Filter all rows in a DataFrame with a selected Month (Pandas Datetime)

Question:

I have the following set of dates and I’m looking to filter all the rows if it has ‘2022-02’ in it.

            Sunday      Monday      Tuesday     Wednesday   Thursday    Friday      Saturday
January     2021-12-26  2021-12-27  2021-12-28  2021-12-29  2021-12-30  2021-12-31  2022-01-01
January     2022-01-02  2022-01-03  2022-01-04  2022-01-05  2022-01-06  2022-01-07  2022-01-08
January     2022-01-09  2022-01-10  2022-01-11  2022-01-12  2022-01-13  2022-01-14  2022-01-15
January     2022-01-16  2022-01-17  2022-01-18  2022-01-19  2022-01-20  2022-01-21  2022-01-22
January     2022-01-23  2022-01-24  2022-01-25  2022-01-26  2022-01-27  2022-01-28  2022-01-29
February    2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03  2022-02-04  2022-02-05
February    2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10  2022-02-11  2022-02-12
February    2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17  2022-02-18  2022-02-19
February    2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24  2022-02-25  2022-02-26
March       2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03  2022-03-04  2022-03-05
March       2022-03-06  2022-03-07  2022-03-08  2022-03-09  2022-03-10  2022-03-11  2022-03-12
March       2022-03-13  2022-03-14  2022-03-15  2022-03-16  2022-03-17  2022-03-18  2022-03-19
March       2022-03-20  2022-03-21  2022-03-22  2022-03-23  2022-03-24  2022-03-25  2022-03-26
April       2022-03-27  2022-03-28  2022-03-29  2022-03-30  2022-03-31  2022-04-01  2022-04-02
April       2022-04-03  2022-04-04  2022-04-05  2022-04-06  2022-04-07  2022-04-08  2022-04-09
....

I am trying to filter out the rows with ‘2022-02’. As shown below:
PS this DataFrame has a row of March because ‘2022-02’ is also present in that row.

            Sunday      Monday      Tuesday     Wednesday   Thursday    Friday      Saturday
February    2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03  2022-02-04  2022-02-05
February    2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10  2022-02-11  2022-02-12
February    2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17  2022-02-18  2022-02-19
February    2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24  2022-02-25  2022-02-26
March       2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03  2022-03-04  2022-03-05

I’ve tried these methods, without any success.

calendar.loc[(calendar >= '2022-02-01') & (calendar < '2020-03-01')]
calendar[calendar == '2022-02-01']
Asked By: Henul

||

Answers:

Based on your comments –

yes they are strings

IIUC, these dates are strings, you are trying to treat them as a date variable while searching for strings). The ideal way to solve this would be to convert everything to datetime and then use pd.series.dt. methods but, since all your data is already strings and this is just a simple string search problem, feel free to just use the following.

  1. row.str.find('2022-02')>=0) this checks if, in a given row, it finds the given string.
  2. calender.apply(lambda row ... applies it to each row independently.
  3. .any(1) takes an OR between the True/False values for each row, to give a True, if any of the rows contain 2022-02
  4. calender[...] using the above boolean series to get only those rows that you want.

Hope this explains the code below –

calender[calender.apply(lambda row: row.str.find('2022-02')>=0).any(1)]
              Sunday      Monday     Tuesday   Wednesday    Thursday  
February  2022-01-30  2022-01-31  2022-02-01  2022-02-02  2022-02-03   
February  2022-02-06  2022-02-07  2022-02-08  2022-02-09  2022-02-10   
February  2022-02-13  2022-02-14  2022-02-15  2022-02-16  2022-02-17   
February  2022-02-20  2022-02-21  2022-02-22  2022-02-23  2022-02-24   
March     2022-02-27  2022-02-28  2022-03-01  2022-03-02  2022-03-03   

              Friday    Saturday  
February  2022-02-04  2022-02-05  
February  2022-02-11  2022-02-12  
February  2022-02-18  2022-02-19  
February  2022-02-25  2022-02-26  
March     2022-03-04  2022-03-05  

Avoid comparing date strings as 2022-01-10 > 2022-02-11

Just to add here, for posterity, comparing dates as strings using > or <= is not a good practice, and its discussed in detail on this popular SO post here.

Furthermore, doing string comparisons only compares Unicode values for each character, and is prone to failed cases such as this '10/15/13' > '10/12/14'

So I would highly recommend avoiding these types of coding practices.

Answered By: Akshay Sehgal

Use from this code


df[(df.Saturday <= '2022-03-06')&(df.Sunday >= '2022-01-26')]

it is very simple and fast
If your dataset is large before this line you can use from convert_dtypes method(some times pandas use from object for data type and it is very slow)

Answered By: Alireza
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.