How to filter in pandas dataframe in datetime column

Question:

I got a dataset that I work on and it had datetime column that looked like this (2012-01-01 00:00:00; 2012-01-01 01:00:00…). So I have data of every hour but I need to make analysis based on a day. So I created another column of only the date. Dataframe is kinda like this:

               Datetime        Date  Airquality(float)
0   2012-01-01 00:00:00  2012-01-01             120.00
1   2012-01-01 01:00:00  2012-01-01             150.12
2   2012-01-01 02:00:00  2012-01-01             140.00
..
40  2012-01-02 00:00:00  2012-01-02             130.54
41  2012-01-02 01:00:00  2012-01-02             110.33
...
99  2012-12-31 23:00:00  2012-12-31             190.76

Now, I need to extract all the days where the pollution was higher than 150 for the year of 2012 (in the dataset there is data for more than one year) and if I group only by the Date column then I get much more info than needed cause there are duplicates of course.

Asked By: Bojan

||

Answers:

Given:

              Datetime        Date  Airquality(float)
0  2012-01-01 00:00:00  2012-01-01             120.00
1  2012-01-01 01:00:00  2012-01-01             150.12
2  2012-01-01 02:00:00  2012-01-01             140.00
3  2012-01-02 00:00:00  2012-01-02             130.54
4  2012-01-02 01:00:00  2012-01-02             110.33
5  2012-12-31 23:00:00  2012-12-31             190.76

Doing:

# We don't actually need this column.
df = df.drop('Date', axis=1)
# This may be already done but...
df.Datetime = pd.to_datetime(df.Datetime)
# Set it as your index.
df = df.set_index('Datetime')

# Filter to just 2012, 
# Resample to daily taking the max. 
# Filter to only show 'Airquality(float)' values > 150.
output = (df.loc[df.index.year == 2012]
            .resample('1d').max()
            [lambda x: x['Airquality(float)'].gt(150)])
print(output)

Output:

            Airquality(float)
Datetime
2012-01-01             150.12
2012-12-31             190.76
Answered By: BeRT2me

pd.Grouper should help you here, you can group_by the days
df.groupby(pd.Grouper(key="Datetime", freq="1D")).max() and then use a normal filter.

Answered By: Daraan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.