How can I get the values at every nth hour from a Pandas DataFrame?

Question:

I would like to get values at every nth hour from a Pandas DataFrame. The DataFrame uses a DateTime column as index like this:

                                                       Value A             Value B               Value C
timestamp
2021-03-29 23:58:59.443000+00:00                           0.7                 0.2                   0.0
2021-03-29 23:58:59.458000+00:00                           0.0                 0.1                   0.1
2021-03-29 23:58:59.474000+00:00                           0.3                 0.0                   0.2
2021-03-29 23:59:59.446000+00:00                           0.2                 0.0                   0.0
2021-03-29 23:59:59.461000+00:00                           0.0                 0.0                   0.5

Now I would like to extract the values at every nth hour. What is the best way to do this? The only way I can think of right now is generate a list with the dates at which the values should be extracted then loop through this list find the date in the DataFrame with the smallest difference and get the values at that date. But this feels like rather bad practice.

Asked By: Axel

||

Answers:

Use an asof merge. This will merge the entire row for the closest time in your DataFrame to the hourly cadence. You can change the direction to be closest in the future or past instead of either direction.

import pandas as pd

# Series of hours that span the range of the Index
s = pd.Series(pd.date_range(df.index.min().floor('H'), df.index.max().ceil('H'), freq='H'),
              name='times')

pd.merge_asof(s, df.reset_index(), left_on='times', right_on='timestamp', direction='nearest')

                      times                        timestamp  ValueA  ValueB  ValueC
0 2021-03-29 23:00:00+00:00 2021-03-29 23:58:59.443000+00:00     0.7     0.2     0.0
1 2021-03-30 00:00:00+00:00 2021-03-29 23:59:59.461000+00:00     0.0     0.0     0.5
Answered By: ALollz

You can use the .loc functionality by filtering using a conditional throughout your dataframe based on you column/index you selected.

nth_df = df.loc[df.index.dt.hour.isin([9])]

with it as a column

nth_df = df.loc[df['timestamp'].dt.hour.isin([9])]

both will result in getting rows that are across multiple days. You can add more conditionals with the & like so:

nth_df = df.loc[df.index.dt.hour.isin([9]) & df.index.dt.minute.isin([30])]
Answered By: pky
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.