Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

Question:

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I’d like to do is create a new column truncated to hour precision. I’m currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that’s fine. However, I’ve an inkling there’s some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

So if possible, is there some pandas wizardry to do this?

Asked By: Jon Clements

||

Answers:

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here’s another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on:

  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'
Answered By: Alex Riley

A method I’ve used in the past to accomplish this goal was the following (quite similar to what you’re already doing, but thought I’d throw it out there anyway):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))
Answered By: David Hagan

Alternatively:

df.dt.index.to_period("h") # for the period
df.dt.index.to_period("h").to_timestamp() # for the timestamp truncated

would be the least ambiguous (more pythonic?) way to achieve this.
Using floor/round/ceil for coarser round (months, years…) you would get an error message

ValueError: <YearEnd: month=12> is a non-fixed frequency

See discussion here: https://github.com/pandas-dev/pandas/issues/15303

Answered By: Adav
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.