Pandas does not respect conversion to time type

Question:

I have this dataframe:

    site    date    time
1   AA  2018-01-01  0100
2   AA  2018-01-01  0200
3   AA  2018-01-01  0300
4   AA  2018-01-01  0400
5   AA  2018-01-01  0500
6   AA  2018-01-01  0600
7   AA  2018-01-01  0700
8   AA  2018-01-01  0800
9   AA  2018-01-01  0900

df.dtypes
>>>   site            object
      date    datetime64[ns]
      time            object

I would like to convert the time column to time type (without date) to later filter the dataframe between desired hours. So I did:

df['time'] = df['time'].apply(lambda x: pd.to_datetime(x, format='%H%M').time())

The dataframe now looks like this:

    site    date    time
1   AA  2018-01-01  01:00:00
2   AA  2018-01-01  02:00:00
3   AA  2018-01-01  03:00:00
4   AA  2018-01-01  04:00:00
5   AA  2018-01-01  05:00:00
6   AA  2018-01-01  06:00:00
7   AA  2018-01-01  07:00:00
8   AA  2018-01-01  08:00:00
9   AA  2018-01-01  09:00:00

However, the data type is still an object type:

df.dtypes
>>> site            object
    date    datetime64[ns]
    time            object
    dtype: object

But, when I check the type for individual value, it does seem to work:

df.at[5,'time']
>>> datetime.time(5, 0)

type(df.at[5,'time'])
>>> datetime.time

Still, I can’t filter the data based on time:

from datetime import time
df[df['time'].between_time(time(5),time(8))]
>>> TypeError: Index must be DatetimeIndex
Asked By: user88484

||

Answers:

The reason you see TypeError is because in the documentation for between_time it clearly says:

Raises

TypeError

    If the index is not a DatetimeIndex

You need to set the index for the dataframe as datetime index, but for that to happen your data for the index should contain datetime not just time. By using time() you are converting it into time object. But DatetimeIndex needs a datetime object.

One way to get the result you wanted is:

df.set_index(pd.DatetimeIndex(
    pd.to_datetime(df["time"], format="%H%M"))).between_time(
    time(5), time(8)
).reset_index(drop=True)

Output:

  site        date  time
0   AA  2018-01-01  0500
1   AA  2018-01-01  0600
2   AA  2018-01-01  0700
3   AA  2018-01-01  0800

Or even you could use your date column to create a datetime index and then use between_time like:

df.set_index(pd.to_datetime(df['date'] + ' ' + df['time'])).between_time(
    time(5), time(8)).reset_index(drop=True)
Answered By: SomeDude
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.