Pandas Timedelta Multiple If Condition
Question:
I would like to create a new column called time_of_day for a dataframe such as below so that 06-12:59 is morning, 13-17:59 is afternoon, 18-22:59 is evening and rest of the time is night.
id
TIME
25
00:01
25
02:01
25
06:55
18
22:03
18
23:33
18
00:33
What I expect
id
TIME
TIME_OF_DAY
25
00:01
Night
25
02:01
Night
25
06:55
Morning
18
22:03
Evening
18
23:33
Night
18
00:33
Night
I tried the numpy vectorization with choices and conditions, but it failed. Then I tried the following:
def conditions(s):
if (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('06:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('13:00:00')):
return "Morning"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('13:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('18:00:00')):
return "Afternoon"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('18:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('23:00:00')):
return "Evening"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('23:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('06:00:00')):
return "Night"
df['TIME_OF_DAY'] = df.apply(conditions, axis=1)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I overcome this issue?
PS: I added .all() for the enf od each if condition, but still got the same error.
Answers:
Assuming the sample DataFrame you mentioned in the question, you can make the following modifications to make the code work:
def conditions(t: pd.Timedelta):
if pd.Timestamp('06:00:00') <= t < pd.Timestamp('13:00:00'):
return "Morning"
elif pd.Timestamp('13:00:00') <= t < pd.Timestamp('18:00:00'):
return "Afternoon"
elif pd.Timestamp('18:00:00') <= t < pd.Timestamp('23:00:00'):
return "Evening"
else:
return "Night"
df['TIME_OF_DAY'] = df["TIME"].apply(lambda s: conditions(pd.Timestamp(s)))
Output (df
):
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
Use pandas.cut
, it will be more efficient than a function with apply
:
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
df['TIME_OF_DAY'] = pd.cut(
pd.to_timedelta(df['TIME']+':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
output:
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
If you need a function:
def to_day_period(s):
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
return pd.cut(
pd.to_timedelta(s+':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['TIME_OF_DAY'] = to_day_period(df['TIME'])
col1=pd.cut(pd.to_datetime(df1.TIME).dt.hour,[6,13,18,23],labels=['morning','afternoon','evening'],right=False).tolist()
df1.assign(TIME_OF_DAY=col1).fillna("night")
out:
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
I would like to create a new column called time_of_day for a dataframe such as below so that 06-12:59 is morning, 13-17:59 is afternoon, 18-22:59 is evening and rest of the time is night.
id | TIME |
---|---|
25 | 00:01 |
25 | 02:01 |
25 | 06:55 |
18 | 22:03 |
18 | 23:33 |
18 | 00:33 |
What I expect
id | TIME | TIME_OF_DAY |
---|---|---|
25 | 00:01 | Night |
25 | 02:01 | Night |
25 | 06:55 | Morning |
18 | 22:03 | Evening |
18 | 23:33 | Night |
18 | 00:33 | Night |
I tried the numpy vectorization with choices and conditions, but it failed. Then I tried the following:
def conditions(s):
if (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('06:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('13:00:00')):
return "Morning"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('13:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('18:00:00')):
return "Afternoon"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('18:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('23:00:00')):
return "Evening"
elif (pd.to_timedelta(df['TIME']) >= pd.to_timedelta('23:00:00')) & (pd.to_timedelta(df['TIME']) < pd.to_timedelta('06:00:00')):
return "Night"
df['TIME_OF_DAY'] = df.apply(conditions, axis=1)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I overcome this issue?
PS: I added .all() for the enf od each if condition, but still got the same error.
Assuming the sample DataFrame you mentioned in the question, you can make the following modifications to make the code work:
def conditions(t: pd.Timedelta):
if pd.Timestamp('06:00:00') <= t < pd.Timestamp('13:00:00'):
return "Morning"
elif pd.Timestamp('13:00:00') <= t < pd.Timestamp('18:00:00'):
return "Afternoon"
elif pd.Timestamp('18:00:00') <= t < pd.Timestamp('23:00:00'):
return "Evening"
else:
return "Night"
df['TIME_OF_DAY'] = df["TIME"].apply(lambda s: conditions(pd.Timestamp(s)))
Output (df
):
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
Use pandas.cut
, it will be more efficient than a function with apply
:
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
df['TIME_OF_DAY'] = pd.cut(
pd.to_timedelta(df['TIME']+':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
output:
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night
If you need a function:
def to_day_period(s):
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Night', 'Morning', 'Afternoon', 'Evening', 'Night']
return pd.cut(
pd.to_timedelta(s+':00'),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['TIME_OF_DAY'] = to_day_period(df['TIME'])
col1=pd.cut(pd.to_datetime(df1.TIME).dt.hour,[6,13,18,23],labels=['morning','afternoon','evening'],right=False).tolist()
df1.assign(TIME_OF_DAY=col1).fillna("night")
out:
id TIME TIME_OF_DAY
0 25 00:01 Night
1 25 02:01 Night
2 25 06:55 Morning
3 18 22:03 Evening
4 18 23:33 Night
5 18 00:33 Night