Spliting nested date lists in DF columns and calculate AVG hour
Question:
Lets say i have this DF
ID
date_time
1
2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020-04-14 22:10:56, 2021-06-02 22:18:06
2
2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-06-04 23:08:16
3
2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-08-13 23:20:22
I want to remove YYYYMMDD string at the first position after every single comma and calculate AVG hour from list
Final output would be:
ID
date_time
AVG_hour
1
21:10:56,22:18:06,22:10:56
22
2
21:43:09,23:08:15,23:08:16
22
3
23:29:17,23:20:22,23:20:22
22
I tried following; but it did not work:
df['date_time'] = [para.split(None, 1)[1] for para in df['date_time']]
df.head()
Answers:
here is one way to do it
# Split on comma, convert each value to date time and then to time delta
# take the total seconds and convert to hours
# np.mean to take average and then round the result
df['Avg_hour']=df['date_time'].str.split(',').apply(lambda x: round(np.mean([ pd.to_timedelta((pd.to_datetime(i)).strftime('%H:%M:%S')).total_seconds()/3600 for i in x])))
df
ID date_time Avg_hour
0 1 2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020... 22
1 2 2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-... 23
2 3 2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-... 23
#same as above, round to 2 decimal places
df['Avg_hour']=df['date_time'].str.split(',').apply(lambda x: round(np.mean([ pd.to_timedelta((pd.to_datetime(i)).strftime('%H:%M:%S')).total_seconds()/3600 for i in x]), 2))
df
ID date_time Avg_hour
0 1 2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020... 21.99
1 2 2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-... 22.66
2 3 2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-... 23.39
Lets say i have this DF
ID | date_time |
---|---|
1 | 2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020-04-14 22:10:56, 2021-06-02 22:18:06 |
2 | 2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-06-04 23:08:16 |
3 | 2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-08-13 23:20:22 |
I want to remove YYYYMMDD string at the first position after every single comma and calculate AVG hour from list
Final output would be:
ID | date_time | AVG_hour |
---|---|---|
1 | 21:10:56,22:18:06,22:10:56 | 22 |
2 | 21:43:09,23:08:15,23:08:16 | 22 |
3 | 23:29:17,23:20:22,23:20:22 | 22 |
I tried following; but it did not work:
df['date_time'] = [para.split(None, 1)[1] for para in df['date_time']]
df.head()
here is one way to do it
# Split on comma, convert each value to date time and then to time delta
# take the total seconds and convert to hours
# np.mean to take average and then round the result
df['Avg_hour']=df['date_time'].str.split(',').apply(lambda x: round(np.mean([ pd.to_timedelta((pd.to_datetime(i)).strftime('%H:%M:%S')).total_seconds()/3600 for i in x])))
df
ID date_time Avg_hour
0 1 2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020... 22
1 2 2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-... 23
2 3 2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-... 23
#same as above, round to 2 decimal places
df['Avg_hour']=df['date_time'].str.split(',').apply(lambda x: round(np.mean([ pd.to_timedelta((pd.to_datetime(i)).strftime('%H:%M:%S')).total_seconds()/3600 for i in x]), 2))
df
ID date_time Avg_hour
0 1 2020-03-13 21:10:56, 2020-06-02 22:18:06, 2020... 21.99
1 2 2010-09-13 21:43:09, 2011-05-04 23:08:15,2012-... 22.66
2 3 2013-06-14 23:29:17, 2014-08-13 23:20:22,2014-... 23.39