update column values based on length temp dataframes based on source dataframe
Question:
I have a data:
ID
closed
set
message_time
aaa
22:00
19:00
19:05
aaa
22:00
19:30
19:40
aaa
22:00
20:00
21:00
bbb
23:00
22:00
22:10
ccc
22:00
19:00
19:05
ccc
22:00
19:30
19:40
code is here:
import pandas as pd
df = pd.DataFrame({
'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
'closed': ['22:00', '22:00', '22:00', '23:00', '22:00', '22:00'],
'set': ['19:00', '19:30', '20:00', '22:00', '19:00', '19:30'],
'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})
I need to create new column named "newtime" based on several conditions:
We have temp_df for each of id, so length of temp_df where id = aaa is 3, for bbb is 1, for ccc is 2;
if length of temp_df == 1 then newtime = closed - set
if length of temp_df > 1 then
for rows > 0 and < length
newtime = set[row+1] - set[row]
if row == length then newtime = closed[row] - set[row]
What is the best way ro realize that?
P. S. The needed output is:
ID
closed
set
message_time
newtime
aaa
22:00
19:00
19:05
30 min
aaa
22:00
19:30
19:40
30 min
aaa
22:00
20:00
21:00
120 min
bbb
23:00
22:00
22:10
60 min
ccc
22:00
19:00
19:05
30 min
ccc
22:00
19:30
19:40
150 min
Answers:
Convert to_timedelta
and fillna
the last row per group with the difference:
c = pd.to_timedelta(df['closed']+':00')
s = pd.to_timedelta(df['set']+':00')
df['newtime'] = s.groupby(df['ID']).diff(-1).mul(-1).fillna(c-s)
Output:
ID closed set message_time newtime
0 aaa 22:00 19:00 19:05 0 days 00:30:00
1 aaa 22:00 19:30 19:40 0 days 00:30:00
2 aaa 22:00 20:00 21:00 0 days 02:00:00
3 bbb 23:00 22:00 22:10 0 days 01:00:00
4 ccc 22:00 19:00 19:05 0 days 00:30:00
5 ccc 22:00 19:30 19:40 0 days 02:30:00
If you need output as minutes:
df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
.fillna(c-s)
.dt.total_seconds().div(60)
)
Output:
ID closed set message_time newtime
0 aaa 22:00 19:00 19:05 30.0
1 aaa 22:00 19:30 19:40 30.0
2 aaa 22:00 20:00 21:00 120.0
3 bbb 23:00 22:00 22:10 60.0
4 ccc 22:00 19:00 19:05 30.0
5 ccc 22:00 19:30 19:40 150.0
I have a data:
ID | closed | set | message_time |
---|---|---|---|
aaa | 22:00 | 19:00 | 19:05 |
aaa | 22:00 | 19:30 | 19:40 |
aaa | 22:00 | 20:00 | 21:00 |
bbb | 23:00 | 22:00 | 22:10 |
ccc | 22:00 | 19:00 | 19:05 |
ccc | 22:00 | 19:30 | 19:40 |
code is here:
import pandas as pd
df = pd.DataFrame({
'ID': ['aaa', 'aaa', 'aaa', 'bbb', 'ccc', 'ccc'],
'closed': ['22:00', '22:00', '22:00', '23:00', '22:00', '22:00'],
'set': ['19:00', '19:30', '20:00', '22:00', '19:00', '19:30'],
'message_time': ['19:05', '19:40', '21:00', '22:10', '19:05', '19:40']
})
I need to create new column named "newtime" based on several conditions:
We have temp_df for each of id, so length of temp_df where id = aaa is 3, for bbb is 1, for ccc is 2;
if length of temp_df == 1 then newtime = closed - set
if length of temp_df > 1 then
for rows > 0 and < length
newtime = set[row+1] - set[row]
if row == length then newtime = closed[row] - set[row]
What is the best way ro realize that?
P. S. The needed output is:
ID | closed | set | message_time | newtime |
---|---|---|---|---|
aaa | 22:00 | 19:00 | 19:05 | 30 min |
aaa | 22:00 | 19:30 | 19:40 | 30 min |
aaa | 22:00 | 20:00 | 21:00 | 120 min |
bbb | 23:00 | 22:00 | 22:10 | 60 min |
ccc | 22:00 | 19:00 | 19:05 | 30 min |
ccc | 22:00 | 19:30 | 19:40 | 150 min |
Convert to_timedelta
and fillna
the last row per group with the difference:
c = pd.to_timedelta(df['closed']+':00')
s = pd.to_timedelta(df['set']+':00')
df['newtime'] = s.groupby(df['ID']).diff(-1).mul(-1).fillna(c-s)
Output:
ID closed set message_time newtime
0 aaa 22:00 19:00 19:05 0 days 00:30:00
1 aaa 22:00 19:30 19:40 0 days 00:30:00
2 aaa 22:00 20:00 21:00 0 days 02:00:00
3 bbb 23:00 22:00 22:10 0 days 01:00:00
4 ccc 22:00 19:00 19:05 0 days 00:30:00
5 ccc 22:00 19:30 19:40 0 days 02:30:00
If you need output as minutes:
df['newtime'] = (s.groupby(df['ID']).diff(-1).mul(-1)
.fillna(c-s)
.dt.total_seconds().div(60)
)
Output:
ID closed set message_time newtime
0 aaa 22:00 19:00 19:05 30.0
1 aaa 22:00 19:30 19:40 30.0
2 aaa 22:00 20:00 21:00 120.0
3 bbb 23:00 22:00 22:10 60.0
4 ccc 22:00 19:00 19:05 30.0
5 ccc 22:00 19:30 19:40 150.0