Keep max value until ID and condition change in Pandas
Question:
I have a dataframe that looks like this (link to csv)
id time value approved
1 0:00 10 false
1 0:01 20 true
1 0:02 30 true
1 0:03 20 true
1 0:04 40 false
1 0:05 35 false
1 0:06 60 false
2 0:07 20 true
2 0:08 30 true
2 0:09 50 false
2 0:10 45 false
2 0:11 70 false
2 0:12 62 false
and I want to create two more columns that will keep the max approved values with a tolerance of 2 secs and the time of the respective max values. So I want it to look like this
id time value approved max_approved max_time
1 0:00 10 false NaN NaN
1 0:01 20 true 20 0:01
1 0:02 30 true 30 0:02
1 0:03 20 true 30 0:02
1 0:04 40 false 40 0:04
1 0:05 35 false 40 0:04
1 0:06 60 false 40 0:04
2 0:07 20 true 20 0:07
2 0:08 30 true 30 0:08
2 0:09 50 false 50 0:09
2 0:10 45 false 50 0:09
2 0:11 70 false 50 0:09
How can I do this? Thanks
Answers:
You could use iterrows to do so
max_value = 0
for index, row_data in df.iterrows():
# your logic, e.g.
if row_data.approved and row_data.value > max_value:
max_value = row_data.value
df['max_approved'].iloc(index) = max_value
...
Does this help to get started?
If you want a exact solution, please provide code with the DataFrame (so we don’t have to parse the data out of your question. Or your code and where your problems are
The logic or output is not fully clear, but if I guess correctly, you can try:
df['td'] = pd.to_timedelta('0:'+df['time'])
df[['max_approved', 'max_time']] = (df
.assign(value=df['value'].where(df['approved']),
last_time=lambda d: d['td'].dt.total_seconds().where(df['approved']),
)
.set_index('td').groupby('id')[['value', 'last_time']]
.apply(lambda s: s.rolling('2s').max().ffill())
.to_numpy()
)
output:
id time value approved td max_approved max_time
0 1 0:00 10 False 0 days 00:00:00 NaN NaN
1 1 0:01 20 True 0 days 00:00:01 20.0 1.0
2 1 0:02 30 True 0 days 00:00:02 30.0 2.0
3 1 0:03 20 True 0 days 00:00:03 30.0 3.0
4 1 0:04 40 False 0 days 00:00:04 20.0 3.0
5 1 0:05 35 False 0 days 00:00:05 20.0 3.0
6 1 0:06 60 False 0 days 00:00:06 20.0 3.0
7 2 0:07 20 True 0 days 00:00:07 20.0 7.0
8 2 0:08 30 True 0 days 00:00:08 30.0 8.0
9 2 0:09 50 False 0 days 00:00:09 30.0 8.0
10 2 0:10 45 False 0 days 00:00:10 30.0 8.0
11 2 0:11 70 False 0 days 00:00:11 30.0 8.0
12 2 0:12 62 False 0 days 00:00:12 30.0 8.0
After a few days of research I managed to do it this way:
canBeTop = (df['approved'].rolling(window = 3, min_periods=1).max() == True)
df['max_approved'] = df.groupby(['id', canBeTop])['value'].transform('cummax').where(canBeTop).ffill()
df['max_time'] = df.where((canBeTop == True) & (df['value'] == df['max_approved']))['time']
df['max_time'] = df.groupby('id', group_keys=False)['max_time'].apply(lambda x: x.ffill())
I have a dataframe that looks like this (link to csv)
id time value approved
1 0:00 10 false
1 0:01 20 true
1 0:02 30 true
1 0:03 20 true
1 0:04 40 false
1 0:05 35 false
1 0:06 60 false
2 0:07 20 true
2 0:08 30 true
2 0:09 50 false
2 0:10 45 false
2 0:11 70 false
2 0:12 62 false
and I want to create two more columns that will keep the max approved values with a tolerance of 2 secs and the time of the respective max values. So I want it to look like this
id time value approved max_approved max_time
1 0:00 10 false NaN NaN
1 0:01 20 true 20 0:01
1 0:02 30 true 30 0:02
1 0:03 20 true 30 0:02
1 0:04 40 false 40 0:04
1 0:05 35 false 40 0:04
1 0:06 60 false 40 0:04
2 0:07 20 true 20 0:07
2 0:08 30 true 30 0:08
2 0:09 50 false 50 0:09
2 0:10 45 false 50 0:09
2 0:11 70 false 50 0:09
How can I do this? Thanks
You could use iterrows to do so
max_value = 0
for index, row_data in df.iterrows():
# your logic, e.g.
if row_data.approved and row_data.value > max_value:
max_value = row_data.value
df['max_approved'].iloc(index) = max_value
...
Does this help to get started?
If you want a exact solution, please provide code with the DataFrame (so we don’t have to parse the data out of your question. Or your code and where your problems are
The logic or output is not fully clear, but if I guess correctly, you can try:
df['td'] = pd.to_timedelta('0:'+df['time'])
df[['max_approved', 'max_time']] = (df
.assign(value=df['value'].where(df['approved']),
last_time=lambda d: d['td'].dt.total_seconds().where(df['approved']),
)
.set_index('td').groupby('id')[['value', 'last_time']]
.apply(lambda s: s.rolling('2s').max().ffill())
.to_numpy()
)
output:
id time value approved td max_approved max_time
0 1 0:00 10 False 0 days 00:00:00 NaN NaN
1 1 0:01 20 True 0 days 00:00:01 20.0 1.0
2 1 0:02 30 True 0 days 00:00:02 30.0 2.0
3 1 0:03 20 True 0 days 00:00:03 30.0 3.0
4 1 0:04 40 False 0 days 00:00:04 20.0 3.0
5 1 0:05 35 False 0 days 00:00:05 20.0 3.0
6 1 0:06 60 False 0 days 00:00:06 20.0 3.0
7 2 0:07 20 True 0 days 00:00:07 20.0 7.0
8 2 0:08 30 True 0 days 00:00:08 30.0 8.0
9 2 0:09 50 False 0 days 00:00:09 30.0 8.0
10 2 0:10 45 False 0 days 00:00:10 30.0 8.0
11 2 0:11 70 False 0 days 00:00:11 30.0 8.0
12 2 0:12 62 False 0 days 00:00:12 30.0 8.0
After a few days of research I managed to do it this way:
canBeTop = (df['approved'].rolling(window = 3, min_periods=1).max() == True)
df['max_approved'] = df.groupby(['id', canBeTop])['value'].transform('cummax').where(canBeTop).ffill()
df['max_time'] = df.where((canBeTop == True) & (df['value'] == df['max_approved']))['time']
df['max_time'] = df.groupby('id', group_keys=False)['max_time'].apply(lambda x: x.ffill())