Keep max value until ID and condition change in Pandas

Question:

I have a dataframe that looks like this (link to csv)

id       time        value   approved
  1       0:00        10       false
  1       0:01        20       true
  1       0:02        30       true
  1       0:03        20       true
  1       0:04        40       false
  1       0:05        35       false
  1       0:06        60       false
  2       0:07        20       true
  2       0:08        30       true
  2       0:09        50       false
  2       0:10        45       false
  2       0:11        70       false
  2       0:12        62       false

and I want to create two more columns that will keep the max approved values with a tolerance of 2 secs and the time of the respective max values. So I want it to look like this

id       time        value   approved    max_approved   max_time
  1       0:00        10       false         NaN          NaN
  1       0:01        20       true          20           0:01
  1       0:02        30       true          30           0:02
  1       0:03        20       true          30           0:02
  1       0:04        40       false         40           0:04
  1       0:05        35       false         40           0:04
  1       0:06        60       false         40           0:04
  2       0:07        20       true          20           0:07
  2       0:08        30       true          30           0:08
  2       0:09        50       false         50           0:09
  2       0:10        45       false         50           0:09
  2       0:11        70       false         50           0:09

How can I do this? Thanks

Asked By: mwind

||

Answers:

You could use iterrows to do so

max_value = 0
for index, row_data in df.iterrows():
  # your logic, e.g.
  if row_data.approved and row_data.value > max_value:
    max_value = row_data.value
  df['max_approved'].iloc(index) = max_value
  ...

Does this help to get started?

If you want a exact solution, please provide code with the DataFrame (so we don’t have to parse the data out of your question. Or your code and where your problems are

Answered By: tturbo

The logic or output is not fully clear, but if I guess correctly, you can try:

df['td'] = pd.to_timedelta('0:'+df['time'])

df[['max_approved', 'max_time']] = (df
   .assign(value=df['value'].where(df['approved']),
           last_time=lambda d: d['td'].dt.total_seconds().where(df['approved']),
          )
   .set_index('td').groupby('id')[['value', 'last_time']]
   .apply(lambda s: s.rolling('2s').max().ffill())
   .to_numpy()
)

output:

    id  time  value  approved              td  max_approved  max_time
0    1  0:00     10     False 0 days 00:00:00           NaN       NaN
1    1  0:01     20      True 0 days 00:00:01          20.0       1.0
2    1  0:02     30      True 0 days 00:00:02          30.0       2.0
3    1  0:03     20      True 0 days 00:00:03          30.0       3.0
4    1  0:04     40     False 0 days 00:00:04          20.0       3.0
5    1  0:05     35     False 0 days 00:00:05          20.0       3.0
6    1  0:06     60     False 0 days 00:00:06          20.0       3.0
7    2  0:07     20      True 0 days 00:00:07          20.0       7.0
8    2  0:08     30      True 0 days 00:00:08          30.0       8.0
9    2  0:09     50     False 0 days 00:00:09          30.0       8.0
10   2  0:10     45     False 0 days 00:00:10          30.0       8.0
11   2  0:11     70     False 0 days 00:00:11          30.0       8.0
12   2  0:12     62     False 0 days 00:00:12          30.0       8.0
Answered By: mozway

After a few days of research I managed to do it this way:

canBeTop = (df['approved'].rolling(window = 3, min_periods=1).max() == True)
df['max_approved'] = df.groupby(['id', canBeTop])['value'].transform('cummax').where(canBeTop).ffill()
df['max_time'] = df.where((canBeTop == True) & (df['value'] == df['max_approved']))['time']
df['max_time'] = df.groupby('id', group_keys=False)['max_time'].apply(lambda x: x.ffill())
Answered By: mwind
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.