Find the number of days since a max value

Question:

Given the following DataFrame:

+----+--------+------------+------+---------------------+
| id | player | match_date | stat | days_since_max_stat |
+----+--------+------------+------+---------------------+
|  1 |      1 | 2022-01-01 | 1500 | NaN                 |
|  2 |      1 | 2022-01-03 | 1600 | 2                   |
|  3 |      1 | 2022-01-10 | 2100 | 7                   |
|  4 |      1 | 2022-01-11 | 1800 | 1                   |
|  5 |      1 | 2022-01-18 | 1700 | 8                   |
|  6 |      2 | 2022-01-01 | 1600 | NaN                 |
|  7 |      2 | 2022-01-03 | 1800 | 2                   |
|  8 |      2 | 2022-01-10 | 1600 | 7                   |
|  9 |      2 | 2022-01-11 | 1900 | 8                   |
| 10 |      2 | 2022-01-18 | 1500 | 7                   |
+----+--------+------------+------+---------------------+

How would I calculate the days_since_max_stat column? The calculation of this column is exclusive of the stat in that row and per player.

For example the value for the row where id = 5 is 8 because the max stat was in the row where id = 3. The days_since_max_stat = 2022-01-18 – 2022-01-10 = 8.

Here’s the base DataFrame:

import datetime as dt
import pandas as pd


dates = [
    dt.datetime(2022, 1, 1),
    dt.datetime(2022, 1, 3),
    dt.datetime(2022, 1, 10),
    dt.datetime(2022, 1, 11),
    dt.datetime(2022, 1, 18),
]
df = pd.DataFrame(
    {
        "id": range(1, 11),
        "player": [1 for i in range(5)] + [2 for i in range(5)],
        "match_date": dates + dates,
        "stat": (1500, 1600, 2100, 1800, 1700, 1600, 1800, 1600, 1900, 1500)
    }
)
Asked By: Jossy

||

Answers:

First imagine you have only one id, then you can use expanding to find the cummulative max/idxmax. then you can subtract:

def day_since_max(data):
    maxIdx = data['stat'].expanding().apply(pd.Series.idxmax)
    date_at_max = data.loc[maxIdx, 'match_date'].shift()
    return data['match_date'] - date_at_max.values

Now, we can use groupby().apply to apply that function for each id:

df['days_since_max'] = df.groupby('player').apply(day_since_max).reset_index(level=0, drop=True)

Output:

   id  player match_date  stat days_since_max
0   1       1 2022-01-01  1500            NaT
1   2       1 2022-01-03  1600         2 days
2   3       1 2022-01-10  2100         7 days
3   4       1 2022-01-11  1800         1 days
4   5       1 2022-01-18  1700         8 days
5   6       2 2022-01-01  1600            NaT
6   7       2 2022-01-03  1800         2 days
7   8       2 2022-01-10  1600         7 days
8   9       2 2022-01-11  1900         8 days
9  10       2 2022-01-18  1500         7 days
Answered By: Quang Hoang

You can use a double groupby. The important part is to compute a new group to put together the rows that are lower than the last max. Once you have done that this is a simple cumsum per group:

g = df.groupby(df['player'])
# date diff per group (days)
diff = g['match_date'].diff().dt.days
# group per lower than last max
group = df['stat'].ge(g['stat'].cummax()).shift().cumsum()
# days since last max
df['dsms'] = diff.groupby([df['player'], group]).cumsum()

Output:

   id  player match_date  stat  dsms
0   1       1 2022-01-01  1500   NaN
1   2       1 2022-01-03  1600   2.0
2   3       1 2022-01-10  2100   7.0
3   4       1 2022-01-11  1800   1.0
4   5       1 2022-01-18  1700   8.0
5   6       2 2022-01-01  1600   NaN
6   7       2 2022-01-03  1800   2.0
7   8       2 2022-01-10  1600   7.0
8   9       2 2022-01-11  1900   8.0
Answered By: mozway
def function1(dd:pd.DataFrame):
    if(len(dd)>1):
        date1=dd.match_date.astype('<M8').iloc[-1]
        data2=dd.iloc[:-1].sort_values('stat',ascending=False).match_date.astype('<M8').iloc[0]
        return (date1-data2).days
    return np.NaN

df.assign(days_since_max_stat=pd.Series(df1.groupby('player').expanding()).apply(function1))

out

   id  player match_date  stat  days_since_max_stat
0   1       1 2022-01-01  1500                  NaN
1   2       1 2022-01-03  1600                  2.0
2   3       1 2022-01-10  2100                  7.0
3   4       1 2022-01-11  1800                  1.0
4   5       1 2022-01-18  1700                  8.0
5   6       2 2022-01-01  1600                  NaN
6   7       2 2022-01-03  1800                  2.0
7   8       2 2022-01-10  1600                  7.0
8   9       2 2022-01-11  1900                  8.0
9  10       2 2022-01-18  1500                  7.0
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.