Python Pandas GroupBy to calculate differences in months

Question:

A data frame below and I want to calculate the intervals of months under the names.

Lines so far:

import pandas as pd
from io import StringIO
import numpy as np

csvfile = StringIO(
"""Name Year - Month    Score
Mike    2022-11 31
Mike    2022-11 136
Lilly   2022-11 23
Lilly   2022-10 44
Kate    2023-01 1393
Kate    2022-10 2360
Kate    2022-08 1648
Kate    2022-06 543
Kate    2022-04 1935
Peter   2022-04 302
David   2023-01 1808
David   2022-12 194
David   2022-09 4077
David   2022-06 666
David   2022-03 3362""")
    
df = pd.read_csv(csvfile, sep = 't', engine='python')

df['Year - Month'] = pd.to_datetime(df['Year - Month'], format='%Y-%m')
    
df['Interval'] = (df.groupby(['Name'])['Year - Month'].transform(lambda x: x.diff())/ np.timedelta64(1, 'M'))

df['Interval'] = df['Interval'].replace(np.nan, 1).astype(int)

But the output seems something wrong (not calculating right).

Where has this gone wrong, and how can I correct it?

     Name Year - Month  Score  Interval
0    Mike   2022-11     31         1 <- shall be 0
1    Mike   2022-11    136         0
2   Lilly   2022-11     23         1
3   Lilly   2022-10     44         1 <- shall be 0
4    Kate   2023-01   1393         1 <- shall be 3
5    Kate   2022-10   2360         3 <- shall be 2
6    Kate   2022-08   1648         2
7    Kate   2022-06    543         2
8    Kate   2022-04   1935         2 <- shall be 0
9   Peter   2022-04    302         1 <- shall be 0
10  David   2023-01   1808         1 <- shall be 1
11  David   2022-12    194         1 <- shall be 3
12  David   2022-09   4077         2 <- shall be 3
13  David   2022-06    666         3
14  David   2022-03   3362         3 <- shall be 0
Asked By: Mark K

||

Answers:

You need to difference with next value instead of previous value. You can do so by setting -1 in diff().

...

df['Interval'] = df.groupby(['Name'])['Year - Month'].transform(lambda x: x.diff(-1)) / np.timedelta64(1, 'M')
df['Interval'] = df['Interval'].fillna(0).round().astype(int)

Result:

     Name Year - Month  Score  Interval
0    Mike   2022-11-01     31         0
1    Mike   2022-11-01    136         0
2   Lilly   2022-11-01     23         1
3   Lilly   2022-10-01     44         0
4    Kate   2023-01-01   1393         3
5    Kate   2022-10-01   2360         2
6    Kate   2022-08-01   1648         2
7    Kate   2022-06-01    543         2
8    Kate   2022-04-01   1935         0
9   Peter   2022-04-01    302         0
10  David   2023-01-01   1808         1
11  David   2022-12-01    194         3
12  David   2022-09-01   4077         3
13  David   2022-06-01    666         3
14  David   2022-03-01   3362         0
Answered By: Ashyam
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.