Python Monthly Change Calculation (Pandas)
Question:
Here is data
id
date
population
1
2021-5
21
2
2021-5
22
3
2021-5
23
4
2021-5
24
1
2021-4
17
2
2021-4
24
3
2021-4
18
4
2021-4
29
1
2021-3
20
2
2021-3
29
3
2021-3
17
4
2021-3
22
I want to calculate the monthly change regarding population in each id. so result will be:
id
date
delta
1
5
.2353
1
4
-.15
2
5
-.1519
2
4
-.2083
3
5
.2174
3
4
.0556
4
5
-.2083
4
4
.3182
delta := (this month – last month) / last month
How to approach this in pandas? I’m thinking of groupby but don’t know what to do next
remember there might be more dates. but results is always
Answers:
maybe you could try something like:
data['delta'] = data['population'].diff()
data['delta'] /= data['population']
with this approach the first line would be NaNs, but for the rest, this should work.
Use GroupBy.pct_change
with sorting columns first before, last remove misisng rows by column delta
:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['id','date'], ascending=[True, False])
df['delta'] = df.groupby('id')['population'].pct_change(-1)
df = df.dropna(subset=['delta'])
print (df)
id date population delta
0 1 2021-05-01 21 0.235294
4 1 2021-04-01 17 -0.150000
1 2 2021-05-01 22 -0.083333
5 2 2021-04-01 24 -0.172414
2 3 2021-05-01 23 0.277778
6 3 2021-04-01 18 0.058824
3 4 2021-05-01 24 -0.172414
7 4 2021-04-01 29 0.318182
Try this:
df.groupby('id')['population'].rolling(2).apply(lambda x: (x.iloc[0] - x.iloc[1]) / x.iloc[0]).dropna()
Here is data
id | date | population |
---|---|---|
1 | 2021-5 | 21 |
2 | 2021-5 | 22 |
3 | 2021-5 | 23 |
4 | 2021-5 | 24 |
1 | 2021-4 | 17 |
2 | 2021-4 | 24 |
3 | 2021-4 | 18 |
4 | 2021-4 | 29 |
1 | 2021-3 | 20 |
2 | 2021-3 | 29 |
3 | 2021-3 | 17 |
4 | 2021-3 | 22 |
I want to calculate the monthly change regarding population in each id. so result will be:
id | date | delta |
---|---|---|
1 | 5 | .2353 |
1 | 4 | -.15 |
2 | 5 | -.1519 |
2 | 4 | -.2083 |
3 | 5 | .2174 |
3 | 4 | .0556 |
4 | 5 | -.2083 |
4 | 4 | .3182 |
delta := (this month – last month) / last month
How to approach this in pandas? I’m thinking of groupby but don’t know what to do next
remember there might be more dates. but results is always
maybe you could try something like:
data['delta'] = data['population'].diff()
data['delta'] /= data['population']
with this approach the first line would be NaNs, but for the rest, this should work.
Use GroupBy.pct_change
with sorting columns first before, last remove misisng rows by column delta
:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['id','date'], ascending=[True, False])
df['delta'] = df.groupby('id')['population'].pct_change(-1)
df = df.dropna(subset=['delta'])
print (df)
id date population delta
0 1 2021-05-01 21 0.235294
4 1 2021-04-01 17 -0.150000
1 2 2021-05-01 22 -0.083333
5 2 2021-04-01 24 -0.172414
2 3 2021-05-01 23 0.277778
6 3 2021-04-01 18 0.058824
3 4 2021-05-01 24 -0.172414
7 4 2021-04-01 29 0.318182
Try this:
df.groupby('id')['population'].rolling(2).apply(lambda x: (x.iloc[0] - x.iloc[1]) / x.iloc[0]).dropna()