calculate diff between two values and then % difference associated to unique references month by month in pandas dataframe
Question:
I have a pandas dataframe;
ID
MONTH
TOTAL
0
REF1
1
500
1
REF1
2
501
2
REF1
3
620
3
REF2
8
5001
4
REF2
9
5101
5
REF2
10
5701
6
REF2
11
7501
7
REF2
7
6501
8
REF2
6
1501
I need to do a comparison between of difference between the ID’s previous month’s TOTAL.
At the moment I can calculate the difference between the row above but the comparison doesn’t take into account the ID/MONTH. Would this need to be a where loop?
I have tried the below, but this returns NaN in all cells of the ‘Variance’ & ‘Variance%’ columns;
df_all.sort_values(['ID', 'MONTH'], inplace=True)
df_all['Variance'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].shift()
df_all['Variance%'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].pct_change()
The desired outcome is;
ID
MONTH
TOTAL
Variance
Variance %
0
REF1
1
500
0
0
1
REF1
2
501
1
0.2
Answers:
You can shift the Month by adding 1 (eventually use a more complex logic if you have real dates), then perform a self-merge
and subtract:
df['diff'] = df['TOTAL'].sub(
df[['ID', 'MONTH']]
.merge(df.assign(MONTH=df['MONTH'].add(1)),
how='left')['TOTAL']
)
Output:
ID MONTH TOTAL diff
0 REF1 1 500 NaN
1 REF1 2 501 1.0
2 REF1 3 620 119.0
3 REF2 8 5001 -1500.0 # 5001 - 6501
4 REF2 9 5101 100.0
5 REF2 10 5701 600.0
6 REF2 11 7501 1800.0
7 REF2 7 6501 5000.0 # 6501 - 1501
8 REF2 6 1501 NaN
I have a pandas dataframe;
ID | MONTH | TOTAL | |
---|---|---|---|
0 | REF1 | 1 | 500 |
1 | REF1 | 2 | 501 |
2 | REF1 | 3 | 620 |
3 | REF2 | 8 | 5001 |
4 | REF2 | 9 | 5101 |
5 | REF2 | 10 | 5701 |
6 | REF2 | 11 | 7501 |
7 | REF2 | 7 | 6501 |
8 | REF2 | 6 | 1501 |
I need to do a comparison between of difference between the ID’s previous month’s TOTAL.
At the moment I can calculate the difference between the row above but the comparison doesn’t take into account the ID/MONTH. Would this need to be a where loop?
I have tried the below, but this returns NaN in all cells of the ‘Variance’ & ‘Variance%’ columns;
df_all.sort_values(['ID', 'MONTH'], inplace=True)
df_all['Variance'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].shift()
df_all['Variance%'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].pct_change()
The desired outcome is;
ID | MONTH | TOTAL | Variance | Variance % | |
---|---|---|---|---|---|
0 | REF1 | 1 | 500 | 0 | 0 |
1 | REF1 | 2 | 501 | 1 | 0.2 |
You can shift the Month by adding 1 (eventually use a more complex logic if you have real dates), then perform a self-merge
and subtract:
df['diff'] = df['TOTAL'].sub(
df[['ID', 'MONTH']]
.merge(df.assign(MONTH=df['MONTH'].add(1)),
how='left')['TOTAL']
)
Output:
ID MONTH TOTAL diff
0 REF1 1 500 NaN
1 REF1 2 501 1.0
2 REF1 3 620 119.0
3 REF2 8 5001 -1500.0 # 5001 - 6501
4 REF2 9 5101 100.0
5 REF2 10 5701 600.0
6 REF2 11 7501 1800.0
7 REF2 7 6501 5000.0 # 6501 - 1501
8 REF2 6 1501 NaN