# calculate diff between two values and then % difference associated to unique references month by month in pandas dataframe

## Question:

I have a pandas dataframe;

ID MONTH TOTAL
0 REF1 1 500
1 REF1 2 501
2 REF1 3 620
3 REF2 8 5001
4 REF2 9 5101
5 REF2 10 5701
6 REF2 11 7501
7 REF2 7 6501
8 REF2 6 1501

I need to do a comparison between of difference between the ID’s previous month’s TOTAL.

At the moment I can calculate the difference between the row above but the comparison doesn’t take into account the ID/MONTH. Would this need to be a where loop?

I have tried the below, but this returns NaN in all cells of the ‘Variance’ & ‘Variance%’ columns;

``````df_all.sort_values(['ID', 'MONTH'], inplace=True)
df_all['Variance'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].shift()
df_all['Variance%'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].pct_change()
``````

The desired outcome is;

ID MONTH TOTAL Variance Variance %
0 REF1 1 500 0 0
1 REF1 2 501 1 0.2

You can shift the Month by adding 1 (eventually use a more complex logic if you have real dates), then perform a self-`merge` and subtract:

``````df['diff'] = df['TOTAL'].sub(
df[['ID', 'MONTH']]
how='left')['TOTAL']
)
``````

Output:

``````     ID  MONTH  TOTAL    diff
0  REF1      1    500     NaN
1  REF1      2    501     1.0
2  REF1      3    620   119.0
3  REF2      8   5001 -1500.0   # 5001 - 6501
4  REF2      9   5101   100.0
5  REF2     10   5701   600.0
6  REF2     11   7501  1800.0
7  REF2      7   6501  5000.0   # 6501 - 1501
8  REF2      6   1501     NaN
``````
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.