# calculate diff between two values and then % difference associated to unique references month by month in pandas dataframe

## Question:

I have a pandas dataframe;

ID | MONTH | TOTAL | |
---|---|---|---|

0 | REF1 | 1 | 500 |

1 | REF1 | 2 | 501 |

2 | REF1 | 3 | 620 |

3 | REF2 | 8 | 5001 |

4 | REF2 | 9 | 5101 |

5 | REF2 | 10 | 5701 |

6 | REF2 | 11 | 7501 |

7 | REF2 | 7 | 6501 |

8 | REF2 | 6 | 1501 |

I need to do a comparison between of difference between the ID’s previous month’s TOTAL.

At the moment I can calculate the difference between the row above but the comparison doesn’t take into account the ID/MONTH. Would this need to be a where loop?

I have tried the below, but this returns NaN in all cells of the ‘Variance’ & ‘Variance%’ columns;

```
df_all.sort_values(['ID', 'MONTH'], inplace=True)
df_all['Variance'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].shift()
df_all['Variance%'] = df_all['TOTAL'] - df_all.groupby(['ID', 'MONTH'])['TOTAL'].pct_change()
```

The desired outcome is;

ID | MONTH | TOTAL | Variance | Variance % | |
---|---|---|---|---|---|

0 | REF1 | 1 | 500 | 0 | 0 |

1 | REF1 | 2 | 501 | 1 | 0.2 |

## Answers:

You can shift the Month by adding 1 (eventually use a more complex logic if you have real dates), then perform a self-`merge`

and subtract:

```
df['diff'] = df['TOTAL'].sub(
df[['ID', 'MONTH']]
.merge(df.assign(MONTH=df['MONTH'].add(1)),
how='left')['TOTAL']
)
```

Output:

```
ID MONTH TOTAL diff
0 REF1 1 500 NaN
1 REF1 2 501 1.0
2 REF1 3 620 119.0
3 REF2 8 5001 -1500.0 # 5001 - 6501
4 REF2 9 5101 100.0
5 REF2 10 5701 600.0
6 REF2 11 7501 1800.0
7 REF2 7 6501 5000.0 # 6501 - 1501
8 REF2 6 1501 NaN
```