Is there a way to calculate the running total across only a few columns (unique values only)?

Question:

I am trying to calculate the running total across a few specific columns of my dataFrame and I am only interested in calculating using unique values.

I have below an example dataframe:

Name Product Date Location Type Sales Ship Fee % Total Fee
Tom Bananas 01-01-2021 NY Fruit 120 0.01 1.2
Tom Apples 01-01-2021 NY Fruit 120 0.01 1.2
Tom Bananas 02-01-2021 TX Fruit 420 0.01 4.2
Tom Bananas 02-01-2021 TX Fruit 120 0.01 1.2
Mat Bananas 02-01-2021 NY Fruit 30 0.01 0.3

I want to have a Running Total column, but only considering the Name and Date (as groupBy columns) and showing the sum of the unique values of Total Fee column. That would result in something like this:

Name Product Date Location Type Sales Ship Fee % Total Fee Running Total
Tom Bananas 01-01-2021 NY Fruit 120 0.01 1.2 1.2
Tom Apples 01-01-2021 NY Fruit 120 0.01 1.2 1.2
Tom Bananas 02-01-2021 TX Fruit 420 0.01 4.2 4.2
Tom Bananas 02-01-2021 TX Fruit 120 0.01 1.2 5.4
Mat Bananas 02-01-2021 NY Fruit 30 0.01 0.3 0.3

I am lost -> I haven’t been able to find anything that can give me this result.

Asked By: pipocaDourada

||

Answers:

I think this is what you are looking for:

Option 1: groupby "Name" and "Date" then cumsum only unique values for Total Fee

df['Running Total'] = df.drop_duplicates(['Name', 'Date', 'Total Fee']).groupby(['Name', 'Date'])['Total Fee'].cumsum()
df['Running Total'] = df['Running Total'].fillna(df['Total Fee'])

Option 2: groupby "Name", "Product", "Date". Then cumsum –> gives the accumulated sum for each product on each day for each person.

df['Running Total'] = df.groupby(['Name', 'Product','Date'], as_index=False)['Total Fee'].cumsum()

Testing and examples

Given this dataframe:

Name Product Date Location Type Sales Ship Fee % Total Fee
0 Tom Bananas 01-01-2021 NY Fruit 120 0.01 1.2
1 Tom Apples 01-01-2021 NY Fruit 120 0.01 1.2
2 Tom Bananas 02-01-2021 TX Fruit 420 0.01 4.2
3 Tom Bananas 02-01-2021 TX Fruit 120 0.01 1.2
4 Mat Bananas 02-01-2021 NY Fruit 30 0.01 0.3
5 Mat Bananas 02-01-2021 NY Fruit 50 0.01 0.3
6 Mat Apples 03-01-2021 NY Vegetable 80 0.02 1.6

Option 1 result:

Name Product Date Location Type Sales Ship Fee % Total Fee Running Total
0 Tom Bananas 01-01-2021 NY Fruit 120 0.01 1.2 1.2
1 Tom Apples 01-01-2021 NY Fruit 120 0.01 1.2 1.2
2 Tom Bananas 02-01-2021 TX Fruit 420 0.01 4.2 4.2
3 Tom Bananas 02-01-2021 TX Fruit 120 0.01 1.2 5.4
4 Mat Bananas 02-01-2021 NY Fruit 30 0.01 0.3 0.3
5 Mat Bananas 02-01-2021 NY Fruit 50 0.01 0.3 0.3
6 Mat Apples 03-01-2021 NY Vegetable 80 0.02 1.6 1.6

Option 2 result:

Name Product Date Location Type Sales Ship Fee % Total Fee Running Total
0 Tom Bananas 01-01-2021 NY Fruit 120 0.01 1.2 1.2
1 Tom Apples 01-01-2021 NY Fruit 120 0.01 1.2 1.2
2 Tom Bananas 02-01-2021 TX Fruit 420 0.01 4.2 4.2
3 Tom Bananas 02-01-2021 TX Fruit 120 0.01 1.2 5.4
4 Mat Bananas 02-01-2021 NY Fruit 30 0.01 0.3 0.3
5 Mat Bananas 02-01-2021 NY Fruit 50 0.01 0.3 0.6
6 Mat Apples 03-01-2021 NY Vegetable 80 0.02 1.6 1.6
Answered By: Pedro Rocha
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.