Is there a way to calculate the running total across only a few columns (unique values only)?
Question:
I am trying to calculate the running total across a few specific columns of my dataFrame and I am only interested in calculating using unique values.
I have below an example dataframe:
Name
Product
Date
Location
Type
Sales
Ship Fee %
Total Fee
Tom
Bananas
01-01-2021
NY
Fruit
120
0.01
1.2
Tom
Apples
01-01-2021
NY
Fruit
120
0.01
1.2
Tom
Bananas
02-01-2021
TX
Fruit
420
0.01
4.2
Tom
Bananas
02-01-2021
TX
Fruit
120
0.01
1.2
Mat
Bananas
02-01-2021
NY
Fruit
30
0.01
0.3
I want to have a Running Total
column, but only considering the Name and Date (as groupBy columns) and showing the sum of the unique values of Total Fee column. That would result in something like this:
Name
Product
Date
Location
Type
Sales
Ship Fee %
Total Fee
Running Total
Tom
Bananas
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
Tom
Apples
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
Tom
Bananas
02-01-2021
TX
Fruit
420
0.01
4.2
4.2
Tom
Bananas
02-01-2021
TX
Fruit
120
0.01
1.2
5.4
Mat
Bananas
02-01-2021
NY
Fruit
30
0.01
0.3
0.3
I am lost -> I haven’t been able to find anything that can give me this result.
Answers:
I think this is what you are looking for:
Option 1: groupby
"Name" and "Date" then cumsum
only unique values for Total Fee
df['Running Total'] = df.drop_duplicates(['Name', 'Date', 'Total Fee']).groupby(['Name', 'Date'])['Total Fee'].cumsum()
df['Running Total'] = df['Running Total'].fillna(df['Total Fee'])
Option 2: groupby
"Name", "Product", "Date". Then cumsum
–> gives the accumulated sum for each product on each day for each person.
df['Running Total'] = df.groupby(['Name', 'Product','Date'], as_index=False)['Total Fee'].cumsum()
Testing and examples
Given this dataframe:
Name
Product
Date
Location
Type
Sales
Ship Fee %
Total Fee
0
Tom
Bananas
01-01-2021
NY
Fruit
120
0.01
1.2
1
Tom
Apples
01-01-2021
NY
Fruit
120
0.01
1.2
2
Tom
Bananas
02-01-2021
TX
Fruit
420
0.01
4.2
3
Tom
Bananas
02-01-2021
TX
Fruit
120
0.01
1.2
4
Mat
Bananas
02-01-2021
NY
Fruit
30
0.01
0.3
5
Mat
Bananas
02-01-2021
NY
Fruit
50
0.01
0.3
6
Mat
Apples
03-01-2021
NY
Vegetable
80
0.02
1.6
Option 1 result:
Name
Product
Date
Location
Type
Sales
Ship Fee %
Total Fee
Running Total
0
Tom
Bananas
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
1
Tom
Apples
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
2
Tom
Bananas
02-01-2021
TX
Fruit
420
0.01
4.2
4.2
3
Tom
Bananas
02-01-2021
TX
Fruit
120
0.01
1.2
5.4
4
Mat
Bananas
02-01-2021
NY
Fruit
30
0.01
0.3
0.3
5
Mat
Bananas
02-01-2021
NY
Fruit
50
0.01
0.3
0.3
6
Mat
Apples
03-01-2021
NY
Vegetable
80
0.02
1.6
1.6
Option 2 result:
Name
Product
Date
Location
Type
Sales
Ship Fee %
Total Fee
Running Total
0
Tom
Bananas
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
1
Tom
Apples
01-01-2021
NY
Fruit
120
0.01
1.2
1.2
2
Tom
Bananas
02-01-2021
TX
Fruit
420
0.01
4.2
4.2
3
Tom
Bananas
02-01-2021
TX
Fruit
120
0.01
1.2
5.4
4
Mat
Bananas
02-01-2021
NY
Fruit
30
0.01
0.3
0.3
5
Mat
Bananas
02-01-2021
NY
Fruit
50
0.01
0.3
0.6
6
Mat
Apples
03-01-2021
NY
Vegetable
80
0.02
1.6
1.6
I am trying to calculate the running total across a few specific columns of my dataFrame and I am only interested in calculating using unique values.
I have below an example dataframe:
Name | Product | Date | Location | Type | Sales | Ship Fee % | Total Fee |
---|---|---|---|---|---|---|---|
Tom | Bananas | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 |
Tom | Apples | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 |
Tom | Bananas | 02-01-2021 | TX | Fruit | 420 | 0.01 | 4.2 |
Tom | Bananas | 02-01-2021 | TX | Fruit | 120 | 0.01 | 1.2 |
Mat | Bananas | 02-01-2021 | NY | Fruit | 30 | 0.01 | 0.3 |
I want to have a Running Total
column, but only considering the Name and Date (as groupBy columns) and showing the sum of the unique values of Total Fee column. That would result in something like this:
Name | Product | Date | Location | Type | Sales | Ship Fee % | Total Fee | Running Total |
---|---|---|---|---|---|---|---|---|
Tom | Bananas | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
Tom | Apples | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
Tom | Bananas | 02-01-2021 | TX | Fruit | 420 | 0.01 | 4.2 | 4.2 |
Tom | Bananas | 02-01-2021 | TX | Fruit | 120 | 0.01 | 1.2 | 5.4 |
Mat | Bananas | 02-01-2021 | NY | Fruit | 30 | 0.01 | 0.3 | 0.3 |
I am lost -> I haven’t been able to find anything that can give me this result.
I think this is what you are looking for:
Option 1: groupby
"Name" and "Date" then cumsum
only unique values for Total Fee
df['Running Total'] = df.drop_duplicates(['Name', 'Date', 'Total Fee']).groupby(['Name', 'Date'])['Total Fee'].cumsum()
df['Running Total'] = df['Running Total'].fillna(df['Total Fee'])
Option 2: groupby
"Name", "Product", "Date". Then cumsum
–> gives the accumulated sum for each product on each day for each person.
df['Running Total'] = df.groupby(['Name', 'Product','Date'], as_index=False)['Total Fee'].cumsum()
Testing and examples
Given this dataframe:
Name | Product | Date | Location | Type | Sales | Ship Fee % | Total Fee | |
---|---|---|---|---|---|---|---|---|
0 | Tom | Bananas | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 |
1 | Tom | Apples | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 |
2 | Tom | Bananas | 02-01-2021 | TX | Fruit | 420 | 0.01 | 4.2 |
3 | Tom | Bananas | 02-01-2021 | TX | Fruit | 120 | 0.01 | 1.2 |
4 | Mat | Bananas | 02-01-2021 | NY | Fruit | 30 | 0.01 | 0.3 |
5 | Mat | Bananas | 02-01-2021 | NY | Fruit | 50 | 0.01 | 0.3 |
6 | Mat | Apples | 03-01-2021 | NY | Vegetable | 80 | 0.02 | 1.6 |
Option 1 result:
Name | Product | Date | Location | Type | Sales | Ship Fee % | Total Fee | Running Total | |
---|---|---|---|---|---|---|---|---|---|
0 | Tom | Bananas | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
1 | Tom | Apples | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
2 | Tom | Bananas | 02-01-2021 | TX | Fruit | 420 | 0.01 | 4.2 | 4.2 |
3 | Tom | Bananas | 02-01-2021 | TX | Fruit | 120 | 0.01 | 1.2 | 5.4 |
4 | Mat | Bananas | 02-01-2021 | NY | Fruit | 30 | 0.01 | 0.3 | 0.3 |
5 | Mat | Bananas | 02-01-2021 | NY | Fruit | 50 | 0.01 | 0.3 | 0.3 |
6 | Mat | Apples | 03-01-2021 | NY | Vegetable | 80 | 0.02 | 1.6 | 1.6 |
Option 2 result:
Name | Product | Date | Location | Type | Sales | Ship Fee % | Total Fee | Running Total | |
---|---|---|---|---|---|---|---|---|---|
0 | Tom | Bananas | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
1 | Tom | Apples | 01-01-2021 | NY | Fruit | 120 | 0.01 | 1.2 | 1.2 |
2 | Tom | Bananas | 02-01-2021 | TX | Fruit | 420 | 0.01 | 4.2 | 4.2 |
3 | Tom | Bananas | 02-01-2021 | TX | Fruit | 120 | 0.01 | 1.2 | 5.4 |
4 | Mat | Bananas | 02-01-2021 | NY | Fruit | 30 | 0.01 | 0.3 | 0.3 |
5 | Mat | Bananas | 02-01-2021 | NY | Fruit | 50 | 0.01 | 0.3 | 0.6 |
6 | Mat | Apples | 03-01-2021 | NY | Vegetable | 80 | 0.02 | 1.6 | 1.6 |