Python – How to do accumulative sums depending on the value of a column

Question:

I have a dataframe and I want to add a column that should be the accumulative sum of one of the columns but only if the value of another column is a specific one.

For example, my dataframe is as follows:

| Type | Quantity |

| A | 30 |

| B | 10 |

| B | 5 |

| A | 3 |

I would like to add a column SumA that would only do the accumulative sum of the quantities when Type == A.

I have tried this:

data['SumA'] = data['Quantity'].cumsum() if data[(data['Type'] == 'A')]

I keep getting errors and I’m not sure how I can solve them, could someone please give me a hand?

I would like to get something like this:

| Type | Quantity | Sum A | Sum B |

| A | 30 | 30 | 0 |

| B | 10 | 30 | 10 |

| B | 5 | 30 | 15 |

| A | 3 | 33 | 15 |
Asked By: Sara.SP92

||

Answers:

The error you are getting here is a syntax error. Pandas does not support selection for rows with the if command.

Instead to select the rows you want you can do this:

data[(data['Type'] == 'A')]['Quantity']

This will show the quantity column of the rows that have Type equal to ‘A’

So in your case in order for this code to work this will become:

data['sumA'] = data[(data['Type'] == 'A')]['Quantity'].cumsum() 

In order to get the expected output you just need to do this twice for columns A and B and fill any missing nan value.

data['sumA'] = data[(data['Type'] == 'A')]['Quantity'].cumsum() 
data['sumB'] = data[(data['Type'] == 'B')]['Quantity'].cumsum() 

# Fill nan values with the previously available value
data.fillna(method='ffill', inplace=True)

# The first values don't have any previous value, so fill with zero
data.fillna(value=0, inplace=True)

This returns the expected value

Answered By: Thanos Mitsiou

I thought about somewhat general solution which can surely be optimized (I will try and continue and work on it):

So we iterate over the unique values of our Type column to create sum{value} column, then each column will consist the cumsum of their respected Type value while non matching values will be NaN.

Then I fill the NaN values with the nearest valid value and the last row is to satisfy the special case where the first item in the row is NaN and needs to be 0

for column in data['Type'].unique():
  column_name = f'sum{column}'
  data[column_name] = data[data['Type'] == column]['Quantity'].cumsum()
  data[column_name].fillna(method='ffill', inplace=True)
  data[column_name].fillna(value=0, inplace=True)

output:

    Type    Quantity    sumA   sumB
0   A       30          30.0   0.0
1   B       10          30.0   10.0
2   B       5           30.0   15.0
3   A       3           33.0   15.0
Answered By: ImSo3K
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.