Add a column in pandas based on sum of the subgroup values in another column

Question:

Here is a simplified version of my dataframe (the number of persons in my dataframe is way more than 3):

df = pd.DataFrame({'Person':['John','David','Mary','John','David','Mary'],
               'Sales':[10,15,20,11,12,18],
               })
  Person  Sales
0   John     10
1  David     15
2   Mary     20
3   John     11
4  David     12
5   Mary     18

I would like to add a column "Total" to this data frame, which is the sum of total sales per person
The desired df

What is the easiest way to achieve this?

I have tried

df.groupby('Person').sum()

but the shape of the output is not congruent with the shape of df.

What I have tried

Asked By: Pepey

||

Answers:

The easiest way to achieve this is by using the pandas groupby and sum functions.

df['Total'] = df.groupby('Person')['Sales'].sum()

This will add a column to the dataframe with the total sales per person.

Answered By: cipher

What you want is the transform method which can apply a function on each group:

df['Total'] = df.groupby('Person')['Sales'].transform(sum)

It gives as expected:

  Person  Sales  Total
0   John     10     21
1  David     15     27
2   Mary     20     38
3   John     11     21
4  David     12     27
5   Mary     18     38
Answered By: Serge Ballesta

your ‘Persons’ column in the dataframe contains repeated values
it is not possible to apply a new column to this via groupby

I would suggest making a new dataframe based on sales sum
The below code will help you with that

newDf = pd.DataFrame(df.groupby('Person')['Sales'].sum()).reset_index()

This will create a new dataframe with ‘Person’ and ‘sales’ as columns.

Answered By: Anirudh B M
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.