Pandas Sum of Duplicate Attributes

Question:

I’m using Pandas to manipulate a csv file with several rows and columns that looks like the following

Fullname     Amount     Date           Zip    State .....
John Joe        1        1/10/1900     55555    Confusion
Betty White     5         .             .       Alaska 
Bruce Wayne     10        .             .       Frustration
John Joe        20        .             .       .
Betty White     25        .             .       .

I’d like to create a new column entitled Total with a total sum of amount for each person. (Identified by Fullname and Zip). I’m having difficulty in finding the correct solution.

Let’s just call my csv import csvfile. Here is what I have.

import Pandas
df = pandas.read_csv('csvfile.csv', header = 0) 
df.sort(['fullname'])

I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

Asked By: user2723240

||

Answers:

I think you want this:

df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')

So groupby will group by the Fullname and zip columns, as you’ve stated, we then call transform on the Amount column and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.

new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])
Answered By: EdChum

Consider using one of following

df = df.groupby(['Fullname', 'Zip'], as_index=False)['Amount'].sum()
df = df.groupby(['Fullname', 'Zip'], as_index=False)['Amount'].cumsum()
df = df.groupby(['Fullname', 'Zip'], as_index=False)['Amount'].agg('sum')

All three methods store the result in Amount column. Since the meaning of column changes, you could rename the column to another with df.rename()

df = df.rename(columns={'Amount':'Total'})

If you want to keep one value from other columns, you could use agg(), which accepts a dict of axis labels -> functions that specifies what operation should be performed for each column.

df.groupby(['Fullname', 'Zip'], as_index=False).agg({'Amount': 'sum', 'State': 'first'})
Answered By: Ynjxsjmh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.