Keep other columns when using sum() with groupby

Question:

I have a pandas dataframe below:

    df

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       1         1       1.19        2.39     
1   Jack       1         2       1.19        2.39
2   Luke       0         1       1.08        1.08  
3   Mark       0         1       3.45        3.45
4   Luke       1         0       1.08        1.08

Same name will have the same value for otherstuff1 and otherstuff2.

I’m trying to group by column name and sum both columns value1 and value2. (Not sum value1 with value2!!! But sum them individually in each column.)

Expecting to get result below:

    newdf

    name    value1    value2  otherstuff1 otherstuff2 
0   Jack       2         3       1.19        2.39     
1   Luke       1         1       1.08        1.08  
2   Mark       0         1       3.45        3.45

I’ve tried

newdf = df.groupby(['name'], as_index=False).sum()

which groups by name and sums up both value1 and value2 columns correctly, but ends up dropping columns otherstuff1 and otherstuff2.

Asked By: SwagZ

||

Answers:

Something like ?(Assuming you have same otherstuff1 and otherstuff2 under the same name )

df.groupby(['name','otherstuff1','otherstuff2'],as_index=False).sum()
Out[121]: 
   name  otherstuff1  otherstuff2  value1  value2
0  Jack         1.19         2.39       2       3
1  Luke         1.08         1.08       1       1
2  Mark         3.45         3.45       0       1
Answered By: BENY

You should specify what pandas must do with the other columns. In your case, I think you want to keep one row, regardless of its position within the group.

This could be done with agg on a group. agg accepts a parameter that specifies what operation should be performed for each column.

df.groupby(['name'], as_index=False).agg({'value1': 'sum', 'value2': 'sum', 'otherstuff1': 'first', 'otherstuff2': 'first'})
Answered By: Guybrush

The key in the answer above is actually the as_index=False, otherwise all the columns in the list get used in the index.

p_summ = p.groupby( attributes_list, as_index=False ).agg( {'AMT':sum })
Answered By: Graven74

These solutions are great, but when you have to many columns you do not want to type all of the column names. So here is what I came up with:

column_map = {col: "first" for col in df.columns}
column_map["col_name1"] = "sum"
column_map["col_name2"] = lambda x: set(x) # it can also be a function or lambda

now you can simply do

df.groupby(["col_to_group"], as_index=False).aggreagate(column_map)
Answered By: Berkay Berabi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.