Group By Sum Multiple Columns in Pandas (Ignoring duplicates)

Question:

I have the following code where my dataframe contains 3 columns

  toBeSummed toBeSummed2 toBesummed3  someColumn
0          X           X           Y         NaN
1          X           Y           Z         NaN
2          Y           Y           Z         NaN
3          Z           Z           Z         NaN
oneframe = pd.concat([df['toBeSummed'],df['toBeSummed2'],df['toBesummed3']], axis=1).reset_index()


temp = oneframe.groupby(['toBeSummed']).size().reset_index()
temp2 = oneframe.groupby(['toBeSummed2']).size().reset_index()
temp3 = oneframe.groupby(['toBeSummed3']).size().reset_index()
temp.columns.values[0] = "SameName"
temp2.columns.values[0] = "SameName"
temp3.columns.values[0]  = "SameName"

final = pd.concat([temp,temp2,temp3]).groupby(['SameName']).sum().reset_index()
final.columns.values[0] = "Letter"
final.columns.values[1] = "Sum"

The problem here is that with the code I have, it sums up all instances of each value. Meaning calling final would result in

  Letter  Sum
0      X    3
1      Y    4
2      Z    5

However I want it to not count more than once if the same value exists in the row (I.e in the first row there are two X’s so it would only count the one X)
Meaning the desired output is

  Letter  Sum
0      X    2
1      Y    3
2      Z    3

I can update or add more comments if this is confusing.

Asked By: Manny Cl

||

Answers:

Given df:

  toBeSummed toBeSummed2 toBesummed3  someColumn
0          X           X           Y         NaN
1          X           Y           Z         NaN
2          Y           Y           Z         NaN
3          Z           Z           Z         NaN

Doing:

sum_cols = ['toBeSummed', 'toBeSummed2', 'toBesummed3']

out = df[sum_cols].apply(lambda x: x.unique()).explode().value_counts()
print(out.to_frame('Sum'))

Output:

   Sum
Y    3
Z    3
X    2
Answered By: BeRT2me
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.