I have the following
A B C D 0 foo a 1200 300 0 foo a 700 300 0 foo b 1000 300 1 bar b 270 70 1 bar a 350 70 2 abc c 270 300 2 abc a 350 300
I want to display the sum of values in column
D grouped by column
B, but I do not want to sum the values in column
B for a single value in column
A. That is, column
D has only one value per value in column
foo will only ever have the value
bar will only have the value
70 in column
D. The values in this column are just repeated because I have repeated indexes.
I want to print something like (no need to show formatting, I just need to output the correct sums):
a: 300 (from foo) + 300 (from foo) + 70 (from bar) = 670 b: 300 (from foo) + 70 (from bar) = 370 c: 300 (from abc)
That is, values in column
D should not be summed together if the value in column
A is the same among them.
You could use
pd.unique() after the groupby and then sum those values up.
df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
B a 370 b 370 Name: D, dtype: int64
For your new example you search for something like this:
df.groupby(['B','A'])['D'].apply(lambda x: sum(pd.unique(x))).groupby('B').sum()
B a 670 b 370 c 300 Name: D, dtype: int64