# How to sum values across groups without summing duplicates

## Question:

I have the following `df`:

``````     A    B        C       D
0  foo    a     1200     300
0  foo    a      700     300
0  foo    b     1000     300
1  bar    b      270      70
1  bar    a      350      70
2  abc    c      270     300
2  abc    a      350     300
``````

I want to display the sum of values in column `D` grouped by column `B`, but I do not want to sum the values in column `B` for a single value in column `A`. That is, column `D` has only one value per value in column `A`.

`foo` will only ever have the value `300` and `bar` will only have the value `70` in column `D`. The values in this column are just repeated because I have repeated indexes.

I want to print something like (no need to show formatting, I just need to output the correct sums):

``````a: 300 (from foo) + 300 (from foo) + 70 (from bar) = 670
b: 300 (from foo) + 70 (from bar) = 370
c: 300 (from abc)
``````

That is, values in column `D` should not be summed together if the value in column `A` is the same among them.

You could use `pd.unique()` after the groupby and then sum those values up.

``````df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
``````
``````B
a    370
b    370
Name: D, dtype: int64
``````

UPDATE
For your new example you search for something like this:

``````df.groupby(['B','A'])['D'].apply(lambda x: sum(pd.unique(x))).groupby('B').sum()
``````

Output:

``````B
a    670
b    370
c    300
Name: D, dtype: int64
``````
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.