Pandas sum over with specific column index?

Question:

Assume there is a pandas DataFrame such as

import pandas as pd 

df = pd.DataFrame({'items':[[101,102],[102,101],[102,103],
                            [101,103],[101,101],[102,102],
                            [103,103]],
                   'value':[12,13,11,15,17,8,19]})
print(df)

        items  value
0  [101, 102]     12
1  [102, 101]     13
2  [102, 103]     11
3  [101, 103]     15
4  [101, 101]     17
5  [102, 102]      8
6  [103, 103]     19

I would like to sum over 2nd value of df['items'] in each row such that
[101, 102] + [101, 103] + [101, 101] = 12 + 15 + 17 = 44. Do the same thing for 102 & 103. The final data frame should have something like

    0   101     44
    1   102     32
    2   103     19

This is my code but it seems to be incorrect

df1 = df.groupby(df['items'][1]).agg({'value':sum})

Any suggestion? many thanks

Asked By: DaCard

||

Answers:

Instead of passing df['items'][1] to groupby, you should be passing df['items'].str[1]

df.groupby(df['items'].str[1]).agg({'value': sum})
       value
items       
101       30
102       20
103       45

Answered By: ThePyGuy
In [168]: df.groupby(df["items"].str[0]).agg({"value": "sum"})
Out[168]:
       value
items
101       44
102       32
103       19

df["items"][0] would choose the 0th value of the Series, not each 0th value of the lists in the Series. For that, we use the .str accessor. It’s short for string but [..] is supported by lists too (duck typing) so we can use them on lists as well. Note that Python is 0-indexed, so we use 0 not 1.

Answered By: Mustafa Aydın
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.