Summing dataframe over combinations of index columns

Question

I have a dataframe with different sections (only 2 sections and speeds here, but a circuit can be up to 8 sections and 6 measured speeds) like so:

section	speed	Data1	Data2
A	10	1.5	2.5
A	20	1.0	2.0
B	10	2.5	3.5
B	20	2.0	3.0

I would like to sum my data columns over all possible circuits

A	B	Data1	Data2
10	10	4.0	6.0
10	20	3.5	5.5
20	10	3.5	5.5
20	20	3.0	5.0

How would I do this? I can make the combinations, but not sure how to sum the data columns over them.

Asked By: Daniel F

||

Source

Answer 1

What about using itertools.product, then summing per group:

from itertools import product

df2 = df.set_index(['section', 'speed']).T

out = (pd.concat({k: df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))})
         .unstack(level=-1)
      )

output:

                 Data1  Data2
(A, 10) (B, 10)    4.0    6.0
        (B, 20)    3.5    5.5
(A, 20) (B, 10)    3.5    5.5
        (B, 20)    3.0    5.0

For the exact provided format:

df2 = df.set_index(['section', 'speed']).T

sections = df2.columns.get_level_values('section').unique()

out = (pd.concat({tuple(x[1] for x in k):
                  df2[list(k)].sum(1)
                  for k in product(*(d for _,d in df2.groupby(axis=1, level=0)))
                 })
         .unstack(level=-1)
         .rename_axis(sections).reset_index()
      )

output:

    A   B  Data1  Data2
0  10  10    4.0    6.0
1  10  20    3.5    5.5
2  20  10    3.5    5.5
3  20  20    3.0    5.0

Answered By: mozway

Answer 2

One approach:

from itertools import product

groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
rows = []
for p in product(*groups):
    row = {}
    for e in p:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0) + v
    rows.append(row)

res = pd.DataFrame(rows)
print(res)

Output

    A  Data1  Data2   B
0  10    4.0    6.0  10
1  10    3.5    5.5  20
2  20    3.5    5.5  10
3  20    3.0    5.0  20

Or more pythonic:

def build_row(prod):
    row = {}
    for e in prod:
        d = e.to_dict()
        row[d.pop("section")] = d.pop("speed")
        for k, v in d.items():
            row[k] = row.get(k, 0) + v
    return row


groups = [[row for i, row in v.iterrows()] for _, v in df.groupby("section")]
res = pd.DataFrame([build_row(p) for p in product(*groups)])
print(res)

Note that if you want exact output, just reorder the columns.

Answered By: Dani Mesejo

Answer 3

use pandasql package

df1.sql("""
    select a,b,tb1.data1+tb2.data1 as Data1,tb1.data2+tb2.data2 as Data2
        from
        (select speed as A,data1,data2 from self where section = 'A') tb1
        join
        (select speed as B,data1,data2 from self where section = 'B') tb2
""")

out：

    A   B  Data1  Data2
0  10  10    4.0    6.0
1  10  20    3.5    5.5
2  20  10    3.5    5.5
3  20  20    3.0    5.0

Answered By: G.G

Summing dataframe over combinations of index columns

Question:

Answers: