How to multiply combinations of two sets of pandas dataframe columns

Question:

I would like to multiply the combinations of two sets of columns

Let say there is a dataframe below:

import pandas as pd

df = {'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D':[0,1,2]}
df = pd.DataFrame(df)

Now, I want to multiply AC, AD, BC, BD
This is like multiplying the combination of [A,B] and [C,D]

I tried to use itertools but failed to figure it out.

So, the desired output will be like:

output = {'AC':[7,16,27], 'AD':[0,2,6], 'BC':[28,40,54], 'BD':[0,5,12]}
output = pd.DataFrame(output)
Asked By: Yun Tae Hwang

||

Answers:

IIUC, you can try

import itertools

cols1 = ['A', 'B']
cols2 = ['C', 'D']

for col1, col2 in itertools.product(cols1, cols2):
    df[col1+col2] = df[col1] * df[col2]
print(df)

   A  B  C  D  AC  AD  BC  BD
0  1  4  7  0   7   0  28   0
1  2  5  8  1  16   2  40   5
2  3  6  9  2  27   6  54  12

Or with new create dataframe

out = pd.concat([df[col1].mul(df[col2]).to_frame(col1+col2)
                 for col1, col2 in itertools.product(cols1, cols2)], axis=1)
print(out)

   AC  AD  BC  BD
0   7   0  28   0
1  16   2  40   5
2  27   6  54  12
Answered By: Ynjxsjmh

You can directly multiply multiple columns if you convert them to NumPy arrays first with .to_numpy()

>>> df[["A","B"]].to_numpy() * df[["C","D"]].to_numpy()
array([[ 7,  0],
       [16,  5],
       [27, 12]])

You can also unzip a collection of wanted pairs and use them to get a new view of your DataFrame (indexing the same column multiple times is fine) .. then multiplying together the two new NumPy arrays!

>>> import math                          # standard library for prod()
>>> pairs = ["AC", "AD", "BC", "BD"]     # wanted pairs
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs)  # new dataframe
   AC  AD  BC  BD
0   7   0  28   0
1  16   2  40   5
2  27   6  54  12

This extends to any number of pairs (triples, octuples of columns..) as long as they’re the same length (beware: zip() will silently drop extra columns beyond the shortest group)

>>> pairs = ["ABD", "BCD"]
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs)
   ABD  BCD
0    0    0
1   10   40
2   36  108
Answered By: ti7
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.