How to multiply combinations of two sets of pandas dataframe columns
Question:
I would like to multiply the combinations of two sets of columns
Let say there is a dataframe below:
import pandas as pd
df = {'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D':[0,1,2]}
df = pd.DataFrame(df)
Now, I want to multiply AC, AD, BC, BD
This is like multiplying the combination of [A,B] and [C,D]
I tried to use itertools but failed to figure it out.
So, the desired output will be like:
output = {'AC':[7,16,27], 'AD':[0,2,6], 'BC':[28,40,54], 'BD':[0,5,12]}
output = pd.DataFrame(output)
Answers:
IIUC, you can try
import itertools
cols1 = ['A', 'B']
cols2 = ['C', 'D']
for col1, col2 in itertools.product(cols1, cols2):
df[col1+col2] = df[col1] * df[col2]
print(df)
A B C D AC AD BC BD
0 1 4 7 0 7 0 28 0
1 2 5 8 1 16 2 40 5
2 3 6 9 2 27 6 54 12
Or with new create dataframe
out = pd.concat([df[col1].mul(df[col2]).to_frame(col1+col2)
for col1, col2 in itertools.product(cols1, cols2)], axis=1)
print(out)
AC AD BC BD
0 7 0 28 0
1 16 2 40 5
2 27 6 54 12
You can directly multiply multiple columns if you convert them to NumPy arrays first with .to_numpy()
>>> df[["A","B"]].to_numpy() * df[["C","D"]].to_numpy()
array([[ 7, 0],
[16, 5],
[27, 12]])
You can also unzip a collection of wanted pairs and use them to get a new view of your DataFrame (indexing the same column multiple times is fine) .. then multiplying together the two new NumPy arrays!
>>> import math # standard library for prod()
>>> pairs = ["AC", "AD", "BC", "BD"] # wanted pairs
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs) # new dataframe
AC AD BC BD
0 7 0 28 0
1 16 2 40 5
2 27 6 54 12
This extends to any number of pairs (triples, octuples of columns..) as long as they’re the same length (beware: zip()
will silently drop extra columns beyond the shortest group)
>>> pairs = ["ABD", "BCD"]
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs)
ABD BCD
0 0 0
1 10 40
2 36 108
I would like to multiply the combinations of two sets of columns
Let say there is a dataframe below:
import pandas as pd
df = {'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9], 'D':[0,1,2]}
df = pd.DataFrame(df)
Now, I want to multiply AC, AD, BC, BD
This is like multiplying the combination of [A,B] and [C,D]
I tried to use itertools but failed to figure it out.
So, the desired output will be like:
output = {'AC':[7,16,27], 'AD':[0,2,6], 'BC':[28,40,54], 'BD':[0,5,12]}
output = pd.DataFrame(output)
IIUC, you can try
import itertools
cols1 = ['A', 'B']
cols2 = ['C', 'D']
for col1, col2 in itertools.product(cols1, cols2):
df[col1+col2] = df[col1] * df[col2]
print(df)
A B C D AC AD BC BD
0 1 4 7 0 7 0 28 0
1 2 5 8 1 16 2 40 5
2 3 6 9 2 27 6 54 12
Or with new create dataframe
out = pd.concat([df[col1].mul(df[col2]).to_frame(col1+col2)
for col1, col2 in itertools.product(cols1, cols2)], axis=1)
print(out)
AC AD BC BD
0 7 0 28 0
1 16 2 40 5
2 27 6 54 12
You can directly multiply multiple columns if you convert them to NumPy arrays first with .to_numpy()
>>> df[["A","B"]].to_numpy() * df[["C","D"]].to_numpy()
array([[ 7, 0],
[16, 5],
[27, 12]])
You can also unzip a collection of wanted pairs and use them to get a new view of your DataFrame (indexing the same column multiple times is fine) .. then multiplying together the two new NumPy arrays!
>>> import math # standard library for prod()
>>> pairs = ["AC", "AD", "BC", "BD"] # wanted pairs
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs) # new dataframe
AC AD BC BD
0 7 0 28 0
1 16 2 40 5
2 27 6 54 12
This extends to any number of pairs (triples, octuples of columns..) as long as they’re the same length (beware: zip()
will silently drop extra columns beyond the shortest group)
>>> pairs = ["ABD", "BCD"]
>>> result = math.prod(df[[*cols]].to_numpy() for cols in zip(*pairs))
>>> pd.DataFrame(result, columns=pairs)
ABD BCD
0 0 0
1 10 40
2 36 108