Get all combinations of several columns in a pandas dataframe and calculate sum for each combination
Question:
I have a dataframe as below:
df = pd.DataFrame({'id': ['a', 'b', 'c', 'd'],
'colA': [1, 2, 3, 4],
'colB': [5, 6, 7, 8],
'colC': [9, 10, 11, 12],
'colD': [13, 14, 15, 16]})
I want to get all combinations of ‘colA’, ‘colB’, ‘colC’ and ‘colD’ and calculate sum for each combination. I can get all combinations using itertools
cols = ['colA', 'colB', 'colC', 'colD']
all_combinations = [c for i in range(2, len(cols)+1) for c in combinations(cols, i)]
But how can I get the sum for each combination and create a new column in the dataframe? Expected output:
id colA colB colC colD colA+colB colB+colC ... colA+colB+colC+colD
a 1 5 9 13 6 14 ... 28
b 2 6 10 14 8 16 ... 32
c 3 7 11 15 10 18 ... 36
d 4 8 12 16 12 20 ... 40
Answers:
First, select from the frame a list of all columns starting with col
. Then we create a dictionary using combinations
, where the keys are the names of the new summing columns, and the values are the sums of the corresponding columns of the original dataframe, then we unpack them **
as arguments to the assign
method, thereby adding to the frame
cols = [c for c in df.columns if c.startswith('col')]
df = df.assign(**{'+'.join(c):df.loc[:, c].sum(axis=1) for i in range(2, len(cols) + 1) for c in combinations(cols, i)})
print(df)
id colA colB colC colD colA+colB colA+colC colA+colD colB+colC colB+colD colC+colD colA+colB+colC colA+colB+colD colA+colC+colD colB+colC+colD colA+colB+colC+colD
0 a 1 5 9 13 6 10 14 14 18 22 15 19 23 27 28
1 b 2 6 10 14 8 12 16 16 20 24 18 22 26 30 32
2 c 3 7 11 15 10 14 18 18 22 26 21 25 29 33 36
3 d 4 8 12 16 12 16 20 20 24 28 24 28 32 36 40
I have a dataframe as below:
df = pd.DataFrame({'id': ['a', 'b', 'c', 'd'],
'colA': [1, 2, 3, 4],
'colB': [5, 6, 7, 8],
'colC': [9, 10, 11, 12],
'colD': [13, 14, 15, 16]})
I want to get all combinations of ‘colA’, ‘colB’, ‘colC’ and ‘colD’ and calculate sum for each combination. I can get all combinations using itertools
cols = ['colA', 'colB', 'colC', 'colD']
all_combinations = [c for i in range(2, len(cols)+1) for c in combinations(cols, i)]
But how can I get the sum for each combination and create a new column in the dataframe? Expected output:
id colA colB colC colD colA+colB colB+colC ... colA+colB+colC+colD
a 1 5 9 13 6 14 ... 28
b 2 6 10 14 8 16 ... 32
c 3 7 11 15 10 18 ... 36
d 4 8 12 16 12 20 ... 40
First, select from the frame a list of all columns starting with col
. Then we create a dictionary using combinations
, where the keys are the names of the new summing columns, and the values are the sums of the corresponding columns of the original dataframe, then we unpack them **
as arguments to the assign
method, thereby adding to the frame
cols = [c for c in df.columns if c.startswith('col')]
df = df.assign(**{'+'.join(c):df.loc[:, c].sum(axis=1) for i in range(2, len(cols) + 1) for c in combinations(cols, i)})
print(df)
id colA colB colC colD colA+colB colA+colC colA+colD colB+colC colB+colD colC+colD colA+colB+colC colA+colB+colD colA+colC+colD colB+colC+colD colA+colB+colC+colD
0 a 1 5 9 13 6 10 14 14 18 22 15 19 23 27 28
1 b 2 6 10 14 8 12 16 16 20 24 18 22 26 30 32
2 c 3 7 11 15 10 14 18 18 22 26 21 25 29 33 36
3 d 4 8 12 16 12 16 20 20 24 28 24 28 32 36 40