Variable combinations of column designations in pandas
Question:
I can best explain my problem by starting with an example:
df = pd.DataFrame({"ID" : [1, 2, 3, 4],
"age": [46, 48, 55, 55],
"gender": ['female', 'female', 'male', 'male'],
"overweight": ['y', 'n', 'y', 'y']},
index = [0, 1, 2, 3])
Now I want to build a function that recives a dataframe (= df) and an integer (= m).
For example m = 2, now the function should combine every column designations in pairs of two. The output should be a list containing those pairs. For example m=2 und df:
[[ID, age],[ID, gender],[ID, overweight],[age, gender], [age, overweight], [gender, overweight]]
Does anyone knwo how I can achieve that?
My problem is that m and the amount of columns are variable.
Answers:
You can use itertools.combinations
directly on the dataframe as iteration occurs on the column names:
from itertools import combinations
m = 2
out = list(combinations(df, m))
output:
[('ID', 'age'),
('ID', 'gender'),
('ID', 'overweight'),
('age', 'gender'),
('age', 'overweight'),
('gender', 'overweight')]
from itertools import combinations
n=2
[df[list(i)] for i in combinations(df.columns,n)]
[ ID age
0 1 46
1 2 48
2 3 55
3 4 55,
ID gender
0 1 female
1 2 female
2 3 male
3 4 male,
ID overweight
0 1 y
1 2 n
2 3 y
3 4 y,
age gender
0 46 female
1 48 female
2 55 male
3 55 male,
age overweight
0 46 y
1 48 n
2 55 y
3 55 y,
gender overweight
0 female y
1 female n
2 male y
3 male y]
I can best explain my problem by starting with an example:
df = pd.DataFrame({"ID" : [1, 2, 3, 4],
"age": [46, 48, 55, 55],
"gender": ['female', 'female', 'male', 'male'],
"overweight": ['y', 'n', 'y', 'y']},
index = [0, 1, 2, 3])
Now I want to build a function that recives a dataframe (= df) and an integer (= m).
For example m = 2, now the function should combine every column designations in pairs of two. The output should be a list containing those pairs. For example m=2 und df:
[[ID, age],[ID, gender],[ID, overweight],[age, gender], [age, overweight], [gender, overweight]]
Does anyone knwo how I can achieve that?
My problem is that m and the amount of columns are variable.
You can use itertools.combinations
directly on the dataframe as iteration occurs on the column names:
from itertools import combinations
m = 2
out = list(combinations(df, m))
output:
[('ID', 'age'),
('ID', 'gender'),
('ID', 'overweight'),
('age', 'gender'),
('age', 'overweight'),
('gender', 'overweight')]
from itertools import combinations
n=2
[df[list(i)] for i in combinations(df.columns,n)]
[ ID age
0 1 46
1 2 48
2 3 55
3 4 55,
ID gender
0 1 female
1 2 female
2 3 male
3 4 male,
ID overweight
0 1 y
1 2 n
2 3 y
3 4 y,
age gender
0 46 female
1 48 female
2 55 male
3 55 male,
age overweight
0 46 y
1 48 n
2 55 y
3 55 y,
gender overweight
0 female y
1 female n
2 male y
3 male y]