Python pandas – convert multiple columns to row
Question:
I have a table like this
id
non-binary
female
male
others
1
True
False
False
False
2
False
True
False
False
3
False
False
True
False
4
True
True
True
False
I want to combine columns into row like this –
If multiple columns have value True, only use the first column that has value True. Eg, for id=4, since the first True is in column "non-binary", then in the new table, set the gender value to non-binary.
How can it be converted to the table below by using Pandas?
id
Gender
1
non-binary
2
female
3
male
4
non-binary
Answers:
Since boolean
values are numeric (True == 1, False == 0) you can use idxmax
to get the index of the maximum value. This interestingly will correspond to the name of the column where the first True appears if you apply this along the column axis.
# set your index to the id column if it is not already
df = df.set_index('id')
out = df.idxmax(axis='columns').rename('Gender')
print(out)
id
1 non-binary
2 female
3 male
4 non-binary
Name: Gender, dtype: object
# Can use .reset_index() or .to_frame() to convert back to a DataFrame
# (depending on what you want to do with your Index)
print(out.reset_index())
id Gender
0 1 non-binary
1 2 female
2 3 male
3 4 non-binary
I haven’t tested this. But it seems in pandas version 1.5.0 there is a new function called from_dummies
:
Try this:
pd.from_dummies(df.astype(int))
What about just using .loc
:
df2 = pd.DataFrame({'id': df.id})
df2.loc[df['non-binary'].eq(True), 'Gender'] = 'non-binary'
df2.loc[df['non-binary'].eq(False) & df['female'].eq(True), 'Gender'] = 'female'
df2.loc[df['non-binary'].eq(False) & df['male'].eq(True), 'Gender'] = 'male'
df2.loc[df['non-binary'].eq(False) & df['others'].eq(True), 'Gender'] = 'other'
print(df2)
Output:
id
Gender
0
1
non-binary
1
2
female
2
3
male
3
4
non-binary
I have a table like this
id | non-binary | female | male | others |
---|---|---|---|---|
1 | True | False | False | False |
2 | False | True | False | False |
3 | False | False | True | False |
4 | True | True | True | False |
I want to combine columns into row like this –
If multiple columns have value True, only use the first column that has value True. Eg, for id=4, since the first True is in column "non-binary", then in the new table, set the gender value to non-binary.
How can it be converted to the table below by using Pandas?
id | Gender |
---|---|
1 | non-binary |
2 | female |
3 | male |
4 | non-binary |
Since boolean
values are numeric (True == 1, False == 0) you can use idxmax
to get the index of the maximum value. This interestingly will correspond to the name of the column where the first True appears if you apply this along the column axis.
# set your index to the id column if it is not already
df = df.set_index('id')
out = df.idxmax(axis='columns').rename('Gender')
print(out)
id
1 non-binary
2 female
3 male
4 non-binary
Name: Gender, dtype: object
# Can use .reset_index() or .to_frame() to convert back to a DataFrame
# (depending on what you want to do with your Index)
print(out.reset_index())
id Gender
0 1 non-binary
1 2 female
2 3 male
3 4 non-binary
I haven’t tested this. But it seems in pandas version 1.5.0 there is a new function called from_dummies
:
Try this:
pd.from_dummies(df.astype(int))
What about just using .loc
:
df2 = pd.DataFrame({'id': df.id})
df2.loc[df['non-binary'].eq(True), 'Gender'] = 'non-binary'
df2.loc[df['non-binary'].eq(False) & df['female'].eq(True), 'Gender'] = 'female'
df2.loc[df['non-binary'].eq(False) & df['male'].eq(True), 'Gender'] = 'male'
df2.loc[df['non-binary'].eq(False) & df['others'].eq(True), 'Gender'] = 'other'
print(df2)
Output:
id | Gender | |
---|---|---|
0 | 1 | non-binary |
1 | 2 | female |
2 | 3 | male |
3 | 4 | non-binary |