Combine multiple columns into one category column using the column names as value label

Question

I have this data

   ID      A      B      C
0   0   True  False  False
1   1  False   True  False
2   2  False  False   True

And want to transform it into

   ID group
0   0     A
1   1     B
2   2     C

I want to use the column names as value labels for the category column.
There is a maximum of only one True value per row.

This is the MWE

#!/usr/bin/env python3
import pandas as pd

df = pd.DataFrame({
    'ID': range(3),
    'A': [True, False, False],
    'B': [False, True, False],
    'C': [False, False, True]
})

result = pd.DataFrame({
    'ID': range(3),
    'group': ['A', 'B', 'C']
})
result.group = result.group.astype('category')

print(df)
print(result)

I could do df.apply(lambda row: ...magic.., axis=1). But isn’t there a more elegant way with pandas’ own tools?

Asked By: buhtz

||

Source

Answer 1

You can use df.dot:

df['group'] = df[['A', 'B', 'C']].dot(df.columns[1:])

Answered By: Nuri Taş

Answer 2

You can use melt then a lookup based on the column where the values are true to get the results you are expecting

df = df.melt(id_vars = 'ID', var_name = 'group')
df.loc[df['value'] == True][['ID', 'group']]

Answered By: ArchAngelPwn

Answer 3

You could use pd.melt() to reshape and rename, then boolean filtering on the ‘value’ column using query:

pd.melt(df,id_vars=['ID'],var_name= 'group').query('value') 

   ID group  value
0   0     A   True
4   1     B   True
8   2     C   True

Chaining .drop('value',axis=1).reset_index(drop=True) will give your final output:

   ID group
0   0     A
1   1     B
2   2     C

Answered By: sophocles

Answer 4

Yet another way:

df.set_index(['ID'])
  .rename_axis('group', axis=1) # getting column name correct
  .stack()                      # reshaping getting column headers into dataframe rows
  .loc[lambda x: x]             # filtering for True
  .reset_index()                # moving ID back into dataframe columns
  .drop(0, axis=1)               # dropping boolean column

Output:

   ID group
0   0     A
1   1     B
2   2     C

Answered By: Scott Boston

Answer 5

More verbose than melt, but this drops the invalid columns during the reshaping:

(df.set_index('ID')
   .rename_axis(columns='group')
   .replace(False, pd.NA)
   .stack().reset_index().drop(columns=0)
)

output:

   ID group
0   0     A
1   1     B
2   2     C

Answered By: mozway

Answer 6

Try with apply lambda

df.set_index('ID').apply(lambda x : x.index[x][0],axis=1)
Out[39]: 
ID
0    A
1    B
2    C
dtype: object

Answered By: BENY

Answer 7

`idxmax`

s = df.set_index('ID')
s.idxmax(1).where(s.any(1))

ID
0    A
1    B
2    C
dtype: object

Answered By: Shubham Sharma

Answer 8

df['group'] = df[['A', 'B', 'C']].dot(df.columns[1:])

Answered By: Amandeep Kaur

Combine multiple columns into one category column using the column names as value label

Question:

Answers:

`idxmax`