How to convert binary columns with multiple occurrences into categorical data in Pandas

Question:

I have the following example data set

A B C D
foo 0 1 1
bar 0 0 1
baz 1 1 0

How could extract the column names of each 1 occurrence in a row and put that into another column E so that I get the following table:

A B C D E
foo 0 1 1 C, D
bar 0 0 1 D
baz 1 1 0 B, C

Note that there can be more than two 1s per row.

Asked By: Dan G

||

Answers:

You can use DataFrame.dot.

df['E'] = df[['B', 'C', 'D']].dot(df.columns[1:] + ', ').str.rstrip(', ')
df

     A  B  C  D     E
0  foo  0  1  1  C, D
1  bar  0  0  1     D
2  baz  1  1  0  B, C

Inspired by jezrael’s answer in this post.

Another way is that you can convert each row to boolean and use it as a selection mask to filter the column names.

cols = pd.Index(['B', 'C', 'D'])

df['E'] = df[cols].astype('bool').apply(lambda row: ", ".join(cols[row]), axis=1)
df

     A  B  C  D     E
0  foo  0  1  1  C, D
1  bar  0  0  1     D
2  baz  1  1  0  B, C
Answered By: wavetitan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.