mapping boolean columns to a categorical equivalent in another column

Question:

I have the following dataframe:

data = 
{'id': [1, 2, 3, 4, 5],
 'A': [1, 0, 0, 0, 0],
 'B': [0, 0, 1, 0, 0],
 'C': [0, 0, 0, 0, 1],
 'D': [0, 1, 0, 0, 0],
 'E': [0, 0, 0, 1, 0]}

df = pd.DataFrame(data)

So I want to create a new column, class that takes a 0 if A is true (A=1), a 1 if B is true (B=1), a 2 if C is true, and so on.

Expected output:


   id   A   B   C   D   E class
0   1   1   0   0   0   0   0
1   2   0   0   0   1   0   3
2   3   0   1   0   0   0   1
3   4   0   0   0   0   1   4
4   5   0   0   1   0   0   2
Asked By: arilwan

||

Answers:

df['class'] = df.apply(lambda x: x.B+x.C*2+x.D*3+x.E*4, axis=1)
print(df)

Prints:

   id  A  B  C  D  E  class
0   1  1  0  0  0  0      0
1   2  0  0  0  1  0      3
2   3  0  1  0  0  0      1
3   4  0  0  0  0  1      4
4   5  0  0  1  0  0      2
Answered By: Алексей Р

You can use np.nonzero, which returns a tuple with the indices of the elements that are non-zero, and select the second element.

df['class'] = np.nonzero(df.iloc[:,1:].to_numpy())[1]

print(df)

   id  A  B  C  D  E  class
0   1  1  0  0  0  0      0
1   2  0  0  0  1  0      3
2   3  0  1  0  0  0      1
3   4  0  0  0  0  1      4
4   5  0  0  1  0  0      2

Or np.where and avoid the need for df.to_numpy.

np.where(df.iloc[:,1:].eq(1))[1]
Answered By: ouroboros1
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.