PANDAS: How to swap columns with one hot encoded values

Question:

So my pandas dataframe looks like this:

                category1  category2  ...  category6  category7 
[filename                             ...                      ] 
[0.wav          5          1.0        ...        NaN        NaN] 
[1.wav          8          1.0        ...        NaN        NaN] 
[2.wav          5          1.0        ...        NaN        NaN] 

I’ve set the filename column as my index. I want now to have these values as my new columns. The values inside every column are numbrs from 0 to 12 or NaN. I want to have numbers from 0 to 12 instead of category1…7 and one hot encode these values. So for file 0.wav I would like to have a 1 at column 5 and a 1 at column 1 and the rest would be 0. So like this:

                0          1           2        ...        5        ...        12 
[filename                                                                        ] 
[0.wav          0          1           0        ...        1        ...        0 ]
[1.wav          0          1           0        ...        0        ...        0 ] 
[2.wav          0          1           0        ...        1        ...        0 ]

I’ve tried to use pandas.get_dummies and to change my values from int or float to strings, because the get_dummies function requires an object instead of numbers. However I don’t know how to change the columns the way I want.

Asked By: Compil3

||

Answers:

You can use get_dummies with groupby.max and reindex:

out = (pd.get_dummies(df.stack())
         .groupby(level=0).max()
         .reindex(columns=range(13), fill_value=0)
       )

Output:

           0   1   2   3   4   5   6   7   8   9  10  11  12
filename                                                    
0.wav      0   1   0   0   0   1   0   0   0   0   0   0   0
1.wav      0   1   0   0   0   0   0   0   1   0   0   0   0
2.wav      0   1   0   0   0   1   0   0   0   0   0   0   0
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.