PANDAS: How to swap columns with one hot encoded values
Question:
So my pandas dataframe looks like this:
category1 category2 ... category6 category7
[filename ... ]
[0.wav 5 1.0 ... NaN NaN]
[1.wav 8 1.0 ... NaN NaN]
[2.wav 5 1.0 ... NaN NaN]
I’ve set the filename column as my index. I want now to have these values as my new columns. The values inside every column are numbrs from 0 to 12 or NaN. I want to have numbers from 0 to 12 instead of category1…7 and one hot encode these values. So for file 0.wav I would like to have a 1 at column 5 and a 1 at column 1 and the rest would be 0. So like this:
0 1 2 ... 5 ... 12
[filename ]
[0.wav 0 1 0 ... 1 ... 0 ]
[1.wav 0 1 0 ... 0 ... 0 ]
[2.wav 0 1 0 ... 1 ... 0 ]
I’ve tried to use pandas.get_dummies and to change my values from int or float to strings, because the get_dummies function requires an object instead of numbers. However I don’t know how to change the columns the way I want.
Answers:
You can use get_dummies
with groupby.max
and reindex
:
out = (pd.get_dummies(df.stack())
.groupby(level=0).max()
.reindex(columns=range(13), fill_value=0)
)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12
filename
0.wav 0 1 0 0 0 1 0 0 0 0 0 0 0
1.wav 0 1 0 0 0 0 0 0 1 0 0 0 0
2.wav 0 1 0 0 0 1 0 0 0 0 0 0 0
So my pandas dataframe looks like this:
category1 category2 ... category6 category7
[filename ... ]
[0.wav 5 1.0 ... NaN NaN]
[1.wav 8 1.0 ... NaN NaN]
[2.wav 5 1.0 ... NaN NaN]
I’ve set the filename column as my index. I want now to have these values as my new columns. The values inside every column are numbrs from 0 to 12 or NaN. I want to have numbers from 0 to 12 instead of category1…7 and one hot encode these values. So for file 0.wav I would like to have a 1 at column 5 and a 1 at column 1 and the rest would be 0. So like this:
0 1 2 ... 5 ... 12
[filename ]
[0.wav 0 1 0 ... 1 ... 0 ]
[1.wav 0 1 0 ... 0 ... 0 ]
[2.wav 0 1 0 ... 1 ... 0 ]
I’ve tried to use pandas.get_dummies and to change my values from int or float to strings, because the get_dummies function requires an object instead of numbers. However I don’t know how to change the columns the way I want.
You can use get_dummies
with groupby.max
and reindex
:
out = (pd.get_dummies(df.stack())
.groupby(level=0).max()
.reindex(columns=range(13), fill_value=0)
)
Output:
0 1 2 3 4 5 6 7 8 9 10 11 12
filename
0.wav 0 1 0 0 0 1 0 0 0 0 0 0 0
1.wav 0 1 0 0 0 0 0 0 1 0 0 0 0
2.wav 0 1 0 0 0 1 0 0 0 0 0 0 0