One Hot Encoding giving nan values in python

Question:

I have a classification case study where I am using Logistic Regression model. I want to use One Hot Encoding to convert my categorical column (SalStat) values into 0 and 1. This is my code:

data2["SalStat"] = data2["SalStat"].map({"less than or equal to 50,000":0, "greater than 50,000":1})
print(data2["SalStat"])

Above code does not convert the values to 0 and 1 but instead converts them to nan!
Where am I going wrong?

PS: The SalStat column classifies rows as "less than or equal to 50,000" or "greater than 50,000"

Asked By: Sage

||

Answers:

I think the issue might be with the mapper that you have defined.
What if there are some whitespaces in the text.
Have a look at this answer

Answered By: Akshay Bahadur

I guess it throws error because of the values in SalStat column.
It is better to assign them to a variable instead of typing manually.

val_1 = data2["SalStat"].unique()[0]
val_2 = data2["SalStat"].unique()[1]

data2["SalStat"] = data2["SalStat"].map({val_1 :0, val_2 :1})
print(data2["SalStat"])
Answered By: Tyr

For the process of One-Hot Encoding, i suggest you try pd.get_dummies(data2[‘SalStat’]). It is a method provided by pandas that will perform One-Hot Encoding on categorical features. The .get_dummies() method is actually a short-hand for One-Hot Encoding. If you would like to conduct OHE the lengthy way, you could –

from sklearn.preprocessing import OneHotEncoder

That is a preprocessing technique for transforming our categorical classes into binary features. During the One-Hot Encoding process, each categorical class will become it’s own feature consisting of binary data type, with ones (1) indicating the presence of the class, and zeros (0) indicating otherwise. for example:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

dataframe = pd.DataFrame({'Name': ['Jack', 'Mary', 'Sheldon']})

print(dataframe)
print(' ')

technique = OneHotEncoder(sparse=False,
                          drop=None,
                          categories='auto',
                          handle_unknown='error')

new_dataframe = pd.DataFrame(technique.fit_transform(dataframe), 
columns=technique.categories_)

print(new_dataframe)

original dataframe:

      Name
0     Jack
1     Mary
2  Sheldon

new dataframe:

  Jack Mary Sheldon
0  1.0  0.0     0.0
1  0.0  1.0     0.0
2  0.0  0.0     1.0
Answered By: Destroyer-byte

While mapping the data, please check whether the values in the column "SalStat" is exactly the same as the ones you give in the mapping (including the white spaces). In the given data set, there is a space in the beginning of the values "less than or equal to 50,000" and "greater than 50,000". That is the reason you are getting Nan values while mapping as the values don’t match.

Try this:
data2[‘SalStat’]=data2[‘SalStat’].map({‘ less than or equal to 50,000’:0, ‘ greater than 50,000’:1})

Answered By: Ikleel