Convert Categorical features to Numerical

Question:

I have a lot of categorical columns and want to convert values in those columns to numerical values so that I will be able to apply ML model.

Now by data looks something like below.

Column 1- Good/bad/poor/not reported
column 2- Red/amber/green
column 3- 1/2/3
column 4- Yes/No

Now I have already assigned numerical values of 1,2,3,4 to good, bad, poor, not reported in column 1 .

So, now can I give the same numerical values like 1,2,3 to red,green, amber etc in column 2 and in a similar fashion to other columns or will doing that confuse model when I implement it

Asked By: Pranav167

||

Answers:

The colour values you mention are nominal. There is no ranking or order to these values. If you assign 1,2,3 etc the data can be misrepresented as being from a scale.

To avoid this you can transform them by using the onehotencoder technique. This effectively encodes a multi value categorical field into the following:

red = 100
amber = 010
green = 001

You can use the following library from sk-learn.preprocessing:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Answered By: ai_10111

You can do this for some of the rated columns by using df[colname].map({})or LabelEncoder() .
They will change each categorical data to numbers, so there is a weight between them, which means if poor is one and good is 3, as you can see, there is a difference between them. You want the model to know it, but if it’s just something like colors, you know there is no preference in colors, and green is no different from blue .so it is better not to use the same method and use get_dummies in pandas.

Answered By: atena karimi