Ordinal encoding in Pandas

Question:

Is there a way to have pandas.get_dummies output the numerical representation in one column rather than a separate column for each option?

Concretely, currently when using pandas.get_dummies it gives me a column for every option:

Size Size_Big Size_Medium Size_Small
Big 1 0 0
Medium 0 1 0
Small 0 0 1

But I’m looking for more of the following output:

Size Size_Numerical
Big 1
Medium 2
Small 3
Asked By: mikelowry

||

Answers:

If using Pandas isn’t an absolute requirement, sklearn has an OrdinalEncoder that does exactly that (source)

Answered By: Mouse

I think OneHotEncoding has a similar issue that it expands and creates n-dimensions as labels. You need to use LabelEncoder so that:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Sizes'])
df['Category'] = le.transform(df['Sizes']) + 1

Outputs:

    Sizes  Category
0   Small         3
1  Medium         2
2   Large         1
Answered By: Celius Stingher

You don’t want dummies, you want factors/categories.

Use pandas.factorize:

df['Size_Numerical'] = pd.factorize(df['Size'])[0] + 1

output:

     Size  Size_Numerical
0     Big               1
1  Medium               2
2   Small               3
Answered By: mozway

With category, you could do


(
    dataf
    .astype({"Size":"category"})
    .assign(Size_Numerical = lambda d : d["Size"].cat.rename_categories({"Big": 1, "Medium": 2, "Small": 3})
    )

)

Tested with data

import pandas as pd


dataf = pd.DataFrame({'Size':["Big", "Medium", "Small","Medium"]})

Output
enter image description here

Answered By: Prayson W. Daniel

You can convert it to the Categorical type and get codes:

pd.Categorical(['A', 'B', 'C', 'A', 'C']).codes

Output:

array([0, 1, 2, 0, 2], dtype=int8)
Answered By: Mykola Zotko
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.