Ordinal encoding in Pandas

Question

Is there a way to have pandas.get_dummies output the numerical representation in one column rather than a separate column for each option?

Concretely, currently when using pandas.get_dummies it gives me a column for every option:

Size	Size_Big	Size_Medium	Size_Small
Big	1	0	0
Medium	0	1	0
Small	0	0	1

But I’m looking for more of the following output:

Size	Size_Numerical
Big	1
Medium	2
Small	3

Asked By: mikelowry

||

Source

Answer 1

If using Pandas isn’t an absolute requirement, sklearn has an OrdinalEncoder that does exactly that (source)

Answered By: Mouse

Answer 2

I think OneHotEncoding has a similar issue that it expands and creates n-dimensions as labels. You need to use LabelEncoder so that:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df['Sizes'])
df['Category'] = le.transform(df['Sizes']) + 1

Outputs:

    Sizes  Category
0   Small         3
1  Medium         2
2   Large         1

Answered By: Celius Stingher

Answer 3

You don’t want dummies, you want factors/categories.

Use pandas.factorize:

df['Size_Numerical'] = pd.factorize(df['Size'])[0] + 1

output:

     Size  Size_Numerical
0     Big               1
1  Medium               2
2   Small               3

Answered By: mozway

Answer 4

With category, you could do


(
    dataf
    .astype({"Size":"category"})
    .assign(Size_Numerical = lambda d : d["Size"].cat.rename_categories({"Big": 1, "Medium": 2, "Small": 3})
    )

)

Tested with data

import pandas as pd


dataf = pd.DataFrame({'Size':["Big", "Medium", "Small","Medium"]})

Output

Answered By: Prayson W. Daniel

Answer 5

You can convert it to the Categorical type and get codes:

pd.Categorical(['A', 'B', 'C', 'A', 'C']).codes

Output:

array([0, 1, 2, 0, 2], dtype=int8)

Answered By: Mykola Zotko

Ordinal encoding in Pandas

Question:

Answers: