is there a way to use lambda or quicker way than a dictionary to recode pandas df column of unique categories into integer buckets like 0, 1, 2, etc?

Question:

Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?

I am trying to recode this without a dictionary or for loop:

   df['Genres'].unique()

array(['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play',...

It goes on for a while – a lot of unique values!

I would like to recode to 0, 1, 2, 3, etc accordingly.

TIA for any advice

Asked By: thedataboi

||

Answers:

This can be done factorize

df['Encoding'] = pd.factorize(df['Values'])[0]

Let’s say I use your sample as input:

df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
       'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
       'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
       'Comics', 'Comics;Creativity', 'Communication', 'Dating',
       'Education', 'Education;Creativity', 'Education;Education',
       'Education;Action & Adventure', 'Education;Pretend Play']})

Using the code proposed above, I get:

                             Values  Encoding
0                      Art & Design         0
1         Art & Design;Pretend Play         1
2           Art & Design;Creativity         2
3   Art & Design;Action & Adventure         3
4                                13         4
5                   Auto & Vehicles         5
6                            Beauty         6
7                 Books & Reference         7
8                          Business         8
9                            Comics         9
10                Comics;Creativity        10
11                    Communication        11
12                           Dating        12
13                        Education        13
14             Education;Creativity        14
15              Education;Education        15
16     Education;Action & Adventure        16
17           Education;Pretend Play        17
Answered By: Celius Stingher

I think you want to assign each genre to its index in

df['Genres'].unique()

Then you can simply call this

df['recodes'] = df.Genres.apply(lambda x: df['Genres'].unique().index(x))
Answered By: Nuri Taş

You can do something really dumb (literally) like
pd.get_dummies(df["Genres"]).idxmax(axis=1).

Go with the factorization one above. Can’t beat that one.

Answered By: O.rka
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.