is there a way to use lambda or quicker way than a dictionary to recode pandas df column of unique categories into integer buckets like 0, 1, 2, etc?
Question:
Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?
I am trying to recode this without a dictionary or for loop:
df['Genres'].unique()
array(['Art & Design', 'Art & Design;Pretend Play',
'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
'Comics', 'Comics;Creativity', 'Communication', 'Dating',
'Education', 'Education;Creativity', 'Education;Education',
'Education;Action & Adventure', 'Education;Pretend Play',...
It goes on for a while – a lot of unique values!
I would like to recode to 0, 1, 2, 3, etc accordingly.
TIA for any advice
Answers:
This can be done factorize
df['Encoding'] = pd.factorize(df['Values'])[0]
Let’s say I use your sample as input:
df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
'Comics', 'Comics;Creativity', 'Communication', 'Dating',
'Education', 'Education;Creativity', 'Education;Education',
'Education;Action & Adventure', 'Education;Pretend Play']})
Using the code proposed above, I get:
Values Encoding
0 Art & Design 0
1 Art & Design;Pretend Play 1
2 Art & Design;Creativity 2
3 Art & Design;Action & Adventure 3
4 13 4
5 Auto & Vehicles 5
6 Beauty 6
7 Books & Reference 7
8 Business 8
9 Comics 9
10 Comics;Creativity 10
11 Communication 11
12 Dating 12
13 Education 13
14 Education;Creativity 14
15 Education;Education 15
16 Education;Action & Adventure 16
17 Education;Pretend Play 17
I think you want to assign each genre to its index in
df['Genres'].unique()
Then you can simply call this
df['recodes'] = df.Genres.apply(lambda x: df['Genres'].unique().index(x))
You can do something really dumb (literally) like
pd.get_dummies(df["Genres"]).idxmax(axis=1)
.
Go with the factorization one above. Can’t beat that one.
Is there a quicker way via lambda or otherwise to recode the every unique value in a pandas df?
I am trying to recode this without a dictionary or for loop:
df['Genres'].unique()
array(['Art & Design', 'Art & Design;Pretend Play',
'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
'Comics', 'Comics;Creativity', 'Communication', 'Dating',
'Education', 'Education;Creativity', 'Education;Education',
'Education;Action & Adventure', 'Education;Pretend Play',...
It goes on for a while – a lot of unique values!
I would like to recode to 0, 1, 2, 3, etc accordingly.
TIA for any advice
This can be done factorize
df['Encoding'] = pd.factorize(df['Values'])[0]
Let’s say I use your sample as input:
df = pd.DataFrame({'Values':['Art & Design', 'Art & Design;Pretend Play',
'Art & Design;Creativity', 'Art & Design;Action & Adventure', 13,
'Auto & Vehicles', 'Beauty', 'Books & Reference', 'Business',
'Comics', 'Comics;Creativity', 'Communication', 'Dating',
'Education', 'Education;Creativity', 'Education;Education',
'Education;Action & Adventure', 'Education;Pretend Play']})
Using the code proposed above, I get:
Values Encoding
0 Art & Design 0
1 Art & Design;Pretend Play 1
2 Art & Design;Creativity 2
3 Art & Design;Action & Adventure 3
4 13 4
5 Auto & Vehicles 5
6 Beauty 6
7 Books & Reference 7
8 Business 8
9 Comics 9
10 Comics;Creativity 10
11 Communication 11
12 Dating 12
13 Education 13
14 Education;Creativity 14
15 Education;Education 15
16 Education;Action & Adventure 16
17 Education;Pretend Play 17
I think you want to assign each genre to its index in
df['Genres'].unique()
Then you can simply call this
df['recodes'] = df.Genres.apply(lambda x: df['Genres'].unique().index(x))
You can do something really dumb (literally) like
pd.get_dummies(df["Genres"]).idxmax(axis=1)
.
Go with the factorization one above. Can’t beat that one.