How to use Label Encoder in a dataframe which is nested in another dataframe
Question:
My dataset is:
https://www.kaggle.com/datasets/angeredsquid/brewers-friend-beer-recipes
I loaded like this:
import json
filename = 'recipes_full copy.json'
with open(filename, 'r') as f:
try:
json_data = json.load(f)
print("The JSON file is valid")
except ValueError as e:
print("The JSON file is invalid:", e)
df = pd.DataFrame(json_data.values())
REsult is:
Then I convert the fermentables and hops columns into dataframes
like this:
df['fermentables'] = df['fermentables'].apply(pd.DataFrame,columns=["kg","Malt","ppg", "°L Degree Lintner", "bill"])
df['hops'] = df['hops'].apply(pd.DataFrame,columns=["grams", "hop","hoptype", " % AA", "Type", "Time", "IBU", "Percentage"])
and the result is like this:
Now I need to be able to convert Malt Name and Hop Name with LabelEncoder.
How can I do this inside the nested dataframe? For all Rows of the main dataframe?
Answers:
You can concat all the desired columns to feed them to LabelEncoder
:
malts = pd.concat([x['Malt'] for x in df["fermentables"]])
le_ferm = preprocessing.LabelEncoder()
le_ferm.fit(malts)
print(le_ferm.classes_)
Output (for the first 80 rows):
['American - Black Malt' 'American - Caramel / Crystal 10L'
'American - Caramel / Crystal 120L' 'American - Caramel / Crystal 20L'
'American - Caramel / Crystal 40L' 'American - Caramel / Crystal 60L'
'American - Caramel / Crystal 80L' 'American - Carapils (Dextrine Malt)'
'American - Chocolate' 'American - Munich - Dark 20L'
'American - Munich - Light 10L' 'American - Pale 2-Row'
'American - Pale 6-Row' 'American - Pale Ale' 'American - Pilsner'
'American - Red Wheat' 'American - Roasted Barley' 'American - Rye'
'American - Special Roast' 'American - Victory' 'American - Vienna'
'American - Wheat' 'American - White Wheat' 'American Crystal 40'
'American Munich' 'Belgian - Cara 20L' 'Belgian - Caramel Pils'
'Belgian - Munich' 'Belgian - Pilsner' 'Belgian - Wheat'
'Belgian Candi Sugar - Clear/Blond' 'Best Munich Dark' 'Best Pilsener'
'Briess American 2-Row' 'Briess Caramel 10L' 'Brown Malt' 'Brown Sugar'
'Canadian - Honey Malt' 'Canadian - Pale Wheat' 'Candi Sugar, Clear'
'Cane Sugar' 'CaraPils' 'Carablonde -Chateau'
'Caramel/Crystal Malt - 40L' 'Caramel/Crystal Malt -120L'
'Castle Malting Abbey' 'Castle Malting Pilsen 2RP/2RS'
'Castle Malting Wheat Blanc' 'Chocolate Malt' 'Corn Sugar (Dextrose)'
'Corn Sugar - Dextrose' 'Corn Sugar - Dextrose ' 'Dark brown sugar'
'Dark sugar' 'Dingemans - Caramunich' 'Dingemans - Pilsen'
'Dry Malt Extract - Extra Light' 'Dry Malt Extract - Light'
'Dry Malt Extract - Pilsen' 'Dry Malt Extract - Wheat' 'Farin, hvit'
'Flaked Barley' 'Flaked Corn' 'Flaked Oats' 'Flaked Rice' 'Flaked Wheat'
'German - Acidulated Malt' 'German - CaraAroma' 'German - CaraFoam'
'German - CaraHell' 'German - Carapils' 'German - Melanoidin'
'German - Munich Dark' 'German - Munich Light' 'German - Pale Ale'
'German - Pale Wheat' 'German - Pilsner' 'German - Vienna'
'German - Wheat Malt' 'Honey' 'Lactose (Milk Sugar)'
'Liquid Malt Extract - Amber' 'Liquid Malt Extract - Light' 'Malted Oats'
'Maris Otter' 'Munich Malt' 'Pilsner (2 Row) Ger' 'Pilsner Malt'
'Rice Hulls' 'Rice Syrup Solids' 'Rolled Oats' 'Sucrose'
'Thomas Fawcett Black Malt' 'Thomas Fawcett Chocolate Malt'
'Thomas Fawcett Crystal Malt' 'Thomas Fawcett Crystal Malt II'
'Thomas Fawcett Maris Otter Pale Malt '
'Thomas Fawcett Pale Ale Malt (Maris Otter)' 'Torrified Wheat'
'Turbinado' 'United Kingdom - Brown' 'United Kingdom - Cara Malt'
'United Kingdom - Chocolate' 'United Kingdom - Crystal 45L'
'United Kingdom - Crystal 90L' 'United Kingdom - Golden Naked Oats'
'United Kingdom - Lager' 'United Kingdom - Maris Otter Pale'
'United Kingdom - Munich' 'United Kingdom - Oat Malt'
'United Kingdom - Pale 2-Row' 'United Kingdom - Pearl'
'United Kingdom - Pilsen' 'United Kingdom - Roasted Barley'
'Weyermann CaraMunich I' 'Weyermann CaraMunich II'
'Weyermann Carafa Special II' 'Weyermann Dark Munich'
'Weyermann Munich II' 'Weyermann Pale Wheat' 'Weyermann Pilsner Malt'
'Weyermann Vienna' 'Wheat starch' 'pumpkin' 'wheat flour king aurthur ']
Edit: once your label encoders have been fitted, you can transform the corresponding columns with transform
:
for df_ferm in df["fermentables"]:
df_ferm['Malt'] = le_ferm.transform(df_ferm['Malt'])
print(df.at[0, 'fermentables'])
Output:
kg Malt ppg °L Degree Lintner bill
0 2.381 11 37.0 1.8 44.7
1 0.907 22 40.0 2.8 17.0
2 0.907 12 35.0 1.8 17.0
3 0.227 62 40.0 0.5 4.3
4 0.227 3 35.0 20.0 4.3
5 0.227 7 33.0 1.8 4.3
6 0.113 61 32.0 2.2 2.1
7 0.340 79 42.0 2.0 6.4
My dataset is:
https://www.kaggle.com/datasets/angeredsquid/brewers-friend-beer-recipes
I loaded like this:
import json
filename = 'recipes_full copy.json'
with open(filename, 'r') as f:
try:
json_data = json.load(f)
print("The JSON file is valid")
except ValueError as e:
print("The JSON file is invalid:", e)
df = pd.DataFrame(json_data.values())
REsult is:
Then I convert the fermentables and hops columns into dataframes
like this:
df['fermentables'] = df['fermentables'].apply(pd.DataFrame,columns=["kg","Malt","ppg", "°L Degree Lintner", "bill"])
df['hops'] = df['hops'].apply(pd.DataFrame,columns=["grams", "hop","hoptype", " % AA", "Type", "Time", "IBU", "Percentage"])
and the result is like this:
Now I need to be able to convert Malt Name and Hop Name with LabelEncoder.
How can I do this inside the nested dataframe? For all Rows of the main dataframe?
You can concat all the desired columns to feed them to LabelEncoder
:
malts = pd.concat([x['Malt'] for x in df["fermentables"]])
le_ferm = preprocessing.LabelEncoder()
le_ferm.fit(malts)
print(le_ferm.classes_)
Output (for the first 80 rows):
['American - Black Malt' 'American - Caramel / Crystal 10L'
'American - Caramel / Crystal 120L' 'American - Caramel / Crystal 20L'
'American - Caramel / Crystal 40L' 'American - Caramel / Crystal 60L'
'American - Caramel / Crystal 80L' 'American - Carapils (Dextrine Malt)'
'American - Chocolate' 'American - Munich - Dark 20L'
'American - Munich - Light 10L' 'American - Pale 2-Row'
'American - Pale 6-Row' 'American - Pale Ale' 'American - Pilsner'
'American - Red Wheat' 'American - Roasted Barley' 'American - Rye'
'American - Special Roast' 'American - Victory' 'American - Vienna'
'American - Wheat' 'American - White Wheat' 'American Crystal 40'
'American Munich' 'Belgian - Cara 20L' 'Belgian - Caramel Pils'
'Belgian - Munich' 'Belgian - Pilsner' 'Belgian - Wheat'
'Belgian Candi Sugar - Clear/Blond' 'Best Munich Dark' 'Best Pilsener'
'Briess American 2-Row' 'Briess Caramel 10L' 'Brown Malt' 'Brown Sugar'
'Canadian - Honey Malt' 'Canadian - Pale Wheat' 'Candi Sugar, Clear'
'Cane Sugar' 'CaraPils' 'Carablonde -Chateau'
'Caramel/Crystal Malt - 40L' 'Caramel/Crystal Malt -120L'
'Castle Malting Abbey' 'Castle Malting Pilsen 2RP/2RS'
'Castle Malting Wheat Blanc' 'Chocolate Malt' 'Corn Sugar (Dextrose)'
'Corn Sugar - Dextrose' 'Corn Sugar - Dextrose ' 'Dark brown sugar'
'Dark sugar' 'Dingemans - Caramunich' 'Dingemans - Pilsen'
'Dry Malt Extract - Extra Light' 'Dry Malt Extract - Light'
'Dry Malt Extract - Pilsen' 'Dry Malt Extract - Wheat' 'Farin, hvit'
'Flaked Barley' 'Flaked Corn' 'Flaked Oats' 'Flaked Rice' 'Flaked Wheat'
'German - Acidulated Malt' 'German - CaraAroma' 'German - CaraFoam'
'German - CaraHell' 'German - Carapils' 'German - Melanoidin'
'German - Munich Dark' 'German - Munich Light' 'German - Pale Ale'
'German - Pale Wheat' 'German - Pilsner' 'German - Vienna'
'German - Wheat Malt' 'Honey' 'Lactose (Milk Sugar)'
'Liquid Malt Extract - Amber' 'Liquid Malt Extract - Light' 'Malted Oats'
'Maris Otter' 'Munich Malt' 'Pilsner (2 Row) Ger' 'Pilsner Malt'
'Rice Hulls' 'Rice Syrup Solids' 'Rolled Oats' 'Sucrose'
'Thomas Fawcett Black Malt' 'Thomas Fawcett Chocolate Malt'
'Thomas Fawcett Crystal Malt' 'Thomas Fawcett Crystal Malt II'
'Thomas Fawcett Maris Otter Pale Malt '
'Thomas Fawcett Pale Ale Malt (Maris Otter)' 'Torrified Wheat'
'Turbinado' 'United Kingdom - Brown' 'United Kingdom - Cara Malt'
'United Kingdom - Chocolate' 'United Kingdom - Crystal 45L'
'United Kingdom - Crystal 90L' 'United Kingdom - Golden Naked Oats'
'United Kingdom - Lager' 'United Kingdom - Maris Otter Pale'
'United Kingdom - Munich' 'United Kingdom - Oat Malt'
'United Kingdom - Pale 2-Row' 'United Kingdom - Pearl'
'United Kingdom - Pilsen' 'United Kingdom - Roasted Barley'
'Weyermann CaraMunich I' 'Weyermann CaraMunich II'
'Weyermann Carafa Special II' 'Weyermann Dark Munich'
'Weyermann Munich II' 'Weyermann Pale Wheat' 'Weyermann Pilsner Malt'
'Weyermann Vienna' 'Wheat starch' 'pumpkin' 'wheat flour king aurthur ']
Edit: once your label encoders have been fitted, you can transform the corresponding columns with transform
:
for df_ferm in df["fermentables"]:
df_ferm['Malt'] = le_ferm.transform(df_ferm['Malt'])
print(df.at[0, 'fermentables'])
Output:
kg Malt ppg °L Degree Lintner bill
0 2.381 11 37.0 1.8 44.7
1 0.907 22 40.0 2.8 17.0
2 0.907 12 35.0 1.8 17.0
3 0.227 62 40.0 0.5 4.3
4 0.227 3 35.0 20.0 4.3
5 0.227 7 33.0 1.8 4.3
6 0.113 61 32.0 2.2 2.1
7 0.340 79 42.0 2.0 6.4