How to use Label Encoder in a dataframe which is nested in another dataframe

Question:

My dataset is:

https://www.kaggle.com/datasets/angeredsquid/brewers-friend-beer-recipes

I loaded like this:

import json

filename = 'recipes_full copy.json'

with open(filename, 'r') as f:
    try:
        json_data = json.load(f)
        print("The JSON file is valid")
    except ValueError as e:
        print("The JSON file is invalid:", e)

df = pd.DataFrame(json_data.values())

REsult is:

enter image description here

Then I convert the fermentables and hops columns into dataframes

like this:

df['fermentables'] = df['fermentables'].apply(pd.DataFrame,columns=["kg","Malt","ppg", "°L Degree Lintner", "bill"])
df['hops'] = df['hops'].apply(pd.DataFrame,columns=["grams", "hop","hoptype", " % AA", "Type", "Time", "IBU", "Percentage"])

and the result is like this:

enter image description here

Now I need to be able to convert Malt Name and Hop Name with LabelEncoder.

How can I do this inside the nested dataframe? For all Rows of the main dataframe?

Asked By: Luis Valencia

||

Answers:

You can concat all the desired columns to feed them to LabelEncoder:

malts = pd.concat([x['Malt'] for x in df["fermentables"]])
le_ferm = preprocessing.LabelEncoder()
le_ferm.fit(malts)
print(le_ferm.classes_)

Output (for the first 80 rows):

['American - Black Malt' 'American - Caramel / Crystal 10L'
 'American - Caramel / Crystal 120L' 'American - Caramel / Crystal 20L'
 'American - Caramel / Crystal 40L' 'American - Caramel / Crystal 60L'
 'American - Caramel / Crystal 80L' 'American - Carapils (Dextrine Malt)'
 'American - Chocolate' 'American - Munich - Dark 20L'
 'American - Munich - Light 10L' 'American - Pale 2-Row'
 'American - Pale 6-Row' 'American - Pale Ale' 'American - Pilsner'
 'American - Red Wheat' 'American - Roasted Barley' 'American - Rye'
 'American - Special Roast' 'American - Victory' 'American - Vienna'
 'American - Wheat' 'American - White Wheat' 'American Crystal 40'
 'American Munich' 'Belgian - Cara 20L' 'Belgian - Caramel Pils'
 'Belgian - Munich' 'Belgian - Pilsner' 'Belgian - Wheat'
 'Belgian Candi Sugar - Clear/Blond' 'Best Munich Dark' 'Best Pilsener'
 'Briess American 2-Row' 'Briess Caramel 10L' 'Brown Malt' 'Brown Sugar'
 'Canadian - Honey Malt' 'Canadian - Pale Wheat' 'Candi Sugar, Clear'
 'Cane Sugar' 'CaraPils' 'Carablonde -Chateau'
 'Caramel/Crystal Malt - 40L' 'Caramel/Crystal Malt -120L'
 'Castle Malting Abbey' 'Castle Malting Pilsen 2RP/2RS'
 'Castle Malting Wheat Blanc' 'Chocolate Malt' 'Corn Sugar (Dextrose)'
 'Corn Sugar - Dextrose' 'Corn Sugar - Dextrose ' 'Dark brown sugar'
 'Dark sugar' 'Dingemans - Caramunich' 'Dingemans - Pilsen'
 'Dry Malt Extract - Extra Light' 'Dry Malt Extract - Light'
 'Dry Malt Extract - Pilsen' 'Dry Malt Extract - Wheat' 'Farin, hvit'
 'Flaked Barley' 'Flaked Corn' 'Flaked Oats' 'Flaked Rice' 'Flaked Wheat'
 'German - Acidulated Malt' 'German - CaraAroma' 'German - CaraFoam'
 'German - CaraHell' 'German - Carapils' 'German - Melanoidin'
 'German - Munich Dark' 'German - Munich Light' 'German - Pale Ale'
 'German - Pale Wheat' 'German - Pilsner' 'German - Vienna'
 'German - Wheat Malt' 'Honey' 'Lactose (Milk Sugar)'
 'Liquid Malt Extract - Amber' 'Liquid Malt Extract - Light' 'Malted Oats'
 'Maris Otter' 'Munich Malt' 'Pilsner (2 Row) Ger' 'Pilsner Malt'
 'Rice Hulls' 'Rice Syrup Solids' 'Rolled Oats' 'Sucrose'
 'Thomas Fawcett Black Malt' 'Thomas Fawcett Chocolate Malt'
 'Thomas Fawcett Crystal Malt' 'Thomas Fawcett Crystal Malt II'
 'Thomas Fawcett Maris Otter Pale Malt '
 'Thomas Fawcett Pale Ale Malt (Maris Otter)' 'Torrified Wheat'
 'Turbinado' 'United Kingdom - Brown' 'United Kingdom - Cara Malt'
 'United Kingdom - Chocolate' 'United Kingdom - Crystal 45L'
 'United Kingdom - Crystal 90L' 'United Kingdom - Golden Naked Oats'
 'United Kingdom - Lager' 'United Kingdom - Maris Otter Pale'
 'United Kingdom - Munich' 'United Kingdom - Oat Malt'
 'United Kingdom - Pale 2-Row' 'United Kingdom - Pearl'
 'United Kingdom - Pilsen' 'United Kingdom - Roasted Barley'
 'Weyermann CaraMunich I' 'Weyermann CaraMunich II'
 'Weyermann Carafa Special II' 'Weyermann Dark Munich'
 'Weyermann Munich II' 'Weyermann Pale Wheat' 'Weyermann Pilsner Malt'
 'Weyermann Vienna' 'Wheat starch' 'pumpkin' 'wheat flour king aurthur ']

Edit: once your label encoders have been fitted, you can transform the corresponding columns with transform:

for df_ferm in df["fermentables"]:
    df_ferm['Malt'] = le_ferm.transform(df_ferm['Malt'])

print(df.at[0, 'fermentables'])

Output:

      kg  Malt   ppg  °L Degree Lintner  bill
0  2.381    11  37.0                1.8  44.7
1  0.907    22  40.0                2.8  17.0
2  0.907    12  35.0                1.8  17.0
3  0.227    62  40.0                0.5   4.3
4  0.227     3  35.0               20.0   4.3
5  0.227     7  33.0                1.8   4.3
6  0.113    61  32.0                2.2   2.1
7  0.340    79  42.0                2.0   6.4
Answered By: Tranbi
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.