When saving a list of LabelEncoders the classes_ get overwritten by the last LabelEncoder

Question

im trying to save a dict of LE encoders for use in inferencing, this is the code that trains and applies the LE and then saves the LE into dict (label_object) which then will be joblib.dump(ed)()

for col in data:
    if data[col].dtype == 'object':
        # If 2 or fewer unique categories
        if len(list(data[col].unique())) >= 2:
            # Train on the training data
            le.fit(data[col])
            label_object[col] = le
            # Transform both training and testing data
            data[col] = le.transform(data[col])
            label_object[col] = le

When trying this it seems the classes_ of the LE get overwritten by the last LE, in this case ‘day_of_incident’

Im not sure whats causing this issues, is there an issue with the logic of the code or am I doing something wrong?

Asked By: Metel Stairs

||

Source

Answer 1

I suggest to avoid memory id() issues, generating a new instance of Label Encoder per iteration as well. Also you can 1 line both and disregard the need to transform data[col].unique() output into a list to evaluate if len() >= 2 conditions:

for col in data:
    if (data[col].dtype == 'object') & (len(data[col].unique()) >=2:
            le = LabelEncoder()
            le.fit(data[col])
            label_object[col] = le
            # Transform both training and testing data
            data[col] = le.transform(data[col])
            label_object[col] = le

Answered By: Celius Stingher

When saving a list of LabelEncoders the classes_ get overwritten by the last LabelEncoder

Question:

Answers: