When saving a list of LabelEncoders the classes_ get overwritten by the last LabelEncoder
Question:
im trying to save a dict of LE encoders for use in inferencing, this is the code that trains and applies the LE and then saves the LE into dict (label_object) which then will be joblib.dump(ed)()
for col in data:
if data[col].dtype == 'object':
# If 2 or fewer unique categories
if len(list(data[col].unique())) >= 2:
# Train on the training data
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le
When trying this it seems the classes_ of the LE get overwritten by the last LE, in this case ‘day_of_incident’
Im not sure whats causing this issues, is there an issue with the logic of the code or am I doing something wrong?
Answers:
I suggest to avoid memory id()
issues, generating a new instance of Label Encoder per iteration as well. Also you can 1 line both and disregard the need to transform data[col].unique()
output into a list to evaluate if len() >= 2
conditions:
for col in data:
if (data[col].dtype == 'object') & (len(data[col].unique()) >=2:
le = LabelEncoder()
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le
im trying to save a dict of LE encoders for use in inferencing, this is the code that trains and applies the LE and then saves the LE into dict (label_object) which then will be joblib.dump(ed)()
for col in data:
if data[col].dtype == 'object':
# If 2 or fewer unique categories
if len(list(data[col].unique())) >= 2:
# Train on the training data
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le
When trying this it seems the classes_ of the LE get overwritten by the last LE, in this case ‘day_of_incident’
Im not sure whats causing this issues, is there an issue with the logic of the code or am I doing something wrong?
I suggest to avoid memory id()
issues, generating a new instance of Label Encoder per iteration as well. Also you can 1 line both and disregard the need to transform data[col].unique()
output into a list to evaluate if len() >= 2
conditions:
for col in data:
if (data[col].dtype == 'object') & (len(data[col].unique()) >=2:
le = LabelEncoder()
le.fit(data[col])
label_object[col] = le
# Transform both training and testing data
data[col] = le.transform(data[col])
label_object[col] = le