Keras prediction incorrect with scaler and feature selection

Question:

I build an application that trains a Keras binary classifier model (0 or 1) every x time (hourly,daily) given the new data. The data preparation, training and testing works well, or at least as expected. It tests different features and scales it with MinMaxScaler (some values are negative).

On live data predictions with one single data point, the values are unrealistic (around 0.9987 to 1 most of the time, which is inaccurate). Since the result should be how close to "1" the prediction is, getting such high numbers constantly raises alerts.

Code for live prediction is as follows

current_df is a pandas dataframe that contains the 1 row with the data pulled live and the column headers, we select the "features" (since why load the features from the db and we implement dynamic feature selection when training the model, which could mean on every model there are different features)

Get the features as a list:

# Convert literal str to list
features = ast.literal_eval(features) 

Then select only the features that I need in the dataframe:

# Select the features
selected_df = current_df[features]

Get the values as a list:

 # Get the values of the df
 current_list = selected_df.values.tolist()[0]

Then I reshape it:

 # Reshape to allow scaling and predicting
 current_list = np.reshape(current_list, (-1, 1))

If I call "transform" instead of "fit_transform" in the line above, I get the following error: This MinMaxScaler instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.

Reshape again:

# Reshape to be able to scale
current_list = np.reshape(current_list, (1, -1))

Loads the model using Keras (model_location is a Path) and predict:

# Loads the model from the local folder
reconstructed_model = keras.models.load_model(model_location)

prediction = reconstructed_model.predict(current_list)
prediction = prediction.flat[0]

Updated

The data gets scaled using fit_transform and transform (MinMaxScaler although it can be Standard Scaler):

X_train = pd.DataFrame(scaler.fit_transform(X_train), columns=X_train.columns, index=X_train.index)
X_test = pd.DataFrame(scaler.transform(X_test), columns=X_test.columns, index=X_test.index)

And this is run when training the model (the "model" config is not shown):

# Compile the model
model.compile(optimizer=optimizer, 
                loss=loss, 
                metrics=['binary_accuracy'])

# build the model
model.fit(X_train, y_train, epochs=epochs, verbose=0)

# Evaluate using Keras built-in function
scores = model.evaluate(X_test, y_test, verbose=0)
testing_accuracy = scores[1]

# create model with sklearn KerasClassifier for evaluation
eval_model = KerasClassifier(model=model, epochs=epochs, batch_size=10, verbose=0)

# Evaluate model using RepeatedStratifiedKFold
accuracy = ML.evaluate_model_KFold(eval_model, X_test, y_test)

# Predict testing data
pred_test= model.predict(X_test, verbose=0)
pred_test = pred_test.flatten()

# extract the predicted class labels
y_predicted_test = np.where(pred_test > 0.5, 1, 0)

Regarding feature selection, the features are not always the same –I use both SelectKBest (10 or 15 features) or RFECV. And select the trained model with highest accuracy, meaning the features can be different.

Is there anything I’m doing wrong here? I’m thinking maybe the scaling should be done before the feature selection or there’s some issue with the scaling (since maybe some values might be 0 when training and 100 when using it and the features are not necessarily the same when scaling).

Asked By: galgo

||

Answers:

The issues seems to stem from a StandardScaler / MinMaxScaler. The following example shows how to apply the former. However, if there are separate scripts handling learning/prediction, then the scaler will also need to be serialized and loaded at prediction time.

Set up a classification problem:

X, y = make_classification(n_samples=10_000)
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y)

Fit a StandardScaler instance on the training set and use the same parameters to .transform the test set:

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

# Train time: Serialize the scaler to a pickle file.
with open("scaler.pkl", "wb") as fh:
    pickle.dump(scaler, fh)

# Test time: Load the scaler and apply to the test set.
with open("scaler.pkl", "rb") as fh:
    new_scaler = pickle.load(fh)

X_test = new_scaler.transform(X_test)

Which means that the model should be fit on features with similar distributions:

model = keras.Sequential([
    keras.Input(shape=X_train.shape[1]),
    layers.Dense(100),
    layers.Dropout(0.1),
    layers.Dense(1, activation="relu")])
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["binary_accuracy"])
model.fit(X_train, y_train, epochs=25)

y_pred = np.where(model.predict(X_test)[:, 0] > 0.5, 1, 0)
print(accuracy_score(y_test, y_pred))
# 0.8708
Answered By: Alexander L. Hayes

Alexander’s answer is correct, I think there is just some confusion between testing and live prediction. What he said regarding the testing step is equally applicable to live prediction step. After you’ve called scaler.fit_transform on your training set, add the following code to save the scaler:

with open("scaler.pkl", "wb") as fh:
    pickle.dump(scaler, fh)

Then, during live prediction step, you don’t call fit_transform. Instead, you load the scaler saved during training and call transform:

with open("scaler.pkl", "rb") as fh:
    new_scaler = pickle.load(fh)

# Load features, reshape them, etc

# Scaling step
current_list = new_scaler.transform(current_list)

# Features are scaled properly now, put the rest of your prediction code here

You always call fit_transform only once per model, during the training step on your training pool. After that (during testing or calculating predictions after model deployment) you never call it, only call transform. Treat scaler as part of the model. Naturally, you fit the model on the training set and then during testing and live prediction you use the same model, never refitting it. The same should be true for the scaler.

If you call scaler.fit_transform on live prediction features it creates a new scaler that has no prior knowledge of feature distribution on training set.

Answered By: Alex Bochkarev
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.