MinMaxScaler Python Changes original Data

Question:

I am trying to have 4 of my 5 csv column to predict the last column.
i used MinMaxScaler to scale my data to 0-1 range,
but at some point when i want to invers_transform it, MinMaxScaler changes my original data. Here is my Code:

dataset = read_csv('zz.csv', header=0, index_col=0)
values = dataset.values
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)

after i split my Scaled data into train_X and train_y i put them into my model and then fitting it:

train = values[:168, :]
test = values[168:, :]

train_X, train_y = train[:, [0,1,3,4]], train[:, 2]
test_X, test_y = test[:, [0,1,3,4]], test[:, 2]
train_X = train_X.reshape((train_X.shape[0], 1, train_X.shape[1]))
test_X = test_X.reshape((test_X.shape[0], 1, test_X.shape[1]))

my model is an LSTM:

# design network
model = Sequential()
model.add(LSTM(4, return_sequences=True, activation="relu", input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(LSTM(16, return_sequences=False, activation="relu"))
model.add(Dense(1))

nadam = tf.keras.optimizers.Nadam(learning_rate=0.0005, beta_1=0.9, beta_2=0.999, epsilon=1e-07)
model.compile(loss='mae', optimizer=nadam, metrics=[tf.keras.metrics.MeanSquaredError()])
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_mean_squared_error', patience=5)
history = model.fit(train_X, train_y, epochs=2000, verbose=1, validation_split=0.2, shuffle=False, callbacks=[stop_early])

then ill use test_X for prediction , in next line i concatenate my yhat which is my predicted data and my train_y to test_X in order to inverse_transform them. and then make inv_yhat and inv_y for further usages like calculating MSE, MAE etc etc.

# make a prediction
yhat = model.predict(test_X)
yhat = yhat.reshape(yhat.shape[0],1)
test_X = test_X.reshape((test_X.shape[0], test_X.shape[2]))
# invert scaling for forecast
inv_yhat = np.concatenate((yhat, test_X), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]
# invert scaling for actual
test_y = test_y.reshape((len(test_y), 1))
inv_y = np.concatenate((test_y, test_X), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,2]

but the problem is that when i use inverse_transform it will change my test_X data to some values that are different from original test_X.
for example this are my first 5 data in test_X :

array([[69.34],
       [69.66],
       [69.6],
       [69.38],
       [69.51],

and this are my inv_y which are the same test_X but after inverse_transform :

array([[68.78412 ],
       [68.73931 ],
       [68.715935],
       [68.65166 ],
       [68.69646 ],

I’ve tried also to only fit_transform train_data and transform test_data. but had the same problem

Asked By: Betabore

||

Answers:

You are scaling your data when your label column is at the second index.

train_X, train_y = train[:, [0,1,3,4]], train[:, 2]
test_X, test_y = test[:, [0,1,3,4]], test[:, 2]

If you inverse scale the label column is at a different position.

inv_yhat = np.concatenate((yhat, test_X), axis=1)
inv_y = np.concatenate((test_y, test_X), axis=1)

You should recheck your feature positions and that the array you want to rescale has the same structure as the original.

Answered By: DataFlo_w
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.