train_test_split affect result when predict value the same

Question

I’m new in data science, I have a question about train_test_split.

I have a example try to predict ice tea sales from temperature

My Question is when I use train_test_split, my mse, score & predict sales value will be different every times (since train_test_split selected different part every times)

Is this normal? If user enter 30 degree same value every time and they will get different predict sales value?

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

#1. predict value
temperature = np.reshape(np.array([30]), (1, 1))

#2. data
X = np.array([29, 28, 34, 31, 25, 29, 32, 31, 24, 33, 25, 31, 26, 30]) #temperatures
y = np.array([77, 62, 93, 84, 59, 64, 80, 75, 58, 91, 51, 73, 65, 84]) #iced_tea_sales

X = np.reshape(X, (len(X), 1))
y = np.reshape(y, (len(y), 1))

#3. split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

#4. train
lm = LinearRegression()
lm.fit(X_train, y_train)

#5. mse score
y_pred = lm.predict(X_test)
mse = np.mean((y_pred - y_test) ** 2)
r_squared = lm.score(X_test, y_test)

print(f'mse: {mse}')
print(f'score(r_squared): {r_squared}')

#6. predict
sales = lm.predict(temperature)
print(sales) #output, user get their prediction

Asked By: Benjamin W

||

Source

Answer 1

The values will never be the same as when you fit() any model even on the same data multiple times, the weights learned may vary hence the predictions can never be the same. Though, they should be close enough (if you don’t have outliers) as the distribution from which the samples are coming is common.

Answered By: Mehul Gupta

Answer 2

Yes all the work all over again. And i thought we are in 2022 ??? What’s the use of NN’s and AI if those ain’t "smart" enough to adjust them self corresponding to database and changes ? Changing input layers, hidden layers to some internal self deployed algorithms within balance under/over-fitting. Someone should develop input data module that automatize dataset’s test/train/forecast on train and base dataset to check model/forecast new dataset on base the whole database set (base+train) with less user intervention as possible in simple and clean/clear workflow. Future isn’t in user’s struggling in data inputs (unless u’re developer and CLI lover, all the others are just the users). This only shows how primitive such solutions are. All the others just try to fail in use of NN’s and AI for what so ever job, just like potential Linux distributions user in try to fail and not try to use as it should be. Those NN’s are pure frustration specially input procedures. Just imagine corporate employee trying to solve "unsolvable" coz his data wont fit.

Answered By: Houdini

train_test_split affect result when predict value the same

Question:

Answers: