Scikit-learn: getting same result on all rows when reusing the model

Question:

So I want to figure out some GDP numbers from a country’s GDP primary industry. The earliest data does not have any GDP values so I have trained a model with newer data. My plan is to use that trained model to guess older data.

I then fed new data to the model (the older data) but the model predicts the same number value for all the years!

What am I doing wrong?

PS. I only started with ML so apologies for messy code/ml technique 🙁

EDIT: FIXED. The new data needed to be scaled too 🙂

Asked By: Frank Jimenez

Source

Answers:

I believe you need to call sc.transform on X1 as well. Otherwise, the scale of the features would be off, and the predictions become erroneous too.

Answered By: anthony-khong

Try using sklearn Pipelines (also here) to take care of rescaling the data before new use for you.

Answered By: ayoubft