Linear regression predict() error: UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names

Question:

I am trying to use my linearRegression() model to make a prediction based on a new x value. TThe x value is within the ranges of my regression model. My code is as follows:

x=df[['A']]
y=df['B']

m = LinearRegression()
m.fit(x,y)

# Test new X
new_x=pd.DataFrame([0.4])
print(m.predict(new_x))


[-28.43482247] UserWarning: X does not have valid feature names, but LinearRegression was fitted with feature names
  warnings.warn(

With the error above. The issue is that:

  1. I’m not sure what the warning is telling me.

  2. The value given to me is not right, because the new_x value i entered was on the regression line (I outputted this on the graph). And my graph is also x>=0, y>=0 . Here is a picture of my scatter plot:
    enter image description here

I have just under a million records as well.

Edit: Solution to 2 is because I was interpreting the number wrong. It should actually be "0.4e7" which give me the expected answer.

Asked By: ViB

||

Answers:

The warning is because the feature names (column names) from the X train set had names, and they do not match the names from those being passed for the model to predict. You can avoid the warning by both train and predicting features naming the same way. Let’s suppose your training set is a single feature based on this dataframe:

x = pd.DataFrame({'X':[1,2,3,4,5]})

You then fit your model:

m.fit(x,y)

And when predicting based on a new pandas Series, with a different feature name, such as:

print(m.predict(pd.DataFrame({'X_test':[0.4]})))

We get the same error, although now it can show the different names, because there is an actual named assigned to the new Series (which you didn’t define in your example):

FutureWarning: The feature names should match those that were passed during fit. Starting version 1.2, an error will be raised.
Feature names unseen at fit time:
- X_test
Feature names seen at fit time, yet now missing:
- X

  warnings.warn(message, FutureWarning)

The solution is to then name both the same way:

new_x=pd.DataFrame({'X':[0.4]})
print(m.predict(new_x))

Shows no warning or error

Answered By: Celius Stingher