X has 45 features, but LinearRegression is expecting 8 features as input

Question:

I am trying to run a Polynomial regression model for the fetch california housing data. However, I get X has 45 features, but LinearRegression is expecting 8 features as input. Does anybody knows why? Any help would be greatly appreciated. Thanks.

from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(return_X_y=True)

X = data[0]
y = data[1]

# Only use 50% of the data.
X = X[:int(X.shape[0] / 2)]
y = y[:int(y.shape[0] / 2)]

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)


poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X_train)

regressor = LinearRegression()
regressor.fit(X_train, y_train)

y_pred = regressor.predict(poly_reg.transform(X_test))

mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error: ", mse)

I have tried using reshape(1, -1) on X_train and X_test, but didn’t work as well.

Asked By: Shan

||

Answers:

I think when training the LinearRegression model, the code is passing the original training data X_train to the fit method, instead of the transformed data X_poly. This is why the error message is raised. See if this change solves the problem?

# regressor.fit(X_train, y_train)
regressor.fit(X_poly, y_train)
Answered By: AboAmmar