X has 45 features, but LinearRegression is expecting 8 features as input
Question:
I am trying to run a Polynomial regression model for the fetch california housing data. However, I get X has 45 features, but LinearRegression is expecting 8 features as input. Does anybody knows why? Any help would be greatly appreciated. Thanks.
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(return_X_y=True)
X = data[0]
y = data[1]
# Only use 50% of the data.
X = X[:int(X.shape[0] / 2)]
y = y[:int(y.shape[0] / 2)]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X_train)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(poly_reg.transform(X_test))
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error: ", mse)
I have tried using reshape(1, -1) on X_train and X_test, but didn’t work as well.
Answers:
I think when training the LinearRegression
model, the code is passing the original training data X_train
to the fit method, instead of the transformed data X_poly
. This is why the error message is raised. See if this change solves the problem?
# regressor.fit(X_train, y_train)
regressor.fit(X_poly, y_train)
I am trying to run a Polynomial regression model for the fetch california housing data. However, I get X has 45 features, but LinearRegression is expecting 8 features as input. Does anybody knows why? Any help would be greatly appreciated. Thanks.
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing(return_X_y=True)
X = data[0]
y = data[1]
# Only use 50% of the data.
X = X[:int(X.shape[0] / 2)]
y = y[:int(y.shape[0] / 2)]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42)
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X_train)
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(poly_reg.transform(X_test))
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error: ", mse)
I have tried using reshape(1, -1) on X_train and X_test, but didn’t work as well.
I think when training the LinearRegression
model, the code is passing the original training data X_train
to the fit method, instead of the transformed data X_poly
. This is why the error message is raised. See if this change solves the problem?
# regressor.fit(X_train, y_train)
regressor.fit(X_poly, y_train)