Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
Question:
While I am predicting the one sample from my data, it gives reshape error but my model has equal number of rows. Here is my code:
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
lr = LinearRegression()
lr.fit(x,y)
print(lr.predict(2.4))
The error is
if it contains a single sample.".format(array))
ValueError: Expected 2D array, got scalar array instead:
array=2.4.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
Answers:
You should reshape your X to be a 2D array not 1D array. Fitting a model requires requires a 2D array. i.e (n_samples, n_features)
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
lr = LinearRegression()
lr.fit(x.reshape(-1, 1), y)
print(lr.predict([[2.4]]))
The error is basically saying to convert the flat feature array into a column array. reshape(-1, 1)
does the job; also [:, None]
can be used.
The second dimension of the feature array X
must match the second dimension of whatever is passed to predict()
as well. Since X
is coerced into a 2D array, the array passed to predict()
should be 2D as well.
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
X = x[:, None] # X.ndim should be 2
lr = LinearRegression()
lr.fit(X, y)
prediction = lr.predict([[2.4]])
If your input is a pandas column, then use double brackets ([[]]
) get a 2D feature array.
df = pd.DataFrame({'feature': x, 'target': y})
lr = LinearRegression()
lr.fit(df['feature'], df['target']) # <---- error
lr.fit(df[['feature']], df['target']) # <---- OK
# ^^ ^^ <---- double brackets
Why should X
be 2D?
If we look at the source code of fit()
(of any model in scikit-learn), one of the first things done is to validate the input via the validate_data()
method, which calls check_array()
to validate X
. check_array()
checks among other things, whether X
is 2D. It is essential for X
to be 2D because ultimately, LinearRegression().fit()
calls scipy.linalg.lstsq
to solve the least squares problem and lstsq
requires X
to be 2D to perform matrix multiplication.
For classifiers, the second dimension is needed to get the number of features, which is essential to get the model coefficients in the correct shape.
While I am predicting the one sample from my data, it gives reshape error but my model has equal number of rows. Here is my code:
import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
lr = LinearRegression()
lr.fit(x,y)
print(lr.predict(2.4))
The error is
if it contains a single sample.".format(array))
ValueError: Expected 2D array, got scalar array instead:
array=2.4.
Reshape your data either using array.reshape(-1, 1) if your data has a
single feature or array.reshape(1, -1) if it contains a single sample.
You should reshape your X to be a 2D array not 1D array. Fitting a model requires requires a 2D array. i.e (n_samples, n_features)
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
lr = LinearRegression()
lr.fit(x.reshape(-1, 1), y)
print(lr.predict([[2.4]]))
The error is basically saying to convert the flat feature array into a column array. reshape(-1, 1)
does the job; also [:, None]
can be used.
The second dimension of the feature array X
must match the second dimension of whatever is passed to predict()
as well. Since X
is coerced into a 2D array, the array passed to predict()
should be 2D as well.
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
X = x[:, None] # X.ndim should be 2
lr = LinearRegression()
lr.fit(X, y)
prediction = lr.predict([[2.4]])
If your input is a pandas column, then use double brackets ([[]]
) get a 2D feature array.
df = pd.DataFrame({'feature': x, 'target': y})
lr = LinearRegression()
lr.fit(df['feature'], df['target']) # <---- error
lr.fit(df[['feature']], df['target']) # <---- OK
# ^^ ^^ <---- double brackets
Why should X
be 2D?
If we look at the source code of fit()
(of any model in scikit-learn), one of the first things done is to validate the input via the validate_data()
method, which calls check_array()
to validate X
. check_array()
checks among other things, whether X
is 2D. It is essential for X
to be 2D because ultimately, LinearRegression().fit()
calls scipy.linalg.lstsq
to solve the least squares problem and lstsq
requires X
to be 2D to perform matrix multiplication.
For classifiers, the second dimension is needed to get the number of features, which is essential to get the model coefficients in the correct shape.