Preprocessing in scikit learn – single sample – Depreciation warning
Question:
On a fresh installation of Anaconda under Ubuntu… I am preprocessing my data in various ways prior to a classification task using Scikit-Learn.
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
This all works fine but if I have a new sample (temp below) that I want to classify (and thus I want to preprocess in the same way then I get
temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)
Then I get a deprecation warning…
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
So the question is how should I be rescaling a single sample like this?
I suppose an alternative (not very good one) would be…
temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]
But I’m sure there are better ways.
Answers:
Well, it actually looks like the warning is telling you what to do.
As part of sklearn.pipeline
stages’ uniform interfaces, as a rule of thumb:
-
when you see X
, it should be an np.array
with two dimensions
-
when you see y
, it should be an np.array
with a single dimension.
Here, therefore, you should consider the following:
temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)
Just listen to what the warning is telling you:
Reshape your data either X.reshape(-1, 1) if your data has a single feature/column
and X.reshape(1, -1) if it contains a single sample.
For your example type(if you have more than one feature/column):
temp = temp.reshape(1,-1)
For one feature/column:
temp = temp.reshape(-1,1)
This might help
temp = ([[1,2,3,4,5,6,.....,7]])
I faced the same issue and got the same deprecation warning. I was using a numpy array of [23, 276] when I got the message. I tried reshaping it as per the warning and end up in nowhere. Then I select each row from the numpy array (as I was iterating over it anyway) and assigned it to a list variable. It worked then without any warning.
array = []
array.append(temp[0])
Then you can use the python list object (here ‘array’) as an input to sk-learn functions. Not the most efficient solution, but worked for me.
.values.reshape(-1,1)
will be accepted without alerts/warnings
.reshape(-1,1)
will be accepted, but with deprecation war
You can always, reshape like:
temp = [1,2,3,4,5,5,6,7]
temp = temp.reshape(len(temp), 1)
Because, the major issue is when your, temp.shape is:
(8,)
and you need
(8,1)
-1 is the unknown dimension of the array. Read more about "newshape" parameters on numpy.reshape documentation –
# X is a 1-d ndarray
# If we want a COLUMN vector (many/one/unknown samples, 1 feature)
X = X.reshape(-1, 1)
# you want a ROW vector (one sample, many features/one/unknown)
X = X.reshape(1, -1)
from sklearn.linear_model import LinearRegression
X = df[['x_1']]
X_n = X.values.reshape(-1, 1)
y = df['target']
y_n = y.values
model = LinearRegression()
model.fit(X_n, y)
y_pred = pd.Series(model.predict(X_n), index=X.index)
On a fresh installation of Anaconda under Ubuntu… I am preprocessing my data in various ways prior to a classification task using Scikit-Learn.
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)
test = scaler.transform(test)
This all works fine but if I have a new sample (temp below) that I want to classify (and thus I want to preprocess in the same way then I get
temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)
Then I get a deprecation warning…
DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19. Reshape your data either using
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample.
So the question is how should I be rescaling a single sample like this?
I suppose an alternative (not very good one) would be…
temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]
But I’m sure there are better ways.
Well, it actually looks like the warning is telling you what to do.
As part of sklearn.pipeline
stages’ uniform interfaces, as a rule of thumb:
-
when you see
X
, it should be annp.array
with two dimensions -
when you see
y
, it should be annp.array
with a single dimension.
Here, therefore, you should consider the following:
temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)
Just listen to what the warning is telling you:
Reshape your data either X.reshape(-1, 1) if your data has a single feature/column
and X.reshape(1, -1) if it contains a single sample.
For your example type(if you have more than one feature/column):
temp = temp.reshape(1,-1)
For one feature/column:
temp = temp.reshape(-1,1)
This might help
temp = ([[1,2,3,4,5,6,.....,7]])
I faced the same issue and got the same deprecation warning. I was using a numpy array of [23, 276] when I got the message. I tried reshaping it as per the warning and end up in nowhere. Then I select each row from the numpy array (as I was iterating over it anyway) and assigned it to a list variable. It worked then without any warning.
array = []
array.append(temp[0])
Then you can use the python list object (here ‘array’) as an input to sk-learn functions. Not the most efficient solution, but worked for me.
.values.reshape(-1,1)
will be accepted without alerts/warnings
.reshape(-1,1)
will be accepted, but with deprecation war
You can always, reshape like:
temp = [1,2,3,4,5,5,6,7]
temp = temp.reshape(len(temp), 1)
Because, the major issue is when your, temp.shape is:
(8,)
and you need
(8,1)
-1 is the unknown dimension of the array. Read more about "newshape" parameters on numpy.reshape documentation –
# X is a 1-d ndarray
# If we want a COLUMN vector (many/one/unknown samples, 1 feature)
X = X.reshape(-1, 1)
# you want a ROW vector (one sample, many features/one/unknown)
X = X.reshape(1, -1)
from sklearn.linear_model import LinearRegression
X = df[['x_1']]
X_n = X.values.reshape(-1, 1)
y = df['target']
y_n = y.values
model = LinearRegression()
model.fit(X_n, y)
y_pred = pd.Series(model.predict(X_n), index=X.index)