Preprocessing in scikit learn – single sample – Depreciation warning

Question:

On a fresh installation of Anaconda under Ubuntu… I am preprocessing my data in various ways prior to a classification task using Scikit-Learn.

from sklearn import preprocessing

scaler = preprocessing.MinMaxScaler().fit(train)
train = scaler.transform(train)    
test = scaler.transform(test)

This all works fine but if I have a new sample (temp below) that I want to classify (and thus I want to preprocess in the same way then I get

temp = [1,2,3,4,5,5,6,....................,7]
temp = scaler.transform(temp)

Then I get a deprecation warning…

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 
and will raise ValueError in 0.19. Reshape your data either using 
X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1)
if it contains a single sample. 

So the question is how should I be rescaling a single sample like this?

I suppose an alternative (not very good one) would be…

temp = [temp, temp]
temp = scaler.transform(temp)
temp = temp[0]

But I’m sure there are better ways.

Asked By: Chris Arthur

||

Answers:

Well, it actually looks like the warning is telling you what to do.

As part of sklearn.pipeline stages’ uniform interfaces, as a rule of thumb:

  • when you see X, it should be an np.array with two dimensions

  • when you see y, it should be an np.array with a single dimension.

Here, therefore, you should consider the following:

temp = [1,2,3,4,5,5,6,....................,7]
# This makes it into a 2d array
temp = np.array(temp).reshape((len(temp), 1))
temp = scaler.transform(temp)
Answered By: Ami Tavory

Just listen to what the warning is telling you:

Reshape your data either X.reshape(-1, 1) if your data has a single feature/column
and X.reshape(1, -1) if it contains a single sample.

For your example type(if you have more than one feature/column):

temp = temp.reshape(1,-1) 

For one feature/column:

temp = temp.reshape(-1,1)
Answered By: Mike

This might help

temp = ([[1,2,3,4,5,6,.....,7]])
Answered By: Bharath

I faced the same issue and got the same deprecation warning. I was using a numpy array of [23, 276] when I got the message. I tried reshaping it as per the warning and end up in nowhere. Then I select each row from the numpy array (as I was iterating over it anyway) and assigned it to a list variable. It worked then without any warning.

array = []
array.append(temp[0])

Then you can use the python list object (here ‘array’) as an input to sk-learn functions. Not the most efficient solution, but worked for me.

Answered By: shan89

.values.reshape(-1,1) will be accepted without alerts/warnings

.reshape(-1,1) will be accepted, but with deprecation war

Answered By: Analytics

You can always, reshape like:

temp = [1,2,3,4,5,5,6,7]

temp = temp.reshape(len(temp), 1)

Because, the major issue is when your, temp.shape is:
(8,)

and you need
(8,1)

Answered By: Francisco Pereira

-1 is the unknown dimension of the array. Read more about "newshape" parameters on numpy.reshape documentation –

# X is a 1-d ndarray

# If we want a COLUMN vector (many/one/unknown samples, 1 feature)
X = X.reshape(-1, 1)

# you want a ROW vector (one sample, many features/one/unknown)
X = X.reshape(1, -1)
from sklearn.linear_model import LinearRegression
X = df[['x_1']] 
X_n = X.values.reshape(-1, 1)
y = df['target']  
y_n = y.values
model = LinearRegression()
model.fit(X_n, y)

y_pred = pd.Series(model.predict(X_n), index=X.index)
Answered By: gregor256