SettingWithCopyWarning won't go away regardless of the approach

Question:

Let me start by saying that I understand what the warning is, why it’s there and I’ve read a ton of questions which have been answered. Using today’s pandas (1.2.3) and scikit-learn (0.24.1) this warning simply won’t go away:

I have a dataframe loaded from a pickle, nothing too complex:

print(df)

    Date  Sales     Labels
0   2013-01-01      0 5024.00000
1   2013-01-02   5024 5215.00000
2   2013-01-03   5215 5552.00000
3   2013-01-04   5552 5230.00000
4   2013-01-05   5230    0.00000
..         ...    ...        ...
747 2015-01-18      0 5018.00000
748 2015-01-19   5018 4339.00000
749 2015-01-20   4339 4786.00000
750 2015-01-21   4786 4606.00000
751 2015-01-22   4606 4944.00000

I’m using the accepted answer of how to min-max the columns Sales and Labels because I want to preserve order and keep the Dates:

scaler = MinMaxScaler()
df[['Sales', 'Labels']] = scaler.fit_transform(df[['Sales', 'Labels']])

This gives me the following warning (as you can guess):

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I’ve tried:

df.loc[:, 'Sales'] = scaler.fit_transform(df[['Sales']])

And I still get the warning (even though now it won’t tell me which line it is coming from!).

Which makes me wonder if scikit-learn is internally calling it in the old-fashioned way, and that’s where the warning is now coming from.

I’ve also tried using a .copy() which I understand is only masking the issue, but the warning is still present.

Is there another way to apply MinMaxScaler without the warning?

Asked By: Ælex

||

Answers:

Most likely df is a subset of another dataframe, for example:

rawdata = pd.DataFrame({'Date':range(5),
'Sales':np.random.uniform(1000,2000,5),
'Labels':np.random.uniform(1000,2000,5),
'Var':np.random.uniform(0,1,5)})

And you subset df from this, but bear in mind this is a slice of the original dataframe rawdata. Hence if we try to scale, it throws a warning:

df = rawdata[['Date','Sales','Labels']]
df[['Sales', 'Labels']] = scaler.fit_transform(df[['Sales', 'Labels']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

If you scale and transform the original dataframe, it works:

rawdata[['Sales','Labels']] = scaler.fit_transform(rawdata[['Sales', 'Labels']])

You have to think about whether you need the original data frame, you can do this, just that it cost more memory:

df = rawdata[['Date','Sales','Labels']].copy()
df[['Sales', 'Labels']] = scaler.fit_transform(df[['Sales', 'Labels']])
Answered By: StupidWolf

One reason for this warning coming up is unordered indices. For example, if you split a dataset into subsets (i.e., train and test splits common in ML), then the indices are not in order.

A quick fix is to reset the indices of the subset df if they won’t matter in a downstream task.

df = df.reset_index()
Answered By: Akanksha Atrey
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.