dask-ml preprocessing raise AttributeError

Question:

I use Dask dataframe and dask-ml to manipulate my data. When I use dask-ml Min-max scaler, I get this error. Is there a way to prevent this error and make it work?

import dask.dataframe as dd
from dask_ml.preprocessing import MinMaxScaler

df = dd.read_csv('path to csv', parse_dates=['CREATED_AT']
                     , dtype={'ODI_UPDATED_AT': 'object'})
scaler = MinMaxScaler()
print(scaler.fit_transform(df['M']))

AttributeError: ‘Scalar’ object has no attribute ‘copy’

Asked By: Saeide

||

Answers:

Since the error message is ambiguous, an issue was opened: Better error message when using invalid ‘MinMAxScaler.fit()’ inputs

By the way, the way to solve this problem is using appropriate type as input. something like this:

scaler = dask_ml.preprocessing.MinMaxScaler()
col_1 = df['col_1'].values
scaler.fit(col_1.compute().reshape(-1, 1))
col_1 = dask_scaler.transform(col_1.compute().reshape(-1, 1))

second line gives you dask array and col_1.compute().reshape(-1,1) gives you numpy array. Finally you can concatenate multiple transformed columns and get new df.

ddf = dd.concat([dd.from_array(c) for c in [col_1, col_2, col_3]], axis=1)
ddf.columns = ['col_name', 'col_name', 'col_name']
Answered By: Saeide