ValueError: operands could not be broadcast together with shapes – inverse_transform- Python

Question:

I know ValueError question has been asked many times. I am still struggling to find an answer because I am using inverse_transform in my code.

Say I have an array a

a.shape
> (100,20)

and another array b

b.shape
> (100,3)

When I did a np.concatenate,

hat = np.concatenate((a, b), axis=1)

Now shape of hat is

hat.shape    
(100,23)

After this, I tried to do this,

inversed_hat = scaler.inverse_transform(hat)

When I do this, I am getting an error:

ValueError: operands could not be broadcast together with shapes (100,23) (25,) (100,23)

Is this broadcast error in inverse_transform? Any suggestion will be helpful. Thanks in advance!

Asked By: user8321813

||

Answers:

Although you didn’t specify, I’m assuming you are using inverse_transform() from scikit learn’s StandardScaler. You need to fit the data first.

import numpy as np
from sklearn.preprocessing import MinMaxScaler


In [1]: arr_a = np.random.randn(5*3).reshape((5, 3))

In [2]: arr_b = np.random.randn(5*2).reshape((5, 2))

In [3]: arr = np.concatenate((arr_a, arr_b), axis=1)

In [4]: scaler = MinMaxScaler(feature_range=(0, 1)).fit(arr)

In [5]: scaler.inverse_transform(arr)
Out[5]:
array([[ 0.19981115,  0.34855509, -1.02999482, -1.61848816, -0.26005923],
       [-0.81813499,  0.09873672,  1.53824716, -0.61643731, -0.70210801],
       [-0.45077786,  0.31584348,  0.98219019, -1.51364126,  0.69791054],
       [ 0.43664741, -0.16763207, -0.26148908, -2.13395823,  0.48079204],
       [-0.37367434, -0.16067958, -3.20451107, -0.76465428,  1.09761543]])

In [6]: new_arr = scaler.inverse_transform(arr)

In [7]: new_arr.shape == arr.shape
Out[7]: True
Answered By: o-90

It seems you are using pre-fit scaler object of sklearn.preprocessing.
If it’s true, according to me data that you have used for fitting is of dimension (x,25) whereas your data shape is of (x,23) dimension and thats the reason you are getting this issue.

Answered By: vipin bansal

The problem here is that the scaler has the information of your 25-column df, but you have updated your df to 23 columns, so it cannot do the ‘inverse’ function.

To fix the problem, you can do the fit on the 23-column original dataframe, and then do the ‘inverse’ on your desired 23-column dataframe.

More info:
scaler object keeps track of the information needed to perform the inverse transformation. When you fit a scaler to a dataset using the fit() method, the scaler computes the statistics (such as mean and variance for StandardScaler or minimum and maximum for MinMaxScaler) of the data and stores them in its internal state.

Answered By: Seb