How does MinMaxScaler work – Scaled per row or scaled for the entire data set?

Question:

I wonder how the MinMaxScaler from sklearn works on a numpy array.

Does it scale based on the min max values per row, or does it scale based on the min max values of the entire data set?

# get pandas DataFrame.
dataframe = self.fetch_symbol(
    symbol=symbol,
    period=None,
    lookup=False,)

# get X dataframe.
X = dataframe[self.columns].to_numpy()

# apply min max scaler.
scaler = sklearn.preprocessing.MinMaxScaler()
X = scaler.fit_transform(X)

Since this a dumped X data array and there is no 1.0 present? How could that be explained if the array is scaled per row?

[[[0.16046406 0.15805957 0.13419023 0.13800743 0.42891535 0.11922597]
  [0.13934693 0.17908731 0.14614396 0.1923503  0.42822784 0.12399251]
  [0.1925308  0.17908731 0.15501285 0.14426272 0.42806807 0.12856839]
  [0.14469139 0.19340406 0.15070694 0.1633544  0.42789004 0.13296123]]

 [[0.14742879 0.24297456 0.13553985 0.125562   0.48300352 0.30485521]
  [0.1262465  0.16483446 0.12275064 0.16348472 0.4821922  0.28753448]
  [0.16365769 0.19787805 0.1559126  0.19736756 0.48006021 0.26733746]
  [0.19741902 0.22306021 0.20533419 0.21926109 0.47704036 0.24956408]]

 [[0.19921137 0.21839448 0.18669666 0.18648596 0.41883789 0.11741573]
  [0.18666493 0.18279369 0.17217224 0.18987489 0.41481457 0.11741573]
  [0.18953269 0.2098939  0.19248072 0.1989151  0.41027914 0.12218477]
  [0.1991136  0.2071456  0.18470437 0.21965205 0.40481333 0.12676305]]

 ...

 [[0.34682917 0.33175915 0.36797013 0.35728155 0.40061129 0.34991894]
  [0.34269779 0.32821724 0.36283865 0.35490831 0.40061115 0.34832607]
  [0.33908283 0.32388823 0.35899004 0.35490831 0.40061589 0.34679691]
  [0.33980583 0.32369146 0.36625964 0.35490831 0.40062501 0.34532891]]

 [[0.9136542  0.87032664 0.93499907 0.93182309 0.73167466 0.84121732]
  [0.89299731 0.85714286 0.92259798 0.92944984 0.73307946 0.88873786]
  [0.88989878 0.84868162 0.91468695 0.90981661 0.73486931 0.88873786]
  [0.87110101 0.82979142 0.91618363 0.90981661 0.73669497 0.88641553]]

 [[0.62920884 0.59937033 0.64507025 0.63667745 0.59950843 0.63437614]
  [0.61412931 0.60232192 0.64507025 0.65738943 0.59995533 0.63049207]
  [0.63168767 0.6090122  0.66548928 0.66321467 0.60035125 0.62691872]
  [0.63499277 0.6111767  0.66666524 0.65846818 0.60061939 0.62363125]]]

Asked By: user7934593

||

Answers:

MinMaxScaler scales by column. Check the documentation, the scaling happens by taking min/max on the axis 0: (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)).

As a further proof, running on this array: np.array([[1,0],[3,5]]) outputs np.array([[0,0],[1,1]]).

Answered By: mozway