Difference between Standard scaler and MinMaxScaler

Question:

What is the difference between MinMaxScaler() and StandardScaler().

mms = MinMaxScaler(feature_range = (0, 1)) (Used in a machine learning model)

sc = StandardScaler() (In another machine learning model they used standard-scaler and not min-max-scaler)

Asked By: Chakra

Answers:

StandardScaler removes the mean and scales the data to unit variance.
However, the outliers have an influence when computing the empirical
mean and standard deviation which shrink the range of the feature
values as shown in the left figure below. Note in particular that
because the outliers on each feature have different magnitudes, the
spread of the transformed data on each feature is very different: most
of the data lie in the [-2, 4] range for the transformed median income
feature while the same data is squeezed in the smaller [-0.2, 0.2]
range for the transformed number of households.

StandardScaler therefore cannot guarantee balanced feature scales in
the presence of outliers.

MinMaxScaler rescales the data set such that all feature values are in
the range [0, 1] as shown in the right panel below. However, this
scaling compress all inliers in the narrow range [0, 0.005] for the
transformed number of households.

Answered By: Simas Joneliunas

MinMaxScaler(feature_range = (0, 1)) will transform each value in the column proportionally within the range [0,1]. Use this as the first scaler choice to transform a feature, as it will preserve the shape of the dataset (no distortion).

StandardScaler() will transform each value in the column to range about the mean 0 and standard deviation 1, ie, each value will be normalised by subtracting the mean and dividing by standard deviation. Use StandardScaler if you know the data distribution is normal.

If there are outliers, use RobustScaler(). Alternatively you could remove the outliers and use either of the above 2 scalers (choice depends on whether data is normally distributed)

Additional Note: If scaler is used before train_test_split, data leakage will happen. Do use scaler after train_test_split

Answered By: perpetualstudent

Many machine learning algorithms perform better when numerical input variables are scaled to a standard range.
Scaling the data means it helps to Normalize the data within a particular range.

When MinMaxScaler is used the it is also known as Normalization and it transform all the values in range between (0 to 1)
formula is x = [(value – min)/(Max- Min)]

StandardScaler comes under Standardization and its value ranges between (-3 to +3)
formula is z = [(x – x.mean)/Std_deviation]

Answered By: Manoj Nahak

Before implementing MinMaxScaler or Standard Scaler you should know about the distribution of your dataset.

StandardScaler rescales a dataset to have a mean of 0 and a standard deviation of 1. Standardization is very useful if data has varying scales and the algorithm assumption about data having a gaussian distribution.

Normalization or MinMaxScaler rescale a dataset so that each value fall between 0 and 1. It is useful when data has varying scales and the algorithm does not make assumptions about the distribution. It is a good technique when we did not know about the distribution of data or when we know the distribution is not gaussian.

Answered By: Shishu Kumar Choudhary