How to reproduce the behaviour of Ridge(normalize=True)?

Question:

This block of code:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

X = 'some_data'
y = 'some_target'

penalty = 1.5e-5
A = Ridge(normalize=True, alpha=penalty).fit(X, y)

triggers the following warning:

FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2.
If you wish to scale the data, use Pipeline with a StandardScaler in a preprocessing stage. To reproduce the previous behavior:

from sklearn.pipeline import make_pipeline

 - model = make_pipeline(StandardScaler(with_mean=False), Ridge())

If you wish to pass a sample_weight parameter, you need to pass it as a fit parameter to each step of the pipeline as follows:
kwargs = {s[0] + '__sample_weight': sample_weight for s in model.steps}
model.fit(X, y, **kwargs)

Set parameter alpha to: original_alpha * n_samples. 
  warnings.warn(
Ridge(alpha=1.5e-05)

But that codes gives me completely different coefficients, as expected because normalisation and standardisation are different.

B = make_pipeline(StandardScaler(with_mean=False), Ridge(alpha=penalty))
B[1].fit(B[0].fit_transform(X), y)

Output:

A.coefs[0], B[1].coefs[0]
(124.87330648168594, 125511.75051106009)

The result still does not match if I set alpha = penalty * n_features.

Output:

A.coefs[0], B[1].coefs[0]
(124.87330648168594, 114686.09835548172)

even though Ridge() uses a bit different normalization than I expected:

the regressor X will be normalized by subtracting mean and dividing by
l2-norm

So what’s the proper way to use ridge regression with normalization?
considering that l2-norm seems like being obtained after prediction, data modifying and fitting again
nothing comes to my mind in context of using ridge regression from sklearn, especially after 1.2 version


prepare data for experimenting:

url = 'https://drive.google.com/file/d/1bu64NqQkG0YR8G2CQPkxR1EQUAJ8kCZ6/view?usp=sharing'
url = 'https://drive.google.com/uc?id=' + url.split('/')[-2]
data = pd.read_csv(url, index_col=0)

X = data.iloc[:,:15]
y = data['target']
Asked By: Rossin

||

Answers:

The difference is that the coefficients reported with normalize=True are to be applied directly to the unscaled inputs, whereas the pipeline approach applies its coefficients to the model’s inputs, which are the scaled features.

You can "normalize" (an unfortunate overloading of the word) the coefficients by multiplying/dividing by the features’ standard deviation. Together with the change to penalty suggested in the future warning, I get the same outputs:

np.allclose(A.coef_, B[1].coef_ / B[0].scale_)
# True

(I’ve tested using sklearn.datasets.load_diabetes.)

Answered By: Ben Reiniger