Pytorch Linear regression 1x1d, consistantly wrong slope

Question:

I am mastering pytorch here, and decided to implement very simple 1 to 1 linear regression, from height to weight.

Got dataset: https://www.kaggle.com/datasets/mustafaali96/weight-height but any other would do nicely.

Lets import libraries and information about females:

import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv('weight-height.csv',sep=',')
#https://www.kaggle.com/datasets/mustafaali96/weight-height
height_f=df[df['Gender']=='Female']['Height'].to_numpy()
weight_f=df[df['Gender']=='Female']['Weight'].to_numpy()
plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
plt.show()

Which gives nice scatter of measured females:
distribution

So far, so good.

Lets make Dataloader:

class Data(Dataset):
  def __init__(self, X: np.ndarray, y: np.ndarray) -> None:
    # need to convert float64 to float32 else
    # will get the following error
    # RuntimeError: expected scalar type Double but found Float
    self.X = torch.from_numpy(X.reshape(-1, 1).astype(np.float32))
    self.y = torch.from_numpy(y.reshape(-1, 1).astype(np.float32))    
    self.len = self.X.shape[0]  
  def __getitem__(self, index: int) -> tuple:
    return self.X[index], self.y[index]  
  def __len__(self) -> int:
    return self.len

traindata = Data(height_f, weight_f)
batch_size = 500
num_workers = 2
trainloader = DataLoader(traindata, 
                         batch_size=batch_size, 
                         shuffle=True, 
                         num_workers=num_workers)

…linear regression model…

class linearRegression(torch.nn.Module):
    def __init__(self, inputSize, outputSize):
        super(linearRegression, self).__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)
        

    def forward(self, x):
        out = self.linear(x)
        return out
model = linearRegression(1, 1)
criterion = torch.nn.MSELoss() 
optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)

.. lets train it:

epochs=10
for epoch in range(epochs):
    print(epoch)
    for i, (inputs, labels) in enumerate(trainloader):
        
        outputs=model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

gives 0,1,2,3,4,5,6,7,8,9
now lets see what our model gives:

range_height_f=torch.linspace(height_f.min(),height_f.max(),150)

plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
pred=model(range_height_f.reshape(-1, 1))
plt.scatter(range_height_f, pred.detach().numpy(), c ="green",alpha=0.1)


wrong model

Why does it do this? Why wrong slope?
consistently wrong slope, I might add
Whatever I change, optimizer, batch size, epochs, females to males.. it gives me this very wrong slope, and I really don’t get – why?

Edit 1: Added loss, here is plot
loss

Edit 2: Have decided to explore a bit, and made regression with skilearn:

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(height_f, weight_f, test_size = 0.25)

regr = LinearRegression()
regr.fit(X_train.reshape(-1,1), y_train)
plt.scatter(height_f, weight_f, c ="red",alpha=0.1)
range_pred=regr.predict(range_height_f.reshape(-1, 1))
range_pred
plt.scatter(range_height_f, range_pred, c ="green",alpha=0.1)

which gives following regression, which looks nice:
skilearn regression

t = torch.from_numpy(height_f.astype(np.float32))
p=regr.predict(t.reshape(-1,1))
p=torch.from_numpy(p).reshape(-1,1)


w= torch.from_numpy(weight_f.astype(np.float32)).reshape(-1,1)

print(criterion(p,w).item())

However in this case criterion=100.65161998527695

Pytorch in own turn converges to about 210

Edit 3
Changed optimisation to Adam from SGD:

#optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)
optimizer = torch.optim.Adam(model.parameters(), lr=0.5)

lr is larger in this case, which yields interesting, but consistent result.
Here is loss:
Adam loss,
And here is proposed regression:
Adam regression loss

And, here is log of loss criterion as well for Adam optimizer:
Last epochs

Asked By: Timo Junolainen

||

Answers:

As far as I can see, the code works as intended. I suggest adding an intercept term, though.

Just for clarification, I do not add code to my answer as I believe the issue is purely a theoretical one. Read up on the the simple linear regression model. If the data is non-zero mean (as is the case here), you can’t possibly match the mean and the slope of the data with merely one coefficient.

Answered By: titfortat

I think your issue stems from the data not being centered around zero.

See this thread for another example where "centering" the data prior to training has a huge effect on the convergence of SGD optimization.


Update (Dec 29the, 2022):

TL;DR
It’s all about normalization/initialization.

In detail:
Your data is not centered around 0 and it is not scaled "nicely". This makes it very difficult to SGD (and all other variants of it) to struggle with optimization.

In this answer I showed how centering the training data (subtracting mean and deciding by the std) solves this problem.

Here I’ll show you how to leave your data as-is, but change the initialization of the weights to solve your problem.

let m_x, s_x be the mean and std of X, and m_y, s_y be the mean and std of y.
When pytorch init the weights, a and b, for the linear layer y = aX + b it assumes X and y have zero mean and unit variance. This is NOT the case here. Far from it.
Therefore, we need to re-adjust the initial a and b accordingly.
Here’s the math for it:
enter image description here

And the code:

mu_x, sig_x, mu_y, sig_y = traindata.X.mean().item(), traindata.X.std().item(), traindata.y.mean().item(), traindata.y.std().item()
# just for fun, here are the values:
# (63.7087, 2.6962, 135.8601, 19.0225)

# start a fresh model and adjust its initial values:
model = linearRegression(1, 1)
model.linear.weight.data *= (sig_x / sig_y)
model.linear.bias.data = sig_y * (-(mu_x/sig_x)+(mu_y/sig_y))

# now you are good to go! continue optimizing like you originally did:
# init an optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.00001)

# optimize for 10 epochs (now you don't need this much, you can even increase the learning rate...)
epochs=10
for epoch in range(epochs):
    print(epoch)
    for i, (inputs, labels) in enumerate(trainloader):
        
        outputs=model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

The loss curve looks like this:
enter image description here
And the optimizer converged to

In []: loss.item()
Out[]: 100.9453125

Similar to that of sklearn.linear_model.LinearRegression.

Plotting the prediction on the data:

enter image description here

Answered By: Shai

The issue seems to be feature scaling/centering. With no gradient descent, classic linear regression is able to derive the solution with no scaling.

For SGD however, it is much harder to converge this way.

Try adding this before implementing the Dataset:

from sklearn.preprocessing import StandardScaler
height_f = StandardScaler().fit_transform(height_f.reshape(-1, 1))

I was able to achieve a good result using learning rate of 0.1 after that.

Answered By: dx2-66

It looks like the data loader loader + SGD is not handling the intercept properly. You should try adding a column of 1’s to the data.

scikit-learn Linear Regression and SGDRegressor behave this way too if you set fit_intercept=False:

enter image description here

Minimal Reproducible Example for linear regression + no intercept.

from sklearn.linear_model import LinearRegression
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

df = pd.read_csv("weight-height.csv")

X = df.Height.to_numpy().reshape(-1, 1)
y = df.Weight.to_numpy()

lr = LinearRegression(fit_intercept=False).fit(X, y)

X_test = np.linspace(df.Height.min(), df.Height.max()).reshape(-1, 1)
y_pred = lr.predict(X_test)

plt.scatter(df.Height, df.Weight, alpha=0.1)
plt.plot(X_test, y_pred, color="black")

Plot showing a diagonal blob, with a line that is not fitting it correctly. It looks like the figure that the original poster had an issue with.

Answered By: Alexander L. Hayes