How to perform a linear regression with a forced gradient in Python?

Question:

I am trying to do a linear regression on some limited and scattered data. I know from theory that the gradient should be 1, but it may have a y-offset. I found a lot of resources on how to force an intercept for linear regression, but never on forcing a gradient. I need the linear regression statistics to be reported and the gradient to be precisely 1.
Would I need to manually calculate the statistics? Or is there a way to use some packages like "statsmodels," "scipy," or "scikit-learn"? Or do I need to use a Bayesian approach with previous knowledge of the gradient?

Here is a graphical example of what I am trying to achieve.

import numpy as np
import matplotlib.pyplot as plt

# Generate random data to illustrate the point

n = 20
x = np.random.uniform(10, 20, n)
y = x - np.random.normal(1, 1, n) # Add noise to the 1:1 relationship

plt.scatter(x, y, ec="k", label="Measured data")

true_x = np.array((8, 20))
plt.plot(true_x, true_x, "k--") # 1:1 line
plt.plot(true_x, true_x-1, "r:", label="Forced gradient") # Theoretical line

m, c = np.polyfit(x, y, 1)
plt.plot(true_x, true_x*m + c, "g:", label="Linear regression")

plt.xlabel("Theoretical value")
plt.ylabel("Measured value")
plt.legend()

Example of forced gradient vs linear regression in random data

Asked By: DouglasCoenen

||

Answers:

I suggest using scipy.optimize.curve_fit that has the benefit of being flexible and easy to use also for non-linear regressions. You just need to define a function that represents a line with a known gradient and an offset given as input:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a):
    gradient = 1 # fixed gradient, not optimized
    return gradient * x + a

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')

popt, pcov = curve_fit(func, xdata, ydata)
popt
plt.plot(xdata, func(xdata, *popt), 'r-',
         label='fit: a=%5.3f' % tuple(popt))

plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

That generates the plot:

enter image description here

Answered By: Caridorc
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.