parameter b diverges when calculating gradient descent for linear regression
Question:
I have two arrays x and y:
x = np.arange(-500, 500, 3)
y = 2.5*x+5
I want to perform a linear regression by calculating gradient descent
I implemented a gradient descent function,
def gradient_descent(x, y, alpha, epochs):
w = 0.0
b = 0.0
n = len(x)
for a in range(epochs):
if a % 100 == 0: print(w,b)
for i in range(n):
new_w = (-2*alpha*x[i]*(y[i]-w*x[i]+b))/n
new_b = (-2*alpha*(y[i]-w*x[i]+b))/n
w -= new_w
b -= new_b
return w,b
Given hyperparameters alpha=0.0005 and epochs=10000, it calculates well w but when calculating b it diverges to infinity. How can I fix it?
Answers:
You were close but you missed some parentheses in both your formulas:
Not only b would diverge to infinity like this but probably w as well (but slower).
See the missing parentheses below:
new_w = (-2*alpha*x[i]*(y[i]- ( w*x[i]+b ) ))/n
new_b = (-2*alpha*(y[i]- ( w*x[i]+b ) ))/n
The difference to your code is -wx[i]+b
!= -(wx[i]+b)
because of the – sign outside of the parentheses.
I have two arrays x and y:
x = np.arange(-500, 500, 3)
y = 2.5*x+5
I want to perform a linear regression by calculating gradient descent
I implemented a gradient descent function,
def gradient_descent(x, y, alpha, epochs):
w = 0.0
b = 0.0
n = len(x)
for a in range(epochs):
if a % 100 == 0: print(w,b)
for i in range(n):
new_w = (-2*alpha*x[i]*(y[i]-w*x[i]+b))/n
new_b = (-2*alpha*(y[i]-w*x[i]+b))/n
w -= new_w
b -= new_b
return w,b
Given hyperparameters alpha=0.0005 and epochs=10000, it calculates well w but when calculating b it diverges to infinity. How can I fix it?
You were close but you missed some parentheses in both your formulas:
Not only b would diverge to infinity like this but probably w as well (but slower).
See the missing parentheses below:
new_w = (-2*alpha*x[i]*(y[i]- ( w*x[i]+b ) ))/n
new_b = (-2*alpha*(y[i]- ( w*x[i]+b ) ))/n
The difference to your code is -wx[i]+b
!= -(wx[i]+b)
because of the – sign outside of the parentheses.