Curve fitting in python compared to excel (variance in results)

Question:

I am fairly new to python, while I have used matlab quite a bit before. Currently I am trying to do an exponential curve fit to a semi logarithmic plot. Down below is the code I’ve currently got.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

time = np.array([0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260])

absorbance = np.array([0.02, 0.02, 0.042, 0.049, 0.094, 0.130, 0.160, 0.159, 0.205, 0.290, 0.440, 0.490, 0.310, 0.317])


def expf(x, a, b):
    return a * np.exp(b*x)


popt, pcov = curve_fit(expf, time, absorbance, p0=[0.02,0.0141], sigma=np.sqrt(absorbance))
a = popt[0]
b = popt[1]

plt.scatter(time, absorbance, label='data')
plt.plot(time, expf(time, a, b), 'r-', label='exponential fit')
plt.yscale('log')
plt.legend()
plt.show()


# r2 value
y_pred = expf(time, a, b)
SS_res = np.sum((absorbance - y_pred)**2)
SS_tot = np.sum((absorbance - np.mean(absorbance))**2)
r2 = 1 - (SS_res / SS_tot)
print("a =", a, "and b =", b)
print("R-squared value:", '{:.4}'.format(r2))

Python example

Gives

a = 0.0368729243337103 and b = 0.009635937402567983
R-squared value: 0.6657

However, in Excel with the same data I get a difference in a, b and r2 values. I am not sure why and what does it. I can put in the b-value from python in excel and get the same, but I wonder why its difference. Is it significant? Does python just use much more digits than excel? Thanks in advance.

Excel Example

I’ve tried weighted the python results, but as excel is unweighted as well, I thought python shouldn’t be as well.

Asked By: JonasDenmark

||

Answers:

scipy.optimize.curve_fit uses a numerical algorithm that attempts to find an approximation of the best fit for an arbitrary function, but it is not guaranteed to return the optimal solution.

I am guessing that in Excel you used curve fitting for the exponential function y = a*exp(b*x). For this specific type of a function, the optimal solution can be computed by applying log to both sides of the equation which gives log y = b*x + log a. This expresses log y as a linear function of x. Then we can solve for log a and b using the usual least squares method, and then compute a from log a.

Here is how to implement this procedure using numpy:

import numpy as np

time = np.array([0, 20, 40, 60, 80, 100, 120, 
                 140, 160, 180, 200, 220, 240, 260])

absorbance = np.array([0.02, 0.02, 0.042, 0.049, 0.094, 0.130, 0.160, 
                       0.159, 0.205, 0.290, 0.440, 0.490, 0.310, 0.317])

b, log_a =  np.linalg.lstsq(np.c_[time, np.ones_like(time)], 
                            np.log(absorbance), 
                            rcond=None)[0]
a = np.exp(log_a)

print(f"{b=}n{a=}")

It gives:

b=0.012147124659447353
a=0.02641989005087254
Answered By: bb1

It is not surprising that the results from Excel and Pyton are not exactly the same because they don’t correspond to the same fitted equation.

Phyton (with what you coded) fits the function y=a * exp(b * x) with the criteria Least Mean Square Error wrt y.

Excel fits the function ln(y)=b * x +ln(a) with the criteria Least Mean Square Error wrt ln(y). Roughly this is approximately equivalent to fitting y=a * exp(b * x) with the criteria Least Mean Square RELATIVE Error wrt y.

Of course "Relative Error" is something else than "Error". Thus one cannot expect exactly the same result. The comparison is shown on the table below.

enter image description here

Answered By: JJacquelin