How to fit multiple datasets which have a combination of shared and non-shared parameters

Question:

I’m trying to fit multiple dataset which should have some variables which are shared by between datasets and others which are not. However I’m unsure of the steps I need to take to do this. Below I’ve shown the approach that I’m trying to use (from ‘Issues begin here’ doesn’t work, and it just for illustrative purposes).

In this answer somebody is able to share parameters arcoss datasets is there some way that this could be adapted so that I can also have some non-shared parameters?

Does anybody have an idea how I could achieve this, or could somebody suggegst a better approach to acheive the same result? Thanks.

import numpy as np
from scipy.stats import gamma
import matplotlib.pyplot as plt
import pandas as pd
from lmfit import minimize, Minimizer, Parameters, Parameter, report_fit, Model


# Create datasets to fit
a = 1.99
start = gamma.ppf(0.001, a)
stop = gamma.ppf(.99, a)
xvals = np.linspace(start, stop, 100)
yvals = gamma.pdf(xvals, a)
data_dict = {}
for dataset in range(4):
    name = 'dataset_' + str(dataset)
    rand_offset = np.random.uniform(-.1, .1)
    noise = np.random.uniform(-.05, .05,len(yvals)) + rand_offset

    data_dict[name] = yvals + noise
df = pd.DataFrame(data_dict)

# Create some non-shared parameters
non_shared_param1 = np.random.uniform(0.009, .21, 4)
non_shared_param2 = np.random.uniform(0.01, .51, 4)

# Create the independent variable
ind_var = np.linspace(.001,100,100)

# Create a model
def model_func(time, Qi, at, vw, R, rhob_cb, al, NSP1, NSP2):
    Dt = at * vw
    Dl = al * vw

    t = time

    first_bot = 8 * np.pi * t * rhob_cb
    sec_bot = np.sqrt((np.pi * (Dl * R) * t))
    exp_top = R * np.power((NSP1 - ((t * vw)/R)), 2)
    exp_bot = 4 * Dl * t
    exp_top2 = R * np.power(NSP2, 2)
    exp_bot2 = 4 * Dt * t
    return (Qi / first_bot * sec_bot) * np.exp(- (exp_top / exp_bot) - (exp_top2 / exp_bot2))

model = Model(model_func)

### Issues begin here ###

all_results = {}
index = 0
for col in df:
    # This block assigns the correct non-shared parameter for the particular fit
    nsp1 = non_shared_param1[index]
    nsp2 = non_shared_param2[index]
    index += 1

    params = Parameters()
    at = 0.1 
    al = 0.15
    vw = 10**-4
    Dt = at * vw
    Dl = al * vw

    # Non-shared parameters
    model.set_param_hint('NSP1', value = nsp1)
    model.set_param_hint('NSP2', value = nsp2)

    # Shared and varying parameters
    model.set_param_hint('vw', value =10**-4, min=10**-10)
    model.set_param_hint('at', value =0.1)
    model.set_param_hint('al', value =0.15)

    # Shared and fixed parameters
    model.set_param_hint('Qi', value = 1000, vary = True)
    model.set_param_hint('R', value = 1.7, vary = True)
    model.set_param_hint('rhob_cb', value =2895, vary = True)

    # One set of parameters should be returned
    result = model.fit(df[col], time = ind_var)

    all_results[index] = result
Asked By: Jason

||

Answers:

A fit with lmfit always uses a single instance of a Parameters object, it does not take multiple Parameters objects.

In order to simultaneously fit multiple data sets with the similar models (perhaps the same mathematical model, but expecting different parameter values for each model), you need to have an objective function that concatenates the residuals from the different component models. And each of those models has to have parameters that are taken from the single instance of Parameters(), each parameter having a unique name.

So, to fit 2 data sets with the same function (let’s use Gaussian, with parameters “center”, “amplitude”, and “sigma”), you might define the Parameters as

params =  Parameters()
params.add('center_1',    5., vary=True)
params.add('amplitude_1', 10., vary=True)
params.add('sigma_1',    1.0, vary=True
params.add('center_2',    8., vary=True)
params.add('amplitude_2', 3., vary=True)
params.add('sigma_2',    2.0, vary=True)

Then use ‘center_1’, ‘amplitude_1’, and ‘sigma_1’ to calculate the model for the first data set and ‘center_2’, etc to calculate the the model for the second, perhaps as

def residual(params, x, datasets):
    model1 = params['amplitude_1'] * gaussian(x, params['center_1'], params['sigma_1'])
    model2 = params['amplitude_2'] * gaussian(x, params['center_2'], params['sigma_2']

    resid1 = datasets[0] - model1
    resid2 = datasets[1] - model2
    return np.concatenate((resid1, resid2))

fit = lmfit.minimize(residual, params, fcn_args=(x, datasets))

As you might be able to see from this, Parameter values are independent by default. In order to share parameter values to be used in different datasets, you have to do that explicitly (as shown in the linked answer you provide).

For example, if you want to require the sigma values to be the same, would not change the residual function, just the Parameter definition above to:

params.add('sigma_2', expr='sigma_1')

You could require the two amplitudes to add to some value:

params.add('amplitude_2', expr='10 - amplitude_1')

or perhaps you would want to ensure that ‘center_2’ is larger than ‘center_1’, but by an amount to be determined in the fit:

params.add('center_offset', value=0.5, min=0)
params.add('center_2',  expr='center_1 + center_offset')

Those are all ways to tie parameter values. By default, they’re independent. Of course, you can also have some parameters that get used in all the models (say, just call the parameter ‘sigma’ and use it in for all models).

Answered By: M Newville

lmfit is the way to go. Thanks very much for M Nevilles answer above.
I want to note, that on multiple datasets of differing size, one needs to add nan_policy='omit' to the

fit = lmfit.minimize(residual, params, fcn_args=(x, datasets), nan_policy='omit')

At this point, my questions is: how to best fit multiple datasets of different size?
See my approach:

import numpy as np
import pandas as pd
import lmfit

# creating some easy Dataframe
data = {'x':[1, 2, 3],
        'meas1':[1, 1, 1],
        'meas2':[2, 3, 4.5]}
data2 = {'x':[1, 2, 4, 5],
        'meas3':[1, 1, 1, 5]}
df1 = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.merge(df1, df2.loc[:,['x','meas3']], on = 'x', how= 'outer')
print(df3)

# defining the fit model
def residual(par, x, df):
    model1 = par['a_1'] * x + par['c_1']
    model2 = par['a_2'] * x + par['c_2']
    model3 = par['a_3'] * x + par['c_3']
    
    resid1 = df.iloc[:,1]- model1
    resid2 = df.iloc[:,2] - model2
    resid3 = df.iloc[:,3] - model3
    
    return np.concatenate((resid1,resid2,resid3))

# set Parameters
par = lmfit.Parameters()
for i in [1,2,3]:
    par.add(f'a_{i}', value=1)
    par.add(f'c_{i}', value=1)

# actual fitting
out = lmfit.minimize(residual, par, args=(df3['x'], df3), nan_policy='omit')

# plotting results
def straight(x,a,c):
    return a*x + c
plt.plot(df3['x'],df3['meas1'],'r*')
plt.plot(df3['x'],df3['meas2'],'g*')
plt.plot(df3['x'],df3['meas3'],'b*')
plt.plot(df3['x'], straight(df3['x'], out.params['a_1'], out.params['c_1']),'r--')
plt.plot(df3['x'], straight(df3['x'], out.params['a_2'], out.params['c_2']),'g--')
plt.plot(df3['x'], straight(df3['x'], out.params['a_3'], out.params['c_3']),'b--')
plt.show()

Hope that helps, best!

Answered By: Stephan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.