scipy, lognormal distribution – parameters

Question:

I want to fit lognormal distribution to my data, using python scipy.stats.lognormal.fit. According to the manual, fit returns shape, loc, scale parameters. But, lognormal distribution normally needs only two parameters: mean and standard deviation.

How to interpret the results from scipy fit function? How to get mean and std.dev.?

Asked By: Jakub M.

||

Answers:

The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (loc) which shifts the distribution to the left or right, while scale is the parameter which compresses or stretches the distribution.

For the two parameter lognormal distribution, the “mean” and “std dev” correspond to log(scale) and shape (you can let loc=0).

The following illustrates how to fit a lognormal distribution to find the two parameters of interest:

In [56]: import numpy as np

In [57]: from scipy import stats

In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3)

In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3)

In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting

In [61]: shape, loc, scale
Out[61]: (2.9212650122639419, 0, 21318.029350592606)

In [62]: np.log(scale), shape  # mu, sigma
Out[62]: (9.9673084420467362, 2.9212650122639419)
Answered By: ars

I just spent some time working this out and wanted to document it here: If you want to get the probability density (at point x) from the three return values of lognorm.fit (lets call them (shape, loc, scale)), you need to use this formula:

x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale

So as an equation that is (loc is µ, shape is σ and scale is α):

x = frac{1}{(x-mu)cdotsqrt{2pisigma^2}}  cdot e^{-frac{log(frac{x-mu}{alpha})^2}{2sigma^2}}

Answered By: Chronial

I think this will help. I was looking for the same issue for a long time and finally found a solution for my problem. In my case, I was trying to fit some data to the lognormal distribution using scipy.stats.lognorm module. However, when I finally got the model parameters, I could not find a way to replicate my results using the mean and std from y data.

In the code below, I explain from the mean and std parameters how to produce a normally distributed data sample using scipy.stats.norm module. Using those data, I fit the normal model (norm_dist_fitted) and also create a normal model using mean and standard deviation (mu, sigma) extracted from the data.

Original model producing the data, fitted and produced-by-(mu-sigma)-pair is compared in a graph.

Fig1


In the next section of the code, I use the normal data to produce a lognormal-distributed sample. To do so notice that the lognormal samples will be the exponential of the original sample. Hence, the mean and standard deviation of the exponential sample will be (exp(mu) and exp(sigma)).

I fitted the produced data to a lognormal (since the log of my sample (exp(x)) is normally distributed and follow the lognormal model assumptions.

To produce a lognormal model from the mean and standard deviation of your original data (x) the code will be:

lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

However, if your data is already in the exponential space (exp(x)), then you have to use:

muX = np.mean(np.log(x))
sigmaX = np.std(np.log(x))
scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)

Fig2

import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example 
sigma = 1.5 # Standard deviation of sample
N = 2000 # Number of samples

norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
x = norm_dist.rvs(size=N) # Generate samples

# Fit normal
fitting_params = scipy.stats.norm.fit(x)
norm_dist_fitted = scipy.stats.norm(*fitting_params)
t = np.linspace(np.min(x), np.max(x), 100)

# Plot normals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
ax.legend(loc='lower right')
plt.show()


# The lognormal model fits to a variable whose log is normal
# We create our variable whose log is normal 'exponenciating' the previous variable

x_exp = np.exp(x)
mu_exp = np.exp(mu)
sigma_exp = np.exp(sigma)

fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
t = np.linspace(np.min(x_exp), np.max(x_exp), 100)

# Here is the magic I was looking for a long long time
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))

# The trick is to understand these two things:
# 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
# with:
#    - muX = np.mean(np.log(x))
#    - sigmaX = np.std(np.log(x))


# Plot lognormals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
             label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
        label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
        label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
ax.legend(loc='lower right')
plt.show()
Answered By: nenetto

First, the loc is not a simple linear shift of the distribution, in fact, the loc has its own statistics meaning, it means samples subtract the loc will get a “standardized” lognormal, whose low bound is zero, this is quite important.

Hence, when you specified the “loc” or “floc”, you actually imposed a very strong hypnosis, that you assume those samples have a lower bound, and the lower bound is “exactly” the “loc” value. So the scipy used different algorithms to fit, i.e:
if you provide the loc information, then scipy will adopt a maximum likelihood approach to calculate the fitting parameters, if not it will use numerical solver.

Also, you can read the code:
in scipy package stats/_continuous_distns.py
line: 3889. As follow:

    def fit(self, data, *args, **kwds):
        floc = kwds.get('floc', None)
        if floc is None:
            # loc is not fixed.  Use the default fit method.
            return super(lognorm_gen, self).fit(data, *args, **kwds)

        f0 = (kwds.get('f0', None) or kwds.get('fs', None) or
              kwds.get('fix_s', None))
        fscale = kwds.get('fscale', None)

        if len(args) > 1:
            raise TypeError("Too many input arguments.")
        for name in ['f0', 'fs', 'fix_s', 'floc', 'fscale', 'loc', 'scale',
                     'optimizer']:
            kwds.pop(name, None)
        if kwds:
            raise TypeError("Unknown arguments: %s." % kwds)

        # Special case: loc is fixed.  Use the maximum likelihood formulas
        # instead of the numerical solver.

Furthermore, someone from R community might wonder why the output from python is different from R. Actually, I don’t agree to use R as a “reference”, it is just a software, different software has different flavors of algorithms.

For example, the output from R as follows is not an error, and Python or other software like Fortran used totally different algorithms:

round(3.5)
[1] 4
round(2.5)
[1] 2
Answered By: Chao Sun

Something that helped me was to consider the location and scale as parametrisation.

Instead of use x as in the standard log-normal distribution, you change to x’ = (x-location)/scale

Them the probability density function F(x’)=(1/scale)F((x-location)/scale))

More information in the link
https://en.wikipedia.org/wiki/Location%E2%80%93scale_family

Answered By: Mateus Kinasz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.