# scipy, lognormal distribution – parameters

## Question:

I want to fit lognormal distribution to my data, using python `scipy.stats.lognormal.fit`

. According to the manual, `fit`

returns *shape, loc, scale* parameters. But, lognormal distribution normally needs only two parameters: mean and standard deviation.

How to interpret the results from scipy `fit`

function? How to get mean and std.dev.?

## Answers:

The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (`loc`

) which shifts the distribution to the left or right, while `scale`

is the parameter which compresses or stretches the distribution.

For the two parameter lognormal distribution, the “mean” and “std dev” correspond to log(`scale`

) and `shape`

(you can let `loc=0`

).

The following illustrates how to fit a lognormal distribution to find the two parameters of interest:

```
In [56]: import numpy as np
In [57]: from scipy import stats
In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3)
In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3)
In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting
In [61]: shape, loc, scale
Out[61]: (2.9212650122639419, 0, 21318.029350592606)
In [62]: np.log(scale), shape # mu, sigma
Out[62]: (9.9673084420467362, 2.9212650122639419)
```

I just spent some time working this out and wanted to document it here: If you want to get the probability density (at point `x`

) from the three return values of `lognorm.fit`

(lets call them `(shape, loc, scale)`

), you need to use this formula:

```
x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale
```

So as an equation that is (`loc`

is `µ`

, `shape`

is `σ`

and `scale`

is `α`

):

I think this will help. *I was looking for the same issue for a long time and finally found a solution for my problem*. In my case, **I was trying to fit some data to the lognormal distribution using scipy.stats.lognorm module.** However, when I finally got the model parameters, I could not find a way to replicate my results using the mean and std from y data.

In the code below, I explain from the mean and std parameters how to produce a normally distributed data sample using scipy.stats.norm module. Using those data, I fit the normal model (`norm_dist_fitted`

) and also create a normal model using mean and standard deviation (`mu, sigma`

) extracted from the data.

Original model producing the data, fitted and produced-by-(mu-sigma)-pair is compared in a graph.

In the next section of the code, I use the normal data to produce a lognormal-distributed sample. To do so notice that the lognormal samples will be the exponential of the original sample. Hence, the mean and standard deviation of the exponential sample will be (`exp(mu)`

and `exp(sigma)`

).

I fitted the produced data to a `lognormal`

(since the log of my sample (exp(x)) is normally distributed and follow the lognormal model assumptions.

To produce a lognormal model from the mean and standard deviation of your original data (x) the code will be:

```
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
```

However, if your data is already in the exponential space (exp(x)), then you have to use:

```
muX = np.mean(np.log(x))
sigmaX = np.std(np.log(x))
scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
```

```
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example
sigma = 1.5 # Standard deviation of sample
N = 2000 # Number of samples
norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process
x = norm_dist.rvs(size=N) # Generate samples
# Fit normal
fitting_params = scipy.stats.norm.fit(x)
norm_dist_fitted = scipy.stats.norm(*fitting_params)
t = np.linspace(np.min(x), np.max(x), 100)
# Plot normals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',
label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std()))
ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',
label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std()))
ax.legend(loc='lower right')
plt.show()
# The lognormal model fits to a variable whose log is normal
# We create our variable whose log is normal 'exponenciating' the previous variable
x_exp = np.exp(x)
mu_exp = np.exp(mu)
sigma_exp = np.exp(sigma)
fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp)
lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal)
t = np.linspace(np.min(x_exp), np.max(x_exp), 100)
# Here is the magic I was looking for a long long time
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# The trick is to understand these two things:
# 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
# 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
# with:
# - muX = np.mean(np.log(x))
# - sigmaX = np.std(np.log(x))
# Plot lognormals
f, ax = plt.subplots(1, sharex='col', figsize=(10, 5))
sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,
label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma))
ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',
label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std()))
ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',
label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std()))
ax.legend(loc='lower right')
plt.show()
```

First, the `loc`

is not a simple linear shift of the distribution, in fact, the `loc`

has its own statistics meaning, it means samples subtract the `loc`

will get a “standardized” lognormal, whose low bound is zero, this is quite important.

Hence, when you specified the “loc” or “floc”, you actually imposed a very strong hypnosis, that you assume those samples have a lower bound, and the lower bound is “exactly” the “loc” value. So the `scipy`

used different algorithms to fit, i.e:

if you provide the loc information, then `scipy`

will adopt a maximum likelihood approach to calculate the fitting parameters, if not it will use numerical solver.

Also, you can read the code:

in scipy package stats/_continuous_distns.py

line: 3889. As follow:

```
def fit(self, data, *args, **kwds):
floc = kwds.get('floc', None)
if floc is None:
# loc is not fixed. Use the default fit method.
return super(lognorm_gen, self).fit(data, *args, **kwds)
f0 = (kwds.get('f0', None) or kwds.get('fs', None) or
kwds.get('fix_s', None))
fscale = kwds.get('fscale', None)
if len(args) > 1:
raise TypeError("Too many input arguments.")
for name in ['f0', 'fs', 'fix_s', 'floc', 'fscale', 'loc', 'scale',
'optimizer']:
kwds.pop(name, None)
if kwds:
raise TypeError("Unknown arguments: %s." % kwds)
# Special case: loc is fixed. Use the maximum likelihood formulas
# instead of the numerical solver.
```

Furthermore, someone from `R`

community might wonder why the output from python is different from `R`

. Actually, I don’t agree to use `R`

as a “reference”, it is just a software, different software has different flavors of algorithms.

For example, the output from `R`

as follows is not an error, and Python or other software like Fortran used totally different algorithms:

```
round(3.5)
[1] 4
round(2.5)
[1] 2
```

Something that helped me was to consider the location and scale as parametrisation.

Instead of use x as in the standard log-normal distribution, you change to x’ = (x-location)/scale

Them the probability density function F(x’)=(1/scale)F((x-location)/scale))

More information in the link

https://en.wikipedia.org/wiki/Location%E2%80%93scale_family