A lognormal distribution in python

Question

I have seen several questions in stackoverflow regarding how to fit a log-normal distribution. Still there are two clarifications that I need known.

I have a sample data, the logarithm of which follows a normal distribution. So I can fit the data using scipy.stats.lognorm.fit (i.e a log-normal distribution)

The fit is working fine, and also gives me the standard deviation. Here is my piece of code with the results.

import numpy as np
from scipy import stats

sample = np.log10(data) #taking the log10 of the data

scatter,loc,mean = stats.lognorm.fit(sample) #Gives the paramters of the fit 

x_fit = np.linspace(13.0,15.0,100)
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,mean) #Gives the PDF

print "scatter for data is %s" %scatter
print "mean of data is %s" %mean

enter image description here
THE RESULT

scatter for data is 0.186415047243
mean for data is 1.15731050926

From the image you can clearly see that the mean is around 14.2, but what I get is 1.15??!! Why is this so? clearly the log(mean) is also not near 14.2!!

In THIS POST and in THIS QUESTION it is mentioned that the log(mean) is the actual mean.

But you can see from my above code, the fit that I have obtained is using a the sample = log(data) and it also seems to fit well. However when I tried

sample = data
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,np.log10(mean))

The fit does not seem to work.

1) Why is the mean not 14.2?

2) How to draw fill/draw vertical lines showing the 1 sigma confidence region?

Asked By: Srivatsan

||

Source

Answer 1

You say

I have a sample data, the logarithm of which follows a normal distribution.

Suppose data is the array containing the samples. To fit this data to
a log-normal distribution using scipy.stats.lognorm, use:

s, loc, scale = stats.lognorm.fit(data, floc=0)

Now suppose mu and sigma are the mean and standard deviation of the
underlying normal distribution. To get the estimate of those values
from this fit, use:

estimated_mu = np.log(scale)
estimated_sigma = s

(These are not the estimates of the mean and standard deviation of
the samples in data. See the wikipedia page for the formulas
for the mean and variance of a log-normal distribution in terms of mu and sigma.)

To combine the histogram and the PDF, you can use, for example,

import matplotlib.pyplot as plt.

plt.hist(data, bins=50, normed=True, color='c', alpha=0.75)
xmin = data.min()
xmax = data.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.lognorm.pdf(x, s, scale=scale)
plt.plot(x, pdf, 'k')

If you want to see the log of the data, you could do something like
the following. Note the the PDF of the normal distribution is used
here.

logdata = np.log(data)
plt.hist(logdata, bins=40, normed=True, color='c', alpha=0.75)
xmin = logdata.min()
xmax = logdata.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma)
plt.plot(x, pdf, 'k')

By the way, an alternative to fitting with stats.lognorm is to fit log(data)
using stats.norm.fit:

logdata = np.log(data)
estimated_mu, estimated_sigma = stats.norm.fit(logdata)

A lognormal distribution in python

Question:

Answers: