Scipy Normaltest how is it used?

Question:

I need to use normaltest in scipy for testing if the dataset is normal distributet. But I cant seem to find any good examples how to use scipy.stats.normaltest.

My dataset has more than 100 values.

Asked By: The Demz

||

Answers:

First i found out that scipy.stats.normaltest is almost the same. The mstats library is used for masked arrays. Arrays where you can mark values as invalid and not taken into the calculation.

import numpy as np
import numpy.ma as ma
from scipy.stats import mstats

x = np.array([1, 2, 3, -1, 5, 7, 3]) #The array needs to be larger than 20, just an example
mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0, 0, 0])
z,pval = mstats.normaltest(mx)

if(pval < 0.055):
    print "Not normal distribution"

"Traditionally, in statistics, you need a p-value of less than 0.05 to
reject the null hypothesis." – http://mathforum.org/library/drmath/view/72065.html

Answered By: The Demz
In [12]: import scipy.stats as stats

In [13]: x = stats.norm.rvs(size = 100)

In [14]: stats.normaltest(x)
Out[14]: (1.627533590094232, 0.44318552909231262)

normaltest returns a 2-tuple of the chi-squared statistic, and the associated p-value. Given the null hypothesis that x came from a normal distribution, the p-value represents the probability that a chi-squared statistic that large (or larger) would be seen.

If the p-val is very small, it means it is unlikely that the data came from a normal distribution. For example:

In [15]: y = stats.uniform.rvs(size = 100)

In [16]: stats.normaltest(y)
Out[16]: (31.487039026711866, 1.4543748291516241e-07)
Answered By: unutbu
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.