Why does SciPy return negative p-values for extremely small p-values with the Fisher-exact test?
Question:
I’ve noticed that the Fisher-exact test in SciPy returns a negative p-value if the p-value is extrememly small:
>>> import scipy as sp
>>> import scipy.stats
>>> x = [[48,60],[3088,17134]]
>>> sp.stats.fisher_exact(x)
(4.4388601036269426, -1.5673906617053035e-11)
In R, using the same 2×2 contingency table:
> a = matrix(c(48,60,3088,17134), nrow=2)
> fisher.test(a)
p-value = 6.409e-13
My question is 1) why does SciPy return a negative p-value? 2) how can I use SciPy to generate the correct p-value?
Thanks for the help.
Answers:
Fisher’s exact test uses the hypergeometric distribution.
The version of scipy you are using uses an implementation of the hypergeometric distribution that is not very precise. This is a known problem and has been fixed in the scipy repository.
SciPy returns negative p-values for extremely small p-values with the Fisher-exact test because of the limitations of floating-point arithmetic. In computing the p-value, SciPy uses a cumulative distribution function that involves taking the sum of the probabilities of all possible outcomes that are as extreme or more extreme than the observed outcome. For extremely small p-values, the cumulative sum of the probabilities can exceed the maximum representable floating-point number, resulting in an overflow error. To handle this, SciPy returns a negative p-value to indicate that the p-value is smaller than the smallest representable floating-point number.
In practice, this means that the p-value is effectively zero, but the exact value is not representable due to the limitations of floating-point arithmetic. When interpreting the results of the Fisher-exact test, it is important to keep in mind the limitations of floating-point arithmetic and to treat extremely small p-values as zero.
For more : https://visualvisionaries.blogspot.com/2023/01/why-does-scipy-return-negative-p-values.html
I’ve noticed that the Fisher-exact test in SciPy returns a negative p-value if the p-value is extrememly small:
>>> import scipy as sp
>>> import scipy.stats
>>> x = [[48,60],[3088,17134]]
>>> sp.stats.fisher_exact(x)
(4.4388601036269426, -1.5673906617053035e-11)
In R, using the same 2×2 contingency table:
> a = matrix(c(48,60,3088,17134), nrow=2)
> fisher.test(a)
p-value = 6.409e-13
My question is 1) why does SciPy return a negative p-value? 2) how can I use SciPy to generate the correct p-value?
Thanks for the help.
Fisher’s exact test uses the hypergeometric distribution.
The version of scipy you are using uses an implementation of the hypergeometric distribution that is not very precise. This is a known problem and has been fixed in the scipy repository.
SciPy returns negative p-values for extremely small p-values with the Fisher-exact test because of the limitations of floating-point arithmetic. In computing the p-value, SciPy uses a cumulative distribution function that involves taking the sum of the probabilities of all possible outcomes that are as extreme or more extreme than the observed outcome. For extremely small p-values, the cumulative sum of the probabilities can exceed the maximum representable floating-point number, resulting in an overflow error. To handle this, SciPy returns a negative p-value to indicate that the p-value is smaller than the smallest representable floating-point number.
In practice, this means that the p-value is effectively zero, but the exact value is not representable due to the limitations of floating-point arithmetic. When interpreting the results of the Fisher-exact test, it is important to keep in mind the limitations of floating-point arithmetic and to treat extremely small p-values as zero.
For more : https://visualvisionaries.blogspot.com/2023/01/why-does-scipy-return-negative-p-values.html