How to perform two-sample one-tailed t-test with numpy/scipy
Question:
In R
, it is possible to perform two-sample one-tailed t-test simply by using
> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")
Welch Two Sample t-test
data: A and B
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-1.029916 Inf
sample estimates:
mean of x mean of y
0.9954942 1.1798523
In Python world, scipy
provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy
.
Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy
?
Answers:
From your mailing list link:
because the one-sided tests can be backed out from the two-sided
tests. (With symmetric distributions one-sided p-value is just half
of the two-sided pvalue)
It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0
, and of a less-than test when p/2 < alpha and t < 0
.
When null hypothesis is Ho: P1>=P2
and alternative hypothesis is Ha: P1<P2
. In order to test it in Python, you write ttest_ind(P2,P1)
. (Notice the position is P2 first).
first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)
You will get the result like below
Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)
In Python, when statstic <0
your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2
, which is approximately 0.99. As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3. when statstic >0
, the real z score is actually equal to -statstic
, the real p-value is equal to pvalue/2.
Ivc’s answer should be when (1-p/2) < alpha and t < 0
, you can reject the less than hypothesis.
After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.
First let’s formulate our investigative question properly. The data we are investigating is
A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
with the sample means
A.mean() = 0.99549419
B.mean() = 1.1798523
I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.
So we have the Null Hypothesis
H0: A >= B
that we would like to reject in favor of the Alternative Hypothesis
H1: B > A
Now when you call scipy.stats.ttest_ind(x, y)
, this makes a Hypothesis Test on the value of x.mean()-y.mean()
, which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call
stats.ttest_ind(B,A)
instead of stats.ttest_ind(B,A)
. We get as an answer
t-value = 0.42210654140239207
p-value = 0.68406235191764142
and since according to the documentation this is the output for a two-tailed t-test we must divide the p
by 2 for our one-tailed test. So depending on the Significance Level alpha
you have chosen you need
p/2 < alpha
in order to reject the Null Hypothesis H0
. For alpha=0.05
this is clearly not the case so you cannot reject H0
.
An alternative way to decide if you reject H0
without having to do any algebra on t
or p
is by looking at the t-value and comparing it with the critical t-value t_crit
at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df
that applies to your problem. Since we have
df = sample_size_1 + sample_size_2 - 2 = 8
we get from a statistical table like this one that
t_crit(df=8, confidence_level=95%) = 1.860
We clearly have
t < t_crit
so we obtain again the same result, namely that we cannot reject H0
.
Did you look at this:
How to calculate the statistics "t-test" with numpy
I think that is exactly what this questions is looking at.
Basically:
import scipy.stats
x = [1,2,3,4]
scipy.stats.ttest_1samp(x, 0)
Ttest_1sampResult(statistic=3.872983346207417, pvalue=0.030466291662170977)
is the same result as this example in R. https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero
from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
_, double_p = ttest_ind(x,y,equal_var = False)
if alternative == 'both-sided':
pval = double_p
elif alternative == 'greater':
if np.mean(x) > np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
elif alternative == 'less':
if np.mean(x) < np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
return pval
A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]
print(t_test(A,B,alternative='greater'))
0.6555098817758839
Based on this function from R: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test
def ttest(a, b, axis=0, equal_var=True, nan_policy='propagate',
alternative='two.sided'):
tval, pval = ttest_ind(a=a, b=b, axis=axis, equal_var=equal_var,
nan_policy=nan_policy)
if alternative == 'greater':
if tval < 0:
pval = 1 - pval / 2
else:
pval = pval / 2
elif alternative == 'less':
if tval < 0:
pval /= 2
else:
pval = 1 - pval / 2
else:
assert alternative == 'two.sided'
return tval, pval
In R
, it is possible to perform two-sample one-tailed t-test simply by using
> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")
Welch Two Sample t-test
data: A and B
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-1.029916 Inf
sample estimates:
mean of x mean of y
0.9954942 1.1798523
In Python world, scipy
provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy
.
Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy
?
From your mailing list link:
because the one-sided tests can be backed out from the two-sided
tests. (With symmetric distributions one-sided p-value is just half
of the two-sided pvalue)
It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0
, and of a less-than test when p/2 < alpha and t < 0
.
When null hypothesis is Ho: P1>=P2
and alternative hypothesis is Ha: P1<P2
. In order to test it in Python, you write ttest_ind(P2,P1)
. (Notice the position is P2 first).
first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)
You will get the result like below
Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)
In Python, when statstic <0
your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2
, which is approximately 0.99. As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3. when statstic >0
, the real z score is actually equal to -statstic
, the real p-value is equal to pvalue/2.
Ivc’s answer should be when (1-p/2) < alpha and t < 0
, you can reject the less than hypothesis.
After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.
First let’s formulate our investigative question properly. The data we are investigating is
A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
with the sample means
A.mean() = 0.99549419
B.mean() = 1.1798523
I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.
So we have the Null Hypothesis
H0: A >= B
that we would like to reject in favor of the Alternative Hypothesis
H1: B > A
Now when you call scipy.stats.ttest_ind(x, y)
, this makes a Hypothesis Test on the value of x.mean()-y.mean()
, which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call
stats.ttest_ind(B,A)
instead of stats.ttest_ind(B,A)
. We get as an answer
t-value = 0.42210654140239207
p-value = 0.68406235191764142
and since according to the documentation this is the output for a two-tailed t-test we must divide the p
by 2 for our one-tailed test. So depending on the Significance Level alpha
you have chosen you need
p/2 < alpha
in order to reject the Null Hypothesis H0
. For alpha=0.05
this is clearly not the case so you cannot reject H0
.
An alternative way to decide if you reject H0
without having to do any algebra on t
or p
is by looking at the t-value and comparing it with the critical t-value t_crit
at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df
that applies to your problem. Since we have
df = sample_size_1 + sample_size_2 - 2 = 8
we get from a statistical table like this one that
t_crit(df=8, confidence_level=95%) = 1.860
We clearly have
t < t_crit
so we obtain again the same result, namely that we cannot reject H0
.
Did you look at this:
How to calculate the statistics "t-test" with numpy
I think that is exactly what this questions is looking at.
Basically:
import scipy.stats
x = [1,2,3,4]
scipy.stats.ttest_1samp(x, 0)
Ttest_1sampResult(statistic=3.872983346207417, pvalue=0.030466291662170977)
is the same result as this example in R. https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero
from scipy.stats import ttest_ind
def t_test(x,y,alternative='both-sided'):
_, double_p = ttest_ind(x,y,equal_var = False)
if alternative == 'both-sided':
pval = double_p
elif alternative == 'greater':
if np.mean(x) > np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
elif alternative == 'less':
if np.mean(x) < np.mean(y):
pval = double_p/2.
else:
pval = 1.0 - double_p/2.
return pval
A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]
print(t_test(A,B,alternative='greater'))
0.6555098817758839
Based on this function from R: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test
def ttest(a, b, axis=0, equal_var=True, nan_policy='propagate',
alternative='two.sided'):
tval, pval = ttest_ind(a=a, b=b, axis=axis, equal_var=equal_var,
nan_policy=nan_policy)
if alternative == 'greater':
if tval < 0:
pval = 1 - pval / 2
else:
pval = pval / 2
elif alternative == 'less':
if tval < 0:
pval /= 2
else:
pval = 1 - pval / 2
else:
assert alternative == 'two.sided'
return tval, pval