Is there a python (scipy) function to determine parameters needed to obtain a target power?
Question:
In R there is a very useful function that helps with determining parameters for a two sided t-test in order to obtain a target statistical power.
The function is called power.prop.test
.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html
You can call it using:
power.prop.test(p1 = .50, p2 = .75, power = .90)
And it will tell you n the sample size needed to obtain this power. This is extremely useful in deterring sample sizes for tests.
Is there a similar function in the scipy package?
Answers:
I’ve managed to replicate the function using the below formula for n and the inverse survival function norm.isf
from scipy.stats
from scipy.stats import norm, zscore
def sample_power_probtest(p1, p2, power=0.8, sig=0.05):
z = norm.isf([sig/2]) #two-sided t test
zp = -1 * norm.isf([power])
d = (p1-p2)
s =2*((p1+p2) /2)*(1-((p1+p2) /2))
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
if __name__ == '__main__':
n = sample_power_probtest(0.1, 0.11, power=0.8, sig=0.05)
print n #14752
n = sample_power_difftest(0.1, 0.5, power=0.8, sig=0.05)
print n #392
Matt’s answer for getting the needed n (per group) is almost right, but there is a small error.
Given d (difference in means), s (standard deviation), sig (significance level, typically .05), and power (typically .80), the formula for calculating the number of observations per group is:
n= (2s^2 * ((z_(sig/2) + z_power)^2) / (d^2)
As you can see in his formula, he has
n = s * ((zp + z)**2) / (d**2)
the “s” part is wrong. a correct function that reproduces r’s functionality is:
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = (2*(s**2)) * ((zp + z)**2) / (d**2)
return int(round(n[0]))
Hope this helps.
Some of the basic power calculations are now available in statsmodels
http://statsmodels.sourceforge.net/devel/stats.html#power-and-sample-size-calculations
http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
The blog article does not yet take the latest changes to the statsmodels code into account. Also, I haven’t decided yet how many wrapper functions to provide, since many power calculations just reduce to the basic distribution.
>>> import statsmodels.stats.api as sms
>>> es = sms.proportion_effectsize(0.5, 0.75)
>>> sms.NormalIndPower().solve_power(es, power=0.9, alpha=0.05, ratio=1)
76.652940372066908
In R stats
> power.prop.test(p1 = .50, p2 = .75, power = .90)
Two-sample comparison of proportions power calculation
n = 76.7069301141077
p1 = 0.5
p2 = 0.75
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
using R’s pwr
package
> library(pwr)
> h<-ES.h(0.5,0.75)
> pwr.2p.test(h=h, power=0.9, sig.level=0.05)
Difference of proportion power calculation for binomial distribution (arcsine transformation)
h = 0.5235987755982985
n = 76.6529406106181
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: same sample sizes
You also have:
from statsmodels.stats.power import tt_ind_solve_power
and put “None” in the value you want to obtain. For instande, to obtain the number of observations in the case of effect_size = 0.1, power = 0.8 and so on, you should put:
tt_ind_solve_power(effect_size=0.1, nobs1 = None, alpha=0.05, power=0.8, ratio=1, alternative='two-sided')
and obtain: 1570.7330663315456 as the number of observations required.
Or else, to obtain the power you can attain with the other values fixed:
tt_ind_solve_power(effect_size= 0.2, nobs1 = 200, alpha=0.05, power=None, ratio=1, alternative='two-sided')
and you obtain: 0.5140816347005553
In R there is a very useful function that helps with determining parameters for a two sided t-test in order to obtain a target statistical power.
The function is called power.prop.test
.
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/power.prop.test.html
You can call it using:
power.prop.test(p1 = .50, p2 = .75, power = .90)
And it will tell you n the sample size needed to obtain this power. This is extremely useful in deterring sample sizes for tests.
Is there a similar function in the scipy package?
I’ve managed to replicate the function using the below formula for n and the inverse survival function norm.isf
from scipy.stats
from scipy.stats import norm, zscore
def sample_power_probtest(p1, p2, power=0.8, sig=0.05):
z = norm.isf([sig/2]) #two-sided t test
zp = -1 * norm.isf([power])
d = (p1-p2)
s =2*((p1+p2) /2)*(1-((p1+p2) /2))
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = s * ((zp + z)**2) / (d**2)
return int(round(n[0]))
if __name__ == '__main__':
n = sample_power_probtest(0.1, 0.11, power=0.8, sig=0.05)
print n #14752
n = sample_power_difftest(0.1, 0.5, power=0.8, sig=0.05)
print n #392
Matt’s answer for getting the needed n (per group) is almost right, but there is a small error.
Given d (difference in means), s (standard deviation), sig (significance level, typically .05), and power (typically .80), the formula for calculating the number of observations per group is:
n= (2s^2 * ((z_(sig/2) + z_power)^2) / (d^2)
As you can see in his formula, he has
n = s * ((zp + z)**2) / (d**2)
the “s” part is wrong. a correct function that reproduces r’s functionality is:
def sample_power_difftest(d, s, power=0.8, sig=0.05):
z = norm.isf([sig/2])
zp = -1 * norm.isf([power])
n = (2*(s**2)) * ((zp + z)**2) / (d**2)
return int(round(n[0]))
Hope this helps.
Some of the basic power calculations are now available in statsmodels
http://statsmodels.sourceforge.net/devel/stats.html#power-and-sample-size-calculations
http://jpktd.blogspot.ca/2013/03/statistical-power-in-statsmodels.html
The blog article does not yet take the latest changes to the statsmodels code into account. Also, I haven’t decided yet how many wrapper functions to provide, since many power calculations just reduce to the basic distribution.
>>> import statsmodels.stats.api as sms
>>> es = sms.proportion_effectsize(0.5, 0.75)
>>> sms.NormalIndPower().solve_power(es, power=0.9, alpha=0.05, ratio=1)
76.652940372066908
In R stats
> power.prop.test(p1 = .50, p2 = .75, power = .90)
Two-sample comparison of proportions power calculation
n = 76.7069301141077
p1 = 0.5
p2 = 0.75
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: n is number in *each* group
using R’s pwr
package
> library(pwr)
> h<-ES.h(0.5,0.75)
> pwr.2p.test(h=h, power=0.9, sig.level=0.05)
Difference of proportion power calculation for binomial distribution (arcsine transformation)
h = 0.5235987755982985
n = 76.6529406106181
sig.level = 0.05
power = 0.9
alternative = two.sided
NOTE: same sample sizes
You also have:
from statsmodels.stats.power import tt_ind_solve_power
and put “None” in the value you want to obtain. For instande, to obtain the number of observations in the case of effect_size = 0.1, power = 0.8 and so on, you should put:
tt_ind_solve_power(effect_size=0.1, nobs1 = None, alpha=0.05, power=0.8, ratio=1, alternative='two-sided')
and obtain: 1570.7330663315456 as the number of observations required.
Or else, to obtain the power you can attain with the other values fixed:
tt_ind_solve_power(effect_size= 0.2, nobs1 = 200, alpha=0.05, power=None, ratio=1, alternative='two-sided')
and you obtain: 0.5140816347005553