How to use t.ppf()? which are the arguments?
Question:
I couldn’t understand how to properly use t.ppf
, could someone please explain it to me?
I have to use this information
- scipy.stats.t
- scipy.stats
- a mean of 100
- a standard deviation of 0.39
- N = 851 (851 samples)
When I’m asked to calculate the (95%) margin of error using t.ppf() will the code look like below?
cutoff1 = t.ppf(0.05,100,0.36,850)
Can somebody help me, please?
Answers:
According to the reference docs, the arguments to t.ppf
are q
, df
, loc
, and scale
. The df
argument is degrees of freedom, which is usually the sample size minus 1 for a single population sampling problem. Since ppf
calculates the inverse cumulative distribution function, by definition a result of x
for a given q
-value and df
means P{T <= x} = q
, i.e, there is probability q
of getting outcomes less than or equal to x
from a T
distribution with the given loc
and scale
. The loc
(mean) and scale
(standard deviation) arguments are optional, and default to 0 and 1, respectively.
To get a 95% margin of error, you want 5% of the probability to be in the tails of the distribution. This is usually done symmetrically so that 2.5% is in each tail, so you would use q
values of 0.025 and 0.975 for the lower and upper cutoff points respectively. For your particular problem, the code would look something like:
from scipy.stats import t
n = 851
mean = 100
std_dev = 0.39
lower_cutoff = t.ppf(0.025, n - 1, loc = mean, scale = std_dev) # => 99.23452406698323
upper_cutoff = t.ppf(0.975, n - 1, loc = mean, scale = std_dev) # => 100.76547593301677
I’m almost certain you have to use standard error instead, which is std/sqrt(n)
I couldn’t understand how to properly use t.ppf
, could someone please explain it to me?
I have to use this information
- scipy.stats.t
- scipy.stats
- a mean of 100
- a standard deviation of 0.39
- N = 851 (851 samples)
When I’m asked to calculate the (95%) margin of error using t.ppf() will the code look like below?
cutoff1 = t.ppf(0.05,100,0.36,850)
Can somebody help me, please?
According to the reference docs, the arguments to t.ppf
are q
, df
, loc
, and scale
. The df
argument is degrees of freedom, which is usually the sample size minus 1 for a single population sampling problem. Since ppf
calculates the inverse cumulative distribution function, by definition a result of x
for a given q
-value and df
means P{T <= x} = q
, i.e, there is probability q
of getting outcomes less than or equal to x
from a T
distribution with the given loc
and scale
. The loc
(mean) and scale
(standard deviation) arguments are optional, and default to 0 and 1, respectively.
To get a 95% margin of error, you want 5% of the probability to be in the tails of the distribution. This is usually done symmetrically so that 2.5% is in each tail, so you would use q
values of 0.025 and 0.975 for the lower and upper cutoff points respectively. For your particular problem, the code would look something like:
from scipy.stats import t
n = 851
mean = 100
std_dev = 0.39
lower_cutoff = t.ppf(0.025, n - 1, loc = mean, scale = std_dev) # => 99.23452406698323
upper_cutoff = t.ppf(0.975, n - 1, loc = mean, scale = std_dev) # => 100.76547593301677
I’m almost certain you have to use standard error instead, which is std/sqrt(n)