Inference using normal and t distribution
Inference using normal and t distribution
If we take a sample size from a normal population of known variance 𝜎 2 , then no matter
𝜎2
what the sample size, 𝑋̅ ~𝑁(𝜇, ), exactly.
𝑛
If we take a sample of size n>30 from any population distribution where 𝜎 2 is known then
𝜎2
by central limit theorem, 𝑋̅~𝑁 (𝜇, ).
𝑛
If we take a sample of size n>30 from any population distribution with unknown variance
𝑠2
then 𝜎 2 is estimated by calculating 𝑠 2 and so 𝑋̅~𝑁(𝜇, ).
𝑛
2
1 2
(𝛴𝑥)2 1
𝑠 = (𝛴𝑥 − )= 𝛴(𝑥 − 𝑥̅ )2
𝑛−1 𝑛 𝑛−1
If we take a sample from normal population with unknown variance then 𝑋̅ is exactly
described by the t-distribution.
If a large sample (n>30) is taken from normal population. As the sample size gets larger
the t-distribution converges to the normal distribution. Technically, if we have a normal
population with unknown variance then 𝑋̅ is exactly a t-distribution, but if n>30 then
Central Limit Theorem lets us approximate 𝑋̅ as a normal. In practice the t-distribution is
used only with small sample sizes.
t-distribution
P(T≤t)=p
Suppose a sample of size n is drawn from an 𝑁(𝜇, 𝜎) population. Then the one sample
t statistic:
𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
has the t distribution with n-1 degrees of freedom. Degrees of freedom is denoted by v.
𝑣 =𝑛−1
Comparing density curves of the standard normal distribution and
t-distribution
̶ The density curve of the t-distribution is similar in shape to the standard normal curve.
̶ The spread of the t-distribution is a bit larger than that of the standard normal
distribution.
̶ The t-distribution has more probability in the tails and less in the center than does the
standard normal.
̶ As the degrees of freedom increase, the t density curve becomes even closer to the
standard normal curve.
Example
Suppose we need to construct a 95% confidence interval for the mean μ of a normal
population based on a sample of size n=12. What critical value t should be used?
So area under the t distribution curve from−𝑡𝑐𝑟𝑖𝑡 𝑡𝑜 𝑡𝑐𝑟𝑖𝑡 is 0.95.
v=n-1=11
So from the table,
0.95+0.025=0.975.
For p=0.975 when v=11.
𝑡𝑐𝑟𝑖𝑡 = 2.201
Hypothesis test when a small sample is drawn from a normal population of
unknown variance
Steps:
̶ We will be given values of x (sample) and a mean value(a).
̶ 𝐻0 𝜇 = 𝑎
̶ 𝐻1 𝜇 < 𝑎 𝑜𝑟 𝜇 > 𝑎 𝑜𝑟 𝜇 ≠ 𝑎
μ<a and μ>a means one tailed test. μ≠a means a two tailed test.
̶ Calculate sample mean from the data given.
𝛴𝑥
𝑥̅ =
𝑛
Case 1 𝐻1 𝜇>𝑎
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
Critical Region 𝑡 > 𝑡𝑐𝑟𝑖𝑡
Case 2 𝐻1 𝜇<𝑎
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
Critical Region 𝑡 < −𝑡𝑐𝑟𝑖𝑡
Case 3 𝐻1 𝜇≠𝑎
𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 −
2
Critical Region 𝑡 < −𝑡𝑐𝑟𝑖𝑡 𝑡 > 𝑡𝑐𝑟𝑖𝑡
̶ This statement must be written:
If t is (greater than or less than) 𝑡𝑐𝑟𝑖𝑡 we reject 𝐻0 .
̶ Now compare t with 𝑡𝑐𝑟𝑖𝑡 and see whether it lies in critical region. Accept or reject
𝐻0 accordingly.
̶ Write the concluding statement.
Example
A machine is producing circular disks whose radius is normally distributed. Their radius
historically has been 5 cm. The factory foreman believes that the machine is now producing
disks that are too small. A sample of 9 disks is taken and their radii are:
4.8, 4.9, 4.5, 5.2, 4.9, 4.8, 5.0, 4.8, 5.0
Carry out a hypothesis test at 10% significance level.
𝐻0 𝜇=5
𝐻1 𝜇<5
10
Significance Level = 100 = 0
v=8
43.9
𝑥̅ = 9
= 4.88
1 43.92
𝑠 2 = 8 (214.43 − ) = 0.03694 𝑠 = 0.1922
9
4.88−5
𝑡= 0.1922 = -1.873
√9
𝜎 2
𝑋̅~𝑁(𝜇𝑥 , 𝑥 )
𝑛𝑥
𝜎𝑦 2
𝑌̅~𝑁(𝜇𝑦 , )
𝑛𝑦
X and Y are either normal or approximately true if 𝑛𝑥 𝑎𝑛𝑑 𝑛𝑦 are greater than 30 by Central
Limit Theorem.
Provided X and Y are independent.
𝜎𝑦 2 2
𝜎
𝑋̅ − 𝑌̅~𝑁(𝜇𝑥 − 𝜇𝑦 , 𝑥 + )
𝑛𝑥 𝑛𝑦
𝜎𝑥 2 𝜎𝑦 2
Standard Deviation = √ +
𝑛𝑥 𝑛𝑦
For hypothesis tests same method will be used as before. However here we will use the
normal distribution. Critical value would be found the same way only t would be replaced
by z. Then we will compare test statistic with the critical value to get the result.
If variances are unknown, we will estimate them given that the sample sizes of both x and y
are greater than 30.
If the variances are unknown for X and Y but both have a common variance and we are
testing (𝐻0 𝜇𝑥 − 𝜇𝑦 = 𝑐) a two-sample t-test is used. X and Y are normally distributed.
𝑣 = 𝑛𝑥 + 𝑛𝑦 − 2
The test statistic is given by
(𝑥̅ − 𝑦̅) − (𝜇𝑥 − 𝜇𝑦 )
𝑡=
1 1
√𝑠𝑝 2 (𝑛 + 𝑛 )
𝑥 𝑦
Example
A scientist wishes to test whether new heart medication reduces blood pressure. 10
patients with high blood pressure were given the medication and their summary data is
𝛴𝑥=1271 and 𝛴(𝑥 − 𝑥̅ )2 = 640.9. 8 patients with high blood pressure were given a placebo
and their summary data is 𝛴𝑦 = 1036 and 𝛴(𝑦 − 𝑦̅)2 = 222. Carry out a hypothesis test at
10% significance level to see if the medication is working.
𝐻0 𝜇𝑥 − 𝜇𝑦 = 0
𝐻1 𝜇𝑥 − 𝜇𝑦 < 0
−2.4−0
𝑡= 1 1
=-0.688
√53.9315( + )
10 8
So we have sample
1, 0, -2, 11, 5, -1, 1, 3, -1, 5
𝐻0 𝜇𝑑 = 0
𝐻0 𝜇𝑑 > 0
22
𝑑̅ = 10 = 2.2 𝑣 = 10 − 1 = 9
1 222
𝑠𝑑 2 = 10−1 (188 − ) = 15.51
10
2.2−0
𝑡= 15.51
=1.767
√
10
Confidence Intervals
Denoted by [a,b]
a<x<b
An a% confidence interval means that we will look up the critical value from the t distribution or normal
1 𝑎
distribution table where area under the graph is 2 + 200 .
𝑎+𝑏
𝑥̅ =
2
If we draw from a normal of known variance, then the confidence interval is:
𝜎 𝜎
[𝑥̅ − 𝑧 , 𝑥̅ + 𝑧 ]
√𝑛 √𝑛
If we draw from a normal of unknown variance, then the confidence interval is:
𝑠 𝑠
[𝑥̅ − 𝑡 , 𝑥̅ + 𝑡 ]
√𝑛 √𝑛
𝑣 =𝑛−1 𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛.
If we draw from any distribution where n>30 (Central Limit Theorem), then the
confidence interval is:
𝑠 𝑠
[𝑥̅ − 𝑧 , 𝑥̅ + 𝑧 ]
√𝑛 √𝑛
s is the estimated standard deviation.
𝜎𝑥 2 𝜎𝑦 2 𝜎𝑥 2 𝜎𝑦 2
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑧√ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑧√ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦
This can also be used for any population distribution of known variance give that n>30.
Difference in means being c from any distribution of unknown variance given that n>30:
𝑠𝑥 2 𝑠𝑦 2 𝑠𝑥 2 𝑠𝑦 2
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑧√ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑧√ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦
Difference in means being c from two normal of the same, unknown variance:
1 1 1 1
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑡𝑠𝑝 √ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑡𝑠𝑝 √ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦