0% found this document useful (0 votes)
3 views

Inference using normal and t distribution

The document discusses inference using normal and t distributions, detailing how sample means behave under different conditions of known and unknown variance. It explains the construction of confidence intervals and hypothesis testing for both single and paired samples, including examples and the significance levels involved. The document emphasizes the use of t-distribution for small sample sizes and the central limit theorem for larger samples.

Uploaded by

eeshakamil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Inference using normal and t distribution

The document discusses inference using normal and t distributions, detailing how sample means behave under different conditions of known and unknown variance. It explains the construction of confidence intervals and hypothesis testing for both single and paired samples, including examples and the significance levels involved. The document emphasizes the use of t-distribution for small sample sizes and the central limit theorem for larger samples.

Uploaded by

eeshakamil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Inference using Normal and t distribution

If we take a sample size from a normal population of known variance 𝜎 2 , then no matter
𝜎2
what the sample size, 𝑋̅ ~𝑁(𝜇, ), exactly.
𝑛

If we take a sample of size n>30 from any population distribution where 𝜎 2 is known then
𝜎2
by central limit theorem, 𝑋̅~𝑁 (𝜇, ).
𝑛

If we take a sample of size n>30 from any population distribution with unknown variance
𝑠2
then 𝜎 2 is estimated by calculating 𝑠 2 and so 𝑋̅~𝑁(𝜇, ).
𝑛

2
1 2
(𝛴𝑥)2 1
𝑠 = (𝛴𝑥 − )= 𝛴(𝑥 − 𝑥̅ )2
𝑛−1 𝑛 𝑛−1

If we take a sample from normal population with unknown variance then 𝑋̅ is exactly
described by the t-distribution.
If a large sample (n>30) is taken from normal population. As the sample size gets larger
the t-distribution converges to the normal distribution. Technically, if we have a normal
population with unknown variance then 𝑋̅ is exactly a t-distribution, but if n>30 then
Central Limit Theorem lets us approximate 𝑋̅ as a normal. In practice the t-distribution is
used only with small sample sizes.

t-distribution
P(T≤t)=p
Suppose a sample of size n is drawn from an 𝑁(𝜇, 𝜎) population. Then the one sample
t statistic:
𝑥̅ − 𝜇
𝑡= 𝑠
√𝑛
has the t distribution with n-1 degrees of freedom. Degrees of freedom is denoted by v.
𝑣 =𝑛−1
Comparing density curves of the standard normal distribution and
t-distribution
̶ The density curve of the t-distribution is similar in shape to the standard normal curve.
̶ The spread of the t-distribution is a bit larger than that of the standard normal
distribution.
̶ The t-distribution has more probability in the tails and less in the center than does the
standard normal.
̶ As the degrees of freedom increase, the t density curve becomes even closer to the
standard normal curve.

The One-Sample t Confidence Interval


The one-sample t interval for a population mean is similar in both reasoning and
computational detail to the one-sample z interval for a population mean.
Choose a simple random sample of size n from a population having unknown mean μ.
Confidence Interval for μ:
𝑠
𝑥̅ ± 𝑡
√𝑛
𝑠
t is the critical value for t(n-1) distribution. Margin of error is 𝑡 .
√𝑛

Example
Suppose we need to construct a 95% confidence interval for the mean μ of a normal
population based on a sample of size n=12. What critical value t should be used?
So area under the t distribution curve from−𝑡𝑐𝑟𝑖𝑡 𝑡𝑜 𝑡𝑐𝑟𝑖𝑡 is 0.95.
v=n-1=11
So from the table,
0.95+0.025=0.975.
For p=0.975 when v=11.
𝑡𝑐𝑟𝑖𝑡 = 2.201
Hypothesis test when a small sample is drawn from a normal population of
unknown variance

Steps:
̶ We will be given values of x (sample) and a mean value(a).
̶ 𝐻0 𝜇 = 𝑎
̶ 𝐻1 𝜇 < 𝑎 𝑜𝑟 𝜇 > 𝑎 𝑜𝑟 𝜇 ≠ 𝑎
μ<a and μ>a means one tailed test. μ≠a means a two tailed test.
̶ Calculate sample mean from the data given.
𝛴𝑥
𝑥̅ =
𝑛

̶ Estimate the standard deviation.


1 (𝛴𝑥)2
𝑠2 = (𝛴𝑥 2 − )
𝑛−1 𝑛

̶ Calculate the test statistic.


𝑥̅ − 𝑎
𝑡= 𝑠
√𝑛
̶ Now to find critical value.
𝑣 =𝑛−1

Case 1 𝐻1 𝜇>𝑎
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
Critical Region 𝑡 > 𝑡𝑐𝑟𝑖𝑡
Case 2 𝐻1 𝜇<𝑎
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
Critical Region 𝑡 < −𝑡𝑐𝑟𝑖𝑡
Case 3 𝐻1 𝜇≠𝑎
𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙
𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 −
2
Critical Region 𝑡 < −𝑡𝑐𝑟𝑖𝑡 𝑡 > 𝑡𝑐𝑟𝑖𝑡
̶ This statement must be written:
If t is (greater than or less than) 𝑡𝑐𝑟𝑖𝑡 we reject 𝐻0 .
̶ Now compare t with 𝑡𝑐𝑟𝑖𝑡 and see whether it lies in critical region. Accept or reject
𝐻0 accordingly.
̶ Write the concluding statement.
Example
A machine is producing circular disks whose radius is normally distributed. Their radius
historically has been 5 cm. The factory foreman believes that the machine is now producing
disks that are too small. A sample of 9 disks is taken and their radii are:
4.8, 4.9, 4.5, 5.2, 4.9, 4.8, 5.0, 4.8, 5.0
Carry out a hypothesis test at 10% significance level.
𝐻0 𝜇=5
𝐻1 𝜇<5
10
Significance Level = 100 = 0

v=8
43.9
𝑥̅ = 9
= 4.88

1 43.92
𝑠 2 = 8 (214.43 − ) = 0.03694 𝑠 = 0.1922
9

4.88−5
𝑡= 0.1922 = -1.873
√9

P(T≤𝑡𝑐𝑟𝑖𝑡 )=1-0.1=0.9 𝑡𝑐𝑟𝑖𝑡= 1.397


We will reject 𝐻0 if t<- 𝑡𝑐𝑟𝑖𝑡
Critical Region t<-1.397
-1.873<-1.397 so we reject the null hypothesis. Factory Foreman’s belief is correct.
Testing for difference between means

𝜎 2
𝑋̅~𝑁(𝜇𝑥 , 𝑥 )
𝑛𝑥

𝜎𝑦 2
𝑌̅~𝑁(𝜇𝑦 , )
𝑛𝑦

X and Y are either normal or approximately true if 𝑛𝑥 𝑎𝑛𝑑 𝑛𝑦 are greater than 30 by Central
Limit Theorem.
Provided X and Y are independent.
𝜎𝑦 2 2
𝜎
𝑋̅ − 𝑌̅~𝑁(𝜇𝑥 − 𝜇𝑦 , 𝑥 + )
𝑛𝑥 𝑛𝑦

𝜎𝑥 2 𝜎𝑦 2
Standard Deviation = √ +
𝑛𝑥 𝑛𝑦

Test Statistic is given by


(𝑥̅ − 𝑦̅) − (𝜇𝑥 − 𝜇𝑦 )
𝑧=
𝜎 2 𝜎 2
√ 𝑥 + 𝑦
𝑛𝑥 𝑛𝑦

For hypothesis tests same method will be used as before. However here we will use the
normal distribution. Critical value would be found the same way only t would be replaced
by z. Then we will compare test statistic with the critical value to get the result.
If variances are unknown, we will estimate them given that the sample sizes of both x and y
are greater than 30.

If the variances are unknown for X and Y but both have a common variance and we are
testing (𝐻0 𝜇𝑥 − 𝜇𝑦 = 𝑐) a two-sample t-test is used. X and Y are normally distributed.

Here 𝑠𝑝 2 is the unbiased pooled estimate of the common variance, defined:

𝛴(𝑥 − 𝑥̅ )2 + 𝛴(𝑦 − 𝑦̅)2 (𝑛𝑥 − 1)𝑠𝑥 2 + (𝑛𝑦 − 1)𝑠𝑦 2


𝑠𝑝 2 = =
𝑛𝑥 + 𝑛𝑦 − 2 𝑛𝑥 + 𝑛𝑦 − 2

𝑣 = 𝑛𝑥 + 𝑛𝑦 − 2
The test statistic is given by
(𝑥̅ − 𝑦̅) − (𝜇𝑥 − 𝜇𝑦 )
𝑡=
1 1
√𝑠𝑝 2 (𝑛 + 𝑛 )
𝑥 𝑦

Example
A scientist wishes to test whether new heart medication reduces blood pressure. 10
patients with high blood pressure were given the medication and their summary data is
𝛴𝑥=1271 and 𝛴(𝑥 − 𝑥̅ )2 = 640.9. 8 patients with high blood pressure were given a placebo
and their summary data is 𝛴𝑦 = 1036 and 𝛴(𝑦 − 𝑦̅)2 = 222. Carry out a hypothesis test at
10% significance level to see if the medication is working.
𝐻0 𝜇𝑥 − 𝜇𝑦 = 0

𝐻1 𝜇𝑥 − 𝜇𝑦 < 0

Significance Level = 0.1


1271 1036
𝑥̅ = = 127.1 𝑦̅ = = 129.5 𝑥̅ − 𝑦̅ = −2.4
10 8
640.9+222
𝑠𝑝 2 = = 53.9315 𝑠𝑝 = 7.344 𝑣 = 10 + 8 − 2 = 16
10+8−2

−2.4−0
𝑡= 1 1
=-0.688
√53.9315( + )
10 8

𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 0.1 = 0.9 𝑡𝑐𝑟𝑖𝑡 = 1.337


Critical Region t<-1.337 -0.688>-1.337
So t lies in the acceptance region and we accept 𝐻0 .

If 𝑛𝑥 = 𝑛𝑦 and if every piece of data in x is somehow linked to a piece of data in y then it is


paired sample.
A new set of data is created d=x-y
Example
Dwayne believes that his mystical crystals can boost IQs. He takes 10 students and records
their IQs before and after they have been ‘blessed’ by the crystals. The results are:
Victim 1 2 3 4 5 6 7 8 9 10
IQ 107 124 161 89 96 120 109 98 147 89
before
IQ 108 124 159 100 101 119 110 101 146 94
after

Test at the 5% significance level Dwayne’s claim.


𝑑𝑖 = 𝐼𝑄𝑎𝑓𝑡𝑒𝑟 − 𝐼𝑄𝑏𝑒𝑓𝑜𝑟𝑒

So we have sample
1, 0, -2, 11, 5, -1, 1, 3, -1, 5
𝐻0 𝜇𝑑 = 0
𝐻0 𝜇𝑑 > 0
22
𝑑̅ = 10 = 2.2 𝑣 = 10 − 1 = 9

1 222
𝑠𝑑 2 = 10−1 (188 − ) = 15.51
10

2.2−0
𝑡= 15.51
=1.767

10

𝑃(𝑇 ≤ 𝑡𝑐𝑟𝑖𝑡 ) = 1 − 0.05 = 0.95 𝑡𝑐𝑟𝑖𝑡 = 1.833


Critical Region t>1.833 1.767<1.833
So we accept 𝐻0 .

Confidence Intervals
Denoted by [a,b]
a<x<b
An a% confidence interval means that we will look up the critical value from the t distribution or normal
1 𝑎
distribution table where area under the graph is 2 + 200 .

𝑎+𝑏
𝑥̅ =
2
If we draw from a normal of known variance, then the confidence interval is:
𝜎 𝜎
[𝑥̅ − 𝑧 , 𝑥̅ + 𝑧 ]
√𝑛 √𝑛
If we draw from a normal of unknown variance, then the confidence interval is:
𝑠 𝑠
[𝑥̅ − 𝑡 , 𝑥̅ + 𝑡 ]
√𝑛 √𝑛
𝑣 =𝑛−1 𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛.
If we draw from any distribution where n>30 (Central Limit Theorem), then the
confidence interval is:
𝑠 𝑠
[𝑥̅ − 𝑧 , 𝑥̅ + 𝑧 ]
√𝑛 √𝑛
s is the estimated standard deviation.

Confidence Interval for difference between populations


Difference in means being c from two normals of known variance:

𝜎𝑥 2 𝜎𝑦 2 𝜎𝑥 2 𝜎𝑦 2
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑧√ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑧√ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦

This can also be used for any population distribution of known variance give that n>30.

Difference in means being c from any distribution of unknown variance given that n>30:

𝑠𝑥 2 𝑠𝑦 2 𝑠𝑥 2 𝑠𝑦 2
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑧√ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑧√ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦

Difference in means being c from two normal of the same, unknown variance:

1 1 1 1
[𝑥̅ − 𝑦̅ − 𝑐 − 𝑡𝑠𝑝 √ + , 𝑥̅ − 𝑦̅ − 𝑐 + 𝑡𝑠𝑝 √ + ]
𝑛𝑥 𝑛𝑦 𝑛𝑥 𝑛𝑦

Here 𝑠𝑝 is the unbiased pooled estimate of common variance 𝑠𝑝 2 .

You might also like