Hypothesis Testing
Hypothesis Testing
(𝑥 ) 2 𝑥 2
S.D. of 𝑥 = −
𝑛 𝑛
Let us take the sample for the above data and find the mean and S.D.
Sample no. No. on the ticket Sample mean (𝑥 )2
in the sample
1 1,1 1 1
2 1,3 2 4
3 1,5 3 9
4 3,1 2 4
5 3,3 3 9
6 3,5 4 16
7 5,1 3 9
8 5,3 4 16
9 5,5 5 25
𝑥 = 27 (𝑥 )2 = 93
Thus, for the sample,
𝑥 27
Mean of 𝑥 = = =3
𝑛 9
(𝑥 ) 2 𝑥 2 93 27 2 93 2 31 4 2
S.D. of 𝑥 = − = − = − 3 − 9 = =
𝑛 𝑛 9 9 9 3 3 3
Next, consider the population 1, 3 and 5 and let us find the mean and S. D.
We have, n = 3, 𝑥 = 9 and 𝑥 2 = 35
𝑥 9
Population mean = = = = 3 and
𝑛 3
𝑥 2 𝑛 2
S.D. of Population = –
𝑛 𝑛
35
= – (3)2
3
35
= – 9
3
35 − 27
= 3
8
= 3
4∗ 2
= 3
2
= 2∗
3
Thus, in general,
Definition:
The distribution of 𝑥 is known as it sampling distribution.
The standard deviation of 𝑥 is known as its „Standard Error‟.
Ex.2. The mean life of a large set of fluorescent tubes is 1570 hour with a
standard deviation of 150 hours. A sample of 100 tubes is drawn from it
with replacement. Find the probabilities that the mean life of these tubes
will-
(i) exceed 1600 hrs.
(ii) not exceed 1540 hrs.
(Given: For s.n.v. z, area between (i) z = 0 to z = 2 is 0.4772
(ii) z = 0 to z = 1.33 is 0.482)
Soln: Let 𝑥 be the variate that follows normal distribution.
We have,
𝜇 = 1570
𝜎 = 150
For normal distribution, s.n.v. z is given by,
𝑥 −𝜇 𝑥 − 1570 𝑥 − 1570
𝑧= 𝜎 = 150 = 15
𝑛 100
Note:
1. Sample proportion (p) in general follows Binomial Distribution
2. For a large value of n it can be approximated by Normal Distribution
3. The distribution of p is called the “Sampling distribution” and the std
deviation of p is called the “Standard Error”.
4. Standard Normal Variate (s.n.v.) for p is given by,
𝑝−𝑃
𝑧=
𝑃𝑄
𝑛
Now,
When p = 25% = 0.25
Therefore, s.n.v. z is given by,
𝑝− 𝑃
𝑧=
𝑃 ∗𝑄
𝑛
0.25 − 0.2
= 0.02
0.05
= 0.02
= 2.5
∴ Prob(p>0.25) = P(z > 2.5)
= Area to the right of 2.5
= 0.5 – 0.4938
= 0.0062
Confidence interval and confidence limits:
For large proportion:
Let a sample size n be drawn from a large population with mean 𝜇 and
standard deviation 𝜎. Let 𝑥 denote the sample mean of the sample drawn.
Than s.n.v. z is given by ,
𝑥–𝜇
𝑧= 𝜎
𝑛
On the similar lines we can calculate the 99 % confidence limit and confidence
interval interval. The procedure can be repeated for population proportion. The
different limits and intervals are written below in the tabular form as follows:
𝑥–𝜇
1) For Population mean, 𝑧 = 𝜎
𝑛
1 % (0.01) 𝑃𝑄 𝑃𝑄 𝑃𝑄 𝑃𝑄
𝑝 − 2.58 ∗ and 𝑝 + 2.58 ∗ 𝑝 − 2.58 ∗ , 𝑝 + 2.58 ∗
𝑛 𝑛 𝑛 𝑛
Certainity 𝑃𝑄 𝑃𝑄 𝑃𝑄 𝑃𝑄
𝑝 −3∗ and 𝑝 + 3 ∗ 𝑝 −3 ∗ , 𝑝+3 ∗
𝑛 𝑛 𝑛 𝑛
Exercise:
1. In a study of television viewing habits, in order to obtained an interval
estimates of the average number of hours per week that teenager spend
watching television programmes, a random sample of 100 teenage children is
taken. The sample investigation revealed that a mean of 9.2 hours with S.D.
of 3.2 hours. Obtained the desired interval of estimates with confidence
coefficient of 0.99. (8.3744, 10.0256)
2. For a given sample of 200 items drawn from a large population, the mean is
65 and the S.D. is 8. Find the 95 % Confidence limits for the population
mean. (63.8913 and 66.1087)
3. A hospital is collecting data regarding the number of days spent by a patient
in the hospital for typhoid. A sample shows the following:
No. of days spent (x) 3 4 5 6 7 8 9
No. of patients (y) 8 10 15 20 20 15 12
Find the limits within which the mean number of days required to be spent in
the hospital by a patient for typhoid lies almost certainty. (5.745 and 6.795)
Estimation of Point: (Sample proportion)
Ex.1. A random sample of 100 balls slected from a large consignment of tennis
balls gave 10 % bad balls. Find 99 % confidence limits for the percentage of
bad balls in the consignment.
Soln: We Proceeds as follows:
To find: the 99% confidence limits to the percentage of persons who
would buy the product.
Given that:
𝑛 = 100
𝑝 = 10% = 0.1
𝑃 = 𝑝 = 0.1
⟹ 𝑄 = 1 − 𝑃 = 0.9 and 𝛼 = 1 %
99% Confidence limits are given by,
𝑃𝑄 𝑃𝑄
𝑝 − 2.57 ∗ and 𝑝 − 2.58 ∗
𝑛 𝑛
0.1∗0.9 0.1∗0.9
0.1 − 2.58 ∗ and 0.1 + 2.58 ∗
100 100
0.1 − 2.58 ∗ 0.03 and 0.1 + 2.58 ∗ 0.03
0.1 − 0.0744 and 0.1 + 0.0744
0.0226 and 0.1744
i.e. 0.0226 and 0.1744
Thus, 99% confidence limits to the percentage of persons are:
2.26 and 17.44
Exercise:
1. In a sample of 1000 T.V. viewer, 340 watch a particular programme. Find
99 % confidence limits for the percentage of all viewers who watch the
programme. (30.135% and 37.865%)
2. A department store wants to determine the percentage of shoppers who
leaves only after having actually bought at least one item. A random sample
of 900 shoppers leaving the store, showed that 750 had brought something
ranging from a couple of saop bars to a complete bedroom furniture set.
What is 99% confidence interval for the true percentage of buyers?
(82%, 84%)
3. A sample of 900 days is taken from a meterological records of a certain
districts and 100 of them are found to be foggy. Determine the probable
limits for the percentage of foggy days in the district. (7.89% and 14.135).
Testing of Hypothesis:
Acceptance region:
The region that falls outside the region of critical region is called „Region of
acceptance‟
Level of significance:
Level of significance is define as the maximum probability of, with which we
would be willing to risk a Type I error.
[Alternatively, it is also defined as a test designed so that P(Type I Error) ≤ ∝,
then ∝ is called level of significance.] It is denoted by 𝛼.
Since ∝ is the probability value, it is always less than 1. Generally we take 𝛼
between1% to 5%. By default we take ∝ as 5% (0.05).
Note:
1) Decision criterion is based on:
i) 𝐻1
ii) 𝑙. 𝑜. 𝑠. 𝛼 and
iii) Large Sample size n.
2) However, for a large samples, it will depend on 𝐻1 and 𝛼.
3) Here, z is the computed value of test statistics based on selected sample
/proportion.
Ex.2. An ambulance service claims that it takes on an average 8.9 minutes for
an ambulance to reach its destination in emergency calls. To check on this
claim, the agency which licenses ambulance service has timed them on 50
emergency calls, getting a mean of 9.3 minutes with a standard deviation
of 1.6 minutes. What can they conclude at 5 % l.o.s.
Soln: We proceed as follows:
Step 1.: 𝐻0 = An ambulance can reached its destination in 8.9 minutes.
𝐻1 = An ambulance cannot reached its destination in 9.8 minutes.
(i.e. 𝐻0 ∶ 𝜇 = 8.9 min
𝐻1 ∶ 𝜇 ≠ 8.9 min )
Step 2.: 𝛼 = 0.05 (i.e. 5 %)
Step 3.: Based on 𝐻1 and 𝛼 and large value of n = 50, we take decision:
Reject 𝐻0 iff 𝑧 < −1.96 or 𝑧 > 1.96
Step 4.: For a given sample, given that,
𝑥 = 9.3
𝜎 = 1.6
𝑛 = 50
And for population, population mean, 𝜇 = 8.9
Test statistics z is given by,
𝑥 −𝜇
𝑧= 𝜎
𝑛
9.3 − 8.9
= 1 .6
50
= 1.77
Step 5: Observe that the computed value of z (i.e. z = 1.77 not > 1.96.
Thus we do not reject H0.
Ex.2. In a sample of 400 residents of a locality, 232 are men. Test the
hypothesis that sex ratio is 1:1 at 1 % l.o.s.
Soln: We Proceeds as follows:
Step1: H0 = Sex ratio is 1:1.
(The man and woman occurs in the ratio1:1
1
i.e. 𝑃 = )
2
1
H1 = Sex ratio is not 1:1 i.e. 𝑃 ≠ 2.
Step 2: Level of significance, 𝛼 = 0.01 = 5%
Step 3: Decision criteria: Based on H1 and 𝛼 (and a large value of n= 400)
we arrive at a decision criterion, reject H 0:
if z < - 2.58 or z > 2.58
Step 4: Given that
1
𝑃 = 2 = 0.5 (P = proportion of men in the population)
1
⟹ 𝑄 = 1 − 𝑃 = = 0.5
2
n = 400
232
𝑝 = 400 = 0.58 (p = proportion of men in the population)
Test statistics z is given by,
𝑝 −𝑃
𝑧 = 𝑃𝑄
𝑛
0.58 − 0.5
= 0.5 ∗0 .5
400
0.08
= 0 .5
20
0.08 ∗ 20
=
0.5
= 3.2
Step 5: Observe that computed value of z = 3.2 > 2.58.
Thus we reject H0.
References:
1. Gupta S.P. , Statistical Methods, Sultan Chand and sons.
2. Gupta C.B., Fundamentals of Statistics, Himalaya Publishing House
3. Shah R. J., Statistical Methods.