Chapter 2 SM
Chapter 2 SM
CHAPTER TWO
2 INFERENCE ABOUT POPULATION AND PROPORTION
❖ Outline:
2.1 Introduction
2.2 Estimation
2.2.1 Sampling distribution of the sample mean
2.2.2 Point and interval estimation of population mean
2.2.3 Sampling distribution of the sample proportion
2.2.4 Point and interval estimation of population proportion
2.3 Hypothesis testing
2.3.1 Important concepts in testing statistical hypothesis
2.3.2 Hypothesis testing about the population mean
2.3.3 Hypothesis testing about the population proportion
2.3.4 Sample size determination
2.1. INTRODUCTION
Inference, specifically decision making and prediction, is centuries old and plays a very important role
in our lives. Each of us faces daily personal decisions and situations that require predictions concerning
the future. The inferences that individuals make should be based on relevant facts, which we call
observations, or data.
Methods for making inferences about parameters fall into one of two categories. Either we will estimate
(predict) the value of the population parameter of interest or we will test a hypothesis about the value of
the parameter. These two methods of statistical inference estimation and hypothesis testing involve
different procedures, and, more important, they answer two different questions about the parameter. In
estimating a population parameter, we are answering the question, ‘‘what is the value of the population
parameter?’’ In testing a hypothesis, we are answering the question, ‘‘is the parameter value equal to
this specific value?’’
Inference is the process of making interpretations or conclusions from sample data for the totality of
the population. Inferential statistics uses the sample results to make decisions and draw conclusions
about the population from which the sample is drawn. In statistics there are two ways through which
inference can be made.
✓ Statistical estimation ✓ Statistical hypothesis testing
Parameter and Statistic
A number that describes a population is called a parameter.
1
Statistical Method Chapter 2: Inference about Population and Proportion
The two common forms of statistical inference are. 1. Estimation 2. Null hypothesis tests of
significance (NHTS)
There are two forms of estimation:
➢ Point estimation (maximally likely value for parameter)
➢ Interval estimation (also called confidence interval for parameter)
Both estimation and NHTS are used to infer parameters. A parameter is a statistical constant that
describes a feature about a phenomena, population, pmf, or pdf
2.2 Statistical Estimation:
This is one way of making inference about the population parameter where the investigator does not
have any prior notion about values or characteristics of the population parameter. There are two ways
estimation:
i. Point Estimation: The goal of point estimation is to make a reasonable guess of the unknown value
of a designated population quantity, e.g., the populations mean. The quality of an individual
estimate depends on the individual sample from which it was computed and is therefore affected by
chance variation. Point Estimation is a single value or number of sample information that is used to
estimate a parameter. The best point estimate of the population mean is the sample mean X .
2
Statistical Method Chapter 2: Inference about Population and Proportion
ii. Interval estimation: It is the procedure that results in the interval of values as an estimate for a
parameter, which is interval that contains the likely values of a parameter. It deals with identifying
the upper and lower limits of a parameter.
Estimator and Estimate
Estimator is the rule or random variable that helps us to approximate a population parameter. But
estimate is the different possible values which an estimator can assume. For example: The sample mean
n
X
is an estimator for the population mean and X = 10 is an estimate, which is one of the
i
X = i =1
possible values of X .
Properties of best estimator
Three Properties of a Good Estimator
➢ It should be unbiased.
➢ It should be consistent.
➢ It should be relatively efficient.
✓ The estimator should be an unbiased estimator. That is, the expected value or the mean of the
estimates obtained from samples of a given size is equal to the parameter being estimated. It’s
desirable that the sampling distribution be centered on the true population parameter. An estimator
with this property is called unbiased.
✓
= = 𝛍 it is UE
3
Statistical Method Chapter 2: Inference about Population and Proportion
Now, let's multiply both sides of the equation by n-1, just so we don't have to keep carrying that
around, and square out the right side, just like we did with that shortcut formula for SSX,
above.
4
Statistical Method Chapter 2: Inference about Population and Proportion
Unfortunately, the expected value of the square of something is not equal to the square of the
expected value, so we seem to have hit an impasse with both terms on the RHS. But, we're not
out of tricks yet. Each of those terms is an expected value of something squared: a second
moment. Let's use the trick about moments that we saw above. First, let Y be the random
variable defined by the sample mean. We're trying to figure out the expected value of its
square.
We can substitute this stuff for the second term on the RHS of equation 1. Also, note that the first term
on the RHS of equation 1 is the second moment of X, so that can also be rewritten. Doing both
substitutions gives us:
✓ The estimator should be consistent. For a consistent estimator, as sample size increases, the value of
the estimator approaches the value of the parameter estimated.
5
Statistical Method Chapter 2: Inference about Population and Proportion
✓ The estimator should be a relatively efficient estimator. That is, of all the statistics that can be used
to estimate a parameter, the relatively efficient estimator has the smallest variance. It's desirable that
our chosen estimator have a small standard error in comparison with other estimators we might
have chosen.
2.2.1 Sampling Distribution of the sample mean
Because statistic such as x varies from sample to sample, they are random variables. As such, Statistic
has probability distributions associated with them. In order to make probability statements regarding a
sample statistic, we need to know the probability distribution of the sample statistic. That is to say, we
need to know the shape, center and spread of the sample statistic’s distribution.
The sampling distribution of a statistic is a probability distribution for all possible values of the
statistic computed from a sample of size n.
➢ There are commonly three properties of interest of a given sampling distribution.
✓ Its Mean
✓ Its Variance
✓ Its Functional form.
Sampling distribution of the sample mean is a theoretical probability distribution that shows the
functional relationship between the possible values of a given sample mean based on samples of size
and the probability associated with each value, for all possible samples of size drawn from that
particular population.
Steps for the construction of Sampling Distribution of the mean
There are some steps to construct sampling distribution of sample mean as follows:
1. From a finite population of size N, randomly draw all possible sample of size n.
(𝑁
𝑛
) possible sample means if the sampling is without replacement and
𝑁 𝑛 possible sample means if sampling is with replacement.
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution.
4. Find the probability distribution of all sample means.
6
Statistical Method Chapter 2: Inference about Population and Proportion
The standard error of the mean is the difference between the sample measure and the population
measure since the sample is not a perfect representation of the population.
Example 2.1: Suppose we have a population of size N=5, consisting of the age of five children
6,8,10,12,and14. Take sample of size n=2 without replacement.
7
Statistical Method Chapter 2: Inference about Population and Proportion
8
Statistical Method Chapter 2: Inference about Population and Proportion
9
Statistical Method Chapter 2: Inference about Population and Proportion
10
Statistical Method Chapter 2: Inference about Population and Proportion
The central limit theorem states that the sample mean follows approximately the normal distribution
with mean 𝛍 and standard deviation σ, where 𝛍 and σ are the mean and standard deviation of the
11
Statistical Method Chapter 2: Inference about Population and Proportion
population from where the sample was selected. The sample size n has to be large (usually n ≥ 30) if
the population from where the sample is taken is non normal.
If the population follows the normal distribution then the sample size n can be either small or large
12
Statistical Method Chapter 2: Inference about Population and Proportion
P(− Z Z Z ) = 1 −
2 2
X −
P(− Z Z Z ) = P(− Z Z ) = 1 −
2 2 2 n 2
P(− X − Z − − X + Z ) = 1−
2 n 2 n
P( X − Z X + Z ) = 1−
2 n 2 n
A (1 − )100% confidenceint erval for will be ( X − Z , X + Z )
2 n 2 n
However most of the time is not known, in that case we estimate by its po int estimate S
S S
( X − Z , X + Z ) is a (1 − )100%confidenceint erval for
2 n 2 n
The Z values corresponding to the most commonly used confidence levels is given below
(1-α)100% α α/2 Zα/2
90 0.1 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
13
Statistical Method Chapter 2: Inference about Population and Proportion
Figure 1: The area to the right of zα is α. For example, z.05 is 1.645. The area outside ± zα /2
is α/2+α/2=α. For example, z.025=1.96 so the area to the right of 1.96 is 0.025, the area to the left
of−1.96 is also 0.025, and the area outside±1.96 is .05.
➢ Why zα/2, rather than zα? We want to make sure that the total area outside the interval is α. This
means that α/2 should be to the left of the interval and α/2 should be to the right. In the special
case of a 90% confidence interval, α=0.1, so α/2=0.05, andz.05 is indeed 1.645.
The expression
is called the half width of the confidence interval or the margin of error. The half width is a measure of
precision; the tighter the interval, the more precise our estimate. Not surprisingly, the half width
➢ decreases as the sample size increases;
➢ increases as the population standard deviation increases;
➢ increases as the confidence level increases (higher confidence requires larger zα/2
14
Statistical Method Chapter 2: Inference about Population and Proportion
/2
15
Statistical Method Chapter 2: Inference about Population and Proportion
For any sample size n and any confidence level 1−α,we have tn−1,α/2 >zα/2 Consequently, intervals
based on the t distribution are always wider then those based on the standard normal. As the sample
size increases, the df increases. As the df increases, that distribution becomes the normal distribution.
• For example, we know that z 0.05 =1.645. Look down the.05 column of the t table. t
n,.05approaches 1.645 as n increases
When to use t, when to use z? Strictly speaking, the conditions are as follows:
➢ Z-based confidence intervals are valid if we have a large sample;
➢ t - based confidence intervals are valid if we have a sample from a normal distribution with an
unknown variance.
Examples: 1) The registrar of Dambi Dollo University is interested to estimate the average age of
students who graduate with BSc degree. From past studies the population standard deviation is known
to be 2 years. A sample of 50 graduating students is selected, and the mean is found to be 23.2 years.
Find the 95% confidence interval estimate of the population mean age of the graduating students at the
university.
1) A random sample size 36 selected from a normal population has a mean of 32. Given that the
population standard deviation (σ) is 4.2. Find
a) A 95% confidence interval for the population mean
b) A 99% confidence interval for the population mean
c) Which interval is wider? Explain why
2) The mean operating life time for a random sample of n =10 light bulbs is X =4,000 hr, with the
sample standard deviation S=200 hr. The operating life of bulbs in general is assumed to be
approximately normally distributed. Find the 95% confidence interval for the true mean operating
life time.
Solutions:
1. Given: σ=2 years, X = 23.2 years , n=50 (Case 1)
A (1 − )100%confidence int erval for is ( X − Z , X + Z )
2 n 2 n
X − Z X + Z
2 n 2 n
1 − = 0.95 = 0.05, = 0.025 Z = 1.96
2 2
2 2
23.2 − 1.96 23.2 + 1.96
50 50
23.2 − 0.55 23.2 + 0.55
22.65 23.75
Interpretation : The registerar is 95% confident taht the averafe age of graguating students is
between 22.65and 23.75 years
16
Statistical Method Chapter 2: Inference about Population and Proportion
b)99% 1 − = 0.99 = 0.01, = 0.005 Z
2 = 2.58
2
X − Z X + Z
2 n 2 n
4.2 4.2
32 − 2.58 32 + 2.58
36 36
32 1.806
30.194 33.806
The 99% confidence int erval is (29.83, 34.17 )
Interpretation : We are 95% confident taht the population mean is between 29.83and 34.17
c)The 99% confidence int erval is wider than the 95% confidence int erval
As the confidence increases the int erval becomes l arg e
17
Statistical Method Chapter 2: Inference about Population and Proportion
18
Statistical Method Chapter 2: Inference about Population and Proportion
The following table shows all possible value of p̂ (rounded to two decimal places) for each sample.
19
Statistical Method Chapter 2: Inference about Population and Proportion
P(1 − P )
mean Pˆ = P and a standard error Pˆ = if nP and n(1 − P ) is greater than 5.
n
2.2.4 Point and Interval estimation of population proportions (P)
Point estimation of population proportions
X
If P represents for the population proportion then the sample proportion Pˆ = provides a good
n
estimate of P. Therefore, the sample proportion P̂ is the point estimation of the population proportion.
Interval estimation of population proportions (P)
In the binomial experiment each trial results in one of two outcomes, which we labeled as either a
success or a failure. We designated P as the probability of a success and 1 − P as the probability of a
failure. Then the probability distribution for x, the number of successes in n identical trials, is
x
P(x ) = P (1 − P )
n! n− x
x!(n − x )!
20
Statistical Method Chapter 2: Inference about Population and Proportion
In a random sample of n from a population in which the proportion of elements classified as successes
is P , the best estimate of the parameter P is the sample proportion of successes. Letting x denote the
x
number of successes in the n sample trials, the sample proportion is Pˆ = . X can be approximated by
n
using a normal curve when nP 5 and n(1 − P ) 5 .
x
In a similar way, the distribution of Pˆ = can be approximated by a normal distribution with a mean
n
P(1 − P )
and a standard error given as Pˆ = P and Pˆ = respectively.
n
A general 100 (1 − ) 100% confidence interval for the proportion of successes is given by
pˆ qˆ pˆ qˆ
( pˆ − Z , pˆ + Z
2 n 2 n
Examples
a. If in a random sample of n=230 voters, 54 voted for candidate A. find the 90% confidence
interval for the proportion of individuals who voted for candidate A.
b. In a sample of 100 teenage girls, 30% used hair coloring. Find the 95% confidence interval of
the true proportion of teenage girls who use hair coloring.
Solutions:
a)Let x be the number of individuals who voted for candidate A
x 54
pˆ = = = 0.235 qˆ = 1 − pˆ = 1 − 0.235 = 0.765 90% Z = 1.645
n 230
2
pˆ qˆ pˆ qˆ
confidenceint erval : ( pˆ − Z , pˆ + Z
n n
2 2
0.235 0.765 0.235 0.765
0.235 − 1.645 , 0.235 − 1.645
230 23
0.235 − 0.046, 0.235 + 0.046
(0.189,0.281) 0.189 p 0.281
18.9% p 28.1%
We can be 90% confident that the true population proportionis betwen 18.9% and 28.1%
21
Statistical Method Chapter 2: Inference about Population and Proportion
22
Statistical Method Chapter 2: Inference about Population and Proportion
Exercise:
1. A survey of 1000 people who watched the Democrats/Republican debate resulted in 600 who
thought that democrats won the debate. Construct a 95% percent confidence interval for the
proportion of people who thought democrats won the debate.
2. A survey of 120 female freshmen shows that 18% did not wish to work after marriage. Find the
95% confidence interval of the true proportion of females who do not wish to work after marriage.
2.3 Hypothesis testing
The idea of hypothesis testing is:
Ask a question with two possible answers
Design a test, or calculation of data
Base the decision (answer) on the test
Hypothesis testing is one way of making inference about the population parameter where the
investigator has prior notion about the values of the parameter. It is a common method of drawing
inferences about a population based on statistical evidence from a sample.
Hypothesis testing: A procedure, based on sample evidence and probability theory, used to determine
whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable and
should be rejected.
A hypothesis is a statement or a claim about the values of the parameter whose plausibility is to be
evaluated on the basis of the sample data.
Hypothesis: A statement about the value of a population parameter developed for the purpose of
testing.
A statistical hypothesis test is a method of making statistical decisions using experimental data.
2.3.1 Important Concepts in Hypothesis testing
Statistical hypothesis: Is an assertion, statement, or claim about the population whose plausibility is to
be evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject or not reject the
hypothesis to be tested. There are two types of statistical hypotheses for each situation: the null
hypothesis and the alternative hypothesis.
a. Null hypothesis: Is a claim or statement about a population parameter that is usually assumed to be
true from the very beginning until it is declared false. It is a statistical hypothesis that states a
23
Statistical Method Chapter 2: Inference about Population and Proportion
hypothesis of equality or the hypothesis of no difference between a parameter and a specific value.
It is usually denoted by H0.
b. Alternative hypothesis: Is a claim or statement about a population parameter that will be true if the
null hypothesis is false. It is a statistical hypothesis that states a hypothesis of difference between a
parameter and a specific value. It is usually denoted by H1 or HA.
Types and size of errors: There are two types of error in hypothesis testing
Type I error: Rejecting the null hypothesis when it is true. The significance level ( ) can be interpreted
as the probability of rejecting the null hypothesis when it is actually true. The probability of type I error
is denoted by α. That is, P (Type I error) = α called level of significance.
Type II error: Failing to reject the null hypothesis when it is false (accepting the null hypothesis when it
is false). The probability of type II error is denoted by β. That is, P (Type I error) = β
Type I error and type II error have inverse relationship and therefore, cannot be minimized at the same
time. In practice we set α at some value and design a test that minimizes β. This is because type I error
is often considered to be more serious, and therefore more important to avoid than type II error.
The following table gives a summary of possible results of any hypothesis test:
24
Statistical Method Chapter 2: Inference about Population and Proportion
➢ The critical and noncritical regions and the critical value are shown in the following
Figure for two tailed
25
Statistical Method Chapter 2: Inference about Population and Proportion
Compare the computed test statistic with critical value.If the computed value is within the rejection
region(s), we reject the null hypothesis; otherwise, we do not reject the null hypothesis.
Interpret the decision.
Based on the decision in Step 4, we state a conclusion in the context of the original problem.
26
Statistical Method Chapter 2: Inference about Population and Proportion
A one-tailed test indicates that the null hypothesis should be rejected when the test value is in the
critical region on one side of the mean. A one-tailed test is either a right tailed test or left-tailed test,
depending on the direction of the inequality of the alternative hypothesis
In a two-tailed test, the null hypothesis should be rejected when the test value is in either of the two
critical regions
The choice of the alternative hypothesis (H1) depends on the prior information on µ.
0 Z cal Z Z cal Z
0 Z cal − Z Z cal − Z
X − 0
Where Z cal =
n
If the population standard deviation σ is not unknown, the sample standard deviation is used in and the
27
Statistical Method Chapter 2: Inference about Population and Proportion
X − 0
Z cal =
S
n
The decision rule is the same as above.
Case 2: When n is small and σ unknown
X − 0
We use the t-test Test Statistics: tcal =
~ t ,n−1
S 2
n
After specifying α we have the following regions (critical and acceptance) on the standard normal
distribution corresponding to the above three hypothesis.
Table: Summary of decision rules
H1 Reject H0 if Do not reject H0 (Accept H0) if
tcal t tcal t
0 2
, n −1
2
, n −1
0 t cal t t cal t
0 t cal −t t cal −t
X − 0
Where tcal =
S
n
For the t distribution to apply strictly we need the following two assumptions:
➢ The observations are selected at random from the population
➢ The population distribution is normal
Sometimes the second assumptions may not be met as the t test is robust for departures from the normal
distribution. That means even when assumption 2 is not satisfied, the probabilities calculated from the t
table are still approximately correct.
Examples:
1. Convicted murderers receive a sentence of an average of 18.7 years in prison. A criminologist
wants to perform a hypothesis test to determine whether the mean sentence by one particular judge
differs from 18.7 years. A random sample of 36 cases from the court files from this judge is taken.
It is found that sample mean is 17.2 years. Assume that the population standard deviation is 4.2
years. Test whether the mean differs from 18.7 years use the 0.05 significance level.
2. The Wollega University uses thousands of fluorescent light bulbs each year. The brand of bulb it
currently uses has a mean life of 900 hours. A manufacturer claims that its new brand of bulbs,
which cost the same as the brand the university currently uses, has a mean life of more than 900
28
Statistical Method Chapter 2: Inference about Population and Proportion
hours. The university has decided to purchase the new brand if, when tested, the test evidence
supports the manufacturer’s claim at the 0.05 significance level. Suppose 64 bulbs were tested with
the following results: X = 920 hours S = 80 hours. Will the Wollega University purchase the new
brand of fluorescent bulbs?"
3. For healthy women aged 18-24, the systolic blood pressure reading with a mean 114.8. A random
sample of 16 women has an average systolic blood pressure is 117.23 with a standard deviation of
5.63. Test the claim that the systolic blood is different from 114.8. Use the 0.05 significance level
4. A job placement director claims that the average monthly starting salary for nurses is less than 1600
birr. A sample of 16 nurses has a mean monthly starting salary of 1570 birr with a sample standard
deviation of 120 birr. At α=0.05 test the claim that nurses earn less than 1600 birr a month.
5. Researchers are interested in the mean level of an enzyme in a certain population. They take a
sample of 36 individuals, determine the level of enzyme in each and compute a sample mean 22. It
is known that the variable of interest is approximately normally distributed with a standard
deviation of 10. Let’s say that they are asking the following question: Can we conclude that the
mean enzyme level in this population is different from 25?
Solution:
1. Step1 : State the null and alternative hypothesis
H 0 : = 18.7
H 1 : 18.7
Step 2 : = 0.05
Step 3 : known and n l arg e use the Z − stastic
Step 4 : Critical regions : Re ject H 0 if Z cal Z = 1.96
2
29
Statistical Method Chapter 2: Inference about Population and Proportion
X − 0
Step 5 : Calculation of the test statistic : Z cal =
n
17.2 − 18.7
Z cal = = −2.143
4.2
36
Step 6 : Decission : Since Z cal = 2.143 1.96 Re ject H 0
Step 7 : Interpreta tion : At = 0.05 the cri min o log ist can concludethat the average sentenceis
differnt from18.7 years.
2. Step1 : State the null and alternative hypothesis
H 0 : = 900
H 1 : 900
Step 2 : = 0.05
Step 3 : unknown but n is l arg e use the Z − stastic
Step 4 : Critical regions : Re ject H 0 if Z cal Z = 1.645
X − 0
Step 5 : Calculation of the test statistic : Z cal =
S
n
920 − 900
Z cal = =2
80
64
Step 6 : Decission : Since Z cal = 2 1.645 Re ject H 0
Step 7 : Interpreta tion : At = 0.05 there is enough evidenceto indicate that the new brand of light bulbs has a
mean life time of more than 900 hours.
3. Step1 : State the null and alternative hypothesis
H 0 : = 114.8
H 1 : 114.8
Step 2 : = 0.05
Step 3 : n small and unknown use the t − test
Step 4 : Critical regions : Re ject H 0 if t cal t ,n −1 = t 0.025, (15) = 2.131
2
X − 0
Step 5 : Calculation of the test statistic : t cal =
S
n
117.23 − 114.8
t cal = = 1.726
5.63
16
Step 6 : Decission : Since t t = 2.131 Do not Re ject H 0
cal , n −1
2
Step 7 : Interpreta tion :The Systolic blood pressure for a healthy women aged 18 − 24 is 114.8
30
Statistical Method Chapter 2: Inference about Population and Proportion
Exercises:
1. State the null and alternative hypotheses for each of the following
a) A researcher thinks that if expectant mothers use vitamin pills, the birth weight of the babies
will increase. The average of the birth weights of the population is 4.6 Kilograms.
b) An engineer claims that she can decrease the mean number of defects in a manufacturing
process of compact discs by using robots instead of human for certain tasks. The mean
number of defective disks is 18
c) A psychologist feels that if he plays soft music during a test, the result of the test will be
changed. He is not whether the grades will be higher or lower. In the past, the mean of the
scores was 73.
2. The scores on an aptitude test required for entry into a certain job position is normally distributed
with mean 500 and standard deviation of 120. If a random sample of 36 applicants has a mean of
546, is there evidence that their mean score is different from 500? Use α=0.05.
3. Ten years ago, the mean age of juveniles held in public custody was 16.0 years. The mean age of
250 randomly selected juveniles currently being held in public custody is 15.86 years. Assuming
σ=1.01 years, does it appear that the mean age of all juveniles being held in public custody this
year is less than it was 16 years ago? Use α=0.10.
4. The mean life time of light bulbs produced by a company is known to be 1600 hours. The mean
life time of a sample of 16 light bulbs produced by the factory is computed to be 1570 hours
31
Statistical Method Chapter 2: Inference about Population and Proportion
a) If the population standard deviation is 120 hours, test whether or not the mean life time is
different from 1600 hours
b) If the population standard deviation is not known and the sample standard deviation is 110
hour, is there any evidence to say that the mean life time of the light bulbs is more than 1600
hours?
5. With a standard care, cancer patients are expected to survive a mean duration of time equal to
38.3 months. A clinician claims that a new therapy will improve survival time. The new therapy
is administered to 100 cancer patients. Their average time is 46.9 months. Suppose σ is known to
be 43.3 months. Is this statistically significant evidence of improved survival time at the 0.05
level of significance?
6. A recent study shows that the average age of murder victims in a small city is 23.2 years. A
random sample of 18 recent victims had a mean of 22.6 years and a standard deviation of 2
years. At α=0.05, is the average different from 23.2 years? Assume the variable is approximately
normally distributed.
7. Oromia International Bank claims that the mean wait time for a teller during peak hours is less
than 4 minutes. A random sample of 20 wait times has a mean of 2.6 minutes with a sample
standard deviation of 2.1 minutes. At α=0.05 test the bank’s claim
2.3.3 Hypothesis testing about the population proportion: P
The procedure to make tests of hypothesis about the population proportion P for large samples is
similar in many aspects to the population mean. The procedure includes the same seven steps.
Similarly, the test can be two-tailed or one tailed. When the sample size is large, the sample proportion
P̂ is approximately normally distributed with its mean equal to P and standard deviation equal to
P(1 − P)
. Hence; we use the normal distribution to perform a test of hypothesis about the population
n
proportion P for a large Sample. The sample size considered to be large when nPˆ and n(1 − Pˆ ) are
both greater than 5.
Suppose the assumed or hypothesized value of P (parameter of the binomial distribution) is denoted
by P0 then one can formulate two sided (1) and one sided (2 and 3) hypothesis as follows:
1. H 0 : P = P0 VS H 1 : P P0
2. H 0 : P = P0 VS H 1 : P P0
32
Statistical Method Chapter 2: Inference about Population and Proportion
3. H 0 : P = P0 VS H 1 : P P0
The choice of H 1 depends on the prior information we have on the values of P0 .
Decision Rule:
Hypothesis
Decision rule is to reject H0 if:
Null Alternative
P P0 Z cal Z 2
P = P0 VS P P0 Z cal Z
P P0 Z cal − Z
Z cal =
(Pˆ − P )
0
~ N (0,1)
P0 (1 − P0 )
n
Example : A manufacturing company has submitted a claim that 100% of items produced by a certain
process are non defective. An improvement in the process is being considered that the feel will lower
the proportion of defectives below the current 10%. In an experiment 100 items are produced with the
new process and 5 are defective: Is this evidence sufficient to conclude that the method has been
improved? Use a 0.05 level of significance.
Solution: As usual, we follow the steps:
1. H 0 : P = 0.9 (actually P 0.9 ) VS H 1 : P 0.9
2. = 0.05
3. Critical Region: Z>1.645
4. Computation
X 95
Pˆ = = = 0.95
n 100
Z cal =
Pˆ − P0( =
)
0.95 − 0.90
= 1.67
P0 (1 − P0 ) 0.9 * 0.1
n 100
5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has reduced the
proportion of defective.
Example: the unemployment rate in a given country at a given period is believed to be 10%. The
government embarked on a series of projects to reduce unemployment. It was of interest to determine
33
Statistical Method Chapter 2: Inference about Population and Proportion
whether unemployment decreases as a result of the projects. A random sample of 500 people was
chosen, and 48 of them were found to be unemployed. Test at 1% level of significance if the
government projects reduced the unemployment rate
Solution: As usual, we follow the steps:
1. H 0 : P = 0.1 VS H 1 : P 0.1
2. = 0.05
3. Critical Region: Z<-Z1.645
4. Critical Region: Z Z
5. Computation
X 48
Pˆ = = = 0.096
n 500
Z cal =
(Pˆ − P )
0
=
0.096 − 0.1
= −0.3
P0 (1 − P0 ) 0.1 * 0.9
n 500
Z tab = − Z = Z 0.01 = −2.33
1. A registrar officer believes that the dropout for seniors at Wollega university is 15%. He performed
a hypothesis test to determine if the percentage is the same or different from 15%. Last year, 38
seniors from a random sample of 200 seniors withdrew. At α=0.05 test the educator’s claim.
2. A telephone company representative estimates that more than 25% of its customers want call
waiting service. A sample of 200 customers showed that 63 had the call waiting service. At α=0.05
is his estimate appropriate?
Solutions:
34
Statistical Method Chapter 2: Inference about Population and Proportion
38
pˆ = = 0.19, p0 = 0.15 1 − p0 = 0.85
200
0.19 − 0.15
Z cal = = 1.58
0.15 0.85
200
Step 6 : Decission : Since Z cal = 1.58 1.96 Do notreject H 0
Step 7 : Interpreta tion : At = 0.05 ther dropout for seniors is 15%.
Exercises: 1) Candidate Chala is one of the two candidates running for the mayor of Nekemte town. A
random polling of 672 registered voters finds that 323 will vote for candidate Chala. At α=0.05 is it
reasonable to assume that half of the population will vote for Chala?
2) Hawi believes that 50% the brides in the Nekemte are younger than their grooms. She performs a
hypothesis test to determine if the percentage is the same or different from 50%. Hawi samples 100
35
Statistical Method Chapter 2: Inference about Population and Proportion
brides and 53 reply that they are younger than their grooms. At 1% level of significance test Hawi’s
claim
2.3.4 Sample size determination
In planning a statistical investigation we should decide the number of units (Sample size) to be studied
in order to answer the study objectives. If the sample size is too small we may fail to detect important
effects, or may estimate effects too imprecisely. If the sample size is too large then we will waste
resources. Therefore it is recommended to determine the appropriate sample size for our study.
How many samples should be included in our study? The sample size depends on the maximum error
of the estimate, the population standard deviation, and the degree of confidence.
Z Z
2
Recall that = Z n = Z n = 2
n= 2
n
2
2
Example: The college president asks the registrar officer to estimate the average age of the students at
their college. From a previous study, the standard deviation of the ages was found to be σ= 2 years.
How large the sample should be if the officer wishes to be accurate within 1 year?
Solution : Given : Z = 2.58 = 2 = 1
2
Z
2
2.58 2
n= 2 = = 26.6256 27
1
A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that the
estimate is accurate within 2 feet. From a previous study, the standard deviation of the depths measured
was 4.38 feet.
Solution
Round the value 31.92 up to 32 therefore, to be 99% confident that the estimate is within 2 feet of the
true mean depth, the scientist needs at least a sample of 32 measurements. (Always round up to the next
whole number.)
2
Z
Similarly for proportions the sample size required is given by: n = pˆ qˆ 2
Example: A university administrator wishes to estimate, with 90 percent confidence the proportion of
students enrolled in M.B.A. programs that also have undergraduate degrees in business. It was found
36
Statistical Method Chapter 2: Inference about Population and Proportion
that in random sample of 230 students enrolled in M.B.A. programs 54 have undergraduate degrees in
business What sample size should be required, if the researcher wishes to be accurate within 5% of the
true proportion?
Solution:
54
Given : 90% Z = 1.645 pˆ = = 0.235 qˆ = 0.765 and = 0.05
2
230
2
Z
1.645
2
37