0% found this document useful (0 votes)
55 views

Chapter-7-Estimation & Hypothesis Testing

estimation book

Uploaded by

Tewodros Bekele
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Chapter-7-Estimation & Hypothesis Testing

estimation book

Uploaded by

Tewodros Bekele
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 15

CHAPTER -7

ESTIMATION AND HYPOTHESIS


TESTING

 Inference is the process of making interpretations or conclusions from


sample data for the totality of the population.
 It is only the sample data that is ready for inference.
 In statistics there are two ways though which inference can be made.
 Statistical estimation
 Statistical hypothesis testing.

Inference Analyzed
Populatio
Data
n

Numerical
Sample
data

Data analysis is the process of extracting relevant information from the


summarized data.

Statistical Estimation
This is one way of making inference about the population parameter
where the investigator does not have any prior notion about values or
characteristics of the population parameter.
There are two ways estimation.
1) Point Estimation
It is a procedure that results in a single value as an estimate for a
parameter.
2) Interval estimation
It is the procedure that results in the interval of values as an estimate
for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a
parameter. The limits by themselves are random variable.
Definitions

1
Confidence Interval: An interval estimate with a specific level of
confidence
Confidence Level: The percent of the time the true value will lie in the
interval estimate given.
Consistent Estimator: An estimator which gets closer to the value of
the parameter as the sample size increases.
Degrees of Freedom: The number of data values which are allowed
to vary once a statistic has been determined.
Estimator: A sample statistic which is used to estimate a population
parameter. It must be unbiased, consistent, and relatively efficient.
Estimate: Is the different possible values which an estimator can
assumes.
Interval Estimate: A range of values used to estimate a parameter.
Point Estimate: A single value used to estimate a parameter.
Relatively Efficient Estimator: The estimator for a parameter with
the smallest variance.
Unbiased Estimator: An estimator whose expected value is the value
of the parameter being estimated.

Point and Interval estimation of the population mean: µ


 Point Estimation
Another term for statistic is point estimate, since we are estimating the
parameter value. A point estimator is the mathematical way we
compute the point estimate. For instance, sum of over n is the point
estimator used to compute the estimate of the population means, .That
is is a point estimator of the population mean.

 Confidence interval estimation of the population


mean
Although possesses nearly all the qualities of a good estimator, because
of sampling error, we know that it's not likely that our sample statistic will
be equal to the population parameter, but instead will fall into an interval
of values. We will have to be satisfied knowing that the statistic is "close
to" the parameter. That leads to the obvious question, what is "close"?

We can phrase the latter question differently: How confident can we be


that the value of the statistic falls within a certain "distance" of the
parameter? Or, what is the probability that the parameter's value is within
a certain range of the statistic's value? This range is the confidence
interval.

2
The confidence level is the probability that the value of the parameter
falls within the range specified by the confidence interval surrounding the
statistic.

There are different cases to be considered to construct confidence


intervals.

Case 1: If sample size is large or if the population is


normal with known variance

Recall the Central Limit Theorem, which applies to the sampling


distribution of the mean of a sample. Consider samples of size n drawn
from a population, whose mean is and standard deviation is with
replacement and order important. The population can have any frequency
distribution. The sampling distribution of will have a mean and a
standard deviation , and approaches a normal distribution as n gets
large. This allows us to use the normal distribution curve for computing
confidence..intervals.

- For the interval estimator to be good the error should be small. How it
be small?
 By making n large
 Small variability
 Taking Z small

- To obtain the value of Z, we have to attach this to a theory of chance. That is,
there is an area of size such

3
Case 2: If
the sample size is large and the variance is unknown
But usually is not known, in that case we estimate by its point estimator S2

Here are
the z % values corresponding to the
most 90 0.1 0.05 1.64 commonly used confidence
levels. 0 5
95 0.0 0.02 1.96
5 5
99 0.0 0.00 2.58
1 5

Case 3: If sample size is small and the population


variance, is not known.

The unit
of measurement of the confidence interval is the standard error.
This is just the standard deviation of the sampling distribution
of the statistic.

Examples:

1. From a normal sample of size 25 a mean of 32 was


found .Given that the population standard deviation is 4.2.
Find
a) A 95% confidence interval for the population mean.
b) A 99% confidence interval for the population mean.

Solution:

4
a)

b)

2. A drug company is testing a new drug which is supposed to


reduce blood pressure. From the six people who are used as
subjects, it is found that the average drop in blood pressure
is 2.28 points, with a standard deviation of .95 points. What
is the 95% confidence interval for the mean change in
pressure?

Solution:

That is, we can be


95% confident that
the mean decrease
in blood pressure is
between 1.28 and
3.28 points.

Hypothesis Testing

5
- This is also one way of making inference about population
parameter, where the investigator has prior notion about the
value of the parameter.
Definitions:
- Statistical hypothesis: is an assertion or statement about
the population whose plausibility is to be evaluated on the
basis of the sample data.
- Test statistic: is a statistics whose value serves to
determine whether to reject or accept the hypothesis to be
tested. It is a random variable.
- Statistic test: is a test or procedure used to evaluate a
statistical hypothesis and its value depends on sample data.
There are two types of hypothesis:
Null hypothesis:
- It is the hypothesis to be tested.
- It is the hypothesis of equality or the hypothesis of no
difference.
- Usually denoted by H0.
Alternative hypothesis:
- It is the hypothesis available when the null hypothesis has to
be rejected.
- It is the hypothesis of difference.
- Usually denoted by H1 or Ha.

Types and size of errors:


- Testing hypothesis is based on sample data which may
involve sampling and non sampling errors.
- The following table gives a summary of possible results of
any hypothesis test:
Decision
Don't reject
Reject H0
H0
Right
H0 Type I Error
Decision
Truth
Right
H1 Type II Error
Decision

- Type I error: Rejecting the null hypothesis when it is true.

6
- Type II error: Failing to reject the null hypothesis when it is
false.
NOTE:
1. There are errors that are prevalent in any two choice
decision making problems.
2. There is always a possibility of committing one or the other
errors.
3. Type I error ( ) and type II error ( ) have inverse relationship
and therefore, can not be minimized at the same time.

 In practice we set at some value and design a test that


minimize . This is because a type I error is often considered to
be more serious, and therefore more important to avoid, than a
type II error.

General steps in hypothesis testing:

1. The first step in hypothesis testing is to specify


the null hypothesis (H0) and the alternative hypothesis (H1).
2. The next step is to select a significance level,

3. Identify the sampling distribution of the


estimator.

4. The fourth step is to calculate a statistic


analogous to the parameter specified by the null hypothesis.

5. Identify the critical region.

6. Making decision.

7. Summarization of the result.

Hypothesis testing about the population mean, :

Suppose the assumed or hypothesized value of is denoted by ,


then one can formulate two sided (1) and one sided (2 and 3)
hypothesis as follows:

1.
2.

7
3.

CASES:

Case 1: When sampling is from a normal distribution with


known

- The relevant test statistic is

- After specifying we have the following regions (critical and


acceptance) on the standard normal distribution
corresponding to the above three hypothesis.

Summary table for decision rule.

H0 Reject H0 if Accept H0 if Inconclusive if

Where:

Case 2: When sampling is from a normal distribution with


unknown and small sample size

- The relevant test statistic is

- After specifying we have the following regions on the student t-distribution


corresponding to the above three hypothesis.

8
H0 Reject H0 if Accept H0 if Inconclusive if

Where:

Case3: When sampling is from a non- normally


distributed population or a population whose
functional form is unknown.
- If a sample size is large one can perform a test hypothesis
about the mean by using:

- The decision rule is the same as case I.

Examples:
1. Test the hypotheses that the average height content of
containers of certain lubricant is 10 liters if the contents of a
random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,
9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use the 0.01 level of
significance and assume that the distribution of contents is
normal.

Solution:
Let ,
Step 1: Identify the appropriate hypothesis

Step 2: select the level of significance,


Step 3: Select an appropriate test statistics

9
t- Statistic is appropriate because population variance is
not known and the sample size is also small.
Step 4: identify the critical region.
Here we have two critical regions since we have two tailed
hypothesis

Step 5: Computations:

Step 6: Decision
Accept H0 , since tcal is in the acceptance region
Step 7: Conclusion
At 1% level of significance, we have no evidence to say that the
average height content of containers of the given lubricant is
different from 10 litters, based on the given sample data.
2. The mean life time of a sample of 16 fluorescent light bulbs
produced by a company is computed to be 1570 hours. The
population standard deviation is 120 hours. Suppose the
hypothesized value for the population mean is 1600 hours. Can we
conclude that the life time of light bulbs is decreasing?
(Use and assume the normality of the population)

Solution:
Let ,
Step 1: Identify the appropriate hypothesis

Step 2: select the level of significance,


Step 3: Select an appropriate test statistics
Z- Statistic is appropriate because population variance is
known.

Step 4: identify the critical region.

Step 5: Computations:

10
Step 6: Decision
Accept H0 , since Zcal is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that that
the life time of light bulbs is decreasing, based on the given
sample data.
3. It is known in a pharmacological experiment that rats fed
with a particular diet over a certain period gain an average of 40
gms in weight. A new diet was tried on a sample of 20 rats
yielding a weight gain of 43 gms with variance 7 gms2 . Test the
hypothesis that the new diet is an improvement assuming
normality.
a) State the appropriate hypothesis
b) What is the appropriate test statistic? Why?
c) Identify the critical region(s)
d) On the basis of the given information test the
hypothesis and make conclusion.
Solution (exercise).

Test of Association

- Suppose we have a population consisting of observations


having two attributes or qualitative characteristics say A and
B.
- If the attributes are independent then the probability of
possessing both A and B is PA*PB
Where PA is the probability that a number has attribute A.
PB is the probability that a number has attribute B.

- Suppose A has mutually exclusive and exhaustive classes.


B has mutually exclusive and exhaustive classes

- The entire set of data can be represented using


contingency table.

B
A B1 B2 . . Bj . Bc Tota
l

11
A1 O11 O12 O1j O1c R1
A2 O21 O22 O2j O2c R2
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
Ar Or1 Or2 Orj Orc
Tota C1 C2 Cj n
l
- The chi-square procedure test is used to test the hypothesis of
independency of two attributes .For instance we may be
interested
 Whether the presence or absence of hypertension
is independent of smoking habit or not.
 Whether the size of the family is independent of
the level of education attained by the mothers.
 Whether there is association between father and
son regarding boldness.
 Whether there is association between stability of
marriage and period of acquaintance ship prior to
marriage.

- The statistic is given by:


~

- The is given by :

Remark:

12
- The null and alternative hypothesis may be stated as:

Decision Rule:

- Reject H0 for independency at level of significance if the


calculated value of exceeds the tabulated value with degree
of freedom equal to .
Reject

Examples:
1. A geneticist took a random sample of 300 men to study
whether there is association between father and son regarding
boldness. He obtained the following results.

Son
Father Bold Not
Bold 85 59
Not 65 91
Using test whether there is association between father
and son regarding boldness.
Solution:

- First calculate the row and column totals

- Then calculate the expected frequencies( eij’s)

13
- Obtain the calculated value of the chi-square.

- Obtain the tabulated value of chi-square

- The decision is to reject H0 since


Conclusion: At 5% level of significance we have evidence to
say there is association between father and son regarding
boldness, based on this sample data.
2. Random samples of 200 men, all retired were classified according to education and number
of children is as shown below
EducatiNumber of
on levelchildren
0-1 2-3 Over
3
Elementa 14 37 32
ry
Secondar 31 59 27
y and
above
Test the hypothesis that the size of the family is independent of
the level of education attained by fathers. (Use 5% level of
significance)

Solution:

- First calculate the row and column totals

14
- Then calculate the expected frequencies( eij’s)

- Obtain the calculated value of the chi-square.

- Obtain the tabulated value of chi-square

- The decision is to reject H0 since


Conclusion: At 5% level of significance we have evidence to
say there is association between the size of the family and the
level of education attained by fathers, based on this sample
data.

15

You might also like