Chapter-7-Estimation & Hypothesis Testing
Chapter-7-Estimation & Hypothesis Testing
Inference Analyzed
Populatio
Data
n
Numerical
Sample
data
Statistical Estimation
This is one way of making inference about the population parameter
where the investigator does not have any prior notion about values or
characteristics of the population parameter.
There are two ways estimation.
1) Point Estimation
It is a procedure that results in a single value as an estimate for a
parameter.
2) Interval estimation
It is the procedure that results in the interval of values as an estimate
for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a
parameter. The limits by themselves are random variable.
Definitions
1
Confidence Interval: An interval estimate with a specific level of
confidence
Confidence Level: The percent of the time the true value will lie in the
interval estimate given.
Consistent Estimator: An estimator which gets closer to the value of
the parameter as the sample size increases.
Degrees of Freedom: The number of data values which are allowed
to vary once a statistic has been determined.
Estimator: A sample statistic which is used to estimate a population
parameter. It must be unbiased, consistent, and relatively efficient.
Estimate: Is the different possible values which an estimator can
assumes.
Interval Estimate: A range of values used to estimate a parameter.
Point Estimate: A single value used to estimate a parameter.
Relatively Efficient Estimator: The estimator for a parameter with
the smallest variance.
Unbiased Estimator: An estimator whose expected value is the value
of the parameter being estimated.
2
The confidence level is the probability that the value of the parameter
falls within the range specified by the confidence interval surrounding the
statistic.
- For the interval estimator to be good the error should be small. How it
be small?
By making n large
Small variability
Taking Z small
- To obtain the value of Z, we have to attach this to a theory of chance. That is,
there is an area of size such
3
Case 2: If
the sample size is large and the variance is unknown
But usually is not known, in that case we estimate by its point estimator S2
Here are
the z % values corresponding to the
most 90 0.1 0.05 1.64 commonly used confidence
levels. 0 5
95 0.0 0.02 1.96
5 5
99 0.0 0.00 2.58
1 5
The unit
of measurement of the confidence interval is the standard error.
This is just the standard deviation of the sampling distribution
of the statistic.
Examples:
Solution:
4
a)
b)
Solution:
Hypothesis Testing
5
- This is also one way of making inference about population
parameter, where the investigator has prior notion about the
value of the parameter.
Definitions:
- Statistical hypothesis: is an assertion or statement about
the population whose plausibility is to be evaluated on the
basis of the sample data.
- Test statistic: is a statistics whose value serves to
determine whether to reject or accept the hypothesis to be
tested. It is a random variable.
- Statistic test: is a test or procedure used to evaluate a
statistical hypothesis and its value depends on sample data.
There are two types of hypothesis:
Null hypothesis:
- It is the hypothesis to be tested.
- It is the hypothesis of equality or the hypothesis of no
difference.
- Usually denoted by H0.
Alternative hypothesis:
- It is the hypothesis available when the null hypothesis has to
be rejected.
- It is the hypothesis of difference.
- Usually denoted by H1 or Ha.
6
- Type II error: Failing to reject the null hypothesis when it is
false.
NOTE:
1. There are errors that are prevalent in any two choice
decision making problems.
2. There is always a possibility of committing one or the other
errors.
3. Type I error ( ) and type II error ( ) have inverse relationship
and therefore, can not be minimized at the same time.
6. Making decision.
1.
2.
7
3.
CASES:
Where:
8
H0 Reject H0 if Accept H0 if Inconclusive if
Where:
Examples:
1. Test the hypotheses that the average height content of
containers of certain lubricant is 10 liters if the contents of a
random sample of 10 containers are 10.2, 9.7, 10.1, 10.3, 10.1,
9.8, 9.9, 10.4, 10.3, and 9.8 liters. Use the 0.01 level of
significance and assume that the distribution of contents is
normal.
Solution:
Let ,
Step 1: Identify the appropriate hypothesis
9
t- Statistic is appropriate because population variance is
not known and the sample size is also small.
Step 4: identify the critical region.
Here we have two critical regions since we have two tailed
hypothesis
Step 5: Computations:
Step 6: Decision
Accept H0 , since tcal is in the acceptance region
Step 7: Conclusion
At 1% level of significance, we have no evidence to say that the
average height content of containers of the given lubricant is
different from 10 litters, based on the given sample data.
2. The mean life time of a sample of 16 fluorescent light bulbs
produced by a company is computed to be 1570 hours. The
population standard deviation is 120 hours. Suppose the
hypothesized value for the population mean is 1600 hours. Can we
conclude that the life time of light bulbs is decreasing?
(Use and assume the normality of the population)
Solution:
Let ,
Step 1: Identify the appropriate hypothesis
Step 5: Computations:
10
Step 6: Decision
Accept H0 , since Zcal is in the acceptance region.
Step 7: Conclusion
At 5% level of significance, we have no evidence to say that that
the life time of light bulbs is decreasing, based on the given
sample data.
3. It is known in a pharmacological experiment that rats fed
with a particular diet over a certain period gain an average of 40
gms in weight. A new diet was tried on a sample of 20 rats
yielding a weight gain of 43 gms with variance 7 gms2 . Test the
hypothesis that the new diet is an improvement assuming
normality.
a) State the appropriate hypothesis
b) What is the appropriate test statistic? Why?
c) Identify the critical region(s)
d) On the basis of the given information test the
hypothesis and make conclusion.
Solution (exercise).
Test of Association
B
A B1 B2 . . Bj . Bc Tota
l
11
A1 O11 O12 O1j O1c R1
A2 O21 O22 O2j O2c R2
.
.
Ai Oi1 Oi2 Oij Oic Ri
.
.
Ar Or1 Or2 Orj Orc
Tota C1 C2 Cj n
l
- The chi-square procedure test is used to test the hypothesis of
independency of two attributes .For instance we may be
interested
Whether the presence or absence of hypertension
is independent of smoking habit or not.
Whether the size of the family is independent of
the level of education attained by the mothers.
Whether there is association between father and
son regarding boldness.
Whether there is association between stability of
marriage and period of acquaintance ship prior to
marriage.
- The is given by :
Remark:
12
- The null and alternative hypothesis may be stated as:
Decision Rule:
Examples:
1. A geneticist took a random sample of 300 men to study
whether there is association between father and son regarding
boldness. He obtained the following results.
Son
Father Bold Not
Bold 85 59
Not 65 91
Using test whether there is association between father
and son regarding boldness.
Solution:
13
- Obtain the calculated value of the chi-square.
Solution:
14
- Then calculate the expected frequencies( eij’s)
15