0% found this document useful (0 votes)
4 views

Estimation & Hypothesis Testing.pptx (Final)

This document outlines the principles of estimation and hypothesis testing in statistics, focusing on concepts such as sampling distributions, point and interval estimates, and the Central Limit Theorem. It aims to equip students with the ability to explain sampling distributions, compute confidence intervals, and test hypotheses using sample data. Key topics include the properties of sampling distributions, the importance of sample size, and the formulation of confidence intervals for various statistical measures.

Uploaded by

danuberh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Estimation & Hypothesis Testing.pptx (Final)

This document outlines the principles of estimation and hypothesis testing in statistics, focusing on concepts such as sampling distributions, point and interval estimates, and the Central Limit Theorem. It aims to equip students with the ability to explain sampling distributions, compute confidence intervals, and test hypotheses using sample data. Key topics include the properties of sampling distributions, the importance of sample size, and the formulation of confidence intervals for various statistical measures.

Uploaded by

danuberh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Aksum University, CHS & CSH

School of Public Health


Department of Epidemiology and Biostatistics

Estimation & Hypothesis Testing

Getachew M. (MPH, Asst. Professor)


1
Contents

 Principles estimation/inferences
Sampling Distribution
Point and interval estimates
Hypothesis testing
Concept of P-value

2
Session Objectives
After completion of this; students will be able to
 Explain the principles of sampling distributions of
means and proportions and calculate their standard
errors.
 State the principles of estimation and differentiate
between point and interval estimations.
 Compute appropriate confidence intervals for
population means and proportions and interpret the
findings.
 Familiar with estimation and hypothesis formulation
 To test hypothesis and significance using sample
means and proportions .
3
Inferential statistics
Inferential statistics: the process of generalizing or
drawing conclusions about the target population on
the basis of results obtained from a sample.

Parameter: The true quantities of the population


(which are rarely known for certain).

Statistic: is a quantity calculated from a sample,


which describes a particular feature or Descriptive
measure obtained from a sample. E.g. if we want to
know the mean age of all Egyptian bladder cancer
cases, We take a portion of bladder cancer cases and
calculate the mean age.
4
5
How information from the sample is linked to the
population?
Via sampling distribution
Sampling distribution:
This is not the distribution of sample.

It is the distribution of a statistic.

The frequency distribution of the statistic taken


over all possible samples.
Serves to answer probability questions about sample
statistics.
Procedure:
If we take many, many samples and get the statistic for
each of those samples, the distribution of all those
statistic is the sampling distribution. 6
A sampling distribution of sample means is a
distribution obtained by using the means computed
from random samples of a specific size taken from a
population
Properties of Sampling Distribution
Three things about sampling distribution
Its Mean; sample proportion
Its Variance
Its Shape
The mean of the samples means will be the same as
the population mean.
The standard deviation of the sample means will be
smaller than the standard deviation of the
population, and
It will be equal to the population standard deviation,
divided by the square root of the sample size.
7
Now consider all possible samples of size taken from
Population N= 18,20,22,24 then consider possible
sample size n=2

1st 2nd Observation 1st 2nd Observation


Obs 18 20 22 24 Obs 18 20 22 24
18 18,18 18,20 18,22 18,24 18 18 19 20 21
20 20,18 20,20 20,22 20,24 20 19 20 21 22
22 22,18 22,20 22,22 22,24 22 20 21 22 23
24 24,18 24,20 24,22 24,24 24 21 22 23 24

• 16 possible samples
(with replacement)

8
Sample means Freq P( )
18 1 0.0625
19 2 0.1250
20 3 0.1875
21 4 0.2500
22 3 0.1875
23 2 0.1250
24 1 0.0625
Comparing the population with its sampling distribution

Population Sample means distribution


N=4 n=2
μ  21 σ  2.236 μx  21 σ x  1.58
P(x) P(x)
.3 .3

.2 .2

.1 .1

0
18 20 22 24
0
18 19 20 21 22 23 24
_
x Mean

10
Properties of sampling distribution of the mean
 If a population is normal with mean μ and standard
deviation σ, the sampling distribution of is also
normally distributed with
 The mean, μ, of the distribution of sample mean is
equal to the mean of the population from which the
samples were drawn.
 The variance of the distribution of sample mean is
equal to the variance of the population divided by the
Square root of sample size.

σ
μx  μ and σx 
n
11
Sampling Distribution of the proportion
The sample proportion is derived from counts or
frequency data.
Easier and more reliable, does not depend on
variance.
Suppose we choose a random sample of size n,
the sampling distribution of the samples
proportions p posses the following properties.
The sample proportion p will be an estimate of
the population proportions P.

12
Sampling Distribution of the proportion……….

The standard deviation of p


is:

Called the standard error of the proportion).

Provided n is large enough the shape of the sampling


distribution of p is normal.

13
• The mean of the distribution, μp, will be equal to the
true population proportion, p, and the variance of the
distribution, σ2 will be equal to p(q)/n.

μp  p σp 
p(1 p)
n
np  5
n(1  p)  5

14
Central Limit Theorem (CLT)
Regardless of the shape of the frequency distribution
of a characteristic in the parent population;

As the sample size n increases, the shape of the


distribution of the sample means taken from a
population with mean μ and standard deviation of σ
will approach a normal distribution.

And, this distribution will have a mean μ and standard


deviation σ/√n (called the standard error of the
mean).
15
Central Limit Theorem (CLT)…..

The central limit theorem states that the sampling


distribution of any statistic will be normal or nearly
normal, if the sample size is large enough.

How large is "large enough"?

As a rough rule of thumb, many researchers say that a


sample size of 30 is large enough for continuous data
16
Standard deviation and Standard error
Standard deviation is a measure of variability
between individual observations (descriptive index
relevant to mean)

Standard error refers to the variability of summary


statistics (e.g. the variability of the sample mean or a
sample proportion)

Standard error is a measure of uncertainty in a sample


statistic. i.e. precision of the estimate of the
estimator
17
Parameter Estimations
In parameter estimation, we generally assume that the
underlying (unknown) distribution of the variable of
interest is adequately described by one or more
(unknown) parameters, referred as population
parameters.
As it is usually not possible to make measurements on
every individual in population, parameters cannot
usually be determined exactly.

Instead we estimate parameters by calculating the


corresponding characteristics from a random sample
estimates .

18
Parameter Estimations…….
Estimation is a procedure in which we use the
information included in a sample to get inferences
about the true parameter of interest.
An estimator is a statistic that is used to estimate an
unknown population parameter.
Estimator

19
Estimations……
Properties of good estimator:
Unbiased: If a measure of the sample statistic is equal
to the population parameter but unlikely to obtain such
estimator unless include the source population

Minimum variance: An estimate which has a


minimum standard error (variance ) is good estimator.
Example:
In a skewed distribution median has a minimum
standard error than mean. Thus ,median is better
estimator than mean in such cases.

20
Estimation
There are two types of estimation:
Point estimation:
is a specific numerical value estimate of parameter.

It uses the information in the sample to arrive at a single


number (that is called an estimate) that is intended to be
close to the true value of the parameter.
Interval estimation:
An interval estimate of a parameter is a range of values
used to estimate the parameter.

It uses the information of the sample to end up at an


interval (i.e. construct 2 endpoints) that is intended to
enclose the true value of the parameter.

21
Point Estimation
Estimator is a rule (formula) that tells us how to
calculate the value of an estimate based on the
measurements contained in a sample.

Suppose we want to estimate the true (population)


mean, μ, of a random variable X , from a sample Xi, i =
1, 2, ...n.
One possible point estimator is the sample mean

Point estimate does not give the precision of the


estimate and hence we need another method of
estimation which handles these problems.
22
Point Estimation……….
 Example:
The mean survival time of 91 laboratory rats after
removal of the thyroid gland was 82 days with a
standard deviation of 10 days (assume the rats were
randomly selected).

In the above example, the point estimates for the


population parameters μ and δ ( with regard to the
survival time of all laboratory rats after removal of
the thyroid gland) are 82 days and 10 days
respectively.

23
Confidence Intervals
Also called Confidence Limits (Lower and upper)

 It is a statement that describes a population


parameter has a value lying in between two
specified limits with a certain confidence level
(usually 95%).

It consists of three parts:


A confidence level,
The statistic and
Margin of error.
Confidence Intervals Estimation:
That is, if we take 100 samples, we expect the true
value of the parameter to lie within the calculated
intervals in 95 of them (and not to lie within the
Intervals in 5 of them).

25
Confidence Intervals Estimation...........
Confidence interval takes into account the sample to sample
variation of the statistic and gives the measure of precision.
The general formula used to calculate a Confidence interval is:
CI = Estimate ± K × Standard Error, k is called reliability
coefficient
Most commonly the 95% confidence intervals are calculated,
however 90% and 99% confidence intervals are sometimes
used.

The term is called the maximum error of the estimate.


The maximum error of the estimate is the maximum difference
between the point estimate of a parameter and the actual value
of the parameter.
26
Confidence Intervals Estimation……
Therefore, Confidence Interval

27
Confidence Intervals Estimation……
90% CI is narrower than 95% CI since we are only
90% certain that the interval includes the population
parameter.

On the other hand 99% CI will be wider than 95% CI;


the extra width meaning that we can be more certain
that the interval will contain the population parameter.

But to obtain a higher confidence from the same


sample, we must be willing to accept a larger margin
of error (a wider interval).
28
Confidence Intervals Estimation:

For a given confidence level (i.e. 90%, 95%, 99%)


the width of the confidence interval depends on the
standard error of the estimate which in turn depends
on the:
1. Sample size:-The larger the sample size, the
narrower the confidence interval (this is to mean the
sample statistic will approach the population
parameter) and the more precise our estimate. Lack
of precision means that in repeated sampling the
values of the sample statistic are spread out or
scattered. The result of sampling is not repeatable.

29
Confidence Intervals Estimation:
To increase precision (of an SRS), use a larger
sample.

You can make the precision as high as you want by


taking a large enough sample. The margin of error
decreases as √n increases.

2. Standard deviation:-The more the variation


among the individual values,
The wider the confidence interval and the less
precise the estimate. As sample size increases
standard error decreases.
30
Estimation of
1. Confidence Interval for Single Mean
2. Confidence Interval for Difference of Means
3. Confidence Interval for Single Proportion
4.Confidence Interval for Difference of Proportions
5. Significant Test and P-value (Z-test, t-test, X2, F-
test)

31
Two-Sided Confidence Intervals
Two-Sided Confidence Intervals for mean...
This is merely a shorthand algebraic statement that
95% of the SND curve lies b/n +1.96 and –1.96. If
one chooses the sampling distribution of means (a
normal curve with mean μ and standard deviation,
σ/√n), then substituting for Z, it follows that
Little manipulation,

32
Two-Sided Confidence Intervals
Estimating standard error

 As we have discussed earlier, the sampling


distribution is only a theoretical distribution as
in practice we take only one sample and not
repeated sample.

 Hence S.E. is often not known but can be


estimated from a single sample,

33
Two-Sided Confidence Intervals

Precision of statistical estimation

34
Two-Sided Confidence Intervals

 The (1-α) percent confidence interval (C.I.) for


μ:
 We want to find two values L and U between
which μ lies with high probability, i.e.
P(L≤μ≤U)
Confidence = 1-αguide
level to Z-value
Confidence level Zα/2(2-Tail) Zα(1-Tail)
80% α = 20% 1.28 0.84
90% α = 10% 1.645 1.28
95% α = 5% 1.96 1.645
99% α = 1% 2.575 2.325
c α = 1.0-c Z(C/2) Z(C-0.5)

35
Confidence interval for a single mean (for continuous variable )

Interval estimation for mean

A (1 - α)100% confidence interval estimation for the


unknown population mean can be defined as:

36
Interval estimation for a single mean

Formula for a specific confidence interval for the


mean when variance is Unknown and n < 30

Where, n-1 = degree of freedom for student’s t


distribution and s = sample standard deviation

37
Interval estimation for difference of mean

38
Interval estimation for difference of mean

39
Interval estimation for single proportion

40
Confidence interval for the difference of two
population proportion

41
Interval for Difference Means

Remark
You can construct a 100(1-α)% confidence
interval for a paired experiment using:
Two Dependent Means
The test to be used in this section is the paired
t test

42
Confidence Intervals
Example: A random sample of 100 drug-treated
patients has a mean survival time of 46.9 months. If
the SD of the population is 43.3 months, find a 95%
confidence interval for the population mean.
Solution: σ
X  Zα
2 n
46.9 ± (1.96) (43.3 /√100) = 46.9 ± 8.5
= (38.4 to 55.4 months). Hence, there is 95%
certainty that the limits (38.4 , 55.4) hold the mean
survival times in the population from which the
sample arose.

43
Example 1

44
45
Example 2

46
Example 4

47
Exercise

48
Hypothesis testing

49
Hypothesis testing………
The formal process of hypothesis testing provides us
with a means of answering research questions.

Hypothesis is a testable statement that describes the


nature of the proposed relationship between two or more
variables of interest.

The purpose of the study is to collect data which will


allow the researcher to test the hypothesis.

50
Hypothesis testing………
This statement (assumption) may or may not
be true.
Statistical tests can prove (with a certain
degree of confidence), that a relationship
exists.
The best way to determine whether a statistical
hypothesis is true would be to examine the entire
population.
Since that is often impractical, researchers
typically examine a random sample from the
population.
51
Hypothesis testing

In hypothesis testing,
the researcher must define the population under
study,
state the particular hypothesis that will be
investigated,
give the significance level,
select a sample from the population,
Collect the data,
perform the calculations require for the statistical
test, and reach a conclusion.

52
Example of hypothesis
1) The mean height of Aksum College of Health
Sciences students’ is 1.63m.

2) There is no difference between the distribution of Pf


and Pv malaria in Ethiopia (i.e. are distributed in
equal proportions.)

53
Hypothesis testing……

54
Hypothesis testing…..

55
Hypothesis testing…..

56
Hypothesis testing…..

57
Hypothesis testing…..

Level of significance
A method for decision making must be agreed upon.
If HO is rejected, then HA is accepted.
How is a “significant” difference defined?
A null hypothesis is either true or false, and it is
either rejected or not rejected.
No error is made if it is true and we fail to reject
it, or if it is false and rejected.
An error is made, however, if it is true but
rejected, or if it is false and we fail to reject it.

58
Hypothesis testing cont’d …..

A random sample of size n is taken and the


information from the sample is used to reject or
accept (fail to reject) the null hypothesis.

It is not always possible to make a correct


decision since we are dealing with random
samples.

Therefore, we must learn to live with probabilities of


type I (α) and type II (β) errors.
59
Hypothesis testing cont’d …..

60
Hypothesis testing cont’d …..
Type I Error: A type I error occurs when one rejects
the null hypothesis while it is true.

The probability of committing a type I error is the level


of significance of the test of hypothesis, and is
denoted by α.
Type II Error: A type II error occurs when one
rejects the alternative hypothesis (fails to reject the
null hypothesis) when the alternative hypothesis is
true.
The probability of committing a type II error is denoted
by β.
61
Hypothesis testing cont’d …..
When we test the hypotheses, we can never be 100% certain
of our conclusions.
We can only be confident to a certain level – hopefully a high
one.
The level of significance(α) is the maximum probability of
commuting a type I error
In practice, the level of significance ( α ) is chosen arbitrarily
and the limits for accepting Ho are determined.
 Types of test to be set up:
The form of HA will determine the kind of test to be set up
(either one tailed or two tailed tests ).
Consider the situation when HA includes the symbol " ≠".
That is, HA: μ ≠ …,P ≠ …, μ1-μ2 ≠ ….,P- P ≠ …. etc (two
tailed test)
62
Hypothesis testing cont’d …..

63
What Do We Test
Effect or Difference we are interested in

Mean and proportion in the population

Difference in Means or Proportions

Odds Ratio (OR)

Relative Risk (RR)

Correlation Coefficient

Estimated Parameters

64
Hypothesis testing for population mean
Test procedure for two tailed test
1.State the null hypothesis: H0: μ =μ0

2.State the alternative hypothesis:H1:μ≠μ0

3.Fix the level of significance(α) and compute the test


statistics under the null (assuming the null hypothesis
is true) as:

Note that: this is not the only test statistics.


Depending on the type of data and sample size, we may
need to compute z-score, t-score or Chi-score

For large samples (n≥30), the test statistic has


standard normally distribution z ~ N (0, 1) 65
Hypothesis testing for population mean

For small sample (n<30) and if the true variance (σ2) is unknown.
the test statistic is distributed as a student t-distribution with n-1
degrees of freedom.
If the interest is to check the presence or absence of association
then Chi-score distribution will be used
66
Example
1. The average age of Aksum university staffs is 30
years with variance of 20 years. To check this
assumption a graduating MPH student wants to
proof whether the assumption made about average
age is true or not. He took a random sample of 10
staffs and found the average age (mean) of 27 years.
Test that the average age of the staffs is 30 years.

• Solution: Test procedure:


1. Null hypothesis: Ho: μ = 30
2. Alternative hypothesis: HA: μ ≠ 30
67
68
5. Decision rule:
• Since |-2.12| is greater than |+-1.96| , then reject
the null hypothesis

Interpretation :
Thus, the average age of Aku staff was not 30
years.

69
Test Procedure for One Tailed Test
Is the same as the test for two tailed test except
that the alternative hypothesis(H1) is
H1 : μ < μo Or μ > μo When Ho: μ = μo

If the alternative hypothesis is H1: µ < µo, then


the decision should be:
–Reject the null hypothesis if
Test Procedure for One Tailed Test…
Hypothesis Testing for Single Population Proportion (two tail)

72
Hypothesis Testing for Single Population Proportion (two tail)

73
Example
• The national institute of mental health published an
article stating that in any one year period,
approximately 9.5 percent of American adults suffer
from depression or a depressive illness.

Suppose that in a survey of 100 people in a certain


town, seven of them suffered from depression or a
depressive illness. Conduct a hypothesis test to
determine if the true proportion of the people in that
town suffering from depression or depressive illness is
different from the percent in the general adult
American population.
74
Solution: the hypothesis can be stated as:
–Ho: P = 0.095
–HA: P ≠ 0.095
• The level of significance for the test is α = 0.05.

• The critical value can be obtained from the normal

• distribution at the mentioned level of significance.

• Thus, Zα/2 = Z0.025 = ± 1.96.

75
76
Procedure for one tailed test of proportion

77
Procedure for one tailed test of proportion

Decision: since Z cal is less than the tabulated value of Z,


then the decision is do not reject the null hypothesis
78
Hypothesis testing for two proportions
• A similar approach is adopted when
performing a hypothesis test to compare two
proportions. The standard error of the
difference in proportions is again calculated,
but because we are evaluating the probability
of the data on the assumption that the null
hypothesis is true we calculate a slightly
different standard error.

79
Hypothesis testing for two proportions……..

80
Hypothesis testing for two proportions………

81
Example

82
83
Decision: reject Ho
• Because Z calc > Z tab; in other words, the p-value is
les s than the level of significance (i.e., α= 0.01)

How can you help the health officer? if his thought is

‘’The prevalence of Malaria in 1978 was greater than


that of prevalence of Malaria in 1979?’’
(Take level of significance; =0.01, one tailed) =2.33

Decision: Reject the Null hypothesis since 8.571>2.33

84
Tests of Hypothesis using the t - distribution
 Tests of hypotheses about the mean are carried out with the
t-distribution just as for the normal distribution, except that
we must consider the number of degrees of freedom and use
a different table (the table of t-distribution).
 Types of t-test:

One-sample t-test: which is used to compare a


single mean to a fixed number or "gold standard".
Paired t-test: which is used to compare two means based
on samples that are paired in some way.
Two-sample t-test: which is used to compare two
population means based on independent samples from the
two populations or groups.

85
The P-Value

86
Two-Sided P-Value

One-sided Ha  AUC
in tail beyond zcal
Two-sided Ha 
consider potential
deviations in both
directions  double
the one-sided P-value Examples: If one-sided P =
0.0010, then two-sided P = 2
× 0.0010 = 0.0020.
If one-sided P = 0.2743, then
two-sided P = 2 × 0.2743 =
0.5486. 87
Interpretation
P-value answer the question: What is the probability
of the observed test statistic … when H0 is true?
Thus, smaller and smaller P-values provide stronger
and stronger evidence against H0
Small P-value  strong evidence

88
Interpretation…..
conventions*
0.01 < P  0.05  significant evidence against H0
P  0.01  highly significant evidence against H0
Examples
P =.27  non-significant evidence against H0
P =.01  highly significant evidence against H0

89
Interpretation…..

90
Confidence Interval (CI) VS p-value

91
Thank You !!!

92

You might also like