0% found this document useful (0 votes)
38 views

Lecture_02 (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Lecture_02 (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

LECTURE 2

Estimation

Lecturer: Nguyen Thi Thu Van


Email: [email protected]
Content
 Estimation

 Measures of goodness for point estimators


 Bias vs. Mean square error

 Common unbiased point estimators

 Error of estimation

 Confidence intervals
 Large-Sample confidence intervals

 Selecting the sample size

 Small-Sample confidence intervals


Estimation
Why do we need Estimation?
Estimation is a knowledge-building
effort focused on an object of interest.
Getting started by the following example
Suppose we want to estimate the average height
of all students at a university, such as UEH, VGU,
or VNU-HCM.
In practice, since we cannot survey all students in
the university (a large population), using a sample
and the estimation method provides us with an
approximate value for the parameter of interest (in
this case, the average height).
Process of Estimation
Step 1: Define the Probability Model.
Step 2: Define the Parameter to Estimate.
Step 3: Choose the Estimator.
Step 4: Collect the Data Sample.
Gather a random sample of student heights
from the university population.
Step 5: Compute the Estimate.
Apply the estimator to the sample data to
calculate the estimated average height.
Estimators and Estimate
Evaluating the Quality of Statistical Estimators

 Each estimator represents a unique rule for


obtaining a single estimate. Naturally, some
estimators perform better than others. But how
can we establish clear criteria to evaluate and
compare the quality of statistical estimators?
 To answer this question, we rely on specific
measures of "goodness" for estimators.
 Imagine a man hits the bull's-eye with a single shot—
this alone doesn't prove his skill. However, consistently
hitting it 100 times builds confidence in his ability.
 Similarly, the goodness of a point estimator is
evaluated through repeated sampling. By analyzing
the frequency distribution of estimates and their
closeness to the true parameter, we assess its
accuracy and reliability.
Next, we introduce two key measures of goodness for
estimators: Bias and Mean Squared Error (MSE).
Measures of Goodness for Estimators

 Bias

 Mean Square Error


In statistics, bias refers to the difference between the
expected value of an estimator 𝜃መ and the true value of
the parameter 𝜃 being estimated 𝐵 𝜃መ = 𝐸 𝜃መ − 𝜃. It
indicates a systematic error in the estimation process.
Large variation Low variation
Example
a) Which of these estimators are unbiased?
b) Among the unbiased estimators, which one has
the smallest variance?
Common Unbiased Estimators
Error of Estimation
Bias and MSE are theoretical measures used to
evaluate the quality of an estimator based on
repeated sampling. However, in practice, we often
work with specific samples rather than repeated
ones.
The error of estimation is defined as
𝜖 = |𝜃መ − 𝜃|
the difference between the estimated value and
the true parameter, measures the estimator's
accuracy for a given sample.
Since the true parameter is unknown, we rely on
probability to quantify the uncertainty and assess
how far an estimate might be from the true value.
This complements bias and MSE by focusing on
actual outcomes in specific datasets.
Example
A sample of n = 1000 voters,
randomly selected from a city,
showed y = 560 in favor of
candidate Jones. Estimate p,
the fraction of voters in the
population favoring Jones, and
place a 2-standard-error bound
on the error of estimation.
Example
Confidence Estimators
Random Sample I am 95%
confident that
μ is between
Population Mean 40 & 60.
(mean, μ, is X = 50
unknown)

Sample

So, what does the number 50 represent for? And


what he meant to say “I am 95% confident that μ
is between 40 & 60”?
Two forms of Estimation
 Point estimation: A point estimate is a single value that serves
as the most likely estimate for a population parameter. Ex:
The sample mean of $50 is used as a point estimate of the
average daily salary of all managers in HCMC = a population.

 Interval estimation: A confidence interval provides additional


information about the variability of an estimate, quantifying the
uncertainty associated with a point estimate of a population
parameter. Ex: A 95% confidence interval for the average
daily salary of the population is [$48, $52], indicating the
range within which the true population mean is likely to fall
with 95% confidence.
40 50 60

Lower Upper
Confidence Confidence
Limit Point Estimate Limit

Width of
confidence interval
An interval estimator uses sample data to calculate two
endpoints that define an interval. Ideally, this interval
should contain the target parameter 𝜃 and be as
narrow as possible.

 Since the endpoints depend on the sample, they vary


randomly, making the interval's length and location
uncertain.

 Our goal is to find an estimator that produces narrow


intervals with a high probability of enclosing 𝜃
Confidence limits are the bounds of an interval
estimator, with the confidence coefficient indicating
the probability that the interval contains the true
parameter 𝜃: 𝑃 𝜃𝐿 ≤ 𝜃 ≤ 𝜃𝑈 = 1 − 𝛼.

A higher confidence coefficient means a greater


likelihood that intervals include 𝜃.

𝜃𝐿 and 𝜃𝑈 are computed


from sample data.
How to find Confidence Interval?
One useful method for finding confidence intervals is
the pivotal method. This method relies on identifying
a pivotal quantity that satisfies two key properties:

 It is a function of the sample measurements and


the unknown parameter 𝜃, with 𝜃 being the only
unknown.

 Its probability distribution is independent of the


parameter 𝜃
Steps in the Pivotal Method
Step 1: Identify the pivotal quantity.

Step 2: Find the probability distribution.

Step 3: Choose a confidence level.

Step 4: Find the confidence interval.

Step 5: Solve for the unknown parameter.


Example. Suppose 𝑋1 , 𝑋2 , … , 𝑋𝑛 ​ are i.i.d. random
variables from a normal distribution 𝑁 𝜇; 𝜎 2
where 𝜇 is the unknown population mean and 𝜎 2
is the known variance. Construct a confidence
interval for 𝜇?


𝑋−𝜇
Step 1: Pivotal quantity 𝑍 =
𝜎/ 𝑛

where 𝜇 is the only unknown parameter.

Note that SE = 𝜎/ 𝑛 is called the standard error.


Step 2: By Theorem 1 in Lecture 1, 𝑍~𝑁(0,1)

where 𝜇 is the only unknown parameter.

Step 3: Choose a confidence level 1 − 𝛼, which


means 𝑃(−𝑧𝛼/2 < 𝑍 < 𝑧𝛼/2 ) = 1 − 𝛼 → ±𝑧𝛼/2

Step 4: Construct confidence interval



𝑋−𝜇
𝑃(−𝑧𝛼/2 < < 𝑧𝛼/2 ) = 1 − 𝛼
𝜎/ 𝑛
Step 5: Solve for 𝜇
𝑃(𝑋ത − 𝑧𝛼 × 𝜎/ 𝑛 < 𝜇 < 𝑋ത + 𝑧𝛼 × 𝜎/ 𝑛) = 1 − 𝛼
2 2

𝜎
CI for 𝜇 = 𝑋ത − 𝑧𝛼 × , 𝑋ത + 𝑧𝛼 × 𝜎/ 𝑛
2 𝑛 2
Examples
Large-Sample Confidence Intervals
Confidence Interval for Large Size
Example. Let 𝜃෠ be a statistic that follows
a normal distribution with mean 𝜃 and
standard error 𝜎𝜃෡ ​.

Derive a confidence interval for 𝜃 with a


confidence level of 1 − 𝛼.
For instance, a sample of 11 circuits from a large normal
population has a mean resistance of 2.20 ohms. We know
from past testing that the population standard deviation is 0.35
ohms. Determine a 95% confidence interval for the true mean
resistance of the population.
σ
Solution. X  Z
n
 2.20  1.96 (0.35/ 11)
 2.20  0.2068
1.9932    2.4068
This means we are 95% confident that the interval from
1.9932 to 2.4068 ohms contains the true mean resistance.
Exercises
Selecting the Sample Size
Choosing Sample Size

 As we known, data collected may contain a large


amount of information about the parameter of
interest; others may contain little or none. Many
aspects effect the quality of information.

 We want obviously seek to obtain information at


minimum cost (i.e., with small sample size).
How to Reduce the Sampling Error?
 The required sample size can be found to reach a desired
margin of error (e) with a specified level of confidence (1 - )
σ
 The margin of error is also called sampling error e  z α/2
n
 the amount of imprecision in the estimate of the population
parameter

 the amount added and subtracted to the point estimate to


form the confidence interval

 Reducing the margin of error by reducing the standard


deviation σ↓ or/and the sample size n↑ or/and the confidence
level 1 –  ↓
Determining Sample Size

Determining
Sample Size

For the
Mean

σ σ Z σ
2 2
XZ eZ n
n n e 2
Sampling Solve
error for n
Required Sample Size Example
If  = 45, what sample size is needed to
estimate the mean within ± 5 with 90%
confidence?

Z σ2 2 2
(1.645) (45) 2
n 2
 2
 219.19
e 5

So the required sample size is n = 220


(Always round up)
Determining Sample Size
Determining
Sample Size

For the
Proportion

π (1 π ) Now solve Z 2 π (1  π )


eZ for n to get n 2
n e
Required Sample Size Example
How large a sample would be necessary to
estimate the true proportion defective in a large
population within ±3%, with 95% confidence?

(Assume a pilot sample yields p = 0.12)


Solution. For 95% confidence, use Zα/2 = 1.96
e = 0.03; p = 0.12, so use this to estimate π
Z/2 π (1  π ) (1.96)2 (0.12)(1 0.12)
2

n 2
 2
 450.74
e (0.03)

So use n = 451
Example. The reaction of an individual to a stimulus in a
psychological experiment may take one of two forms, A or
B. If an experimenter wishes to estimate the probability p
that a person will react in manner A, how many people
must be included in the experiment?

Assume that the experimenter will be satisfied if the error of


estimation is less than 0.04 with probability equal to 0.90.
Assume also that he expects p to lie somewhere in the
neighborhood of 0.6.
Example. An experimenter wishes to compare the effectiveness of
two methods of training industrial employees to perform an
assembly operation. The selected employees are to be divided
into two groups of equal size, the first receiving training method 1
and the second receiving training method 2. After training, each
employee will perform the assembly operation, and the length of
assembly time will be recorded. The experimenter expects the
measurements for both groups to have a range of approximately 8
minutes.

If the estimate of the difference in mean assembly times is to be


correct to within 1 minute with probability .95, how many workers
must be included in each training group?
How to Reduce Margin of Error from Finite Populations?
 We simply multiply the margin of error by a factor called
the finite population correction factor (FPCF) :
Then the margin of error becomes
σ
e  z α/2
n
 If the sample size n is less than 5 percent of the
population, and we are sampling without replacement,
then we consider the size of the population to be
effectively infinite. In this case, 1.
Small-Sample Confidence Intervals
-- The End of Topic --
Thank You!

You might also like