0% found this document useful (0 votes)
14 views

Module-5-Joint-Probanility-Distribution

Chapter 5 discusses point estimation and sampling distributions, focusing on methods to estimate unknown population parameters like means and proportions. It introduces concepts such as unbiased estimators, the central limit theorem, and the mean squared error (MSE) for comparing estimators. The chapter also outlines methodologies for point estimation, including the method of moments and maximum likelihood estimation.

Uploaded by

ZhuoJinWoo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Module-5-Joint-Probanility-Distribution

Chapter 5 discusses point estimation and sampling distributions, focusing on methods to estimate unknown population parameters like means and proportions. It introduces concepts such as unbiased estimators, the central limit theorem, and the mean squared error (MSE) for comparing estimators. The chapter also outlines methodologies for point estimation, including the method of moments and maximum likelihood estimation.

Uploaded by

ZhuoJinWoo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

NUEVA VIZCAYA STATE UNIVERSITY

Bayombong, Nueva Vizcaya


College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters
POINT ESTIMATION
Overview Some Parameters & Their Statistics

Suppose we have an unknown population parameter, such as


a population mean μ or a population proportion p, which we'd
like to estimate. For example, suppose we are interested in
estimating:

 p = the (unknown) proportion of American college


students, 18-24, who have a smart phone
 μ = the (unknown) mean number of days it takes
Alzheimer's patients to achieve certain milestones
In either case, we can't possibly survey the entire population.
That is, we can't survey all American college students between
the ages of 18 and 24. Nor can we survey all patients with
Alzheimer's disease. So, of course, we do what comes
naturally and take a random sample from the population, and
use the resulting data to estimate the value of the population
parameter. Of course, we want the estimate to be "good" in
some way.

In this lesson, we'll learn two methods, namely the method of


maximum likelihood and the method of moments, for
deriving formulas for "good" point estimates for population
parameters. We'll also learn one way of assessing whether a
point estimate is "good." We'll do that by defining what a
means for an estimate to be unbiased.
Some Definitions
• A point estimate is a reasonable value of a population
• The random variables X1, X2,…,Xn are a random
parameter.
sample of size n if:
• Data collected, X1, X2,…, Xn are random variables.
a) The Xi are independent random variables.
• Functions of these random variables, x-bar and s2,
b) Every Xi has the same probability distribution.
are also random variables called statistics.
• A statistic is any function of the observations in a
• Statistics have their unique distributions that are
random sample.
called sampling distributions.
• The probability distribution of a statistic is called a
sampling distribution.
Point Estimator

Sampling Distribution of the Sample Mean

• A random sample of size n is taken from a normal


population with mean μ and variance σ2.
• The observations, X1, X2,…,Xn, are normally and
independently distributed.
• A linear function (X-bar) of normal and independent
random variables is itself normally distributed.

1|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters
Central Limit Theorem
Suppose that a random variable X has a continuous uniform
distribution:

1 2, 4  x  6
f  x  
0, otherwise
Find the distribution of the sample mean of a random sample
of size n = 40.

Sampling Distributions of Sample Means

Example 7-1: Resistors

An electronics company manufactures resistors having a mean


resistance of 100 ohms and a standard deviation of 10 ohms. Two Populations
The distribution of resistance is normal. What is the probability
that a random sample of n = 25 resistors will have an average We have two independent normal populations. What is the
resistance of less than 95 ohms? distribution of the difference of the sample means?

Example 7-2: Central Limit Theorem Sampling Distribution of a Difference in Sample Means
2|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters

• If we have two independent populations with means • We want point estimators that are:
μ1 and μ2, and variances σ12 and σ22, – Are unbiased.
• And if X-bar1 and X-bar2 are the sample means of two – Have a minimal variance.
independent random samples of sizes n1 and n2 from • We use the standard error of the estimator to
these populations: calculate its mean square error.
• Then the sampling distribution of:
Unbiased Estimators Defined

The point estimator  is an unbiased estimator


for the parameter θ if:

 
E  θ (7-5)

is approximately standard normal, if the conditions of If the estimator is not unbiased, then the difference:
the central limit theorem apply.
• If the two populations are normal, then the sampling  
E  θ (7-6)
distribution is exactly standard normal.
is called the bias of the estimator .
The mean of the sampling distribution of 
Example 7-3: Aircraft Engine Life
is equal to θ.
The effective life of a component used in jet-turbine aircraft
engines is a normal-distributed random variable with Example 7-4: Sample Man & Variance Are Unbiased-1
parameters shown (old). The engine manufacturer introduces
an improvement into the manufacturing process for this • X is a random variable with mean μ and variance σ2.
component that changes the parameters as shown (new). Let X1, X2,…,Xn be a random sample of size n.
Random samples are selected from the “old” process and • Show that the sample mean (X-bar) is an unbiased
“new” process as shown. estimator of μ.
What is the probability the difference in the two sample means
is at least 25 hours?  X  X 2  ...  X n 
EX   E 1 
 n 
1
  E  X 1   E  X 2   ...  E  X n  
n
1 n
Figure 7-4 Sampling distribution of the sample mean
      ...      
n n
difference.
Example 7-4: Sample Man & Variance Are Unbiased-2
Process
Show that the sample variance (S2) is a unbiased estimator of
Old (1) New (2) Diff (2-1) σ2 .
x -bar = 5,000 5,050 50  n 2 
  X  X  
s= 40 30 50 E  S   E  i 1
2

1  n 
E    X i2  X 2  2 XX i  
 n  1  n  1  i 1 
n= 16 25 



Calculations 1   n 2 2  1  n 
  E   X i  nX      E  X i2   nE  X 2  
s / √n = 10 6 11.7 n  1   i 1   n  1  i 1 

z= -2.14 1  n 
    2   2   n   2   2 n 
n  1  i 1
P(xbar2-xbar1 > 25) = P(Z > z) = 0.9840
1 1
= 1 - NORMSDIST(z)   n 2  n 2  n 2   2    n  1  2    2
n 1 n 1 

General Concepts of Point Estimation


3|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters
Other Unbiased Estimators of the Population Mean
Figure 7-5 The sampling distributions of two
110.4 unbiased estimators.
Mean = X   11.04
10 Minimum Variance Unbiased Estimators
10.3  11.6
Median = X   10.95 • If we consider all unbiased estimators of θ, the one
2 with the smallest variance is called the minimum
110.04  8.5  14.1 variance unbiased estimator (MVUE).
Trimmed mean =  10.81 • If X1, X2,…, Xn is a random sample of size n from a
8 normal distribution with mean μ and variance σ2, then
the sample X-bar is the MVUE for μ.
• All three statistics are unbiased. • The sample mean and a single observation are
– Do you see why? unbiased estimators of μ. The variance of the:
• Which is best? – Sample mean is σ2/n
– We want the most reliable one. – Single observation is σ2
– Since σ2/n ≤ σ2, the sample mean is
preferred.

i xi xi' Standard Error of an Estimator

The standard error of an estimator  is its standard deviation, given by


1 12.8 8.5
  V  .  
2 9.4 8.7 If the standard error involves unknown parameters that can be estimated,

3 8.7 9.4 substitution of these values into  


produces an estimated standard error, denoted by   .
4 11.6 9.8 Equivalent notation:    s  se   
5 13.1 10.3 If the X i are ~N   ,   , then X is normally distributed,
 s
and  X  . If  is not known, then  X 
6 9.8 11.6 n n
.

7 14.1 12.1
8 8.5 12.8
9 12.1 13.1
10 10.3 14.1
Σ 110.4
Choosing Among Unbiased Estimators

Suppose that 1 and 2 are unbiased estimators of θ.


The variance of 1 is less than the variance of 2 .
 1 is preferable.

4|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters
Relative Efficiency
Example 7-5: Thermal Conductivity
• The MSE is an important criterion for comparing
• These observations are 10 measurements of two estimators.
thermal conductivity of Armco iron.
• Since σ is not known, we use s to calculate the
standard error.
• Since the standard error is 0.2% of the mean, the
mean estimate is fairly precise. We can be very
confident that the true population mean is 41.924
± 2(0.0898).

xi • If the relative efficiency is less than 1, we


conclude that the 1st estimator is superior to the
41.60 2nd estimator.

41.48 Optimal Estimator

42.34 • A biased estimator can be preferred to an


unbiased estimator if it has a smaller MSE.
41.95 • Biased estimators are occasionally used in linear
regression.
41.86 • An estimator whose MSE is smaller than that of
any other estimator is called an optimal estimator.
42.18
41.72
42.26
41.81
42.04
41.924 = Mean
Figure 7-6 A biased estimator has a smaller
0.284 = Std dev (s ) variance than the unbiased estimator.
0.0898 = Std error Methods of Point Estimation

Mean Squared Error • There are three methodologies to create point


estimates of a population parameter.
The mean squared error of an estimator  – Method of moments
of the parameter θ is defined as: – Method of maximum likelihood
– Bayesian estimation of parameters
   
2
MSE   E   θ (7-7) • Each approach can be used to create estimators
with varying degrees of biasedness and relative
Can be rewritten as  E   E      
2 2
 θ  E   MSE efficiencies.
   
 V      bias  Method of Moments
2

• A “moment” is a kind of an expected value of a


Conclusion: The mean squared error (MSE) of the random variable.
estimator is equal to the variance of the estimator plus the • A population moment relates to the entire
bias squared. It measures both characteristics. population or its representative function.
• A sample moment is calculated like its associated
population moments.

5|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters
Moments Defined

• Let X1, X2,…,Xn be a random sample from the xi


probability f(x), where f(x) can be either a:
– Discrete probability mass function, or
– Continuous probability density function
11.96
• The kth population moment (or distribution
moment) is E(Xk), k = 1, 2, …. 5.03
• The kth sample moment is (1/n)ΣXk, k = 1, 2, ….
• If k = 1 (called the first moment), then: 67.40
– Population moment is μ.


– Sample moment is x-bar.
The sample mean is the moment estimator of the
16.07
population mean.
31.50
Moment Estimators 7.73
Let X 1 , X 2 ,..., X n be a random sample from either
a probability mass function or a probability density function
11.10
with m unknown parameters θ1 ,θ 2 ,...,θ m . 22.38
The moment estimators 1 , 2 ,..., m are found
by equating the first m population moments 21.646 = Mean
to the first m sample moments and
solving the resulting simultaneous equations 0.04620 = λ est
for the unknown parameters.
Example 7-7: Normal Moment Estimators
Example 7-6: Exponential Moment Estimator-1
Suppose that X1, X2, …, Xn is a random sample
• Suppose that X1, X2, …, Xn is a random sample from a normal distribution with parameter μ and
from an exponential distribution with parameter λ. σ2. So E(X) = μ and E(X2) = μ2 + σ2.
• There is only one parameter to estimate, so
equating population and sample first moments,
we have E(X) = X-bar. 1 n 1 n 2
• E(X) = 1/λ = x-bar X  Xi
n i 1
and 2  2   Xi
n i 1
• λ = 1/x-bar is the moment estimator. 2
n
1 n 
1 n  X  n
i
2
  Xi 
 n i 1 
   X i2  X 2  i 1
2
Example 7-6: Exponential Moment Estimator-2
n i 1 n
• As an example, the time to failure of an electronic   
2
 n
 X X
n

 Xi 
2
module is exponentially distributed. 
1  n 2  i 1   i
• Eight units are randomly selected and tested.   Xi   i 1
(biased)
Their times to failure are shown. n  i 1 n  n
• The moment estimate of the λ parameter is  
0.04620.  

6|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters

Example 7-8: Gamma Moment Estimators-1 Maximum Likelihood Estimators

• Suppose that X is a random variable with


Parameters = Statistics probability distribution f(x:θ), where θ is a single
unknown parameter. Let x1, x2, …, xn be the
r
 E  X   X is the mean observed values in a random sample of size n.
 Then the likelihood function of the sample is:
 E  X 2   E  X  is the variance or
r 2
L(θ) = f(x1: θ) ∙ f(x2; θ) ∙…∙ f(xn: θ)
 2
(7-9)
r  r  1
 EX 2
 and now solving for r and  : Note that the likelihood

2 function is now a
function of only the
X2 unknown parameter θ.
r n
The maximum likelihood
1/ n   X i2  X 2 estimator (MLE) of θ is
i 1
the value of θ that
X maximizes the likelihood
 n
function L(θ).
1/ n   X i2  X 2 • If X is a discrete random variable, then L(θ) is the
i 1
probability of obtaining those sample values. The
MLE is the θ that maximizes that probability.
Example 7-8: Gamma Moment Estimators-2

Using the exponential example data shown, we can


Example 7-9: Bernoulli MLE
estimate the parameters of the gamma distribution.
Let X be a Bernoulli random variable. The probability
mass function is f(x;p) = px(1-p)1-x, x = 0, 1 where P is the
x-bar = 21.646 parameter to be estimated. The likelihood function of a
random sample of size n is:
ΣX 2 = 6645.4247
L  p   p x1 1  p   p x2 1  p   ...  p xn 1  p 
1 x1 1 x2 1 xn

2  xi n

1  p  i1 i
n
p 1  p 
1 xi n x
xi xi xi
 p i1
i 1

11.96 143.0416  n   n

ln L  p     xi  ln p   n   xi  ln 1  p 
 i 1   
5.03 25.3009 i 1

n
 n

67.40 4542.7600  xi  n   xi 
d ln L  p  i 1
  i 1  0
16.07 258.2449 dp p 1  p 
n
31.50 992.2500 x i
p i 1
7.73 59.7529 n

11.10 123.2100
22.38 500.8644
X2 21.6462
r   1.29
n
1/ n   X i2  X 2 1 8 6645.4247  21.6462
i 1

X 21.646
   0.0598
n
1/ n   X 2
X 2 1 8 6645.4247  21.6462
i
i 1

7|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters

Example 7-10: Normal MLE for μ Figure 7-7 Log likelihood for exponential
distribution. (a) n = 8, (b) n = 8, 20, 40.
Let X be a normal random variable with unknown mean μ
and known variance σ2. The likelihood function of a Let X be a normal random variable with both unknown
random sample of size n is: mean μ and variance σ2. The likelihood function of a
random sample of size n is:
 x     2 2 
n
1
L   
2

e i
i 1  2
 xi     2 2 
L   ,   
n
1
n 2 1 2

  xi   
2 e
i 1  2
1
e 2
2
 i 1

 2  2 n2
n
1
  xi   
2
1 2 2

n  e i 1

ln  2 2   2
n

 2 
1
ln L      x    2 n2
2

2
i
2 i 1

d ln L    1 n
ln L   ,  2   ln  2 2   2
n
1
 x   
n

 x     0
2
 2
2
i
d  i
i 1
2 i 1
n
 ln L   ,  2 
x
n
1
 i 1
i
 X (same as moment estimator) 

2
 x     0
i 1
i

n
 ln L   ,  2  n 1 n
   x    0
2

  
Example 7-11: Exponential MLE
2 2 2 4
2 i
i 1

Let X be a exponential random variable with parameter λ.


 x  X 
n
2
The likelihood function of a random sample of size n is: i
2
n
  X and   i 1

n   xi n
L      e   xi
 e n i 1

i 1 Properties of an MLE
n
ln L     n ln       xi
i 1 Under very general and non-restrictive conditions,
d ln L    n n
when the sample size n is large and if  is the MLE of the parameter ,
   xi  0
d  i 1
n
(1)  is an approximately unbiased estimator for θ, i.e.,  E   θ 
  
n x
i 1
i  1 X (same as moment estimator) (2) The variance of  is nearly as small as the variance
that could be obtained with any other estimator, and
Why Does MLE Work? (3)  has an approximate normal distribution.
• From Examples 7-6 & 11 using the 8 data
Notes:
observations, the plot of the ln L(λ) function
• Mathematical statisticians will often prefer MLEs
maximizes at λ = 0.0462. The curve is flat near
because of these properties. Properties (1) and
max indicating estimator not precise.
(2) state that MLEs are MVUEs.
• As the sample size increases, while maintaining
• To use MLEs, the distribution of the population
the same x-bar, the curve maximums are the
must be known or assumed.
same, but sharper and more precise.

Figure 7-7 Log likelihood for exponential


distribution. (a) n = 8, (b) n = 8, 20, 40.

8|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor
NUEVA VIZCAYA STATE UNIVERSITY
Bayombong, Nueva Vizcaya
College of Engineering
Chapter 5: Sampling Distributions and Point Estimation of Parameters

Complications of the MLE Method


Importance of Large Sample Sizes
The method of maximum likelihood is an excellent
• Consider the MLE for σ2 shown in Example 7-12: technique, however there are two complications:
1. It may not be easy to maximize the likelihood
function because the derivative function set to
zero may be difficult to solve algebraically.

  xi  X 
n
2 2. The likelihood function may be impossible to

 
solve, so numerical methods must be used.
2 n 1 2 The following two examples illustrate.
E   i 1
 
n n Example 7-14: Uniform Distribution MLE

Then the bias is:


f  x   1 a for 0  x  a
E    2
 
n 1 2
n
  
22  2
n L a  
n
1 1
 n  a  n for 0  xi  a
i 1 a a
• Since the bias is negative, the MLE
underestimates the true variance σ2. dL  a  n
• The MLE is an asymptotically (large sample)  n 1  na  n 1
unbiased estimator. The bias approaches zero as da a
n increases.
a  max  xi 
Invariance Property

Let 1 , 2 ,..., k be the maximum likelihood estimators (MLEs)


of the parameters θ1 ,θ2 ,...,θk .
Then the MLEs for any function h  θ1 ,θ2 ,...,θk  of these parameters

 
is the same function h 1 , 2 ,..., k of the estimators 1 , 2 ,..., k

This property is illustrated in Example 7-13.

Example 7-13: Invariance


Figure 7-8 The likelihood function for this
For the normal distribution, the MLEs were: uniform distribution

Calculus methods don’t work here because L(a) is


 x  X 
n
2

2 i maximized at the discontinuity.


  X and   i 1
Clearly, a cannot be smaller than max(xi), thus the
n
MLE is max(xi).
To obtain the MLE of the function h   ,  2    2   ,
substitute the estimators  and  2 into the function h :

 x  X 
n
2
i
  2  i 1

n
which is not the sample standard deviation s.

9|MODULE 5
Prepared by:
VHANESSA LIAN T. MARIANO
Emergency Instructor

You might also like