0% found this document useful (0 votes)
2 views

chapter7-Sampling-Distribution

Uploaded by

20150009758
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

chapter7-Sampling-Distribution

Uploaded by

20150009758
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

ENGINEERING DATA DANALYSIS

Sampling Distributions
CE Group 1
Aparece
Introduction
• Parameters are numerical descriptive measures
for populations.
– For the normal distribution, the location and
shape are described by and 
– For a binomial distribution consisting of n
trials, the location and shape are determined
by p.
• Often the values of parameters that specify the
exact form of a distribution are unknown.
• You must rely on the sample to learn about these
parameters.
Sampling
Examples:
• A pollster is sure that the responses to his
“agree/disagree” question will follow a binomial
distribution, but p, the proportion of those who
“agree” in the population, is unknown.
• An agronomist believes that the yield per acre of
a variety of wheat is approximately normally
distributed, but the mean and the standard
deviation  of the yields are unknown.
 If you want the sample to provide reliable
information about the population, you must
select your sample in a certain way!
Simple Random Sampling
• The sampling plan or experimental
design determines the amount of
information you can extract, and
often allows you to measure the
reliability of your inference.
inference
• Simple random sampling is a
method of sampling that allows each
possible sample of size n an equal
probability of being selected.
Example
•There are 89 students in a
statistics class. The instructor
wants to choose 5 students to form
a project group. How should he
proceed?
1.
1. Give
Giveeach
eachstudent
studentaanumber
numberfrom
from01
01
to
to89.
89.
2.
2. Choose
Choose55pairs
pairsof
ofrandom
randomdigits
digits
from
fromthe
therandom
randomnumber
numbertable.
table.
3.
3. IfIfaanumber
numberbetween
between90
90and
and00
00isis
chosen,
chosen,choose
chooseanother
anothernumber.
number.
4.
4. The
Thefive
fivestudents
studentswith
withthose
those
numbers
numbersform
formthe
thegroup.
group.
Types of Samples
• Sampling can occur in two types of
practical situations:
1.
1. Observational
Observational studies:
studies: TheThedata
dataexisted
existedbefore
before
you
youdecided
decidedto
tostudy
studyit.
it. Watch
Watchout
out for
for
 Nonresponse:
Nonresponse: AreArethethe responses
responsesbiased
biased
because
becauseonly
onlyopinionated
opinionatedpeople
peopleresponded?
responded?

 Undercoverage:
Undercoverage: AreArecertain
certainsegments
segmentsof
of the
the
population
populationsystematically
systematicallyexcluded?
excluded?

 Wording
Wordingbias:
bias: The
Thequestion
questionmay
maybe
betoo
too
complicated
complicatedor
orpoorly
poorlyworded.
worded.
Types of Samples
• Sampling can occur in two types of
practical situations:
2.
2. Experimentation:
Experimentation: The
Thedata
dataarearegenerated
generatedby by
imposing
imposingan
anexperimental
experimentalcondition
conditionor ortreatment
treatment
on
onthe
theexperimental
experimentalunits.
units.
 Hypothetical
Hypothetical populations
populations can canmake
make
random
randomsampling
samplingdifficult
difficult ifif not
not impossible.
impossible.

 Samples
Samplesmust
must sometimes
sometimesbe bechosen
chosensosothat
that
the
theexperimenter
experimenterbelieves
believesthey
theyare
are
representative
representative of
of the
thewhole
wholepopulation.
population.

 Samples
Samplesmust
must behave
behavelike
likerandom
random
samples!
samples!
Other Sampling Plans
• There are several other sampling plans
that still involve randomization:
randomization
1.
1. Stratified
Stratifiedrandom
randomsample:
sample: Divide
Dividethe
the
population
populationinto
intosubpopulations
subpopulationsor
or strata
strata and
and
select
select aasimple
simplerandom
randomsample
samplefrom
fromeach
eachstrata.
strata.
2.
2. Cluster
Clustersample:
sample: Divide
Dividethethepopulation
populationinto
into
subgroups
subgroupscalled
calledclusters;
clusters; select
select aasimple
simple
random
randomsample
sampleofof clusters
clustersand
andtake
takeaacensus
censusof
of
every
everyelement
element in
inthe
thecluster.
cluster.
3.
3. 1-in-k
1-in-ksystematic
systematicsample:
sample: Randomly
Randomlyselect
select
one
oneof
of the
thefirst
first kkelements
elementsininan
anordered
ordered
population,
population, and
andthen
thenselect
select every
everyk-th
k-thelement
element
thereafter.
thereafter.
Examples
• Divide California into counties and
take a simple random sample within each
county. Stratified
• Divide California into counties and take a simple
random sample of
10 counties. Cluster
• Divide a city into city blocks, choose a simple
random sample of 10 city blocks, and interview
all who live there.
• Choose an entry at random from the phone Cluster
book, and select every 50th number thereafter.

1-in-50 Systematic
Non-Random Sampling Plans
• There are several other sampling plans that
do not involve randomization.
randomization They should
NOT be used for statistical inference!
1.
1. Convenience
Conveniencesample:
sample: AAsample
samplethat
that can
canbe
betaken
taken
easily
easilywithout
without random
randomselection.
selection.
•• People
Peoplewalking
walkingby
byon
onthe
thestreet
street
2.
2. Judgment
Judgmentsample:
sample: The
Thesampler
samplerdecides
decideswho
whowill
willand
and
won’t
won’t be
beincluded
includedin
inthe
thesample.
sample.
3.
3. Quota
Quotasample:
sample: The Themakeup
makeupofof the
thesample
samplemust
must reflect
reflect
the
themakeup
makeupof of the
thepopulation
populationon
onsome
someselected
selected
characteristic.
characteristic.
•• Race,
Race, ethnic
ethnicorigin,
origin, gender,
gender, etc.
etc.
Sampling Distributions
Sampling Distributions
Definition: The sampling distribution of a
statistic is the probability distribution for the
possible values of the statistic that results
when random samples of size n are
repeatedly drawn from the population.
x Each value
Population:
Population:3,
3,5,
5,2,
2,11 Possible
Possiblesamples
samples of x-bar is
10 / 3 3.33
Draw
Drawsamples
samplesof
ofsize
sizenn== equally
3,
3,5,
5,22 9 / 3 3
33without
withoutreplacement
replacement likely, with
3,
3,5,
5,11 6 / 3 2 probability
3,
3,2,
2,11 8 / 3 2.67
p(x) 1/4
5,
5,2,
2,11
1/4

x
2 3
Sampling Distributions
Sampling distributions for statistics can be
Approximated with simulation techniques
Derived using mathematical theorems
The Central Limit Theorem is one such
theorem.
Central
CentralLimit
LimitTheorem:
Theorem:IfIfrandom
randomsamples
samplesofofnn
observations
observationsare aredrawn
drawnfrom
fromaanonnormal
nonnormalpopulation
populationwith
with
finiteand
finite andstandard deviation,,then,
standarddeviation then,when
whennnisislarge,
large,the
the
sampling
samplingdistribution
distributionof
ofthe
thesample
samplemean
mean x isisapproximately
approximately
normally
normallydistributed,
distributed,with mean and
withmean andstandard
standarddeviation
deviation
/ n
..The
Theapproximation
approximationbecomes
becomesmore moreaccurate
accurateas as nn
becomes
becomeslarge.large.
Example
Toss a fair coin n = 1 time. The distribution of x
the number on the upper face is flat or
uniform.
 
xp
xp((xx))
11 11 11

11(( ))22(( ))......66(( ))
33.5.5
66 66 66
  ( x   ) 22
  ( x   ) pp((xx)) 
 11.71
.71
Example

Toss a fair coin n = 2 times. The distribution of


x the average number on the two upper faces
is mound-shaped.

Mean:: 
Mean 33..55
Std
StdDev
Dev::
// 22  71// 22 
11..71 11..21
21
Example

Toss a fair coin n = 3 times. The distribution of


x the average number on the two upper faces
is approximately normal.

Mean:: 
Mean 33..55
Std
StdDev
Dev::
// 33  71// 33 
11..71 ..987
987
Why is this Important?
The Central Limit Theorem also implies that the
sum of n measurements is approximately normal
with mean n and standard deviation  n .
Many statistics that are used for statistical
inference are sums or averages of sample
measurements.
When n is large, these statistics will have
approximately normal distributions.
This will allow us to describe their behavior and
evaluate the reliability of our inferences.
How Large is Large?
If the sample is normal,
normal then the sampling
distribution of x will also be normal, no matter
what the sample size.

When the sample population is approximately


symmetric,
symmetric the distribution becomes
approximately normal for relatively small
values of n.

When the sample population is skewed,


skewed the
sample size must be at least 30 before the
sampling distribution of x becomes
approximately normal.
The Sampling Distribution of the
Sample Mean
A random sample of size n is selected from a
population with mean  and standard deviation 
he sampling distribution of the sample mean x will
have mean and standard deviation .
/ n
If the original population is normal, the sampling
distribution will be normal for any sample size.
If the original population is nonnormal, the sampling
distribution will be normal when n is large.

The standard deviation of x-bar is sometimes called the


STANDARD ERROR (SE).
Finding Probabilities for
the Sample Mean

IfIf the
the sampling
sampling distribution
distribution of of x is
is normal
normal or
or
approximately
approximately normalnormal standardize
standardize or
or rescale
rescale the
the
interval
interval of of interest
interest in
in terms
terms of
of
x 
z
/ n

Find
Find the
the appropriate
appropriate area
area using
using Table
Table 3.
3.
Example: A random 12  10
PP((xx 12 )  P ( z  12 10)
sample of size n = 16 12) P ( z  )
from a normal 88// 1616
distribution with  = 
10 and  = 8.
PP((zz 11)) 
11 ..8413
8413 ..1587
1587
Example
A soda filling machine is supposed to fill cans of
soda with 12 fluid ounces. Suppose that the fills are
actually normally distributed with a mean of 12.1 oz
and a standard deviation of .2 oz. What is the
probability that the average fill for a 6-pack of soda is
less than 12 oz?

P (x  12) 
x   12  12.1
P(  )
 / n .2 / 6
P ( z   1.22) .1112
The Sampling Distribution of
the Sample Proportion
The Central Limit Theorem can be used to
conclude that the binomial random variable x is
approximately normal when n is large, with mean
np and standard deviation .
x
ˆ
The sample proportion, p  n is simply a
rescaling of the binomial random variable x,
dividing it by n.
From the Central Limit Theorem, the sampling
distribution of p̂ will also be approximately
normal, with a rescaled mean and standard
deviation.
The Sampling Distribution of
the Sample Proportion
A random sample of size n is selected from a binomial
population with parameter p
he sampling distribution of the sample proportion,
x
pˆ 
n pq
will have mean p and standard deviation
n
If n is large, and p is not too close to zero or one, the
sampling distribution of p̂ will be approximately
normal.

The standard deviation of p-hat is sometimes


called the STANDARD ERROR (SE) of p-hat.
Finding Probabilities for
the Sample Proportion
IfIf the
the sampling
sampling distribution
distribution of of p̂ isis normal
normal or
or
approximately
approximately normalnormal standardize
standardize or or rescale
rescale the
the
interval
interval of of interest
interest in
in terms
terms of
of z  p̂  p
pq
n

Find
Find the
the appropriate
appropriate area
area using
using Table
Table 3.
3.
..55 ..44
Example: A random PP((pˆpˆ ..55)) 
PP((zz  ))
sample of size n = ..44(.(.66))
100 from a binomial 100
100
population with p 
PP((zz 22..0404)) 
11 ..9793
9793 ..0207
0207
= .4.
Example
The soda bottler in the previous example claims
that only 5% of the soda cans are underfilled.
A quality control technician randomly samples 200
cans of soda. What is the probability that more
than 10% of the cans are underfilled?
nn==200
200 P ( pˆ  .10)
S:
S:underfilled
underfilledcan
can .10  .05
P ( z  ) P ( z  3.24)
pp==P(S)
P(S)==.05
.05 .05(.95)
qq==.95
.95 200
np
1  .9994 .0006
np==10
10 nq
nq==190
190
This would be very
OK to use the normal
unusual, if indeed p = .05!
approximation
Statistical Process Control
The cause of a change in the variable is said to
be assignable if it can be found and corrected.
Other variation that is not controlled is regarded
as random variation.
Ifthe variation in a process variable is solely
random, the process is said to be in control.
control
Ifout of control, we must reduce the variation
and get the measurements of the process
variable within specified limits.
x
The Chart for
Process Means
At various times during production, we take a
sample of size n and calculate the sample mean x .
According to the CLT, the sampling distribution of x

should be approximately normal; almost all of the


values of x should fall into the interval


33
nn
Ifa value of x falls outside of this interval, the
process may be out of control.
The x Chart
To create a control chart, collect data on k
samples of size n. Use the sample data to
estimate and .
The mean is estimated withx , the grand
average of all the sample statistics calculated for
the nk measurements on the process variable.
The standard deviation  is estimated by s, the
standard deviation of the nk measurements.
Create the control chart, using a centerline and
control limits.
The x Chart
Centerline
Centerline::xx
ss ss
LCL::xx  33
LCL UCL::xx 33
UCL
nn nn

When a sample
mean falls
outside the
control limits, the
process may be
out of control.
The p Chart for
Proportion Defective
At various times during production, we take a
sample of size n and calculate the proportion of
defective items, pˆ 
. x/n
According to the CLT, the sampling distribution
of p̂ should be approximately normal; almost all
of the values of p̂ should fall into the interval
pq
pq
pp3
3
nn
Ifa value of p̂ falls outside of this interval, the
process may be out of control.
The p Chart
To create a control chart, collect data on k
samples of size n. Use the sample data to
estimate p.
The population proportion defective p is
estimated with  pˆ i
p
k
the grand average of all the sample proportions
calculated for the k samples.
Create the control chart, using a centerline and
control limits.
The p Chart
Centerline
Centerline:: pp
pp((11 pp)) pp((11 pp))
LCL:: pp 33
LCL UCL:: pp33
UCL
nn nn

When a sample
proportion falls
outside the control
limits, the process
may be out of
control.
Key Concepts
I. Sampling Plans and Experimental Designs
1. Simple random sampling
a. Each possible sample is equally likely to occur.
b. Use a computer or a table of random numbers.
c. Problems are nonresponse, undercoverage, and
wording bias.
2. Other sampling plans involving randomization
a. Stratified random sampling
b. Cluster sampling
c. Systematic 1-in-k sampling
Key Concepts
3. Nonrandom sampling
a. Convenience sampling
b. Judgment sampling
c. Quota sampling
II. Statistics and Sampling Distributions
1. Sampling distributions describe the possible values of a
statistic and how often they occur in repeated sampling.
2. Sampling distributions can be derived mathematically,
approximated empirically, or found using statistical
theorems.
3. The Central Limit Theorem states that sums and averages
of
measurements from a nonnormal population with finite
mean  and standard deviation  have approximately
normal distributions for large samples of size n.
Key Concepts
III. Sampling Distribution of the Sample Mean
1. When samples of size n are drawn from a normal population
with mean  and variance 2, the sample mean x has a
normal distribution with mean  and variance 2n.
2. When samples of size n are drawn from a nonnormal
population with mean  and variance 2, the Central Limit
Theorem ensures that the sample mean x will have an
approximately normal distribution with mean  and variance
2n when n is large (n  30).
3. Probabilities involving the sample mean  can be calculated
by standardizing the value of x using
xx 
zz

// nn
Key Concepts
IV. Sampling Distribution of the Sample
Proportion
1. When samples of size n are drawn from a p̂
binomial population with parameter p, the
sample proportion will have an
approximately normal distribution with mean p
and variance pq n as long as np  5 and nq 
5.

2. Probabilities involving the sample proportion
can be calculated by pˆpˆstandardizing
pp the value
zz

using pq
pq
nn
Key Concepts
V. Statistical Process Control
1. To monitor a quantitative process, use an xchart. Select k
samples of size n and calculate the overall mean x and the
standard deviation s of all nk measurements. Create upper and
lower control limits as LCL : x  3
LCL : x  3
ss
UCL : x  3
UCL : x  3
ss
nn nn

If a sample mean exceeds these limits, the process is out of control.


2. To monitor a binomial process, use a p chart. Select k samples
of size n and calculate the average of the sample proportions as
pˆpˆi Create upper and lower control limits as
pp i
kk pp(1(1  pp) ) pp(1(1  pp) )
LCL : p  3
LCL : p  3 UCL : p  3
UCL : p  3
nn nn

If a sample proportion exceeds these limits, the process is out of


control.

You might also like