Sampling Distribution
Sampling Distribution
Sampling Distribution
Introduction
The value of a statistic varies from one sample to another even if the samples are selected by the
same procedure from the same population. Thus the statistic is a random variable. The
distribution or probability distribution of a statistic is called a sampling distribution.
For example, the distribution of the sample mean is a sampling distribution of the mean and the
distribution of the sample proportion is a sampling distribution of the proportion. The standard
deviation of the sampling distribution of a statistic is called the standard error of the statistic.
The sampling distribution of the mean is a probability distribution and has the following major
characteristics:
1. The mean of all the sample means will be exactly equal to the population mean.
2. If the population from which the samples are drawn is normal, the distribution of sample
means is also normally distributed.
3. If the population from which the samples are drawn is not normal, the sampling
distribution is approximately normal, provided the samples are “sufficiently” large
(usually accepted to include at least 30 observations).
The central limit theorem states that, for large random samples, the shape of the sampling
distribution of the sample means is close to a normal probability distribution. The approximation
is more accurate for large samples than for small samples. One can make logical and reasonable
statements about the distribution of the sample means with little or no information about the
shape of the original distribution from which the sample has taken.
Page 1
Statistics
The Central Limit Theorem does not address the dispersion of the sampling distribution of
sample means nor does it address the comparison of the sampling distribution of sample means
to the mean of the population. It can be shown that the mean of the sampling distribution is the
population mean, and if the standard deviation in the population is σ, the standard deviation of
the means is , where n is the number of observations in each sample. It refers to as the
standard error of the mean. It is actually the standard deviation of the sampling distribution of
the sample mean:
The standard error is a measure of the variability of the sampling distribution of the means.
Where:
The size of the standard error is affected by the standard deviation. As the standard deviation
increases so does the standard error. The standard error is also affected by the sample size. As
the sample size increases the standard error decreases, which indicates that there is less
variability in the distribution of the sample means. Obviously one concludes that as one increase
the sample size the standard error decreases.
Page 2
Statistics
1. The mean of the distribution of sample means will be exactly equal to the population
mean if we are able to select all possible samples of a particular size from a given
population. That is . Even if one does not select all samples, one can expect the
mean of the distribution of the sample mean to be close to the population mean.
2. There will be less dispersion in the sampling distribution of the distribution of sample
mean than in the population. If the standard deviation of the population is σ, the standard
deviation of the distribution of sample means is . Note that when one increase the size
of the sample the standard error of the mean decreases.
The majority of decisions are made on the basis of sampling. Generally one has a population and
wish to know something about that population, such as the mean. One take a sample from that
population and wish to conclude whether the sampling error, that is the difference between the
population parameter and the sample statistic, is due to chance.
One can compute the probability that a sample mean will fall within a certain range. The
sampling distribution of the sample mean will follow the normal probability distribution under
two conditions:
1. When the samples are taken from populations known to follow the normal distribution. In
this case the size of the sample is not a factor.
2. When the shape of the population distribution is not known or the shape is known to be
non-normal, but the sample contains at least 30 observations.
The z-value is used to convert any normal distribution to the standard normal distribution. One
can use the standard normal table to find the probability of selecting a value of an observation
that falls within a specified range. The formula is:
In this formula X is the value of the random variable, μ is the population mean, and is the
population standard deviation.
Since most decisions are based on a sample, one is interested in the distribution of the sample
mean not the value of , the value of one observation. Change to . Then change the
Page 3
Statistics
Example 1:
Suppose that a population consists of N = 5 numbers: 1, 2, 3, 4, 5. The mean () and the standard
deviation () of this population are given by
1 N 1 + 2 + 3 + 4 + 5 15
= Xi = = = 3.
N i=1 5 5
1 N
= (X )2
N i=1 i
= 10/5 = 2 = 1.4142.
Suppose that all possible samples of size n = 2 is drawn with replacement, and then for each
sample compute the sample meanX. There are Nn = 52 = 25 samples of size two which can be
drawn with replacement. These samples are
(1, 1) (2, 1) (3, 1) (4, 1) (5, 1)
(1, 2) (2, 2) (3, 2) (4, 2) (5, 2)
(1, 3) (2, 3) (3, 3) (4, 3) (5, 3)
(1, 4) (2, 4) (3, 4) (4, 4) (5, 4)
(1, 5) (2, 5) (3, 5) (4, 5) (5, 5)
The corresponding means of these twenty five samples are
1.0 1.5 2.0 2.5 3.0
1.5 2.0 2.5 3.0 3.5
2.0 2.5 3.0 3.5 4.0
2.5 3.0 3.5 4.0 4.5
3.0 3.5 4.0 4.5 5.0
The statisticX clearly assumes values ranging from 1.0 to 5.0. The sampling distribution ofX is
given in Table. This is called the sampling distribution ofX with replacement.
Page 4
Statistics
Mean and Standard Deviation of the Sampling Distribution of Means The sampling
distribution ofX has a mean and a standard deviation. The mean and the standard deviation of
the sampling distribution ofX, denoted respectively by
X and
X, are given by
X
= x f(x )
These measures for the sampling distribution ofX are calculated in the following table.
2 (x 2 2
x f(x ) x f(x ) x f(x ) X) (x
X) (x
X) f(x )
1.0 1/25 1/25 1/25 2.0 4.0 4/25
1.5 2/25 3/25 4.5/25 1.5 2.25 4.5/25
2.0 3/25 6/25 12/25 1.0 1.0 3/25
2.5 4/25 10/25 25/25 0.5 0.25 1/25
3.0 5/25 15/25 45/25 0 0 0/25
3.5 4/25 14/25 49/25 0.5 0.25 1/25
4.0 3/25 12/25 48/25 1.0 1.0 3/25
4.5 2/25 9/25 40.5/25 1.5 2.25 4.5/25
5.0 1/25 5/25 25/25 2.0 4.0 4/25
2
xf(x ) = x 2f(x ) (x
X) f(x )
f(x ) = 1
75/25 = 3 = 250/25 = 10 = 25/25 = 1
Page 5
Statistics
X = x f( x ) = 3.
X = (
x 2
X) f( x ) = 1 = 1.
2 2 2
X = x f( x )
X = 10 (3) = 1.
The standard deviation of the sampling distribution ofX, denoted by X , is equal to the
population standard deviation divided by the square root of the sample size n, that is
X = n
= 1.4142./√2
=1
Suppose all possible samples of size two from a population is drawn without replacement and
then for each sample compute the sample mean,X. There are 5C2 = 10 samples of size two
which can be drawn without replacement from the population.
These samples are
(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5).
The corresponding sample means are
1.5, 2, 2.5, 3, 2.5, 3, 3.5, 3.5, 4, 4.5.
The statisticX now assumes the values ranging from 1.5 to 4.5.
Page 6
Statistics
Computations for the mean and the standard deviation of the sampling distribution ofX are
shown in the following table.
x f(
x)
x f(
x) x2 f(
x)
1.5 1/10 1.5/10 2.25/10
2.0 1/10 2.0/10 4/10
2.5 2/10 5.0/10 12.5/10
3.0 2/10 6.0/10 18/10
3.5 2/10 7.0/10 24.5/10
4.0 1/10 4.0/10 16/10
4.5 1/10 4.5/10 20.25/10
x2f(
x) = 97.5/10 =
xf(
x) = 30/10 = 3
9.75
=
x f(
x ) = 3.
X
=
x2 f(
x ) 2
= 9.75 (3)2 = 0.75 = 0.866.
X X
The mean of the sampling distribution ofX equals the population mean, that is,
X = .
The standard deviation of the sampling distributionX is given by
Nn
=
X n N1
5 2
= 1.4142./√2
51
= 0.866
Page 7
Statistics
. From all combinations of these samples from the two populations, a distribution of
X2
differences of means,X1X2, is obtained which is called the sampling distribution of differences
of the means. The mean and standard deviation of this sampling distribution, denoted by X X
1 2
and X X
, are given by
1 2
X
= = 1 2
X X X
(1)
1 2 1 2
2 2 2 12 22
= Var (X1
X X X2) = Var (X1) + Var (X2) = + = +
1 2 X1 X2 n1 n2
12 22
and X
= + . (2)
1 X2 n1 n2
The results (1) and (2 ) also hold for finite populations if sampling is with replacement. They are
also valid for finite populations when sampling is without replacement provided the population
sizes N1 and N2 are large relative to the sample sizes n1 and n2 respectively. However, if the
populations are small and sampling is without replacement, then X and X is computed by
1 2
Formula (2). In this case, the mean and standard deviation of the sampling distribution of
(X1 X2) is
2 2
1 N1 n1 2 N2 n2
n1 N1 1 n2 N2 1
X1X2 = + (5)
Page 8
Statistics
Example 2
Draw all possible random samples of size n1 = 2 with replacement from a finite population 3, 4, 5.
Similarly draw all possible random samples of size n2 = 2 with replacement from another finite
population 1, 2, 3. (a) Find sample meansX1 andX2 and the possible differences between the
sample means of the two populations. (b) Form a sampling distribution of X1 X2 and compute its
2 2
2 1 2
mean and variance. (c) Verify that (i) X X
= 1 2 and (ii) = + .
1 2 X X 1 2 n1 n2
Solution
In both populations N1 = N2 = 3 and n1 = n2 = 2. There are (3)(3) = 9 possible samples which can be
drawn with replacement from each population.
The samples for the first population are (3, 3), (3, 4), (3, 5), (4, 3),
(4, 4), (4, 5), (5, 3) (5, 4), (5, 5).
The corresponding samples means are 3, 3.5, 4, 3.5, 4, 4.5, 4, 4.5, 5.
The samples for the second population are (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3).
The corresponding means samples means are 1, 1.5, 2, 1.5, 2, 2.5, 2, 2.5, 3.
The eighty one possible differences (X1 X2) are shown in the following table.
X1
3.0 3.5 3.5 4.0 4.0 4.0 4.5 4.5 5.0
X2
1.0 2.0 2.5 2.5 3.0 3.0 3.0 3.5 3.5 4.0
1.5 1.5 2.0 2.0 2.5 2.5 2.5 3.0 3.0 3.5
1.5 1.5 2.0 2.0 2.5 2.5 2.5 3.0 3.0 3.5
2.0 1.0 1.5 1.5 2.0 2.0 2.0 2.5 2.5 3.0
2.0 1.0 1.5 1.5 2.0 2.0 2.0 2.5 2.5 3.0
2.0 1.0 1.5 1.5 2.0 2.0 2.0 2.5 2.5 3.0
2.5 0.5 1.0 1.0 1.5 1.5 1.5 2.0 2.0 2.5
2.5 0.5 1.0 1.0 1.5 1.5 1.5 2.0 2.0 2.5
3.0 0 0.5 0.5 1.0 1.0 1.0 1.5 1.5 2.0
(b) The differences (X1 X2) range from 0 to 4. The sampling distribution ofX1 X2 is given in the
following table. The last two columns have been added for calculation of the mean and variance of
the sampling distribution of (X1 X2).
Page 9
Statistics
(x1 f(x1 2
x2) Tally f x2) (x1
x2)f(x1
x2) (x1
x2) f(x1
x2)
0 / 1 1/81 0 0
0.5 //// 4 4/81 2/81 1/81
1.0 //// //// 10 10/81 10/81 10/81
1.5 //// //// //// / 16 16/81 24/81 36/81
2.0 //// //// //// //// 19 19/81 38/81 76/81
2.5 //// //// //// / 16 16/81 40/81 100/81
3.0 //// //// 10 10/81 30/81 90/81
3.5 //// 4 4/81 14/81 49/81
4.0 / 1 1/81 4/81 16/81
2
f f(x1
x2) (x1
x2)f(x1
x2) (x1
x2) f(x1
x2)
= 81 =1 = 162/81 = 2 = 378/81
2 2
= (x1 x2)2 f(x1 x2) = 378/81 (2)2
X X 1 2 X X 1 2
= 4.6667 4 = 0.6667.
(c) The mean and variance of the first population are
1 = (3 + 4 + 5)/3 = 4 and
(i) = 1 2 = 4 2 = 2.
X1 X2
Page 10
Statistics
Example 3:
The strength of a certain type of wire is normally distributed with a mean of 99.8 and a standard
deviation of 5.48. (a) What are the mean and standard deviation of the sampling distribution of
the mean based on a random sample of size 100? (b) A single random sample of 16 values is
drawn from this population. What is the probability that the mean of this sample will be between
98.8 and 100.9?
Solution
(a) Here = 99.8, = 5.48 and n = 100. The mean X and standard deviation X of the
sampling distribution ofX is given by
X = = 99.8 and X = / n = 5.48/ 100 = 0.548
Page 11
Statistics
Example 4
A firm employs 1500 people. During a given year the mean amount contributed to a charity
derive per employee was Rs.25.75. The standard deviation was Rs.5.25. What is the probability
that a random sample of 100 employees yields a mean between Rs. 25.00 and Rs.27.00?
Solution
Here = Rs.25.75, = Rs.5.25, N = 1500, n = 100 so that
N n 5.25 1500 100
X = = Rs.25.75 and X = = = Rs.0.51
n N1 100 1500 1
For X = 25.00, z = (X X )/
X = (25.00 25.75)/0.51 = 1.47
For X = 27.00, z = (27.00 25.75)/0.51 = 2.45
P(25.00 X 27.00) = P(1.47 z 2.45)
= P(-1.47z 0) + P(0 z 2.45)
= 0.4292 + 0.4929 = 0.9221
Example 5
Scores on a test for students of Group 1 are normally distributed with a mean and variance of 60
and 100 respectively. Scores for students of Group 2 are normally distributed with a mean of 50
and a variance of 121. A random sample of 10 students is selected from Group 1. An
independent random sample size of 11 is selected from Group 2. What is the probability that the
difference between sample meansX1 X2 is between 8 and 14?
Solution
2 2
Here 1 2 = 60 50 = 10, 1 = 100, 2 = 121, n1 = 10, n2 = 11
14 10
For X1 X2 = 14, z2 = = 0.87
100 121
+
10 11
The required probability is given by
P[8 (X1 X2) 14] = P(0.44 z 0.87) = P(-0.44 z 0) + P(0 z 0.87)
Page 12
Statistics
Example 6
A market analyst studying the length of time shoppers spend in two types of grocery store
observes a sample of 75 shoppers in each store. The mean time the sample of shoppers spends in
Store A is 55 minutes. The mean time the sample of shoppers spends in Store B is 49 minutes.
What is the probability of observing a sample difference XA XB at least as large as and in the
same direction as this if there is no difference in the true mean time that shoppers spend in the
two stores and if the standard deviation is 15 minutes for both populations? What assumptions
are made regarding the samples?
Solution
Here XA = 55 minutes, XB = 49 minutes, nA= nB = 75,
Page 13