Sample and Sampling Distribution
Sample and Sampling Distribution
In statistics, Sampling plays vital role. Because most of the time we are occupied with a class of
problems that involve an attempt to say something about the properties of a large group of
objects, given information on a relatively small subset of them. This is done, especially in Least
Developed Countries (LDCs), because there is no resource to undertake census or complete
enumeration there. So the major motivation for examining a sample rather than the whole
population is that the collection of complete information on the latter would typically be
prohibitively expensive. Even in circumstances where sufficient resources are apparently
available to contact the whole population, it may be preferable to devote these resources to just a
subset of the population in the hope that such a concentration of effort will produce more
accurate measurements. But if we take a sample from a population, the eventual aim is to make
statements that have some validity for the population at large. Therefore, it is important that the
sample be representative of the whole population. Generally, the overall purpose of this chapter
is to introduction and equips students with the concepts of sample, sampling and sampling
distribution.
At the outset of this course, we have defined what sample and population mean. Restating again,
population refers to all items that have been chosen for study. That is the larger parent group is
called population and sample refers to a portion or subset of the population selected.
Sampling: - is the process of selecting samples (or part of the items from the population) from
of population of interest for critical investigation.
Exercise
1.Why sampling?
2. What is the difference and similarity between Sampling and Census (complete enumeration).
Mathematically, we can describe samples and population by using measures such as the mean,
median, mode, and standard deviation. When these terms describe the characteristics of a
sample, they are called statistics. When they describe the characteristics of a population, they are
referred parameters.
If we are convinced that the sample statistics are accurate estimate of the population
characteristics, we could use sample statistics to estimate the population parameter without
measuring the entirety of the items under study. In order to be consistent, Statisticians use lower
case Roman letters to denote sample statistics, and Greek or capital letters to denote population
parameters.
Table 2.1: Summary of definitions and characteristics of population and sample
Population Sample
Page 1 of 10
Definition Collection of all items being dealt Part (sub-set) of the
in a study population
Characteristic Population Parameter Sample Statistics
o Population size = N Sample size = n
o Population mean = µ Sample mean = x
Symbol
o Population variance = σ2 Sample variance = S2
o Population-standard deviation = σ Sample-standard-deviation = S
B) Non-probability (Non-random/Judgment)
Sampling is a sampling methodology where personal knowledge and opinion play major role in
identifying which elements of the population are to be included in the sample, and the
probability of an element from the population to be included in the sample is not known. Just
like the probability sampling, the following are the non-probability sampling techniques of
selecting a sample that will be discuss later:
Accidental, Haphazard or Convenience Sampling
Purposive Sampling
Quota Sampling
Snowball Sampling
2.2.3. Criteria in Sampling Procedure
In this context one must remember that two costs are involved in a sampling analysis-the cost of
collecting the data and the cost of an incorrect inference (conclusions) resulting from the data.
There are two causes of incorrect inferences namely non-sampling (systematic bias) and
sampling error.
1. Systematic bias: it is a non-sampling errors that results from errors in the sampling
procedures and it cannot be reduced or eliminated by increasing the sample size. Or it is the
differences between the population and the sample that arise either from deficiencies in the
Page 2 of 10
sampling approach. However, the causes responsible for these errors can be detected and
corrected. Bias enters in when a sample fails to represent the population it was intended to
represent. Usually a systematic bias is the result of one or more of the following factors:
i. Inappropriate sampling frame: If the sampling frame is inappropriate i.e., a biased
representation of the universe, it will result in a systematic bias. Or it can be arises from a
wrong list or set of directions for identifying the target population
ii. Defective measuring device: In survey work, systematic bias can result if the questionnaire or
the interviewer is biased. Similarly, if the physical measuring device is defective there will be
systematic bias in the data collected through such a measuring device.
iii. Non-respondents: If we are unable to sample all the individuals initially included in the
sample, there may arise a systematic bias.
iv. Indeterminacy principle: Sometimes we find that individuals act differently when kept under
observation than what they do when kept in non-observed situations. For instance, if workers
are aware that somebody is observing them in course of a work study on the basis of which
the average length of time to complete a task will be determined and accordingly the quota
will be set for piece work, they generally tend to work slowly in comparison to the speed with
which they work if kept unobserved.
v. Natural bias in the reporting of data: This also leads to a systematic bias. There is usually a
downward bias in the income data collected by government taxation department, whereas we
find an upward bias in the income data collected by some social organization.
2. Sampling errors: it is the difference between a sample and the population from which it is
selected, even though a probability sample has been selected. On the other hand, it indicates the
random variations in the sample estimates around the true population parameters. Since they
occur randomly and are equally likely to be in either direction, their nature happens to be of
compensatory type and the expected value of such errors happens to be equal to zero. Sampling
error decreases with the increase in the size of the sample, and it happens to be of a smaller
magnitude in case of homogeneous population.
Sampling error can be measured for a given sample design and size. The measurement of
sampling error is usually called the ‘precision of the sampling plan’. If we increase the sample
size, the precision can be improved. But increasing the size of the sample has its own limitations.
A large sized sample increases the cost of collecting data and also enhances the systematic bias.
Thus the effective way to increase precision is usually to select a better sampling design which
has a smaller sampling error for a given sample size at a given cost.
In brief, while selecting a sampling procedure, researcher must ensure that the procedure causes
a relatively small sampling error and helps to control the systematic bias in a better way.
Page 3 of 10
Sampling error = Frame error + Chance error + Response error
2.2.4. Characteristics of a Good Sample Design
Page 4 of 10
5
∑ Xi
1= 1 2 +4 + 6 + 8 + 10 30
= = =6
The population mean = N 5 5
∴δ =
√ (2−6 )2 + (4−6 )2 + (6−6 )2 + (8− 6)2 + (10 − 6 )2
5
=
40
5 √
= √8 = 2 . 83
Now, Let us assume the sample size, n = 2 and take all the possible samples of size 2, from this
population.
Page 5 of 10
Accordingly, the sampling distribution of the means of the ages of children as tabulated above
has three predicate patterns.
These are: -
1- The mean of the sampling distribution and the means of the population are equal. This
can be shown as follows:
Sample Mean ( X ) Probability P ( X )
3 0.1
4 0.1
5 0.2
6 0.2
7 0.2
8 0.1
9 0.1
1.00
The mean of the sampling distribution is given by = ∑ X . P ( X)
Thus, = ∑ X . P ( X ) = (3 x 0.1) + (4 x 0.1) + (5 x 0.2) + (6 x 0.2) + (7 x 0.2) + (8 x 0.1)
+ (9 x 0.1)
= 0.3 + 0.4 + 1.0 + 1.2 + 1.4 + 0.8 + 0.9
= 6.0
This value is the same as the mean of population
2- The spread of the sample means in the distribution is smaller than in the population values.
For instance, the spread in the distribution of sample means above is from 3 to 9, while the
spread in the population was from 2 to 10.
3- The shape of the sampling distribution of the means tends to be “Bell-shaped” and
approximates the normal probability distribution, even when the population is not normally
distributed.
THE CENTRAL LIMIT THEOREM
√
7
∑ ( X i − μ X i )2
δX = i=1
7 =√(3−6 )2
7 √
28
= 7
= √4 = 2
However, since it is not possible to take all possible samples from the population, we must use
alternative method to compute X
δ
If mean is given for a finite population,
δX =
N = Population size
δ
.
√
( N −n )
√n N − 1 Where = Population standard deviation
n = Sample size.
√
(N − n)
Since we generally deal with very large population, which can be considered infinite, ( N −1 )
would approach to 1.
δ
δX =
Hence, √n
√
N −n
- The factor N − 1 is known as the finite correction factor and should be used when population
size is finite.
A population is said to be infinite when it is not possible to list or count all the elements
included in the population, (i.e., when the elements are unlimited).
Or, in the cases when the elements in the population are limited, the population may be
considered as infinite, when the sample size is small and as a rule of thumb, statisticians
consider the population as infinite when n 5% of N. And the population is said to be finite
when n > 0.05 N.
- Any single sample mean will become closer to the population mean, as the value of X
δ
decreases.
Example:1 The IQ scores of College students are normally distributed with the mean of 120 and
standard deviation of 10.
a – what is the probability that the IQ score of any one student chosen at random is
between 120 and 125?
b – If a random sample of 25 students is taken, what is the probability that the mean of
this sample will be between 120 and 125?
Solution: - a -
Page 7 of 10
= 10 = 120 125
Using the standardized normal distribution formula,
( X − μ ) 125 −120 5
Ζ= = = = 0. 5
a- δ 10 10
From the table, the area for Z = 0.5 is 0.915.
There is a 19.15 % chance that a student picked up randomly will have an IQ score between
120 and 125.
δ 10 10
δX = = = =2
b- √n √ 25 5
X − μ 125 − 120 5
= = = 2.5
Then Z =
δ X 2 2
The area for Z = 2.5 is 0.4938.
There is a chance of 49.38 % that the sample mean will be between 120 and 125.
- As the sample size increases further, this chance will also increase.
- It can be noted that the probability of a sample mean being between 120 and 125 is much
higher than the probability of an individual student having an IQ between 120 and 125.
Example:2 If the heights of male students are normally distributed with mean 66 inches and
standard deviation 3.0 inches,
a – what is the probability that the height of any student picked up at random will be more than
60 inches?
b – If a random sample of 25 students is taken, what is the probability that their mean height
would be between 66 and 67 inches?
Therefore, sampling distribution is a distribution of a sample statistic. It is a model of a
distribution of scores, like the population distribution, except that the scores are not raw scores,
but statistics.
In addition, Suppose we draw all possible samples of size n from a population of size N. Suppose
further that we compute a mean score for each sample. In this way, we create a sampling
distribution of the mean.
We know the following. The mean of the population (μ) is equal to the mean of the sampling
distribution (μx). And the standard error of the sampling distribution (σ x) is determined by the
standard deviation of the population (σ), the population size, and the sample size. These
relationships are shown in the equations below:
Therefore, we can specify the sampling distribution of the mean whenever two conditions are
met:
Exercise: Determine the probability distribution of the mean for sample size of n=2 drawn from
a population that has N=5 elements (1, 3, 5, 5, 9).
Page 8 of 10
A) How many possible samples can you draw from a population of 5 elements taking 2
samples at a time(sampling without replacement)
B) Find the sampling distribution of the mean
C) Find the standard error of the mean
B) Sampling Distribution of the Proportion
In a population of size N, suppose that the probability of the occurrence of an event (dubbed a
"success") is P; and the probability of the event's non-occurrence (dubbed a "failure") is q. From
this population, suppose that we draw all possible samples of size n. And finally, within each
sample, suppose that we determine the proportion of successes p and failures q. In this way, we
create a sampling distribution of the proportion.
We find that the mean of the sampling distribution of the proportion (μ p) is equal to the
probability of success in the population (P). And the standard error of the sampling distribution
(σp) is determined by the standard deviation of the population (σ), the population size, and the
sample size. These relationships are shown in the equations below:
Example: Consider a population of N = 5 given numbers 3, 6, 9, 12, and 15. Let’s take even
numbers, the proportion of even numbers is 2/5 = 0.4. Consider a samples of size 3 (n = 3) that
are drawn from the population the samples, sample proportions are given in below table.
Samples
Sample Proportion ( P )
3, 6, 9 1/3
3, 6, 12 2/3
3, 6, 15 1/3
3, 9, 12 1/3
3, 9, 15 0/3
3, 12, 15 1/3
6, 9, 12 2/3
6, 9, 15 1/3
6, 12, 15 2/3
9, 12, 15 1/3
Given the above table, we can construct the probability distribution of the sample proportions
again the following table.
Table: - Probability Distribution of sample proportion ( P )
0/3 1/3 2/3
Sample proportion ( P )
0.1 0.6 0.3
Probability: P( P )
Sampling distribution of the proportion is the probability distribution of all possible values of the
sample proportion (P). If necessitates the understanding of the properties of sampling distribution
of the proportion (P): the mean value of ( P ), standard deviation of ( P ) and the shape or form of
the sampling distribution of ( P ).
Properties of the sampling distribution of the proportion ( P ).
1. The expected value of the sample proportion E( P ) is equal to the population proportion, P.
Symbolically: E ( P ) = P
Page 9 of 10
Where E ( P ) = is the expected value of the random variable ( P ).
P = is the population proportion from the above example,
Thus, E(P) P
2. Just as with the standard deviation of the sample means the standard deviation of
the sample proportion (S) also depends on whether the population is finite or infinite.
√ n √ √
σp = PQ and σp = PQ N −n when population is infinite and finite respectively.
n N −1
Page 10 of 10