0% found this document useful (0 votes)
1 views

Sample and Sampling Distribution

Class note

Uploaded by

Emran Amin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Sample and Sampling Distribution

Class note

Uploaded by

Emran Amin
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER TWO

SAMPLING AND SAMPLING DISTRIBUTION


2.1. Introduction

In statistics, Sampling plays vital role. Because most of the time we are occupied with a class of
problems that involve an attempt to say something about the properties of a large group of
objects, given information on a relatively small subset of them. This is done, especially in Least
Developed Countries (LDCs), because there is no resource to undertake census or complete
enumeration there. So the major motivation for examining a sample rather than the whole
population is that the collection of complete information on the latter would typically be
prohibitively expensive. Even in circumstances where sufficient resources are apparently
available to contact the whole population, it may be preferable to devote these resources to just a
subset of the population in the hope that such a concentration of effort will produce more
accurate measurements. But if we take a sample from a population, the eventual aim is to make
statements that have some validity for the population at large. Therefore, it is important that the
sample be representative of the whole population. Generally, the overall purpose of this chapter
is to introduction and equips students with the concepts of sample, sampling and sampling
distribution.

2.2. Sample and Sampling Theory

2.2.1. The Concepts of Sampling

At the outset of this course, we have defined what sample and population mean. Restating again,
population refers to all items that have been chosen for study. That is the larger parent group is
called population and sample refers to a portion or subset of the population selected.
Sampling: - is the process of selecting samples (or part of the items from the population) from
of population of interest for critical investigation.
Exercise
1.Why sampling?
2. What is the difference and similarity between Sampling and Census (complete enumeration).

Mathematically, we can describe samples and population by using measures such as the mean,
median, mode, and standard deviation. When these terms describe the characteristics of a
sample, they are called statistics. When they describe the characteristics of a population, they are
referred parameters.
If we are convinced that the sample statistics are accurate estimate of the population
characteristics, we could use sample statistics to estimate the population parameter without
measuring the entirety of the items under study. In order to be consistent, Statisticians use lower
case Roman letters to denote sample statistics, and Greek or capital letters to denote population
parameters.
Table 2.1: Summary of definitions and characteristics of population and sample
Population Sample

Page 1 of 10
Definition  Collection of all items being dealt  Part (sub-set) of the
in a study population
Characteristic  Population Parameter  Sample Statistics
o Population size = N  Sample size = n
o Population mean = µ  Sample mean = x
Symbol
o Population variance = σ2  Sample variance = S2
o Population-standard deviation = σ  Sample-standard-deviation = S

2.2.2. Types of Sampling


In statistics, there are two methods of selecting samples from a population:
 Random or probability sampling
 Non-random, non-probability or judgment sampling
A) Probability (Random) Sampling
It is sampling when all items (i.e., each element) in the population have a chance of being chosen
in the sample and the probability of each element of the population included in the sample is
known. There are several probability sampling techniques. The following are the commonly
used probability sampling techniques in statistical investigation:
1. Simple Random Sampling
2. Systematic Random Sampling
3. Stratified Sampling
4. Cluster Sampling
5. Multi- Stage Sampling

B) Non-probability (Non-random/Judgment)

Sampling is a sampling methodology where personal knowledge and opinion play major role in
identifying which elements of the population are to be included in the sample, and the
probability of an element from the population to be included in the sample is not known. Just
like the probability sampling, the following are the non-probability sampling techniques of
selecting a sample that will be discuss later:
 Accidental, Haphazard or Convenience Sampling
 Purposive Sampling
 Quota Sampling
 Snowball Sampling
2.2.3. Criteria in Sampling Procedure

In this context one must remember that two costs are involved in a sampling analysis-the cost of
collecting the data and the cost of an incorrect inference (conclusions) resulting from the data.
There are two causes of incorrect inferences namely non-sampling (systematic bias) and
sampling error.

1. Systematic bias: it is a non-sampling errors that results from errors in the sampling
procedures and it cannot be reduced or eliminated by increasing the sample size. Or it is the
differences between the population and the sample that arise either from deficiencies in the
Page 2 of 10
sampling approach. However, the causes responsible for these errors can be detected and
corrected. Bias enters in when a sample fails to represent the population it was intended to
represent. Usually a systematic bias is the result of one or more of the following factors:
i. Inappropriate sampling frame: If the sampling frame is inappropriate i.e., a biased
representation of the universe, it will result in a systematic bias. Or it can be arises from a
wrong list or set of directions for identifying the target population
ii. Defective measuring device: In survey work, systematic bias can result if the questionnaire or
the interviewer is biased. Similarly, if the physical measuring device is defective there will be
systematic bias in the data collected through such a measuring device.
iii. Non-respondents: If we are unable to sample all the individuals initially included in the
sample, there may arise a systematic bias.
iv. Indeterminacy principle: Sometimes we find that individuals act differently when kept under
observation than what they do when kept in non-observed situations. For instance, if workers
are aware that somebody is observing them in course of a work study on the basis of which
the average length of time to complete a task will be determined and accordingly the quota
will be set for piece work, they generally tend to work slowly in comparison to the speed with
which they work if kept unobserved.
v. Natural bias in the reporting of data: This also leads to a systematic bias. There is usually a
downward bias in the income data collected by government taxation department, whereas we
find an upward bias in the income data collected by some social organization.

2. Sampling errors: it is the difference between a sample and the population from which it is
selected, even though a probability sample has been selected. On the other hand, it indicates the
random variations in the sample estimates around the true population parameters. Since they
occur randomly and are equally likely to be in either direction, their nature happens to be of
compensatory type and the expected value of such errors happens to be equal to zero. Sampling
error decreases with the increase in the size of the sample, and it happens to be of a smaller
magnitude in case of homogeneous population.

Sampling error can be measured for a given sample design and size. The measurement of
sampling error is usually called the ‘precision of the sampling plan’. If we increase the sample
size, the precision can be improved. But increasing the size of the sample has its own limitations.
A large sized sample increases the cost of collecting data and also enhances the systematic bias.
Thus the effective way to increase precision is usually to select a better sampling design which
has a smaller sampling error for a given sample size at a given cost.

In brief, while selecting a sampling procedure, researcher must ensure that the procedure causes
a relatively small sampling error and helps to control the systematic bias in a better way.

Page 3 of 10
Sampling error = Frame error + Chance error + Response error
2.2.4. Characteristics of a Good Sample Design

Sample design must


a) result in a truly representative sample
b) result in small sampling error
c) be viable in the context of funds available for the research study
d) must enable to control the systematic bias in a better way
e) be such that the results of the sample study can be applied, in general, for the universe
with a reasonable level of confidence.

2.3. Sampling Distributions

Definition: it is a probability distribution of all the values of sample statistics. To


understand how close the sample values (statistics) to the population parameter, we need to
understand the properties of the sampling distribution of sample statistics. And we have sampling
distribution of the mean, proportion and variance which we will discuss here.

A) Sampling Distribution of the mean:


- Sample mean is an estimate of population mean that can be more accurate by taking
large samples.
- Sampling distribution of the mean is probability distribution of all possible sample
means of a given size, selected from a population.
Steps to construct sampling distribution of the mean:
Example: - Suppose a baby sitter has 5 children under her supervision with average age of 6
years. The age of each child is given as follows.
Children X1 X2 X3 X4 X5
Age 2 4 6 8 10
These 5 children constitute our entire population, so that N = 5 and.

Page 4 of 10
5
∑ Xi
1= 1 2 +4 + 6 + 8 + 10 30
= = =6
The population mean = N 5 5

The standard deviation is given by the formula  = √ ∑ ( X − μ )2


N

∴δ =
√ (2−6 )2 + (4−6 )2 + (6−6 )2 + (8− 6)2 + (10 − 6 )2
5
=
40
5 √
= √8 = 2 . 83
Now, Let us assume the sample size, n = 2 and take all the possible samples of size 2, from this
population.

Since NCn = 5C2


5
2
=
2[]
!
5!
( 3 ! )
=
5x4
2
= 10,
there are 10 possible sample
These are X1, X2-------- (2, 4)--------- X 1 = 3 X2, X3------- (4, 6)------- X 5 = 5
X1, X3-------- (2, 6)--------- X 2 = 4 X2, X4------- (4, 8)------- X 6 = 6
X1, X4-------- (2, 8)--------- X 3 = 5 X2, X5------ (4, 10)------- X 7 = 7
X1, X5-------- (2, 10)------- X 4 = 6 X3, X4------ (6, 8)--------
X 8= 7
X3, X5------ (6, 10)----- X 9 = 8
X4, X5------ (8, 10)----- X 10 = 9
If only the first sample was taken, the average would be 3, if the second was taken, the average
would be 4 and so on. All these samples independently taken are totally unrepresentative of the

population. However, if a grand mean,


μ x or X , of the distribution of these sample means is
taken, i.e.,
10
∑ Xi
i=1 3 +4 + 5 + 6 + 5 + 6 + 7 + 7 + 8 + 9 60
μX = = = =6
10 10 10
The grand mean has the same value as the population mean. Let’s organize this distribution of
sample means into a frequency distribution and probability distribution.
Sample Mean Frequency Relative Frequency Probability
3 1 1/10 0.1
4 1 1/10 0.1
5 2 2/10 0.2
6 2 2/10 0.2
7 2 2/10 0.2
8 1 1/10 0.1
9 1 1/10 0.1
1.00
The probability distribution of the sample mean is referred to us “Sampling distribution of the
mean.”

Page 5 of 10
Accordingly, the sampling distribution of the means of the ages of children as tabulated above
has three predicate patterns.
These are: -
1- The mean of the sampling distribution and the means of the population are equal. This
can be shown as follows:
Sample Mean ( X ) Probability P ( X )
3 0.1
4 0.1
5 0.2
6 0.2
7 0.2
8 0.1
9 0.1
1.00
The mean of the sampling distribution is given by  = ∑ X . P ( X)
Thus,  = ∑ X . P ( X ) = (3 x 0.1) + (4 x 0.1) + (5 x 0.2) + (6 x 0.2) + (7 x 0.2) + (8 x 0.1)
+ (9 x 0.1)
= 0.3 + 0.4 + 1.0 + 1.2 + 1.4 + 0.8 + 0.9
= 6.0
This value is the same as the mean of population
2- The spread of the sample means in the distribution is smaller than in the population values.
For instance, the spread in the distribution of sample means above is from 3 to 9, while the
spread in the population was from 2 to 10.
3- The shape of the sampling distribution of the means tends to be “Bell-shaped” and
approximates the normal probability distribution, even when the population is not normally
distributed.
THE CENTRAL LIMIT THEOREM

The Central Limit Theorem States that:


“Regardless of the shape of population distribution, the distribution of the sample means
approaches the normal probability distribution as the sample size increases.”
The question is how large should the sample size be in order for the distribution of sample means
to approximate the normal distribution for any type of population. In practice, the sample sizes
of 30 or larger are considered adequate for this purpose. This should be noted however, the
sampling distribution would be normally distributed if the original population is normally
distributed, no matter what the sample size.
The discussion on the sampling distribution is concerned with the proximity to “a” sample mean
to the population mean. It can be seen that the possible values of sample means tend towards
the population mean, and according to central limit theorem, the distribution of sample means
tends to be normal for a sample size of n being larger than 30.
δ
Standard error of the mean ( X ): is a measure of dispersion of the distribution of sample
means and is similar to the standard deviation in a frequency distribution and it measures the
likely deviation of a sample mean from the grand mean of the sampling distribution.
Page 6 of 10
δX =
√ ∑ ( X − μ X )2
N , Where N = number of sample means.
For the above example of the sampling distribution of the ages of 5 children, the standard error
of the means is:


7
∑ ( X i − μ X i )2
δX = i=1
7 =√(3−6 )2
7 √
28
= 7
= √4 = 2
However, since it is not possible to take all possible samples from the population, we must use
alternative method to compute X
δ
If mean is given for a finite population,

δX =
N = Population size
δ
.

( N −n )
√n N − 1 Where  = Population standard deviation
n = Sample size.


(N − n)
Since we generally deal with very large population, which can be considered infinite, ( N −1 )
would approach to 1.
δ
δX =
Hence, √n


N −n
- The factor N − 1 is known as the finite correction factor and should be used when population
size is finite.
 A population is said to be infinite when it is not possible to list or count all the elements
included in the population, (i.e., when the elements are unlimited).
 Or, in the cases when the elements in the population are limited, the population may be
considered as infinite, when the sample size is small and as a rule of thumb, statisticians
consider the population as infinite when n  5% of N. And the population is said to be finite
when n > 0.05 N.
- Any single sample mean will become closer to the population mean, as the value of X
δ
decreases.
Example:1 The IQ scores of College students are normally distributed with the mean of 120 and
standard deviation of 10.
a – what is the probability that the IQ score of any one student chosen at random is
between 120 and 125?
b – If a random sample of 25 students is taken, what is the probability that the mean of
this sample will be between 120 and 125?
Solution: - a -

Page 7 of 10
 = 10  = 120 125
Using the standardized normal distribution formula,
( X − μ ) 125 −120 5
Ζ= = = = 0. 5
a- δ 10 10
From the table, the area for Z = 0.5 is 0.915.
There is a 19.15 % chance that a student picked up randomly will have an IQ score between
120 and 125.
δ 10 10
δX = = = =2
b- √n √ 25 5
X − μ 125 − 120 5
= = = 2.5
Then Z =
δ X 2 2
The area for Z = 2.5 is 0.4938.
 There is a chance of 49.38 % that the sample mean will be between 120 and 125.
- As the sample size increases further, this chance will also increase.
- It can be noted that the probability of a sample mean being between 120 and 125 is much
higher than the probability of an individual student having an IQ between 120 and 125.

Example:2 If the heights of male students are normally distributed with mean 66 inches and
standard deviation 3.0 inches,
a – what is the probability that the height of any student picked up at random will be more than
60 inches?
b – If a random sample of 25 students is taken, what is the probability that their mean height
would be between 66 and 67 inches?
Therefore, sampling distribution is a distribution of a sample statistic. It is a model of a
distribution of scores, like the population distribution, except that the scores are not raw scores,
but statistics.
In addition, Suppose we draw all possible samples of size n from a population of size N. Suppose
further that we compute a mean score for each sample. In this way, we create a sampling
distribution of the mean.
We know the following. The mean of the population (μ) is equal to the mean of the sampling
distribution (μx). And the standard error of the sampling distribution (σ x) is determined by the
standard deviation of the population (σ), the population size, and the sample size. These
relationships are shown in the equations below:

Therefore, we can specify the sampling distribution of the mean whenever two conditions are
met:
Exercise: Determine the probability distribution of the mean for sample size of n=2 drawn from
a population that has N=5 elements (1, 3, 5, 5, 9).
Page 8 of 10
A) How many possible samples can you draw from a population of 5 elements taking 2
samples at a time(sampling without replacement)
B) Find the sampling distribution of the mean
C) Find the standard error of the mean
B) Sampling Distribution of the Proportion
In a population of size N, suppose that the probability of the occurrence of an event (dubbed a
"success") is P; and the probability of the event's non-occurrence (dubbed a "failure") is q. From
this population, suppose that we draw all possible samples of size n. And finally, within each
sample, suppose that we determine the proportion of successes p and failures q. In this way, we
create a sampling distribution of the proportion.
We find that the mean of the sampling distribution of the proportion (μ p) is equal to the
probability of success in the population (P). And the standard error of the sampling distribution
(σp) is determined by the standard deviation of the population (σ), the population size, and the
sample size. These relationships are shown in the equations below:
Example: Consider a population of N = 5 given numbers 3, 6, 9, 12, and 15. Let’s take even
numbers, the proportion of even numbers is 2/5 = 0.4. Consider a samples of size 3 (n = 3) that
are drawn from the population the samples, sample proportions are given in below table.
Samples
Sample Proportion ( P )
3, 6, 9 1/3
3, 6, 12 2/3
3, 6, 15 1/3
3, 9, 12 1/3
3, 9, 15 0/3
3, 12, 15 1/3
6, 9, 12 2/3
6, 9, 15 1/3
6, 12, 15 2/3
9, 12, 15 1/3
Given the above table, we can construct the probability distribution of the sample proportions
again the following table.
Table: - Probability Distribution of sample proportion ( P )
0/3 1/3 2/3
Sample proportion ( P )
0.1 0.6 0.3
Probability: P( P )

Sampling distribution of the proportion is the probability distribution of all possible values of the
sample proportion (P). If necessitates the understanding of the properties of sampling distribution
of the proportion (P): the mean value of ( P ), standard deviation of ( P ) and the shape or form of
the sampling distribution of ( P ).
Properties of the sampling distribution of the proportion ( P ).
1. The expected value of the sample proportion E( P ) is equal to the population proportion, P.

Symbolically: E ( P ) = P
Page 9 of 10
Where E ( P ) = is the expected value of the random variable ( P ).
P = is the population proportion from the above example,

E(P)  Pr(P1 )P1  Pr(P2 )P 2  Pr(P3 )P 3


 0  0.2  0.2
 0.4

Thus, E(P)  P
2. Just as with the standard deviation of the sample means the standard deviation of

the sample proportion (S) also depends on whether the population is finite or infinite.

√ n √ √
σp = PQ and σp = PQ N −n when population is infinite and finite respectively.
n N −1

Page 10 of 10

You might also like