Chapter 1 - 071218
Chapter 1 - 071218
Define sampling
Reason out why sampling is needed
Define sampling and non-sampling errors.
Identify types of samples- random and non-random samples.
Understand sampling distribution of the mean & proportion
Introduction
Sampling in statistics is a common and important as salt is in food. In homes, ladies take out one
teaspoonful to detect the quality what she is cooking. In medical sciences, a few drops of blood
are taken and tested microscopically or chemically to know whether the blood contains some
abnormalities or not.
Nowadays, sampling methods are extensively used in socio-economic surveys to know the living
condition, cost of living index etc. of a class of people. In biological studies, experiments are
conducted on some units (persons, animals or plants) and inferences are drawn about the breed or
variety to which the units belong. In the industries sampling procedures are predominantly used
for quality control.
Sampling theory is the study of the relationship existing between a population and sample drawn
from the population. It concerned with estimating the property of the population from those of
the samples and also with gauging the precision of the estimate. Sample theory is applicable only
to random samples. This sort of movement from particular (sample) towards general (population)
is what is known as statistical induction or statistical inference. In simple word, from the sample
we attempt to draw inference concerning the population.
E.g., when we work out certain measurement like, mean from a sample they are called statistics.
But when such measure describes the characteristic of the population, they are called parameter.
Less accuracy: In comparison to census technique the conclusion derived from sample
are more liable to error. Therefore, sampling technique is less accurate than the census
technique.
Need for specialized knowledge: The sample results can be successful only if a
competent and able researcher makes the selection. If it is done by average researcher the
selection is liable to error and it result in false generalization about the characteristics of
the population.
Representativeness: The selected sample must truly represent the whole population. It
should not lack a quality found in the whole population so that the results can be
generalized.
Adequacy: The size of the sample should be adequate enough to enable derivation of
conclusion applicable for the whole population. Sufficient
Independence: Elementary units those are selected as sample should be independent of
one another and all units of the population should have the same chance to being selected
in the sample.
Homogeneity: The element` included in the sample must bear likeness with other
element.
1.1.5. Sampling and Non Sampling Errors
Sampling study is subjected to sampling and non-sampling errors, which are of random and/or
of a constant in nature.
A. Sampling Error: is the difference between the sample estimation and the actual value of the
population. This error is created because of the chance only. Although the sample is properly
selected, there will be some difference between the sample statistics and the actual value
(population parameter). The mean of the sample might be different from the population
mean by chance alone. The standard deviation of the sample might also be different from the
population standard deviation. Therefore, we can expect some difference between the sample
statistics and the population parameter. This difference is known as sampling error. To
illustrate this let us take a very simple example. Suppose an individual student has scored the
following grades in 10 subjects (Consider these subjects as population); 55, 60, 65, 90, 55,
75, 88, 45, 85, 82. Say, a sample of four grades 55, 65, 82, and 90 are selected at random
from this population to estimate the average grade of this student. The mean of this sample is
73. But the population mean is 70. The sampling error is therefore, 73 - 70 = 3. Sampling
B. Non-sampling error (Systematic Error): is also called sampling bias, such error occurs
because of human mistakes and not due to chance variation and it cannot be reduced or
eliminate by increasing the sample size. The possible factors that contribute to the creation of
such error include inappropriate sampling frame, accessibility bias, defective measuring
device, and non-response bias or defects in data collection.
Accessibility bias: In many research studies, statistician tends to select respondents who
are the most accessible to them. When all members of the population are not equally
accessible, the statistician must provide some mechanism of controlling in order to ensure
the absence of over and under-representation of some respondents.
1.1.6.Types of samples
Types of sampling
In this case, the sample units are not selected based on the discretion of the researcher rather
based on random selection procedures and this process of randomization help us tom make
unbiased inferences about the population of interest. In probability sampling, each unit of the
population has some known probability of entering the sample. The four types of probability
sampling that most commonly used are discussed as follow:
In simple random sampling each element in the population has an equal chance of being included
in the sample. It is drawn by a random procedure from a sample frame. Drawing names from a
hat is a typical simple random sampling technique. The sampling process is simple because it
requires only one stage of sample selection. Selecting random sample is made in such a way that.
Each element in the sample frame is assigned a number. Then each number is written on separate
pieces of paper, properly mixed and one is selected. If say the sample size is 45, then the
selection procedure is repeated 45 times. When the population is consists of a large number of
elements, table of random digits or computer generated random numbers are utilized.
2. Systematic Sampling
The mechanics of taking a systematic sample are simple. If the population contains N ordered
elements, and sample size of n is required or desired to select, then we find the ratio of these two
numbers, i.e., N/n to obtain the sampling interval. E.g., Say the population size N= 600 and the
desired sample size is 60 (n = 60), then the sample interval will be 600/60 = 10 Random number
at the 10 interval will be selected, i.e., if the researcher starts from the fourth element then 4 th,
14th, 24th etc, elements will be selected.
3. Stratified Sampling
For example, suppose you want to collect information regarding income expenditure of male
population, say in Jijiga Town.
First you shall split the whole male population in the town into various strata on the basis of, say
level of income like:
Higher level-income
Having the classification of these different groups, you will select elements using random sample
technique.
How to form strata? We can say that strata can be formed on the basis of common
characteristics of the items (elements) to be put in each stratum. Various strata are formed
in such away as to ensure element being more homogeneous within each stratum. Thus,
strata are purposively formed and are usually based on past experience and personal
judgment of the researcher.
How should items (elements) be selected from each stratum? The usual method for
selection of items for the sample from each stratum is that of simple random sampling.
How many items to be selected from each stratum (sample size)? Stratified samples may
be either proportional or non-proportional.
In a proportional stratified sampling, the number of elements to be drawn from each stratum is
proportional to the size of that stratum compared with the population.
For example, if you plan to draw a sample of 100 students from college of business and
economics to assess factors affecting their academic performance in Jijiga University, first you
should divide the target population in to sub –population /strata (different departments such as
Management, Accounting, Economics and public administration). Once you have classified the
population in to different strata, you can draw a sample from each stratum proportionally using
ni = Ni *n
Where ni = number of sample units from stratum i, N = the total number of units in the
population, Ni = the total number of units in the stratum i, n = sample size desired. Thus, the
elements to be drawn from each stratum would be 25, 27, 24 and 24 respectively. Proportional
stratification yields a sample that represents the population with respect to the proportion in each
stratum in the population. Proportional stratified sampling yields satisfactory results if the
dispersion in the various strata is of proportionately the same magnitude. If there is a significant
difference in dispersion from stratum to stratum, sample estimates will be much more efficient if
non-proportional stratified random sampling is used, here, equal numbers of elements are
selected from each stratum regardless of how the stratum is represented in the population. Thus,
in the earlier example, an equal number, i.e., 25, of students will be drawn to constitute the
sample.
4. Cluster sampling
No confidence can be placed in the data obtained from such samples; they don't
represent the large population. Therefore, the result obtained may not be generalized
for the entire population.
Non-probability sampling depends exclusively on uncontrolled factors and
researcher's insight, and there is no statistical method to determine the margin of the
sampling errors.
The advantage of non-probability sampling is that it is less expensive, and a researcher may take
the advantage of the available respondents without the statistical complexity of the probability
sampling. Some of the non-probability samplings methods are discussed below:
1. Quota sampling
Under this sampling approach, the interviewers are simply given quotas to be full-filled from the
different strata (groups). E.g., an interviewer in a particular city may assign 100 interviews. He
will assign this to different subgroups (say 50 for male respondents and 50 for female
respondents). Within the pre-assigned quotas, the selection of the sample elements depends on
the personal judgment.
For example, if one is studying the consumer preferences for ice creams among children and
college going students and supposes it is fixed to interview 250 individuals from each category.
If the city has five colleges, one decides to fix up a quota of 50 students to be interviewed from
each college. It entirely depends upon the interviewer who will constitute this sub-sample of 50
students in a college— they may be the first 50 students who visit the ice cream parlour or may
be the 50 students who visit the parlour between 4 p.m. and 6 p.m., etc
In this method the investigator has complete freedom in selecting his sample according to his
personal judgment because he is well acquainted with the population. As long as he know the
population very well he can decides which members (elementary units) in his or her judgment
would constitute a proper cross-section representing the parameters of relevance to the study.
The intent of this method is to select elements that are believed to be typical or representative of
the population. The key assumption underling in this type of sampling is that, with sound
judgment of expertise and an appropriate strategy, one can carefully and consciously choose the
element to be included in the sample. This method of sampling is generally used in studies
involving performance of personnel.
For example, if one is studying the performance of sales staff in a marketing organization, the
people here are classified into top grade, medium grade and low grade performers. Having
specified qualities that are important in the study, the expert (possibly here the Vice-President-
sales) indicates the people who, in his or her knowledge, would be representative of each of the
three categories mentioned earlier.
Snowball Sampling
It is also known as Multiplicity sampling or Multi-stage Sampling. The term snowball comes
from the analogy of the snowball, beginning small but becomes bigger and bigger as it rolls
downhill. Under this technique, first initial respondents are selected randomly but additional
respondent are then obtained from referrals or by other information provided by the initial
respondent. E.g., consider a researcher use telephone to obtain referral. Random telephone calls
are made; the respondents (answering the call) are asked if they know someone else who meets
the studies respondent qualification. Like “whether they know the someone who survived the
September eleven terrorist attack in New York “ Say, A researcher wants to study the impact of
the September Eleven Terrorist attack on the social life and life style of the survivals.
3. Convenience Sampling
In this scheme, elements are selected only based on the fact that they are easy, inexpensive and
convenient to sample. For example, samples are selected from the readily available sources or
lists such as telephone directory or a register of the small scale industrial units, etc. will give us
a convenient sample. No planned effort is made to collect information. The researcher comes
across certain people and communicates with them, and then he tries to make generalization
about the whole population. This sampling technique is not scientific. In general, the results
1.2.1. Definition
Sampling distribution is a distribution that shows how sample results are distributed. In other
words, it is a type of probability distribution of a sample statistics based on randomly selected
samples (i.e., the distribution of sample means is a sampling distribution) . It plays crucial role in
statistical work because they enable us to use data from random sample to make prediction or
judgments about the population.
Sampling distribution of the mean is refers to the probability distribution of all possible
sample means of random samples of a given size (n) that we take from a specified
population. If samples are taken from a normal population, N( μ , δ ), the sampling
distribution of mean would also be normal with mean
μ x = and standard deviation of
√
sampling distribution ( x ) is equal to δ / n , where μ is the mean of the population, δ is
the standard deviation of the population and n means the number of items in a sample. But
when sampling is from a population, which is not normal (may be positively or negatively
skewed), even then, as per the central limit theorem, the sampling distribution of mean tends
quite closer to the normal distribution, provided the number of sample items is large i.e., n >
30. In case we want to reduce the sampling distribution of mean to unit normal distribution
χ −μ
i.e., N(0, 1), we can write the normal variate Z = δ / √ n for the sampling distribution of
mean. The sample mean is unbiased estimator of population mean because the mean of all
possible samples
μ x (of a given sample size n) is equal to the population mean ( μ ).
Example 1- A company uses a filling machine to fill plastic bottles with popular cola . the
bottles are supposed to contain 300 milliliters. In fact, the contents vary according to normal
distribution with a mean (µ) of 298 ml. and standard deviation ( ) of 3 ml.
Solution
295−298
x = δ /√n = P ( x <295) = 1. 225 = -2.45
0.4929
µ =298 X
-2.345 0 Z
The area between z = 0 and z = -2.345 obtained from normal distribution table is 0.4929.
Since the desired area is in the left tail. Subtract 0.4929 from 0.5000. Thus,
Interpretation
The probability of mean contents of the bottles in a six –pack that less than 295 ml is 0.0071 or
0.71%.
b) Solution
Given
x
Required= P ( <295)=?
χ −μ 295−298
Z=
δ / √n = 3 / √1 = -1
0.3413
-1 0 Z
= 0.1587
Interpretation
The probability that an individual bottle contains less than 295 ml is equal to 0.1587 or 15.87%
Example 2- Harar Brewery factory, beer bottles are not always filled to capacity. The brewery
advertises that its bottles contain on average 300 milliliters of bear with standard deviation of 50
milliliters. If a random sample of 36 bottles was taken from production line,
A. Find the probability of observing a mean fill is between 280 milliliters and 315 milliliters.
B. Find the probability that bottles filled to 315 milliliters and more
a) Solution
Given
Required
X−μ 315−300 15
z 2= = = =1.80
σx 50 / √ 36 8.33
- 2.4 0 1.8 Z
P (-2.4 < Z < 1.8) = p ( -2.4 < z < 0) + P( 0< Z < 1.8)
The area between z = 0 and z = -2.40 is 0.4918. The area obtained from normal distribution
table between z = 0 and z = 1.80 is 0.4641. Add 0.4918 and 0.4641 (0.4918 + 0.4641 =
0.9559). Thus, the total area is 95.59%.
Hence, the probability that a randomly selected 36 bottles from the production line revels
that bottles are filled between 280 to 315 milliliters will be 0.9559 or 95.59%.
b) Solution
Required
P( x >315)
Find the z value
x−μ 315−300 15
z= = = =1. 80
σ 50/ √ 36 8 . 33
P( Z >1.8)
The area between z = 0 and z = 1.80 obtained from normal distribution table is 0.4641. Since the
desired area is in the right tail. Subtract 0.4641 from 0.5000.
Hence, the probability that a randomly selected bottle will fill up to 315 and more is 0.0359. Or
3.59%.
Exercises
1. The amount of soda in each bottle is normally distributed with a population means of
32.2 ounces and a population standard of 0.3 ounces.
Required
a. Find the probability that a bottle bought a customer will contain more than 32 ounces.
b. Calculate the probability that a bottle bought will contain less than 29 ounces.
2. The annual wages of all employees of a company has a mean of 20,400 per year with
standard deviation of 3200. The personnel manager is going to take a random sample of
36 employees and calculate the sample mean wage. What is the probability that the
sample mian will exceed 21.000?
3. A company makes engine used in speedboats. The company’s engineers believe that the
engine delivers an average power of 220 horse power / HP/ and that the standard
deviation of power delivered is 15 HP. A potential buyer intends to sample 100 engines
(each engine to be run a single time) .What is the probability that the sample mean, x,
will be less than 217 HP.
Like sampling distribution of mean, we can as well have a sampling distribution of proportion. It
is the probability distribution of all possible sample proportion ^p of random samples of a given
size (n) that we take from a specified population.
This happens in case of statistics of attribute. For instance, Assume that we have worked out the
proportion of defective parts in large number of samples, say 100 items that have been taken
from an infinite population and plot a probability distribution of the said proportions; we obtain
what is known as the sampling distribution of the said proportions. Usually the statistics of
attributes correspond to the conditions of a binomial distribution that tends to become normal
^p =√ p.q
n
In order to use the normal approximation for the sampling distribution of ^p , the sample size
needs to be large. A commonly used rule of thumb says that the normal approximation to the
^ may be used only if both n p and n q are each at least 5.
distribution of p
The normal variate of sampling distribution of proportion (the difference between the sample
proportion the population proportion in standardized normal unit) is
p^ − p
√
p.q
z = n , N(0, 1)
Example 1- A state representative received 52% of the voters in the last election. One year later
the representative wanted to study his popularity. If his popularity has not changed, what is the
probability that more than half of a sample of 300 voters would vote for him?
Solution
Given
n=300, p=0.52
Required
As long as the multiplication of both np ( 300 x 0.52) and nq ( 300 x 0.48 ) is greater than 5,
^
we can use normal approximation to the distribution of p .
p^ − p
√ p.q
z = n , N(0, 1)
0. 50−0 .52
= 0.69
- 0.2451
0 0.693 Z
Interpretation
THE PROBABILITY THAT MORE THAN HALF OF A SAMPLE OF 300 VOTERS WOULD VOTE
FOR HIM WOULD BE 0.2451 OR 24.51 %.
Example 2: suppose that a certain shoes company has noticed that on average 0.02 proportion
of shoes produced are defective , a random sample of 400 shoes are examined for the proportion
of defective shoes. Find the probability that sample proportion of the defective shoes is between
0.01 and 0.03?
Given
Required
Since the sample size is large enough, we can solve the required probability using standard
normal variable. That is, normal approximation to the distribution of ^p .
p^ − p
√
p.q
Z = n , N(0, 1)
0. 01−0. 02 0 . 03−0. 02
<Z <¿ ¿
√
0 .02 . 0. 08
P (0.01 < Z < 0.03) = 400 √ 0 . 02 . 0 . 08
400
−0 .01 0 .01
<Z<¿ ¿
= √ 0. 007 √ 0. 007
=−1 . 43<Z <¿ ¿1 . 43
0.4236
0.4236
- 1.43 0 1.43 Z
Exercises