Ch-6&7
Ch-6&7
Probability distributions
Mr. Yonatan N.
1
RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS
Definition: A random variable is a numerical
description of the outcomes of the
experiment or a numerical valued function
defined on sample space, usually denoted
by capital letters.
Example: If X is a random variable, then it
is a function from the elements of the
sample space to the set of real numbers.
i.e. X is a function X: S R
A random variable takes a possible
outcome and assigns a number to it.
Example: Flip a coin three times, let X be
the number of heads in three tosses.
2
Contd…
3
Contd…
Random variables are of two types:
1. Discrete random variable: are variables which
can assume only a specific number of values.
They have values that can be counted Examples:
Toss coin n times and count the number of heads.
Number of children in a family.
Number of car accidents per week.
Number of defective items in a given company.
Number of bacteria per two cubic centimeter of
water.
4
Contd…
There are three types of Discrete random
variable.
binomial distribution,
the Poisson distribution,
the Hypergeometric distribution.
5
Binomial distribution
The binomial distribution can be used in situations
in which a given experiment (often referred to, in
this context, as a trial) is repeated a number of
times.
For the binomial model to be applied the following
four criteria must be satisfied:
the trial is carried out a fixed number of times n
the outcomes of each trial can be classified into two
‘types’ conventionally named ‘success’ or ‘failure’
the probability p of success remains constant for
each trial
the individual trials are independent of each other.
6
Contd…
For example, if we consider throwing a coin
7 times what is the probability that exactly
4 Heads occur?
7
Contd…
2. Continuous random variable: are
variables that can assume all values between
any two give values.
Examples:
Height of students at certain college.
Mark of a student.
Life time of light bulbs.
Length of time required to complete a given
In training.
principle variables such as height, weight,
and temperature are continuous, in practice
the limitations of our measuring instruments
8
restrict us to a discrete (though sometimes
Cont…
The Normal Distribution
Is Common continuous probability distributions.
The normal distribution is probably the most
important
distribution in all of probability and statistics.
Many populations have distributions that can
be fit very
closely by an appropriate normal (or Gaussian,
bell) curve.
Examples: height, weight, and other physical,
characteristics, scores on various tests, etc.
9
Probability Distribution
Definition: a probability distribution consists of
value that a random variable can assume and
the corresponding probabilities of the values.
Example: Consider the experiment of tossing a
coin three times. Let X is the number of heads.
Construct the probability distribution of X.
Solution:
First identify the possible value that X can
assume.
Calculate the probability of each possible
distinct value of X and express X in the form of
frequency distribution.
10
Contd…
11
Chapter 7
SAMPLING METHODS
Mr. Yonatan N.
12
LEARNING OBJECTIVES
13
What is research?
• “Scientific research is systematic, controlled,
empirical, and critical investigation of natural
phenomena guided by theory and hypotheses
about the presumed relations among such
phenomena.”
– Kerlinger, 1986
14
Important Components of Empirical Research
15
SAMPLING
A sample is “a smaller (but hopefully
representative) collection of units from a
population used to determine truths about that
population” (Field, 2005)
Why sample?
Resources (time, money) and workload
Gives results with known accuracy that can be
calculated mathematically
The sampling frame is the list from which the
potential respondents are drawn
Registrar’s office
Class rosters
Must assess sampling frame errors
16
SAMPLING……
What is your population of interest?
To whom do you want to generalize your
results?
All doctors
School children
Indians
Women aged 15-45 years
Other
Can you sample the entire population?
17
SAMPLING…….
18
19
SAMPLING BREAKDOWN
SAMPLING…….
STUDY POPULATION
SAMPLE
TARGET POPULATION
20
Types of Samples
21
Process
The sampling process comprises several
stages:
Defining the population of concern
Specifying a sampling frame, a set of items
or events possible to measure
Specifying a sampling method for selecting
items or events from the frame
Determining the sample size
Implementing the sampling plan
Sampling and data collecting
Reviewing the sampling process
22
Population definition
A population can be defined as including
all people or items with the characteristic
one wishes to understand.
Because there is very rarely enough time
or money to gather information from
everyone or everything in a population,
the goal becomes finding a
representative sample (or subset) of that
population.
23
SAMPLING FRAME
In the most straightforward case, such as the
sentencing of a batch of material from production
(acceptance sampling by lots), it is possible to
identify and measure every single item in the
population and to include any one of them in our
sample.
However, in the more general case this is not
possible.
There is no way to identify all rats in the set of all
rats.
As a remedy, we seek a sampling frame which has
the property that we can identify every single
element and include any in our sample .
The sampling frame must be representative of the
population
24
PROBABILITY SAMPLING
26
NON PROBABILITY SAMPLING
Any sampling method where some elements of population
have no chance of selection (these are sometimes referred
to as 'out of coverage'/'undercovered'), or where the
probability of selection can't be accurately determined.
It involves the selection of elements based on assumptions
regarding the population of interest, which forms the criteria
for selection.
Hence, because the selection of elements is nonrandom,
nonprobability sampling not allows the estimation of
sampling errors..
28
SIMPLE RANDOM SAMPLING
• Applicable when population is small,
homogeneous & readily available
• All subsets of the frame are given an equal
probability.
• Each element of the frame thus has an equal
probability of selection.
• It provides for greatest number of possible
samples. This is done by assigning a number to
each unit in the sampling frame.
• A table of random number or lottery system is
used to determine which units are to be
selected.
29
SIMPLE RANDOM SAMPLING……..
Estimates are easy to calculate.
Simple random sampling is always an EPS design, but not
all EPS designs are simple random sampling.
Disadvantages
If sampling frame large, this method impracticable.
Minority subgroups of interest in population may not be
present in sample in sufficient numbers for study.
30
REPLACEMENT OF SELECTED UNITS
33
SYSTEMATIC SAMPLING……
ADVANTAGES:
Sample easy to select
Suitable sampling frame can be identified easily
Sample evenly spread over entire reference population
DISADVANTAGES:
Sample may be biased if hidden periodicity in
population coincides with that of selection.
Difficult to assess precision of estimate from one
survey.
34
STRATIFIED SAMPLING
Where population embraces a number of distinct
categories, the frame can be organized into
separate "strata." Each stratum is then sampled
as an independent sub-population, out of which
individual elements can be randomly selected.
Every unit in a stratum has same chance of being
selected.
Using same sampling fraction for all strata ensures
proportionate representation in the sample.
Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required.
35
STRATIFIED SAMPLING……
Finally, since each stratum is treated as an
independent population, different sampling
approaches can be applied to different strata.
37
CLUSTER SAMPLING
Cluster sampling is an example of 'two-stage
sampling' .
First stage a sample of areas is chosen;
Second stage a sample of respondents within
those areas is selected.
Population divided into clusters of
homogeneous units, usually based on
geographical contiguity.
Sampling units are groups rather than
individuals.
A sample of such clusters is then selected.
All units from the selected clusters are studied.
38
CLUSTER SAMPLING…….
Advantages :
Cuts down on the cost of preparing a
sampling frame.
This can reduce travel and other
administrative costs.
Disadvantages: sampling error is higher for
a simple random sample of same size.
Often used to evaluate vaccination
coverage in EPI
39
Difference Between Strata and
Clusters
Although strata and clusters are both non-
overlapping subsets of the population, they
differ in several ways.
All strata are represented in the sample; but
only a subset of clusters are in the sample.
With stratified sampling, the best survey
results occur when elements within strata are
internally homogeneous. However, with
cluster sampling, the best results occur when
elements within clusters are internally
heterogeneous
40
MULTISTAGE SAMPLING
This technique, is essentially the process of taking
random samples of preceding random samples.
Not as effective as true random sampling, but
probably solves more of the problems inherent to
random sampling.
An effective strategy because it banks on multiple
randomizations. As such, extremely useful.
Multistage sampling used frequently when a complete
list of all members of the population not exists and is
inappropriate.
Moreover, by avoiding the use of all sample units in
all selected clusters, multistage sampling avoids the
large, and perhaps unnecessary, costs associated
with traditional cluster sampling.
41
MATCHED RANDOM SAMPLING
A method of assigning participants to groups in
which pairs of participants are first matched on some
characteristic and then individually assigned
randomly to groups.
The Procedure for Matched random sampling can be
briefed with the following contexts,
45
45
Judgmental sampling or
Purposive sampling
46
Questions???
47