0% found this document useful (0 votes)
20 views63 pages

Wa0014

The document outlines a course on Introductory Biostatistics, covering key topics such as data collection methods, types of variables, scales of measurement, and statistical tests including measures of central tendency and dispersion. It emphasizes the importance of biostatistics in medical research and health sciences, detailing various types of research data and methods of data collection. Additionally, it explains the application of statistical tools and software for practical assignments in the field.

Uploaded by

olusolaolomola12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views63 pages

Wa0014

The document outlines a course on Introductory Biostatistics, covering key topics such as data collection methods, types of variables, scales of measurement, and statistical tests including measures of central tendency and dispersion. It emphasizes the importance of biostatistics in medical research and health sciences, detailing various types of research data and methods of data collection. Additionally, it explains the application of statistical tools and software for practical assignments in the field.

Uploaded by

olusolaolomola12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

INTRODUCTORY

BIOSTATISTICS
PHG 316
COURSE OUTLINE
• Importance of statistics in Medicine
• Data collection methods and types of variables
• Scale of measurements
• Measures of central tendency (ungrouped and grouped data)
• Measures of dispersion (ungrouped and grouped data)
• Data presentation methods
• Sampling and types of sampling
• Probability theory and binomial distribution
• Practical assignments and applications using jot form, google form, Ms-
Excel, Graph-Pad, SPSS and other online statistics and survey tools.
Course outline Continued

• The normal Distribution and introduction to Z-test


• Introduction to students T-test
• Chi-square test
• Correlation and Linear regression
• Vital statistics.
BIOSTATISTICS
• Statistics is the science which deals with methods of collection, organizing,
presentation, summarizing, analysis and interpretation
• Biostatistics is the branch of statistics that deals with the application of
statistics in biomedical research laboratories, clinical medicine, health
promotion, national and global systems of health care to Medicine and
health sciences.
• Biostatistics helps researchers make sense of the data collected to decide
whether a treatment is working or to find factors that contribute to
diseases. Medical statisticians design and analyze studies to identify the
real causes of health issues as distinct from chance variation.
FIELDS OF STATISTICS TO BE COVERED
• Descriptive statistics: Mean, Median, Standard Deviation and
Variance
• Relational statistics: How some data relate to others e.g. correlation,
multiple correlation, regression
• Inferential statistics: divided into two types
• Inferential parametric test for significance e.g. F Test, T-test,
ANOVA,
• Inferential non-parametric tests for significance e.g. chi-square,
Mann-whitney, Kruskal-Wallis etc.
TYPES OF RESEARCH DATA
DATA: can be defined as facts and statistics collected for reference or
analysis. In other words, It is the collection of measurements on
variables that a researcher is interested in.
1. Observational Data: data captured through observation of a
behavior or activity. Can be collected using human observation,
open-ended surveys or the use of instrument to monitor and record
information.
2. Experimental data: collected through active intervention by the
researcher to produce and measure change or to create difference
when a variable is altered.
TYPES OF RESEARCH DATA CONT’D
3. Simulation Data: Data generated by imitating the operation of a
real-world process or system over time using computer test models.
E.g. used to predict weather conditions, economic models, seismic
activity, organ models etc.
4. Derived/Compiled data: this involves using existing data points,
often from different data sources, to create new data through some
sort of transformation, such as an arithmetic formula or
aggregation.
METHODS OF DATA COLLECTION
Methods of data collection includes:
1. Interviews e.g. in person interview, phone interviews
2.Questionnaires and surveys
3.Observations: Collection of information without asking question
4.Documents and records e.g. use of records such as attendance,
death records, life records, gene bank records etc
5.Focus groups
6.Oral histories
7.Online tracking
8.Online marketing analytics
9. Social Media Monitoring
VARIABLES AND TYPES OF VARIABLES

• VARIABLE: An entity that can assume different values.

TYPES OF VARIALBLE

Qualitative variable Quantitative variable


VARIABLES AND TYPES OF VARIABLES
• Quantitative Variable: This is a variable in which numerical
values can be assigned e.g. age, height, weight, blood
pressure, sperm count etc.

• It is further divided into two types namely Discrete and


continuous.
• Discrete: this are variables that are countable in a finite amount of
time and are usually whole numbers e.g. Number of kids,
• Continuous: In this type of quantitative variable, fractions are
allowed e.g. weight of a chemical 10.35 kg.
VARIABLES AND TYPES OF VARIABLES
• Qualitative Variable: This is a type of variable in which there are no numerical
value assigned but categories can be assigned to them e.g. sex, religion,
occupation, ethnicity etc.
• Types of Qualitative variables are Nominal and Ordinal.
• Nominal: A nominal variable is a type of variable that is used to name,
label or categorize attributes that are being measured. It takes qualitative
values representing different categories, and there is no intrinsic ordering
of these categories e.g., gender, name, phone number etc
• Ordinal: the ordinal type has a scale of measurement in which one is
higher than the other e.g., Socio economic status (“low income”,”middle
income”,”high income”), education level (“high
school”,”BSc”,”MSc”,”PhD”), satisfaction rating (“extremely dislike”,
“dislike”, “neutral”, “like”, “extremely like”).
SCALES OF MEASUREMENTS
1. Nominal scale: The scale defines the identity property of data.
This scale has certain characteristics but doesn’t have any form
of numerical meaning. The data can be placed into categories
but can’t be multiplied, divided, added or subtracted from one
another. It’s also not possible to measure the difference between
data points e,g. eye colour and country of birth. It has three (3)
categories.
• Nominal with order: such as “cold, warm, hot and very hot.”
• Nominal without order: e.g such as male and female.
• Dichotomous: Dichotomous data is defined by having only two
categories or levels, such as “yes’ and ‘no’.
SCALES OF MEASUREMENTS CONT’D

• 2. Ordinal scale: This scale defines data that is placed in a


specific order. While each value is ranked, there’s no
information that specifies what differentiates the categories
from each other. These values can’t be added to or subtracted
from.
• Example: satisfaction data points in a survey, where ‘one =
happy, two = neutral, and three = unhappy.’ Where someone
finished in a race also describes ordinal data. While first place,
second place or third place shows what order the runners
finished in, it doesn’t specify how far the first-place finisher was
in front of the second-place finisher.
SCALES OF MEASUREMENTS CONT’D
3. Interval scale: The interval scale contains properties of nominal and
ordered data, but the difference between data points can be quantified. This
type of data shows both the order of the variables and the exact differences
between the variables. They can be added to or subtracted from each other,
but not multiplied or divided. For example, 40 degrees is not 20 degrees
multiplied by two.
• This scale is also characterized by the fact that the number zero is an
existing variable. In the ordinal scale, zero means that the data does not
exist. In the interval scale, zero has meaning – for example, if you measure
degrees, zero has a temperature.
• Data points on the interval scale have the same difference between them.
E.g 11-20; 21-30, etc This scale is used to quantify the difference between
variables, whereas the other two scales are used to describe qualitative
values only. Other examples of interval scales include the year a car was
made or the months of the year.
SCALES OF MEASUREMENTS CONT’D
4. Ratio scale: Ratio scales of measurement include properties from all
four scales of measurement. The data is nominal and defined by an
identity, can be classified in order, contains intervals and can be
broken down into exact value. Weight, height and distance are all
examples of ratio variables. Data in the ratio scale can be added,
subtracted, divided and multiplied.
Ratio scales also differ from interval scales in that the scale has a
‘true zero’. The number zero means that the data has no value point.
An example of this is height or weight, as someone cannot be zero
centimeters tall or weigh zero kilos – or be negative centimeters or
negative kilos. Examples of the use of this scale are calculating
shares or sales. Of all types of data on the scales of measurement,
data scientists can do the most with ratio data points
MEASURES OF CENTRAL TENDENCY
• There are three main measures of central tendency: the mode, the
median and the mean.
Central Tendency for Ungrouped data
• Mean: also known as average is the most reliable and most used
measure of central tendency because it is amenable to other
statistical tests. It is represented by 𝑥ҧ
σ 𝑋1 𝑆𝑢𝑚𝑚𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠
• Mean = =
𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

• E.g., 10 scores for PHG 316 exam


• 30, 45, 50, 55, 60, 65, 70, 75, 80, 55
585
• 𝑥=
ҧ = 58.5
10
Advantages and Disadvantages of Mean
• Advantage: It is amenable to many statistic test e.g., T tests and
ANOVA uses mean values.

• Disadvantage: It is affected by extreme values i.e., outliers


Central Tendency for Ungrouped data CONT’D
• MEDIAN: The middle value or the most middle value in an
observation. It is the value that divides the distribution into two equal
halves. To ascertain median, you must do an array i.e., arrange scores
in ascending or descending order.
Scores 30 45 50 55 60 65 70 75 80 55
Array 30 45 50 55 55 60 65 70 75 80
Position 1 2 3 4 5 6 7 8 9 10
𝑛+1
• If number of scores is odd, median = = assuming n was 9 then
2
9+1
which gives us 5 and means the 5th number is the Median
2
Median Cont’D
• If the number of scores/observations is even e.g. 10 scores of PHG
316 earlier mentioned, then median is

𝒏 𝒏
• Average of numbers at and + 1 position
𝟐 𝟐
10 10
• i.e. and + 1 which means the 5th and the 6th values
2 2

55 + 60
• i.e. = 57.5
2

• Median = 57.5
Advantages and disadvantages of Median
• Median is the second most commonly used measure of central
tendency
• ADVANTAGE: Very useful when distribution is skewed to one side i.e.
right/left, median then becomes more appropriate than mean.
• DISADVATAGE: You must have an array of values before you can
calculate median
MODE in an Ungrouped Data

• Mode is the most observed value in the distribution.


• Advantage: It is the easiest measure of central tendency to calculate
i.e. the number of the highest frequency of occurrence.
• Disadvantage: It is not amenable to other statistics test in inferential
statistics.
• It is the least useful measure of central tendency because it only takes
a single value into account.
• In our example the Mode= 55
MEASURES OF CENTRAL TENDENCY IN A GROUPED DATA

Score Freq (f) Xm fXm CF


20-29 4 24.5 98 4
30-39 10 34.5 345 14
40-49 20 44.5 890 34
50-59 32 54.5 1744 66
60-69 20 64.5 1290 86
70-79 8 74.5 596 94
80-89 4 84.5 338 98
90-99 2 94.5 189 100
n = 100 ∑= 5490
Mean of Grouped Data
σ 𝑓𝑋𝑚
• MEAN 𝑥ҧ = σ𝑓

5490
= = 54.9
100
f = frequency of class
Xm= Midpoint of class interval
∑f = Total number of observations
Median of Grouped Data
𝑁
−𝑛𝑏 𝑖
• Median= li + 2
𝑛𝑤
• Li = lower real limit of median class
• N = Total number of observation/ sample size
• Nb = Number of cases/values on cumulative frequency below the
median class interval
• i = width/size of class interval
• nw = frequency within the median class interval
𝒏 𝒏
Median observation= and + 1 position
𝟐 𝟐
50 +51
= = 50.5
2
Therefore, the class interval for the Median is 50-59 and Li will be 50.
Median calculations continued
• We take Li for class 50-59 = 49.5 (because the exam
score will be rounded up to 50).
𝑁
−𝑛𝑏 𝑖
• Median = li + 2
𝑛𝑤
100
2
−34 10 50 −34 10 160
= 49.5 + = 49.5 + = 49.5 +
32 32 32

ANS= 54.5
MODE IN GROUPED DATA
𝐷1
• Mode = L + ( ) i
𝐷1+𝐷2

• L= lower class limit of the modal class


• D1 = Excess of modal frequency over frequency of next lower class
• D2 = Excess of modal frequency over frequency of next higher class
• E.g. Modal class = 50-59 (i.e. class with highest frequency)
• L= 50
• D1 = (frequency of 50-59) – (frequency of 40-49) = 32-20. D1=12
• D2 = (frequency of 50-59) – (frequency of 60-69) = 32 – 20. D2 =12
• i = 10
MODE OF GROUPED DATA CONT’D
𝐷1
• Mode = L + ( ) i
𝐷1+𝐷2

12
= 50 +( ) 10
12 +12
12
= 50 +( ) 10
24

ANS = 55
Formula for constructing a table when given
Raw Scores
• K = 1 + 3.322 logn
K= Number of class intervals to construct
logn = log of sample size e.g 100 in the sample earlier used can be
calculated as
K= 1 + 3.322 log100
= 1 + 3.322 X 2
= 1 + 6.644
= 7.644
ANS is Approximately = 8
Constructing a table when given raw scores
• To calculate the width of the class interval
𝑅
• W=
𝐾
• W= width of class interval to be constructed
• R= Range of scores (i.e. difference between lowest and highest limit)
• K = Number of class intervals
99 −20 79
• W= =
8 8
= 9.8
• Therefore, w is approximately = 10
ASSIGNMENT 1
• Complete and submit your assignment. Assignment will be posted on
your class WhatsApp group.
MEASURES OF DISPERSION
• The measure of dispersion is also known as measures of variability or
measures of the spread of the data.
• It indicates how far or distant other values are from the central value.
• Examples of measures of dispersion are;
• Range
• Variance
• Standard Deviation
• Coefficient of variation
Measures of dispersion for ungrouped data
1. Range: This is the difference between the highest and lowest value.
e.g., 5 scores: 15, 25,35, 10, 60
Range= 60-10 = 50.
Advantage: Range is the easiest measure of dispersion to calculate
Disadvantages:
i. It is not very useful because it only makes use of two scores.
ii. Range varies with sample size.
iii. It is not amenable to other statistical tests
Measures of dispersion for ungrouped data
Cont’D
2. Variance: also known as mean squared deviation and represented by S2.
2
σ (𝑥 − ҧ
𝑥)
S2 =
𝑛−1
𝑥ҧ = mean
n = sample size
5 scores of 10,20, 30,40 and 50
150
first calculate the mean = 𝑥 =
2 2
5
2 2 2
σ 10−30 + 20−30 + 30−30 + 40−30 + 50−30 400+100+0+100+400
S2 = =
5−1 4
S2 = 250
• Advantages of Variance:
i. it makes use of all the scores in the distribution
ii. It is useful in inferential statistics.
Measures of dispersion for ungrouped data
Cont’D
3. Standard Deviation: also known as root mean squared deviation. i.e., square root
of variance. It is the average deviation of all the values from the central value
S = √S2
σ(𝑥 −𝑥)ҧ 2
S= √
𝑛−1
Advantages:
i. It is the most used measure of dispersion
ii. It is amenable to many statistical tests
iii. It helps to bring the scores to the same unit of our final score.
Measures of dispersion for ungrouped data
Cont’D
4. Coefficient of Variation (CV).
S.D
C.V. = X 100 percent
𝑀𝑒𝑎𝑛

From the example Variance is 250


√250 15.8
Therefore CV = X 100 = X 100
30 30

CV= 52.7
Coefficient of variations cont’d
• CV is used to compare 2 variables e.g., Blood pressure and cholesterol
levels to find out which varies more in the population. Both values
have different units so difficult to compare but they can be compared
directly using coefficient of variation which would have had the units
cancelled out in calculating SD and mean of each variable.
𝑆𝐷 𝑚𝑚𝐻𝑔 𝑆𝐷 𝑚𝑚𝑜𝑙/𝐿
• For B.P. = Cholesterol =
𝑚𝑒𝑎𝑛 𝑚𝑚𝐻𝑔 𝑚𝑒𝑎𝑛 𝑚𝑚𝑜𝑙/𝐿

• The CV of BP in percent e.g., 40% and the CV of cholesterol e.g., 60%


can then be compared with conclusions such has cholesterol varies
more than B.P. in the population.
Second formula for Variance
• The advantage of this second formula over the first is that you don’t
need to calculate mean when the values are many.
2
2 σ𝑥
σ𝑥 −
• S2 = 𝑛
𝑛−1
• S.D will remain square root of variance which is the square root of the
formulae above.
Measures of dispersion for grouped data
Score Frequency Xm Xm – 𝑥ҧ (Xm – 𝑥)ҧ 2 f (Xm – 𝑥)ҧ 2

20 – 29 4 24.5 - 30.4 924.16 3696.64


30 – 39 10 34.5 - 20.4 416.16 4161.60
40 – 49 20 44.5 -10.4 108.16 2163.20
50 – 59 32 54.5 - 0.4 0.16 5.12
60 – 69 20 64.5 9.6 92.16 1843.20
70 – 79 8 74.5 19.6 384.16 3073.28
80 – 89 4 84.5 29.6 876.16 3504.64
90 – 99 2 94.5 39.6 1568.16 3136.32
∑f = 100 21584

Mean = 54.9 S.D= √Variance = √218.02


σ 𝑓(𝑋𝑚 −𝑥)ҧ 2 21584
Variance, S2 = = 99 = 218.02 S.D. = 14.77
𝑛−1
Second formula for variance in a grouped
data
2
2 σ𝑓𝑋𝑚
σ𝑓𝑋𝑚 −
• S2 = 𝑛
𝑛−1
Score Frequency Xm Xm2 fXm2 fXm
20 – 29 4 24.5 600.25 2401 98
30 – 39 10 34.5 1190.25 11902.5 345
40 – 49 20 44.5 1980.25 39605 890
50 – 59 32 54.5 2970.25 95048 1744
60 – 69 20 64.5 4160.25 83205 1290
70 – 79 8 74.5 5550.25 44402 596
80 – 89 4 84.5 7140.25 28561 338
90 – 99 2 94.5 8930.25 17860.5 189
∑f = 100 σ𝑓𝑋𝑚2= 322985 σ𝑓𝑋𝑚= 5490
• σ𝑓𝑋𝑚2= 322985

• (σ𝑓𝑋𝑚)2= (5490)2
σ𝑓𝑋𝑚 2 30140100
• = = 301401
𝑛 100

322985 −301401 21584


• S2 = =
99 99
• S2 = 218.02
• S.D. = √S2 = √218.2 = 14.77
𝑆𝐷 14.77
• C.V. = X 100 = X 100 = 26.9
𝑀𝑒𝑎𝑛 54.9
DATA PRESENTATION METHODS
Methods of Data presentation
• 1. Text presentation
• 2. Tables
• 3. Charts
• 4. Graphs
Methods of Data presentation
1. Text presentation: Text is the main method of conveying
information as it is used to explain results and trends and provide
contextual information.
E.g., The number of deaths from covid-19 in Nigeria as of November
30, 2020, is 1945 while as of November 30, 2021, an increase of about
44.9% has been observed rising to 2820 deaths.
Methods of Data presentation Cont’d
2. Tables: Tables shows the raw data presented in rows and columns. It
is designed to simplify the presentation and to facilitate quick
comparison. A table shows all data at once and in a precise form.
However, when using a table for data presentation, it can be hard to
interpret or see patterns.
E,g. Table showing students scores as already presented under
measures of central tendency grouped data.
Methods of Data presentation Cont’d
3. Charts: There are various types of charts used in the presentation of data.
a. Pie Chart: it displays the relative figures (proportions or percentages) of
classes or strata of a given sample or population.
• The pie chart follows the principle that the angle of each of its sectors
should be proportional to the frequency of the class that it represents.
• A pie chart is used to represent nominal data (in other words, data
classified in different categories), of a distribution of categories visually. It is
generally the most appropriate format for representing information
grouped into a small number of categories. It gives a simple pictorial
display of the relative sizes of classes.
How to make a pie chart in Excel
Figure 1. Effect of 0.3% salt diet To use Microsoft Excel to create a
(control diet) on estrous cycle of pie chart,
female rats i. select the data to be created –
17% the label and the frequency
only,
PROESTRUS ii. go to Insert,
46% ESTRUS iii. click on Recommended
METESTRUS
21% Charts/All Charts/Charts,
DIESTRUS
iv. click on Pie chart,
v. select the type of pie chart you
16% want.
Methods of Data presentation Cont’d
b. Bar Chart: A bar chart shows data in separate columns. It consists a
group of equally spaced rectangular bars, one for each category (or
class) of given statistical data.
The rectangular bars are differentiated by different shades or colors.
The bars starting from a common baseline must be of equal width and
their length represents the values of statistical data.
Bar charts may be of two types: vertical and horizontal. The bar chart is
one of the most common methods of presenting data in a visual form.
Its main purpose is to display quantities in the form of bars
Bar charts

Serum testosterone concentration To use Microsoft Excel to create a bar


1.6
chart,
1.4 i. select the data to be created – the
1.2
label and the frequency only,
ii. go to Insert,
Testosterone level (ng/mL)

1
iii. click on Recommended Charts/All
0.8
Charts/Charts,
0.6 iv. click on bar chart,
0.4 v. select the type of bar chart you
0.2
want.
0
Control L-NAME L-NAME + Omega 357 Omega 357
Methods of Data presentation Cont’d

4. Graphs: Due to the presence of the x and y axis on a bar chart, it is


sometimes called a graph. However, graphs to be considered here
include:
a. Histogram: A histogram is a set of vertical bars whose areas are
proportional to the frequencies of the classes that they represent. It
shows continuous data with no gaps between the columns. It shows
continuity of data categories. It can be vertical or horizontal. However,
the histogram should be clearly distinguished from the bar chart. The
most striking physical difference between these two diagrams is that,
unlike the bar chart, there are no ‘gaps’ between successive rectangles
of a histogram.
Histogram
• To use Microsoft Excel to create a
histogram,
• i. select the data to be created – the
label and the frequency only,
• ii. go to Insert,
• iii. click on Recommended
Charts/All Charts/Charts,
• iv. click on histogram,
• v. select the type of histogram you
want
Methods of Data presentation Cont’d

b. Line Graph: A line graph is


usually meant for showing the
frequencies for various values
of a variable. Successive points
are joined by means of line
segments so that a glance at
the graph is enough for the
reader to understand the
distribution of the variable. It
shows all data points.
SAMPLING AND SAMPLING TECHNIQUES
• Sampling is the selection of study units from a defined population
(Study population).

• Steps involved in Sampling.


i. Define your study population clearly. E.g. sex, age, e.t.c.
ii. How many people/samples do you need?
iii. Which type of sampling do you method do you intend using?
• For you to make any valid conclusion for your study population, your
sample should be selected among the representative of the total
population e.g., quota system representation in government.
Methods of Sampling
• This can be divided into two major categories.
1. Probability Sampling
2. Non- Probability Sampling (this type is applicable were there is no
sampling frame)

A sampling frame is the list of all the units that make up your
population.
Non-probability sampling Methods
The Non-probability sampling methods includes:
• Convenient (haphazard) sampling
• Quota sampling
• Propulsive sampling

• Convenient Sampling: In this type of sampling, the study unit that


happen to be available at the time of data collection are selected in
the sample e.g. focus group discussion
Non-probability sampling Methods Cont’d
• Quota sampling: This method ensures that a certain number of
sampling unit from the different category with specific characteristics
appear in the sample so that all these characteristics are represented.
Its useful when a researcher fills that a convenient sampling will not
provide the desired balance of sampling units.

• Propulsive sampling: Here subjects that you presume are typical of


the population to be studied e.g. picking a group of Undergraduate
students in Idi-araba to represent Unilag Students.
Probability Sampling Methods
• Probability sampling involves a random selection procedure to ensure
that each unit of the sample is chosen on the bases of chance. All the
units in the study population should have an equal or at least a
known chance of being included in the sample. Types of probability
samplings includes:
i. Simple random sampling
ii. Systematic random sampling
iii. Stratified random sampling
iv. Cluster sampling
v. Multistage sampling
Probability Sampling Methods Cont’d
1. Simple random Sampling: This is the simplest form of probability
sampling in which every member of the population has an equal
chance of being selected as a member of the sample.
Procedure: Make a number list of all the units in the population
which you want to draw the sampling, select the required no of
sampling unit e.g., balloting or using a table of random numbers.

Disadvantage: It will be difficult if the sampling frame is large e.g., in


millions.
Probability Sampling Methods Cont’d
2. Systematic sampling: Here individuals are chosen at regular intervals from
the sampling frame. The interval is derived from the sampling fraction.
Ideally, you should select randomly the units at which you start selecting.
E.g. every 3rd person in a class.

𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 100


Sampling fraction = e,g., will mean 1 in 10
𝑆𝑡𝑢𝑑𝑦 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒 1,000
persons.

Disadvantage: Don’t use it if there is a periodicity in a population or sampling


frame to prevent bais. There is a risk of bias if your sampling interval
coincides with a systemic variation in the sampling frame. e.g., if 1 in every 3
picks are all ladies, findings can become bias.
Probability Sampling Methods Cont’d
3. Stratified sampling: Here the sampling frame is divided into strata
and your samples are obtained from each group by random or
systematic sampling. This method is used if it is important that the
sample includes representatives' groups of the study unit with specific
characteristics. A different sampling frame is prepared for each strata.
This is used when a group is heterogenous.
e.g., when you need to pick Senior secondary students into clubs and
you must have representatives from each strata’s of SS1, SS2, and SS3
well represented in the 5 clubs of 12 people each to be formed.
Probability Sampling Methods Cont’d
4. Cluster sampling: Here the clusters are the study units rather than
the individuals. This method is usually used when you don’t have a
complete frame or because of logistic difficulty. The population is
1. first divided into clusters of homogeneous unit. Here sampling frame
is the list of clusters and not individuals.
2. Select cluster by simple random sampling
3. Examine all units in the cluster
e.g., Village with multiple compounds, the compounds becomes your
clusters, you can ballot and know which clusters to be used for your
study.
Probability Sampling Methods Cont’d
5. Multi-stage sampling: Sampling that involves several stages with at
least 2 stages. It usually involves more than one sampling method. It is
used when you have a large and diverse population. E,g., Nigeria when
you need 1,000 Nigerians to represent a population.
• First, you can use the 36 states and pick 5 of them randomly.
• From the 5 states picked , you can still use the Local governments in the states
as sampling frame and pick 2 local governments from each of the 5 states.
• You can come to the wards in the local governments and pick a ward.
• You can then further narrow down to streets and then use systematic
sampling to pick every 3 other 3 house in the streets.
• At the end the units are truly a representation of the population.
Probability Sampling Methods Cont’d
6. Snowball Sampling: This is a network referrer sampling/chain
referrer sampling. This method is usually used for difficult to reach
population or rare conditions. Data is collected from a small group of
people with a special characteristics who will then assist in recruiting
other people like them. The data is collected from the newly recruited
people, and they are to recruit others, and this will continue until the
desired sample size is achieved. E.g., Study among drug addict,
prostitutes etc.

You might also like