SAMPLE SIZE
DETERMINATI
ON
1
At the end of this presentation, we
should be able to;
Understand the significance of
sample size.
Determine sample size.
Understand factors that may affect
sample size
Use sample size in our research or
study.
2
WHAT IS SAMPLE
SIZE?
This is the sub-population to be studied in
order to make an inference to a reference
population(A broader population to which
the findings from a study are to be
generalized)
In census, the sample size is equal to the
population size. However, in research,
because of time constraint and budget, a
representative sample are normally used.
The larger the sample size the more accurate
the findings from a study.
3
Availability of resources sets the upper
limit of the sample size.
While the required accuracy sets the
lower limit of sample size
Therefore, an optimum sample
size is an essential component of
any research.
4
6
WHAT IS SAMPLE SIZE
DETERMINATION
Sample size determination is the
mathematical estimation of the number of
subjects/units to be included in a study.
When a representative sample is taken
from a population, the finding are
generalized to the population.
Optimum sample size determination is
required for the following reasons:
[Link] allow for appropriate analysis
[Link] provide the desired level of accuracy
[Link] allow validity of significance test.
6
HOW LARGE A SAMPLE DO I
NEED?
If the sample is too small:
[Link] a well conducted study may fail to
answer it research question
[Link] may fail to detect important effect or
associations
[Link] may associate this effect or association
imprecisely
7
CONVERSE
If the sample size is too large:
LY
[Link] study will be difficult and costly
[Link] constraint
[Link] cases e.g rare disease.
[Link] of accuracy.
Hence, optimum sample size must be
determined before commencement of
a study.
8
Random error Type I(a)
Systematic error
error (bias) Type II (b)
Precision error
(reliability) Power (1-b)
Accuracy Effect size
(Validity) Design
Null hypothesis effect
Alternative
hypothesis
9
Random error: error that occur by chance.
Sources are sample variability, subject to
subject differences & measurement errors. It
can be reduce by averaging, increase sample
size, repeating the experiment.
Systematic error: deviations not due to
chance alone. Several factors, e.g patient
selection criteria may contribute. It can be
reduce by good study design and conduct of
the experiment.
Precision: the degree to which a variable
has the same value when measured several
times. It is a function of random error.
Accuracy: the degree to which a variable
10 actually represent the true value. It is
11
Null hypothesis: It state that there is no
difference among groups or no
association between the predictor & the
outcome variable. This hypothesis need
to be tested.
Alternative hypothesis: It contradict the
null hypothesis. If the alternative
hypothesis cannot be tested directly, it is
accepted by exclusion if the test of
significance rejects the null hypothesis.
There are two types; one tail(one-sided) or
12
two tailed(two-sided)
Type I(a) error: It occurs if an
investigator rejects a null hypothesis
that is actually true in the population.
The probability of making (a) error is
called as level of significance &
considered as 0.05(5%). It is specified as
Za in sample size computing. Za is a
value from standard normal distribution
≡ a. Sample size is inversely proportional
to type I error.
Type II(b) error: it occur if the investigator
fails to reject a null hypothesis that is
actually false in the population. It is
specify in terms of Zb in sample size
13 computing. Zb is a value from standard
Power(1-b): This is the probability that the
test will correctly identify a significant
difference, effect or association
in the sample should one exist in the
population. Sample size is directly
proportional to the power of the study. The
larger the sample size, the study will have
greater power to detect significance
difference, effect or association.
Effect size: is a measure of the strength of the
relationship between two variables in a
population. It is the magnitude of the effect
under the alternative hypothesis. The bigger
the size of the effect in the population, the
easier it will be to find.
14
Design effect: Geographic clustering is
generally used to make the study easier
& cheaper to perform.
The effect on the sample size depends
on the number of clusters & the
variance between & within the cluster.
In practice, this is determined from
previous studies and is expressed as a
constant called ‘design effect’ often
between 1.0 &2.0. The sample sizes for
simple random samples are multiplied by
the design effect to obtain the sample
15
size for the cluster sample.
odds ratio is a measure of effect
size, describing the strength of
association or non-independence
between two binary data values.
relative risk (RR) is the risk of an event
(or of developing a disease) relative to
exposure. Relative risk is a ratio of the
probability of the event occurring in the
exposed group versus a non-exposed
group.
16
POWER
ANALYSIS
When the estimated sample size can
not be included in a study, post-hoc
power analysis should be carried out.
The probability of correctly rejecting the
null hypothesis is equal to 1 – b, which is
called power. The power of a test refers
to its ability to detect what it is looking
for.
the power of a test is our probability of
finding what we are looking for, given its
size.
post-hocpower analysis is done after a
17 study has been carried out to help to
AT WHAT STAGE CAN SAMPLE
SIZE BE ADDRESSED?
It can be addressed at two stages:
[Link] the optimum sample size
required during the planning stage, while
designing the study, using appropriate
approach & information on some
parameters.
[Link] through post-hoc power analysis
at the stage of interpretation of the
result.
18
APPROACH FOR ESTIMATING
SAMPLE SIZE/POWER ANALYSIS
Approaches for estimating sample size and
performing power analysis depend
primarily on:
[Link] study design &
[Link] main outcome measure of the study
There are distinct approaches for
calculating sample size for different
study designs & different outcome
measures.
19
1. THE STUDY
DESIGN
There are many different approaches for
calculating the sample size for different
study designs. Such as case control
design, cohort design, cross sectional
studies, clinical trials, diagnostic test
studies etc.
Within each study design there could be
more sub-designs and the sample size
calculation will vary accordingly.
Therefore, one must use the correct approach
for computing the sample size appropriate to
20
the study design & its subtype.
[Link] OUTCOME
MEASURE
1⁰ outcome measure is usually reflected
in the 1⁰ research question of the study
& also depend on the study design.
For estimating the risk in control study,
it will be the odds ratio, while for cohort
study it will be the relative ratio.
For case control study, it could be
the difference in means/proportions
of exposure in case & controls,
crude/adjusted odds ratio etc.
Hence, while calculating sample size,
one of these 1⁰outcome measures
21
has to be specified b/c there are
statistical inference from the
study results
In addition, there are also different procedure
for calculating sample size for two
approaches of drawing statistical inference
from the study result i.e
[Link] (Confidence interval approach)
[Link] testing(Test of significance
approach)
A researcher needs to select the appropriate
procedure for computing the sample size
& accordingly use the approach of drawing
NB: Test of significance: Chi-squared, T-test,
a statistical inference subsequently.
2
3
Z-test, F-test, P- value
ADDITIONAL
PARAMETERS
Depending upon the approach chosen for
calculating the sample size, one also needs
to specify some additional parameters such
as;
Hypothesis
Precision
Type I error
Type II error
Power
Effect size
Design effect
23
PROCEDURE FOR CALCULATING
SAMPLE SIZE.
There are four procedures that could be used
for calculating sample size:
2. Use of formulae
3. Ready made table
4. Nomograms
5. Computer software
24
USE OF FORMULAE FOR SAMPLE SIZE
CALCULATION & POWER ANALYSIS
There are many formulae for
calculating sample size & power in
different situations for different study
designs.
The appropriate sample size for
population-based study is determined
largely by 3 factors
[Link] estimated prevalence of the variable
of interest.
[Link] desired level of confidence.
[Link] acceptable margin of error.
25
To calculate the minimum sample size
required for accuracy, in estimating
proportions, the following decisions must be
taken:
[Link] on a reasonable estimate of key
proportions (p) to be measured in the
study
[Link] on the degree of accuracy (d) that
is desired in the study.
~1%-5% or 0.01 and 0.05
[Link] on the confidence level(Z) you
want to use. Usually 95%≡1.96.
[Link] the size (N) of the population
that the sample is supposed to
represent.
26
[Link] on the minimum differences you
expect to find statistical significance.
For population >10,000.
n=Z2pq/d2
0
n= desired sample size(when the population>10,0
0)
Z=standard normal deviate; usually set at 1.96(or
a~2), which correspond to 95% confidence level.
p=proportion in the target population estimated
to have a particular characteristics. If there is
no reasonable estimate, use 50%(i.e 0.5)
q=1-p(proportion in the target population not
having the particular characteristics)
d= degree of accuracy required, usually set at
0.05 level( occasionally at 2.0)
27
E.g if the proportion of a target
population with certain characteristics
is 0.50, Z statistics is 1.96 & we desire
accuracy at 0.05 level, then the sample
size is
n=(1.962)(0.5)
(0.5)/0.052 n=384.
28
If study population is < 10,000
nf=n/1+(n)/(N)
nf= desired sample size, when study
population <10,000 n= desired sample
size, when the study population > 10,000
N= estimate of the population size
Example, if n were found to be 400 and if the population size
were estimated at 1000, then nf will be calculated as
follows
nf=
400/1+400/100
0 nf= 400/1.4
29 nf=286
SAMPLE SIZE FORMULA FOR COMPARISON OF
GROUPS
If we wish to test difference(d) between two sub-
samples regarding a proportion & can assume an
equal number of cases(n1=n2=n’) in two sub-
samples, the formula for n’ is
n’=2z2pq/d2
E.g suppose we want to compare an experimental group
against a control group with regards to women using
contraception. If we expect p to be 40 & wish to
conclude that an observed difference of 0.10 or more is
significant at the
0.05 level, the sample size will be:
n’= 2(1.96)2(0.4)(0.6)/0.12
30 =184
USE OF READYMADE TABLE FOR
SAMPLE SIZE CALCULATION
How large a sample of patients should be
followed up if an investigator wishes to
estimate the incidence rate of a disease to
within 10% of it’s true value with 95%
confidence?
The table show that for e=0.10 &
confidence level of 95%, a sample size of
385 would be needed.
This table can be used to calculate the
sample size making the desired changes in
the relative precision & confidence level .e.g
if the level of confidence is reduce to 90%,
then the sample size would be 271.
31
Such table that give ready made sample
3
3
USE OF NOMOGRAM FOR
SAMPLE SIZE CALCULATION
For use of nomogram to calculate the
sample size, one needs to specify the
study(group 1) & the control
group(group 2). It could be arbitrary
or based on study design; the
nomogram will work either way.
The researcher should then decide the
effect size that is clinically important
to detect. This should be expressed in
terms of % change in the response rate
33
compared with that of the control
E.g if 40% of patients treated with
standard therapy are cured and one
wants to know whether a new drug can
cure 50%, one is looking for a 25%
increase in cure rate .
(50%-40%/40% = 25% )
34
3
6
USE OF COMPUTER SOFTWARE FOR
SAMPLE SIZE CALCULATION & POWER
ANALYSIS
The following software can be used for
calculating sample size & power;
Epi-info
nQuerry
Power & precision
Sample
STATA
SPSS
36
Epi-info for sample size
determination
In STATCALC:
1 Select SAMPLE SIZE & POWER.
2 Select POPULATION SURVEY.
3 Enter the size of population (e.g. 15 000).
4 Enter the expected frequency (an
estimate of the true prevalence,
e.g.80% ± your minimum standard).
5 Enter the worst acceptable result (e.g.
75%) i.e the margin of error is 5%
37
How to use sample size
formulae
Steps:
1st Formulate a research question
2nd Select appropriate study design, primary
outcome measure, statistical significance.
3rd use the appropriate formula to calculate
the sample size.
38
Finall
ySample size determination is one of
the most essential component of
every research/study.
The larger the sample size, the higher the
degree accuracy, but this is limit by the
availability of resources.
It can be determined using formulae,
readymade table, nomogram or
computer software.
39
STILL CONFUSED………………………..
Smart people don’t do it alone…………………
Call a statistician
•Sample selection
4 •Sample size determination
1 •Analysis of data
41
Referenc
es
Research methodology, 2004, M.O.
Araoye; sample size determination,
page 117
Research methodology, 2004,Zodpey SP
[Link]
Wikipedia, sample size determination
42