Randomised Controlled Trials (RCTS) - Sample Size. The Magic Number
Randomised Controlled Trials (RCTS) - Sample Size. The Magic Number
Randomised
Controlled
Trials (RCTs) –
Sample Size:
The Magic Number?
Y H Chan
Tel: (65) 6317 2121 Fax: (65) 6317 2122 Email: chanyh@ white cat could be lying under a table
cteru.gov.sg
INTRODUCTION somewhere. But once you bring me a white
A common question posed to a biostatistician cat, the hypothesis of ‘all cats are black’ is
from a medical researcher is “How many disqualified.
subjects do I need to obtain a significant result Hence if we are interested to compare two
for my study?”. That magic number! In the therapies, the null hypothesis will be “there is
manufacturing industry, it is permitted to test no difference” versus the Alternative
thousands of components in order to derive a Hypothesis of “there is a difference”. From the
conclusive result but in medical research, the above philosophical argument, not being able
sample size has to be “just large enough” to to reject the null hypothesis
provide a reliable answer to the research does not mean that that it is true (just that we
question. If the sample size is too small, it’s a do not have enough evidence to reject).
waste of time doing the study as no conclusive We want to reject the null hypothesis but could
results are likely to be obtained and if the be committing a Type I Error: rejecting the
sample size is too large, extra subjects may be null hypothesis when it’s true. In a research
given a therapy which perhaps could be proven study, there’s no such thing as “my results are
to be non-efficacious with a smaller sample correct” but rather “how much error I am
size(1). committing”. For example, if in the population,
Another major reason, besides the scientific there are actually no differences between two
justification for doing a study, why a researcher therapies (but we do not know, that’s why we
wants an estimate of the sample size is to are doing the study) and after conducting the
calculate the cost of the study which will study, a significant difference was found which
determine the feasibility of conducting the is given by p<0.05.
study within budget. This magic number will There are only two reasons for this significant
also help the researcher to estimate the length difference (assuming that we have controlled
of his/her study – for example, the calculated for bias of any kind). One is, there’s actually a
sample size may be 50 (a manageable number) difference between the two therapies and the
but if the yearly accrual of subjects is 10 other is by chance. The p-value gives us this
(assuming all subjects give consent to be in the “amount of chance”. If the p-value is 0.03, then
study), it will take at least five years to the significant difference due to chance is 3%.
complete the study! In that case a multicentre If the p-value is very small, then this difference
study is encouraged. happening by chance is “not possible” and thus
should be due to the difference in therapies
STATISTICAL THEORY ON SAMPLE (still with a small possibility of being
SIZE CALCULATIONS “wrong”).
The Null Hypothesis is set up to be rejected. The other situation is not being able to reject
Clinical Trials and Epidemiology The philosophical argument is: it is easier to the null hypothesis when it is actually false
Research Unit (Type II Error). As mentioned, the main aim
226 Outram Road Blk A #02-02
prove a statement is false than to prove it’s
Singapore 169039 true. For example, we want to prove that “all of a clinical research is to reject the null
Y H Chan, PhD Head of Biostatistics cats are black”, and even if you point to me hypothesis and we could achieve this by
(2)
Correspondence to: Y H Chan black cats everywhere, there’s still doubt that a controlling the type II error . This is given by
the Power of the study (1 – type II error): the sample size required) is usually carried out To estimate a sample size which will ethically
probability of rejecting the null hypothesis compared to a one-sided test which has the answer the research question of an RCT with a
when it is false. Conventionally, the power is assumption that the test therapy will perform reliable conclusion, the following information
set at 80% or more, the higher the power, the clinically better than the standard or control should be available.
bigger the sample size required. therapy.
To be conservative, a two-sided test (more
SAMPLE SIZE CALCULATIONS
Singapore Med J 2003 Vol 44(4) : 173
Type of configuration(4)
Parallel design subjects are
(π1 – π2)2
Most commonly used design. The m (size per group) = c X
where c = 7.9 for 80% power and 10.5 for 90% power, π1
randomised to one or more arms of different therapies and π2 are the proportion estimates. Thus from the above
treated concurrently. example, π1 = 0.25 and π2 = 0.65. For a 80% power, we
have
Crossover design m (size per group) = 7.9 X [0.25 (1 – 0.25) + 0.65
For this design, subjects act as their own control, will be (1 – 0.65)]/(0.25-0.65)2
randomised to a sequence of two or more therapies with a = 20.49
washout period in between therapies. Appropriate for
chronic conditions which will return to its original level Hence 21 X 2 = 42 subjects will be needed.
once therapy is discontinued.
Table I shows the required sample size per group for
Type I error and Power(5) π1 & π2 in steps of 0.1for powers of 80% & 90% at
The type I error is usually set at two-sided 5% and power two-sided 5%.
is at 80% or 90%.
Table I
π 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 199 (266) 62 (82) 32 (42) 20 (26) 14 (17) 10 (12) 7 (9) 5 (6) 0.2 – 294 (392) 82 (109) 39 (52) 23 (30) 15 (19) 10 (13) 7 (9) 0.3 –
356 (477) 93 (125) 42 (56) 24 (31) 15 (19) 10 (12) 0.4 – 388 (519) 97 (130) 42 (56) 23 (30) 14 (17) 0.5 – 388 (519) 93 (125) 39 (52)
20 (26) 0.6 – 356 (477) 82 (109) 32 (42) 0.7 – 294 (392) 62 (82) 0.8 – 199 (266)
Numbers in ( ) are for 90% power
174 : 2003 Vol 44(4) Singapore Med J the magic number being generated is accepted by the user.
For this number to be “correct”, the right formula must be
used for the right type of design and primary outcome. It
is important to note that nearly all the programs would
provide the sample size for one group and not the total
Continuous outcomes
(except for paired designs).
Two independent samples
A simple-to-use PC-based sample size software,
The primary outcome of interest is the mean difference in
affordable in cost, is Machin’s et al(6) Sampsize version 2.1
an outcome variable between two treatment groups. For
but it could only be installed for Windows 98 and below.
example, it is postulated that a good clinical response
Software with network capabilities are SPSS
difference between the active and placebo groups is 0.2
units with an SD of 0.5 units, how many subjects will be
required to obtain a statistical significance for this clinical
difference?
A simple formula, for a two-sided test of 5%, is 2c
Table III
δ
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 80% power 792 200 90 52 34 24 19 15 12 90% power 1,052 265 119 68 44
32 24 19 15