Sample Size Calculations Basic Principles and Common Pitfalls
Sample Size Calculations Basic Principles and Common Pitfalls
doi: 10.1093/ndt/gfp732
Advance Access publication 12 January 2010
CME Series
Marlies Noordzij1, Giovanni Tripepi2, Friedo W. Dekker3, Carmine Zoccali2, Michael W. Tanck4
and Kitty J. Jager1
© The Author 2010. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
For Permissions, please e-mail: [email protected]
Sample size calculations 1389
Table 1. Overview of errors in clinical research fect that is present in a population using a test based on a
sample from that population (true positive). The power is
Population the complement of beta: 1-beta. So, in case of a beta of
Difference 0.20, the power would be 0.80 or 80%, representing the
does not exist probability of avoiding a false-negative conclusion, or
the chance of correctly rejecting a null hypothesis.
Difference does not exist False negative result
Type II error (beta)
3. The smallest effect of interest. The smallest effect of
Difference exists False positive result interest is the minimal difference between the studied
Power (1-beta)
Type I error (alpha) groups that the investigator wishes to detect and is often
referred to as the minimal clinically relevant difference,
sometimes abbreviated as MCRD. This should be a differ-
differ from each other. On the one hand, the null hypoth- ence that the investigator believes to be clinically relevant
esis (H0) hypothesizes that the groups of subjects (sam- and biologically plausible. For continuous outcome vari-
Component Definition
Alpha (type I error) The probability of falsely rejecting H0 and detecting a statistically significant difference when the groups in reality are not
different, i.e. the chance of a false-positive result.
Beta (type II error) The probability of falsely accepting H0 and not detecting a statistically significant difference when a specified difference
between the groups in reality exists, i.e. the chance of a false-negative result.
Power (1-beta) The probability of correctly rejecting H0 and detecting a statistically significant difference when a specified difference between
the groups in reality exists.
Minimal clinically The minimal difference between the groups that the investigator considers biologically plausible and clinically relevant.
relevant difference
Variance The variability of the outcome measure, expressed as the SD in case of a continuous outcome.
Abbreviations: H0, null hypothesis; i.e. the compared samples come from the same source population (the compared groups are not different from each
other); SD, standard deviation.
1390 M. Noordzij et al.
2
renal failure patients in Australia is 140 g/m with an SD of value 0.842 should be filled in for b in the formula.
60 g/m2. These multipliers for conventional values of alpha and
Sometimes, the minimal clinically relevant difference beta can be found in Table 3.
and the variability are combined and expressed as a mul- Suppose the investigators consider a difference in SBP
tiple of the SD of the observations; the standardized differ- of 15 mmHg between the treated and the control group
ence. The standardized difference is also referred to as the (μ1 – μ2) as clinically relevant and specified such an effect to
effect size and can be calculated as: be detected with 80% power (0.80) and a significance level
A summary of all components of sample size calcula- alpha of 0.05. Past experience with similar experiments,
Fig. 1. Nomogram for the calculation of sample size or power (adapted from Altman 1982) [2].
1392 M. Noordzij et al.
Table 4. Approximate relative sample size for different levels of alpha In most studies, investigators estimate the difference of
and power interest and the standard deviation based on results from a
pilot study, published data or on their own knowledge and
Alpha (type I error)
opinion. This means that the calculation of an appropriate
0.05 0.01 0.001 sample size partly relies on subjective choices or crude es-
timates of certain factors which may seem rather artificial
Power (1-beta) to some. Unless the pilot study was large, using informa-
0.80 100 149 218 tion from a pilot study often results in unreliable estimates
0.90 134 190 266
0.99 234 306 402 of the variability and the minimal clinically relevant differ-
ence. By definition, pilot studies are underpowered, and
the observed difference in a pilot study is therefore an im-
precise estimate of the difference in the population. Not
an odds ratio is significantly different from one, after ad- accounting for this sampling error will lead to underpow-
justment for potential confounders [3]. Also, sample size ered studies [7]. Also published reports could provide an