0% found this document useful (0 votes)
26 views3 pages

Sample Size in Medical Research Ethics

This document discusses sample size and statistical power in medical research. It explains that studies with too small of a sample size will lack the power to detect clinically important effects, and may thus be unethical. The document provides an example to illustrate how statistical power increases with larger sample sizes, and how sample size calculations can be done prospectively to ensure adequate power to detect a clinically meaningful effect if present.

Uploaded by

mghasegh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

Sample Size in Medical Research Ethics

This document discusses sample size and statistical power in medical research. It explains that studies with too small of a sample size will lack the power to detect clinically important effects, and may thus be unethical. The document provides an example to illustrate how statistical power increases with larger sample sizes, and how sample size calculations can be done prospectively to ensure adequate power to detect a clinically meaningful effect if present.

Uploaded by

mghasegh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1336 BRITISH MEDICAL JOURNAL VOLUmE 281 15 NOVEMBER 1980

Medicine and Mathematics

Statistics and ethics in medical research


III How large a sample?
DOUGLAS G ALTMAN

Whatw~r type of statistical design is used for a study, the significant) a clinically relevant difference. More importantly,
problem of sample size must be faced. This aspect, which it may be used prospectively to calculate a suitable sample size.
causes considerable difficulty for researchers, is perhaps the If the smallest difference of clinical relevance can be specified
most common reason for consulting a statistician. There are we can calculate the sample size necessary to have a high
also, however, many who give little thought to sample size, probability of obtaining a statistically significant result-that is,
choosing the most convenient number (20, 50, 100, etc) or time high power-if that is the true difference. For a continuous
period (one month, one year, etc) for their study. They, and variable, such as weight or blood pressure, it is also necessary
those who approve such studies, should realise that there are to have a measure of the usual amount of variability. A simple
important statistical and ethical implications in the choice of example will, I hope, illustrate the relation between the sample
sample size for a study. size and the power of a test.
A study with an overlarge sample may be deemed unethical
through the unnecessary involvement of extra subjects and the
correspondingly increased costs. Such studies are probably rare. 1.0
On the other hand, a study with a sample that is too small will
be unable to detect clinically important effects. Such a study
may thus be scientifically useless, and hence unethical in its
use of subjects and other resources. Studies that are too small 0.

are extremely common, to judge by surveys of published


research.1 2 The ethical implications, however, have only rarely
been recognised.3'
The approach to the calculation of sample size will depend on 0.
the complexity of the study design. I will discuss it here in the
context of trying to ascertain whether a new treatment is Li
0~

0
better than an existing one, since it will help if the ideas are
illustrated by one of the most common types of research. 0.4

Significant tests and power


0.2
Despite their widespread use in medical research significance
tests are often imperfectly understood. In particular, few
medical researchers know what the power of a test is. This is
perhaps because most simple books and courses on medical 0.
statistics do not discuss it in any detail, even though it is a 0 200 400 600 600 1000 1200
concept fundamental to understanding significance tests. Some TOTAL sTuDr SIZE
of the general implications, however, are well appreciated, such FIG 1-Relation between sample size and power to detect
as the awareness that the more subjects there are, the greater as significant (p<005 or p<001) a difference of 05 cm
the likelihood of statistical significance. when standard deviation is 2 cm.
Formally, the power of a significance test is a measure of how
likely that test is to produce a statistically significant result for a
population difference of any given magnitude. Practically, it AN EXAMPLE
indicates the ability to detect a true difference of clinical
importance. The power may be calculated retrospectively to Suppose we wish to carry out a milk-feeding trial on 5-year-
see how much chance a completed study had of detecting (as old children when a random half of the children are given extra
milk every day for a year. We know that at this age children's
height gain in 12 months has a mean of about 6 cm and a standard
deviation of 2 cm. We consider that an extra increase in height
Division of Computing and Statistics, Clinical Research Centre, in the milk group of 0 5 cm on average will be an important
Harrow, Middx HAl 3UJ difference, and we want a high probability of detecting a true
DOUGLAS G ALTMAN, BSC, medical statistician (member of scientific difference at least that large.
staff)
Figure 1 shows the power of the test for a true difference of
BRITISH MEDICAL JOURNAL VOLUME 281 15 NOVEMBER 1980 1337

0.0 - 0-995
- 0-99
0'1
098
0.2 - O-97
- 0*96
- 0-95
0-3
- 090
0*4
0-85
c
1- 0-5 - O'80
I.-
- 0-75
v06 - 070 -v
0
0-6
-o N
N
065
- O'60
as 0 7 - 0*55
c - 0*50
-
(I) 0*45
0-8 - 01,0
0-35
0'9
- O'30
Y 005 - 0'25
0-20
1*0
0'15
0.01
1*1 -0.10

1*2 -J SIG L 0-05


LEVEL
FIG 2-Nomogram for a two-sample comparison of a continuous variable, relating power, total study size, the standardised difference,
and significance level.

0 5 cm. The increase in power with increasing sample size is they are to some extent arbitrary, it is generally advisable to
clearly seen, as is the relation with the significance level. For stick closely to the prestated criteria.
any given sample size the probability of obtaining a result
significant at either the 5% or 1% level, given a true difference
in growth of 0-5 cm, can be read off. Power of 80-90% is A NEW SIMPLE METHOD
recommended; fig 1 shows that to achieve an 85% chance of
detecting the specified difference of 0 5 cm significant at the The formula on which these calculations are based is not
1 % level, we would need a total of about 840 children. particularly simple. Graphs are preferable, but because so
If we are told that we can have at most 500 children in all, many variables are concerned, a large set of graphs like fig 1
what will the power be now ? Figure 1 shows that the power would be necessary to calculate sample size for any problem.
drops from 85% to 60%. We are now more than twice as Greater flexibility, however, is achieved by the nomogram shown
likely to miss a true difference of 0 5 cm at the 1% level, although in fig 2. This makes use of the standardised difference, which is
the power is still about 80% for a test at the 5% level of equal to the postulated true difference (usually the smallest
significance. Alternatively, and not shown by fig 1, this size of medically relevant difference) divided by the estimated standard
study achieves the same power as the larger one for a difference deviation. So in the previous example the standardised difference
of 0-65 cm instead of 0-5 cm. Whether or not this is thought of interest was 0 5/2 0=0 25. The nomogram is appropriate
sufficient will depend on how far one is prepared to alter one's for calculating power for a two-sample comparison of a con-
criteria of acceptability for the sake of expediency. Although tinuous measurement with the same number of subjects in each
1338 BRITISH MEDICAL JOURNAL VOLUME 281 15 NOVEMBER 1980
group. The only restriction is the common requirement that power of their study. Obviously in most of these studies such
the variable that is being measured is roughly Normally calculations were not done.
distributed. It is surprising and worrying that in such an ethically
The nomogram gives the relation between the standardised sensitive area as clinical trials so little attention has been given
difference, the total study size, the power, and the level of to an aspect that can have major ethical consequences. If the
significance. Given the significance level (5% or 1°h),* by sample size is too small there is an increased risk of a false-
joining with a straight line the specific values for two of the negative finding. A recent survey' of 71 supposedly negative
variables the required value for the other variable can easily trials found that two-thirds of them had at least a 10% risk of
be read off the third scale. By using this nomogram, it is both missing a true improvement of 50%. In only one of the 71
simple and quick to assess the effect on the power of varying studies was power mentioned as having been considered before
the sample size, the effect on the required sample size of changing carrying out the study. It is surely ethically indefensible to
the difference of importance, and so on. It is easy to confirm carry out a study with only a small chance of detecting a
the earlier calculations for the milk-feeding trial. treatment effect unless it is a massive one, and with a con-
An estimate of the standard deviation should usually be sequently high probability of failure to detect an important
available, either from previous studies or from a pilot study. therapeutic effect.
Note that the nomogram is not strictly appropriate for retro-
spective calculations. Although it will be reasonably close for This is the third in a series of eight articles.
samples larger than 100, for smaller samples it will tend to No reprints will be available from the authors.
overestimate the power.

QUALITATIVE DATA References


For many studies the outcome measure is not continuous but Freiman JA, Chalmers TC, Smith H, Kuebler RR. The importance of
qualitative-for example, where one is looking for the presence beta, the type II error and sample size in the design and interpretation
or absence of some condition or comparing survival rates. of the randomized control trial. N EnglJ7 Med 1978;299:690-4.
2 Ambroz A, Chalmers TC, Smith H, Schroeder B, Freiman JA, Shareck
Peto et al5 have discussed calculating sample size for such EP. Deficiencies of randomized control trials. Clinical Research 1978;
studies, and they emphasise the problem of getting enough 26:280A.
subjects when either the condition is rare or the expected 3 Newell DJ. Type II errors and ethics. Br MedJ 1978;iv:1789.
4 Anonymous. Controlled trials: planned deception? Lancet 1979;i:534-5.
improvement is not large. For example, about 1600 subjects 5Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized
would be needed to have a power of 90% of detecting (at p <0 05) clinical trials requiring prolonged observation of each patient. I Intro-
a reduction in mortality from 15% to 10%. Although the sample duction and design. BrJ' Cancer 1976;34;585-612.
size will in general need to be much larger for studies including 6 Aleong J, Bartlett DE. Improved graphs for calculating sample sizes
qualitative outcome measures, the logic behind the calculations when comparing two independent binomial distributions. Biometrics
is exactly the same as with continuous data, except that a prior 1979 ;35 :875-81.
Boag JW, Haybittle JL, Fowler JF, Emery EW. The number of patients
estimate of the standard deviation is not needed. Several required in a clinical trial. BrJ Radiol 1971 ;44:122-5.
authors have published graphs for general use.6-8 8 Mould RF. Clinical trial design in cancer. Clin Radiol
1979;30:371-81.

OTHER TYPES OF STUDY


A right-handed 46-year-old stonemason developed a right axillary vein
Sequential designs are similarly amenable to the incorporation thrombosis. No haematological, biochemical, or physical abnormalities
of considerations of power at the design stage. Indeed, it is were found to account for his thrombosis, and he has recovered well
probably much more common here than for ordinary randomised taking anticoagulants. Might his condition have been related to his
studies. For these, and for more complicated designs, it may occupation ?
be particularly helpful to enlist the aid of a statistician when
thinking about sample size. It might have been, especially if he had had a spell off work. Axillary
vein thrombosis commonly results from unaccustomed use of the arm,
including upward movements that compress the vein between clavicle
and first rib.
Conclusions
The idea behind using the concept of power to calculate What are the health hazards of taking small babies to public swimming
sample size is to maximise, so far as practicable, the chances of pools ?
finding a real and important effect if it is there, and to enable
us to be reasonably sure that a negative finding is strong grounds Mother and baby bathing is a rewarding experience for both parent
for believing that there is no important difference. The effect and child. It aids physical development of the baby and augments the
of the approach outlined above is to make clinical importance psychological "bonding." Many public bathing pools have special
and statistical significance coincide, thus avoiding a common mother (father) and baby bathing sessions, and those interested are
advised to try to use this facility. There is the safety advantage of a
problem of interpretation. poolside attendant being present. The best age to start for the baby is
Before embarking on a study the appropriate sample size from 9 to 12 months, although some enthusiasts may start earlier.
should be calculated. If not enough subjects are available then Much depends on the development of the baby and the confidence of
the study should not be carried out or some additional source the parent. The pool should be reasonably warm, between 80-85°F
of subjects should be found.5 (It should also be borne in mind (26-30°C) (most public baths are 70-75°F (21-240C)), and it is most
that expected accession rates tend to be over-optimistic.) The important to let the baby gain confidence by holding him and only
calculations affecting sample size and power should be reported gradually allowing independence in the water. It is preferable to
when publishing results. A study2 of 172 randomised controlled have only parents and babies in the pool, as excited older children
trials published in the New England J7ournal of Medicine and shouting and splashing may be frightening. It is unwise to take a baby
the Lancet from 1973 to 1976 found that none mentioned a bathing until at least 1-1 hours after his last meal. There is no more
risk of contracting any infection than in any other social activity, and
prior estimate of the required sample size, and none specified a provided the parent is not over-enthusiastic the chance of an accident
clinically relevant difference that might allow calculation of the is negligible. Small babies take to bathing readily, and parents who
have used the special sessions confirm that parent and baby bathing is
*As in the example these are two-tailed significance levels. well worth while.

You might also like