Annals of The National Academy of Medical Sciences
Annals of The National Academy of Medical Sciences
74 Sample
Review Size Calculation in Medical Research
Article Charan et al.
1 Department of Pharmacology, All India Institute of Medical Address for correspondence Rimplejeet Kaur, PhD, Department
Sciences, Jodhpur, Rajasthan, India of Pharmacology, All India Institute of Medical Sciences, Jodhpur
2 Department of Community and Family Medicine, All India Institute 342005, Rajasthan, India (e-mail: [email protected]).
of Medical Sciences, Jodhpur, Rajasthan, India
3 Department of Paediatrics, All India Institute of Medical Sciences,
Jodhpur, Rajasthan, India
4 Department of Pharmacology, All India Institute of Medical
Sciences, Jodhpur, Rajasthan, India
5 All India Institute of Medical Sciences, Jodhpur, Rajasthan, India
Abstract Quality of research is determined by many factors and one such climacteric factor
is sample size. Inability to use correct sample size in study might lead to fallacious
results in the form of rejection of true findings or approval of false results. Too large
sample size is wastage of resources and use of too small sample size might fail to
answer the research question or provide imprecise results and may question the valid-
ity of study. Despite being such a paramount aspect of research, the knowledge about
sample size calculation is sparse among researchers. Why is it important to calculate
sample size; when to calculate it; how to calculate it and what details about sample
size calculation should be reported in research protocols or articles; are the lesser
known basics to majority of researchers. The present review is directed to address
these aforementioned fundamentals about sample size. Sample size should be calcu-
lated during the initial phase of planning of study. Several components are required
for sample size calculation such as effect size, type-1 error, type-2 error, and variance.
Researchers must be aware that there are different formulas for calculating sample
size for different types of study designs. The researcher must include details about
Keywords sample size calculation in the methodology section, so that it can be justified and it
► sample size also adds to the transparency of the study. The literature about calculation of sam-
► medical research ple size for different study designs is scattered over many textbooks and journals.
► clinical trials Scrupulous literature search was conducted to find the passable information for this
► case control study review. This paper presents the sample size calculation formulas in a single review in
► cohort study a simplified manner with relevant examples, so that researchers may adequately use
► cross-sectional study them in their research.
published online DOI https://doi.org/ © 2021. National Academy of Medical Sciences (India).
April 15, 2021 10.1055/s-0040-1722104 This is an open access article published by Thieme under the terms of the Creative
ISSN 0379-038X. Commons Attribution-NonDerivative-NonCommercial-License, permitting copying
and reproduction so long as the original work is given appropriate credit. Contents
may not be used for commercial purposes, or adapted, remixed, transformed or
built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Thieme Medical and Scientific Publishers Pvt. Ltd. A-12, 2nd Floor,
Sector 2, Noida-201301 UP, India
Sample Size Calculation in Medical Research Charan et al. 75
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).
76 Sample Size Calculation in Medical Research Charan et al.
According to H1 the birth weight of neonates born to receives drug A and second group receives drug B. Here,
women consuming tobacco in pregnancy is less than neo- the H0 will be that there is no difference in lipid lower-
nates born to women not consuming tobacco in pregnancy. ing efficacy of these two groups and the H1 will be drug
3. Choosing the primary outcome and suitable statistical test B that is more efficacious than drug A. If for this study,
applicable: we consider that the p-value of 0.05 is significant, this
The sample size calculation is usually based on the pri- will mean that we are assuming that there are 5% chances
mary objective of the study. Sample size determination of detection of difference in the efficacy in two groups;
is also related to the selection of the statistical test for when in the reality, there is no difference in efficacy of
data analysis as the calculation of sample size may also drug A and drug B at all, that is, the false positive results.
be based on the statistical tests that will be used for data For medical research, α-value of 0.05 is used (►Table 1).
interpretation. The matching confidence levels (CI) for the appropriate
4. Selecting significance level and power of study: level of significance are: (1) CI 95% for the 5% (α/p = 0.05)
Discussed in “Prerequisites for Sample Size Calculation“ level of significance and (2) CI 99% for the 1% (α/p = 0.001)
section of the article. level of significance.
5. Calculating sample size manually using formulas or with
statistical software. Power/Type-2 Error
It is defined as the probability of finding the difference in
Prerequisites for Sample Size Calculation two study groups if it actually exists. It is an essential tool
to measure the validity of the study. Power is calculated
Most of the novice researchers often choose sample size from another type of error known as β error or type-2 error.
based on the convenience, for example, if an orthopaedician Type-2 error detects false negative results which means it
wants to know the prevalence of osteoarthritis in a partic- fails to detect the difference in two groups when actually the
ular city. He or she will include all patients visiting his/her difference exists. The acceptable value of this error should
hospital in particular duration, for example, in 2 months. The also be decided by the researcher before initiating the study.
issue with this selection of population that the patients of Conventionally, the acceptable value of β error is 0.20, that
osteoporosis visiting that particular hospital may not be true is, 20% chances that the null hypothesis is wrongly accepted.
representative of that city as there are many other hospitals Power of study is calculated as 1-β. So, if β is 0.20 then power
in the city with more patients of orthopaedics visiting there. is 0.8, that is, 80%. Thus, the power could be defined as proba-
The most common question that a researcher wonders bility of correctly rejecting the H0. It is usually kept above 80%
about is what should be the adequate sample size to do a for medical research (►Table 1).
study? As mentioned earlier, for determining the sample size
various statistical formulas are used. Application of these for- Effect Size
mulas in sample size will require predetermined information As described earlier, effect size is the difference in the value of
regarding these four components. ►Table 1 shows how to variable in the control group and the test group. If effect size
establish these components for sample size calculation. is high, less sample size will be required to prove the effect
and if effect size is low, larger sample size will be needed.
Level of Significance/Alpha Value/Type-1 Error (►Table 1) The effect size is a numerical value for continuous
It is popularly known as p-value. It is defined as the proba- outcome variables, for example, comparing increases in hemo-
bility of falsely rejecting the H0. For example, we are com- globin caused by drug A is 1 dL/mL and by drug B is 5 dL/mL.
paring two drugs for lipid lowering efficacy. First group Thus, the effect size here will be 4 dL/mL. In case of binary
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).
Sample Size Calculation in Medical Research Charan et al. 77
outcomes, the difference between the event rate between the choosing the right formula as per the study design. In this
two groups should be considered as effect size, for example, article, we are explaining in detail about the sample size for-
the development of anxiety as adverse effect by drug (yes/no), mula for cross-sectional and clinical trials.
if the difference is 5% between both groups, then the effect
size is taken as 5%. Effect size value for sample size calculation Cross-Sectional Studies
is warranted for analytical studies and not for descriptive or In such studies, data are collected at a particular time to
cross-sectional studies. Effect size value could be determined answer questions about the status of population at that partic-
from the previously conducted studies, by conducting pilot ular time. Such studies include questionnaires, disease prev-
study, or it could be based on the experience of the researcher. alence surveys, meta-analysis, etc. Cross-sectional studies
are also frequently used to show association.6 Cross-sectional
Variance/Standard Deviation studies usually involves estimation of prevalence and estima-
If the endpoint used for the sample size calculation is quanti- tion of mean.
tative, then this parameter is required for comparative stud- For estimation of prevalence, the formula used is as
ies. Like effect size, the variance/standard deviation is also follows: 2
identified from the previously conducted studies, by con- Z (1-/2) p(1-p)
Sample Size = 2
ducting pilot study, or it could be based on the experience of d
the researcher (►Table 1).
Where Z1–α/2 is the standard normal variate (1.96 at 5% error;
►Table 2), p is the expected proportion in the population,
Dropout Rate
and d is precision. Precision is measure of random sampling
Dropout rate is estimation of number of participants who
error. It is of two types as follows:
will leave the study due to some reasons. Thus, to compen-
sate for this possible dropouts, some extra patients need to 1. Absolute precision: it refers to the actual uncertainty in a
be accommodated in sample size. It is calculated by the for- quantity. For example, prevalence of tonsillitis in children
mula (►Table 1) mentioned below: is 20 ± 10%, the absolute uncertainty is 10%.
2. Relative precision: it expresses the uncertainty as a frac-
N1=n/(1-d), where N1 is adjusted sample size, tion of the quantity of interest. For our example of a prev-
n is required sample size, d is dropout rate alence of 20 ± 10%, the relative uncertainty is 10 of 20%,
The components mentioned above are required for calcu- which is equal to 2%.
lation of sample size for almost all types of study designs. Conventionally, absolute precision is taken as 5% if the prev-
Besides these, there could be other parameters required alence of disease is expected to be between 10 and 90%. If
as per study design, for example, for prevalence studies- the prevalence is below 10%, then the precision is usually
precision/margin of error, and for clinical trials-pooled prev- taken half of prevalence and if the prevalence is expected to
alence. Details on how to calculate them are given in relevant be more than 90%, the d is calculated as {0.5 (1-P)}, where P
sections below. is prevalence.7
For example, a researcher wants to calculate the sample
Importance of Pilot Study size for a cross-sectional study to know prevalence/propor-
tion of asthma in traffic police in a city, and as per the pre-
As mentioned earlier, for sample size calculation, various viously published study, the value of prevalence of asthma
components, such as prevalence, variance, effect size, stan- in traffic police in the city is around 10%, and the researcher
dard deviation, are derived from the previously published wants to calculate sample size with the absolute precision
literature. Many a times such information is not found on lit- of 5% and type-1 error of 5%.
erature search, in such cases pilot studies could be planned. Where, Z1–α/2 will be 1.96,
Pilot study is a small-scale study conducted prior to actual p = 0.10 (percentage converted into the proportion)
large-scale study to assess the feasibility and scientific valid- d will be 0.05 (►Table 2).
ity. It also serves a source of information required for sam- Hence by putting the values in the above-mentioned for-
ple size calculation for subsequent large study. If the results mula, the sample size will be as follows:
of pilot study show that the study is not feasible and useful
1.96 0.10 1.0 0.10
2
then the idea of conducting larger study could be dropped
Sample size 276
out. This will help on saving time and resources.5 0.05
2
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).
78 Sample Size Calculation in Medical Research Charan et al.
Table 2 Z-values for sample size calculation In a clinical trial, the researcher could be calculating
Value Variance either difference between the proportion of two groups or
α-Value Z1–α/2 (two sided) difference between quantitative endpoint, that is the mean
between two groups.
0.01 (level of significance 1%) 2.58
If the clinical trial involves estimation of qualitative end
0.05 (level of significance 5%) 1.96 point between two groups, that is the difference between
0.10 (level of significance 10%) 1.64 proportions, then the formula used for sample size calcula-
α-Value Z1–α/2 (one sided) tion is:
0.01 (level of significance 1%) 2.33 2 (Z 1-/2 +Zβ) 2p (1-p)
Sample Size = 2
0.05 (level of significance 5%) 1.65 (p₁-p₂)
0.10 (level of significance 10%) 1.28
Where value of Z1–α/2 is the standard normal variate is 1.96 at
β-Value Z1–β 5% error and Zβ is 0.842 at 80% power (►Table 2).
0.01 (power 99%) 2.33 p1–p2 is the effect size, that is, the expected difference
0.05 (power 95%) 1.65 between two groups,
0.20 (power 80%) 0.84 P is pooled prevalence calculated by adding prevalence
is group 1 and prevalence in group 2 and then dividing the
sum by 2.
Adjusted sample size (N1) = n/(1-d)
As mentioned earlier, the value of effect size and pooled
Where N1 is adjusted sample size, n is required sample prevalence are calculated from previous studies, pilot study,
size, and d is dropout rate. or experience of researcher.9
For example, one wants to find out the effect of drug
Corrected sample size = 276/([1– (10/100)] = 307
A on the mortality in patients with colon cancer. For this
So, total of 307 traffic police men need to be screened for study, patients will be divided in two groups. One group
asthma for this study. will receive test drug A and another group will receive pla-
If the study involves estimation of mean in cross sectional cebo, and the standard drug therapy will be given to both
study, then the formula for sample size calculation will be groups.
mentioned below: To calculate the sample size for this study, information
2
Z (1-/2) SD
2
required is expected difference between the two groups and
Sample Size = 2 the pooled prevalence. On searching literature, it was found
d
that the normal mortality in the standard care treatment is
Where, Z1–α/2 is the standard normal variate (1.96 at 5% 20%. For drug A, since it is a new drug so the data are not avail-
error; ►Table 2), d is the precision of measurement with able for mortality, thus by discussions with other researchers
respect to the endpoint, and SD is the standard deviation, the working on this, it was decided that a 50% reduction in the
value of which needs is extracted from the previously pub- mortality can be considered to be clinically significant, and
lished similar studies, internal pilot study, or from the expe- so the expected mortality in drug-A group is taken as 10%.
rienced researchers working in the same area. Using these values, the effect size will be 10% (20–10) and
For example, estimation of average blood sugar level in pooled prevalence will be 15% (20 + 10 / 2). On conversion of
last trimester of pregnancy in women with gestational diabe- the percentage to proportions, the effect size will be 0.10 and
tes in a particular region is the study objective. On review of pooled prevalence will be 0.15. On adding these parameters
literature, a similar study was found with SD of 30 dL/mL. To in the formula, the sample size will be:
calculate sample size based on this value of SD and with pre- 2 1.96 0.84 0.15 1 0.15
cision of 5 dL/mL around the true value of blood glucose in Sample size 285
0.05
2
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).
Sample Size Calculation in Medical Research Charan et al. 79
Table 3 Sample size formula with interpretation for different study designs
Type of study Sample size formula Interpretation
Cohort study10,11 • Z1–β = it is the desired power = 0.84 at 80% power
Z₁ {(1+ 1 ) p × (1 − p)} +Z₁ − β • Z1–α/2 is the standard normal variate is 1.96 at 5% error
2 m
p
√{p₀ × (1− m₀ ) p₁ (1−p₁)}]²
• p0 = possibility of event in controls, from previous studies
Sample Size = • p1 = possibility of event in experimental, from previous
(p₀ − p₁)² studies
• m = number of control subjects per experimental subject
• p = [p1+(m×p0)]/m+1
Case control 2 2 • r = control to cases ratio (1 if same numbers of patient in
studies10 (r+1) ×p(1-p)(Z₁-β+Z₁-₂) both groups)
Sample Size = 2
r (p₁ − p₂) • p = proportion of population = (P1+P2)/2
• Z1–β = it is the desired power (0.84 for 80% power and 1.28
for 90% power)
• It is the standard normal variate is 1.96 at 5% error
• P1 = proportion in cases
• P2 = proportion in controls
Diagnostic For determining Sensitivity: • Z is conventionally taken as 1.96 in lieu with 90% confi-
tests12,13 Sample size = TP + FN dence interval
P • P is prevalence of rate of disease in study population
Z²xSpecificity (1-Specificity)
TN + FP =
W²
Animal studies14,15 For one-way ANOVA design (for group comparison: • k = number of groups
•Minimum number of patients /groups: • n = number of patients per group
10 • r = number of repeated measurements
n
k 1
•Maximum number of patients /groups:
20
n
k 1
One within factor, repeated-measure ANOVA (one
group, repeated measures): • N = total number of patients
•Minimum number of patients /groups: • k = number of groups
10 • n = number of patients per group
n 1 • r = number of repeated measurements
r 1
• If the study involves sacrificing of animals then the n
•Maximum number of patients/groups: should be multiplied by r.
20
n 1
r 1
One between, one within factor, repeated measures
ANOVA (group comparison, repeated measurements): • k = number of groups
•Minimum number of patients/groups: • n = number of patients per group
n 10 / kr 1 • r = number of repeated measurements
•Maximum number of patients/groups:
20
n 1
kr
Abbreviations: ANOVA, analysis of variance; CI, confidence interval.
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).
80 Sample Size Calculation in Medical Research Charan et al.
SD is the difference which is decided based on the previ- are: type of study, effect size, type of outcome, variance of
ous study or by other means discussed earlier in text. outcome, significance level, and the power of test. Sample
d is the effect size, that is, the expected difference between size calculation requires thorough review of the literature to
the two means which will be based on the previously avail- determine some of the parameters such as prevalence and
able data. effect size. Sample size calculations should be explained in
For example, a new antidiabetic drug A is to be evaluated detail in study protocol and publication, so that it can be
for reduction of the fasting blood glucose (FBG) level in com- authenticated by anyone.
parison to the old antidiabetic drug B. For this study, diabetic
Funding
patient will be randomly allocated to two groups, one group
None.
will be administered new drug A and the other group will
receive drug B. Literature of previous similar studies suggest Conflict of Interest
that the reduction of FBG by drug A is 20 dL/mL is more than None declared.
drug A and the SD of the difference is 50 dL/mL. on entering
the values in the formula:
References
2 50 1.96 0.84
2 2
Sample size = 98 1 Johnston KM, Lakzadeh P, Donato BMK, Szabo SM. Methods of
202 sample size calculation in descriptive retrospective burden of
Thus, the sample size needed for this study will be 98 in each illness studies. BMC Med Res Methodol 2019;19(1):9
2 Uttley J. Power analysis, sample size, and assessment of statis-
group. Sample size may be adjusted to accommodate drop
tical assumptions—improving the evidential value of lighting
rate by the formula mentioned in earlier sections. research. Leukos 2019;15:143–162
3 Crutzen R, Peters GY. Targeting next generations to change the
common practice of underpowered research. Front Psychol
What to Mention in Research/Protocol/ 2017;8:1184
Report about Sample Size 4 Halpern SD, Karlawish JH, Berlin JA. The continuing uneth-
ical conduct of underpowered clinical trials. JAMA 2002;
Reporting of details of sample size calculation is often dis-
288(3):358–362
regarded in research protocol, as well as the final research 5 Das S, Mitra K, Mandal M. Sample size calculation: Basic prin-
report/article. Contrary to it, is should be mentioned in ciples. Indian J Anaesth 2016;60(9):652–656
details so that the authenticity of the sample size is verifiable 6 Habib A, Johargy A, Mahmood K, Humma H. Design and deter-
(►Table 3). mination of the sample size in medical research. IOSR J Dent
The following information should be included in the study Med 2014;13:21–31
7 Naing L, Winn T, Rusli BN. Practical issues in calculating the
protocol for the example 1 mentioned above in the article
sample size for prevalence studies. medical statistics. Arch
“Sample size was calculated based on the previously Orofac Sci 2006;1:19–14
published study by Charan and Kantharia14 et al in which 8 World Health Organization. Clinical trials. Available at: https://
prevalence of diabetes was 20%. With the absolute precision www.who.int/health-topics/clinical-trials/#tab=tab_1.
of 5% points and type-1 error of 5%, the sample size was cal- Accessed September 1, 2020
9 Charan J, Biswas T. How to calculate sample size for differ-
culated as 246. After adjusting the sample size for dropout
ent study designs in medical research.? Indian J Psychol
rate of 10%, the final sample size was 274. The sample size Med 2013;35(2):121–126
was calculated manually by using the formula for cross-sec- 10 Süt N. Study designs in medicine. Balkan Med J 2014;31(4):
tional studies.” 273–277
Thus, the sample size section of the research protocol 11 Sharma SK, Mudgal SK, Thakur K, Gaur R. How to calcu-
must contain two references: one for the study from where late sample size for observational and experimental nursing
research studies? Natl J Physiol Pharm Pharmacol 2020;10:1–8
the prevalence of the disease is derived and another citation
12 Baratloo A, Hosseini M, Negida A. El Ashal G. Part 1: simple
from where the formula for the sample size calculation is definition and calculation of accuracy, sensitivity and speci-
taken. Beside this, if any software is used for sample size cal- ficity. Emergency (Tehran) 2015;3(2):48–49
culation then that too need to be mentioned. 13 Negida A, Fahim NK, Negida Y. Sample size calculation guide
- part 4: how to calculate the sample size for a diagnostic test
accuracy study based on sensitivity, specificity, and the area
Conclusion under the ROC curve. Adv J Emerg Med 2019;3(3):e33
14 Charan J, Kantharia ND. How to calculate sample size in animal
Sample size calculation is one of the important aspects studies.? J Pharmacol Pharmacother 2013;4(4):303–306
while planning a research and any laxity in its estimation 15 Arifin WN, Zahiruddin WM. Sample size calculation in ani-
may lead to misleading or incorrect findings. The important mal studies using resource equation approach. Malays J Med
factors to be considered during calculation of sample size Sci 2017;24(5):101–105
Annals of the National Academy of Medical Sciences (India) Vol. 57 No. 2/2021 © 2021. National Academy of Medical Sciences (India).