A Guide To Sample Size For Animal Based Studies - 1st Edition Entire Volume Download
A Guide To Sample Size For Animal Based Studies - 1st Edition Entire Volume Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/a-guide-to-sample-size-for-animal-based-studies-1st-
edition/
A Guide to Sample Size for Animal-based Studies, First Edition. Penny S. Reynolds.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons Ltd.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 A Guide to Sample Size for Animal-based Studies
of resources, and the ethical requirement to mini- unaware of their existence. Many preclinical studies
mise waste and suffering of research animals. Thus, reported in the literature consist of numerous two-
sample size calculations are not a single calculation group designs. However, this approach is both inef-
but a set of calculations, involving iteration through ficient and inflexible and unsuited to exploratory
formal estimates, followed by reality checks for studies with multiple explanatory variables
feasibility and ethical constraints (Reynolds 2019). (Reynolds 2022). Statistically based designs are
Additional challenges to right-sizing experiments rarely reported in the preclinical literature. In part,
include those imposed by experimental design and this is because the design of experiments is seldom
biological variability (Box 1.2). In The Principles of taught in introductory statistics courses directed
Humane Experimental Technique (1959), Russell towards biomedical researchers.
and Burch were very clear that Reduction is achieved Power calculations are the gold standard for sam-
by systematic strategies of experimentation rather ple size justification. However, they are commonly
than trial and error. In particular, they emphasised misapplied, with little or no consideration of study
the role of the statistically based family of experi- design, type of outcome variable, or the purpose of
mental designs and design principles proposed by the study. The most common power calculation is
Ronald Fisher, still relatively new at the time. For- for two-group comparisons of independent samples.
mal experimental designs customised to address However, this is inappropriate when the study is
the particular research question increase the experi- intended to examine multiple independent factors
mental signal through the reduction of variation. and interactions. Power calculations for continuous
Design principles that reduce bias, such as rand- variables are not appropriate for correlated observa-
omisation and allocation concealment (blinding) tions or count data with high prevalence of zeros.
increase validity. These methods increase the Power calculations cannot be used at all when sta-
amount of usable information that can be obtained tistical inference is not the purpose of the study, for
from each animal (Parker and Browne 2014). example, assessment of operational and ethical fea-
Although it has now been almost a century sibility, descriptive or natural history studies, and
since Fisher-type designs were developed many species inventories.
researchers in biomedical sciences still seem Evidence of right-sizing is provided by a clear
plan for sample size justification and transparent
reporting of the number of all animals used in the
BOX 1.2 study. This is why these items are part of best-
Challenges for Right-Sizing Animal-Based Studies practice reporting standards for animal research
publications (Kilkenny et al. 2010, Percie du Sert
Ethics and welfare considerations. The three Rs
et al. 2020 and are essential for the assessment of
Replacement, Reduction, and Refinement should
research reproducibility (Vollert et al. 2020). Unfor-
be the primary driver of animal numbers.
tunately, there is little evidence that either sample
Experimental design. Animal-based research has
size justification or sample size reporting has
no design culture. Clinical trial models are inapp-
improved over the past decade. Most published ani-
ropriate for exploratory research. Multifactorial
mal research studies are underpowered and biased
agriculture/industrial design may be more suita-
(Button et al. 2013, Henderson et al. 2013, Macleod
ble in many cases, and they are unfamiliar to
et al. 2015) with poor validity (Würbel 2017, Sena
most researchers.
and Currie 2019), severely limiting reproducibility
Biological variability. Animals can display signifi-
and translation potential (Sena et al. 2010, Silver-
cant differences in responses to interventions,
man et al. 2017). A recent cross-sectional survey
making it challenging to estimate an appropriate
of mouse cancer model papers published in high-
sample size.
impact oncology journals found that fewer than
Cost and resource constraints. The financial cost of
2% reported formal power calculations, and less
conducting animal-based research, including the
than one-third reported sample size per group. It
cost of housing, caring for, and monitoring the
was impossible to determine attrition losses, or
animals, must be considered in estimates of
how many experiments (and therefore animals)
sample size.
were discarded due to failure to achieve statistical
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
The Sample Size Problem in Animal-Based Research 5
significance (Nunamaker and Reynolds 2022). The the most appropriate statistically based study
most common sample size mistake is not perform- design. Although advanced statistical or mathemat-
ing any calculations at all (Fosgate 2009). Instead, ical skills are not required, readers are expected to
researchers make vague and unsubstantiated state- have at least a basic course on statistical analysis
ments such as ‘Sample size was chosen because it is methods and some familiarity with the basics of
what everyone else uses’ or ‘experience has shown power and hypothesis testing. SAS code is provided
this is the number needed for statistical signifi- in appendices at the end of each chapter and refer-
cance’. Researchers often game, or otherwise adjust, ences to specific R packages in the text. It is strongly
calculations to obtain a preferred sample size recommended that everyone involved in devising
(Schultz and Grimes 2005, Fitzpatrick et al. 2018). animal-based experiments take at least one course
In effect, these studies were performed without jus- in the design of experiments, a topic not often cov-
tification of the number of animals used. ered by statistical analysis courses.
Statistical thinking is both a mindset and a set of This book is organised into four sections
skills for understanding and making decisions (Figure 1.1).
based on data (Tong 2019). Reproducible data can
only be obtained by sustained application of statis- Part I Sample size basics discusses definitions of
tical thinking to all experimental processes: good sample size, elements of sample size determina-
laboratory procedure, standardised and comprehen- tion, and strategies for maximising information
sive operating protocols, appropriate design of power without increasing sample size.
experiments, and methods of collecting and analys- Part II Feasibility. This section presents strategies
ing data. Appropriate strategies of sample size justi- for establishing study feasibility with pilot stud-
fication are an essential component. ies. Justification of animal numbers must first
address questions of operational feasibility
(‘Can it work?’ Is the study possible? suitable?
1.1 Organisation of the Book convenient? sustainable?). Once operational
This book is a guide to methods of approximating logistics are standardised, pilot studies can be
sample sizes. There will never be one number or performed to establish empirical feasibility
approach, and sample size will be determined for (‘Does it work?’ is the output large enough to
the most part by study objectives and choice of be measured? consistent enough to be reliable?)
Sampling Maximise
probability Signal Effect size
Data Minimise
visualisation Noise Process variation
Operational
Subject variation
feasibility?
No Yes
Justifiable
Feasible? numbers
Figure 1.1: Overview of book organisation. For animal numbers to be justifiable (Are they feasible? appropriate? ethical?
verifiable?), sample size should be determined by formal quantitative calculations (arithmetic, probability-based, precision-
based, power-based) and consideration of operational constraints.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 A Guide to Sample Size for Animal-based Studies
and translational feasibility (‘Will it work?’ proof Graham, M.L. and Prescott, M.J. (2015). The multifacto-
of concept and proof of principle) before pro- rial role of the 3Rs in shifting the harm-benefit analysis
ceeding to the main experiments. Power calcula- in animal models of disease. European Journal of
tions are not appropriate for most pilots. Instead, Pharmacology 759: 19–29. https://doi.org/
common-sense feasibility checks include basic 10.1016/j.ejphar.2015.03.040.
arithmetic (with structured back-of-the-envelope Henderson, V.C., Kimmelman, J., Fergusson, D. et al.
(2013). Threats to validity in the design and conduct
calculations), simple probability-based calcula-
of preclinical efficacy studies: a systematic review of
tions, and graphics. guidelines for in vivo animal experiments. PLoS Med-
Part III Description. This section presents methods icine 10: e1001489.
for summarising the main features of the sample Kilkenny, C., Browne, W.J., Cuthill, I.C. et al. (2010).
data and results. Basic descriptive statistics pro- Improving bioscience research reporting: the ARRIVE
vide a simple and concise summary of the data in guidelines for reporting animal research. PLoS Biol-
ogy 8 (6): e1000412. https://doi.org/10.1371/
terms of central tendency and dispersion or
journal.pbio.1000412.
spread. Graphical representations are used to
Macleod, M.R., Lawson McLean, A., Kyriakopoulou,
identify patterns and outliers and explore rela- A. et al. (2015). Risk of bias in reports of in vivo
tionships between variables. Intervals computed research: a focus for improvement. PLoS Biology 13:
from the sample data are the range of values e1002301. https://doi.org/10.1371/journal.
estimated to contain the true value of a popula- pbio.1002273.
tion parameter with a certain degree of confi- Nunamaker, E.A. and Reynolds, P.S. (2022). “Invisible
dence. Four types of intervals are discussed: actors”—how poor methodology reporting compro-
confidence intervals, prediction, intervals, toler- mises mouse models of oncology: a cross-sectional sur-
ance intervals, and reference intervals. Intervals vey. PLoS ONE 17 (10): e0274738. https://doi.org/
shift emphasis away from significance tests 10.1371/journal.pone.0274738.
and P-values to more meaningful interpretation Parker, R.M.A. and Browne, W.J. (2014). The place of
of results. experimental design and statistics in the 3Rs. ILAR
Journal 55 (3): 477–485.
Part IV Comparisons. Power-based calculations Percie du Sert, N., Hurst, V., Ahluwalia, A. et al. (2020). The
for sample size are centred on understanding ARRIVE guidelines 2.0: updated guidelines for reporting
effect size in the context of specific experimen- animal research. PLoS Biology 18 (7): e3000410.
tal designs and the choice of outcome variables. https://doi.org/10.1371/journal.pbio.3000410.
Effect size provides information about the Reynolds, P.S. (2019). When power calculations won’t
practical significance of the results beyond con- do: fermi approximation of animal numbers. Lab Ani-
siderations of statistical significance. Specific mal (NY) 48: 249–253.
Reynolds, P.S. (2021). Statistics, statistical thinking, and
designs considered are two-group comparisons,
the IACUC. Lab Animal (NY) 50 (10): 266–268.
ANOVA-type designs, and hierarchical designs.
https://doi.org/10.1038/s41684-021-00832-w.
Reynolds, P.S. (2022). Between two stools: preclinical
research, reproducibility, and statistical design of
References experiments. BMC Research Notes 15: 73. https://
doi.org/10.1186/s13104-022-05965-w.
Button, K.S., Ioannidis, J.P.A., Mokrysz, C. et al. (2013). Russell, W.M.S. and Burch, R.L. (1959). The Principles
Power failure: why small sample size undermines the of Humane Experimental Technique. London:
reliability of neuroscience. Nature Reviews Neurosci- Methuen.
ence 14: 365–376. Schulz, K.F. and Grimes, D.A. (2005). Sample size calcu-
Fitzpatrick, B.G., Koustova, E., and Wang, Y. (2018). Get- lations in randomised trials: mandatory and mystical.
ting personal with the “reproducibility crisis”: inter- Lancet 365 (9467): 1348–1353. https://doi.org/
views in the animal research community. Lab 10.1016/S0140-6736(05)61034-3.
Animal (NY) 47: 175–177. Sena, E.S. and Currie, G.L. (2019). How our approaches
Fosgate, G.T. (2009). Practical sample size calculations to assessing benefits and harms can be improved. Ani-
for surveillance and diagnostic investigations. Journal mal Welfare 28: 107–115.
of Veterinary Diagnostic Investigation 21: 3–14. Sena ES, van der Worp HB, Bath PM, Howells DW,
https://doi.org/10.1177/104063870902100102. Macleod MR. Publication bias in reports of animal
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
The Sample Size Problem in Animal-Based Research 7
stroke studies leads to major overstatement of efficacy. Vollert, J., Schenker, E., Macleod, M. et al. (2020). Sys-
PLoS Biology, 2010 8(3):e1000344. https://doi.org/ tematic review of guidelines for internal validity in
10.1371/journal.pbio.1000344. the design, conduct and analysis of preclinical biomed-
Silverman, J., Macy, J., and Preisig, P. (2017). The role of ical experiments involving laboratory animals. BMJ
the IACUC in ensuring research reproducibility. Lab Open Science 4 (1): e100046. https://doi.org/
Animal (NY) 46: 129–135. 10.1136/bmjos-2019-100046.
Tong, C. (2019). Statistical inference enables bad science; Würbel, H. (2017). More than 3Rs: the importance of sci-
statistical thinking enables good science. American entific validity for harm-benefit analysis of animal
Statistician 73: 246–261. research. Lab Animal 46: 164–166.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2
Sample Size Basics
A Guide to Sample Size for Animal-based Studies, First Edition. Penny S. Reynolds.
© 2024 John Wiley & Sons Ltd. Published 2024 by John Wiley & Sons Ltd.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 A Guide to Sample Size for Animal-based Studies
(a) (b)
(c) (d)
t1 t2 t3 t4
t1 t2 t3 t4
Figure 2.1: Units of replication. (a) Experimental unit = individual animal = biological unit. The entire entity to which an
experimental or control intervention can be independently applied. There are two treatment interventions A or B. Here each
mouse receives a separate intervention, and the individual mouse is the experimental unit (EU). The individual mouse is also
the biological unit. (b). Experimental unit = groups of animals. There are two treatment interventions A or B. Each dam
receives either A or B, but measurements are conducted on the pups in each litter. The experimental unit is the dam (N = 2),
and biological unit is the pup (n = 8). For this design, the number of pups cannot contribute to the test of the central
hypothesis. (c) Experimental unit with repeated observations. The experimental unit is the individual animal (= biological unit)
with four sequential measurements made on each animal. The sample size N = 2. (d) Experimental unit = part of each animal.
There are two treatment interventions A or B. Treatment A is randomised to either the right or left flank of each mouse,
and B is injected into the opposite flank of that mouse. The experimental unit is flank (N = 8). The individual mouse is the
biological unit. Each mouse can be considered statistically as a block with paired observations within each animal.
applied (Figure 2.1a). Cox and Donnelly (2011) size N will not be the same as the number of
define it as the ‘smallest subdivision of the experi- animals.
mental material such that two distinct units might The total sample size N refers to the number of
be randomized (randomly allocated) to different independent experimental units in the sample.
treatments.’ Whatever happens to one experimental The classic meaning of a ‘replicate’ refers to the
unit will have no bearing on what happens to the number of experimental units within a treatment
others (Hurlbert 2009). If the test intervention is or intervention group. Therefore, replicating exper-
applied to a ‘grouping’ other than the individual imental units (and hence increasing N) contributes
animal (e.g. a litter of mice, a cage or tank of ani- to statistical power for testing the central statistical
mals, a body part; (Figure 2.1b–d)), then the sample hypothesis. Power calculations estimate the number
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Sample Size Basics 11
of experimental units required to test the hypothe- unit can be an individual biological unit, a group of
sis. The assignment of treatments and controls to biological units, a sequence of observations on a sin-
experimental units should be randomised if the gle biological unit or a part of a biological unit (Lazic
intention is to perform statistical hypothesis tests and Essioux 2013; Lazic et al. 2018). The biological
on the data (Cox and Donnelly 2011). unit of replication may be the whole animal or a sin-
Independence of experimental units is essential gle biological sample, such as strains of mice, cell
for most null hypothesis statistical tests and meth- lines or tissue samples (Table 2.1).
ods of analysis and is the most important condition
for ensuring the validity of statistical inferences (van
Belle 2008). Non-independence of experimental 2.4 Technical Replicates
units occurs with repeated measures and multi-level
Technical replicates or repeats are multiple mea-
designs and must be handled by the appropriate sta-
surements made on subsamples of an experimental
tistically based designs and analyses for hypothesis
unit (Figure 2.2). Technical replicates are used
tests to be valid.
to obtain an estimate of measurement error, the dif-
ference between a measured quantity and its true
value. Technical replicates are essential for asses-
2.3 Biological Unit sing internal quality control of experimental proce-
The biological unit is the entity about which infer- dures and processes, and ensuring that results are
ences are to be made. Replicates of the biological not an artefact of processing variation (Taylor and
unit are the number of unique biological samples Posch 2014). Differences between operators and
or individuals used in an experiment. Replication instruments, instrument drift, subjectivity in deter-
of biological units captures biological variability mination of measurement landmarks, or faulty cal-
between and within these units (Lazic et al. 2018). ibration can result in measurement error. Cell
The biological unit is not necessarily the same as cultures and protein-based experiments can also
the experimental unit. Depending on how the treat- show considerable variation from run to run, so
ment intervention is randomised, the experimental in vitro experiments are usually repeated several
Table 2.1: Units of Replication in a Hypothetical Single-Cell Gene Expression RNA Sequencing Experiment. Designating a
given replicate unit as an experimental unit depends on the central hypothesis to be tested and the study design.
times. At least three technical replicates of Western response/expression rates, expected false positive
blots, PCR measurements, or cell proliferation rate, and number of sampling time points (Lee
assays may be necessary to assess reliability of tech- and Whitmore 2002; Lin et al. 2010; Jung and Young
nique and confirm validity of observed changes in 2012).
protein levels or gene expression (Taylor and Posch
2014).
The variance calculated from the multiple mea- Example: Experimental Units with Techni-
surements is an estimate of the precision, and cal Replication
therefore the repeatability, of the measurement.
Technical replicates measure the variability Two treatments A and B are randomly allocated
between measurements on the same experimental to six individually housed mice, with three mice
units. Repeating measurements increases the preci- receiving A and three receiving B. Lysates are
sion only for estimates of the measurement error; obtained from each mouse in three separate ali-
they do not measure variability either within or quots (Figure 2.2).
between treatment groups. Therefore, increasing The individual mouse is the experimental unit
the number of technical replicates does not improve because treatments can be independently and
power or contribute to the sample size for testing the randomly allocated to each mouse. There are
central hypothesis. Analysing technical repeats as three subsamples or technical replicates per
independent measurements is pseudo-replication. mouse. The total sample size is N = 6, with k =
High-dimensionality studies produce large 2 treatments, n = 3 mice per treatment group,
amounts of output information per subject. Exam- and j = 3 technical replicates per mouse. The total
ples include multiple DNA/RNA microarrays; bio- sample size N is 6, not 18.
chemistry assays; biomarker studies; proteomics;
metabolomics; inflammasome profiles, etc. These
studies may require a number of individual animals,
either for operational purposes (for example, to 2.5 Repeats, Replicates, and
obtain enough tissue for processing) or as part of
the study design (for example, to estimate biological
Pseudo-Replication
variation). Sample size will then be determined by Confusion of repeats with replicates is a problem of
the amount of tissue required for the assay techni- study design, and pseudo-replication is a problem of
cal replicates, or by design-specific requirements analysis. Study validity is compromised by incorrect
for power. Design features include anticipated identification of the experimental unit. A replicate is
A B
Figure 2.2: Experimental unit versus technical replicates. Two treatments A and B are randomly allocated to six mice.
The individual mouse is the experimental unit. Three lysate aliquots are obtained from each mouse. These are technical
replicates. The total sample size N is 6, not 18.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Sample Size Basics 13
correct repli-cation unit, and almost half showed Sacrificial pseudo-replication occurs when there
pseudo-replication, suggesting that inferences based are multiple replicates within each treatment arm,
on hypothesis tests were likely invalid. the data are structured as a feature of the design
Three of the most common types of pseudo- (such as pairing, clustering, or nesting), but design
replication are simple, sacrificial, and temporal. structure is ignored in the analyses. The units are
Others are described in Hurlbert and White (1993) treated as independent, so the degrees of freedom
and Hurlbert (2009). for testing treatment effects are too large. Sacrificial
Simple pseudo-replication occurs when there is pseudo-replication is especially common in studies
only a single replicate per treatment. There may with categorical outcomes when the χ 2 test or Fish-
be multiple observations, but they are not obtained er’s exact test is used for analysis (Hurlbert and
from independent experimental replicates. The arti- White 1993; Hurlbert 2009).
ficial inflation of sample size results in estimates of
the standard error that are too small, contributing to
increased Type I error rate and increased number of Example: Sunfish Foraging Preferences
false positives. Dugatkin and Wilson (1992) studied feeding suc-
cess and tankmate preferences in 12 individually
marked sunfish housed in two tanks. Preference
Example: Mouse Photoperiod Exposure was evaluated for each fish for all possible pair-
A study on circadian rhythms was conducted to wise combinations of two other tankmates. There
assess the effect of two different photoperiods on were 2 groups × 60 trials, per group × 2 replicate
mouse wheel-running. Mice in one environmen- sets of trials, for a total of 240 observations. They
tal chamber were exposed to a long photoperiod concluded that feeding success was weakly but
with 14 hours of light, and mice in a second statistically significantly correlated with aggres-
chamber to a short photoperiod with 6 hours of sion (P < 0.001) based on 209 degrees of freedom,
light. There were 15 cages in each chamber and that fish in each group strongly preferred
with four mice per cage. What is the effective (P < 0.001) the same individual in each of the
sample size? two replicate preference experiments, based on
This is simple pseudo-replication. The experi- 60 observations.
mental unit is the chamber, so the effective sam- The actual number of experimental units is 12,
ple size is one per treatment. Analysing the data with 6 fish per tank. The correct degrees of
as if there is a sample size of n = 60 (or even n = freedom for the regression analysis is 4, not
15) per treatment is incorrect. The number of 209. Suggested analyses for preference data
mice and cages in each chamber is irrelevant. included one-sample t-tests with 5 degrees of free-
This design implicitly assumes that chamber con- dom or one-tailed Wilcoxon matched-pairs test
ditions are uniform and chamber effects are zero. with N = 12. Correct analyses would produce
However, variation both between chambers and much larger P-values, suggesting that interpreta-
between repeats for the same chamber can be tion of these data requires substantial revision
considerable (Potvin and Tardif 1988; Hammer (Lombardi and Hurlbert 1996).
and Hopper 1997). Increasing sample size of mice
will not remedy this situation because chamber
Temporal (or spatial) pseudo-replication occurs
environment is confounded with photoperiod. It
when multiple measurements are obtained sequen-
is, therefore, not possible to estimate experimental
tially on the same experimental units, but analysed
error, and inferential statistics cannot be applied.
as if they represent an individual experimental unit.
Analysis should be restricted to descriptive statis-
Sequential observations (or repeated measures)
tics only. The study should be re-designed either to
are correlated within each individual. Repeated mea-
allow replication across several chambers, or if
sures increase the precision of within-unit estimates,
chambers are limited, as a multi-batch design
but the number of repeated measures do not increase
replicated at two or more time points.
the power for estimating treatment effects.
Downloaded from https://onlinelibrary.wiley.com/doi/ by University Of Wisconsin-Stout, Wiley Online Library on [19/02/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Sample Size Basics 15