100% found this document useful (8 votes)
103 views

Biostatistics Decoded - 2nd Edition Instant DOCX Download

Biostatistics Decoded, 2nd Edition, published by John Wiley & Sons in 2020, provides a comprehensive overview of biostatistical concepts and methods. The book is structured to cater to readers at different levels of understanding, with clear explanations and examples to facilitate learning. This edition includes new content on experimental designs and study designs, expanding beyond the focus of the first edition on epidemiological and clinical research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (8 votes)
103 views

Biostatistics Decoded - 2nd Edition Instant DOCX Download

Biostatistics Decoded, 2nd Edition, published by John Wiley & Sons in 2020, provides a comprehensive overview of biostatistical concepts and methods. The book is structured to cater to readers at different levels of understanding, with clear explanations and examples to facilitate learning. This edition includes new content on experimental designs and study designs, expanding beyond the focus of the first edition on epidemiological and clinical research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Biostatistics Decoded, 2nd Edition

Visit the link below to download the full version of this book:

https://medipdf.com/product/biostatistics-decoded-2nd-edition/

Click Download Now


This edition first published 2020
© 2020 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as
permitted by law. Advice on how to obtain permission to reuse material from this title is available at
http://www.wiley.com/go/permissions.

The right of A. Gouveia Oliveira to be identified as the author of this work has been asserted in accordance
with law.

Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit
us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that
appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty


While the publisher and authors have used their best efforts in preparing this work, they make no
representations or warranties with respect to the accuracy or completeness of the contents of this work and
specifically disclaim all warranties, including without limitation any implied warranties of merchantability or
fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales
materials or promotional statements for this work. The fact that an organization, website, or product is
referred to in this work as a citation and/or potential source of further information does not mean that the
publisher and authors endorse the information or services the organization, website, or product may provide
or recommendations it may make. This work is sold with the understanding that the publisher is not engaged
in rendering professional services. The advice and strategies contained herein may not be suitable for your
situation. You should consult with a specialist where appropriate. Further, readers should be aware that
websites listed in this work may have changed or disappeared between when this work was written and when
it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial
damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data


Names: Oliveira, A. Gouveia, author.
Title: Biostatistics Decoded / A. Gouveia Oliveira.
Description: Second edition. | Hoboken, NJ : Wiley, 2020. | Includes
bibliographical references and index.
Identifiers: LCCN 2020017017 (print) | LCCN 2020017018 (ebook) | ISBN
9781119584209 (cloth) | ISBN 9781119584315 (adobe pdf) | ISBN
9781119584285 (epub)
Subjects: MESH: Biostatistics
Classification: LCC R853.S7 (print) | LCC R853.S7 (ebook) | NLM WA 950 |
DDC 610.72/7–dc23
LC record available at https://lccn.loc.gov/2020017017
LC ebook record available at https://lccn.loc.gov/2020017018

Cover Design: Wiley


Cover Images: Ultrasound Monitor © Belish/Shutterstock, Iris Pairs Plot
© Wikimedia Commons, Graph © Wiley

Set in 9.5/12.5pt STIXTwoText by SPi Global, Pondicherry, India

Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4YY

10 9 8 7 6 5 4 3 2 1
v

Contents

Preface xi

1 Populations and Samples 1


1.1 The Object of Biostatistics 1
1.2 Scales of Measurement 3
1.3 Central Tendency Measures 5
1.4 Sampling 8
1.5 Inferences from Samples 11
1.6 Measures of Location and Dispersion 14
1.7 The Standard Deviation 15
1.8 The n − 1 Divisor 16
1.9 Degrees of Freedom 18
1.10 Variance of Binary Variables 19
1.11 Properties of Means and Variances 20
1.12 Descriptive Statistics 22
1.13 Sampling Variation 25
1.14 The Normal Distribution 27
1.15 The Central Limit Theorem 29
1.16 Properties of the Normal Distribution 30
1.17 Probability Distribution of Sample Means 32
1.18 The Standard Error of the Mean 33
1.19 The Value of the Standard Error 35
1.20 Distribution of Sample Proportions 37
1.21 Convergence of Binomial to Normal Distribution 39

2 Descriptive Studies 41
2.1 Designing a Research 41
2.2 Study Design 42
2.3 Classification of Descriptive Studies 44
2.4 Cross-sectional Studies 45
2.5 Inferences from Means 47
2.6 Confidence Intervals 48
2.7 Statistical Tables 49
vi Contents

2.8 The Case of Small Samples 51


2.9 Student’s t Distribution 54
2.10 Statistical Tables of the t Distribution 56
2.11 Inferences from Proportions 58
2.12 Statistical Tables of the Binomial Distribution 60
2.13 Sample Size Requirements 61
2.14 Longitudinal Studies 63
2.15 Incidence Studies 65
2.16 Cohort Studies 66
2.17 Inference from Incidence Studies 70
2.18 Standardization 72
2.19 Time-to-Event Cohort Studies 75
2.20 The Actuarial Method 76
2.21 The Kaplan–Meier Method 79
2.22 Probability Sampling 82
2.23 Simple Random Sampling 84
2.24 Replacement in Sampling 85
2.25 Stratified Sampling 87
2.26 Multistage Sampling 92

3 Analytical Studies 97
3.1 Objectives of Analytical Studies 97
3.2 Measures of Association 98
3.3 Odds, Logits, and Odds Ratios 99
3.4 Attributable Risk 101
3.5 Classification of Analytical Studies 103
3.6 Uncontrolled Analytical Studies 104
3.7 Comparative Analytical Studies 105
3.8 Hybrid Analytical Studies 109
3.9 Non-probability Sampling in Analytical Studies 111
3.10 Comparison of Two Means 111
3.11 Comparison of Two Means from Small Samples 114
3.12 Comparison of Two Proportions 116

4 Statistical Tests 121


4.1 The Null and Alternative Hypotheses 121
4.2 The z-Test 122
4.3 The p-Value 125
4.4 Student’s t-Test 126
4.5 The Binomial Test 128
4.6 The Chi-Square Test 130
4.7 The Table of the Chi-Square Distribution 134
4.8 Analysis of Variance 135
4.9 Partitioning the Sum of Squares 139
Contents vii

4.10 Statistical Tables of the F Distribution 142


4.11 The ANOVA Table 143

5 Aspects of Statistical Tests 145


5.1 One-Sided Tests 145
5.2 Power of a Statistical Test 149
5.3 Sample Size Estimation 150
5.4 Multiple Comparisons 153
5.5 Scale Transformation 155
5.6 Non-parametric Tests 156

6 Cross-sectional Studies 161


6.1 Linear Regression 161
6.2 The Least Squares Method 163
6.3 Linear Regression Estimates 166
6.4 Regression and Correlation 171
6.5 The F-Test in Linear Regression 173
6.6 Interpretation of Regression Analysis Results 176
6.7 Multiple Regression 177
6.8 Regression Diagnostics 180
6.9 Selection of Predictor Variables 184
6.10 Independent Nominal Variables 185
6.11 Interaction 188
6.12 Nonlinear Regression 190

7 Case–Control Studies 193


7.1 Analysis of Case–Control Studies 193
7.2 Logistic Regression 194
7.3 The Method of Maximum Likelihood 196
7.4 Estimation of the Logistic Regression Model 198
7.5 The Likelihood Ratio Test 201
7.6 Interpreting the Results of Logistic Regression 202
7.7 Regression Coefficients and Odds Ratios 203
7.8 Applications of Logistic Regression 204
7.9 The ROC Curve 205
7.10 Model Validation 208

8 Cohort Studies 213


8.1 Repeated Measurements 213
8.2 The Paired t-Test 213
8.3 McNemar’s Test 215
8.4 Generalized Linear Models 216
8.5 The Logrank Test 219
8.6 The Adjusted Logrank Test 222
viii Contents

8.7 The Incidence Rate Ratio 224


8.8 The Cox Proportional Hazards Model 225
8.9 Assumptions of the Cox Model 229
8.10 Interpretation of Cox Regression 230

9 Measurement 233
9.1 Construction of Clinical Questionnaires 233
9.2 Factor Analysis 234
9.3 Interpretation of Factor Analysis 237
9.4 Factor Rotation 239
9.5 Factor Scores 241
9.6 Reliability 242
9.7 Concordance 248
9.8 Validity 253
9.9 Validation of Diagnostic Tests 255

10 Experimental Studies 257


10.1 Main Design Features and Classification 257
10.2 Experimental Controls 260
10.3 Replicates 261
10.4 Classification of Experimental Designs 262
10.5 Completely Randomized Design 263
10.6 Interaction 268
10.7 Full Factorial Design 269
10.8 The Random Effects Model 274
10.9 Components of Variance 275
10.10 ANOVA Model II and Model III 278
10.11 Rules for the Definition of the Error Terms 282
10.12 ANOVA on Ranks 284

11 Blocking 285
11.1 Randomized Block Design 285
11.2 Generalized Randomized Block Design 288
11.3 Incomplete Block Design 291
11.4 Factorial Design with Randomized Blocks 292
11.5 Latin and Greco-Latin Square Design 293

12 Simultaneous Inference 297


12.1 Multiple Comparisons 297
12.2 Generalist Methods 298
12.3 Multiple Comparisons of Group Means 303
12.4 Pairwise Comparison of Means 304
12.5 Different Variances 312
12.6 Comparison to a Control 313
Contents ix

12.7 Comparison of post hoc Tests 315


12.8 Complex Comparisons 316
12.9 Tests of Multiple Contrasts 320
12.10 A posteriori Contrasts 324
12.11 The Size of an Experiment 326

13 Factorial ANOVA 329


13.1 The n-Way ANOVA 329
13.2 The 2k Factorial Design 331
13.3 The 2k Factorial Design with Blocking 335
13.4 The Fractional Factorial Design 337

14 Nested Designs 339


14.1 Split–Plot Design 339
14.2 Nested (Hierarchical) Design 343
14.3 Mixed Model Nested ANOVA 345
14.4 Mixed Model Nested ANOVA with Three Sublevels 349
14.5 Pure Model II Nested ANOVA 352

15 Repeated Measures 355


15.1 Repeated Measures ANOVA 355
15.2 Repeated Measures ANOVA with Two Factors 359
15.3 ANOVA with Several Repeated Measures 361
15.4 Multivariate Tests 362

16 Clinical Trials 363


16.1 Classification of Clinical Trials 363
16.2 The Clinical Trial Population 365
16.3 The Efficacy Criteria 366
16.4 Controlled Clinical Trials 367
16.5 The Control Group 369
16.6 Blinding 370
16.7 Randomization 371
16.8 Non-comparative Clinical Trials 375
16.9 Regression Toward the Mean 378
16.10 Non-randomized Controlled Clinical Trials 379
16.11 Classical Randomized Clinical Trial Designs 381
16.12 Alternative Clinical Trial Designs 385
16.13 Pragmatic Clinical Trials 387
16.14 Cluster Randomized Trials 389
16.15 The Size of a Clinical Trial 393
16.16 Non-inferiority Clinical Trials 398
16.17 Adaptive Clinical Trials 403
16.18 Group Sequential Plans 405
x Contents

16.19 The Alpha Spending Function 407


16.20 The Clinical Trial Protocol 409
16.21 The Data Record 411

17 Analysis of Clinical Trials 413


17.1 General Analysis Plan 413
17.2 Data Preparation 414
17.3 Study Populations 415
17.4 Primary Efficacy Analysis 418
17.5 Analysis of Multiple Endpoints 420
17.6 Secondary Analyses 423
17.7 Safety Analysis 424

18 Meta-analysis 427
18.1 Purpose of Meta-analysis 427
18.2 Measures of Effect 428
18.3 The Inverse Variance Method 429
18.4 The Random Effects Model 435
18.5 Heterogeneity 439
18.6 Publication Bias 442
18.7 The Forest Plot 444

References 447
Index 455
xi

Preface

In this second edition of Biostatistics Decoded, the style of presentation of the concepts and
methods follows the same approach as in the first edition. Thus, the main features of easy
mathematics requiring knowledge of little more than the basic arithmetic operations,
emphasis on the explanation of the foundations of statistical concepts and on the rationale
of statistical methods, and profuse illustration with working examples in a detailed step-
by-step presentation of the statistical calculations adopted in the first edition are all present
in this second edition.
Also, in keeping with the format of the first edition, this book is written at two levels
of difficulty. Readers who only wish to understand the rationale of the statistical methods
may read only the plain text, without loss of continuity, while those readers who want to
understand how the calculations are done should also read the text boxes.
A great deal of new content has been included in this second edition. While the first edi-
tion was focused almost entirely on epidemiological and clinical research, in this edition
several chapters covering experimental designs used in basic science research are intro-
duced. Several important topics on study designs are also included, such as quasi-
experimental designs, pragmatic clinical trials, new designs of randomized controlled trials,
and cluster randomized trials. Additions in statistical methods include not only an extensive
discussion of methods based on the analysis of variance, but also an introduction to meth-
ods that are gaining great importance in data analysis, such as generalized linear models.
The sections on observational studies have been expanded to better explain the various
designs available and the estimates they produce. The chapter on meta-analysis was also
developed further.
The changes to the first edition are not limited to the addition of new content. The
structure of the book was reorganized according to research types, with the statistical
methods used in each type of research presented immediately after the discussion of
the main research designs. This chapter organization contrasts with the typical organiza-
tion of most statistical textbooks, where the presentation of statistical methods is often
organized first in basic concepts and applied methods, and then according to the types
of variables. The organization in this book seems more relevant to practicing researchers,
or professionals looking to expand their knowledge of statistical methods in a specific
problem area.
This book is adequate for both introductory and advanced courses on biostatistics and
research methodology, as well as for experienced biostatisticians who may find refreshing
xii Preface

the presentation of statistical methods they may use often but not fully understand the
rationale behind. The data used in the examples is either fictitious or from personal obser-
vation, and examples of computer outputs and many graphs were produced using Stata
statistical software (Stata Corp., College Station, TX, USA).
I am in debt to the faculty and staff of the Pharmacy Department of the Federal University
of Rio Grande do Norte, Brazil who gave me the opportunity and motivation to develop my
understanding, skills, and hands-on experience of research designs and statistical methods
applied to basic science during the last seven marvelous years at that department. Many
other people have given me encouragement to pursue this endeavor, probably several of
them not even being aware of the importance of their role, in particular my dear friend
Ingrid Bezerra, dear Professor Ivonete Araújo, my college Rand Martins, and, most of
all, my sons Miguel and Ivan to whom I dedicate this book, and, inevitably, Ana Cristina.
1

Populations and Samples

1.1 The Object of Biostatistics

Biostatistics is a science that allows us to make abstractions from instantiated facts, there-
fore helping us to improve our knowledge and understanding of the real world. Most people
are aware that biostatistics is concerned with the development of methods and of analytical
techniques that are applied to establish facts, such as the proportion of individuals in the
general population who have a particular disease. The majority of people are probably also
aware that another important application of biostatistics is the identification of relation-
ships between facts, for example, between some characteristic of individuals and the
occurrence of disease. Consequently, biostatistics allows us to establish the facts and the
relationships among them, that is, the basic building blocks of knowledge. Therefore, it
can be said that it is generally recognized that biostatistics plays an important role in
increasing our knowledge in biosciences.
However, it is not so widely recognized that biostatistics is of critical importance in the
decision-making process. Clinical practice is largely involved in taking actions to prevent,
correct, remedy, or cure diseases. But before each action is taken, a decision must be made
as to whether an action is required and which action will benefit the patient most. This is, of
course, the most difficult part of clinical practice, simply because people can make decisions
about alternative actions only if they can predict the likely outcome of each action. In other
words, to be able to make decisions about the care of a patient, a clinician needs to be able to
predict the future, and it is precisely here that resides the central role of biostatistics in deci-
sion making.
Actually, biostatistics can be thought of as the science that allows us to predict the future.
How is this magic accomplished? Simply by considering that, for any given individual,
the expectation is that his or her features and outcomes are the same, on average, as those
of the population to which the individual belongs. Therefore, once we know the average
features of a given population, we are able to make a reasonable prediction of the features
of each individual belonging to that population.
Let us take a further look at how biostatistics allows us to predict the future using, as an
example, personal data from a nationwide survey of some 45 000 people in the population.
The survey estimated that 27% of the population suffers from chronic venous insufficiency
(CVI) of the lower limbs. With this information we can predict, for each member of the

Biostatistics Decoded, Second Edition. A. Gouveia Oliveira.


© 2020 John Wiley & Sons Ltd. Published 2020 by John Wiley & Sons Ltd.
2 1 Populations and Samples

100

80
Prevalence (%)

60 women

40

20 men

0
0 10 20 30 40 50 60 70 80 90
Age (years)

Figure 1.1 Using statistics for predictions. Age- and sex-specific prevalence rates of chronic venous
insufficiency.

population, knowing nothing else, that such a person has a 27% chance of suffering from
CVI. We can further refine our prediction about that person if we know more about
the population. Figure 1.1 shows the prevalence of CVI by sex and by age group. With this
information we can predict, for example, for a 30-year-old woman, that she has a 40%
chance of having CVI and that in, say, 30 years she will have a 60% chance of suffering
from CVI.
Therefore, the key to prediction is to know about the characteristics of individuals and of
disease and treatment outcomes in the population. So we need to study, measure, and eval-
uate populations. However, this is not easily accomplished. The problem is that, in practice,
most populations of interest to biomedical research have no material existence. Patient
populations are very dynamic entities. For example, the populations of patients with acute
myocardial infarction, with flu, or with bacterial pneumonia are changing at every instant,
because new cases are entering the population all the time, while patients resolving the
episode or dying from it are leaving the population. Therefore, at any given instant there
is one population of patients, but in practice there is no possible way to identify and evaluate
each and every member of the population. Populations have no actual physical existence,
they are only conceptual.
So, if we cannot study the whole population, what can we do? Well, the most we can do is
to study, measure, and evaluate a sample of the population. We may then use the observa-
tions we made in the sample to estimate what the population is like. This is what biosta-
tistics is about, sampling. Biostatistics studies the sampling process and the phenomena
associated with sampling, and by doing so it gives us a method for studying populations
which are immaterial. Knowledge of the features and outcomes of a conceptual population
allows us to predict the features and future behavior of an individual known to belong to
that population, making it possible for the health professional to make informed decisions.
Biostatistics is involved not only in helping to build knowledge and to make individual
predictions, but also in measurement. Material things have weight and volume and are
1.2 Scales of Measurement 3

usually measured with laboratory equipment, but what about things that we know to exist
which have no weight, no volume, and cannot be seen? Like pain, for example. One impor-
tant area of research in biostatistics is on methods for the development and evaluation of
instruments to measure virtually anything we can think of. This includes not just things
that we know to exist but are not directly observable, like pain or anxiety, but also things
that are only conceptual and have no real existence in the physical world, such as quality of
life or beliefs about medications.
In summary, biostatistics not only gives an enormous contribution to increase our knowl-
edge in the biosciences, it also provides us with methods that allow us to measure things that
may not even exist in the physical world, in populations that are only conceptual, in order to
enable us to predict the future and to make the best decisions.
This dual role of biostatistics has correspondence with its application in clinical research
and in basic science research. In the former, the main purpose of biostatistics is to determine
the characteristics of defined populations and the main concern is in obtaining correct
values of those characteristics. In basic science, biostatistics is mainly used to take into
account the measurement error, through the analysis of the variability of replicate measure-
ments, and to control the effect of factors that may influence measurement error.

1.2 Scales of Measurement

Biostatistical methods require that everything is measured. It is of great importance to select


and identify the scale used for the measurement of each study variable, or attribute, because
the scale determines the statistical methods that will be used for the analysis. There are only
four scales of measurement.
The simplest scale is the binary scale, which has only two values. Patient sex (female,
male) is an example of an attribute measured in a binary scale. Everything that has a
yes/no answer (e.g. obesity, previous myocardial infarction, family history of hypertension,
etc.) was measured in a binary scale. Very often the values of a binary scale are not numbers
but terms, and this is why the binary scale is also a nominal scale. However, the values of
any binary attribute can readily be converted to 0 and 1. For example, the attribute sex, with
values female and male, can be converted to the attribute female sex with values 0 meaning
no and 1 meaning yes.
Next in complexity is the categorical scale. This is simply a nominal scale with more
than two values. In common with the binary scale, the values in the categorical scale
are usually terms, not numbers, and the order of those terms is arbitrary: the first term
in the list of values is not necessarily smaller than the second. Arithmetic operations with
categorical scales are meaningless, even if the values are numeric. Examples of attributes
measured on a categorical scale are profession, ethnicity, and blood type.
It is important to note that in a given person an attribute can have only a single value.
However, sometimes we see categorical attributes that seem to take several values for
the same person. Consider, for example, an attribute called cardiovascular risk factors with
values arterial hypertension, hypercholesterolemia, diabetes mellitus, obesity, and taba-
gism. Obviously, a person can have more than one risk factor and this attribute is called
4 1 Populations and Samples

a multi-valued attribute. This attribute, however, is just a compact presentation of a set


of related attributes grouped under a heading, which is commonly used in data forms.
For analysis, these attributes must be converted into binary attributes. In the example,
cardiovascular risk factors is the heading, while arterial hypertension, hypercholesterol-
emia, diabetes mellitus, obesity, and tabagism are binary variables that take the values
0 and 1.
When values can be ordered, we have an ordinal scale. An ordinal scale may have any
number of values, the values may be terms or numbers, and the values must have a natural
order. An example of an ordinal scale is the staging of a tumor (stage I, II, III, IV). There is a
natural order of the values, since stage II is more invasive than stage I and less than stage III.
However, one cannot say that the difference, either biological or clinical, between stage
I and stage II is larger or smaller than the difference between stage II and stage III. In ordi-
nal scales, arithmetic operations are meaningless.
Attributes measured in ordinal scales are often found in clinical research. Figure 1.2
shows three examples of ordinal scales: the item list, where the subjects select the item
that more closely corresponds to their opinion, the Likert scale, where the subjects read
a statement and indicate their degree of agreement, and the visual analog scale, where
the subjects mark on a 100 mm line the point that they feel corresponds to their assess-
ment of their current state. Psychometric, behavioral, quality of life, and, in general,
many questionnaires commonly used in clinical research have an ordinal scale of
measurement.
When the values are numeric, ordered, and the difference between consecutive values is
the same all along the scale, that is an interval scale. Interval scales are very common in

Item list
Compared to last month, today you feel:
much worse
worse
no change
better
much better

Likert scale:
Today I feel much better than last month:

Strongly Strongly
disagree agree
1 2 3 4 5

Visual analog scale:


How do you feel today? Place a vertical mark on the line
below to indicate how you feel today

Worst Best
imaginable imaginable

Figure 1.2 Examples of commonly used ordinal scales.

You might also like