0% found this document useful (0 votes)
53 views

Biomedical Literature Evaluation: Brief Notes On

This document provides guidance on evaluating biomedical literature. It discusses the importance of critical evaluation skills and outlines various types of primary literature as well as common biases. Key points include that primary literature introduces new knowledge, critical evaluation is needed to interpret results applicably, and around 40-50% of publications have issues. Components to evaluate include study design, validity, bias, sample size, and confounding factors. The document also reviews reputable journal characteristics.

Uploaded by

Pauline Tayaban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Biomedical Literature Evaluation: Brief Notes On

This document provides guidance on evaluating biomedical literature. It discusses the importance of critical evaluation skills and outlines various types of primary literature as well as common biases. Key points include that primary literature introduces new knowledge, critical evaluation is needed to interpret results applicably, and around 40-50% of publications have issues. Components to evaluate include study design, validity, bias, sample size, and confounding factors. The document also reviews reputable journal characteristics.

Uploaded by

Pauline Tayaban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Brief Notes on

Biomedical Literature Evaluation

Hisham S. Abou-Auda, Ph.D.


College of Pharmacy, King Saud University

Last Revision: September 2005


Biomedical Literature Evaluation
Hisham S. Abou-Auda, Ph.D.

Primary Literature

• Original published or unpublished works that introduce new knowledge or


enhance existing knowledge

• Experimental studies
• Clinical trials
• Pharmaceutical research
• Educational assessments

• Observational studies
• Cross-sectional
• Case control
• Cohort

• Descriptive Reports
• Case reports
• Case series
• Pharmaceutical practice
• Editorials
• Letter-to-the-Editor

NOTE:

A review article published in Pharmacotherapy is not considered


primary literature

Importance of Critical Literature Evaluation Skills

• Critical literature evaluation is the process of reading and evaluating primary


literature journal articles in order to arrive at an interpretation that you can
call "your own"

• Are the results believable?


• Are the results applicable to your practice?
• Does the paper really support it claims?

• The ultimate interpretation and decision about the value of an article rests
with the reader.

2
• Many studies are open to differing interpretation

• The majority of clinical decisions are based on primary literature reports.

• Literature evaluation skills are needed when therapeutic dilemmas arise,


when information about a particular subject is conflicting, or when making
decisions about drug policy (i.e., adding or deleting a drug from an
institution's drug formulary).

• Patients often ask pharmacists to help them interpret information that they
have obtained in the lay press.

• It has been estimated that 40 to 50% of published articles in the medical


literature have serious problems with study design, statistical analysis, and
conclusions.

Reasons for Publications of Poor Quality Research

1. The "Publish or Perish" dilemma


2. Investigator's lack of knowledge in study design or statistical
analysis
3. Peer-reviewer's lack of knowledge in study design or statistical
analysis
4. Preliminary results published to stimulate further research in area
5. Scientific fraud

From 1975 to 1983, the FDA conducted routine audits of investigators who were
conducting clinical trials funded by the agency. Fraudulent data were found to
have been submitted by 59% (24/41) of investigators disciplined for scientific
misconduct.

Validity

• Internal Validity: Within the confines of the study, the results appear to be
accurate, the methods and analysis are appropriate and the interpretations of
the investigators appear supported.

• External Validity: The conclusions of the study can be applied to the reader's
practice. Also referred to as "generalizability"of results.

Bias
• Bias is a systematic variation in which treatment groups under study are treated or
measured differently on a consistent basis

• Bias can mislead one to conclude an erroneous outcome

• Not all types of bias can be avoided

3
Stages Where Bias Can Occur Within a Study
(From Journal of Chronic Diseases 1979;32:51-63)

1. Reading background information or the introduction of the study


2. Defining and choosing the study sample
3. Applying the experimental interventions
4. Measuring the outcomes
5. Analyzing the data
6. Interpreting the analysis and results
7. Publishing the findings

Common Biases Found in Research

(Adapted from Millares. Applied Drug Information. Vancouver; Applied Therapeutics,


Inc.; 1998.)

Bias of Rhetoric: Rhetorical techniques used to convince the reader without


reference.

One-Sided
Reference Bias: Authors may restrict references to only those works
which support their conclusions.

Positive Result Bias: Authors are more likely to submit and publish positive results.

Hot Stuff Bias: When a topic is "hot", investigators and editors may not be
able to resist publishing additional results, no matter how
preliminary or shaky the results may appear

Suspicion Bias: Knowledge of a subject's prior exposure may influence


subject selection or the outcome of an intervention.

Sample Size Bias: Samples which are too small can prove nothing; samples
which are too large can prove anything.

Admission Rate or
"Berkson" Bias: When hospitalization rates differ for different
exposures/diseases, the relation between exposure and
diseases can become distorted.

Procedure
Selection Bias: Certain clinical procedures may be preferentially offered to
those at low risk or those expected to have a favorable
outcome.

Missing Clinical
Data Bias: Certain clinical data may be missing because they were
normal, negative, never measured, or never recorded.

4
Volunteer Bias: Volunteers from a specified sample may exhibit characteristics
which may differ from non-volunteers.

Contamination Bias: This may occur when members of the control group
inadvertently receive the experimental intervention.

Withdrawal Bias: Patients who withdraw from a study may differ


systematically from those who remain.

Therapeutic
Personality Bias: When an investigator knows what treatment the subjects are
receiving the outcomes or measurements may be influenced.

Intensive
Measure Bias: When outcome measures are incapable of detecting clinically
significant differences.

Apprehension Bias: Certain measurements may be altered from usual values


when the patient is apprehensive.

Obsequiousness Bias: Subjects may alter their responses based on what they
perceive to be desired by the investigator.

Attention Bias: Subjects may alter their behavior if they know they are being
observed (also known as the Hawthorne Effect).

Instrument Bias: Defects in the calibration or maintenance of measurement


instruments may lead to systematic deviations in results.

Post-hoc
Significance Bias: Alpha or beta error selected after the data have been analyzed

Data Dredging Bias: Data are reviewed for all possible associations without a
prior hypothesis.

Tidying-up Bias: Exclusion of outliers or other "untidy" results that is not justified
by statistical grounds.

Magnitude Bias: When interpreting a finding, the selection of a scale of


measurement may markedly affect the interpretation.

Significance Bias: The confusion of statistical significance with clinical significance.

Correlation Bias: Equating correlation with causation

5
Confounding
• A variable that affects the dependent or independent variables within a
study and makes it difficult to determine what effect it has on the measured
outcomes.

• Confounding variables may hide a true association.

• It may be impossible to eliminate all confounding variables.

• Statistical methods can be used to control for some of the effects


confounding.

Components of an Evaluative Journal Article

• Title: A brief description of the article's subject

• Authors: Informs the reader who conducted the study; the


primary author or investigator is listed first

• Abstract: Identifies the purpose, design, and methodology of


the study and briefly reviews its results and
conclusions

• Introduction: Provides a framework for the purpose of the study


through an overview of the literature and references
other published studies that support the purpose

• Methods: A detailed description of the techniques used to


conduct the study; enough information should be
provided so that the reader could replicate the study.
(This is the most important component the article
and must be evaluated with careful scrutiny!)

• Results: Organizes and simplifies the raw data, but does not
interpret the results

• Discussion: Authors interpret the data and debate the


significance of their findings

• Conclusion: Provides a one- to two-sentence summary of the


study's purpose and results

• References: Lists all references cited throughout the study

6
Evaluating the Journal
Characteristics of Reputable Scientific Journals

• Editorial policy that specifies the requirements for the types and
formats of submitted manuscripts

• Peer-review policy which requires that all manuscripts be reviewed by


consultants or researchers in the field of study prior to publication

• An editorial board that is composed of well-known researchers and leaders


in their respective discipline (editorial board is usually listed on or near
the title page)

• Does not contain an overabundance of advertisements

• Reputable journals usually publish supplements. Keep in mind that


information found in supplements may not be peer-reviewed.

• Clinical studies are rarely published in "throw-away" journals. Throw-


away journals are characterized by being free to readers, having a
high advertisement-to text ratio, not being owned by professional
societies, and having variable peer-review processes.

Evaluating the Investigators

• The investigators should have appropriate training and expertise in the


area.

• The investigators should have a good track record of prior research.

• A biostatistician should be involved with the evaluation of data.

• The validity of studies authored entirely by investigators working in


pharmaceutical industry is sometimes questioned because the goal of
these individuals is primarily promotion of the company's drug
products.

• The source of funding and potential conflicts of interest should be disclosed


by the authors. Complete funding by pharmaceutical companies may be an
issue of concern. However, potential bias does not necessarily negate
the study results.

• The research site should have appropriate resources and technology to


effectively conduct the study.

7
Evaluating the Title

• The title should be brief and catch the attention of readers interested in the
topic.

• The title should indicate what the article is about without drawing any
conclusions. A title that sounds like a newspaper headline may indicate
bias by the authors.

• The tile should not give the impression of answering questions that the
study is not designed to answer.

Evaluating the Abstract

• Abstracts should provide a very brief overview of the study, and provide
enough information for the reader to determine whether the study is of
interest.

• Data presented in the abstract may not be discussed in the body of the
article and important data may be omitted from the abstract due to space
limitations. Abstracts should not be used as substitutes for careful analysis
of the study, and clinical decisions should not be made based on
information only from the abstract. (i.e., you must ready the entire article
to accurately evaluate the study)

• Inflammatory language may indicate the potential for bias

• Certain journals may have formats for "structured" abstracts to improve


quality. Structured abstracts include: objective, research design, clinical
setting, participants, interventions, main outcome measurements, results,
and conclusions.

Evaluating the Introduction

• The introduction should provide valid reasons supporting the need to


conduct the study; all factual statements must be referenced, and the
authors should appropriately interpret the available literature for
applicability and relevance.

• References should be up-to-date and mostly from the primary literature-


not the tertiary literature. Authors should avoid citing only their own past
research.

• A good introduction should provide a complete synopsis of the literature


published to date and include the most important studies.

• Refer to original articles to confirm suspicious data. If the data are


misquoted, then the credibility of the study or report is questionable.

8
• Conduct your own literature search to confirm that the authors have
completely reviewed the literature. Authors may chose to ignore recent
work that disagrees with their own.

• The study objective usually appears at the end of the introduction, and
should be stated in a clear and concise manner. If the authors fail to
state and objective, it may be an indication that the study was not well
planned.

• A research and null hypothesis must be formulated for each study and
should be stated in the introduction of an article:

Alternative Hypothesis (HA): A difference exists between groups


Null Hypothesis (Ho): A difference does not exist between
groups

Evaluating the Methods (the most important section!)


Important Definitions
Randomization: A procedure equivalent to flipping a coin that helps
ensure that treatment groups are similar. When studies
are randomized, subjects have an equal and
independent chance of receiving any of the treatment
modalities. Appropriate randomization techniques
include the use of random number tables, computer-
generated random numbers, or lotteries.

Stratification: A randomization procedure that divides subjects into


equal groups to control for differences in confounding
variables. Stratification allows separate estimates to
make for groups of individuals who have the same
values for the confounding variable.

Open-label: A study in which there is no blinding. Both the


investigator and subjects are aware of the assignment of
the treatment groups.

Single-blind: Either subjects or investigators are unaware of


assignment of subjects to active or control groups.

Double-blind: Both subjects and investigators are unaware of


assignment of subjects to active or control groups.

Triple-blind: Both subjects and investigators are unaware of


assignment of subjects to active or control groups;
another group involved with interpretation of data is also
unaware of subject assignment.

Double-Dummy: A study in which two placebo are needed to achieve


proper blinding of treatments.

9
Parallel study: All subjects receive only one treatment. Parallel studies
are most appropriate when therapies are definitive or
when disease states are self-limited (e.g.,
antimicrobials for infectious diseases)

Cross-Over Study: All subjects receive both treatments being studied and
outcomes are assessed for each therapy. Cross-Over
studies are appropriate when diseases are highly variable
(e.g., pain) Advantages include: smaller sample size
required and decreased error caused by variability

Important Considerations for Cross-Over Studies


• Appropriate wash-out period
• Diseases with exacerbations and remissions
• Subjects must be randomly assigned to treatment order
• Blinded to time of cross-over
• Subject dropouts and deaths should be minimized

Controls: A group of persons used for comparison with a study


group. Ideally, the control group is identical to the study
group except that it has not been exposed to the
treatment under study.

Placebo-controlled: A study in which the "control" group receives placebo that


is identical to the study drug in terms of appearance,
taste, and smell, but does not contain active drug.

Active-controlled: A study in which the "control" group receives treatment


with another pharmacologically active medication.

Historical control: A study in which data from previously conducted trials or


groups of previously treated patients are used for
comparison with the current study population.

Run-In Period: Preinvestigation observation of patients usually


designed to ensure that they are appropriate
candidates for entrance into a randomized clinical trial.
(to ensure adherence to therapy)

• The protocol must be approved the institutional review board of the


institution to ensure patient safety and sound clinical research

• All subjects must agree to participate by signing an informed consent that clearly
states the risk and benefits of participating in the study

10
• The inclusion and exclusion criteria for the subjects should be appropriate for the
topic of study and clearly defined. Subjects enrolled in the study should have
characteristics that are representative of patients with disease, and severity
criteria should be appropriate.

• The methods section should describe in detail how the patients were selected to
participate in the study. Where is the patient population from?

• Patients should be randomly assigned to treatment groups to ensure that


groups are homogenous. All demographic information should be presented.
(Always check to make sure that baseline characteristics between groups are
similar)

• The medication doses, routes and frequencies of administration, and


duration of treatment should be appropriate for the condition studied.

• All concomitant therapies should be clearly described, and diet and lifestyle
characteristics should be similar between groups

• Compliance with therapy and adverse events should be monitored and


recorded appropriately.

• All measurements should be standardized and conducted at appropriate


intervals. Data collection forms and instruments should be validated.

• Power calculations and level of statistical significance should be determined a


priori

Evaluating the Results

• Data should be presented in a clear and understandable format. Data presented


in the abstract, charts, tables, or graphs should be consistent with what is
described in the text.

11
Review of Statistical Methods used in Biomedical Literature

Statistical Analysis: The organization and mathematical manipulation of data used


to describe characteristics that have been studied and
formulate conclusions from the data.

Statistically Significant: The difference that exists between study groups cannot be
explained by the role of chance alone.

Clinically Significant: The difference that exists between study groups is


substantial enough to be clinically useful.

Type I Error: An error that occurs when statistical tests show that a
difference exists between study groups, but in fact there is no
difference (α-error)

Type II Error. Statistical tests conclude that no difference exists between


study groups, but in fact a difference does exist (β-error)

Type I Error
• α is the probability of making a Type I error

• Acceptable values for α error are selected before the study is conducted (a
priori) to minimize the risk of an incorrect conclusion

• The generally accepted value used for α error in biomedical studies is ≤ 0.05

• An α ≤ 0.05 means that type I error will occur less than 5% of the time, and
that 95% of the time investigators can be sure that the results cannot be
explained by chance alone

• After the study is completed, a p-value is calculated from the data collected to
determine the actual observed significance

• If the p-value is less than the preset α, it is assumed that the data supports the
hypothesis 95% of the time

Type II Error
• β is the probability of Type II error and is directly related to the study's "power"
(Power = 1 - β)

• β can be calculated using the number of subjects (n) and the difference that the
investigators are trying to detect (power) must be preset before the study is
conducted (usually "a priori”)

• The arbitrarily accepted level of β-error is ≤ 0.2, which translates to a power of


80%

• A power of < 80% (or 0.80) suggests that there may not be enough study subjects
to detect a difference between groups

12
Information Needed for a Sample Size Calculation
• Mean difference the investigator wishes to detect
• Anticipated standard deviation within the study population
• Desired power (β)
• Preset α
• One- or two-tailed analysis

• An example of sample size calculation formula for comparison of sample


means:

2
⎡ ⎛ σ ⎞⎤
n = 2 ⎢ PI ⎜ ⎟⎥
⎣ ⎝ µ2 -µ1 ⎠ ⎦

where n= desired sample size for each group,


PI = desired power index,
σ = anticipated standard deviation,
and µ2 - µ1 = the mean difference wished to detect

(NOTE: This is an example, the specific equation used will vary depending the
type of data, number of groups etc)

• The smaller the difference one wishes to detect between groups, the larger the
sample size will be. For example, to detect a 20% difference in cure rates
between two antibiotics, 20 patients will be needed to have 80% power.
However, to detect a difference of 10%, 200 patients will be needed to
have 80% power.

"One-Tailed" vs. "Two-Tailed" Analysis


• Refers to the distribution of the test statistic

• One-tailed analyses are designed to detect deviations from the null


hypothesis in only one direction. The test will only detect if drug A is more
effective than drug B. The test cannot provide a valid determination of whether
is drug A is less effective than drug B. One-tailed tests are typically used
when comparing an active drug to placebo. In this situation, one is
reasonably sure that the active drug may be similar to the placebo, but it is
highly unlikely that the active drug will be worse than placebo.

• A two-tailed test is designed to determine deviations from the null hypothesis in


both directions. This type of analysis will determine if drug A is better than drug
B, if drug A is equal to drug B, or if drug A is worse than drug B.

13
Descriptive Statistics
Measures of central tendency: Measures of variability:
• mean • range
• median • standard deviation
• mode • coefficient of variation

Incidence
Incidence is the probability that a healthy person will develop a disease within a
specified period. It is the number of new cases of disease in the population within a
specific period.

Number of new cases of disease


Incidence rate =
Population at risk

Prevalence
Prevalence measures the number of people in the population who have the disease
at a given time. It is the probability of people having a disease within a
specified time frame.

Number of existing cases of disease


Prevalence =
Total Population

Relative Risk
Relative risk is a measure of disease frequency when a specific factor is present or
absent. An actual risk can only be measured when a cohort type of study design is
used. Prospective studies allow for defining populations at risk and also allow for
calculation of excess risk caused by exposure to a particular factor.

Incidence rate among those exposed to a particular factor


RR =
Incidence rate among those not exposed to the same factor

Odds Ratio
Odds ratio is also a measure of disease frequency. An odds ratio is an estimator
of relative risk and is calculated when prospective studies evaluating exposure to
certain factors are not practical. Odds ratios are calculated when retrospective
case-control studies are used. When calculating an odds ratio, three
assumptions must be made; the control group is representative of the general
population, the cases are representative of the population with the disease, and
the frequency of the disease in the population is small.

Cases with exposure × Controls without exposure


Odds Ratio =
Cases without exposure × Controls with exposure

14
95% Confidence Intervals
Statistical terms that represent the interval of numerical values within which you can
be 95% confident that the population value you are estimating lies within the
interval. This is also known as an interval estimate.

Inferential Statistics
Used to make assumptions pertaining to the null hypothesis and determine if the
difference that exists can be explained by chance alone.

Type of statistical test that is used is based on:


• Data type utilized in the study
• Number groups for comparison
• Type of study design

Types of Data
Continuous: Characterized by having an infinite number of evenly spaced
potential values between any two points.
Examples: Age, weight, S. Cr., Etc.

Nominal: Characterized by the arbitrary assignment of numbers to


different characteristics that have a finite number of possible
values.
Examples: race or sex

Ordinal: Data that is ranked in a specific order with no consistent


magnitude of difference between ranks.
Examples: stage of disease or opinion scores (ranked 1-5)

Dependent variable
The outcome variable of interest in a research study. The outcome that one intends
to explain or estimate.

Independent variable
Variables that will effect corresponding measurement of the dependent variable in a
research study. Independent variables define the conditions under which the
dependent variable is to be examined.

Parametric
Data collected from a sample that can be described using a "normal distribution".
When plotted, the data is symmetrical, continuous, and forms a bell-shaped curve with
its mean value corresponding to the highest point on the curve. Used to refers to
interval or ratio data.

Nonparametric
Data collected from a sample that is not normally distributed. Used to refer to ordinal
or nominal data.

Common Statistics for Nominal Measurements


• Chi-Square: A statistical test used to compare nominal data from independent
groups when only two possible choices exist. This test should not
be used if the expected value in any cell is less than 5.

15
• Fisher's Exact test: Another statistical test used to compare nominal data from 2
independent groups. (used when expected frequencies are < 5)

• McNemar's test: A variant of the Chi Square test used to compare nominal data
from two matched or paired groups (e.g., when data is
collected from the same patient at different time periods).

• Contingency Table Analysis: A test used to compare nominal data when


there are three or more groups or three or more possible
outcomes. Also referred to as "R X C" or "Row-by-Column".

• Cochran Mantel-Haenszel Test: A test used to compare nominal data


when more than one independent variable exists. It is often used
when data from several Contingency Tables are combined and
analyzed together.

• Bonferroni Correction: A modification commonly used in statistical tests to make


adjustments in p-values in order to minimize type I error when
comparing several groups. (the more comparisons that are made,
the greater the risk for type I error)

Common Statistics for Ordinal Measurements


• Wilcoxon rank sum test: A statistical test used to determine differences in
ordinal measurements taken from 2 independent groups. The
test involves combining the data from both samples and ranking
them from smallest to largest.

• Mann-Whitney U test: A statistical test used to determine differences in ordinal


measurements taken from 2 independent groups. This test
involves comparing each data value from one group to the
values obtained in the other group.

(The Wilcoxon rank sum test and the Mann-Whitney U test use
different methods to calculate the exact same p-values)

• Kruskal-Wallis rank sum: A single statistical test used to compare differences


in ordinal data taken from 3 or more independent groups. If a
difference is detected, then additional statistical tests are required
to determine exactly which groups are different since there can
be many comparisons.

• Wilcoxon signed-rank test: A statistical test used to compare ordinal data


that is collected using repeated-measures. (e.g., when data is
collected from the same individual at different time points, also
known as a crossover study design)

• Friedman's test A statistical test used to compare ordinal measurements


from 3 or more independent groups when the data is collected
using repeated measures.

16
Common Statistics for Interval and Ratio Measurements
• Student's t test: A test statistic used to compare continuous (interval or ratio)
data collected from 2 independent groups with equal variances. If
the data are not normally distributed, then the appropriate
corresponding nonparametric test statistic is the Mann-Whitney
U test.

• Paired t test: A test statistic used to compare continuous (interval or ratio) data
collected from the same individual using repeated measures. If
the data are not normally distributed, then the appropriate
nonparametric test statistic is the Wilcoxon signed-rank test.

• Analysis of variance (ANOVA): A single test statistic used to compare


continuous data collected from 3 or more groups assuming the
variances of all groups are equal and normally distributed. The
Kruskal-Wallis test is the appropriate nonparametric test for this
instance.

™ If the p-value calculated from an ANOVA test reports


statistical significance (p<0.05), one can conclude that a
significant difference among groups exists. However, to
determine which groups are significantly different
additional statistical tests (post hoc methods) are needed.
Post hoc methods used to adjust multiple comparisons
include:
• Tukey's test
• Ducan's test
• Dunnett's test
• Student-Newman-Keul's test (SNK)
• Scheffé's test
• Fisher’s Least significant difference test (LSD)
• Bonferroni test

• Analysis of Covariance (ANCOVA): A statistical test that compares


differences in a continuous variable with variations that may
result from differences between treatment groups (i.e., age,
weight etc) The control variables used are known as covariates.

Regression Models
• Regression models: Statistical techniques used to determine if there are
associations between two variables. A correlation coefficient is
calculated (denoted by the letter "r"). The correlation coefficient
spans between +1 and -1. Perfect correlation would result in a
correlation coefficient of 1.00 (r-1.00), and if the r value is -1,
then a perfect negative correlation exists.

• Spearman's correlation coefficient: A correlation coefficient used to assess


possible associations when data for the regression model are
nonparametric.

17
• Pearson's correlation coefficient: A correlation coefficient used to assess
possible associations when data for the regression model are
parametric.

• Multivariate regression analysis: A statistical technique used to determine if


there is a relationship between multiple independent variables
and a single dependent variable. A single continuous variable
is chosen, and other variables are evaluated to determine their
effect on the dependent variable.

• Logistic regression: A statistical technique also used to determine if there is


a relationship between multiple independent variables and a
single dependent variable. This method of analysis is designed
for a nominal variable.

Survival Analysis
• Kaplan-Meier Method: A method commonly used for estimating the effect a
variable on survival beyond a certain time point. The outcome
measure does not have to be survival; it can be any
dichotomous (yes or no) outcome that is measured over a
period of time. Subjects are followed until they experience the
outcome under study or follow-up ends. Kaplan-Meier curves
represent the probability of not experiencing the event over
time. (note: survival curves are not regular proportions
because of censored data)

™ Statistical tests commonly used to compare survival curves


include the Wilcoxon rank-sum test, Mann-Whitney U test,
and the logrank test.

• Cox proportional hazards models: A multivariable regression method used


to determine if there is a relationship between multiple
independent variables and a single dependent variable when
the dependent variable is time to an event with censored
data.

General Considerations
• Standard error of the mean (SEM): This value represents the "estimated" variation
that would occur between means from repeated samples of the same size. The
SEM is sometimes reported in the medical literature because it is smaller than the
standard deviation and it makes the data appear to have a very narrow range.
However, SD should always be used to describe variation within a sample.

SD
SEM =
n

• Bonferroni's adjustment should be used when multiple comparisons are used to


minimize the potential for α error.

18
• When a significant difference is reported using an ANOVA, post hoc results should
be given

• Parametric tests are generally considered to be "more powerful" than


nonparametric tests.

Hisham S. Abou-Auda, Ph.D.


September 2005

19
DESCRIPTIVE STUDIES A descriptive study can be used to document and communicate experiences
that the author feels are important to bring to the attention of the medical community. The investigator simply records data
from observations made and draws conclusions as to possible reasons for the events witnessed. Alternatively, de
scriptive studies may describe unusual or new events, such as the occurrence of sudden infant death syndrome (SIDS) in
sey eral siblings within a single family. Descriptive studies can help identify possible causes of or treatments for a
problem, and as such would generate hypotheses that can be tested later using experimental or other study designs.
Descriptive studies fall into two main types: (1) those dealing with individuals (case reports, case series)
or (2) those dealing with populations (surveillanceor ecological studies).28 Case re ports are based on the observations of
individual patients. They are often used to describe an adverse event following the use of a particular drug or group of
drugs, or to report a possible drug interaction. Case reports frequently generate hypotheses to serve as the basis for more
rigorous studies to examine the relationship between drug administration and the outcomes observed.
Case series document the observations from a group or se ries of patients, all of whom had been
exposed to a particular drug/substance or group of drugs. It is essentially combining together the observations from
several small or individual case reports. The outcomes are observed and recorded. Case series are also used to examine
the prior histories of patients with the same outcome in hopes of identifying a possible cause and effect relationship,
thereby generating hypotheses for further testing. Case series are useful for estimating the incidence of an adverse event
of a newly marketed drug when there is limited information available about that particular event. Case series can also help
monitor for the possible occurrence of certain adverse events that might or might not be associated with the use of a drug,
for example, suicidal ideation following halo peridol use. Surveillance studies are used to more systemati cally track and
monitor potential public health issues, 28 such as the Centers for Disease Control and Prevention's surveillance
monitoring for the occurrence of Guillain-Barré Syndrome fol lowing vaccinations. Ecological studies examine the
association or correlation between exposures to drugs or other substances and the occurrence of health problems in a
population. Most of the data needed for these studies are found in already available statistics.29 For example, an
ecological study might look at the correlation between opioid drug death rates and the number of prescriptions dispensed
for opioids.
A major limitation of descriptive studies is that they do not provide definitive explanations, test
hypotheses, determine causes, or supply evidence that one drug is superior to another. Indeed, the outcome observed
might not even be related to the drug or substance the person(s) were exposed to. Thus, readers must exercise a great
deal of caution when reading the results from case reports, case series or other descriptive reports/studies and should not
draw conclusions about causality from them.
OBSERVATIONAL STUDIES: CASE-CONTROL, COHORT, AND CROSS-SECTIONAL When
conducting observational studies, the investigators are essentially bystanders to the events under study. They exam ine
the natural course of health events, gather data about the subjects included, and then classify, sort, and analyze the data
obtained. The investigators use comparisons to explore and provide insight into the cause of medical conditions or the
risk factors associated with disease occurrence.
When evaluating the relationship between drugs and the oc currence of specific outcomes, there are
two basic approaches an investigator can take: start from the effect or outcome and work back to identify the possible
cause or exposure (case control studies), or start from the possible cause or exposure and then work to identify the
possible effect or outcome that results (cohort studies). Cross-sectional studies often involve surveys and collect data
simultaneously from the comparison groups. Some have actually classified cross-sectional studies as descriptive (dealing
with populations) in design,28 although they are considered observational studies for this discussion.
Case-Control Studies In case-control studies, one group of patients who have the target
outcome/disease (i.e., the cases) are selected and com pared with another group of similar individuals who do not have
the target outcome/disease (i.e., the controls). Cases and controls are then compared to determine the existence of pa
tient characteristics or exposures that might have contributed to development of the outcome (i.e., disease or condition) in
the cases. Since determining the presence of the targeted char acteristics or exposures always involves looking back in
time, a case-control study design is always retrospective. If investiga tors find that a greater number of case patients had
a certain exposure than control patients, it points to that exposure as a possible contributor to outcome development in
the cases. For example, suppose investigators want to study whether there is a link between aspirin use and ulcers. The
cases would consist of patients who have an ulcer, and the controls would consist of ulcer-free patients of a similar age,
and so on as the cases. The investigators would then determine the history of aspirin use in both groups of patients. If the
cases (patients with ulcers) are found to have used much more aspirin than the controls (pa tients without ulcers), that
finding indirectly points to aspirin as a possible cause of or contributor to ulcer development.
A case-control study has some advantages over other designs. Case-control studies take little time to
design, initiate, and complete because the outcome of interest is already present (i.e., the cases have it). Thus, they are
useful for studying pos sible causes of rare diseases that would require large numbers
of subjects or conditions that take many years to develop. Since case-control studies use patients who
have already developed the disease of interest, there is no need to wait for time to elapse between an exposure and
actual development of the disease.
From an ethical perspective, case-control studies have an other advantage for situations where neither
experimental nor follow-up observational studies can be sanctioned (e.g., wheth er certain drugs might cause teratogenic
effects in the fetus, if long-term use of a drug or exposure to a chemical could lead to cancer development). Further,
case-control studies are ideal for exploring possible disease etiology (“fishing expeditions") so that specific hypotheses
can be formulated to justify a future detailed investigation. There is little to no risk to the patients involved in case-control
studies because the cases already have the outcome under evaluation and the controls are being in cluded simply for
comparative purposes. Finally, case-control studies are relatively inexpensive since existing records or surveys/interviews
can often be used to collect the necessary data
echanism is
On the other hand, there are several disadvantages with case-control study design. A detailed study of
mechanis rarely possible with this design. The case-control design is suited to evaluating therapy efficacy or to study
disease proph laxis. In these situations, experimental trials should be used major problem with the case-control design is
its reliance 2 patient recall or on existing medical records or large databac for information. Completely accurate
information may not in available from medical records or databases. Likewise, infort tion concerning the dose, duration, or
administration of drid may be inadequately recorded or imperfectly remembered. 7.50 idation of information collected is
difficult or sometimes impo sible to accomplish.
The case-control design cannot control extraneous variables that may affect the cause and effect
relationship. Case-control studies are subject to antecedent-consequent relationships (i.e the chicken-and-egg
phenomenon). One cannot be sure what came first: whether the exposure/characteristic being studied really led to the
outcome/disease, or if the outcome in some way predisposed people to have characteristics that falsely ap peared to
affect disease development.
Case-control studies are also subject to numerous types of bias.30 An exhaustive discussion of biases
associated with case. control trials is beyond the scope of this chapter, but some im portant types are highlighted.
Case-control study design may be affected by recall bias, meaning that patients with medical con ditions or diseases may
recall their past exposure to factors that might affect the condition differently than those in the control group without such
conditions. Another important bias to con sider when evaluating case-control studies includes interviewer or reporter bias,
which occurs when the individual obtaining data from the cases and controls allows their own knowledge of the condition
and exposures to influence their objectivity. The appropriate selection of cases is critical to the reporting of valid results in
case-control studies. Who they are, where they come from, and how the outcome (condition or disease) is de fined
represent important considerations when selecting case and control study subjects. Selection of an appropriate control
group is difficult when conducting case-control studies because it is almost impossible to find a comparison group
identical to the cases. Selection bias occurs when unintentional differences occur between the patients in the case and
control groups that influence the study results. To help avoid selection bias, each eligible case in the target population,
irrespective of exposure, should ideally have an equal chance of appearing in the study. Methods have been developed to
help minimize problems as sociated with the proper selection of a control group in case control studies, although they
cannot eliminate the problems. These include the selection of multiple controls, in which more than one control group is
selected for comparison, and match ing, in which control subjects who share particular character istics with the cases are
identified and selected for inclusion.
Cohort Studies (Follow-up Studies) Cohort studies start with patients who do not yet have the out come
(i.e., medical condition, disease) being studied. Rather, the investigators identify characteristics or exposures they believe
may affect the development of the outcome and enroll patient groups who have (exposed, study group) or do not have
(nonexposed, control group) the characteristics or exposures. The groups of patients are then followed forward over time
and the rates or risks of outcome development are identified and compared among groups. The cohort study represents
the strongest observational study design.
Cohort studies can be concurrent (prospective) or noncon current (retrospective or historical). In both
types, the groups or cohorts of patients (i.e., those with the characteristics/expo sures and those without the
characteristics/exposures) are first selected for the study. With the concurrent cohort design, the patients are then
followed prospectively over time to determine whether there are any differences in outcome development be tween the
groups. A prospective cohort study can be very time consuming and expensive to undertake. On the other hand, the
nonconcurrent retrospective cohort design can be done more quickly and less expensively. With the nonconcurrent cohort
design, all data including the exposures and the outcomes are obtained from already existing medical records or
databases. Thus, with a nonconcurrent cohort study, the outcomes have also occurred before the investigation even
begins. The key fea ture of the retrospective cohort design is that, at the start of the study (which occurs at a designated
point in the past), the patients are selected for inclusion in either the study or control groups with no knowledge by the
investigators of whether the outcome had later developed. Once all patients are included in their appropriate group, the
investigators can then examine the existing data at later points in time to determine whether the patients in each group
did, or did not, develop the outcome of interest. Unlike the prospective cohort design, the retrospective cohort design is
subject to potential existing data inaccuracy, as with the case-control study design.
A prospective cohort study design has several advantages over the case-control study. Its prospective
nature allows for a womplete description of experiences and outcomes that occur over time, including rates of
progression, staging of disease, and natural history. The cohort design offers greater assurance that the
characteristics/exposures under study actually preceded the outcome development. It also enables study of multiple po
tential effects of a given exposure, thereby allowing for informa tion to be obtained on potential benefits as well as risks.
The cohort design permits flexibility in choosing the variables to be systematically recorded and can delineate various
types of con sequences that may be produced by a single risk factor.
In contrast to the case-control design, the cohort design (with the exception of the retrospective cohort)
has few prob lems associated with incomplete medical records, and no recall bias. Another advantage of the cohort study
design over the case-control design is that the cohort design allows for better determination of cause-effect relationships.
However, it must be noted that none of the observational study designs can prove that a cause-effect relationship exists..
Since it is still an observational study design, however, cohort studies have the disadvantages inherent
with any observational study. Cohort studies are subject to patient selection bias. Ev ery effort must be made to identify
independently each char acteristic, in addition to those targeted in the study, that might affect outcome development and
to ensure an even distribution of these factors among study groups. Another major problem of the prospective cohort
study design is patient dropout over time since they could move, fail to respond to questionnaires or follow-up visits, or
decide to quit the study, all of which could result in an uneven patient distribution between groups. Cohort studies should
report the attempts made by the investigator to track down subjects and minimize the number lost to follow-up,
identify the rate of follow-up losses, and explore the possibility of biased attrition. By examining the
characteristics of drop outs, the investigator may identify reasons for subject loss that are related to the outcomes under
study and that affect any dif ferences identified. The more similar the dropouts are to those in the study group, the less
chance there is for attrition bias. If possible, investigators should contact a representative sample of dropouts to identify
the reasons for discontinuation and to take any substantial differences into account when analyzing the study results.'
Another disadvantage associated with cohort studies is that current practice, drug usage, or exposure to
study factors of in terest may change over time, possibly invalidating the findings of the study or making them irrelevant.
Cohort studies are also subject to surveillance bias due to possible unequal examina tion or scrutiny of the patients in
each group, leading to un equal factors of interest being identified in the groups simply for that reason.32
Cross-Sectional Studies (Prevalence or Survey Studies) The cross-sectional study makes simultaneous
assessments of both the characteristics/exposures and outcomes at the same, present time in a certain population. Thus,
it provides a “cross section" or "snapshot” of the characteristics of a condition or outcome at that given point in time. The
cross-sectional design is often used to identify possible risk factors and causes of a disease or condition, to determine the
prevalence of a disease or condition at a specific point in time, or to determine if beliefs or practices might affect health or
behavioral outcomes. For ex ample, one cross-sectional study examined whether alcohol use was associated with
intimate partner violence (IPV) in Russia.33 A questionnaire gathered information about demographics, health status,
alcohol use, and IPV in persons recruited from a clinic. The participants were divided according to their alcohol use and it
was found that those who misused alcohol were 3.28 times more likely compared to those with no alcohol misuse to
perpetrate IPV.
Advantages of the cross-sectional design include that it is rel atively inexpensive to conduct and the
relatively short amount of time needed for the study due to all of the information being collected at the same time. Time is
not needed for outcomes to develop when conducting cross-sectional studies and loss to follow-up (drop-outs) is not a
problem.
There are several disadvantages of cross-sectional studies. It is difficult to truly determine the
cause-effect relationship, that is, whether the possible characteristics or exposures be ing examined were responsible for
the outcomes. The type of patients (population) selected for the cross-sectional study and their comparability with other
individuals with the same out come can affect the ability to extrapolate the findings outside of the study. Once a
population is identified for a cross-sectional study, it is important to include an appropriate sample of that population for
study inclusion to minimize selection bias. Even when an appropriate population and sample are selected for a
cross-sectional study, the patients might not complete or return surveys or questionnaires. The study findings can be
skewed if the responders differ in important ways from the nonre sponders. Finally, the use of surveys or questionnaires
depends on self-reporting which can be inaccurate.32
EXPERIMENTAL STUDIES Experimental studies are always prospective in design and in volve actual
intervention, that is, manipulation and regulation of the variables in a study by the investigators.34 There are two types of
experimental studies, controlled and noncontrolled. Controlled studies, in contrast to noncontrolled studies, use a
comparison (i.e., control) group(s) in addition to the group re ceiving the drug being investigated. Noncontrolled clinical
trials generally should not (and cannot) be used to establish efficacy of a certain treatment because, without a control or
comparison
group, it is difficult to truly know the extent to which the treat ment itself was responsible for any effects
observed. The com parison group(s) allow for determining the possible influence that other outside factors (e.g.,
environmental) could have on a study's outcomes independent of the drug being evaluated. Control/comparison groups
also allow for relative comparisons of efficacy to be made with other proven treatments. However, there are situations in
which a noncontrolled trial might still provide useful information. These include use for serious condi tions for which it is
unethical to use placebo but no other effec tive therapy exists and to provide preliminary information (e.g., possible effect
size of treatment, types of patients more likely to benefit from treatment) useful for designing and undertaking subsequent
controlled studies.
Since the controlled study is the strongest type of experi mental study, the remainder of the discussion
will focus on the controlled design. Several guides and checklists have been pub lished to assist readers in evaluating the
quality of experimental clinical studies.35,36 However, these tools have often not been subjected to reliability and validity
testing and the user should carefully consider the source of any tool used and its purpose. 36, Table 61-2 lists the criteria
often included in such checklists, and can be used as a guide for the evaluation of published clini cal drug studies,

You might also like