0% found this document useful (0 votes)
3 views

Module+1 Introduction+to+Biostatistics%2C+Epidemiology

Biostatistics is the application of statistical tools in biological and medical data, focusing on data analysis, epidemiology, and predictive modeling. It encompasses descriptive and inferential statistics, with various applications in public health, cancer research, and genetics. The document also discusses methods of data collection, presentation, and the importance of understanding variability in health-related studies.

Uploaded by

Shiela Mae
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Module+1 Introduction+to+Biostatistics%2C+Epidemiology

Biostatistics is the application of statistical tools in biological and medical data, focusing on data analysis, epidemiology, and predictive modeling. It encompasses descriptive and inferential statistics, with various applications in public health, cancer research, and genetics. The document also discusses methods of data collection, presentation, and the importance of understanding variability in health-related studies.

Uploaded by

Shiela Mae
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

Biostatistics

What is Statistics?
STATISTICS –field of study concerned with:
1.The collection, organization,
summarization, computation and analysis
of data, and
2.the drawing of inferences about a body of
data when only a part of the data is
observed.

BIOSTATISTICS –application of statistical


tools and concepts in data derived from
the biological sciences and medicine.
Why Study Statistics?
Primarily due to VARIABILITY
Not everything in this world are exactly
the same or alike.
Ex: some bacteria are resistant while
others are susceptible to an antibiotic

There is variability within a person from


one time to another.
Ex: sitting blood pressure (BP) vs.
standing BP; morning BP vs. evening BP
Why Study
Biostatistics?
1. Data Analysis Skills
2. Medical and Public Health
Research
3. Epidemiology
4. Predictive Modeling
5. Critical Thinking
6. Interdisciplinary Applications
2 DIVISIONS OF STATISTICS
1.Descriptive statistics – uses different
methods of statistics to summarize
and present data in narrative form.
i.e. – methods of tabulation
- graphical presentation
- computation of averages
- measures of variability
2. Inferential Statistics – uses
generalizations &
conclusions about a target
population w/c is based on results
from a sample.
Application of Biostatistics
1.In community medicine and public
health
2.In cancer research
3.In advanced biomedical technologies
4.In pharmacology
5.In ecology
6.In demography
7.In population genetics and statistical
genetics
8.In bioinformatics
9.In systems biology
10.In agriculture
11.In genetics
12.In physiology and anatomy
Epidemiology - It is the study
(scientific, systematic, and data-
driven) of the distribution (frequency,
pattern) and determinants (causes,
risk factors) of health-related states
and events (not just diseases) in
specified populations (neighborhood,
school, city, state, country, global)
- It is the study of how disease is
distributed in population and the
factors that influence or determine
this distribution.
- It is the study of the distribution and
determinants of health related states
EPIDEMIOLOGY - Study of the
distribution and determinants of
diseases or conditions in a defined
population
Derived from 3 Greek roots:
“epi” = upon
“demos” = people
“logia” = study
Based on Two Fundamental
Assumptions
 Diseases do not occur by chance
 Diseases are not randomly
distributed in the population; thus,
USES OF EPIDEMIOLOGY
1. Identify factors that cause disease
2. Identify factors or conditions that can
be used or modified to prevent the
occurrence or spread of disease
3. Explain how and why diseases and
epidemics occur
4. Evaluate the effectiveness of vaccine
and different forms of therapy
5. Establish a clinical diagnosis of
disease
6. Identify the health needs of the
community
7. Evaluate the effectiveness of health
PRIMARY HEALTH CARE - Forms
an integral part both of the
country’s health system of which it
is the nucleus and of the overall
social and economic development of
the community
- It aims at improved state of health
and quality for all people attained
through self-reliance
AGENCIES INVOLVED IN PHC
• Department of • Department of
Health Budget and
• Department of the Management
Interior and Local • Philippine Hospital
Government Association
• Department of • Philippine National
Education, Culture Red Cross
and Sports • Philippine National
• Department of Social Organization
Welfare and • World Health
Development Organization
• Department of • Health Sciences
National Defense Center UP
• Department of Public • Council of Health
Works and Highways Agencies of the
• Population Philippines
2 Main types of Epidemiological Studies
1.Experimental studies. The investigator
determines through a controlled process the
exposure for each individual (clinical trial) or
community (community trial), and then tracks the
individuals or communities over time to detect the
effects of the exposure.
Example:
In a clinical trial of a new COVID 19 vaccine, the
investigator may randomly assign some of the
participants to receive the new vaccine, while others
receive a placebo shot. The investigator then tracks
all participants, observes who gets the disease that
the new vaccine is intended to prevent, and compares
the two groups (new vaccine vs. placebo) to see
whether the vaccine group has a lower rate of
disease.
Descriptive VS Analytics Epidemiology
Descriptive Epidemiology – it deals with the
questions: Who, What, When, and Where
Person: characteristics (age, sex,
occupation) of the individuals affected by the
outcome
Place: geography (residence, work, hospital) of
the affected individuals
Time: when events (diagnosis, reporting;
testing) occurred
Analytic Epidemiology – it deals with the
questions: Why and How.
- it used to help identify the cause of
disease.
- it involves designing a study to test
hypothesis developed using descriptive
Data
DATA -raw material for statistics;
 numbers from actual measurements (“how much”)

Ex: AST level, specific gravity , LAP score

 Numbers from the process of counting (“how


many”)
Ex: no. of patients with malaria at PGH
no. of patients infected with dengue virus of DENV-1
serotype

Population – totality of objects under investigation

Sample – small representative of the entire


population
Categories of Data
According to Source
1.PRIMARY DATA –prospectively
obtained by the researcher to answer
the objective(s) of his study (e.g.
surveys, experiments, observations)

Ex: A researcher wants to determine the


prevalence of mutations in the CYP120
gene in patients with coronary artery
disease. Therefore, he extracts blood
from the patients and performs lab
tests to determine the gene.
2.SECONDARY DATA –already
existing; obtained by other people
for their purposes, not those of the
researcher’s.
inexpensive, but data access may
be difficult
generally more problematic;
researcher had no control on:
data collection –issues on quality of
data
e.g. incompleteness,
inaccuracy
 rationale behind the data collection
 definitions used in classifying
individuals into categories of the
variables considered
e.g. International vs. Filipino Cholesterol
Ex: A researcher wants to determine the
number of colon cancer cases at NKTI and
their response to treatment. He tries to get
the information from the Hospital Tumor
Registry, which usually records only the
baseline characteristics of CEA patients
(e.g. age at diagnosis, tumor stage,
morphology, histopath grade, initial
treatment). Response to treatment is not
available in the registry.
SECONDARY DATA
Example: A researcher wants to
determine the number of breast cancer
cases at SLMC and their response to
treatment. He tries to get the information
from the Hospital Tumor Registry, which
usually records only the baseline
characteristics of BRCA patients (ex: age
at dx, tumor stage, morphology,
histopath grade, initial tx). Response to
treatment is NOT available in the registry.
Sources of Secondary Data
1.DISEASE REGISTRIES –contain the
names and other relevant clinical data
of all cases of diseases of public
health importance in a locality,
nationally or regionally (e.g. TB,
cancer, malaria, dengue,
schistosomiasis, etc.)
May provide running summary of the
number afflicted, clinical & treatment
status of each patient, statistics on
defaulters, recoveries & deaths
2. ROUTINELY KEPT RECORDS
hospital medical records
containing large amounts of
patient information

3.DISEASE DATABANKS
repositories of patient’s personal
data, medical history, risk
factors, diagnostics and other
ancillary procedures, clinical
information
4. JOURNALS, MAGAZINES
Scientific journals and entries
that carry information on the
topic of interest, peer-reviewed
articles

5. BOOKS
Literary works are often based on
articles previously printed and
verified by scholarly committees,
as such citations and
bibliographies are always present
Methods of Data Collection
1.USING AVAILABLE INFORMATION/
REGISTRATION METHOD –using
secondary data not analyzed or
published yet according to owner’s
objectives.
- This is subject to certain government
laws.
Data collected by DOH from hospitals
and clinics are useful in identifying
increases in the incidence of diseases,
etc.
2.OBSERVATIONS –involves the
systematic selection, watching and
recording of behavior and
characteristics of people, objects or
events
can give additional, more accurate
information on behavior of people
can validate the information
obtained from interviews, especially
on sensitive topics such as alcohol
or drug use, or stigmatizing
diseases.
often time consuming, thus most
often used in small-scale studies
Undertaken in different ways:
◦ Participant observation –the observer
takes part in the situation he or she
observes
◦ Non-participant observation –the
observer watches the situation,
openly or concealed, but does not
participate
3.INTERVIEW –one-to-one
encounter between the
interviewer and respondent to
elicit opinions or feelings which
can not be observed;
Interview schedule –done, either
face to face or through
telephones, using a list of
questions and the answers are
written verbatim by the data
collector.
4. FOCUS GROUP DISCUSSIONS
(FGD) -allows a group of 8 -12
informants to freely discuss a
certain subject with the guidance
of a facilitator or reporter.
Useful when there is no available
data on the topic and the
researcher wants to get opinions,
experiences of informants on the
topic to be able to construct a
questionnaire
5. Indirect or Questionnaire Method – It
utilizes a carefully planned and printed
questionnaire, which is given to the
samples of the study to elicit responses
that will answer the questions of the
research.
2 Types of Questions
1. Open-Ended Question – It allows a free
response from the respondents.
2. Closed-Ended Question – It limits the
responses to the checklist of answers
prepared by the researcher.
Written or self-administered questionnaire -
written questions are answered by the
respondents in written form
gather the respondents in one place at
one time, giving oral or written
instructions, letting the respondents
accomplish questionnaires
hand-delivering questionnaires to
respondents and collecting them later
sending questionnaires by mail with clear
instructions
6. Experimental Method – This is used
when the objective of the study is to
establish the relationship of certain
phenomena under controlled condition.
Sources of Bias
During Data Collection
1.DEFECTIVE INSTRUMENTS
Uncalibrated or unstandardized
weighing scales or other
measuring equipment
Questionnaires with:
◦ fixed or closed questions on topics
about which little is known;
◦ open-ended questions without
guidelines on how to answer them;
◦ vaguely phrased questions;
◦ ‘leading questions’ that cause the
respondent to believe one answer
would be preferred over another;
Ex: Do you think fainting during fever
is dangerous? vs. What do you think of
fainting during fever?
◦ questions placed in an illogical order
Bias can be prevented by careful
planning of the data collection,
by pre-testing the data collection
tools, calibrating equipment
2.OBSERVER BIAS -data collector may
only see or hear things in which (s)he
is interested or may miss information
that is critical to the research.
Bias can be prevented by the:
◦ preparation of observation protocols and
guidelines for conducting loosely
structured interviews
◦ training and practicing data collectors
◦ allowing data collectors to work in pairs
when using flexible research techniques
discussingand interpreting the data
immediately after data collection.
3.EFFECT OF THE INTERVIEW
ON THE INFORMANT
Informant may give misleading
answers or evade certain
questions if he/she is wary of the
intention of the interview
requires careful selection of
interviewers.
Qualities of Statistical Data
TIMELINESS –interval between date of
occurrence of different events and time
the data is ready for use or
dissemination
COMPLETENESS -covers the entire
geographic area and target population
within the area of interest
◦ All items/fields in the form are accomplished
ACCURACY -how close the
measurement or the data is to the true
value.
PRECISION - repeatability or
consistency of the information
when a measurement is done or an
observation is made more than
once
RELEVANCE - consistency of data
produced with the needs of the
data users.
ADEQUACY - collected data
provide all the basic information
needed to meet the objectives of
the user.
Methods of Presenting Data
1. Textual Presentation of Data
• Paragraph form.
• Very wordy and cumbersome
2. Tabular presentation of data – it
is in rows and columns form.

Exact form of a table depends on:


the purpose for which it is designed
complexity of the material

Guidelines
1.Relatively simple and easy to read
2.The title should be clear, concise,
and direct to the point; should indicate
what is being tabulated.
3.Left column is for independent variable;
the dependent variable is in the next
columns; the derived or calculated column
(often average) is on the far right.

4.Units of measurement for the data should


be given.

5.Each row and column, as appropriate,


should be labelled concisely and clearly.

6.Totals should be shown, if appropriate.

7.Codes, abbreviations, and symbols should


be explained in a footnote.
Title: Clearly state the purpose of the
experiment (e.g., The effect of
_________ (independent variable)on
__________ (dependent variable),
___(Place), ___ (Time) (if applicable)
3. Graphical presentation of data

Graphs, diagrams, charts and other


representations of data are easier to
read than tables and can present large,
complex masses of data in a simpler
language such as trends or patterns.
Guidelines
1.Self-explanatory
clear & concise title indicating what is
being represented, time element
involved, place where data relates to
if secondary data, indicate the source
of original data
2.Properly labeled scales –should
identify units of measurements in the x
and y axis; scales should have good
proportions.
3.Proper identification of trend lines
and curves -labels or legends are
utilized; no. of trend lines are kept to a
minimum.
4.Grids or guide rulings should be
drawn lightly so that lines and curves
are delineated vividly.
5.Neat, businesslike quality –devoid of
unnecessary trimmings &
draftsmanship.
6.Basis of classification is generally
placed along the x axis (sometimes the
independent variable) while
frequencies or percentages are placed
along the y axis (sometimes the
dependent variable).
7.Vertical scale should always start with
zero, showing a break only when the
range of the observations is too far
from the origin.
8.Colors add appeal but consider the
aspect of reproducing the report.
9.On an arithmetic scale, equal
distances between tick marks on an
Types of commonly used graphs for data

BAR GRAPH/CHART/DIAGRAM
(HORIZONTAL OR VERTICAL)
- used for qualitative or discrete quantitative
variable to compare absolute or relative counts,
rates, etc. between categories or a qualitative
or a discrete quantitative variable.
PIE CHART
- used for qualitative variable to show
the breakdown of a group or total
where the no. of categories is not too
many.
COMPONENT BAR DIAGRAM/CHART – same as for
pie chart
HISTOGRAM – for continuous quantitative variable;
graphic representation of the frequency distribution
of a continuous variable or measurement including
age groups.
FREQUENCY POLYGON – for
quantitative variable; same function as
histogram
LINE DIAGRAM – for time series;
shows trend data or changes with time
with respect to some other variable
SCATTERPOINT; SCATTERPLOT;
DOT DIAGRAM; SCATTERGRAPH –
for quantitative variables; shows
correlation between two quantitative
variables
Venn Diagram - Synonym: set
diagram or logic diagram
A diagram representing mathematical or
logical sets pictorially as circles or closed
curves within an enclosing rectangle (the
universal set), common elements of the sets
being represented by the areas of overlap
among the circles.
Epidemic Curve

The epidemic curve represents in a graphic


form the onset of cases of the disease, either
as a histogram, a bar graph, or a frequency
polygon.
Epidemic: local
Pandemic: international
Variables
VARIABLE – a phenomenon, character
or trait, whose values or categories
cannot be printed with certainty; it is
simply what is being observed or
measured
can take different values among
different persons, places, or things

Ex: age at diagnosis


exposure to a pathogen (yes or no)
birthweight
Classification of Variables
by Precision

QUALITATIVE VARIABLES – also


categorical or nominal variables
merely classifications, as
membership in one of a few
groups
generally described in terms of
percentages or proportions
often displayed in contingency
tables or bar/pie charts.
Examples:
Race - black/African American,
white, Asian, American
Indian/Alaskan native, Native
Hawaiian/other Pacific Islander
Cervical tissues - cancerous,
normal
Sex - male, female
QUANTITATIVE VARIABLES – also
called numeric, scaled, or metric
variables
variables that can be measured
according to an amount or
quantity;
generally described with means and
standard deviations
Examples:
Colony count
AST: ALT ratio
LAP score
Classification of Variables
by Relationship
INDEPENDENT VARIABLES -
variables that are manipulated or
treated in a study in order to see
what effect differences in them will
have on those variables proposed
as being dependent on them.
Synonyms: cause, input,
predisposing factor, antecedent,
risk factor, characteristic,
attribute, determinant, intervention
DEPENDENT VARIABLES -
variables in which changes are
the results of the level or amount
of the independent variable(s)
Synonyms: effect, outcome,
consequence, result, condition,
disease
Examples:
Alive or dead
Treatment success or treatment
failure
CONFOUNDING / INTERVENING
VARIABLES - variables that
should be studied as they may
influence or ‘confound’ the effect
of the independent variable(s) on
the dependent variable(s).
Examples:
Sex, age, ethnic origin,
education, marital status,
social status
Example: In a study of the effect of
TB on child mortality, the
nutritional status of the child may
play an intervening role.
TB - independent variable
child mortality - dependent
variable
nutritional status of the child –
confounding variable

Effectsof probiotics on total


serum IgE levels in high risk
Types of Qualitative
Variables
NOMINAL VARIABLES –
categorical variables with no
natural ordering of the categories
Ex: leukemia – lymphoblastic,
myeloid, erythroid, monomyelocytic
◦ Dichotomous or binary variable –
when the variable exactly has two
categories
Ex: Outcome – alive, dead
Pregnancy test – positive, negative
ORDINAL VARIABLES –
variables with natural ordering of
categories; the magnitude is not
important, but there is an order
to the data
Ex: Pain classified on a 4-point
scale - none, mild, moderate,
severe
Tumor stage – 0 (in situ), I, II, III,
IV
Occasionally, a variable can be
classified as either nominal or
ordinal.
Ex: Genotype classified with four
genotype
categories of AA, AB, BB, O
(nominal).
The number of A alleles are
counted and the trait is
treated as an ordinal scale,
such as 0, 1, or 2 A alleles.
Types of Quantitative
Variables
DISCRETE OR DISCONTINUOUS
NUMERIC SCALE -values only take
integers or a small number of values
Both order and magnitude are
important for discrete variables, but
the values are usually restricted to
integers or whole numbers
Ex: Number of mutant alleles
Number of patients
CONTINUOUS NUMERIC
VARIABLE
- the values are not restricted to a
set of specified values, can include
decimals or fractions
Ex: RBC pallor area: 3/4
Potassium (e.g.4.03 mEq/L)
Scales of Measurement
 NOMINAL SCALE –variables that can be
categorized into groups
 ORDINAL SCALE –variables that can be ranked
or ordered; can either be qualitative or
quantitative.

Ex: Height -qualitative if recorded as short,


medium, or tall; quantitative if recorded in actual
height measurement,
e.g. 5 ft 5 in
Pain scale (0 to 10) –quantitative, ordinal
Pain is ranked according to severity but it does not
mean that a score of 2 indicates a pain that is 2
times more severe than a pain with a score of 1.
INTERVAL SCALE –exact
distance between two categories
can be determined but the zero
point is arbitrary
Ex: Temperature –arbitrarily,
freezing point is set at 0ºC.

It does not mean that at 0ºC, there


is no temperature at all. It also
does not mean that a temperature
of 80ºC is twice as hot as 40ºC.
RATIO SCALE –similar to interval
scale but zero point is fixed; ratio
of two numbers can be computed
and interpreted.
Ex: Mass in kg – a zero kg mass
means absence of weight. A
mass of 80 kg is twice as heavy
as 40 kg.
NOTES:
Some variables are always qualitative
in nature.
Ex: Sex (male, female)
Disease status (sick, not sick)

Data in ratio scale can be transformed


to the nominal scale but not vice versa.
Ex: Height in cm can be transformed to
short/tall. But height expressed as
short/tall can not be transformed to
actual height in cm.
Other variables can be measured
both as a qualitative or
quantitative variable depending
on the objective of the data
Ex: Weight –can be recorded as
lightweight/medium
weight/heavyweight (qualitative)
or in actual weight measurement
in kg (quantitative)

You might also like