Introduction to Biostatistics
George Mwasekaga, MD
Related Tasks
• a) Define terminologies used in
biostatistics
• b) Explain importance of biostatistics
• c) Explain importance of data stratification
• d) Explain different types of biostatistical
data
Definition of terms
• Biostatics can be defined as the
application of statistics to biological
problems
• Statistics: A descriptive measure
computed from the data of a sample OR
• Is a science of analyzing data to gain
useful information
• Data is a collection of observations
expressed in numerical figures
• A variable - characteristic that can be
measured or observed
OR an attribute that can take different
values for different objects
Example;
– Sex : Male, female
– Weight: Kg, Ibs
– Height: Tall, short
– Level of knowledge: High, low
• Population: A collection of entities
• Sample: is a small group of units selected
from the study population
• Parameter: A descriptive measure
computed from the data of a population
Why do doctors need statistics?
• ACTIVITY 1
• Work in pairs. Please think of 3 reasons
doctors need statistics?
Why do doctors need statistics?
Formal way of organising ideas, helps make
decisions; Evidenced-based medicine
• Old doc Jim uses choroquine for malaria
• Young doc Mohammed uses Coartem, he
thinks cholorquine doesn’t work any more
• What are you going to use?
Why do doctors need statistics?
• Allows us to deal with variation
– Not every person who does not receive a vaccine for
measles will get measles – some are naturally immune
– There may be a few people who are vaccinated
against measles who still get measles – some do not
respond to the vaccine
• Statistics provides a scientific way of
dealing with biological variation, allows
calculation of risk
Why do doctors need statistics?
• Allows us to explain if relationships
between variables are due to chance
– Smoking and lung cancer – not every patient
that smokes will get lung cancer. Is the
relationship due to chance or is it significant?
Importance of biostatistics
• Provides a standardized technique to cope
with inevitable biological variability
• Biostatical methods are a numerical
approach that can quantify data and also
account for variation
• Needful for both data analysis and study
designs.
Application of Biostatistics
Statistical methods have a role to play in:
• Official health statistics/statements (e.g.
trends of number of cases of a disease over
time).
• Epidemiology (e.g. association of diseases
with some risk factors).
• Clinical studies (e.g. comparison of
treatments in clinical trials).
• Health service administration (e.g. resource
allocation).
What is a variable?
• Variable is any aspect which is observed or
measured in a study
• In medical terms this information is collected in a
study which aims to answer a certain question.
• Studies make observations on an individual. The
aspects of the individual that are measured are the
variables (height, weight, eye colour)
• There may be only one variable in the study or
many
What are data?
• Data is the numerical result of counting
these variables
• Different studies collect different data. It is
important to think about what data you
want before starting the study.
Summary DATA survey of TB patients
ACTIVITY 2: What are the variables?
What are you observing/counting?
Age Sex Weight (KG) Smear result Alive after 6
months?
76 F 62 Negative Yes
55 M 89 Positive No
82 F 51 Positive Yes
74 F 67 Negative Yes
62 M 74 positive No
Variables –
factors you are observing/measuring
• Age
• Sex
• Weight
• Smear test result
• Whether dead or alive at 6 months
Types of Variable
QUANTATIVE QUALITATIVE
• Numerical • Non-numerical
– Height – Eye colour
Qualitative (categorical variables)
Do not take numerical values, but are
recorded and reported in categories
– Marital status
(single, married, cohabiting, separated, divorced, widowed)
• If only 2 categories = BINARY
– Sex
(male or female)
• If > 2 categories + natural order=
Ordered categorical
– Level of knowledge
(poor, average, good)
Quantitative or numerical variables
Discrete - taking whole numbers only
➢ number of births
➢ number of newly admitted patients
Continuous - taking any value within
meaningful extremes, infinite number
➢ Haemoglobin
➢ Birth weight
ACTIVITY:
What type of variables are these?
• Age
• Sex
• Weight
• Smear test result – positive or negative
• Whether dead or alive at 6 months
• Blood pressure
• Number of episodes of diarrhoea in one
month
• Village
Defining the variables
• Age – QUANTATIVE, CONTINUOUS
• Sex-QUALITATIVE, BINARY
• Weight – QUANTATIVE, CONTINUOUS
• Smear test result - QUALITATIVE,
BINARY
• Whether dead or alive at 6 months -
QUALITATIVE, BINARY
Outcome and Exposure variables
• Outcome = what happens
• Outcome variable (Response variable)
– what we are interested in, what we are
looking at
• Exposure variable (Explanatory variable)
– factor which may influence/explain the
outcome
Outcome and Exposure Variables
OUTCOME VARIABLE EXPOSURE VARIABLES
What you are measuring/what Factors affecting the change in
you want to know outcome
RESPONSE VARIABLE EXPLANATORY VARIBALE
DEPENDANT VARIABLE INDEPENDANT VARIABLE
• Dead at 6 months • Sex
• Blood pressure with new • Age
drug • Smear result
• Malaria yes or no • Sleeping under a mosquito
• Number of malarial net
parasites
Migration and HIV-1
• Outcome (Dependant) Variable:
– HIV-1 positive (antibody testing)
• Exposure (independent variables):
– Not changed address in 3 years
– Moved within village
– Moved to neighbouring village
– Left the area
– Joined study area in last 3 years
– Years lived at present house
– Number of lifetime sexual partners
Continuous data in categories
• If large amount of continuous data, useful
to create categories for it
• Define number of decimal places
• Create frequency table= recording number
of events/occurences in each category
Frequency Tables
• Sometimes it is useful to put numerical data
into categories. This is a way of summarising
the data.
• If you have more that 20 observations you
can form a frequency distribution.
• For discrete variables you can make a table
for each value or for groups of values
• For continuous variables, you need first to
form categories or groups
Discrete data: Frequency table
Method of delivery No. Of births Percentage
Normal 478 79.7
Forceps 65 10.8
Caesarean section 57 9.5
Total 600 100.0
Continuous data - FREQUENCY
TABLES
Exam # of
Score Exam # of
Students
Score Students
40-44 1
40-49 1
45-49 0
50-59 4
50-54 2
60-69 12
55-59 2
60-64 6 70-79 10
80-89 2
65-69 6
90-99 1
70-74 3
75-79 7
80-84 2
85-89 0
90-94 1
Study of 70 women - haemoglobin
• Create a frequency table for this data
• How many categories will you have?
• How will you makes sure that the
categories do not overlap?
• How will you draw the table?
Assignment
• Explain importance of data stratification
References
• Antony Stewart(2002). Basic Statistics and
Epidemiology A practical guide. Radcliffe
Medical Pres
• Bonita, R. Beaglehole, & T. KjellstrÖm
(2006). Basic epidemiology World Health
Organization
• Bland, M. (2000). An introduction to medical
statistics (No. Ed. 3).Oxford University Press.
• Johnson, B., & Christensen, L. (2008).
Educational Research: Quantitative,
qualitative, and mixed approaches. Sage.