Introduction To Biostatistics
Introduction To Biostatistics
E-mail - [email protected]
11/27/2022 Werkneh M. 1
Table of content
Chapter one: Introduction to Biostatistics
• Basic statistical concepts
• Classification of statistics
• Types of variables
• Application and limitation of statistics
• Data types and measurement scales
• Data collection methods
• Questionnaire design and interviewing techniques
Chapter two: Methods of data, organization, presentation and summarization
• Tables and diagrams
• Measures of central tendency
• Measures of variation
11/27/2022 Werkneh M. 2
Table of content…
• Types of probability
11/27/2022 Werkneh M. 3
Chapter one: Introduction to Biostatistics
Objectives:
At the end of this section the student will be able to:-
• Define Statistics and Biostatistics
• Discuss the role of statistics in health sciences
• Understand the importance of statistics in health sciences
• Describe methods of data collection
• Understand common types of variables and measurement scales
• Know how to design a questionnaires
11/27/2022 Werkneh M. 4
Introduction to Biostatistics
what is statistics?
What is biostatistics?
11/27/2022 Werkneh M. 5
Definition
Important to all professionals who want to help & make the world a better place
We see ‘‘vital statistics’’ in the newspaper, announcements of life events such as births,
marriages and deaths
11/27/2022 Werkneh M. 6
Uncertainties
Scenario; court procedures: A crime has been discovered and a suspect has been
identified. After a police investigation collect evidence against suspect, presecutor presents
summarized evidence to a jury. The jurors are given the rules regarding convicting beyond a
reasonable doubt and about unanimous decision, and then debate. After the debate, jurors vote
and verdict is reached: guilty or not guilty. Why do we need to have this time-consuming,
cost-consuming process of trial by jury? One reason is that the truth is often unknown, at least
uncertain. Perhaps only the suspect knows but he or she does not talk. It is uncertain because
of variability (every case is different) and because of possibly incomplete information. Trial
by jury is the way our society deals with uncertainties; its goal is to minimize mistakes.
11/27/2022 Werkneh M. 7
Definitions
ASA defined Statistics as “the science of learning from data, and of measuring,
controlling and communicating uncertainty.”
It is a methodological discipline, that exists not for itself, rather to offer other
fields of study a coherent set of ideas and tools for dealing with data
11/27/2022 Werkneh M. 8
Definitions …
It is a methodology that scientists and mathematicians have developed for
collecting, analyzing, interpreting data and drawing conclusions from information
and
The detailed planning that precedes all these activities
Statistics is a way of thinking about numbers (collection, analysis, and
presentation) with emphasis on relating their interpretation and meaning to the
manner in which they are collected
Two main areas
Mathematical and
Applied statistics
11/27/2022 Werkneh M. 9
Classifications of statistics
Mathematical statistics
• Development of new methods of statistical inference
• Requires detailed knowledge of abstract mathematics
Applied statistics
• Application of methods of mathematical statistics to specific area
Statistical methods can be used to find answers to the questions like:
What kind and how much data need to be collected?
How should we organize and summarize the data?
How can we analyse the data and draw conclusions from it?
How can we assess the strength of the conclusions and evaluate their
uncertainty?
11/27/2022 Werkneh M. 10
Cont’d…
The term statistics is used to mean either statistical data or statistical
methods.
NB: Even though statistical data always denote figures (numerical descriptions) it
must be remembered that all 'numerical descriptions' are not statistical data.
In order say numerical descriptions called statistics they must possess the
following characteristics:
11/27/2022 Werkneh M. 11
Characteristics of statistical data
i. Must be in aggregates – means statistics are 'number of facts.‘ A single fact,
even though numerically stated, cannot be called statistics.
ii. Must be affected to a marked extent by a multiplicity of causes.
iii. They must be enumerated or estimated according to a reasonable standard of
accuracy.
If the basis happens to be incorrect the results are bound to be misleading.
iv. They must have been collected in a systematic manner
Facts collected in an unsystematic manner and without a complete awareness
of the object, will be confusing and cannot be made the basis of valid
conclusions.
11/27/2022 Werkneh M. 12
Characteristics of statistical data…
v. They must be placed in relation to each other
Numerical facts may be placed in relation to each other either in point
of time, space or condition
That is, they must be comparable
Statistical Methods
It refers to a body of methods that are used for collecting, organising,
analysing and interpreting numerical data for understanding a
phenomenon or making wise decisions.
It is a branch of scientific method and helps us to know in a better way the
object under study.
11/27/2022 Werkneh M. 13
Statistical cycle
11/27/2022 Werkneh M. 14
Statistical cycle …
• The Problem step is about turning vague feelings into precise informational
goals, and specific questions that should be able to be answered using data
• Forming useful questions that can realistically be answered using statistical data
• Plan step is about deciding what people/objects/entities to obtain data on, what
things we should “measure,” and how we are going to do all of this.
• Data step is about obtaining the data, storing it, and “whipping it into shape”
(data wrangling and cleaning).
• Analysis step obtaining results through detailed examination of anything
complex
• Conclusions step, are about making sense of it all and then abstracting and
communicating what has been learned.
• In fact there is back-and-forth between the major steps
11/27/2022 Werkneh M. 15
Biostatistics
• Statistics is a very broad subject, with applications in a vast number of different
fields — business, education, psychology, agriculture, and economics, to mention
only a few.
• When the data analyzed are derived from the biological sciences and medicine,
we use the term biostatistics to distinguish this particular application of statistical
tools and concepts.
• Biostatistics is a branch of applied statistics concerned with the application of
statistical methods to medical and biological problem.
• However, in some circumstances standard methods may be modified by
biostatisticians.
11/27/2022 Werkneh M. 16
Rationale of Biostatistics
Statistical method is used to explain and predict some of the health outcomes
and the direction of epidemics and pandemics, and
It definitely influences decision-makers in public health.
Guide public health and other healthcare practitioners on how to go about
controlling these diseases.
Biostatistics is a critical and invaluable tool in developing public health policy
and initiatives
Scientific community rely on accurate and timely data to deal with outbreaks
of infectious diseases such as Ebola and COVID-19.
From designing a trial to protocol development, biostatistics helps the clinical
research realm in various ways. The below-mentioned areas are to name a few.
11/27/2022 Werkneh M. 17
Relevance of biostatistics
Program evaluation
To create population based interventions
Identifying barriers
To determine the accuracy of measurements
To compare measurement techniques
To assess diagnostic tests, to determine normal values
To estimate prognosis and to monitor patients
Provide a way of organizing information
Assessment of health status
11/27/2022 Werkneh M. 18
Relevance of biostatistics
Resource allocation
To determine the nature of information
Magnitude of association: Strong vs weak association
Assessing risk factors: Cause & effect relationship
Evaluation of a new vaccine or drug
Is the effect due to chance or some bias?
Drawing of inferences: Information from sample to population
11/27/2022 Werkneh M. 19
Limitation of statistics
• It deals with only those subjects of inquiry that are capable of being
quantitatively measured and numerically expressed
11/27/2022 Werkneh M. 20
Types of Statistics
Depending on how data are used, statistics divided into two
1. Descriptive statistics
2. Inferential statistics
1. Descriptive statistics
All about a set of information that has been collected only
Consists of the collection, organization, summarization, and presentation of data.
Focus on just mere descriptive, do not involve generalizing beyond the data at
hand.
There are 3 main types of descriptive statistics:
The distribution concerns the frequency of each value.
The central tendency concerns the averages of the values.
The variability or dispersion concerns how spread out the values are.
11/27/2022 Werkneh M. 21
Types of Statistics…
2. Inferential statistics.
Generalizing from our data to another set of cases or from samples to populations
11/27/2022 Werkneh M. 23
Definition of common …
Parameter: It is numerical expression of population measurements
Example
Population mean (µ),
Population variance,
Population standard deviation, etc.
Statistic: It is numerical expression of sample measurements.
Example:
Sample mean,
Sample variance (s2 ) and
Standard deviation (s).
11/27/2022 Werkneh M. 24
Definition of common …
Variable: a characteristic that takes different values in different persons, places, or
things /not the same when observed in different possessors/
E.g. blood pressure, heart rate, height, weights, ages, Sex, temperature, etc.
Can be classified as
• Categorical (Qualitative ) and
• Numeric variables (Quantitative)
11/27/2022 Werkneh M. 26
Definition of common…
Categorical variables
• characteristic that can’t be quantifiable.
• Variables that can be placed into distinct categories
• Can be either nominal or ordinal.
• E.g. gender (male or female), religion, residence etc.
• Data: are the values (measurements/observations) that the variables can assume .
• Random variables: variables whose values are determined by chance
• Data set: a collection of data values
• Data value or a datum - each value in the data set
11/27/2022 Werkneh M. 27
Summary of variable classification
Variables may be classified into two main categories: categorical and numeric. Each
category is then classified in two subcategories: nominal or ordinal for categorical
variables, discrete or continuous for numeric variables.
11/27/2022 Werkneh M. 28
Measurement/observations
Assignment of numbers to objects or events according to a set of rules
Different set of rules may produces various measurement scales
Basis of all scientific inquiry and essential component of research methodology.
A critical junction between scientific theory and application,
It is a process through which researchers describe, explain, and predict
phenomenon and constructs of our daily existence
Measurement is important in research design in two critical areas.
1. Allows researchers to quantify abstract constructs and variables.
2. Level of statistical sophistication used to analyze data derived from a study is
directly dependent on the scale of measurement that quantify variables of interest.
11/27/2022 Werkneh M. 29
Measurement scales
It has been a common practice to see four basic types of data (scales of
measurement) under these broader nonmetric & metric measurements
Nominal Scale
Ordinal scale
Interval scale
Ratio scale
Nominal and ordinal scales are nonmetric measurement scales.
11/27/2022 Werkneh M. 30
Nominal Scale
Least sophisticated and lowest type of measurement
It consists of naming and used only to qualitative data
Classifies data into mutually exclusive, exhausting categories
No order or ranking can be imposed in a quantitative sequence
Used only to qualitatively classify or categorize not to quantify
No absolute zero point
Impossible to use or apply standard mathematical operations
Purely descriptive and cannot be manipulated mathematically
e.g. Sex( Female, male), Exam result (Pass, Fail), Blood Group (A,B, O or
AB), Color of Eyes (blue, green brown, black), marital status, etc
11/27/2022 Werkneh M. 31
Ordinal scale
Build on nominal measurement. However,
The ability to measure both identity and magnitude of variables makes it a
higher level of measurement
Ordering of variables with some number representing more than another
Information about relative position but not the interval between the ranks or
categories
Qualitative in nature and lacks the mathematical properties necessary for
sophisticated statistical analyses.
Way of thinking is using the concept of greater than or less than
NB: it is impossible to express the real difference between measurements in numerical
terms. .
11/27/2022 Werkneh M. 32
Ordinal scale…
There are both category and rank. However, precise differences between
the ranks do not exist
spaces or intervals b/n the categories are not necessarily equal
Ordinal measurements tell you the direction of difference between two
individuals
Example
Pain level (Mild, Moderate, Severe), Tumors (Stage 0-IV)
Response to treatment (poor, fair, good)
Severity of disease (mild, moderate, severe)
Income status (low, middle, high)
11/27/2022 Werkneh M. 33
Metric measurement scales
• Interval and ratio scales are the two types of metric measurement scales,
• They are quantitative in nature.
• they represent the most sophisticated level of measurement and
• lend themselves well to sophisticated and powerful statistical techniques
11/27/2022 Werkneh M. 34
Interval scale
Build on ordinal measurement by providing precise/absolute/ differences between units
and
Identify direction & magnitude of a d/ce
Numbers scaled at equal/same distances
There is no meaningful/true zero, instead zero point is arbitrary
Lack of an absolute zero point makes division and multiplication impossible
Addition and subtraction are possible
• e.g. on either the Fahrenheit or Celsius scale, zero does not represent a complete absence of
temperature and
• the difference between 10 and 20 degrees is the same as the difference between 40 and 50
degrees. There might be a qualitative difference but the quantitative difference is identical—10
degrees
• But 30 º C is not twice as hot as 15 º C
11/27/2022 Werkneh M. 35
Ratio scale
The second type of metric measurement scale
Identical to the interval scale, except that they have an absolute zero point
Data is presented in frequency distribution in logical order.
Unlike with interval scale data, all mathematical operations are possible
Highest level of measurement and allow for the use of sophisticated
statistical techniques.
Ratio matters – has a natural (true)zero value. E.g. Salaries i.e. a value of
zero indicates none of the variable.
Examples include money, height, weight, and time
Ten dollars is 10 times more than 1 dollar, and 20 dollars is twice as much as 10
dollars.
11/27/2022 Werkneh M. 36
Cont’d…
There is not complete agreement among statisticians about the classification of
data into one of the four categories.
Most data a nalysis techniques that apply to ratio data also apply to interval data
Therefore, in most practical aspects, these types of data (interval and ratio) are
grouped under metric data
In some other instances, these type of data are also known as numerical discrete
and numerical continuous
11/27/2022 Werkneh M. 37
Data collection methods/techniques
11/27/2022 Werkneh M. 38
Data collection methods…
Data can be collected in a variety of ways and
Factors influencing choice of data collection methods are
Type of data/variable
Objective of the study
Availability of data
Money
Time
Quality needed
Characteristics of study participants and
Types of data
Primary source
Secondary source
11/27/2022 Werkneh M. 39
Sources of Data
2- External sources
Published reports, commercially available data banks, or the research literature
3- Surveys – DHS
4- Experiments
Data collection methods/techniques
Methods of data collection
• Formal testing (psychological, educational, academic, intelligence)
• Observation
• Face-to-face and self-administered interviews
• Postal or mail method and telephone interviews
• Using available information
• Focus group discussions (FGD)
• Other data collection techniques – Rapid appraisal techniques, life histories, case
studies, etc.
11/27/2022 Werkneh M. 41
Observation
Systematically selecting, watching and recoding behaviors of people or
phenomena
Give additional, more accurate information on behavior than interviews or
questionnaires
Can be used in any kind of research (quantitative, qualitative)
It can be simple visual observations or
high level using sophisticated machines and equipment such as biochemical, X-
ray machines, microscope, clinical and microbiological examinations.
Efficient way when researchers studies & quantifies some type of behavior.
There should be standardized outline/guidelines for the observations prior to
actual data collection to ensure adequate validity and interrater reliability
11/27/2022 Werkneh M. 42
Observation
• The oldest and most basic tool
• Data’: body language, facial expression, behavior, other non-verbal expressions,
movements, etc.
• Observation of behaviors, actions, activities and interactions helps in understanding
more than what people say about situations, and complex situations more fully.
• e.g. a researcher may be interested in studying cooperative behavior of university students in a
classroom
11/27/2022 Werkneh M. 43
Advantage
Provides deep understanding and relatively more accurate data on behavior and
activities
Permit researcher/evaluator to enter into and understand situation/context.
Gives more detailed and context related information
Allows to observe whether people do what they say they do; often used for
exploring deviant behavior
Useful to access tacit knowledge of subjects - the subconscious knowledge that
they would not be able to verbalize in an interview setting
Provide direct information about behavior of individuals and groups.
11/27/2022 Werkneh M. 44
Disadvantages
Time consuming and expensive
Observer bias: selective perception of observer may distort/affect the investigation.
Ethical issues concerning confidentiality or privacy may arise.
May affect behavior of participants
Need well qualified, highly trained observers; may need to be content experts
during the use of high level machines and good skills in research and local
language;
Time lag between observation and note taking is likely thus requires good memory
and ability to take note afterwards
11/27/2022 Werkneh M. 45
Interview
• Oral questioning of respondents to tell us their story in their own ways
assuming their perspectives are meaningful, knowable and can affect the
success of the research/project.
• Relatively simple approach, but it can produce a wealth of information
• Relatively inexpensive and efficient way to collect a wide variety of data that
does not require formal testing.
• The effectiveness of an interview depends on how it is structured
• Interviewer should be trained properly to avoid variation of collected data
• Responses can be recorded by:
Writing them down
Tape-recording
Combination of them
11/27/2022 Werkneh M. 46
Interview …
Interviews can be conducted with varying degree of flexibility
High degree of flexibility: when
Studying sensitive issues (e.g. commercial sex)
The researcher has little understanding of the problem
Is frequently applied in exploratory studies
Low degree of flexibility: when
Researcher is relatively knowledgeable about expected answers or
The number of respondents being interviewed is relatively large
Questionnaires may be used with a fixed list of questions in a standard
sequence, which have mainly fixed answers
11/27/2022 Werkneh M. 47
Types of interview
In-depth Interviews
The interviewer does not follow a rigid form rather limited set of open-ended
questions with extensive probing
Interview Guide
Includes a list of questions or issues that are to be explored
Helps the interviewer to remember the points to cover
Suggests ways of approaching and talking about topics make interviewing more
systematic and comprehensive.
Reminds the interviewer about probes and ways of asking questions
Gives a possible order of topics; and
Helps the interviewer to enable people to talk in their own ways, and fully as
possible.
Includes an introduction and way of ending the interview.
11/27/2022 Werkneh M. 49
Conducting an In-depth …
11/27/2022 Werkneh M. 50
Cont’d…
Important points to remember
Avoid variations in the interview setting
Consent of the respondent should be obtained
Knowledge gap between the interviewer and interviewee must be considered to facilitate
understanding
Use a tape record and or a note taker who can assist
Confidentiality of information must be maintained
Receive the information accurately
Information can be distorted by interviewer
• Fatigue
• Bias (expectation of answers),
• Preoccupation with taking notes and
• By technical languages (Jargons) foreign to the interviewee
11/27/2022 Werkneh M. 51
Tips for Effective Interviewing
11/27/2022 Werkneh M. 54
Telephone interview
They are less costly
People may be more candid in their opinions since there is no face
to-face contact.
A major drawback to the telephone survey is that some
People may not have phones or may not answer when the calls are
made. chance of being surveyed??
Unlisted numbers and cell phones, so they cannot be surveyed.
Tone of the voice of the interviewer might influence the response of
the person who is being interviewed.
We may not get a rich and detailed responses
11/27/2022 Werkneh M. 55
Mail interview
11/27/2022 Werkneh M. 56
Use of documentary sources/available information
Available information includes
Clinical and other personal records,
death certificates,
published mortality statistics,
census publications, etc. Examples include
Official publications of Central Statistical Authority
Publication of Ministry of Health and Other Ministries
News Papers and Journals.
International Publications like Publications by WHO, World Bank, UNICEF
Records of hospitals or any Health Institutions.
Use of key informants is important technique to gain access to available
information
11/27/2022 Werkneh M. 57
Advantages
Ease of access
Time-saving
Longitudinal analysis
Data may not always be complete and precise enough, or too disorganized
Biasness
11/27/2022 Werkneh M. 59
Self-administered questionnaire
11/27/2022 Werkneh M. 60
Focus group discussion (FGD)
FGD is a qualitative data collection method of approximately 6-10 people guided
by a facilitator and discuss an issue/research topic among themselves in depth
People with something in common discussing an issue /Homogenous group
composition/
It is focused Ideal size 6-10, maximum =12, minimum=4
Far advanced and used more than in-depth interview, because
Group interaction stimulate richer responses and emergence of new ideas
Observation: The researcher observes and gets first hand insights
Cost and timing: (FGD) can be done more quickly and generally less expensively
than series of depth interviews
11/27/2022 Werkneh M. 61
FGD…
It requires
Relaxed, informal/natural setting
Time duration: 1:30 - 2:00 hours (it depends)
One leader (moderator) One assistant (or co-moderator)
Good communication skills
Records all the verbal and non-verbal message, in addition to note
FGD guide: tool used to guide the discussion, probe/follow up questions
When to use FGD
Idea generation
Problem identification and definition
Evaluation of message concepts
Program design
11/27/2022 Werkneh M. 62
FGD… Advantages
Produce a lot of information in less time and cost /Richness of data/
Excellent for obtaining information from illiterate communities
generally more acceptable among community
Explore variety of opinions or views within a group
Flexibility in covering topics/May uncover unanticipated ideas that are important/
Group synergy (group dynamics)
Broader understanding by providing well-grounded data on social and cultural norms
with respect to health and disease, prevention of disease
Ability to study special respondents
youth
Professionals (doctors, lawyers)
11/27/2022 Werkneh M. 63
Disadvantages
Needs more time for preparations
Requires good facilitation skills
Lack of generalizability (small sample size)
High selection bias
Might be misused
FGD is not a replacement for quantitative research
Subject to Interpretation - subjective
Cost-per-respondent is high (compared to survey)
Results dependent on skill of moderator in running the group and analysis
(researcher)
11/27/2022 Werkneh M. 64
Good facilitation skills
11/27/2022 Werkneh M. 65
Good facilitation skills…
Treating the participant as the expert?/ does not agree or disagree with what is
said, no right or wrong answers
Active listening; Pay attention to what participants say and follow up with
relevant questions and probes
Don’t lose control
Build rapport with and among participants; it starts from welcoming
Prevent dominance
Skills Required for Moderator
Observation
Interpersonal
Communication
Interpretive
11/27/2022 Werkneh M. 66
FGD… challenges
More time for preparations
The moderator may disrupt group interaction
Group members can affect each other “polarization”(two division)
FGs will not work well for some sensitive topics due to self-disclosure
Individual behaviors
Criticizing/undermine others’ ideas
Blockers
Recognition seekers
Withdrawing - no view
Dominating/authority, “I know everything “expect to others to confirm
Fear of consequences
Hip pocket decision, the influence of those in power
11/27/2022 Werkneh M. 67
Questionnaire Design
General Considerations
With only a few exceptions, long questionnaires get less response than short
questionnaires.
68
Questionnaire design…
70
Reading assignments
11/27/2022 Werkneh M. 71
Thank you!!
11/27/2022 Werkneh M. 72