Biostat Lecture 2
Biostat Lecture 2
2 Sources of Data
Collection
1. Primary Data
2. Secondary Data
1. Primary Data
▪ Obtained first hand by the
investigator from first hand sources.
• Thesis & dissertations
• Interview and questionnaire
• Letters, diaries and
autobiographies
• Experimentation
• Journals and newspapers
2. Secondary Data
▪ Are finished products taken
from raw materials.
▪ Data w/c are already existing.
• Data obtained from registry
of cases of hospitals
• Documented materials
• Book of factual information
(textbooks)
■ Biostatistics is the application of
statistics to a wide range of topics
in biology.
What is
■ The science of biostatistics
biostatistics? encompasses the design of
biological experiments, especially
What does in medicine, pharmacy, agriculture
biostatistics and fishery; the collection,
summarization, and analysis of
mean? data from those experiments; and
the interpretation of, and inference
(Biostatistics from, the results.
meaning, ■ A major branch of this is medical
biostatistics, which is exclusively
definition & concerned with medicine and
explanation) health.
What are these numbers?
66,79, 64
What are these numbers?
66,79, 64
❖ 𝐹(𝑥) = 210 − 𝑥
variables is written as a “function.”
= 𝐻𝑅
There are two broad flavours of
variables (Types of Variables ;
Independent/Dependent Variable)
Dichotomous Variables
■The most common type of categorical
variable is “dichotomous”, meaning that it
has two levels or two possible values.
Why is this important?
✓ This is important because a
lot of the computations that will be
done in future lectures depend
upon whether or not the exposure
or outcome variables are
dichotomous, those are called two-
by-two contingency tables.
■“Dichotomize” means to convert a non-
dichotomous variable to a dichotomous
one.
■Categorizing Continuous Variables
Categorical variables with more levels
can also be created.
Sampling
▪ Descriptive Statistics
❖ Just describing the people in front of
me.
▪ Inferential Statistics
❖ Using the information to learn
something about larger population at
hand
■ The sample comes from the larger
population, also called “reference
population.”
■The sample is extracted from the larger
population, then manipulated to learn
something about the larger population.
■Statistics are performed on this
representative sample in order to infer
properties about the reference population.
■It’s important that sample be
representative.
The Null Hypothesis
oWhat is it?
▪ It is a statement that there is no
relationship between the variables we are testing.
o Why do we care?
▪ Statistical tests allow us to either “reject” or
“fail to reject” the null hypothesis.
o Ho: the average number of subjects getting
better in the test group is no different from the
average number of subjects in the placebo group.
The p-Value
o A “p-value” is
computed from a
statistical test. It
tells us whether
we should reject
the null
hypothesis.
The p-Value
O Whether or not we
reject the null is
determined by
whether the p-value
is below a certain
cut-off, which we call
the alpha value.
Traditionally, we tend
to set 0.05 or 0.01
■A convenient, though inaccurate
interpretation is that the p-value is the
probability that your result was due to
chance → More accurately, the p-value
is the probability of your test
incorrectly rejecting the null, when
indeed the null hypothesis is true.
■o A useful memory aid: “If the p is
low, the null (hypothesis) must
go.”
Confidence Intervals
■ A confidence interval is another way to express
a statistical result along with its significance
level, without having to use a p-value.
o Gives us a range value where the answer
probably sits.
Example:
– The mean age of university students is 21
years old (18, 21.5).
– The “confidence interval” of the parameter
estimate.
Commonly Use Statistical
Tests