Introduction To Biostatistics
Introduction To Biostatistics
Biostatistics
Lecture Objectives
◼ Overall: To give a basic understanding
of descriptive statistics
◼ Specific:
– understand the branches of statistics
– understand the different types of data that
can be collected
Statistics
◼ The science of collecting, monitoring,
analyzing, summarizing, and
interpreting data.
– This includes design issues as well.
Branches of Statistics
◼ Descriptive statistics
– Gives numerical and graphic procedures to
summarize a collection of data in a clear and
understandable way.
– Provide summary indices for a given data, e.g.
arithmetic mean, median, standard deviation,
coefficient of variation, etc.
◼ Inductive (inferential) statistics
– Provides procedures to draw inferences about a
population from a sample
sample Population
◼ Hair color
– blonde, brown, red, black, etc.
◼ Opinion of students about riots
– ticked off, neutral, happy
◼ Smoking status
– smoker, non-smoker
Categorical data classified as Nominal,
Ordinal, and/or Binary
Categorical data
Nominal Ordinal
data data
Measurement
data
Discrete Continuous
Discrete Measurement Data
Only certain values are possible (there
are gaps between the possible values).
0 1 2 3 4 5 6 7
Continuous data -- Theoretically,
no gaps between possible values
0 1000
Discrete Measurement Data
Examples
◼ Number of pregnancies
◼ Number of students late for class
◼ Number of crimes reported
◼ Number of huts in a sampled rural home
◼ CD4 counts
The mean is
= 137.14
Median
Median=132
Example 2. Median if– n is even
Six men with high cholesterol participated in a study to
investigate the effects of diet on cholesterol level. At the
beginning of the study, their cholesterol levels (mg/dL)
were as follows:
366, 327, 274, 292, 274 and 230.
Rearrange the data in numerical order as follows:
x1 87 87
x2 95 95
Median is unchanged x3 98 98
x4 101 101
x5 105.0 1050
37
Measures of Variation
◼ Summarize the dispersion of individual
values from some central value like the
mean
◼ Measures of dispersion characterise how
spread out the distribution is, i.e., how variable
the data are.
mean
x
x
x
x
x
x
38
Indices of Variation
◼ Commonly used measures of
dispersion include:
– Range
– Variance & standard deviation
– Inter-quartile range (IQR)
– Coefficient of Variation (or
relative standard deviation)
Range
R = (xmin ,xmax )
Inter-quartile Range
◼ IQR = third quartile - first quartile
or, equivalently
IQR = Q3 - Q1
Q1 =lower quartile (has 25% of data
below and 75% above)
Q3=upper quartile (has 75% of data
below and 25% above)
IQR:-Example
(
i =1
Xi − X ) 2
n
◼ Variance of a sample: usually subtract 1
from n in the denominator
n
(
i =1
Xi − X ) 2
n −1 effective sample
size, also called 43
degree of freedom
Standard deviation
◼ Problem with variance: its awkward unit
of measurement as value are squared
◼ Solution: taking square root of variance
=> standard deviation
◼ Sample standard deviation ( s or sd)
(x − x)
2
i
s= s =
2 i =1
n −1 44
What is a standard deviation?
◼ it is the typical (standard) difference
(deviation) of an observation from the mean
◼ think of it as the average distance a data
point is from the mean, although this is not
strictly true
Example
Data Deviation Deviation2
151 13.86 192.02
124 -13.14 172.73
132 -5.14 26.45
170 32.86 1079.59
146 8.86 78.45
124 -13.14 172.73
113 -24.14 582.88
Sum = 960.0 Sum = 0.00 Sum = 2304.86
x = 137.14
Example (contd.)
7
(x − x ) = 2304.86
2
i
i =1
2304.86
Therefore, s=
7 −1
= 19.6
Standard deviation
◼ Caution must be exercised when using
standard deviation as a comparative index of
dispersion
Weights of Weights of
newborn elephants newborn mice (kg)
(kg) 0.72 0.42
929 853 n=10 n=10
0.63 0.31
878 939 X =887.1 X = 0.68
0.59 0.38
895 972 sd =56.50 sd = 0.255
0.79 0.96
937 841
1.06 0.89
801 826
cvmice = 0.375
53