0% found this document useful (0 votes)
40 views

Introduction Biostat

Uploaded by

lailykurnia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Introduction Biostat

Uploaded by

lailykurnia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Biostatistics (BIOL130024.

01)
Professor Zewei Luo
Dr Chenqi Lu

Telephone: 021-55665269
E-mail: [email protected]
[email protected]
Office: 2309 Guanghua east Building
Major Teaching Components
• Course lectures

• Exercises with computer software


(MiniTab)

• Inter-course tests

• Final examination
Reference
• Text: An Introduction to Biostatistics
Thomas Glover & Kevin Mitchell
Copyright © 2002 Waveland Press

• Reference Books
(1) Mather, K. (1973). Statistical Analysis in Biology
Chapman & Hall.
(2) Elandt-Johnson, R. C. (1971). Probability Models and
Statistical Genetics John Wiley & Sons.
Chapter 1. Introduction to Data Analysis

§1.1. A general concept of Scientific Methods


Design and Evaluation of Experiments

• Observation of a particular event

• Statement of the problem

• Formulation of a hypothesis

• Design of the experiment

• Making a Prediction
Statistics -- The subject which helps design and interpret
experiments properly

• collection
• manipulation

• summarization

• analysis of experimental data

• utilization of the data to test scientific hypotheses

Biostatistics (Biometry) – Statistics in Biosciences


§ 1.2.Basic concepts in Biostatistics
• Population vs. Sample
Descriptive measure

Parameters Statistics

• Variables & Data Types


1. Quantitative variables
(a). Continuous variables or interval data
(b). Discrete variables
2. Ranked (ordinal) variables
3. Categorical data
§ 1.3. Measure of Central Tendency
(1) Mean
• Population Mean: if a population contains N entities
whose measures are x1, x2, … xN, the arithmetic mean
is given by
N
1

N
x
i 1
i

• Sample Mean: if a sample collected from a


population contains n observations: x1, x2, … xn, the
sample mean is given by
1 n
X   xi
n i 1
An example demonstrating  and X
The population measures: 1,6,4,5,6,3,8,7 with N = 8

If a sample with n = 3 is randomly collected from the population,


there are a total of 56 possible such samples. Of the samples,

Four have a mean of 5, equal to the population mean


The rest have a mean differing from the population mean

Average over all the sample means gives a mean of 5, the population mean

Sample mean is an unbiased estimate of mean of


the population from which the sample is collected.
(2) Median – the “middle” value of an ordered list of observation
Depth of an observation (d) – its position relative to the nearest
extreme (end) when the data are listed ascendingly.

Population or sample median (M or X ) is defined as the observation


whose depth is d = (N+1)/2 or d = (n+1)/2, in the above example,

X  xd ( n1)/ 2  x8  38 cm
(3) Mode – the most frequent occurring observation in a data set
In the above example, the mode = 29 cm
• Data Types Measure of Central Tendency
1. Quantitative variables 1. Mean
(a). Continuous variables
(b). Discrete variables
2. Ranked (ordinal) variables 2. Median
informative

3. Categorical data 3. Mode


§ 1.4. Measure of Dispersion and Variability
(1) Range – the difference between the largest and smallest
observations in a group of data

(2) Variance – average of squared deviates of each observation


from mean of observations in a group of data
N
1
   x  
2 2
Population variance i
N i 1

1 n
s    xi  x 
2 2
Sample variance
n  1 i 1
n
It is easy to show that x  x   0
i 1
i and

1 n
1  n
 n

2

s    xi  x    xi    xi  / n 
2 2 2

n  1 i 1 n  1  i 1  i 1  
(3) Standard deviation (s.d.)
Population s.d. () and Sample s.d. (s)

§ 1.5. Descriptive statistics for grouped data


c
fi xi 857
X i 1
  1.1 plants/quadrat

c
f 800
i 1 i

  / n  1805  857 / 800  1.11 (plants/quadrat)


2

c c
i 1
fx 
2
i i i 1
f i xi 2
s2  2

n 1 799
§ 1.6. Quartile and Box Plots
A group of n observations (data points) are in an ascending order:

x1 , x2 , , Q1 , , Q2 , , Q3 , , xn 1 , xn
X

First quartile or Second quartile Third quartile or


25th percentile or 50th percentile 75th percentile

Inter-quartile range IQR  Q3  Q1


The five-number summary: x1 , Q1 , Q2 (or X ), Q3 , xn
Box Plot – a graphic presentation of the five-number descriptive
summary

Weights of 15 lake trouts caught in Geneva’s


Lake Trout Derby in 1994 outlier

f3  Q3  1.5( IQR)  6.545 lb

Q3

X
Q1

f1  Q1  1.5( IQR)  0.535 lb


Chapter 3. Probability Distribution
Random variable (r.v.) – a variable whose actual value is
determined by chance operations, which are
fully specified by a probability distribution.

Discrete R.V. – takes discrete values, e.g. X = -1, 0, 2, 5, …


Discrete R.V. – takes continuous values, e.g. X   R etc

§ 3.1. Probability distributions of discrete r.v.


Probability distribution or probability density function, f ( ), of a
discrete r.v. X is a real function giving probability that X takes a
value of x, i.e.
f is defined for all possible values of X
f ( x)  P  X  x  f ( x)  0

 f ( x)  1
all x
Uniformity in the probability across all possible values the r.v. X may
take! This distribution is also referred as to uniform distribution.

The uniformity in the above distribution no longer holds, graphically


If the number of rolling approaches infinity, r.v. X x1 , x2 , 
  E  X    x f ( x) expected value of X
all x

  E  X    x1 xf ( x)  (1  2 
6
In example 3.1.  6) / 6  3.5
In example 3.2.   E  X   
12
i 2
xf ( x)  (1 2  2  3   112) / 36  7
In general, if H(X) is a function of r.v. X with probability distribution f ( x, )
then
E  H ( X )    H ( x) f ( x)
all x

example 3.1 X1 ,
examples 3.2 X1 +X2 (X1 i.i.d. X2 )
and example 3.3. 2 X1,

E ( X 1  X 2 )  E ( X 1 )  E ( X 2 )  2 E ( X 1 )  2  3.5  7
E (2 X 1 )  2 E ( X 1 )  2  3.5  7
In the infinite die rolling experiment, we explore variation of the
outcomes.
Var ( X )   2   ( xi   )2 1/ N   ( xi   )2 f ( xi ) 
E ( X i   ) 2   E ( X 2 )   2  E ( X 2 )   E ( X ) 
2

In example 3.1. E  X 2   15.167 and E  X   3.5 i.e Var ( X1 )  2.917


In example 3.2. Var ( X1  X 2 )  5.834  2Var ( X1 )
Var ( X1  X 2 )  Var (2 X1 )
In example 3.3. Var (2 X1 )  11.668  4Var ( X1 )
§ 3.2. Binomial Distribution

n
A discrete r.v. X  X i with Xi being discrete r.v. and they are
i
independent each other but have the same probability distribution
(i.i.d)
1 p
Xi  
0 1 p

X follows a binomial distribution with parameters n and p. Its pdf


has a form

 
n x
f ( x)  P  X  x   P  i 1 X i  x    p (1  p)n  x
n

 x
It will be easy to demonstrate that

  E( X )  E i 1 X i  i 1 E ( X i )  i 1 p  np
 
n n n

 

 2  Var ( X )  Var i 1 X i   i 1Var ( X i )  i 1 p(1  p)  np(1  p)


n n n

 
It is important to notice that a binomial r.v. is sum of outcomes of a
series of independent 0-1 trial.

Example 3.8. males of 5 children: p = 0.5 and n = 5

  np  2.5 and  2  np(1  p)  1.25

This binomial distribution is


symmetric with respect to the mean
B(10,0.25)
B(20,0.25)
In some applications, we are interested in calculating
P  X      x f ( x)  FX ( )
which is referred as to cumulative distribution function (CDF)
of r.v. X

F ( x)  0 for any x

F ( x ) is monotonically increasing, and


F ()  1.0
§ 3.3. Poisson Distribution
A discrete r.v. X = the number of occurrences of a rare event in an
interval of time or space, occurrence of the event is independently
distributed across the time interval (or space location).

pdf of X is specified by a parameter  and given by


e   x
f ( x)  P( X  x)  x  0,1, 2,
x!
It is easy to show that E ( X )   and Var ( X )  

Examples of Poisson distribution


• Accidents on a pedestrian crossing per week
• Children with meningitis in a family
• Individuals of a rare species in a quadrat
• Bacterial cells in a very dilute liquid culture
• Chiasmata between two genes on a chromosome
Use of Poisson distribution
• As with binomial distribution, events are assumed
to be independent of each other

• So, the Poisson distribution can be used to test


for independence of events

• For Example:
A rare species could be distributed at random
over a site (independent) or clumped in certain
areas (non-independent).
Possible distributions.

Random

Clumped

Uniform
Effect of  on Poisson distribution
0.7
0.3 0.3
Probability

0.5
0.2 0.2

0.3
0.1 0.1

0.1

0 1 2 3 0 2 4 6 0 2 4 6 8 10
x x x

 = 0.5  = 2.0  = 5.0

As  increases the distribution tends towards a Normal distribution.


Poisson Approximation to Binomial Distribution

For X~ B(n, p) with n  100 and np  10 then X~P() with  =np

B(20,0.05)
P(1.0)
§ 3.4. Normal Distribution
A r.v. X follows a distribution with pdf defined as
1  ( x   )2 / 2 2
f ( x)  e
2 2

then the distribution is referred as to a normal (Gaussian) distribution

(a) f ( x)  0 for x   ,  



f ( x)
 (b) 

f ( x)dx  1

x
(c) P  X  x   F ( x)   f ( y )dy

 (d) E ( X )   and Var ( X )   2
The distribution is fully characterized by the two parameters  and
2
 
P    X       f ( x)dx  68%
 

  2
P    2  X    2    f ( x)dx  95%
 
2
  3
P    3  X    3    f ( x)dx  99%
 
3

The normal distribution with  = 0 and  2 =1.0 is referred as to


standard normal distribution N(0,1). If r.v. X ~ N(,  2 ), then
X 
Z N (0,1)

This unifies evaluation of probabilities of any normal distribution
(refer to Table C.3 or Minitab).
Normal approximation to Binomial distribution
A r.v. X ~ B(n,p) has mean  = np and variance  2 = np(1-p). If np(1-p)
> 3, then X ~ N( , 2)

But it should be noted that


n i x  0.5  np
FB ( x)  P( X  x)   i 0   p (1  p)  FN (
n i
x
)
i np(1  p)
Data type pdf Interval cdf
d-density

Continuous !=Probability x  f ( x )x


  f ( x )dx

Discrete =Probability 1  f ( x)
Normal approximation to Binomial distribution
A r.v. X ~ B(n,p) has mean  = np and variance  2 = np(1-p). If np(1-p)
> 3, then X ~ N( , 2)

0.1201
0.1196

But it should be noted that


n i x  0.5  np
FB ( x)  P( X  x)   i 0   p (1  p)  FN (
n i
x
)
i np(1  p)

You might also like