0% found this document useful (0 votes)
8 views

Lecture 7 9

Uploaded by

Pelumi Isaac
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 7 9

Uploaded by

Pelumi Isaac
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 68

Basic Probability Theory

& Probability
Distributions
(Lectures 7 – 9)
Rotimi F. Afolabi, PhD
Department of Epidemiology & Medical Statistics
College of Medicine
([email protected]/08066075196)
Objective

• To introduce basic concepts probability


in theory and probability
distributions

Probability theory &


distributions_PSM2018
Probability – what/why?
• Probability is the likelihood or chance that something will
happen
– wordslike might, almost or certainly
are used to indicate how likely or unlikely
an event is
 We might want to do more with data than just describing
it …
– by testing certain specific inferences about the behaviour of the
data
– Probability statements about the significance of our statistics are
required
– Basic to statistical inference
Probability theory &
distributions_PSM2018
Probability and decision making

 Have you ever had a difficult decision to make as a


student, nurse, lab. Scientist, pharmacist, medical
practitioner or an individual?
 Did you end up making decision based on your
intuition?
 Most times, the decisions we make are the wrong ones
 Sometimes decision making may be hard
 Generally, every person's life is daily affected by a
series of "probability models" that profoundly affect
his or her future
Probability theory &
distributions_PSM2018
Three Approaches
To Defining
Probability

Probability theory &


distributions_PSM2018
 Classical or Priori Probability
−A probability which we quote from prior information
−e.g. tossing of a coin. The outcome is either head or tail

 Empirical or Frequency concept


−A probability in which we calculate based on information
collected
−i.e., No of times the event occurs in a long series of trials

 Subjective
−A probability based on our own judgment
−i.e. on our personal experience; degree of belief

Probability theory &


distributions_PSM2018
Type Frequency
A 22
Example B 5
AB 2
O 21
Total 50
• P(E) = P(blood group A) = 0.25
– using the classical interpretation of probability
• P(E) is approximately 22/50 = 0.44
– using the relative frequency interpretation of probability
• P(E) = P(blood group A) = ??
– using the subjective interpretation of probability
Probability theory &
distributions_PSM2018
Probability Terms & Definitions
• Experiment: any process which when
repeated generates a set of result of observations
– In statistics, an experiment may not necessarily
be
laboratory based e.g. balloting, throwing a coin
• Trial: particular act of any experiment
• Outcome: the result of a trial
• Sample Space: totality of all possible outcomes
of an experiment
• Equally Likely: all outcomes of the experiment have
the same chance of occurrence
Probability theory &
distributions_PSM2018
Events
• Subset of a sample space
– an outcome or a set of outcomes of an
experiment
defined by a given rule

• Example: Tossing a coin three times


– Event A = getting exactly two heads =
{HTH, HHT, THH}
• Example: Tossing a fair dice
– Event A = result is an even number = {2, 4, 6}

Probability theory &


distributions_PSM2018
Types of Event
• Simple event
– A particular result of an experiment that has
only one outcome of interest
• Compound events
– More than one possible outcome of interest
• Mutually Exclusive events:
– Two events A and B are said to be mutually exclusive, if
they cannot occur together
– e.g., having a male prevents a female child in a single birth
• Independent events
– The occurrence of one event does not
affect the occurrence of the other event
– e.g., having a male does not influence a female child in a
multiple birth
Probability theory &
distributions_PSM2018
Values of Probability
 It is non-negative

 P(E) = 0; This means the event cannot take place

 P(E) = 1; This means the event is sure to take place

 0 ≤ P(E) ≤ 1

Probability theory &


distributions_PSM2018
Probability
Thermometer
1.0

High

0.5 Fair

Low
0.0

Probability theory &


distributions_PSM2018
Useful set notations
• The joint occurrence of two sets (the intersection)
contains the common elements
– The intersection is the joint occurrence of A and B
– Notation: A and B, A ∩B

• Two sets are mutually exclusive if the sets have no


common elements

• The union of two sets, A and B, is another set


which consists of all elements belonging to set A or
to set B (or some may belong to both sets)
– Notation: A or B, A ∪B
Probability theory &
distributions_PSM2018
Example of an Intersection of Events
• 40 patients (characteristic/event A)

• 30 students aged ≤ 30 years (characteristic/event B)

• Suppose we are told that there are 20 patients aged


≤ 30 years
• The intersection of sets A and B is the set of
20 patients aged ≤ 30 (A and B)
– The intersection is the joint occurrence
of students who are patients and of age ≤
30 years
Probability theory &
distributions_PSM2018
Example of Mutually Exclusive Events
• 80 300level students (characteristic A)
• 50 200level students (characteristic B)
• The intersection of the two sets, A and B, is
empty because a student cannot belong to
both class levels
– Sets A and B are mutually exclusive (i.e., the two
class levels are mutually exclusive) because a
student cannot belong to both class levels at the
same time

Probability theory &


distributions_PSM2018
Example of a Union of Events
• Recall the information on the example of intersection of events
– Sets A and B total to 70
– But 20 individuals are common to both sets

• The union of sets A and B is the set of all individuals who are
patients or aged ≤ 30
– The union of sets A and B is determined by adding up
the total of each set . . .
numbers And then subtracting out the individuals
common to both sets
» Set A = 40
+ Set B = 30
» Total = 70
Common = 20
» Union: 50 unique individuals
– The union of the these two sets (A or B) consists of the 50 unique individuals
who have one or both characteristics (A alone or B alone or both A and B)
Probability theory &
distributions_PSM2018
Features of Probability with Mutually
Exclusive Events
 When there are n mutually exclusive events (cannot occur
together), the probability of any event is nonnegative
– P(Event A) ≥ 0

 The sum of the probabilities of all mutually exclusive events


equals 1
– P(Event 1) + P(Event 2) +…+P(Event n) = 1

 If Event A and Event B are mutually exclusive events, then the


probability of either Event A or Event B is the sum of the two
probabilities
– P(Event A or Event B ) = P(Event A) + P(Event B)
Probability theory &
distributions_PSM2018
Addition Rule/Law of Probability
General rule:
 The probability that event A or event B occurs is:
– P(A or B) = P(A) + P(B) – P(A and B)
– P(A ∪B) = P(A) + P(B) – P(A ∩B)

Special case:
 The probability of occurrence of 2 or more Mutually
Exclusive
events is equal to the sum of their respective probabilities

 E.G. When two events, A and B, are mutually exclusive, then the
probability that event A or event B occurs is:
– P(A or B) = P(A) + P(B)
– P(A ∪B) = P(A) + P(B)
– since P(A and B) = 0 for mutually exclusive events

Probability theory &


distributions_PSM2018
Types of Probability
• The joint probability of an event A and an event B is
– P(A ∩B) = P(A and B)
– P(A and B) = 0, if events A and B are mutually exclusive
• The conditional probability of an event A given an event
B is present (has occurred):

– P(A/B)= 0, if the two events are mutually exclusive


– P(A/B)=P(A), if independent events
• The marginal probabilities of an event A and event B
are
– P(A) and P(B), respectively
Probability theory &
distributions_PSM2018
Multiplication Rule of Probability
General rule:
• Applies to the joint probability of 2 or more events
• For two events, say A and B:
P(A B)  P(A)* P(B / A)  P(B)*
P(A / B)
Special case:
• The probability of occurrence of 2 or more Independent
events is equal to the product of their respective
probabilities
• When events A and B are independent, then:
– P(A ∩B) =P(A)P(B)
– P(A/B)=P(A)
– P(B/A)=P(B) Probability theory &
distributions_PSM2018
Example 1
• The notion of conditional probability can be found in
many different types of problems
• Eg. diagnostic test for a disease

Disease + Disease - Total


Test + 30 10 40
Test - 10 50 60
Total 40 60 100

• What is probability that a person has the disease?


Answer: 40/100 = 0.4
• What is the probability that a person has the disease
given that they tested positive?
– I.e., P(A|B) = P(A and B) / P(B)
– ??
Probability theory &
distributions_PSM2018
Age
Gender Younger Older Total
Example 2: Probability of
Selecting a Female or a older Male 30 20 50
diabetics patient? Female 40 110 150
Total 70 130 200

Marginal probabilities can be calculated:


P(Female) = 150/200 = 0.75
P(Young) = 70/200 = 0.35
P(Older) = 130/200 = 0.65
P(Male) = ?
Joint probability= P(Female and Older )=
P( Female ∩Older)=110/200

Using the addition rule of probability:


P( Female or Older)
= P( Female) + P(Older) – P(Female
and Older)
= 150/200 + 130/200 – 110/200 =
Probability theory &
170/200=0.85 distributions_PSM2018
Exercise
s
• Obtain the probability of Selecting a Male or a Younger
diabetics patient
– ??

• Are the two Characteristics, Sex and Age, independent?


– ??
– If sex and age are independent: Then the probability of being
in a particular age group should be the same for both sexes
– In other words, the conditional probabilities should be equal
– i.e., P(Older given Male) = P(Older given Female) = P(Older) or
P(Older | Male) = P(Older |Female) = P(Older)

Probability theory &


distributions_PSM2018
Exercise 2
Table 1: Age distribution of patients at a
community health centre in Jan 2014 • What is the
that someone selected at
probability
Age group Frequency random from this sample
15-19 24 is
i. Aged 20-24 yrs
20-24 46 ii. Below 35 years
25-29 50 iii. Above 44 years
iv. Aged 15-19 35-39
30-34 36 and years
35-39 44 v. Either 25-29 yrs or 45-49
yrs
40-44 52 vi. Aged 20-24 given
30-34 years
45-49 48
vii. Not aged 40-44 yrs

Probability theory &


distributions_PSM2018
PROBABILITY
DISTRIBUTION
S

Probability theory &


distributions_PSM2018
Recap - Random Variables
 What is a random variable?
– a numerical outcome of a random process or random event
– an observable that takes on values with certain probabilities
– Its values can vary with each repetition of an experiment
– It could be discrete or continuous
• Why do we need random variables?
• We use them as a model for our observed data

 Knowledge of the probability distribution of a variable


allows us to come to conclusions about a population
based on data taken from a sample of that population
Probability theory & distributions_PSM2018
What is Probability Distribution?
 Distribution of the probability of all possible outcomes
of a random variable
 arrangement of the events and their
corresponding
probabilities in a tabular form

 For instance, in a family of four children, what is the


distribution of number of girls in this class?
 Explain with practical demonstration!!!

 Two types of probability distributions


 Discrete and continuous
Probability theory &
distributions_PSM2018
Discrete Probability Distributions
• This is the probability distribution of a random
variable that can assume only integer values
I. Bernoulli Distribution
II. Binomial Distribution
III. Poisson Distribution
IV. Geometric Distribution
V. Hypergeometric Distribution
VI. Negative binomial distribution
Probability theory & distributions_PSM2018
Binomial Probability

• The total number of successes, X, is a binomial


random variable with parameters n and p
– where n is a fixed number of trials
– each experiment results in a “success” with probability p and
– a “failure” with probability (1-p)

• And the probability that X=x (i.e., that there are exactly x
successes) is:
n x n!
P( X  x)    p x (1 p) n
 p x (1 p) n  x , x  ,
 x  0,1, x!(n  n
x)!
Probability theory & distributions_PSM2018
Binomial Statistical Expression
If you have only two possible outcomes (call them 1/0 or
yes/no or success/failure) in n independent trials, then the
probability of exactly X “successes”=x

n = number of trials

  x n
P(X  x)   p (1 p) x
n

  x
1-p = probability
of failure
X =x, # successes out of n p = probability
trials of success
Probability theory & distributions_PSM2018
Factorial !

0! =1
1! =1
2! = 2x1
3! = 3x2x1
4! = 4x3x2x1
………..
Binomial Distribution
Properties:
•Trial has only two possible outcomes - dichotomous
• Trials repeated “n” times
• Successive trials are independent
• Probability of success is constant from trial to trial
•The random variable X is the number of successes in the n
trials

Probability theory & distributions_PSM2018


Mean and Variance of X ~ Bin (n, p)

If X follows a binomial with


distribution parameters n and p, i.e.,
X ~ Bin (n, p)
Then:
x= E(X) = np

x2 =Var (X) = np(1-p)


x =SD (X)= np(1  p)

Probability theory & distributions_PSM2018


Example 1
• Suppose that 30% of a certain population
have blood group O. For a sample of 20 drawn
from this population find the probability that:
i. Exactly three persons has blood group O
{ans P(X=3)=0.0716}
ii. Three or more persons has blood group O
{ans: 1-[P(X=0)+P(X=1)+P(X=2)=1-0.0354 =0.9646}
iii. Exactly five persons has group O?

Probability theory & distributions_PSM2018


Example 2
i. Thirty individuals, each susceptible to tuberculosis
come in contact with a carrier of the disease. The
probability that the disease will be passed from
the carrier to any given subject is 0.10. How many
are expected to contact the disease?
ii. If 10% of the population of students are colour
blind. What is the probability that in a screening
of 8 students, 3 or more are colour blind?

Probability theory &


distributions_PSM2018
Solution
1. E (X) = np
= 30 x 1/10 =
3
2. P = 10% = 1/10 , q = 9/10 , n = 8, x ≥
3
P ( X≥ 3 ) = 1 - P ( X <
3)
= 1 - [ P (X = 0) + P ( X = 1) + P (X = 2 ) ]

= 1 - [ 8C0 (1/10)0 (9/10)8-0 + 8C1 (1/10)1 (9/10)8-1

+ 8C2 (1/10)2 (9/10)8-2 ]


= …………………

Probability theory &


distributions_PSM2018
Exercise
• Suppose it is known that the probability of
recovery from a certain disease is 0.4.
• If 15 randomly selected people are affected
by the disease, what is the probability that
i. At least 3 will recover?
ii. Four or more will recover?
iii. Fewer than three will recover?

Probability theory & distributions_PSM2018


Poisson Distribution
Distribution of a discrete random variable X with
parameter , the average number of occurrences of an
event in a given space, time or volume
It is an experiment in which discrete events are
observed in a continuous interval of time, space &
volume such that
1. Occurrence of events is random (space or time)
2. Probability of occurrence very small (in a short interval -
it approaches zero)
3. Trial size is large and
4. events rare
Poisson distribution …
• Useful for rare events
• Poisson distribution is for counts
– if events happen at a constant rate over time, the
Poisson distribution gives the probability of X
number of events occurring in time T
• It is characterized by parameter λ (the expected
value of X)
– Mean = λ = Variance

Probability theory & distributions_PSM2018


Poisson Distribution – formula expression
 If X= # of new cases next month and X ~ Poisson (), then the
probability that X=x (a particular count) is:

where x = no of occurrences of some random events


e= 2.7183, a constant
 is mean number of occurrences


p ( X  x )   x
e 
, x  0,1, 2,
x !
Probability theory &
distributions_PSM2018
Example 1
 The average number of deaths from lung
cancer in a certain population per year has
been observed to be 12. If the number of
deaths from the disease follows the Poisson
distribution.
What is the prob. that during the current year
(i) 10 or fewer people will die of the disease?
(ii) There will be at least 3 deaths from lung
cancer?

Probability theory &


distributions_PSM2018
Solution: 2
µ = 12,

=1–
{P(x=0)+P(x=1)+P(x=2)}
= .......
Probability theory &
distributions_PSM2018
Poisson Approximation to
Binomial
• Approximation to Binominal
distribution when n is large, and
p is small

» Mean = np=
» Variance= npq= 
» i.e.
 Poisson can be applied when n (sample) is large
P(probability of success) is very small relative to q
Probability theory &
distributions_PSM2018
Example: Poisson Approximation to Binomial
Distribution
The white blood cell count of a healthy individual can
be as much as 6000 per cubic millimeter of blood (a
drop). The probability of a white cell deficiency per
cubic millimeter of blood is 0.001. if a drop of blood is
taken from an individual, then:

Question: What is the probability of finding:


(i) no deficient white cell
(ii) one deficient white cell and
(iii) At least 2 deficient white blood cell
Solution
Discrete event of interest is the occurrence of a deficient
white blood
Continuous interval is a drop of blood
X (no of deficient white cell) is a random variable with
parameter
 = np = 6000 x 0.001= 6
If x = 0;
Pr(x= 0) = e - ()0
0
= e -6 60
0
= = 0.002479
e-6
Exercise 1
• Suppose that over a period of several years
the average number of deaths from a certain
disease has been 10.
• If the number of deaths from this disease
follows the Poisson distribution, what is the
probability that during the current year:
1. Exactly seven people will die from the disease?
2. There will be no deaths from the disease?
3. At least one person will die from the disease?
Probability theory &
distributions_PSM2018
Exercise 2
• If the average number of serious accidents
per year in a drug manufacturing company is
5. It is assumed that occurence of industrial
accident follows a poisson distribution. What
is the probability that in a given year there
will be:
1. Fewer than 2 accidents
2. No accidents
3. Exactly seven accidents
4. Two or more accidents

Probability theory & distributions_PSM2018


Continuous
Distributi
on

Probability theory &


distributions_PSM2018
Continuous probability distributions
• Uniform distribution
• Exponential distribution
• Gamma distribution
• Normal distribution
• Chi-square distribution
• Student t-distribution
• Weibull distribution

Probability theory & distributions_PSM2018


The Normal (Gaussian) Distribution
• The normal distribution is also called the “Gaussian
distribution” in honour of its inventor Carl Friedrich
Gauss
• The normal distribution is a theoretical probability
distribution that is perfectly symmetric about its mean
(median and mode), and has a “bell” like shape
– Reasonable description for many variables such as height,
blood pressure, body temperature and haemoglobin level
– They are symmetric with scores more concentrated in the
middle than in the tails
– It occupies a central role in statistical inference
– Sometimes described as bell shapedProbability theory & distributions_PSM2018
Simple Normal Curve

Probability theory & distributions_PSM2018


Parameters of the normal distribution
• Normal distributions are uniquely defined by two quantities
called parameters: a mean (μ), and standard deviation (σ)
• Areas under a normal curve represent the proportion of total
values described by the curve that fall in that range
• The height of a normal distribution curve can be
specified mathematically in terms of the two parameters:
– the mean (μ) and
– the standard deviation (σ)

– A random variable X having a normal distribution is usually


written in the form: X ~ N (, 2 )
Probability theory & distributions_PSM2018
Normal Probability Density Function

 x  2
1 
f (x)  e 2 2
,  x 
 2 2
•where μ is the mean
σ is the standard deviation,
is the constant 3.14159, and
e is the base of natural logarithms
and is equal to 2.718282
•X can take on any value from -infinity to +infinity
Probability theory & distributions_PSM2018
Properties of the normal curve
• It’s bell-shaped
• It’s symmetrical about the mean value
• Determined by the mean and variance
• It’s mean, median and mode are equal
• Total area under the curve is 1 (100%)
• 68% of total observation approximately lie within
1SD (left and right) of the mean value
• 95% of total observation approximately lie
within 1.96SD (left and right) of the mean value
• Over 99% of total observation approximately lie within
2.576SD (left and right) of the mean value
Probability theory & distributions_PSM2018
Demonstrating the Percentage Points of Normal Distribution

-3 -2 -1 Mean +1 +2 +3
68%

95%
99%
Probability theory & distributions_PSM2018
68% of values are within
1 standard deviation of the mean

95% are within 1.96 standard deviations

99.7% are within 2.576 standard deviations

Probability theory & distributions_PSM2018


Importance of the Normal distribution
• Fits many practical data distributions in
Biomedical sciences
• Binomial distribution can be approximated by
a normal when n is large
• Sampling distributions of means and proportions are
known to have normal distributions.
• It is the cornerstone of all parametric tests
of statistical significance
• It is the foundation of other distributions (e.g.
Chi- square, F-distribution, T-distribution, etc)
Probability theory & distributions_PSM2018
Standard Normal Distribution
• This is a normal distribution with a
mean=0 and a standard deviation=1
– sometimes called the z distribution

σ=
1

µ=0 Probability theory & distributions_PSM2018


Standardizing normal distribution

Probability theory & distributions_PSM2018


Transforming to Standard Normal
• In fact, any normal distribution with any mean and standard
deviation can be transformed to a standard normal curve

• This process is called standardizing or computing z-scores

• A z-score can be computed for any observation from


any
normal curve by:
x
z


– X is the score from the original normal distribution
– μ is the mean of the original normal distribution, and
– σ is the standard deviation of original normal
distribution Probability theory & distributions_PSM2018
Z-score Interpretation
• A z score always reflects the number of standard
deviations
above or below the mean
– A z-score measures the distance of any observation
from its distribution’s mean in units of standard deviation
• This z-score can help assess where the observations fall relative
to the rest of the observations in the distribution
– We can use this score to determine areas under the curve

• For instance, if a student scored 80 on a test in which the mean


is 50 and standard deviation is 10, then the student scored 3
standard deviations above the mean i.e.

x  8100 5 0
z    3
Probability theory & distributions_PSM2018
The area under the normal distribution curve
• Proportion of the population that has values in
some specified range,
or
• Probability that an individual observation
will lie in the specified range

– Calculate with a calculator, computer, or look up in a


table

Probability theory & distributions_PSM2018


Example 1
Given a normal distribution of male heights with µ =
171.5 cm and σ = 6.5 cm, what is the proportion of
men taller than 180 cm?

σ = 6.5

171.5 180
Probability theory & distributions_PSM2018
Area in the upper tail of distribution
1. Z = (180 – 171.5) / 6.5 = 1.31
2. Now we need to find the area of the standard normal curve
above 1.31
3. P(Z>1.31) = 0.0951
4. Proportion of men taller than 180cm is 9.5%

σ= 1

0 1.31
Probability theory & distributions_PSM2018
Example 2
• Suppose the distribution of grades in your statistics class
is normal, with mean = 83.4, s = 7. There are 120
students in the class. If you score a 97.4 in the class,
approximately how many people have scores higher
than you?

• If you have a standard score of 2, we know that 2.3% of


the population has a score greater than you (and
therefore a higher exam score)

– There are 120 people in the class

– So about (.023)*(120) = 2.76 ≈ 3 people have higher scores


Probability theory & distributions_PSM2018
Example 3
 The mean weight of 500 students are
normally distributed with mean 62kg and
standard deviation of 4kg. How many
students have weights;
i. More than 70kg
• (prob = 0.02275 ….about 12 students)
ii. Between 60kg and 72kg
• (prob = 0.68525 … about 343 students)
iii. Less than 65kg
− (prob = 0.77337 ….about 387 students)

Probability theory &


distributions_PSM2018
Exercise 1
 A medical centre employs a large number of clerical
staff. Applicants for employment are timed carrying
out a standard task which involves entering data into
computer. It was observed that the time each
applicant takes is normally distributed with mean
340 seconds and standard deviation of 80 seconds.
Determine:
I. The proportion of the applicants who took more
than 420 seconds.
II. The percentage of the applicants who took between
240 and 420 seconds.

Probability theory &


distributions_PSM2018
Exercise 2
i. For µ = 45, σ = 4, and x = 52, calculate the
percentage area of the normal curve below x.
ii. For µ = 33, σ = 5, and x = 45, calculate the
percentage area of the normal curve above x.
iii. For µ = 125, σ = 10, and x = 110, calculate the
percentage area of the normal curve below x.
iv. For µ = 125, σ = 10, x1 = 110 and x2=130, calculate
the percentage area of the normal curve
between x1 and x2.

Probability theory & distributions_PSM2018

You might also like