Probability
Probability
NORMAL
DISTRIBUTION
INDUCTIVE
STATISTICS
PROBABILITY
RANDOM CIRCUMSTANCES
Random circumstance is one in which
the outcome is unpredictable.
Example: Disease Status
You have the disease
You do not have the disease
N = exhaustive events
n = favorable events
A = event
=n/N
Conditions for Probabilities
for Random Variables
Condition 1
The sum of the probabilities over all possible
values of a discrete random variable must equal
1.
Condition 2
The probability of any specific outcome for a
discrete random variable must be between 0 and
1. 7
PROBABILITY DEFINITIONS
AND RELATIONSHIPS
Sample space: collection of unique, nonoverlapping
possible outcomes of a random circumstance.
Simple event: one outcome in the sample space; a
possible outcome of a random circumstance.
Event: a collection of one or more simple events in
the sample space; often written as
A, B, C, and so on.
8
Equally Likely Simple Events
If there are k simple events in the sample space
and they are all equally likely, then the probability
of the occurrence of each one is 1/k.
9
EXAMPLE: HOW MANY GIRLS ARE LIKELY?
Family has 3 children. Probability of a girl is ?
What are the probabilities of having 0, 1, 2, or 3 girls?
11
EXAMPLE:
Rolling a die.
The chance of rolling a 2 is 1/6, because there is
a 2 on one face and a total of 6 faces.
So, assuming the die is balanced, a 2 will come
up 1 time in 6.
Example: Probability of Simple Events
Random Circumstance:
A three-digit winning lottery number is selected.
Sample Space: {000,001,002,003, . . . ,997,998,999}.
There are 1000 simple events.
Probabilities for Simple Event: Probability any specific
three-digit number is a winner is 1/1000.
Assume all three-digit numbers are equally likely.
15
Probability That Either of Two Events
Happen/ “or” rule/ addition theorem
= 1/80X1/10
= 1/800
NOT RULE
The chance of an event not happening is 1
minus the chance of it happening.
For example, the chance of not getting a 2 on a
die is 1 - 1/6 = 5/6.
This rule can be very useful. Sometimes
complicated problems are greatly simplified by
examining them backwards.
Possible Outcomes of Coin Flipped 3 times
HHH HHT THH HTH HTT THT TTH TTT
Now P (2H , 1T) = 3/8
Addition rule = P(HHT or HTH or THH)
= P(HHT) + P(HTH) + P(THH)
= 1/8 + 1/8 + 1/8 = 3/8
Multiplication rule
= P(H). P(H). P(T)+P(H). P(T). P(H) +
P(T).P(H).P(H)
=1/2.1/2.1/2+1/2.1/2.1/2+1/2.1/2.1/2= 3/8
BINOMIAL RANDOM VARIABLES
Binomial -- results from a binomial experiment.
23
BINOMIAL PROBABILITY
DISTRIBUTION
Example: Flip a coin 3 times the possible outcomes are (call
heads = hits; tails = misses):
Possible Outcomes of Coin
Flipped 3 times Frequency Dist of data
Outcome No. Hits (x)
X F
HHH 3 (FREQUENCY)
HHT 2
3 1
THH 2
HTH 2 2 3
HTT 1 1 3
THT 1
TTH 1 0 1
TTT 0
24
LARGER FAMILIES: BINOMIAL
DISTRIBUTION
The binomial distribution is a
shortcut method based on the
expansion of the equation to the
( p q) 1
n
right, where p = probability of
one event (say, a normal child),
and q = probability of the
alternative event 9mutant child).
n is the number of children in the
family.
Since 1 raised to any power
(multiplied by itself) is always
equal to 1, this equation
describes the probability of any
size family.
BINOMIAL FOR A FAMILY OF 2
The expansion of the binomial for n = 2 is shown.
The 3 terms represent the 3 different kinds of families:
p2 is families with 2 normal children, 2pq is the
families with 1 normal and 1 mutant child, and q 2 is
the families with 2 mutant children.
As before, p = 3/4 and q = 1/4.
Chance of 2 normal children = p2 = (3/4)2 = 9/16.
Chance of 1 normal plus 1 mutant = 2pq = 2 * 3/4 *
1/4 = 6/16 = 3/8.
p 2 pq q
2 2
BINOMIAL FOR A FAMILY OF 3
Here, p3 is a family of 3 normal children, 3p2q is 2 normal plus
1 affected, 3pq2 is 1 normal plus 2 affected, and q3 is 3 affected.
p 3 p q 3 pq q 1
3 2 2 3
LARGER FAMILIES
( p q) 1 n
95% within
2 standard deviations
68% within
1 standard deviation
34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%
30
x - 3s x - 2s x -s x x + s x + 2s x + 3s
Probability of calculated values from
tables
It is determined by referring to the
respective tables.
APPLICATIONS
1. Determine sensitivity & specificity of a
diagnostic test
2. Determine chance of success or failure of a
specific treatment
3. Solve most transmission problems
4. Determine the effect of a certain exposure of
an outcome of disease
5. To study survival pattern of two or more
groups of patients receiving different
treatment.
NORMAL RANDOM VARIABLES
If a population of measurements follows a normal
curve, and if X is the measurement for a randomly
selected individual from that population, then
X is said to be a normal random variable
X is also said to have a normal distribution
Any normal random variable can be completely
characterized by its mean, m, and standard deviation, s.
33
NORMAL DISTRIBUTIONS (OF NORMAL
RANDOM VARIABLES)
35
THE NORMAL CURVE
NB:
NB:position
positionof
of
measures
measuresofof
central
centraltendency
tendency
50% of scores
f fall below mean
Mean
Median
Mode
36
EXAMPLE NORMAL CURVES (HEIGHTS; MALES; FEMALES)
Women:
µ = 63.6 Men:
= 2.5 µ = 69.0
= 2.8
63.6 69.0
Height (inches)
37
Normal Curve Characteristics
• The curve is bell-shaped and symmetrical.
• The mean, median, and mode are all equal.
• The highest frequency is in the middle of the curve.
• The frequency gradually tapers off as the scores
approach the ends of the curve.
• The curve approaches, but never meets, the abscissa
at both high and low ends.
38
The Empirical Rule
Standard Normal Distribution: µ = 0 and = 1
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
68% within
1 standard deviation
34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%
39
x - 3s x - 2s x -s x x + s x + 2s x + 3s
POSITIVELY SKEWED DISTRIBUTION
Mean Median
Mode 41
OTHER DISTRIBUTIONS
Bimodal
Data shows 2 peaks
Multimodal
More than 2 peaks
42
BIMODAL
xx
z
sd
A positive z score implies a value above the mean
A negative z score implies a value below the mean 44
INTERPRETING Z SCORES
By using Z scores, we
Mean = 70,SD = 6
Then a score of 82 is 2
can standardize a set of
sd [ (82-70)/6] above the scores to a scale that is
mean, or 82 = Z score of more intuitive
2 Many IQ tests and
Similarly, a score of 64 = aptitude tests do this,
a Z score of -1 setting a mean of 100 and
an SD of 10 etc.
45
REMEMBER:
AZ score reflects position in a normal
distribution
The Normal Distribution has been plotted out
such that we know what proportion of the
distribution occurs above or below any point
46
IMPORTANCE OF DISTRIBUTION
Given the mean, the standard deviation, and
some reasonable expectation of normal
distribution, we can establish the confidence
level of our findings
47
INFERENCE IS BUILT ON PROBABILITY
Inferential statistics rely on the laws of
probability to determine the ‘significance’ of the
data we observe.
Statistical significance is NOT the same as
practical significance
In statistics, we generally consider ‘significant’
those differences that occur less than 1:20 by
chance alone
48
Statistical inference is the process of drawing
conclusions from data that are subject to random
variation, for example, observational errors or sampling
variation.
Statistical inference, statistical induction and
inferential statistics are used to describe systems of
procedures that can be used to draw conclusions from
datasets arising from systems affected by random
variation
THE MEANING OF STATISTICS
HAS SEVERAL MEANINGS
Collections of numerical Last year’s enrollment
data figures
Summary measures Average enrollment per
calculated from a month last year
collection of data
Activity of using and
Evaluators made a
interpreting a collection
projection of next year’s
of numerical data
enrollments
DESCRIPTIVE STATISTICS
Use of numerical information to summarize,
simplify, and present masses of data.
Organized and summarized for clearer presentation
Sd=1