Probability Review: Thursday Sep 13
Probability Review: Thursday Sep 13
Thursday Sep 13
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
Event: a subset of
First toss is head = {HH,HT}
S: event space, a set of events
Closed under finite union and
complements
Entails other binary operation: union,
diff, etc.
Probability Measure
Defined over (Ss.t.
P() >= 0 for all in S
P() = 1
If are disjoint, then
P( U ) = p() + p()
We can deduce other axioms from the
above ones
Visualization
Conditional Probability
P(F|H) = Fraction of worlds in which H is true
that also have F true
p( F H )
p ( f | h)
p( H )
B2
B3
B4
A
B7
B6
B1
p A P Bi P A | Bi
Random Variables
I:Intelligence
High
low
A
G:Grade
A
+
Probability of Discrete RV
Probability mass function (pmf): P(X
= xi)
Easy facts about pmf
i P(X = xi) = 1
P(X = xiX = xj) = 0 if i j
P(X = xi U X = xj) = P(X = xi) + P(X = xj)
if i j
P(X = x1 U X = x2 U U X = xk) = 1
Common Distributions
Uniform X U[1, , N]
X takes values 1, 2, N
P(X = i) = 1/N
E.g. picking balls of different colors from a
box
Binomial X Bin(n, p)
i
X takes nvalues
n i0, 1, , n
p(X i) p (1 p)
i
Continuous Random
Variables
Probability density function (pdf)
instead of probability mass function
(pmf)
A pdf is any function f(x) that
describes the probability density in
terms of the input variable x.
Probability of Continuous RV
Properties of pdf
f (x) 0,x
f (x) 1
f (x)dx
0
Cumulative Distribution
Function
FX(v) = P(X v)
Discrete RVs
FX(v) = vi P(X = vi)
Continuous
RVs
v
FX (v)
f (x)dx
Fx (x) f (x)
dx
Common Distributions
Normal X
N(, 2)
(x ) 2
1
f (x)
exp
2
2
2
Multivariate Normal
Generalization to higher dimensions
of the one-dimensional normal
Covariance matrix
f Xr (x i ,..., x d )
1
(2 )
d /2
1/ 2
T
r
1 r
. exp 1 x
x
2
Mean
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
f
x
,
y
dxdy
1
X
,
Y
x y
Chain Rule
Always true
P(x, y, z) = p(x) p(y|x) p(z|x, y)
= p(z) p(y|z) p(x|y, z)
=
Conditional Probability
events
P X x Y y
P X x Y y
p ( x, y )
P x | y
p( y )
P Y y
Marginalization
We know p(X, Y), what is P(X=x)?
We can use the low of total probability,
why?
p x P x, y
y
P y P x | y
B5
B2
B3
B4
A
B7
B6
B1
Marginalization Cont.
Another example
p x P x, y , z
y,z
P y, z P x | y, z
z,y
Bayes Rule
We know that P(rain) = 0.5
If we also know that the grass is
wet, then how this affects our
belief about whether it rains or
not?
P(rain)P(wet | rain)
P rain | wet
P(wet)
P(x)P(y | x)
P x | y
P(y)
P ( x | z ) P ( y | x, z )
P x | y, z
P( y | z )
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
Independence
X is independent of Y means that
knowing Y does not change our belief
about X.
P(X|Y=y) = P(X)
P(X=x, Y=y) = P(X=x) P(Y=y)
The above should hold for all x, y
It is symmetric and written as X
Y
Independence
X1, , Xn are independent if and only if
n
P(X1 A1,..., X n An ) P X i Ai
i1
CI: Conditional
Independence
Conditional Independence
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
E X vi P X vi
Discrete RVs:
vi
E X
Continuous RVs:
E(g(X))
xf x dx
g(x) f (x)dx
Discrete RVs:
Var(X) E((X ) )
2
Var(X) E(X 2 ) 2
Continuous RVs:
Covariance:
Covariance:
Cov(X,Y )
V X
V X
vi P X vi
2
vi
f x dx
Properties
Mean
E X Y E X E Y
E aX aE X
E XY E X E Y
If X and Y are independent,
Variance
V aX b a 2V X
If X and Y are independent,
V X Y V (X) V (Y)
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
Mode
l
Data
Estimation/learning
Statistical Inference
Given observations from a model
What (conditional) independence
assumptions hold?
Structure learning
Probability Review
Events and Event spaces
Random variables
Joint probability distributions
Marginalization, conditioning, chain
rule, Bayes Rule, law of total
probability, etc.
Structural properties
Independence, conditional
independence
Mean and Variance
The big picture
Host reveals
Goat A
or
Host reveals
Goat B
Host must
reveal Goat B
Host must
reveal Goat A
pick door i
P H ij Ck
0
jk
ik
12
1 i k , j k
P H13 C1 P C 1
P H13
1 1 1
C1 P C1
2 3 6
1
1
1
6
3
1
2
16 1
P C1 H13 1 2 3
12 3
1 2
1 P C1 H13
3 3
P C1 H13
P C2 H13
Information Theory
P(X) encodes our uncertainty about X
Some variables are more uncertain that others
P(X
)
P(Y
)
1
1
H P X E log
P
x
log
P x log P( x)
p
x
P
x
x
x
1
1
H P X E log
P
x
log
P x log P( x)
p
x
P
x
x
x
1 entropy similarly
H P X | Y E log
H P X ,Y H P Y
p x | y
Mutual Information: MI
Remember independence?
If XY then knowing Y wont change our belief
about X
Mutual information can help quantify this! (not
the
way
though)
I P Xonly
;Y H
P X HP X |Y
MI:
The amount of uncertainty in X which is
removed by knowing Y
Symmetric
I(X;Y) = 0 iff, X and Y are independent!
p ( x, y )
p( x) p( y )
I ( X ; Y ) p( x, y ) log
y
Republican Democrat
Total
nt
Male
200
150
50
400
Female
250
300
50
600
Total
450
450
100
1000
Republi
can
Democr
at
Independ
ent
Total
Male
200
150
50
400
Fema
le
250
300
50
600
Total
450
100
1000
Degrees of freedom =
|g|-1 * |v|-1 = (2-1) * (3-1) = 2
Expected frequency count =
450
(Og ,v E g ,v ) 2
E g ,v
Democr
at
Independ
ent
Total
Male
200
150
50
400
Fema
le
250
300
50
600
Total
450
450
100
1000
Acknowledgment