0% found this document useful (0 votes)

16 views42 pages

Graph Lecture19

Graphical models provide a compact representation for joint probability distributions through directed acyclic graphs (DAGs) and conditional probability tables (CPTs). For Bayesian networks, the DAG encodes local independence assumptions, and the distribution factorizes according to the graph. For Markov random fields, the graph represents conditional independence properties, and the distribution factorizes over cliques in the graph. Variable elimination is a method for efficient probabilistic inference in graphical models that exploits the conditional independence structure to decompose computations. The order of variable elimination can significantly impact computational complexity.

Uploaded by

nirmala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views42 pages

Graph Lecture19

Uploaded by

nirmala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Graphical Models

Aarti Singh
Slides Courtesy: Carlos Guestrin

Machine Learning 10-701/15-781

Nov 15, 2010
Directed – Bayesian Networks
• Compact representation for a joint probability distribution

• Bayes Net = Directed Acyclic Graph (DAG) + Conditional

Probability Tables (CPTs)

• distribution factorizes according to graph ≡ distribution

satisfies local Markov independence assumptions

≡ xk is independent of its non-descendants

given its parents pak
Directed – Bayesian Networks
• Graph encodes local independence assumptions (local Markov
Assumptions)
• Other independence assumptions can be read off the graph
using d-separation
• distribution factorizes according to graph ≡ distribution
satisfies all independence assumptions found by d-separation

F I

• Does the graph capture all independencies? Yes, for almost all
distributions that factorize according to graph. More in 10-708
D-separation
• a is D-separated from b by c ≡ a  b|c
• Three important configurations
c

a … c … b

Causal direction a  b|c a b

Common cause a  b|c

a b a b
V-structure
(Explaining away)

…
a  b|c
c c
Undirected – Markov Random Fields
• Popular in statistical physics, computer vision, sensor
networks, social networks, protein-protein interaction network
• Example – Image Denoising xi – value at pixel i
yi – observed noisy value
Conditional Independence properties
• No directed edges
• Conditional independence ≡ graph separation
• A, B, C – non-intersecting set of nodes
• A  B|C if all paths between nodes in A & B are “blocked”
i.e. path contains a node z in C.
Factorization
• Joint distribution factorizes according to the graph

Arbitrary positive function Clique, xC = {x1,x2}

Maximal clique
xC = {x2,x3,x4}

typically NP-hard to compute

MRF Example

Often
Energy of the clique (e.g. lower if
variables in clique take similar values)
MRF Example
Ising model:

cliques are edges xC = {xi,xj}

binary variables xi ϵ {-1,1}

1 if xi = xj
-1 if xi ≠ xj

Probability of assignment is higher if neighbors xi

and xj are same
Hammersley-Clifford Theorem
• Set of distributions that factorize according to the graph - F

• Set of distributions that respect conditional independencies

implied by graph-separation – I

I F

Important because: Given independencies of P can get MRF structure G

I F

Important because: Read independencies of P from MRF structure G

What you should know…
• Graphical Models: Directed Bayesian networks, Undirected
Markov Random Fields
– A compact representation for large probability
distributions
– Not an algorithm
• Representation of a BN, MRF
– Variables
– Graph
– CPTs
• Why BNs and MRFs are useful
• D-separation (conditional independence) & factorization
Topics in Graphical Models
• Representation
– Which joint probability distributions does a graphical
model represent?

• Inference
– How to answer questions about the joint probability
distribution?
• Marginal distribution of a node variable
• Most likely assignment of node variables

• Learning
– How to learn the parameters and structure of a graphical
model?
Inference
• Possible queries:
1) Marginal distribution e.g. P(S) Flu Allergy
Posterior distribution e.g. P(F|H=1)

Sinus

2) Most likely assignment of nodes Headache Nose

arg max P(F=f,A=a,S=s,N=n|H=1)
f,a,s,n
Inference
• Possible queries:
1) Marginal distribution e.g. P(S) Flu Allergy
Posterior distribution e.g. P(F|H=1)

Sinus
P(F|H=1) ?

P(F, H=1)
P(F|H=1) = Headache Nose
P(H=1)
P(F, H=1)
=
∑ P(F=f,H=1)
f
 P(F, H=1) will focus on computing this, posterior will
follow with only constant times more effort
Marginalization
Need to marginalize over other vars
Flu Allergy
P(S) = ∑ P(f,a,S,n,h)
f,a,n,h
Sinus
P(F,H=1) = ∑ P(F,a,s,n,H=1)
a,s,n

23 terms
Headache Nose
To marginalize out n binary variables,
need to sum over 2n terms

Inference seems exponential in number of variables!

Actually, inference in graphical models is NP-hard 
Bayesian Networks Example
• 18 binary attributes

• Inference
– P(BatteryAge|Starts=f)

• need to sum over 216 terms!

• Not impressed?
– HailFinder BN – more
than 354 =
58149737003040059690
390169 terms
Fast Probabilistic Inference
P(F,H=1) = ∑ P(F,a,s,n,H=1)
a,s,n Flu Allergy
= ∑ P(F)P(a)P(s|F,a)P(n|s)P(H=1|s)
a,s,n

= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s) Sinus

a s n

Headache Nose
Push sums in as far as possible

Distributive property: x1z + x2z = z(x1+x2)

= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s)

a s 4 values x 1 multiply Nose
Headache
= P(F) ∑ P(a) g1(F,a)
a
2 values x 1 multiply 32 multiplies vs. 7 multiplies
= P(F) g2(F) 2n vs. n 2k
1 multiply k – scope of largest factor

(Potential for) exponential reduction in computation!

P(H=1|F)
Headache Nose

(Potential for) exponential reduction in computation!

P(H=1|F)
Headache Nose
P(F,H=1) = P(F) ∑ P(a) ∑ ∑ P(s|F,a)P(n|s)P(H=1|s)
a ns

g(F,a,n) 3 – scope of largest factor

(Potential for) exponential reduction in computation!

Variable Elimination – Order can
make a HUGE difference
Y

X1 X2 X3 X4

1 – scope of
g(Y) largest factor

n – scope of
g(X1,X2,..,Xn) largest factor
Variable Elimination Algorithm
• Given BN – DAG and CPTs (initial factors – p(xi|pai) for i=1,..,n)
• Given Query P(X|e) ≡ P(X,e) X – set of variables
• Instantiate evidence e e.g. set H=1 IMPORTANT!!!
• Choose an ordering on the variables e.g., X1, …, Xn
• For i = 1 to n, If Xi {X,e}
– Collect factors g1,…,gk that include Xi
– Generate a new factor by eliminating Xi from these factors

– Variable Xi has been eliminated!

– Remove g1,…,gk from set of factors but add g
• Normalize P(X,e) to obtain P(X|e)
Complexity for (Poly)tree graphs
Variable elimination order:
• Consider undirected version
• Start from “leaves” up
• find topological order
• eliminate variables in reverse
order

Does not create any factors

bigger than original CPTs

For polytrees, inference is

linear in # variables (vs.
exponential in general)!
Complexity for graphs with loops
• Loop – undirected cycle
Linear in # variables but exponential in size of largest factor
generated!

Moralize
graph

(connect parents
into a clique
& drop direction
of all edges)

When you eliminate a variable, add edges between its neighbors

Complexity for graphs with loops
• Loop – undirected cycle
Var eliminated Factor generated
S g1(C,B)
B g2(C,O,D)
D g3(C,O)
C g4(T,O)
T g5(O,X)
O g6(X)

Linear in # variables but exponential in size of largest factor

generated ~ tree-width (max clique size-1) in resulting graph!
Example: Large tree-width with small
number of parents

At most 2 parents per node, but tree width is O(√n)

Compact representation  Easy inference 

Choosing an elimination order
• Choosing best order is NP-complete
– Reduction from MAX-Clique
• Many good heuristics (some with guarantees)
• Ultimately, can’t beat NP-hardness of inference
– Even optimal order can lead to exponential variable
elimination computation
• In practice
– Variable elimination often very effective
– Many (many many) approximate inference approaches
available when variable elimination too expensive
Inference
• Possible queries:
2) Most likely assignment of nodes Flu Allergy
arg max P(F=f,A=a,S=s,N=n|H=1)
f,a,s,n
Sinus

Use Distributive property: Headache Nose

max(x1z, x2z) = z max(x1,x2)
2 multiply 1 mulitply
Topics in Graphical Models
• Representation
– Which joint probability distributions does a graphical
model represent?

• Inference
– How to answer questions about the joint probability
distribution?
• Marginal distribution of a node variable
• Most likely assignment of node variables

• Learning
– How to learn the parameters and structure of a graphical
model?
Learning

Data
x(1) CPTs –
… P(Xi| PaXi)
x(m)
structure parameters

Given set of m independent samples (assignments of random

variables),
find the best (most likely?) Bayes Net (graph Structure + CPTs)
Learning the CPTs (given structure)
For each discrete variable Xk
Data
x(1) Compute MLE or MAP estimates for
…
x(m)
MLEs decouple for each CPT in Bayes
Nets
• Given structure, log likelihood of data F A
S
(j) (j) (j) (j) (j) (j) (j) (j) (j) H N

(j) (j) (j) (j) (j) (j) (j) (j) (j)

(j) (j) (j) (j) (j)

Depends qF qA qF,A (j) (j) (j) (j)

only on
qH|S qN|S
Can computer MLEs of each parameter independently!
Information theoretic interpretation
of MLE

Plugging in MLE estimates: ML score

Reminds of entropy
Information theoretic interpretation
of MLE

Doesn’t depend on graph structure

ML score for graph structure

ML – Decomposable Score
• Log data likelihood

• Decomposable score:
– Decomposes over families in BN (node and its parents)
– Will lead to significant computational efficiency!!!
– Score(G : D) = i FamScore(Xi|PaXi : D)
How many trees are there?
• Trees – every node has at most one parent
• nn-2 possible trees (Cayley’s Theorem)

Nonetheless – Efficient optimal algorithm finds best tree!

Scoring a tree

Equivalent Trees (same score): I(A,B) + I(B,C)

A B C A B C A B C

Score provides indication of structure:

A B C B C

I(A,B) + I(B,C) I(A,B) + I(A,C)

Chow-Liu algorithm
• For each pair of variables Xi,Xj
– Compute empirical distribution:
– Compute mutual information:

• Define a graph
– Nodes X1,…,Xn
– Edge (i,j) gets weight

• Optimal tree BN
– Compute maximum weight spanning tree (e.g. Prim’s, Kruskal’s
algorithm O(nlog n))
– Directions in BN: pick any node as root, breadth-first-search defines
directions
Chow-Liu algorithm example

1/
1/

1/ 1/ 1/
1/
1/
1/ 1/
1/
1/
Scoring general graphical models
• Graph that maximizes ML score -> complete graph!
• Information never hurts
H(A|B) ≥ H(A|B,C)

• Adding a parent always increases ML score

I(A,B,C) ≥ I(A,B)

• The more edges, the fewer independence assumptions, the

higher the likelihood of the data, but will overfit…

• Why does ML for trees work?

Restricted model space – tree graph
Regularizing
• Model selection
– Use MDL (Minimum description length) score
– BIC score (Bayesian Information criterion)
• Still NP –hard
• Mostly heuristic (exploit score decomposition)
• Chow-Liu: provides best tree approximation to any
distribution.
• Start with Chow-Liu tree. Add, delete, invert edges. Evaluate
BIC score
What you should know
• Learning BNs
– Maximum likelihood or MAP learns parameters
– ML score
• Decomposable score
• Information theoretic interpretation (Mutual
information)
– Best tree (Chow-Liu)
– Other BNs, usually local search with BIC score

Lauritzen - Graphical Models (1996)
No ratings yet
Lauritzen - Graphical Models (1996)
306 pages
0x02. C - Functions, Nested Loops
100% (1)
0x02. C - Functions, Nested Loops
8 pages
Filecoin Primer
No ratings yet
Filecoin Primer
21 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
Building Probabilistic Graphical Models With Python
No ratings yet
Building Probabilistic Graphical Models With Python
24 pages
prob_inf
No ratings yet
prob_inf
56 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
04 Exact Inference
No ratings yet
04 Exact Inference
23 pages
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
No ratings yet
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
40 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Bayesian Neworks
No ratings yet
Bayesian Neworks
32 pages
Kolter PGM
No ratings yet
Kolter PGM
75 pages
All of Graphical Models
No ratings yet
All of Graphical Models
135 pages
16 Graphical Models
No ratings yet
16 Graphical Models
27 pages
BayesianNetworks Reduced
No ratings yet
BayesianNetworks Reduced
14 pages
BayesianNetworks Reduced
No ratings yet
BayesianNetworks Reduced
14 pages
BN Lecture3
No ratings yet
BN Lecture3
32 pages
Probabilistic Reasoning: CS 188: Artificial Intelligence
No ratings yet
Probabilistic Reasoning: CS 188: Artificial Intelligence
10 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Graphical Models: Michael I. Jordan
No ratings yet
Graphical Models: Michael I. Jordan
16 pages
Epi Summer 24
No ratings yet
Epi Summer 24
291 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
Cheatsheet Variables Models
No ratings yet
Cheatsheet Variables Models
4 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Belief Propagation Algorithm
No ratings yet
Belief Propagation Algorithm
20 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
2 pages
Wainwright Microsoft Slides2
No ratings yet
Wainwright Microsoft Slides2
67 pages
17 Factor Graphs
No ratings yet
17 Factor Graphs
27 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
Graphical
No ratings yet
Graphical
99 pages
Bayes Ball
No ratings yet
Bayes Ball
5 pages
Bayesian
No ratings yet
Bayesian
40 pages
2021 Lecture09 BayesianNetworks
No ratings yet
2021 Lecture09 BayesianNetworks
60 pages
Scribe Lecture4
No ratings yet
Scribe Lecture4
9 pages
Belief Propagation Cambridge
No ratings yet
Belief Propagation Cambridge
22 pages
Lecture 13: Bayesian Networks I: CS221 / Spring 2019 / Charikar & Sadigh
No ratings yet
Lecture 13: Bayesian Networks I: CS221 / Spring 2019 / Charikar & Sadigh
76 pages
Week-9
No ratings yet
Week-9
88 pages
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
No ratings yet
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
40 pages
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
No ratings yet
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
39 pages
AI unit 5 notes
No ratings yet
AI unit 5 notes
35 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
EXP1_A09_DS
No ratings yet
EXP1_A09_DS
6 pages
Markov Networks
No ratings yet
Markov Networks
22 pages
Factor Graph
No ratings yet
Factor Graph
3 pages
BN DBN SSM HMM - ghahramani
No ratings yet
BN DBN SSM HMM - ghahramani
30 pages
Case Study With Probabilistic Models
No ratings yet
Case Study With Probabilistic Models
85 pages
Directed Graphical Models
No ratings yet
Directed Graphical Models
54 pages
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
29 pages
BN Problems
No ratings yet
BN Problems
17 pages
dsdf ans mid 2
No ratings yet
dsdf ans mid 2
9 pages
PR January20 06 PDF
No ratings yet
PR January20 06 PDF
29 pages
6-Uncertainty6
No ratings yet
6-Uncertainty6
36 pages
Lec7_Bayesian Network I(1)
No ratings yet
Lec7_Bayesian Network I(1)
62 pages
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
No ratings yet
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
21 pages
Probabilistic Graphical Models Lecture Notes Cmu 10708 Itebooks download
No ratings yet
Probabilistic Graphical Models Lecture Notes Cmu 10708 Itebooks download
76 pages
cs228 HW 1
No ratings yet
cs228 HW 1
6 pages
Lecture10 - Bayes 3
No ratings yet
Lecture10 - Bayes 3
43 pages
SP14 Cs188 Lecture 18 -- Bayes Nets III Inference - Print
No ratings yet
SP14 Cs188 Lecture 18 -- Bayes Nets III Inference - Print
39 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
ThotWave SAS9 Clinical Research
No ratings yet
ThotWave SAS9 Clinical Research
11 pages
ALV Part2
No ratings yet
ALV Part2
12 pages
The Morphology of English
No ratings yet
The Morphology of English
22 pages
DeltaGrid EM - Leaflet - EN - 20231027
No ratings yet
DeltaGrid EM - Leaflet - EN - 20231027
4 pages
Complete List of SAP Modules
No ratings yet
Complete List of SAP Modules
16 pages
Library Management System Project
No ratings yet
Library Management System Project
32 pages
SAD Research Assesment
No ratings yet
SAD Research Assesment
3 pages
Visual Programming Practical Sheet for Class 12
No ratings yet
Visual Programming Practical Sheet for Class 12
7 pages
Mobile Phone Usage Terms & Conditions
No ratings yet
Mobile Phone Usage Terms & Conditions
3 pages
GE 4 - Science, Technology and Society Week 5 & 6
No ratings yet
GE 4 - Science, Technology and Society Week 5 & 6
8 pages
TTL Digital Innovation Project
No ratings yet
TTL Digital Innovation Project
10 pages
Linux-Users-and-Permissions
No ratings yet
Linux-Users-and-Permissions
8 pages
Patent
No ratings yet
Patent
7 pages
LQ2 Trans
No ratings yet
LQ2 Trans
8 pages
Ict Past Paer Exercise
No ratings yet
Ict Past Paer Exercise
20 pages
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
No ratings yet
Mark Joshi (Auth.) - Proof Patterns-Springer International Publishing (2015)
189 pages
Manual OF User: Karaoke
0% (1)
Manual OF User: Karaoke
24 pages
Tyit Sem 5 Advance Java
No ratings yet
Tyit Sem 5 Advance Java
96 pages
Grade10 PT1 AI 417 QP SET2 70571
No ratings yet
Grade10 PT1 AI 417 QP SET2 70571
3 pages
Squares and Square Roots Assignment 13
No ratings yet
Squares and Square Roots Assignment 13
2 pages
Strassen's Matrix Multiplication
No ratings yet
Strassen's Matrix Multiplication
13 pages
Adaptation of The Okumura-Hata Model To The Environment of Accra
No ratings yet
Adaptation of The Okumura-Hata Model To The Environment of Accra
6 pages
MIS Short Notes
No ratings yet
MIS Short Notes
7 pages
SWIFT Customer Security Program
No ratings yet
SWIFT Customer Security Program
12 pages
نموذج تسجيل طالب زائر
No ratings yet
نموذج تسجيل طالب زائر
1 page
Cs8792-Cryptography and Network Security Unit-3: Sn. No. Option 1 Option 2 Option 3 Option 4 Correct Option
No ratings yet
Cs8792-Cryptography and Network Security Unit-3: Sn. No. Option 1 Option 2 Option 3 Option 4 Correct Option
3 pages
Fan and Temperature Controller With LCD
No ratings yet
Fan and Temperature Controller With LCD
9 pages
OPS-Chapter-2-Services and Components of Operating Systems
No ratings yet
OPS-Chapter-2-Services and Components of Operating Systems
10 pages

Graph Lecture19

Uploaded by

Graph Lecture19

Uploaded by

Graphical Models

Machine Learning 10-701/15-781

• Bayes Net = Directed Acyclic Graph (DAG) + Conditional

• distribution factorizes according to graph ≡ distribution

≡ xk is independent of its non-descendants

Causal direction a  b|c a b

Arbitrary positive function Clique, xC = {x1,x2}

typically NP-hard to compute

cliques are edges xC = {xi,xj}

Probability of assignment is higher if neighbors xi

• Set of distributions that respect conditional independencies

Important because: Given independencies of P can get MRF structure G

Important because: Read independencies of P from MRF structure G

2) Most likely assignment of nodes Headache Nose

Inference seems exponential in number of variables!

• need to sum over 216 terms!

= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s) Sinus

Distributive property: x1z + x2z = z(x1+x2)

= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s)

(Potential for) exponential reduction in computation!

(Potential for) exponential reduction in computation!

g(F,a,n) 3 – scope of largest factor

(Potential for) exponential reduction in computation!

– Variable Xi has been eliminated!

Does not create any factors

For polytrees, inference is

When you eliminate a variable, add edges between its neighbors

Linear in # variables but exponential in size of largest factor

At most 2 parents per node, but tree width is O(√n)

Compact representation  Easy inference 

Use Distributive property: Headache Nose

Given set of m independent samples (assignments of random

(j) (j) (j) (j) (j) (j) (j) (j) (j)

(j) (j) (j) (j) (j)

Depends qF qA qF,A (j) (j) (j) (j)

Plugging in MLE estimates: ML score

Doesn’t depend on graph structure

ML score for graph structure

Nonetheless – Efficient optimal algorithm finds best tree!

Equivalent Trees (same score): I(A,B) + I(B,C)

Score provides indication of structure:

I(A,B) + I(B,C) I(A,B) + I(A,C)

• Adding a parent always increases ML score

• The more edges, the fewer independence assumptions, the

• Why does ML for trees work?

You might also like