CS264A_Review_Note
CS264A_Review_Note
• ∆ |= ∃P ∆ 2. simplifying KB (∆)
1. Turning KB ∆ into CNF.
• If α is a sentence that does not mention P then 3. deduction (strategies of resolution, directed re-
2. To existentially Quantify B, do all B-resolutions
∆ |= α ⇐⇒ ∃P ∆ |= P solution) 3. Drop all clauses containing B
We can safely remove P from ∆ when considering
existential qualification. It is called: Completeness of Resolution / Inference Rule Unit Resolution
We say rule R is complete, iff ∀α, if ∆ |= α then Unit resolution is a special case of resolution, where
• forgetting P from ∆
∆ `R α. min(|Ci |, |Cj |) = 1 where |Ci | denotes the size of set
• projecting P on all units / variables but P In other words, R is complete when it could “discover Ci . Unit resolution corresponds to modus ponens
everything from ∆”. (MP). It is NOT refutation complete. But it has
Resolution / inference rule is NOT complete. A benefits in efficiency: could be applied in linear time.
counter example is: ∆ = {A, B}, α = A ∨ B.
However, when applied to CNF, resolution is refuta- Refutation Theorem
tion complete. Which means that it is sufficient to ∆ |= α iff ∆ ∧ ¬α is inconsistent. (useful in proof)
discover any inconsistency.
• resolution finds contradiction on ∆∧¬α: ∆ |= α
• resolution does not find any contradiction on
∆ ∧ ¬α: ∆ 0 α
Resolution Strategies: Linear Resolution Directed Resolution: Forgetting SAT Solvers
All the clauses that are originally included in CNF ∆ Directed resolution can be applied to forgetting / pro- The SAT-solvers we learn in this course are:
are root clauses. jecting.
Linear resolution resolved Ci and Cj only if one of • requiring modest space
When we do existential quantification on variables
them is root or an ancestor of the other clause. P1 , P2 , . . . Pm , we: • foundations of many other things
An example: ∆ = {¬A, C}, {¬C, D}, {A}, {¬C, ¬D}.
1. put them in the first m places of the variable Along the line there are: SAT I, SAT II, DPLL, and
order
{¬A,C} {¬C,D} {A} {¬C,¬D} other modern SAT solvers.
They can be viewed as optimized searcher on all the
2. after processing the first m (P1 , P2 , . . . Pm ) buc-
{C} kets, remove the first m buckets
worlds ωi looking for a world satisfying ∆.
The first order has 3 distinct subfunction, only 1 de- x3 feature: variables split in the same way in each sub-
pend on x4 , thus next layer has 1 node only. function.
0 1 0 1 0 1 1 1
0 1 1 x4 + x6 x x x VTree
case 1
1 0 0 x2 Vtree is a binary tree that denotes the order and the
1 0 1 x2 + x6 f1 f0 g1 g0 f1∧g1 f0∧g0 structure of a SDD. Each node’s left branch refers
1 1 0 x2 + x4 to the element in the primes, and each node’s right
1 1 1 x2 + x4 + x6 x y x
case 2 branch refers to that of the subs.
Subfunction is a reliable measurement of the OBDD f1 f0 g1 g0 f1∧g f0∧g
From OBDD to SDD
graph size, and is useful to determine which variable assuming x < y, thus x not in g
order is better. OBDD is a special case of SDD with right-linear a
vtree.
SDDs (Sentential Decision Diagrams) SDD is a strict superset of OBDD, maintaining key
SDD is the most popular generalization of OBDD. It properties of OBDD b , and could be exponentially
is also a circuit type. smaller than OBDD.
a Right-linear means that each node’s left child is a leaf.
• Order: needed, and matters b What is called a path-width in OBDD is called a tree-
width in SDD
• Unique: when canonical / reduced
SDD: Compression Construct an SDD: Example Bottom-Up Compilation (OBDD/SDD)
(X, Y)-partition is compressed if there is no equal Following the previous example, using that specific • To compile a CNF:
subs. That is, vtree, the SDD we construct looks like:
6
OBDD/SDD for literals
hi 6= hi , ∀i 6= j disjoint literals to clause
disjoint clauses to CNF
Any f has a unique compressed (X, Y)-partition. C ¬B
nary tree; unlike subtree = root node + descendents of that f (X, Y) = g1 (X)h1 (Y) + · · · + gn (X)hn (Y)
• # vtrees over n variables: node, a fragment need not include all descendents of its root.
∀i, gi 6= 0
2(n − 1) !
n! × Cn−1 = SDD, PSDD and Conditional PSDD ∀i 6= j, gi gj = 0 (mutually exclusive)
(n − 1)!
These are circuits of learning from Data & Knowledge.
g1 + g2 + · · · + gn = 1 (exhaustive)
Starting from the top, trace the high-wired (1, one relation. (Can’t expect to have κ3 = κ1 × κ2 .) For PI, existential quantification, and CE (clausal en-
under OR, all under AND) for each sample. Assign 1 The PSDD circuits involved tailment check), are easy.
for each sample along the trace, under OR-gate. a (P SDD1 , P SDD2 , P SDD3 ) doesn’t need to be Prime means a clause/term is not subsumed by
similar at all. any any other
Normalizing under each OR-gate. (Sum up to 1.)
An application: Compiling Bayesian Network into ( clause/term.
a e.g. α is a prime implicate of ∆
In this case, the OR-gates input high wires correspon- PSDDs. e.g. P SDDall = P SDDA ∗ P SDDB ∗ Duality:
ding to ¬L ∧ ¬K ∧ P ∧ ¬A are assigned 0 + 6 = 6. If the same
P SDDC|AB ∗ P SDDD|B . . . ¬α is a prime implicant of ¬∆
edge gets assignment ≥ 2 times, sum them up (e.g. 6+10 = 16).
Model-Based Diagnosis Health Condition Classifiers: Review
In a circuit, on each edge (connecting two gates) there Health condition of system ∆ given observation α is: Function version of a classifier:
is a signal (high or low), denoted as X, Y, A, B, C, . . . ,
(α, could be directly observed). Health(∆, α) = ∃ ...
|{z} ∆∧α f (x1 , . . . , xn )
For each gate (usually numbered 1, 2, . . . ), there is all except oki
where xi are called features, all features x1 , x2 , . . . xn
one extra variable (ok1, ok2, . . . ) called health vari-
— projection of ∆ ∧ α onto health variables oki. together: instance; output of f : decision (classifica-
able, representing whether or not the gate is correctly
Note: Could be done easily by bucket resolution + tion); positive/negative decision refer to f = 1/0 res-
functioning.
forgetting we’ve learned before. pectively, while the corresponding instances are called
∆ contains A, B, C, . . . ok1, ok2, . . . . Examples:
positive/negative instantiation.
A A B Methods of Diagnosis
• Boolean Classifier: xi , f have Boolean values
(a) (b)
Based on health condition Health(∆, α) we can do
1 1 2 model-based diagnosis. – Propositional Formula as Classifier: ω |=
B
(
conflict: implicates of Health(∆, α) ∆ positive and ω |= ¬∆ negative.
C D CNF:
2 3
min-conflict: PI of Health(∆, α) • Monotone Classifier: positive instance remains
C
(
parlid: implicant of Health(∆, α) positive if we flip some features from − to +
E DNF:
kernel: IP of Health(∆, α) – e.g. f (+ − −+) → + ⇒ f (+ + ++) → +
(
ok1 ⇒ (A ⇐⇒ ¬B) Minimum Cardinality Diagnosis: turn the health
∆a = condition Health(∆, α) into a DNNF and then com- Minimum Cardinality of Classifiers: the number of
ok2 ⇒ (B ⇐⇒ ¬C)
pute the minCard. The path with minimum cardina- false variables (negative features). Note: Computed
lity corresponds to the solution. on DNNF easily. Sometimes the circuit can be mi-
ok1 ⇒ (A ⇐⇒ ¬C)
nimized when (1) smooth (2) prune some edges that
∆b = ok2 ⇒ (B ⇐⇒ ¬D)
Current Topics aren’t helpful to minCard.
ok3 ⇒ ((C ∨ D) ⇐⇒ E)
• Explaining decisions of ML systems. (see “Why • Sub-Circuit: a model; trace-down one child
Model-Based Diagnosis figure out what are the possi- Should I Trust You?” (KDD’16)) of OR-gates and all children of AND-gates.
ble situations of health variables when given ∆ and
α (an observation, e.g. αa = C, αb = ¬E, etc.). • Measuring robustness of decisions.
MC Explanations and PI Explanations
∆ here is called a system, and α is system obser-
vation. Readings: MC Explanations (MC: Minimum Cardinality)
For example: in case (a), if ∆ ∧ α ∧ ok1 ∧ ok2 is sa-
• Three Modern Roles for Logic in AI (PODS’20) • which positive features are responsible for a yes
tisfiable (using SAT solver) then health condition
decision? (negative: vice versa)
ok1 = t, ok2 = t is normal, otherwise it is abnor- • Human-Level Intelligence or Animal-Like Abili-
mal. ties? (CACM’18) • computed in linear time on DNNF* (def: ∆,
To do diagnosis we conclude all the normal assign- ¬∆ are both DNNF)
ments of the health variables. Explanation (Explaining Decisions)
e.g. Example (b), α = ¬A, ¬B, ¬E, diagnosis: • to answer Q1: which positive features, def:
Instance minCard = # positive variables; condition
ok1 ok2 ok3 normal? on the negative features observed in the cur-
3 3 3 no rent case,; compute minCard; minimizing
3 3 7 yes training ML System Compile Tractable (kill unhelpful nodes, edges); enumerate (sub-
3 7 3 no (function)
Data Circuit circuits)
BN/NN/RF……
3 7 7 yes
7 3 3 no PI Explanations (PI: Prime Implicant)
7 3 7 yes
Decision • characteristics make the rest irrelevant?
7 7 3 yes
7 7 7 yes • compute PI; sufficient reasons are all the PI
(BN: Bayesian Nets, NN: Neural Nets, RF: Random Forests) terms.
Concluding all yes and simplify: ¬ok3∨(¬ok1∧¬ok2).
Decision and Classifier Bias: Definition Reasoning about ML Systems: Overview Compiling Binary Neural Networks
Protected features: we don’t want them to influ- Queries Explanation, Robustness, This is a very recent topic.
ence the classification outcome. (e.g. gender, age) Verification, etc. Binary: the whole NN represents a Boolean Function.
Decision is biased if the result changes when we flip ML Systems Neural Networks, Graphi-
the value of a protected feature. • Input to the NN (and to each neuron): Boolean
cal Models, Random Fo- (0/1)
Classifier is biased if one of its decisions is biased. rests, etc.
Tractable Circuits OBDD, SDD, DNNF, etc. • Step activation function:
Decision and Classifier Bias: Judgement
(
Theorem: Decision is biased iff each of its sufficient For more: http://reasoning.cs.ucla.edu/xai/
P
1 i w i xi ≥ T
reasons contains at least one protected feature. σ(x) =
0 otherwise
Robustness (for Decision / Classifier)
Complete Reason (for Decision) Hamming Distance (between instances): the num- where in this case the neuron has a th-
Complete reason is the disjunction (∨) of all suffi- ber of disagreed features. Denoted as d(x1 , x2 ). reshold T and inputs from the last layer are:
cient reasons. (e.g. α = (E ∧ F ∧ G) ∨ (F ∧ W ) — Instance Robustness: x1 , x2 , . . . , xi , . . . , with corresponding weights
we made the decision because of α) w1 , w2 , . . . , wi , . . . .
robustnessf (x) = min d(x, x0 )
x0 :f (x0 )6=f (x) For instance, a neuron that represents 2A+2B −3C ≥
Reason Circuit (for Decision) 1 can be reduced to a Boolean circuit:
Reason Circuit: tractable circuit representation of Model Robustness:
2A+2B-3C≥1
the complete reason. 1 X A=1
model robustness(f ) = n robustnessf (x) A=0
If the classifier is in a special form (e.g. OBDD, 2 x
Decision-DNNF), then reason circuit can be obtained 2B-3C≥1 2B-3C≥-1
directly in linear time. How: Instance Robustness is the minimum amount of flips B=0 B=1 B=0 B=1
needed to change decision. Model Robustness is the
1. compile the classifier into a circuit, and get a -3C≥1 -3C≥-1 -3C≥-3
average of all instances’ robustness. (2n is the amount C=0 C=0
positive instance ready (otherwise work on C=1 C=0 C=1
of instances.) C=1
negation of the classifier circuit);
e.g. odd-parity: the model-robustness is 1. 0 1
2. add consensus:
Compiling I/O of ML Systems Naı̈ve Bayes Classifier
(¬A ∧ α) ∨ (A ∧ β)
α∧β By compiling the input/output behavior of ML sys- Naı̈ve Bayes Classifier:
tems, we can analyze classifiers by tractable circuits.
add all the α ∧ β terms into the circuit. From easiest to hardest conceptually: RF, NN, BN C
main challenge: scaling to large ML systems
3. filtering: go to the branches incompatible with E1 …… En
the instance and kill them.
Compiling Decision Trees and Random Forests
• the reason circuit thereby monotone (po- DT (decision tree): could transfer into multi-valued • Class: C (all Ei depend on C)
sitive feature remains positive, negative fe- propositional logic
ature remains negative) • Features: E1 , . . . En (conditional independent)
x≥2
• because of monotone, can do existential • Instance: e1 , . . . en = e
quantification in linear time. y≥-7 x≥6
Pr(α∧β)
• Class Posterior: (note that) Pr(α|β) = Pr(β)
The reason circuit can be used to handle queries such 0 1 1 0
as: sufficient reasons, necessary properties, necessary Pr(e1 , . . . en )Pr(c)
reason, because statement, . . . Pr(c|e1 , . . . , en ) =
where x ∈ (−∞, 2) → x = x1 , x ∈ [2, 6) → x = x2 , Pr(e1 , . . . en )
x ∈ [6, +∞) → x = x3 ; y ∈ (−∞, −7) → y = y1 , Pr(e1 |c) . . . Pr(en |c)Pr(c)
y ∈ [−7, +∞) → y = y2 . =
Pr(e1 , . . . en )
RF (random forest): majority voting of many DTs.
Naı̈ve Bayes: CPT Compiling Naive Bayes Classifier Solving MPE via MaxSAT: Example
A Bayesian Network has conditional probability Brutal force method: consider sub-classifiers — Given a Bayesian Network (with CPT listed):
tables (CPT) at each of its node. ∆|U and ∆|¬U , recursively.
e.g. previous example node C CPT: Problem: can have exponential size (to # variables). A B
Solution: cache sub-classifiers.
C ΘC Note: Naı̈ve Bayesian Network has threshold T and A B θB|A
c1 θc1 (e.g., 0.1) prior (e.g. in the previous example, we have prior of a1 b1 0.2
... C, and if Pr(C = ci |Ej = ej,x ) ≥ T then, for exam- A θA
a1 b2 0.8
ck θck (e.g., 0.2) ple, the answer is yes, otherwise no). We may have a1 0.3
a2 b1 1
different conditions, different conditional pro- a2 0.5
Pk
where ∀i θci ∈ [0, 1], i=1 θci = 1. a2 b2 0
babilities, sharing the same sub-classifier. a3 0.2
And at node Ej , the CPT: a3 b1 0.6
a3 b2 0.4
C Ej ΘEj |C Application: Solving MPE & MAR
c1 ej,1 θej,1 |c1 (e.g., 0.01) MPE: most probable explanation • Indicator variables:
c1 ej,2 θej,2 |c1 (e.g., 0.03) → NP-complete – from A (values a1 , a2 , a3 ): Ia1 , Ia2 , Ia3
... → probabilistic reasoning, find the world with the lar-
c1 ej,q θej,q |c1 (e.g., 0.1) gest probability – from B (values b1 , b2 ): Ib1 , Ib2
c2 ej,1 θej,1 |c2 (e.g., 0.01) → solved by weighted MaxSAT • Indicator Clauses:
... → compile to DNNF
ck ej,q θej,q |ck (e.g., 0.02) MAR: marginal probability
(Ia1 ∨ Ia2 ∨ Ia3 )W (
→ PP-complete (¬I ∨ ¬I )W (Ib1 ∨ Ib2 )W
Pq a1 a2
where ∀i, j, x θej,x |ci ∈ [0, 1], ∀i, j x=1 θej,x |ci = 1. → sum of all worlds’ probabilities who satisfy certain A B
(¬Ia1 ∨ ¬Ia3 )W (¬Ib1 ∨ ¬Ib2 )W
Under a condition, the marginal probability is 1. conditions
(¬Ia2 ∨ ¬Ia3 )W
∀i, ei , c, Pr(ei |c) are all in CPT tables. → solved by WMC (weighted model counting)
→ compile to d-DNNF
• Parameter Clauses: (= # rows in CPTs)
P
Odds v.s. Probability conditional version: work on “shrunk table” where
some worlds are removed
(¬Ia1 ∨ ¬Ib1 )− log(.2)
Probability: Pr(c) (chance to happen, [0, 1])
− log(.8)
Solving MPE via MaxSAT (¬Ia1 ∨ ¬Ib2 )
Pr(c) − log(.3)
(¬Ia1 )
(¬I ∨ ¬I )− log(1)
Odds: O(c) = • Input: weighted CNF = α1 , . . . , αn (with
a2 b1
Pr(c) A (¬Ia2 )− log(.5) B − log(0)
weights w1 , . . . , wn ) (¬Ia ∨ ¬I b2 )
Pr(c|e) (¬Ia3 )− log(.2)
2
O(c|e) = (conditional odds)
(¬Ia3 ∨ ¬Ib1 )− log(.6)
– (x ∨ ¬y ∨ ¬z)3 , (¬x)10.1 , (y).5 , (z)2.5
Pr(c|e)
(¬Ia3 ∨ ¬Ib2 )− log(.4)
Pr(c|e) – next to the clauses, 3, 10.1, 0.5, 2.5 are the
log O(c|e) = log (log odds)
Pr(c|e) corresponding weights
where we define W = log(0).
In the previous example, if we use log odds instead of – W : the weight of hard clauses, greater
p than the sum of all soft clauses’ weights • the weighted CNF contains all Indicator Clau-
probability, Pr(α) ≥ p ⇐⇒ log O(α) ≥ ρ = 1−p
ses and Parameter Clauses
Qn • find variable assignment with the highest
Pr(c) i=1 Pr(ei |c)/Pr(e) weight / least penalty • Evidence: e.g. A = a1 , adding (Ia1 )W .
O(c|e) = log Qn
Pr(c) i=1 Pr(ei |c)/Pr(e)
n n Wt =weight(x1 , . . . , xn ) =
X
wi Given a certain instantiation Γ, e.g. ¬Ia1 , . . . ¬Ib2 :
X Pr(ei |c) X
= log(c) + log = log(c) + wei x1 ,...xn |=αi
Pr(ei |c)
X
i=1 i=1 X Pn(Γ) = − log θx|v
Pn =penalty(x1 , . . . , xn ) = wi θx|v ∼x
wei is weight of evidence ei , depending on instance. x1 ,...xn 2αi Y
log O(c) is the prior log-odds. Changing class prior = − log θx|v = − log Pr(x)
(shift log O(c)) shifts all log O(c|ei ) the same amount. Wt(x1 , . . . , xn )+Pn(x1 , . . . , xn ) = Ψ (constant) θx|v ∼x
MaxSAT: Solving Solving MAR via WMC: Example MAR as WMC: Example with Local Structure
Previously we’ve discussed methods of solving Max- B B
SAT problems, such as searching. A A
MaxSAT could also be solved by compiling to DNNF C C
and calculate the minCard.
An Example: (unweighted for simplicity) A B θB|A A C θC|A A B θB|A A C θC|A
A θA a1 b1 0.1 a1 c1 0.1 A θA t t 1 t t 0.8
∆:A ∨ B}, ¬A
| {z∨ B}, |{z}
¬B
| {z a1 0.1 a1 b2 0.9 a1 c2 0.9 t 0.5 t f 0 t f 0.2
C0 C1 C2
a2 0.9 a2 b1 0.2 a2 c1 0.2 f 0.5 f t 0 f t 0.2
• add selector variables: S0 , S1 , S2 a2 b2 0.8 a2 c2 0.8 f f 1 f f 0.8
∆0 : A ∨ B ∨ S0 , ¬A ∨ B ∨ S1 , ¬B ∨ S2 • Indicator Variables: Ia1 , Ia2 , Ib1 , Ib2 , Ic1 , Ic2 First we construct the clauses as before (this time de-
note e.g., a1 = a and a2 = a).
• Parameter Variables: Pa1 , Pa2 , Pb1 |a1 , Pb2 |a1 , Local Structure: re-surfacing old concept; here para-
representing whether or not a clause is selected
Pb1 |a2 , Pb2 |a2 , Pc1 |a1 , Pc2 |a1 , Pc1 |a2 , Pc2 |a2 meter values matter.
to be unsatisfiable / thrown away.
• I∗ and P∗ are all Boolean variables. • Zero Parameters (logical constraints): e.g.
• assign weights:
• Indicator Clauses:
I ∧ I (⇐⇒ P → ¬Ia ∨ ¬Ib
hhhh((((
w(S0 ) = 1, w(S1 ) = 1, w(S2 ) = 1
(a((b hhhb|ah
w(¬S0 ) = 0, w(¬S1 ) = 0, w(¬S2 ) = 0 A: Ia1 ∨ Ia2 , ¬Ia1 ∨ ¬Ia2
w(A) = w(¬A) = w(B) = w(¬B) = 0 B: Ib1 ∨ Ib2 , ¬Ib1 ∨ ¬Ib2 • One Parameters (logical constraints): e.g.
C: Ic1 ∨ Ic2 , ¬Ic1 ∨ ¬Ic2
hhhh((((
∧(
Ia ( Ib (⇐⇒
hhP
hb|a
• define cardinality: number of positive selector ( h
variables — computing minCard is the same • Parameter Clauses:
• Equal Parameters: e.g.
with working on the weights
A: Ia1 ⇐⇒ Pa1 , Ia2 ⇐⇒ Pa2 (
• compile ∆0 into DNNF (hopefully) Ia∧Ic
B: Ia1 ∧ Ib1 ⇐⇒ Pb1 |a1 , Ia1 ∧ Ib2 ⇐⇒ Pb2 |a1
→ (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P1
Ia2 ∧ Ib1 ⇐⇒ Pb1 |a2 , Ia2 ∧ Ib2 ⇐⇒ Pb2 |a2 Ia ∧ Ic
• compute minCard, optimal solution minCard =
C: Ia1 ∧ Ic1 ⇐⇒ Pc1 |a1 , Ia1 ∧ Ic2 ⇐⇒ Pc2 |a1
1 achieved when S0 , ¬S1 , ¬S2 , ¬A, ¬B; solution:
• Context-Specific Independence (CSI): indepen-
Ia2 ∧ Ic1 ⇐⇒ Pc1 |a2 , Ia2 ∧ Ic2 ⇐⇒ Pc2 |a2
¬A, ¬B; satisfied clauses: C1 , C2 .
dent only when considering some specific worlds
the rule is:
Factor v.s. Distribution With local structure considered, the clauses:
Factor sums up to anything; Iu1 ∧ · · · ∧ Ium ∧ Ix ⇐⇒ Px|Iu1 ...Ium
Ia ∨ Ia Ib ∨ Ib Ic ∨ Ic
Distribution sums up to 1.
¬Ia ∨ ¬Ia ¬Ib ∨ ¬Ib ¬Ic ∨ ¬Ic
• Weights are defined as:
¬Ia ∨ ¬Ib
Solving MAR via WMC
Wt(Ix ) = Wt(¬Ix ) = Wt(¬Px|u ) = 1 ¬Ia ∨ ¬Ia
Bayesian Reduce Weighted Compile Tractable (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P1 (0.8 prob)
Boolean Boolean
Wt(Px|u ) = θx|u
Network (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P2 (0.2 prob)
Formula Circuit
e.g. Pb2 |a2 has 0.8 weight. Ia ∨ Ia ⇐⇒ P3 (0.5 prob)
X∨Y Y∨¬Z ¬X∨Q cutset {C3} Extended Resolution: might reduce cost. (e.g.
separator {v1,v3} Pigeonhole: exponential to polynomial) e.g.:
(c) incidence graph resolution: (recall)
Cutwidth and pathwidth are both influenced by vari-
able ordering. X ∨ α, ¬X ∨ β
1 Cutwidth of a variable order: size of the largest cut- α∨β
Y X set, e.g. 3 in this case. (cutset is the set of clauses
that crosses a cut.) 1. {¬A, C} ∆
2. {¬B, C} ∆
Z 2 3 Q Cutwidth of CNF: smallest cutwidth attained by
3. {¬C, D} ∆
any variable order.
Pathwidth of a variable order: size of the largest se- 4. {¬D} ¬α
(d) hypergraph parator. e.g. 3 in this case. (separator is the set of 5. {A} ¬α
variables that appear in the clauses within the cut- 6. {¬C} 3, 4
set, and before the cut — according to the variable 7. {¬A} 1, 6
Graph Properties: Treewidth ordering.) 8. {} 5, 7
Tree width of graph G: tw(G) is the minimum width Pathwidth of CNF: smallest pathwidth attained by
extension rule: (carefully choose literals `i and X
among all tree-decomposition of G. a any variable order.
is new/unseen to this CNF)
In many cases, good performance is guaranteed when
there’s a small treewidth. AC: Conclusions X ⇐⇒ `1 ∨ `2
a https://en.wikipedia.org/wiki/Treewidth
Two fundamental notations:
1. Arithmetic Circuit (AC): indicator variables, cons- which is equivalent with adding the following clauses:
tants, additions, multiplications
2. Evaluation AC (at evidence): set indicator → 1 if ¬X ∨ `1 ∨ `2
its subscript is consistent with evidence, otherwise 0. X ∨ ¬`1
Three fundamental questions: (1) reference factor X ∨ ¬`2
f (x)? (2) marginal of factor? (3) MPE of factor?
Intuition: resolving multiple variables all at once.