0% found this document useful (0 votes)
3 views

CS264A_Review_Note

Uploaded by

De Zheng Zhao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

CS264A_Review_Note

Uploaded by

De Zheng Zhao
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CS264A Automated Reasoning Review Note

2020 Fall By Zhiping (Patricia) Xiao

Notation Existential Quantification Useful Equations Semantic Relationships


variable x, α, β, . . . (a.k.a. propositional α ⇒ β = ¬α ∨ β Equivalence: α and β are equivalent iff
variable / Boolean variable)
literal x, ¬x
α ⇒ β = ¬β ⇒ ¬α Mods(α) = Mods(β)
conjunction conjunction of α and β: α ∧ β ¬(α ∨ β) = ¬α ∧ ¬β
disjunction disjunction of α and β: α ∨ β ¬(α ∧ β) = ¬α ∨ ¬β Mutually Exclusive: α and β are equivalent iff
negation negation of α: ¬α
γ ∧ (α ∨ β) = (γ ∧ α) ∨ (γ ∧ β) Mods(α ∧ β) = Mods(α) ∩ Mods(β) = ∅
sentence variables are sentences; nega-
tion, conjunction, and disjunc- γ ∨ (α ∧ β) = (γ ∨ α) ∧ (γ ∨ β)
tion of sentences are sentences Exhaustive: α and β are exhaustive iff
term conjunction (∧) of literals Models
clause disjunction (∨) of literals Mods(α ∨ β) = Mods(α) ∪ Mods(β) = W
normal forms universal format of all lo-
Listing the 2n worlds wi involving n variables, we have
gic sentences (everyone can be a truth table. that is, when α ∨ β is valid.
transformed into CNF/DNF) If sentence α is true at world ω, ω |= α, we say: Entailment: α entails β (α |= β) iff
CNF conjunctive normal form, con-
• sentence α holds at world ω Mods(α) ⊆ Mods(β)
junction (∧) of clauses (∨)
DNF disjunctive normal form, dis- • ω satisfies α That is, satisfying α is stricter than satisfying β.
junction (∨) of terms (∧)
world ω: truth assignment of all varia- • ω entails α Monotonicity: the property of relations, that
bles (e.g. ω |= α means sentence
α holds at world ω) otherwise ω 6|= α. • if α implies β, then α ∧ γ implies β;
models Mods(α) = {ω : ω |= α} Mods(α) is called models/meaning of α: • if α entails β, then α ∧ γ entails β;
Mods(α) = {ω : ω |= α} it infers that adding more knowledge to the existing
Main Content of CS264A
KB (knowledge base) never recalls anything. This is
• Foundations: logic, quantified Boolean logic, Mods(α ∧ β) = Mods(α) ∩ Mods(β)
considered a limitation of traditional logic. Proof:
SAT solver, Max-SAT etc., compiling kno- Mods(α ∨ β) = Mods(α) ∪ Mods(β)
wledge into tractable circuit (the book chapters) Mods(α ∧ γ) ⊆ Mods(α) ⊆ Mods(β)
Mods(¬α) = Mods(α)
• Application: three modern roles of logic in AI
ω |= α: world ω entails/satisfies sentence α. Quantified Boolean Logic: Notations
1. logic for computation α ` β: sentence α derives sentence β.
Our discussion on quantified Boolean logic centers
2. logic for leaning from knowledge / data around conditioning and restriction. (|, ∃, ∀) With a
Semantic Properties
3. logic for meta-learning propositional sentence ∆ and a variable P :
Defining ∅ as empty set and W as the set of all worlds.
Consistency: α is consistent when • condition ∆ on P : ∆|P
Syntax and Semantics of Logic
i.e. replacing all occurrences of P by true.
Logic syntax, “how to express”, include the literal, Mods(α) 6= ∅
etc. all the way to normal forms (CNF/DNF). • condition ∆ on ¬P : ∆|¬P
Logic semantic, “what does it mean”, could be dis- Validity: α is valid when
i.e. replacing all occurrences of P by false.
cussed from two perspectives:
Mods(α) = W Boolean’s/Shanoon’s Expansion:
• properties: consistency, validity etc. (of a sen-
tence) α is valid iff ¬α is inconsistent.    
α is consistent iff ¬α is invalid. ∆ = P ∧ (∆|P ) ∨ ¬P ∧ (∆|¬P )
• relationships: equivalence, entailment, mutual
exclusiveness etc. (of sentences) it enables recursively solving logic, e.g. DPLL.
Existential & Universal Qualification Resolution / Inference Rule Clausal Form of CNF
Existential Qualification: Modus Ponens (MP): CNF, the Conjunctive Normal Form, is a conjunction
of clauses.
∃P ∆ = ∆|P ∨ ∆|¬P α, α ⇒ β ∆ = C1 ∧ C2 ∧ . . .
β
Universal Qualification: written in clausal form as:
Resolution:
∀P ∆ = ∆|P ∧ ∆|¬P α ∨ β, ¬β ∨ γ ∆ = {C1 , C2 . . . }
α∨γ
Duality: where each clause Ci is a disjuntion of literals:
 equivalent to:
∃P ∆ = ¬ ∀P ¬∆ ¬α ⇒ β, β ⇒ γ
 Ci = li1 ∨ li2 ∨ li3 ∨ . . .
∀P ∆ = ¬ ∃P ¬∆ ¬α ⇒ γ
The quantified Boolean logic is different from first- Above the line are the known conditions, below the written in clausal form as:
order logic, for it does not express everything as ob- line is what could be inferred from them.
In the resolution example, α ∨ γ is called a Ci = {li1 , li2 , li3 }
jects and relations among objects.
“resolvent”. We can say it either way:
Resolution in the clausal form is formalized as:
Forgetting
• resolve α ∨ β with ¬β ∨ γ
The right-hand-side of the above-mentioned equation: • Given clauses Ci and Cj where literal P ∈ Ci
• resolve over β and literal ¬P ∈ Cj
∃P ∆ = ∆|P ∨ ∆|¬P
• do β-resolution • The resolvent is (Ci \{P }) ∪ (Cj \{¬P }) (Nota-
doesn’t include P . tion: removing set {P } from set Ci is written as
MP is a special case of resolution where α = true. Ci \{P })
Here we have an example: ∆ = {A ⇒ B, B ⇒ C},
It is always written as:
then: If the clausal form of a CNF contains an empty
∆ = (¬A ∨ B) ∧ (¬B ∨ C)
∆ = {α ∨ β, ¬β ∨ γ} `R α ∨ γ clause (∃i, Ci = ∅ = {}), then it makes the CNF
∆|B = C inconsistent / unsatisfiable.
∆|¬B = ¬A Applications of resolution rules:
∴ ∃E∆ = ∆|B ∨ ∆|¬E = ¬A ∨ C 1. existential quantification Existential Quantification via Resolution

• ∆ |= ∃P ∆ 2. simplifying KB (∆)
1. Turning KB ∆ into CNF.

• If α is a sentence that does not mention P then 3. deduction (strategies of resolution, directed re-
2. To existentially Quantify B, do all B-resolutions
∆ |= α ⇐⇒ ∃P ∆ |= P solution) 3. Drop all clauses containing B
We can safely remove P from ∆ when considering
existential qualification. It is called: Completeness of Resolution / Inference Rule Unit Resolution
We say rule R is complete, iff ∀α, if ∆ |= α then Unit resolution is a special case of resolution, where
• forgetting P from ∆
∆ `R α. min(|Ci |, |Cj |) = 1 where |Ci | denotes the size of set
• projecting P on all units / variables but P In other words, R is complete when it could “discover Ci . Unit resolution corresponds to modus ponens
everything from ∆”. (MP). It is NOT refutation complete. But it has
Resolution / inference rule is NOT complete. A benefits in efficiency: could be applied in linear time.
counter example is: ∆ = {A, B}, α = A ∨ B.
However, when applied to CNF, resolution is refuta- Refutation Theorem
tion complete. Which means that it is sufficient to ∆ |= α iff ∆ ∧ ¬α is inconsistent. (useful in proof)
discover any inconsistency.
• resolution finds contradiction on ∆∧¬α: ∆ |= α
• resolution does not find any contradiction on
∆ ∧ ¬α: ∆ 0 α
Resolution Strategies: Linear Resolution Directed Resolution: Forgetting SAT Solvers
All the clauses that are originally included in CNF ∆ Directed resolution can be applied to forgetting / pro- The SAT-solvers we learn in this course are:
are root clauses. jecting.
Linear resolution resolved Ci and Cj only if one of • requiring modest space
When we do existential quantification on variables
them is root or an ancestor of the other clause. P1 , P2 , . . . Pm , we: • foundations of many other things
An example: ∆ = {¬A, C}, {¬C, D}, {A}, {¬C, ¬D}.
1. put them in the first m places of the variable Along the line there are: SAT I, SAT II, DPLL, and
order
{¬A,C} {¬C,D} {A} {¬C,¬D} other modern SAT solvers.
They can be viewed as optimized searcher on all the
2. after processing the first m (P1 , P2 , . . . Pm ) buc-
{C} kets, remove the first m buckets
worlds ωi looking for a world satisfying ∆.

{D} 3. keep the clauses (original clause or resolvent) in SAT I


the remaining buckets 1. SAT-I (∆, n, d):
{¬C}
2. If d = n:
then it is done.
3. If ∆ = {}, return {}
{}
Utility of Using Graphs 4. If ∆ = {{}}, return FAIL
5. If L = SAT-I(∆|Pd+1 , n, d + 1) 6= FAIL:
Primal Graph: Each node represents a variable P . 6. return L ∪ {Pd+1 }
Resolution Strategies: Directed Resolution
Given CNF ∆, if there’s at least a clause ∃C ∈ ∆ such 7. If L = SAT-I(∆|¬Pd+1 , n, d + 1) 6= FAIL:
Directed resolution is based on bucket elimination, that li , lj ∈ C, then the corresponding nodes Pi and
and requires pre-defining an order to process the va- 8. return L ∪ {¬Pd+1 }
Pj are connected by an edge. 9. return FAIL
riables. The steps are as follows: The tree width (w) (a property of graph) can be used
to estimate time & space complexity. e.g. comple- ∆: a CNF, unsat when {} ∈ ∆, satisfied when ∆ = {}
1. With n variables, we have n buckets, each cor-
xity of directed resolution. e.g. Space complexity of n: number of variables, P1 , P2 . . . Pn
responds to a variable, listed from the top to the
n variables is O(n exp (w)). d: the depth of the current node
bottom in order.
For more, see textbook — min-fill heuristic.
• root node has depth 0, corresponds to P1
2. Fill the clauses into the buckets. Scanning top- Decision Tree: Can be used for model-counting. e.g.
side-down, putting each clause into the first buc- ∆ = A ∧ (B ∨ C), where n = 3, then: • nodes at depth n − 1 try Pn
ket whose corresponding variable is included in
A • leave nodes are at depth n, each represents a
the clause.
world ωi
3. Process the buckets top-side-down, whenever we B false
have a P -resolvent Cij , put it into the first fol- Typical DFS (depth-first search) algorithm.
lowing bucket whose corresponding variable is true C • DFS, thus O(n) space requirement (moderate)
included in Cij .
true false • No pruning, thus O(2n ) time complexity
An example: ∆ = {¬A, C}, {¬C, D}, {A}, {¬C, ¬D},
with variable order A, D, C, initialized as: high child (true)
low child (false) SAT II
A: {¬A, C}, {A} 1. SAT-II (∆, n, d):
D: {¬C, D}, {¬C, ¬D} for counting purpose we assign value 2n = 23 = 8 to 2. If ∆ = {}, return {}
C: the root (A in this case), and 2n−1 = 4 to the next 3. If ∆ = {{}}, return FAIL
level (its direct children), etc. and finally we sum up 4. If L = SAT-II(∆|Pd+1 , n, d + 1) 6= FAIL:
After processing finds {} ({C} is the A-resolvent, the values assigned to all true values. Here we have: 5. return L ∪ {Pd+1 }
{¬C} is the B-resolvent, {} is a C-resolvent): 2 + 1 = 3. |Mods(∆)| = 3. Constructing via: 6. If L = SAT-II(∆|¬Pd+1 , n, d + 1) 6= FAIL:
• If inconsistent then put false here. 7. return L ∪ {¬Pd+1 }
A: {¬A, C}, {A}
8. return FAIL
D: {¬C, D}, {¬C, ¬D}
• Directed resolution could be used to build a de-
C: {C}, {¬C}, {} Mostly SAT I, plus early-stop.
cision tree. P -bucket: P nodes.
Termination Tree Implication Graphs Asserting Clause & Assertion Level
Termination tree is a sub-tree of the complete search Implication Graph is used to find more clauses to Asserting Clause: Including only one variable
space (which is a depth-n complete binary tree), in- add to the KB, so as to empower the algorithm. at the last (highest) decision level. (The last
cluding only the nodes visited while running the algo- An example of an implication graph upon the first decision-level means the level where the last deci-
rithm. conflict found when running DPLL+ for ∆: sion/implication is made.)
When drawing the termination tree of SAT I and SAT Assertion Level (AL): The second-highest level
II, we put a cross (X) on the failed nodes, with {{}} 1.{A,B} in the clause. (Note: 3 is higher than 0.)
label next to it. Keep going until we find an answer 2.{B,C} An example (following the previous example, on the
3.{¬A,¬X,Y}
— where ∆ = {}. learned clauses):
4.{¬A,X,Y} ∆
5.{¬A,¬Y,Z}
Unit-Resolution 6.{¬A,X,¬Z}
Clause Decision-Levels Asserting? AL
7.{¬A,¬Y,¬Z}
1. Unit-Resolution (∆): {¬A, ¬X} {0, 3} Yes 0
2. I = unit clauses in ∆ decision implication {¬A, ¬Y } {0, 3} Yes 0
depth
3. If I = {}: return (I, ∆) 7 {¬A, ¬Y, ¬Z} {0, 3, 3} No 0
4. Γ = ∆|I A 0 0/A=t
t 5 3/Z=t
5. If Γ = ∆: return (I, Γ) 3 7
B 1 1/B=t 5 DPLL+
t
6. return Unit-Resolution(Γ)
7 3/{}
t C 2 2/C=t 3/Y=t 01. DPLL+ (∆):
Used in DPLL, at each node. 3 02. D ← ()
X 3 3/X=t
t 03. Γ ← {}
DPLL Y=t
04. While true Do:
Z=t
01. DPLL (∆): 05. (I, L) = Unit-Resolution(∆ ∧ Γ ∧ D)
02. (I, Γ) = Unit-Resolution(∆) 06. If {} ∈ L:
There, the decisions and implications assignments of 07. If D = (): return false
03. If Γ = {}, return I
variables are labeled by the depth at which the value 08. Else (backtrack to assertion level):
04. If {} ∈ Γ, return FAIL
is determined. 09. α ← asserting clause
05. choose a literal l in Γ
The edges are labeled by the ID of the correspon- 10. m ← AL(α)
06. If L = DPLL(Γ ∪ {{l}}) 6= FAIL:
ding rule in ∆, which is used to generate a unit clause 11. D ← first m + 1 decisions in D
07. return L ∪ I
(make an implication). 12. Γ ← Γ ∪ {α}
08. If L = DPLL(Γ ∪ {{¬l}}) 6= FAIL:
09. return L ∪ I 13. Else:
10. return FAIL Implication Graphs: Cuts 14. find ` where {`} ∈/ I and {¬`} ∈/I
Cuts in an Implication Graph can be used to identify 15. If an ` is found: D ← D; `
Mostly SAT II, plus unit-resolution. 15. Else: return true
the conflict sets. Still following the previous example:
Unit-Resolution is used at each node looking for
entailed value, to save searching steps. true if the CNF ∆ is satisfiable, otherwise false.
If there’s any implication made by Unit- Γ is the learned clauses, D is the decision sequence.
1.{A,B} 7
Resolution, we write down the values next to 0/A=t Idea: Backtrack to the assertion level, add the
2.{B,C} 5 3/Z=t
the node where the implication is made. (e.g. 3.{¬A,¬X,Y} 3 7
conflict-driven clause to the knowledge base, apply
1/B=t 5 unit resolution.
A = t, B = f, . . . ) ∆= 4.{¬A,X,Y} 3/{}
This is NOT a standard DFS. Unit-Resolution 5.{¬A,¬Y,Z} 2/C=t 7 Selecting α: find the first UIP.
3/Y=t
component makes the searching flexible. 6.{¬A,X,¬Z} 3
3/X=t Cut#2
7.{¬A,¬Y,¬Z} A=t,Y=t UIP (Unique Implication Path)
Cut#1 Cut#3
Non-chronological Backtracking A=t,X=t A=t,Y=t,Z=t The variable that set on every path from the last de-
Chronological backtracking is when we find a con- cision level to the contradiction.
tradiction/FAIL in searching, backtrack to parent. The first UIP is the closest to the contradiction.
Here Cut#1 results in learned clause {¬A, ¬X},
Non-chronological backtracking is an optimiza- For example, in the previous example, the last UIP
Cut#2 learned clause {¬A, ¬Y }, Cut#3 learned
tion that we jump to earlier nodes. a.k.a. conflict- is 3/X = t, while the first UIP is 3/Y = t.
clause {¬A, ¬Y, ¬Z}.
directed backtracking.
Exhaustive DPLL Certifying UNSAT: Method #2 SAT using Local Search
Exhaustive DPLL: DPLL that doesn’t stop when Verifying the Γ generated from the SAT solver after The general idea is to start from a random guess of
finding a solution. Keeps going until explored the running on ∆ is a correct one. the world ω, if UNSAT, move to another world by
whole search space. flipping one variable in ω (P to ¬P , or ¬P to P ).
It is useful for model-counting. • Will ∆ ∪ Γ produce any inconsistency?
However, recall that, DPLL is based on that ∆ is sa- • Random CNF: n variables, m clauses. When
– Can use Unit-Resolution to check. m/n gets extremely small or large, it is ea-
tisfiable iff ∆|P is satisfiable or ∆|¬P is satisfiable,
which infers that we do not have to test both bran- • CNF Γ = {α1 , α2 , . . . , αn } comes from ∆? sier to randomly generate a world (thinking of
n
ches to determine satisfiability. m : when m/n → 0 it is almost always SAT,
Therefore, we have smarter algorithm for model- – ∆ ∧ ¬αi is inconsistent for all clauses αi . m/n → ∞ will make it almost always UNSAT).
counting using DPLL: CDPLL. – Can use Unit-Resolution to check. In practice, the split point is m/n ≈ 4.24.
Two ideas to generate random clauses:
CDPLL Why Unit-Resolution is enough: {αi }ni=1 are gene-
rated from cuts in an implication graph. The im- – 1st idea: variable-length clauses
1. CDPLL (Γ, n): plication graph is built upon conflicts found by Unit-
2. If Γ = {}: return 2n – 2nd idea: fixed-length clauses (k-SAT, e.g.
Resolution. Therefore, the conflicts can be detected 3-SAT)
4. If {} ∈ Γ: return 0 by Unit-Resolution.
5. choose a literal l in Γ • Strategy of Taking a Move:
6. (I+ , Γ+ ) = Unit-Resolution(Γ ∪ {{l}})
UNSAT Cores
7. (I− , Γ− ) = Unit-Resolution(Γ ∪ {{¬l}}) – Use a cost function to determine the qua-
8. return CDPLL(Γ+ , n − |I+ |)+ For CNF ∆ = {α1 , α2 , . . . , αn }, an UNSAT core is any lity of a world.
9. CDPLL(Γ− , n − |I− |) subsets consisting of some αi ∈ ∆ that is inconsistent
together. There exists at least one UNSAT core iff ∆ ∗ Simplest cost function: the number of
n is the number of variables, it is very essential when is UNSAT. unsatisfied clauses.
counting the models. A minimal UNSAT core is an UNSAT core of ∆ ∗ A lot of variations.
An example of the termination tree: that, if we remove a clause from this UNSAT core, ∗ Intend to go to lower-cost direction.
A the remaining clauses become consistent together. (“hill-climbing”)
1.{¬A,B} t f
∆= – Termination Criteria: No neighbor is bet-
2.{¬B,C} B=t,C=t 1
B More on SAT ter (smaller cost) than the current world.
t f
• Can SAT solver be faster than linear time? (Local, not global optima yet.)
C=t 1 2
– 2-literal watching (in textbook) – Avoid local optima: Randomly restart
multiple times.
Certifying UNSAT: Method #1 • The “phase-selection” / variable ordering pro-
blem (including the decision on trying P or ¬P • Algorithms:
When a query is satisfiable, we have an answer to
certify. first)? – GSAT: hill-climbing + side-move (moving
However, when it is unsatisfiable, we also want to va- – An efficient and simple way: “try to try the to neighbors whose cost is equal to ω)
lidate this conclusion. phase you’ve tried before”. — This is be- – WALKSAT: iterative repair
One method is via verifying UNSAT directly (example
∆ from implication graphs), example: cause of the way modern SAT solvers work ∗ randomly pick an unsatisfied clause
(cache, etc.).
level assignment reason ∗ pick a variable within that clause to
-1 flip, such that it will result in the
0 A
1 B fewest previously satisfied clauses be-
2 C coming unsatisfied, then flip it
3 X
Y ¬A ∨ ¬X ∨ Y – Combination of logic and randomness:
Z ¬A ∨ ¬Y ∨ Z
∗ randomly select a neighbor, if bet-
And then learned clause ¬A ∨ ¬Y is applied. Learned ter than current node then move,
clause is asserting, AL = 0 so we add ¬Y to level 0, otherwise move at a probability (de-
right after A, then keep going from ¬Y . termined by how much worse it is)
Max-SAT Max-SAT Resolution Directed Max-SAT Resolution: Example
Max-SAT is an optimization version of SAT. In other In Max-SAT, in order to keep the same ∆ = (¬a ∨ c) ∧ (a) ∧ (¬a ∨ b) ∧ (¬b ∨ ¬c)
words, Max-SAT is an optimizer SAT solver. cost/score before and after resolution, we: Variable order: a, b, c.
Goal: finding the assignment of variables that maxi- First resolve on a:
mizes the number of satisfied clauses in a CNF • Abandon the resolved clauses;
∆. (We can easily come up with other variations, such (¬a∨c) (a) (¬a∨b) (¬b∨¬c)
• Add compensation clauses.
as Min-SAT etc.)
Considering the following two clauses to resolve: (c)
• We assign a weight to each clause as the score (a∨¬c)
of satisfying it / cost of violating it. c
}|1
(b∨¬c)
z {
x ∨ `1 ∨ `2 ∨ · · · ∨ `m
• We maximize the score. (This is only one way (¬a∨b∨c)
of solving the problem, we can also do it by mi- ¬x ∨ o1 ∨ o2 ∨ · · · ∨ on (a∨¬c∨¬b)
| {z }
nimizing the cost. — Note: score is different c2

from cost.) Then resolve on b:


The results are the resolvent c1 ∨ c2 , and the compen-
Solving Max-SAT problems generally goes into three sation clauses: (¬a∨c) (a) (¬a∨b) (¬b∨¬c)
directions: c1 ∨ c2
(c)
• Local Search x ∨ c1 ∨ ¬o1 (a∨¬c)
x ∨ c1 ∨ o1 ∨ ¬o2
• Systematic Search (branch and bound etc.) (b∨¬c)
.. (¬a∨b∨c)
• Max-SAT Resolution .
(a∨¬c∨¬b)
x ∨ c1 ∨ o1 ∨ o2 ∨ · · · ∨ ¬on (¬c)
Max-SAT Example ¬x ∨ c2 ∨ ¬`1
We have images I1 , I2 , I3 , I4 , with weights (impor- ¬x ∨ c2 ∨ `1 ∨ ¬`2 Finally:
tance) 5, 4, 3, 6 respectively, knowing: (1) I1 , I4 can’t
.. (¬a∨c) (a) (¬a∨b) (¬b∨¬c)
be taken together (2) I2 , I4 can’t be taken together (3) .
I1 , I2 if overlap then discount by 2 (4) I1 , I3 if overlap ¬x ∨ c2 ∨ `1 ∨ `2 ∨ · · · ∨ ¬`m
then discount by 1 (5) I2 , I3 if overlap then discount (c)
by 1. (a∨¬c)
Then we have the knowledge base ∆ as: Directed Max-SAT Resolution
1. Pick an order of the variables, say, x1 , x2 , . . . , xn (b∨¬c)
∆ :(I1 , 5) (¬a∨b∨c)
2. For each xi , exhaust all possible Max-SAT (a∨¬c∨¬b)
(I2 , 4)
(I3 , 3)
resolutions, the move on to xi+1 . (¬c)
(I4 , 6) When resolving xi , using only the clauses that does
(¬I1 ∨ ¬I2 , 2) not mention any xj , ∀j < i. false
(¬I1 ∨ ¬I3 , 1)
(¬I2 ∨ ¬I3 , 1) Resolve two clauses on xi only when there isn’t The final output is:
(¬I1 ∨ ¬I4 , ∞) a xj 6= xi that xj and ¬xj belongs to the two clauses
 
(¬I2 ∨ ¬I4 , ∞) each. (Formally: do not contain complementary false, (¬a ∨ b ∨ c), (a ∨ ¬b ∨ ¬c)
literals on xj 6= xi .)
To simply the example we look at I1 and I2 only: Where Γ = (¬a ∨ b ∨ c) ∧ (a ∨ ¬b ∨ ¬c), and k = 1,
Ignore the resolvent and compensation clauses when
I1 I2 score cost they’ve appeared before, as original clauses, resolvent indicating that there must be at least one clause in ∆
3 3 9 0
3 7 5 4
clauses, or compensation clauses. that is not satisfiable.
7 3 4 5
7 7 0 9
In the end, there remains k false (conflicts), and Γ Beyond NP
In practice we list the truth table of I1 through I4 (guaranteed to be satisfiable). k is the minimum Some problems, even those harder than NP problems
(24 = 16 worlds). cost, each world satisfying Γ achieves this cost. can be reduced to logical reasoning.
Complexity Classes Bayesian Network to MAJ-SAT Problem NNF Properties
Shown in the figure are some example of the complete A Maj-SAT problem consists of: Property On Whom Satisfied NNF
problems. Decomposability and DNNF
• #SAT Problem (model counting) Determinism or d-NNF
SDP • WMC Problem (weighted model counting)
Smoothness or s-NNF
PPPP Flatness whole NNF f-NNF
Decision or BDD (FBDD)
MAP Consider WMC (weighted model counting) problem,
Ordering each node OBDD
NPPP e.g. three variables A, B, C, weight of world A =
MAR
t, B = t, C = f should be: Decomposability: for any and node, any pair of its
PP
children must be on disjoint variable sets. (e.g. one
w(A, B, ¬C) = w(A)w(B)w(¬C)
child A ∨ B, the other C ∨ D)
NP MPE Determinism: for any or node, any pair of its chil-
Typically, in a Bayesian network, where both B and
C depend on A: dren must be mutually exclusive. (e.g. one child
A ∧ B, the other ¬A ∧ B)
abbr. meaning Smoothness: for any or node, any pair of its chil-
B
SDP Same-Decision Probability dren must be on the same variable set. (e.g. one
MAP Maximum A Posterior hypothesis A child A ∧ B, the other ¬A ∧ ¬B)
MAR MArginal Probabilities C Flatness: the height of each sentence (sentence: from
MPE Most Probable Explanation root — select one child when seeing or ; all children
And we therefore have: when seeing and — all the way to the leaves / literals)
A complete problem means that it is one of the is at most 2 (depth 0, 1, 2 only). (e.g. CNF, DNF)
hardest problems of its complexity class. e.g. NP- Decision: a decision node N can be true, false,
Prob(A = t, B = t, C = t) = θA θB|A θC|A
complete: among all NP problem, there is not any or being an or-node (X ∧ α) ∨ (¬X ∧ β) (X: variable,
problem harder than it. where Θ = {θA , θ¬A } ∪ {θB|A , θ¬B|A , θB|¬A , θ¬B|¬A } α, β: decision nodes, decided on dVar(N ) = X).
Our goal: Reduce complete problems to prototy- ∪ {θC|A , θ¬C|A , θC|¬A , θ¬C|¬A } are the parameters Ordering: make no sense if not decision (FBDD);
pical problems (Boolean formula), then transform within the Bayesian network at nodes A, B, C respec- variables are decided following a fixed order.
them into tractable Boolean circuits. tively, indicating the probabilities.
Though slightly more complex than treating each va- NNF Queries
Prototypical Problems riable equally, by working on Θ we can safely reduce
Abbr. Spelled Name description
any Bayesian network to a Maj-SAT problem.
MAJ-MAJ-SAT CO consistency check SAT (∆)
PPPP VA validity check ¬SAT (¬∆)
E-MAJ-SAT
NNF (Negation Normal Form) SE sentence entailment check ∆1 |= ∆2
NP PP NNF is the form of Tractable Boolean Circuit we CE clausal entailment check ∆ |= clause α
are specifically interested in. IM implicant testing ∆ |= term `
MAJ-SAT EQ equivalence testing ∆1 = ∆2
PP In an NNF, leave nodes are true, false, P or ¬P;
CT model counting |Mods(∆)|
internal nodes are either and or or, indicating an ope-
ME model enumeration ω ∈ Mods(∆)
NP SAT ration on all its children.
Our goal is to get the above-listed queries done on
Tractable Boolean Circuits our circuit within polytime.
abbr. meaning We draw an NNF as if it is made up of logic. From a Besides, we also seek for polytime transformations:
SAT satisfiability
MAJ-SAT majority-instantiation satisfiability circuit perspective, it is made up of gates. Projection (existential quantification), Conditioning,
E-MAJ-SAT with (X, Y )-split of the variables, exists Conjoin, Disjoin, Negate, etc.
an X-instantiation that satisfies the ma-
jority of Y -instantiation.
MAJ-MAJ-SAT with (X, Y )-split of the variables, the
majority of X-instantiation satisfies the
majority of Y -instantiation.
and or not
Again, those are all complete problems.
The Capability of NNFs on Queries The Capability of NNFs on Transformations DNNF
CD FO SFO ∧C ∧BC ∨C ∨BC ¬C CO: check consistency in polytime, because:
NNF NNF 3 ◦ 3 3 3 3 3 3
d-NNF 3 ◦ 3 3 3 3 3 3 
s-NNF 3 ◦ 3 3 3 3 3 3 SAT(A ∨ B) = SAT(A) ∨ SAT(B)
CO, CE, ME

f-NNF 3 ◦ 3 7 7 7 7 3

SAT(A ∧ B) = SAT(A) ∧ SAT(B) // DNNF only

d-NNF s-NNF DNNF f-NNF

DNNF 3 3 3 ◦ ◦ 3 3 ◦ 


VA, IM, CT d-DNNF 3 ◦ ◦ ◦ ◦ ◦ ◦ ? SAT(X) = true
FBDD 3 7 ◦ 7 ◦ 7 ◦ 3
BDD d-DNNF EQ?
OBDD 3 7 3 7 ◦ 7 ◦ 3 SAT(¬X) = true

OBDD< 3 7 3 7 3 7 3 3 
VA, IM
SAT(true) = true


BDD 3 3 3 3 3 3 3

FBDD sd-DNNF DNF CNF

EQ?

sd-DNNF 3 3 3 3 ? ◦ 3 3

SAT(false) = false

DNF 3 3 3 7 3 3 3 7
EQ
CNF 3 ◦ 3 3 3 7 3 7
OBDD PI 3 3 3 7 7 7 3 7
IP 3 7 7 7 3 7 7 7
CE: clausal entailment, check ∆ |= α (α = `1 ∨
SE EQ, SE VA, IM, EQ, SE CO , CE, EQ, SE, ME
MODS 3 3 3 7 3 7 7 7 `2 . . . `n ) by checking the consistency of:
OBDD< MODS IP PI

3: can be done in polytime ∆ ∧ ¬`1 ∧ ¬`2 ∧ · · · ∧ ¬`n


CO VA CE IM EQ SE CT ME ◦: cannot be done in polytime unless P = N P .
NNF ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ constructing a new NNF of it by making NNF of ∆
d-NNF ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
7: cannot be done in polytime even if P = N P
s-NNF ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ?: remain unclear (no proof yet) and the NNF of ¬α direct child of root-node and.
f-NNF ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ When a variable P appear in both α and ∆, the new
DNNF 3 ◦ 3 ◦ ◦ ◦ ◦ 3
d-DNNF 3 3 3 3 ? ◦ 3 3 Variations of NNF NNF is not DNNF. We fix this by conditioning ∆’s
FBDD 3 3 3 3 ? ◦ 3 3
Acronym Description
NNF on P or ¬P , depending on either P or ¬P ap-
OBDD 3 3 3 3 3 ◦ 3 3
OBDD< 3 3 3 3 3 3 3 3 NNF Negation Normal Form pears in α. (∆ → (¬P ∧ ∆|¬P ) ∨ (P ∧ ∆|P )) If P in
d-NNF Deterministic Negation Normal Form
BDD ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦
s-NNF Smooth Negation Normal Form
α, then ¬P in ¬α, we do ∆|¬P .
sd-DNNF 3 3 3 3 ? ◦ 3 3
DNF 3 ◦ 3 ◦ ◦ ◦ ◦ 3 f-NNF Flat Negation Normal Form Interestingly, this transformation might turn a non-
DNNF Decomposable Negation Normal Form
CNF ◦ 3 ◦ 3 ◦ ◦ ◦ ◦
d-DNNF Deterministic Decomposable Negation Normal
DNNF NNF (troubled by A) into DNNF.
PI 3 3 3 3 3 3 ◦ 3
IP 3 3 3 3 3 3 ◦ 3 Form CD: conditioning, ∆|A is to replace all A in NNF
sd-DNNF Smooth Deterministic Decomposable Negation
MODS 3 3 3 3 3 3 3 3
Normal Form
with true and ¬A with false. For ∆|¬A, vice versa.
BDD Binary Decision Diagram ME: model enumeration, CO + CD → ME, we keep
3: can be done in polytime FBDD Free Binary Decision Diagram
checking ∆|X, ∆|¬X, etc.
◦: cannot be done in polytime unless P = N P . OBDD Ordered Binary Decision Diagram
OBDD< Ordered Binary Decision Diagram (using order
7: cannot be done in polytime even if P = N P <)
?: remain unclear (no proof yet) DNF Disjunctive Normal Form DNNF: Projection / Existential Qualification
CNF Conjunctive Normal Form
PI Prime Implicates Recall: ∆ = A ⇒ B, B ⇒ C, C ⇒ D, existential qua-
NNF Transformations
IP Prime Implicants lifying B, C, is the same with forgetting B, C, is in
MODS Models
other words projecting on A, D.
notation transformation description
FBDD: the intersection of DNNF and BDD. In DNNF, we existential qualifying {Xi }i∈S (S is a
CD conditioning ∆|P
OBDD< : if N and M are or-nodes, and if N is an selected set) by:
FO forgetting ∃P, Q, . . . ∆
SFO singleton forgetting ∃P.∆ ancestor of M , then dVar(N ) < dV ar(M ).
• replacing all occurrence of Xi (both positive and
∧C conjunction ∆1 ∧ ∆2 OBDD: the union of all OBDD< languages. In this
negative, both Xi and ¬Xi ) in the DNNF with
∧BC bounded conjunction ∆1 ∧ ∆2 course we always use OBDD to refer to OBDD< .
true (Note: result is still DNNF);
∨C disjunction ∆1 ∨ ∆2 MODS is the subset of DNF where every sentence
∨BC bounded disjunction ∆1 ∨ ∆2 satisfies determinism and smoothness. • check if the resulting circuit is consistent.
¬C negation ¬∆ PI: subset of CNF, each clause entailed by ∆ is sub-
sumed by an existing clause; and no clause in the This can be done to DNNF, because:
Our goal is to transform in polytime while still keep
sentence ∆ is subsumed by another. (
the properties (e.g. DNNF still be DNNF). ∃X.(α ∨ β) = (∃x.α) ∨ (∃x.α)
IP: dual of PI, subset of DNF, each term entailing
Bounded conjunction / disjunction: KB ∆ is boun-
∆ subsumes some existing term; and no term in the ∃X.(α ∧ β) = (∃x.α) ∧ (∃x.α) // DNNF only
ded on conjunction / disjunction operation. That is,
sentence ∆ is subsumed by another.
taking any two formula from ∆, their conjunction / In DNNF, ∃X.(α ∧ β) is α ∧ (∃X.β) or (∃X.α) ∧ β.
disjunction also belong to ∆.
Minimum Cardinality Arithmetic Circuits (ACs) OBDD (Ordered Binary Decision Diagrams)
Cardinality: in our case, by default, defined as the The counting graph we used to do CT on d-DNNF In an OBDD there are two special nodes: 0 and 1,
number of false in an assignment (in a world, how is a typical example of Arithmetic Circuits (ACs). always written in a square. Other nodes correspond
many variables’ truth value are false). We seek for Other operations could be in ACs, such as by repla- to a variable (say, xi ) each, having two out-edges:
its minimum. a cing “+” by “max” in the counting graph, running it high-edge (solid, decide xi = 1, link to high-child),
results in the most-likely instantiation. (MPE) low-edge (dashed, decide xi = 0 link to low-child).
minCard(X) = 0
If a Bayesian Net is decomposable, deterministic and
minCard(¬X) = 1 smooth, then it could be turned into an Arithmetic x1
minCard(true) = 0 Circuits.
minCard(false) = ∞ x2
 Succinctness v.s. Tractability
minCard(α ∨ β) = min minCard(α), minCard(β)
Succinctness: not expensive; Tractability: easy to use. ∆=(x1∧x2)∨¬x3 x3
minCard(α ∧ β) = minCard(α) + minCard(β) Along the line: OBDD → FBDD → d-DNNF →
f=x1x2+(1-x3)
DNNF, succinctness goes up (higher and higher space
Again, the last rule holds only in DNNF.
efficiency), but tractable operations shrunk. 0 1
Filling the values into DNNF circuit, we can easily
compute the minimum cardinality. An example of a DNF
Knowledge-Base Compilation
• minimizing cardinality requires smoothness; Top-down approaches: We express KB ∆ as function f by turning all ∧ into
• it can help us optimizing the circuit by “killing” multiply and ∨ into plus, ¬ becomes flipping between
• Based on exhaustive search; 0 and 1. None-zero values are all 1. Another exam-
the child of or-nodes with higher cardinality,
and further remove dangling nodes. Bottom-up approaches: ple says we want to express the knowledge base where
there are odd-number positive values:
• Based on transformations.
a Could easily be other definitions, such as defined as the
number of true values, and seek for its maximum.
Odd-parity x1
Top-Down Compilation via Exhaustive DPLL function
d-DNNF x2 x2
Top-down compilation of a circuit can be done by ke-
CT: model counting. MC(α) = |Mods(α)| eping the trace of an exhaustive DPLL.
f=(x1+x2+x3+x4)%2 x3 x3
(decomposable) MC(α ∧ β) = MC(α) × MC(β) The trace is automatically a circuit equivalent to the
(deterministic) MC(α ∨ β) = MC(α) + MC(β) original CNF ∆.
counting graph: replacing ∨ with + and ∧ with ∗ It is a decision tree, where: x4 x4
in a d-DNNF. Leaves: MC(X) = 1, MC(¬X) = 1,
MC(true) = 1, MC(false) = 0. • each node has its high and low children;
0 1
weighted model counting (WMC): can be com-
• leaves are SAT or UNSAT results.
puted similarly, replacing 0/1 with weights.
Note: smoothness is important, otherwise there can Reduction rules of OBDD:
We need to deal with the redundancy of that circuit.
be wrong answers. Guarantee smoothness by adding
z1 z2 z1 z2 A B A B A B A B
trivial units to a sub-circuit (e.g. α ∧ (A ∨ ¬A)). 1. Do not record redundant portion of trace (e.g.
Marginal Count: counting models on some conditi- too many SAT and UNSAT — keep only one
x x x x
ons (e.g. counting ∆|{A, ¬B}) CD+CT. SAT and one UNSAT would be enough);
It is not hard to compute, but the marginal counting y y
2. Avoid equivalent subproblems (merge the nodes
is bridging CT to some structure that we can compute
of the same variable with exactly the same out- deduction rule merge rule
partial-derivative upon (input: the conditions / as-
degrees, from bottom to top, iteratively).
signment of variables), similar to Neural Networks.
FO: forgetting / projection / existential qualification. In practice, formula-caching is essential to reduce the An OBDD that can not apply these rules is a reduced
Note: a problem occur — the resulting graph might amount of work; trade-off: it requires a lot of space. OBDD. Reduced OBDDs are canonical. i.e. Gi-
no longer be deterministic, thus d-DNNF is not con- A limitation of exhaustive DPLL: some conflicts ven a fixed variable order, ∆ has only one reduced
sidered successful on polytime FO. can’t be found in advance. OBDD.
OBDD: Subfunction and Graph Size OBDD: Transformations SDD: Structured Decomposability
Considering the function f of a KB ∆, we have a ¬C: negation. Negation on OBDD and on all BDD Decomposability:
fixed variable order of the n variables v1 , v2 , . . . , vn ; is simple. Just swapping the nodes 0 and 1 — turning
after determining the first m variables, we have up 0 into 1 and 1 into 0, done. O(1) time complexity. f (ABCD) =
to 2m different cases of the remaining function (given f1 [g1 (AB) ∧ h1 (CD)]∨
the instantiation). CD: conditioning. O(1) time complexity. ∆|X f2 [g2 (A) ∧ h2 (BCD)] ∨ . . .
The number of distinct subfunction (range from requires re-directing all parent edges of X be directed
1 to 2m ) involving vm+1 determines the number of to its high-child node, and then remove X; similarly Structured Decomposability:
nodes we need for variable vm+1 . Smaller is better. ∆|¬X re-directs all parent edges of X-nodes to its
An example: f = x1 x2 + x3 x4 + x5 x6 , examining low-child node, and then remove itself. f (ABCD) =
two different variable orders: x1 , x2 , x3 , x4 , x5 , x6 , or f1 [g1 (AB) ∧ h1 (CD)]∨
x1 x1 x1 x1
x1 , x3 , x5 , x2 , x4 , x6 . Check the subfunction after the f2 [g2 (AB) ∧ h2 (CD)] ∨ . . .
first three variables are fixed. x2 x2 x2

The first order has 3 distinct subfunction, only 1 de- x3 feature: variables split in the same way in each sub-
pend on x4 , thus next layer has 1 node only. function.
0 1 0 1 0 1 1 1

x1 x2 x3 subfunction ∆ ∆|x3 ∆|¬x3 reduced OBDD SDD: Partitioned Determinism


0 0 0 x5 x6 An (X, Y)-partition of a function f goes like:
0 0 1 x4 + x5 x6 ∧C: conjunction.
0 1 0 x5 x6 f (X, Y) = g1 (X)h1 (Y) + · · · + gn (X)hn (Y)
0 1 1 x4 + x5 x6 • Conjoining BDD is super easy (O(1)): link the
1 0 0 x5 x6 root of ∆2 to where was node-1 in ∆1 , and then where X ∩ Y = ∅ and X ∪ Y = V where V are all the
1 0 1 x4 + x5 x6 we are done. variables we have for function f .
1 1 0 1 It is called a structured decomposability.
• Conjoining OBDD, since we have to keep the or-
1 1 1 1 gi regarding X is called a prime, and hi regarding Y
der, will be quadratic. Assuming OBDD f and
g have the same variable order, and their size is called a sub.
The second order has 8 distinct subfunction, 4 depend Requirements on the primes are:
on x2 , thus next layer has 4 nodes. (i.e. #nodes) are n and m respectively, time
complexity of generating f ∧ g will be O(nm). 
This theoretical optimal is achieved in practice, ∀i, j gi ∧ gj = false
 //mutual exclusiveness
x1 x3 x5 subfunction
by proper caching. g1 ∨ · · · ∨ gn = true //exhaustive
0 0 0 0 
∀i gi 6= ⊥ //satisfiable

0 0 1 x6
0 1 0 x4 f g f∧g

0 1 1 x4 + x6 x x x VTree
case 1
1 0 0 x2 Vtree is a binary tree that denotes the order and the
1 0 1 x2 + x6 f1 f0 g1 g0 f1∧g1 f0∧g0 structure of a SDD. Each node’s left branch refers
1 1 0 x2 + x4 to the element in the primes, and each node’s right
1 1 1 x2 + x4 + x6 x y x
case 2 branch refers to that of the subs.
Subfunction is a reliable measurement of the OBDD f1 f0 g1 g0 f1∧g f0∧g
From OBDD to SDD
graph size, and is useful to determine which variable assuming x < y, thus x not in g
order is better. OBDD is a special case of SDD with right-linear a
vtree.
SDDs (Sentential Decision Diagrams) SDD is a strict superset of OBDD, maintaining key
SDD is the most popular generalization of OBDD. It properties of OBDD b , and could be exponentially
is also a circuit type. smaller than OBDD.
a Right-linear means that each node’s left child is a leaf.
• Order: needed, and matters b What is called a path-width in OBDD is called a tree-
width in SDD
• Unique: when canonical / reduced
SDD: Compression Construct an SDD: Example Bottom-Up Compilation (OBDD/SDD)
(X, Y)-partition is compressed if there is no equal Following the previous example, using that specific • To compile a CNF:
subs. That is, vtree, the SDD we construct looks like: 
6
OBDD/SDD for literals

hi 6= hi , ∀i 6= j disjoint literals to clause

disjoint clauses to CNF

Any f has a unique compressed (X, Y)-partition. C ¬B

Systematic Way of Building SDD: Example


2 2 5
• Similar to DNF

Given: f = (A ∧ B) ∨ (B ∧ C) ∨ (C ∧ D) • Works for every Boolean formula


B A ¬B B ¬A C D ¬C
X = {A, B}
An example of the bottom-up compilation:
Y = {C, D} where > stands for always true and ⊥ for always
CNF:(x+y)(y+z) x
Then we can have the sub-functions (subs) as condi- false. ab Each node consists of a head and a tail; for order: x,y,z
tioned on the primes: either a head or a tail, if it involves more than one garbage collection
variable (a.k.a representing an intermediate node in needed x x y y y
prime sub the vtree), we need to decompose it again (according
A∧B true to its left-right branches in the vtree). y z z
x
A∧B C ∧D OBDDs are SDDs where the partition at any
A∧B C node has |X| = 1, being a Shanoon decomposition
0 1 0 1 0 1 0 1
A∧B C ∧D (gi (X)hi (Y|X)).
x x+y y+z (x+y)(y+z)
In a SDD circuit, the in-signals of any or-gate are
Resolving the primes with the same sub, to conduct
either one-high or all-low (when, for example, the
compression: Note: I’ve directly omitted a lot of nodes that are
selected prime has a ⊥ sub).
prime sub a https://en.wikipedia.org/wiki/List_of_logic_symbols
garbage-collected in the middle. For instance,
A∧B true b https://oeis.org/wiki/List_of_LaTeX_mathematical_symbols shown on the second step is where we do garbage col-
A∧B C lection for the first x literal node.
B C ∧D Garbage-collection: for the sake of memory.
Same Partition: Polytime Operation
Challenges: good variable order, apply (e.g. con-
f = (A ∧ B) (true) + (A ∧ B) (C) + (B) (C ∧ D) (X, Y)-partition of joint, disjoint) operations scheduler, etc.
| {z } | {z } | {z } |{z} |{z} | {z }
prime prime prime
Top-Down v.s. Bottom-Up: bottom-up approa-
sub sub sub f : (p1 , q1 ) . . . (pn , qn )
| {z } | {z } | {z } ches are typically more space-consuming, yet more
f1 f2 f3 g : (r1 , s1 ) . . . (rm , sm ) flexible. (Sometimes, f1 ∧ f2 could be simple when
One possible vtree is: f1 and f2 on their own are complex.)
which means that, for example,
6
f = p1 (X)q1 (Y) + · · · + pn (X)qn (Y) Canonicity in Compilation
2 5 OBDDs are canonical:
0 1 3 4
And then we have the (X, Y)-partition of f ◦ g being: fixed variable order → unique reduced OBDD
B A C D SDDs are canonical:
(pi ∧ rj , qi ◦ sj |pi ∧ rj 6= false) fixed vtree → unique trimmed & compressed SDD
X Y
Note: variable ordering has great impact on OBDD
where there are m × n sub-functions in total.
size; vtree has significant impact on SDD size.
Note that there other possible vtrees, but under this Note: at this stage, compression is not guaranteed.
circumstance, where X and Y are fixed, the leaves un-
Minimizing OBDD Size
der the left branch of the root has to contain and only
contain variables belong to X, and right branch for n variables lead to n! possibilities. We swap two ad-
Y. For intermediate nodes (neither leave nor root), jacent variables to change variable order. (This can
do the same recursively. be done easily, and could explore all possibilities.)
Minimizing SDD Size Tree Swapping Probabilistic: Review
The key point of optimizing the SDD size is to find a a
• Marginal Probability: formally the marginal
the best vtree. A vtree embeds a variable order. swap(a) probability of X can always be written as an
There are two approaches to find a good vtree: x y y x
expected value:

• statically: by pre-examining the Boolean func-


Z
pX (x) = pX|Y (x | y) pY (y) dy
tion y .
Searching Over Vtrees: in Practice = EY [pX|Y (x | y)]
• dynamically: by searching for an appropriate
one at runtime Vtree fragments a : root, child, left-linear fragment
(beneath the left child), right-linear fragment (bene- computed by examining the conditional proba-
Distinct sub-functions matter. Different vtrees can ath the right child). bility of X (some variables) given a particular
have exponentially different SDD sizes. Fragment operations: next, previous, goto, etc. value of Y (the remaining variables), and then
swap + rotate: enough to explore all possible vtrees. averaging over the distribution of all Ys.
Counting Vtrees in practice: we need time limit to avoid exploding In our case it isPusually the sum of some worlds’
ourselves. probabilities ( i Pr(ωi )).
A vtree embeds a variable order because the varia-
ble order can be obtained by a left-right traversal
greedy search: • Conditional Probability:
of the vtree. Vtree dissects a variable order, it tells
the division among primes and subs explicitly. • enumerate all vtrees over a window (i.e. reacha- Pr(α, β)
Pr(α|β) =
ble via a certain amount of rotate/swap opera- Pr(β)
• # variable orders: n! (n: #vars)
 tions)
2(n−1) ! To compute them efficiently/effectively, we can use
• # dissections: Cn−1 = n!(n−1)! • greedily accept the best vtree found, and then
circuits.
(Catalan number, # full binary trees with n move window
SDD (probability version): (X, Y)-Partition,
leaves.) a Fragment: (possibly empty) connected subgraph of a bi-

nary tree; unlike subtree = root node + descendents of that f (X, Y) = g1 (X)h1 (Y) + · · · + gn (X)hn (Y)
• # vtrees over n variables: node, a fragment need not include all descendents of its root.

∀i, gi 6= 0

2(n − 1) ! 
n! × Cn−1 = SDD, PSDD and Conditional PSDD ∀i 6= j, gi gj = 0 (mutually exclusive)
(n − 1)!
These are circuits of learning from Data & Knowledge.

g1 + g2 + · · · + gn = 1 (exhaustive)

Searching Over Vtrees where in this case gi are the probabilities.


year model comments
Compressed (∀i 6= j, hi 6= hj ) (X, Y)-Partition of
(
variable order probability space the truth table
• a Double-search problem f is unique.
dissection 2011 SDD Tractable Boolean Circuit
2014 PSDD P: Probabilistic e.g. Given α, we have
2018 Conditional PSDD conditional probability
(
rotating n
• using tree operations
X
swapping Pr(α) = Pr(α|gi )Pr(gi )
Impact of knowledge (supervised/unsupervised):
i=1
• reduce the amount of data needed (for trai-
Tree Rotations Structured Space: instead of considering all possi-
ning)
right
ble worlds, crossing-off some worlds for not satisfying
a b
rotate • improve robustness of ML systems some known constraints.
b z x a
• improve generality of ML systems • e.g. Routes: nodes are cities, edges are streets.
x y y z
Assign to edge value 1 for being on the route and
left Truth table: world, instantiation, 1/0. 0 for not. Structure: being a route. Unstructu-
rotate Probability distribution: world, instantiation, Pr(·). red assignment has 2m possibilities where m is
the number of possible streets (0/1 for each).
It preserves variable order; enumerates all dissections.
From SDD to PSDD Likelihood Conditional PSDD
PSDD, compared to SDD, is almost the same, except For model Pr(·), and PSDD with parameters θ, the Conditional PSDD models Pr(α|β).
that: idea is that we evaluate the quality of the parameters Its circuit is always a hybrid — from root to leave,
by likelihood (ei is a single observation — the line SDD on top and PSDD at the bottom. Meaning that
• OR-gates: having probability distributions with count 6 are actually 6 observations). condition β’s probability is not important at all.
over all inputs. An application: hierarchical map. If we treat each
L(Data|θ) = Prθ (e1 ) ∗ Prθ (e2 ) ∗ . . . Prθ (en ) part of the map as conditional PSDD conditioning on
• Any two OR-gates may have different proba-
bility distributions. the outer connections, then we can solve a very big
Dataset Incompleteness map by safely dividing it into smaller maps.
The AND-gates are just kept the same and no pro- Incomplete data means that for some worlds / obser-
bability applies. vations, there are some variable instantiation missing. Conditional Vtrees
Conditional PSDDs of Pr(Y|X) need conditional
PSDD: Probability of Feasible Instantiations vtrees. X, Y are sets of variables, X includes the
Dataset Type Algorithm
Evaluating the circuit top-side down — for each conditions.
Classical Complete Closed-form Solution a
world, from the top, tracing one child at each OR- The conditional vtree must contain a node, with
Classical Incomplete EM Algorithm (on PSDD)
gate, tracing all children at each AND-gate. Then we precisely the variables in X contained in the sub-
Non-classical Incomplete N/A in ML
have Pr(ωi ). tree beneath it. Then this node is called a X-node,
Interpreting PSDD Parameters: At each OR-gate, Non-classical Incomplete Dataset example: denoted as a ∗ instead of a · when drawing the vtree.
it induces a normalized distribution satisfying assign- The X-node must be reachable from the root of the
ments. The probability distribution corresponds to x2 ∧ (y2 ∨ z2 ), x ⇒ y, . . . vtree by only following the right children.
the probabilities of primes.
It is good to define arbitrary events.
Prime Implicate (PI), Prime Implicant (IP)
PSDD: Computing Marginal Probabilities Missing in the ML literature, conceptually doable but
there are computational reasons. (See extension rea- The two concepts are closely-related. ∆ is the kno-
In this case, marginal probabilities refers to the proba- dings mentioned in class.) wledge base.
bilities of some partial assignments (e.g. Pr(A =
a Unique maximum-likelihood estimates.
t, B = f ) when variables are A, B, C, D). Prime Implicate (PI) Prime Implicant (IP)
PSDDs are ACs (OR: +, AND: *). clauses terms
The challenge is that: parameters (probability distri- PSDD Multiplication CNF no subsumed clauses DNF no subsumed terms
bution) unknown. factor ← { distribution, normalization constant κ } Implicate c of ∆: ∆ |= c Implicant t of ∆: t |= ∆
factor: worlds’ instantiation, and sample count (in- Resolution α∨x,β∨¬x
α∨β
Consensus α∧x,β∧¬x
α∧β
PSDD: Learning Background Knowledge teger) ∧ prime implicates of ∆ ∨ prime implicants of ∆
We learn the parameters of PSDD via evidence. distribution: worlds’ instantiation, and probability To obtain PI/IP: Close ∆ Under Resolution / Con-
Evidence: observed data sample. Consider the tables as matrices, then F = D ∗ κ. sensus then drop subsumed clauses/terms.
First we have the SDD structure. Normalization needs to be re-done after multiplica- Subsume = all-literals already contained:
Then, we have Data such as: tion. (Multiplying two circuits.)
Aligning the rows of worlds in the factor table, the re- • Clauses: c1 subsumes c2 , for c1 = A ∨ ¬B,
L K P A #samples sulting factor table (of multiplication) is computed via c2 = A ∨ ¬B ∨ C; c1 |= c2 .
0 0 1 0 6 multiplying each row’s value (# samples multiplied).
0 0 1 1 10 Besides, it doesn’t means that, when κ1 ∗ P SDD1 × • Terms: t1 subsumes t2 , for t1 = ¬A ∧ B,
... κ2 ∗ P SDD2 = κ3 ∗ P SDD3 , κ1 , κ2 , κ3 have any cor- t2 = ¬A ∧ B ∧ ¬C; t2 |= t1 .

Starting from the top, trace the high-wired (1, one relation. (Can’t expect to have κ3 = κ1 × κ2 .) For PI, existential quantification, and CE (clausal en-
under OR, all under AND) for each sample. Assign 1 The PSDD circuits involved tailment check), are easy.
for each sample along the trace, under OR-gate. a (P SDD1 , P SDD2 , P SDD3 ) doesn’t need to be Prime means a clause/term is not subsumed by
similar at all. any any other
Normalizing under each OR-gate. (Sum up to 1.)
An application: Compiling Bayesian Network into ( clause/term.
a e.g. α is a prime implicate of ∆
In this case, the OR-gates input high wires correspon- PSDDs. e.g. P SDDall = P SDDA ∗ P SDDB ∗ Duality:
ding to ¬L ∧ ¬K ∧ P ∧ ¬A are assigned 0 + 6 = 6. If the same
P SDDC|AB ∗ P SDDD|B . . . ¬α is a prime implicant of ¬∆
edge gets assignment ≥ 2 times, sum them up (e.g. 6+10 = 16).
Model-Based Diagnosis Health Condition Classifiers: Review
In a circuit, on each edge (connecting two gates) there Health condition of system ∆ given observation α is: Function version of a classifier:
is a signal (high or low), denoted as X, Y, A, B, C, . . . ,
(α, could be directly observed). Health(∆, α) = ∃ ...
|{z} ∆∧α f (x1 , . . . , xn )
For each gate (usually numbered 1, 2, . . . ), there is all except oki
where xi are called features, all features x1 , x2 , . . . xn
one extra variable (ok1, ok2, . . . ) called health vari-
— projection of ∆ ∧ α onto health variables oki. together: instance; output of f : decision (classifica-
able, representing whether or not the gate is correctly
Note: Could be done easily by bucket resolution + tion); positive/negative decision refer to f = 1/0 res-
functioning.
forgetting we’ve learned before. pectively, while the corresponding instances are called
∆ contains A, B, C, . . . ok1, ok2, . . . . Examples:
positive/negative instantiation.
A A B Methods of Diagnosis
• Boolean Classifier: xi , f have Boolean values
(a) (b)
Based on health condition Health(∆, α) we can do
1 1 2 model-based diagnosis. – Propositional Formula as Classifier: ω |=
B
(
conflict: implicates of Health(∆, α) ∆ positive and ω |= ¬∆ negative.
C D CNF:
2 3
min-conflict: PI of Health(∆, α) • Monotone Classifier: positive instance remains
C
(
parlid: implicant of Health(∆, α) positive if we flip some features from − to +
E DNF:
kernel: IP of Health(∆, α) – e.g. f (+ − −+) → + ⇒ f (+ + ++) → +
(
ok1 ⇒ (A ⇐⇒ ¬B) Minimum Cardinality Diagnosis: turn the health
∆a = condition Health(∆, α) into a DNNF and then com- Minimum Cardinality of Classifiers: the number of
ok2 ⇒ (B ⇐⇒ ¬C)
pute the minCard. The path with minimum cardina- false variables (negative features). Note: Computed
lity corresponds to the solution. on DNNF easily. Sometimes the circuit can be mi-

ok1 ⇒ (A ⇐⇒ ¬C)

nimized when (1) smooth (2) prune some edges that
∆b = ok2 ⇒ (B ⇐⇒ ¬D)
 Current Topics aren’t helpful to minCard.
ok3 ⇒ ((C ∨ D) ⇐⇒ E)

• Explaining decisions of ML systems. (see “Why • Sub-Circuit: a model; trace-down one child
Model-Based Diagnosis figure out what are the possi- Should I Trust You?” (KDD’16)) of OR-gates and all children of AND-gates.
ble situations of health variables when given ∆ and
α (an observation, e.g. αa = C, αb = ¬E, etc.). • Measuring robustness of decisions.
MC Explanations and PI Explanations
∆ here is called a system, and α is system obser-
vation. Readings: MC Explanations (MC: Minimum Cardinality)
For example: in case (a), if ∆ ∧ α ∧ ok1 ∧ ok2 is sa-
• Three Modern Roles for Logic in AI (PODS’20) • which positive features are responsible for a yes
tisfiable (using SAT solver) then health condition
decision? (negative: vice versa)
ok1 = t, ok2 = t is normal, otherwise it is abnor- • Human-Level Intelligence or Animal-Like Abili-
mal. ties? (CACM’18) • computed in linear time on DNNF* (def: ∆,
To do diagnosis we conclude all the normal assign- ¬∆ are both DNNF)
ments of the health variables. Explanation (Explaining Decisions)
e.g. Example (b), α = ¬A, ¬B, ¬E, diagnosis: • to answer Q1: which positive features, def:
Instance minCard = # positive variables; condition
ok1 ok2 ok3 normal? on the negative features observed in the cur-
3 3 3 no rent case,; compute minCard; minimizing
3 3 7 yes training ML System Compile Tractable (kill unhelpful nodes, edges); enumerate (sub-
3 7 3 no (function)
Data Circuit circuits)
BN/NN/RF……
3 7 7 yes
7 3 3 no PI Explanations (PI: Prime Implicant)
7 3 7 yes
Decision • characteristics make the rest irrelevant?
7 7 3 yes
7 7 7 yes • compute PI; sufficient reasons are all the PI
(BN: Bayesian Nets, NN: Neural Nets, RF: Random Forests) terms.
Concluding all yes and simplify: ¬ok3∨(¬ok1∧¬ok2).
Decision and Classifier Bias: Definition Reasoning about ML Systems: Overview Compiling Binary Neural Networks
Protected features: we don’t want them to influ- Queries Explanation, Robustness, This is a very recent topic.
ence the classification outcome. (e.g. gender, age) Verification, etc. Binary: the whole NN represents a Boolean Function.
Decision is biased if the result changes when we flip ML Systems Neural Networks, Graphi-
the value of a protected feature. • Input to the NN (and to each neuron): Boolean
cal Models, Random Fo- (0/1)
Classifier is biased if one of its decisions is biased. rests, etc.
Tractable Circuits OBDD, SDD, DNNF, etc. • Step activation function:
Decision and Classifier Bias: Judgement
(
Theorem: Decision is biased iff each of its sufficient For more: http://reasoning.cs.ucla.edu/xai/
P
1 i w i xi ≥ T
reasons contains at least one protected feature. σ(x) =
0 otherwise
Robustness (for Decision / Classifier)
Complete Reason (for Decision) Hamming Distance (between instances): the num- where in this case the neuron has a th-
Complete reason is the disjunction (∨) of all suffi- ber of disagreed features. Denoted as d(x1 , x2 ). reshold T and inputs from the last layer are:
cient reasons. (e.g. α = (E ∧ F ∧ G) ∨ (F ∧ W ) — Instance Robustness: x1 , x2 , . . . , xi , . . . , with corresponding weights
we made the decision because of α) w1 , w2 , . . . , wi , . . . .
robustnessf (x) = min d(x, x0 )
x0 :f (x0 )6=f (x) For instance, a neuron that represents 2A+2B −3C ≥
Reason Circuit (for Decision) 1 can be reduced to a Boolean circuit:
Reason Circuit: tractable circuit representation of Model Robustness:
2A+2B-3C≥1
the complete reason. 1 X A=1
model robustness(f ) = n robustnessf (x) A=0
If the classifier is in a special form (e.g. OBDD, 2 x
Decision-DNNF), then reason circuit can be obtained 2B-3C≥1 2B-3C≥-1
directly in linear time. How: Instance Robustness is the minimum amount of flips B=0 B=1 B=0 B=1
needed to change decision. Model Robustness is the
1. compile the classifier into a circuit, and get a -3C≥1 -3C≥-1 -3C≥-3
average of all instances’ robustness. (2n is the amount C=0 C=0
positive instance ready (otherwise work on C=1 C=0 C=1
of instances.) C=1
negation of the classifier circuit);
e.g. odd-parity: the model-robustness is 1. 0 1
2. add consensus:
Compiling I/O of ML Systems Naı̈ve Bayes Classifier
(¬A ∧ α) ∨ (A ∧ β)
α∧β By compiling the input/output behavior of ML sys- Naı̈ve Bayes Classifier:
tems, we can analyze classifiers by tractable circuits.
add all the α ∧ β terms into the circuit. From easiest to hardest conceptually: RF, NN, BN C
main challenge: scaling to large ML systems
3. filtering: go to the branches incompatible with E1 …… En
the instance and kill them.
Compiling Decision Trees and Random Forests
• the reason circuit thereby monotone (po- DT (decision tree): could transfer into multi-valued • Class: C (all Ei depend on C)
sitive feature remains positive, negative fe- propositional logic
ature remains negative) • Features: E1 , . . . En (conditional independent)
x≥2
• because of monotone, can do existential • Instance: e1 , . . . en = e
quantification in linear time. y≥-7 x≥6
Pr(α∧β)
• Class Posterior: (note that) Pr(α|β) = Pr(β)
The reason circuit can be used to handle queries such 0 1 1 0
as: sufficient reasons, necessary properties, necessary Pr(e1 , . . . en )Pr(c)
reason, because statement, . . . Pr(c|e1 , . . . , en ) =
where x ∈ (−∞, 2) → x = x1 , x ∈ [2, 6) → x = x2 , Pr(e1 , . . . en )
x ∈ [6, +∞) → x = x3 ; y ∈ (−∞, −7) → y = y1 , Pr(e1 |c) . . . Pr(en |c)Pr(c)
y ∈ [−7, +∞) → y = y2 . =
Pr(e1 , . . . en )
RF (random forest): majority voting of many DTs.
Naı̈ve Bayes: CPT Compiling Naive Bayes Classifier Solving MPE via MaxSAT: Example
A Bayesian Network has conditional probability Brutal force method: consider sub-classifiers — Given a Bayesian Network (with CPT listed):
tables (CPT) at each of its node. ∆|U and ∆|¬U , recursively.
e.g. previous example node C CPT: Problem: can have exponential size (to # variables). A B
Solution: cache sub-classifiers.
C ΘC Note: Naı̈ve Bayesian Network has threshold T and A B θB|A
c1 θc1 (e.g., 0.1) prior (e.g. in the previous example, we have prior of a1 b1 0.2
... C, and if Pr(C = ci |Ej = ej,x ) ≥ T then, for exam- A θA
a1 b2 0.8
ck θck (e.g., 0.2) ple, the answer is yes, otherwise no). We may have a1 0.3
a2 b1 1
different conditions, different conditional pro- a2 0.5
Pk
where ∀i θci ∈ [0, 1], i=1 θci = 1. a2 b2 0
babilities, sharing the same sub-classifier. a3 0.2
And at node Ej , the CPT: a3 b1 0.6
a3 b2 0.4
C Ej ΘEj |C Application: Solving MPE & MAR
c1 ej,1 θej,1 |c1 (e.g., 0.01) MPE: most probable explanation • Indicator variables:
c1 ej,2 θej,2 |c1 (e.g., 0.03) → NP-complete – from A (values a1 , a2 , a3 ): Ia1 , Ia2 , Ia3
... → probabilistic reasoning, find the world with the lar-
c1 ej,q θej,q |c1 (e.g., 0.1) gest probability – from B (values b1 , b2 ): Ib1 , Ib2
c2 ej,1 θej,1 |c2 (e.g., 0.01) → solved by weighted MaxSAT • Indicator Clauses:
... → compile to DNNF 
ck ej,q θej,q |ck (e.g., 0.02) MAR: marginal probability 
(Ia1 ∨ Ia2 ∨ Ia3 )W (
→ PP-complete (¬I ∨ ¬I )W (Ib1 ∨ Ib2 )W

Pq a1 a2
where ∀i, j, x θej,x |ci ∈ [0, 1], ∀i, j x=1 θej,x |ci = 1. → sum of all worlds’ probabilities who satisfy certain A B
(¬Ia1 ∨ ¬Ia3 )W (¬Ib1 ∨ ¬Ib2 )W
Under a condition, the marginal probability is 1. conditions


(¬Ia2 ∨ ¬Ia3 )W

∀i, ei , c, Pr(ei |c) are all in CPT tables. → solved by WMC (weighted model counting)
→ compile to d-DNNF
• Parameter Clauses: (= # rows in CPTs)
P
Odds v.s. Probability conditional version: work on “shrunk table” where
some worlds are removed 
(¬Ia1 ∨ ¬Ib1 )− log(.2)
Probability: Pr(c) (chance to happen, [0, 1])


− log(.8)

Solving MPE via MaxSAT (¬Ia1 ∨ ¬Ib2 )
 

Pr(c) − log(.3)
(¬Ia1 )

(¬I ∨ ¬I )− log(1)
Odds: O(c) = • Input: weighted CNF = α1 , . . . , αn (with
 
a2 b1
Pr(c) A (¬Ia2 )− log(.5) B − log(0)
weights w1 , . . . , wn ) (¬Ia ∨ ¬I b2 )
Pr(c|e) (¬Ia3 )− log(.2)

 
 2

O(c|e) = (conditional odds) 
(¬Ia3 ∨ ¬Ib1 )− log(.6)
– (x ∨ ¬y ∨ ¬z)3 , (¬x)10.1 , (y).5 , (z)2.5

Pr(c|e) 


(¬Ia3 ∨ ¬Ib2 )− log(.4)

Pr(c|e) – next to the clauses, 3, 10.1, 0.5, 2.5 are the
log O(c|e) = log (log odds)
Pr(c|e) corresponding weights
where we define W = log(0).
In the previous example, if we use log odds instead of – W : the weight of hard clauses, greater
p than the sum of all soft clauses’ weights • the weighted CNF contains all Indicator Clau-
probability, Pr(α) ≥ p ⇐⇒ log O(α) ≥ ρ = 1−p
ses and Parameter Clauses
Qn • find variable assignment with the highest
Pr(c) i=1 Pr(ei |c)/Pr(e) weight / least penalty • Evidence: e.g. A = a1 , adding (Ia1 )W .
O(c|e) = log Qn
Pr(c) i=1 Pr(ei |c)/Pr(e)
n n Wt =weight(x1 , . . . , xn ) =
X
wi Given a certain instantiation Γ, e.g. ¬Ia1 , . . . ¬Ib2 :
X Pr(ei |c) X
= log(c) + log = log(c) + wei x1 ,...xn |=αi
Pr(ei |c)
X
i=1 i=1 X Pn(Γ) = − log θx|v
Pn =penalty(x1 , . . . , xn ) = wi θx|v ∼x
wei is weight of evidence ei , depending on instance. x1 ,...xn 2αi Y
log O(c) is the prior log-odds. Changing class prior = − log θx|v = − log Pr(x)
(shift log O(c)) shifts all log O(c|ei ) the same amount. Wt(x1 , . . . , xn )+Pn(x1 , . . . , xn ) = Ψ (constant) θx|v ∼x
MaxSAT: Solving Solving MAR via WMC: Example MAR as WMC: Example with Local Structure
Previously we’ve discussed methods of solving Max- B B
SAT problems, such as searching. A A
MaxSAT could also be solved by compiling to DNNF C C
and calculate the minCard.
An Example: (unweighted for simplicity) A B θB|A A C θC|A A B θB|A A C θC|A
A θA a1 b1 0.1 a1 c1 0.1 A θA t t 1 t t 0.8
∆:A ∨ B}, ¬A
| {z∨ B}, |{z}
¬B
| {z a1 0.1 a1 b2 0.9 a1 c2 0.9 t 0.5 t f 0 t f 0.2
C0 C1 C2
a2 0.9 a2 b1 0.2 a2 c1 0.2 f 0.5 f t 0 f t 0.2
• add selector variables: S0 , S1 , S2 a2 b2 0.8 a2 c2 0.8 f f 1 f f 0.8

∆0 : A ∨ B ∨ S0 , ¬A ∨ B ∨ S1 , ¬B ∨ S2 • Indicator Variables: Ia1 , Ia2 , Ib1 , Ib2 , Ic1 , Ic2 First we construct the clauses as before (this time de-
note e.g., a1 = a and a2 = a).
• Parameter Variables: Pa1 , Pa2 , Pb1 |a1 , Pb2 |a1 , Local Structure: re-surfacing old concept; here para-
representing whether or not a clause is selected
Pb1 |a2 , Pb2 |a2 , Pc1 |a1 , Pc2 |a1 , Pc1 |a2 , Pc2 |a2 meter values matter.
to be unsatisfiable / thrown away.
• I∗ and P∗ are all Boolean variables. • Zero Parameters (logical constraints): e.g.
• assign weights:
 • Indicator Clauses:
I ∧ I (⇐⇒ P → ¬Ia ∨ ¬Ib
hhhh((((
w(S0 ) = 1, w(S1 ) = 1, w(S2 ) = 1
  (a((b hhhb|ah
w(¬S0 ) = 0, w(¬S1 ) = 0, w(¬S2 ) = 0 A: Ia1 ∨ Ia2 , ¬Ia1 ∨ ¬Ia2



w(A) = w(¬A) = w(B) = w(¬B) = 0 B: Ib1 ∨ Ib2 , ¬Ib1 ∨ ¬Ib2 • One Parameters (logical constraints): e.g.

C: Ic1 ∨ Ic2 , ¬Ic1 ∨ ¬Ic2
 hhhh((((
∧(
Ia ( Ib (⇐⇒
hhP
hb|a
• define cardinality: number of positive selector ( h
variables — computing minCard is the same • Parameter Clauses:
• Equal Parameters: e.g.
with working on the weights 

 A: Ia1 ⇐⇒ Pa1 , Ia2 ⇐⇒ Pa2 (
• compile ∆0 into DNNF (hopefully) Ia∧Ic

B: Ia1 ∧ Ib1 ⇐⇒ Pb1 |a1 , Ia1 ∧ Ib2 ⇐⇒ Pb2 |a1


→ (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P1
 
Ia2 ∧ Ib1 ⇐⇒ Pb1 |a2 , Ia2 ∧ Ib2 ⇐⇒ Pb2 |a2 Ia ∧ Ic
 
• compute minCard, optimal solution minCard =


C: Ia1 ∧ Ic1 ⇐⇒ Pc1 |a1 , Ia1 ∧ Ic2 ⇐⇒ Pc2 |a1


1 achieved when S0 , ¬S1 , ¬S2 , ¬A, ¬B; solution: 

• Context-Specific Independence (CSI): indepen-

Ia2 ∧ Ic1 ⇐⇒ Pc1 |a2 , Ia2 ∧ Ic2 ⇐⇒ Pc2 |a2

¬A, ¬B; satisfied clauses: C1 , C2 .
dent only when considering some specific worlds
the rule is:
Factor v.s. Distribution With local structure considered, the clauses:
Factor sums up to anything; Iu1 ∧ · · · ∧ Ium ∧ Ix ⇐⇒ Px|Iu1 ...Ium
Ia ∨ Ia Ib ∨ Ib Ic ∨ Ic
Distribution sums up to 1.
¬Ia ∨ ¬Ia ¬Ib ∨ ¬Ib ¬Ic ∨ ¬Ic
• Weights are defined as:
¬Ia ∨ ¬Ib
Solving MAR via WMC
Wt(Ix ) = Wt(¬Ix ) = Wt(¬Px|u ) = 1 ¬Ia ∨ ¬Ia
Bayesian Reduce Weighted Compile Tractable (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P1 (0.8 prob)
Boolean Boolean
Wt(Px|u ) = θx|u
Network (Ia ∧ Ic ) ∨ (Ia ∧ Ic ) ⇐⇒ P2 (0.2 prob)
Formula Circuit
e.g. Pb2 |a2 has 0.8 weight. Ia ∨ Ia ⇐⇒ P3 (0.5 prob)

Arithmetic N : CNF encoding of BN ⇒ ∆N Could be compiled into sd-DNNF.


Circuit(AC) And we can build AC accordingly, by: (1) replacing
Any evidence e = e1 , . . . ek :
Ix with λx (and Ix with λx ); (2) replacing Py with θy ;
Reduction: using indicator and parameter variables. Pr(e) = WMC(∆N ∧ Ie1 . . . Iek ) (3) replace and by ∗, or by +.
More Reading: Modeling and Reasoning with Baye- Evidence in AC: when there’s no evidence, λi = 1;
sian Networks Network Instantiation: (ai bj ck ): when there is an evidence, if compatible with it λi =
e.g. Wt(a1 b1 c2 ) = .1 ∗ .1 ∗ .9 = .009. 1, otherwise λi = 0. (e.g. given A: λa = 1, λa = 0)
MAR as WMC: Example — AC Sum-Product Nets (SPN, 2011) Finale: more topics
The AC generated from the previous example (consi- Claim: If an AC: ACs:
dering local structure) is: • model-based supervised learning:
1. computes a factor f , • • • in between AC-encoding with & without local
* structure; only part of the parameters (part of θ) are
2. and is decomposable and smooth,
.5 + known and the rest to learn.
it computes marginals of f .
* * • background knowledge (BK): (1) known para-
Known as SPNs (Sum-Product Nets). SPNs can’t
λa λb + + λb λa compute MPE in linear time. meters (2) functional dependencies (sometimes
Decomposable and smooth guarantee that sub- we know that Y = f (X) but we don’t know the
* * * * identity of function f )
circuit term is a complete instantiation.
λc .2 .8 λc
Determinism further guarantees one-to-one map- • from compile model to compile query:
ping between sub-circuit terms and complete instan- e.g. evidence A,C, query B; AC’s leaves:
tiations. λa , λa , λc , λc , Θ; output P ∗ (b), P ∗ (b) can be
On AC we can do backpropagation.
An SPN that satisfies determinism is called trained from labeled data (GD etc.)
∂f Selective-SPN, and it computes MPE.
(e) = Pr(x, e − x) • tensor graphs
∂λx
Parametric Completeness of Factors • • • new AC compilation algorithm
∂f
θx|u (e) = Pr(x, u, e) • • • key benefit: parallel
∂θx|u Definition: Parameter Θ is complete for factor f (x)
iff for any instantiation x, f (x) can be expressed as a • Structural Causal Models (SCMs): exogenous
There are other possible reductions, such as minimi- product of parameters in Θ. variables (distributions, e.g. Ux , it points to x),
zing the size of CNF, etc. Claim: The parameters of a Bayesian Network are endogenous variables (functions, e.g. x, a node
complete for its factor. in a directed graph)
ACs with Factors Infer: When completeness of the parameters is gua-
Motivation: avoid losing reference point etc. when ranteed: ∃AC(X, Θ) that is decomposable, determi- Solving P P P P -complete problems with tracta-
learning ACs from data. nistic and smooth. ble circuits
For instance, instead of listing A, B, ΘB|A and use it, • Maj-Maj-SAT is solvable in linear time (to the
we list A, B, f (A, B) where the f values are integers. Factor: Sub-circuit Term & Coefficient SDD size) if we can constrain its SDD (i.e. normali-
In the AC, because we use f instead of Θ, the values + zed for a constrained Vtree)
are integers as well. * • Vtree is x-constrained iff there’s a node ∃v that
We can build ACs to compute factor (f ) in this way.
+ λa 2 (1) appears on the right-most path (2) the set
(e.g. given instance A, B, compute f (a, b) via the AC of variables outside v are equal to x.
by setting λa = λb = 1 and λa = λb = 0) *
Some of these ACs also computes: λb 5 Graph abstractions of KB
• primal, dual, incidence graphs; hyper-graph
• marginals: e.g. f (a) = f (a, b) + f (a, b) can be • tree-width, cut-width, path-width
computed via the AC setting λa = λb = λb = 1 e.g., the above sub-circuit: Auxiliary variables: basically, the idea is to add
and λa = 0. ( X ⇐⇒ `1 ∨ `2 where `1 and `2 are carefully-chosen
term: ab
• MPE: by replacing “+” with “max” in the AC. literals.
coefficient: 2 ∗ 5 = 10 • Equivalent Modulo Forgetting (EMF): A func-
Claim: If an AC: tion f (X) is EMF to function g(X, Y ) iff f (X) =
An instantiation can have multiple sub-circuits; with ∃Y g(X, Y ).
1. computes a factor f , the same term, but different coefficients. Sum the • Tseitin Transformation (1968): convert Boolean
coefficients up to get the factor. formulas into CNF.
2. and is decomposable, deterministic and
smooth,
it computes marginals (2003) and MPE (2006).
Graph Abstraction: Examples CNF Properties: Cutwidth and Pathwidth Auxiliary Variables
Given a CNF: Given the case: EMF Compile d-DNNF
f(X) g(X,Y) for g(X,Y)
(X ∨ Y ) ∧ (Y ∨ ¬Z) ∧ (¬X ∨ Q) C1 v5 + v6
C2 v4 + ¬v5 + v6
forgetting Y
C3 v1 + v3 + v4 + v5
1 2 3
C4 v2 + v3
CNF: (X∨Y)∧(Y∨¬Z)∧(¬X∨Q) C5 v1 + v2 + ¬v3 DNNF
for f(X)
C1
C2
X Y Z Q There’s no easy direct way from f (X) to its DNNF.
C3 Sometimes fn (X) has exponential size when gn (X, Z)
(a) primal graph C4 has polynomial size.
C5
When adding auxiliary variables to ∆, we guarantee
equal satisfiability.
v1 v2 v3 v4 v5 v6
X∨Y Y∨¬Z ¬X∨Q An example:
cutset {C1,C2}
(b) dual graph cutset {C3,C5} separator {v4,v5}
∆ = (A ∨ D) ∧ (B ∨ D) ∧ (C ∨ D) ∧ (A ∨ E) ∧ (B ∨ E) ∧ (C ∨ E)

separator {v1} Σ = (A ∨ ¬X) ∧ (B ∨ ¬X) ∧ (C ∨ ¬X) ∧ (D ∨ X) ∧ (E ∨ X)

X Y Z Q Here we have ∃XΣ = ∆ by doing existential


cutset {C3,C4,C5} cutset {C2,C3}
separator {v1,v2} separator {v1,v3,v4} quantification (forgetting).

X∨Y Y∨¬Z ¬X∨Q cutset {C3} Extended Resolution: might reduce cost. (e.g.
separator {v1,v3} Pigeonhole: exponential to polynomial) e.g.:
(c) incidence graph resolution: (recall)
Cutwidth and pathwidth are both influenced by vari-
able ordering. X ∨ α, ¬X ∨ β
1 Cutwidth of a variable order: size of the largest cut- α∨β
Y X set, e.g. 3 in this case. (cutset is the set of clauses
that crosses a cut.) 1. {¬A, C} ∆
2. {¬B, C} ∆
Z 2 3 Q Cutwidth of CNF: smallest cutwidth attained by
3. {¬C, D} ∆
any variable order.
Pathwidth of a variable order: size of the largest se- 4. {¬D} ¬α
(d) hypergraph parator. e.g. 3 in this case. (separator is the set of 5. {A} ¬α
variables that appear in the clauses within the cut- 6. {¬C} 3, 4
set, and before the cut — according to the variable 7. {¬A} 1, 6
Graph Properties: Treewidth ordering.) 8. {} 5, 7
Tree width of graph G: tw(G) is the minimum width Pathwidth of CNF: smallest pathwidth attained by
extension rule: (carefully choose literals `i and X
among all tree-decomposition of G. a any variable order.
is new/unseen to this CNF)
In many cases, good performance is guaranteed when
there’s a small treewidth. AC: Conclusions X ⇐⇒ `1 ∨ `2
a https://en.wikipedia.org/wiki/Treewidth
Two fundamental notations:
1. Arithmetic Circuit (AC): indicator variables, cons- which is equivalent with adding the following clauses:
tants, additions, multiplications
2. Evaluation AC (at evidence): set indicator → 1 if ¬X ∨ `1 ∨ `2
its subscript is consistent with evidence, otherwise 0. X ∨ ¬`1
Three fundamental questions: (1) reference factor X ∨ ¬`2
f (x)? (2) marginal of factor? (3) MPE of factor?
Intuition: resolving multiple variables all at once.

You might also like