Unit-4 Bayesian Networks
Unit-4 Bayesian Networks
Bayesian Networks
Overview
Recalling of Naïve Bayes
Bayesian networks
Making predictions
Learning Bayesian networks
Specific algorithms(K2, Tree Augmented Naïve Bayes (TAN),
Bayesian Multinet)
Data structures for fast learning (an All-Dimensions (AD) tree )
Recalling of Naïve Bayes
In many Bayesian networks, each node represents a Variable such as someone's height,
age or gender. A variable might be discrete, such as Gender = {Female, Male} or might
be continuous such as someone's age.
Links are added between nodes to indicate that
1. one node directly influences the other.
2. When a link does not exist between two nodes, this does not mean that they are
completely independent, as they may be connected via other nodes. They may however
become dependent or independent depending on the evidence that is set on other nodes.
Directed Acyclic Graph
A node X is a parent of
Fig 2:- A another node Y if there is an
arrow from node X to node Y
Each node in the graph is a e.g. A is a parent of B
random variable B
c D
Rainy
day
A simple Bayesian network. Ex:-
Assume all these nodes corresponds to variable which
are Boolean. Late Accident
7 Nodes are:- wakeup
1. Late wakeup
2. Accident
3. Rainy day
Traffic
4. Traffic jam jam
Late for
5. Late for work work
6. Late for meeting
7. Meeting postponed Meeting
postponed
Late for
meeting
Making Predictions
Rainy
day
7 Nodes or Variable. Fig:-
So there are (2^7)-1 combination and for each Late Accident
combination we can obtain probability table. wakeup
Traffic
jam
Late for
work
Meeting
postponed
Late for
meeting
Making Predictions
Rainy
day
From these networks we can read different Fig:-
conditional independence relationship.
Late Accident
wakeup
Meeting
postponed
Late for
meeting
General Product Rule
Sample of General Product Rule
X1
X2 X3
X5
X4 X6
p(x1, x2, x3, x4, x5, x6) = p(x6 | x5) p(x5 | x3, x2) p(x4 | x2, x1) p(x3 | x1) p(x2 | x1) p(x1)
14
Arc Reversal - Bayes Rule
X1 X2 X1 X2
X3 X3
p(x1, x2, x3) = p(x3 | x1) p(x2 | x1) p(x1) p(x1, x2, x3) = p(x3 | x2, x1) p(x2) p( x1)
is equivalent to is equivalent to
X1 X2 X1 X2
X3 X3
It is used to avoid Overfitting (Overfitting happens when a model learns the detail and
noise in the training data to the extent that it negatively impacts the performance of the
model on new data.).