DataScience - Unit 4
DataScience - Unit 4
DATA SCIENCE
UNIT -IV
Mrs. M. VANATHI
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
1 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
2 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
What is Learning?
• Herbert Simon: “Learning is any process by which a
system improves performance from experience.”
• For a machine, experiences come in the form of
data.
• What is the task? Classification/Problem solving /
planning / control
• What does it mean to improve performance? Learning
is guided by an objective- notion of Loss or Gain
3 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
4 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
5 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Machine Learning - Definitions
• Machine Learning - Field of study that gives computers
the ability to learn without being explicitly programmed
(Arthur Samuel 1959)
• A branch of artificial intelligence, concerned with the
design and development of algorithms that allow
computers to evolve behaviors based on empirical data
(experience)
• Learning denotes changes in the system that are adaptive
in the sense that they enable the system to do the same
task ( or tasks drawn from a population of similar tasks)
more effectively next time (H.Simon 1983)
6 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
So What Is Machine Learning?
•Automating automation -Getting computers to program themselves
•Writing software is the bottleneck -Let the data do the work instead!
Traditional Programming
Machine Learning
7 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
Traditional Programming : Write Programs using hard-coded (fixed)
rules
8 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Machine Learning teams
• Observe the world (Labeled Data)
• Develop models that match observations
• Teach computer to learn these models
• Computer applies learned model to the world
Provides various techniques that can learn from and make predictions on
data
9 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Machine Learning Workflow
10 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
11 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
ML Terminology
Examples: Items or instances used for learning or evaluation.
Features: Set of attributes represented as a vector associated with an example.
Labels: Values or categories assigned to examples. In classification the labels
are categories; in regression the labels are real numbers.
Target: The correct label for a training example. This is extra data that is
needed for supervised learning.
Output: Prediction label from input set of features using a model of the
machine learning algorithm.
Training sample: Examples used to train a machine learning algorithm.
Validation sample: Examples used to tune parameters of a learning
algorithm.
Model: Information that the machine learning algorithm stores after training.
The model is used when predicting the output labels of new, unseen examples.
12 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
13 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
14 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
15 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
16 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Key Elements of Machine Learning
17 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Machine Learning Areas
Supervised Learning: Data and corresponding labels are
given
20 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example: Spam Filter
21 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
22 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Classification
In classification, we predict labels y (classes) for inputs x
Examples:
OCR (input: images, classes: characters)
Medical diagnosis (input: symptoms, classes: diseases)
Automatic essay grader (input: document, classes: grades)
Fraud detection (input: account activity, classes: fraud / no fraud)
Customer service email routing
Recommended articles in a newspaper, recommended books
DNA and protein sequence identification
Categorization and identification of astronomical images
Financial investments
… many more
23 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Inductive learning
Simplest form: learn a function from examples
f is the target function
An example is a pair (x, f(x))
24 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Inductive learning method
Construct/adjust h to agree with f on training set
(h is consistent if it agrees with f on all examples)
E.g., curve fitting:
25 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
26 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Supervised Learning
Learning a discrete function: Classification
Boolean classification:
Each example is classified as true(positive) or false(negative).
27 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
28 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
29 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Applications of SL
Pattern recognition
Medical diagnosis
Speech recognition
Face recognition
30 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Unsupervised Learning
In unsupervised learning, there is no such supervisor and we
structure to the input space such that certain patterns occur more
often than others, and we want to see what generally happens
and what does not.
31 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
32 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Types of USL….
Dimension reduction
Clustering
engine).
33 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
scenario
A company with a data of past customers, the customer data
36 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
The above image shows the robot, diamond, and fire.
The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired.
The robot learns by trying all the possible paths and
then choosing the path which gives him the reward
with the least hurdles.
Each right step will give the robot a reward and each
wrong step will subtract the reward of the robot. The
total reward will be calculated when it reaches the final
reward that is the diamond.
37 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
40 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Comparison Table
Criteria Supervised ML Unsupervised ML Reinforcement ML
Type of problems Regression and classification Association and Clustering Exploitation or Exploration
41 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Dimensionality
Reduction
42 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Curse of Dimensionality
Increasing the number of
features will not always improve
classification accuracy.
44 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
Feature extraction: finds a set of Feature selection:
new features (i.e., through some chooses a subset of the
mapping f()) from the existing original features.
features.
The mapping f()
could be linear or x1
x1 non-linear x
x 2 xi1
2 .
y1
. y xi2
.
. 2 x y .
x
f (x)
y . .
. .
. .
.
yK . xiK
.
xN
xN
45 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Feature Extraction
• Linear combinations are particularly attractive because they are
simpler to compute and analytically tractable.
47 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
x1
Vector Representation x
2
• A vector x ϵ Rn can be represented by .
.
n components: x:
.
.
.
• Assuming the standard base <v1, v2,
xN
…, vN> (i.e., unit vectors in each
dimension), xi can be obtained by xT vi T
projecting x along the direction of vi: xi T x vi
N
vi vi
• x can be “reconstructed” from its x xi vi x1v1 x2v2 ... xN vN
projections as follows: i 1
• Since the basis vectors are the same for all x ϵ R n (standard
basis), we typically represent them as a n-component vector.
48 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
• Example assuming n=2: x1 3
x:
x2 4
1
1 M
1 M
1
x
M
i 1
( x i x )( x i xT
)
M
i i
i 1
T
M
AAT where A=[Φ1 Φ2 ... ΦΜ]
i.e., the columns of A are the Φi
(N x M matrix)
52 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
x ui i ui where we assume 1 2 ... N
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you can explicitly put them in this order)
M
i 1
i
T
i
Using diagonalization:
The diagonal elements of
The columns of P are
x P P T the eigenvectors of ΣX
Λ are the eigenvalues of ΣX
or the variances
y i U T ( xi x ) PT i
M M
1 M 1 1
y
M
(y i y )(y i y )
T
M
i 1
( y i )(T
y i )
M
(
i 1
P T
i )( P T
i )T
i 1
M M
1 1
M
i 1
(T
P i )(T
i P ) P T
(
M
i i
i 1
T
) P P T
x P PT ( PPT ) P
57 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example
• Compute the PCA of the following dataset:
(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)
58 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Contd.,
• The eigenvectors are the solutions of the systems:
xui i ui
62 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Application to Images
• The goal is to represent images in a space of lower
dimensionality using PCA.
o Useful for various applications, e.g., face recognition, image
compression, etc.
63 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Application to Images Contd.,
• The key challenge is that the covariance matrix Σx is now very
large (i.e., N2 x N2) – see Step 3:
(N2 x M matrix)
(N 2
65 UNIT-IV x M matrix) 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Application to Images Contd.,
• But do AAT and ATA have the same number of
eigenvalues/eigenvectors?
− AAT can have up to N2 eigenvalues/eigenvectors.
− ATA can have up to M eigenvalues/eigenvectors.
− It can be shown that the M eigenvalues/eigenvectors of ATA
are also the M largest eigenvalues/eigenvectors of AAT
66 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Application to Images Contd.,
Step 3: compute ATA (i.e., instead of AAT)
Step 4b: compute λi, ui of AAT using λi=μi and ui=Avi, then
normalize ui to unit length.
Dataset
68 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example Contd., Top eigenvectors: u1,…uk
(visualized as an image - eigenfaces)
69 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example Contd.,
How can you visualize the eigenvectors (eigenfaces) as an
image?
• Their values must be first mapped to integer values in the
interval [0, 255] (required by PGM format).
• Suppose fmin and fmax are the min/max values of a given
eigenface (could be negative).
• If xϵ[fmin, fmax] is the original value, then the new value
yϵ[0,255] can be computed as follows:
70 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example Contd.,
Interpretation: represent a face in terms of eigenfaces
y1
y
K 2
xˆ yi ui y1u1 y2u2 ... yK uK x xˆ x : .
i 1
.
yK
x
71 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Limitations
• Background changes cause problems
- De-emphasize the outside of the face (e.g., by multiplying the input
image by a 2D Gaussian window centered on the face).
• Light changes degrade performance
- Light normalization might help but this is a challenging issue.
• Performance decreases quickly with changes to face size
- Scale input image to multiple sizes.
- Multi-scale eigenspaces.
• Performance decreases with changes to face orientation (but not as fast
as with scale changes)
- Out-of-plane rotations are more difficult to handle.
- Multi-orientation eigenspaces.
72 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Limitations contd.,
• Not robust to misalignment
73 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Limitations contd.,
• PCA is not always an optimal dimensionality-reduction technique
for classification purposes.
74 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Discriminant Analysis (LDA)
What is the goal of LDA?
• Seeks to find directions along which the classes are best separated
(i.e., increase discriminatory information).
• It takes into consideration the scatter (i.e., variance) within-classes
and between-classes.
• Let μi is the mean of the i-th class, i=1,2,…,C and μ is the mean of the
whole dataset: 1 C
μ μi
C i 1
Within-class scatter matrix
C Mi
S w ( x j μi )( x j μi )T
i 1 j 1
C
S i
Between-class scatter matrix
b (μ
i 1
μ )(μ i μ ) T
76 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Discriminant Analysis (LDA) Contd.,
• Suppose the desired projection transformation is:
y UTx
• Suppose the scatter matrices of the projected data y are:
Sb , S w
• LDA seeks transformations that maximize the between-class
scatter and minimize the within-class scatter:
| Sb | | U T SbU |
max or max T
| Sw | | U S wU |
77 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Discriminant Analysis (LDA) Contd.,
• It can be shown that the columns of the matrix U are the
eigenvectors (i.e., called Fisherfaces) corresponding to the largest
eigenvalues of the following generalized eigen-problem:
Sb uk k S wuk
• It can be shown that Sb has at most rank C-1; therefore, the max
number of eigenvectors with non-zero eigenvalues is C-1, that
is:
max dimensionality of LDA sub-space is C-1
e.g., when C=2, we always end up with one LDA feature
no matter what the original number of features was!
78 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Example
79 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Discriminant Analysis (LDA) Contd.,
• If Sw is non-singular, we can solve a conventional eigenvalue
problem as follows:
Sb uk k S wuk
S Sb uk k uk
1
w
80 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Discriminant Analysis (LDA) Contd.,
To alleviate this problem, PCA could be applied first:
1) First, apply PCA to reduce data dimensionality:
x1 y1
x y
2 2
x . PCA
y .
. .
xN yM
Classification
82 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Classification—A Two-Step Process
Model construction: describing a set of predetermined classes
Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label
The set of tuples used for model construction is training set
The model is represented as classification rules, decision trees, or
mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the classified
result from the model
Test set is independent of training set, otherwise over-fitting
will occur
If the accuracy is acceptable, use the model to classify data tuples
whose class labels are not known
83 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
84 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
No
1 Yes Large 125K
algorithm
2 No Medium 100K No
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
85 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Issues: Data Preparation
• Data cleaning
– Preprocess data in order to reduce noise and handle missing
values
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
• Data transformation
– Generalize data to (higher concepts, discretization)
– Normalize attribute values
86 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Classification Techniques
Decision Tree based Methods
Rule-based Methods
KNN
Naïve Bayes and Bayesian Belief Networks
Neural Networks
Support Vector Machines
and more...
87 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Learning decision trees
Example Problem: decide whether to wait for a table at a restaurant,
based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60,
>60)
88 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Feature(Attribute)-based representations
• Examples described by feature(attribute) values
– (Boolean, discrete, continuous)
• E.g., situations where I will/won't wait for a table:
90 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Expressiveness
Decision trees can express any function of the input attributes.
E.g., for Boolean functions, truth table row → path to leaf:
Trivially, there is a consistent decision tree for any training set with
one path to leaf for each example (unless f nondeterministic in x) but
it probably won't generalize to new examples
91 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Rule-Based Classifier
Classify records by using a collection of “if…then…” rules
Rule: (Condition) y
where
Condition is a conjunctions of attributes
y is the class label
LHS: rule antecedent or condition
RHS: rule consequent
Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
92 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
95 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
K-Nearest Neighbors
Given a query item: Return the most
Find k closest matches Frequent label
96 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
K-Nearest Neighbors
k = 3 votes for “cat”
97 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
K-Nearest Neighbors
2 votes for cat,
1 each for Buffalo, Cat wins…
Deer, Lion
98 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
K-NN Issues
The Data is the Model
• No training needed.
• Accuracy generally improves with more data.
• Matching is simple and fast (and single pass).
• Usually need data in memory, but can be run off disk.
Minimal Configuration:
• Only parameter is k (number of neighbors)
• Two other choices are important:
• Weighting of neighbors (e.g. inverse distance)
• Similarity metric
99 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
K-NN Metrics
• Euclidean Distance: Simplest, fast to compute
X
102 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Linear Regression
• The predicted value of y is given by:
i.e. rows of
• There are many gradient-based methods which reduce the RSS error
by taking the derivative wrt
which was
• These updates happen many times in one pass over the dataset.
Statistic: Some value which should be small under the null hypothesis,
and large if the alternate hypothesis is true.
And can be described as the fraction of the total variance not explained
by the model.
∑ 𝑖 𝑖)
Small if good fit
2 ( 𝑦 − ^
𝑦
2
𝑅 =1−
Line of
∑( 𝑖 )
𝑦 − 𝑦
2
Line of
X
110 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
R2-values and P-values
• Statistic: From R-squared we can derive another statistic (using
degrees of freedom) that has a standard distribution called an F-
distribution.
• From the CDF for the F-distribution, we can derive a P-value for the
data.
• The P-value is, as usual, the probability of observing the data under
the null hypothesis of no linear relationship.
Issues
Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Determine when to stop splitting
C0: 5 C0: 9
C1: 5 C1: 1
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
Gini Index
Misclassification error
GINI (t ) 1 [ p ( j | t )]2
j
k
ni
GINI split
i 1 n
GINI (i )
Easy to construct/implement
Extremely fast at classifying unknown records
Models are easy to interpret for small-sized trees
Accuracy is comparable to other classification techniques for
many simple data sets
Tree models make no assumptions about the distribution of the
underlying data : nonparametric
Have a built-in feature selection method that makes them
immune to the presence of useless variables
Stock market
X1
X2 X3
X5
X4 X6
p(x1, x2, x3, x4, x5, x6) = p(x6 | x5) p(x5 | x3, x2) p(x4 | x2, x1) p(x3 | x1) p(x2 | x1) p(x1)
W S P(Q| W, S)
w s 0.6 0.4
Reviewer
Quality w s 0.3 0.7
Mood
w s 0.4 0.6
w s 0.1 0.9
Review Accepted
Length
nodes = domain variables
edges = direct causal influence
P (w , s, m, q, l , a )
P (w )P (s )P ( m | w )P (q | w , s )P ( l | m )P (a | m, q )
Compact & natural representation:
nodes k parents O(2k n) vs. O(2n) params
natural parameters
M Q
L A
Allows combination of different types of reasoning:
Causal: P(Reviewer-Mood | Good-Writer)
Evidential: P(Reviewer-Mood | not Accepted)
Intercausal: P(Reviewer-Mood | not Accepted, Quality)
C D
C D
If you have a Boolean variable with k Boolean parents, this table has
2k+1 probabilities (but only 2k need to be stored)
P(C,S)=P(C|S).P(S)
S none benign malignant total
C
no 0.768 0.024 0.008 0.800
P(Smoke)
light 0.132 0.012 0.006 0.150
heavy 0.035 0.010 0.005 0.050
total 0.935 0.046 0.019
P(Cancer)
Diagnostic inference:
Knowing that the grass is wet,
diagnostic what is the probability that rain is
causal
the cause?
PW | R P R
P R | W
P W
PW | R PR
PW | R P R P W |~ R P~ R
0.9 0.4
0.75
0 . 9 0 .4 0 .2 0 . 6
Step 1
Determine what the propositional (random) variables should
be
Determine causal (or another type of influence)
relationships and develop the topology of the network
Burglary Earthquake
Alarm
P(A|B,E)
B E
True False
T T 0.950 0.050
T F 0.940 0.060
F T 0.290 0.710
F F 0.001 0.999
P(B) P(E)
Burglary Earth
0.001 0.002
Quake
B E P(A|B,E)
T T 0.95
Alarm
T F 0.94
F T 0.29
F F 0.001 A P(M|A)
UNIT-IV
F 0.05
171 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Population-wide ANomaly Detection and
Assessment (PANDA)
A detector specifically for a large-scale outdoor release of
inhalational anthrax
Uses a massive causal Bayesian network
Population-Wide Approach
Anthrax Release Global nodes
P ( j m a b e )
P ( j | a ) P ( m | a ) P ( a | b e ) P ( b ) P ( e )
0.9 0.7 0.001 0.999 0.998 0.00062
Converging
L B
Knowing T makes Lung Cancer Bronchitis
A and X independent
(intermediate cause)
P(J ^ M ^ A ^ ~B ^ ~E)
= P(J|A) P(M|A) P(A|~B,~E) P(~B) P(~E)
n
P(x1,..., xn ) P(xi | Parents(Xi ))
i1
statements.
JohnCalls is conditionally independent of other variables in
the networks.
P ( A) P ( B ) P ( E )( 0.95) P ( B ) P ( E )( 0.29)
P ( B ) P ( E )( 0.94) P ( B ) P ( E )( 0.001)
P ( A) ( 0.001)( 0.002 )( 0.95) ( 0.999 )( 0.002 )( 0.29)
( 0.001)( 0.998)( 0.94) ( 0.998)( 0.999 )( 0.001)
P ( A) 0.002517
(0.85)(0.001)
P ( J ) P ( A)( 0.9) P (A)( 0.05)
P( B | J ) 0.016
P ( J ) (0.002517)( 0.9) (0.9975)( 0.05) (0.052)
P ( J ) 0.052
Many false positives.
181 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Neural Network
in layers.
A typical structure has three layers: input, intermediate (called
output layers.
distributed processes.
Most ANN software run on sequential machines emulating
distributed processes.
188 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Pros and Cons of Neural Networks
Pros Flexibility and ease of
Good to use in continuous maintenance
domains with little Fast processing speed.
knowledge.
Cons
Ability to solve problems
that are difficult to define. Not interpretable, “black
Can be used when a good box”.
functional model is not Learning is slow.
known. Good generalization can
Provides human
require many data points.
characteristics to problem
solving that are difficult to
simulate.
direction only
No feedback or cycles –
Linear Neuron
Logistic Neuron
Sigmoidal (S-shaped)
Softmax
199
outputs
UNIT-IV yk 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Overall Network Function
• Combining stages of the overall function with sigmoidal output
• w(τ+1)=w(τ)−η∇En(w(τ))
212 UNIT-IV 06/02/2024
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
Descent Methods
• Newton-Raphson (second order) ∇2
• All of these can be used here, stochastic gradient descent is
particularly effective
of a fully-specified classifier.
• To evaluate the model while still building and tuning the model,
to create a third subset of the data known as the validation set.