0% found this document useful (0 votes)
17 views

AI Lecture 9

The document provides an overview of machine learning, defining it as the ability of computers to learn without explicit programming, and discusses its applications such as spam filtering, stock marketing, and self-driving cars. It covers various machine learning methods, including supervised, unsupervised, and reinforcement learning, as well as decision trees for classification and regression tasks. Additionally, it explains the concepts of entropy, information gain, and techniques for improving decision tree accuracy, such as pruning and early stopping.

Uploaded by

dekukunmc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

AI Lecture 9

The document provides an overview of machine learning, defining it as the ability of computers to learn without explicit programming, and discusses its applications such as spam filtering, stock marketing, and self-driving cars. It covers various machine learning methods, including supervised, unsupervised, and reinforcement learning, as well as decision trees for classification and regression tasks. Additionally, it explains the concepts of entropy, information gain, and techniques for improving decision tree accuracy, such as pruning and early stopping.

Uploaded by

dekukunmc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

ARTIFICIAL

INTELLIGENCE
DR. MANAL TANTAWI
ASSOCIATE PROFESSOR
IN
SCIENTIFIC COMPUTING DEPARTMENT
FACULTY OF COMPUTER & INFORMATION
SCIENCES
AIN SHAMS UNIVERSITY
PART 9

➢Introduction to Machine Learning


➢Decision Trees
INTRODUCTION
TO MACHINE
LEARNING
DEFINITION OF MACHINE LEARNING

➢Arthur Samuel (1959). Machine Learning: Field of study


that gives computers the ability to learn without being
explicitly programmed.
DEFINITION OF MACHINE LEARNING
MOTIVATING EXAMPLE

• SPAM FILTERING
Spam: all emails you don’t want to receive and has not asked to
receive

T: Identify Spam Emails


P: the number of correctly classified emails as spam/non-spam
E: database of emails that were labelled by users.
➢ PATTERN RECOGNITION (CLASSIFICATION)
• DIAGNOSIS
• OBJECT RECOGNITION
• BIOMETRICS
MACHINE • SPEECH RECOGNITION
LEARNING • HUMAN MACHINE INTERFACES

APPLICATIONS ➢ STOCK MARKETING


➢ SELF DRIVING CARS
➢ NATURAL LANGUAGE PROCESSING
➢ RECOMMENDATION SYSTEMS
Supervised learning

MACHINE Unsupervised learning


LEARNING

Reinforcement learning
Collect and Collect and understand data
understand

Prepare Prepare data

MACHINE
LEARNING Train Train A model

WORKFLOW
Test Test the model

Improve Improve accuracy (improving previous steps)


COLLECT AND UNDERSTAND DATA
Typically, data is represented in tables. The columns are referred to as features of the data,
and the rows are referred to as examples.

Example
PREPARE DATA
Real-world data is never ideal
to work with. Data might be
sourced from different systems
and different organizations,
which may have different
standards and rules for data
integrity. There are always
missing data, column of same
value for all examples, empty
columns, inconsistent data,
categorical data that needs to
be encoded and data in a
format that is difficult to work
with for the algorithms that we Example for dataset with missing and ambiguous values
want to use.
SUPERVISED LEARNING
• SUPERVISED LEARNING (SL)

➢ Is the machine learning task of learning a function that maps an input to an output
based on example input-output pairs. It infers a function from labeled training
data consisting of a set of training examples.

➢ In supervised learning, each example is a pair consisting of an input object


(vector) and a desired output value. A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for mapping
new (unseen) examples (generalization).
SUPERVISED LEARNING

Regression Classification
REGRESSION (PREDICTING HOUSE PRICES)
REGRESSION (PREDICTING HOUSE PRICES)
REGRESSION (PREDICTING HOUSE PRICES)
REGRESSION (PREDICTING HOUSE PRICES)
REGRESSION (PREDICTING HOUSE PRICES)

linear Non-linear
Rule based learning

CLASSIFICATION Instance based learning (lazy


learning)

Experience based learning


CLASSIFICATION
➢ Rule based learning: The defining characteristic of a rule-based machine learner is the
identification and utilization of a set of relational rules that collectively represent the
knowledge captured by the system using some form of learning algorithm.

➢ Instance based learning: instead of performing explicit generalization, it compares new


problem instances with instances seen in training, which have been stored in memory. Because
computation is postponed until a new instance is observed, these algorithms are sometimes
referred to as "lazy“.

➢ Experience based learning: It infers a function from labeled training data consisting of a set
of training examples, which can be used for classifying new (unseen) examples
(generalization).
RULE BASED LEARNING

Corresponding Rules:
• If it is sunny, temperature is Hot, humidity is high, and Wind is
Weak then I will not play.
• If it is sunny, temperature is Hot, humidity is high, and Wind is
Strong then I will not play.
• If it is Overcast, temperature is Hot, humidity is high, and
Wind is Weak then I will play.
.
.
.
.
• If it is Rain, temperature is Mild, humidity is high, and Wind is
Strong then I will not play.
DRAWBACKS OF RULE-BASED SYSTEMS

• NO LEARNING, THE SYSTEM JUST USES THE SET OF RULES GIVEN BY THE KNOWLEDGE
ENGINEER.
• SOMETIMES MORE THAN ONE RULE CAN BE ACTIVATED OR NONE OF THEM.
• FOR LARGE DATASETS, THERE WILL BE TOO MUCH LONG RULES.
• RULES CONTAIN REDUNDANCIES AND NOT NECESSARY CONDITIONS.
• OVERFITTING (LOW OR ZERO TRAINING ERROR AND HIGH-TEST ERROR).
• DECISION TREE LEARNING IS A METHOD FOR
APPROXIMATING DISCRETE-VALUED TARGET
FUNCTIONS. THAT IS ROBUST TO NOISY DATA AND
CAPABLE OF LEARNING DISJUNCTIVE EXPRESSIONS.

DECISION
TREES • LEARNED TREES CAN ALSO BE RE-REPRESENTED AS
SETS OF IF-THEN RULES TO IMPROVE HUMAN
READABILITY.
• DECISION TREES ARE A NON-PARAMETRIC
SUPERVISED LEARNING METHOD USED FOR BOTH
CLASSIFICATION AND REGRESSION TASKS.
DECISION TREE FOR PLAY TENNIS
DECISION TREE FOR PLAY TENNIS

Corresponding Rules:
• If it is sunny and humidity is high, then I will not
play.
• If it is sunny and humidity is normal, then I will
play.
• If it is overcast, then I will play.
• If it is raining and wind is strong, then I will not
play.
• If it is raining and wind is weak, then I will
play.
CONTINUE…

• EACH INTERNAL NODE TESTS AN ATTRIBUTE.


• EACH BRANCH CORRESPONDS TO ATTRIBUTE VALUE.
• EACH LEAF NODE ASSIGNS A CLASSIFICATION.

WHEN TO CONSIDER DECISION TREES?


• INSTANCES DESCRIBABLE BY ATTRIBUTES VALUE
PAIRS.
• TARGET FUNCTION IS DISCRETE VALUED.
• POSSIBLY NOISY TRAINING DATA.
• DISJUNCTIVE HYPOTHESES MAY BE REQUIRED.
Classification Tree (when tree
TYPES OF classifies things into categories)

DECISION
TREES Regression Tree (predict numeric
value)
DECISION TREES (LOGIC FUNCTIONS)
DECISION TREES (LOGIC FUNCTIONS)
DECISION TREES (LOGIC FUNCTIONS)
DECISION TREES LEARNING FROM DATA
HOW CAN WE FIND THE BEST TREE ?

Exponentially large number of possible trees makes decision


tree learning hard!
GREEDY ALGORITHM FOR
DECISION TREE LEARNING
DECISION TREES TRAINING ALGORITHM ID3
(BINARY OR MULTICLASS)
ENTROPY AND INFORMATION GAIN
ENTROPY

Pi is the probability
of class i

Given 14 examples, two classes, 9 are positive and 5 are


negative
INFORMATION GAIN

(A) is set of attributes, Sv is the subset of S which attribute A


has value v and |Sv|is the number examples in |Sv|
INFORMATION GAIN
EXAMPLES FOR COMPUTING GAIN

Humidity is
better than
wind
TENNIS EXAMPLE

√ winner
TENNIS EXAMPLE
TENNIS EXAMPLE

√ winner

𝑆𝑅𝑎𝑖𝑛 = { D4, D5, D6, D10, D14}


After computing Gain (𝑆𝑅𝑎𝑖𝑛 , humidity), Gain (𝑆𝑅𝑎𝑖𝑛 ,
Temperature) and Gain (𝑆𝑅𝑎𝑖𝑛 , wind)
The winner with maximum gain is wind
TENNIS EXAMPLE

Sunny + high humidity -> all examples are No


Sunny + normal humidity -> all examples are Yes
Rain + strong wind -> all examples are No
Rain + weak wind -> all examples are Yes
Thus, we reach leaves for all branches
FINAL DECISION TREE FOR TENNIS EXAMPLE USING
ID3 LEARNING ALGORITHM
C4.5 AND C5 DECISION TREE LEARNING ALGORITHMS

• C4.5 AND C5 ARE EXTENSIONS OF ID3


➢ C4.5 made several improvements to ID3. Some of these are:
❖ Handling both continuous and discrete attributes - In order to handle
continuous attributes, C4.5 creates a threshold and then splits the list
into those whose attribute value is above the threshold and those that
are less than or equal to it.
❖ Handling training data with missing attribute values - C4.5 allows
attribute values to be marked as ? for missing. Missing attribute values
are simply not used in gain and entropy calculations or give them
values of highest probability for this example target class.
❖ Pruning trees after creation - C4.5 goes back through the tree once it's
been created and attempts to remove branches that do not help by
replacing them with leaf nodes.
➢ C5 is similar to C4.5 but it improves speed and memory usage.
CONTINUOUS VALUED FEATURES
For example, let temperature be a continuous valued
feature

Temperature 40 10 35 15 25 30
Play Tennis No No Yes No Yes Yes

We can’t consider each value as a node in the tree. This


will cause overfitting !
Solution ??
CONTINUOUS VALUED FEATURES
The solution is Threshold Split

Temperature >?

Yes No

----- -----

How can we choose this threshold ?


THRESHOLD SPLIT SELECTION ALGORITHM
Step 1:
sort all values for the continuous feature A in the training set
denote the sorted values {v1, v2, v3, … vN}

Step 2:
For i = 1 … N-1
Consider split 𝑡𝑖 = (𝑣𝑖 + 𝑣𝑖+1 ) / 2
Compute information gain according to threshold split A >= 𝑡𝑖
Chose the t* with the highest information gain.
THRESHOLD SPLIT SELECTION ALGORITHM
After sorting
Temperature 10 15 25 30 35 40
Play Tennis No No Yes Yes Yes No

12.5 20 27.5 32.5 37.5

Compute the information gain at each threshold


and choose the maximum
THRESHOLD SPLIT SELECTION ALGORITHM
After sorting
Temperature 10 15 25 30 35 40
Play Tennis No No Yes Yes Yes No

20 37.5

To avoid high computation, we can compute midpoints for


only adjacent points that differ in target value. It can be
showed that the maximum information gain will be one of
this points (Fayyad 1991).
OVERFITTING IN DECISION TREE
Principle of Occam’s razor:
“Simpler trees are better”
•When two trees have similar classification error
on the validation set, pick the simpler one

Same validation error choose


moderate complexity
Overfitting
HOW CAN WE TO CHOOSE SIMPLE TREES ?

➢ Early Stopping: stop learning algorithm before the tree becomes too
complex.

➢ Pruning: simplifies tree after learning algorithm terminate.


EARLY STOPPING

➢ STOP TREE FROM GROWING BASED ON VALIDATION SET

Validation
error
EARLY STOPPING

➢STOP TREE FROM GROWING BASED ON CLASSIFICATION ERROR


Typically, add magic parameter ε
-Stop if error doesn’t decrease by more than ε
92

Assume that ε=0.02


Split for outlook Classification
=Rain error
No split 0.21 √
Add “yes “ leaf
Split on wind 0.2
since number of yes
examples including Split on temperature 0.3
rain is more than Split on humidity 0.4
“No” examples
EARLY STOPPING
➢ STOP TREE FROM GROWING BASED ON SMALL NUMBER OF DATA POINTS FOR A NODE.
EXAMPLE :
EARLY STOPPING

Disadvantage:
Too short sighted: We may miss out on “good”
splits may occur right after “useless” splits.
PRUNING

• WHICH TREE IS SIMPLER ??

Simpler
PRUNING
L(T) is the number of tree T
leaf nodes
• WHICH TREE IS SIMPLER ??
L(T) = 5
L(T) = 3

Simpler
BALANCE FIT AND COMPLEXITY

➢ Want to balance:
How well tree fits data and the Complexity of tree
TOTAL COST
PRUNING ALGORITHM
PRUNING ALGORITHM
PRUNING ALGORITHM
“Undo” the splits on Tsmaller
REPEAT FOR EVERY SPLIT
DECISION TREES PRUNING ALGORITHM
CREDIT FOR

• MACHINE LEARNING – TOM M. MITCHELL


THANK YOU

You might also like