0% found this document useful (0 votes)
11 views34 pages

Yapay Zeka Ve Makine Öğrenmesi 10

yapay zeka ders notları

Uploaded by

erenyusufasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views34 pages

Yapay Zeka Ve Makine Öğrenmesi 10

yapay zeka ders notları

Uploaded by

erenyusufasim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture#5

Decision Support
Decision Support

• One of the earliest AI problems was decision support


• The first solution to this problem was expert systems
• They used an often very large number of hand-
crafted if-then rules
• These problems are suitable for a type of algorithms
called Decision Trees
• The dataset typically mostly contains categorical
features, but can have numerical features as well.
Decision Trees

• Decision Trees has one big advantage: the trained


model is easy to visualize and interpret
• We can understand what the algorithm has learned
• This can be important in some applications where
we want to investigate why the system took a
decision
• This is commonly referred to as a completely
transparent method
Example: Weather dataset
Outlook Temperature Humidity Windy Play
sunny hot high false NO
sunny hot high true NO
overcast hot high false YES
rainy mild high false YES
rainy cool normal false YES
rainy cool normal true NO
overcast cool normal true YES
sunny mild high false NO
sunny cool normal false YES
rainy mild normal false YES
sunny mild normal true YES
overcast mild high true YES
overcast hot normal false YES
rainy mild high true NO
Building the tree

• At each node, we need to find the attribute that best divides the
data into Yes and No.
• To do this we calculate the information gain for each parameter
and value.
• The attribute with the highest information gain is selected at
each node.
Find the root node
Outlook Sunny Overcast Rainy
Yes 2 4 3
No 3 0 2
Find the root node
Temperature Hot Mild Cool
Yes 2 4 3
No 2 2 1
Find the root node
Humidity High Normal
Yes 3 6
No 4 1
Find the root node
Windy True False
Yes 3 6
No 3 2
Find the root node
Attribute Gain
Outlook 0.306
Temperature 0.088
Humidity 0.211
Windy 0.107

Outlook has the highest gain and


is selected as root node
Find the root node
outlook
sunny rainy
overcast

yes

Overcast has perfect gain = all examples


belongs to the same category: Yes

Let’s find the sunny node!


All examples with sunny
Outlook Temperature Humidity Windy Play
sunny hot high false NO
sunny hot high true NO
sunny mild high false NO
sunny cool normal false YES
sunny mild normal true YES

• Now we use a subset of the data


• It contains all examples with Outlook = sunny
• 5 examples
Find the sunny node
Temperature Hot Mild Cool
Yes 0 1 1
No 2 1 0
Find the sunny node
Humidity High Normal
Yes 0 2
No 3 0
Find the sunny node

Windy True False


Yes 1 1
No 1 2
Find the sunny node

outlook
sunny rainy
overcast

humidity yes

high normal

no yes Since humidity has perfect gain


it is selected

Let’s find the rainy node!


All examples with rainy

Outlook Temperature Humidity Windy Play


rainy mild high false YES
rainy cool normal false YES
rainy cool normal true NO
rainy mild normal false YES
rainy mild high true NO

• Again, we use a subset of the data


• It contains all examples with Outlook = rainy
• 5 examples
Find the rainy node
Temperature Hot Mild Cool
Yes 0 2 1
No 0 1 1
Find the rainy node
Humidity High Normal
Yes 1 2
No 1 1
Find the rainy node
Windy True False
Yes 0 3
No 2 0

Since windy has perfect gain,


it is selected
Final tree

outlook
sunny rainy
overcast

humidity yes windy

high normal false true

no yes yes no
The problem

• In most cases, there are several possible trees that


can be generated
• The aim is to:
1. Generate a tree that as accurately as possible can classify
the training data
2. Generate the smallest possible tree
• It can be tricky to satisfy both
• The first is of highest priority
Generating a good tree

• There is a wide range of different algorithms for


generating decision trees
• Each tries to fulfill both criteria as much as possible
• Weka uses an algorithm called J48
Classification

• To classify an example, we need to traverse the tree


by following the nodes that matches the attribute
values in the example
• When we reach a leaf node, the result (category) is
returned
Overfitting

• Decision Trees can suffer from overfitting


• It means that the model learned is very specific to
the training data, but can be bad at classifying
unknown examples
• To get around this problem, learning is usually
stopped before there is a risk of overfitting
Overfitting

• A common approach to reduce overfitting in


Decision Trees is to stop creating more branches if
there is only a very small increase in gain
• We can set a minimum threshold of how large the
gain must be to allow a new branch to be created
• There is no universal answer to which limit to use
• You have to experiment on the dataset you use
When to use Decision Trees

• As mentioned, one big advantage of DTs is that we


can interpret the trained model
• There are some other benefits of DTs
• They work on both numerical and nominal attributes
without pre-processing the data, which many other
algorithms don’t
• They also support probabilistic reasoning of
assignments, which we did when we returned the
most probable category
When to use Decision Trees

• The major drawback is that DTs are not very good for complex
learning problems
• If we have lots of categories, the decision tree tends to be very
complicated and will most likely make poor predictions
• Another disadvantage is that they can only do simple greater-
than/less-than decisions for numerical attributes
• They work best if we have combinations of numerical and
nominal data, and few categories (which many real-world
problems satisfy)
Weka
• Weka’s standard Decision Tree classifier is called
J48.
• When using J48 on the Weather dataset we get the
following result:
R

• In R, we can use an algorithm called CART


• The dataset needs to be in csv format
• The R script looks like this:
R script
#Load the ML library
library(caret)

#Read the dataset


dataset <- read.csv("FIFA_skill.csv")

#setup 10-fold cross validation


control <- trainControl(method="cv", number=10)
metric <- "Accuracy"

#Train model using CART


set.seed(7)
cart <- train(PlayerSkill~., data=dataset, method="rpart",
metric=metric, trControl=control)

#Print result
print(cart)
R result
•Warning
> message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There result
>#Print were missing values in resampled performance measures.
>print(cart) CART

• 19 samples
• 3 predictor
• 2 classes: 'bad', 'good'

• No pre-processing
• Resampling: Cross-Validated (10 fold)
• Summary of sample sizes: 17, 17, 17, 17, 17, 17, ... Resampling results:

• Accuracy Kappa
• 0.55 0
R result

• The warning message from R means that the 10-fold


CV split the dataset so one class was missing in
some iteration
• This has large impact on the result
• R needs more data to accurately predict the dataset
• If we make a copy of each example in the dataset
(twice as much data), the result is:
R result
CART

38 samples
3 predictor
2 classes: 'bad', 'good'

No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 35, 34, 34, 34, 34, 34, ...
Resampling results across tuning parameters:

cp Accuracy Kappa
0.0000000 0.8416667 0.69
0.3333333 0.8416667 0.69
0.6666667 0.6083333 0.19

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.3333333.

You might also like