0% found this document useful (0 votes)

97 views

Decision Trees

The document describes a machine learning course that covers topics like concept learning, decision trees, neural networks, and ensemble learning. It provides details on homework assignments, projects, and the schedule. Decision tree learning is discussed, including how decision trees represent target functions, the ID3 learning algorithm, information gain, entropy, and choosing the best attribute to split on using these concepts. Sample training data on whether to play tennis is presented.

Uploaded by

Rashid Mahmood

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views

Decision Trees

Uploaded by

Rashid Mahmood

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

Machine Learning

CS 165B
Spring 2012

1
Course outline

• Introduction (Ch. 1)
• Concept learning (Ch. 2)
• Decision trees (Ch. 3)
• Ensemble learning
• Neural Networks (Ch. 4)
• …

2
Schedule

• Homework 1 due today

• Homework 2 on decision trees will be handed out Thursday
4/19; due Wednesday 5/2
• Project choices by Friday 4/20
– Topic of discussion section

3
Projects
• Projects proposals are due by Friday 4/20.
• 2-person teams
• If you want to define your own project:
– Submit a 1-page proposal with references and ideas
– Needs to have a significant Machine Learning
component
– You may do experimental work, theoretical work, a
combination of both or a critical survey of results in
some specialized topic.
• Originality is not mandatory but is encouraged.
• Try to make it interesting!

4
Decision tree learning
 Decision tree representation
– Most popular method for representing discrete TE’s
– Decision tree represents disjunction of conjunctions
of attribute values
 More general H-representation than in concept learning

 ID3 learning procedure based on

– Entropy of set of +/- TEs
– Information gain from splitting set with use of attribute
– Greedy, hill-climbing algorithm
 Characterization of ID3 algorithm and its search space
 Overfitting issue
 Generalizations of basic procedure

5
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Decision Trees
• Decision tree to represent learned target functions
– Each internal node tests an attribute
– Each branch corresponds to attribute value
– Each leaf node assigns a classification
Outlook

• Can be represented
sunny overcast rain
by logical formulas
Humidity Yes Wind

high normal strong weak

No Yes No Yes

7
Representation in decision trees

 Example of representing rule in DT’s:

if outlook = sunny AND humidity = normal
OR
if outlook = overcast
OR
if outlook = rain AND wind = weak
then playtennis

8
Applications of Decision Trees

 Instances describable by a fixed set of attributes and their values

 Target function is discrete valued
– 2-valued
– N-valued
– But can approximate continuous functions
 Disjunctive hypothesis space
 Possibly noisy training data
– Errors, missing values, …
 Examples:
– Equipment or medical diagnosis
– Credit risk analysis
– Calendar scheduling preferences

9
Top-Down Construction

 Main loop:
1. Choose the “best” decision attribute (A) for next node
2. Assign A as decision attribute for node
3. For each value of A, create new descendant of node
4. Sort training examples to leaf nodes
5. If training examples perfectly classified, STOP,
Else iterate over new leaf nodes
 Grow tree just deep enough for perfect classification
– If possible (or can approximate at chosen depth)
 Which attribute is best?

10
Choosing Best Attribute?
• Consider 64 examples, 29+ and 35-
• Which one is better?
29+, 35- A1 29+, 35- A2
t f t f

25+, 5- 4+, 30- 15+, 19- 14+, 16-

• Which is better?
29+, 35- A1 29+, 35- A2
t f t f

21+, 5- 8+, 30- 18+, 33- 11+, 2-

11
Entropy

• A measure for
– uncertainty
– purity
– information content
• Information theory: optimal length code assigns (- log2p) bits to
message having probability p
• S is a sample of training examples
– p+ is the proportion of positive examples in S
– p- is the proportion of negative examples in S
• Entropy of S: average optimal number of bits to encode information
about certainty/uncertainty about S
Entropy(S) = p+(-log2p+) + p-(-log2p-) = -p+log2p+- p-log2p-
• Can be generalized to more than two values
12
Entropy

 Entropy can also be viewed as measuring

– purity of S,
– uncertainty in S,
– information in S, …
 E.g.: values of entropy for p+=1, p+=0, p+=.5
 Easy generalization to more than binary values
– Sum over pi *(-log2 pi) , i=1,n
 i is + or – for binary
 i varies from 1 to n in the general case

13
Choosing Best Attribute?
• Consider 64 examples (29+,35-) and compute entropies:
• Which one is better?
E(S)=0.993
29+, 35- A1 E(S)=0.993 29+, 35- A2
t f t f
0.650 0.522 0.989 0.997
25+, 5- 4+, 30- 15+, 19- 14+, 16-

• Which is better?
E(S)=0.993 E(S)=0.993
29+, 35- A1 29+, 35- A2
t f t f
0.708 0.742 0.937 0.619
21+, 5- 8+, 30- 18+, 33- 11+, 2-

14
Information Gain
• Gain(S,A): reduction in entropy after choosing attr. A
Sv
Gain( S , A) = Entropy( S ) - 
vValues( A ) S
Entropy( S v )

E(S)=0.993
29+, 35- A1 E(S)=0.993 29+, 35- A2
t f t f
0.650 0.522 0.989 0.997
25+, 5- 4+, 30- 15+, 19- 14+, 16-
Gain: 0.395 Gain: 0.000

E(S)=0.993 E(S)=0.993
29+, 35- A1 29+, 35- A2
t f t f
0.708 0.742 0.937 0.619
21+, 5- 8+, 30- 18+, 33- 11+, 2-
Gain: 0.265 Gain: 0.121 15
Gain function
 Gain is measure of how much can
– Reduce uncertainty
 Value lies between 0,1
 What is significance of
 gain of 0?
 example where have 50/50 split of +/- both before and after
discriminating on attributes values
 gain of 1?
 Example of going from “perfect uncertainty” to perfect certainty
after splitting example with predictive attribute

– Find “patterns” in TE’s relating to attribute values

 Move to locally minimal representation of TE’s

16
Training Examples

Day Outlook Temp Humidity Wind Tennis?

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Determine the Root Attribute
9+, 5- E=0.940 9+, 5- E=0.940

Humidity Wind

High Low Weak Strong

3+, 4- 6+, 1- 6+, 2- 3+, 3-

E=0.985 E=0.592 E=0.811 E=1.000

Gain (S, Humidity) = 0.151 Gain (S, Wind) = 0.048

Gain (S, Outlook) = 0.246 Gain (S, Temp) = 0.029

18
Sort the Training Examples
9+, 5- {D1,…,D14}

Outlook

Sunny Overcast Rain

{D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D15}

2+, 3- 4+, 0- 3+, 2-

? Yes ?

Ssunny= {D1,D2,D8,D9,D11}
Gain (Ssunny, Humidity) = .970
Gain (Ssunny, Temp) = .570
Gain (Ssunny, Wind) = .019 19
Final Decision Tree for Example

Outlook

Sunny Rain
Overcast

Humidity
Yes Wind
High
Normal Strong Weak

No Yes No
Yes

20
Hypothesis Space Search by ID3
• Hypothesis space (all possible trees) is complete!
– Target function is included in there

21
Hypothesis Space Search in Decision Trees
• Conduct a search of the space of decision trees which
can represent all possible discrete functions.

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

is NP-hard.

• Perform a greedy heuristic search: hill climbing without

backtracking

• Statistics-based decisions using all data

22
Hypothesis Space Search by ID3
• Hypothesis space is complete!
– H is space of all finite DT’s (all discrete functions)
– Target function is included in there
• Simple to complex hill-climbing search of H
– Use of gain as hill-climbing function
• Outputs a single hypothesis (which one?)
– Cannot assess all hypotheses consistent with D (usually many)
– Analogy to breadth first search
 Examines all trees of given depth and chooses best…
• No backtracking
– Locally optimal ...
• Statistics-based search choices
– Use all TE’s at each step
– Robust to noisy data

23
Restriction bias vs. Preference bias
• Restriction bias (or Language bias)
– Incomplete hypothesis space
• Preference (or search) bias
– Incomplete search strategy
• Candidate Elimination has restriction bias
• ID3 has preference bias
• In most cases, we have both a restriction and a
preference bias.

24
Inductive Bias in ID3

• Preference for short trees, and for those with high

information gain attributes near the root
• Principle of Occam's razor
– prefer the shortest hypothesis that fits the data
• Justification
– Smaller likelihood of a short hypothesis fitting the data
at random
• Problems
– Other ways to reduce random fits to data
– Size of hypothesis based on the data representation
 Minimum description length principle

25
Overfitting the Data
• Learning a tree that classifies the training data perfectly may
not lead to the tree with the best generalization performance.
- There may be noise in the training data the tree is fitting
- The algorithm might be making decisions based on
very little data
• A hypothesis h is said to overfit the training data if the is
another hypothesis, h’, such that h has smaller error than h’
on the training data but h has larger error on the test data than h’.

On training

accuracy On testing

Complexity of tree 26
Overfitting in Decision Trees
• Consider adding noisy training example (should be +):
Day Outlook Temp Humidity Wind Tennis?
D15 Sunny Hot Normal Strong No

• What effect on earlier tree?

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

27
Overfitting - Example

Noise or other Outlook

coincidental regularities

Sunny Overcast Rain

1,2,8,9,11 3,7,12,13 4,5,6,10,14
2+,3- 4+,0- 3+,2-
Humidity Yes Wind

High Normal Strong Weak

No Wind No Yes

Strong Weak
No Yes 28
Avoiding Overfitting

• Two basic approaches

- Prepruning: Stop growing the tree at some point during
construction when it is determined that there is not enough
data to make reliable choices.
- Postpruning: Grow the full tree and then remove nodes
that seem not to have sufficient evidence. (more popular)
• Methods for evaluating subtrees to prune:
- Cross-validation: Reserve hold-out set to evaluate utility (more popular)
- Statistical testing: Test if the observed regularity can be
dismissed as likely to be occur by chance
- Minimum Description Length: Is the additional complexity of
the hypothesis smaller than remembering the exceptions ?
This is related to the notion of regularization that we will see
in other contexts– keep the hypothesis simple.

29
Reduced-Error Pruning
• A post-pruning, cross validation approach
- Partition training data into “grow” set and “validation” set.
- Build a complete tree for the “grow” data
- Until accuracy on validation set decreases, do:
For each non-leaf node in the tree
Temporarily prune the tree below; replace it by majority vote.
Test the accuracy of the hypothesis on the validation set
Permanently prune the node with the greatest increase
in accuracy on the validation test.
• Problem: Uses less data to construct the tree
• Sometimes done at the rules level

General Strategy: Overfit and Simplify

30
Rule post-pruning

• Allow tree to grow until best fit (allow overfitting)

• Convert tree to equivalent set of rules
– One rule per leaf node
– Prune each rule independently of others
 Remove various preconditions to improve
performance
– Sort final rules into desired sequence for use

31
Example of rule post pruning
• IF (Outlook = Sunny) ^ (Humidity = High)
– THEN PlayTennis = No
• IF (Outlook = Sunny) ^ (Humidity = Normal)
– THEN PlayTennis = Yes

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

32
Extensions of basic algorithm

• Continuous valued attributes

• Attributes with many values
• TE’s with missing data
• Attributes with associated costs
• Other impurity measures
• Regression tree

33
Continuous Valued Attributes
• Create a discrete attribute from continuous variables
– E.g., define critical Temperature = 82.5
• Candidate thresholds
– chosen by gain function
– can have more than one threshold
– typically where values change quickly
(48+60)/2 (80+90)/2

Temp 40 48 60 72 80 90
Tennis? N N Y Y Y N

34
Attributes with Many Values
• Problem:
– If attribute has many values, Gain will select it (why?)
– E.g. of birthdates attribute
 365 possible values

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

infinity
 ni/N -> 0

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

35
Attributes with many values
• Problem: Gain will select attribute with many values
• One approach: use GainRatio instead

Gain( S , A)
GainRatio ( S , A) =
SplitInformation( S , A) Entropy of the
partitioning
c Si Si
SplitInformation( S , A) = - log 2 Penalizes
higher number
i =1 S S of partitions

where Si is the subset of S for which A has value vi

(example of Si/S = 1/N: SplitInformation = log N)
36
Unknown Attribute Values

• What if some examples are missing values of attribute A?

• Use training example anyway, sort through tree
– if node n tests A, assign most common value of A among other
examples sorted to node n
– assign most common value of A among other examples with same
target value
– assign probability pi to each possible value vi of A
 assign fraction pi of example to each descendant in tree
• Classify test instances with missing values in same fashion
• Used in C4.5

37
Attributes with Costs
• Consider
– medical diagnosis: BloodTest has cost $150, Pulse has a cost of $5.
– robotics, Width-From-1ft has cost 23 sec., from 2 ft 10s.
• How to learn a consistent tree with low expected cost?
• Replace gain by
– Tan and Schlimmer (1990)
Gain 2 ( S , A)
Cost ( A)
– Nunez (1988)
2Gain( S , A) - 1
(Cost ( A) + 1)w
where w  [0, 1] determines importance of cost 38
Gini Index
• Another sensible measure of impurity
(i and j are classes)

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Gini Index

. .

. .
. .

Attributes: color, border, dot

Classification: triangle, square

40
. .
. .
. .
. .
red
Gini Index for Color

Color? green

.
yellow .

.
.

41
Gain of Gini Index

42
Three Impurity Measures

A Gain(A) GainRatio(A) GiniGain(A)

Color 0.247 0.156 0.058
Outline 0.152 0.152 0.046
Dot 0.048 0.049 0.015

43
Decision Trees as Features
• Rather than using decision trees to represent the target function, use
small decision trees as features

• When learning over a large number of features, learning decision trees

is difficult and the resulting tree may be very large
 (over fitting)
• Instead, learn small decision trees, with limited depth.
• Treat them as “experts”; they are correct, but only on a small region in
the domain.
• Then, learn another function over these as features.

44
Regression Tree
• Similar to classification
• Use a set of attributes to predict the value (instead
of a class label)
• Instead of computing information gain, compute
the sum of squared errors
• Partition the attribute space into a set of
rectangular subspaces, each with its own predictor
– The simplest predictor is a constant value

45
Rectilinear Division
• A regression tree is a piecewise constant function of the
input attributes
X2
X1 t1
r5
r2
X2  t2 X1  t3
r3
t2 r4
r1
r1 r2 r3 X2  t4

t1 t3 X1
r4 r5

46
Growing Regression Trees

• To minimize the square error on the learning sample,

the prediction at a leaf is the average output of the
learning cases reaching that leaf
• Impurity of a sample is defined by the variance of the
output in that sample:
I(LS)=vary|LS{y}=Ey|LS{(y-Ey|LS{y})2}

• The best split is the one that reduces the most variance:

DI ( LS , A) = vary|LS { y} - 
| LS a |
vary|LS a { y}
a | LS |

47
Regression Tree Pruning
• Exactly the same algorithms apply: pre-pruning
and post-pruning.
• In post-pruning, the tree that minimizes the
squared error on VS is selected.
• In practice, pruning is more important in
regression because full trees are much more
complex (often all objects have a different output
values and hence the full tree has as many leaves
as there are objects in the learning sample)

48
When Are Decision Trees Useful ?
• Advantages
– Very fast: can handle very large datasets with many
attributes
– Flexible: several attribute types, classification and
regression problems, missing values…
– Interpretability: provide rules and attribute importance
• Disadvantages
– Instability of the trees (high variance)
– Not always competitive with other algorithms in terms
of accuracy

49
History of Decision Tree Research
• Hunt and colleagues in Psychology used full search decision
trees methods to model human concept learning in the 60’s

• Quinlan developed ID3, with the information gain heuristics

in the late 70’s to learn expert systems from examples

• Breiman, Friedmans and colleagues in statistics developed

CART (classification and regression trees simultaneously

• A variety of improvements in the 80’s: coping with noise,

continuous attributes, missing data, non-axis parallel etc.

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

50
Summary
• Decision trees are practical for concept learning
• Basic information measure and gain function for best first
search of space of DTs
• ID3 procedure
– search space is complete
– Preference for shorter trees
• Overfitting is an important issue with various solutions
• Many variations and extensions possible

51
References
• Classification and regression trees, L.Breiman et al.,
Wadsworth, 1984
• C4.5: programs for machine learning, J.R.Quinlan,
Morgan Kaufmann, 1993
• Random Forests, L. Breiman, Leo, Machine Learning
45 (1): 5–32, 2001.
• The elements of statistical learning : Data mining,
inference, and prediction, Hastie, T., Tibshirani, R.,
Friedman, New York: Springer Verlag, 2001.
• Constructing Optimal Binary Decision Trees is NP-
complete, Hyafil, Laurent; Rivest, RL. Information
Processing Letters 5 (1): 15–17, 1976 52
Software
• In R:
– Packages tree and rpart
• C4.5:
– http://www.cse.unwe.edu.au/~quinlan
• Weka
– http://www.cs.waikato.ac.nz/ml/weka

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Decision Tree Analytics
0% (1)
Decision Tree Analytics
5 pages
An Improved Clarke and Wright Algorithm To Solve The Capacitated Vehicle Routing Problem
No ratings yet
An Improved Clarke and Wright Algorithm To Solve The Capacitated Vehicle Routing Problem
3 pages
Operations Research With Case Study On GM and Ford
100% (1)
Operations Research With Case Study On GM and Ford
15 pages
Capacitated Plant Location Model 1613647752323
No ratings yet
Capacitated Plant Location Model 1613647752323
2 pages
Intermittent Demand Forecasting
No ratings yet
Intermittent Demand Forecasting
6 pages
Balanced Scorecard Case Study
No ratings yet
Balanced Scorecard Case Study
5 pages
Tigabu Dagne
No ratings yet
Tigabu Dagne
125 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Modeling and Optimization in Green Logistics 3030453073 9783030453077
No ratings yet
Modeling and Optimization in Green Logistics 3030453073 9783030453077
178 pages
Balanced Scorecard
No ratings yet
Balanced Scorecard
38 pages
Three Ways To Estimate Remaining Useful Life: Predictive Maintenance With MATLAB
No ratings yet
Three Ways To Estimate Remaining Useful Life: Predictive Maintenance With MATLAB
9 pages
CH 06 Student Krajewski 9 e
No ratings yet
CH 06 Student Krajewski 9 e
26 pages
Hybrid Strategy, Ambidexterity and Environment - Lapersonne (2015)
No ratings yet
Hybrid Strategy, Ambidexterity and Environment - Lapersonne (2015)
12 pages
Time Series Forecasting Using A Hybrid ARIMA
No ratings yet
Time Series Forecasting Using A Hybrid ARIMA
17 pages
Forecasting PPT
No ratings yet
Forecasting PPT
58 pages
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
100% (1)
Artificial Intelligence For Fault Diagnosis of Rotating Machinery A Review
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Operation Research Problems Solving in Python: Prepared by Saurav Barua
No ratings yet
Operation Research Problems Solving in Python: Prepared by Saurav Barua
15 pages
ANL203 StudyGuide
No ratings yet
ANL203 StudyGuide
228 pages
SimPy For First Time Users - SimPy v2.2 Documentation
No ratings yet
SimPy For First Time Users - SimPy v2.2 Documentation
15 pages
Queuing Analysis
100% (1)
Queuing Analysis
47 pages
RSHH Qam12 ch10
No ratings yet
RSHH Qam12 ch10
76 pages
Lagrange Multipliers: Navigation Search
No ratings yet
Lagrange Multipliers: Navigation Search
25 pages
Operating Profit Margin
No ratings yet
Operating Profit Margin
4 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
Best Fitness Devices To Complement Your Carb Manager Membership
No ratings yet
Best Fitness Devices To Complement Your Carb Manager Membership
34 pages
1998 - A Method For VRP With Multiple Vehicle Types and TW
No ratings yet
1998 - A Method For VRP With Multiple Vehicle Types and TW
11 pages
Multivariate Statistical Control Thesis
No ratings yet
Multivariate Statistical Control Thesis
111 pages
MRP 1,2
No ratings yet
MRP 1,2
36 pages
Queuing Models
No ratings yet
Queuing Models
69 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Management Science
No ratings yet
Spreadsheet Modeling & Decision Analysis: A Practical Introduction To Management Science
64 pages
Prediction of House Prices Using Machine Learning
No ratings yet
Prediction of House Prices Using Machine Learning
8 pages
Meta Heuristic Method
No ratings yet
Meta Heuristic Method
46 pages
Simulation: Why Simulation' Is Used For Solving Real-Life Problems?
No ratings yet
Simulation: Why Simulation' Is Used For Solving Real-Life Problems?
14 pages
Logistic Regression Tutorial
No ratings yet
Logistic Regression Tutorial
25 pages
Forecasting: August 29, Wednesday
100% (1)
Forecasting: August 29, Wednesday
58 pages
Heuristics For The Quadratic Assignment Problem
No ratings yet
Heuristics For The Quadratic Assignment Problem
50 pages
Machine Learning and Data Mining in Manufacturing
No ratings yet
Machine Learning and Data Mining in Manufacturing
45 pages
Krajewski Om12 02
100% (1)
Krajewski Om12 02
69 pages
What Is Naive Bayes Algorithm?
No ratings yet
What Is Naive Bayes Algorithm?
18 pages
Fuzzy Lookup Add-In For Excel
No ratings yet
Fuzzy Lookup Add-In For Excel
4 pages
Big Data For Manufacturing
100% (1)
Big Data For Manufacturing
21 pages
Solution Manual for Making Hard Decisions with DecisionTools, 3rd Edition Robert T. Clemen, Terence Reilly - All Chapters Are Available In PDF Format For Download
92% (12)
Solution Manual for Making Hard Decisions with DecisionTools, 3rd Edition Robert T. Clemen, Terence Reilly - All Chapters Are Available In PDF Format For Download
34 pages
Demand Forecasting
No ratings yet
Demand Forecasting
30 pages
Recommended Reading For Time Series Analysis
No ratings yet
Recommended Reading For Time Series Analysis
2 pages
Optimization Based On Gradient Descent
No ratings yet
Optimization Based On Gradient Descent
24 pages
Lecture Note 2 - Forecasting Trends
No ratings yet
Lecture Note 2 - Forecasting Trends
60 pages
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
100% (1)
IACT 422 - 03 - Term Project - SUPPLY CHAIN SIMULATION FOR 4th PARTY LOGISTICS
37 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
Weighted Score and TOPSIS
No ratings yet
Weighted Score and TOPSIS
34 pages
AI In 100 Images
No ratings yet
AI In 100 Images
104 pages
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
No ratings yet
Predictive Maintenance of Railway Point Machine Using Machine Learning Algorithm
3 pages
Chapter 03 Solved
No ratings yet
Chapter 03 Solved
24 pages
Decision Theory
No ratings yet
Decision Theory
11 pages
CH 01 - Data and Statistics: Page 1
100% (4)
CH 01 - Data and Statistics: Page 1
35 pages
Multiobjective Optimization
No ratings yet
Multiobjective Optimization
36 pages
Simulation
No ratings yet
Simulation
33 pages
From Prognostics and Health Systems Management to Predictive Maintenance 1: Monitoring and Prognostics
From Everand
From Prognostics and Health Systems Management to Predictive Maintenance 1: Monitoring and Prognostics
Rafael Gouriveau
No ratings yet
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
How To Activate Wondershare For Lifetime With Key and 127.0.0.1 Platform - Wondershare.com - 2019.mp4
No ratings yet
How To Activate Wondershare For Lifetime With Key and 127.0.0.1 Platform - Wondershare.com - 2019.mp4
5 pages
BS Information Technology-5th-1
No ratings yet
BS Information Technology-5th-1
1 page
Govt Primary School Chak No
No ratings yet
Govt Primary School Chak No
6 pages
Collection of Nodes and Edges Without Any Loop Is Called Tree Tree Is Non Linear Data Structure
No ratings yet
Collection of Nodes and Edges Without Any Loop Is Called Tree Tree Is Non Linear Data Structure
2 pages
Constructor Vs Member Function
No ratings yet
Constructor Vs Member Function
2 pages
Documents For Regularization NEW
No ratings yet
Documents For Regularization NEW
3 pages
Depth of Complete Binary Tree
No ratings yet
Depth of Complete Binary Tree
1 page
Lecture #1 Topic Introduction To Statistics
No ratings yet
Lecture #1 Topic Introduction To Statistics
2 pages
Quality Assurance and Testing: Lecture # 7
No ratings yet
Quality Assurance and Testing: Lecture # 7
46 pages
10-Fee Structure Finalcdr
No ratings yet
10-Fee Structure Finalcdr
11 pages
Software Quality Assurance Notes
No ratings yet
Software Quality Assurance Notes
49 pages
A Novel Reaching Law For Sliding Mode Control of Uncertain Discrete-Time Systems
No ratings yet
A Novel Reaching Law For Sliding Mode Control of Uncertain Discrete-Time Systems
1 page
Elizabeth G. Creamer - Visual Displays in Qualitative and Mixed Method Research_ a Comprehensive Guide (2025, Routledge) - Libgen.li 7.52.53 PM 8.08.30 PM
No ratings yet
Elizabeth G. Creamer - Visual Displays in Qualitative and Mixed Method Research_ a Comprehensive Guide (2025, Routledge) - Libgen.li 7.52.53 PM 8.08.30 PM
158 pages
Chapter 4. Solving Linear Programs The Simplex Method
No ratings yet
Chapter 4. Solving Linear Programs The Simplex Method
34 pages
Generalized Gronwall Inequalities and Their Applications To Fractional Differential Equations
No ratings yet
Generalized Gronwall Inequalities and Their Applications To Fractional Differential Equations
9 pages
Solutions To Assignment 2, Math 220
No ratings yet
Solutions To Assignment 2, Math 220
3 pages
Gerber Shi Ur
No ratings yet
Gerber Shi Ur
28 pages
DFN - Super Indicator 2022
No ratings yet
DFN - Super Indicator 2022
8 pages
STATISTICS
No ratings yet
STATISTICS
9 pages
Discrete Complex Analysis and Probability: Stanislav Smirnov Hyderabad, August 20, 2010
No ratings yet
Discrete Complex Analysis and Probability: Stanislav Smirnov Hyderabad, August 20, 2010
34 pages
Cap 4 PDF
No ratings yet
Cap 4 PDF
16 pages
Classroom: How To Identify Support & Resistance On Technical Charts
No ratings yet
Classroom: How To Identify Support & Resistance On Technical Charts
11 pages
Lecture 1 COMPUTATIONAL METHODS IN CIVIL ENGINEERING
No ratings yet
Lecture 1 COMPUTATIONAL METHODS IN CIVIL ENGINEERING
24 pages
Average Study Material PDF 4 PDF
No ratings yet
Average Study Material PDF 4 PDF
6 pages
M.SC Mathematics Syllabus 2016 17
No ratings yet
M.SC Mathematics Syllabus 2016 17
22 pages
QM Guideline Omcl Qualification Annex7 Mass Spectrometers April2018
No ratings yet
QM Guideline Omcl Qualification Annex7 Mass Spectrometers April2018
12 pages
Content Question Bank 141MAT104 PDF
No ratings yet
Content Question Bank 141MAT104 PDF
2 pages
Download ebooks file Handbook of Complementary Methods in Education Research 3rd Edition Judith L Green all chapters
100% (3)
Download ebooks file Handbook of Complementary Methods in Education Research 3rd Edition Judith L Green all chapters
47 pages
Math 1 All Answers
No ratings yet
Math 1 All Answers
259 pages
Silo - Tips - A Rational Approach To Feasibility Analysis
No ratings yet
Silo - Tips - A Rational Approach To Feasibility Analysis
9 pages

Decision Trees

Uploaded by

Decision Trees

Uploaded by

Machine Learning

• Homework 1 due today

 ID3 learning procedure based on

Day Outlook Temp Humidity Wind Tennis?

high normal strong weak

 Example of representing rule in DT’s:

 Instances describable by a fixed set of attributes and their values

25+, 5- 4+, 30- 15+, 19- 14+, 16-

21+, 5- 8+, 30- 18+, 33- 11+, 2-

 Entropy can also be viewed as measuring

– Find “patterns” in TE’s relating to attribute values

Day Outlook Temp Humidity Wind Tennis?

High Low Weak Strong

3+, 4- 6+, 1- 6+, 2- 3+, 3-

Gain (S, Humidity) = 0.151 Gain (S, Wind) = 0.048

Gain (S, Outlook) = 0.246 Gain (S, Temp) = 0.029

Sunny Overcast Rain

{D1,D2,D8,D9,D11} {D3,D7,D12,D13} {D4,D5,D6,D10,D15}

• Goal: to find the best decision tree

• Finding a minimal decision tree consistent with a set of data

• Perform a greedy heuristic search: hill climbing without

• Statistics-based decisions using all data

• Preference for short trees, and for those with high

• What effect on earlier tree?

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

Noise or other Outlook

Sunny Overcast Rain

High Normal Strong Weak

• Two basic approaches

General Strategy: Overfit and Simplify

• Allow tree to grow until best fit (allow overfitting)

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

• Continuous valued attributes

 Likely to discriminate well on small sample

– For sample of fixed size n, and attribute with N values, as N ->

 - pi*log pi -> 0 for all i and entropy -> 0

 Hence gain approaches max value

where Si is the subset of S for which A has value vi

• What if some examples are missing values of attribute A?

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Attributes: color, border, dot

A Gain(A) GainRatio(A) GiniGain(A)

• When learning over a large number of features, learning decision trees

• To minimize the square error on the learning sample,

• Quinlan developed ID3, with the information gain heuristics

• Breiman, Friedmans and colleagues in statistics developed

• A variety of improvements in the 80’s: coping with noise,

• Quinlan’s updated algorithm, C4.5 (1993) is commonly used (New:C5)

• Boosting (or Bagging) over DTs is a good general purpose algorithm

You might also like