0% found this document useful (0 votes)
29 views

Supervised Learning-Classification Part-4 Divide and Conquer

- Decision tree learners are powerful classifiers that use a tree structure to model relationships between features and potential outcomes. They work by recursively splitting data into increasingly homogeneous subsets based on feature values. - The C5.0 algorithm is commonly used to build decision trees and performs well out of the box for most problems. It works by selecting features that maximize information gain at each split to create pure leaf nodes. - Decision trees can be pruned to prevent overfitting by stopping growth early or removing leaf nodes from a fully grown tree. This reduces model complexity and improves generalization to new data.

Uploaded by

Prerna Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Supervised Learning-Classification Part-4 Divide and Conquer

- Decision tree learners are powerful classifiers that use a tree structure to model relationships between features and potential outcomes. They work by recursively splitting data into increasingly homogeneous subsets based on feature values. - The C5.0 algorithm is commonly used to build decision trees and performs well out of the box for most problems. It works by selecting features that maximize information gain at each split to create pure leaf nodes. - Decision trees can be pruned to prevent overfitting by stopping growth early or removing leaf nodes from a fully grown tree. This reduces model complexity and improves generalization to new data.

Uploaded by

Prerna Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

SUPERVISED LEARNING:

CLASSIFICATION
Divide and Conquer: Decision Trees
and Rules
Introduction
• While deciding between several job offers with
various levels of pay and benefits, many people
begin by making lists of pros and cons, and
eliminate options based on simple rules. For
instance, ''if I have to commute for more than an
hour, I will be unhappy.'' Or, ''if I make less than
$50k, I won't be able to support my family.'' In
this way, the complex and difficult decision of
predicting one's future happiness can be reduced
to a series of simple decisions.
Decision tree
• Decision tree learners are powerful classifiers,
which utilize a tree structure to model the
relationships among the features and the
potential outcomes. As illustrated in the following
figure, this structure earned its name due to the
fact that it mirrors how a literal tree begins at a
wide trunk, which if followed upward, splits into
narrower and narrower branches. In much the
same way, a decision tree classifier uses a
structure of branching decisions, which channel
examples into a final predicted class value.
Some potential uses include:
• Credit scoring models in which the criteria that
causes an applicant to be rejected need to be
clearly documented and free from bias
• Marketing studies of customer behavior such as
satisfaction or churn, which will be shared with
management or advertising agencies
• Diagnosis of medical conditions based on
laboratory measurements, symptoms, or the rate
of disease progression
• In spite of their wide applicability, it is worth
noting some scenarios where trees may not
be an ideal fit. One such case might be a task
where the data has a large number of nominal
features with many levels or it has a large
number of numeric features. These cases may
result in a very large number of decisions and
an overly complex tree. They may also
contribute to the tendency of decision trees to
overfit data
Divide and Conquer
• Decision trees are built using a heuristic called
recursive partitioning. This approach is also
commonly known as divide and conquer
because it splits the data into subsets, which
are then split repeatedly into even smaller
subsets, and so on and so forth until the
process stops when the algorithm determines
the data within the subsets are sufficiently
homogenous, or another stopping criterion
has been met.
Divide and conquer might stop at a node in a
case that:
• All (or nearly all) of the examples at the node
have the same class
• There are no remaining features to distinguish
among the examples
• The tree has grown to a predefined size limit
Example
• To illustrate the tree building process, let's consider a simple
example. Imagine that you work for a Hollywood studio,
where your role is to decide whether the studio should move
forward with producing the screenplays pitched by promising
new authors. After returning from a vacation, your desk is
piled high with proposals. Without the time to read each
proposal cover-to-cover, you decide to develop a decision tree
algorithm to predict whether a potential movie would fall into
one of three categories: Critical Success, Mainstream Hit, or
Box Office Bust.
Split the feature indicating the number of celebrities, partitioning the movies
into groups with and without a significant number of A-list stars:
among the group of movies with a larger number of celebrities, we can make
another split between movies with and without a high budget:
• At this point, we have partitioned the data into three
groups. The group at the top-left corner of the
diagram is composed entirely of critically acclaimed
films. This group is distinguished by a high number of
celebrities and a relatively low budget. At the top-
right corner, majority of movies are box office hits
with high budgets and a large number of celebrities.
The final group, which has little star power but
budgets ranging from small to large, contains the
flops.
The C5.0 decision tree algorithm
• The C5.0 algorithm has become the industry
standard to produce decision trees, because it
does well for most types of problems directly
out of the box. Compared to other advanced
machine learning models, the decision trees
built by C5.0 generally perform nearly as well,
but are much easier to understand and deploy.
• Tree based learning algorithms are considered to be
one of the best and mostly used supervised learning
methods (having a pre-defined target variable).
• Decision tree are powerful non-linear classifiers,
which utilize a tree structure to model the
relationships among the features and the potential
outcomes. A decision tree classifier uses a structure
of branching decisions, which channel examples into
a final predicted class value.
• The decision tree model follows the steps outlined below
in classifying data:
• It puts all training examples to a root.
• It divides training examples based on selected attributes.
• It selects attributes by using some statistical measures.
• Recursive partitioning continues until no training
example remains, or until no attribute remains, or the
remaining training examples belong to the same class.
Decision Tree on Car Purchase
Choosing the best split
• The first challenge that a decision tree will face is to
identify which feature to split upon.
• The degree to which a subset of examples contains
only a single class is known as purity, and any subset
composed of only a single class is called pure.
• There are various measurements of purity that can
be used to identify the best decision tree splitting
candidate.
• C5.0 uses entropy, that quantifies the randomness,
or disorder, within a set of class values.
• The decision tree hopes to find splits that reduce
entropy, ultimately increasing homogeneity within
the groups.
• In this formula, for a given segment of data (S), the
term c refers to the number of class levels and pi
refers to the proportion of values falling into class
level i.
• For example, suppose we have a partition of data
with two classes: red (60 percent) and white (40
percent). We can calculate the entropy as follows:
• > -0.60 * log2(0.60) - 0.40 * log2(0.40)
• [1] 0.9709506
• To use entropy to determine the optimal feature to split
upon, the algorithm calculates the change in homogeneity
that would result from a split on each possible feature,
which is a measure known as information gain.
• The information gain for a feature F is calculated as the
difference between the entropy in the segment before the
split (S1) and the partitions resulting from the split (S2):
• The higher the information gain, the better a
feature is at creating homogeneous groups after
a split on this feature. If the information gain is
zero, there is no reduction in entropy for splitting
on this feature. On the other hand, the maximum
information gain is equal to the entropy prior to
the split. This would imply that the entropy after
the split is zero, which means that the split
results in completely homogeneous groups
Pruning the decision tree
• If the tree grows overly large, many of the decisions it
makes will be overly specific and the model will be
overfitted to the training data.
• The process of pruning a decision tree involves reducing
its size such that it generalizes better to unseen data.
• One solution to this problem is to stop the tree from
growing once it reaches a certain number of decisions or
when the decision nodes contain only a small number of
examples. This is called early stopping or pre-pruning
the decision tree.
• The post-pruning, involves growing a tree
that is intentionally too large and pruning leaf
nodes to reduce the size of the tree to a more
appropriate level.
Yes

You might also like