0% found this document useful (0 votes)
9 views

Unit IV

Uploaded by

madhurcb1
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit IV

Uploaded by

madhurcb1
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT-IV

Object Segmentation
Supervised and Unsupervised Learning
Regression Vs Segmentation
Regression and classification are supervised learning approach that
maps an input to an output based on example input-output pairs,
while clustering is a unsupervised learning approach.
• Regression: It predicts continuous values and their output.
Regression analysis is the statistical model that is used to
predict the numeric data instead of labels. It can also identify
the distribution trends based on the available data or historic
data. Predicting a person's income based on various attributes
such as age and experience is an example of creating a
regression model.
• Classification: It predicts discrete number of values. In
classification the data is categorized under different labels
according to some parameters and then the labels are predicted
for the data. Classifying emails as either spam or not spam is
example of classification problem.
• Clustering: Clustering is grouping up of data
according to the similarity of data points and
data patterns. The aim of this is to separate
similar categories of data and differentiate
them into localized regions. This way, when a
new data point arrives, we can easily identify
which group or cluster it belongs to. This is
done for unstructured datasets where it is up
to the machine to figure out the categories.
Basic Decision Tree Concept
• A Decision Tree is an important data structure known to solve many
computational problems
Binary Decision Tree
A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7
Basic Concept

• In Example , we have considered a decision tree where values of any


attribute if binary only. Decision tree is also possible where attributes are
of continuous data type
Decision Tree with numeric data
Some Characteristics

• Decision tree may be n-ary, n ≥ 2.


• There is a special node called root node.
• All nodes drawn with circle (ellipse) are called internal nodes.
• All nodes drawn with rectangle boxes are called terminal nodes or leaf
nodes.
• Edges of a node represent the outcome for a value of the node.
• In a path, a node with same label is never repeated.
• Decision tree is not unique, as different ordering of internal nodes can give
different decision tree.
Decision Tree and Classification Task
• Decision tree helps us to classify data.
– Internal nodes are some attribute

– Edges are the values of attributes

– External nodes are the outcome of classification

• Such a classification is, in fact, made by posing questions starting from the
root node to each terminal node.
Decision Tree and Classification Task

Example 1 : Vertebrate Classification


Name Body Skin Gives Birth Aquatic Aerial Has Legs Hibernates Class
Temperature Cover Creature Creature
Human Warm hair yes no no yes no Mammal
Python Cold scales no no no no yes Reptile
Salmon Cold scales no yes no no no Fish
Whale Warm hair yes yes no no no Mammal
Frog Cold none no semi no yes yes Amphibian
Komodo Cold scales no no no yes no Reptile
Bat Warm hair yes no yes yes yes Mammal
Pigeon Warm feathers no no yes yes no Bird
Cat Warm fur yes no no yes no Mammal
Leopard Cold scales yes yes no no no Fish
Turtle Cold scales no semi no yes no Reptile
Penguin Warm feathers no semi no yes no Bird
Porcupine Warm quills yes no no yes yes Mammal
Eel Cold scales no yes no no no Fish
Salamander Cold none no semi no yes yes Amphibian

What are the class label of Dragon and Shark?


Decision Tree and Classification Task
Example 1 : Vertebrate Classification
• Suppose, a new species is discovered as follows.
Name Body Skin Cover Gives Aquatic Aerial Has Hibernates Class
Temperature Birth Creature Creature Legs

Gila Monster cold scale no no no yes yes


?
• Decision Tree that can be inducted based on the data (in Example 1) is as
follows.
Decision Tree and Classification Task
• Example 1 illustrates how we can solve a classification problem by asking a
series of question about the attributes.
– Each time we receive an answer, a follow-up question is asked until we reach a
conclusion about the class-label of the test.

• The series of questions and their answers can be organized in the form of a
decision tree
– As a hierarchical structure consisting of nodes and edges

• Once a decision tree is built, it is applied to any test to classify it.


Basic Idea
Segment the predictor space into sub-regions
and we learn from the training set the value to
predict as the mean or mode or median of the
respond variable of the training examples that
are in that segment.
x2 Region2
Region1

x2 a

Region3

x1 a
x1
X1<x1a x1>x1a

X2<x2a x2>x2a
Why Trees?
What would you do tonight? Decide amongst the
following:

• Finish homework
• Go to a party
• Read a book
• Hang out with friends
Homework Deadline
tonight?
Yes
No

Party invitation? Do homework

No Yes

Do I have friends Go to the party


Yes
No
Hang out with
Read a book friends
Why Trees?
We split the predictor space as branches of a
tree and therefore these methods are called
decision tree methods
Why Forest?
Not the most powerful models but using multiple trees as in
bagging, random forests and boosting yield much better results.
Regression
Build a regression tree:
Divide the predictor space into J distinct not
overlapping regions R1,R2,R3,…,RJ

We make the same prediction for all observations in


the same region; use the mean of responses for all
training observations that are in the region
x2 Region2
Region1

y=2.2

y=3.2

Region3
y=5.6

x1
Finding the sub-regions
The regions could have any shape.
But we choose just rectangles
x2 Region2
x2 Region2 Region1
Region1
y=2.2
Y=2.5

Y=2.9 Y=3.2

Region3
Region3
Y=5.1y=5.6

x1
x1
Find boxes R1, . . . , RJ that minimize the RSS

where is the mean response value of all training


observations in the Rj region

This computationally very expensive!

Solution: Top down approach, greedy approach


recursive binary splitting
Recursive Binary Splitting
1. Consider all predictor Xp and all the all possible values of the
cutpoints s for each of the predictors. Choose the predictor
and cutpoint s.t. it minimizes the RSS

This can be done quickly, assuming number of predictors is


not very large
2. Repeat #1 but only consider the sub-regions
3. Stop: node contains only one class or node contains less
than n data points or max depth is reached
Overfitting
If we keep splitting we will be reducing RSS
Pruning
Fewer splits or fewer regions lower variance better
interpretation at cost of little more bias

Ideas?

Stop splitting when RSS improvement is lower than a


threshold
Smaller trees but not effective (short sighted)
A split early on in the tree might be followed by a very
good split; a split that leads to a large reduction in RSS later on
Pruning
Better is to grow a large tree and then look subtrees
that minimize the test error

How?

Cross-validation of all possible subtrees?


This is too expensive

Cost complexity pruning—also known as weakest link pruning


Cost complexity pruning
Consider a tuning parameter a that for each value of a there is a
subtree that minimizes

Where |T| is the number of terminal nodes. a controls the


complexity of the tree similarly we saw with other regularizations
(e.g. LASSO).

It turns out that as we increase α from zero in, branches get pruned
from the tree in a nested and predictable fashion, so obtaining the
whole sequence of subtrees as a function of α is easy.
ALGORITHM FOR PRUNING

1. Use recursive binary splitting to grow a large tree on


the training data, stopping only when each terminal
node has fewer than some minimum number of
observations
2. Apply cost complexity pruning to the large tree in
order to obtain a sequence of best subtrees, as a
function of α
3. Use K-fold cross-validation to choose α
– Repeat #1 and #2 on the k-th fold
– Estimate the MSE as a function of α
Average all and pick α
4. Return the subtree from Step 2 that corresponds to
the chosen value of α
Hitters data set:
Response variable: baseball player’s Salary
Predictors:
– Years (the number of years that he has played
in the major leagues)
– Hits (the number of hits that he made in the
previous year)
– Walks, RBI, hits, putouts

Note: log-transform Salary so that its distribution


has more of a typical bell-shape.
Multiple Decision Trees
Multi-output problem:
• A multi-output problem is a supervised learning problem with
several outputs to predict, that is when Y is a 2d array of size
[n_samples, n_outputs].
• When there is no correlation between the outputs, a very simple
way to solve this kind of problem is to build n independent
models, i.e. one for each output, and then to use those models
to independently predict each one of the n outputs. However,
because it is likely that the output values related to the same
input are themselves correlated, an often better way is to build a
single model capable of predicting simultaneously all n outputs.
First, it requires lower training time since only a single estimator
is built. Second, the generalization accuracy of the resulting
estimator may often be increased.
With regard to decision trees, this strategy can readily be used
to support multi-output problems. This requires the following
changes:
• Store n output values in leaves, instead of 1;
• Use splitting criteria that compute the average reduction
across all n outputs.
This module offers support for multi-output problems by
implementing this strategy in both DecisionTreeClassifier and
DecisionTreeRegressor. If a decision tree is fit on an output
array Y of size [n_samples, n_outputs] then the resulting
estimator will:
• Output n_output values upon predict;
• Output a list of n_output arrays of class probabilities upon
predict_proba.
The use of multi-output trees for regression is demonstrated
in Multi-output Decision Tree Regression. In this example, the
input X is a single real value and the outputs Y are the sine
and cosine of X.
The use of multi-output trees for classification is demonstrated in Face
completion with a multi-output estimators. In this example, the inputs X are
the pixels of the upper half of faces and the outputs Y are the pixels of the
lower half of those faces.

You might also like