0% found this document useful (0 votes)
6 views

Chapter 2 Types of Machine Learning and Their Learning Strategies

This document discusses machine learning types and learning strategies. It covers supervised learning and describes decision trees and regression trees in detail. Decision trees split data into nodes and leaves to classify or predict categorical outcomes, while regression trees are used for continuous outcomes. The document provides examples and discusses the advantages of classification and regression trees, such as interpretability, as well as limitations like overfitting.

Uploaded by

samisey316
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter 2 Types of Machine Learning and Their Learning Strategies

This document discusses machine learning types and learning strategies. It covers supervised learning and describes decision trees and regression trees in detail. Decision trees split data into nodes and leaves to classify or predict categorical outcomes, while regression trees are used for continuous outcomes. The document provides examples and discusses the advantages of classification and regression trees, such as interpretability, as well as limitations like overfitting.

Uploaded by

samisey316
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

1

Chapter 2. Machine Learning types and


Learning Strategy

BY Abebe B, PhD
12/20/2022
1. Machine Learning types

12/20/2022 Abebe B, PhD 2


Supervised Learning

Example. Model

12/20/2022 Abebe B, PhD 3


Supervised Learning

4
1. Decision Tree
• Decision Trees are a type of Supervised Machine Learning
where the data is continuously split according to a certain
parameter.
• The tree can be explained by two entities: nodes and leaves.
• The leaves are the decisions or the final outcomes. And the decision
nodes are where the data is split.

• An example of a decision tree can be


explained using binary tree.
• Let’s say you want to predict
whether a person is fit given
their information like age, eating
habit, and physical activity, etc.

12/20/2022 Abebe B, PhD 5


• The decision nodes here are questions like ‘What’s the age?’,
‘Does he exercise?’, and ‘Does he eat a lot of pizzas’? And the
leaves, which are outcomes like either ‘fit’, or ‘unfit’.
• In this case this was a binary classification problem (a yes no
type problem).
• There are two main types of Decision Trees:
• Classification trees (Yes/No types): Here the decision variable is
Categorical.
• Regression trees (Continuous data types): Here the decision or
the outcome variable is Continuous, e.g. a number like 123.
• There are many algorithms out there which construct Decision
Trees, but one of the best is called as ID3 Algorithm (Iterative
Dichotomiser 3).
• The algorithm iteratively (repeatedly) dichotomizes(divides) features
into two or more groups at each step.

12/20/2022 Abebe B, PhD 6


• ID3 Algorithm will perform following tasks recursively
1. Create root node for the tree.
2. If all examples are positive, return leaf node „positive‟.
3. Else if all examples are negative, return leaf node „negative‟
4. Calculate the entropy of current state H(S).
5. For each attribute, calculate the entropy with respect to the
attribute „x‟ denoted by H(S, x).
6. Select the attribute which has maximum value of
Information Gain IG(S, x).
7. Remove the attribute that offers highest IG from the set of
attributes
8. Repeat until we run out of all attributes, or the decision tree
has all leaf nodes.

12/20/2022 Abebe B, PhD 7


• Entropy Entropy, also called as Shannon
Entropy is denoted by H(S)

• Entropy tells us about the predictability of a certain event. Example,


consider a coin toss whose probability of heads is 0.5 and probability
of tails is 0.5.
• Here the entropy is the highest possible, since there’s no way of
determining what the outcome might be.
• Alternatively, consider a coin which has heads on both the sides, the
entropy of such an event can be predicted perfectly since we know
beforehand that it’ll always be heads. In other words, this event has
no randomness hence it’s entropy is zero.
• In particular, lower values imply less uncertainty while higher values
imply high uncertainty.
• Information Gain Information gain is also called as
Kullback-Leibler divergence denoted by IG(S,A) for a set
S is the effective change in entropy after deciding on a
particular attribute A.
• It measures the relative change in entropy with respect
to the independent.
12/20/2022 Abebe B, PhD 8
• where IG(S, A) is the information gain by applying feature A.
H(S) is the Entropy of the entire set, while the second term
calculates the Entropy after applying the feature A, where P(x)
is the probability of event x.

12/20/2022 Abebe B, PhD 9


• where „x‟ are the possible values for an
attribute. Here, attribute „Wind‟ takes
two possible values in the sample data,
hence x = {Weak, Strong}.

12/20/2022 Abebe B, PhD 10


12/20/2022 Abebe B, PhD 11
12/20/2022 Abebe B, PhD 12
• Here we observe that whenever the outlook is Overcast, Play Golf is
always ‘Yes’, it’s no coincidence by any chance, the simple tree
resulted because of the highest information gain is given by the
attribute Outlook.
• Now how do we proceed from this point? We can simply apply
recursion, you might want to look at the algorithm steps described
earlier.
• Now that we’ve used Outlook, we’ve got three attributes remaining
Humidity, Temperature, and Wind.
• And, we had three possible values of Outlook: Sunny, Overcast, Rain.
• Where the Overcast node already ended up having leaf node ‘Yes’, so
we’re left with two subtrees to compute: Sunny and Rain.

12/20/2022 Abebe B, PhD 13


12/20/2022 Abebe B, PhD 14
Example 2

• IG(S, Fever) = 0.99 - (8/14) *


0.81 - (6/14) * 0.91 = 0.13
• IG(S, BreathingIssues) = 0.40
• IG(Sʙʏ, Cough) = 0.09

12/20/2022 Abebe B, PhD 15


Example 2 ….

12/20/2022 Abebe B, PhD 16


2. Classification Trees:
• A classification tree is an algorithm where the target variable
is fixed or categorical. The algorithm is then used to identify
the “class” within which a target variable would most likely
fall.
• An example of a classification-type problem would be
determining who will or will not subscribe to a digital
platform; or who will or will not graduate from high school.
• These are examples of simple binary classifications where the
categorical dependent variable can assume only one of two,
mutually exclusive values.
• For instance, you may have to predict which type of
smartphone a consumer may decide to purchase.

12/20/2022 Abebe B, PhD 17


• In such cases, there are multiple values for the categorical
dependent variable. Here’s what a classic classification tree
looks like

12/20/2022 Abebe B, PhD 18


3. Regression Trees : A regression
tree refers to an algorithm where
the target variable is and the
algorithm is used to predict it’s
value.
• As an example of a regression
type problem, you may want to
predict the selling prices of a
residential house, which is a
continuous dependent variable.
• This will depend on both
continuous factors like square
footage as well as categorical
factors like the style of home,
area in which the property is
located and so on.
12/20/2022 Abebe B, PhD 19
• Classification trees are used when the dataset needs to be split
into classes which belong to the response variable.
• Regression trees are used when the response variable is
continuous. For instance, if the response variable is something
like the price of a property or the temperature of the day, a
regression tree is used.
• In other words, regression trees are used for prediction-type
problems while classification trees are used for
classification-type problems.
• A classification tree splits the dataset based on the
homogeneity of data.
• In a regression tree, a regression model is fit to the target
variable using each of the independent variables.

12/20/2022 Abebe B, PhD 20


12/20/2022 Abebe B, PhD 21
• Advantages of Classification and Regression Trees
(i) The Results are Simplistic :- The interpretation of results
summarized in classification or regression trees is usually fairly
simple.
• It allows for the rapid classification of new observations.
• It can often result in a simpler model which explains why the
observations are either classified or predicted in a certain way.
(ii) Classification and Regression Trees are Nonparametric &
Nonlinear: The results from classification and regression trees
can be summarized in simplistic if-then conditions.
• The predictor variables and the dependent variable are linear.
• The predictor variables and the dependent variable follow some
specific nonlinear link function.
• The predictor variables and the dependent variable are monotonic

12/20/2022 Abebe B, PhD 22


(iii) Classification and Regression Trees Implicitly Perform Feature
Selection:
• When we use decision trees, the top few nodes on which the tree is
split are the most important variables within the set. As a result,
feature selection gets performed automatically and we don’t need to
do it again. Limitations of Classification and Regression Trees.
• There are many classification and regression trees examples where
the use of a decision tree has not led to the optimal result.
• Limitations of classification and regression trees:
• (i) Overfitting: Overfitting occurs when the tree takes into account a
lot of noise that exists in the data and comes up with an inaccurate
result.
• (ii) High variance: In this case, a small variance in the data can lead to
a very high variance in the prediction, thereby affecting the stability
of the outcome.
• (iii) Low bias: A decision tree that is very complex usually has a low
bias. This makes it very difficult for the model to incorporate any new
data.

12/20/2022 Abebe B, PhD 23


4. Regression
• Regression analysis is a statistical method to model the
relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.
• More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing
corresponding to an independent variable when other
independent variables are held fixed.
• It predicts continuous/real values such as temperature, age,
salary, price, etc.
• Example: Suppose there is a marketing company A, who does
various advertisement every year and get sales on that. The
below list shows the advertisement made by the company in
the last 5 years and the corresponding sales:

12/20/2022 Abebe B, PhD 24


• Now, the company wants to do the advertisement
of $200 in the year 2019 and wants to know the
prediction about the sales for this year. So to
solve such type of prediction problems in
machine learning, we need regression analysis.
• Regression is a supervised learning technique
which helps in finding the correlation between
variables and enables us to predict the
continuous output variable based on the one or
more predictor variables.
• It is mainly used for prediction, forecasting,
time series modeling, and determining the
causal-effect relationship between variables.
• "Regression shows a line or curve that passes
through all the datapoints on target-predictor
graph in such a way that the vertical distance
between the datapoints and the regression line is
minimum." The distance between datapoints and
line tells whether a model has captured a strong
relationship or not.

12/20/2022 Abebe B, PhD 25


• Some examples of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
• Terminologies Related to the Regression Analysis:
• Dependent Variable: The main factor in Regression analysis which we
want to predict or understand is called the dependent variable. It is also
called target variable.
• Independent Variable: The factors which affect the dependent variables
or which are used to predict the values of the dependent variables are
called independent variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value
or very high value in comparison to other observed values.
• Multicollinearity: If the independent variables are highly correlated with
each other than other variables, then such condition is called
Multicollinearity. It should not be present in the dataset, because it
creates problem while ranking the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the
training dataset but not well with test dataset, then such problem is
called Overfitting. And if our algorithm does not perform well even with
training dataset, then such problem is called underfitting.

12/20/2022 Abebe B, PhD 26


• Below are some other reasons for using Regression analysis:
• Regression estimates the relationship between the target and the
independent variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the
most important factor, the least important factor, and how each
factor is affecting the other factors.
• some important types of regression which are given below:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Support Vector Regression
Group Assignment
• Decision Tree Regression
• Random Forest Regression
• Ridge Regression
• Lasso Regression
• We will discuss detail of these in chapter 3.
12/20/2022 Abebe B, PhD 27
Unsupervised Learning

• Unsupervised Learning: has not target

28
Types of ML …

• Unsupervised learning

29
• Unsupervised type of algorithm consists of input data without labelled
response. There will not be any pre existing labels and human
intervention is also less. It is mostly used in exploratory analysis as it
can automatically identify the structure in data.

12/20/2022 Abebe B, PhD 30


2. Neural network Learning Techniques
1. Transfer Learning

• Transfer learning aims to leverage the learned knowledge


from a resource-rich domain/task to help learning a task
with not sufficient training data.
• Sometimes referred as domain adaptation
12/20/2022 Abebe B, PhD 31
• The resource-rich domain is known as the source and the low-
resource task is known as the target.

• Transfer learning works the best if the model features learned


from the source task are general (i.e., domain- independent).

12/20/2022 Abebe B, PhD 32


12/20/2022 Abebe B, PhD 33
12/20/2022 Abebe B, PhD 34
12/20/2022 Abebe B, PhD 35
12/20/2022 Abebe B, PhD 36
2. Parallel Learning

12/20/2022 Abebe B, PhD 37


12/20/2022 Abebe B, PhD 38
12/20/2022 Abebe B, PhD 39
• Layer parallelism

12/20/2022 Abebe B, PhD 40


12/20/2022 Abebe B, PhD 41
3. Centralized, Federated and distributed learning

12/20/2022 Abebe B, PhD 42


• Centralized Machine Learning
• In centralized machine learning, the participants are
connected with the central server to upload their data.
• In particular, the participants upload their local data to the
cloud server, and the cloud server performs all the
computational tasks to train the data.
• On the one hand, centralized training is computationally-
efficient for the participants as the participants are free
from the computation responsibilities, which require
higher resources.
• On the other hand, participants' private data is highly at
risk as the cloud server can be malicious or infer through
adversaries.

12/20/2022 Abebe B, PhD 43


• Distributed Machine Learning
• Distributed machine learning algorithms are designed to
resolve the computational problems of complex algorithms
on large scale datasets.
• The distributed machine learning algorithms are more
efficient and scalable than centralized algorithms. The
distributed models are trained with same methodology as in
centralized machine learning models, except they are
trained separately on multiple participants .
• During the training in a distributed algorithm, the
participants independently train their models and send the
weight updates to the central server.
• At the same time, the central server receives updates from
participants and performs averaging for output. After the
certain communication rounds, the convergence testing is
done on the central cloud server.
12/20/2022 Abebe B, PhD 44
• Federated Learning
• Similar to distributed machine learning, federated learning
also train the models independently.
• The only difference between distributed machine learning
and federated learning is that in federated learning, each
participant initializes the training independently as there is
no other participant in the network.
• In federated learning, the training is conducted
collaboratively and independently on individual participants.
• In particular, local epochs are declared in the learning
parameters and each participant train the data by running
the local epochs.
• After specific amounts of epochs, the local update is
computed, and the participants send the updates to the
cloud server.

12/20/2022 Abebe B, PhD 45

You might also like