0% found this document useful (0 votes)

3 views

Applied Machine Learning I

The document provides an overview of various supervised learning algorithms for classification tasks, including K-Nearest Neighbor (K-NN), Decision Trees, Random Forest, Support Vector Machines (SVM), and Logistic Regression. Each algorithm is explained in terms of its working principles, advantages, disadvantages, and specific terminologies. The document also includes examples and mathematical formalism relevant to the discussed algorithms.

Uploaded by

ibiamiheanyi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Applied Machine Learning I

Uploaded by

ibiamiheanyi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Lecture Week 8:

Applied Machine Learning

Supervised Learning Algorithms for Classification Task
Mr. Nwachukwu Victor C.
K-Nearest Neighbor Algorithm

Applied ML I... Nwachukwu Victor 2

K-Nearest Neighbor Algorithm
The K-Nearest Neighbors (K-NN)
algorithm is a popular technique in
machine learning used for both
classification and regression tasks.

It works on the principle that data points

that are close together are likely to have
similar characteristics.

Applied ML I... Nwachukwu Victor 3

KNN Classification
Given a dataset Distance Metrics
, where 1. Euclidean Distance
is an n-dimensional vector and is the
class label:
1. Compute Distance: For a test point x,
compute the distance between x and all
points in the training set Where and
2. Select Neighbors: Select the closest
points
3. Majority Vote: Determine the class 2. Manhattan Distance
label by majority vote among the
neighbors.

Applied ML I... Nwachukwu Victor 4

KNN Classification
Choosing K: The value of K significantly Lazy Learner: K-NN doesn't learn a
impacts the performance of the model from the data during training.
algorithm. A high K value reduces the Instead, it waits until a new data point
impact of noise but can lead to arrives and then calculates the distances
overfitting. A low K value can be more to determine the nearest neighbours.
sensitive to noise in the data.
Distance Metric: The choice of distance
Non-parametric: K-NN does not make metric (e.g., Euclidean, Manhattan)
any assumptions about the underlying affects how the distance between points
data distribution, making it suitable for is calculated.
various data types.
We will be using Euclidean distance
metric
Applied ML I... Nwachukwu Victor 5
KNN Classification
Advantages of K-NN Disadvantages of K-NN
• Simplicity: Easy to understand and • Computational Cost: Can be expensive
implement. in terms of memory and computation,
• No Training Phase: Useful for especially with large datasets, as it
applications where the dataset is needs to store all training data and
frequently updated. compute distances for each query.
• Adaptability: Can be used for both • Curse of Dimensionality: Performance
classification and regression tasks. can degrade with high-dimensional data
because the distance metrics become
less informative.
• Sensitive to Noise: Can be influenced
by irrelevant features or noisy data
points.
Applied ML I... Nwachukwu Victor 6
KNN Question I
Suppose you have a dataset containing Solution:
information about houses, with each
house represented by two features:
square footage (in square feet) and
number of bedrooms. You are given the
following dataset:

Using the KNN algorithm with K = 3 and

Euclidean distance measure, find the 3 nearest Thus the 3-nearest neighbours to the
neighbours of a new house with the following
new house are:
features: square footage = 1600 sq ft and
bedrooms = 2. House 1, House 4, House 3 7
Applied ML I... Nwachukwu Victor
KNN Question II
A KNN classifier assigns a test instance
to the majority class associated with its
K nearest training instances. Distance
between instances is measured using
Euclidean distance. Suppose we have
the following training set of positive (+)
and negative (-) instances and a single (a) What would be the class assigned to this
test instance for K=1 and give your reason
test instance (o). All instances are
(b) What would be the class assigned to this
projected onto a vector space of two test instance for K=3 and give your reason
real-valued features (X and Y). Answer (c) What would be the class assigned to this
the following questions. Assume test instance for K=5 and give your reason
“unweighted” KNN (every nearest (d) Setting K to a large value seems like a good
neighbour contributes equally to the idea. We get more votes! Given this particular
training set, would you recommend setting K =
final vote).
Applied ML I... Nwachukwu Victor 11? Why or why not? 8
Decision Tree Algorithm

Applied ML I... Nwachukwu Victor 9

Decision Tree (DT) Algorithm
The Decision Tree algorithm is a popular
supervised learning algorithm used for
both classification and regression tasks.

It splits the data into subsets based on

the most significant attribute in the
dataset, recursively doing so to form a
tree-like model of decisions.

10
Applied ML I... Nwachukwu Victor
Decision Tree Terminologies
• Root Node: A decision tree’s root node, which represents the original choice or
feature from which the tree branches, is the highest node.

• Internal Nodes (Decision Nodes): Nodes in the tree whose choices are
determined by the values of particular attributes. There are branches on these
nodes that go to other nodes.

• Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts
are decided upon. There are no more branches on leaf nodes.

• Branches (Edges): Links between nodes that show how decisions are made in
response to particular circumstances.

Applied ML I... Nwachukwu Victor 11

Decision Tree Terminologies
• Splitting: The process of dividing a node into two or more sub-nodes based on a
decision criterion. It involves selecting a feature and a threshold to create subsets
of data.

• Parent Node: A node that is split into child nodes. The original node from which a
split originates.

• Child Node: Nodes created as a result of a split from a parent node.

• Decision Criterion: The rule or condition used to determine how the data should
be split at a decision node. It involves comparing feature values against a
threshold.

• Pruning: The process of removing branches or nodes from a decision tree to

improve its generalisation and prevent overfitting. 12
Applied ML I... Nwachukwu Victor
How Decision Trees Work (Classification Task)
1. Starting Point: Begin with the entire Splitting Criteria
dataset as the root. 1. Gini impurity Index
2. Splitting: At each node, choose the best
feature to split the data. The best split is
based on a criterion such as Gini impurity,
Information Gain Where = probability of a randomly
3. Stopping Criteria: Recursively split the chosen element being correctly
subsets until one of the stopping criteria is classified, is the number of classes.
met, such as maximum depth of the tree, 2. Entropy
minimum number of samples in a node,
or no further gain from splitting.
4. Prediction: For classification, the label
of a new instance is determined by Information gain (S,A)
traversing the tree from the root to a leaf,
following the splits. Applied ML I... Nwachukwu Victor ∈
13
Building DTs using Gini Index
Given the dataset Step I: Weighted Gini Index of Employee
Employee Study Hrs. Pass Exam Gini Index (Fresher)
Fresher > 2hrs Y
Fresher > 2hrs Y
Senior > 2hrs Y
Gini index (Senior)
Junior < 2hrs Y
Fresher > 2hrs N
Senior > 2hrs N
Fresher < 2hrs N Gini index (Junior)

Build a decision tree using the Gini index

split criterion Weighted Gini for Employee

Applied ML I... Nwachukwu Victor 14

Building DTs using Gini Index Employee
Step II: Weighted Gini Index of Study Hrs.
Gini Index (>2hrs)
Fresher Junior Senior

Gini index (<2hrs)

>2 <2 <2 >2 <2

Weighted Gini for Study Hrs.

i.e. “Employee” has a lower weighted Y N Y Y N

gini index, thus a higher information
gain, so we will build the tree with
“Employee” as the root node.
Applied ML I... Nwachukwu Victor 15
Decision Tree (DT) Algorithm Pros and Cons
Pros: Cons:
1. Simple to understand and interpret. 1. Prone to overfitting, especially with
2. Can handle both numerical and deep trees.
categorical data. 2. Can be unstable; small variations in
3. Requires little data preprocessing (no the data can result in a completely
need for normalization or scaling). different tree.
4. Capable of capturing non-linear 3. Greedy nature does not guarantee the
relationships. globally optimal solution.
5. Can handle multi-output problems. 4. Decision trees can be biased if some
classes dominate.

Applied ML I... Nwachukwu Victor 16

Random Forest Algorithm

Applied ML I... Nwachukwu Victor 17

Random Forest Algorithm
The Random Forest algorithm is an
ensemble learning method that
combines the predictions of multiple
decision trees to improve classification
and regression accuracy.

By aggregating the results of several

trees, Random Forest mitigates the
overfitting problem inherent in
individual decision trees and provides
more robust predictions.

Applied ML I... Nwachukwu Victor 18

How Random Forest Works (Classification Task)
1. Bootstrap Sampling: 2. Decision Tree Construction: For each
The algorithm generates multiple subset, a decision tree is constructed.
subsets of the training data through a Unlike standard decision trees, Random
process called bootstrap sampling. Forest introduces additional
randomness: at each split in the tree, a
Each subset is created by randomly random subset of features is selected,
selecting samples from the original and the best split is chosen only from
dataset with replacement, meaning this subset. This process is known as
some samples may appear multiple "feature bagging.“
times in a subset. 3. Aggregation of Predictions: For
classification tasks, each decision tree in
the forest casts a vote for the predicted
class. The final prediction is the class
that receives the majority of votes.19
Applied ML I... Nwachukwu Victor
Mathematical Formalism
1. Bagging (Bootstrap Sampling) 2. Growing Decision Trees with feature
Given a dataset of size , bagging.
bootstrap samples , For each bootstrap sample , grow an
where each bootstrap sample is unpruned tree .
created by randomly selecting At each node of the tree, randomly
instances from with replacement. select features out of the total
features
From these features, select the
features that provide the best split
Where are the data instances according to a chosen impurity measure
and their corresponding labels. (Gini impurity, entropy).

Applied ML I... Nwachukwu Victor 20

Mathematical Formalism
3. Aggregating Predictions Pros and Cons
Once all decision trees are trained,
Each tree predicts a class label for an Pros
instance High Accuracy
Robustness
Feature Importance
The final prediction is determined by Versatility
majority vote. Handles Missing Values

Cons
Complex
Computationally Intensive
Memory Usage
Applied ML I... Nwachukwu Victor 21
Support Vector Machine (SVM) Algorithm

Applied ML I... Nwachukwu Victor 22

Support Vector Machine (SVM) Algorithm
Support Vector Machine (SVM) is a
machine learning algorithm used for
linear or nonlinear classification.
Key Concepts
1. Hyperplane: In SVM, the goal is to find
a hyperplane that best separates the
classes in the feature space.

2. Support Vectors: These are the data

points that are closest to the hyperplane
and influence its position and
orientation. The SVM algorithm uses
these support vectors to find the
optimal hyperplane. 23
Applied ML I... Nwachukwu Victor
Support Vector Machine (SVM) Algorithm
3. Margin: Margin is the distance
between the support vector and
hyperplane. The main objective of the
support vector machine algorithm is to
maximize the margin. The wider margin
indicates better classification
performance.
4. Kernel: Kernel is the mathematical
function used in SVM to map the original
input data points into high-dimensional
feature spaces. Some of the common
kernel functions are linear, polynomial,
radial basis function(RBF), and sigmoid.
24
Applied ML I... Nwachukwu Victor
Support Vector Machine (SVM) Algorithm
Advantages Disadvantages
1. Effective in high-dimensional spaces: 1. Not suitable for large datasets: SVMs
SVM is particularly effective in cases are not suitable for very large datasets
where the number of dimensions due to their high computational
exceeds the number of samples. complexity.
2. Memory efficient: SVM uses a subset 2. Difficult to choose the right kernel:
of training points (support vectors) in The choice of kernel and its parameters
the decision function. can significantly affect the performance
3. Versatile: Different kernel functions of SVM.
can be specified for the decision 3. Sensitive to noisy data: SVM does not
function, making it adaptable to various perform well when the data has a lot of
data types and distributions. noise.

Applied ML I... Nwachukwu Victor 25

Logistic Regression

Applied ML I... Nwachukwu Victor 26

Logistic Regression
Logistic regression is used for binary
classification using a sigmoid function
(logistic function), which takes input as
independent variables and produces a
probability value between 0 and 1
The logistic function is defined as:

Where is
a linear combination of the input
features and the model
parameters (coefficients)

Applied ML I... Nwachukwu Victor 27

Logistic Regression
Common Terms 4. Maximum likelihood estimation: The
1. Independent variables: The input method used to estimate the
characteristics or predictor factors coefficients of the logistic regression
applied to the dependent variable’s model, which maximizes the likelihood
predictions. of observing the data given the model.
2. Dependent variable: The target The log-likelihood function is given as:
variable in a logistic regression model,
which we are trying to predict.
3. Coefficient: The logistic regression Maximizing the log-likelihood function
model’s estimated parameters, show with respect to the parameters gives
how the independent and dependent the estimated model coefficients. This is
variables relate to one another. typically done using iterative
optimization algorithms such as gradient
descent 28
Applied ML I... Nwachukwu Victor
Logistic Regression Steps
1. Initialization 4. Prediction
Initialize the model parameters For a new instance , compute the
(to zero or small random linear combination
values)
2. Model Specification
Define the linear combinations of Compute the probability
features

Apply the logistic function to obtain Classify the instance based on the
probabilities: decision rule (e.g threshold at 0.5)
3. Use an optimization algorithm to
maximize the likelihood function and
find the optimal parameters
Applied ML I... Nwachukwu Victor 29

ISO 9001 Lead Auditor Sample Exam Questions and Answers
100% (7)
ISO 9001 Lead Auditor Sample Exam Questions and Answers
5 pages
Experimenting With Religion - The New Science of Belief
100% (1)
Experimenting With Religion - The New Science of Belief
201 pages
DBE 04769feng PDF
100% (2)
DBE 04769feng PDF
421 pages
Compresor Sullivan Pallatek
33% (3)
Compresor Sullivan Pallatek
4 pages
K-Nearest Neighbor (KNN)
No ratings yet
K-Nearest Neighbor (KNN)
27 pages
KNN
No ratings yet
KNN
2 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Bài-nhóm-tìm-hiểu-về-KNN
No ratings yet
Bài-nhóm-tìm-hiểu-về-KNN
5 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
Unit 5 - DA - Classification & Clustering
No ratings yet
Unit 5 - DA - Classification & Clustering
105 pages
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
No ratings yet
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
5 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
2 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Week 7 Part 1KNN K Nearest Neighbor Classification
No ratings yet
Week 7 Part 1KNN K Nearest Neighbor Classification
47 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
SVM,KNN,TreeNBC
No ratings yet
SVM,KNN,TreeNBC
22 pages
FPA unit 2
No ratings yet
FPA unit 2
20 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
algosintrvwques
No ratings yet
algosintrvwques
27 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
K Nearest neighbour’s(knn)[1] using R
No ratings yet
K Nearest neighbour’s(knn)[1] using R
9 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
3
No ratings yet
3
23 pages
ML IMP QUES 2
No ratings yet
ML IMP QUES 2
37 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
ML-Lecture-13-KNN
No ratings yet
ML-Lecture-13-KNN
14 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
Big Data
No ratings yet
Big Data
21 pages
KNN - Feb 19
No ratings yet
KNN - Feb 19
42 pages
Classification KNN
No ratings yet
Classification KNN
11 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Aiml
No ratings yet
Aiml
7 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
KNN
No ratings yet
KNN
3 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Act8
No ratings yet
Act8
20 pages
Intro to Knn
No ratings yet
Intro to Knn
8 pages
Overview of Clustering K Means
No ratings yet
Overview of Clustering K Means
8 pages
Lecture Slides#7
No ratings yet
Lecture Slides#7
21 pages
K Nearest Neighbors: Probably A Duck."
No ratings yet
K Nearest Neighbors: Probably A Duck."
14 pages
UNIT-2 K-Nn-March 2024
No ratings yet
UNIT-2 K-Nn-March 2024
23 pages
Learning Types ML
No ratings yet
Learning Types ML
18 pages
KNN
No ratings yet
KNN
53 pages
Unit - II
No ratings yet
Unit - II
37 pages
K-NN Algorithm in Machine Learning
No ratings yet
K-NN Algorithm in Machine Learning
11 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
KNN
No ratings yet
KNN
20 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
Kmeans
No ratings yet
Kmeans
92 pages
KNN Algorithm - PPT (Autosaved)
0% (1)
KNN Algorithm - PPT (Autosaved)
8 pages
Review
No ratings yet
Review
16 pages
ads e8
No ratings yet
ads e8
2 pages
himanshPR
No ratings yet
himanshPR
12 pages
21CS743 Model Question Paper Solution
No ratings yet
21CS743 Model Question Paper Solution
32 pages
K NN
No ratings yet
K NN
4 pages
Improving Performance Handout
No ratings yet
Improving Performance Handout
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
400l past Questions
No ratings yet
400l past Questions
11 pages
Ift i.t Placement Companies
No ratings yet
Ift i.t Placement Companies
85 pages
CSC201 Data set
No ratings yet
CSC201 Data set
2 pages
Information Security Ift 302
No ratings yet
Information Security Ift 302
14 pages
Cv32223683 Hogr Fatih Telecommunication Engineer
No ratings yet
Cv32223683 Hogr Fatih Telecommunication Engineer
3 pages
Petrarch S Canzoniere in The English Renaissance 1st Edition Anthony Mortimer 2024 Scribd Download
100% (16)
Petrarch S Canzoniere in The English Renaissance 1st Edition Anthony Mortimer 2024 Scribd Download
70 pages
Sad Final Project Haramaya Woreda Land Management System
No ratings yet
Sad Final Project Haramaya Woreda Land Management System
59 pages
Udemi 1
No ratings yet
Udemi 1
22 pages
Accounting Lesson Notes Exercises
0% (1)
Accounting Lesson Notes Exercises
14 pages
David sm13 PPT 05
No ratings yet
David sm13 PPT 05
41 pages
Industrial Gas Cutting Equipment
No ratings yet
Industrial Gas Cutting Equipment
100 pages
ABC Optimization - Full Paper (Santosh Galgotia)
No ratings yet
ABC Optimization - Full Paper (Santosh Galgotia)
17 pages
Mueller - How To Make A Mind Map - June 2008
No ratings yet
Mueller - How To Make A Mind Map - June 2008
21 pages
Exemplo Stored Procedure SP Who3 SQL
No ratings yet
Exemplo Stored Procedure SP Who3 SQL
5 pages
DS - Chapter 5 - Naming
No ratings yet
DS - Chapter 5 - Naming
45 pages
Thesis Kokurikulum
100% (2)
Thesis Kokurikulum
6 pages
Skorupski Outline of Bon Religion
No ratings yet
Skorupski Outline of Bon Religion
11 pages
Test Bank Notes Receivable
No ratings yet
Test Bank Notes Receivable
9 pages
Cadillac Shop Manual
100% (1)
Cadillac Shop Manual
883 pages
SEN Korea
No ratings yet
SEN Korea
46 pages
Task 1 Solution Final
No ratings yet
Task 1 Solution Final
29 pages
Product Datasheet - TraXon With Modules - en
No ratings yet
Product Datasheet - TraXon With Modules - en
7 pages
Essay Futurama
No ratings yet
Essay Futurama
9 pages
Advantages and Disadvantages of A CSTR
50% (2)
Advantages and Disadvantages of A CSTR
3 pages
Sumon Report
No ratings yet
Sumon Report
40 pages
For+Posting+127+-+IO 2021-002+RC NGCP+CAPEX Reduce (Compress) + (SGD)
No ratings yet
For+Posting+127+-+IO 2021-002+RC NGCP+CAPEX Reduce (Compress) + (SGD)
16 pages
Selection and Organization of Content
100% (5)
Selection and Organization of Content
8 pages
Questionnaire
No ratings yet
Questionnaire
5 pages
How To Read Mandarin - Getting Up-Close and Personal With Chinese Characters
No ratings yet
How To Read Mandarin - Getting Up-Close and Personal With Chinese Characters
7 pages
Accounting
No ratings yet
Accounting
2 pages