0% found this document useful (0 votes)

25 views64 pages

Supervised Learning in Machine Learning

Uploaded by

falishaumaiza6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views64 pages

Supervised Learning in Machine Learning

Uploaded by

falishaumaiza6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Course: A p p l i e d Machine Learning (CSE3087)

Module1
Module-I: Supervised Learning [14 Sessions] [Blooms Taxonomy Selected-
Application]
An overview of Machine Learning (ML); ML workflow; types of ML; Types of features, Feature Engineering -
Data Imputation Methods; Regression – introduction; simple linear regression, loss functions; Polynomial
Regression; Logistic Regression; Softmax Regression with cross entropy as cost function;
Bayesian Learning – Bayes Theorem, estimating conditional probabilities for categorical and continuous
features, Naïve Bayes for supervised learning; Bayesian Belief networks; Support Vector Machines – soft margin
and kernel tricks.

CSE3087
• Module-I: Supervised Learning

• An overview of Machine Learning (ML); ML workflow; types of ML; Types of

features, Feature Engineering -Data Imputation Methods; Regression –
introduction; simple linear regression, loss functions; Polynomial Regression;
Logistic Regression; Softmax Regression with cross entropy as cost function;

• Bayesian Learning – Bayes Theorem, estimating conditional probabilities for

categorical and continuous features, Naïve Bayes for supervised learning;
Bayesian Belief networks; Support Vector Machines – soft margin and kernel
tricks.

2
Machine Learning
Machine learning (ML) is a type of artificial intelligence that enables
machines to learn and improve from experience without being explicitly
programmed.

ML algorithms can be divided into three main categories:

• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

CSE3087 1/ 51
• Supervised learning involves training a model on a labeled dataset,
where the input features are labeled with the corresponding output
values.

• Unsupervised learning involves training a model on an unlabeled

dataset, where the goal is to discover patterns and structure in the
data.

• Reinforcement learning involves training a model to make decisions by

interacting with an environment and receiving feedback in the form of
rewards or penalties.

4
MLWorkflow
ML Workflow

The ML workflow typically consists of the following steps:

• Data Collection
• Data Cleaning
• Feature Engineering
• Model Selection
• Model Training
• Model Evaluation
• Model Deployment

2/
CSE3087
51
• Data collection involves gathering data from various sources, such as databases, APIs,
and web scraping.

• Data cleaning involves preprocessing the data to remove missing values, outliers,
and other anomalies.

• Feature engineering involves selecting and transforming the input features to

improve the model’s performance. This can include techniques such as feature
scaling, one-hot encoding, and dimensionality reduction.

6
• Model selection involves choosing the appropriate algorithm for
the task at hand. This can involve experimenting with different
algorithms and comparing their performance.

• Model training involves fitting the model to the training data using
an optimization algorithm such as gradient descent.

• Model evaluation involves testing the model on a held-out

validation set to assess its performance.

• Model deployment involves integrating the trained model into a

larger system or application.

CSE3087 /
Types of Features

Features can be divided into two main categories:

• Categorical Features (Discrete values such as colors, types,

and categories)

• Numerical Features(Continuous values such as measurements,

counts, and percentages)

Features can also be divided into binary, ordinal, and interval/ratio

types, depending on their characteristics and properties.

7/
CSE3087
51
Feature Engineering
Feature engineering is the process of selecting and transforming the
input features to improve the model’s performance.
Some common feature engineering techniques include feature scaling,
one-hot encoding, and dimensionality reduction.
Feature scaling involves scaling the numerical features to a common
scale to prevent bias in the model.

12/
CSE3087
51
Data Transformation
Data transformation is the most important step in a machine
learning pipeline which includes modifying the raw data and
converting it into a better format so that it can be more suitable
for analysis and model training purposes.

CSE3087
ImageData Transformation:

Imagedata is typically represented as a matrix of pixel values, and a

common transformation is to normalize the pixel values so they have
a mean of zero and a standard deviation of one.

Text Data Transformation: A common technique is to represent the

text as a bag-of-words, which involves creating a dictionary of all the
unique words in the text and representing each document as a vector
of word frequencies.

9/
CSE3087
51
Numerical Data Transformation:
Numerical data typically requires scaling and normalization to
ensure that all the input features have a similar scale and range.

10/
CSE3087
51
Data Imputation

Data imputation is a statistical technique used to fill in missing data

points in a dataset.
There are several methods for data imputation, but one common
approach is to use a statistical model to estimate the missing values
based on the available data.

14/
CSE3087
51
Methods of Data Imputation
Mean Imputation: Replace missing values with the mean of the
non-missing values in the variable.

Regression Imputation: Predict the missing values using a regression

model based on the other variables in the dataset.

Multiple Imputation: Create multiple imputed datasets and combine

the results for analysis.

K-Nearest Neighbor (KNN) Imputation: Predict the missing values

using the values of the nearest neighbors in the dataset.

13/
CSE3087
51
Example

Suppose we have a dataset with five data points with NaN values:

We want to impute the missing value using a simple linear regression

model. The linear regression model is given by:
y = bx + a

where
y is the dependent variable (the missing value we want to impute),
x is the independent variable (in this case, the index of the data
point),
b is the slope of the line, and a is the y-intercept.

15/
CSE3087
51
To estimate the missing value, we first need to fit the linear regression
model to the available data. We can do this by using the least squares
method, which finds the values of a and b that minimize the sum of the
squared differences between the predicted values and the actual values
of the dependent variable. The formula for the slope b and y-intercept
a are given by:

16
17
find the value of Y for x=12
Y=1.5 + (0.95 * 12)
Y= 12.5
K-Nearest Neighbor (KNN) Imputation

The KNN algorithm selects the k most similar observations to the

one with the missing value and then takes the average or weighted
average of the values of these observations to fill in the missing value.

It is referred to as multivariate because it considers multiple variables

or features in the dataset to estimate the missing values. By
leveraging the values of other variables, KNN imputation takes into
account the relationships and patterns present in the data to impute
missing values.

19/
CSE3087
51
Calculating K Nearest Neighbors with NaN Euclidean Distance:

When calculating K nearest neighbors using Euclidean distance and

dealing with missing values (NaN), special handling is required.
Here’s how it can be performed:
[Link] the subset of data points that have non-missing values for
the target feature.
[Link] the Euclidean distance between the data point with the
missing value and each data point in the subset.
[Link] data points with missing values (NaN) in the features being
compared.
[Link] the K data points with the smallest Euclidean distances as
the nearest neighbors.

20
Which features KNN imputer will take into account?
Whatever columns are passed as X
Because of Euclidean distance calculation, KNN imputer is not
applicable on categorical variables

21
The formula for Euclidean distance between two data points X and Y
is:

Distance = sqrt(weight *Σ((X_i — Y_i)²)), for i = 1 to number of

dimensions

Where X_i and Y_i represent the values of the i-th feature or
dimension of data points X and Y, respectively and
weight = Total # of coordinates / # of present coordinates Euclidean.

An example to illustrate the calculation of Euclidean distance when

dealing with NaN values:
Consider two data points, X and Y, with three features: A, B, and C.

X: [1, NaN, 3] Y: [2, 4, 5]

22
To calculate the Euclidean distance between X and Y, we ignore the missing value (NaN) in
feature B and compute the distance using the available features:

Euclidean Distance = sqrt(weight*(X_A — Y_A)² + (X_C — Y_C)²)

Applying the values from X and Y:

Euclidean Distance = sqrt(3/2 * (1–2)² + (3–5)²) = sqrt(1.5 * (1 + 4))

sqrt(1.5 * 5)=7

In this case, the missing value in feature B does not contribute to the Euclidean distance
calculation between X and Y. The distance is determined based on the available features A
and C.

23
What is Regression?

Regression is a statistical modeling technique used to explore the

relationship between a dependent variable and one or more independent
variables. It is used to model and predict the value of the dependent
variable based on the values of the independent variables.

The goal of regression is to find a function that can accurately predict

the value of the dependent variable based on the values of the
independent variables.

22/
CSE3087
51
Simple Linear Regression:

Simple linear regression is a type of regression analysis that models

the relationship between a dependent variable and a single
independent variable.
It assumes a linear relationship between the two variables and uses a
line to model the relationship.
The line is fitted to the data using a least squares regression approach,
which minimizes the sum of the squared differences between the
predicted values and the actual values of the dependent variable.

23/
CSE3087
51
Simple Linear Regression
Simple linear regression is a type of regression where there is only one
independent variable. The goal of simple linear regression is to find the
line of best fit that minimizes the difference between the predicted values
and the actual values of the dependent variable.

24/
CSE3087
51
Loss Functions

In order to find the line of best fit in simple linear regression, we need to
define a loss function that measures the difference between the predicted
values and the actual values of the dependent variable.
Two commonly used loss functions in simple linear regression are the
Mean Squared Error (MSE) and the Mean Absolute Error (MAE).

2
Σi =1 (yi − yˆi )
n
MSE = 1
n
MAE = Σ i =1 |yi − yˆi |
1 n
n

25/
CSE3087
51
Loss Functions:
Loss functions are used in regression to quantify the difference
between the predicted values and the actual values of the dependent
variable.
The goal of regression is to minimize the loss function by adjusting
the parameters of the model.
The most commonly used loss function in simple linear regression is
the mean squared error (MSE), which is the average of the squared
differences between the predicted values and the actual values.
Other loss functions include mean absolute error (MAE) and root
mean squared error (RMSE).

26/
CSE3087
51
Polynomial Regression

Polynomial regression is a type of regression where the relationship

between the dependent variable and the independent variable(s) is
modeled as an nth degree polynomial. This allows for a more complex
relationship between the variables than simple linear regression.

Figure: Polynomial Regression

27/
CSE3087
51
Polynomial Regression:

What is Polynomial Regression?

In polynomial regression, we describe the relationship between the independent
variable x and the dependent variable y using an nth-degree polynomial in x.

Polynomial regression is a type of regression analysis that models the

relationship between a dependent variable and one or more
independent variables using a polynomial function.
It allows for more complex relationships between the variables by
fitting a curve to the data instead of a straight line.

28/
CSE3087
51
Types of Polynomial Regression
A quadratic equation is a general term for a second-degree polynomial equation.
This degree, on the other hand, can go up to nth values. Here is the
categorization of Polynomial Regression:

Linear – if degree as 1

Quadratic – if degree as 2

Cubic – if degree as 3 and goes on, on the basis of degree.

When the Linear Regression Model fails

to capture the points in the data and
the Linear Regression fails to
adequately represent the optimum,
then we use Polynomial Regression

31
30/
CSE3087
51
33
Introduction to Logistic Regression
Logistic regression is a popular method for binary classification,
where the goal is to predict the probability of an input belonging to
a particular class.
The logistic regression model uses a logistic function to model
the probability of an input belonging to the positive class.
The logistic function maps any input to a value between 0 and
1, which can be interpreted as a probability.
1
p(y = 1|x) =
1 + e−z
Where z = θ0 + θ1x1 + θ2x2 + · · · + θnxn is the linear combination of the input features
x1, x2, . . . , xn and their corresponding coefficients
θ1 , θ2 , . . . , θn .

37/
CSE3087
51
Softmax Regression
• Softmax regression is a generalization of logistic regression that can be used for
multi-class classification problems.

• The softmax function is used to compute the probability of each class, and
the class with the highest probability is chosen asthe predicted class.

• The softmax function outputs a probability distribution over the classes,

and its outputs sum to 1.

ezi
p(y = i |x) = K (2)
Σ j =1 ezj
where K is the number of classes, zi is the linear combination of input features for
class i, and the denominator sums over all classes.
38/
CSE3087
51
Cross Entropy Loss

To train a logistic regression or softmax regression model, we need a

cost function that measures the difference between the predicted
probabilities and the true labels.
The cross entropy loss is a popular choice for such a cost function.

39/
CSE3087
51
Solving classification
problems with naïve
bayes
How does naïve bayes algorithm work?

• Naive Bayes classifier algorithm is based on a famous

P(A|B)= theorem called “Bayes theorem”.
P(A|B)P(A)
P(B) • It can help us find simple yet powerful solutions to many
problems ranging from text analysis to spam detection
and much more.
Probability to describe
how likely an event is
to happen
A value between 0 and 1represents the
possibility of an event happening

0 1

Less likely to Most likely to

happen happen
Bayes theorem is centered on
conditional probability
What is conditional probability?
Conditional probability is the
probability of an event
‘A’ happening given that another
event ‘B’ has already happened.

The Bayes theorem is an

extension of conditional
probability. It allows us in a
sense to use reverse reasoning.
Understanding the Bayes
theorem formula
Prior probability P(A) –
The probability of just ‘A’ occurring

Posterior probability P(A|B) –

The probability of event ‘A’ given
that event ‘B’ occurs

P(B|A) - The probability of event B

happening given that event A has
occurred

P(B) - The probability of just B

What makes Naïve bayes
algorithm naïve?
When the model calculates the conditional probability of one feature
given a class,

…it doesn’t take into account the effect of any other feature.

…it assumes that features are independent from each other.

…it gives us the flexibility to describe the probability of each feature.

The algorithm’s naivety has some
advantages & limitations

• Advantages Disadvantages

• Quick & simple • In most real-world situations

some of the features are likely
• Produce good results to be dependent on each
with small amount other, which might cause
of training data wrong results.

• Used for
benchmarking of a
model

• Works well with

continuous data
by discretizing

42
Three types of naïve bayes classifiers
in sklearn
Bernoulli Multinomial Naïve Gaussian Naïve
Bayes Bayes

Used when data is Used when there are Used when all features
binary like true or false, discrete values such are continuous variables,
yes or no etc. as number of family like temperature or
members or pages height.
in a book.
Reversing the condition
Example: Rahul’s favorite breakfast is bagels and his favorite lunch is pizza. The probability of Rahul having
bagels for breakfast is 0.6. The probability of him having pizza for lunch is 0.5. The probability of him, having
a bagel for breakfast given that he eats a pizza for lunch is 0.7.
Let’s define event A as Rahul having a bagel for breakfast, Event B as Rahul having a pizza for lunch.
P (A) = 0.6
P (B) = 0.5

If we look at the numbers, the probability of having a bagel is different than the probability of having a bagel
given he has a pizza for lunch. This means that the probability of having a bagel is dependent on having a
pizza for lunch.

Now what if we need to know the probability of having a pizza given you had a bagel for breakfast. i.e. we

need to know

. Bayes theorem now comes into the picture.

44
The Bayes theorem describes the probability of an event based on
the prior knowledge of the conditions that might be related to the
event. If we know the conditional probability , we can use the
bayes rule to find out the reverse probabilities .

For the previous example – if we now wish to calculate the probability of

having a pizza for lunch provided you had a bagel for breakfast would be =
0.7 * 0.5/0.6.

45
Na¨ıve Bayes is a popular algorithm for supervised learning, particularly
for text classification problems. It’s based on Bayes’ theorem, which is
a formula for calculating conditional probabilities.

• The basic idea behind Na¨ıveBayes is to calculate the probability of a

particular class given some input features.
• This is done by calculating the conditional probabilities of each feature
given the class, and then using Bayes’ theorem to calculate the
probability of the class given the features.

CSE3087
43/
51
Bayesian Belief Networks

Bayesian belief networks are a type of graphical model that can be

used to represent probabilistic relationships between variables.

• Each variable is represented by a node in the network, and

the edges between the nodes represent conditional
dependencies.
• The basic idea behind Bayesian belief networks is to use
Bayes’ theorem to calculate the probabilities of each variable
given the values of its parents in the network.
• This can be done by factorizing the joint probability
distribution of all the variables into a product of conditional
probabilities.

CSE3087
One common application of Bayesian belief networks
is in medical diagnosis.

• The variables in the network might represent symptoms and

diseases, and the edges represent the conditional dependencies
between them.
• Given a set of observed symptoms, the network can be
used to calculate the probability of each possible
disease.

CSE3087
46/
51
Na¨ıve Bayes and Bayesian belief networks are powerful tools for
probabilistic modeling and inference.

• Na¨ıve Bayes is particularly useful for text classification

problems, while Bayesian belief networks are useful for
modeling complex dependencies between variables.
• With these tools, we can make more accurate predictions and
better understand the relationships between different
variables in our data.

CSE3087
47/
51
50
51
52
This is the Joint Probability Distribution

53
54
Support Vector Machines (SVMs) are a popular machine
learning algorithm for classification and regression. They work
by finding the hyperplane that maximally separates the data
points in a high-dimensional feature space.
• The basic idea behind SVMs is to find the hyperplane that maximizes
the margin betweenthe positive and negative data points. The
margin is the distance between the hyperplane and the closest data
points from either class. The hyperplane that maximizes the margin is
called the maximum margin hyperplane.
• SVMs can be extended to handle non-linearly separable data using a
technique called the kernel trick. The kernel trick involves mapping
the original feature space to a higher-dimensional space using a non-
linear function, and then finding the maximum margin hyperplane in
this new space.

CSE3087
48/
51
• Applying a mapping Function

56
Disadvantages:
• Increased Computation
• Increased Learning Cost
• No Thumb Rule for all types of data

57
58
59
In practice, data is often not perfectly separable, and finding the
maximum margin hyperplane is not always possible. In these
cases, we can use a variant of SVM called the soft margin SVM.

• The soft margin SVM allows for some misclassifications, and

introduces a slack variable that penalizes data points that
are on the wrong side of the margin.
• The objective function for the soft margin SVM includes a
regularization term that controls the tradeoff between
maximizing the margin and minimizing the classification
error.

CSE3087
49/
51
The kernel trick is a powerful technique for extending SVMs to handle
non-linearly separable data. The basic idea is to map the data into a
higher-dimensional feature space using a non-linear function, and then
apply the linear SVM algorithm to this new feature space.
• The key insight behind the kernel trick is that we can compute the dot
product between the mapped data points without explicitly computing the
mapping. This is done by defining a kernel function that takes two data
points as input and returns the dot product of their mapped features.
• Some common kernel functions include the polynomial kernel, which
computes the dot product of two vectors raised to a certain power, and the
radial basis function kernel, which measures the similarity between two data
points based on their distance in the feature space.

CSE3087
50/
51
62
Support Vector Machines are a powerful machine learning
algorithm that can handle both linear and non-linearly
separable data.

• The kernel trick allows us to extend SVMs to handle complex

data, and the soft margin SVM allows us to handle data that is not
perfectly separable.
• With these techniques, we can build powerful models for
classification and regression tasks.

CSE3087
51/
51

Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
63 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
63 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
34 pages
Machine Learning Development Life Cycle
No ratings yet
Machine Learning Development Life Cycle
10 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
18 pages
Data Preprocessing Techniques Explained
No ratings yet
Data Preprocessing Techniques Explained
22 pages
Essential Math Lab for Data Science
No ratings yet
Essential Math Lab for Data Science
31 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
12 pages
Hands-On Data Preprocessing in Python
No ratings yet
Hands-On Data Preprocessing in Python
12 pages
Predictive Maintenance with Machine Learning
No ratings yet
Predictive Maintenance with Machine Learning
66 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
6 pages
Machine Learning in Cyber Security
No ratings yet
Machine Learning in Cyber Security
26 pages
Machine Learning Practical Record 2024
No ratings yet
Machine Learning Practical Record 2024
23 pages
Supervised Learning: Types & Algorithms
No ratings yet
Supervised Learning: Types & Algorithms
54 pages
Supervised Learning Concepts and Techniques
No ratings yet
Supervised Learning Concepts and Techniques
89 pages
Machine Learning Lab by Tanvi Wadhwa
No ratings yet
Machine Learning Lab by Tanvi Wadhwa
47 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
18 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
94 pages
CIE-2 Solutions for AI & ML (B.E CSE)
No ratings yet
CIE-2 Solutions for AI & ML (B.E CSE)
10 pages
Weather Data Analysis for Decision Making
No ratings yet
Weather Data Analysis for Decision Making
9 pages
Feature Scaling and Data Encoding Techniques
No ratings yet
Feature Scaling and Data Encoding Techniques
44 pages
Data Analytics Course Overview
No ratings yet
Data Analytics Course Overview
253 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
33 pages
Data Science Course Curriculum Overview
No ratings yet
Data Science Course Curriculum Overview
8 pages
Supervised Machine Learning Explained
No ratings yet
Supervised Machine Learning Explained
5 pages
Machine Learning with Large Datasets
No ratings yet
Machine Learning with Large Datasets
140 pages
Data Transformations for PMA Units
No ratings yet
Data Transformations for PMA Units
19 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
15 pages
Machine Learning Data Preprocessing Guide
No ratings yet
Machine Learning Data Preprocessing Guide
9 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
70 pages
Linear Regression in Neural Networks
No ratings yet
Linear Regression in Neural Networks
66 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
68 pages
Distance Measures in Machine Learning
No ratings yet
Distance Measures in Machine Learning
24 pages
Machine Learning with Real Projects
100% (2)
Machine Learning with Real Projects
26 pages
Linear Regression Lab Manual
100% (1)
Linear Regression Lab Manual
8 pages
Machine Learning Concepts and Applications
No ratings yet
Machine Learning Concepts and Applications
22 pages
Feature Engineering Basics in Python
No ratings yet
Feature Engineering Basics in Python
33 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
30 pages
Domain-Specific Feature Engineering
No ratings yet
Domain-Specific Feature Engineering
25 pages
Machine Learning Basics and Preprocessing
No ratings yet
Machine Learning Basics and Preprocessing
52 pages
Linear Regression for House Pricing
No ratings yet
Linear Regression for House Pricing
23 pages
Basics of Machine Learning Overview
No ratings yet
Basics of Machine Learning Overview
67 pages
Data Analytics Lab Overview
No ratings yet
Data Analytics Lab Overview
58 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
68 pages
Linear Regression and KNN Algorithms Guide
100% (5)
Linear Regression and KNN Algorithms Guide
56 pages
Impact of k on k-NN Bias and Variance
No ratings yet
Impact of k on k-NN Bias and Variance
28 pages
Feature Engineering and Selection Guide
No ratings yet
Feature Engineering and Selection Guide
32 pages
Salary Prediction Using KNN Model
No ratings yet
Salary Prediction Using KNN Model
1 page
Understanding Negative Residuals in Regression
No ratings yet
Understanding Negative Residuals in Regression
42 pages
Distributed Linear Regression Overview
No ratings yet
Distributed Linear Regression Overview
19 pages
Machine Learning Process and Techniques
No ratings yet
Machine Learning Process and Techniques
254 pages
Data Preprocessing for Machine Learning
No ratings yet
Data Preprocessing for Machine Learning
36 pages
My Notes
No ratings yet
My Notes
15 pages
ML Assignment 1 Solutions Overview
No ratings yet
ML Assignment 1 Solutions Overview
36 pages
Machine Learning Project Workflow Steps
No ratings yet
Machine Learning Project Workflow Steps
12 pages
Machine Learning: Supervised & Unsupervised Techniques
No ratings yet
Machine Learning: Supervised & Unsupervised Techniques
63 pages
Geometric Models in Machine Learning
No ratings yet
Geometric Models in Machine Learning
25 pages
Semi-Supervised Learning & Data Engineering
No ratings yet
Semi-Supervised Learning & Data Engineering
86 pages
ECE 449: Machine Learning Concepts
No ratings yet
ECE 449: Machine Learning Concepts
5 pages
CIAT Test Practice Exam Overview
No ratings yet
CIAT Test Practice Exam Overview
4 pages
Pre-Installation Checklist for Varian Equipment
No ratings yet
Pre-Installation Checklist for Varian Equipment
14 pages
SAP UI5 and Fiori Training Overview
No ratings yet
SAP UI5 and Fiori Training Overview
144 pages
Technical Drawing Exam Questions
No ratings yet
Technical Drawing Exam Questions
5 pages
Comparing AI Agents: PEAS Framework
No ratings yet
Comparing AI Agents: PEAS Framework
6 pages
SAP Coding Guide for IT Finance
No ratings yet
SAP Coding Guide for IT Finance
7 pages
Exploring Balanced and Unbalanced Forces
No ratings yet
Exploring Balanced and Unbalanced Forces
3 pages
Call Data Insertion Logs
No ratings yet
Call Data Insertion Logs
111 pages
AI Course Outline for Students
No ratings yet
AI Course Outline for Students
4 pages
PT-NC120D1-LIU(D2) Camera Datasheet
No ratings yet
PT-NC120D1-LIU(D2) Camera Datasheet
4 pages
Understanding SQL DDL Commands
No ratings yet
Understanding SQL DDL Commands
24 pages
Understanding the CI/CD Process
No ratings yet
Understanding the CI/CD Process
5 pages
Inogen One Oxygen Concentrator Manual
No ratings yet
Inogen One Oxygen Concentrator Manual
33 pages
Factors Affecting Consumers' Acceptance Towards Electronic Payment System: Case of A Government Land and District Office
No ratings yet
Factors Affecting Consumers' Acceptance Towards Electronic Payment System: Case of A Government Land and District Office
8 pages
Single-Phase Half-Bridge Inverter Guide
No ratings yet
Single-Phase Half-Bridge Inverter Guide
9 pages
Philosophy of Happiness and Technology
No ratings yet
Philosophy of Happiness and Technology
9 pages
TAPCore ScriptLanguage ReferenceGuide
No ratings yet
TAPCore ScriptLanguage ReferenceGuide
231 pages
VB.NET Program Examples and Source Code
50% (2)
VB.NET Program Examples and Source Code
2 pages
Continuition Chapter 4 Thesis Documentation Chapters 1 4 Continue
No ratings yet
Continuition Chapter 4 Thesis Documentation Chapters 1 4 Continue
57 pages
Understanding Linked Lists in Data Structures
100% (1)
Understanding Linked Lists in Data Structures
22 pages
Shopify's E-Commerce Strategy Analysis
No ratings yet
Shopify's E-Commerce Strategy Analysis
5 pages
Language and Reading Practice Test
No ratings yet
Language and Reading Practice Test
4 pages
Modern Infographic PowerPoint Template
No ratings yet
Modern Infographic PowerPoint Template
43 pages
NLDC SCADA Integration Experience
No ratings yet
NLDC SCADA Integration Experience
6 pages
Tile Setting: Techniques and Tools
No ratings yet
Tile Setting: Techniques and Tools
7 pages
ST-RJ45-CAT6 Surge Suppressor Specs
No ratings yet
ST-RJ45-CAT6 Surge Suppressor Specs
2 pages
Computer Adventure Story Questions
No ratings yet
Computer Adventure Story Questions
3 pages
Database Schema and Constraints Overview
No ratings yet
Database Schema and Constraints Overview
6 pages
Configuration and Compatibility Testing Guide
No ratings yet
Configuration and Compatibility Testing Guide
6 pages
IMT Ghaziabad Selection Process Details
No ratings yet
IMT Ghaziabad Selection Process Details
2 pages

Supervised Learning in Machine Learning

Uploaded by

Supervised Learning in Machine Learning

Uploaded by

Course: A p p l i e d Machine Learning (CSE3087)

• An overview of Machine Learning (ML); ML workflow; types of ML; Types of

• Bayesian Learning – Bayes Theorem, estimating conditional probabilities for

ML algorithms can be divided into three main categories:

• Unsupervised learning involves training a model on an unlabeled

• Reinforcement learning involves training a model to make decisions by

The ML workflow typically consists of the following steps:

• Feature engineering involves selecting and transforming the input features to

• Model evaluation involves testing the model on a held-out

• Model deployment involves integrating the trained model into a

Features can be divided into two main categories:

• Categorical Features (Discrete values such as colors, types,

• Numerical Features(Continuous values such as measurements,

Features can also be divided into binary, ordinal, and interval/ratio

Imagedata is typically represented as a matrix of pixel values, and a

Text Data Transformation: A common technique is to represent the

Data imputation is a statistical technique used to fill in missing data

Regression Imputation: Predict the missing values using a regression

Multiple Imputation: Create multiple imputed datasets and combine

K-Nearest Neighbor (KNN) Imputation: Predict the missing values

We want to impute the missing value using a simple linear regression

The KNN algorithm selects the k most similar observations to the

It is referred to as multivariate because it considers multiple variables

When calculating K nearest neighbors using Euclidean distance and

Distance = sqrt(weight *Σ((X_i — Y_i)²)), for i = 1 to number of

An example to illustrate the calculation of Euclidean distance when

X: [1, NaN, 3] Y: [2, 4, 5]

Euclidean Distance = sqrt(weight*(X_A — Y_A)² + (X_C — Y_C)²)

Applying the values from X and Y:

Euclidean Distance = sqrt(3/2 * (1–2)² + (3–5)²) = sqrt(1.5 * (1 + 4))

Regression is a statistical modeling technique used to explore the

The goal of regression is to find a function that can accurately predict

Simple linear regression is a type of regression analysis that models

Polynomial regression is a type of regression where the relationship

Figure: Polynomial Regression

What is Polynomial Regression?

Polynomial regression is a type of regression analysis that models the

Cubic – if degree as 3 and goes on, on the basis of degree.

When the Linear Regression Model fails

• The softmax function outputs a probability distribution over the classes,

To train a logistic regression or softmax regression model, we need a

• Naive Bayes classifier algorithm is based on a famous

Less likely to Most likely to

The Bayes theorem is an

Posterior probability P(A|B) –

P(B|A) - The probability of event B

P(B) - The probability of just B

…it assumes that features are independent from each other.

…it gives us the flexibility to describe the probability of each feature.

• Quick & simple • In most real-world situations

• Works well with

. Bayes theorem now comes into the picture.

For the previous example – if we now wish to calculate the probability of

• The basic idea behind Na¨ıveBayes is to calculate the probability of a

Bayesian belief networks are a type of graphical model that can be

• Each variable is represented by a node in the network, and

• The variables in the network might represent symptoms and

• Na¨ıve Bayes is particularly useful for text classification

• The soft margin SVM allows for some misclassifications, and

• The kernel trick allows us to extend SVMs to handle complex

You might also like