0% found this document useful (0 votes)

24 views

Week 3

Uploaded by

SvipDag

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Week 3

Uploaded by

SvipDag

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

CPE316 - Introduction to

Machine Learning

Week 3
Introduction to Supervised Learning

Assoc. Prof. Dr. Caner ÖZCAN

Nothing is impossible. The word itself says: "I’m possible!"
~ Audrey Hepburn
Machine Learning Glossary

• https://developers.google.com/machine-learning/glossary#model

• https://scikit-learn.org/stable/glossary.html

• https://seaborn.pydata.org/examples/index.html

3
Applications
• Association
• Supervised Learning
• Classification
• Regression
• Unsupervised Learning
• Reinforcement Learning

4
Supervised Learning: Uses

• Prediction of future cases: Use the rule to predict the output for
future inputs
• Knowledge extraction: The rule is easy to understand
• Compression: The rule is simpler than the data it explains
• Outlier detection: Exceptions that are not covered by the rule, e.g.,
fraud

5
Supervised Learning
• We discuss supervised learning starting from the simplest case, which
is learning a class from its positive and negative examples.
• We generalize and discuss the case of multiple classes, then
regression, where the outputs are continuous.

6
Learning a Class from Examples
• Class C of a “family car”
• Prediction: Is car x a family car?
• Knowledge extraction: What do people expect from a family car?
• Output:
Positive (+) and negative (–) examples
• Input representation:
x1: price, x2 : engine power

7
Training set X

• ‘+’ denotes a positive example of

the class (a family car)

• ‘−’ denotes a negative example

(not a family car)

Training set for the class of a “family car.”

Each data point corresponds to one example car.
8
Training set X
 x1  𝑥1 (e.g., in U.S. dollars)
x=  𝑥2 (e.g., engine volume in cubic centimeters).
x2 
Each car is represented by such an ordered pair (x, r):

 1 if x is positive
r=
0 if x is negative

The training set contains N such examples:

X = {xt ,r t }tN=1

Training set for the class of a “family car.”

Each data point corresponds to one example car.
9
Class C

(p1  price  p2 ) AND (e1  engine power  e2 )

• After further discussions with the expert and

the analysis of the data, we may have reason to
believe that for a car to be a family car, its price
and engine power should be in a certain range.

Example of a hypothesis class.

The class of family car is a rectangle in the
price-engine power space. 10
Class C
• C is the actual class and h is our induced
hypothesis.
• The point where C is 1 but h is 0 is a false
negative, and the point where C is 0 but h is 1
is a false positive.
• Other points—namely, true positives and true
negatives—are correctly classified.

• Equation fixes H, the hypothesis class from

which we believe C is drawn, namely, the set of
rectangles.
• The learning algorithm then finds hypothesis
the particular hypothesis, h ∈ H, specified by a
Example of a hypothesis class.
particular quadruple of (𝑝1ℎ , 𝑝2ℎ , 𝑒1ℎ , 𝑒2ℎ ), to
approximate C as closely as possible. 11
Hypothesis class H
12

• The aim is to find h∈ H that is as similar as possible to C.

• The hypothesis h makes a prediction for an instance x:

 1 if h classifies x as a positive
h( x) = 
0 if h classifies x as a negative

• The empirical error is the proportion of training

instances where predictions of h do not match the
required values given in X.
• The error of hypothesis h given the training set X is:

E (h | X ) = 1(h(xt )  r t )
N

t =1

where 1(a != b) is 1 if a != b and is 0 if a = b

Hypothesis class H
13

• In our example, the hypothesis class H is the set of all

possible rectangles.

• Each quadruple (𝑝1ℎ , 𝑝2ℎ , 𝑒1ℎ , 𝑒2ℎ ) defines one hypothesis,

h, from H, and we need to choose the best one.

• Generalization, how well our hypothesis will correctly

classify future examples that are not part of the
training set.
S, G, and the Version Space
most specific hypothesis, S
most general hypothesis, G

• The most specific hypothesis, S, that is the hypothesis

tightest rectangle that includes all the positive
examples and none of the negative examples.

• The most general hypothesis, G, is the largest

rectangle we can draw that includes all the positive
examples and none of the negative examples.

h ∈ H, between S and G is
consistent and make up the
version space (Mitchell, 1997) 14
Margin
• Choose h with largest margin

• It seems intuitive to choose h halfway between S and G;

this is to increase the margin, which is the distance
between the boundary and the instances closest to it.

• We should use an error (loss) function which not only

checks whether an instance is on the correct side of the
boundary but also how far away it

• Instead of h(x) that returns 0/1, we need to have a

hypothesis that returns a value which carries a measure
of the distance to the boundary.
15
VC Dimension
• N points can be labeled in 2N ways as +/–
• H shatters N if there
exists h  H consistent
for any of these:
VC(H ) = N

• The maximum number of points that can be shattered

by H is called the Vapnik-Chervonenkis (VC) dimension of H. An axis-aligned rectangle
shatters 4 points. Only
rectangles covering two
points are shown.
16
Probably Approximately Correct (PAC)
Learning
• How many training examples N should we have, such that with probability at least 1 ‒ δ, h has
error at most ε ?
(Blumer et al., 1989)

• Each strip is at most ε/4

• Probability that we miss a strip 1‒ ε/4
• Probability that N instances miss a strip (1 ‒ ε/4)N
• Probability that N instances miss 4 strips 4(1 ‒ ε/4)N
• 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
• 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

The difference between h and C

is the sum of four rectangular
strips, one of which is shaded. 17
Noise and Model Complexity
• Noise is any unwanted anomaly in the data and due to noise, the class may be more difficult to
learn and zero error may be infeasible with a simple hypothesis class.

• Use the simpler rectangular because

• Simpler to use
(lower computational
complexity) When there is noise,
• Easier to train (lower there is not a simple
space complexity/fewer parameters) boundary between the
• Easier to explain positive and negative
instances.
(more interpretable)
• Generalizes better (lower
variance - Occam’s razor)

18
Multiple Classes, Ci i=1,...,K
The training set: X = {x t ,r t }tN=1

where r has K dimensions and

1 if x t
C i
ri = 
t

0 if x t
C j , j  i
Train hypotheses
hi(x), i =1,...,K:

K-class problem, we have K hypotheses

to learn such that
 if t
Ci
hi (x ) = 
t
1 x
Three hypotheses, each one covering the instances of one class
0 if x t
C j , j  i
and leaving outside the instances of the other two classes. 19
‘?’ are reject regions where no, or more than one, class is chosen.
Regression
• In classification, given an input, the output that is generated is Boolean; it is a
yes/no answer.
• When the output is a numeric value, what we would like to learn is not a class,
C(x) ∈ {0, 1}, but is a numeric function.
• In machine learning, the function is not known but we have a training set of
examples drawn from it
X = x , r t =1
t t N

rt 
• If there is no noise, the task is interpolation. We would like to find the function
f(x) that passes through these points such that we have

( )
r t = f xt 20
Regression
• In time-series prediction, for example, we have data up to the present and we
want to predict the value for the future.
• In regression, there is noise added to the output of the unknown function

( )
r t = f xt + 

where f(x) ∈ is the unknown function and  is random noise.

• We would like to approximate the output by our model g(x). The empirical error
on the training set X is


E (g | X ) =  r − g (x )
1 N t
N t =1
t 2

21
Regression
• The square of the difference is one error (loss) function that can be used; another
is the absolute value of the difference.
• Our aim is to find g(·) that minimizes the empirical error.
• Our approach is the same; we assume a hypothesis class for g(·) with a small set
of parameters.
• If we assume that g(x) is linear, we have

g ( x ) = w1 x1 + ... + wd xd + w0 =  j =1 w j x j + w0
d

22
Regression
• Let us now go back to our example in previous section where we estimated the price of a
used car.
• There we used a single input linear model
g (x ) = w1 x + w 0

where w1 and w0 are the parameters to learn from data. The w1 and w0 values should
minimize

 r − (w x )
N
E (w1 , w0 | X ) =
1 2
t
1
t
+ w0
N t =1

• The output may be taken as a higher-order function of the input—for example, quadratic

g (x ) = w 2 x 2 + w1 x + w 0
23
Regression
• Linear, second-order, and sixth-
g (x ) = w1 x + w 0 order polynomials are fitted to the
same set of points.
g (x ) = w 2 x 2 + w1 x + w 0
• The highest order gives a perfect
fit but given this much data it is
very unlikely that the real curve is
so shaped.
• The second order seems better
than the linear fit in capturing the
trend in the training data.

24
Model Selection & Generalization
• If the training set we are given contains only a small subset of all
possible instances, the solution is not unique.
• This is an example of an ill-posed problem;where the data by itself is
not sufficient to find a unique solution.
• If learning is ill-posed, and data by itself is not sufficient to find the
solution, we should make some extra assumptions to have a unique
solution with the data we have.
• The set of assumptions we make to have learning possible is called
the inductive bias of the learning algorithm.
• The need for inductive bias, assumptions about hypothesis class H

25
Model Selection & Generalization
• Learning is not possible without inductive bias, and now
the question is how to choose the right bias.
• This is called model selection, which is choosing between
possible H.
• In answering this question, we should remember that the
aim of machine learning is rarely to replicate the training
data but the prediction for new cases.
• We would like to be able to generate the right output for
an input instance outside the training set, one for which
the correct output is not given in the training set.
• How well a model trained on the training set predicts the
right output for new instances is called generalization.
26
*Yi-xin
Underfitting
• For best generalization, we should match the
complexity of the hypothesis class H with the
complexity of the function underlying the
data.
• If H is less complex than the function, we have
underfitting, for example, when trying to fit a
line to data sampled from a third-order
polynomial.
• In such a case, as we increase the complexity,
the training error decreases.
• But if we have H that is too complex, the data
is not enough to constrain it and we may end
up with a bad hypothesis, h ∈ H.
Siora Photography
27
Overfitting
• If there is noise, an overcomplex hypothesis may learn not only the underlying
function but also the noise in the data and may make a bad fit, for example, when
fitting a sixth-order polynomial to noisy data sampled from a third-order
polynomial.
• This is called overfitting.
• In such a case, having more training data helps but only up to a certain point.
• Given a training set and H, we can find h∈ H that
has the minimum training error but if H is not
chosen well, no matter which h∈ H we pick,
we will not have good generalization.

28
Sharoon Saxena
Triple Trade-Off
• In all learning algorithms that are trained from example data, there is
a trade-off between three factors:
• the complexity of the hypothesis C(H) we fit to data, namely, the capacity of
the hypothesis class,
• the amount of training data (N), and
• the generalization error (E) on new examples.

• As N increases, E decreases.
• As C(H ) increases, E first decreases first and then increases.

29
Train and Validation Set
• We can measure the generalization ability of a hypothesis, namely, the
quality of its inductive bias, if we have access to data outside the training
set.
• We simulate this by dividing the dataset we have into two parts.
• We use one part for training (i.e., to fit a hypothesis), and the remaining
validation set part is called the validation set and is used to test the
generalization ability.
• Assuming large enough training and validation sets, the hypothesis that is
the most accurate on the validation set is the best one (best inductive
bias).

30
Cross-Validation and Test Set
• Dividing data process is called cross-validation.
• Note that if we then need to report the error to give an idea about the
expected error of our best model, we should not use the validation error.
• We have used the validation set to choose the best model, and it has
effectively become a part of the training set.
• We need a test set, containing examples not used in training or validation.
• We split the data as
• Training set (50%)
• Validation set (25%)
• Test set (25%)
• Resampling when there is few data
31
Holdout Method for Model Evaluation and
Selection
• The dataset is split into two parts for The dataset is split into three different
model evaluation. sets – training, validation, and test for
• Generally, 70-30% split is used for model selection.
splitting the dataset. The hold-out method can also be used for
hyperparameters tuning.

32
Leave-p-out Cross-Validation
• Leave-p-out cross-validation involves using p observations as the validation set
and the remaining observations as the training set.
• This is repeated on all ways to cut the original sample on a validation set
of p observations and a training set.

33
K-Fold Cross-Validation
• For more advanced statistical evaluation, experienced experimenters often prefer
the so-called K-fold cross-validation.
• To begin with, the set of pre-classified examples is divided into K equally sized (or
almost equally-sized) subsets which the machine-learning jargon sometimes (not
quite correctly) refers to as “folds.”

34
K-Fold Cross-Validation
• K-fold cross-validation then runs K experiments.
• In each, one of the K subsets is removed so as to be used only for testing (this
guarantees that, in each run, a different testing set is used).
• The training is then carried out on the union of the remaining K-1 subsets.
• Again, the results are averaged, and the standard deviation calculated.

35
Dimensions of a Supervised Learner

1. Model: g (x | )
where g(·) is the model, x is the input, and θ are the parameters.

2. Loss function: E ( | X ) =  L(r t , g (xt | ))

where r t is desired output and our approximation to it, g(x t |θ).

3. Optimization procedure (to find θ∗ that minimizes the total error):

 * = arg min E ( | X )


where arg min returns the argument that minimizes.

36
Homework

End-to-End Machine Learning Project

(Chapter 2)

Read the chapter and apply all codes by

creating your own notebook.

Due date: next week

37
Homework

Discover and Visualize the Data to Gain Insights

Visualizing Geographical Data
Looking for Correlations
Experimenting with Attribute Combinations

Prepare the Data for Machine Learning Algorithms

Data Cleaning
Handling Text and Categorical Attributes
Custom Transformers
Feature Scaling
Transformation Pipelines
38
CPE316 - Introduction to
Machine Learning

Week 3
LAB.

Assoc. Prof. Dr. Caner ÖZCAN

Data Exploration and Cleaning

• Data science is a discipline that uses mathematical, statistical, and

programming skills to analyze and process data. This field is used to
explore large datasets, analyze them, and derive meaningful insights
from the data.
• Data exploration and cleaning are essential steps in data science, and
they are crucial in ensuring the accuracy of the data. Proper execution
of these steps helps data scientists to obtain more accurate results.
In this lesson, we will explore the Titanic disaster training set.

• The Titanic dataset is a collection of data that represents the

passengers who survived or perished in the sinking of the Titanic. The
dataset contains various features of the passengers.
What is Seaborn
• Seaborn is a data visualization library in Python programming language.

• Seaborn is based on Matplotlib and aims to create more attractive and

informative graphics using Matplotlib's basic plotting functions.

• Seaborn is particularly useful for statistical data visualization, providing a

high-level interface to create various types of statistical graphics.

• It provides a range of functions for visualizing data in the form of

heatmaps, line plots, scatter plots, bar plots, etc., and makes it easy to
customize and style the plots.
Commonly Used Plot
1.Line plot - sns.lineplot()
2.Scatter plot - sns.scatterplot()
3.Bar plot - sns.barplot()
4.Box plot - sns.boxplot()
5.Violin plot - sns.violinplot()
6.Heat map - sns.heatmap()
7.Pair plot - sns.pairplot()
8.Facet grid - sns.FacetGrid()
9.Joint plot - sns.jointplot()
10.Count plot - sns.countplot()
Visualization with Plotly
• Plotly provides data visualization in Python. It comes with different
functions.
• It’s an interactive graphic library.
• Plotly is a great charting alternative for your data exploration and
analysis.
• It offers interactive dashboards that help you navigate and better
understand your data.
• Before start to visualizing you should install dependicies
pip install plotly
Visualization with Plotly
• Plotly has a pattern and visualization makes according to this pattern
which is:
• trace()
• data()
• layout()
• pattern()
Visualization with Plotly
• When we are creating the trace part:
• x = The column to be placed on the x-axis is written.
• y = The column to be placed on the y-axis is written
• mode = The type of plot to be used.
• name = The name of the trace
• marker = It used with dictionaries, defines the color and
transparency.
• text = It is the information to which the value belongs when hovering
over the plot.
Visualization with Plotly
• data = the list to which traces are added.
• layout = It is a dictionary and contains the following;
• title = The information about data
• x axis = = It is a dictionary and contains the following;
• title = The name of the x axis
• ticklen = thickness of the header on the x-axis
• zeroline = When it is false, lines crossing zero are disabled.
Visualization with Plotly
• fig = A figure containing the data and the layout is created.
• iplot() = Plot the figure which contains the data and layout.
• Most common visualization ways with plotly:
• 3D Scatter Plot
• Bar Plot
• Box Plot
• Scatter Plot
• Scatter Plot Matrix
The Digit Dataset
• This dataset is made up of 1797 8x8
images.
• Each image, like the one shown below,
is of a hand-written digit.
• In order to utilize an 8x8 figure like
this, we’d have to first transform it into
a feature vector with length 64.

https://scikit-
learn.org/stable/auto_examples/
index.html#dataset-examples
The Iris Dataset
• This data sets consists of 3
different types of irises’
(Setosa, Versicolour, and
Virginica) petal and sepal
length, stored in a 150x4
numpy.ndarray

• The rows being the samples

and the columns being:
Sepal Length, Sepal Width,
Petal Length and Petal
Width.
Plot Randomly Generated Classification
Dataset
• This example plots several randomly
generated classification datasets.
• For easy visualization, all datasets have 2
features, plotted on the x and y axis. The
color of each point represents its class
label.
• The first 4 plots use the make_classification
with different numbers of informative
features, clusters per class and classes.
• The final 2 plots use make_blobs and
make_gaussian_quantiles.
Plot Randomly Generated Multilabel
Dataset

• This illustrates the

make_multilabel_classification dataset
generator.

• Each sample consists of counts of two

features (up to 50 in total), which are
differently distributed in each of two
classes.
References
• Ethem Apaydin, Introduction to Machine Learning, 3e. The MIT Press, 2014.
• Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2011.
• Tom Mitchell, Machine Learning, McGraw Hill, 1997.
• Russell, S., and P. Norvig. 2009. Artificial Intelligence: A Modern Approach, 3rd ed. New York: Prentice Hall.
• “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to
Build Intelligent Systems”, Aurélien Géron, O'Reilly Media (2019).
• https://developers.google.com/machine-learning/crash-course/
• https://www.kaggle.com/kanncaa1/machine-learning-tutorial-for-beginners
• Yi-xin, https://medium.com/@yixinsun_56102/understanding-generalization-error-in-machine-learning-
e6c03b203036
• Siora Photography, https://unsplash.com/?utm_source=medium&utm_medium=referral
• Sharoon Saxena, https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-
machine-learning/
• https://scikit-learn.org/stable/auto_examples/index.html#dataset-examples
• https://en.wikipedia.org/wiki/Cross-validation_(statistics)
• https://vitalflux.com/hold-out-method-for-training-machine-learning-model/

Artificial Intelligence in Economics and Finance Theories: Tankiso Moloi Tshilidzi Marwala
No ratings yet
Artificial Intelligence in Economics and Finance Theories: Tankiso Moloi Tshilidzi Marwala
131 pages
CS3491-AI ML-Chapter 2
No ratings yet
CS3491-AI ML-Chapter 2
16 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Presentation 6
No ratings yet
Presentation 6
34 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
ML 01
No ratings yet
ML 01
24 pages
Notes
No ratings yet
Notes
125 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
SML_Lecture4
No ratings yet
SML_Lecture4
38 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (1)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
ML 3
No ratings yet
ML 3
36 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
Supervised_Learning (2)
No ratings yet
Supervised_Learning (2)
41 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
shawe-taylor-slides Statiscal Learning Theory for Modern Machine Learning
No ratings yet
shawe-taylor-slides Statiscal Learning Theory for Modern Machine Learning
195 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
No ratings yet
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
51 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
49 pages
ML PPT LECT_4
No ratings yet
ML PPT LECT_4
16 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
ITML U1 Overview
No ratings yet
ITML U1 Overview
45 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
ML
No ratings yet
ML
9 pages
KSMF
No ratings yet
KSMF
35 pages
Unit 3
No ratings yet
Unit 3
16 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Machine_learning(unit 3)
No ratings yet
Machine_learning(unit 3)
9 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
UNIT-3
No ratings yet
UNIT-3
99 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
AI Learning
No ratings yet
AI Learning
81 pages
Lec7 - Nonparametric Methods - II
No ratings yet
Lec7 - Nonparametric Methods - II
38 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Horn Clause: Fundamentals and Applications
From Everand
Horn Clause: Fundamentals and Applications
Fouad Sabry
No ratings yet
Set-Theoretic Paradoxes and their Resolution in Z-F
From Everand
Set-Theoretic Paradoxes and their Resolution in Z-F
Samuel Horelick
4.5/5 (2)
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
AI SAMPLE PAPER 2024-25
No ratings yet
AI SAMPLE PAPER 2024-25
7 pages
Predictive Analytics: Myth or Reality
No ratings yet
Predictive Analytics: Myth or Reality
9 pages
Machine Learning in Computational Lithography: Yu Cao
No ratings yet
Machine Learning in Computational Lithography: Yu Cao
20 pages
AI_Project_LogBook[1]
No ratings yet
AI_Project_LogBook[1]
33 pages
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
100% (5)
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
53 pages
MCQ Suggestions For Ca4 and Semester Examination
No ratings yet
MCQ Suggestions For Ca4 and Semester Examination
12 pages
CS4622 Machine Learning PROJECT
No ratings yet
CS4622 Machine Learning PROJECT
3 pages
Assignment-2 GOKUL DLS20011
No ratings yet
Assignment-2 GOKUL DLS20011
8 pages
Top 12 Uses of AI in Retail
100% (1)
Top 12 Uses of AI in Retail
4 pages
Ashutosh Resume1
No ratings yet
Ashutosh Resume1
1 page
ETEG 425 Internal Exam Questions 2021
No ratings yet
ETEG 425 Internal Exam Questions 2021
2 pages
Michael Chan
No ratings yet
Michael Chan
6 pages
Ek 2020
No ratings yet
Ek 2020
203 pages
Std10aipreboard22 23
No ratings yet
Std10aipreboard22 23
11 pages
Ai and Improving CX en
100% (1)
Ai and Improving CX en
17 pages
Artificial Neural Network Tutorial
0% (2)
Artificial Neural Network Tutorial
11 pages
Machine Learning For Networking Workflow, Advances and Opportunities
No ratings yet
Machine Learning For Networking Workflow, Advances and Opportunities
8 pages
Insider Threat Detection in Organization Using Machine Learning
No ratings yet
Insider Threat Detection in Organization Using Machine Learning
11 pages
Amrita - M Tech Robotic Automation
No ratings yet
Amrita - M Tech Robotic Automation
34 pages
MCQ Artificial Intelligence Class 10 Computer Vision
100% (3)
MCQ Artificial Intelligence Class 10 Computer Vision
41 pages
v7.0 Tutorial
No ratings yet
v7.0 Tutorial
24 pages
AI-Based Digital Assistants: Opportunities, Threats, and Research Perspectives
No ratings yet
AI-Based Digital Assistants: Opportunities, Threats, and Research Perspectives
10 pages
New Itron Dbda Chennai
No ratings yet
New Itron Dbda Chennai
16 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
43 pages
AIML_PBL
No ratings yet
AIML_PBL
15 pages
Algorithms: K Nearest Neighbors
No ratings yet
Algorithms: K Nearest Neighbors
16 pages
Kernel Machines
No ratings yet
Kernel Machines
33 pages
Adoption of AI in the Auditing Practice_ a Case Study of a Big Fo
No ratings yet
Adoption of AI in the Auditing Practice_ a Case Study of a Big Fo
13 pages

Week 3

Uploaded by

Week 3

Uploaded by

CPE316 - Introduction to

Assoc. Prof. Dr. Caner ÖZCAN

• ‘+’ denotes a positive example of

• ‘−’ denotes a negative example

Training set for the class of a “family car.”

The training set contains N such examples:

Training set for the class of a “family car.”

(p1  price  p2 ) AND (e1  engine power  e2 )

• After further discussions with the expert and

Example of a hypothesis class.

• Equation fixes H, the hypothesis class from

• The aim is to find h∈ H that is as similar as possible to C.

• The empirical error is the proportion of training

where 1(a != b) is 1 if a != b and is 0 if a = b

• In our example, the hypothesis class H is the set of all

• Each quadruple (𝑝1ℎ , 𝑝2ℎ , 𝑒1ℎ , 𝑒2ℎ ) defines one hypothesis,

• Generalization, how well our hypothesis will correctly

• The most specific hypothesis, S, that is the hypothesis

• The most general hypothesis, G, is the largest

• It seems intuitive to choose h halfway between S and G;

• We should use an error (loss) function which not only

• Instead of h(x) that returns 0/1, we need to have a

• The maximum number of points that can be shattered

• Each strip is at most ε/4

The difference between h and C

• Use the simpler rectangular because

where r has K dimensions and

K-class problem, we have K hypotheses

where f(x) ∈ is the unknown function and  is random noise.

2. Loss function: E ( | X ) =  L(r t , g (xt | ))

where r t is desired output and our approximation to it, g(x t |θ).

3. Optimization procedure (to find θ∗ that minimizes the total error):

where arg min returns the argument that minimizes.

End-to-End Machine Learning Project

Read the chapter and apply all codes by

Due date: next week

Discover and Visualize the Data to Gain Insights

Prepare the Data for Machine Learning Algorithms

Assoc. Prof. Dr. Caner ÖZCAN

• Data science is a discipline that uses mathematical, statistical, and

• The Titanic dataset is a collection of data that represents the

• Seaborn is based on Matplotlib and aims to create more attractive and

• Seaborn is particularly useful for statistical data visualization, providing a

• It provides a range of functions for visualizing data in the form of

• The rows being the samples

• This illustrates the

• Each sample consists of counts of two

You might also like