0% found this document useful (0 votes)

41 views54 pages

W2 Ecs7020p

The document discusses regression problems in machine learning, including formulating regression tasks, examples of regression problems, and basic regression models like simple linear regression and polynomial regression. It also covers concepts like predicting a continuous target variable given input features, evaluating model quality using metrics like mean squared error, and finding the optimal regression model through an optimization problem.

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views54 pages

W2 Ecs7020p

Uploaded by

Yen-Kai Cheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

School of Electronic Engineering and Computer Science

Queen Mary University of London

ECS7020P Machine Learning

Supervised learning: Regression

Dr Jesús Requena Carrión

5 Oct 2023
How far is the equator from the north pole? 10,000km

”By using this method, a sort of equilibrium

is established between the errors which
prevents the extremes from prevailing [...]
[getting us closer to the] truth.”

Adrien-Marie Legendre, 1805

2/54
Embrace the error!

3/54
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

4/54
Machine learning

There are two main ways of thinking about ML:

Data-first view: ML is a set of tools for extracting knowledge from
data.
Deployment-first (our) view: ML is a set of tools together with a
methodology for solving problems using data.

In ML, data is organised as a dataset (a collection of items described by

a set of attributes) and knowledge is represented as a model.

Machine learning distinguishes between different types of problems,

techniques and models, which can be arranged into a taxonomy.

5/54
Machine learning taxonomy

Machine
Learning

Supervised Unsupervised

Structure Density
Regression Classification
Analysis Estimation

6/54
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

7/54
Problem formulation

Regression is a supervised problem: Our goal is to predict the value

of one attribute (label) using the remaining attributes (predictors).
features
The label is a continuous variable.
Our job is then to find the best model that assigns a unique label
to a given set of predictors.
We use datasets consisting of labelled samples to build models.

Predictors Model? Label

8/54
Examples of regression problems
The following are examples of problems that can be formulated as a
regression problem:
1. Predict the energy consumption of a household, given the location
of the house, household size, income, intensity of occupation.
2. Predict future values of a company stock, given past stock prices.
3. Predict distance driven by a vehicle given its speed and journey
duration.
4. Predict demand given past demand and currency exchange rate.
5. Predict tomorrow’s temperature given today’s temperature and
pressure.
6. Predict the probability to develop a specific heart condition given
BMI, alcohol consumption, diet, number of daily steps.

Identify labels and predictors. Do we need machine learning to solve

them?
Go to www.menti.com and use code 1858 5479

9/54
Mathematical notation

xi f (⋅) ŷi

Population:
x is the predictor attribute
y is the label attribute
Dataset:
N is the number of samples, i identifies each sample
xi is the predictor of sample i
yi is the actual label of sample i
(xi , yi ) is sample i, {(xi , yi ) ∶ 1 ≤ i ≤ N } is the entire dataset
Model:
f (⋅) denotes the model
ŷi = f (xi ) is the predicted label for sample i
ei = yi − ŷi is the prediction error for sample i

10/54
Visualising our mathematical notation

11/54
Candidate solutions
Which line is the best mapping of age to salary?

12/54
What is a good model?

In order for us to find the best model we need a notion of model

quality.

The squared error e2i = (yi − ŷi )2 is a common quantity used in

regression to encapsulate the notion of single prediction quality.

Based on the squared error, we can define dataset quality metrics. Two
quality metrics based on the squared error are the sum of squared errors
(SSE) and the mean squared error (MSE). which are computed as:
N
ESSE = e21 + e22 + ⋅ ⋅ ⋅ + e2N = ∑ e2i
i=1
N
1
EM SE = ∑e
2
N i=1 i

13/54
MSE: Example

14/54
A zero-error model?

Given a dataset, is it possible to find a model such that ŷi = yi for every
instance i in the dataset, i.e. a model whose error is zero, EM SE = 0?

(a) Never, there will always be a

non-zero error
(b) It is never guaranteed, but
might be possible for some
datasets
(c) Always, there will always be a
model complex enough that
achieves this

15/54
The nature of the error

When considering a regression problem we need to be aware that:

The chosen predictors might not include all the factors that
determine the label.
The chosen model might not be able to represent the true
relationship between response and predictor (the pattern).
Random mechanisms (noise) might be present.

Mathematically, we represent this discrepancy as

y = ŷ + e
= f (x) + e

There will always be some discrepancy (error e) between the true label y
and our model prediction f (x). Embrace the error!

16/54
Regression as an optimisation problem (take 1)

Given a dataset {(xi , yi ) ∶ 1 ≤ i ≤ N }, every candidate model f has its

own EM SE . Our goal is to find the model with the lowest EM SE :

1 N 2
fbest (x) = arg min ∑ (yi − f (xi ))
f N i=1

The question is, how do we find such model? Finding such a model is an
optimisation problem.

Why take 1? Note that we are defining regression as finding the model
that minimises EM SE on the dataset, without considering what happens
once deployed. We’ll revise this definition.

17/54
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

18/54
Our regression learner

Knowledge
Learner
Data

Model

New data Deployment Prediction/Action

Data: Labelled samples (predictors and true label).

Model: Predicts a label based on the predictors.

19/54
Simple regression
Simple regression considers one predictor x and one label y.

20/54
Simple linear regression

In simple linear regression, models are defined by the mathematical

expression
f (x) = w0 + w1 x
Hence, the predicted label ŷi can be expressed as

ŷi = f (xi ) = w0 + w1 xi

A linear model has therefore two parameters w0 (intercept) and w1

(gradient), which need to be tuned to achieve the highest quality.

In machine learning, we use a dataset to tune the parameters. We say

that we train the model or fit the model to the training dataset.

21/54
Linear solution: Example

22/54
Beyond linearity
Sketch the model that you would choose for the Salary Vs Age dataset
and try to find a suitable mathematical expression.

23/54
Simple polynomial regression

The general form of a polynomial regression model is:

f (xi ) = w0 + w1 xi + w2 x2i + ⋅ ⋅ ⋅ + wD xD
i

where D is the degree of the polynomial.

Polynomial regression defines a family of families of models. For each

value of D, we have a different family: D = 1 corresponds to the linear
family, D = 2 to the quadratic, D = 3 to the cubic, and so on.

We call D a hyperparamenter. What it means is that setting its value

results in a different family, with a different collection of parameters.

24/54
Quadratic solution

25/54
Cubic solution

26/54
5-power solution

27/54
Multiple regression
features
In multiple regression there are two or more predictors. Given item i,
we will denote each individual predictor as xi,1 , xi,2 , ... and xi,K , where
K is the number of predictors.

28/54
Multiple regression: Linear model

Use vector notation to represent a multiple linear regression model where

the predictors are age and height and the label is salary.

29/54
Multiple regression: Vector notation

Let K denote the number of predictors. We will represent the k-th

predictor of item i as xi,k .

Using vector notation, the predictors of item i can be packed together

into a vector represented in bold font:

xi = [1, xi,1 , xi,2 , . . . , xi,K ]T ,

where the constant 1 is prepended for convenience.

Using vector notation, multiple regression can then be expressed as

ŷi = f (xi )

Good news: the notation developed for simple regression can be easily
translated to the multivariate scenario, no extra efforts required!

30/54
Multiple linear regression: Formulation

Linear models in multiple regression are simply the sum of a constant (or
intercept) and each predictor multiplied by its own coefficient.

Multiple linear regression models can be expressed as:

f (xi ) = wT xi = w0 + w1 xi,1 + ⋅ ⋅ ⋅ + wK xi,K

where w = [w0 , w1 , . . . , wK ]T is the model’s parameter vector.

Note that we can use the same vector notation for simple linear
regression models, by defining w = [w0 , w1 ]T and xi = [1, xi ]T .

31/54
Multiple linear regression: Solution visualisation
Multiple linear regression models are planes (or hyperplanes).

32/54
Multiple regression: More notation

In multiple linear regression, the training dataset can be represented by

the design matrix X:
⎡1 x1,K ⎤⎥
⎢ x1,1 x1,2 ...
⎢1 x2,K ⎥⎥
⎢ x2,1 x2,2 ...
X = ⎢ ⎥
⎢⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎥
⎢
⎢1 xN,1 xN,2 ... xN,K ⎥⎦
⎣
together with the label vector y:

⎡ y1 ⎤
⎢ ⎥
⎢y ⎥
⎢ ⎥
y = ⎢ 2⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢yN ⎥
⎣ ⎦

33/54
Multiple regression: More notation
Given a linear model defined by coefficients

⎡ w0 ⎤
⎢ ⎥
⎢w ⎥
⎢ ⎥
w = ⎢ 1⎥
⎢ ⋮ ⎥
⎢ ⎥
⎢wK ⎥
⎣ ⎦
we can calculate the label vector ŷ as

⎡ ŷ1 ⎤ ⎡1 x1,K ⎤⎥ ⎡⎢ w0 ⎤⎥
⎢ ⎥ ⎢ x1,1 x1,2 ...
⎢ ŷ ⎥ ⎢1 x2,K ⎥⎥ ⎢⎢ w1 ⎥⎥
⎢ 2⎥ ⎢ x2,1 x2,2 ...
ŷ = ⎢ ⎥ = Xw = ⎢ ⎥⎢ ⎥
⎢ ⋮ ⎥ ⎢⋮ ⋮ ⋮ ⋱ ⋮ ⎥⎥ ⎢⎢ ⋮ ⎥⎥
⎢ ⎥ ⎢
⎢ŷN ⎥ ⎢1 xN,1 xN,2 ... xN,K ⎥⎦ ⎢⎣wK ⎥⎦
⎣ ⎦ ⎣
and error vector e as
e = y − ŷ

34/54
Multiple linear regression: Example

Consider a dataset consisting of 4 samples described by three attributes:

Age [Years] Height [cm] Salary [GBP]

S1 18 175 12000
S2 37 180 68000
S3 66 158 80000
S4 25 168 45000

1. Use vector notation to represent the linear regression model.

2. Obtain the design matrix X and response vector y.

35/54
The least squares solution

It can be shown that the linear model that minimises the metric EM SE
on a training dataset defined by a design matrix X and a label vector y,
has the parameter vector:

wbest = (XT X)
−1
XT y

This is an exact or analytical solution and is known as the least

squares solution. It is valid for simple and multiple linear regression.

This solution can also be used for polynomial models, by treating the
powers of the predictor as predictors themselves.

Note that the inverse matrix (XT X)

−1
exists when all the columns in X
are linearly independent.

36/54
Other models for regression

Linear and polynomial models are not the only options available. Other
families of models that can be used include:

Exponential
Sinusoids
Radial basis functions
Splines
And many more!

The mathematical formulation is identical and only the expression for

f (⋅) changes.

37/54
Other quality metrics
In addition to the MSE, we can consider other quality metrics:
Root mean squared error. Measures the sample standard deviation
of the prediction error.
√
1
ERM SE = ∑ e2i
N
Mean absolute error. Measures the average of the absolute
prediction error.
1
EM AE = ∑ ∣ei ∣
N
R-squared. Measures the proportion of the variance in the response
that is predictable from the predictors.

∑ e2i 1
ER = 1 − , where ȳ = ∑ yi
∑(yi − ȳ) 2 N

38/54
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

39/54
Flexibility

Models allow us to generate multiple shapes by tuning their parameters.

We talk about the degrees of freedom or the complexity of a model to
describe its ability to generate different shapes, i.e. its flexibility.

The degrees of freedom of a model are in general related to the number

of parameters of the model:
A linear model y = w0 + w1 x has two parameters and is inflexible, as
it can only generate straight lines.
A cubic model y = w0 + w1 x + w2 x2 + w3 x3 has 4 parameters and is
more flexible than a linear one.
The flexibility of a model is related to its interpretability and accuracy
and there is a trade-off between the two.

40/54
Interpretability
Model interpretability is crucial for us, as humans, to understand in a
qualitative manner how a predictor is mapped to a label. Inflexible
models produce solutions that are usually simpler and easier to interpret.

According to this linear model, the According to this polynomial model,

older you get, the more money you our salary remains the same as
make teenagers, then increases between
our 20s and 50s, then...
41/54
Quality on the training dataset
The quality of a model on a training dataset is also related to its
flexibility. During training, the error produced by flexible models is in
general lower.

The training error of the best linear The training error of the best
model is EM SE = 0.0983 polynomial model is EM SE = 0.0379

42/54
Generalisation

We have considered the training MSE, i.e. the quality of regression

models on the training dataset.

Priors
Learner
Data

Model

New data Deployment Prediction/Action

Will our model work well during deployment, when presented with new
data? Generalisation is the ability of our model to successfully translate
what we was learnt during the learning stage to deployment.

43/54
Generalisation
In this figure, the red curve represents the training MSE of different
models of increasing complexity, whereas the blue curve represents the
deployment MSE for the same models. What’s happening?

Deployment
just right Training
Mean Squared Error

underfitting overfitting

0 2 4 6 8 10
Flexibility

44/54
Underfitting and overfitting

By comparing the performance of models during training and

deployment, we can observe three different behaviours:
Underfitting: Large training and deployment errors are produced.
The model is unable to capture the underlying pattern. Rigid
models lead to underfitting.
Overfitting: Small errors are produced during training, large errors
during deployment. The model is memorising irrelevant details.
Too complex models and not enough data lead to overfitting.
Just right: Low training and deployment errors. The model is
capable of reproducing the underlying pattern and ignores
irrelevant details.

45/54
Underfitting and overfitting

46/54
Underfitting

47/54
Overfitting

48/54
Just right

49/54
Underfitting and overfitting
Remember this: Generalisation can only be assessed by comparing
training and deployment performance, not by just looking at how each
model fits the training data.

Deployment
Training
Mean Squared Error

0 2 4 6 8 10
Flexibility

50/54
Agenda

Recap

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Summary

51/54
Regression: Basic methodology

Regression is a family of problems in machine learning, where we set

out to find a model that predicts a continuous label.
To build a model we use:
• A training dataset,
• a tunable model,
• a quality metric and
• an optimisation procedure.
The final quality of a model has to be assessed during deployment.

52/54
Model generalisation

Models have different degrees of flexibility. Complex models are

flexible, simple models are rigid.
A model generalises well when it can deal successfully with samples
that it hasn’t been exposed to during training.
Three terms describe the ability of models to generalise:
• Underfitting: unable to describe the underlying pattern
• Overfitting: memorisation of irrelevant details
• Just right: reflects underlying pattern and ignores irrelevant
details

53/54
Final historical note

Wondering where the term regression comes from?

In the 19th century, Galton noticed that children of tall people tend to be
taller than average – but not as tall as their parents. Galton called this
reversion and later regression towards mediocrity.

This observation is nowadays called regression to the mean. You can read
more about this curious fallacy in Kanehman’s Thinking, Fast and Slow.

54/54

The Passion of An Amateur Card Magician
100% (4)
The Passion of An Amateur Card Magician
557 pages
Regression
No ratings yet
Regression
45 pages
ML 2
No ratings yet
ML 2
155 pages
Unit 2
No ratings yet
Unit 2
19 pages
Matt Mello - Thought Control
No ratings yet
Matt Mello - Thought Control
16 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Tom Rose - From The Red Notebook 2nd Edition
75% (4)
Tom Rose - From The Red Notebook 2nd Edition
33 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
W1.2 Regression 1
No ratings yet
W1.2 Regression 1
28 pages
m2 Data Analytic and Visualization
No ratings yet
m2 Data Analytic and Visualization
53 pages
W1.3 Regression 2
No ratings yet
W1.3 Regression 2
28 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Regression
No ratings yet
Regression
44 pages
Tensor 1 10
75% (4)
Tensor 1 10
24 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Week 2
No ratings yet
Week 2
43 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Unit 2
No ratings yet
Unit 2
35 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Week 6 - Lecture 12-1
No ratings yet
Week 6 - Lecture 12-1
34 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
2 - (9-3) Regression Classifiers
No ratings yet
2 - (9-3) Regression Classifiers
35 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Slide 3 Linear Regression
No ratings yet
Slide 3 Linear Regression
27 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
ML Unit
No ratings yet
ML Unit
23 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Icru 38
No ratings yet
Icru 38
33 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Applied Biostatistics
No ratings yet
Applied Biostatistics
7 pages
WME01 - 01 - MSC - 2025 (January) M1
No ratings yet
WME01 - 01 - MSC - 2025 (January) M1
19 pages
Homework #02 (Phy 112) Solutions
No ratings yet
Homework #02 (Phy 112) Solutions
19 pages
LinearRegression PDF
No ratings yet
LinearRegression PDF
4 pages
Great Circle Sailing Formulas and Voyage Planning
100% (1)
Great Circle Sailing Formulas and Voyage Planning
2 pages
ECS765P - W11 - Stream Processing II
No ratings yet
ECS765P - W11 - Stream Processing II
47 pages
ECS781P-11-Edge of The Cloud
No ratings yet
ECS781P-11-Edge of The Cloud
30 pages
Chapter 8 Index Models
100% (1)
Chapter 8 Index Models
26 pages
ECS765P - W3 - Hadoop Principles and Components
No ratings yet
ECS765P - W3 - Hadoop Principles and Components
47 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
ECS765P - W5 - Spark Programming
No ratings yet
ECS765P - W5 - Spark Programming
43 pages
Searle
100% (1)
Searle
13 pages
ECS726-Week01 Intro
No ratings yet
ECS726-Week01 Intro
70 pages
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
No ratings yet
Week 3 v1.1 (Hidden) Supervised Learning (Regression)
52 pages
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (Hidden) - Supervised Learning (Classification)
43 pages
ECS726-Week02 Symmetric EncryptionP
No ratings yet
ECS726-Week02 Symmetric EncryptionP
62 pages
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
No ratings yet
ECS726-Week04 - Hash - MAC - Digital Sinatures - Freshness - Dynamic Password Schemes
52 pages
ECS726-Week05 Cryptographic Protocols Key Management-P
No ratings yet
ECS726-Week05 Cryptographic Protocols Key Management-P
58 pages
Motion in A Straight Line - Practice Sheet
No ratings yet
Motion in A Straight Line - Practice Sheet
6 pages
ECS765P - W4 - Introduction To Spark
No ratings yet
ECS765P - W4 - Introduction To Spark
39 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
ECS781P 6 CloudPerformanceSLAs
No ratings yet
ECS781P 6 CloudPerformanceSLAs
39 pages
Ecs765p W2
No ratings yet
Ecs765p W2
55 pages
W3 Ecs7020p
No ratings yet
W3 Ecs7020p
51 pages
Cloud Computing Lab 2
No ratings yet
Cloud Computing Lab 2
4 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
PM Mathematics Tips
No ratings yet
PM Mathematics Tips
5 pages
Magic Pen Script 10-05-19
No ratings yet
Magic Pen Script 10-05-19
4 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
ECS781P-3-Cloud Applications
No ratings yet
ECS781P-3-Cloud Applications
50 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
Lights Illusions Script 08-26-19
No ratings yet
Lights Illusions Script 08-26-19
6 pages
ECS781P 10 Microservices
No ratings yet
ECS781P 10 Microservices
34 pages
Note - Wireless Communications For Everybody
No ratings yet
Note - Wireless Communications For Everybody
2 pages
Ecs781p 4 Rest
No ratings yet
Ecs781p 4 Rest
47 pages
Raymond Smullyan - An Epistemological Nightmare
No ratings yet
Raymond Smullyan - An Epistemological Nightmare
12 pages
Design Based On Availability Generative Design and Robotic Fabrication Workflow For Non-Standardized Sheet Metal With Variable Properties
No ratings yet
Design Based On Availability Generative Design and Robotic Fabrication Workflow For Non-Standardized Sheet Metal With Variable Properties
17 pages
Nsat Prep Kit: Last Updated: 17 December 2024
No ratings yet
Nsat Prep Kit: Last Updated: 17 December 2024
30 pages
Java Cheat Sheet With All BigO
No ratings yet
Java Cheat Sheet With All BigO
4 pages
ECS7020P ClassificationExercisesSolutions II
No ratings yet
ECS7020P ClassificationExercisesSolutions II
7 pages
Lesson Plan Day 14
No ratings yet
Lesson Plan Day 14
7 pages
For Loops The Basics
No ratings yet
For Loops The Basics
10 pages
Modelling Rockfall Protection Fences (Cantarelli Et Al.)
No ratings yet
Modelling Rockfall Protection Fences (Cantarelli Et Al.)
6 pages
Enhanced Heat Transfer in Round Tubes With Porous Inserts
No ratings yet
Enhanced Heat Transfer in Round Tubes With Porous Inserts
6 pages
2.2 RGB Led
No ratings yet
2.2 RGB Led
7 pages
Mathematics I BT 102 Unit 4
No ratings yet
Mathematics I BT 102 Unit 4
11 pages
San Juan
No ratings yet
San Juan
10 pages
Practical
No ratings yet
Practical
21 pages
Basic Chemistry Review
No ratings yet
Basic Chemistry Review
5 pages
Non Sequitur Argument Forms Have Been Classified Into Many Different Types of
No ratings yet
Non Sequitur Argument Forms Have Been Classified Into Many Different Types of
4 pages
7es Lesson Exemplar Polynomial Division
No ratings yet
7es Lesson Exemplar Polynomial Division
3 pages
Maximizing Power System Loadability by Optimal Allocation of SVC Using Mixed Integer Linear Programming
No ratings yet
Maximizing Power System Loadability by Optimal Allocation of SVC Using Mixed Integer Linear Programming
7 pages
Math Rubric
No ratings yet
Math Rubric
1 page
Decimal Fraction Cards-Justine McNeilly
No ratings yet
Decimal Fraction Cards-Justine McNeilly
6 pages
BC0052
No ratings yet
BC0052
1 page
TB Gear Motor 90bf006
No ratings yet
TB Gear Motor 90bf006
1 page
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

W2 Ecs7020p

Uploaded by

W2 Ecs7020p

Uploaded by

School of Electronic Engineering and Computer Science

Queen Mary University of London

ECS7020P Machine Learning

Dr Jesús Requena Carrión

”By using this method, a sort of equilibrium

Adrien-Marie Legendre, 1805

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

There are two main ways of thinking about ML:

In ML, data is organised as a dataset (a collection of items described by

Machine learning distinguishes between different types of problems,

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Regression is a supervised problem: Our goal is to predict the value

Predictors Model? Label

Identify labels and predictors. Do we need machine learning to solve

In order for us to find the best model we need a notion of model

The squared error e2i = (yi − ŷi )2 is a common quantity used in

(a) Never, there will always be a

When considering a regression problem we need to be aware that:

Mathematically, we represent this discrepancy as

Given a dataset {(xi , yi ) ∶ 1 ≤ i ≤ N }, every candidate model f has its

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

New data Deployment Prediction/Action

Data: Labelled samples (predictors and true label).

In simple linear regression, models are defined by the mathematical

A linear model has therefore two parameters w0 (intercept) and w1

In machine learning, we use a dataset to tune the parameters. We say

The general form of a polynomial regression model is:

where D is the degree of the polynomial.

Polynomial regression defines a family of families of models. For each

We call D a hyperparamenter. What it means is that setting its value

Use vector notation to represent a multiple linear regression model where

Let K denote the number of predictors. We will represent the k-th

Using vector notation, the predictors of item i can be packed together

xi = [1, xi,1 , xi,2 , . . . , xi,K ]T ,

where the constant 1 is prepended for convenience.

Using vector notation, multiple regression can then be expressed as

Multiple linear regression models can be expressed as:

f (xi ) = wT xi = w0 + w1 xi,1 + ⋅ ⋅ ⋅ + wK xi,K

where w = [w0 , w1 , . . . , wK ]T is the model’s parameter vector.

In multiple linear regression, the training dataset can be represented by

Consider a dataset consisting of 4 samples described by three attributes:

Age [Years] Height [cm] Salary [GBP]

1. Use vector notation to represent the linear regression model.

This is an exact or analytical solution and is known as the least

Note that the inverse matrix (XT X)

The mathematical formulation is identical and only the expression for

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Models allow us to generate multiple shapes by tuning their parameters.

The degrees of freedom of a model are in general related to the number

According to this linear model, the According to this polynomial model,

We have considered the training MSE, i.e. the quality of regression

New data Deployment Prediction/Action

By comparing the performance of models during training and

Formulation of regression problems

Basic regression models

Flexibility, interpretability and generalisation

Regression is a family of problems in machine learning, where we set

Models have different degrees of flexibility. Complex models are

Wondering where the term regression comes from?

You might also like