0% found this document useful (0 votes)
9 views

Week 3 v1.1 (hidden) Supervised Learning (Regression)

The document outlines the Week 3 curriculum for a data analytics course focused on supervised learning, specifically regression techniques. It includes important dates for coursework submissions, guidelines for deliverables, and an overview of the required data analysis methods. Additionally, it provides resources for data sourcing and introduces key concepts such as bivariate linear regression and the importance of correlation in predictive modeling.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Week 3 v1.1 (hidden) Supervised Learning (Regression)

The document outlines the Week 3 curriculum for a data analytics course focused on supervised learning, specifically regression techniques. It includes important dates for coursework submissions, guidelines for deliverables, and an overview of the required data analysis methods. Additionally, it provides resources for data sourcing and introduces key concepts such as bivariate linear regression and the importance of correlation in predictive modeling.

Uploaded by

Yen-Kai Cheng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

ECS784U/P DATA ANALYTICS

(WEEK 3, 2024)
SUPERVISED LEARNING (REGRESSION)

DR ANTHONY CONSTANTINOU 1
SCHOOL OF ELECTRONIC ENGINEERING AND COMPUTER SCIENCE
TIMETABLE

2
LECTURE OVERVIEW
Supervised Learning (Regression)
▪ Coursework 1.
▪ Week 3 Lab.
▪ Association.
▪ Bivariate Linear Regression.
▪ Multivariate Linear Regression.
▪ Non-linear Regression.
▪ Optimisation.
▪ Regularisation.

3
COURSEWORK 1: DATES
Important Dates:

▪ Release date:
▪ end of Week 2, Friday 2nd February 2024 at 12:00 noon.

▪ Submission deadline:
▪ mid-Week 8, Wednesday 13th March 2024 at 10:00AM.

▪ Late submission deadline (cumulative penalty applies):


▪ Within 7 days after deadline.

4
COURSEWORK 1: GUIDELINES Reading
slide
General information:
▪ Students will sometimes upload their coursework as a draft and not hit the submit button.
Make sure you fully complete the submission process.
▪ A penalty will be applied automatically by the system for late submissions.
▪ Lecturers cannot remove the penalty!
▪ Penalties can only be challenged via submission of an Extenuating Circumstances
(EC) form which can be found on your Student Support page. All the information
you need to know is on that page, including how to submit an EC claim along with the
deadline dates and full guidelines.
▪ Deadline extensions can only be granted through approval of an EC claim.
▪ If you submit an EC form, your case will be reviewed by a panel. When the panel
reaches a decision, they will inform both you and the module organiser (Anthony).
▪ If you miss both the submission deadline and the late submission deadline, you will
automatically receive a score of 0.
▪ Submissions via e-mail are not accepted.

▪ The School recommends that we set the deadline during a weekday at 10:00 AM.

▪ For more details on submission regulations, please refer to your relevant student
handbook. 5
COURSEWORK 1: OVERVIEW
▪ The submission involves two files:
▪ a data analytic report (Deliverable 1).
▪ a Jupyter notebook (Deliverable 2).

▪ You should address a data-related problem in your


professional field or a field you are interested in (e.g.,
healthcare, sports, bioinformatics, gaming, finance, etc). If you
are motivated by the subject matter, the project will be more fun
for you, and you will likely produce a better report.

▪ Once you determine the area that interests you the most, you
should search for a suitable data set online, or collate the
data set yourself (see Section 5 for possible data sources).
6
COURSEWORK 1: OVERVIEW
▪ You should apply a minimum of TWO data analytic techniques (i.e. machine
learning algorithms) of your choice to your data, from those covered in this
course up to and including Week 5.
▪ The aim is to learn two models and contrast their performance on your
input data.
▪ You are allowed to test more than TWO data analytic techniques if you
wish (e.g., using multiple techniques to learn a model, or learning more
than two models), but this is not a requirement and will not necessarily
improve your mark.
▪ Remember to use the page limit wisely against the marking criteria.

▪ The algorithms you can choose from are:


▪ Linear, non-linear and logistic regression,
▪ Support vector classification or regression,
▪ Decision trees,
▪ KNN,
▪ k-means,
7
▪ GMMs.
COURSEWORK 1: DELIVERABLES Reading
slide
Deliverable 1: Technical report:
▪ The report shall have a maximum length of 7 pages including references. Pages beyond the first 7
will NOT be marked.
▪ Font size should be not lower than 11, and page margins should be not lower than 2cm. There
are no other formatting requirements; e.g., the document can have a single-column or a two-
column format.
▪ Reports should be written with a technical audience in mind. It should be concise and clear,
adopting the same style you would use in writing a scientific report or project dissertation.
▪ Some of the components your report should include:
▪ Problem statement and hypothesis.
▪ A review of relevant literature.
▪ Description of your data set and how it was obtained, including a sample of the data presented
in a figure, along with pointers to your data sources.
▪ Description of any data pre-processing steps you took (if any).
▪ What you have learnt by exploring the data; you may include some visualisations if necessary.
▪ How you chose which features to use in your analysis.
▪ Details of your modelling process, including how you selected your data analytic methods, as
well as how you determined the optimal model through validation.
▪ Your challenges and successes.
▪ Concluding remarks including key findings. 8
▪ Possible extensions or business applications of your project.
COURSEWORK 1: DELIVERABLES Reading
slide
Deliverable 2: Jupyter notebook:
▪ You must submit your Jupyter notebook Python code as a separate PDF
file. This is needed so that we can quickly refer to your code outputs while
marking your report. Please do not forget to add some section headings and
comments around your code, similar to those added to the notebooks used
in the labs.

▪ In Windows, you can generate a PDF file by right clicking and selecting
‘Print’ the Jupyter notebook loaded in your browser, and then you should
be given an option to save it as a PDF file.

▪ Do NOT copy-and-paste your notebook’s code into a word document, as


this approach will not preserve the notebook’s format.

▪ You do NOT need to submit your data set nor the actual .ipynb file. These
might be requested at a later stage, if and only if we would like to review
your code and/or data in greater depth.
9
COURSEWORK 1: MARKING CRITERIA

10
This coursework contributes 60% towards your total module mark.
COURSEWORK 1: TIMETABLE Reading
slide

11
COURSEWORK 1: DATA SOURCES
Using public data is the most common choice. If you have access to private data, that is also
an option, though you will have to be careful about what results you can release to us. Some
sources of publicly available data are listed below (you don`t have to use these sources).
▪ Kaggle
▪ https://www.kaggle.com/
▪ Over 50,000 public data sets for machine learning.

▪ UC Irvine Machine Learning Repository


▪ https://archive.ics.uci.edu/
▪ More than 600 data sets across different domains for machine learning testing.

▪ NHS Health and Social Care Information Centre


▪ http://www.hscic.gov.uk/home
▪ Health datasets from the UK National Health Service.
Reading
▪ Google Finance
▪ https://www.google.com/finance
slide
▪ Forty years' worth of stock market data, updated in real time.

▪ Football datasets
▪ http://www.football-data.co.uk/
▪ This site provides historical data for football matches around the world.

More data sources are provided in the coursework specification document available 12
on QM+.
WEEK 3 LAB OVERVIEW Reading
slide
Pandas
What is Pandas?
▪ Pandas is one of the fundamental libraries for data manipulation, including data
cleaning, and analysis in Python.
▪ It is suitable for manipulating two-dimensional tabular data, so you might find it
useful for Coursework 1.
▪ Three Jupyter notebook files:
▪ Notebook 1: covers Series and DataFrames.
▪ A Series is a one-dimensional object similar to an array, list, or column in a
table (e.g., each column in Excel).
▪ A DataFrame (one of the most commonly used Pandas object) is a two-
dimensional data structure, similar to a spreadsheet, relational database
table, or R's data.frame object.
▪ Notebook 2: Manipulating DataFrames.
▪ E.g., merging, concatenating, pivoting and deleting data.
▪ Notebook 3: Processing data from DataFrames.
▪ E.g., drop/remove/replace operations, data discretisation, outliers,
sampling, conditioning/grouping. 13
ASSOCIATION

Positive Zero Negative

.
. . . . . . . .
. .
. .. .
Variable 1 (y)

Variable 1 (y)
Variable 1 (y)

. .
.
. . .. . .
Variable 2 (x) Variable 2 (x) Variable 2 (x)

Relationship between two variables

14
POSITIVE ASSOCIATION
.
. .

Ice cream sales


. .
. .
.
Temperature

What does it mean ?


▪ There is a positive relationship between the two variables.
Specifically, larger quantities of ice cream sales associate with higher
temperature.
▪ Knowing the quantity of ice cream sales improves our prediction
about temperature.
▪ Knowing temperature improves our prediction about ice cream sales.
15
NO ASSOCIATION

. . . .

Car value
.
. . .
Car colour

What does it mean ?


▪ There is no meaningful statistical relationship between a
car’s colour and its value.
▪ Knowing the colour of a car does not improve our prediction
about the value of the car.
▪ Knowing the value of a car does not improve our prediction
about the colour of the car.
16
NEGATIVE ASSOCIATION

. .

Life expectancy
. .. .
. .
Cigarettes smoked

What does it mean ?


▪ There is a negative relationship between the two variables. Specifically,
larger quantities of cigarettes smoked associate with shorter life
expectancy.
▪ Knowing the quantity of cigarettes smoked improves our prediction about
life expectancy.
▪ Knowing life expectancy improves our prediction about cigarettes smoked.
17
REGRESSION
Machine
Learning

Supervised Unsupervised

Regression Classification

18
SUPERVISED LEARNING
There are two types of supervised learning.

▪ Regression ▪ Classification
▪ Predict numeric target 𝑦. ▪ Predict categorical target 𝑦.
▪ E.g., house price growth, temperature, ▪ E.g., Female/Male, True/False,
etc. Win/Draw/Lose, Rain/No-Rain

19
LINEAR REGRESSION
Why is it important?
▪ Linear Regression is one of the simplest predictive models.
▪ Not to be confused with regression vs classification
terminology! E.g., logistic regression is a classification
method.
▪ Despite its simplicity, regression serves as the foundation
for more advanced statistical and machine learning models,
especially because it can be extended to non-linear
representation.
▪ For example, Neural networks can be viewed as a set of
parametric (i.e., fixed set of parameters) non-linear
regression lines nested up against each other.

20
BIVARIATE LINEAR REGRESSION

▪ Regression with just two variables.


▪ Linear regression is performed under the assumption the
variables are linearly dependent.
▪ Variable 𝒙 serves as one of the feature variables.
▪ Also referred to as an independent variable.
▪ Variable 𝒚 serves as the target variable.
▪ Also referred to as the dependent variable.
.
. .
. .
Target (y)

.. .
.
21
Feature (x)
HOW DOES IT WORK?
▪ It answers the following question:
ෝ given observation 𝒙?
What is prediction 𝒚

The regression line/model 𝑦ො = 𝑓 𝑥 . Represents a prediction of 𝑦,


denoted as 𝑦.

. An observation of 𝑥 relative to 𝑦 (one data point).

The error between prediction and observation.


.
. .
. .
.
Target (y)

.
.
22
Feature (x)
HOW DO WE EXPLAIN THE LINE?
▪ Regression line: 𝑦
ො = 𝑎𝑥 + 𝑏
▪ If we assume 𝑏 = 4, then 𝑦
ො = 𝑎𝑥 + 4, and this means that 𝑦ො is expected to
be 4 when feature 𝑥 has value 0.
▪ Parameter 𝑏 is known as the intercept.
▪ The other parameter, 𝑎, tells us how much we can expect 𝑦
ො to change as
𝑥 increases.
▪ For example, if 𝑎 = 4.5 then 𝑦
ො is expected to increase at 4.5 times the
rate of increase in 𝑥.
▪ Parameter 𝑎 is known as the slope.

.
. . .
20

. . The slope indicates the steepness of


15
Target (y)

a line and the intercept the location


.. .
10

where it intersects an axis.


5

.
0

0 1 2 3 4 5 23
Feature (x)
HOW DO WE EXPLAIN THE LINE?
Assuming 𝑏 = 4 and 𝑎 = 4.5 .
. . .

20
▪ 𝐰𝐡𝐞𝐧 𝑥 = 0 𝐭𝐡𝐞𝐧
. .

15
Target (y)
▪𝑦
ො = 𝑎𝑥 + 𝑏 = 𝑏 = 4 .
. .

10
5
.

0
0 1 2 3 4 5
Feature (x)

.
. . .

20
▪ 𝐰𝐡𝐞𝐧 𝑥 = 1 𝐭𝐡𝐞𝐧 . .

15
Target (y)
▪𝑦
ො = 𝑎𝑥 + 𝑏 = 4.5 ∗ 1 + 4 = 8.5 .
. .
10
5
0
.
0 1 2 3 4 5
Feature (x)

24
BIVARIATE LINEAR REGRESSION
▪ The predictions are only as accurate as the strength of the correlation
between 𝑥 and 𝑦.
▪ Pearson’s correlation coefficient 𝑟 (or 𝑅) is most commonly used.
▪ The value of 𝑟 ranges between −1 (negative correlation) and 1 (positive
correlation), where 0 represents no correlation.
▪ Note that 𝑟 does not take into consideration whether a variable is defined
as a feature 𝑥 or a target 𝑦 variable; it treats both equally (it is symmetric).

Figure taken from Statistics Laerd. (2017). Pearson Product-Moment Correlation. Retrieved August 16, 2017, 25
from https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
BIVARIATE LINEAR REGRESSION

▪ The line of best fit is the line that best represents the relationship between
the two variables.
▪ When 𝑟 = 1 it simply means that the there is no variation between the
data points and the line of best fit and does not tell us anything about the
slope of the line of best fit.

Figure taken from Statistics Laerd. (2017). Pearson Product-Moment Correlation. Retrieved August 16, 2017, 26
from https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
BIVARIATE LINEAR REGRESSION
The lower the variability between observed data and the line of best fit, the
stronger the 𝑟 coefficient towards −1 or 1. This means that different levels
of variability can generate similar regression lines.

Figure taken from Statistics Laerd. (2017). Pearson Product-Moment Correlation. Retrieved August 16, 2017, 27
from https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
BIVARIATE LINEAR REGRESSION Reading
slide

The 𝑟 correlation for several sets of (x, y) points. Note that the correlation reflects the noisiness and
direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many
aspects of nonlinear relationships (bottom).
N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because
the variance of Y is zero. Source: Wikipedia, https://en.wikipedia.org/wiki/Correlation_and_dependence 28
10 MINUTTERS PAUSE
10分の休憩
10 MINUTEN PAUSE
‫ دقائق استراحة‬10
10 MINUTI DI PAUSA
‫ דקות‬10 ‫הפסקה של‬
10 MINUTES DE PAUSE
10 मिनट का ब्रेक
10 MINUTES BREAK
10 МИНУТА ПАУЗЕ
10 মিমিটের মিরমি
ΔΙΑΛΕΙΜΜΑ 10 ΛΕΠΤΩΝ
ПЕРЕРЫВ 10 МИНУТ
休息10分钟
DESCANSO DE 10 MINUTOS
10 분 휴식
10 MINUTEN PAUZE 29
ERROR (RESIDUALS)
▪ The prediction is

𝑦ො = 𝑓 𝑥 = 𝑎𝑥 + 𝑏

▪ But what we observe is

𝑦 = 𝑓 𝑥 = 𝑎𝑥 + 𝑏 + 𝐸

▪ where error (residuals) 𝐸 = 𝑦 − 𝑦.



.
. .
. .
.
Target (y)

.
.
30
Feature (x)
OUTLIERS
▪ An outlier is a data point that is abnormally distant from other
observations.
▪ An outlier occurs due to variability or due to error.
▪ Some evaluators are sensitive to outliers, such as MSE, and
other much less sensitive, such as Mean Absolute Error (MAE).

𝑦
𝐸𝑀𝐴𝐸 𝑤 = ෍ 𝑦𝑖 − 𝑓 𝑥𝑖
𝑖

Exaggerates the impact of outliers Price


2
𝐸𝑀𝑆𝐸 𝑤 = ෍ 𝑦𝑖 − 𝑓 𝑥𝑖 𝑥
𝑖
Square Meters

outlier 31
ERROR MEASURES Reading
slide
There are many different error measures that can be used to
compute 𝑬.
Assuming 𝑁 is sample size:
▪ MSE: Mean squared error.
▪ Measures the average of the squares of the prediction error.

▪ RMSE: Root mean squared error.


▪ Measures the sample standard deviation of the prediction error.

▪ MAE: Mean absolute error.


▪ Measures the average of the prediction error.
▪ Preserves the prediction unit (e.g., £)

▪ R-squared: Coefficient of determination.


▪ Measures the proportion of the variance in the dependent variable that is predictable
form the independent variables.
▪ Commonly used for statistical hypothesis testing.
32
PCA VS LINEAR REGRESSION
▪ Regression: Predict a special output variable (𝑦) given others 𝑥 .
▪ PCA: No special variable. Reduce data dimensions (𝑥) with minimum
information loss.
▪ Error on 𝑦 (Regression) versus error over all 𝑥 (PCA).
𝑧
𝑦 𝑥2

5. 5.
3. . 3. .
. . . .
1. . 4 1. . 4
.
. 2 .
. 2

𝑥1 𝑥1

33
HOW DOES LINEAR REGRESSION WORK AS
A MACHINE LEARNING ALGORITHM?
▪𝑦
ො = 𝑓 𝑥 = 𝑎𝑥 + 𝑏
▪ 𝑥 is a feature and 𝑎 and 𝑏 are model parameters.
▪ As a search problem, iterate over different linear lines; i.e., over
parameters 𝑎 and 𝑏.
▪ The lines searched depend on how we iterate through 𝑎 and 𝑏.
▪ For each line searched, compute 𝐸 which represents the error of
the line/model relative to the observed data points.
▪ Move towards the line that minimises 𝐸; i.e., argmin 𝐸(𝑎, 𝑏)

. .
. . . .
. . . .
. .
Target (y)

Target (y)

. .
. .
34
Feature (x) Feature (x)
LINEAR REGRESSION PSEUDOCODE
▪ trainLinearRegression(𝑦, 𝑥) Reading
▪ For 𝑎 = -10.0 to 10.0 slide
▪ For 𝑏 = -10.0 to 10.0
▪ Store how well the learnt line 𝑦
ො = 𝑎𝑥 + 𝑏 fits the observed data.
▪ Pick the optimal combination 𝑎 and 𝑏

▪ predictionWithLinearRegression(𝑥, 𝑎, 𝑏)
▪ Return 𝑦
ො = 𝑎𝑥 + 𝑏

. .
. . . .
. . . .
. .
Target (y)

Target (y)

. .
. .
35
Feature (x) Feature (x)
SEARCH AND OPTIMISATION
Optimisation represents the process of arriving at the optimal model
parameters.
▪ There are different approaches to optimisation, and each of those
approaches determine how to iterate over the different parameter values.
▪ E.g., exhaustive search explores for all possible combinations (within a
range) and returns the combination that minimises the error.
▪ E.g., gradient descent makes bigger or smaller steps towards the
optimal weights, proportional to the change in the error.

. .
. . . .
. . . .
. .
Target (y)

Target (y)

. .
. .
36
Feature (x) Feature (x)
SEARCH AND OPTIMISATION Reading
slide
▪ Gradient descend, for example, is much more efficient than exhaustive
search.
▪ The search approach represents an important decision when learning from
large data sets, since each additional parameter might mean an additional
nested loop, and nested loops have exponential, or higher, impact on
computational complexity.
▪ Moreover, exploring non-linear relationships might involve an additional
loop for each additional polynomial order searched.
▪ There are many different search methods for optimisation, but we will not be
covering them in this course.

. .
. . . .
. . . .
. .
Target (y)

Target (y)

. .
. .
37
Feature (x) Feature (x)
MULTIVARIATE LINEAR REGRESSION
▪ We now know how to predict 𝑦ො given a single feature 𝑥.
▪ e.g., carCrashes = 𝑎. drivingSpeed + 𝑏

▪ What about when we have multiple features 𝑥?

▪ Multivariate regression is simply an extension of bivariate regression.


▪ We have an intercept parameter, as before, but now we have a slope
parameter for each independent variable.
▪ In other words, each input 𝑥 has its own coefficient (slope weight).
▪ e.g., carCrashes = 𝑎. 𝑑𝑟𝑖𝑣𝑖𝑛𝑔𝑆𝑝𝑒𝑒𝑑 + 𝑏. roadCondition + 𝑐. milesDriven + 𝑑

… if we have two features, instead of a


straight line, we have a straight plane.

Harder to visualise three dimensions.

38
MULTIVARIATE LINEAR REGRESSION
▪ The general expression of multivariate regression is Reading
slide
𝑦ො = 𝛽0 + 𝛽1 . 𝑥1 + 𝛽2 . 𝑥2 … 𝛽𝑘 . 𝑥𝑘

▪ Where 𝛽0 is the intercept, and the equation is specified over 𝑘


independent variables.
▪ Note input variables 𝑥1 … 𝑥𝑘 are called independent precisely
because their impact is additive.
▪ E.g., the value of 𝑥1 does not influence the impact of 𝑥2 , or any
other input 𝑥.
▪ If the independent variables 𝑥 correlate with each
other, then this would violate the independence
assumption, implying some negative impact on
the accuracy of the learnt model.
39
NON-LINEAR REGRESSION
What if the relationship is not a straight line ?
▪ Some relationships may not be linear.
▪ e.g., demand for electricity given temperature.

𝑦
Electricity

Heating Air conditioner

𝑥
Temp

▪ This requires more complex regression than 𝑦ො = 𝑎𝑥 + 𝑏


40
QUADRATIC REGRESSION
▪ Non-linear lines require ‘curves’.

▪ To introduce a single curve, we move from 𝑦ො = 𝑎𝑥 + 𝑏, to the more complex


𝑦ො = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐

Exponent 2 makes the


equation quadratic

▪ And instead of optimising for argmin 𝐸 𝑎, 𝑏


We need to optimise for
2 one additional
𝐸𝑀𝑆𝐸 𝑎, 𝑏 = ෍ 𝑦𝑖 − 𝑎𝑥 + 𝑏 parameter
𝑖
▪ We optimise for argmin 𝐸 𝑎, 𝑏, 𝑐

2 2
𝐸𝑀𝑆𝐸 𝑎, 𝑏, 𝑐 = ෍ 𝑦𝑖 − 𝑦ෝ𝑖 = ෍ 𝑦𝑖 − 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
41
𝑖 𝑖
POLYNOMIAL REGRESSION Reading
slide

We can introduce more curves:


▪ The function of a polynomial regression depends on a deterministic
value called ‘order’.
▪ Think of it as an extension of a quadratic representation, where the
fitted line has more than one curves.

Linear Quadratic 3rd order polynomial


𝑦ො = 𝑎𝑥 + 𝑏 (or 2nd order polynomial)
𝑦ො = 𝑎𝑥 3 + 𝑏𝑥 2 + 𝑐𝑥 + 𝑑
𝑦ො = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐

42
MODEL UNDERFITTING
▪ Recall the relationship
between temperature and
electricity from a previous
slide. 𝑦

▪ Because the relationship is not .. .

Electricity
linear, a linear regression . .
model is likely to underfit the . . .
data . . .
▪ Underfitting occurs when the 𝑥
trained model is too simple Temp
relative to the patterns
available in the data.

43
MODEL OVERFITTING
▪ The higher the order of polynomial
regression, the more complex the
regression line becomes.
𝑦
▪ It is possible to introduce enough
curves to fit the data perfectly.
▪ Complex models are at risk of
.. .

Electricity
adjusting to specific data patterns . . .
that do not generalise well. This . .
phenomenon is known as model . . .
overfitting.
▪ This means that increasing 𝑥
polynomial order does not mean Temp
the regression will be better, or
offer better predictions.

44
MODEL OVERFITTING
𝑦 𝑦

.. . .. .
Electricity

Electricity
. . . . . .
. . . .
. . . . . .
𝑥 𝑥
Temp Temp

▪ Acquiring more training data would have resulted in a better polynomial fit,
and this represents one way of addressing model overfitting.
▪ What if we cannot acquire more data?

45
HIDDEN SLIDE

46
HIDDEN SLIDE

47
HIDDEN SLIDE

48
HIDDEN SLIDE

49
HIDDEN SLIDE

50
HIDDEN SLIDE

51
HIDDEN SLIDE

52

You might also like