0% found this document useful (0 votes)

1 views

2 LinearRegression2

Uploaded by

João Paulo Dellasta do Nascimento

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

2 LinearRegression2

Uploaded by

João Paulo Dellasta do Nascimento

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

ITCS 6156/8156 Fall 2023

Machine Learning

Linear Regression

Instructor: Hongfei Xue

Email: [email protected]
Class Meeting: Mon & Wed, 4:00 PM – 5:15 PM, CHHS 376

Some content in the slides is based on Dr. Razvan’s lecture

Machine Learning as Optimization
Convexity
Convex Optimization
Gradient Descent

𝐽(𝑤! , 𝑤" )

𝑤"
𝑤!
Gradient Descent

𝐽(𝑤! , 𝑤" )

𝑤"
𝑤!
Gradient Descent

𝐽(𝑤! , 𝑤" )

𝑤"
𝑤!
Gradient Descent

𝐽(𝑤! , 𝑤" )

𝑤"
𝑤!
Gradient Descent

𝐽(𝑤! , 𝑤" )

𝑤"
𝑤!
Gradient Descent

•
Taylor Expansion
Gradient Decent

•
Gradient Decent
•

• The key operation in the above update step is the calculation of each partial derivative.

•
Gradient Decent

• The final weight update rule:

Issues with Gradient Decent

• Issues with Gradient Decent:

• Slow convergence
• Stuck in local minima

• One should note that the second issue will not arise in the
case of convex problem as the error surface has only one
global minima.

• More eﬃcient algorithms exist for batch optimization,

including Conjugate Gradient Descent and other quasi-
Newton methods. Another approach is to consider training
examples in an online or incremental fashion, resulting in
an online algorithm called Stochastic Gradient Descent.
Stochastic Gradient Descent (SGD)
• Update weights after every (or a small subset of) training
example(s).
•

• Why SGD?
Stochastic Gradient Descent (SGD)

1 or K (a small number)

1 or K (a small number)
Polynomial Basis Functions

• Q: What if the raw feature is insufficient for good

performance?
• Example: non-linear dependency between label and raw
feature.

• A: Engineer / Learning higher-level features, as functions

of the raw feature.

• Polynomial curve fitting:

- Add new features, as polynomials of the original feature.
Regression: Curve Fitting

Target 𝑓
Regression: Curve Fitting

Learned ℎ
𝑦

Target 𝑓

• Training: Build a function ℎ 𝑥 , based on (noisy) training examples

𝑥! , 𝑦! , 𝑥" , 𝑦" , ⋯ , (𝑥# , 𝑦# ).
Regression: Curve Fitting

Learned ℎ
𝑦

Target 𝑓

• Testing: for arbitrary (unseen) instance 𝑥 𝜖 𝐗, compute target output

ℎ 𝑥 ; want it to be close to f 𝑥 .
Regression: Polynomial Curve Fitting

%
ℎ 𝑥 = ℎ 𝑥, 𝐰 = 𝑤$ + 𝑤! 𝑥 + 𝑤" 𝑥 " + ⋯ + 𝑤% 𝑥 % = 1 𝑤& 𝑥 &
&'$

parameters features
Polynomial Curve Fitting
• Parametric model:
%
ℎ 𝑥 = ℎ 𝑥, 𝐰 = 𝑤$ + 𝑤! 𝑥 + 𝑤" 𝑥 " + ⋯ + 𝑤% 𝑥 % = 1 𝑤& 𝑥 &
&'$

• Polynomial curve fitting is (Multiple) Linear Regression:

𝐱 = [1, 𝑥, 𝑥 " , ⋯ , 𝑥 % ](
ℎ 𝑥 = ℎ 𝐱, 𝐰 = ℎ𝐰 𝐱 = 𝐰 𝑻𝐱

• Learning = minimize the Sum-of-Squares error function:

#
argmin 1
6=
𝐰 𝐽 𝐰 𝐽 𝐰 = 1 (ℎ𝐰(𝐱 . ) − 𝑦. )"
𝐰 2𝑁
.'!
• Least Square Estimate:

6 = (𝐗 𝐓𝐗),𝟏 𝐗 𝐓𝐲
𝐰
Polynomial Curve Fitting

• Generalization = how well the parameterized ℎ(𝑥, 𝐰)

performs on arbitrary (unseen) test instances 𝑥𝜖𝑋.

• Generalization performance depends on the value of M

0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial

• Which M to pick? Why?

• Follow the wisdom of a philosopher.
Occam’s Razor

William of Occam (1288 – 1348)

English Franciscan friar, theologian and
philosopher.

“Entia non sunt multiplicanda praeter necessitatem”

• Entities must not be multiplied beyond necessity.

i.e. Do not make things needlessly complicated.

i.e. Prefer the simplest hypothesis that fits the data.
Polynomial Curve Fitting

• Model Selection: choosing the order M of the polynomial.

- Best generalization obtained with M=3.
- M = 9 obtains poor generalization, even though it fits
training examples perfectly:
• But M = 9 polynomials subsume M = 3 polynomials!

• Overfitting ≡ good performance on training examples, poor

performance on test examples.
Over-fitting and Parameter Values
Overfitting
• Measure fit using the Root-Mean-Square (RMS) error (RMSE):
∑+(𝐰 , 𝐱 + − 𝒕+ )-
𝐸'() 𝐰 =
𝑁
• Use 100 random test examples, generated in the same way:
Overfitting vs. Data Set Size

• More training data ⟹ less overfitting

• What if we do not have more training data?

- Use regularization
Regularization

• Penalize large parameter values:

0
1 -
𝜆 -
𝐸 𝐰 = 5 (ℎ𝐰(𝐱 + ) − 𝑡+ ) + 𝐰
2𝑁 2
+./

Regularizer

argmin
𝐰∗ = 𝐸(𝐰)
𝐰
Ridge Regression

• Multiple linear regression with L2 regularization:

0
1 -
𝜆 -
𝐽 𝐰 = 5 (ℎ𝐰(𝐱 + ) − 𝑡+ ) + 𝐰
2𝑁 2
+./
argmin
@=
𝐰 𝐽(𝐰)
𝐰

• Solution is 𝐰 = (𝝀𝑵𝐈 + 𝐗 𝑻𝐗)3𝟏 𝐗 𝑻𝐭

- Prove it.
9th Order Polynomial with Regularization
9th Order Polynomial with Regularization
Training & Test error vs. ln 𝜆

How do we find the optimal value of 𝜆?

Model Selection

• Put aside an independent validation set.

• Select parameters giving best performance on validation set.

ln 𝜆 𝜖{−40, −35, −30, −25, −20, −15}

K-fold Cross-Validation

Source: https://scikit-learn.org/stable/modules/cross_validation.html
K-fold Cross-Validation

• Split the training data into K folds and try a wide range of
tunning parameter values:
- split the data into K folds of roughly equal size
- iterate over a set of values for 𝜆
• iterate over k = 1,2, ⋯ ,K
- use all folds except k for training
- validate (calculate test error) in the k-th fold
• error[𝜆] = average error over the K folds
- choose the value of 𝜆 that gives the smallest error.
Regularization: Ridge vs. Lasso

• Ridge regression:

! /
𝐽 𝐰 = ∑# (ℎ 𝐱 . − 𝑡. )" + ∑%
&'! 𝑤&
"
"# .'! 𝐰 "

• Lasso:
# %
1 "
𝜆
𝐽 𝐰 = 1 (ℎ𝐰 𝐱 . − 𝑡. ) + 1 𝑤&
2𝑁 2
.'! &'!

- if 𝜆 is sufficiently large, some of the coefficients 𝑤& are driven to

0 ⟹ sparse model
Regularization: Ridge vs. Lasso

Plot of the contours of the unregularized error function (blue) along with the
constraint region (3.30) for the quadratic regularizer 𝑞 = 2 on the left and the lasso
regularizer 𝑞 = 1 on the right, in which the optimum value for the parameter vector
𝐰 is denoted by 𝐰 ∗. The lasso gives a sparse solution in which 𝐰 ∗ = 𝟎.
Regularization

• Parameter norm penalties (term in the objective).

• Limit parameter norm (constraint).
• Dataset augmentation.
• Dropout.
• Ensembles.
• Semi-supervised learning.
• Early stopping
• Noise robustness.
• Sparse representations.
• Adversarial training.
Questions?

The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
An Introduction To Pseudocode Workbook
No ratings yet
An Introduction To Pseudocode Workbook
6 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
06 Estimation
No ratings yet
06 Estimation
31 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
NM Chap 5.2
No ratings yet
NM Chap 5.2
30 pages
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
No ratings yet
Provable Non-Convex Optimization For ML: Prateek Jain Microsoft Research India
86 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
MachineLearning PDF
No ratings yet
MachineLearning PDF
94 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
2.b Applied Machine Learning Secret Sauce - Slides
No ratings yet
2.b Applied Machine Learning Secret Sauce - Slides
41 pages
Lecture18 Boosting
No ratings yet
Lecture18 Boosting
21 pages
MLR - 2023
No ratings yet
MLR - 2023
18 pages
Improving ML, DL networks Hyperparameter tuning, Regularization & Optimization
No ratings yet
Improving ML, DL networks Hyperparameter tuning, Regularization & Optimization
16 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Lecture Slides - Linear Reg
No ratings yet
Lecture Slides - Linear Reg
34 pages
Weight initialization in ANNs
No ratings yet
Weight initialization in ANNs
13 pages
2024-SCU-ML-1-3-PLA
No ratings yet
2024-SCU-ML-1-3-PLA
50 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
2024-SCU-ML-2-1-SVM
No ratings yet
2024-SCU-ML-2-1-SVM
36 pages
ML cheat sheet(1)
No ratings yet
ML cheat sheet(1)
2 pages
Set3 Growth of Functions
No ratings yet
Set3 Growth of Functions
61 pages
Lesson 4 Linear Assumptions
No ratings yet
Lesson 4 Linear Assumptions
27 pages
04 Bayes Classification Rule
No ratings yet
04 Bayes Classification Rule
46 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Error Based Learning
No ratings yet
Error Based Learning
48 pages
ELEC3846 Week 1 Lecture 1 (2024)
No ratings yet
ELEC3846 Week 1 Lecture 1 (2024)
23 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Logistic Regression
No ratings yet
Logistic Regression
31 pages
Lecture_1_Introduction
No ratings yet
Lecture_1_Introduction
19 pages
Deep learning
No ratings yet
Deep learning
15 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
ML Lecture # 02 Linear Regression
No ratings yet
ML Lecture # 02 Linear Regression
28 pages
PWD 2019 20 Class 9 Final PDF
No ratings yet
PWD 2019 20 Class 9 Final PDF
21 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Inbound 8392301798635648784
No ratings yet
Inbound 8392301798635648784
43 pages
Cprogramming ICTweek 5
No ratings yet
Cprogramming ICTweek 5
26 pages
Tree-Based-Methods
No ratings yet
Tree-Based-Methods
21 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Employee Salary Prediction Slides
No ratings yet
Employee Salary Prediction Slides
21 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
Divide
No ratings yet
Divide
35 pages
2-lecture 2-1
No ratings yet
2-lecture 2-1
30 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Paper - 1 - HL Questions
No ratings yet
Paper - 1 - HL Questions
3 pages
3710501
No ratings yet
3710501
3 pages
CSC373 Week 3: Dynamic Programming Nisarg Shah
No ratings yet
CSC373 Week 3: Dynamic Programming Nisarg Shah
46 pages
FIT1029: Tutorial 6 Solutions Semester 1, 2014: Task 1
No ratings yet
FIT1029: Tutorial 6 Solutions Semester 1, 2014: Task 1
5 pages
Using Deep Learning For Automatically Determining Correct Application of Basic Quranic Recitation Rules
No ratings yet
Using Deep Learning For Automatically Determining Correct Application of Basic Quranic Recitation Rules
7 pages
Fourier Transform of Discrete Time Signals: Spring 2014
No ratings yet
Fourier Transform of Discrete Time Signals: Spring 2014
5 pages
World Bank Reg-2
No ratings yet
World Bank Reg-2
47 pages
Reflection Coefficients For Lattice Realization
No ratings yet
Reflection Coefficients For Lattice Realization
8 pages
Sadowsky PDF
No ratings yet
Sadowsky PDF
12 pages
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
100% (1)
21 Machine Learning Using Scikit Learn Ipynb Colaboratory PDF
23 pages
FIR Filter Design Via Semidenite Programming and Spectral Factorization
No ratings yet
FIR Filter Design Via Semidenite Programming and Spectral Factorization
6 pages
π π π = 4 · arctan (1) : How to approximate π? Use: tan 4 = 1 ⇒ 4 = arctan (1) ⇒ Know: Taylor expansion of arctan x
No ratings yet
π π π = 4 · arctan (1) : How to approximate π? Use: tan 4 = 1 ⇒ 4 = arctan (1) ⇒ Know: Taylor expansion of arctan x
7 pages
El 29 2 15
No ratings yet
El 29 2 15
8 pages
4 CSE 447 Digital Filter
No ratings yet
4 CSE 447 Digital Filter
82 pages
College 07
No ratings yet
College 07
39 pages
Nearest Neighbour Algorithm
No ratings yet
Nearest Neighbour Algorithm
20 pages
Constraint Programming: Michael Trick Carnegie Mellon
No ratings yet
Constraint Programming: Michael Trick Carnegie Mellon
41 pages
Solving Polynomial Equations
No ratings yet
Solving Polynomial Equations
27 pages
Dynamic Range: Music Production - Assignment 4
No ratings yet
Dynamic Range: Music Production - Assignment 4
4 pages
pertemuan 11-12
No ratings yet
pertemuan 11-12
44 pages
Dwi Putri Wahyuningsih 2019390004 Midterm - DAA Mid-Term (70 Minutes) Answer All Questions and Submit Using MD-Doc or PDF File (100 Points)
No ratings yet
Dwi Putri Wahyuningsih 2019390004 Midterm - DAA Mid-Term (70 Minutes) Answer All Questions and Submit Using MD-Doc or PDF File (100 Points)
18 pages
NLA_lecture_notes
No ratings yet
NLA_lecture_notes
86 pages
4 LSTM Gru
No ratings yet
4 LSTM Gru
44 pages
Measuring Relationship via Regression Analysis and Correlation
No ratings yet
Measuring Relationship via Regression Analysis and Correlation
9 pages
Course: ELEC ENG 3110 Electric Power Systems ELEC ENG 7074 Power Systems PG (Semester 2, 2022)
No ratings yet
Course: ELEC ENG 3110 Electric Power Systems ELEC ENG 7074 Power Systems PG (Semester 2, 2022)
25 pages
Top Down Parsing
No ratings yet
Top Down Parsing
45 pages
Digital Signal Processing - Exercises: 1 Sequences and Systems
No ratings yet
Digital Signal Processing - Exercises: 1 Sequences and Systems
7 pages
Exercise RandomForest
No ratings yet
Exercise RandomForest
5 pages
UCS664_EST_23
No ratings yet
UCS664_EST_23
3 pages