0% found this document useful (0 votes)

48 views59 pages

Linear Regression: Jia-Bin Huang Virginia Tech

Uploaded by

Mr. Raghunath Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views59 pages

Linear Regression: Jia-Bin Huang Virginia Tech

Uploaded by

Mr. Raghunath Reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 59

Linear Regression

Jia-Bin Huang
ECE-5424G / CS-5824 Virginia Tech Spring 2019
BRACE YOURSELVES

WINTER IS COMING
BRACE YOURSELVES

HOMEWORK IS COMING
Administrative
• Office hour
• Chen Gao
• Shih-Yang Su

• Feedback (Thanks!)
• Notation?

• More descriptive slides?

• Video/audio recording?

• TA hours (uniformly spread over the week)?

Recap: Machine learning algorithms
Supervised Unsupervised
Learning Learning

Discrete Classification Clustering

Dimensionality
Continuous Regression reduction
Recap: Nearest neighbor classifier
• Training data

• Learning
Do nothing.

• Testing
, where
Recap: Instance/Memory-based Learning
1. A distance metric
• Continuous? Discrete? PDF? Gene data? Learn the metric?
2. How many nearby neighbors to look at?
• 1? 3? 5? 15?
3. A weighting function (optional)
• Closer neighbors matter more
4. How to fit with the local points?
• Kernel regression

Slide credit: Carlos Guestrin

Validation set
• Spliting training set: A fake test set to tune hyper-parameters

Slide credit: CS231 @ Stanford

Cross-validation
• 5-fold cross-validation -> split the training data into 5 equal folds
• 4 of them for training and 1 for validation

Slide credit: CS231 @ Stanford

Things to remember
• Supervised Learning
• Training/testing data; classification/regression; Hypothesis
• k-NN
• Simplest learning algorithm
• With sufficient data, very hard to beat “strawman” approach
• Kernel regression/classification
• Set k to n (number of data points) and chose kernel width
• Smoother than k-NN
• Problems with k-NN
• Curse of dimensionality
• Not robust to irrelevant features
• Slow NN search: must remember (very large) dataset for prediction
Today’s plan: Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
Regression
Training set
real-valued output

Learning Algorithm

𝑥 h 𝑦
Size of house Hypothesis Estimated price
House pricing prediction
Price ($)
in 1000’s
400
300
200
100

500 1000 1500 2000 2500

Size in feet^2
Training set Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178 = 47
… …

• Notation:
• = Number of training examples
• = Input variable / features Examples:
• = Output variable / target variable
• (, ) = One training example
• (, ) = training example
Slide credit: Andrew Ng
Model representation

Training set
Shorthand

Learning Algorithm Price ($)

in 1000’s
400
300

𝑥 h 𝑦
200
100

Size of house Hypothesis Estimated price 500 1000 1500 2000 2500
Size in feet^2

Univariate linear regression

Slide credit: Andrew Ng
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
Size in feet^2 (x) Price ($) in 1000’s (y)
Training set 2104 460
1416 232
1534 315
852 178 = 47
… …

• Hypothesis

: parameters/weights

How to choose ’s?

Slide credit: Andrew Ng
h 𝜃 ( 𝑥 )= 𝜃 0+ 𝜃 1 𝑥
𝑦 𝑦 𝑦
3 3 3
2 2 2
1 1 1

1 2 3 𝑥 1 2 3 𝑥 1 2 3 𝑥

Slide credit: Andrew Ng

Cost function
• Idea: 𝜃 0 , 𝜃1
Choose so that
is close to for our
h 𝜃 ( 𝑥 ) =𝜃 0 +𝜃 1 𝑥
(𝑖 ) (𝑖)
training example
𝑚
𝑦 1 (𝑖 ) 2
Price ($)
in 1000’s
𝐽 ( 𝜃0 , 𝜃1 ) = ∑
2𝑚 𝑖=1
( h𝜃 ( 𝑥 ) − 𝑦 )
(𝑖 )

400
300
200

Cost function
100

500 1000 1500 2000 2500

𝑥 𝜃 0 , 𝜃1
Size in feet^2
Slide credit: Andrew Ng
Simplified
• Hypothesis: • Hypothesis:

• Parameters: • Parameters:

• Cost function: • Cost function:

• Goal: • Goal:

𝜃 0 , 𝜃1 𝜃 0 , 𝜃1
Slide credit: Andrew Ng
, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1

Slide credit: Andrew Ng

, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1

Slide credit: Andrew Ng

, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1

Slide credit: Andrew Ng

, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1

Slide credit: Andrew Ng

, function of , function of
𝑦 𝐽 ( 𝜃1 )
3 3

2 2

1 1

1 2 3 𝑥 0 1 2 3 𝜃1

Slide credit: Andrew Ng

• Hypothesis:

• Parameters:

• Cost function:

• Goal:
𝜃 0 , 𝜃1
Slide credit: Andrew Ng
Cost function

Slide credit: Andrew Ng

How do we find good that minimize ?
Slide credit: Andrew Ng
Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
Gradient descent
Have some function
Want
𝜃 0 , 𝜃1

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at minimum
Slide credit: Andrew Ng
Slide credit: Andrew Ng
Gradient descent
Repeat until convergence{
(for and )
}

: Learning rate (step size)

: derivative (rate of change)

Slide credit: Andrew Ng

Gradient descent
Correct: simultaneous update Incorrect:

Slide credit: Andrew Ng

𝜕
𝜃 1 ≔ 𝜃1 − 𝛼 𝐽 ( 𝜃1 )
𝜕 𝜃1
𝐽 ( 𝜃1 )

𝜕
3 𝐽 ( 𝜃1 ) < 0
𝜕 𝜃1
𝜕
2 𝐽 ( 𝜃1 ) > 0
𝜕 𝜃1

0 1 2 3 𝜃1
Slide credit: Andrew Ng
Learning rate
Gradient descent for linear regression
Repeat until convergence{
(for and )
}

• Linear regression model

Slide credit: Andrew Ng

Computing partial derivative
•=
=

•:
•:

Slide credit: Andrew Ng

Gradient descent for linear regression
Repeat until convergence{

Update and simultaneously

Slide credit: Andrew Ng

Batch gradient descent
• “Batch”: Each step of gradient descent uses all the training examples
Repeat until convergence{
: Number of training examples

Slide credit: Andrew Ng

Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
Training dataset
Size in feet^2 (x) Price ($) in 1000’s (y)
2104 460
1416 232
1534 315
852 178
… …

h 𝜃 ( 𝑥 )=𝜃 0+ 𝜃1 𝑥

Slide credit: Andrew Ng

Multiple features (input variables)
Size in feet^2 () Number of Number of Age of home Price ($) in
bedrooms () floors () (years) () 1000’s (y)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… …

Notation:
= Number of features
= Input features of training example
= Value of feature in training example
Slide credit: Andrew Ng
Hypothesis
Previously:

Now:

Slide credit: Andrew Ng

h 𝜃 ( 𝑥 )=𝜃 0+𝜃1 𝑥1 +𝜃 2 𝑥 2+…+𝜃 𝑛 𝑥 𝑛
• For convenience of notation, define
( for all examples)

Slide credit: Andrew Ng

Gradient descent
• Previously () • New algorithm ()

Repeat until convergence{ Repeat until convergence{

}
Simultaneously update

Slide credit: Andrew Ng

Gradient descent in practice: Feature scaling
• Idea: Make sure features are on a similar scale (e.g,. )
• E.g. size (0-2000 feat^2)
number of bedrooms (1-5)

𝜃2 𝜃2

3 3
2 2
1 1
𝜃1 𝜃1
0 1 2 3 0 1 2 3 Slide credit: Andrew Ng
Gradient descent in practice: Learning rate
• Automatic convergence test
• too small: slow convergence
• too large: may not converge

• To choose , try

0.001, … 0.01, …, 0.1, … , 1

Image credit: CS231n@Stanford

House prices prediction

• Area

Slide credit: Andrew Ng

Polynomial regression
Price ($)
in 1000’s
400

300

200

100 = (size)
500 1000 1500 2000 2500
= (size)^2
Size in feet^2 = (size)^3

Slide credit: Andrew Ng

Linear Regression
• Model representation

• Cost function

• Gradient descent

• Features and polynomial regression

• Normal equation
() Size in feet^2 Number of Number of Age of home Price ($) in
() bedrooms () floors () (years) () 1000’s (y)
1 2104 5 1 45 460
1 1416 3 2 40 232
1 1534 3 2 30 315
1 852 2 1 36 178
… …

[ ]
460
𝑦 = 232
315
178
⊤ −1 ⊤
𝜃=( 𝑋 𝑋) 𝑋 𝑦 Slide credit: Andrew Ng
Least square solution
•
Justification/interpretation 1
• Loss minimization

• Least squares loss

• Empirical Risk Minimization (ERM)

Justification/interpretation 2
• Probabilistic model
• Assume linear model with Gaussian errors

• Solving maximum likelihood

Image credit: CS 446@UIUC

𝒚
Justification/interpretation 3
𝑿 𝜽−𝒚
• Geometric interpretation 𝑿𝜽
column space of

• : column space of or span()

• Residual is orthogonal to the column space of

training examples, features
Gradient Descent Normal Equation
• Need to choose • No need to choose
• Need many iterations • Don’t need to iterate
• Works well even when • Need to compute
is large
• Slow if is very large

Slide credit: Andrew Ng

Things to remember
• Model representation

• Cost function
• Gradient descent for linear regression
Repeat until convergence {}
• Features and polynomial regression
Can combine features; can use different functions to generate features (e.g.,
polynomial)
• Normal equation
Next
• Naïve Bayes, Logistic regression, Regularization

Threatlabz Ai Security Report
100% (1)
Threatlabz Ai Security Report
38 pages
Slide 4 - Linear Regression With Multiple Variables
100% (1)
Slide 4 - Linear Regression With Multiple Variables
30 pages
Prompting Guide 101
No ratings yet
Prompting Guide 101
68 pages
Linear Regression With Multiple Variables
100% (1)
Linear Regression With Multiple Variables
38 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
unit-2
No ratings yet
unit-2
19 pages
Bain - Healthcare Private Equity Report 2025
No ratings yet
Bain - Healthcare Private Equity Report 2025
44 pages
Codex of The Rift
No ratings yet
Codex of The Rift
16 pages
Ai Lecture1
No ratings yet
Ai Lecture1
16 pages
AI Tools for Digital Marketing
100% (1)
AI Tools for Digital Marketing
17 pages
HCAI MOD 5
No ratings yet
HCAI MOD 5
31 pages
GII GRIN SCIE Conference Rating 1 Giu 2017 10.36.57 Output
No ratings yet
GII GRIN SCIE Conference Rating 1 Giu 2017 10.36.57 Output
330 pages
Mathematics Behind Machine Learning:: Linear Regression Model
No ratings yet
Mathematics Behind Machine Learning:: Linear Regression Model
21 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Ecobot ai record
No ratings yet
Ecobot ai record
16 pages
L4-ML Introduction To Machine Learning Algorithms 2024-01
No ratings yet
L4-ML Introduction To Machine Learning Algorithms 2024-01
44 pages
Project PPT
No ratings yet
Project PPT
27 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
ML03
No ratings yet
ML03
14 pages
Openassistant Conversations - Democratizing Large Language Model Alignment
No ratings yet
Openassistant Conversations - Democratizing Large Language Model Alignment
44 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
FINAL INTERN DOCUMENT Dhanunjai
No ratings yet
FINAL INTERN DOCUMENT Dhanunjai
26 pages
Analysis and Design Accounting Information System PDF
No ratings yet
Analysis and Design Accounting Information System PDF
5 pages
Alibaba 2020 Anual
No ratings yet
Alibaba 2020 Anual
52 pages
Privacy and Artificial Intelligence: Challenges For Protecting Health Information in A New Era
No ratings yet
Privacy and Artificial Intelligence: Challenges For Protecting Health Information in A New Era
5 pages
102679174
No ratings yet
102679174
6 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
importance of computer in modern era
No ratings yet
importance of computer in modern era
11 pages
Revised-L3-Linear Regression
No ratings yet
Revised-L3-Linear Regression
41 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
LinearRegression
No ratings yet
LinearRegression
64 pages
ML Question BanK
No ratings yet
ML Question BanK
5 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Lecture3_Linear Regression and Logistic Regression
No ratings yet
Lecture3_Linear Regression and Logistic Regression
60 pages
Blue-and-Green-Modern-Artificial-Intelligence-Presentation_20241218_101145_0000
No ratings yet
Blue-and-Green-Modern-Artificial-Intelligence-Presentation_20241218_101145_0000
12 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Lecture3-Linear Regression With Multiple Variables
No ratings yet
Lecture3-Linear Regression With Multiple Variables
27 pages
Machine Learning Notes by Standard Andrew Ng
No ratings yet
Machine Learning Notes by Standard Andrew Ng
142 pages
Unit 1 Ai PDF
No ratings yet
Unit 1 Ai PDF
89 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
[MLP] MidtermNote
No ratings yet
[MLP] MidtermNote
31 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
Chap 4
No ratings yet
Chap 4
31 pages
CS229
No ratings yet
CS229
69 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
31 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
24-25 ősz_HH_CHARM Visuality and communication
No ratings yet
24-25 ősz_HH_CHARM Visuality and communication
6 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
2 (1)
No ratings yet
2 (1)
18 pages
3
No ratings yet
3
14 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Computer Netwok Devices Grade 6
No ratings yet
Computer Netwok Devices Grade 6
18 pages
Linear Regression With Multiple Features
No ratings yet
Linear Regression With Multiple Features
7 pages
CS435 Ch6
No ratings yet
CS435 Ch6
14 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Notes 2. Linear - Regression - With - Multiple - Variables
No ratings yet
Notes 2. Linear - Regression - With - Multiple - Variables
10 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
Week 04
No ratings yet
Week 04
101 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Regression
No ratings yet
Regression
30 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
The Mac Hack Attack
No ratings yet
The Mac Hack Attack
4 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Cost Function
No ratings yet
Cost Function
17 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
cs229 2
No ratings yet
cs229 2
275 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
21 pages
Linear Regression
100% (1)
Linear Regression
51 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Machine Learniing
No ratings yet
Machine Learniing
31 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
AI in Regulatory Reporting - Whitepaper - Digital PDF
No ratings yet
AI in Regulatory Reporting - Whitepaper - Digital PDF
16 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Research G4 1 1
No ratings yet
Research G4 1 1
38 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Cybersecurity
No ratings yet
Cybersecurity
11 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

Linear Regression: Jia-Bin Huang Virginia Tech

Uploaded by

Linear Regression: Jia-Bin Huang Virginia Tech

Uploaded by

Linear Regression

• More descriptive slides?

• TA hours (uniformly spread over the week)?

Discrete Classification Clustering

Slide credit: Carlos Guestrin

Slide credit: CS231 @ Stanford

Slide credit: CS231 @ Stanford

• Features and polynomial regression

• Features and polynomial regression

500 1000 1500 2000 2500

Learning Algorithm Price ($)

Univariate linear regression

• Features and polynomial regression

How to choose ’s?

Slide credit: Andrew Ng

500 1000 1500 2000 2500

• Cost function: • Cost function:

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

• Features and polynomial regression

: Learning rate (step size)

Slide credit: Andrew Ng

Slide credit: Andrew Ng

• Linear regression model

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Update and simultaneously

Slide credit: Andrew Ng

Slide credit: Andrew Ng

• Features and polynomial regression

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Slide credit: Andrew Ng

Repeat until convergence{ Repeat until convergence{

Slide credit: Andrew Ng

0.001, … 0.01, …, 0.1, … , 1

Image credit: CS231n@Stanford

Slide credit: Andrew Ng

Slide credit: Andrew Ng

• Features and polynomial regression

• Least squares loss

• Empirical Risk Minimization (ERM)

• Solving maximum likelihood

Image credit: CS 446@UIUC

• : column space of or span()

• Residual is orthogonal to the column space of

Slide credit: Andrew Ng

You might also like