0% found this document useful (0 votes)

29 views28 pages

Lecture3 Upload

The lecture covers regression analysis, focusing on linear regression, gradient descent, and polynomial regression. Key concepts include the least squares method for estimating parameters, the importance of minimizing errors, and various gradient descent techniques. Additionally, it addresses overfitting and the need for data splitting to improve model performance.

Uploaded by

practisecodingforinterview

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views28 pages

Lecture3 Upload

Uploaded by

practisecodingforinterview

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Lecture 3

Regression Analysis

Ziyi Cao
University of Texas at Dallas
[email protected]
Spring 2025
Agenda

• Linear Regression
• Gradient Descent
• Polynomial Regression
• Python Practice
Linear Regression

Concepts
Least Square Method
Linear Regression Concepts

• Simple linear regression

• Models the linear relationship between a numeric target variable and an
explanatory variable

• : target variable / dependent variable / outcome variable

• : explanatory variable / independent variable / predictor / regressor

𝑌 = 𝛽0 + 𝛽1 𝑋 +𝜀
Linear Regression Concepts

• Multiple linear regression

• Models the linear relationship between a numeric target variable and a set of
explanatory variables

parameters
outcome (to be estimated)

𝑌 = 𝛽0 + 𝛽1 𝑋 1+ 𝛽 2 𝑋 2 +…+ 𝛽 𝑘 𝑋 𝑘 +𝜀
Random error
(unexplained part)
predictors
Linear Regression Concept

• Linear Relationship
• The linearity is defined on parameters, NOT predictors
• X are observed values (consider as constant), parameters are unknowns (to solve)
• You can always apply nonlinear transfers to variables before applying a model

Both are linear

regressions

𝑌 = 𝛽0 + 𝛽1 𝑋 +ε 𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽 2 𝑋 2 + ε

Define a new variable , then

Are They Linear Regressions?

• Suppose i=1, 2, …, 5, parameter is

Linear Regression Estimation

• Parameters (s) need to be estimated

outcome parameters

𝑌 = 𝛽0 + 𝛽1 𝑋 1+ 𝛽 2 𝑋 2 +…+ 𝛽 𝑘 𝑋 𝑘 +𝜀
Random error
(unexplained part)
predictors
Linear Regression Estimation – Intuition

𝑌 = 𝛽0 + 𝛽1 𝑋 +𝜀
• : Intercept
• : Slope 𝑌𝑖
^
𝑒 𝑖=𝑌 𝑖 − 𝑌 𝑖
^𝑖
𝑌
For datapoint
• Fitted Value:
𝛽1
• Predicted from regression
• Error (residual): 𝛽0

• Difference between an 𝑋𝑖
observed value and a predicted value
• Key component for performance measure
Linear Model Estimation

• Goal: Minimize the total error

• close to zero for each i => transfer s.t. non-negative

• Potential methods
1
• Mean (sum) of absolute errors 𝑀𝐴𝐸 =
𝑛
(|𝑒1|+|𝑒 2|+ …+|𝑒𝑛|)

1 2
• Mean (sum) of squared errors 𝑀𝑆𝐸=
𝑛
( 𝑒 1 +𝑒 2+ …+𝑒 𝑛 )
2 2

Note: For both methods, n is the number of observations

Minimize MSE is equivalent to minimize
Linear Model Estimation – Least Squares Method

• Criteria: minimize the sum (mean) of squared errors

𝑛
min 𝑀𝑆𝐸 ( 𝜷 ) ⇔ min ∑ ( 𝑦 𝑖 − ^
𝑦 𝑖 )2
𝜷 𝜷 𝑖 =1
• in bold represents a vector

• Solving for (calculus & linear algebra):

• For each , take partial differentiations 𝜕 𝑀𝑆𝐸( 𝜷)
=0
• We have K+1 equations, solve for K+1 unkown s 𝜕 𝛽𝑘

• The solution can be put to matrix form −1

𝜷 =( 𝑋 ′ 𝑋 ) 𝑋′ 𝑌
X is a N*(K+1) matrix;
Y is a N*1 vector;
is a (K+1)*1 vector
Gradient Descent
Linear Regression: An Example

𝑛
• Recall Optimization
min ∑ ( 𝑦 𝑖 − ^
𝑦𝑖)2 Cost function
𝜷 𝑖 =1

• Method 1:
• Closed-form solution ( ′
𝜷= 𝑋 𝑋 )−1
𝑋′ 𝑌
• Solvable
• Computational costly, no close form solution (complicated models)?

• Method 2: search for it

• Gradient Descent (Optimization Algorithm)
• Solve an optimization problem
• In conjunction with neural networks, regressions, SVMs, …
Gradient Descent – Intuition

• River flowing down a mountain

• A cost function is a hyperplane
A
• Searching for minima of cost function
• The lowest direction

• Start somewhere (A)

• Initial point/value
• Flow down to some (adjusted) directions
• Direction – gradient (slope)
• Distance – Learning rate (step size)
• Keep flowing until flat or into a lake (B) B
• Convergence – local minima
Gradient Descent – Concepts

• Search for the minimal point & the corresponding value minimizing the cost function

• Initial point
• Gradient
• Partial derivative w.r.t all variables

• Learning rate
• Length to move along the direction

• Next Step:
• Converge
• You need to set: initial point, learning rate, convergence criteria
Learning Rate (Step Size)

• Large learning rate <=> large step size

• Small learning rate: slow convergence

• Large learning rate: divergence (cannot converge)
Pros and Cons

• Advantages
• Simple and usually effective on ML tasks

• Disadvantages
• Local minimum if non-convex
• Change multiple initial points
Batch Gradient Descent

• Use the whole data set when computing the gradient – small data sets or simple models

• Example (Linear Regression):

• Compute the gradient (partial derivative) of the cost function MSE w.r.t. each
• Select a set of initial 𝑛
1
• Gradient: min
𝜷
∑
𝑛 𝑖 =1
( ^
𝑦𝑖 − 𝑦𝑖) 2

• Compute the gradient using the whole dataset – plug in all X and y observations, and initial
• Update :

• Learning rate
• Pre-determined
Stochastic Gradient Descent

• Huge dataset – still computationally expensive

• Consider every sample to calculate the gradient and update parameters at every step
(iteratively till convergence)?
• How about pulling a subset of data?

• In extreme cases => SGD:

• Picks one random instance (sample) to compute the gradient
• Update the parameter

• Adds noise to the gradient

• Less likely to be stuck at local minima
• More iterations to converge
Mini Batch Gradient Descent

• Batch Gradient Descent vs. Stochastic Gradient Descent

• Smooth vs. Large data set

• Balance the two

• Calculate gradient based on a mini-batch (e.g., size = 32 records)
• Not too much noise (compared to SGD)
• More efficient (compared to BGD)

• Need to decide on batch size

• Very commonly used

Polynomial Regressions
Polynomial Regression

• Generate new features consisting of polynomial combinations of the original

features

• Captures the non-linear pattern between Y and X

• But still a linear model

• Example (poly = 2)
Example I: Polynomial Features

• Hyperbolic tangent: tanh(x):

• S Shape function
• Approximates to 1 and -1
Example I: Polynomial Features
Overfitting

• When overfitting occurs:

• The model captures the noise in the training data instead of the underlying structure
• Complicated model (e.g., too many variable, too complicated structure)

• Consequences:
• The model has low bias? (low)
• The model has high variance? (High, small x changes high y changes)
• Predictive power? (no)
• Significance for interpretation? (no)

• Solution / Preventions
• Splitting the data
• Regularization
• Cross-validation
• …
Data Splitting – Intuition

• Model with overfitting problem

• Nice performance for data in hand
• Poor predictive accuracy for new dataset

Blue point: data collected

Orange point: new data point
Grey point: predicted value

Error for Data collected: error = 0

new New data: high error
sample

What if orange is intentionally held

out for estimation ？
Data Splitting to Address Overfitting

• Nice performance for data in hand

• Poor predictive accuracy for new dataset

Split the data into two groups

Training Set Test Set

Hypothetically “In-hand” data “New” data Untouched

Purpose Train the model Show performance High error,

Poor performance
Overfitting Very low error High error

ALWAYS split the data first!

Test Set for Performance Measure
PYTHON PRACTICE

2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Regression
No ratings yet
Regression
25 pages
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
89 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Regression
No ratings yet
Regression
16 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Fileml
No ratings yet
Fileml
54 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
36 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Module B Handbook
No ratings yet
Module B Handbook
11 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
11 pages
Complete Chapter Revision Takeaways Supervised ML Regression
No ratings yet
Complete Chapter Revision Takeaways Supervised ML Regression
22 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
ML 1
No ratings yet
ML 1
24 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Final ML
No ratings yet
Final ML
54 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
4 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
CM20315 06 Fitting
No ratings yet
CM20315 06 Fitting
67 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Linear Regression & Optimization Techniques
No ratings yet
Linear Regression & Optimization Techniques
42 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Unit3 ML
No ratings yet
Unit3 ML
52 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Lecture 03 04
No ratings yet
Lecture 03 04
28 pages
Module 3
No ratings yet
Module 3
27 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
8 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
An Airline Crew Assignment
No ratings yet
An Airline Crew Assignment
3 pages
2020 Greece MO Eng
No ratings yet
2020 Greece MO Eng
1 page
Tutorial 1 Solute Boundary Layer
No ratings yet
Tutorial 1 Solute Boundary Layer
3 pages
Neural Networks Exam - ETEG 425
No ratings yet
Neural Networks Exam - ETEG 425
2 pages
SYLLABUS
No ratings yet
SYLLABUS
3 pages
Lab 2
No ratings yet
Lab 2
2 pages
Solution of Linear System of Equations With Complex Variables Using Jacobian and Guass Seidel Method
No ratings yet
Solution of Linear System of Equations With Complex Variables Using Jacobian and Guass Seidel Method
58 pages
Perceptron Learning Algorithm Explained
No ratings yet
Perceptron Learning Algorithm Explained
17 pages
DSA Using C - C++
No ratings yet
DSA Using C - C++
14 pages
Numerical Methods Problems
No ratings yet
Numerical Methods Problems
14 pages
Binomial Theorem
No ratings yet
Binomial Theorem
3 pages
Intro to Neural Networks Lecture
No ratings yet
Intro to Neural Networks Lecture
65 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
Class10 Pt1 QP STD
No ratings yet
Class10 Pt1 QP STD
3 pages
Numerical Analysis: Unit 5
No ratings yet
Numerical Analysis: Unit 5
16 pages
Set B Daa Answer Key1
No ratings yet
Set B Daa Answer Key1
13 pages
Deep Learning As Optimal Control Problems - Models and Numerical Methods
No ratings yet
Deep Learning As Optimal Control Problems - Models and Numerical Methods
34 pages
Daa Assignment - 3
No ratings yet
Daa Assignment - 3
2 pages
Chap 2 Polynomials MCQs
No ratings yet
Chap 2 Polynomials MCQs
5 pages
Advanced Numerical Methods Question Bank
No ratings yet
Advanced Numerical Methods Question Bank
12 pages
4.1 Matrices: (Refer Lecture Notes) (Refer Lecture Notes)
No ratings yet
4.1 Matrices: (Refer Lecture Notes) (Refer Lecture Notes)
4 pages
Mth643 Quize
No ratings yet
Mth643 Quize
15 pages
ESSC MATH 101 - Final Exam Review Session #1
No ratings yet
ESSC MATH 101 - Final Exam Review Session #1
7 pages
AI Assignment-I 1
No ratings yet
AI Assignment-I 1
2 pages
Polynomial Revison Sheet
No ratings yet
Polynomial Revison Sheet
10 pages
Lecture 1 02-01-2022
100% (1)
Lecture 1 02-01-2022
49 pages
Discrete Mathematics Question Bank: Trees
No ratings yet
Discrete Mathematics Question Bank: Trees
4 pages
Chapter 3 Problem Solving Agents
No ratings yet
Chapter 3 Problem Solving Agents
69 pages
Lu Cholesky
No ratings yet
Lu Cholesky
1 page

Lecture3 Upload

Uploaded by

Lecture3 Upload

Uploaded by

Lecture 3

• Simple linear regression

• : target variable / dependent variable / outcome variable

• Multiple linear regression

Both are linear

Define a new variable , then

• Suppose i=1, 2, …, 5, parameter is

• Parameters (s) need to be estimated

• Goal: Minimize the total error

Note: For both methods, n is the number of observations

• Criteria: minimize the sum (mean) of squared errors

• Solving for (calculus & linear algebra):

• The solution can be put to matrix form −1

• Method 2: search for it

• River flowing down a mountain

• Start somewhere (A)

• Large learning rate <=> large step size

• Small learning rate: slow convergence

• Example (Linear Regression):

• Huge dataset – still computationally expensive

• In extreme cases => SGD:

• Adds noise to the gradient

• Batch Gradient Descent vs. Stochastic Gradient Descent

• Balance the two

• Need to decide on batch size

• Very commonly used

• Generate new features consisting of polynomial combinations of the original

• Captures the non-linear pattern between Y and X

• Hyperbolic tangent: tanh(x):

• When overfitting occurs:

• Model with overfitting problem

Blue point: data collected

Error for Data collected: error = 0

What if orange is intentionally held

• Nice performance for data in hand

Split the data into two groups

Training Set Test Set

Hypothetically “In-hand” data “New” data Untouched

Purpose Train the model Show performance High error,

ALWAYS split the data first!

You might also like