0% found this document useful (0 votes)

4 views28 pages

PGN AI and ML Presentation

Uploaded by

nguyenpg.ba12-142

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views28 pages

PGN AI and ML Presentation

Uploaded by

nguyenpg.ba12-142

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Ridge, Lasso and

Elastic Net
Presented by: Pham Gi
Nguyen
Student ID: BA12-142

Course: AI and
Machine Learning
Lecturer: Nguyen Cam 01
Overview of
Topics
1.Advancements with
Regression
2.Ridge Regression
3.Lasso Regression
4.Elastic Net

02
1. Introduction to
Regression Analysis
01
If we continue to draw from OLS as our only approach
to linear regression techniques, methodologically
speaking, we are still within the late 1800’s and early
1900’s timeframe.

With advancements in computing technology,

02 regression analysis can be calculated using a variety of
different statistical techniques which has lead to the
development of new tools and methods.

The techniques we will discuss today will bring us to

03 date with advancements in regression analysis.

In modern data analysis, we will often find data with a

04
very high number of independent variables and we
need better regression techniques to handle this high-
dimensional modeling.

03
Review of Linear
Regression Analysis
Simple Linear Regression
Formula
The output of a regression analysis will produce
The simple regression model can be represented as a coefficient table similar to the one below.
follows:

• This table shows that the intercept is -114.326

and the Height coefficient is 106.505 ± 11.55.
• This can be interpreted as for each unit increase
The β0 represents the Y intercept value, in X, we can expect that Y will increase by 106.5
the coefficient β1 represents the slope of • Also, the T value and Pr > |t| indicate that these
the line, the X1 is an independent variables are statistically significant at the 0.05
variable, and ϵ is the error term. The level and can be included in the model.
error term is the value needed to correct
for a prediction error between the
observed and predicted value.

04
Ordinary Least
Squares
What is Ordinary Least Squares
or OLS?
• In statistics, ordinary least squares (OLS) or
linear least squares is a method for
estimating the unknown parameters in a
linear regression model.

• The goal of OLS is to minimize the differences

between the observed responses in some
arbitrary dataset and the responses predicted
by the linear approximation of the data.

05
2. Ridge
Regression
Ridge Regression is a modeling
01
technique that works to solve the
multi-collinearity problem in OLS
models through the incorporation of
the shrinkage parameter, λ.

02
The assumptions of the model are the
same as OLS: Linearity, constant
variance, and independence. Normality
need not be assumed.

Additionally, multiple linear

03 regression (OLS) has no manner to
identify a smaller subset of important
variables.

06
Ridge
Regression
• In OLS regression, the form of the equation
in matrix notation as follows:
Y=β0+β1X1+β2X2+…+ϵ can be represented

XtXβ=XtY
• Where XX is the design matrix having, [X]ij=xij ,y is the vector of the response (y1,…,yn) and β is
the vector of the coefficients (β1,…,βp).

• This equation can be rearranged to show the following:

β=(X’X)−1 X’Y
• Where R=X’X

• And R is the correlation matrix of independent variables.

• These estimates are unbiased so the expected values of the estimates are the population values.
That is,

E(β′)=β
• The variance-covariance matrix of the estimates is

V(β′)=σ2R−1 07
Ridge
Regression
• Ridge Regression proceeds by adding a small value, λ to the diagonal elements of the correlation matrix. (This is where
ridge regression gets its name since the diagonal of ones may be thought of as a ridge.)

• λ is a positive value less than one (usually less than 0.3).

• The amount of bias of the estimator is given by:

• and the covariance matrix is given by:

08
Ridge Trace
• One of the main obstacles in using ridge regression is choosing an
appropriate value of λ. The inventors of ridge regression suggested
using a graphic which they called a "ridge trace.“

• A ridge trace is a plot that shows the ridge regression coefficients as a

function of λ.

• When viewing the ridge trace, we are looking for the λ for which the
regression coefficients have stabilized. Often the coefficients will vary
widely for small values of λ and then stabilize.

• Choose the smallest value of λ\lambda possible (which introduces the

smallest bias) after which the regression coefficients seem to have
remained constant.

• Note: Increasing λ will eventually drive the regression coefficients to 0

09
Scale in Ridge
Regression
• Here is a visual representation of the ridge coefficients for λ versus a
linear regression.

• We can see that the size of the coefficients (penalized) has decreased
through our shrinking function, ℓ2

• It is also important to point out that in ridge regression we usually

leave the intercept un-penalized because it is not in the same scale as
the other predictors.

• The λ is unfair if the predictor variables are not on the same scale.

• Therefore, if we know that the variables are not measured in the

same units, we typically center and scale all of the variables before
building a ridge regression

10
Variable
Selection
• The problem of picking out the relevant variables from a larger set is
called variable selection.

• Suppose there is a subset of coefficients that are identically zero. This

means that the men response doesn’t depend on these predictors at
all.

• The red paths on the plot are the true non-zero coefficients, the grey
paths are the true zeros.

• The vertical dashed line is the point at which ridge regression’s MSE
starts losing to linear regression.

• Note: the grey coefficient paths are not exactly zero; they are
shrunken, but non-zero.

11
Variable
Selection
• We can show that ridge regression doesn’t set the coefficients exactly
to zero unless λ=∞, in which case they are all zero.

• Therefore, ridge regression cannot perform variable selection.

• Ridge regression performs well when there is a subset of true

coefficients that are small or zero.

• It doesn’t do well when all of the true coefficients are moderately

large, however, will still perform better than OLS regression.

12
Ridge
Advantages Regression Disadvantages
•Reduces Overfitting: By adding a penalty to the size of •Does Not Perform Variable Selection: Ridge Regression
coefficients, Ridge Regression reduces the risk of overfitting. does not set any coefficients to zero, so it does not perform
variable selection.
•Handles Multicollinearity: It is effective in dealing with
multicollinearity (when predictor variables are highly correlated). •Interpretability: The model can be less interpretable
because it includes all predictors in the final model.
•Computationally Efficient: It is computationally efficient and
can be solved using standard linear algebra techniques.

Potential Applications
Genomics and Bioinformatics:
•Gene Expression Data: Ridge Regression is used to handle multicollinearity
among gene expression data and improve the accuracy of gene function prediction.

Economics and Finance:

•Portfolio Optimization: It's applied in financial models to predict returns and
manage portfolios, especially when dealing with a large number of correlated
financial indicators.

Healthcare:
•Disease Risk Prediction: Ridge Regression helps in predicting disease risk by
considering multiple correlated health metrics. 13
3. Lasso
Regression

The lasso combines some of Lasso is an acronym for

01 02
the shrinking advantages of 'Least Absolute
ridge regression with Selection and
variable selection. Shrinkage Operator'.

The lasso is very competitive The only difference between the

03
with the ridge regression in
04 lasso and ridge regression is that
regards to prediction error the ridge ℓ2uses the ∥β∥22 penalty
where the lasso ℓ1 uses the ∥β∥1
penalty

14
Lasso
Regression
• The tuning parameter λ controls the strength of the penalty, and
like ridge regression, we get the βlasso= the linear regression
estimate when λ=0, and βlasso when λ=∞

• For λ in between these two extremes, we are balancing two ideas:

fitting a linear model of y on X, and shrinking the coefficients.

• The nature of the penalty ℓ1 causes some of the coefficients to be

shrunken to zero exactly.

• This is what makes lasso different than ridge regression. It is able to

perform variable selection in the linear model.

• Important: As λ increases, more coefficients are set to zero (less

variables selected), and among non-zero coefficients, more
shrinkage is employed.

15
Lasso
Regression
Because the lasso sets the coefficients to exactly zero. It performs variable selection in the linear model

16
Lasso
Regression
We can also use plots of the degrees of freedom (ds) to put different estimates on equal footing

17
Constrained Form
• It can be helpful to think about our penalty ℓ1 and ℓ2 parameters in the following form:

• We can think of this formula now in a constrained (penalized) form:

• t is a tuning parameter (which we have been calling λ earlier)

• The usual OLS regression solves the unconstrained least squares problem; these estimates constrain the
coefficient vector to lie in some geometric shape centered around the origin.

18
Constrained
Form
This generally reduces the variance because it keeps the estimate close to zero. But the shape that we
choose really matters!!!

The contour lines are the least squares The contour lines are the least squares
error function. The blue diamond is the error function. The blue circle is the
constraint region for the lasso regression. constraint region for the ridge regression.

19
Lasso
Regression
Advantages Disadvantages
•Performs Variable Selection: Lasso can shrink some •Computationally Intensive: It can be more computationally
coefficients to zero, effectively performing variable selection intensive than Ridge Regression, especially with a large
and producing simpler, more interpretable models. number of predictors.

•Reduces Overfitting: Like Ridge Regression, Lasso also •Bias: Lasso can introduce bias into the model, especially if
reduces the risk of overfitting by adding a penalty to the size the true relationship between predictors and the response is
of coefficients. not sparse.

Potential Application
Feature Selection in Machine Learning:
•High-Dimensional Data: Lasso is extensively used for selecting important features in
datasets with a large number of variables, such as text classification and image recognition.

Credit Scoring:
•Risk Assessment: In finance, Lasso can identify key predictors of credit risk from a large
set of financial indicators.

Marketing:
•Customer Segmentation: Helps in identifying the most influential factors that differentiate
between different customer segments. 20
4. When we are working with high dimensional
Elastic 01 data, correlations between the variables can
be high resulting in multicollinearity.

Net These correlated variables can sometimes

02 form groups or clusters of correlated
variables.

There are many times where we would want

03 to include the entire group in the model
selection if one variable has been selected.

This can be thought of as an elastic net

04
catching a school of fish instead of singling
out a single fish.

21
Elastic Net
The total number of
Additionally, lasso fails to
variables that the lasso
perform grouped selection. The elastic net forms a
variable selection
It tends to select one hybrid of the ℓ1 and ℓ2
procedure can select is
variable from a group and penalties.
bound by the total number
ignore the others.
of samples in the dataset.

22
Elastic Net
• Ridge, Lasso, and Elastic Net are all part of the same family with the penalty term of:

•If the α=0 then we have a Ridge Regression

•If the α=1 then we have the LASSO
•If the 0<α<10 then we have the elastic net

23
Elastic Net

The specification of the elastic net penalty above is actually considered a naïve elastic net.

• Unfortunately, the naïve elastic net does not perform well in practice.

• The parameters are penalized twice with the same α level. (this is why it is called naïve)

• To correct this we can use the following:

24
Elastic Net -
Constraint
Here is the visualization of the
constrained region for the elastic net
Visualizations for the Ridge Regression, Lasso And Elastics Nets

25
Elastic
Advantages Net
•Combines Ridge and Lasso: Elastic Net combines the penalties of
Disadvantages
•Complexity: The model can be more complex to tune due
Ridge and Lasso, providing a balance between the two methods. to the additional mixing parameter.

•Handles Multicollinearity and Variable Selection: It can handle •Computationally Intensive: Like Lasso, Elastic Net can be
multicollinearity and perform variable selection simultaneously. computationally intensive, especially with a large number of
predictors.
•Flexibility: The mixing parameter allows for flexibility in the amount
of Ridge and Lasso penalties applied.

Potential Application
Predictive Modeling in Medicine:
•Genetic Studies: Elastic Net is useful in predictive modeling where there are groups of correlated
genes, combining the strengths of Ridge and Lasso.

Chemoinformatic:
•Drug Discovery: Used to identify important molecular descriptors and predict the biological
activity of compounds.

Social Science Research:

•Survey Data Analysis: Elastic Net helps in analyzing large survey datasets by selecting
significant predictors while handling multicollinearity.
26
References
Ridge Regression
Ridge Regression (Wikipedia)
Ridge Regression Lecture Notes from "The Elements of Statistical Learning" by Hastie,
Tibshirani, and Friedman.
Lasso Regression
Lasso Regression (Wikipedia)
Lasso Regression Lecture Notes by Trevor Hastie.
Elastic Net
Elastic Net (Wikipedia)
Elastic Net Regression Tutorial on Cross Validated, a Stack Exchange site.

Most Important: TRUST ME BRO

27
Thank You For
Watching

Goal Stack Planning
100% (8)
Goal Stack Planning
10 pages
Week 2 Lasso and Ridge Regression
No ratings yet
Week 2 Lasso and Ridge Regression
7 pages
Lecturer 4 Regression Analysis
100% (1)
Lecturer 4 Regression Analysis
29 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Lecture Notes On Ridge Regression
No ratings yet
Lecture Notes On Ridge Regression
149 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Регрессии на пальцаx
100% (1)
Регрессии на пальцаx
118 pages
Machine Learning With Ridge and Lasso Regression
No ratings yet
Machine Learning With Ridge and Lasso Regression
19 pages
LASSO and Ridge-1
No ratings yet
LASSO and Ridge-1
15 pages
Operating System
No ratings yet
Operating System
3 pages
Ridge Regression: Patrick Breheny
No ratings yet
Ridge Regression: Patrick Breheny
22 pages
Unit 4
No ratings yet
Unit 4
215 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
21csc305p ML Unit 2
No ratings yet
21csc305p ML Unit 2
115 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
CE 007 Quiz 3 Solution
No ratings yet
CE 007 Quiz 3 Solution
2 pages
Unit 2
No ratings yet
Unit 2
92 pages
Modern Regression - Ridge Regression
100% (1)
Modern Regression - Ridge Regression
21 pages
Regid Regression
No ratings yet
Regid Regression
129 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
ML EasySol
No ratings yet
ML EasySol
62 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Note - Before Use Check Answers According To Your Syllabus.: Importance
No ratings yet
Note - Before Use Check Answers According To Your Syllabus.: Importance
31 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Regularization Methods Intro 1694372556
No ratings yet
Regularization Methods Intro 1694372556
38 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
INSY446 - 3 - Linear Model Part 2
No ratings yet
INSY446 - 3 - Linear Model Part 2
27 pages
Fraud Detection ML
No ratings yet
Fraud Detection ML
13 pages
LAB Final
No ratings yet
LAB Final
27 pages
Lecture 3
No ratings yet
Lecture 3
16 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
Copie de Executive Summary of Marketing Plan by Slidesgo 1
No ratings yet
Copie de Executive Summary of Marketing Plan by Slidesgo 1
50 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
DA Unit-3 Short Q&A
No ratings yet
DA Unit-3 Short Q&A
17 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
Fundemantal of AI - Project
No ratings yet
Fundemantal of AI - Project
17 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Pa 1 Unit
No ratings yet
Pa 1 Unit
23 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
Mlda U1
No ratings yet
Mlda U1
10 pages
Module 4 EDA
No ratings yet
Module 4 EDA
20 pages
Chapter 5 Artificial Intelligence Notes
No ratings yet
Chapter 5 Artificial Intelligence Notes
7 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Aml 3
No ratings yet
Aml 3
19 pages
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
No ratings yet
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
43 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
TP2 Reg 2024
No ratings yet
TP2 Reg 2024
5 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Unit III
No ratings yet
Unit III
18 pages
Arrays in C
No ratings yet
Arrays in C
10 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Tibshirani Lasso
No ratings yet
Tibshirani Lasso
22 pages
Filters
No ratings yet
Filters
20 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
DSD FIR Filter
No ratings yet
DSD FIR Filter
7 pages
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
No ratings yet
Modern Regression 1: Ridge Regression: Ryan Tibshirani Data Mining: 36-462/36-662
21 pages
Ridge Regression LASSO
No ratings yet
Ridge Regression LASSO
18 pages
Name - Aryan Gupta Reg. No. 199301088 Section - B: Ans.1) A. Rabin Karp String Matching Algorithm Code
No ratings yet
Name - Aryan Gupta Reg. No. 199301088 Section - B: Ans.1) A. Rabin Karp String Matching Algorithm Code
7 pages
Lec - 2 - Analysis of Algorithms (Part 2)
No ratings yet
Lec - 2 - Analysis of Algorithms (Part 2)
45 pages
Department of Electrical Engineering: Riphah International University, Islamabad, Pakistan
No ratings yet
Department of Electrical Engineering: Riphah International University, Islamabad, Pakistan
9 pages
Wavelet Transform
No ratings yet
Wavelet Transform
5 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Ecen 644 - Homework #5 Solution Set
No ratings yet
Ecen 644 - Homework #5 Solution Set
20 pages
Regression Analysis in Machine Learning: Context
No ratings yet
Regression Analysis in Machine Learning: Context
16 pages
Module 4: Regression Shrinkage Methods
No ratings yet
Module 4: Regression Shrinkage Methods
5 pages
9.4 Slides
No ratings yet
9.4 Slides
9 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
Design and Analysis of Algorithms - MCQS: Home About Us Contact Our Policy
No ratings yet
Design and Analysis of Algorithms - MCQS: Home About Us Contact Our Policy
9 pages
Lab Ex 1-3
No ratings yet
Lab Ex 1-3
7 pages
HW1 Solutions
No ratings yet
HW1 Solutions
3 pages
Expanded Intelligent Control Techniques Answers
No ratings yet
Expanded Intelligent Control Techniques Answers
2 pages
Fazal - CFD - Assignment - Example 7.2
No ratings yet
Fazal - CFD - Assignment - Example 7.2
9 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Detailed Breakdown Ridge Lasso
No ratings yet
Detailed Breakdown Ridge Lasso
2 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
Principles of Digital Signal Processing - Lesson PLan
No ratings yet
Principles of Digital Signal Processing - Lesson PLan
3 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Introduction to Logarithms and Exponentials
From Everand
Introduction to Logarithms and Exponentials
Simone Malacrida
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

PGN AI and ML Presentation

Uploaded by

PGN AI and ML Presentation

Uploaded by

Ridge, Lasso and

With advancements in computing technology,

The techniques we will discuss today will bring us to

In modern data analysis, we will often find data with a

• This table shows that the intercept is -114.326

• The goal of OLS is to minimize the differences

Additionally, multiple linear

• This equation can be rearranged to show the following:

• And R is the correlation matrix of independent variables.

• λ is a positive value less than one (usually less than 0.3).

• The amount of bias of the estimator is given by:

• and the covariance matrix is given by:

• A ridge trace is a plot that shows the ridge regression coefficients as a

• Choose the smallest value of λ\lambda possible (which introduces the

• Note: Increasing λ will eventually drive the regression coefficients to 0

• It is also important to point out that in ridge regression we usually

• Therefore, if we know that the variables are not measured in the

• Suppose there is a subset of coefficients that are identically zero. This

• Therefore, ridge regression cannot perform variable selection.

• Ridge regression performs well when there is a subset of true

• It doesn’t do well when all of the true coefficients are moderately

Economics and Finance:

The lasso combines some of Lasso is an acronym for

The lasso is very competitive The only difference between the

• For λ in between these two extremes, we are balancing two ideas:

• The nature of the penalty ℓ1 causes some of the coefficients to be

• This is what makes lasso different than ridge regression. It is able to

• Important: As λ increases, more coefficients are set to zero (less

• We can think of this formula now in a constrained (penalized) form:

• t is a tuning parameter (which we have been calling λ earlier)

Net These correlated variables can sometimes

There are many times where we would want

This can be thought of as an elastic net

•If the α=0 then we have a Ridge Regression

• To correct this we can use the following:

Social Science Research:

Most Important: TRUST ME BRO

You might also like