PGN AI and ML Presentation
PGN AI and ML Presentation
Elastic Net
Presented by: Pham Gi
Nguyen
Student ID: BA12-142
Course: AI and
Machine Learning
Lecturer: Nguyen Cam 01
Overview of
Topics
1.Advancements with
Regression
2.Ridge Regression
3.Lasso Regression
4.Elastic Net
02
1. Introduction to
Regression Analysis
01
If we continue to draw from OLS as our only approach
to linear regression techniques, methodologically
speaking, we are still within the late 1800’s and early
1900’s timeframe.
03
Review of Linear
Regression Analysis
Simple Linear Regression
Formula
The output of a regression analysis will produce
The simple regression model can be represented as a coefficient table similar to the one below.
follows:
04
Ordinary Least
Squares
What is Ordinary Least Squares
or OLS?
• In statistics, ordinary least squares (OLS) or
linear least squares is a method for
estimating the unknown parameters in a
linear regression model.
05
2. Ridge
Regression
Ridge Regression is a modeling
01
technique that works to solve the
multi-collinearity problem in OLS
models through the incorporation of
the shrinkage parameter, λ.
02
The assumptions of the model are the
same as OLS: Linearity, constant
variance, and independence. Normality
need not be assumed.
06
Ridge
Regression
• In OLS regression, the form of the equation
in matrix notation as follows:
Y=β0+β1X1+β2X2+…+ϵ can be represented
XtXβ=XtY
• Where XX is the design matrix having, [X]ij=xij ,y is the vector of the response (y1,…,yn) and β is
the vector of the coefficients (β1,…,βp).
β=(X’X)−1 X’Y
• Where R=X’X
• These estimates are unbiased so the expected values of the estimates are the population values.
That is,
E(β′)=β
• The variance-covariance matrix of the estimates is
V(β′)=σ2R−1 07
Ridge
Regression
• Ridge Regression proceeds by adding a small value, λ to the diagonal elements of the correlation matrix. (This is where
ridge regression gets its name since the diagonal of ones may be thought of as a ridge.)
08
Ridge Trace
• One of the main obstacles in using ridge regression is choosing an
appropriate value of λ. The inventors of ridge regression suggested
using a graphic which they called a "ridge trace.“
• When viewing the ridge trace, we are looking for the λ for which the
regression coefficients have stabilized. Often the coefficients will vary
widely for small values of λ and then stabilize.
09
Scale in Ridge
Regression
• Here is a visual representation of the ridge coefficients for λ versus a
linear regression.
• We can see that the size of the coefficients (penalized) has decreased
through our shrinking function, ℓ2
• The λ is unfair if the predictor variables are not on the same scale.
10
Variable
Selection
• The problem of picking out the relevant variables from a larger set is
called variable selection.
• The red paths on the plot are the true non-zero coefficients, the grey
paths are the true zeros.
• The vertical dashed line is the point at which ridge regression’s MSE
starts losing to linear regression.
• Note: the grey coefficient paths are not exactly zero; they are
shrunken, but non-zero.
11
Variable
Selection
• We can show that ridge regression doesn’t set the coefficients exactly
to zero unless λ=∞, in which case they are all zero.
12
Ridge
Advantages Regression Disadvantages
•Reduces Overfitting: By adding a penalty to the size of •Does Not Perform Variable Selection: Ridge Regression
coefficients, Ridge Regression reduces the risk of overfitting. does not set any coefficients to zero, so it does not perform
variable selection.
•Handles Multicollinearity: It is effective in dealing with
multicollinearity (when predictor variables are highly correlated). •Interpretability: The model can be less interpretable
because it includes all predictors in the final model.
•Computationally Efficient: It is computationally efficient and
can be solved using standard linear algebra techniques.
Potential Applications
Genomics and Bioinformatics:
•Gene Expression Data: Ridge Regression is used to handle multicollinearity
among gene expression data and improve the accuracy of gene function prediction.
Healthcare:
•Disease Risk Prediction: Ridge Regression helps in predicting disease risk by
considering multiple correlated health metrics. 13
3. Lasso
Regression
14
Lasso
Regression
• The tuning parameter λ controls the strength of the penalty, and
like ridge regression, we get the βlasso= the linear regression
estimate when λ=0, and βlasso when λ=∞
15
Lasso
Regression
Because the lasso sets the coefficients to exactly zero. It performs variable selection in the linear model
16
Lasso
Regression
We can also use plots of the degrees of freedom (ds) to put different estimates on equal footing
17
Constrained Form
• It can be helpful to think about our penalty ℓ1 and ℓ2 parameters in the following form:
• The usual OLS regression solves the unconstrained least squares problem; these estimates constrain the
coefficient vector to lie in some geometric shape centered around the origin.
18
Constrained
Form
This generally reduces the variance because it keeps the estimate close to zero. But the shape that we
choose really matters!!!
The contour lines are the least squares The contour lines are the least squares
error function. The blue diamond is the error function. The blue circle is the
constraint region for the lasso regression. constraint region for the ridge regression.
19
Lasso
Regression
Advantages Disadvantages
•Performs Variable Selection: Lasso can shrink some •Computationally Intensive: It can be more computationally
coefficients to zero, effectively performing variable selection intensive than Ridge Regression, especially with a large
and producing simpler, more interpretable models. number of predictors.
•Reduces Overfitting: Like Ridge Regression, Lasso also •Bias: Lasso can introduce bias into the model, especially if
reduces the risk of overfitting by adding a penalty to the size the true relationship between predictors and the response is
of coefficients. not sparse.
Potential Application
Feature Selection in Machine Learning:
•High-Dimensional Data: Lasso is extensively used for selecting important features in
datasets with a large number of variables, such as text classification and image recognition.
Credit Scoring:
•Risk Assessment: In finance, Lasso can identify key predictors of credit risk from a large
set of financial indicators.
Marketing:
•Customer Segmentation: Helps in identifying the most influential factors that differentiate
between different customer segments. 20
4. When we are working with high dimensional
Elastic 01 data, correlations between the variables can
be high resulting in multicollinearity.
21
Elastic Net
The total number of
Additionally, lasso fails to
variables that the lasso
perform grouped selection. The elastic net forms a
variable selection
It tends to select one hybrid of the ℓ1 and ℓ2
procedure can select is
variable from a group and penalties.
bound by the total number
ignore the others.
of samples in the dataset.
22
Elastic Net
• Ridge, Lasso, and Elastic Net are all part of the same family with the penalty term of:
23
Elastic Net
The specification of the elastic net penalty above is actually considered a naïve elastic net.
• Unfortunately, the naïve elastic net does not perform well in practice.
• The parameters are penalized twice with the same α level. (this is why it is called naïve)
24
Elastic Net -
Constraint
Here is the visualization of the
constrained region for the elastic net
Visualizations for the Ridge Regression, Lasso And Elastics Nets
25
Elastic
Advantages Net
•Combines Ridge and Lasso: Elastic Net combines the penalties of
Disadvantages
•Complexity: The model can be more complex to tune due
Ridge and Lasso, providing a balance between the two methods. to the additional mixing parameter.
•Handles Multicollinearity and Variable Selection: It can handle •Computationally Intensive: Like Lasso, Elastic Net can be
multicollinearity and perform variable selection simultaneously. computationally intensive, especially with a large number of
predictors.
•Flexibility: The mixing parameter allows for flexibility in the amount
of Ridge and Lasso penalties applied.
Potential Application
Predictive Modeling in Medicine:
•Genetic Studies: Elastic Net is useful in predictive modeling where there are groups of correlated
genes, combining the strengths of Ridge and Lasso.
Chemoinformatic:
•Drug Discovery: Used to identify important molecular descriptors and predict the biological
activity of compounds.
27
Thank You For
Watching
28