0% found this document useful (0 votes)

18 views14 pages

Lecture BDS 7-23-24 Print

Uploaded by

Victor Van der Wel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views14 pages

Lecture BDS 7-23-24 Print

Uploaded by

Victor Van der Wel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Big Data Statistics, meeting 7: When d

is bigger than n, part 4

28 February 2024
Found and open
■ Found: With high probability the difference between the LASSO estimator β̂ n
and the true β is small even if d > n provided the number of non-zero entries of β
remains fixed: cf. Theorem (upper bound on error of LASSO probabilistic
version)?
■ Remained open
◆ What do we know about the averaged squared approximation error
(β̂ n − β)T XnT Xn (β̂ n − β), when β̂ n is the LASSO estimator?
◆ Does the LASSO estimator select those covariates for which we have βj 6= 0?

4
Criterion 2
■ Recall criterion 2 of Lecture 6 (see slide 13 of this lecture) for a good estimator
was that the average squared approximation error goes to zero. i.e.

(1/n)||Xn (β̂ − β)||22 → 0.

■ Today, we will consider two results for this quantity:

◆ The first result uses the restricted eigenvalue condition over C(S, 3) and the
number of non-zero elements of β to equal k;
◆ The second result uses another type of sparsity, so-called weak sparsity. It
requires
Xd
|βj | ≤ R1 ,
j=1

where R1 is a strictly positive real number.

6
Criterion 2 (cont’d)
■ The reason for considering two different types of sparsity is to get a feel for how
our notion of sparsity affects the results we can expect (in the straightforward
sense of ’certain input results in a certain output’).
■ Which strategy worked nicely in Lecture 6 when we looked at ||β̂n − β||2 ?
■ Splitting the problem into
◆ a deterministic part; and
◆ a random part.
■ Here we will do the same. The deterministic result for the LASSO estimator,
i.e. the solution to
 2
n
X Xd Xd
minimize w.r.t. β : (1/n) yi − βj xij  + λ |βj |, (1)
i=1 j=1 j=1

is as follows

7
Criterion 2 (cont’d)
Theorem: (upper bound approximation error) For problem (1) we have with
λn ≥ 2||XnT ǫn (ω)||∞ /n > 0.
(i) Suppose that the design matrix Xn satisfies the restricted eigenvalue bound with
parameter γ > 0 over C(S; 3) and let |SA (β)| = k. Then for any solution β̂ of (1)
we have
2 c1
(1/n)||Xn (β̂ n − β)||2 ≤ k λ2n , (2)
γ
where c1 is a constant independent of d and n.
Pd
(ii) If j=1 |βj | ≤ R1 we have

(1/n)||Xn (β̂ n − β)||22 ≤ c2 R1 λn , (3)

where c2 is a constant independent of d and n.

■ Remark: In part (i) we denote by |SA (β)| the cardinality, i.e. the number of
elements, of the active set of the true regression vector β.

8
Criterion 2 (cont’d)
Remarks: (on upper bound approximation error)
q
■ We know from Lecture 6, that by choosing λn = τ σ log(d) n , with τ chosen
√
appropriately (see Lecture 6 where τ > 8 was needed), √ the statements above
will hold with high probability if τ is not too close to 8 or if log(d) is not too
small (or both).
■ With that choice of λn statement (i) becomes

c̃1 log(d)
(1/n)||Xn (β̂ n − β)||22 ≤ k , (4)
γ n
where c̃1 is a constant independent of d and n; and statement (ii) becomes
r
2 log(d)
(1/n)||Xn (β̂ n − β)||2 ≤ c̃2 R1 , (5)
n
where c̃2 is a constant independent of d and n.

9
Selecting right variables (cont’d)
Here comes the result (for the exact assumptions see Wainwright (2009) or Section
11.4 of Hastie, T, Tibshirani, R., Wainwright, M. (2016))
q
Theorem (selecting right variables) Let λn = cγ3 σ log(d)
n , where c3 is a constant.
Then with probability at least of order (ignoring constants) 1 − exp(− log(d)) we have
(i) SA (β̂ n ) ⊆ SA (β);
(ii) if additionally minj∈SA |βj | is bounded from below (with the lower bound
depending on λn ) we have SA (β̂ n ) = SA (β).
Remarks (on Theorem selecting right variables)
■ Statement (i) means our estimator does not include variables that are irrelevant
(with high probability);
■ Yet, statement (i) does not exclude the possibility of missing important variables;
■ Statement (ii) implies that by imposing further assumptions we select all important
variables and only those (with high probability).

13
Older penalities (cont’d)
■ Message of Quiz 7?
■ Even if XnT Xn does not have an inverse,

XnT Xn + Id×d

will have an inverse because every positive definite matrix has an inverse (and
Id×d is positive definite ).
■ We can generalize this to λId×d (cf. also Quiz 7) which is also positive definite if
λ > 0 to find that the following matrix has an inverse too:

XnT Xn + λId×d .

■ This opens up another way to solve the issues mentioned on the previous slide. If
(6) is not possible for β̂ because XnT Xn does not have an inverse, we could simply
replace it by
R
β̂ n = (XnT Xn + λId×d )−1 XnT Yn . (7)
R
■ β̂ n
is known as Ridge regression estimator (the name Ridge is the reason for the
superscript R).
16
Older penalities (cont’d)
Remarks (on Ridge regression)
■ Clearly, if λ is small XnT Xn + λId×d will be close to XnT Xn . Yet, the addition of
the small λId×d to XnT Xn has a huge impact as XnT Xn + λId×d has an inverse in
contrast to XnT Xn .
■ In practice, choosing λ too small may make it hard (or impossible) for your
software to find the inverse of XnT Xn + λId×d .
■ As alluded to in the introduction to Quiz 7 in practice finding the inverse of XnT Xn
is difficult (or impossible) if XnT Xn is close to collinearity (that means here we
have highly correlated regressors).
■ Also in such a case inverting XnT Xn + λId×d instead and using (7) as our
regression estimator provides a way out.

17
Older penalities (cont’d)
■ While (7) was how Ridge regression was introduced historically, nowadays
another way of introducing it is more popular.
■ The more recent way of introducing the Ridge regression estimator is
■ The solution to
 2
n
X X d Xd
minimize w.r.t. β : yi − βj xij  + λ βj2 (8)
i=1 j=1 j=1

is called Ridge regression estimator.

■ On exercise sheet 4 you can convince yourself that (7) and (8) are indeed the same.
■ If P
you think back to our first lecture on the LASSO and if you see that the penalty
λ dj=1 βj2 includes β1 it means that our data were already processed to fulfil the
normalizing and centering conditions (recall we do not want to penalize the
intercept).
Pn Pd 2
■ In line with most of the literature we do not multiply i=1 yi − j=1 βj xij
in Equation (8) by (1/n).

18
Older penalities (cont’d)
■ Let’s try to get a feel for what Ridge regression does.
■ When we started studying the LASSO estimator we looked at which special case
to get a feel for what LASSO does?
■ Right, we looked at orthogonal design. Let’s do the same here. Assume that
XnT Xn = Id×d then Equation (7) becomes
R 1
β̂ n = (Id×d + λId×d ) −1
XnT Yn = XnT Yn .
1+λ

■ Let’s compare this to what our ordinary OLS estimator β̂ LS would be if we had
not added the term λId×d . Then β̂ LS would be

β̂ LS = XnT Yn .

■ So, in this case we see very clearly that ridge regression shrinks the ordinary OLS
towards zero.

19
Older penalities (cont’d)
To conclude this section and to shed light on the third statement, let us compare the
ridge regression estimator with the LASSO for orthogonal design.
■ We can rewrite the ridge regression estimator as
1
β̂iR = β̂LS,i .
1+λ

■ Also the LASSO can be written in terms of β̂ LS as

λ λ


 β̂ LS,i − 2 , β̂ LS,i ≥ 2;
β̂i = 0, − λ2 < β̂LS,i < λ2 ;

β̂LS,i + λ2 , β̂LS,i ≤ − λ2 .


■ Bottom line:
◆ Ridge regression estimator does not select variables ( β̂iR is unequal to zero if
β̂LS,i is), whereas LASSO does select variables.
◆ Hence, LASSO overcomes the problem of having to estimate many regression
coefficients with a possibly small n. Ridge fails to do so.
22
Correlations and penalities
We now turn a generalization of the LASSO penality known as elastic net. To motivate
it we collect a few more facts about the LASSO and the ridge regression estimator:
■ If we have several strongly correlated regressors LASSO will tend to select only
one (or maybe two) of them simply because many regression coefficients are
estimated as zero by LASSO. Picking only one (or maybe two) from a group of
strongly correlated regressors may seem a bit arbitrary.
■ As we have highly correlated variables the estimates for the regression coefficients
are somehow arbitrary. For example, when X1 and X2 are strongly correlated and
their ’joint effect’ is 2, say, it does not really make a difference whether you
estimate β1 and β2 both as 1 or whether you estimate β1 as 1.8 and β2 as 0.2.
Thus, we may view this also as arbitrary (on Exercise sheet 4 you can convince
yourself of this).
■ Ridge regression is known to overcome this arbitrariness by diving the total effect
equally among the two regressors (on Exercise sheet 4 you can convince yourself
of this property of ridge regression).

24
Correlations and penalities
■ The elastic net estimator is the solution to
 2  
Xn X d d
X d
X
minimize w.r.t. β : yi − βj xij  + λ α |βj | + (1 − α) βj2  .
i=1 j=1 j=1 j=1

■ Here α ∈ [0, 1].

■ Clearly, for α = 0 we obtain the ridge regression estimator while for α = 1 we get
the LASSO estimator.
■ For values of α between 0 and 1 we obtain a comprise between LASSO and ridge.

Prophet-Hacker-Android-Hacking-Blog Book PDF
100% (4)
Prophet-Hacker-Android-Hacking-Blog Book PDF
1,154 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Lecture BDS 5 23 24 Print
No ratings yet
Lecture BDS 5 23 24 Print
9 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
Lecture BDS 6-23-24 Print
No ratings yet
Lecture BDS 6-23-24 Print
10 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
7SSMM700 Lecture 8
No ratings yet
7SSMM700 Lecture 8
33 pages
Ch5 Regularization
No ratings yet
Ch5 Regularization
23 pages
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
No ratings yet
Notes - Lecture 13 - Regularization - LASSO and RIDGE Regression
29 pages
SL 3
No ratings yet
SL 3
11 pages
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
No ratings yet
Cs 7265 Big Data Analytics Regularization On Linear Model: Mingon Kang, PH.D Computer Science, Kennesaw State University
24 pages
4lasso and Friends
No ratings yet
4lasso and Friends
36 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
INSY662 - F23 - Week 3-2
No ratings yet
INSY662 - F23 - Week 3-2
15 pages
PGN AI and ML Presentation
No ratings yet
PGN AI and ML Presentation
28 pages
Slides Ridge Lasso Regression
No ratings yet
Slides Ridge Lasso Regression
23 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Rudyregularization PDF
No ratings yet
Rudyregularization PDF
56 pages
Linear Regression With Limited Observation: Elad Hazan Tomer Koren
No ratings yet
Linear Regression With Limited Observation: Elad Hazan Tomer Koren
12 pages
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
No ratings yet
Regularization: Ridge Regression and The LASSO: Statistics 305: Autumn Quarter 2006/2007
56 pages
SL 4
No ratings yet
SL 4
6 pages
Problem Set 2 - Answers
No ratings yet
Problem Set 2 - Answers
5 pages
Ridge Regression
No ratings yet
Ridge Regression
5 pages
Tibshirani Lasso
No ratings yet
Tibshirani Lasso
22 pages
Non-Spherical Errors: 1 Efficient OLS
No ratings yet
Non-Spherical Errors: 1 Efficient OLS
14 pages
Ex Regularization 2
No ratings yet
Ex Regularization 2
3 pages
SLChapter 5
No ratings yet
SLChapter 5
16 pages
q6-5 Solution (Ridge and Lasso)
No ratings yet
q6-5 Solution (Ridge and Lasso)
7 pages
Chapter 6 - 1 Handsout Machine Learning
No ratings yet
Chapter 6 - 1 Handsout Machine Learning
29 pages
4 Smoothing
No ratings yet
4 Smoothing
34 pages
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
No ratings yet
ISYE 8803 - Kamran - M6 - LD Learning Using Regularization
25 pages
Generalized Ridge Regression Biased Estimation For
No ratings yet
Generalized Ridge Regression Biased Estimation For
23 pages
Regression Shrinkage and Selection Via The Lasso: A Retrospective
No ratings yet
Regression Shrinkage and Selection Via The Lasso: A Retrospective
10 pages
Second-Order Nonlinear Least Squares Estimation: Liqun Wang
No ratings yet
Second-Order Nonlinear Least Squares Estimation: Liqun Wang
18 pages
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
No ratings yet
The Bayesian Lasso: Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521
16 pages
Meier Et Al. - 2008 - The Group Lasso For Logistic Regression
No ratings yet
Meier Et Al. - 2008 - The Group Lasso For Logistic Regression
22 pages
In The Name of God: 1 PS#4 - Solution
No ratings yet
In The Name of God: 1 PS#4 - Solution
4 pages
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
No ratings yet
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
22 pages
Econometrics I, MT2 Problem Set 1: y y Corr R
No ratings yet
Econometrics I, MT2 Problem Set 1: y y Corr R
1 page
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
No ratings yet
Suggested Solutions: Problem Set 3 Econ 210: April 27, 2015
11 pages
10 Ba607
No ratings yet
10 Ba607
44 pages
A Note On The Group Lasso and A Sparse Group Lasso PDF
No ratings yet
A Note On The Group Lasso and A Sparse Group Lasso PDF
9 pages
Stats216 hw2
No ratings yet
Stats216 hw2
21 pages
Slides 2
No ratings yet
Slides 2
27 pages
Technometrics
No ratings yet
Technometrics
14 pages
Problem Set 04 - Solutions (Odtuclass)
No ratings yet
Problem Set 04 - Solutions (Odtuclass)
8 pages
Sparse Inverse Covariance Estimation With The Graphical Lasso
No ratings yet
Sparse Inverse Covariance Estimation With The Graphical Lasso
14 pages
Exam Practice Solution
No ratings yet
Exam Practice Solution
9 pages
3.3 Regularized Linear Model
No ratings yet
3.3 Regularized Linear Model
27 pages
TR 112
No ratings yet
TR 112
19 pages
Lasso Ridge Notes
No ratings yet
Lasso Ridge Notes
2 pages
TP2 Reg 2024
No ratings yet
TP2 Reg 2024
5 pages
Lecture 24
No ratings yet
Lecture 24
8 pages
Correction MIP 2024
No ratings yet
Correction MIP 2024
9 pages
Lec 5
No ratings yet
Lec 5
53 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Analysis of The Influence of Geometric, Modeling& Environmental Parameters On Bridges Subjected To Fire
No ratings yet
Analysis of The Influence of Geometric, Modeling& Environmental Parameters On Bridges Subjected To Fire
13 pages
An Analysis of Literacy Rate in Haryana
No ratings yet
An Analysis of Literacy Rate in Haryana
5 pages
Sandia Remote Sensing E-Magazine
No ratings yet
Sandia Remote Sensing E-Magazine
7 pages
3 THere Is, Are + Prepositions + Some Any A
No ratings yet
3 THere Is, Are + Prepositions + Some Any A
2 pages
UcD-Superlite v.3 Flagship
No ratings yet
UcD-Superlite v.3 Flagship
1 page
© The Institute of Chartered Accountants of India
No ratings yet
© The Institute of Chartered Accountants of India
8 pages
1619 Bao-Idang Chaermalyn Finals
No ratings yet
1619 Bao-Idang Chaermalyn Finals
3 pages
NTTR Info and Mission by The 476 - DCS - Nevada Test and Training Range - ED Foru
No ratings yet
NTTR Info and Mission by The 476 - DCS - Nevada Test and Training Range - ED Foru
17 pages
Karate Proposal
No ratings yet
Karate Proposal
3 pages
Ga Wire Mesh Welder 2021
No ratings yet
Ga Wire Mesh Welder 2021
11 pages
Financial Report - Sole Trader
No ratings yet
Financial Report - Sole Trader
21 pages
CH 5 - Special Machines
No ratings yet
CH 5 - Special Machines
14 pages
Generational Differences - Revisting Generational Work Values For The New Millennium
No ratings yet
Generational Differences - Revisting Generational Work Values For The New Millennium
21 pages
VKSS08 Floppy Brim Hat
No ratings yet
VKSS08 Floppy Brim Hat
1 page
Leather Exporting PDF
No ratings yet
Leather Exporting PDF
20 pages
Sustainable Urban Forms
No ratings yet
Sustainable Urban Forms
3 pages
Artificial Intelligence & Machine Learning Curriculum Pregrad
No ratings yet
Artificial Intelligence & Machine Learning Curriculum Pregrad
12 pages
(UploadMB - Com) Kuro Tensei
67% (3)
(UploadMB - Com) Kuro Tensei
143 pages
Bined 1
No ratings yet
Bined 1
3 pages
Alienware Area51m r2 Laptop - Users Guide - en Us
No ratings yet
Alienware Area51m r2 Laptop - Users Guide - en Us
23 pages
CDS Default Probabilities
No ratings yet
CDS Default Probabilities
8 pages
AR801 Brief - 2023
No ratings yet
AR801 Brief - 2023
5 pages
Reader's Digest 2010-02
No ratings yet
Reader's Digest 2010-02
204 pages
Retinal Detachment
100% (4)
Retinal Detachment
6 pages
Statement of Facts - 4 Harvey Street
No ratings yet
Statement of Facts - 4 Harvey Street
29 pages
Manual Welding Arm
No ratings yet
Manual Welding Arm
112 pages
Analysis of Elastic Thermal Stresses by Station-Function Collocat
No ratings yet
Analysis of Elastic Thermal Stresses by Station-Function Collocat
51 pages
Lecture 15
No ratings yet
Lecture 15
15 pages
Becoming Artist Becoming Educated Becomi
No ratings yet
Becoming Artist Becoming Educated Becomi
26 pages

Lecture BDS 7-23-24 Print

Uploaded by

Lecture BDS 7-23-24 Print

Uploaded by

Big Data Statistics, meeting 7: When d

is bigger than n, part 4

(1/n)||Xn (β̂ − β)||22 → 0.

■ Today, we will consider two results for this quantity:

where R1 is a strictly positive real number.

(1/n)||Xn (β̂ n − β)||22 ≤ c2 R1 λn , (3)

where c2 is a constant independent of d and n.

is called Ridge regression estimator.

■ Also the LASSO can be written in terms of β̂ LS as

■ Here α ∈ [0, 1].

You might also like