0% found this document useful (0 votes)

181 views

Ridge and Lasso in Python PDF

This document provides an overview and example of using Ridge Regression and the Lasso for regression analysis. It uses the Hitters dataset to predict Salary. It demonstrates fitting Ridge and Lasso models with various regularization parameters, using cross-validation to select the best parameters. Ridge Regression is shown to outperform the null and least squares models on this data, while the Lasso yields similar performance and has the advantage of producing a sparse model with many zero-valued coefficients, thus performing variable selection. Students are instructed to apply these techniques to other datasets and respond with their results.

Uploaded by

ASHISH MALI

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views

Ridge and Lasso in Python PDF

Uploaded by

ASHISH MALI

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

This lab on Ridge Regression and the Lasso is a Python adaptation of p.

251-255 of "Introduction to Statistical Learning with

Applications in R" by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. Adapted by R. Jordan Crouser at
Smith College for SDS293: Machine Learning (Spring 2016).

Want to follow along on your own machine? Download the .py (lab10.py) or Jupyter Notebook (Lab 10 - Ridge Regression
and the Lasso in Python.ipynb) version.

6.6: Ridge Regression and the Lasso

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import scale

from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge, RidgeCV, Lasso, LassoCV
from sklearn.metrics import mean_squared_error

We will use the sklearn package in order to perform ridge regression and the lasso. The main functions in this package that
we care about are Ridge(), which can be used to fit ridge regression models, and Lasso() which will fit lasso models. They
also have cross-validated counterparts: RidgeCV() and LassoCV(). We'll use these a bit later.

Before proceeding, let's first ensure that the missing values have been removed from the data, as described in the previous
lab.

df = pd.read_csv('Hitters.csv').dropna().drop('Player', axis = 1)
df.info()
dummies = pd.get_dummies(df[['League', 'Division', 'NewLeague']])

We will now perform ridge regression and the lasso in order to predict Salary on the Hitters data. Let's set up our data:

y = df.Salary

# Drop the column with the independent variable (Salary), and columns for which we created dummy var
iables
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis = 1).astype('float64')

# Define the feature set X.

X = pd.concat([X_, dummies[['League_N', 'Division_W', 'NewLeague_N']]], axis = 1)

X.info()

6.6.1 Ridge Regression

The Ridge() function has an alpha argument (λ, but with a different name!) that is used to tune the model. We'll generate an
array of alpha values ranging from very big to very small, essentially covering the full range of scenarios from the null model
containing only the intercept, to the least squares fit:
alphas = 10**np.linspace(10,-2,100)*0.5
alphas

Associated with each alpha value is a vector of ridge regression coefficients, which we'll store in a matrix coefs. In this case,
it is a 19 × 100 matrix, with 19 rows (one for each predictor) and 100 columns (one for each value of alpha). Remember that
we'll want to standardize the variables so that they are on the same scale. To do this, we can use the normalize = True
parameter:

ridge = Ridge(normalize = True)

coefs = []

for a in alphas:
ridge.set_params(alpha = a)
ridge.fit(X, y)
coefs.append(ridge.coef_)

np.shape(coefs)

We expect the coefficient estimates to be much smaller, in terms of l2 norm, when a large value of alpha is used, as
compared to when a small value of alpha is used. Let's plot and find out:

ax = plt.gca()
ax.plot(alphas, coefs)
ax.set_xscale('log')
plt.axis('tight')
plt.xlabel('alpha')
plt.ylabel('weights')

We now split the samples into a training set and a test set in order to estimate the test error of ridge regression and the lasso:

# Split data into training and test sets

X_train, X_test , y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=1)

Next we fit a ridge regression model on the training set, and evaluate its MSE on the test set, using λ = 4 :

ridge2 = Ridge(alpha = 4, normalize = True)

ridge2.fit(X_train, y_train) # Fit a ridge regression on the training data
pred2 = ridge2.predict(X_test) # Use this model to predict the test data
print(pd.Series(ridge2.coef_, index = X.columns)) # Print coefficients
print(mean_squared_error(y_test, pred2)) # Calculate the test MSE

The test MSE when alpha = 4 is 106216. Now let's see what happens if we use a huge value of alpha, say 10 :
10

ridge3 = Ridge(alpha = 10**10, normalize = True)

ridge3.fit(X_train, y_train) # Fit a ridge regression on the training data
pred3 = ridge3.predict(X_test) # Use this model to predict the test data
print(pd.Series(ridge3.coef_, index = X.columns)) # Print coefficients
print(mean_squared_error(y_test, pred3)) # Calculate the test MSE

This big penalty shrinks the coefficients to a very large degree, essentially reducing to a model containing just the intercept.
This over-shrinking makes the model more biased, resulting in a higher MSE.
Okay, so fitting a ridge regression model with alpha = 4 leads to a much lower test MSE than fitting a model with just an
intercept. We now check whether there is any benefit to performing ridge regression with alpha = 4 instead of just performing
least squares regression. Recall that least squares is simply ridge regression with alpha = 0.

ridge2 = Ridge(alpha = 0, normalize = True)

ridge2.fit(X_train, y_train) # Fit a ridge regression on the training data
pred = ridge2.predict(X_test) # Use this model to predict the test data
print(pd.Series(ridge2.coef_, index = X.columns)) # Print coefficients
print(mean_squared_error(y_test, pred)) # Calculate the test MSE

It looks like we are indeed improving over regular least-squares!

Instead of arbitrarily choosing alpha = 4 , it would be better to use cross-validation to choose the tuning parameter alpha. We
can do this using the cross-validated ridge regression function, RidgeCV(). By default, the function performs generalized
cross-validation (an efficient form of LOOCV), though this can be changed using the argument cv.

ridgecv = RidgeCV(alphas = alphas, scoring = 'neg_mean_squared_error', normalize = True)

ridgecv.fit(X_train, y_train)
ridgecv.alpha_

Therefore, we see that the value of alpha that results in the smallest cross-validation error is 0.57. What is the test MSE
associated with this value of alpha?

ridge4 = Ridge(alpha = ridgecv.alpha_, normalize = True)

ridge4.fit(X_train, y_train)
mean_squared_error(y_test, ridge4.predict(X_test))

This represents a further improvement over the test MSE that we got using alpha = 4 . Finally, we refit our ridge regression
model on the full data set, using the value of alpha chosen by cross-validation, and examine the coefficient estimates.

ridge4.fit(X, y)
pd.Series(ridge4.coef_, index = X.columns)

As expected, none of the coefficients are exactly zero - ridge regression does not perform variable selection!

6.6.2 The Lasso

We saw that ridge regression with a wise choice of alpha can outperform least squares as well as the null model on the
Hitters data set. We now ask whether the lasso can yield either a more accurate or a more interpretable model than ridge
regression. In order to fit a lasso model, we'll use the Lasso() function; however, this time we'll need to include the argument
max_iter = 10000. Other than that change, we proceed just as we did in fitting a ridge model:
lasso = Lasso(max_iter = 10000, normalize = True)
coefs = []

for a in alphas:
lasso.set_params(alpha=a)
lasso.fit(scale(X_train), y_train)
coefs.append(lasso.coef_)

ax = plt.gca()
ax.plot(alphas*2, coefs)
ax.set_xscale('log')
plt.axis('tight')
plt.xlabel('alpha')
plt.ylabel('weights')

Notice that in the coefficient plot that depending on the choice of tuning parameter, some of the coefficients are exactly equal
to zero. We now perform 10-fold cross-validation to choose the best alpha, refit the model, and compute the associated test
error:

lassocv = LassoCV(alphas = None, cv = 10, max_iter = 100000, normalize = True)

lassocv.fit(X_train, y_train)

lasso.set_params(alpha=lassocv.alpha_)
lasso.fit(X_train, y_train)
mean_squared_error(y_test, lasso.predict(X_test))

This is substantially lower than the test set MSE of the null model and of least squares, and only a little worse than the test
MSE of ridge regression with alpha chosen by cross-validation.

However, the lasso has a substantial advantage over ridge regression in that the resulting coefficient estimates are sparse.
Here we see that 13 of the 19 coefficient estimates are exactly zero:

# Some of the coefficients are now reduced to exactly zero.

pd.Series(lasso.coef_, index=X.columns)

Your turn!
Now it's time to test out these approaches (ridge regression and the lasso) and evaluation methods (validation set, cross
validation) on other datasets. You may want to work with a team on this portion of the lab. You may use any of the datasets
included in ISLR, or choose one from the UCI machine learning repository (http://archive.ics.uci.edu/ml/datasets.html
(http://archive.ics.uci.edu/ml/datasets.html)). Download a dataset, and try to determine the optimal set of parameters to use
to model it! You are free to use the same dataset you used in Lab 9, or you can choose a new one.

# Your code here

To get credit for this lab, post your responses to the following questions:

Which dataset did you choose?

What was your response variable (i.e. what were you trying to model)?
Did you expect ridge regression to outperform the lasso, or vice versa?
Which predictors turned out to be important in the final model(s)?

to Moodle: https://moodle.smith.edu/mod/quiz/view.php?id=259464 (https://moodle.smith.edu/mod/quiz/view.php?id=259464)

(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
100% (5)
(Ebook) Machine Learning Algorithms in Depth (MEAP V01) by Vadim Smolyakov ISBN 9781633439214, 1633439216 download pdf
81 pages
Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) All Chapters Instant Download
100% (4)
Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) All Chapters Instant Download
33 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Variable Selection
No ratings yet
Variable Selection
15 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Understanding Random Forest
100% (1)
Understanding Random Forest
12 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Introduction Econometrics R
No ratings yet
Introduction Econometrics R
48 pages
TF Idf Algorithm
No ratings yet
TF Idf Algorithm
4 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
34 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Random Forest
No ratings yet
Random Forest
5 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Introduction To Stochastic Processes
No ratings yet
Introduction To Stochastic Processes
22 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Advance Stats
No ratings yet
Advance Stats
233 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Numpy - Pandas Assignment
No ratings yet
Numpy - Pandas Assignment
2 pages
Ps and Solution CS229
No ratings yet
Ps and Solution CS229
55 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
100% (1)
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
Study Guide For STA3701
No ratings yet
Study Guide For STA3701
325 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Quantitative Analysis and Modelling
No ratings yet
Quantitative Analysis and Modelling
3 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
100% (1)
Pattern Recognition Machine Learning: Chapter 3: Linear Models For Regression
48 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
No ratings yet
EE 769 Introduction To Machine Learning: Sheet 4 - 2020-21-2 Linear Classification
4 pages
Binary Classification Tutorial With The Keras Deep Learning Library
No ratings yet
Binary Classification Tutorial With The Keras Deep Learning Library
33 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Artificial Neural Networks Kluniversity Course Handout
No ratings yet
Artificial Neural Networks Kluniversity Course Handout
18 pages
Practice of Analysis of Financial Time Series by Ruey S Tsay
No ratings yet
Practice of Analysis of Financial Time Series by Ruey S Tsay
10 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
46 pages
Online Machine Learning Algorithms For Currency Exchange Prediction
No ratings yet
Online Machine Learning Algorithms For Currency Exchange Prediction
84 pages
Real Options BV Lec 14
No ratings yet
Real Options BV Lec 14
49 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
7 pages
CNN Cheat Sheet
No ratings yet
CNN Cheat Sheet
5 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Full Statistics
No ratings yet
Full Statistics
108 pages
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
No ratings yet
Credit Risk Analysis Applying Logistic Regression, Neural Networks and Genetic Algorithms Models
12 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Directives
No ratings yet
Directives
2 pages
Gauri Jagtap_QA Resume
No ratings yet
Gauri Jagtap_QA Resume
2 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
4 pages
Synopsis: Multi Fruit Pulp and Paste Plant
No ratings yet
Synopsis: Multi Fruit Pulp and Paste Plant
3 pages
Experiment No.: 8: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
No ratings yet
Experiment No.: 8: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
2 pages
"Business Plan": Project Report
No ratings yet
"Business Plan": Project Report
14 pages
Miniproject-II Report On //title: Textile and Engineering Institute, Ichalkaranji
No ratings yet
Miniproject-II Report On //title: Textile and Engineering Institute, Ichalkaranji
2 pages
Final Megamind 2
No ratings yet
Final Megamind 2
25 pages
Chipsmore Final Propasal
No ratings yet
Chipsmore Final Propasal
3 pages
Nurs 191
No ratings yet
Nurs 191
4 pages
Lesson Plan Border States
No ratings yet
Lesson Plan Border States
1 page
GCSE Music (2017) - Specification-Standard
No ratings yet
GCSE Music (2017) - Specification-Standard
49 pages
Exam Candidate Handbook 2025 Curriculum
No ratings yet
Exam Candidate Handbook 2025 Curriculum
22 pages
Question Bank Preparation: Meaning
No ratings yet
Question Bank Preparation: Meaning
4 pages
Summative Test No.1 - Filipino6 With Tos
No ratings yet
Summative Test No.1 - Filipino6 With Tos
3 pages
IELTS Reading Tips
No ratings yet
IELTS Reading Tips
3 pages
Intro To Sip
No ratings yet
Intro To Sip
10 pages
Open Letter On Schools
No ratings yet
Open Letter On Schools
10 pages
Diversity Reading and Writing
No ratings yet
Diversity Reading and Writing
2 pages
Proposal Writing
100% (1)
Proposal Writing
6 pages
MIL 11 12LESI IIIg 17 23 006
No ratings yet
MIL 11 12LESI IIIg 17 23 006
5 pages
Daily Lesson Log School Grade Level Teacher Learning Area Teaching Dates and Time Quarter
100% (2)
Daily Lesson Log School Grade Level Teacher Learning Area Teaching Dates and Time Quarter
5 pages
Chemical Modification and Characterization of Carbon Nanotubes
No ratings yet
Chemical Modification and Characterization of Carbon Nanotubes
137 pages
Olivarez College History PDF
No ratings yet
Olivarez College History PDF
2 pages
Holy Child Academy: Grading Sheet
No ratings yet
Holy Child Academy: Grading Sheet
11 pages
Hindi Diwas Blog
No ratings yet
Hindi Diwas Blog
2 pages
Full Time Bursary Application Form: Personal Details
No ratings yet
Full Time Bursary Application Form: Personal Details
2 pages
2012 Situationer On Children
No ratings yet
2012 Situationer On Children
35 pages
KTU SOEX III 7036 2019 Notification IssuanceofDigitallySignedCertificates
No ratings yet
KTU SOEX III 7036 2019 Notification IssuanceofDigitallySignedCertificates
20 pages
Building Your Career Portfolio
No ratings yet
Building Your Career Portfolio
17 pages
4.in What Instances Is Extrinsic Motivation Necessary? Give Examples
No ratings yet
4.in What Instances Is Extrinsic Motivation Necessary? Give Examples
2 pages
3rd Quarter Force
No ratings yet
3rd Quarter Force
6 pages
Syllabus Writing As Thinking
No ratings yet
Syllabus Writing As Thinking
3 pages
MW3501 Pharmacology For Midwifery Practice PDF
100% (3)
MW3501 Pharmacology For Midwifery Practice PDF
2 pages
Personal Development: Knowing Oneself
No ratings yet
Personal Development: Knowing Oneself
36 pages
Becoming Your Best Newsletter - November 2010
No ratings yet
Becoming Your Best Newsletter - November 2010
4 pages
Gantt Chart
No ratings yet
Gantt Chart
4 pages
Vikalp Resume
No ratings yet
Vikalp Resume
3 pages

Ridge and Lasso in Python PDF

Uploaded by

Ridge and Lasso in Python PDF

Uploaded by

This lab on Ridge Regression and the Lasso is a Python adaptation of p.

251-255 of "Introduction to Statistical Learning with

6.6: Ridge Regression and the Lasso

from sklearn.preprocessing import scale

# Define the feature set X.

6.6.1 Ridge Regression

ridge = Ridge(normalize = True)

# Split data into training and test sets

ridge2 = Ridge(alpha = 4, normalize = True)

ridge3 = Ridge(alpha = 10**10, normalize = True)

ridge2 = Ridge(alpha = 0, normalize = True)

It looks like we are indeed improving over regular least-squares!

ridgecv = RidgeCV(alphas = alphas, scoring = 'neg_mean_squared_error', normalize = True)

ridge4 = Ridge(alpha = ridgecv.alpha_, normalize = True)

6.6.2 The Lasso

lassocv = LassoCV(alphas = None, cv = 10, max_iter = 100000, normalize = True)

# Some of the coefficients are now reduced to exactly zero.

# Your code here

Which dataset did you choose?

to Moodle: https://moodle.smith.edu/mod/quiz/view.php?id=259464 (https://moodle.smith.edu/mod/quiz/view.php?id=259464)

You might also like