ML | Ridge Regressor using sklearn

Last Updated : 12 Feb, 2025

Ridge regression is a powerful technique used in statistics and machine learning to improve the performance of linear regression models. In this article we will understand the concept of ridge regression with its implementation in sklearn.

Ridge Regression

A Ridge regressor is basically a regularized version of a Linear Regressor. In linear regression we try to find the best-fitting line through our data points by minimizing the error between predicted and actual values. However when our model is too complex it can fit the noise in the data rather than the actual trend lead to poor predictions on new data. This issue is known as overfitting. To handle this Ridge regression introduces a regularization term that penalizes large weights in the model. The regularized term has the parameter ‘alpha’ which controls the regularization of the model.

Cost Function of Ridge Regression

The cost function for Ridge regression can be written as:

J(\Theta) = \frac{1}{m} (X \Theta - Y)^2 + \frac{\alpha}{2} \Theta^2

The first term is our basic linear regression's cost function and the second term is our new regularized weights term which uses the L2 norm to fit the data. If the 'alpha' is zero the model is the same as linear regression and the larger 'alpha' value specifies a stronger regularization.

Note: Before using Ridge regressor it is necessary to scale the inputs, because this model is sensitive to scaling of inputs. So performing the scaling through sklearn's StandardScalar will be beneficial.

Choosing the right alpha \alpha for Ridge regression is important. Let's understand how we can do it:

Cross-Validation: Test the model with different alpha values to see which one works best for new data.
Grid Search: Try a set of alpha values and pick the one that gives the best prediction results.
Error Metrics: Focus on the error from the test data not the training data to avoid overfitting and pick the right alpha.

Implementing Ridge Regression using Scikit-learn

Here’s a simple example using Python's Scikit-Learn library to implement Ridge regression with the California Housing dataset:

Python

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV
from sklearn.metrics import r2_score

data = fetch_california_housing()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

ridge_cv = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5)  
ridge_cv.fit(X_train_scaled, y_train)

y_pred = ridge_cv.predict(X_test_scaled)
print("Model score (R^2):", r2_score(y_test, y_pred))

Output :

Boston dataset keys : 
 dict_keys(['feature_names', 'DESCR', 'data', 'target'])

Boston data : 
 [[6.3200e-03 1.8000e+01 2.3100e+00 ... 1.5300e+01 3.9690e+02 4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9690e+02 9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 ... 1.7800e+01 3.9283e+02 4.0300e+00]
 ...
 [6.0760e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 5.6400e+00]
 [1.0959e-01 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9345e+02 6.4800e+00]
 [4.7410e-02 0.0000e+00 1.1930e+01 ... 2.1000e+01 3.9690e+02 7.8800e+00]]

Optimal alpha: 1.0 
Model score : 0.6012345678901234

This Python code implements Ridge regression with the California Housing dataset. It loads the data splits it into training and testing sets and scales the features for better performance. The RidgeCV model is trained to find the best regularization parameter (alpha) using cross-validation. Finally it predicts housing prices on the test set and prints the optimal alpha value along with the model's R² score.

Limitations of Ridge Regression

Other than its benefits Ridge regression has some limitations:

No Variable Selection: Ridge regression keeps all features in the model so it doesn’t help in figuring out which ones are really important. This can make it confusing when there are a lot of features.
Bias Introduction: If the regularization parameter (alpha) is too high the model can become too simple. This means it might not fit the data well led to underfitting where it miss important patterns in the data.

If you want to understand the concept refer to these:

Understanding Kernel Ridge Regression With Sklearn

ankushkuwar05

Improve

Article Tags :

Practice Tags :

ML | Ridge Regressor using sklearn

Ridge Regression

Cost Function of Ridge Regression

Implementing Ridge Regression using Scikit-learn

Limitations of Ridge Regression

Similar Reads

Thank You!

What kind of Experience do you want to share?