Normal and Shrinkage Linear Discriminant Analysis for Classification in Scikit Learn
Last Updated :
26 Apr, 2025
In this article, we will try to understand the difference between Normal and Shrinkage Linear Discriminant Analysis for Classification. We will try to implement the same using sci-kit learn library in Python. But first, let’s try to understand what is LDA.
What is Linear discriminant analysis (LDA)?
Linear discriminant analysis (LDA) is a supervised learning algorithm that projects the data onto a lower-dimensional space and separates the classes using a linear decision boundary. LDA is commonly used for classification tasks, where the goal is to predict the class label of a sample based on its features.
In LDA, the projection onto a lower-dimensional space is performed by finding a set of directions in the original feature space that maximizes the separation between the classes. These directions, known as the discriminant directions, are calculated using the class means and covariances.
Once the discriminant directions are found, the data is projected onto the space spanned by these directions and a linear decision boundary is constructed to separate the classes. The decision boundary is a hyperplane that is orthogonal to the discriminant directions and maximally separates the classes.
The LinearDiscriminantAnalysis class has two main modes of operation: normal and shrinkage. In normal mode, LDA assumes that the class covariance matrices are equal and estimated using the sample covariance of the entire dataset. In shrinkage mode, LDA uses a shrinkage estimator to regularize the covariance matrix and improve the stability of the model.
Performing linear discriminant analysis (LDA) for classification in scikit-learn involves the following steps:
- Import the LinearDiscriminantAnalysis class from sklearn.discriminant_analysis module.
- Generate or load the data for the classification task. The data should be a 2D array of feature values and a 1D array of class labels.
- Split the data into training and test sets using the train_test_split() function from the sklearn.model_selection module.
- Create an instance of the LinearDiscriminantAnalysis class and specify any desired hyperparameters, such as the solver (the solver parameter) and the shrinkage value (the shrinkage parameter).
- Fit the LinearDiscriminantAnalysis estimator to the training data using the fit() method.
- Use the estimator to make predictions on the test set using the predict() method.
- Evaluate the performance of the model by calculating metrics such as classification accuracy or the confusion matrix.
Here is a complete code of how to use the LinearDiscriminantAnalysis class to perform LDA for classification in scikit-learn:
Python3
import numpy as np
from sklearn.discriminant_analysis
import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
X = np.random.randn( 100 , 10 )
y = np.random.randint( 2 , size = 100 )
X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size = 0.3 )
estimator = LinearDiscriminantAnalysis(shrinkage = None )
estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
print (estimator.score(X_test, y_test))
|
Output:
0.5
This code will fit a LinearDiscriminantAnalysis estimator to the training data using normal mode and use it to make predictions on the test set. The classification accuracy is then printed using the score() method of the estimator.
Shrinkage Linear Discriminant Analysis:
Shrinkage linear discriminant analysis (LDA) is a variant of LDA that uses a shrinkage estimator to regularize the covariance matrices of the classes. In normal LDA, the covariance matrices are estimated using the sample covariance of the entire dataset, which can be unstable and lead to overfitting. Shrinkage LDA addresses this issue by using a shrinkage estimator, such as the Ledoit-Wolf estimator, to regularize the covariance matrices and improve their stability.
To use shrinkage mode, you can set the solver parameter of the LinearDiscriminantAnalysis estimator to the solver of your choice(Eigen or lsqr) and specify a value for the shrinkage parameter. For example:
# Create a LinearDiscriminantAnalysis estimator
# with shrinkage and fit it to the training data
estimator = LinearDiscriminantAnalysis(solver='svd', shrinkage=0.5)
estimator.fit(X_train, y_train)
This code will create a LinearDiscriminantAnalysis estimator that uses shrinkage mode with a shrinkage value and fit it to the training
Python3
import numpy as np
from sklearn.discriminant_analysis
import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
X = np.random.randn( 100 , 10 )
y = np.random.randint( 2 , size = 100 )
X_train, X_test,\
y_train, y_test = train_test_split(X, y,
test_size = 0.3 )
estimator = LinearDiscriminantAnalysis(solver = 'eigen' ,
shrinkage = 'auto' )
estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
print (estimator.score(X_test, y_test))
|
Output:
0.43333333333333335
Similar Reads
Linear Discriminant Analysis in Machine Learning
When working with high-dimensional datasets it is important to apply dimensionality reduction techniques to make data exploration and modeling more efficient. One such technique is Linear Discriminant Analysis (LDA) which helps in reducing the dimensionality of data while retaining the most signific
6 min read
Linear vs. Non-linear Classification: Analyzing Differences Using the Kernel Trick
Classification is a fundamental task in machine learning, where the goal is to assign a class label to a given input. There are two primary approaches to classification: linear and non-linear. Support Vector Machines (SVMs) are a popular choice for classification tasks due to their robustness and ef
8 min read
Linear and Quadratic Discriminant Analysis using Sklearn
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two well-known classification methods that are used in machine learning to find patterns and put things into groups. They are especially helpful when you have labeled data and want to classify new observations notes int
5 min read
MNIST Classification Using Multinomial Logistic + L1 in Scikit Learn
In this article, we shall implement MNIST classification using Multinomial Logistic Regression using the L1 penalty in the Scikit Learn Python library. Multinomial Logistic Regression and  L1 Penalty MNIST is a widely used dataset for classification purposes. You may think of this dataset as the He
4 min read
Classification of text documents using sparse features in Python Scikit Learn
Classification is a type of machine learning algorithm in which the model is trained, so as to categorize or label the given input based on the provided features for example classifying the input image as an image of a dog or a cat (binary classification) or to classify the provided picture of a liv
5 min read
Linear Discriminant Analysis in R Programming
One of the most popular or well established Machine Learning technique is Linear Discriminant Analysis (LDA ). It is mainly used to solve classification problems rather than supervised classification problems. It is basically a dimensionality reduction technique. Using the Linear combinations of pre
6 min read
ML | Cancer cell classification using Scikit-learn
Machine learning is used in solving real-world problems including medical diagnostics. One such application is classifying cancer cells based on their features and determining whether they are 'malignant' or 'benign'. In this article, we will use Scikit-learn to build a classifier for cancer cell de
4 min read
Map Data to a Normal Distribution in Scikit Learn
A Normal Distribution, also known as a Gaussian distribution, is a continuous probability distribution that is symmetrical around its mean. It is defined by its norm, which is the center of the distribution, and its standard deviation, which is a measure of the spread of the distribution. The normal
5 min read
Shrinkage Covariance Estimation in Scikit Learn
The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the m
3 min read
Model Selection with Probabilistic PCA and Factor Analysis (FA) in Scikit Learn
In the field of machine learning, model selection plays a vital role in finding the most suitable algorithm for a given dataset. When dealing with dimensionality reduction tasks, methods such as Principal Component Analysis (PCA) and Factor Analysis (FA) are commonly employed. However, in scenarios
10 min read