Blind source separation using FastICA in Scikit Learn
Last Updated :
10 Feb, 2025
FastICA is the most popular method and the fastest algorithm to perform Independent Component Analysis. It can be used to separate all the individual signals from a mixed signal.
Independent Component Analysis(ICA) is a method where it performs a search for finding mutually independent non-Gaussian latent variables. Here, the components of the multivariate data are assumed to be linear combinations of them.
FastICA is of two types.
- Deflation-based FastICA where the components are found in one by one manner.
- Symmetric FastICA where the components are found simultaneously.
FastICA has some prominent features it can use any nonlinearity function and optimizes the extraction order in the deflation-based version.
Mathematical Approach
FastICA can be used to separate all the individual signals from a mixed signal. So let us assume a set of individual source signals is [Tex]f(s) [/Tex] where [Tex]f(s) = (f_1(s),…,f_n(s))^{T} [/Tex]. These signal can be mixed using a matrix [Tex]G = [g_{ij}] ∈ R^{m×n} [/Tex]. After mixing the signal it produces a mixed signal [Tex]x(s) = (x_1(s),……,x_m(s))^{T} [/Tex].
- Generally [Tex]n=m [/Tex].
- If [Tex]m>n [/Tex], then the system of equations is overdetermined and the conventional linear method is used to unmix the signals.
- If [Tex]n>m [/Tex], then a non-linear method must be implemented because the system performs underdetermined.
Now FastICA algorithm is used to unmix these signals as there are some multidimensional signals that can be present on the mixed signals. Blind source separation can effectively unmix these signals with the help of the FastICA algorithm by determining an un-mixing matrix [Tex]U = [U_{ij}] ∈ R^{n×m}. [/Tex] Then it recovers the approximation of the original signals,
[Tex]y(s) = (y_1(s),…,y_n(s)) \\ y(s) = U\cdot x(s) [/Tex]
We will see the steps of FastICA algorithms later on in this article.
Applications of FastICA
FastICA is used in various fields in real-life events like a mixed sound wave from multiple independent sources we can easily find out each signal wave along with its source by using FastICA. Also let if a room is full of people and there are a certain number of microphones, we can use FastICA to identify everyone from the mixed sound waves captures by microphones.
Blind Source Separation
When all the sources of the signals and also the mixing methodology is completely unknown, the Blind source separation technique comes into consideration. From the name, we can understand it separates signals ‘blindly’ i.e. maximum numbers are not known of the mixed signals but we need to separate them. In real life, there are many situations where it is necessary to separate the mixed signal into it’s each signal. Also, there is a high possibility that the mixed signal contains noises. Blind Source Separation is used to handle these types of situations where separation is important but most things are not known about its input signal.
It is used in many domains like Audio source separation(speech separation and music industry), Medical tests(electroencephalography, Magnetic Resonance Imaging(MRI), electrocardiography, etc.), and communication applications.
FastICA Algorithm
FastICA consists of mainly three steps:-
1. Pre-whitening the data:
If the input data matrix is [Tex]X = (x_{ij})\; \epsilon \; R^{N\times M} [/Tex] then it should be centered and whitened before applying the FastICA algorithm on it.
- Centering: Centering the data means subtracting the mean value from each variable, it will make the expected value for each row equal to 0. Suppose [Tex]X_{i} [/Tex] denotes the [Tex]i^{th} [/Tex] row having [Tex]M [/Tex] number of variables then [Tex]x_{ij} [/Tex] element will be equal to the [Tex]x_{ij} = x_{ij} – \frac{1}{M}\sum_{j}x_{ij} [/Tex]. where [Tex]i \;\epsilon \;(1,2,…,N) [/Tex] and [Tex]j \; \epsilon \; (1,2,…,M) [/Tex].
- Whitening: In whitening, we linearly transformed the centered data in such a way that they become uncorrelated and the variance between them will be equal to 1. It is done by Eigen Value Decomposition of Covariance Matrix of centered data X. i.e. [Tex]E\left \{ XX^{T}\right \} = EDE^{T} [/Tex]. Where E is the eigen vector matrix and D is the diagonal matrix of eigen values.
2. For single unit:
After that , the iterative algorithm finds the direction for the weight vector that maximizes a measure of non-Gaussian plane of the projection on the pre-whitened data matrix. In this case it randomizes initial weight vector then perform averaging over all column-vectors of matrix X. If converges then weight vector(w) value is calculated if not converged then it re-performs the averaging. For calculating non-Gaussianity FastICA uses a nonlinear nonquadratic function f(q) and it’s first derivative(g(q)) , second derivative(g'(q)).
Non-Gaussianity:- A Gaussian function measures the data is following normal distribution or not. If the mean or noise of dataset varies with time then there is a high chance that it triggers some errors as the variables become non-stationary i.e. the value of variables change with time. So, Non-Gaussianity is a corrective technique or distribution(Kurtosis) to reduce this error. In Gaussian variables this kurtosis value is zero and more high when it tending to follow non-gaussianity.
For general purpose uses the function will be –>
[Tex]f(q) = \log\cosh(q) \\ f'(q) = tanh(q) \\ f”(q) = 1-tanh^2(q)[/Tex]
or
[Tex]f(q) = -e^{\frac{-u^2}{2}} \\ f'(q) = ue^{\frac{-u^2}{2}} \\ f”(q) = (1-u^2)e^{\frac{-u^2}{2}}[/Tex]
3. For several unit:
Only one weight vector can be estimated by the single unit iterative algorithm which extracts only a single component. For estimating additional components that are mutually independent(i.e. a certain components performance will not be affected by other components occurrence) requires repeating the algorithm to obtain linearly independent projection vectors. In this case total number of desired components along with pre-whitened matrix is given as input the algorithm gives the output as Un-mixing matrix(each column projects onto independent component) and Independent components matrix. Pseudo code for this algorithm is given below–>
for p in 1 to C:
weight vector [Tex]w_p\leftarrow [/Tex][Tex] [/Tex] Random vector of length N
while [Tex] [/Tex] changes
[Tex]w_{p}\leftarrow\frac{1}{M}\ Xg(w_p^{T}X)^{T}\ – \frac{1}{M}\ g'(w_p^{T}X){1_M}{w_p}[/Tex]
[Tex]w_p\leftarrow\ w_p – \sum_{j=1}^{p-1}\ (w_p^{T}w_j)w_j [/Tex]
[Tex]w_p\leftarrow\frac{w_p}{||w_p||}[/Tex]
output [Tex]W\leftarrow\ [w_1,….,w_C][/Tex]
output [Tex]S\leftarrow\ W^{T}X[/Tex]
Note: Here C= total number of desired components, X = pre-whitened matrix, W = unmixing matrix , M = independent components matrix.
Now we will see a example which depicts how FastICA is can be used for Blind source separation step by step–>
To estimate the sound sources effectively from given noisy measurements the Independent component analysis (ICA) is used. Let us assume that three different type of musical instrument is playing music simultaneously and three microphones recording the mixed signals. ICA is used here to recover the sources of the signals i.e. which signal is played by which instrument. Here we will also show that PCA fails to recover the instruments since the related signals reflect non-Gaussian processes. We will see the implementation step by step–>
Importing Python libraries and generating sample data
By using python libraries like NumPy, SciPy, Matplotlib and SKlearn we can perform complex computations easily and effectively. After that three types of signals(sinusoidal, square and saw-tooth) we will generate.
Python
# importing libraries
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
from sklearn.decomposition import FastICA, PCA
# generating sample data
np.random.seed(20)
numberOfSamples = 10000
timeRange = np.linspace(0, 8, numberOfSamples)
# Sinusoidal signal
signal1 = np.sin(2 * timeRange)
# Square signal
signal2 = np.sign(np.sin(3 * timeRange))
# Saw-tooth signal
signal3 = signal.sawtooth(2 * np.pi * timeRange)
signalSummation = np.c_[signal1, signal2, signal3]
# Adding some noise
signalSummation += 0.4 * \
np.random.normal(size=signalSummation.shape)
# Standardization of data
signalSummation /= signalSummation.std(axis=0)
# Mixing the data
mixMatrix = np.array(
[[1, 1, 1], [0.8, 2, 1.2], [1.6, 1.2, 2.4]])
# Generate observations
obsvGenerate = np.dot(signalSummation, mixMatrix.T)
Computing ICA
Now we will compute ICA model using FastICA and also as given earlier we will also compute PCA model for showing the comparison.
Python
# Fitting ICA and PCA models
# Compute ICA
ica = FastICA(n_components=3, whiten="arbitrary-variance")
signalRecont = ica.fit_transform(obsvGenerate) # Reconstruct signals
mixMatrixEst = ica.mixing_ # Get estimated mixing matrix
assert np.allclose(obsvGenerate, np.dot(
signalRecont, mixMatrixEst.T) + ica.mean_)
# compute PCA
pca = PCA(n_components=3)
# Reconstruct signals based on orthogonal components
orthosignalrecont = pca.fit_transform(obsvGenerate)
Visualizing results and comparison in graphical format
Now we will plot the graph with our achieved values and can under stand the efficiency of ICA for blind source separation of signals as well as PCA has failed to do this.
Python
# Plot results
plt.figure()
models = [obsvGenerate, signalSummation, signalRecont, orthosignalrecont]
names = ["Observations (mixed signal)",
"True Sources", "ICA recovered signals",
"PCA recovered signals",
]
colors = ["yellow", "green", "cyan"]
for ii, (model, name) in enumerate(zip(models, names), 1):
plt.subplot(4, 1, ii)
plt.title(name)
for sig, color in zip(model.T, colors):
plt.plot(sig, color=color)
plt.tight_layout()
plt.show()
Output:

Blind source separation
From the graph, it is very much clear that ICA has separated all three signals efficiently but PCA can’t.
Similar Reads
Introduction to Speech Separation Based On Fast ICA
Prerequisite: ML | Independent Component Analysis What is Speech Separation? Speech Separation is the process of extracting all overlapping speech sources from a given mixed speech signal. Speech Separation is a special scenario for source separation problems, where the focus is only on overlapping
3 min read
Feature selection using SelectFromModel and LassoCV in Scikit Learn
Feature selection is a critical step in machine learning and data analysis, aimed at identifying and retaining the most relevant variables in a dataset. It not only enhances model performance but also reduces overfitting and improves interpretability. In this guide, we delve into the world of featur
7 min read
FastICA on 2D Point Clouds in Scikit Learn
In the field of machine learning, the Fast Independent Component Analysis (FastICA) method has emerged as a powerful tool for uncovering latent patterns within data, particularly in the analysis of 2D point clouds derived from sensor or image data. This article provides a thorough exploration of Fas
7 min read
Shrinkage Covariance Estimation in Scikit Learn
The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the m
3 min read
Reduce Data Dimensionality using PCA - Python
IntroductionThe advancements in Data Science and Machine Learning have made it possible for us to solve several complex regression and classification problems. However, the performance of all these ML models depends on the data fed to them. Thus, it is imperative that we provide our ML models with a
6 min read
Can We Use Scikit-learn and TensorFlow Together?
In the world of machine learning, Scikit-learn and TensorFlow are two of the most popular libraries used for building and deploying models. While Scikit-learn excels in providing a wide range of tools for data preprocessing, model selection, and evaluation, TensorFlow shines in creating deep learnin
5 min read
Dog Breed Classification using Transfer Learning
In this tutorial, we will demonstrate how to build a dog breed classifier using transfer learning. This method allows us to use a pre-trained deep learning model and fine-tune it to classify images of different dog breeds. Why to use Transfer Learning for Dog Breed ClassificationTransfer learning is
9 min read
Feature Selection in Python with Scikit-Learn
Feature selection is a crucial step in the machine learning pipeline. It involves selecting the most important features from your dataset to improve model performance and reduce computational cost. In this article, we will explore various techniques for feature selection in Python using the Scikit-L
4 min read
What is Scikit-learn Random State in Splitting Dataset?
One of the key aspects for developing reliable models is the concept of the random_state parameter in Scikit-learn, particularly when splitting datasets. This article delves into the significance of random_state, its usage, and its impact on model performance and evaluation. Table of Content Underst
5 min read
ML | Cancer cell classification using Scikit-learn
Machine learning is used in solving real-world problems including medical diagnostics. One such application is classifying cancer cells based on their features and determining whether they are 'malignant' or 'benign'. In this article, we will use Scikit-learn to build a classifier for cancer cell de
4 min read