Cross-correlation Analysis in Python
Last Updated :
17 May, 2024
Cross-correlation analysis is a powerful technique in signal processing and time series analysis used to measure the similarity between two series at different time lags. It reveals how one series (reference) is correlated with the other (target) when shifted by a specific amount. This information is valuable in various domains, including finance (identifying stock market correlations), neuroscience (analyzing brain activity), and engineering (evaluating system responses).
In this article, we'll explore four methods for performing cross-correlation analysis in Python, providing clear explanations and illustrative examples.
Cross-correlation Analysis in Python
Understanding Cross-correlation
Cross-correlation measures the similarity between two sequences as a function of the displacement of one relative to the other. denoted by R_{XY}(\tau) for various time or spatial lags where \tau represents the lag between the two datasets. Calculating Cross-correlation analysis in Python helps in:
- Time series data: This means data that's collected over time, like stock prices, temperature readings, or sound waves.
- Compares similarity at different lags: By shifting one set of data (like sliding the comb), it finds how well aligned they are at different points in time.
- Ranges from -1 to 1: A value of 1 means the data sets perfectly overlap (like perfectly aligned combs), 0 means no correlation, and -1 means they are opposite (like the gaps in the combs lining up exactly out of sync).
Implementation of Cross-correlation Analysis in Python
There are major 4 methods to perform cross-correlation analysis in Python:
- Python-Manual Function: Using basic Python functions and loops to compute cross-correlation.
- NumPy: Utilizing NumPy's fast numerical operations for efficient cross-correlation computation.
- SciPy: Leveraging SciPy's signal processing library for advanced cross-correlation calculations.
- Statsmodels: Employing Statsmodels for statistical analysis, including cross-correlation.
Method 1. Cross-correlation Analysis Using Python
To show implementation let's generate an dataset comprising two time series signals, signal1
and signal2
, using a combination of sine and cosine functions with added noise. This dataset simulates real-world scenarios where signals often exhibit complex patterns and noise.
In the code, we define two different functions for calculating mean, second cross_correlation fucntion
that takes two signals x
and y
where:
mean(x)
and mean(y)
: Calculates the mean of each signal.sum((a - x_mean) * (b - y_mean) for a, b in zip(x, y))
: Calculates the numerator of the cross-correlation formula by summing the product of the differences between corresponding elements of x
and y
, centered around their means.x_sq_diff
and y_sq_diff
calculate the sum of squared differences for each signal.math.sqrt(x_sq_diff * y_sq_diff)
: Calculates the denominator of the cross-correlation formula by taking the square root of the product of the squared differences.
Python
import math
import random
# Generate signals
t = [i * 0.1 for i in range(100)]
signal1 = [math.sin(2 * math.pi * 2 * i) + 0.5 * math.cos(2 * math.pi * 3 * i) + random.normalvariate(0, 0.1) for i in t]
signal2 = [math.sin(2 * math.pi * 2 * i) + 0.5 * math.cos(2 * math.pi * 3 * i) + random.normalvariate(0, 0.1) for i in t]
# Define a function to calculate mean
def mean(arr):
return sum(arr) / len(arr)
# function to calculate cross-correlation
def cross_correlation(x, y):
# Calculate means
x_mean = mean(x)
y_mean = mean(y)
# Calculate numerator
numerator = sum((a - x_mean) * (b - y_mean) for a, b in zip(x, y))
# Calculate denominators
x_sq_diff = sum((a - x_mean) ** 2 for a in x)
y_sq_diff = sum((b - y_mean) ** 2 for b in y)
denominator = math.sqrt(x_sq_diff * y_sq_diff)
correlation = numerator / denominator
return correlation
correlation = cross_correlation(signal1, signal2)
print('Correlation:', correlation)
Output:
Manual Correlation: 0.9837294963190838
Method 2. Cross-correlation Analysis Using Numpy
NumPy's corrcoef
function is utilized to calculate the cross-correlation between signal1
and signal2
.
Python
import numpy as np
# time array
t = np.arange(0, 10, 0.1)
# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
numpy_correlation = np.corrcoef(signal1, signal2)[0, 1]
print('NumPy Correlation:', numpy_correlation)
Output:
NumPy Correlation: 0.9796920509627758
Method 3. Cross-correlation Analysis Using Scipy
SciPy's pearsonr
function is employed to calculate the cross-correlation between signal1
and signal2.
The Pearson correlation coefficient measures the linear relationship between two datasets.
Python
import numpy as np
# time array
t = np.arange(0, 10, 0.1)
# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
from scipy.stats import pearsonr
scipy_correlation, _ = pearsonr(signal1, signal2)
print('SciPy Correlation:', scipy_correlation)
Output:
SciPy Correlation: 0.9865169592702046
Method 4. Cross-correlation Analysis Using Statsmodels
Statsmodels OLS
function is used to calculate the cross-correlation between signal1
and signal2
.
Python
import numpy as np
# time array
t = np.arange(0, 10, 0.1)
# Generate signals
signal1 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
signal2 = np.sin(2 * np.pi * 2 * t) + 0.5 * np.cos(2 * np.pi * 3 * t) + np.random.normal(0, 0.1, len(t))
import statsmodels.api as sm
statsmodels_correlation = sm.OLS(signal1, signal2).fit().rsquared
print('Statsmodels Correlation:', statsmodels_correlation)
Output:
Statsmodels Correlation: 0.9730755677920275
Conclusion
The manual implementation, NumPy, SciPy, and Statsmodels methods all yield correlation coefficients that indicate a strong positive correlation between signal1
and signal2
. This underscores the versatility of Python in performing cross-correlation analysis, catering to a wide range of requirements and complexities.
Similar Reads
What is Correlation Analysis?
Most of the data in the world is interrelated by various factors. Data Science deals with understanding the relationships between different variables. This helps us learn the underlying patterns and connections that can give us valuable insights. "Correlation Analysis" is an important tool used to u
6 min read
How to Calculate Cross Correlation in R?
In this article we will discuss how to calculate cross correlation in R programming language. Correlation is used to get the relation between two or more variables. The result is 0, if there is no correlation between two variablesThe result is 1, if there is positive correlation between two variable
1 min read
Exploring Correlation in Python
This article aims to give a better understanding of a very important technique of multivariate exploration. A correlation Matrix is basically a covariance matrix. Also known as the auto-covariance matrix, dispersion matrix, variance matrix, or variance-covariance matrix. It is a matrix in which the
4 min read
Correspondence Analysis using Python
Correspondence Analysis (CA) is a statistical technique used to analyze the relationships between the categorical variables in a contingency table. It provides a visual representation of the data allowing for the identification of the patterns and associations between the categories of the variables
4 min read
Correlation and Regression with R
Correlation and regression analysis are two fundamental statistical techniques used to examine the relationships between variables. R Programming Language is a powerful programming language and environment for statistical computing and graphics, making it an excellent choice for conducting these ana
8 min read
EDA | Exploratory Data Analysis in Python
Exploratory Data Analysis (EDA) is a key step in data analysis, focusing on understanding patterns, trends, and relationships through statistical tools and visualizations. Python offers powerful libraries like pandas, numPy, matplotlib, seaborn, and plotly, enabling effective exploration and insight
9 min read
Python For Data Analysis
Exploratory Data Analysis (EDA) serves as the foundation of any data science project. It is an essential step where data scientists investigate datasets to understand their structure, identify patterns, and uncover insights. Data preparation involves several steps, including cleaning, transforming,
4 min read
Create a correlation Matrix using Python
A Correlation matrix is a table that shows how different variables are related to each other. Each cell in the table displays a number i.e. correlation coefficient which tells us how strongly two variables are together. It helps in quickly spotting patterns, understand relationships and making bette
3 min read
Exploratory Data Analysis in Python | Set 1
This article provides a comprehensive guide to performing Exploratory Data Analysis (EDA) using Python focusing on the use of NumPy and Pandas for data manipulation and analysis. Step 1: Setting Up EnvironmentTo perform EDA in Python we need to import several libraries that provide powerful tools fo
4 min read
How to Calculate Autocorrelation in Python?
Correlation generally determines the relationship between two variables. Correlation is calculated between the variable and itself at previous time steps, such a correlation is called Autocorrelation. Method 1 : Using lagplot() The daily minimum temperatures dataset is used for this example. As the
3 min read