Kendall Correlation Testing in R Programming
Last Updated :
10 Oct, 2024
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally, it lies between -1 and +1. It is a scaled version of covariance and provides direction and strength of the relationship. It's dimensionless. There are mainly two types of correlation:
- Parametric Correlation - Pearson correlation(r): It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
- Non-Parametric Correlation - Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, and are known as non-parametric correlation.
What is Kendall's Tau?
Kendall’s tau is a measure of correlation that assesses the ordinal relationship between two variables. It is based on the difference between the number of concordant and discordant pairs in the dataset. Kendall Rank Correlation is a rank-based correlation coefficient, also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:
[
\tau = \frac{\text{Number of concordant pairs} - \text{Number of discordant pairs}}{\frac{n(n - 1)}{2}}
]
where,
- Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 > y2 or
- x1 < x2 and y1 < y2
- Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 < y2 or
- x1 < x2 and y1 > y2
- n: Total number of samples
Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.
R’s base functions include support for calculating Kendall's tau using the cor()
and cor.test()
functions. Optionally, you can install additional visualization packages such as ggpubr
for enhanced plots.
install.packages("ggpubr")
Lets discuss stepby step Kendall Correlation Testing in R Programming Language:
Step 1: Creating a Dataset
Let’s create a sample dataset to work with.
R
# Sample data
set.seed(123)
x <- c(12, 25, 35, 47, 52, 68, 70, 85, 90, 100)
y <- c(15, 22, 37, 40, 48, 60, 67, 80, 95, 105)
data <- data.frame(x, y)
Step 2: Computing Kendall Correlation using cor()
function
You can compute Kendall’s tau using the cor()
function and specifying the method as "kendall"
.
R
# Calculate Kendall correlation
kendall_corr <- cor(data$x, data$y, method = "kendall")
kendall_corr
Output:
[1] 0.9111111
Step 3: Hypothesis Testing with cor.test()
To conduct a hypothesis test and obtain the p-value for Kendall correlation, use the cor.test()
function:
R
# Perform Kendall correlation test
kendall_test <- cor.test(data$x, data$y, method = "kendall")
kendall_test
Output:
Kendall's rank correlation tau
data: data$x and data$y
T = 45, p-value = 5.511e-07
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.9111111
The output shows a high positive Kendall's tau (0.91), with a p-value indicating statistical significance.
- Positive values indicate a strong positive monotonic relationship.
- Negative values indicate a strong negative monotonic relationship.
- Values near 0 indicate little to no monotonic relationship.
- Always consider the p-value to assess the significance of the relationship.
Step 4: Visualizing Kendall Correlation
We can visualize the correlation using a scatter plot and annotate it with Kendall’s tau using the ggpubr
package.
R
# Load required library
library(ggpubr)
# Scatter plot with Kendall correlation coefficient
ggscatter(data, x = "x", y = "y",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "kendall",
xlab = "X Values", ylab = "Y Values",
title = "Kendall Correlation Plot")
Output:
Kendall Correlation Testing in R ProgrammingThis will produce a scatter plot with a trendline and display the Kendall correlation coefficient.
Conclusion
Kendall correlation, or Kendall’s tau, provides a robust non-parametric measure of association between two variables. It is particularly useful for ordinal data and datasets with ties, where it outperforms other correlation measures like Pearson or Spearman. R offers convenient functions like cor()
and cor.test()
to compute Kendall’s tau and test its significance.
Similar Reads
Pearson Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c
5 min read
Spearman Correlation Testing in R Programming
Correlation is a key statistical concept used to measure the strength and direction of the relationship between two variables. Unlike Pearsonâs correlation, which assumes a linear relationship and continuous data, Spearmanâs rank correlation coefficient is a non-parametric measure that assesses how
3 min read
Correlation Matrix in R Programming
Correlation refers to the relationship between two variables, specifically the degree of linear association between them. In R, a correlation matrix represents this relationship as a range of values between -1 and 1.A value of -1 indicates a perfect negative linear relationship.A value of 1 indicate
5 min read
Fligner-Killeen Test in R Programming
The Fligner-Killeen test is a non-parametric test for homogeneity of group variances based on ranks. It is useful when the data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. It is also one of the many tests for homogeneity of variances which is
3 min read
Hypothesis Testing in R Programming
A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypo
6 min read
Bartlettâs Test in R Programming
In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or sam
5 min read
Unit Testing in R Programming
The unit test basically is small functions that test and help to write robust code. From a robust code we mean a code which will not break easily upon changes, can be refactored simply, can be extended without breaking the rest, and can be tested with ease. Unit tests are of great use when it comes
5 min read
Comparing Means in R Programming
There are many cases in data analysis where youâll want to compare means for two populations or samples and which technique you should use depends on what type of data you have and how that data is grouped together. The comparison of means tests helps to determine if your groups have similar means.
14 min read
Mann Whitney U Test in R Programming
A popular nonparametric(distribution-free) test to compare outcomes between two independent groups is the Mann Whitney U test. When comparing two independent samples, when the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. It is used to see the di
4 min read
Leveneâs Test in R Programming
Levene's test is an inferential statistic used to assess whether the variances of a variable are equal across two or more groups, especially when the data comes from a non-normal distribution. This test checks the assumption of homoscedasticity (equal variances) before conducting tests like ANOVA. I
3 min read