Pearson Correlation Testing in R Programming

Kendall Correlation Testing in R Programming

Last Updated : 10 Oct, 2024

Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally, it lies between -1 and +1. It is a scaled version of covariance and provides direction and strength of the relationship. It's dimensionless. There are mainly two types of correlation:

Parametric Correlation - Pearson correlation(r): It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
Non-Parametric Correlation - Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, and are known as non-parametric correlation.

What is Kendall's Tau?

Kendall’s tau is a measure of correlation that assesses the ordinal relationship between two variables. It is based on the difference between the number of concordant and discordant pairs in the dataset. Kendall Rank Correlation is a rank-based correlation coefficient, also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:

[ \tau = \frac{\text{Number of concordant pairs} - \text{Number of discordant pairs}}{\frac{n(n - 1)}{2}} ]

where,
Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
x1 > x2 and y1 > y2 or
x1 < x2 and y1 < y2
Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
x1 > x2 and y1 < y2 or
x1 < x2 and y1 > y2
n: Total number of samples
Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.

R’s base functions include support for calculating Kendall's tau using the cor() and cor.test() functions. Optionally, you can install additional visualization packages such as ggpubr for enhanced plots.

install.packages("ggpubr")

Lets discuss stepby step Kendall Correlation Testing in R Programming Language:

Step 1: Creating a Dataset

Let’s create a sample dataset to work with.

R

# Sample data
set.seed(123)
x <- c(12, 25, 35, 47, 52, 68, 70, 85, 90, 100)
y <- c(15, 22, 37, 40, 48, 60, 67, 80, 95, 105)

data <- data.frame(x, y)

Step 2: Computing Kendall Correlation using `cor()` function

You can compute Kendall’s tau using the cor() function and specifying the method as "kendall".

R

# Calculate Kendall correlation
kendall_corr <- cor(data$x, data$y, method = "kendall")
kendall_corr

Output:

[1] 0.9111111

Step 3: Hypothesis Testing with `cor.test()`

To conduct a hypothesis test and obtain the p-value for Kendall correlation, use the cor.test() function:

R

# Perform Kendall correlation test
kendall_test <- cor.test(data$x, data$y, method = "kendall")
kendall_test

Output:

	Kendall's rank correlation tau

data:  data$x and data$y
T = 45, p-value = 5.511e-07
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau 
0.9111111

The output shows a high positive Kendall's tau (0.91), with a p-value indicating statistical significance.

Positive values indicate a strong positive monotonic relationship.
Negative values indicate a strong negative monotonic relationship.
Values near 0 indicate little to no monotonic relationship.
Always consider the p-value to assess the significance of the relationship.

Step 4: Visualizing Kendall Correlation

We can visualize the correlation using a scatter plot and annotate it with Kendall’s tau using the ggpubr package.

R

# Load required library
library(ggpubr)

# Scatter plot with Kendall correlation coefficient
ggscatter(data, x = "x", y = "y", 
          add = "reg.line", conf.int = TRUE, 
          cor.coef = TRUE, cor.method = "kendall",
          xlab = "X Values", ylab = "Y Values",
          title = "Kendall Correlation Plot")

Output:

Kendall Correlation Testing in R Programming

This will produce a scatter plot with a trendline and display the Kendall correlation coefficient.

Conclusion

Kendall correlation, or Kendall’s tau, provides a robust non-parametric measure of association between two variables. It is particularly useful for ordinal data and datasets with ties, where it outperforms other correlation measures like Pearson or Spearman. R offers convenient functions like cor() and cor.test() to compute Kendall’s tau and test its significance.

Pearson Correlation Testing in R Programming

A

AmiyaRanjanRout

Improve

Article Tags :

Similar Reads

Pearson Correlation Testing in R Programming

Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c

Spearman Correlation Testing in R Programming

Correlation is a key statistical concept used to measure the strength and direction of the relationship between two variables. Unlike Pearsonâ€™s correlation, which assumes a linear relationship and continuous data, Spearmanâ€™s rank correlation coefficient is a non-parametric measure that assesses how

Correlation Matrix in R Programming

Correlation refers to the relationship between two variables, specifically the degree of linear association between them. In R, a correlation matrix represents this relationship as a range of values between -1 and 1.A value of -1 indicates a perfect negative linear relationship.A value of 1 indicate

Fligner-Killeen Test in R Programming

The Fligner-Killeen test is a non-parametric test for homogeneity of group variances based on ranks. It is useful when the data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. It is also one of the many tests for homogeneity of variances which is

Hypothesis Testing in R Programming

A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypo

Bartlettâ€™s Test in R Programming

In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or sam

Unit Testing in R Programming

The unit test basically is small functions that test and help to write robust code. From a robust code we mean a code which will not break easily upon changes, can be refactored simply, can be extended without breaking the rest, and can be tested with ease. Unit tests are of great use when it comes

Comparing Means in R Programming

There are many cases in data analysis where youâ€™ll want to compare means for two populations or samples and which technique you should use depends on what type of data you have and how that data is grouped together. The comparison of means tests helps to determine if your groups have similar means.

Mann Whitney U Test in R Programming

A popular nonparametric(distribution-free) test to compare outcomes between two independent groups is the Mann Whitney U test. When comparing two independent samples, when the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. It is used to see the di

Leveneâ€™s Test in R Programming

Levene's test is an inferential statistic used to assess whether the variances of a variable are equal across two or more groups, especially when the data comes from a non-normal distribution. This test checks the assumption of homoscedasticity (equal variances) before conducting tests like ANOVA. I