How to Calculate Matthews Correlation Coefficient in R
Last Updated :
15 Mar, 2024
The correlation coefficient is a statistical measure used to quantify the relationship between two variables. It indicates the strength and direction of the linear association between them. The range of coefficient values is from -1 to 1.
It is denoted as '?'
- ? = 1 indicates a perfect positive linear relationship.
- ? = −1 indicates a perfect negative linear relationship.
- ? = 0 indicates no linear relationship.
The most commonly used correlation coefficient is the Pearson correlation coefficient.
Which is calculated using the following formula
r = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}}
Where:
- ? is the Pearson correlation coefficient.
- ?? and ?? are the individual data points.
- x̄ and ȳ are the means of the variables ? and ?, respectively.
This formula computes the correlation coefficient ? between two variables ? and ?, with values ranging from -1 to 1. A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship.
Student
| Height(X)
| Weight(Y)
|
---|
1
|
63
|
127
|
2
|
65
|
140
|
3
|
67
|
155
|
4
|
69
|
160
|
5
|
71
|
170
|
Now calculate the correlation coefficient between height and weight for these students.
Step 1: First, we need to calculate the mean (x̄ and ȳ):
x̄ = 63+65+67+69+71/5 = 67 inches
ȳ = 127+140+155+160+170/5 = 150.4 pounds
Step 2: Next, we calculate the correlation coefficient using the formula:
(63-67)(127-150.4)+(65-67)(140-150.4)+(67-67)(155-150.4)+(69-67)(160-150.4)+(71-67)(170+150.4)
------------------------------------------------------------------------------------------
√(63-67)2+(65-67)2+(67-67)2+(69-67)2+(71-67)2 √(127-150.4)2+(140-150.4)2+(155-150.4)2+(160-150.4)2+(170-150.4)2
? = 93.6+20.8+0+19.2+78.4/√40.√1166.76
? ≈ 212/216.804
r ≈ 0.978
So, the correlation coefficient ≈ 0.978
In R, you can calculate the Pearson correlation coefficient using the cor() function
R
# Sample data for heights and weights of students
height <- c(63, 65, 67, 69, 71)
weight <- c(127, 140, 155, 160, 170)
# Calculate correlation coefficient using cor() function
correlation_coefficient <- cor(height, weight)
# Print the correlation coefficient
print(correlation_coefficient)
Output:
[1] 0.9870827
Here is two vectors height and weight representing the heights and weights of five students.
- We use the cor() function to calculate the correlation coefficient between the two variables.
- The result is stored in the variable correlation_coefficient.
- Finally, print the correlation coefficient using print().
What is Matthews Correlation Coefficient(MCC)
The Matthews correlation coefficient (MCC) is a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is regarded as a balanced measure even if the classes are of very different sizes. MCC is particularly useful when classes are imbalanced.
MCC values range from -1 to 1, where
- 1 indicates a perfect prediction
- 0 indicates a random prediction
- -1 indicates total disagreement between prediction and observation
The MCC is calculated by using the following formula
MCC = \frac{(TP \times TN - FP \times FN)}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}
Where:
- TP (True Positives) is the number of correctly predicted positive examples.
- TN (True Negatives) is the number of correctly predicted negative examples.
- FP (False Positives) is the number of incorrectly predicted positive examples.
- FN (False Negatives) is the number of incorrectly predicted negative examples.
| Predicted Negative
| Predicted Positive
|
---|
Actual Negative
| 50
| 10
|
Actual Positive
| 5
| 135
|
In this confusion matrix:
- True Negatives (TN) = 50
- False Positives (FP) = 10
- False Negatives (FN) = 5
- True Positives (TP) = 135
For calculate MCC , we can use the formula
MCC = (TP*TN – FP*FN) / √(TP+FP)(TP+FN)(TN+FP)(TN+FN)
MCC = (135*50-10*5)/√(135+10)(135+5)(50+10)(50+5)
MCC = (6750-50)/√(145)(140)(60)(55)
MCC= 6700/8184.74
MCC = 0.8185
So, the correct Matthews correlation coefficient (MCC) for this example is approximately 0.8185.
Calculate of Matthews Correlation Coefficient in R
To calculate the Matthews correlation coefficient (MCC) in R Programming Language, we can use the 'mltools' package or the mcc() function from the 'pracma' package.
Calculate Matthews Correlation Coefficient in R Using 'mltools' package
R
install.packages("mltools")
library(mltools)
actual <- rep(c(1, 0), times=c(20, 380))
preds <- rep(c(1, 1, 0, 0), times=c(15, 5, 5, 75))
mcc(preds, actual)
Output:
[1] 0.4588315
install.packages("mltools"): This command installs the mltools package from CRAN if it is not already installed.
- library(mltools): After installation, the library() function loads the mltools package into the R session, making its functions available for use.
- actual <- rep(c(1, 0), times=c(20, 380)) creates a vector actual consisting of 20 instances of class 1 followed by 380 instances of class 0, using the rep() function to repeat values.
- preds <- rep(c(1, 1, 0, 0), times=c(15, 5, 5, 75)) creates a vector preds representing predicted labels. It contains 15 instances of class 1, 5 instances of class 1, 5 instances of class 0, and 75 instances of class 0, using the rep() function similarly to the actual vector.
mcc(preds, actual): It calculates the Matthews correlation coefficient (MCC) between the preds and actual vectors using the mcc() function provided by the 'mltools' package
Calculate Matthews Correlation Coefficient in R Using 'pracma' package
R
# Install and load required packages
install.packages("pracma")
library(pracma)
# Create some example data
actual <- c(1, 0, 1, 0, 1)
predicted <- c(1, 0, 0, 1, 1)
# Calculate MCC
mcc_value <- mcc(as.logical(actual), as.logical(predicted))
# Print MCC
print(mcc_value)
Output:
[1] 0.1666667
install.packages("pracma"): This line installs the pracma package from CRAN if it's not already installed.
- library(pracma): this line loads the pracma package into the R session, making its functions available for use.
- actual <- c(1, 0, 1, 0, 1) creates a vector actual containing actual labels, where 1 represents one class and 0 represents another class.
- predicted <- c(1, 0, 0, 1, 1) creates a vector predicted containing predicted labels corresponding to the actual labels.
mcc_value <- mcc(as.logical(actual), as.logical(predicted)): This line calculates the Matthews correlation coefficient (MCC) between the actual and predicted vectors using the mcc() function from the 'pracma' package.
Calculate Matthews Correlation Coefficient in R Using 'caret' package
R
# Load required package
install.packages("caret")
library(caret)
# Generate example data
actual <- c(1, 0, 1, 0, 1) # Actual labels
predicted <- c(1, 0, 0, 1, 1) # Predicted labels
# Create confusion matrix
conf_matrix <- confusionMatrix(as.factor(predicted), as.factor(actual))
# Extract values from confusion matrix
TP <- conf_matrix$table[2, 2] # True Positives
TN <- conf_matrix$table[1, 1] # True Negatives
FP <- conf_matrix$table[2, 1] # False Positives
FN <- conf_matrix$table[1, 2] # False Negatives
# Calculate Matthews correlation coefficient (MCC)
mcc <- (TP * TN - FP * FN) / sqrt((TP + FP) * (TP + FN) * (TN + FP) * (TN + FN))
# Print MCC
print(mcc)
Output:
[1] 0.1666667
install.packages("caret"): This line installs the caret package from CRAN if it's not already installed.
- library(caret): After installation, this line loads the caret package into the R session, making its functions available for use.
- actual <- c(1, 0, 1, 0, 1): This line creates a vector actual containing actual labels, where 1 represents one class and 0 represents another class.
- predicted <- c(1, 0, 0, 1, 1): This line creates a vector predicted containing predicted labels corresponding to the actual labels.
- conf_matrix <- confusionMatrix(as.factor(predicted), as.factor(actual)): This line creates a confusion matrix using the confusionMatrix() function from the caret package. It takes the predicted and actual labels as inputs.
- TP <- conf_matrix$table[2, 2]: true positives from the confusion matrix.
- TN <- conf_matrix$table[1, 1]: true negatives from the confusion matrix.
- FP <- conf_matrix$table[2, 1]: false positives from the confusion matrix.
- FN <- conf_matrix$table[1, 2]: false negatives from the confusion matrix.
This line calculates the Matthews correlation coefficient (MCC) using the formula provided earlier in this conversation. It uses the extracted values of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from the confusion matrix.
Conclusion
In conclusion, Matthews correlation coefficient (MCC) in R is a robust metric for binary classification model evaluation. It contains true and false positives/negatives, even with imbalanced data. It's a reliable tool for assessing model performance and comparing different algorithms or experiments.
Similar Reads
How to calculate the Pearsonâs Correlation Coefficient?
Answer: Pearsonâs Correlation Coefficient (r) can be calculated using the formula:[Tex]r = cov(x,y) / ÏxÏy = \frac{â(x-\bar{x})(y-\bar{y})}{\sqrt{â(x-\bar{x})^2}\sqrt{â(y-\bar{y})^2}}[/Tex]The correlation topic comes under the statistics concept. It establishes the relationship between two variables
4 min read
How to Calculate the Coefficient of Determination?
In mathematics, the study of data collection, analysis, perception, introduction, organization of data falls under statistics. In statistics, the coefficient of determination is utilized to notice how the contrast of one variable can be defined by the contrast of another variable. Like, whether a pe
7 min read
How to Calculate Partial Correlation in R?
In this article, we will discuss how to calculate Partial Correlation in the R Programming Language. Partial Correlation helps measure the degree of association between two random variables when there is the effect of other variables that control them. in partial correlation in machine learning It g
3 min read
How to Calculate a Phi Coefficient in R
In this article, we will discuss what is Phi Coefficient and How to Calculate a Phi Coefficient in R Programming Language. What is the Phi Coefficient?The Phi coefficient, also known as the Phi correlation coefficient or the coefficient of association, is a measure of association between two binary
5 min read
How to Calculate Rolling Correlation in R?
In this article, we will discuss Rolling Correlation in R Programming Language. Correlation is used to get the relationship between two variables. It will result in 1 if the correlation is positive.It will result in -1 if the correlation is negative.it will result in 0 if there is no correlation. Ro
2 min read
How to Calculate Rolling Correlation in Python?
Correlation generally determines the relationship between two variables. The rolling correlation measure the correlation between two-time series data on a rolling window Rolling correlation can be applied to a specific window width to determine short-term correlations. Calculating Rolling Correlati
2 min read
How to Calculate Polychoric Correlation in R?
In this article, we will discuss how to calculate polychoric correlation in R Programming Language. Calculate Polychoric Correlation in R Correlation measures the relationship between two variables. we can say the correlation is positive if the value is 1, the correlation is negative if the value is
2 min read
How to Calculate Cross Correlation in R?
In this article we will discuss how to calculate cross correlation in R programming language. Correlation is used to get the relation between two or more variables. The result is 0, if there is no correlation between two variablesThe result is 1, if there is positive correlation between two variable
1 min read
How to Calculate Correlation Between Two Columns in Pandas?
In this article, we will discuss how to calculate the correlation between two columns in pandas Correlation is used to summarize the strength and direction of the linear association between two quantitative variables. It is denoted by r and values between -1 and +1. A positive value for r indicates
2 min read
How to Calculate Point-Biserial Correlation in R?
In this article, we will discuss how to calculate Point Biserial correlation in R Programming Language. Correlation measures the relationship between two variables. we can say the correlation is positive if the value is 1, the correlation is negative if the value is -1, else 0. Point biserial correl
2 min read