Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get a correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s R) which is commonly used for linear regression. Pearson’s correlation coefficient is denoted with the symbol “R”. The correlation coefficient formula returns a value between 1 and -1. Here,
- 1 indicates strong positive relationships
- -1 indicates strong negative relationships
- A result of zero indicates no relationship at all
The linear correlation coefficient is known as Pearson’s r or Pearson’s correlation coefficient. Which reflects the direction and strength of the linear relationship between the two variables x and y. It returns a value between -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.
The “crude estimates” for interpreting the strengths of correlations using Pearson’s Correlation:
r value |
crude estimates |
+.70 or higher |
A very strong positive relationship |
+.40 to +.69 |
Strong positive relationship |
+.30 to +.39 |
Moderate positive relationship |
+.20 to +.29 |
weak positive relationship |
+.01 to +.19 |
No or negligible relationship |
0 |
No relationship [zero correlation] |
-.01 to -.19 |
No or negligible relationship |
-.20 to -.29 |
weak negative relationship |
-.30 to -.39 |
Moderate negative relationship |
-.40 to -.69 |
Strong negative relationship |
-.70 or higher |
The very strong negative relationship |
The formula used to get the linear correlation coefficient of the data is :
R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²
Types of Linear Correlation Coefficients
The linear correlation coefficient is reflected by Pearson’s r. So, the value of r can be range between +1 and -1.
There are three types of linear correlation coefficient as follows:
- Positive values indicate a Positive Correlation (0 < r < 1)
- Negative values indicate a Negative Correlation (-1 < r < 1)
- A Value of 0 indicates No Correlation (r=0)
Positive correlation: In positive correlation both the variables move in the same direction. If one increases the other also increases and if one decreases the other also decreases. Whenever the r indicates a positive value it shows a positive relationship
Negative correlation: In negative correlation both the variables move in different directions. If one increases the other decreases and if one decreases the other increases. Whenever the r indicates a negative value it shows a negative relationship
No correlation: when there is no statistical association between the variables. They are said to have no correlation. In this case, their correlation coefficient (also known as r) is 0.
Problem 1: Calculate the correlation coefficient for the following data:
X = 5, 9,14, 16
and
Y = 6, 10, 16, 20
Solution:
Given variables are,
X = 12,16 ,4, 8
and
Y = 15, 20, 55, 10
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X |
Y |
XY |
X² |
Y² |
5 |
6 |
180 |
144 |
225 |
9 |
10 |
320 |
256 |
400 |
14 |
16 |
20 |
16 |
20 |
16 |
20 |
80 |
56 |
100 |
∑40 |
∑50 |
∑600 |
∑480 |
∑750 |
∑xy = 600
∑x = 40
∑y = 50
∑x² = 470
∑y² = 750
n = 4
Put all the values in the Pearson’s correlation coefficient formula:-
R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²
R = 4(600) – (40)(50) / √[4(470)-(40)²][4(750)-(50)²]
R = 400 / √[320][500]
R = 400/400
R =1
It shows that the relationship between the variables of the data is a very strong positive relationship.
Problem 2: Find the value of the correlation coefficient from the following table:
SUBJECT |
AGE X |
GLUCOSE LEVEL Y |
1 |
42 |
98 |
2 |
23 |
68 |
3 |
22 |
73 |
4 |
47 |
79 |
5 |
50 |
88 |
6 |
60 |
82 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y² also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =6.
SUBJECT |
AGE X |
GLUCOSE
LEVEL Y
|
XY |
X² |
Y² |
1 |
42 |
98 |
4116 |
1764 |
9604 |
2 |
23 |
68 |
1564 |
529 |
4624 |
3 |
22 |
73 |
1606 |
484 |
5329 |
4 |
47 |
79 |
3713 |
2209 |
6241 |
5 |
50 |
88 |
4400 |
2500 |
7744 |
6 |
60 |
82 |
4980 |
3600 |
6724 |
∑ |
244 |
488 |
20379 |
11086 |
40266 |
∑xy= 20379
∑x=244
∑y=488
∑x² =11086
∑y² =40266
n =6.
Put all the values in the Pearson’s correlation coefficient formula:-
R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R = 6(20379) – (244)(488) / √ [6(11086)-(244)²][6(40266)-(488)²
R = 3202 / √ [6980][3452]
R = 3202/4972.238
R = 0.6439
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 3: Calculate the correlation coefficient for the following data:
X = 21,31,25,40,47,38
and
Y = 70,55,60,78,66,80
Solution:
Given variables are,
X = 21,31,25,40,47,38
and
Y = 70,55,60,78,66,80
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X |
Y |
XY |
X² |
Y² |
21 |
70 |
1470 |
441 |
4900 |
31 |
55 |
1705 |
961 |
3025 |
25 |
60 |
1400 |
625 |
3600 |
40 |
78 |
3120 |
1600 |
6084 |
47 |
66 |
3102 |
2209 |
4356 |
38 |
80 |
3040 |
1444 |
6400 |
∑202 |
∑409 |
∑13937 |
∑7280 |
∑28265 |
∑xy= 13937
∑x=202
∑y=409
∑x² =7280
∑y² =28265
n =6
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]
R= 1004 / √[2876][2909]
R=1004 / 2892.452938
R=-0.3471
It shows that the relationship between the variables of the data is a moderate positive relationship.
Problem 4: Calculate the correlation coefficient for the following data:
X= 12, 10, 42, 27,35,56
and
Y = 13, 15, 56, 34,65,26
Solution:
Given variables are,
X= 12, 10, 42, 27,35,56
and
Y = 13, 15, 56, 34,65,26
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X |
Y |
XY |
X² |
Y² |
12 |
13 |
156 |
144 |
169 |
10 |
15 |
150 |
100 |
225 |
42 |
56 |
2353 |
1764 |
3136 |
27 |
34 |
918 |
729 |
1156 |
35 |
65 |
2275 |
1225 |
4225 |
56 |
26 |
1456 |
3136 |
676 |
∑182 |
∑209 |
∑7307 |
∑7098 |
∑9587 |
∑xy= 7307
∑x=182
∑y=209
∑x² =7098
∑y² =9587
n =6
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 6(7307) – (182)(209) / √ [6(7098)-(182)²][6(9587)-(209)²]
R= 5804 / √[9464][13841]
R= 5804/11445.139
R=0.5071
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 5: There is some correlation coefficient that was given to tell whether the variables are positive or negative?
0.69
0.42
-0.23
-0.99
Solution:
The given correlation coefficient is as follows:
0.64
0.46
-0.29
-0.95
Tell whether the relationship is negative or positive
0.64
The relationship between the variables is a strong positive relationship
0.46
The relationship between the variables is a strong positive relationship
-0.29
The relationship between the variables is a weak negative relationship
-0.95
The relationship between the variables is a very strong negative relationship.
Problem 6: Calculate the correlation coefficient for the following data:
X = 10, 13, 15 ,17 ,19
and
Y = 5,10,15,20,25.
Solution:
Given variables are,
X = 10, 13, 15 ,17 ,19
and
Y = 5,10,15,20,25.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.
X |
Y |
XY |
X² |
Y² |
10 |
5 |
50 |
100 |
25 |
13 |
10 |
130 |
169 |
100 |
15 |
15 |
225 |
225 |
225 |
17 |
20 |
340 |
289 |
400 |
19 |
25 |
475 |
361 |
625 |
∑74 |
∑75 |
∑1103 |
∑1144 |
∑1375 |
∑xy= 1103
∑x=74
∑y=75
∑x² =1144
∑y² =1375
n =5
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 5(1103) – (74)(75) / √ [5(1144)-(74)²][5(1375)-(75)²]
R= -35 / √[244][1250]
R= -35/552.26
R=0.0633
It shows that the relationship between the variables of the data is a negligible relationship.
Problem 7: Find the value of the correlation coefficient from the following table:
SUBJECT |
AGE X |
Weight Y |
1 |
40 |
99 |
2 |
25 |
79 |
3 |
22 |
69 |
4 |
54 |
89 |
Solution:
SUBJECT |
AGE X |
Weight Y |
XY |
X² |
Y² |
1 |
40 |
99 |
3960 |
1600 |
9801 |
2 |
25 |
79 |
1975 |
625 |
6241 |
3 |
22 |
69 |
1518 |
484 |
4761 |
4 |
54 |
89 |
4806 |
2916 |
7921 |
∑ |
151 |
336 |
12259 |
5625 |
28724 |
∑xy= 12258
∑x=151
∑y=336
∑x² =5625
∑y² 28724
n =4
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]
R= -1704 / √ [-301][-2000]
R=-1704/775.886
R=-2.1961
It shows that the relationship between the variables of the data is a very strong negative relationship.
1. Calculate r for the data points: (1,2), (2,3), (3,5), (4,4), (5,6)
2. Find the correlation coefficient for: (0,1), (1,3), (2,4), (3,5), (4,8)
3. Determine r for: (-2,4), (-1,1), (0,0), (1,1), (2,4)
4. Calculate the correlation coefficient for: (1,10), (2,8), (3,6), (4,4), (5,2)
5. Find r for: (0,0), (1,2), (2,4), (3,6), (4,8)
6. Determine the correlation coefficient for: (1,1), (2,4), (3,9), (4,16), (5,25)
7. Calculate r for: (10,11), (20,22), (30,31), (40,43), (50,52)
8. Find the correlation coefficient for: (1,5), (2,7), (3,6), (4,8), (5,9)
9. Determine r for: (100,102), (200,199), (300,301), (400,398), (500,503)
10. Calculate the correlation coefficient for: (1,1), (2,1), (3,2), (4,3), (5,5)
Summary
The Linear Correlation Coefficient, also known as Pearson’s correlation coefficient, is a measure of the strength and direction of the linear relationship between two variables. It ranges from -1 to 1, where -1 indicates a perfect negative linear relationship, 0 indicates no linear relationship, and 1 indicates a perfect positive linear relationship. The formula involves calculating the covariance of the two variables and dividing it by the product of their standard deviations. This coefficient is widely used in statistics, data analysis, and various scientific fields to quantify the degree of linear dependence between pairs of observations.