ABM 401 Lesson 12
ABM 401 Lesson 12
Lesson 12
Correlation: Meaning and Calculations
(Karl Pearson’s Method)
Objectives
INTRODUCTION
154
ABM 401 Lesson 12
TYPES OF CORRELATION
A correlation in which the regression line, the line that best describes the relationship
between the two variables, is a straight line, so that for any change in the magnitude of
one variable there will be a proportional change in the magnitude of the other variable.
The data can be represented by the ordered pairs (x, y) where x is the independent or
explanatory variable and y is the dependent, or response variable.
Correlation can be: (i) Positive, negative and absence of correlation; (ii) Linear or non-
linear; (iii) Simple, partial, and Multiple.
(i) Positive Correlation: When higher magnitudes on variable ‘Y’ occur along with
higher magnitudes on variable 'X' and the lower magnitudes on both also co-occur, then
they vary together positively, and we denote this situation as positive co-variation or
positive correlation. This can be shown in the following figure 1:
155
ABM 401 Lesson 12
If x and y and close to each other in the same direction, it is said that the variable x and
y have a strong positive linear correlation. In this, r is close to 1. This is shown in the
following figure 2.
(ii) Negative Correlation: The second possibility is that two variables vary inversely or
oppositely. That is, the higher magnitudes of variable 'Y' go along with the lower
magnitudes of variable 'X' and vice versa. Then, we denote this situation as negative
co-variation or negative correlation. This can be seen in the following figure 3:
156
ABM 401 Lesson 12
If x and y and close to each other in opposite direction, it is said that the variable x and y
have a strong negative linear correlation. In this, r is close to -1. This is shown in the
following figure 4
Figure 3: No correlation
(iv) Linear Correlation: If the amount of change in one variable tends to bear constant
ration of change in the other variable, the correlation is said to be linear. It can be
positive or negative.
157
ABM 401 Lesson 12
DEGREES OF CORRELATION
There are five degrees of correlation. These are: (I) Perfect correlation, (ii) High degree
correlation, (iii) Moderate degree correlation, (iv) Low degree correlation, and (v)
Absence of correlation. The numerical values of these are tabulated below.
Perfect +1 -1
Absence 0 0
158
ABM 401 Lesson 12
a linear relationship between two variables. The symbol r represents the sample
correlation coefficient." The most widely used mathematical method for measuring the
intensity or the magnitude of the linear relationship between two variables was
suggested by Karl Pearson. The formulae for calculating correlation in different
conditions are tabulated below.
Where:
Direct Method (When Or
deviations are taken from
Actual Mean)
Or
159
ABM 401 Lesson 12
Example 1: Calculate coefficient of correlation between age of husband and age of wife
from the following data:
Age of
17 20 22 27 21 29 26 30 28 30
Wife
Age of
22 27 28 28 29 30 31 34 25 36
Husband
Solution:
dx dy
(X) ( x) (Y) ( y) dxdy
( ) (
17 -8 64 22 -7 49 56
20 -5 25 27 -2 4 10
22 -3 9 28 -1 1 3
27 2 4 28 -1 1 -2
21 -4 16 29 0 0 0
29 4 16 30 1 1 4
26 1 1 31 2 4 2
30 5 25 34 5 25 25
28 3 9 25 -4 16 -12
30 5 25 36 7 49 35
250 0 194 290 0 150 121
160
ABM 401 Lesson 12
Example 2: Find out the correlation between the height of father and height of son from
the following data:
Height of
17 20 22 27 21 29 26 30 28 30
Father (inches)
Height of
22 27 28 28 29 30 31 34 25 36
Son (inches)
Solution:
(X) dx (68) ( x) (Y) dy (69) ( y) dxdy
65 -3 9 67 -2 4 6
66 -2 4 68 -1 1 2
67 -1 1 66 -3 9 3
65 -3 9 68 -1 1 3
68 0 0 72 3 9 0
69 1 1 70 1 1 1
71 3 9 71 2 4 6
73 5 25 70 1 1 5
544 0 58 522 0 30 26
X Series Y Series
Number of Items 15 15
Mean 25 18
Sum of Squares of deviation from their respective means 136 138
Sum of products of deviation of X and Y series from their means 122
Here:
161
ABM 401 Lesson 12
Example 4: Calculate the coefficient of correlation between weight and Income from the
following data. What are you conclusions?
Solution:
weight (X) dx (150) ( x) Income (Y) dy (30) ( y) dxdy
120 -30 900 100 -200 40000 6000
130 -20 400 200 -100 10000 2000
140 -10 100 300 0 0 0
150 0 0 400 100 10000 0
160 10 100 500 200 40000 2000
170 20 400 600 300 90000 6000
N=6 -30 1900 N=6 300 190000 16000
Here we have taken from assumed weights. So, we will apply short cut method.
As the weight and Income have no cause and effect relationship as such to calculate
correlation between these two variables communicates no sense.
Example 5: Find out if there is any correlation between age & illiteracy from the
following information:
162
ABM 401 Lesson 12
Solution: Firstly we should find the number of illiterates per thousand in each group.
Illiterate
M.V. dx dy
Age ( x) (per ‘000) ( y) dxdy
(X) (A=35) (A=77)
(Y)
0-10 5 -30 900 83 6 36 -180
10-20 15 -20 400 75 -2 4 40
20-30 25 -10 100 75 -2 4 20
30-40 35 0 0 60 -17 289 0
40-50 45 10 100 80 3 9 30
50-60 55 20 400 67 -10 100 -200
60-70 65 30 900 100 23 529 690
N=7 0 2800 N=7 1 971 400
After the calculation of coefficient of correlation the next thing is to find out the extent
to which it is dependable. For this purpose the probable error of the coefficient. of
correlation is calculated. If probable error is added to and subtracted from the
coefficient of correlation it would give two such limits within which we can reasonably
expect the value of coefficient of correlation to vary. It means that if from the same
universe another set of random samples was selected, the coefficient of correlation
between the two variables in the new sample would not fall outside the limits so
established. The formula for calculating probable error of the Karl Pearson's
coefficient of correlation is:
163
ABM 401 Lesson 12
To make calculations easy, we may use 2/3 in place of 0.6745. It will not affect the
result. The limits of r for any set of random sampling from the universe shall be
determined as under:
If the value of r is less than the probable error, there is no evidence of correlation.
If the value of r is more than six times of the probable error it is significant
correlation.
If the probable error is not much and if the coefficient of correlation is 0.5 or more it
is generally considered to be significant. .
Probable error as a measure for interpreting coefficient of correlation should be
used only when a sample study is being made and the sample is unbiased and
representative.
Probable error as a measure for interpreting coefficient of correlation should be
used only when the number of pairs of observations is large. If n is small probable
error may give misleading conclusions. In case n is small standard error is used.
Solution:
164
ABM 401 Lesson 12
Here: r = 0.8 is more than six times the PE, i.e. (0.049 × 6) = 0.294. As such coefficient
of correlation is significant.
(ii) First of all we shall find out the value of r with the help of P.E. and N given in the
problem.
COEFFICIENT OF DETERMINATION
The nature and the extent of relationship between two variables are indicated by the
coefficient of correlation. An effective way of interpreting correlation is by way of
coefficient of determination. The coefficient of determination is defined as the ratio of
the explained variance to the total variance. If this ratio is multiplied by 100, it will give
the percentage of co-variance in Y (X) which is associated with the variance on X (Y) or
vice versa. Thus,
165
ABM 401 Lesson 12
For example, if r = 0.6, then r2 = 0.36. If it is multiplied by 100, it will be 36%. It means
36 % of the variance in the relative series has been explained by the subject series and
the remaining 64% of the variance is due to other factors.
SUMMARY
166
ABM 401 Lesson 12
X 10 12 15 18 25 35 45 50 55 65
Y 5 7 13 15 20 21 29 30 36 44
Ans. = 0.98
8. Calculate the coefficient of correlation from the following data of marks obtained in
Commerce (X) and Economics (Y):
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
Ans. 0.611
9. From the following data find out if there is any relationship between density of
population and death rate:
11. What is the significance of the coefficient of correlation (r) for the following value
based on the number of observations (a) 50 and (b) 500. (r= 0.4)
SUGGESTED READINGS
Elhance DN: Fundamentals of Statistics
Gupta SP: Statistical Methods
Gupta BN: Statistics
Nagar KN: Fundamentals of Statistics
Varshney RD: Fundamentals of Statistics
Nagar AL: Fundamentals of Statistics
167