0% found this document useful (0 votes)
17 views14 pages

ABM 401 Lesson 12

Uploaded by

Predestination
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views14 pages

ABM 401 Lesson 12

Uploaded by

Predestination
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ABM 401 Lesson 12

Lesson 12
Correlation: Meaning and Calculations
(Karl Pearson’s Method)
Objectives

To make students able to understand:


Correlation: Meaning, and more precisely Linear Correlation;
Calculation of Correlation by Karl Pearson’s Method;
Concept of Probable Error, Standard Error, and Coefficient of Determination.

INTRODUCTION

In statistics, correlation (often measured as a correlation coefficient) indicates the


strength and direction of a linear relationship between two random variables. In general
statistical usage, correlation or co-relation refers to the departure of two random
variables from independence. In layman language, Correlation is the relationship that
exists between two or more variables. For example:
Relationship between height and age.
Relationship between price and demand of commodity.
Relationship between dose of insulin and blood sugar.
Features of Correlation Coefficient:
The main characteristics of Correlation Coefficient are:
The correlation coefficient is represented by r (the lowercase Greek letter rho).
It ranges between -1 and +1.
The value closer to -1, the stronger the negative linear relationship.
The value closer to +1, the stronger the positive linear relationship.
The value closer to 0, the weaker the linear relationship.
Uses of Correlation:
Economic theory and business studies relationship between variables like price and
quantity demand.
Correlation analysis helps in deriving precisely the degree and the directions of a
relationship.

154
ABM 401 Lesson 12

The effect of correlation is to reduce the range of uncertainty of our prediction.


It gives the direction as well as the degree of the relationship between the variables.
Helps in estimating the value of the dependent variable from the known values of the
independent variable.
Limitations
It assumes linear relationship between the variables.
It is time consuming.
It is affected by extreme values.

TYPES OF CORRELATION

A correlation in which the regression line, the line that best describes the relationship
between the two variables, is a straight line, so that for any change in the magnitude of
one variable there will be a proportional change in the magnitude of the other variable.
The data can be represented by the ordered pairs (x, y) where x is the independent or
explanatory variable and y is the dependent, or response variable.

Correlation can be: (i) Positive, negative and absence of correlation; (ii) Linear or non-
linear; (iii) Simple, partial, and Multiple.
(i) Positive Correlation: When higher magnitudes on variable ‘Y’ occur along with
higher magnitudes on variable 'X' and the lower magnitudes on both also co-occur, then
they vary together positively, and we denote this situation as positive co-variation or
positive correlation. This can be shown in the following figure 1:

Figure 1: Positive Linear Correlation

155
ABM 401 Lesson 12

If x and y and close to each other in the same direction, it is said that the variable x and
y have a strong positive linear correlation. In this, r is close to 1. This is shown in the
following figure 2.

Figure 2: Strong positive correlation (r = 0.81)

(ii) Negative Correlation: The second possibility is that two variables vary inversely or
oppositely. That is, the higher magnitudes of variable 'Y' go along with the lower
magnitudes of variable 'X' and vice versa. Then, we denote this situation as negative
co-variation or negative correlation. This can be seen in the following figure 3:

Figure 3: Negative Linear Correlation

156
ABM 401 Lesson 12

If x and y and close to each other in opposite direction, it is said that the variable x and y
have a strong negative linear correlation. In this, r is close to -1. This is shown in the
following figure 4

Figure 2: Strong negative correlation (r = -0.92)

(iii) Absence of Correlation: If there is no linear correlation or a weak linear


correlation, r is close to 0, it is called as absence of correlation. This is shown in the
following figure 5.

Figure 3: No correlation

(iv) Linear Correlation: If the amount of change in one variable tends to bear constant
ration of change in the other variable, the correlation is said to be linear. It can be
positive or negative.

157
ABM 401 Lesson 12

(v) Nonlinear Correlation: Correlation will be said to be nonlinear or curvilinear if the


amount of change in one variable does not tend to bear constant ration of change in the
other variable. It can also be positive or negative.
(vi) Simple Correlation: When only two variables are studied, it is the case of simple
correlation. In other words, if we study the correlation between variables X and Y, it will
be called as simple correlation.
(vii) Partial Correlation: When three or more variables are studied simultaneously, it is
called multiple correlation e.g. correlation between X and Y1, Y2, Y3, and so on.
(viii) Multiple Correlation: When we study correlation between two variables in which
one is dependent and one is independent leaving some other variables which are also
correlated with the dependent variable, it is called partial correlation. For example X is
an independent variable and Y1, Y2, Y3, and Y4 are dependent variables. Here, if we
study correlation between variables X and Y1 only, it will be called as partial correlation.

DEGREES OF CORRELATION

There are five degrees of correlation. These are: (I) Perfect correlation, (ii) High degree
correlation, (iii) Moderate degree correlation, (iv) Low degree correlation, and (v)
Absence of correlation. The numerical values of these are tabulated below.

Degree Positive Negative

Perfect +1 -1

High + 0.75 to + 1 - 0.75 to - 1

Moderate + 0.25 to + 0.75 - 0.25 to - 0.75

Low 0 to + 0.25 0 to - 0.25

Absence 0 0

KARL PEARSON’S COEFFICIENT OF CORRELATION


Interpreting correlation using a scatter plot shown above can be subjective. A more
precise way to measure the type and strength of a linear correlation between two
variables is to calculate the correlation coefficient. To explain correlation coefficient we
can say that "The correlation coefficient is a measure of the strength and the direction of

158
ABM 401 Lesson 12

a linear relationship between two variables. The symbol r represents the sample
correlation coefficient." The most widely used mathematical method for measuring the
intensity or the magnitude of the linear relationship between two variables was
suggested by Karl Pearson. The formulae for calculating correlation in different
conditions are tabulated below.

FORMULAE FOR CALCULATING KARL PEARSON’S


COEFFICIENT OF CORRELATION

Where:
Direct Method (When Or
deviations are taken from
Actual Mean)
Or

Short Cut (When deviations


are taken from Assumed
Or
Mean)

In Case of Grouped Date or


Bi-variate Distribution Or

159
ABM 401 Lesson 12

Example 1: Calculate coefficient of correlation between age of husband and age of wife
from the following data:
Age of
17 20 22 27 21 29 26 30 28 30
Wife
Age of
22 27 28 28 29 30 31 34 25 36
Husband
Solution:
dx dy
(X) ( x) (Y) ( y) dxdy
( ) (
17 -8 64 22 -7 49 56
20 -5 25 27 -2 4 10
22 -3 9 28 -1 1 3
27 2 4 28 -1 1 -2
21 -4 16 29 0 0 0
29 4 16 30 1 1 4
26 1 1 31 2 4 2
30 5 25 34 5 25 25
28 3 9 25 -4 16 -12
30 5 25 36 7 49 35
250 0 194 290 0 150 121

Here deviations are taken from actual mean. Which is and


Now, by formula:

This can also be solved by simplified formula.

160
ABM 401 Lesson 12

Example 2: Find out the correlation between the height of father and height of son from
the following data:
Height of
17 20 22 27 21 29 26 30 28 30
Father (inches)
Height of
22 27 28 28 29 30 31 34 25 36
Son (inches)
Solution:
(X) dx (68) ( x) (Y) dy (69) ( y) dxdy
65 -3 9 67 -2 4 6
66 -2 4 68 -1 1 2
67 -1 1 66 -3 9 3
65 -3 9 68 -1 1 3
68 0 0 72 3 9 0
69 1 1 70 1 1 1
71 3 9 71 2 4 6
73 5 25 70 1 1 5
544 0 58 522 0 30 26

Here, and Now, by formula:

Example 3: From the given information calculate coefficient of correlation

X Series Y Series
Number of Items 15 15
Mean 25 18
Sum of Squares of deviation from their respective means 136 138
Sum of products of deviation of X and Y series from their means 122

Here:

161
ABM 401 Lesson 12

Example 4: Calculate the coefficient of correlation between weight and Income from the
following data. What are you conclusions?

Weight (Kg.) 120 130 140 150 160 170


Income (Rs.) 100 200 300 400 500 600

Solution:
weight (X) dx (150) ( x) Income (Y) dy (30) ( y) dxdy
120 -30 900 100 -200 40000 6000
130 -20 400 200 -100 10000 2000
140 -10 100 300 0 0 0
150 0 0 400 100 10000 0
160 10 100 500 200 40000 2000
170 20 400 600 300 90000 6000
N=6 -30 1900 N=6 300 190000 16000
Here we have taken from assumed weights. So, we will apply short cut method.

As the weight and Income have no cause and effect relationship as such to calculate
correlation between these two variables communicates no sense.

Example 5: Find out if there is any correlation between age & illiteracy from the
following information:

Age 0-10 10-20 20-30 30-40 40-50 50-60 60-70


Population (thousands) 120 100 80 50 25 15 5
Illiterates (hundreds) 100 75 60 30 20 10 5

162
ABM 401 Lesson 12

Solution: Firstly we should find the number of illiterates per thousand in each group.

E.g. (i) =75, (iii)

Illiterate
M.V. dx dy
Age ( x) (per ‘000) ( y) dxdy
(X) (A=35) (A=77)
(Y)
0-10 5 -30 900 83 6 36 -180
10-20 15 -20 400 75 -2 4 40
20-30 25 -10 100 75 -2 4 20
30-40 35 0 0 60 -17 289 0
40-50 45 10 100 80 3 9 30
50-60 55 20 400 67 -10 100 -200
60-70 65 30 900 100 23 529 690
N=7 0 2800 N=7 1 971 400

PROBABLE ERROR OF COEFFICIENT OF CORRELATION

After the calculation of coefficient of correlation the next thing is to find out the extent
to which it is dependable. For this purpose the probable error of the coefficient. of
correlation is calculated. If probable error is added to and subtracted from the
coefficient of correlation it would give two such limits within which we can reasonably
expect the value of coefficient of correlation to vary. It means that if from the same
universe another set of random samples was selected, the coefficient of correlation
between the two variables in the new sample would not fall outside the limits so
established. The formula for calculating probable error of the Karl Pearson's
coefficient of correlation is:

163
ABM 401 Lesson 12

To make calculations easy, we may use 2/3 in place of 0.6745. It will not affect the
result. The limits of r for any set of random sampling from the universe shall be
determined as under:

Properties of Probable Error:

If the value of r is less than the probable error, there is no evidence of correlation.
If the value of r is more than six times of the probable error it is significant
correlation.
If the probable error is not much and if the coefficient of correlation is 0.5 or more it
is generally considered to be significant. .
Probable error as a measure for interpreting coefficient of correlation should be
used only when a sample study is being made and the sample is unbiased and
representative.
Probable error as a measure for interpreting coefficient of correlation should be
used only when the number of pairs of observations is large. If n is small probable
error may give misleading conclusions. In case n is small standard error is used.

Example 6: Calculate probable error if coefficient of correlation is 0.92 and number of


pairs of item is 25.

Solution:

Maximum limit = r + P.E. or 0.92 + 0.021 = 0.941


Minimum limit = r - P.E. or 0.92 - 0.021 = 0.899

164
ABM 401 Lesson 12

Example 7: Comment on the significance of correlation if:


(i) N = 25, r = 0.8; and
(ii) N = 100, PE = 0.04.
Solution:
(i) Given N = 25, r = 0.8

Here: r = 0.8 is more than six times the PE, i.e. (0.049 × 6) = 0.294. As such coefficient
of correlation is significant.
(ii) First of all we shall find out the value of r with the help of P.E. and N given in the
problem.

COEFFICIENT OF DETERMINATION

The nature and the extent of relationship between two variables are indicated by the
coefficient of correlation. An effective way of interpreting correlation is by way of
coefficient of determination. The coefficient of determination is defined as the ratio of
the explained variance to the total variance. If this ratio is multiplied by 100, it will give
the percentage of co-variance in Y (X) which is associated with the variance on X (Y) or
vice versa. Thus,

165
ABM 401 Lesson 12

For example, if r = 0.6, then r2 = 0.36. If it is multiplied by 100, it will be 36%. It means
36 % of the variance in the relative series has been explained by the subject series and
the remaining 64% of the variance is due to other factors.

SUMMARY

Coefficient of correlation is calculated to study the extent or degree of correlation


between two variables. We know that there is correlation between two variables does
not mean that their relationship is functional or constant. If the value of a variable is
known it is not always possible to obtain the exact value of the other variable. This can
be done only where there is relationship between the two variables. Karl Pearson, the
great statistician, has given a formula for the calculation of coefficient of correlation.
According to it the coefficient of correlation of two variables is obtained by dividing the
sum of the products of the corresponding deviations of the various Items of the two
series from their respective means, by the product of their standard deviations and
number or pairs of observations.
REVIEW QUESTIONS

1. What is meant by correlation? Explain its types, merits and demerits.


2. What is correlation? Explain its features, uses and degrees.
3. How do we calculate Karl Pearson’s coefficient of correlation? Explain its methods in
detail.
4. Write a detailed note on meaning, properties and uses of probable error.
5. Calculate the correlation coefficient for the advertising expenditures and company
sales data. (Figures in Lakh Rs.)
Advertising Exp. 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2
Sales 225 184 220 240 180 184 186 215
Ans. = 0.913

6. Find the coefficient of correlation from the following table:


X 300 350 400 450 500 550 600 650 700
Y 800 900 1000 1100 1200 1300 1400 1500 1600
Ans. = 1

166
ABM 401 Lesson 12

7. Calculate the coefficient of correlation from the following data:

X 10 12 15 18 25 35 45 50 55 65
Y 5 7 13 15 20 21 29 30 36 44
Ans. = 0.98
8. Calculate the coefficient of correlation from the following data of marks obtained in
Commerce (X) and Economics (Y):

X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
Ans. 0.611
9. From the following data find out if there is any relationship between density of
population and death rate:

Zone Area (Sq. Km.) Population No. of Deaths


P 200 40,000 480
Q 150 75,000 1200
R 120 72,000 1080
S 80 20,000 280

10. If r = 0.82, N = 10, calculate the probable Error. Ans. = 0.07

11. What is the significance of the coefficient of correlation (r) for the following value
based on the number of observations (a) 50 and (b) 500. (r= 0.4)

Ans. = (i) 0.08, r not significant; (ii) 0.025, r is significant

SUGGESTED READINGS
Elhance DN: Fundamentals of Statistics
Gupta SP: Statistical Methods
Gupta BN: Statistics
Nagar KN: Fundamentals of Statistics
Varshney RD: Fundamentals of Statistics
Nagar AL: Fundamentals of Statistics

167

You might also like