0% found this document useful (0 votes)
21 views

Correlation Analysis

Uploaded by

Suravi Baid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Correlation Analysis

Uploaded by

Suravi Baid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

C- 1229

ANALYSIS
TION

COARELA The mean of the product of deviation scores (x, -) and (y,-) is called the

Definition. Y

X and
i.e.
of
...0
coariamCe
Cov(X, )or Cyy = E(r- N) (y-y)
that
to see
It is easy ..i)
Cov(X, Y) =
if x - I,

small numbers, it is easier to calculate Cov (X, Y) using formula(i);


If x, y are
we can assume
small fractionless numbers, it is easy to use formula (0;in other cases,

are
y-y A and B, use u = X- A, v= y- B,
then
means ...(iin)

Cov(X, Y) =Euw-u Zo

EXAMPLES
ILLUSTRATIVE xy = 115 and
1. Find the covariance between x and y when Ex = 50, >2y = -30, E

Eyaple
n =10.
Given Ex = 50, Ey =-30, 2 xy= 115 and n = 10.

Cov(x, y) = n xy

115-x 50 x(-30)

1
-(115 + 150) =
265

10
= 26.5

10
=
Y ifE uv, 50 and n =15 where u,and v, are deviations

Example2.Find Cov (X,Y) between X and


means.

)
respective
X-and Yseries from
their

uv,=50
of

Given that u, =x -i and v, =y, - y and

faolation.
E(;- 7) (y, - = 50.

Here, n = 15.

Cov(X, Y) = E(x;-\y;-)
50 10
= 3.33 (approx.)
3
15

E (y; - 5) =20, E x,V; =148 and n =5, find Cov(x, y).

Eyample3. If E (x;-2) = 10,

Solution. Given n =5,Exy; =148


E (x; -2) = 10 Ex - 27n =10
Ex,
x,=
-2x 5 =10 20,

E -5) =20 Ey; -5n =20


(y,

£y;-5 x5 =20 Ey,


= 45.

Cov(x, y) = Exy-Ex2y

-48-x20x45 -(148-180)
= - 5 --6.4
C- 1230
UNDERSTANDING 1SC
MATHEMATICS-XI
Eyámple4. If Ex - 3) = 20 and = 10, find Covl,
15, Ey, =40, E (K,- 2) (y, n
y).

Salutión. Given Ex, =15, Ey,40, n=


2-- 2) (y,-3) -20
10,

2xy, 3 Ex, - 2 Ey + (2x 3)n 20


Ex,y,-3x 15- x 40 + 6 x 10 =20 2

Ex,y, 20 + 45 + 80 -60 -85

Covts, y) - Exy-y)
o85-x15x40-(85-60)
25

10
= 2.5

Exaíple 5. Find CoU(X, Y) for the following data :


X 4 5 6 7
Y 8 7 6 5 4
Solution. Here N = 5. Construct the following table:
Total

3 4 5 6 7 25
8 7 6 5 4 30
Xy 24 28 30 30 28 140

Here Ex =25, Ey =30, xy =140


Cov(X, Y) =
.:.
Hence, we see negative
-[40-c]--2
a relation between X and Y.

Empße 6. Compute Cov(X, n for following pairs of observations :


(15, 44), (20, 43), (25, 45), (30, 37), (40,34), (50, 37).

Solution. Here =mean of values of X-variable


15 + 20+ 25 +30 + 40 + 50 180
6 =30
44 + 43+ 45 +37 + 34 +37 240
6 =

))
40.
6

Construct the following table

(x- (y -
15 -15 44 4 -60
20 10 43 3 30
25 -5 45 5 -25
30 37 -3 0
40 10 34 -6 -60
50 20 -3
37 -60

So
Total

Cov(X,Y) = E(x-)(y
N
1
-235)
-
=-39.17
) -235

6
ANALYSIS
CORRELATION C- 1231
7. Calculate the
Exgprple covariance for the following bivariate data
11 12 13 14 15 17 18 19 20 21
14 8 12 21 19 19 23 22 17 25
Solution. Assume mean of X-variate A = and
16, for Y-variate B= 19.

u = X- 16
V= /- 19
11 -5 14 -5 25
12 -4 - 11 44
13 12 -7 21
14 -2 21 2 -4
15 -1 19 0
17 1
19 0 0
18 2 23 4

19 3 22 3
20 4 17 2
21 5 25 6 30
Total - 10 125

Cov(X, Y) =

= 12.5

EXERCISE 2.1
1. Find Cov(X, Y) between = 2 y.= Ex; =-115 and n =
x and y when Ex, 50, 30, V; 10.

2 Find covariance between and y when Ex; ==60Eand=n = =274covariance


and n =
x

between
52, y, 60, x; y, 10.

3. IfEx,= 100, 2y, = 140, 2 - - 10) (y: 15) 10, find


(x;

x and y.

Find Cov(X, Y) between X and Y if E u, v, =55 and n = 11 where u, and v; are deviations

.
X and Y means.

5 of

(1, 5), (2, 7),


series

(3, 9),
from
Find the covariance of the data given below
:

(4,
their

11), (5,
respective

10), (6, 9), (7, 8), (8, 7), (9, 6), (10, 5).

the following bivariate data:


Caleulate the covariance of
7 10 12 13 14 15
4 5 6 11

48 42 36 30 24 18 12
Y 78 72 66 60 54

7), (9, 9), (12, 11), (15,13), (18, 15),


observations (3, 5), (6,
Calculate the covariance of
Z

(21, 17), (24, 19)


using assumedmeans A 13 and = B= 12.

data :
8. Calculate covariance for the following

1
3 4 6 7 8

|Y 16
9 4 1 1

independent
4 9 16

of the choice of origin, it depends


upon the
Prove that though covariance
is
9.

Scale. If u =
ax + b, v = cy + d, show that cov(u, v) a.c. cov(r, y).=
OF CORRELATION
2,4 KARL PEARSON'S COEFFICIENT
it depends on the scale of
covariance is independent of the choice of the origin,
Though
further, we use the following formula for (Karl Pearson's)
measurement. To standardise it
of correlation (sometimes called Product
Moment Correlation).
coefficient
C- 1232 UNDERSTANDING 1SC MATHEMATICs

Cov Cov (x,y)


Jror pr,y)= (x, y)

VVarx Var y
It is casy to see that if

I = ax + b andv= cy + d, then

plu, v) = Cov (u, v) E(u -)(u-)


(v-)2
NN (u- )2

N a(r-)?
Ea(r-) b(y -
b2(y
)
-)2
N

ab

Iabl
Cov (x, y)
o,y
to(X, y)

N N
Therefore, coefficient of correlation is independent of choice of origin and scale. Note
that - 1 Srs1 (Proof is beyond the scope of this book).

If ,y-y are small fractionless numbers, we use

E(r-)(y-)

If , E(-I2y-y)2
y are small numbers, we use

Zxy-xy
..i)
Ey?-y)²
Ofherwise, we use assumed means A and B, and u =X- A, v + y- B,

N Eu Ev
1
E-Z)2
N
...(iü)

Some remarks regarding coefficient of correlation


1. The square of r i.e. 2 is called coefficientof determination. Obviously 0
ss1.Variation
between X and Y is indicated by rand not r. For example, if r= 0.9, there strong positive
relation between X and Y, but as ²
= (0.9)2 =0.81, only 81 percent variation in Y is
is

explained due to variation in X.

2. Correlation is said to be of high degree if


3
s Ir |s 1, of moderate degree if

slrl<
4 4
and of low degree if 0 s r|<.
4
1
I
3. IfX and Y are independent variables then cov(X, Y) = 0 and coefficient of correlation
r=0. Inversely, if r =0, then X and Y have no linear
still have relation. However, Y may
a curved with X. For example, for observations (-4, 16),
relation
(-3, 9), (-2, 4), (-1, 1),
(1,1), (2, 4), (3, 9), (4, 16), we find that r =0
(Do it !). However, we also see that Y = X2.
=
Hence, though r 0, we can still accurately
predict the value of Y, given the value of X.
4. Correlation highly abused by researchers and
coefficient is
advertisers. It may or may not
indicate cause and
relationship. For example, in any school, you will find a
effect
high
positive correlation between children's shoe size and spelling ability. Does it mean that
bigger feet lead to better brains or that if you learn to
spell better, your feet will get bigger?
May be a third factor, that is, age of children, affects both these factors.
TION ANALYSIS
OARELA

JLLUSTRATIVE EXAMPLES
1. Find
Example the
()
)
6.25 and Var
Var(X)
cocf}icient of
20.25. C1233
corelation

solution. r(X =
between X and
Y
Cov(X,Y) when Cov (X,
|Var(X) Y-2.75,
-2.75
Var(Y)
-2.75 N6.25 x
2025
2.5x 4.5 275 10
100
10 11

--2.444 25x45
45
Findthe (approx.)
Example
2.
Ey' =464, E xy = correlation
508 and
1 =
25.
cocfficient
betwcen Xand
Y
wlen Ex=
125, Ey 100,
x'-650,

-x
L

Solution. r (X, Y) = Lxy


1
Ly

EX-(2x)
y'-(y
508 1

25 x 125 x 100
650 -25 X (125)2
1

64-1
25 X(100)2

508 –500
8
J650-625\464 -400 J25 x
8 1
64

5x8E0.2
Example 3. Calcylate
the correlation

A= =150, 2(x -10) =


100, y
coefficient from the following
data:

(y -15) = 180, E
E(x- 10) (y -15) =60 and 215,
n = 10.
Solution. =X -10 and v=y -
.u= Let u

E(r- 10) =2x - 10n =


= 10 (given)
=
15, n

100 – 10 × 10 0,
v=E(y- 15) =Ly - 15n = 150 - 15 = x 10 0.
Also E 2 =180, = 215 andE uv = 60

1
Euv.

60
10X0x0 60

V180 x 215
|180-x0²215-x0²
10

= 60 6
= 0.305 (approx.)
38700 V387

Example 4. Find Karl Pearson's coefficient of correlation between X and Y for the following data :
X 5 4 2
Y 4 2 10 8 6
C- 1234 UNDERSTANDING ISC MATHEMATICS

Solátion. Here N = 5, and X. Yae small numbers. So we use formula (0).

We construct the following table :

5 25 4 16 20
4 16 2 4 8
3 9 10 100 30

2 4 64 16
1 1 6 36 6

Total 15 55 30 220 80

1 = Exy -rEy1

(Ex)2 (Ey)²
N N

80-(15)(30) -10 10

V10 40 20

=-0.5
Example 5. Calculate coefficient of correlation from the follorving data :
X 12 13 14 15 16 17 18

14 17 18 19 20 24 28

Solution. Here X = EXN 105


7= 15,

2Y
Y = N
140

7 = 20.

Also X - X,Y - Y are small fractionless numbers, so we use formula ().

We construct the following table:


X X-X (X- X Y Y- Y (Y-Y)| (X- X)(Y- Y)
i.e. X -15 i.e. Y-20
12 -3 14 -6 36 18
13 -2 4 17 -3 9 6
14 -1 1 18 -2 4 2
15 19 -1 1 0

16 1 1 20 0 0
17 2 4 24 4 16 8
18 3 28 64 24

E(X-X)2 2(Y-Y)² Z(X- X) (Y- Y)


|
= 28 =130 =58
r = Z(X- X)(Y - Y) 58 58
= 0.961
/EX- X2 E(Y- Y)2
V28 N130 V3640

Example 6. Find the Karl Pearson's coefficient of correlation between x and y for the following data:

16 18 21 20 22 26 27 15
22 25 24 26 25 30 33 18

Solution. Assume mean A =20 for the x-variate and B = 25 for y-variate and we shall use
the formula(ii).
C-1235

ANALYSIS
TION
CORRELA U= 20 2 Vy- 25

22 -3 12
-4 16

0 0
16 -2 4 25

18 24
1
21 0 0 26 1
(0

1
20 4 25 0 0
2
22 36 30 25
6
26 49 33 8 64 56
7
27 25 -11 55
5 14 121

15
152
5 135 1 221

Luv-E|Ev
N
Hence,
p,y) =

152-(5)(-1) 1221

V1055 /1767

ji35-(6 21--9
= 0.894
husbands and wives based
on the
heiglts of
between the
Exanple 7. Find the correlation coefficient
(given in inches)and interpret
the result.
data
dllowig
1 23 4 5 6 10 11 1213 14 15
lCouple
7271 7170 68 6868 68676762
Heict of husband 76|757572
67 64 6565 66636561
77 70|7067 |716565
Height of
wife

B =66, use the formula (iii)


and
means A = 70,
shall
Solution. We use assumed
2
u =X -A V = y- B
Couple
=X-70 =y-66
5 25 30
76 6 36 71
1
16 20
25 70 4
2 75 5 20
4 16
5 25 70
3 75 1 2
67 1
2 4
4 72 10
5 25
2 4 71
5 72 1 -1
1 65
6 71 1

1 65 1 1 -1
1
7 71
1

70 0 67 1

4 64 2 4 4
68
4 65 1 2
68
10
4 65 1
2
68 2
11
66 0
2 4
12 68
3 63 -3
13 67
9 65 -1 1 3
14 67 -3
8 64 61 -5 25 40
15 62
5 127 140
Total
194
C-1236 INDERSTANDING ISC MATHEMATICS- XI

(0) (5)
140
N 15
(E )? (Eo)? (0)2
15
y(127)2(9)2
N N 15

-0.89,which is a strong positive correlation.


and short men marry short women
shows that tall men usually marry tall women
This

(called assortive mating).


while falling from various storeue
percentage of cats killed
Example8.The followingtable shows the

coefficient and dlraw scatler diagram. Comment om,


in Newyork. Calculate correlation
from skysgpers

7 9
2 3
Fallen from mmber of storeys
15 18 15 12 3
3 9 12
Percentage killed

Solution. We use assumed means A = 5 and B= 12.


W=X -5 2
V=y- 12
3 -9 81 36
1 -4 16
-3 9
2 -3 9
0
3 -2 12
3 9 -3
4 -1 1 15

0 18 6 36
15 9 3
6 1 1

7 2 12 0 0
-3 9 -9
3 9

4 16 3 -9 81 -36

Total 60 -12 234

.. = N = 0, as both E uv =0 and u =0.

Though r = 0 means there is no linear


Scatter Diagram
relationship between X and Y, the scatter

diagram clearly shows a non-linear relationship

between X and Y. This shows that upto 5th

of getting killed
storey, percentage cats

increases, but thereafter it decreases, so much


so that only 3% cats are killed when they fall

from 9th storey. Cats have highly flexible

bodies 1 2 3 4 5 7 8 9 X
when
they fall from higher storeys,

they get sufficient time to stretch their bodies Number of stories fallen

like parachutes, which saves them from getting Fig. 2.4.


killed even from getting their legs broken.

Eyample 9. From the follorwing data, find the values of a, b and Karl Pearson's coefficient of
correlatio:
10 13 16 25 26 30

6 10 12 15 19

Given that =20 and =12.


YSIS C- 1237
ANAL
9ARELATION
Here,n
=7
= 20 10+ +16+
Solution.
13
I a + 25+26 +30
7 = 20
Given
120 +a = 140a= 20 :
y = 12 6+8+10+ 12 +b+15 +19
7 = 12

70 + b = 84 b= 14

the following table :


We
construct

U=t -20
V= y- 12

-10 100 6 -6 60
10 36
-7 49 8 -4 28
13 16
-4 16 10 -2 8
16 4
0 0 12 0 0
20
5 25 10
25 14 2 4
6 36 15 3 18
26 9
30 10 100 19 7 49
70

Total
Eu = 0 2 u2 =326 v = 0 E² =118 2uv =194
Luv-EuLv 194-x0x0

326-x0° 118-xo
11
194 194 97 97

V326 x 118 W326 x 118 V163 x 59 /9617


97

98.066
=0.989 (approx.)
A student ohile calculating correlation cocfficient betroeen two variables x and y for
25 pdis of gbservations obtained the following results :

Ex = 650, 2 y =100, E'=460 and E xy = 508.

oH rechecking, it zvas found that he had wrongly copied two pairs as (6, 14) and (8, 6) whereas
palues zwere (8, 12) and (6, 8). Calculate the correct correlation coefficient between x and y.

Solution. Correct Ex = 125 - 6 - 8 + 8 +6= 125


Correct – 14 -6 + 12 + 8 = 100
y = 100

Correct - 62 - + 82 + 62 = 650
L2 = 650 8

Correct Ey2 = 460 – 142- 62 + 122 + 82


= 460- 196 - 36 + 144 + 64 =436
Correct Exy = 508 – (6) (14) – (8) (6) + (8) (12) + (6) (8)
= 508 -84 - 48 + 96 + 48 =520.
Correct coefficient of correlation
.:.
Exy -2xEy
(Ex)2 (Ey)?
11
11
1
520 (125) (100) 20 20
1 V25 V36 30
1
650- (125)2 436-(100)2
= 0.667
UNDERSTANDING ISC MATHEMATICS V

C-1238
50 observations is 0.3, R= 10, - 6,
of corelation betveen x and y for
Jkample 11. Cocffcient
values (10, 6) wasinaCCurate and hence

,3 and o, 2. Later on, it was liscovered that one pair


betveen the remaining
of

49 pairs of values.
of correlation

6
the cocfficient
Ceeded out. Caleulate

Given n 50, 10,


Solution.

10.V6 Er 101, Ey61


10 x 50500, Sy6 x 50 300.

r Cov (x,y),we get


Using the formula,

0.3
(:o, 3, o, 2)
Jx2

> 61.8y
1
18 y - 10 x 6

50 x 61.8 3090.

o, -3 -(I)? - 32

EO2- 10 E?- 50
1
9 109 x 5450;

a, - 2 y- (7)? = 22

50
y-6 - 4 y = 40 x 50 - 2000.

weeded out, remaining observations n = 49.


When the inaccurate observation (10, 6) is

New Ex =500 - 10= 490, y=300 -6 = 294


L?= 5450 - 10² = 5350, Ey² = 2000 - 62 = 1964,
Exy = 3090 – 10 x6 = 3030.

Exy- 1,
ExEy
= we
11
Using the formula, r get

1
3030 x 490 x 294
-
new = 49

-x
r

5350 49 (490)2 /1964 - 1

4g
x (294)2

3030 -2940 90

V5350 - 49001964 – 1764 V450 V200

90 90 3
= 0.3
300 10
/90000

We find that the coefficient of correlation remains the same.

EXERCISE 2.2
A Find p(x, y) if cov(r, y) = -16.5, var(r) = 2.25 and var(y) =144.
2. The coefficient of correlation between two variables X and Y is 0.64. Their covariance is

16. The variance of X is 9. Find the standard deviation of Y-series.


ANALYSIS
CORRELLATION
Er -26,
KIfn = 10, y
=-27,Er = C- 1239
226, E v
fhd/the cOvariance and the = 267, E xy
=7,
E60,.Ey
find
= 60, E?= 400, E coefficient of
correlation
correlation coefficient.

coefficient
y²= 580and between x and y when n = 10,
6 Fina of E xy

Xlulate the
) =138, (r-
coefficient
correlation

of
betweenx
and v,
)(y-V)=122 andwhen Er=
correlation
=305.

n = 15.
375, Ey | 270, (r- =

Karl Pearson's method. between X and Y from the


following data
X 1 2 3 5
using

2 5 3 8 7
Karl
K Compute Pearson's
firm for six months.
Coefficient of
Correlation

) between sales and expenditures a

of
Sales
18 20
of 27 20

)
(in laklh
21 29
Expenditure
23 27
(in lakh of 28 28 29 30
Calculate KarlPearson's
coefficient of correlation
and interpret the result : betweenxand y for the following
data
(1, 6),(2, 5),
(3, 7), (4,9), (5, 8), (6, 10),
(7, 11), (8,
13), (9, 12).
Calculate Karl Pearson's
coefficient of
X 6
correlation between x and y for the followingdata
2 4 9 :
1 3 5
Y 13 8 12 15 9 10
LA
11 16
Find Karl Pearson's
coefficient of correlation between X and Y for the following data:
16 18 21 20 22 26 27 15
Y 22 25 24 26 25 30 33 14
11<The weights of sons and fathers (in kilograms) are
given below :
Weight of father 65 66 67 67 68 69 70 72
Weight of son 67 68 65 68 72 72 69 71
Find the coefficient of correlation.

12 Calculate Karl Pearson's coefficient of correlation from the following data and interpret
the result :

Serial number of student 1 2 3 4 5 6 7 10

Marks in mathematics 15 1821 24 2730 36 39 42 48

Marks in statistics 25 252727 313335 41 41 45

13. Calculate Karl Pearson's coefficient of correlation between the marks in English and
Mathematics obtained by 10 students :

English
20 13 18 21 11 12 17 14 19 15

Mathematics 17 12 23 25 14 19 21 22 19

14.Find Katl Pearson's coefficient of correlation from the given data:

21 24 26 29 32 43 25 30 35 37

120 123 125 128 131 142 124 129 134 136
C-1240 IGC MATHEMATCS
UNOERSTANDING

15. From the following table, calulate the Kartl Pearson's coefficient of correlation
X 6 2 10 4

11 7
Anhmetic means of X and Y series are 6 and 8 respectively.
befween two variables r and y for 12 pairs of
alculating the coeffiient of correlation

bservations,the following data was obtained


NEr 30,
On rechecking it
Iy- 5, - 670, y 285, L xy - 334
the correct pair
found that one pair (11, 4) was wrongly copied
was
while

being (10, 14). Find the correct value of coeffiient of correlation.

2.5 SPEARMAN'S RANK CORRELATION COEFFICIENT


Sometimes, it is dificult to give numerical values to a quality eg. honesty, beauty, intelligence

etc Sometimes, though may be possible to quantify the variable, we may chose to grade it in
it

terms of ranks, by using numbers 1, 2. ..,n. Assigning rank to the highest (or lowest) value 1

and rank 2 to the next highest (or next lowest) value and so on. If two corresponding sets of
values x and y are ranked in such manner, the Edteard Spearman's coefficient of rank correlation,

denoted by r or as r, is given by

r =1 6E42
mn -1)
where =difference between ranks of corresponding x and y
d
n = number of pairs of values (r, y) in the data.
As an example, let 5 students be ranked in Maths and Physics as
Student A B C D
Maths 1 2 3 4 5
Prysics 2 3 4 5

Then we see that d =0 for each pair, so r =+ 1.

Now let the same 5 students be ranked in Maths and Sports as

Student A B C D E
Maths 1 2 3 4 5
Sports
5 4 3 2 1

Then we have differences d = -4,-2,0, 2, 4, so Ed2 = 40,

6x 40 240
and r= =1 =1-2=-1.
1
5(25-1) 120

Thus, we see that r =+1 when ranks are in complete agreement, and in the same direction.
Also r =-1 when ranks are in complete agreement but in the opposite direction. Otherwise,r
varies between -1 and +1.

Now, let us assumethat the five students are given marks as follows in Maths and English:

Student A B D
Maths 90 80 70 60 50

English 90 70 80 60 50

You might also like