0% found this document useful (0 votes)
7 views

7) Correlation and Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

7) Correlation and Regression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

7 = ELXY] - EI_X] EI_Y] - E[YJE[X] + E[X]E[Y]

= E[XY] - E[X)E[Y] ......... ( I )


Correlation and Regression Remark: If X and y are independent, then

E[XY] = E[X]E[Y]
:. from ( I) we have Cov(X, Y) = o

7.2. Correlation Coefficient


Definition 2. The correlation coefficient between X and

Correlation is a mathematical measure of linear relationship Y is usually denoted by p and is defined as p = cov(X,Y)
between two variables. It gives the strength and direction of I ~~
w1ere ax and Cly are S.D ofX and Y respectively.
linear relationship between the two variables. If change in
Sometimes it is also denoted by ' r'
one variable X produces the change in variable Y, then X
and Y are said to be correlated. When increase in the value Remark: -1 $ p $ l
of one variable increases the value of the other, then two
are said to be positively correlated and when the increase in Theorem 1. Let X and Y are jointly distributed random
one causes the decrease in other, then they are called variables such that the variances of X and Y exist finitely.
negatively correlated . Two variables may be highly, Let µ1 , µz are the means and a/, az2 are the variances of
slightly or moderately correlated. The measure of such a X and Y respectively and p is the correlation coefficient
correlation is called correlation coefficient and usually between X and Y. lfE[YIX] is linear in X, then
denoted by p. a) E[YIX] = µ 2 + p (]<J1 (X -
2
µ1 )
7.1. Covariance b) E(Var[YIX]) = a/(1 - p 2 )
Definition 1. If X and y are two random variables then
covariance between them is defined as Proof: a) Suppose X and Y are continuous random
Cov(X, Y) = E[(X-E[X])(Y-E[Y])] variables having joint pdf f(x, y) and marginal pdf of X be
[ 1 (x) and ofY be f 2 (y)
Provided that the expectation of X and Y exists.
In fact Cov(X, Y) = E[(X-E[X])(Y-E[Y]) Since E[YIX] is linear in X.
= E[(XY- XE[Y]-YE[X]) + E[X]E[Y]) Therefore let E[YIX] = a + bx

Correlation and Regression 175


"' ff!x.y)
I_.,,Y dy - a + hx
:::) ,CxJ :. (3) ⇒ P a1 az + µ, µ2 = aµ1 + b(a1 2 + µ/ ) ............(4)

⇒ (,>yf(x,y)dy = (a+/Jx) fi (x) ··········( I)


On solving (2) and (4) simu ltaneously, we get
Oz a
lntcgaratc w.r.l x on hoth sides or (I) a = µ2 - p-µ , and b = p ..l.
C11 C11

⇒ I~., r:,Y f (x,y) dydx = f~co(a + bx)f1 (x) dx

⇒ J~ ylJ~ f (x, y )dx] dy = a f~00 f1 (x) dx +


00 00 b) The conditional variance or Y given X = xis given by
bf~00 x f 1 (x)dx
V[Y\X] = E[Y - E[YIX]]2 = f_"Jy - E[Yl x]]2 f(y lx)dy
⇒ (.,Y fz(y)dy = a. 1 + b E[X] 2
J
= _oo [ y- µz -p -" 2 (X-µ1) ] -
co ~
f (x.y)
AW
dy ..........(5)
⇒ f' y fz(y)dy = a.1 + b E[X]
00
As the variance is non-negative and is at most function of x
⇒ £[Y] =a+ b E[X] alone.
⇒ µ 2 = a + bµ 1 where µ1 = E[X] and µz = E[Y] •··········<2) :. On multiplying equation (5) by [ 1 (x) and integrating
w.r.t x on both sides,we get
Now firstly multiply equation (I) by x and then integrate
w.r.t x, we have r'.,, V[Yj x ]fi(x)dx = I.:I.: [r - µz- p~(X - µ1)r f(x, y )dy dx
E[V[Y\X]
(., f~co xy f (x, y)dydx = J:' x(a + bx)/ (x) dx
⇒ (.,f:' xy f(x,y)dydx
00 1
=J_
00

00 L: [(y- µz)2 - 2p :: (y - µz)(x - µ1) +

= a f:' 00
00

f:'
x/1 (x) dx + b 00 x 2f1 (x) dx
(P ::f (x - µ 1 ) 2 ] f( x, y )dydx
= E[(y- µ 2 )2] - 2p CTz E[(X - µ1)(Y - µz)] +
⇒ E[XY] = a £(X] + b £(X 2 ] ···········<3)
01

pz CTz: E[(x - µ1)2]


As p is the correlation coefficient between X and Y CTt
2 az2
Cov( X,Y) E[XY) - E[X)E(Y) = az2 - 2p -CTz pa1 az + P 2 CT1
2

:. p = --;;;;- = U1 Uz C11 111

= az2 - 2p2 az2 + p2 az2


= a/ (1 - p 2 )

176 Probability and Statistics Correlation and Regression 177


I Jenee proved .
2
rJx = qxz] - [£[X]]2 = 22. -
12
[~]2
4
= ~
48
Examples Simi larly E[Y] = Ly=o.1 _2ypz(y )
Example I. Let X and Y have joint p.m.f given as
= 0. P2(0) + 1. p2(1) + 2. p2(2) = £
12
= Ly=o.1.2 Y2 P2CY)
2
(x. y) (0.0) (0,1) (0,2) (1,1) (1.2) (2,2) £[Y ]

1 2 = 02.pz (0) + 12·P2(1) + 22·P2(2) = 29


p(x, y) -12 -12 -121 -12
3 4
12 -121 12
And zero otherwise. Find the correlation coeflicient ay z = E[Yz] _ [£[Y]] 2 = 29 _ [~12 = ~
between X and Y. 12 12 144
Solution: The marginal pmfs can be obtained as E[XY] = Lx=o,1,2 Ly=o.1,2xyp(x , y)

= Lx=o.1.2 x(Ly=o.1.2 y. p(x, y)]


X\Y 0 I 2 p,(x)
= Lx=o.1.2 x[1. p(x, 1) + 2. p(x, 2)]
0 -121 2
-12 -
1
12
4
-12
3 4
= Lx=o,1.zX.p(x, 1) + 2Lx=o.1.2X,p(x,2)
I 7
0 -12 -12
12 = p(l,1) + 2p(2, 1) + 2[p(l, 2) + 2[p(x, 1) + 2p(2,2)]
2 0 1 1
0 -12 -12 =~
4
1 5 6 The correlation coefficient betwee n X and Y is given
pz(y) -12 -12 -12 LP1(X ) = I
1:pz(y )= J by
5 I 17
p = E[XY)- E[X )E[Y] = 4 12 = -0.Z&
4-
U1 Uz

3
=-
12 = -41 Example 2. Let X and Y have joint pdf given by
2 2
E[X J = Lx=o.1.2X P1(x) < X < y,
f(x, y) = { 20 0 0 <y <1
2 otherwise
= 0 ,P1(O )+1 2 .p1(1) + 22 • p1 (2) = ~
12 Show that the correlation coefficient of X and Y is !.
2
178 Probability and Statistics
Correlation and Regression 179

Solution: The nurginJI rJf l,f .\ is ( \Tl- E!X' t f!"J
p ==
r~ (x) = .r: {(x,y )dy: 0 < x < 1 .2. 2..
\ lS\ lS

= f:r1 2dv• : o < x < 1 V . Rrgression


= 2(1 - x); 0 <X<1 It is the 5tatis1ic.a.l K'Ol 10 mcb-ure the an •rafc n:-l.?:il'ns~:i'
1 I
lx'.twc'en 1wo or nwre ,-anaNe:s in terms of ( ' ~ In e..u-!i~r
E~l = fo1 xf1(x)dx =2 fo x(l-x)dx =i times ii was used by Sir Francis Galk,n in the stud~ l'I
hm'<li1y. :-.:ow days. ii is u_~ in many t-rand~ li;,.c w
E[X 2 ] =f0I x 2f1 (x)dx = 2J,l0 x 2(1 - x )dx =!_6 prc..--dict the potential sales of new rn'1lh:t in 1em1S <'i it:5

1- (1)i 21 price. the patient" s weigh! in terms <'f the numl'-er Cl! w~l:-
he or she has 1-een on diet. ~ rre capit3 \.'(11L<:un1rti~'n llf
= E[X 2 ]-[E[XJ]2 = 6 = 18
cenain foods in terms of their nutritiN1al ,:tlues etc.
The m:irginal pdf of Y is given by
7_u Binriate Regression
fz(y) = J: f(x,y)dx ; 0 < y < 1 Definition 3. Let X and Y ~ two ,-ariar-ks " hi.:h are

= I: 2c1x ; o< y < 1


jointly distributed. Then the regn.-s:-:iC1n eqwtiC1n Clf Y l' n X
·is denoted l-y Pn x and is defined as

=2y; 0 < y < 1 ~.,. )' p{:r,, ·) for discrete case


= X =X = P, (:r)
~ ,. f (:r ,,)d,,
1
Yf2(y)dy = 2 fo Y 2dy = i
1 2
EM = J0 µl'IX E[\1 ·l
{ . ..,.,. ' · for conlinous rnse
/1{r )
1
f[Y2] = fo y2/z(y)dy = 2 fo1 y 3dy - 21
where f\x. y). p(X. y) denotes their joint rx!i and rmi :mJ
o/ = f[Y 2 ] - 1- (2)i 2= 1
[E[Y]]2 = 2 18
/1(X),P1(x)
denotes their marginal densities.
E[XY] = J 0
1
J: xyf(x,y)dxdy = 2 J J: xydxdy =;
0
1
Similarly the Then !he regression eqt1.1til'O <'f X l'D y is
The correlation coefficient between X and Y is given by denoted by µXIY and is defined as

180 Probability and Statistics 181


LrX rlx YI for discrete case
P2lYl
µ.qr =
{
f:,x { (x.y )dx
fi (y)
ror continous case
a, : aJ_: f ,(x)dx +b f_:xf,(x)dx
Remarks => f_a,y f2(y) dy = a. l + b E[X]
1. The graph of µ XIY orµ y1x is called regression curve of => f ~a, Y f2(y) dy = a. l + b E[X]
X on Y and Yon X respectively. Therefore any one or
bolh the regressions may be linear.
=> E[Y] = a + b E[X]
2. When lhe regressions X on Y and Y on X are linear, => µ 2 = a + bµ1 whereµ - E[X]
lhey are called linear regression. i - and µ z = E[Y] ..... (2)
Now firstly multiply equation ( I) bv x .
w.r.t x. we have • · and then mtegrate
Theorem 2. Let µ 1 , µ 2 are the means and a/, az2
are the
variances of X and Y respectively and p is the correlation
coefficient between X and Y. If the regression of Y on X is J~a, J~a, xy f(x,y)dydx = f~"" x (a + bx)f, (x ) dx
linear, then prove that µy 1x =µ 2 + p (Jz (x - µ1) => f ~a, J:, xy f(x , y )dydx
(JI

Proof : We will prove the result for continuous case. For = Q J~a, Xf1 (x) dx + b r :, x 2f, (x) dx
discrete, integrals can be replaced by summation sign and
proof can be done on similar way. => E[XY] = a E[X] + b E[X 2 ]
........... (3)
Let regression equation of Yon X is linear then As p is the correlation coefficient between X and y
:. p = Co v ( X ,Y) = _E_lX_Y'--)-_E.;..[X.:..)E...,_[Y....:l
µy1x =a+ bx ⇒ f_a,a,yf(ylx) dy = a+ bx q, qi q, {Jz

⇒ fa, yf(x.y) dy =a+ bx => E[XY] = p U1 a2 + µ1 µz


-ao fi(x)


00
f_ y/(x,y) dy = (a+ bx)f1(x).
:. (3) ⇒ P U1 a2 + µ1 µ2 = aµ 1 +b(a12 + µ/ ) ......(4)
00
.........(I)
On solving (2) and (4) simultaneously. we get
Integrate w.r.t x on both sides of (I)
qi {Jz
a = µ2 - p-µ 1 and b = p-
=> f~aoL""aoy f(x,y)dydx = J:ca + bx)f1(X) dx <11 a1

182 Probability and Statistics Correlalion and Regression 183


I km:e proved.
Example 2. Let the random variables X and y b . . I
d1.stn· buted as e Joint y
Remark: If the regression of X on Y is linear, then we
have
f{x, y) = ( ~ - X < Y < X, 0 < X < l
otherwise
Find the regression equation of y on X.

Examples Solution: The marginal density ofX is given by

f1(X) = fxf(x,y)dy ; 0 < x < l


Example I. Let X and Y be two random variables having
joint density = f x 1. dy =2x ; 0 < x < l
xe-x(l+y) X > 0, y >0
f(x, y) = { 0 The regression equation of Y on X is given by
otherwise
JOO X
Find the regression equation of Y on X. = E[YIX = x] = -oo Y f(x,y)dy = LxY dy = 0
~~ '100 h
Solution: The marginal density of X is given by which is required regression equation ofY on X.
f 1(x) = J0 f(x,y)dy; X > 0
00

Example 3. Given the joint density f(x, y)


O<x<y<l
otherwise
e-rylOO
=xe-x __ - =e-x; x > 0 Find the regression line ofX on Y.
-x 0

The regression equation of Y on X is given by Solution: The marginal density ofX is given by
1
C' y f(x,y)dy f. xye-x(i+y) dy
00
f 1(x) = fx f(x,y)dy; O<x<l
µYIX = E[YIX = x] = oo fi(x) = o e-x
= J; 6x. dy =6x.yli = 6x(l - x) ; 0< x < l
= J,oo xye-xy dy = ~
0 X
f2 (y) = g f(x,y)dx ; 0 < x < l
= fY 6xdx = 6. xzlY = 3y 2 ; 0 < y < 1
Which is reqired regression equation of Yon X. Jo 2 o

µ1 =E[X]= J; xf1(x)dx= fo16x2(l-x)dx


= 6 fol (x2 - x3)dx

184 Probability and Statistics Correlation and Regression 185


.1' 3
= 6 . ( -j - -4
.r•)1 l _
- -2
1
fllotc . I · When there are ties in rank. then we assign the tied
o
1 obSCr v·ition the mean of ranks that they jointly occupy.
E[X ]= f0 x f 1(x)dx = f0 6x 3 ( l-x) dx
2 2 1 '

= 6 fo1 (x
3
- x 4 )dx 2. When there is no ties in rank. then rs = r (correlation
coet1ic icnt)
xs)l 1 -_ -3
= 6. (-x• - -
students
ExampIC 4· The number of hours of .study. of IO
4 5 O 10
2
= E[X 2]-(E[X ]) 2 =---
3 1 1
• o- 10 4
=-
20 for an e",·amination and the scores obtained 1s as follows
•• 1
y•1 l 3
µz=E[Y]= fo1 Yf2(y) dy= fo 3y dy = 3.7 o =;-
1 3
No.o f hrs. 8 5 11 13 10 5 18 15 2 8
studied (x)
SI 1 3
E[Y2] = fo1 y2/2(y)dy = fo 3y4dy = 3.: o = 1 s - Score (y) 56 44 79 72 54 94 85 33 70 65
2 2
.-. o- = E[Y ] - (E[Y])2 = -3 - 9
-= -3
5 16 80 ~

Calculate the rank correlation coefficient.


J0 J: xy f(x,y) dxdy =J0 J; 6x 2y dxdy
2
1 1
E[XY] =
l 2 . n.. The ranking of x·s and y's is givn below
Solutio
= J, 2y 4 dy=-5
o dz
2 3 ~

Ranko fx Ranko fy(Rz d :

:. p = Co11(X.Y) = E[XY)-E[X)E(YJ =_ii _= 2_ = O.SB Rz) Ri_) 2


Q ( Ri) ) [R 1 (R, -
~
-
U1 Uz U1 Uz ,/3
• ✓-io · ~ao 6.5 7 -0.5 0.25
The regression line ofX on Y is given by 8.5 9 -0.S 0.25
µXIY = µ1 + P~(y - µz) 4 3 1 1
<Tz

==> µXIY =0.S + 0.66(y - 0. 75) 3 4 -1 1


I 8 -3 9
5
7.4. Rank Correlation Coefficient 8.5 1 7.5 56.25

This is called non Parametric measure of association. It is


also known as ~peannan's rank correlation coefficient. I 1
2 10
2 -1
-8
5
64
25
1

10 5
Definition 4. Let {(xi,Yi; i = 1,2,3, .... n} be the given set 6 -0.S 0.25 J
6.5
of paired data. Then the rank correlation coefficient
denoted by rs and defined by ( Here L d = 158
2

r.=1-~h
6't'" d 2
The rank correlation coefficient is given by
s n(nz_ 1) w ere d; is the difference of the ranks I
'I 6Id2 - - 6•158 =0.84
assigned to xi and Y;. r.s - - 1
= 1 - -n(nZ-1) 10(102- 1)

186
Probability and Statistics
Com:lation and Regression 187
! i
Exercise 7.1 s. The ranking of IOstudents in two subjects A and B
are as fo llows
I. Let X and Y have joint p.m.f given as

(x,y) (1, 1)

2
(1,2)

4
(1,3)

3
(2,1)

1
(2,2) (2,3)
ffi: I: I: I: I: l~'I: l I: I: I
Calculate the rank correlation coefficient.
10

p(x, y) -15 -15 -15 -15 -151 -4

And zero elsewhere. Fmd the correla!lon coefficient


15
•••
between X and Y.
2. Let X and Y are jointly distributed as Answers
f(x, y) = -1
2- X- y Q :5 X :5 1, Q :5 y :5 1 1. 0.246 2. a)- b)2
{ 144 11
0 otherwise
a) Find the covariance between X and Y 3. µYIX = 2.698 + 0.0027(x - 4.253)
b) Find the correlation ofX and Y
3. The joint density of two random variables X and y 4. a) 0.49 b) µyJx = 0.8 + 0.267(x - 0.53), µx 1r =
is given by 0.53 + 0.898(y - 0.8)
f(x, y) =
2x+y 5. 0.3
210 2 < X < 6, < Q < y < 5
{
0 otherwise
Find the regression line of y on x
4. Let the joint density of (X, Y) is
f(x, y) = { 8xy O< x < y < 1
. 0 otherwise
a) Fmd correlation coefficient ofX and y
b) Fi nd both the regression lines

188
Probability and Statistics
Correlation and Regression 189

You might also like