0% found this document useful (0 votes)
5 views

7) Correlation and Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

7) Correlation and Regression

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

7 = ELXY] - EI_X] EI_Y] - E[YJE[X] + E[X]E[Y]

= E[XY] - E[X)E[Y] ......... ( I )


Correlation and Regression Remark: If X and y are independent, then

E[XY] = E[X]E[Y]
:. from ( I) we have Cov(X, Y) = o

7.2. Correlation Coefficient


Definition 2. The correlation coefficient between X and

Correlation is a mathematical measure of linear relationship Y is usually denoted by p and is defined as p = cov(X,Y)
between two variables. It gives the strength and direction of I ~~
w1ere ax and Cly are S.D ofX and Y respectively.
linear relationship between the two variables. If change in
Sometimes it is also denoted by ' r'
one variable X produces the change in variable Y, then X
and Y are said to be correlated. When increase in the value Remark: -1 $ p $ l
of one variable increases the value of the other, then two
are said to be positively correlated and when the increase in Theorem 1. Let X and Y are jointly distributed random
one causes the decrease in other, then they are called variables such that the variances of X and Y exist finitely.
negatively correlated . Two variables may be highly, Let µ1 , µz are the means and a/, az2 are the variances of
slightly or moderately correlated. The measure of such a X and Y respectively and p is the correlation coefficient
correlation is called correlation coefficient and usually between X and Y. lfE[YIX] is linear in X, then
denoted by p. a) E[YIX] = µ 2 + p (]<J1 (X -
2
µ1 )
7.1. Covariance b) E(Var[YIX]) = a/(1 - p 2 )
Definition 1. If X and y are two random variables then
covariance between them is defined as Proof: a) Suppose X and Y are continuous random
Cov(X, Y) = E[(X-E[X])(Y-E[Y])] variables having joint pdf f(x, y) and marginal pdf of X be
[ 1 (x) and ofY be f 2 (y)
Provided that the expectation of X and Y exists.
In fact Cov(X, Y) = E[(X-E[X])(Y-E[Y]) Since E[YIX] is linear in X.
= E[(XY- XE[Y]-YE[X]) + E[X]E[Y]) Therefore let E[YIX] = a + bx

Correlation and Regression 175


"' ff!x.y)
I_.,,Y dy - a + hx
:::) ,CxJ :. (3) ⇒ P a1 az + µ, µ2 = aµ1 + b(a1 2 + µ/ ) ............(4)

⇒ (,>yf(x,y)dy = (a+/Jx) fi (x) ··········( I)


On solving (2) and (4) simu ltaneously, we get
Oz a
lntcgaratc w.r.l x on hoth sides or (I) a = µ2 - p-µ , and b = p ..l.
C11 C11

⇒ I~., r:,Y f (x,y) dydx = f~co(a + bx)f1 (x) dx

⇒ J~ ylJ~ f (x, y )dx] dy = a f~00 f1 (x) dx +


00 00 b) The conditional variance or Y given X = xis given by
bf~00 x f 1 (x)dx
V[Y\X] = E[Y - E[YIX]]2 = f_"Jy - E[Yl x]]2 f(y lx)dy
⇒ (.,Y fz(y)dy = a. 1 + b E[X] 2
J
= _oo [ y- µz -p -" 2 (X-µ1) ] -
co ~
f (x.y)
AW
dy ..........(5)
⇒ f' y fz(y)dy = a.1 + b E[X]
00
As the variance is non-negative and is at most function of x
⇒ £[Y] =a+ b E[X] alone.
⇒ µ 2 = a + bµ 1 where µ1 = E[X] and µz = E[Y] •··········<2) :. On multiplying equation (5) by [ 1 (x) and integrating
w.r.t x on both sides,we get
Now firstly multiply equation (I) by x and then integrate
w.r.t x, we have r'.,, V[Yj x ]fi(x)dx = I.:I.: [r - µz- p~(X - µ1)r f(x, y )dy dx
E[V[Y\X]
(., f~co xy f (x, y)dydx = J:' x(a + bx)/ (x) dx
⇒ (.,f:' xy f(x,y)dydx
00 1
=J_
00

00 L: [(y- µz)2 - 2p :: (y - µz)(x - µ1) +

= a f:' 00
00

f:'
x/1 (x) dx + b 00 x 2f1 (x) dx
(P ::f (x - µ 1 ) 2 ] f( x, y )dydx
= E[(y- µ 2 )2] - 2p CTz E[(X - µ1)(Y - µz)] +
⇒ E[XY] = a £(X] + b £(X 2 ] ···········<3)
01

pz CTz: E[(x - µ1)2]


As p is the correlation coefficient between X and Y CTt
2 az2
Cov( X,Y) E[XY) - E[X)E(Y) = az2 - 2p -CTz pa1 az + P 2 CT1
2

:. p = --;;;;- = U1 Uz C11 111

= az2 - 2p2 az2 + p2 az2


= a/ (1 - p 2 )

176 Probability and Statistics Correlation and Regression 177


I Jenee proved .
2
rJx = qxz] - [£[X]]2 = 22. -
12
[~]2
4
= ~
48
Examples Simi larly E[Y] = Ly=o.1 _2ypz(y )
Example I. Let X and Y have joint p.m.f given as
= 0. P2(0) + 1. p2(1) + 2. p2(2) = £
12
= Ly=o.1.2 Y2 P2CY)
2
(x. y) (0.0) (0,1) (0,2) (1,1) (1.2) (2,2) £[Y ]

1 2 = 02.pz (0) + 12·P2(1) + 22·P2(2) = 29


p(x, y) -12 -12 -121 -12
3 4
12 -121 12
And zero otherwise. Find the correlation coeflicient ay z = E[Yz] _ [£[Y]] 2 = 29 _ [~12 = ~
between X and Y. 12 12 144
Solution: The marginal pmfs can be obtained as E[XY] = Lx=o,1,2 Ly=o.1,2xyp(x , y)

= Lx=o.1.2 x(Ly=o.1.2 y. p(x, y)]


X\Y 0 I 2 p,(x)
= Lx=o.1.2 x[1. p(x, 1) + 2. p(x, 2)]
0 -121 2
-12 -
1
12
4
-12
3 4
= Lx=o,1.zX.p(x, 1) + 2Lx=o.1.2X,p(x,2)
I 7
0 -12 -12
12 = p(l,1) + 2p(2, 1) + 2[p(l, 2) + 2[p(x, 1) + 2p(2,2)]
2 0 1 1
0 -12 -12 =~
4
1 5 6 The correlation coefficient betwee n X and Y is given
pz(y) -12 -12 -12 LP1(X ) = I
1:pz(y )= J by
5 I 17
p = E[XY)- E[X )E[Y] = 4 12 = -0.Z&
4-
U1 Uz

3
=-
12 = -41 Example 2. Let X and Y have joint pdf given by
2 2
E[X J = Lx=o.1.2X P1(x) < X < y,
f(x, y) = { 20 0 0 <y <1
2 otherwise
= 0 ,P1(O )+1 2 .p1(1) + 22 • p1 (2) = ~
12 Show that the correlation coefficient of X and Y is !.
2
178 Probability and Statistics
Correlation and Regression 179

Solution: The nurginJI rJf l,f .\ is ( \Tl- E!X' t f!"J
p ==
r~ (x) = .r: {(x,y )dy: 0 < x < 1 .2. 2..
\ lS\ lS

= f:r1 2dv• : o < x < 1 V . Rrgression


= 2(1 - x); 0 <X<1 It is the 5tatis1ic.a.l K'Ol 10 mcb-ure the an •rafc n:-l.?:il'ns~:i'
1 I
lx'.twc'en 1wo or nwre ,-anaNe:s in terms of ( ' ~ In e..u-!i~r
E~l = fo1 xf1(x)dx =2 fo x(l-x)dx =i times ii was used by Sir Francis Galk,n in the stud~ l'I
hm'<li1y. :-.:ow days. ii is u_~ in many t-rand~ li;,.c w
E[X 2 ] =f0I x 2f1 (x)dx = 2J,l0 x 2(1 - x )dx =!_6 prc..--dict the potential sales of new rn'1lh:t in 1em1S <'i it:5

1- (1)i 21 price. the patient" s weigh! in terms <'f the numl'-er Cl! w~l:-
he or she has 1-een on diet. ~ rre capit3 \.'(11L<:un1rti~'n llf
= E[X 2 ]-[E[XJ]2 = 6 = 18
cenain foods in terms of their nutritiN1al ,:tlues etc.
The m:irginal pdf of Y is given by
7_u Binriate Regression
fz(y) = J: f(x,y)dx ; 0 < y < 1 Definition 3. Let X and Y ~ two ,-ariar-ks " hi.:h are

= I: 2c1x ; o< y < 1


jointly distributed. Then the regn.-s:-:iC1n eqwtiC1n Clf Y l' n X
·is denoted l-y Pn x and is defined as

=2y; 0 < y < 1 ~.,. )' p{:r,, ·) for discrete case


= X =X = P, (:r)
~ ,. f (:r ,,)d,,
1
Yf2(y)dy = 2 fo Y 2dy = i
1 2
EM = J0 µl'IX E[\1 ·l
{ . ..,.,. ' · for conlinous rnse
/1{r )
1
f[Y2] = fo y2/z(y)dy = 2 fo1 y 3dy - 21
where f\x. y). p(X. y) denotes their joint rx!i and rmi :mJ
o/ = f[Y 2 ] - 1- (2)i 2= 1
[E[Y]]2 = 2 18
/1(X),P1(x)
denotes their marginal densities.
E[XY] = J 0
1
J: xyf(x,y)dxdy = 2 J J: xydxdy =;
0
1
Similarly the Then !he regression eqt1.1til'O <'f X l'D y is
The correlation coefficient between X and Y is given by denoted by µXIY and is defined as

180 Probability and Statistics 181


LrX rlx YI for discrete case
P2lYl
µ.qr =
{
f:,x { (x.y )dx
fi (y)
ror continous case
a, : aJ_: f ,(x)dx +b f_:xf,(x)dx
Remarks => f_a,y f2(y) dy = a. l + b E[X]
1. The graph of µ XIY orµ y1x is called regression curve of => f ~a, Y f2(y) dy = a. l + b E[X]
X on Y and Yon X respectively. Therefore any one or
bolh the regressions may be linear.
=> E[Y] = a + b E[X]
2. When lhe regressions X on Y and Y on X are linear, => µ 2 = a + bµ1 whereµ - E[X]
lhey are called linear regression. i - and µ z = E[Y] ..... (2)
Now firstly multiply equation ( I) bv x .
w.r.t x. we have • · and then mtegrate
Theorem 2. Let µ 1 , µ 2 are the means and a/, az2
are the
variances of X and Y respectively and p is the correlation
coefficient between X and Y. If the regression of Y on X is J~a, J~a, xy f(x,y)dydx = f~"" x (a + bx)f, (x ) dx
linear, then prove that µy 1x =µ 2 + p (Jz (x - µ1) => f ~a, J:, xy f(x , y )dydx
(JI

Proof : We will prove the result for continuous case. For = Q J~a, Xf1 (x) dx + b r :, x 2f, (x) dx
discrete, integrals can be replaced by summation sign and
proof can be done on similar way. => E[XY] = a E[X] + b E[X 2 ]
........... (3)
Let regression equation of Yon X is linear then As p is the correlation coefficient between X and y
:. p = Co v ( X ,Y) = _E_lX_Y'--)-_E.;..[X.:..)E...,_[Y....:l
µy1x =a+ bx ⇒ f_a,a,yf(ylx) dy = a+ bx q, qi q, {Jz

⇒ fa, yf(x.y) dy =a+ bx => E[XY] = p U1 a2 + µ1 µz


-ao fi(x)


00
f_ y/(x,y) dy = (a+ bx)f1(x).
:. (3) ⇒ P U1 a2 + µ1 µ2 = aµ 1 +b(a12 + µ/ ) ......(4)
00
.........(I)
On solving (2) and (4) simultaneously. we get
Integrate w.r.t x on both sides of (I)
qi {Jz
a = µ2 - p-µ 1 and b = p-
=> f~aoL""aoy f(x,y)dydx = J:ca + bx)f1(X) dx <11 a1

182 Probability and Statistics Correlalion and Regression 183


I km:e proved.
Example 2. Let the random variables X and y b . . I
d1.stn· buted as e Joint y
Remark: If the regression of X on Y is linear, then we
have
f{x, y) = ( ~ - X < Y < X, 0 < X < l
otherwise
Find the regression equation of y on X.

Examples Solution: The marginal density ofX is given by

f1(X) = fxf(x,y)dy ; 0 < x < l


Example I. Let X and Y be two random variables having
joint density = f x 1. dy =2x ; 0 < x < l
xe-x(l+y) X > 0, y >0
f(x, y) = { 0 The regression equation of Y on X is given by
otherwise
JOO X
Find the regression equation of Y on X. = E[YIX = x] = -oo Y f(x,y)dy = LxY dy = 0
~~ '100 h
Solution: The marginal density of X is given by which is required regression equation ofY on X.
f 1(x) = J0 f(x,y)dy; X > 0
00

Example 3. Given the joint density f(x, y)


O<x<y<l
otherwise
e-rylOO
=xe-x __ - =e-x; x > 0 Find the regression line ofX on Y.
-x 0

The regression equation of Y on X is given by Solution: The marginal density ofX is given by
1
C' y f(x,y)dy f. xye-x(i+y) dy
00
f 1(x) = fx f(x,y)dy; O<x<l
µYIX = E[YIX = x] = oo fi(x) = o e-x
= J; 6x. dy =6x.yli = 6x(l - x) ; 0< x < l
= J,oo xye-xy dy = ~
0 X
f2 (y) = g f(x,y)dx ; 0 < x < l
= fY 6xdx = 6. xzlY = 3y 2 ; 0 < y < 1
Which is reqired regression equation of Yon X. Jo 2 o

µ1 =E[X]= J; xf1(x)dx= fo16x2(l-x)dx


= 6 fol (x2 - x3)dx

184 Probability and Statistics Correlation and Regression 185


.1' 3
= 6 . ( -j - -4
.r•)1 l _
- -2
1
fllotc . I · When there are ties in rank. then we assign the tied
o
1 obSCr v·ition the mean of ranks that they jointly occupy.
E[X ]= f0 x f 1(x)dx = f0 6x 3 ( l-x) dx
2 2 1 '

= 6 fo1 (x
3
- x 4 )dx 2. When there is no ties in rank. then rs = r (correlation
coet1ic icnt)
xs)l 1 -_ -3
= 6. (-x• - -
students
ExampIC 4· The number of hours of .study. of IO
4 5 O 10
2
= E[X 2]-(E[X ]) 2 =---
3 1 1
• o- 10 4
=-
20 for an e",·amination and the scores obtained 1s as follows
•• 1
y•1 l 3
µz=E[Y]= fo1 Yf2(y) dy= fo 3y dy = 3.7 o =;-
1 3
No.o f hrs. 8 5 11 13 10 5 18 15 2 8
studied (x)
SI 1 3
E[Y2] = fo1 y2/2(y)dy = fo 3y4dy = 3.: o = 1 s - Score (y) 56 44 79 72 54 94 85 33 70 65
2 2
.-. o- = E[Y ] - (E[Y])2 = -3 - 9
-= -3
5 16 80 ~

Calculate the rank correlation coefficient.


J0 J: xy f(x,y) dxdy =J0 J; 6x 2y dxdy
2
1 1
E[XY] =
l 2 . n.. The ranking of x·s and y's is givn below
Solutio
= J, 2y 4 dy=-5
o dz
2 3 ~

Ranko fx Ranko fy(Rz d :

:. p = Co11(X.Y) = E[XY)-E[X)E(YJ =_ii _= 2_ = O.SB Rz) Ri_) 2


Q ( Ri) ) [R 1 (R, -
~
-
U1 Uz U1 Uz ,/3
• ✓-io · ~ao 6.5 7 -0.5 0.25
The regression line ofX on Y is given by 8.5 9 -0.S 0.25
µXIY = µ1 + P~(y - µz) 4 3 1 1
<Tz

==> µXIY =0.S + 0.66(y - 0. 75) 3 4 -1 1


I 8 -3 9
5
7.4. Rank Correlation Coefficient 8.5 1 7.5 56.25

This is called non Parametric measure of association. It is


also known as ~peannan's rank correlation coefficient. I 1
2 10
2 -1
-8
5
64
25
1

10 5
Definition 4. Let {(xi,Yi; i = 1,2,3, .... n} be the given set 6 -0.S 0.25 J
6.5
of paired data. Then the rank correlation coefficient
denoted by rs and defined by ( Here L d = 158
2

r.=1-~h
6't'" d 2
The rank correlation coefficient is given by
s n(nz_ 1) w ere d; is the difference of the ranks I
'I 6Id2 - - 6•158 =0.84
assigned to xi and Y;. r.s - - 1
= 1 - -n(nZ-1) 10(102- 1)

186
Probability and Statistics
Com:lation and Regression 187
! i
Exercise 7.1 s. The ranking of IOstudents in two subjects A and B
are as fo llows
I. Let X and Y have joint p.m.f given as

(x,y) (1, 1)

2
(1,2)

4
(1,3)

3
(2,1)

1
(2,2) (2,3)
ffi: I: I: I: I: l~'I: l I: I: I
Calculate the rank correlation coefficient.
10

p(x, y) -15 -15 -15 -15 -151 -4

And zero elsewhere. Fmd the correla!lon coefficient


15
•••
between X and Y.
2. Let X and Y are jointly distributed as Answers
f(x, y) = -1
2- X- y Q :5 X :5 1, Q :5 y :5 1 1. 0.246 2. a)- b)2
{ 144 11
0 otherwise
a) Find the covariance between X and Y 3. µYIX = 2.698 + 0.0027(x - 4.253)
b) Find the correlation ofX and Y
3. The joint density of two random variables X and y 4. a) 0.49 b) µyJx = 0.8 + 0.267(x - 0.53), µx 1r =
is given by 0.53 + 0.898(y - 0.8)
f(x, y) =
2x+y 5. 0.3
210 2 < X < 6, < Q < y < 5
{
0 otherwise
Find the regression line of y on x
4. Let the joint density of (X, Y) is
f(x, y) = { 8xy O< x < y < 1
. 0 otherwise
a) Fmd correlation coefficient ofX and y
b) Fi nd both the regression lines

188
Probability and Statistics
Correlation and Regression 189

You might also like