7) Correlation and Regression
7) Correlation and Regression
E[XY] = E[X]E[Y]
:. from ( I) we have Cov(X, Y) = o
Correlation is a mathematical measure of linear relationship Y is usually denoted by p and is defined as p = cov(X,Y)
between two variables. It gives the strength and direction of I ~~
w1ere ax and Cly are S.D ofX and Y respectively.
linear relationship between the two variables. If change in
Sometimes it is also denoted by ' r'
one variable X produces the change in variable Y, then X
and Y are said to be correlated. When increase in the value Remark: -1 $ p $ l
of one variable increases the value of the other, then two
are said to be positively correlated and when the increase in Theorem 1. Let X and Y are jointly distributed random
one causes the decrease in other, then they are called variables such that the variances of X and Y exist finitely.
negatively correlated . Two variables may be highly, Let µ1 , µz are the means and a/, az2 are the variances of
slightly or moderately correlated. The measure of such a X and Y respectively and p is the correlation coefficient
correlation is called correlation coefficient and usually between X and Y. lfE[YIX] is linear in X, then
denoted by p. a) E[YIX] = µ 2 + p (]<J1 (X -
2
µ1 )
7.1. Covariance b) E(Var[YIX]) = a/(1 - p 2 )
Definition 1. If X and y are two random variables then
covariance between them is defined as Proof: a) Suppose X and Y are continuous random
Cov(X, Y) = E[(X-E[X])(Y-E[Y])] variables having joint pdf f(x, y) and marginal pdf of X be
[ 1 (x) and ofY be f 2 (y)
Provided that the expectation of X and Y exists.
In fact Cov(X, Y) = E[(X-E[X])(Y-E[Y]) Since E[YIX] is linear in X.
= E[(XY- XE[Y]-YE[X]) + E[X]E[Y]) Therefore let E[YIX] = a + bx
= a f:' 00
00
f:'
x/1 (x) dx + b 00 x 2f1 (x) dx
(P ::f (x - µ 1 ) 2 ] f( x, y )dydx
= E[(y- µ 2 )2] - 2p CTz E[(X - µ1)(Y - µz)] +
⇒ E[XY] = a £(X] + b £(X 2 ] ···········<3)
01
3
=-
12 = -41 Example 2. Let X and Y have joint pdf given by
2 2
E[X J = Lx=o.1.2X P1(x) < X < y,
f(x, y) = { 20 0 0 <y <1
2 otherwise
= 0 ,P1(O )+1 2 .p1(1) + 22 • p1 (2) = ~
12 Show that the correlation coefficient of X and Y is !.
2
178 Probability and Statistics
Correlation and Regression 179
•
Solution: The nurginJI rJf l,f .\ is ( \Tl- E!X' t f!"J
p ==
r~ (x) = .r: {(x,y )dy: 0 < x < 1 .2. 2..
\ lS\ lS
1- (1)i 21 price. the patient" s weigh! in terms <'f the numl'-er Cl! w~l:-
he or she has 1-een on diet. ~ rre capit3 \.'(11L<:un1rti~'n llf
= E[X 2 ]-[E[XJ]2 = 6 = 18
cenain foods in terms of their nutritiN1al ,:tlues etc.
The m:irginal pdf of Y is given by
7_u Binriate Regression
fz(y) = J: f(x,y)dx ; 0 < y < 1 Definition 3. Let X and Y ~ two ,-ariar-ks " hi.:h are
Proof : We will prove the result for continuous case. For = Q J~a, Xf1 (x) dx + b r :, x 2f, (x) dx
discrete, integrals can be replaced by summation sign and
proof can be done on similar way. => E[XY] = a E[X] + b E[X 2 ]
........... (3)
Let regression equation of Yon X is linear then As p is the correlation coefficient between X and y
:. p = Co v ( X ,Y) = _E_lX_Y'--)-_E.;..[X.:..)E...,_[Y....:l
µy1x =a+ bx ⇒ f_a,a,yf(ylx) dy = a+ bx q, qi q, {Jz
⇒
00
f_ y/(x,y) dy = (a+ bx)f1(x).
:. (3) ⇒ P U1 a2 + µ1 µ2 = aµ 1 +b(a12 + µ/ ) ......(4)
00
.........(I)
On solving (2) and (4) simultaneously. we get
Integrate w.r.t x on both sides of (I)
qi {Jz
a = µ2 - p-µ 1 and b = p-
=> f~aoL""aoy f(x,y)dydx = J:ca + bx)f1(X) dx <11 a1
The regression equation of Y on X is given by Solution: The marginal density ofX is given by
1
C' y f(x,y)dy f. xye-x(i+y) dy
00
f 1(x) = fx f(x,y)dy; O<x<l
µYIX = E[YIX = x] = oo fi(x) = o e-x
= J; 6x. dy =6x.yli = 6x(l - x) ; 0< x < l
= J,oo xye-xy dy = ~
0 X
f2 (y) = g f(x,y)dx ; 0 < x < l
= fY 6xdx = 6. xzlY = 3y 2 ; 0 < y < 1
Which is reqired regression equation of Yon X. Jo 2 o
= 6 fo1 (x
3
- x 4 )dx 2. When there is no ties in rank. then rs = r (correlation
coet1ic icnt)
xs)l 1 -_ -3
= 6. (-x• - -
students
ExampIC 4· The number of hours of .study. of IO
4 5 O 10
2
= E[X 2]-(E[X ]) 2 =---
3 1 1
• o- 10 4
=-
20 for an e",·amination and the scores obtained 1s as follows
•• 1
y•1 l 3
µz=E[Y]= fo1 Yf2(y) dy= fo 3y dy = 3.7 o =;-
1 3
No.o f hrs. 8 5 11 13 10 5 18 15 2 8
studied (x)
SI 1 3
E[Y2] = fo1 y2/2(y)dy = fo 3y4dy = 3.: o = 1 s - Score (y) 56 44 79 72 54 94 85 33 70 65
2 2
.-. o- = E[Y ] - (E[Y])2 = -3 - 9
-= -3
5 16 80 ~
10 5
Definition 4. Let {(xi,Yi; i = 1,2,3, .... n} be the given set 6 -0.S 0.25 J
6.5
of paired data. Then the rank correlation coefficient
denoted by rs and defined by ( Here L d = 158
2
r.=1-~h
6't'" d 2
The rank correlation coefficient is given by
s n(nz_ 1) w ere d; is the difference of the ranks I
'I 6Id2 - - 6•158 =0.84
assigned to xi and Y;. r.s - - 1
= 1 - -n(nZ-1) 10(102- 1)
186
Probability and Statistics
Com:lation and Regression 187
! i
Exercise 7.1 s. The ranking of IOstudents in two subjects A and B
are as fo llows
I. Let X and Y have joint p.m.f given as
(x,y) (1, 1)
2
(1,2)
4
(1,3)
3
(2,1)
1
(2,2) (2,3)
ffi: I: I: I: I: l~'I: l I: I: I
Calculate the rank correlation coefficient.
10
188
Probability and Statistics
Correlation and Regression 189