0% found this document useful (0 votes)
68 views

Unit 2-Part 3-Linear Regression

- The coefficient of correlation (r) between x and y values is 0.5 - The equation of the regression line of y on x is: Y = 1.1 + 1.3X - The equation of the regression line of x on y is: X = 0.5Y + 0.5

Uploaded by

sasuke Uchiha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

Unit 2-Part 3-Linear Regression

- The coefficient of correlation (r) between x and y values is 0.5 - The equation of the regression line of y on x is: Y = 1.1 + 1.3X - The equation of the regression line of x on y is: X = 0.5Y + 0.5

Uploaded by

sasuke Uchiha
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Regression

Example
 A researcher believes that there is a linear
relationship between BMI (Kg/m2) of
pregnant mothers and the birth-weight
(BW in Kg) of their newborn

 A researcher also says that there is a


linear relationship between the study
hours and result of the student.
Study Hours Regents
Score
3 80
5 90
2 75
6 80
7 90
1 50
2 65
7 85
1 40
7 100
Scatter Diagram
 Scatter diagram is a graphical method to
display the relationship between two
variables

 Scatter diagram plots pairs of bivariate


observations (x, y) on the X-Y plane

 Y is called the dependent variable

 X is called an independent variable


Is there a linear relationship between
study hours and result?
 Scatter diagrams are important for initial
exploration of the relationship between two
quantitative variables

 In the above example, we may wish to


summarize this relationship by a straight line
drawn through the scatter of points
Regression
 Regression is the estimation or prediction of unknown values of
one variable from known values of another variable

 The variable whose value is to be predicted is called dependent


variable and the variable which is used for prediction is called
independent variables

 If the scatter diagram indicates some relationship between two


variable x and y then the dots of the scatter diagram will be
concentrated round a curve

 The curve is called the curve of regression and the relationship


is said to the be expressed by means of curvilinear regression

 In the particular case, when the curve is a straight line, it is


called a line of regression and the regression is said to be linear.
 The equation of the line of regression of y on x is
y=a+bx ---eq 1
, where y is dependent variable and x is
independent variable.
 The line of regression always passes through point
( x, y) , y =a+bx ----eq 2
byx is the slope of the line r σy
(b=coeff of regression)

σx
where r = karl Pearson’s coefficient of correlation
r = cov(x, y) / σxσy
line of regression of y on x
 The equation of the line of regression of x on y is
x =a+by ----eq 1
, where x is dependent variable and y is
independent variable.
The line of regression always passes through point
( x, y) , x =a+by -- eq 2
bxy is the slope of the line r σx

σy
line of regression of x on y
Example

The model has a deterministic and a probabilistic


components
House
Cost
b out
ts a
os
e c z e)
u s . Si
a ho fo ot 75(
d i n g u a r e 00 +
l 0
Bui per sq t = 25
Most lots sell $7 5 s e c o s
for $25,000 Ho u

House size

11
Example

However, house cost vary even among same


size houses!Since cost behave unpredictably,
House we add a random component.
Cost

Most lots sell


for $25,000
House cost = 25000 + 75(Size)
House size

12
Estimating the Coefficients

 The estimates are determined by


 drawing a sample from the population of
interest,
 calculating sample statistics.
 producing a straight line that cuts into the data.
y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
x 14
The Least Squares (Regression) Line

A good line is one that minimizes


the sum of squared differences between
the points and the line.

15
y on x
y=a+bx substitute the values of a and b

a=y- bx b= y

y – y= r σy (x – x)
σx

x on y
x=a+by substitute the values of a and b

a=x- by b =σx x

x – x= r σx (y – y)
σy
Regression
LINE OF REGRESSION:
 The equation of the line of regression of y on x is y=a+bx

 y =a+bx (as Line of Regression passes through x, y )


y – y= r σy (x – x)
σx

 The equation of the line of regression of x on y is


(x – x) = r σx (y – y)
σy

where r = cov(x, y) / σxσy


Both the lines pass through (x, y)
Regression Coefficients

 The slope of the line of regression of Y on X is also called


the coefficient of regression of Y on X

 Regression coefficient of Y on X = bYX =r σY / σX

 Regression coefficient of X on Y = bXY =r σX / σY


Properties of Regression Coefficients
1. Correlation coefficient (r) is the geometric mean between
the regression coefficients
bYX =r σY / σX ----eq 1 and bXY =r σX / σY ----eq 2

bYX X bXY = r2
Example 1
Obtain the following for given data:
1) the least square regression line of y on x
2) line of regression of x on y
3) Also obtain an estimate of y for x = 8
4) Estimate value of X for Y = 2
5) Calculate the coefficient of correlation (r)

x 3 4 5 6 4 5 6 7
y 3 5 3 2 3 4 6 6
X Y X-X Y-Y (x-X)2 (Y-y)2 (x-x)(y-
Xmean =EX/n = 40/8 = 5
y)
3 3 -2 -1 4 1 2
Ymean =EY/n= 32/8 = 4
4 5 -1 1 1 1 -1
Coefficient of regression of y
5 3 0 -1 0 1 0
on x is:
6 2 1 -2 1 4 -2
4 3 -1 -1 1 1 1
5 4 0 0 0 0 0
=6/12=0.5
6 6 1 2 1 1 2 Line of regression of y on x is:
7 6 2 2 4 4 4
Sum Sum Sum = Sum= Sum= 6 Y – 4= 0.5 ( X – 5)
= 40 = 32
12 16
Y – 4 = 0.5 X - 2.5
Y = 0.5 X - 2.5 + 4
Y = 1.5 + 0.5 X
Also, Coefficient of regression of X on y is

= 6/16 =0.375

Line of regression of x on y is :

X- 5 = 0.375 ( y- 4)
X= 5 – 1.5 + 0.375 y
X= 3.5 + 0.375 y
Find Y when X=8:

Putting x=8 on line of regression of line Y on X


Y=0.5(8)+1.5 = 5.5

Find X when Y=2:

Putting Y=2 on line of regression of line X on Y


X=3.5 + 0.375 * 2
X=4.25

Coefficient of regression (r):

as bYX X bXY = r2
r = (0.5 * 0.375) 1/2 = 0.43
Example 2

For the following values of


x 1 2 3 4 5
y 2 5 3 8 7

 Calculate the Karl Pearson’s coefficient of


correlation (r)
 obtain the equation of regression line for x on y
 obtain the equation of regression line for y on x
 Find r using either following formula:

or
 Find bYX & bXY then find r [as bYX X bXY = r2 ]
Xi Yi Xi- Yi- (xi-x)2 (yi-y)2 (xi-x) Xmean = sum X/n =15/5=3
mean mean (yi-y)
1 2 Ymean = sum Y/n =25/5=5
-2 -3 4 9 6
2 5 For regression line calculation:
-1 0 1 0 0
3 3
0 -2 0 4 0
4 8
1 3 1 9 3
5 7
2 2 4 4 4 =13/10=1.3
Sum Sum
= 15 =25 Sum=10 Sum=26 Sum=13

=0.5

Line of regression of y on x is

Y-5= 1.3(x-3) y=1.3x -3.9+5


Y=1.1+1.34 X
Xi Yi Xi- Yi- (xi-x)2 (yi-y)2 (xi-x) Xmean = sum X/n = 15/5= 3
mean mean (yi-y)
1 2 Ymean = sum Y/n = 25/5= 5
-2 -3 4 9 6
2 5 For regression line calculation:
-1 0 1 0 0
3 3
0 -2 0 4 0
4 8
1 3 1 9 3
5 7
2 2 4 4 4 = 13/10=1.3
Sum Sum
= 15 = 25 Sum= 10Sum= 26Sum= 13

=13/26=0.5

Line of regression of y on x is

Y-5= 1.3 (x-3)


Y= 1.1+1.3x
Line of regression of x on y is

X-3=0.5(y-5)
X=0.5y+0.5

Calculate Value of r:
as bYX X bXY = r2
r = (1.3 * 0. 5) 1/2 = + 0.806
Example 3

The regression line of y on x for a certain bivariate data


is 5y+3x=52 and the line of regression of x on y is
2x+y=30 find

i) Arithmetic mean of x and y


ii) The coefficient of correlation between x and y
iii) The most probable value of y when x = 10
iv) The most probable value of x when y = 5
Example 3
Solve the given equation :

3x + 5y = 52  3x + 5y = 52 (subtract first equation from second)


2x + y = 30  10x+5y=150  7x=98  X=14

Calculate Y : 2 * 14 + y = 30 (put x=14 in second equation)


y= 2

AM of x= 14 and AM of y = 2 5y+3x=52  5y=52-3X


2x+y=30 –> 2x=30-y  x=15-y/2

as bYX X bXY = r2 Y on X
5y+3x=52  Y = a + bYXX y = (52/5) + (–3/5) x
X on Y
2x+y=30  X = a + bXYY  x = 15 + (-1/2) y
bYX = –3/5
bXY = –1/2 r = - 0.5477

The most probable value of y when x = 10


y = -3/5x+52/5 = -3/5 *10+52/5 = 4.4

The most probable value of x when y = 5


2x+y=30  x=12.5
Example 4
 In partially destroyed lab record of an analysis of correlation
data , the following results only are readable:
 Variance of x=9
 Regression equations: 8x- 10y+66=0 , 40x-18y =214
Find the value of
a) the correlation coefficient between x and y,
b) the mean values of x and y
c) standard deviation of y

Variance= SD2 r=0.6


Sd of y =4
Mean x=13
Mean y=17
ii) Since both the line pass through the point (Xmean,Ymean)
So solving two equation 8x- 10y+66=0 , 40x-18y =214 we get
Xmean = 13 and Ymean = 17

i) Let us consider
Line of y on x : 8x- 10y+66=0  y = 6.6 + 0.8 x  bYX = 0.8
Line of x on y : 40x-18y =214  x = (214/40) + (18/40) y  bXY = 18/40
as bYX X bXY = r2
r2 =0.8 *18/40 = 0.4 * 0.9  r = + 0.6

iii)
bYX = r σY / σX
0.8 = 0.6 * σY / 3
σY= 0.8 * 3 /0.6 = 4
σY = 4
Mean x= 67.657
Mean y = 68.70
r=0.513
Practice Questions
 The two regression lines obtained from certain data were
y=x+5 and 16x =9y-94. find the variance of X where variance
of y is 16. also find the covariance between X and Y
(ans variance of X= 9, cov =9)
(hint : cov(x,y)/σxσy = r)
Practice Questions
 The two regression lines obtained from certain data were
y=x+5 and 16x =9y-94. find the variance of X where variance
of y is 16. also find the covariance between X and Y
(ans variance of X= 9, cov =9)
(hint : cov(x,y)/σxσy = r)
Given: y = x +5  bYX = 1
16x =9y-94  bXY = 9/16
as bYX X bXY = r2
r= (1 * 9/16) 1/2 = 3/4---eq 1

Variance(y)= 16  so σY = (16) 1/2  so σY = 4 --eq 2


bYX = r σY / σX
Practice Questions

 Find the regression line of y on x for the following data:


Ans: y = 0.636 x + 0.548

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9

You might also like