Introduction To Regression
Introduction To Regression
Introduction to Regression
Correlation measures the direction and the strength of the relationship between the
variables and so we can predict the value of one variable from the given value of
variable knowing the degree of association between these variables. For example, the
demand and supply are correlated. We can find the expected demand for the given
supply for the market needs.
Regression analysis is widely used for deriving an appropriate functional relationship
between the variables. It helps us to estimate one variable or the dependent variable
from the other variable or independent variable. The prediction is based on average
relationship arrived at statistically by regression analysis.
The literal meaning of regression is ‘moving backward’, ‘going back’ or ‘return to the
mean value’.
“Regession is a technique which estimates the value of unknown from the know values.
Regression also is defined as predicting or estimating the dependent values with the
help of independent values.” In regression analysis there are two types of variables. The
variable whose value is influenced or is to be predicted is called dependent variable and
the variables which influence the value or is used for prediction, is called independent
variable. In regression analysis independent variable is also known as regressor or
predictor or explained variable.
Uses of regression analysis
1. Regression analysis is used almost in every field where two or more relative
variables have the tendency to go back to the averages. It is very useful in prediction
purposes as in the fields of statistics, economics, natural sciences and physical
sciences and many other applied fields.
It is very well adopted for predicting sales, production or demand in any business
entity
which would plan for a better profit.
2. Regression analysis predicts the unknown variable from the known values of the
variable.
3. We can calculate the coefficient of correlation with the help of regression coefficient.
4. Regression analysis in statistical estimation of demand curves, supply curves,
production function, cost function, consumption functions etc., can be predicted.
Correlation and Regression
The correlation establishes the relation or the degree of association between the two
variables while regression establishes a functional relationship between the dependent
and the independent variable so that the values of unknown variables can be estimated
from the known values of independent variables. Correlation precedes the regression
analysis.
1
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
2. It finds the degree of relationship between It indicates the cause and effect relationship
the variables. between the variables.
3. It is used for testing and verifying the Besides verification it is used for prediction
relation between them. of the unknown form the known values.
Regression Equations
Regression equations are the algebraic expression of the regression lines. Since there are
two regression lines, there will be two regression equations. One, the regression equation of
X on Y is used to describe the variation in the value of X for the given changes of Y and the
regression equation Y on X is used to describe the variation in the values of Y for the given
charges of X.
Regression Equation of Y in X.
Yc = a + bX
In this equation ‘a’ and ‘b’ are two unknown constants (fixed numerical values) which
determine the position of the line completely. The constant ‘a’ determines the level of the
fitted line i,e., the change in Y for the unit change in X.
2
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
If the values of the constants ‘a’ and ‘b’ are obtained, the line equation is completely
determined. But how to determine these values, the answer is obtained by the method of
least squares which states that the line should be drawn through the plotted points in such a
manner that the sum of the squares of the vertical deviations of the actual Y values from
the estimated Y values is the least or in the other words, in order to obtain a line which fits
the points best, (Y-Yc)2 should be minimum. Such a line is known as the line of best fit.
r
x
=
dxdy b
y dy 2 xy
Y Y r
y
(X X ) Y Y byx ( X X ) Y Y
dxdy ( X X )
x
or or dx 2
y
r : is the regression coefficient of y on x or it can be represented as byx.
x
r
y
=
dxdy b
x dx 2 yx
3
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
Demand 12 13 15 13 12 20 20
Supply 45 40 43 37 40 39 43
Solution : The demand is taken as X series and the supply as Y
Computation of regression equations
X Y dx= (x- 15) dx2 dy= y-41 dy2 dxdy
12 45 -3 9 4 16 -12
13 40 -2 4 -1 1 2
15 43 0 0 2 4 0
13 37 -2 4 -4 16 8
12 40 -3 9 -1 1 3
20 39 5 25 -2 4 -10
20 43 5 25 2 4 10
X
X 105 15 Y
Y 287 41
N 7 N 7
Regression equation of X on Y Regression equation of Y on X
XX
dxdy (Y Y ) Y Y
dxdy ( X X )
dy 2 dx 2
1 1
X 15 (Y 41) Y 41 ( X 15)
46 76
X 0.022(Y 41) 15 Y 0.013( X 15) 41
X 0.022Y 14.098 Y 0.013X 40.805
EXAMPLE 2:
The following data relate to ages of husbands and wives. Obtain the two regression
equations and determine the most likely age of husbands for the age of wife 25 years and
most likely age of wife age of husband 30 years. Also determine the coefficient of
correlation.
Age of 27 25 29 28 30 33 37 35 40 42
husbands
Age of 24 20 27 25 24 28 34 28 44 38
4
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
wives
Solution : The ages of husband is taken as X and the ages of wives is taken as Y.
X Y dx = x - 32.6 dx2 dy= y - 29.2 dy2 dxdy
27 24 -5.6 31.36 -5.2 27.04 29.12
∑dx2
∑x = 326 ∑y = 292 ∑dx = 0 = 298.4 ∑dy = 0 ∑dy2 = 483.6 ∑dxdy =349.8
X
X 326 32.6 Y
Y 292 29.2
N 10 N 10
Regression equation of X on Y Regression equation of Y on X
XX
dxdy (Y Y ) Y Y
dxdy ( X X )
dy 2 dx 2
349.8 349.8
X 32.6 (Y 29.2) Y 29.2 ( X 32.6)
483.6 298.4
X 0.723(Y 29.2) 32.6 Y 1.173( X 32.6) 29.2
X 0.723Y 11.488 Y 1.173X 9.04
dxdy b 349.8 0.723 dxdy b 349.8 1.173
dy 2 dx 2
xy yx
483.6 298.4
Thus the likely age of husband for age of wife being 25 years is
X 0.723Y 11.488 0.723(25) 11.488 29.6 years
The likely age of wife for age of husband being 30 is
5
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
X
X 114 19 Y
Y 108 18
N 6 N 6
Regression equation of X on Y Regression equation of Y on X
XX
dxdy (Y Y ) Y Y
dxdy ( X X )
dy 2 dx 2
97 97
X 19 (Y 18) Y 18 ( X 19)
216 46
X 0.449(Y 18) 19 Y 2.109( X 19) 18
X 0.449Y 10.917 Y 2.109 X 22.065
dxdy b 97 0.449 dxdy b
97
2.109
dy 2 dx
xy 2 yx
216 46
6
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
Example 4: A company wants to assess the impact of Exports on its annual profit. The
following table presents the information for the last eight years.
Years 2009 2010 2011 2012 2013 2014 2015 2016
Exports 10 8 5 10 9 5 8 7
(Rs.’000)
Annual 40 50 43 59 60 45 40 40
profit(Rs.’000)
Estimate the regression equation and predict the annual profit for 2017 for an allocated
sum of Rs.12,000 exports.
Solution: Let the exports be taken as x and that of annual profits as y.
Years x y dx = x -6 dx2 dy= y -40 ∑dy2 dxdy
10 40 2 4 -3 9 -6
2010
8 50 0 0 7 49 0
2009
7
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
5 43 -3 9 0 0 0
2008
10 59 2 4 16 256 32
2007
9 60 1 1 17 289 17
2006
5 45 -3 9 2 4 -6
2005
8 40 0 0 -3 9 0
2004
7 40 -1 1 -3 9 3
2003
∑x =62 ∑y =377 ∑dx= -2 ∑dx2=28 ∑dy=33 ∑dy2=625 ∑dx dy=40
X
X
62
7.75 Y
Y
377
47.125
N 8 N 8
Regression equation of x on y Regression equation of y on x
( dx)( dy ) ( dx)( dy )
dxdy N
dxdy N
bxy byx
( dy )2 ( dx )2
dy N
2
dx N
2
(2)(33)
40
8 40 8.25
bxy
(33) 2
625 136.125
625
8
(2)(33)
40
8 40 8.25
b yx
(2) 2
28 0.5
28
8
48.25 48.25
bxy 0.0.99 b yx 1.755
488.875 27.5
( X X ) bxy (Y Y ) (Y Y ) byx ( X X )
( X 7.75) 0.099(Y 47.125) (Y 47.125) 1.755( X 7.75)
X = 0.099Y + 3.085 Y = 1.755X +33.524
The annual profit for the sum of exports of Rs.12 is Y = = 1.755(12) +33.524=Rs. 54.584(in
thousands)
=
8
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
XX
dxdy (Y Y ) Y Y
dxdy ( X X )
dy 2 dx 2
169
169 Y 18 ( X 25)
X 25 (Y 18) 286
136
Y 0.591X 3.225
X 1.243Y 1.243 18 25
X 1.243Y 2.626
r bxy b yx 1.243 0.591 0.857
The coefficient if correlation
There is a high degree of correlation.
Example 6: From the following data of the rainfall and production of rice , the most likely
production corresponding to the rainfall of 35”
Rain fall(inches) production(tonnes)
Mean 25 50
SD 6 8
Coefficient of correlation = +0.85
Solution : rain fall is taken as X and production as Y.
Regression equation of X on Y Regression equation of Y on X
x y
X X r (Y Y ) Y Y r (X X )
y x
9
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
8
Y 50 0.85 ( X 25)
6
6
X 25 0.85 (Y 50) Y 1.133 X 28.333 50
8
Y 1.133 X 21.167
X 0.638Y 31.875 25
X 0.638Y 6.875
The most likely production of rice for the rainfall of 35 inches is
Y 1.133(35) 21.167 60.822tonnes
Example 7: The coefficient of correlation between the ages of boys and girls in a
community was found to be +0.89, the average of boys was 13 years and that of girls 10
years. Their standard deviations were 3 ans 2 years respectively. Find with the help of
regression equations:
(a) The expected age of boy when the girl’s age is 17.
(b) The expected age of girl when the boy’s age is 18.
Solution:
Let the boys age be X and girls age Y.
X 13,Y 10, x 3, y 2, r 0.89
Regression equation of X on Y Regression equation of Y on X
x y
X X r (Y Y ) Y Y r (X X )
y x
2
Y 10 0.89 ( X 13)
3
3
X 13 0.89 (Y 10) Y 0.593 X 7.709 10
2
Y 0.593 X 2.291
X 1.335Y 13.35 13
X 1.335Y 0.35
(a) The expected age of boy when girl’s age 17 is
X 1.335Y 0.35 = X 1.335(17) 0.35 22.345 years
(b) (b) The expected age of a girl when the boy’s age being 18 is
(c) Y 0.593X 2.291 = Y 0.593(18) 2.291 12.965 years
(d)
Example 8: The following calculation have been made for closing prices of eight stocks (Y)
on the National stock Exchange on a certain, along with the volume of the sales in
thousands of shares(X) . from these calculation find the regression equation of volume of
shares on stocks.
∑X= 56, ∑Y=40, ∑XY=364,∑X2=524, ∑Y2=256
10
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
Solution :
X
X 56 7 Y
Y 40 5
N 8 N 8
Regression equation of X on Y
XX
dxdy (Y Y ) X 7 364 (Y 5) X 1.423Y 1.423 5 7
dy 2 ;
256 X 1.423Y 0.115
Exercise
1. The average daily wages for working class in Nagpur is Rs.12 and for that in Delhi
Rs.18, their respective standard deviations are Rs.2 and Rs.3 and the coefficient of
correlation is 0.67. Find the most likely wage in Delhi corresponding to the wage of
Rs.20 in Nagpur.
4. Prices indices of cotton and wool are given below for 6 months of a year. Obtain the
equations of regression between the indices.
Prices index of cotton (X) 78 77 85 88 87 82
Prices index of wool (Y) 84 82 82 85 89 90
6. The following table gives the aptitude test scores and the productivity indices of 10
workers selection at random.
Aptitude Scores(X) 60 62 65 70 72 48 53 73 65 82
Productivity index(Y) 68 60 62 80 85 40 52 62 60 81
11
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
Calculate the two regression equations and estimate the productivity index of a workers
whose test score is 92.
7. To study the relationship between expenditure on accommodation X and
expenditure on food and entertainment Y , an enquiry into 50 families gave the
following results:
∑X=8500, ∑y =9600, σx = 60 , σy = 20 and r = 0.6
Estimate the expenditure on food and entertainment when expenditure on
accommodation is Rs.200.
8. Following are the data on business on turnover and the staff of a company for eight
years from 2002 to 2009:
Years 2002 2003 2004 2005 2006 2007 2008 2009
Business 45 50 60 75 80 110 150 170
turnover(Rs
crores)
Staff 2,600 3,000 3,100 3,530 3,850 4,300 5,870 7,150
Fit a proper regression equation to estimate manpower in terms of business turnover.
Estimate the staff requirement when the business turnover reaches Rs.200 crores.
9. Calculate the two regression equations of X on Y and Y on X from the data given
below taking deviations from actual means of X and Y:
Price (Rs) 10 12 13 12 16 15
Amount demanded 40 38 43 45 37 43
Estimate the likely demand when the price is Rs.20.
10. An industrial engineer collected the following data on experience and performance
rating of 8 operators:
Operators 1 2 3 4 5 6 7 8
Experience 16 12 18 4 3 10 5 12
(years)
Performance 87 88 89 68 58 80 70 85
rating
(a) Does the data give evidence that experience improves performance?
(b) Estimate the performance rating of an operator having (a) 9 years (b) 15 years of
experience.
Answer
1. Y 20 = 26.04 2. 28.6 3. Marks in Maths =94 4. X=4.78Y+42.084, Y =
0.265X+63.365
5. X=2.19Y-65.25, Y=0.037X+35.39, r =0.901 6. X=-0.596Y+26.26, Y=1.168Y-10.92
7. Y=158+0.2X, Y=198. 8. Y= 33.24X+1100.3; 7748.3
12
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II
13
FOR PRIVATE CIRCULATION ONLY