0% found this document useful (0 votes)
144 views

Introduction To Regression

1) Regression analysis establishes a functional relationship between dependent and independent variables to estimate unknown variable values from known independent variable values. 2) Correlation measures the strength and direction of a relationship between variables, while regression establishes a predictive relationship to estimate dependent variables. 3) The document provides examples of using regression analysis to predict sales, production, demand and other business variables, as well as supply curves, cost functions, and consumption based on known variable values. Formulas for regression equations and calculating regression coefficients are also presented.

Uploaded by

Sara Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views

Introduction To Regression

1) Regression analysis establishes a functional relationship between dependent and independent variables to estimate unknown variable values from known independent variable values. 2) Correlation measures the strength and direction of a relationship between variables, while regression establishes a predictive relationship to estimate dependent variables. 3) The document provides examples of using regression analysis to predict sales, production, demand and other business variables, as well as supply curves, cost functions, and consumption based on known variable values. Formulas for regression equations and calculating regression coefficients are also presented.

Uploaded by

Sara Gomez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 5 QUANTITATIVE TECHNIQUES II

Introduction to Regression
Correlation measures the direction and the strength of the relationship between the
variables and so we can predict the value of one variable from the given value of
variable knowing the degree of association between these variables. For example, the
demand and supply are correlated. We can find the expected demand for the given
supply for the market needs.
Regression analysis is widely used for deriving an appropriate functional relationship
between the variables. It helps us to estimate one variable or the dependent variable
from the other variable or independent variable. The prediction is based on average
relationship arrived at statistically by regression analysis.
The literal meaning of regression is ‘moving backward’, ‘going back’ or ‘return to the
mean value’.
“Regession is a technique which estimates the value of unknown from the know values.
Regression also is defined as predicting or estimating the dependent values with the
help of independent values.” In regression analysis there are two types of variables. The
variable whose value is influenced or is to be predicted is called dependent variable and
the variables which influence the value or is used for prediction, is called independent
variable. In regression analysis independent variable is also known as regressor or
predictor or explained variable.
Uses of regression analysis

1. Regression analysis is used almost in every field where two or more relative
variables have the tendency to go back to the averages. It is very useful in prediction
purposes as in the fields of statistics, economics, natural sciences and physical
sciences and many other applied fields.
It is very well adopted for predicting sales, production or demand in any business
entity
which would plan for a better profit.
2. Regression analysis predicts the unknown variable from the known values of the
variable.
3. We can calculate the coefficient of correlation with the help of regression coefficient.
4. Regression analysis in statistical estimation of demand curves, supply curves,
production function, cost function, consumption functions etc., can be predicted.
Correlation and Regression
The correlation establishes the relation or the degree of association between the two
variables while regression establishes a functional relationship between the dependent
and the independent variable so that the values of unknown variables can be estimated
from the known values of independent variables. Correlation precedes the regression
analysis.

1
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

The differences between correlation and regression are:

1. Correlation is the relationship between Regression means going back and it


two or more variables either in the same establishes the relationship between the
direction or in the opposite direction. known and unknown variables.

2. It finds the degree of relationship between It indicates the cause and effect relationship
the variables. between the variables.

3. It is used for testing and verifying the Besides verification it is used for prediction
relation between them. of the unknown form the known values.

4. The coefficient of correlation is a relative Regression coefficient is an absolute


value and it range from +1 to -1. measure. We estimate the unknown value
from known values.

It has a wider application, as it studies both


5. As it is confined to only linear linear and non linear relationships.
relationships, the application is limited.
The regression coefficient explains that the
6. If the coefficient of correlation is positive, decrease one variable is associated with the
then the variables are positively related and increase in the other.
vice versa.

Regression Equations

Regression equations are the algebraic expression of the regression lines. Since there are
two regression lines, there will be two regression equations. One, the regression equation of
X on Y is used to describe the variation in the value of X for the given changes of Y and the
regression equation Y on X is used to describe the variation in the values of Y for the given
charges of X.

Regression Equation of Y in X.

The regression equation of Y on X expressed as follows:

Yc = a + bX

Where Y is the dependent variable to be estimated and X is the independent variable.

In this equation ‘a’ and ‘b’ are two unknown constants (fixed numerical values) which
determine the position of the line completely. The constant ‘a’ determines the level of the
fitted line i,e., the change in Y for the unit change in X.

2
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

If the values of the constants ‘a’ and ‘b’ are obtained, the line equation is completely
determined. But how to determine these values, the answer is obtained by the method of
least squares which states that the line should be drawn through the plotted points in such a
manner that the sum of the squares of the vertical deviations of the actual Y values from
the estimated Y values is the least or in the other words, in order to obtain a line which fits
the points best, (Y-Yc)2 should be minimum. Such a line is known as the line of best fit.

Regression equation - Deviation taken from arithmetic mean


Deviation taken from arithmetic mean of X on Y.
This method is much easier and much simpler than the previous method which is a tedious
one. We find regression equations by taking deviations from their respective means.
1. Regression equation of X an Y

X  X  r x (Y  Y ) X  X  bxy (Y  Y ) XX 
 dxdy (Y  Y )
y
or or  dy 2
x
r : is the regression coefficient of x on y or it can be represented as bxy.
y

r
x
=
 dxdy  b
y  dy 2 xy

Deviation taken from arithmetic mean of Y on X.


2. Regression equation of Y an X

Y Y  r
y
(X  X ) Y  Y  byx ( X  X ) Y Y 
 dxdy ( X  X )
x
or or  dx 2

y
r : is the regression coefficient of y on x or it can be represented as byx.
x

r
y
=
 dxdy  b
x  dx 2 yx

Properties of regression coefficient


The relation between the coefficient of correlation and coefficient of regression is given by
r  bxy b yx

(a) If bxy is positive , byx s also positive.


(b) If bxy is negative , byx s also negative.
(c) If one regression coefficient is greater than unity, then the other regression
coefficient must be lesser than unity.
Example 1: Calculate the two regression equations of X on Y and Y on X from the data
given below, taking deviations from the actual means of X and y variables.

3
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

Demand 12 13 15 13 12 20 20
Supply 45 40 43 37 40 39 43
Solution : The demand is taken as X series and the supply as Y
Computation of regression equations
X Y dx= (x- 15) dx2 dy= y-41 dy2 dxdy
12 45 -3 9 4 16 -12

13 40 -2 4 -1 1 2

15 43 0 0 2 4 0

13 37 -2 4 -4 16 8

12 40 -3 9 -1 1 3

20 39 5 25 -2 4 -10

20 43 5 25 2 4 10

∑x =105 ∑y = 287 ∑dx = 0 ∑dx2 = 76 ∑dy = 0 ∑dy2 = 46 ∑dxdy=1

X 
 X  105  15 Y
Y  287  41
N 7 N 7
Regression equation of X on Y Regression equation of Y on X

XX 
 dxdy (Y  Y ) Y Y 
 dxdy ( X  X )
 dy 2  dx 2
1 1
X  15  (Y  41) Y  41  ( X  15)
46 76
X  0.022(Y  41)  15 Y  0.013( X  15)  41
X  0.022Y  14.098 Y  0.013X  40.805

EXAMPLE 2:
The following data relate to ages of husbands and wives. Obtain the two regression
equations and determine the most likely age of husbands for the age of wife 25 years and
most likely age of wife age of husband 30 years. Also determine the coefficient of
correlation.
Age of 27 25 29 28 30 33 37 35 40 42
husbands
Age of 24 20 27 25 24 28 34 28 44 38

4
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

wives
Solution : The ages of husband is taken as X and the ages of wives is taken as Y.
X Y dx = x - 32.6 dx2 dy= y - 29.2 dy2 dxdy
27 24 -5.6 31.36 -5.2 27.04 29.12

25 20 -7.6 57.76 -9.2 84.64 69.92

29 27 -3.6 12.96 -2.2 4.84 7.92

28 25 -4.6 21.16 -4.2 17.64 19.32

30 24 -2.6 6.76 -5.2 27.04 13.52

33 28 0.4 0.16 -1.2 1.44 -0.48

37 34 4.4 19.36 4.8 23.04 21.12

35 28 2.4 5.76 -1.2 1.44 -2.88

40 44 7.4 54.76 14.8 219.04 109.52

42 38 9.4 88.36 8.8 77.44 82.72

∑dx2
∑x = 326 ∑y = 292 ∑dx = 0 = 298.4 ∑dy = 0 ∑dy2 = 483.6 ∑dxdy =349.8

X
 X  326  32.6 Y
 Y  292  29.2
N 10 N 10
Regression equation of X on Y Regression equation of Y on X

XX 
 dxdy (Y  Y ) Y Y 
 dxdy ( X  X )
 dy 2  dx 2
349.8 349.8
X  32.6  (Y  29.2) Y  29.2  ( X  32.6)
483.6 298.4
X  0.723(Y  29.2)  32.6 Y  1.173( X  32.6)  29.2
X  0.723Y  11.488 Y  1.173X  9.04
 dxdy  b  349.8  0.723  dxdy  b  349.8  1.173
 dy 2  dx 2
xy yx
483.6 298.4

The correlation coefficient is found through the regression coefficients


r  bxy b yx  0.723 1.173  0.997

Thus the likely age of husband for age of wife being 25 years is
X  0.723Y  11.488  0.723(25)  11.488  29.6 years
The likely age of wife for age of husband being 30 is

5
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

Y  1.173 X  9.04  1.173(30)  9.04  26.15 years


Example 3: Estimate : 1. The sales corresponding to advertising expenditure of Rs. 35
lakhs.
2. The advertising expenditure for the sales target of Rs. 30 crores .
The following data relates to advertising expenditure (in lakhs of rupees ) and their
corresponding sales (in crores of rupees)
Advertising
expenditure 15 16 19 20 21 23
Sales 9 12 17 23 21 26
Solution : Let advertising expenditure be denoted as X and sales as Y .
Calculation of regression equations
X Y dx = x – 19 dx2 dy = y - 18 dy2 dxdy
15 9 -4 16 -9 81 36
16 12 -3 9 -6 36 18
19 17 0 0 -1 1 0
20 23 1 1 5 25 5
21 21 2 4 3 9 6
23 26 4 16 8 64 32
∑x = 114 ∑y = 108 ∑dx2=46 0 ∑dy2 = 216 ∑dxdy = 97

X
 X  114  19 Y
 Y  108  18
N 6 N 6
Regression equation of X on Y Regression equation of Y on X

XX 
 dxdy (Y  Y ) Y Y 
 dxdy ( X  X )
 dy 2  dx 2
97 97
X  19  (Y  18) Y  18  ( X  19)
216 46
X  0.449(Y  18)  19 Y  2.109( X  19)  18
X  0.449Y 10.917 Y  2.109 X  22.065
 dxdy  b  97  0.449  dxdy  b 
97
 2.109
 dy 2  dx
xy 2 yx
216 46

The correlation coefficient is found through the regression coefficients


r  bxy b yx  0.449  2.109  0.973

1. Thus the likely sales corresponding to advertising expenditure of Rs.35 lakhs is


Y  2.109(35)  22.065  51.75lakhs

6
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

2. Thus the AD expenses corresponding to sales of Rs.30 cr is


X  0.449Y 10.917
X  0.449(30)  10.917  24.387crores
Regression equation – short cut method or deviations taken from Assumed
means
When the actual means of X and Y series are in fractions, the calculation of the deviations
becomes tedious and hence the deviations are taken from the assumed mean. The value of
the
regression coefficients , will be calculated as follows:

Regression equation of x on y Regression equation of y on x


( X  X )  bxy (Y  Y ) (Y  Y )  byx ( X  X )
( dx)(  dy ) ( dx)( dy )
 dxdy  N
 dxdy  N
bxy  byx 
( dy ) 2 ( dx) 2
Where  dy 2

N
where  dx 2

N
dx = x – A and dy = y - A
A is the assumed mean.

Example 4: A company wants to assess the impact of Exports on its annual profit. The
following table presents the information for the last eight years.
Years 2009 2010 2011 2012 2013 2014 2015 2016
Exports 10 8 5 10 9 5 8 7
(Rs.’000)
Annual 40 50 43 59 60 45 40 40
profit(Rs.’000)
Estimate the regression equation and predict the annual profit for 2017 for an allocated
sum of Rs.12,000 exports.
Solution: Let the exports be taken as x and that of annual profits as y.
Years x y dx = x -6 dx2 dy= y -40 ∑dy2 dxdy
10 40 2 4 -3 9 -6
2010
8 50 0 0 7 49 0
2009

7
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

5 43 -3 9 0 0 0
2008
10 59 2 4 16 256 32
2007
9 60 1 1 17 289 17
2006
5 45 -3 9 2 4 -6
2005
8 40 0 0 -3 9 0
2004
7 40 -1 1 -3 9 3
2003
∑x =62 ∑y =377 ∑dx= -2 ∑dx2=28 ∑dy=33 ∑dy2=625 ∑dx dy=40

X 
X 
62
 7.75 Y
Y 
377
 47.125
N 8 N 8
Regression equation of x on y Regression equation of y on x
( dx)(  dy ) ( dx)( dy )
 dxdy  N
 dxdy  N
bxy  byx 
(  dy )2 (  dx )2
 dy  N
2
 dx  N
2

(2)(33)
40 
8 40  8.25
bxy  
(33) 2
625  136.125
625 
8
(2)(33)
40 
8 40  8.25
b yx  
(2) 2
28  0.5
28 
8
48.25 48.25
bxy   0.0.99 b yx   1.755
488.875 27.5
( X  X )  bxy (Y  Y ) (Y  Y )  byx ( X  X )
( X  7.75)  0.099(Y  47.125) (Y  47.125)  1.755( X  7.75)
X = 0.099Y + 3.085 Y = 1.755X +33.524
The annual profit for the sum of exports of Rs.12 is Y = = 1.755(12) +33.524=Rs. 54.584(in
thousands)
=

8
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

Example 5: Calculate the coefficient of correlation and regression equations between X


and Y series from the following data:
X series Y series
Number of pairs of observation 15
Arithmetic mean 25 18
Sum of square of deviations from
arithmetic mean 286 136
summation of product deviation of X and Y series from their respective arithmetic mean =
169
Solution:
Lets the data given in the form of notations
N  15, X  25, Y  18,  dx 2  200,  dy 2 136,  dxdy  169
Regression equation of X on Y Regression equation Y on X

XX 
 dxdy (Y  Y ) Y Y 
 dxdy ( X  X )
 dy 2  dx 2
169
169 Y  18  ( X  25)
X  25  (Y  18) 286
136
Y  0.591X  3.225
X  1.243Y  1.243 18  25
X  1.243Y  2.626
r  bxy b yx  1.243  0.591  0.857
The coefficient if correlation
There is a high degree of correlation.

Example 6: From the following data of the rainfall and production of rice , the most likely
production corresponding to the rainfall of 35”
Rain fall(inches) production(tonnes)
Mean 25 50
SD 6 8
Coefficient of correlation = +0.85
Solution : rain fall is taken as X and production as Y.
Regression equation of X on Y Regression equation of Y on X
x y
X X r (Y  Y ) Y Y  r (X  X )
y x

9
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

8
Y  50  0.85 ( X  25)
6
6
X  25  0.85 (Y  50) Y  1.133 X  28.333  50
8
Y  1.133 X  21.167
X  0.638Y  31.875  25
X  0.638Y  6.875
The most likely production of rice for the rainfall of 35 inches is
Y  1.133(35)  21.167  60.822tonnes

Example 7: The coefficient of correlation between the ages of boys and girls in a
community was found to be +0.89, the average of boys was 13 years and that of girls 10
years. Their standard deviations were 3 ans 2 years respectively. Find with the help of
regression equations:
(a) The expected age of boy when the girl’s age is 17.
(b) The expected age of girl when the boy’s age is 18.
Solution:
Let the boys age be X and girls age Y.
X  13,Y  10, x  3, y  2, r  0.89
Regression equation of X on Y Regression equation of Y on X
x y
X X r (Y  Y ) Y Y  r (X  X )
y x
2
Y  10  0.89 ( X  13)
3
3
X  13  0.89 (Y  10) Y  0.593 X  7.709  10
2
Y  0.593 X  2.291
X  1.335Y  13.35  13
X  1.335Y  0.35
(a) The expected age of boy when girl’s age 17 is
X  1.335Y  0.35 = X  1.335(17)  0.35  22.345 years
(b) (b) The expected age of a girl when the boy’s age being 18 is
(c) Y  0.593X  2.291 = Y  0.593(18)  2.291  12.965 years
(d)
Example 8: The following calculation have been made for closing prices of eight stocks (Y)
on the National stock Exchange on a certain, along with the volume of the sales in
thousands of shares(X) . from these calculation find the regression equation of volume of
shares on stocks.
∑X= 56, ∑Y=40, ∑XY=364,∑X2=524, ∑Y2=256

10
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

Solution :

X
 X  56  7 Y
 Y  40  5
N 8 N 8
Regression equation of X on Y

XX 
 dxdy (Y  Y ) X  7  364 (Y  5) X  1.423Y  1.423  5  7
 dy 2 ;
256 X  1.423Y  0.115

Exercise

1. The average daily wages for working class in Nagpur is Rs.12 and for that in Delhi
Rs.18, their respective standard deviations are Rs.2 and Rs.3 and the coefficient of
correlation is 0.67. Find the most likely wage in Delhi corresponding to the wage of
Rs.20 in Nagpur.

2. Given the following values, find the expected value of X when Y is 12


Average of X series = 25 Average of Y series = 22

S.D of X series = 4 S.D of Y series = 5

3. The coefficient of correlation between marks obtained in mathematics and marks


obtained in is -0.4, the average marks are respectively 80 and 50. The standard deviation of
marks in Mathematics and English are 15 and 10 respectively. Estimate the marks of the
student in mathematics who has secured 64 marks in English.

4. Prices indices of cotton and wool are given below for 6 months of a year. Obtain the
equations of regression between the indices.
Prices index of cotton (X) 78 77 85 88 87 82
Prices index of wool (Y) 84 82 82 85 89 90

5. The following table gives the relative values of two variables :


X 42 44 58 55 89 98 66
Y 56 49 53 58 65 76 51
Determine the regression equations which may be associated with these values and
calculate Karl Pearson’s coefficient of correlation

6. The following table gives the aptitude test scores and the productivity indices of 10
workers selection at random.
Aptitude Scores(X) 60 62 65 70 72 48 53 73 65 82
Productivity index(Y) 68 60 62 80 85 40 52 62 60 81

11
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

Calculate the two regression equations and estimate the productivity index of a workers
whose test score is 92.
7. To study the relationship between expenditure on accommodation X and
expenditure on food and entertainment Y , an enquiry into 50 families gave the
following results:
∑X=8500, ∑y =9600, σx = 60 , σy = 20 and r = 0.6
Estimate the expenditure on food and entertainment when expenditure on
accommodation is Rs.200.

8. Following are the data on business on turnover and the staff of a company for eight
years from 2002 to 2009:
Years 2002 2003 2004 2005 2006 2007 2008 2009
Business 45 50 60 75 80 110 150 170
turnover(Rs
crores)
Staff 2,600 3,000 3,100 3,530 3,850 4,300 5,870 7,150
Fit a proper regression equation to estimate manpower in terms of business turnover.
Estimate the staff requirement when the business turnover reaches Rs.200 crores.

9. Calculate the two regression equations of X on Y and Y on X from the data given
below taking deviations from actual means of X and Y:
Price (Rs) 10 12 13 12 16 15
Amount demanded 40 38 43 45 37 43
Estimate the likely demand when the price is Rs.20.

10. An industrial engineer collected the following data on experience and performance
rating of 8 operators:
Operators 1 2 3 4 5 6 7 8
Experience 16 12 18 4 3 10 5 12
(years)
Performance 87 88 89 68 58 80 70 85
rating
(a) Does the data give evidence that experience improves performance?
(b) Estimate the performance rating of an operator having (a) 9 years (b) 15 years of
experience.

Answer
1. Y 20 = 26.04 2. 28.6 3. Marks in Maths =94 4. X=4.78Y+42.084, Y =
0.265X+63.365
5. X=2.19Y-65.25, Y=0.037X+35.39, r =0.901 6. X=-0.596Y+26.26, Y=1.168Y-10.92
7. Y=158+0.2X, Y=198. 8. Y= 33.24X+1100.3; 7748.3

12
FOR PRIVATE CIRCULATION ONLY
Module 5 QUANTITATIVE TECHNIQUES II

9. X= 17.92 - 0.12Y, Y = 44.25 - 0.25X, when x is 20, y= 49.25.


10. Y=69.67 + 1.133X

13
FOR PRIVATE CIRCULATION ONLY

You might also like