Regression and
Multiple Regression Analysis
Regression
- a technique use for the modeling and analysis of
numerical data consisting of value of dependent
variable (response variable) and of one or more
independent variables (explanatory variables).
It can be used for prediction (including forecasting of
time-series data), inference, hypothesis testing, and
modeling of causal relationships.
Regression concepts were published in early of 1800.
It was published by Legendre 1805 and gauss 1809.
Applications
Applications of regression are numerous and occur in
almost every field, including:
- engineering,
- physical sciences,
- economics,
- management,
- life and biological sciences
- social sciences.
In fact, regression analysis may be the most widely used
statistical technique.
Types of Regression
Models
1 independent Regression 2+ Independent
Variable Models Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
simple linear regression model:
A regression model that involves only one independent
variable.
The form can be express as
Yi = β0+ β1Xi+ei i 1,2,3,..., n
Here, Y = the yield (dependent),
Xi= the independent variable
ei= error or disturbance
Multiple linear regression model:
A regression model that involves more than one
regressor (independent) variable.
The general form can be express as
Yi = β0+ β1Xi1+ β2Xi2+ …. + βkXik+ ei i 1,2,3,..., n
Here, Y = the yield (dependent),
Xi= the independent variable
ei= error or disturbance
Objectives
1. The general purpose of regression (multiple
regression) is to learn about the relationship between
several independent or predictor variables and a
dependent variable.
2. The specific objective of regression are:
• Estimate the unknown parameters in the
regression model (fitting the model to the data).
• Predict or forecast the response variable and these
predictions are helpful in planning the project.
Underlying Principles
According to Gaussian, standard or classical linear
regression model (CLRM), which is the
foundation/cornerstone of most econometric theory.
several assumptions:
Assumption 1: The regression model is linear in the
parameters
Assumption 2: X values are fixed in repeated sampling
Assumption 3: Zero mean values of disturbance (error)
Underlying Principles cont’s …
Assumption 4: Error variance
ie: Var(ei /Xi ) = 2 ( a constant)
Assumption 5: No autocorrelation between the
disturbances (error).
Assumption 6: Zero covariance between ei and Xi , or
Cov (ei, Xi) = 0
Assumption 7: There are no perfect linear
relationships among the
independent variables.
Methods of Estimation
Here we just name some well-known methods for
estimating the regression model:
1. The methods of moments
2. The methods of least squares
3. The methods of maximum likelihood
The Ordinary Least Squares (OLS) method of
estimation is the popular one, has a wide area of uses
for its flexibility.
The main aim of least square method is to estimate
parameters of the linear regression model by minimizing
the error sum of squares.
The Ordinary Least Squares (OLS)
A multi linear model of the form
Y = β0+ β1X1+ β2X2+….++ β6X6+e
We may write the sample regression model as follows
Yi = 0 + 1xi1 + 2xi2 + ---------+ kxik + I
The least-squares function is
n
S = ∑I 2
i=1
n k
= ∑( yi - 0 - ∑j xij )2
i=1 j =1
This function S must be with respect to 0, 1, ……….., k.
The least-squaresd estimators of 0, 1, ……….., k are estimated by
minimized this S function with respect to 0, 1, ……….., k.
The techniques to determining the model accuracy:
i) Standard error of the coefficient
ii) T-test of the coefficients
iii) Residuals standards deviations
iv) Coefficient of determination, R2
v) ANOVA for overall measures
(i) The standard error is represented
by
se( i ) MSres / Sxx
MSres : residual means square
Sxx : Sum of square of independent variables
(ii) T-test of the coefficients
• Suppose that we wish to test the hypothesis
that the slope equals a constant, say ßi0.
The appropriate hypothesis are:
H0 : ßi = ßio
H1 : ßi ≠ ßio
where we have specified a two-sided alternative
(ii) T-test of the coefficients cont’s…
The definition of a t statistic is follows:
To = (βi – βio) / MSres / Sxx
iii) Coefficient of determination:
R2 as a PRE (proportional-reduction-in-error measure of association)
o
iv) Residual standard deviation:
the standard deviation of the residuals (residuals = differences
between observed and predicted values). It is calculated as
follows:
(v) ANOVA for overall measures
The analysis of variance table divides the total variation in
the dependent variable into two components,
1st component- which can be attributed to the regression
model (labeled Regression)
2nd component-which cannot (labeled Residual).
*If the significance level for the F-test is small (less than
0.05), then the hypothesis that there is no (linear)
relationship can be rejected, and the multiple correlation
coefficient can be called statistically significant. The F
statistic can be written as
Fo = MSr MSr = Regression means square
MSres = Residual means square
MSres
Literature on Applications of OLS method :
Here we have considered a seven variable Multiple linear regression model.
The model can be written as a linear form
Y = β0+ β1X1+ β2X2+….++ β6X6+e
Y = Overall rating of job being done by supervisor
X1 = Handles employee complaints
X2 = Does not allow special privileges
X3 = Opportunity to learn new things
X4 = Raises based on performance
X5 = To critical of poor performance
X6 = Rate of advancing to better jobs
e = Error term
β0, β1, β2,….,β6 are the unknown parameters.
Our ultimate goal is to estimate the unknown parameters from the model.
Data Source: http://www.ilr.cornell.edu/hadi/rabe4
For estimating model we have used here SPSS 11.5 version. The outputs getting
from SPSS 11.5 version are given below:
Summary of coefficients
t Sig.
Model Std. Error of
Coefficients Coefficients
(Constant) 10.787 11.589 .931 .362
X1 .613 .161 3.809 .001
X2 -.073 .136 -.538 .596
X3 .320 .169 1.901 .040
X4 .082 .221 .369 .715
X5 .038 .147 .261 .796
X6 -.217 .178 -1.218 .236
From summary of the coefficients table we see that the variables
X1 and X3 are significance than comparing the other variables.
The R2 value =0.73 and standard error of the estimate= 7.06
Here value of R2 is high, this imply that our fitting
model for this data set is appropriate.
ANOVA
Model Sum of
Squares df Mean Square F Sig.
Regression
3147.966 6 524.661 10.502 .000
Residual 1149.000 23 49.957
Total 4296.967 29
We can also comment from ANOVA Table that
over all fitting of the model is also appropriate (F=10.502, α=0.01).
Conclusion
1. Regression- can learn the relationship between several
independent variables and a dependent variable.
2. Regression- can estimate the unknown parameters of
regression model
3. It also can be use for forecasting the response variable
and these predictions are helpful in planning the project.