RESEARCH METHODS LESSON 18 - Multiple Regression
RESEARCH METHODS LESSON 18 - Multiple Regression
MULTIPLE REGRESSION
Reality in the public sector is complex. Often there may be several
possible causes associated with a problem; and likewise there may be several
factors necessary for a solution. Complex statistical applications are needed
which can:
Ordinary least squares linear regression is the most widely used type of
regression for predicting the value of one dependent variable from the value of
one independent variable. It is also widely used for predicting the value of one
dependent variable from the values of two or more independent variables.
When there are two or more independent variables, it is called multiple
regression.
The steps in multiple regression are basically the same as in simple regression.
R2 The proportion of the variance in the values of the dependent variable (Y)
explained by all the independent variables (Xs) in the equation together;
sometimes this is reported as adjusted R2, when a correction has been made to
reflect the number of variables in the equation.
a=1.4
This is the number of traffic fatalities that would be expected if all three
independent variables were equal to zero (no population, no days snowed, and
zero average speed).
b1=.00029
If X2 and X3 remain the same, this indicates that for each extra person in the
population, the number of yearly traffic fatalities increases by .00029.
b2=2.4
If X1 and X3 remain the same, this indicates that for each extra day of snow, Y
increases by 2.4 additional traffic fatalities.
b3= 10.3
If X1 and X2 remain the same, this indicates that for each mph increase in
average speed, Y increases by 10.3 traffic fatalities.
s.e.b1=.00003
s.e.2=.62
s.e.b3=1.1
Dividing b3 by s.e.b3 gives us a t-score of 9.36; p<.01. The t-score indicates
that the slope of the b coefficient is significantly different from zero so the
variable should be in the equation.
R2 = .78
We can explain 78% of the difference in annual fatality rates among states if we
know the states' populations, days of snow, and average highway speeds.
F is statistically significant.
Conclusion: Reject the null hypothesis and accept the research hypothesis.
Make recommendations for management implications and further research.
Just as with simple regression, multiple regression will not be good at
explaining the relationship of the independent variables to the dependent
variables if those relationships are not linear.
Ordinary least squares linear multiple regression assumes that the
independent (X) variables are measures at the interval or ratio level. If the
variables are not, then multiple regression will result in more errors of
prediction. When nominal level variables are used, they are called "dummy"
variables. They take the value of 1 to represent the presence of some quality,
and the value of zero the indicate the absence of that quality (for example,
smoker=1, non-smoker=0). Ordinal coefficients may indicate ranks (for
example, staff=1, supervisor=2, manager=3). The interpretation of the
coefficients is more problematic with independent variables measured at the
nominal or ordinal level.
Regression with only one dependent and one independent variable
normally requires a minimum of 30 observations. A good rule of thumb is to
add at least an additional 10 observations for each additional independent
variable added to the equation.
Many difficulties tend to arise when there are more than five independent
variables in a multiple regression equation. One of the most frequent is the
problem that two or more of the independent variables are highly correlated to
one another. This is called multicollinearity. If a correlation coefficient matrix
with all the independent variables indicates correlations of .75 or higher, then
there may be a problem with multicollinearity. When two variables are highly
correlated, they are basically measuring the same phenomenon. When one
enters into the regression equation, it tends to explain most of the variance in
the dependent variable that is related to that phenomenon. This leave little
variance to be explained by the second independent variable.
STANDARDIZED REGRESSION
We cannot compare the size of the various coefficients because the three
independent variables are measured on different scales. Undergraduate GPA is
measured on a scale from 0.0 to 4.0. GRE score is measured on a scale from 0
to 1600. Years out of college is measured on a scale from 0 to 20. We cannot
directly tell which independent variable has the most effect on Y (graduate level
GPA). However, it is possible to transform the coefficients into standardized
regression coefficients, which are written as the plain English letter b. The
standardized regression coefficients in any one regression equation are
measured on the same scale, with a mean of zero and a standard deviation of
1.
They are then directly comparable to one another, with the largest
coefficient indicating which independent variable has the greatest influence on
the dependent variable.