0% found this document useful (0 votes)
97 views

RESEARCH METHODS LESSON 18 - Multiple Regression

Multiple regression is a statistical technique used to predict the value of a dependent variable based on the values of two or more independent variables. The steps in multiple regression are similar to simple regression and include stating hypotheses, collecting data, assessing relationships between variables, calculating the regression equation, and interpreting results. A multiple regression equation includes a constant, slope coefficients for each independent variable, and measures of variance explained. Problems can arise with multiple regression including non-linear relationships between variables and multicollinearity between independent variables.

Uploaded by

Althon Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

RESEARCH METHODS LESSON 18 - Multiple Regression

Multiple regression is a statistical technique used to predict the value of a dependent variable based on the values of two or more independent variables. The steps in multiple regression are similar to simple regression and include stating hypotheses, collecting data, assessing relationships between variables, calculating the regression equation, and interpreting results. A multiple regression equation includes a constant, slope coefficients for each independent variable, and measures of variance explained. Problems can arise with multiple regression including non-linear relationships between variables and multicollinearity between independent variables.

Uploaded by

Althon Jay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 6

Lesson 18: MULTIPLE REGRESSION

MULTIPLE REGRESSION

    Reality in the public sector is complex. Often there may be several
possible causes associated with a problem; and likewise there may be several
factors necessary for a solution. Complex statistical applications are needed
which can:

1. deal with interval and ratio level variables


2. assess causal linkages
3. forecast future outcomes

    Ordinary least squares linear regression is the most widely used type of
regression for predicting the value of one dependent variable from the value of
one independent variable. It is also widely used for predicting the value of one
dependent variable from the values of two or more independent variables.
When there are two or more independent variables, it is called multiple
regression.
 

STEPS IN MULTIPLE REGRESSION

The steps in multiple regression are basically the same as in simple regression.

1. State the research hypothesis.


2. State the null hypothesis
3. Gather the data
4. Assess each variable separately first (obtain measures of central
tendency and dispersion; frequency distributions; graphs); is the
variable normally distributed?
5. Assess the relationship of each independent variable, one at a time,
with the dependent variable (calculate the correlation coefficient;
obtain a scatter plot); are the two variables linearly related?
6. Assess the relationships between all of the independent variables with
each other (obtain a correlation coefficient matrix for all the
independent variables); are the independent variables too highly
correlated with one another?
7. Calculate the regression equation from the data
8. Calculate and examine appropriate measures of association and tests
of statistical significance for each coefficient and for the equation as a
whole
9. Accept or reject the null hypothesis
10. Reject or accept the research hypothesis
11. Explain the practical implications of the findings
ELEMENTS OF A MULTIPLE REGRESSION EQUATION

Y=a + b1X1 + b2X2 + b3X3

Y is the value of the Dependent variable (Y), what is being predicted or


explained

a (Alpha) is the Constant or intercept

b1 is the Slope (Beta coefficient) for X1

X1 First independent variable that is explaining the variance in Y

b2 is the Slope (Beta coefficient) for X2

X2 Second independent variable that is explaining the variance in Y

b3 is the Slope (Beta coefficient) for X3

X3 Third independent variable that is explaining the variance in Y

s.e.b1 standard error of coefficient b1

s.e.b2 standard error of coefficient b2

s.e.b3 standard error of coefficient b3

R2 The proportion of the variance in the values of the dependent variable (Y)
explained by all the independent variables (Xs) in the equation together;
sometimes this is reported as adjusted R2, when a correction has been made to
reflect the number of variables in the equation.

F Whether the equation as a whole is statistically significant in explaining Y

Example: The Department of Highway Safety wants to understand the


influence of various factors on the number of annual highway fatalities.

Hypothesis: Number of annual fatalities is affected by total population, days of


snow, and avenge MPH on highways.

Null hypothesis: Number of annual fatalities is not affected by total population,


days of snow, or average MPH on highways.

Dependent variable: Y is the number of traffic fatalities in a state in a given


year
Independent variable: X1 is the state's total population; X2 is the number of
days it snowed; X3 is the average speed drivers were driving at for that year.

Equation: Y = 1.4 + .00029 X1 + 2.4 X2 + 10.3 X3

Predicted value of Y:  If X1=3,000,000, X2=2, and X3=65, then


Y = 1.4 + .00029 (3,000,0000) + 2.4 (2) + 10.3 (65) = 1545.7

a=1.4

This is the number of traffic fatalities that would be expected if all three
independent variables were equal to zero (no population, no days snowed, and
zero average speed).

b1=.00029

If X2 and X3 remain the same, this indicates that for each extra person in the
population, the number of yearly traffic fatalities increases by .00029.

b2=2.4

If X1 and X3 remain the same, this indicates that for each extra day of snow, Y
increases by 2.4 additional traffic fatalities.

b3= 10.3

If X1 and X2 remain the same, this indicates that for each mph increase in
average speed, Y increases by 10.3 traffic fatalities.

s.e.b1=.00003

Dividing b1 by s.e.b1 gives us a t-score of 9.66; p<.01. The t-score indicates


that the slope of the b coefficient is significantly different from zero so the
variable should be in the equation.
 

s.e.2=.62

Dividing b2 by s.e.b2 gives us a t-score of 3.87; p<.01. The t-score indicates


that the slope of the b coefficient is significantly different from zero so the
variable should be in the equation.

s.e.b3=1.1
Dividing b3 by s.e.b3 gives us a t-score of 9.36; p<.01. The t-score indicates
that the slope of the b coefficient is significantly different from zero so the
variable should be in the equation.

R2 = .78

We can explain 78% of the difference in annual fatality rates among states if we
know the states' populations, days of snow, and average highway speeds.

F is statistically significant.

The equation as a whole helps us to understand the dependent variable (Y).

Conclusion:  Reject the null hypothesis and accept the research hypothesis.
Make recommendations for management implications and further research.
 

PROBLEMS WITH MULTIPLE REGRESSION

    Just as with simple regression, multiple regression will not be good at
explaining the relationship of the independent variables to the dependent
variables if those relationships are not linear.

    Ordinary least squares linear multiple regression is used to predict


dependent variables measured at the interval or ratio level. If the dependent
variable is not measured at this level, then other, more specialized regression
techniques must be used.

    Ordinary least squares linear multiple regression assumes that the
independent (X) variables are measures at the interval or ratio level. If the
variables are not, then multiple regression will result in more errors of
prediction. When nominal level variables are used, they are called "dummy"
variables. They take the value of 1 to represent the presence of some quality,
and the value of zero the indicate the absence of that quality (for example,
smoker=1, non-smoker=0). Ordinal coefficients may indicate ranks (for
example, staff=1, supervisor=2, manager=3). The interpretation of the
coefficients is more problematic with independent variables measured at the
nominal or ordinal level.

    Regression with only one dependent and one independent variable
normally requires a minimum of 30 observations. A good rule of thumb is to
add at least an additional 10 observations for each additional independent
variable added to the equation.

    The number of independent variables in the equation should be limited


by two factors. First, the independent variables should be included in the
equation only if they are based on the researcher's theory about what factors
influence the dependent variable. Second, variables that do not contribute very
much to explaining the variance in the dependent variable (i.e., to the total R 2),
should be eliminated.

    Many difficulties tend to arise when there are more than five independent
variables in a multiple regression equation. One of the most frequent is the
problem that two or more of the independent variables are highly correlated to
one another. This is called multicollinearity. If a correlation coefficient matrix
with all the independent variables indicates correlations of .75 or higher, then
there may be a problem with multicollinearity. When two variables are highly
correlated, they are basically measuring the same phenomenon. When one
enters into the regression equation, it tends to explain most of the variance in
the dependent variable that is related to that phenomenon. This leave little
variance to be explained by the second independent variable.
 

Signs of multicollinearity include:

1) none of the t-ratios of the coefficients are statistically significant, but


he F-test for the equation as a whole is significant;
2) adding an additional independent variable to the equation radically
changes either the size or the sign (plus or minus) of the coefficients
associated with the other independent variables

If multicollinearity is discovered, the researcher may drop one of the two


variables that are highly correlated, or simply leave them in and note that
multicollinearity is present.
 

STANDARDIZED REGRESSION

    In multiple regression, the relative size of the coefficients is not


important. For example, say that we want to predict the graduate grade point
averages of students who are newly admitted to the MPA Program. We use their
undergraduate GPA, their GRE scores, and the number of years they have been
out of college as independent variables. We obtain the following regression
equation:

Y=1.437 + (.367) (UG-GPA) + (.00099) (GRE score) + (-.014) (years out of


college)

    We cannot compare the size of the various coefficients because the three
independent variables are measured on different scales. Undergraduate GPA is
measured on a scale from 0.0 to 4.0. GRE score is measured on a scale from 0
to 1600. Years out of college is measured on a scale from 0 to 20. We cannot
directly tell which independent variable has the most effect on Y (graduate level
GPA). However, it is possible to transform the coefficients into standardized
regression coefficients, which are written as the plain English letter b. The
standardized regression coefficients in any one regression equation are
measured on the same scale, with a mean of zero and a standard deviation of
1.

They are then directly comparable to one another, with the largest
coefficient indicating which independent variable has the greatest influence on
the dependent variable.
 

Variable Name Non-Standardized  Standardized


Coefficient (b)
Coefficient (beta)
Undergraduate GPA .367 +.291
GRE score .00099 +.175
Years out of college -.014 -.122
Intercept or Constant (a) 1.437 n/a

    The convention to use to denote non-standardized regression coefficients


and to use b to denote standardized coefficients is not always respected. The
one difference between non-standardized and standardized regression is that
standardized regression does not have an term (a constant). If there is no term
(no constant), then the regression coefficients have been standardized. If there
is an term, then the regression coefficients have not been standardized.

You might also like