Assumptions of Multiple Regression
Assumptions of Multiple Regression
This tutorial should be looked at in conjunction with the previous tutorial on Multiple
Regression. Please access that tutorial now, if you haven’t already.
When running a Multiple Regression, there are several assumptions that you need to check
your data meet, in order for your analysis to be reliable and valid. This tutorial will talk you
though these assumptions and how they can be tested using SPSS.
This tutorial will use the same example seen in the Multiple Regression tutorial. It aims to
investigate how revision intensity and subject enjoyment (the IVs/predictor variables) may
predict students' exam score (the DV/outcome variable).
The analysis for this tutorial is all done using SPSS file ‘Week 6 MR Data.sav’:
The Assumptions
Assumption #1: The relationship between the IVs and the DV is linear.
The first assumption of Multiple Regression is that the relationship between the IVs and the DV
can be characterised by a straight line. A simple way to check this is by producing scatterplots
of the relationship between each of our IVs and our DV.
To produce a scatterplot, CLICK on the Graphs menu option and SELECT Chart Builder
To produce a scatterplot, SELECT the Scatter/Dot option from the Gallery options in the bottom
half of the dialog box. Then drag and drop the Simple Scatterplot icon into the Chart Preview
Window.
Next, we need to tell SPSS
what to draw. To do this, drag
and drop the DV (Exam Score)
onto the graph's Y-Axis and
one of the IVs (in this case,
Hours spent revising) onto the
graph's X-Axis.
These distances are known as residuals. Each data point has an associated residual, and these
play an important role in the assumptions of multiple regression.
To test the next assumptions of multiple regression, we need to re-run our regression in SPSS.
To do this, CLICK on the Analyze file menu, SELECT Regression and then Linear.
This opens the main
Regression dialog box. As
we haven’t shut SPSS down
since running our multiple
regression (in the previous
tutorial), SPSS remembers
the options we chose for
running our analysis. Exam
Score is still selected as our
DV, and Revision Intensity
and Subject Enjoyment are
entered as the predictors (or
IVs).
To test the next assumption, CLICK on the Plots option in the main Regression Dialog box.
Assumption #4: The variance of the residuals is constant.
This is called homoscedasticity, and is the assumption that the variation in the residuals (or
amount of error in the model) is similar at each point across the model. In other words, the
spread of the residuals should be fairly constant at each point of the predictor variables (or
across the linear model). We can get an idea of this by looking at our original scatterplot… but
to properly test this, we need to ask SPSS to produce a special scatterplot for us that includes
the whole model (and not just the individual predictors).
To test the 4th assumption, we need to plot the standardised values our model would predict,
against the standardised residuals obtained.
To do this, first CLICK on the ZPRED variable and MOVE it across to the X-axis. Next, SELECT the
ZRESID variable and MOVE it across to the Y-axis.
Significant outliers and influential data points can place undue influence on your model, making
it less representative of your data as a whole. To identify any particularly influential data
points, first CLICK the SAVE option in the main Regression dialog box.
SPSS now produces both the results of the multiple regression, and the output for assumption
testing. This tutorial will only go through the output that can help us assess whether or not the
assumptions have been met.
For the assumption to be met we want VIF scores to be well below 10, and tolerance scores to
be above 0.2; which is the case in this example.
To check the next assumption we need to look at is the Model Summary box. Here, we can use
the Durbin-Watson statistic to test the assumption that our residuals are independent (or
uncorrelated). This statistic can vary from 0 to 4. For assumption #3 to be met, we want this
value to be close to 2. Values below 1 and above 3 are cause for concern and may render your
analysis invalid.
In this case, the value is 1.931, so we can say this assumption has been met.
Assumption #4: The variance of the residuals is constant.
To test the fourth assumption, you need to look at the final graph of the output. This tests the
assumption of homoscedasticity, which is the assumption that the variation in the residuals (or
amount of error in the model) is similar at each point of the model.
This graph plots the standardised values our model would predict, against the standardised
residuals obtained. As the predicted values increase (along the X-axis), the variation in the
residuals should be roughly similar. If everything is ok, this should look like a random array of
dots. If the graph looks like a funnel shape, then it is likely that this assumption has been
violated.
Assumption Assumption
Met Not Met
It is important that you flag any violations of your assumptions when writing up the results of
your multiple regression analysis. In this case:
Assumption #1: The relationship between the IVs and the DV is linear.
Scatterplots show that this assumption had been met (although you would need to
formally test each IV yourself).
You have now seen how to test the assumptions of multiple regression using SPSS. Why not
download the data file used in this example, and try to produce the output yourself.
Remember, it is important to report any violations of these assumptions when writing up your
results, so readers know that they should be interpreted with caution.