0% found this document useful (0 votes)
10 views

Correlation and Regression

Uploaded by

yungshifu7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Correlation and Regression

Uploaded by

yungshifu7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CHAPTER 8

CORRELATION AND REGRESSION ANALYSIS

e
ut
rib
R egression analysis allows us to predict one variable from information we have

t
about other variables. In this chapter, linear regression is discussed. Linear

is
regression is a type of analysis that is performed on interval and ratio variables
(labeled “scale” variables in SPSS Statistics). However, it is possible to incorporate

d
data from variables with lower levels of measurement (i.e., nominal and ordinal
variables) through the use of dummy variables. We will begin with a bivariate

or
regression example and then add some more detail to the analysis.

BIVARIATE REGRESSION
s t,
In the case of bivariate regression, researchers are interested in predicting the
value of the dependent variable, Y, from the information they have about the
po

independent variable, X. We will use the example below, in which respondent’s


occupational prestige score is predicted from number of years of education.
Choose the following menus to begin the bivariate regression analysis:
,
py

Analyze → Regression → Linear . . .


t co
no
o
D

127
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
The “Linear Regression” dialog box will appear. Initially, select the variables
of interest and drag them into the appropriate areas for dependent and indepen-
dent variables. The variable “REALRINC,” respondent’s actual annual income,
should be moved to the “Dependent” area, and “EDUC,” respondent’s number
of years of education, should be moved to the “Independent(s)” area. Now, simply
click “OK.” The following SPSS Statistics output will be produced:

e
ut
t rib
In the first column of the “Model Summary” box, the output will yield

is
Pearson’s r (in the column labeled “R”), followed in the next column by r-square
(r 2). SPSS Statistics also computes an adjusted r 2 for those interested in using that

d
value. R-square, like lambda, gamma, Kendall’s tau-b, and Somers’ d, is a PRE
(proportional reduction in error) statistic that reveals the proportional reduction

or
in error by introducing the dependent variable(s). In this case, r2 = .083, which
means that 8.3% of the variation in real annual income is explained by the varia-
tion in years of education. Although this percentage might seem low, consider that
t,
years of education is one factor among many (8.3% of the factors, to be exact) that
s
contribute to income, including major field of study, schools attended, prior and
continuing experience, region of the country, gender, race/ethnicity, and so on. We
po

will examine gender (sex) later in this chapter to demonstrate multiple regression.
,
py
t co
no

ANOVA (analysis of variance) values, including the F statistic, are given in the
above table of the linear regression output.
o
D

128   USING IBM ® SPSS ® STATISTICS FOR RESEARCH METHODS AND SOCIAL SCIENCE STATISTICS
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
The coefficients table reveals the actual regression coefficients for the regres-
sion equation, as well as their statistical significance. In the “Unstandardized
Coefficients” columns, and in the “B” column, the coefficients are given. In this
case, the b value for number of years of education completed is 2,933.597. The
a value, or constant, is -17,734.68. By looking in the last column (“Sig.”), you can
see that both values are statistically significant (p = .000). Remember, the p value
refers to the probability that the result is due to chance, so smaller numbers are

e
better. The standard in social sciences is usually .05; a result is deemed statistically
significant if the p value is less than .05. We would write the regression equation

ut
describing the model computed by SPSS Statistics as follows:

Yˆ = bX + a → Yˆ = $2, 933.60 X ∗ −$17, 734.68 ∗

rib
*Statistically significant at the p ≤ .05 level.

t
The coefficient in the bivariate regression model above can be interpreted

is
to mean that each additional year of education provides a $2,933.60 predicted

d
increase in real annual income. The constant gives the predicted real annual
income when years of education is zero; however, as is often the case with a regres-
sion equation, that may be beyond the range of the data for reasonable prediction.

or
In other words, if no one had zero or near zero years of education in the sample,
the range of the data upon which the prediction was calculated did not include
such, and we should be cautious in making predictions at those levels.
s t,
CORRELATION
po

Information about correlation tells us the extent to which variables are related.
Below, the Pearson method of computing correlation is requested through
SPSS Statistics. To examine a basic correlation between two variables, use the
,

following menus:
py

Analyze → Correlate → Bivariate . . .


t co
no
o
D

Chapter 8 • Correlation and Regression Analysis   129

Copyright ©2020 by SAGE Publications, Inc.


This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
In the “Bivariate Correlations” dialog box, choose the variables you wish to
examine. In the preceding case, “MALE” (a dummy variable representing sex,
described in further detail below, under “Multiple Regression”) and “EDUC,”
representing years of education, have been selected. For now, “MALE” is a
recoded version of the sex variable, where a male respondent is coded as 1 and a
female respondent is coded as 0. Thus, a “1” indicates “male” and a “0” indicates
“not male,” with a proportional range in between. This allows us to treat a nomi-

e
nal dichotomy as an interval/ratio variable and then use it in regression and cor-
relation analysis. Follow the following menus to create the male dummy variable:

ut
Transform → Recode into Different Variables . . .

t rib
d is
or
s t,
po

Select SEX, and then add the name and label, as above. Now click “Old and
,

New Values . . .” Enter the recoding instructions, as illustrated below.


py
t co
no
o
D

130   USING IBM ® SPSS ® STATISTICS FOR RESEARCH METHODS AND SOCIAL SCIENCE STATISTICS
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
Now, click “Continue,” and then click “OK” in the first dialog box. The new
variable, “MALE,” will be created. Be sure to do the appropriate fine-tuning for
this new variable (e.g., eliminate decimal places, because there are only two pos-
sible values this variable can take: 0 and 1) in the Variable View window.
Returning to the correlation exercise, the output that results is shown in the
following table:

e
ut
t rib
d is
Note that in the output, the correlation is an extremely small, −.12, which is

or
not statistically significant (p = .513). This tells us that being male is not correlated
with having completed a greater number of years of education.
It is also possible to produce partial correlations. Suppose you are inter-
t,
ested in examining the correlation between occupational prestige and education.
s
Further suppose you wish to determine the way that sex affects that correlation.
po

Use the following menus to produce a partial correlation:

Analyze → Correlate → Partial . . .


,
py
t co
no
o
D

Chapter 8 • Correlation and Regression Analysis   131

Copyright ©2020 by SAGE Publications, Inc.


This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
In the “Partial Correlations” dialog box, you will be able to select the variables
among which you wish to examine a correlation. You will also be able to select the
control variable, around which partial correlations will be computed. In this case,
years of education (“EDUC”) and occupational prestige score (“REALRINC”)
have been selected for correlation analysis. The control variable is “MALE.” (It is
also possible to include more than one control variable.)
SPSS Statistics provides the following output:

e
ut
t rib
d is
or
s t,
Here, the correlation is noteworthy, at .302, and is statistically significant
(p = .000). This is indicative of a relationship between education and income.
po

Correlation information about variables is useful to have before constructing


regression models. Should you want to know more, many textbooks in statistics
and research methods have detailed discussions about how this information aids
,

in regression analysis.
py

MULTIPLE REGRESSION
co

Now, suppose a researcher wished to include one or more additional indepen-


dent variables in a bivariate regression analysis. This is very easy to do using
SPSS Statistics. All you need to do is move the additional variables into the
t
no

“Independent(s)” area in the “Linear Regression” dialog box, as seen below:

Analyze → Regression → Linear . . .


o

Because linear regression requires interval-ratio variables, one must take care
when incorporating variables such as sex, race/ethnicity, religion, and the like. By
D

creating dummy variables from the categories of these nominal variables, you can
add this information to the regression equation.
To do this, use the recode function (for more information about recod-
ing variables, see Chapter 2, “Transforming Variables”). Create a dichotomous

132   USING IBM ® SPSS ® STATISTICS FOR RESEARCH METHODS AND SOCIAL SCIENCE STATISTICS
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
e
ut
t rib
d is
or
t,
variable for all but one category, the “omitted” comparison category or attri-
s
bute, and insert each of those dichotomies into the “Independent(s)” area. The
po

number of dummy variables necessary for a given variable will be equal to K – 1,


where K is the number of categories of the variable. Dichotomies are an exception
to the cumulative property of levels of measurement, which tells us that vari-
ables measured at higher levels can be treated at lower levels but not vice versa.
,

Dichotomies, typically considered categorical or nominal, can be “coded” to be


py

treated as if they are at any level of measurement.


For the case of sex, we already have a dichotomy exclusive of transgender
co

categories and other conditions, so the recoding just changes this to one variable:
“MALE.” (Alternatively, you could have changed it to “FEMALE.”) The coding
should be binary: 1 for affirmation of the attribute, 0 for respondents not possess-
ing the attribute. Now, as was entered into the previous dialog box, just select the
t

new recoded variable, “MALE,” from the variable bank on the left and drag it into
no

the “Independent(s)” area on the right. You may need to set the variable property
to scale in the Variable View tab of the Data Editor window so that SPSS Statistics
will allow that variable to be included in the regression analysis. Newer versions
o

of SPSS Statistics track variable types and often will not allow you to include vari-
ables with lower levels of measurement in analyses requiring variables with higher
D

levels of measurement.
After recoding as necessary and dragging your variables of interest into their
respective areas, click the “Plots . . .” button, and you will be shown the “Linear
Regression: Plots” dialog box:

Chapter 8 • Correlation and Regression Analysis   133

Copyright ©2020 by SAGE Publications, Inc.


This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
e
ut
t rib
d is
Here, you can avail yourself of a couple of useful graphics: a histogram and a
normal probability plot. Click each box to request them. Then click “Continue.”

or
When you are returned to the “Linear Regression” dialog box, select the
“Statistics . . .” button. The following dialog box will appear:
s t,
, po
py
t co
no

There are a number of options, including descriptive statistics, that you may
select to be included in the SPSS Statistics linear regression output. For now, leave
o

the defaults checked as shown, and click “Continue” in this box; then click “OK”
when returned to the “Linear Regression” dialog box.
D

On next page you will find tables from the SPSS Statistics output that
results. The first table reports the descriptive statistics that were requested.
The next two tables give the same sort of information as before in the bivari-
ate regression case: Pearson’s r (correlation coefficient), r 2 (PRE), and ANOVA
(analysis of variance) values.

134   USING IBM ® SPSS ® STATISTICS FOR RESEARCH METHODS AND SOCIAL SCIENCE STATISTICS
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
e
ut
t rib
d is
In this case, r 2 = .115, which means that 11.5% of the variation in respon-

or
dents’ real annual income (“REALRINC”) is explained by the variation in the
independent variables: years of education (“EDUC”) and sex (“MALE”).
s t,
, po
py
co

The “Coefficients” table (on page 136), again, provides the information
that can be used to construct the regression model and equation. Note that the
dummy variable, “male,” was not statistically significant.
t
no

Yˆ = bX 1 + bX 2 + a → Yˆ = $3, 045.39 X 1 ∗ + $10, 619.76 X 2 ∗ − $24, 512.82

*Statistically significant at the p ≤ .05 level.


o

The X1 coefficient (“EDUC,” years of education) can be interpreted to


D

mean that each additional year of education provides a $3,045.39 predicted


increase in real annual income. The X2 coefficient (“MALE,” dummy variable
for gender) can be interpreted to mean that men have a predicted real annual
income of $10,619.76 more than women for this prediction model. In this case,
both independent variables are statistically significant, with p = .000.

Chapter 8 • Correlation and Regression Analysis   135

Copyright ©2020 by SAGE Publications, Inc.


This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
e
ut
The two graphics that follow show a histogram of the regression standard-
ized residual for the dependent variable and the observed by expected cumulative

rib
probability for the dependent variable, real annual income.

Histogram

t
Dependent Variable: R’s income in constant $

is
Mean = −2.21E−16
400 Std. Dev. = 0.999
N = 1,631

d
300
Frequency

200 or
t,
100
s
po

0
−2 0 2 4 6
Regression Standardized Residual
,

Normal P–P Plot of Regression Standardized Residual


py

Dependent Variable: R’s income in constant $


1.0
co

0.8
Expected Cum Prob
t
no

0.6

0.4
o
D

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0

Observed Cum Prob

136   USING IBM ® SPSS ® STATISTICS FOR RESEARCH METHODS AND SOCIAL SCIENCE STATISTICS
Copyright ©2020 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
It is possible to add additional variables to your linear regression model,
such as those in the dialog box featured below. Interval-ratio variables may be
included, as well as dummy variables, along with others such as interaction vari-
ables. Interaction variables may be computed using the compute function (in
the “Transform” menu). More information about computing variables can be
found in Chapter 2, “Transforming Variables.” The computation would consist
of: Variable 1 × Variable 2 = Interaction Variable.

e
ut
t rib
d is
or
s t,
, po
py
t co
no

Access the full 2016 data file and the 1972–2016 Cumulative Codebook at
the student study site: study.sagepub.com/wagner7e.
o
D

Chapter 8 • Correlation and Regression Analysis   137

Copyright ©2020 by SAGE Publications, Inc.


This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.

You might also like