Regression c
Regression c
Coefficient
The following is a discussion of the regression
coefficient without all the formulas
Correlation and Regression
• The goal of explanatory research is to tell us why things exist
as they do. The use of true experimental designs is a powerful
method for conducting causal research. There are however
many types of social phenomena for which it is not practical or
ethical to conduct social experiments.
Typical Line
• Earlier, when we were describing one variable, we wanted to
have a model that represented the typical value. That model
was a single value. When we have two variables, we want to
find the line that is typical of or best represents all the
observations.
• In the scatterplot on the next slide, does it appear that there is
a linear relationship?
Equation for a Line
Equation for a Line
• If you said yes, then you were correct.
B Std. Error t Sig.
(Constant) 1.395 .835 1.672 .101
Poverty Rate: 2008 .698 .063 11.034 .000
Dependent Variable: Births to Teenage Mothers as Percent of All
Births: 2007
• Looking at the independent variable, poverty on the
dependent variable, there is a positive relationship (.698)
and according the t-test (11.034) it is statistically
significant (.000)
• The coefficient for poverty (.698) suggests that as
poverty increases so does teen pregnancy.
Equation for a Line
• If you said yes, then you were correct.
B Std. Error t Sig.
(Constant) 1.395 .835 1.672 .101
Poverty Rate: 2008 .698 .063 11.034 .000
Dependent Variable: Births to Teenage Mothers as Percent of All Births:
2007
• Back to the regression formula: Y=a + bx
• The constant is for a, or the intercept
• Y= a + b(x)
• Y= 1.395 + poverty rate (.698)
• So you could enter the values to calculate Y
• However, it is important to remember that there is error in the
calculations: .835 for the intercept and .063 for the independent
variable
Equation for a Line
• Outliers: Describing data with a line--making a linear model of
the data--is a powerful and useful statistical technique.
However, there are important cautions to keep in mind when
calculating correlations and when using regression to estimate
slopes.
• Other Than Linear Relationships: As noted many times in this
chapter, there is a link between the correlation and linear
relationships. If there is a linear relationship--one that can be
represented by a straight line, then the correlation will be
high. However, if the correlation is low, then there is no
evidence of a linear relationship between the variables.
However, there may be other kinds of relationships.
Caution Inferring Cause
• When we observe a relationship between two variables, it is
frequently tempting to infer that one variable is causing the other
one.
• For example, suppose a medical survey reports a negative
correlation between the amount of red meat eaten and age of
death--the more red meat eaten the younger the age of death, on
average.
• It is tempting to conclude that eating the red meat causes people to
die younger of various diseases. Although a causal link is one
possibility, there are other plausible alternative explanations. It may
be that some other variable, related to both, is actually the cause.
• For example, eating red meat might be associated with getting less
exercise and it might actually be the lack of exercise that is the
cause of the later health problems. The causal ambiguity is more
obvious in the following true example.
Caution Inferring Cause
• In any major city there is a positive correlation between the
monthly sales of ice cream cones and the monthly number of
suicides.
• It seems farfetched to propose that eating ice cream causes
suicides, but what is going on? We don't really believe that
banning ice cream sales would reduce suicide rates.
• Most likely, a third variable--average monthly temperature--is
responsible for the relationship between ice cream and
suicides. Hotter temperatures may cause all sorts of people to
eat more ice cream and, unfortunately, they also may cause
some depressed people to become even more desperate.
Caution Inferring Cause
• Many textbooks emphasize: Correlation does not imply
causation.
• Although following that rule will generally get you in less
trouble than its opposite, it is too strong a statement.
• If one of the variables is under the control of the researcher,
then correlation does imply causation.
• The correlation in that case does allow the researcher to
conclude that increasing the stress level causes ingots to fail
sooner. But if instead both variables are simply observed and
not controlled, then remember: Correlation does not imply
causation.
• Correlation and regression measure and describe the co-
relationship between two variables.
MAIN POINTS
• Control
• Although two variables may be associated, they are not
necessarily causally related. By using methods that control other
factors, researchers are able to obtain evidence about whether
an independent variable has a causal influence on a dependent
variable.
What does this tell us? Looking at the regression coefficient for
the initiative (7.301) with a standard error of .997, this would
suggest what? First, we must check to see if the variable is
statistically significant because if it is not then it does not matter
what the regression coefficient is. All we would be able to say is
that there is no statistical relationship between the two variables.
We know from the t statistic and Sig. that
there is a statistically significant relationship
and that it is a positive relationship: 7.301 is
positive and as noted above statistically
significant (.000). So, this would allow us to
infer that states with the initiative have
higher turnout than states that do not have
the initiative. Further analysis would be
necessary to be certain but it appears that
turnout would be about 6.3 and 8.3 percent
higher.
Should we also include number of initiatives on the ballot to see
if that makes a difference? Why not!
Model Summary
Mode R R Adjusted R Std. Error of
l Square Square the
Estimate
1 .366a .134 .129 9.30567
a. Predictors: (Constant), initnumb, duminit
What does this tell us? Looking at the regression coefficient for the
initiative (7.154) with a standard error of 1.146, and the regression
coefficient for the number of initiatives is .069 with a standard
error of .262. What would this suggest? First, we must check to
see if the variables are statistically significant because if it is not
then it does not matter what the regression coefficient is. All we
would be able to say is that there is no statistical relationship
between the two variables.
Coefficientsa
Model Unstandardized Standardize t Sig.
Coefficients d
Coefficients
B Std. Error Beta
(Constan
36.323 .680 53.377 .000
t)
1
duminit 7.154 1.146 .358 6.245 .000
initnumb .069 .262 .015 .262 .794
a. Dependent Variable: turnout