Statistics Chapter 4
Statistics Chapter 4
Chapter 4
Describing the
Relation between
Two Variables
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
Learning Objectives
1. Draw and interpret scatter diagrams
2. Describe the properties of the linear correlation coefficient
3. Compute and interpret the linear correlation coefficient
4. Determine whether a linear relation exists between two
variables
5. Explain the difference between correlation and causation
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (1 of 6)
The response variable is the variable whose value can be
explained by the value of the explanatory or predictor
variable.
A scatter diagram is a graph that shows the relationship
between two quantitative variables measured on the same
individual. Each individual in the data set is represented by a
point in the scatter diagram. The explanatory variable is
plotted on the horizontal axis, and the response variable is
plotted on the vertical axis.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (2 of 6)
EXAMPLE Drawing and Interpreting a Depth at Which Time to Drill 5
Scatter Diagram Drilling Begins, x Feet, y
(in feet) (in minutes)
The data shown to the right are based 35 5.88
on a study for drilling rock. The 50 5.99
researchers wanted to determine 75 6.74
whether the time it takes to dry drill a 95 6.1
distance of 5 feet in rock increases with 120 7.47
the depth at which the drilling begins.
130 6.93
So, depth at which drilling begins is the
145 6.42
explanatory variable, x, and time (in
155 7.97
minutes) to drill five feet is the response
160 7.92
variable, y. Draw a scatter diagram of
175 7.62
the data.
185 6.89
Source: Penner, R., and Watts, D.G. “Mining Information.”
The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6. 190 7.9
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (3 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (4 of 6)
Various Types of Relations in a Scatter Diagram
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (5 of 6)
Two variables that are linearly related are positively
associated when above-average values of one variable are
associated with above-average values of the other variable
and below-average values of one variable are associated
with below-average values of the other variable. That is, two
variables are positively associated if, whenever the value of
one variable increases, the value of the other variable also
increases.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.1 Draw and Interpret Scatter Diagrams (6 of 6)
Two variables that are linearly related are negatively
associated when above-average values of one variable are
associated with below-average values of the other variable.
That is, two variables are negatively associated if, whenever
the value of one variable increases, the value of the other
variable decreases.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (1 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (2 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (3 of 6)
Properties of the Linear Correlation Coefficient
1. The linear correlation coefficient is always between −1 and 1,
inclusive. That is, −1 ≤ r ≤ 1.
2. If r = + 1, then a perfect positive linear relation exists between
the two variables.
3. If r = −1, then a perfect negative linear relation exists between
the two variables.
4. The closer r is to +1, the stronger the evidence is of a positive
association between the two variables.
5. The closer r is to −1, the stronger the evidence is of a negative
association between the two variables.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (4 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (5 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.2 Describe the Properties of the Linear Correlation
Coefficient (6 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation
Coefficient (1 of 5)
Depth at Which Time to Drill 5
EXAMPLE Determining Drilling Begins, x Feet, y
(in feet) (in minutes)
the Linear Correlation
35 5.88
Coefficient
50 5.99
Determine the linear 75 6.74
correlation coefficient of 95 6.1
the drilling data. 120 7.47
130 6.93
145 6.42
155 7.97
160 7.92
175 7.62
185 6.89
190 7.9
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation
Coefficient (2 of 5)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation
Coefficient (3 of 5)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation
Coefficient (4 of 5)
IN CLASS ACTIVITY
Correlation
Randomly select six students from the class and have them determine their at-
rest pulse rates and then discuss the following:
1. When determining each at-rest pulse rate, would it be better to count beats for
30 seconds and multiply by 2 or count beats for 1 full minute?
Explain. What are some other ways to find the at-rest pulse rate?
Do any of these methods have an advantage?
2. What effect will physical activity have on pulse rate?
3. Do you think the at-rest pulse rate will have any effect on the pulse rate after
physical activity? If so, how? If not, why not?
Have the same six students jog in place for 3 minutes and then immediately
determine their pulse rates using the same technique as for the at-rest pulse
rates.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.3 Compute and Interpret the Linear Correlation
Coefficient (5 of 5)
4. Draw a scatter diagram for the pulse data using the at-rest data as the
explanatory variable.
5. Comment on the relationship, if any, between the two variables. Is this
consistent with your expectations?
6. Based on the graph, estimate the linear correlation coefficient for the data.
Then compute the correlation coefficient and compare it to your estimate.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.4 Determine whether a Linear Relation Exists between
Two Variables (1 of 2)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.4 Determine whether a Linear Relation Exists between
Two Variables (2 of 2)
EXAMPLE Does a Linear Relation Table II
Exist? Critical Values for Correlation Coefficient
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (1 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (2 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (3 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (4 of 8)
Table 4
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (5 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (6 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.1 Scatter Diagrams and Correlation
4.1.5 Explain the Difference between Correlation and
Causation (7 of 8)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
Learning Objectives
1. Find the least-squares regression line and use the line to
make predictions
2. Interpret the slope and the y-intercept of the least-squares
regression line
3. Compute the sum of squared residuals
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
EXAMPLE Finding an Equation that Describes Linearly
Relate Data (1 of 2)
Using the following sample data:
x 0 2 3 5 6 6
y 5.8 5.7 5.2 2.8 1.9 2.2
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
EXAMPLE Finding an Equation that Describes Linearly
Relate Data (2 of 2)
(b) Graph the equation on the scatter diagram.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (1 of 7)
The difference between the observed value of y and the predicted value of y is
the error, or residual.
Using the line from the last example, and the predicted value at x = 3:
= 5.2 − 4.75
= 0.45
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (2 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (3 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (4 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (5 of 7)
Depth at Time to Drill
EXAMPLE Finding the Least- Which Drilling 5 Feet, y
Begins, x (in (in minutes)
squares Regression Line feet)
Using the drilling data 35 5.88
50 5.99
(a) Find the least-squares regression
75 6.74
line.
95 6.1
(b) Predict the drilling time if drilling 120 7.47
starts at 130 feet. 130 6.93
(c) Is the observed drilling time at 130 145 6.42
feet above, or below, average. 155 7.97
160 7.92
(d) Draw the least-squares regression
175 7.62
line on the scatter diagram of the
185 6.89
data.
190 7.9
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (6 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.1 Find the Least-Squares Regression Line and Use the
Line to Make Predictions (7 of 7)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.2 Interpret the Slope and the y-Intercept of the Least-
Squares Regression Line (1 of 3)
Interpretation of Slope:
The slope of the regression line is 0.0116. For each additional foot
of depth we start drilling, the time to drill five feet increases by
0.0116 minutes, on average.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.2 Least-squares Regression
4.2.2 Interpret the Slope and the y-Intercept of the Least-
Squares Regression Line (2 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
Learning Objectives
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (1 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (2 of 18)
The data to the right are based Depth at Which Time to Drill
Drilling Begins, 5 Feet, y
on a study for drilling rock. The x (in feet) (in minutes)
researchers wanted to determine 35 5.88
whether the time it takes to dry 50 5.99
drill a distance of 5 feet in rock 75 6.74
increases with the depth at 95 6.1
which the drilling begins. So, 120 7.47
depth at which drilling begins is 130 6.93
the predictor variable, x, and 145 6.42
time (in minutes) to drill five feet 155 7.97
is the response variable, y. 160 7.92
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (3 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (4 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (5 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (6 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (7 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (8 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (9 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (10 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (11 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (12 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (13 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (14 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (15 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (16 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (17 of 18)
DATA SET A DATA SET B DATA SET C
X Y X Y X Y
3.6 8.9 3.1 8.9 2.8 8.9
8.3 15.0 9.4 15.0 8.1 15.0
0.5 4.8 1.2 4.8 3.0 4.8
1.4 6.0 1.0 6.0 8.3 6.0
8.2 14.9 9.0 14.9 8.2 14.9
5.9 11.9 5.0 11.9 1.4 11.9
4.3 9.8 3.4 9.8 1.0 9.8
8.3 15.0 7.4 15.0 7.9 15.0
0.3 4.7 0.1 4.7 5.9 4.7
6.8 13.0 7.5 13.0 5.0 13.0
Draw a scatter diagram for each of these data sets. For each data
set, the variance of y is 17.49.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.3 Diagnostics on the Least-squares Regression Line
4.3.1 Compute and Interpret the Coefficient of Determination (18 of 18)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
Example: Data Information
A professor at a community college in New Mexico conducted a study to assess
the effectiveness of delivering an introductory statistics course via traditional
lecture-based method, online delivery (no classroom instruction), and hybrid
instruction (online course with weekly meetings) methods, the grades students
received in each of the courses were tallied.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.1 Compute the Marginal Distribution of a Variable (1 of 3)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.1 Compute the Marginal Distribution of a Variable (2 of 3)
EXAMPLE Determining Frequency Marginal Distributions
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.2 Use the Conditional Distribution to Identify Association
among Categorical Data (1 of 4)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.2 Use the Conditional Distribution to Identify Association
among Categorical Data (2 of 4)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.2 Use the Conditional Distribution to Identify Association
among Categorical Data (3 of 4)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.2 Use the Conditional Distribution to Identify Association
among Categorical Data (4 of 4)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (1 of 6)
EXAMPLE Illustrating Simpson’s Paradox
Insulin dependent (or Type 1) diabetes is a disease that results in
the permanent destruction of insulin-producing beta cells of the
pancreas. Type 1 diabetes is lethal unless treatment with insulin
injections replaces the missing hormone. Individuals with insulin
independent (or Type 2) diabetes can produce insulin internally.
The data shown in the table below represent the survival status of
902 patients with diabetes by type over a 5-year period.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (2 of 6)
EXAMPLE Illustrating Simpson’s Paradox
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (3 of 6)
However, Type 2 diabetes is usually contracted after the age of
40. If we account for the variable age and divide our patients into
two groups (those 40 or younger and those over 40), we obtain
the data in the table below.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (4 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (5 of 6)
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
4.4 Contingency Tables and Association
4.4.3 Explain Simpson’s Paradox (6 of 6)
Simpson’s Paradox describes a situation in which an
association between two variables inverts or goes away
when a third variable is introduced to the analysis.
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved