Unit 4 Correlation and Linear Regression
Unit 4 Correlation and Linear Regression
IB Maths AA
Correlation and Linear Regression
Unit 4 - Statistics
IB Maths AA
TOK Connections
IB Maths AA
TOK Connections
IB Maths AA
Bivariate Data (2 variable)
However! Data in the Real World is never perfect.
IB Maths AA
70
60
50
Test Score
40
30
20
10
0
0 20 40 60 80 100 120
Study time
IB Maths AA
Scatter Plot
➢ A scatter diagram is a way of graphing bivariate data
○ One variable will be on the x-axis and the other will be on the y-axis
IB Maths AA
Scatter Plot
With Bivariate Data we create a Scatter Plot.
- Plot one set of data on the x-axis
- Plot other set of data on the y-axis
IB Maths AA
Scatter Plot
What can we say about this Correlation?
- Is it (approximately) Linear?
- How “Strong” is it? (does it follow an exact curve, or only loosely)
- What direction is the relation going in?
IB Maths AA
Lines of Best Fit
A line of best fit is drawn on the data plotted that represents the general trend of
the data
IB Maths AA
Lines of Best Fit
Process of drawing a Line of Best Fit:
1) Find the average of each data set
2) Plot Coordinate ( ) in your scatterplot (indicate it differently than other
points)
3) Draw a Line of Best Fit that goes through this “Average Point” and
demonstrates the best trend through the data
IB Maths AA
Lines of Best Fit
Example: A company records the amount of money they spend on advertising and the
number of products they sold in store. They want to see if there is a relationship between
these sets of data. Their record is below:
Average x (Advertising) =
(45+55+47+75+90+100+100+95+88+50+45+58)/12
= 70.67
IB Maths AA
Lines of Best Fit
Draw Scatter Plot and Average Point Draw Line of Best Fit through Average
Point
IB Maths AA
Describing Correlations
Descriptors of Correlation
Positive, Negative, No Correlation Strong, Moderate, Weak
IB Maths AA
Describing Correlations
IB Maths AA
Describing Correlations
IB Maths AA
Pearson’s Product-Moment Correlation
Coefficient (r)
PMCC or “Pearson’s Coefficient” or just denoted by r, is a value between -1 and 1 that
quantitatively tells us how strong our correlation is
The closer r is to either 1 or -1, the stronger the correlation between the data
IB Maths AA
Pearson’s Coefficient (r)
IB Maths AA
Pearson’s Coefficient (r)
These are “approximate” boundaries for r and what is “strong”, “moderate” and
“weak”. There is no exact ranges, but this gives you an idea
IB Maths AA
Calculating Pearson's Coefficient
You must use the Graphing Calculator to calculate r.
http://www.youtube.com/watch?v=DBWAmboDVtg
IB Maths AA
Linear Regression (Line of Best Fit on a
Calculator
We can also perform a “Linear Regression” on our data. This means finding the
Line of Best Fit and its Equation that gives the strongest Pearson’s Coefficient
IB Maths AA
Regression
IB Maths AA
Regression
The equation of linear regression
is found by minimising the total of
the vertical distances (residuals)
between the points of the data set
and the line.
IB Maths AA
Practice
A teacher is interested in the relationship between the number of hours her students spend on a
phone per day and the number of hours they spend on a computer. She takes a sample of nine
students and records the results in the table below.
a) Draw a scatter diagram for the data, where x is hours spent on the phone and y is hours spent
on the computer.
b) The relationship can be modelled by the regression equation , Find the value of r the
correlation coefficient.
c) Comment on the relationship.
d) If a student spent 5 hours on their phone, how long would we expect them to spend on the PC
IB Maths AA
Practice
Revision Village - https://www.revisionvillage.com/ib-math/analysis-and-
approaches-hl/questionbank/statistics-and-probability/bivariate-statistics/
Questions Questions Questions
1 10 14
4 12
5 13
6
7
8
9
IB Maths AA
Summary
The gradient of a straight line tells us how much y increases for each unit increase in x.
So the gradient of the line of regression tells us how much the dependent variable
increases for each unit increase in the independent variable.
The y-intercept of the line of regression gives the value of the dependent variable when
the independent variable is 0.
This value is often an extrapolation and so should be used cautiously
IB Maths AA