0% found this document useful (0 votes)
17 views

Course 10-Part 1

Uploaded by

wtjqkgbn85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Course 10-Part 1

Uploaded by

wtjqkgbn85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

8STT117

Session 10

Simple Linear
Regression
Part 1
Plan

Part 1 :
• Context
 Simple Linear Regression Model
 Estimation of the regression line
Part 2
 Confidence interval on the parameters of the regression line
 Hypothesis tests on the parameters of the regression line
 Pointwise prediction, Interval prediction
 Coefficient of determination and correlation

2
Context
In statistics, several problems consist of defining the relationship
between two statistical variables:

 The number of years of experience and the number of errors


committed;
 Driver age and the number of car accidents;
 Weight and size of students;
 Volume of sales and advertising expenses;
 Number of hours of study and the number of good answers;

…

3
Context
In this kind of problem, the main questions we want to answer are :

1. Is there a relationship or dependence between statistical


variables ?
2. Is this relationship, if it exists, linear?
3. If a dependency exists, how can it be translated by a
mathematical equation ?
4. If the mathematical equation of the relation between the
variables exists, how can the values of a variable be predicted
from the knowledge of values of the other variable or of the other
variables ?
5. Is the relationship, if it exists, strong or weak?

4
Context

To answer questions 1 to 4
We use a statistical tool called :

Regression Analysis

To answer question 5
We use a statistical tool called :

Correlation Analysis

5
Regression Analysis

6
Regression Analysis

Definition
Regression analysis is a statistical method that allows us to
study the type of relationship that may exist between a
certain variable whose values we want to explain and one or
more other variables that are used for this explanation.

Regression analysis study the relations between a variable we


wish to «predict» and other variables which are more easily
observed.

7
Regression Analysis
Example
Rent (Y) of an apartment depends on its surface area,
its distance from the city centre, its number of rooms,
its level of soundproofing…

Y  f ( X 1 , X 2 , X 3 ,..., X n )

Rent of an Surface
apartment Distance Rooms
from the city
centre

8
Regression Analysis

• Regression analysis can be used to develop an equation


showing how the variables are related.
• The variable being predicted is called the dependent variable
and is always denoted by y.
• The variables being used to predict the value of the dependent
variable are called the independent variables (or explanatory)
and are denoted by x.

9
Simple Linear Regression

• Simple linear regression involves one independent variable


(x) and one dependent variable (y).

• The relationship between the two variables is approximated


by a straight line.

• Regression analysis involving two or more independent


variables (x1, x2, …, xn) is called multiple regression.

• This course deals with simple linear regression.

10
Simple Linear Regression

1. Is there a relationship or dependence between statistical


variables ?

Scatter plot

 Wecan represent this sample of n pairs in a two-dimensional


space (X, Y). We call such a graph a scatter plot.

11
Simple Linear Regression
Example

 Studying result: for 13 students, here are the number of hours


devoted to preparing for a final exam in statistics, and the
number of correct answers:

12
Simple Linear Regression

y 10

Linear 3

relation f 2 Is there a relationship or dependence


1 between statistical variables ?
0
x
0 2 4 6 8 10 12

13
Scatter plot
A scatter plot gives us an idea of the relation that can exist
between X and Y, and of the quality of our estimation of this
relation (if it exists…)

We can assume a strong linear


relation.
Good estimation of this relation.

We can assume a strong linear


relation.
Poor estimation of this relation.

14
Scatter plot

We can assume a strong relationship,


but it is nonlinear.

Approximate linear relationship

If there is a relationship, it is not clear what


it is; there is a great deal of noise.

No estimation can be good.

If the slope of the estimated line is zero, this


means that we judge that X has no effect on
Y (i.e. there is no relation).

15
Simple Linear Regression
Goal :

Once the graphical representation is made, it is easy to suspect


the existence of a certain relation between the two variables
studied. It is now necessary to try to express the relation
existing between the two variables by using a mathematical
equation.

Y  f (X )

16
Simple Linear Regression
The equation that describes how Y is related to X and an error
term is called the regression model.

Y  f ( X) 1X   0  
Where :
Y = dependent variable
X = independent variable

 0 and 1 = parameters of the model

 = is a random variable called the error term

17
Estimation of the Regression Line
This population model is unknown Y   0  1 X  because the
parameters 0 , 1 and  are unknown.

We make an approximation of this population model by another


linear model, called the estimated model:

Yˆ b0  b1 X

where b1 and b0 are our estimations of 1 and 0 respectively.


• b1 is the slope of the line,
• b0 is the y-intercept

18
Estimation of the Regression Line

Least Squares Method

Goal : this method tries to construct a regression line for


which the sum of the squares of the vertical distances between
the line and each of the observed points (yi) is smallest.

19
Estimation of the Regression Line
Least Squares Method

ei

20
Least Squares Principle
The objective of the least squares method is to
determine the regression line that minimizes
n n
min  e min  y  y
ˆ 
2 2
i i i
i 1 i 1
where:
yi = observed value of the dependent variable for the ith
observation
ŷi= estimated value of the dependent variable for the ith
observation

21
Least Squares Principle

How to calculate the coefficients b0 et b1?

The point estimates of the regression line parameters


obtained by the least squares method are:

b0  y  b1 x
n

xy i i  nx y
b1  i 1n
 i
x 2
 nx 2

Sample size
i1

22
Estimation of the Regression Line (b1, b0)
 Studying-result example (continued) : b0  y  b1 x
i xi yi xi yi xi2 x  91 7 n

1 4 5 20 16 13 xy i i  nx y
2 8 8 64 64 86 b1  i 1n
y 6.6154
3
4
5
9
7
9
35
81
25
81 13  i
x 2
 nx 2

i1
5 12 10 120 144
6 7 7 49 49 b1  684  13 7 6.6154 0.7736
7 3 4 12 9 743  13 7 2
8 4 4 16 16
9 10 8 80 100
b0  6.6154  0.7736 7 1.2
10 3 1 3 9
11 10 9 90 100
12 7 6 42 49 Estimated regression line
13 9 8 72 81
å 91 86 684 743 ŷ 1.2  0.7736 x

23
Estimation of the Regression Line
Example 2:
A company wants to conduct a study on the relationship between
weekly advertising spending and the volume of sales that it
makes. The following data have been collected over the last 10
weeks:
Advertising
4 2 2.5 2 3 5 1 5.5 3.5 4.5
cost (X) (M$)
Sales volumes
40.5 41 43 39 46 53 38 54 48.5 51.5
(Y) (M$)

From the above data, determine the point estimates of the


regression line parameters.

24
Estimation of the Regression Line

Scartterplot
60

50
Sales volumes

40

30

20

10

0
0 1 2 3 4 5 6

Advertising cost

25
Estimation of the Regression Line
Answer Sales
Advertising cost volumes (Y)
(X) (M$) (M$)
xi yi  xi . yi  xi 2
4 49,5 198 16
2 41 82 4
2,5 43 107,5 6,25
2 39 78 4
3 46 138 9
5 53 265 25
1 38 38 1
5,5 54 297 30,25
3,5 48,5 169,75 12,25
4,5 51,5 231,75 20,25
Sum 1605 128
Mean x  3,3 y  46,35

26
Estimation of the Regression Line
Answer

x y i i  nx y
1605  10 3.3 46.35
b1  i 1n  2
3.95
128  10 (3.3)
 i
x 2

i 1
 n x 2
Yˆ 33.31  3.95 X

b0  y  b1 x 46.35  3.95 3.3 33.31

27
Estimation of the Regression Line

Interpretation

Yˆ 33.31  3.95 X

Y-Intercept Advertising cost

This is the increase in


sales volume (Y) for one
unit increase in
Estimation of the sales for a advertising costs (X)
given value of advertising costs.

28
Estimation of the Regression Line
Properties:

• The regression line always passes through the point  x , y :


y b0  b1 x
• The sum of the residues is always zero:∑ ei = 0.

29
Estimation of the Regression Line

x , y 
(3.3, 46.35)

30
See next presentation (part 2) :
• Confidence interval on the parameters of the
regression line
 Hypothesis tests on the parameters of the regression
line
 Pointwise prediction, Interval prediction
 Coefficient of determination and correlation

31
Thank you

32

You might also like