0% found this document useful (0 votes)

22 views

Chapter 2 Simple Linear Regression

This chapter discusses simple linear regression including an overview of linear regression, the least squares method, assumptions of simple linear regression, and computation of simple linear regression. Simple linear regression investigates the dependence of a dependent variable on an independent variable using a straight line. The chapter provides examples and applications of simple linear regression.

Uploaded by

shah reza

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Chapter 2 Simple Linear Regression

Uploaded by

shah reza

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

FEM 2063 - Data Analytics

CHAPTER 2
AT THE END OF THIS CHAPTER STUDENTS SHOULD
BE ABLE TO UNDERSTAND

SIMPLE LINEAR REGRESSION

1
OVERVIEW
➢2.1 Background
➢2.2 Introduction
➢2.3 Regression
➢2.4 Least Squares Method
➢2.5 Simple Linear Regression
(SLR)
➢2.6 Software Output
➢2.7 ANOVA
➢2.8 Model Evaluation
➢2.9 Applications/Examples

2
2.1 BACKGROUND - WHAT IS LINEAR
REGRESSION (LR)?
Linear regression is a linear model, a model that
assumes a linear relationship between the input variables
(x) and the single output variable (y)
LR is relation between variables where changes in some
variables may “explain” the changes in other variables
LR model estimates the nature of the relationship between
independent and dependent variables.

Examples:
• Does change class size affect marks of students?
• Does cholesterol level depend on age, sex, or amount of
exercise? 3
2.1 What is Linear Regression (LR)?
Investigating the dependence of one variable (dependent
variable), on one or more variables (independent variable)
using a straight line.

Strong relationships Weak relationships

Y Y

X X

Y Y

4
X X
2.1 What is LR model used for?
Linear regression models are used to show or predict
the relationship between two variables or factors. The
factor that is being predicted is called the dependent
variable.
Example of Linear Regression
Regression analysis is used in stats to find trends in data.
For example, you might guess that there's a connection
between how much you eat and how much you
weight; regression analysis can help you quantify that.

5
2.1 LR model - how does it work?
Linear Regression is the process of finding a line that best
fits the data points available on the plot, so that we can use
it to predict output values for inputs that are not present in
the data set we have, with the belief that those outputs
would fall on the line.

6
2.1 REGRESSION APPLICATIONS
THREE MAJOR APPLICATIONS
• description
• control
• prediction
Linear regression has many practical uses. Most
applications fall into one of the following two broad
categories: If the goal is prediction, forecasting, or
error reduction, linear regression can be used to fit
a predictive model to an observed data set of
values of the response and explanatory variables.
7
2.1 WHAT REGRESSION IS USED FOR?
1.Predictive Analytics: 2. Operation Efficiency:
forecasting future opportunities Regression models can also
and risks is the most prominent be used to optimize
application of regression analysis business processes.
in business
4. Correcting Errors:
3. Supporting Decisions: Regression is not only great for
Businesses today are overloaded lending empirical support to
with data on finances, operations management decisions but also for
and customer purchases. identifying errors in judgment.
Executives are now leaning on
data analytics to make informed 5. New Insights:
business decisions that have Over time businesses have
statistical significance, thus gathered a large volume of
eliminating the intuition and gut unorganized data that has the
feel. potential to yield valuable insights.

8
2.1 EXAMPLE LR IN FORECASTING

© 2019 Petroliam Nasional Berhad (PETRONAS) | 9

2.1 TYPES OF REGRESSION

10
2.1 TYPES OF REGRESSION

Regression
1 Variable Models 2+ Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

11
2.1 TYPES OF REGRESSION

(EDUCATION) Y (Income)

(EDUCATION)
(SEX)
(EXPERIENCE)
(AGE) Y (Income)

12
2.1 CORRELATION
 Correlation is a statistical technique that can show whether
and how strongly pairs of variables are related.

 The range of possible values is from -1 to +1

 The correlation is high if observations lie close to a straight

line (ie values close to +1 or -1) and low if observations are
widely scattered (correlation value close to 0)

( xi − x )( yi − y )
r=
(( xi − x ) 2 )(( yi − y ) 2 )
13
2.1
CORRELATION
VS
LINEAR
REGRESSION

14
2.2 INTRODUCTION – SIMPLE
LINEAR REGRESSION
• The quantitative analysis use the
information to predict its future
behavior.
• Current information is usually in the
form of a set of data.
• When the data form a set of pairs of
numbers, we may interpret them as
representing the observed values of
an independent (predictor) variable x
and a dependent (response) variable
y.

15
2.2 INTRODUCTION – SIMPLE
LINEAR REGRESSION
The goal is to find a functional relation between the response
variable y and the predictor variable x.
𝑦 = 𝑓(𝑥)
SELECTION of independent variable(s)
- choose the most important predictor variable(s).

SCOPE of model
- we may need to restrict the coverage of model to some
interval or region of values of the independent variable(s)
depend on the needs/requirements.

16
2.3 REGRESSION -
POPULATION & SAMPLE

17
2.3 REGRESSION - REGRESSION
MODEL
General regression model where,

𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝜀
0, and 1 are unknown parameters, x is a known parameter
Deviations  are independent, n(o, 2)
The values of the regression parameters 0, and 1 are
not known. We estimate them from data.

1 indicates the change in the mean response per unit

increase in X.

18
2.3 REGRESSION - REGRESSION LINE
• If the scatter plot of our sample data suggests a linear
relationship between two variables i.e.

𝑦 = 𝛽መ0 + 𝛽መ1 𝑥
the relationship can be summarized by a straight-line plot.
• Least squares method give us the “best” estimated line for
our set of sample data.
• The least squares method is a statistical procedure to find
the best fit for a set of data points by minimizing the sum
of the offsets or residuals of points from the plotted curve.
Least squares regression is used to predict the behavior of
dependent variables.

19
2.4 Least Squares Method

20
2.4 Least Squares Method
• ‘best fit’ means differences between actual y values & predicted
y values is a minimum.
• but positive differences off-set negative ones, so, square the
errors!

LS methods minimizes the Sum of the Squared

Differences (errors) (SSE)

21
2.4 Least Squares Method

22
2.4 ASSUMPTIONS IN SLR
◼ Independent observations:
Observations are independent of
each other.
◼ Linear relationship: The relationship
between X and the mean of Y is
linear
◼ Normal distribution of error terms:
the residuals 𝜀𝑖 are normally Multivariate Normality–
Multiple regression assumes
distributed
that the residuals are normally
◼ No auto-correlation: The residuals distributed.
are independent of each other. No No Multicollinearity—Multiple
serial correlation in the values of regression assumes that the
residual 𝜀𝑖 . independent variables are not
highly correlated with each
other.
23
2.4 ASSUMPTIONS IN SLR

Serial correlation is the relationship between a given variable and a

lagged version of itself over various time intervals. It measures the
relationship between a variable's current value given its past values. A
variable that is serially correlated indicates that it may not be random.

24
2.5 SLR - COMPUTATION
• write an estimated regression line based on sample data as

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥
• the method of least squares chooses the values for b0, and b1
to minimize the sum of squared errors (sse)

25
Example 1:
The manager of a car plant wishes to investigate how the
plant’s electricity usage depends upon the plant production.
The data is given below
Production 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
($million) (x)
Electricity 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
Usage (y)

Estimate the linear regression equation

Y =  0 + 1x
Example – manual calculations

x 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2
x
=58.62

y 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.53
y
=34.15

xy 11.18 8.09 10.65 14.02 16.86 15.22 16.82 20.17 14.24 18.29 13.08 10.63
 xy
=169.25

x2 20.34 12.82 18.58 25.60 31.81 24.90 27.98 33.99 22.09 31.47 24.01 17.64
x 2

=291.23
2
n
1 n

S XX =  x −   xi 
2
ˆ S XY
i =1
i
n  i =1  1 = = 0 . 4988
S XX
= 291 .23 – ( 58 .62 ) 2 /12 = 4 .8723
n
1  n  n 
S XY =  xiy i −   xi    y i 
i =1 n  i =1   i =1 
= 169 .25 – ( 58 .62 )( 34 .15 )/12 = 2 .43045

ˆ0 = y − ˆ1 x
= ( 34 .15 /12 ) – ( 0 .4988 )( 58 .62 / 12 )
= 0 .4091
Estimated Regression Line yˆ = 0 . 4 0 9 1 + 0 . 4 9 8 8 x
2.5 SLR - ESTIMATION OF MEAN
RESPONSE
• Fitted regression line can be used to estimate the mean
value of y for a given value of x.
• Example
• the weekly advertising expenditure (x) and weekly
sales (y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71 29
2.5 SLR – ESTIMATION OF MEAN
RESPONSE
• from the previous table:
𝑛 = 10 ෍ 𝑥 = 564 ෍ 𝑥 2 = 32604

෍ 𝑦 = 14365 ෍ 𝑥𝑦 = 818755

• the least squares estimates of the regression coefficients

are:

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 10(818755) − (564)(14365)
𝛽መ1 = 2 2
= 2
= 10.8
𝑛 σ 𝑥 − (σ 𝑥) 10(32604) − (564)

𝛽መ0 = 1436.5 − 10.8(56.4) = 828

30
2.5 SLR – ESTIMATION OF MEAN
RESPONSE
• The estimated regression function is:

yො = 828 + 10.8x
Sales = 828 + 10.8 Expenditure
this means that if the weekly advertising expenditure is increased
by $1 we would expect the weekly sales to increase by $10.8.
For $50 of expenditure, then estimated sales is:

𝑆𝑎𝑙𝑒𝑠 = 828 + 10.8(50) = 1368

This is called the point estimate (forecast) of the mean response

(sales).
31
2.6 ANOVA
• ANOVA (analysis of variance) is the term for statistical
analyses of the different sources of variation.
• Partitioning of sums of squares and degrees of freedom
associated with the response variable.
2.6 ANOVA
2.6 ANOVA – SST, SSE & SSR
The measure of total variation, denoted by SST, is the sum of
the squared deviations:
ǉ 2
𝑆𝑆𝑇 = ෍(𝑦𝑖 − 𝑦)

If SST = 0, all observations are the same (no variability).

The greater is SST, the greater is the variation among the y
values.
In regression model, the measure of variation is that of the y
observations variability around the fitted line:

𝑦𝑖 − 𝑦ො𝑖
2.6 ANOVA – SST, SSE & SSR

35
2.6 ANOVA – SST, SSE & SSR
▪ Sum Square Total (SST ):
- Measure how much variance is in the dependent
variable.
- Made up of the SSE and SSR
𝐧 𝐧 𝐧

ǉ 𝟐 = ෍(𝐲𝐢 − 𝐲ො𝐢 )𝟐 + ෍( 𝐲ො𝐢 − 𝐲)

𝐒𝐒𝐓 = ෍(𝐲𝐢 − 𝐲) ǉ 𝟐
𝐢=𝟏 𝐢=𝟏 𝐢=𝟏

SST = SSE + SSR

degree of
𝑛−1 = 𝑛−2 + 1
freedom: 36
2.6 ANOVA – SST, SSE & SSR

37
2.6 ANOVA – SST, SSE & SSR

© 2019 Petroliam Nasional Berhad (PETRONAS) | 38

2.6 ANOVA - MEAN SQUARES (MS)

• A sum of squares divided by its degrees of

freedom is called a mean square (MS)

𝑆𝑆𝑅
• Mean square regression (MSR) 𝑀𝑆𝑅 =
1

𝑆𝑆𝐸
• Mean square error (MSE) 𝑀𝑆𝐸 =
𝑛−2
2.7 Model Evaluation
SLR model evaluation is using software output

(i) standard error of estimate (s)

(ii) coefficient of determination (R2)
(iii) hypothesis test
a)the t-test of the slope
b)the F-test of the slope

40
2.7 Model Evaluation
(i) Standard error of estimate (s)
𝐒𝐒𝐄
➢ Compute Standard Error of Estimate by 𝜎ො 𝟐 =
𝐧−𝟐
where
𝐧

𝐒𝐒𝐄 = ෍(𝐲𝐢 − 𝑦ො𝐢 )𝟐

𝐢=𝟏

➢ The smaller SSE the more successful is the Linear

Regression Model in explaining y.

41
2.7 Model Evaluation
(ii) Coefficient of Determination
➢ Coefficient of determination
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1− R2 = 1 - (SSE/SST)
𝑆𝑆𝑇 𝑆𝑆𝑇

➢proportion of variability in the observed dependent variable

that is explained by the linear regression model.
➢The coefficient of determination measures the strength of
that linear relationship, denoted by R2
➢The greater R2 the more successful is the Linear Model
➢The R2 value close to 1 indicates the best fit and close to 0
indicates the poor fit.

42
2.7 R-SQUARED

© 2019 Petroliam Nasional Berhad (PETRONAS) | 43

2.7 Model Evaluation
(iii) The Hypothesis Test
Hypothesis testing: Decision-making procedure about
the null hypothesis

◼The Null Hypothesis (H0):

The hypothesis that cannot be viewed as false unless
sufficient evidence on the contrary is obtained.

◼The Alternative Hypothesis (H1):

The hypothesis against which the null hypothesis is
tested and is viewed true when H0 is declared false.

© 2019 Petroliam Nasional Berhad (PETRONAS) | 44

2.7 Model Evaluation
(iii) The Hypothesis Test

© 2019 Petroliam Nasional Berhad (PETRONAS) | 45

2.7 Model Evaluation
(iii) The Hypothesis Test

Hypothesis test
• A process that uses sample statistics to test a claim about the value
of a population parameter.
• Example: An automobile manufacturer advertises that its new hybrid
car has a mean mileage of 50 miles per gallon. To test this claim, a
sample would be taken. If the sample mean differs enough from the
© 2019 Petroliam Nasional Berhad (PETRONAS) | 46
advertised mean, you can decide the advertisement is wrong.
2.7 MODEL EVALUATION
(III) THE HYPOTHESIS TEST
• One sided (tailed) H 0 :   0 or  = 0
lower-tail test H1 :    0

• One sided (tailed) H 0 :   0 or  = 0

upper-tail test H1 :    0

H 0 :  = 0
• Two-sided (tailed) test H1 :    0

Note: μ0 is the value given/assumed for the parameter μ.

One sided
(tailed) upper-
tail test

One sided
(tailed) lower-
tail test

Two-sided
(tailed) test

© 2019 Petroliam Nasional Berhad (PETRONAS) | 48

2.7 Model Evaluation
(iii) The Hypothesis Test - Summary

2.7 Model Evaluation
(iii) The Hypothesis Test
➢Equivalence Of F-Test And t -Test:
For Given  Level, The F Test Of 1 = 0 Versus 1  0 Is
Equivalent Algebraically To The Two-sided t-test.

➢ Thus, At A Given Level, We Can Use Either The t-test Or The

F-test For Testing 1 = 0 Versus 1  0.

➢ The t-test Is More Flexible Since It Can Be Used For One

Sided Test As Well.
2.7 Model Evaluation
(iii) The hypothesis test
– t-test
➢ t-test to check on adequate
relationship between x and y

➢ Test the hypothesis

H0 : 𝛽1 = 0 (No relationship between x and y)
𝛽1
H1: ≠ 0 (There is relationship between x and y)
𝛽መ1 − 𝛽1 𝛽መ1 − 𝛽1
➢ Test Statistic: T – distribution: 𝑇= =
𝜎ො 2 𝑠𝑒(𝑏)
𝑠𝑠𝑥𝑥
➢ Critical Region: |T | > tα/2, n-2 .

51
2.7 Model Evaluation
(iii) Model evaluation – t-test

The t – test is used to test for inference on individual

regression coefficient.
2.7 Model Evaluation
(iii) The hypothesis test
– F-test
• In order to be able to construct a
statistical decision rule, we need to know
the distribution of our test statistic F.
𝑀𝑆𝑅
𝐹=
𝑀𝑆𝐸
• when h0 is true, our test statistic, F,
follows the F-distribution with 1, and n-2
degrees of freedom.
𝐹(𝛼; 1, 𝑛 − 2)
2.7 Model Evaluation
(iii) The hypothesis test – F-test
• This time we will use the F-test. the null and alternative
hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0
◼ Construction of decision rule:
At  = 5% level, Reject H0 if 𝐹 > 𝐹(𝛼; 1, 𝑛 − 2)

◼ Large values of F support Ha and values of F near 1

support H0.
2.7 Model Evaluation
(iii) The evaluation – F-test
➢ The F – test is used to test for inference on
multiple linear regression model
2.7 Model Evaluation - (iv) P - Value
In statistics, the p-value is the probability of obtaining
results at least as extreme as the observed results of a
statistical hypothesis test, assuming that the null hypothesis
is correct.
A smaller p-value means
that there is stronger
evidence in favor of the
alternative hypothesis.
When the p–value is
small we shall reject the
null hypothesis H0 .

2.7 Model Evaluation
(iii) Model Evaluation – p - value
The p – value can also be used to test for inference on
individual regression coefficient.
Excel steps and
outputs
2.8 Example 1
The manager of a car plant wishes to investigate how the
plant’s electricity usage depends upon the plant production.
The data is given below estimate the linear regression
equation
Production (x) ($M) 4.51 3.58 4.31 5.06 5.64 4.99 5.29 5.83 4.7 5.61 4.9 4.2

Electricity Usage 2.48 2.26 2.47 2.77 2.99 3.05 3.18 3.46 3.03 3.26 2.67 2.5
(y)(kWh) 3

i. Estimate the linear regression equation

ii. Find the standard error of estimate of this regression.
iii. Determine the coefficient of determination of this regression.
iv. Test for significance of regression at 5% significance level.

59
Set up the hypothesis

H0 : β = 0 ( there is no relationship between x and y)

There is no relationship between Production and
Electric Usage

H1: β ≠ 0 (the straight-line model is adequate)

There is relationship between Production and
Electric Usage

60
Excel Results
Regression Statistics
Multiple R 0.895605603
R Square 0.802109396
Adjusted R Square 0.782320336
Standard Error 0.172947969
Observations 12

ANOVA
df SS MS F Significance F
Regression 1 1.212381668 1.21238 40.53297031 8.1759E-05
Residual 10 0.299109998 0.02991
Total 11 1.511491667

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.409048191 0.385990515 1.05974 0.314189743 -0.450992271 1.269088653 -0.45099227 1.269088653
X Variable 1 0.498830121 0.078351706 6.36655 8.1759E-05 0.324251642 0.673408601 0.32425164 0.673408601
Excel Results : Regression Line
Production Line Fit Plot
4
y = 0.4988x + 0.409
3.5 R² = 0.8021

2.5
electricity

2 electricity
Predicted electricity
1.5 Linear (electricity )

0.5

0
0 1 2 production
3 4 5 6 7
2.8 Example 1 - Summary
Estimated Regression Line 𝑦ො = 0.4091 + 0.4988𝑥
Electricity usage = 0.4091 + 0.4988*Production
Standard Error of Estimate = 0.173
Coefficient of Determination R2 = 0.802

𝛼 = 0.05; 𝑡𝛼/2,𝑛−2 = 𝑡0.025,10 = 2.228

𝑇 = 6.37, Critical Region: |T | > tα/2, n-2 .

Since 6.37 > 2.228, reject H0 , thus, the distribution of
Electricity usage does depend on level of production
63
2.8 Example 1 - Summary
• Using F-test. the null and alternative hypothesis are:
𝐻0 : 𝛽1 = 0
𝐻𝑎 : 𝛽1 ≠ 0

•  = 0.05. since n=12, we require F(0.05; 1,10).

from table F(0.05; 1,10) = 4.96.
decision rule is to reject h0 since:
𝐹 = 40.53 > 4.96

In conclusion, there is also a linear association between the

distribution of electricity usage and level of production
2.8 Example 1 - Interpretation
• Production coefficient (𝛽 1 =0.498): Each unit increase in
Production($million) adds 0.498 to Electricity usage.
• 𝛽 1> 0: (positive relationship): Electricity usage increases
with the increase in Production.
• Intercept coefficient (𝛽 0 = 0.409): The Electricity usage when
Production equal to zero.
• R Square = 0.802: indicates that the model explains 80% of the
total variability in the electricity usage around its mean. (good fit)
• P-value < 0.05: The regression is significant. The change in
production impacts the electricity usage.

Internal
2.9 Example 2 - Application of SLR
to Hydraulic-calibration data
Example: Given data on Permeability and Reservoir Quality
Index, RQI, investigate the dependence of RQI (Y) on
Permeability (X).
Set up the hypothesis :

H0 : β = 0 ( there is no relationship between x and y)

There is no relationship between RQI and Permeability

H1: β ≠ 0 (the straight-line model is adequate)

There is relationship between RQI and Permeability

Internal
Excel Results – Example 2
Regression Statistics
Multiple R 0.680322
R Square 0.462837
Adjusted R
Square 0.461716
Standard Error 0.40947
Observations 481

ANOVA
df SS MS F Significance F
Regression 1 69.19926 69.19926 412.7226 1.22E-66
Residual 479 80.31167 0.167665
Total 480 149.5109

Coefficients Standard Error t Stat P-value

Intercept 0.309739 0.019769 15.66798 5.73E-45
Permeability
(md) 0.00171 8.42E-05 20.31558 1.22E-66

Internal
Excel Results – Example 2
Permeability(md) Line Fit Plot
5.00
4.50 y = 0.3097 + 0.0017x
4.00 R² = 0.4628
3.50
3.00
RQI

2.50 RQI
2.00 Predicted RQI
1.50 Linear (RQI)

Internal
2.9 Example 2 - Interpretation of
the results
• Permeability(md) coefficient (𝛽 1 =0.0017): Each unit
increase in Permeability adds 0.0017 to RQI value when
all other variables are fixed.
• 𝛽 1> 0: (positive relationship): RQI increases with the
increase in Permeability.
• Intercept coefficient (𝛽 0 = 0.309): The value of RQI when
Permeability equal to zero.
• R Square = 0.462837: indicates that the model explains 46% of
the total variability in the RQI values around its mean.
• P-value < 0.05: The regression is significant

Internal
70

MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
12 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
CHAPTER 2 Simple Linear Regression
No ratings yet
CHAPTER 2 Simple Linear Regression
76 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
60 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Lec 6
No ratings yet
Lec 6
19 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
38 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Linear Regression - 1st draft (1)
No ratings yet
Linear Regression - 1st draft (1)
5 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
No ratings yet
DIMPAS_BSCPE_2-7_ASSIGNMENT_NO.9
17 pages
F_Regression
No ratings yet
F_Regression
65 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
LINEAR REGRESSION MODEL 1
No ratings yet
LINEAR REGRESSION MODEL 1
23 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Regression Equations
No ratings yet
Regression Equations
94 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
linearregression-190924053948
No ratings yet
linearregression-190924053948
10 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Linear Regression (1)
No ratings yet
Linear Regression (1)
19 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
da-unit-iii
No ratings yet
da-unit-iii
43 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
Unit III
No ratings yet
Unit III
18 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Course 10-Part 1
No ratings yet
Course 10-Part 1
32 pages
Regression
No ratings yet
Regression
32 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages

Chapter 2 Simple Linear Regression

Uploaded by

Chapter 2 Simple Linear Regression

Uploaded by

FEM 2063 - Data Analytics

SIMPLE LINEAR REGRESSION

Strong relationships Weak relationships

© 2019 Petroliam Nasional Berhad (PETRONAS) | 9

 The range of possible values is from -1 to +1

 The correlation is high if observations lie close to a straight

1 indicates the change in the mean response per unit

LS methods minimizes the Sum of the Squared

Serial correlation is the relationship between a given variable and a

Estimate the linear regression equation

• the least squares estimates of the regression coefficients

𝛽መ0 = 1436.5 − 10.8(56.4) = 828

𝑆𝑎𝑙𝑒𝑠 = 828 + 10.8(50) = 1368

This is called the point estimate (forecast) of the mean response

If SST = 0, all observations are the same (no variability).

ǉ 𝟐 = ෍(𝐲𝐢 − 𝐲ො𝐢 )𝟐 + ෍( 𝐲ො𝐢 − 𝐲)

SST = SSE + SSR

© 2019 Petroliam Nasional Berhad (PETRONAS) | 38

• A sum of squares divided by its degrees of

(i) standard error of estimate (s)

𝐒𝐒𝐄 = ෍(𝐲𝐢 − 𝑦ො𝐢 )𝟐

➢ The smaller SSE the more successful is the Linear

➢proportion of variability in the observed dependent variable

© 2019 Petroliam Nasional Berhad (PETRONAS) | 43

◼The Null Hypothesis (H0):

◼The Alternative Hypothesis (H1):

© 2019 Petroliam Nasional Berhad (PETRONAS) | 44

© 2019 Petroliam Nasional Berhad (PETRONAS) | 45

• One sided (tailed) H 0 :   0 or  = 0

Note: μ0 is the value given/assumed for the parameter μ.

© 2019 Petroliam Nasional Berhad (PETRONAS) | 48

© 2019 Petroliam Nasional Berhad (PETRONAS) | 49

➢ Thus, At A Given Level, We Can Use Either The t-test Or The

➢ The t-test Is More Flexible Since It Can Be Used For One

➢ Test the hypothesis

The t – test is used to test for inference on individual

◼ Large values of F support Ha and values of F near 1

© 2019 Petroliam Nasional Berhad (PETRONAS) | 56

i. Estimate the linear regression equation

H0 : β = 0 ( there is no relationship between x and y)

H1: β ≠ 0 (the straight-line model is adequate)

𝛼 = 0.05; 𝑡𝛼/2,𝑛−2 = 𝑡0.025,10 = 2.228

𝑇 = 6.37, Critical Region: |T | > tα/2, n-2 .

•  = 0.05. since n=12, we require F(0.05; 1,10).

In conclusion, there is also a linear association between the

© 2019 Petroliam Nasional Berhad (PETRONAS) | 65

H0 : β = 0 ( there is no relationship between x and y)

H1: β ≠ 0 (the straight-line model is adequate)

© 2019 Petroliam Nasional Berhad (PETRONAS) | 66

Coefficients Standard Error t Stat P-value

© 2019 Petroliam Nasional Berhad (PETRONAS) | 67

© 2019 Petroliam Nasional Berhad (PETRONAS) | 69

You might also like