0% found this document useful (0 votes)

31 views41 pages

STAT22209 - Chapter 02-Regression Analyisis - 2022

Regression analysis

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views41 pages

STAT22209 - Chapter 02-Regression Analyisis - 2022

Regression analysis

Uploaded by

Hasitha Dhananjaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

B.Sc. Special (Math. & Stat. ) (Ruhuna), M.Sc. (Industrial Mathematics) (USJ),
M.Sc. (Stat. ) (WHUT, China),
Ph.D. (Applied Statistics, WHUT)
Introduction to Correlation
and
Regression Analysis

Chapter 01
The Regression Analysis
• Regression analysis is a powerful statistical method that allows you to
examine the relationship between two or more variables of interest.

• A regression analysis generates an equation to describe the statistical

relationship between one or more predictors and the response variable and
to predict new observations.

• It is a statistical tool used to determine the probable change in one

variable for the given amount of change in another. This means, the
value of the unknown variable can be estimated from the known value
of another variable
Regression Equation

• The Regression Equation is the algebraic expression of the

regression lines.

Y= a + b X
• X- independent variable Y - dependent variable

• a - intercept on Y axis b - slope of the line

• Dependent Variable: This is the main factor that you’re trying

to understand or predict.

• Independent Variables: These are the factors that you

hypothesize have an impact on your dependent variable.
• There are two methods of obtaining regression line
1) The scatter diagram method

2) Method of least square

The scatter diagram method
• Scatter diagram is the simplest method for representing data.

• Suppose the two variables are X and Y and there are ‘n’ pairs
of values
( x1 , y1 ), ( x2 , y2 ), ........ , ( xn , yn )
• Generally independent variable is plotted along the
horizontal (X) axis and depend variable plotted along the
vertical (Y) axis.

• Plotting your data is the first step in figuring out if there is a

relationship between your independent and dependent
variable
 Calculate ( X , Y ) values.
 The paired observations are plotted.
 Then draw the line through the mean point.

Variable Y (X , Y) Y

a X

Variable X

Y
b
X
Y a  bX
Example :
The data given below is collected from 7 persons from a
department of Physical Sciences and Technology referring to
years of service and their monthly income. Plot the values and
get the regression line X on Y.

Employee A B C D E F

Years of Service (X) 2 3 5 6 8 9

Income (in 1000 Rs.) (Y) 5 6 7 8 12 14

8
Why should your organization use regression analysis?

• Regression analysis is helpful statistical method that can be

leveraged across an organization to determine the degree to
which particular independent variables are influencing
dependent variables.
The Method of Ordinary Least Squares

• In ordinary least squares (OLS) regression, the estimated

equation is calculated by determining the equation that
minimizes the sum of the squared distances between the
sample's data points and the values predicted by the
equation.
The Classical Assumptions
Assumption 1: The disturbances have zero mean, i.e., for every .

• This assumption is needed to insure that on the average we are on

the true line.
Assumption 2: The disturbances have a constant variance, i.e., for
every . This insures that every observation is equally reliable.

Assumption 3: The disturbances are not correlated, i.e.,for ,

Assumption 4: The explanatory variable X is non-stochastic, i.e., fixed

in repeated samples, and hence, not correlated with the
disturbances. Also, and has a finite limit as n tends to infinity.
Least squares Estimation

• Least squares minimizes the residual sum of squares where

the residuals are given by

and and denote guesses on the regression parameters and ,

respectively.

• The residual sum of squares denoted by

is minimized by the two first-order conditions:

• The equations and are called the least-squares equations for
estimating the parameters of a line.
• The equations and are called the least-squares equations for
estimating the parameters of a line.

• The least-squares equations are linear in and and hence can

be solved simultaneously. The solutions are
Example :
The data given below is collected from 6
persons from SUSL referring to years of service and their
monthly income

Employee A B C D E F
Years of Service (X) 2 3 5 6 8 9
Income (in 1000 Rs.) (Y) 5 6 7 8 12 14

08/04/2024 17
X Y XY 2
x
2 5 15 4

3 6 18 9
5 5 25 25
6 8 48 36
8 12 96 64
9 14 126 81

 y  50  xy  328  x
2
 x  33  219

18
 XY  (  X Y
b n
)
& a
 Y  b X 
( x ) 2
n
X  n
2

(33  50)
328 
6 53 50  1.41 33
b 2
  1.41 & a  0.578
33 37.5 6
219 
6
 Y  0.578  1.41 X

08/04/2024 19
Example : Test score and sales Data of Salesmen.

Sales man A B C D E F G H I J
Test Score 50 80 60 70 90 60 80 50 70 90
(X)
Sales (‘000) 3.5 7.0 5.0 6.0 5.0 4.0 6.0 4.0 5.5 4.0
(Y)

From the above data calculate the regression line of Y on X

and estimate the probable weekly sales volume for a score
of 100 in the intelligence test.
08/04/2024 20
n  10  x 700  y  50   2000  xy  70
x 2

 Y  0.035 x  2.55

x  100  Y  0.035 (100)  2.55  6.05

Thus, the most probable weekly sales volume if

a salesman makes a score of 100 in the
intelligence test is 6050 or 6.05 thousands.

08/04/2024 21
Example
A sample of 6 persons was selected the value of their age ( x
variable) and their weight is demonstrated in the following
table.

Find the regression equation and what is the predicted weight

when age is 8.5 years.
Weight (y) Age (x) .Serial no
12 7 1
8 6 2
12 8 3
10 5 4
11 6 5
13 9 6
Exercise 2
• The following are the age (in years) and systolic
blood pressure of 20 apparently healthy adults.
B.P (y) Age (x) B.P (y) Age (x)

128 46 120 20
136 53 128 43
146 60 141 63
124 20 126 26
143 63 134 53
130 43 128 31
124 26 136 58
121 19 132 46
126 31 140 58
123 23 144 70
1. Find the correlation between age and
blood pressure using simple and
Spearman's correlation coefficients, and
comment.

2. Find the regression equation?

3. What is the predicted blood pressure for a

man aging 25 years?
Regression validation
• Model validation is possibly the most important step in the
model building sequence.

• There are many statistical tools for model validation can be

seen in the literature.

• But the primary tool for most process modeling applications

is graphical residual analysis.
Residual Plots
• A residual plot is a graph that shows the residuals on the
vertical axis and the independent variable on the horizontal
axis.

• If the points in a residual plot are randomly dispersed around

the horizontal axis, a linear regression model is appropriate
for the data; otherwise, a non-linear model is more
appropriate.
Chart displays the residual (e) and independent variable (X) as a residual plot.

• The residual plot shows a fairly random pattern

– The first residual is positive,

– the next two are negative,

– the fourth is positive,

– and the last residual is negative.

• This random pattern indicates that a linear

model provides a decent fit to the data.
Residual Plots
• The residual plots show three typical patterns.

• The first plot shows a random pattern, indicating a good fit

for a linear model.

• The other plot patterns are non-random (U-shaped and

inverted U), suggesting a better fit for a non-linear model.
What Is R-squared?
• R-squared is a statistical measure of how close the data are
to the fitted regression line.
• The definition of R-squared is fairly straight-forward; it is
the percentage of the response variable variation that is
explained by a linear model.

Total Variation =

Explained Variation=
What Is R-squared?
• R-squared is always between 0 and 100%:

• 0% indicates that the model explains none of the variability

of the response data around its mean.

• 100% indicates that the model explains all the variability of

the response data around its mean.

• In general, the higher the R-squared, the better the model fits
your data.
• The regression model on the left accounts for 38.0% of the variance while the one

on the right accounts for 87.4%.

• The more variance that is accounted for by the regression model the closer the

data points will fall to the fitted regression line.

• Theoretically, if a model could explain 100% of the variance, the fitted values

would always equal the observed values and, therefore, all the data points would
Exercise
• The following are the age (in years) and systolic
blood pressure of 20 apparently healthy adults.
B.P (y) Age (x) B.P (y) Age (x)

128 46 120 20
136 53 128 43
146 60 141 63
124 20 126 26
143 63 134 53
130 43 128 31
124 26 136 58
121 19 132 46
126 31 140 58
123 23 144 70
1. Find the correlation between age and
blood pressure using simple or Spearman's
correlation coefficients, and comment.

2. Find the regression equation?

3. Calculate R- Square ; comment.

4. What is the predicted blood pressure for a

man aging 25 years?

Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
DA_UNIT_3_R22
No ratings yet
DA_UNIT_3_R22
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
No ratings yet
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
33 pages
C6 Regression
No ratings yet
C6 Regression
27 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Linear Models
No ratings yet
Linear Models
92 pages
Introducing Regression: Notes Unit 5: Regression Basics
No ratings yet
Introducing Regression: Notes Unit 5: Regression Basics
5 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
BA unit3
No ratings yet
BA unit3
42 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
Chapter 8 Regression Model - 2023
No ratings yet
Chapter 8 Regression Model - 2023
21 pages
Engineering - Simple Correlation and Regression - 2024
No ratings yet
Engineering - Simple Correlation and Regression - 2024
35 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Unit-III
No ratings yet
Unit-III
13 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
STAR Rando Questions Stats
No ratings yet
STAR Rando Questions Stats
14 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
FDA UNIT 5
No ratings yet
FDA UNIT 5
20 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Lecture 4
No ratings yet
Lecture 4
3 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Spatial Data And Intelligence 4th International Conference Spatialdi 2023 Nanchang China April 1315 2023 Proceedings Xiaofeng Meng instant download
No ratings yet
Spatial Data And Intelligence 4th International Conference Spatialdi 2023 Nanchang China April 1315 2023 Proceedings Xiaofeng Meng instant download
86 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Regression PDF
No ratings yet
Regression PDF
16 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Topic 4 Decision Analysis
No ratings yet
Topic 4 Decision Analysis
45 pages
Chapter 3
No ratings yet
Chapter 3
22 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
ArunRangrej
No ratings yet
ArunRangrej
5 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression Analysis - VCE Further Mathematics
No ratings yet
Regression Analysis - VCE Further Mathematics
5 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Digital Signal Processing Using Deep Neural Networks: Brian Shevitski, Yijing Watkins, Nicole Man and Michael Girard
No ratings yet
Digital Signal Processing Using Deep Neural Networks: Brian Shevitski, Yijing Watkins, Nicole Man and Michael Girard
21 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Uplift Modeling
No ratings yet
Uplift Modeling
93 pages
ME 422_442_StaticFailureTheories_r4 - Tagged
No ratings yet
ME 422_442_StaticFailureTheories_r4 - Tagged
29 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Logistic Regression vs Decision Tree
No ratings yet
Logistic Regression vs Decision Tree
2 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Lab 9 SS
No ratings yet
Lab 9 SS
24 pages
Module 2 Bisection Method
No ratings yet
Module 2 Bisection Method
10 pages
Rank & Normal Form of Matrix
No ratings yet
Rank & Normal Form of Matrix
19 pages
Kunal_From Sensors to Solutions A Survey on IoT and Machine Learning in Modern Agriculture
No ratings yet
Kunal_From Sensors to Solutions A Survey on IoT and Machine Learning in Modern Agriculture
5 pages
12 - Production Activity Control
No ratings yet
12 - Production Activity Control
19 pages
QP_univ_DS-set2
No ratings yet
QP_univ_DS-set2
6 pages
LLM 1
No ratings yet
LLM 1
2 pages
Thesis Fuzzy Linear Programming Problems Solved With Fuzzy Decisive Set Methods Shembujt Te Zgjidhur PDF
No ratings yet
Thesis Fuzzy Linear Programming Problems Solved With Fuzzy Decisive Set Methods Shembujt Te Zgjidhur PDF
32 pages
QB SJ
No ratings yet
QB SJ
56 pages
Lecture 05 - IntegerProgramming
No ratings yet
Lecture 05 - IntegerProgramming
26 pages
Computer Security (EITA25) - Report For Project 2
No ratings yet
Computer Security (EITA25) - Report For Project 2
13 pages
CL7201-Process Dynamics and Control
No ratings yet
CL7201-Process Dynamics and Control
13 pages
Dual Domain Image Encryption Using Bit Plane Scrambling and Sub - Band Scrambling
No ratings yet
Dual Domain Image Encryption Using Bit Plane Scrambling and Sub - Band Scrambling
16 pages
Calculus Notes
No ratings yet
Calculus Notes
10 pages
5) N Catalan Numbers
No ratings yet
5) N Catalan Numbers
4 pages
Ch.06 Feedback Linearization - Problem 3
No ratings yet
Ch.06 Feedback Linearization - Problem 3
13 pages
Resume_public
No ratings yet
Resume_public
1 page
Aditya Engineering College (A) : Signals and Systems
No ratings yet
Aditya Engineering College (A) : Signals and Systems
17 pages
Inverted Pendulum
No ratings yet
Inverted Pendulum
73 pages
Alexander 4
No ratings yet
Alexander 4
9 pages
MATH 3-5 Marks Practice Set - SIMPLEX, DUAL, DUAL SIMPLEX, REVISED SIMPLEX
No ratings yet
MATH 3-5 Marks Practice Set - SIMPLEX, DUAL, DUAL SIMPLEX, REVISED SIMPLEX
4 pages
SIT281 Problem Based Learning B 2022
No ratings yet
SIT281 Problem Based Learning B 2022
6 pages
MGOC10 - Review Problems - Chapters 4, 6, 7 & 13 - Solution - Fall2020
No ratings yet
MGOC10 - Review Problems - Chapters 4, 6, 7 & 13 - Solution - Fall2020
6 pages
Lcs 9
No ratings yet
Lcs 9
5 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet

STAT22209 - Chapter 02-Regression Analyisis - 2022

Uploaded by

STAT22209 - Chapter 02-Regression Analyisis - 2022

Uploaded by

Advanced Statistics II

( PST22209/ FST 22209/ ESNRM22209)

R.M. KAPILA RATHNAYAKA

• A regression analysis generates an equation to describe the statistical

• It is a statistical tool used to determine the probable change in one

• The Regression Equation is the algebraic expression of the

• a - intercept on Y axis b - slope of the line

• Dependent Variable: This is the main factor that you’re trying

• Independent Variables: These are the factors that you

2) Method of least square

• Plotting your data is the first step in figuring out if there is a

Years of Service (X) 2 3 5 6 8 9

Income (in 1000 Rs.) (Y) 5 6 7 8 12 14

• Regression analysis is helpful statistical method that can be

• In ordinary least squares (OLS) regression, the estimated

• This assumption is needed to insure that on the average we are on

Assumption 3: The disturbances are not correlated, i.e.,for ,

Assumption 4: The explanatory variable X is non-stochastic, i.e., fixed

• Least squares minimizes the residual sum of squares where

and and denote guesses on the regression parameters and ,

• The residual sum of squares denoted by

is minimized by the two first-order conditions:

• The least-squares equations are linear in and and hence can

From the above data calculate the regression line of Y on X

x  100  Y  0.035 (100)  2.55  6.05

Thus, the most probable weekly sales volume if

Find the regression equation and what is the predicted weight

2. Find the regression equation?

3. What is the predicted blood pressure for a

• There are many statistical tools for model validation can be

• But the primary tool for most process modeling applications

• If the points in a residual plot are randomly dispersed around

• The residual plot shows a fairly random pattern

– The first residual is positive,

– the next two are negative,

– the fourth is positive,

– and the last residual is negative.

• This random pattern indicates that a linear

• The first plot shows a random pattern, indicating a good fit

• The other plot patterns are non-random (U-shaped and

• 0% indicates that the model explains none of the variability

• 100% indicates that the model explains all the variability of

on the right accounts for 87.4%.

data points will fall to the fitted regression line.

2. Find the regression equation?

3. Calculate R- Square ; comment.

4. What is the predicted blood pressure for a

You might also like