STAT22209 - Chapter 02-Regression Analyisis - 2022
STAT22209 - Chapter 02-Regression Analyisis - 2022
Chapter 01
The Regression Analysis
• Regression analysis is a powerful statistical method that allows you to
examine the relationship between two or more variables of interest.
Y= a + b X
• X- independent variable Y - dependent variable
• Suppose the two variables are X and Y and there are ‘n’ pairs
of values
( x1 , y1 ), ( x2 , y2 ), ........ , ( xn , yn )
• Generally independent variable is plotted along the
horizontal (X) axis and depend variable plotted along the
vertical (Y) axis.
Variable Y (X , Y) Y
a X
Variable X
Y
b
X
Y a bX
Example :
The data given below is collected from 7 persons from a
department of Physical Sciences and Technology referring to
years of service and their monthly income. Plot the values and
get the regression line X on Y.
Employee A B C D E F
8
Why should your organization use regression analysis?
Employee A B C D E F
Years of Service (X) 2 3 5 6 8 9
Income (in 1000 Rs.) (Y) 5 6 7 8 12 14
08/04/2024 17
X Y XY 2
x
2 5 15 4
3 6 18 9
5 5 25 25
6 8 48 36
8 12 96 64
9 14 126 81
y 50 xy 328 x
2
x 33 219
18
XY ( X Y
b n
)
& a
Y b X
( x ) 2
n
X n
2
(33 50)
328
6 53 50 1.41 33
b 2
1.41 & a 0.578
33 37.5 6
219
6
Y 0.578 1.41 X
08/04/2024 19
Example : Test score and sales Data of Salesmen.
Sales man A B C D E F G H I J
Test Score 50 80 60 70 90 60 80 50 70 90
(X)
Sales (‘000) 3.5 7.0 5.0 6.0 5.0 4.0 6.0 4.0 5.5 4.0
(Y)
Y 0.035 x 2.55
08/04/2024 21
Example
A sample of 6 persons was selected the value of their age ( x
variable) and their weight is demonstrated in the following
table.
128 46 120 20
136 53 128 43
146 60 141 63
124 20 126 26
143 63 134 53
130 43 128 31
124 26 136 58
121 19 132 46
126 31 140 58
123 23 144 70
1. Find the correlation between age and
blood pressure using simple and
Spearman's correlation coefficients, and
comment.
Total Variation =
Explained Variation=
What Is R-squared?
• R-squared is always between 0 and 100%:
• In general, the higher the R-squared, the better the model fits
your data.
• The regression model on the left accounts for 38.0% of the variance while the one
• The more variance that is accounted for by the regression model the closer the
• Theoretically, if a model could explain 100% of the variance, the fitted values
would always equal the observed values and, therefore, all the data points would
Exercise
• The following are the age (in years) and systolic
blood pressure of 20 apparently healthy adults.
B.P (y) Age (x) B.P (y) Age (x)
128 46 120 20
136 53 128 43
146 60 141 63
124 20 126 26
143 63 134 53
130 43 128 31
124 26 136 58
121 19 132 46
126 31 140 58
123 23 144 70
1. Find the correlation between age and
blood pressure using simple or Spearman's
correlation coefficients, and comment.