0% found this document useful (0 votes)
10 views

Regression Model

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Regression Model

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Predictive Models

Y = f (x1, x2, …, xn) + ɛ


• Y and X are continuous: Regression Models;
• Y is continuous; X is categorical: ANOVA;
• Y is categorical, X is continuous: Logistics Regression;
Classification techniques;
Cluster Analysis;
• Y is categorical, X is categorical: Regression Trees;
Regression Analysis
• Existence & degree of association : Correlation
• Extent of causal relationship: Regression

• Simple Linear regression model:


• Estimated y is ŷ = a + b x
Least square method
• If yi = a + b xi + ei
i.e., actual y = ŷ + error value
• Then minimize the squared sum of ei

n n 2

e 2
i   y
ˆ i  ( a  bxi ) 
i 1 i 1
• Solving the following two normal equations for a and b
 y na  b x
 xy a  x  b x 2

• Alternatively

 xy   x y
S xy n
b 
S xx  x
2

x  n
2

a
 y
 b
 x
n n
Coefficient of Determination
2

  xy 
 x y 

2
S xy  n 
R2  
S xx S yy 
 x 
2
 x
2
 
  y 
2
 y 
2


 n  n 
 

• R2 : Proportion of variation of values of y explained by


the regression model.
• 0≤ R2 ≤1
• R2 = 1, indicates the regression line is a perfect
estimation of linear relationship
between x & y.
• R2 = 0, indicates no relationship
Example: In this table y is the purity of oxygen produced in a chemical
distillation process, and x is the percentage of hydrocarbons that are
present in the main condenser of the distillation unit.
Observation Hydrocarbon Purity, y (%) Observation Hydrocarbon Purity, y (%)
No. Level, x (%) No. Level, x (%)

1 0.99 90.01 11 1.19 93.54


2 1.02 89.05 12 1.15 92.52
3 1.15 91.43 13 0.98 90.56
4 1.29 93.74 14 1.01 89.54
5 1.46 96.73 15 1.11 89.85
6 1.36 94.45 16 1.20 90.39
7 0.87 87.59 17 1.26 93.25
8 1.23 91.77 18 1.32 93.41
9 1.55 99.42 19 1.43 94.98
10 1.40 93.65 20 0.95 87.33
b= 14.94748
a= 74.28331
R2 = 0.877

• The regression coefficient b = 14.947 indicates the change in purity %


with unit change in hydrocarbon level.
• Coefficient of Determination , R2 = 0.877
It implies that 87.7% of the variation in purity % is explained by the
estimated line and the remaining 12.3% of the variation may be
explained either by other variables or errors in measurements or both.
Multiple Linear Regression
More than one independent variables are involved to express a single
dependent variable.
Attempt is to increase the accuracy of the estimates.
It is expressed as,
y a  b1 x1  b2 x2  ......  bk xk  
where, x1 , x2 ,........, xk are independen t variable s and
y is the dependent variable.
b1 , b2 ,........., bk are the regression coefficien ts
which are to be estimated.
 is the error term
Broad steps involved in developing a linear
multiple regression model

1. Hypothesize the form of the model. This involves


the choice of the independent variables to be
included in the model.

2. Estimate the unknown parameters, a, b , b , ….b .


1 2 k

3. Make inferences on the estimates.


Scatterplot Matrix (SPLOM)
• Scatterplot Matrix plots all possible combinations of two or more
numeric variables against one another.
• The plots are arranged in rows and columns, with the same number
of rows and columns as there are variables. The point of the plot is
simple. When you have many variables to plot against each other in
scatterplots, it is logical to arrange the plots in rows and columns
using a common vertical scale for all plots within a row (and a
common horizontal scale within columns). All complete x-y pairs
within each plot are used; that is pairwise deletion is used for missing
data.
Machine Cross
Density direction direction
strength strength
0 0.801 121.41 70.42
1 0.824 127.70 72.47
2 0.841 129.20 78.20
3 0.816 131.80 74.89
4 0.840 135.10 71.21
LEAST SQUARES ESTIMATION OF THE
PARAMETERS

The least squares function is


Normal Equations are written as matrix form,
Coefficient of Determination
• R2 : Proportion of variation of values of y explained by the
regression model.
• 0≤ R2 ≤1
• R2 = 1, indicates the regression line is a perfect estimation of linear
relationship between x & y.
• R2 = 0, indicates no relationship
I. Test for significance of regression

This test for significance is to determine whether a


linear relationship exists between the response variable y
and a set of the regressor variables x1, x2, ….xk.

The hypothesis are


Ho : b1 = b2 = ….= bk = 0
H1 : bj ≠ 0 for at least one j.
ANOVA for testing significance of regression

Source of Variation Sum of df Mean Sum of Fo p-value


Squares Squares
Regression SSR k MSR=SSR/k MSR/MSE
Error SSE n-k-1 MSE=SSE/(n-k-1)
Total TSS n-1

n is the number of data points in the sample

n 2
SSE  ( y i  yˆ i ) 2  n

i 1
n n
  y
 i 1 
i 
TSS  ( y i  y i )  y i 
2 2

SSR TSS  SSE i 1 i 1 n


From table we get, Fα, k, n-k-1 = Ftable

If Fo > Ftable , then reject Ho

OR,

if p-value < α , then reject Ho.


II. Tests on individual regression co-efficients.
Such tests are useful in determining the potential of each of
the regressor variables in the regression model.

The model might be more effective with the inclusion of an


additional variable or perhaps the deletion of one or more of
the regressors present in the model.

Hypothesis: bˆ j
t0 
 
Ho : bj = 0
H1 : bj ≠ 0
se bˆ j

If t 0  t  / 2,n  k  1 , OR, if p-value < α/2 , then reject Ho.


III. Confidence Interval for dependent variable.

A 100(1-α) % CI on the dependent variable is given by

yˆ  t  / 2, n  k  1 seyˆ   yˆ  yˆ  t  / 2, n  k  1 seyˆ 

se( yˆ )  MSE Standard error of Estimate


IV. Confidence Intervals of individual regression co-efficients.

A 100(1-α) % CI on the regression co-efficient bj is given by,

   
bˆ j  t  / 2, n  k  1 se bˆ j b j bˆ j  t  / 2, n  k  1 se bˆ j
Standardized regression coefficient
Regression model is estimated using standardized data.

Dimensionless regression co-efficient can help to compare the


relative importance of each variable.

If bˆ j  bˆi , then we can say that regressor xj produces a


larger effect than the regressor xi.
EX: The data shown in Table represent the thrust of a jet-turbine engine (y)
and six candidate regressors: x1 = primary speed of rotation, x2 = secondary
speed of rotation, x3 = fuel low rate, x4 = pressure, x5 = exhaust
temperature, and x6 = ambient temperature at time of test.

(a) Fit a multiple linear regression model with the above data and interpret
the results.
(b) Fit a multiple linear regression model using x3 = fuel low rate, x4 =
pressure, and x5 = exhaust temperature as the regressors, and interpret
the results.
(c) Refit the model using y∗ = ln (y) as the response variable and x3* = ln(x3)
as the regressor (along with x4 and x5). How do you compare with the
previous fitted regression model?
Obs y x1 x2 x3 x4 x5 x6
1 4540 2140 20640 30250 205 1732 99
2 4315 2016 20280 30010 195 1697 100
3 4095 1905 19860 29780 184 1662 97
4 3650 1675 18980 29330 164 1598 97
5 3200 1474 18100 28960 144 1541 97
6 4833 2239 20740 30083 216 1709 87
7 4617 2120 20305 29831 206 1669 87
8 4340 1990 19961 29604 196 1640 87
9 3820 1702 18916 29088 171 1572 85
10 3368 1487 18012 28675 149 1522 85
11 4445 2107 20520 30120 195 1740 101
12 4188 1973 20130 29920 190 1711 100
13 3981 1864 19780 29720 180 1682 100
14 3622 1674 19020 29370 161 1630 100
15 3125 1440 18030 28940 139 1572 101
16 4560 2165 20680 30160 208 1704 98
17 4340 2048 20340 29960 199 1679 96
18 4115 1916 19860 29710 187 1642 94
19 3630 1658 18950 29250 164 1576 94
20 3210 1489 18700 28890 145 1528 94
21 4330 2062 20500 30190 193 1748 101
22 4119 1929 20050 29960 183 1713 100
23 3891 1815 19680 29770 173 1684 100
24 3467 1595 18890 29360 153 1624 99
25 3045 1400 17870 28960 134 1569 100
26 4411 2047 20540 30160 193 1746 99
27 4203 1935 20160 29940 184 1714 99
28 3968 1807 19750 29760 173 1679 99
29 3531 1591 18890 29350 153 1621 99
30 3074 1388 17870 28910 133 1561 99
31 4350 2071 20460 30180 198 1729 102
32 4128 1944 20010 29940 186 1692 101
33 3940 1831 19640 29750 178 1667 101
34 3480 1612 18710 29360 156 1609 101
35 3064 1410 17780 28900 136 1552 101
36 4402 2066 20520 30170 197 1758 100
37 4180 1954 20150 29950 188 1729 99
38 3973 1835 19750 29740 178 1690 99
39 3530 1616 18850 29320 156 1616 99
40 3080 1407 17910 28910 137 1569 100

You might also like