0% found this document useful (0 votes)
80 views

2.linear Regression

The document discusses linear regression using an advertising data set containing sales and advertising expenditure data for 200 markets. It describes the data set, which contains sales as the output variable and advertising budgets for TV, radio, and newspaper as input variables. It then provides equations and graphs to illustrate the linear regression model and how it fits a linear line to the sample data to represent the relationship between the input and output variables.

Uploaded by

koleti vikas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

2.linear Regression

The document discusses linear regression using an advertising data set containing sales and advertising expenditure data for 200 markets. It describes the data set, which contains sales as the output variable and advertising budgets for TV, radio, and newspaper as input variables. It then provides equations and graphs to illustrate the linear regression model and how it fits a linear line to the sample data to represent the relationship between the input and output variables.

Uploaded by

koleti vikas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Linear Regression

Vineet Padmanabhan Nair

School of Computer and Information Sciences


University of Hyderabad
Hyderabad.

May 19, 2022

Vineet Padmanabhan Machine Learning


Linear Regression-Sample Data

Figure: Example
Vineet Padmanabhan Data
Machine Learning
Linear Regression-Sample Data . . .

The Advertising data set consists of the sales of that product in


200 different markets, along with advertising budgets for the
product in each of those markets for three different media: TV,
radio, and newspaper.
The data are displayed in previous slide It is not possible for our
client to directly increase sales of the product.
On the other hand, they can control the advertising expenditure
in each of the three media.
Therefore, if we determine that there is an association between
advertising and sales, then we can instruct our client to adjust
advertising budgets, thereby indirectly increasing sales.

Vineet Padmanabhan Machine Learning


Linear Regression-Sample Data . . .

The advertising budgets are input variables while sales is an


output variable.
The input variables are typically denoted using the symbol X,
with a subscript to distinguish them. So X1 might be the TV
budget, X2 the radio budget, and X3 the newspaper budget.
The inputs go by different names, such as predictors,
independent variables, features, or sometimes just
variables.
The output variable in this case, sales, is often called the
response or dependent variable, and is typically denoted using
the symbol Y .

Vineet Padmanabhan Machine Learning


Linear Regression
The simplest deterministic mathematical relationship between two
variables x and y is a linear relationship

y = β 0 + β1 x (1)

sales ≈ β0 + β1 × T V

The set of pairs (x, y) for which y = β0 + β1 x determines a


straight line with slope β1 and y-intercept β0 .
The slope of a line is the change in y for a 1-unit increase in x.
For example, if y = −3x + 10, then y decreases by 3 when x
increases by 1, so the slope is -3. The y-intercept is the height at
which the line crosses the vertical axis and is obtained by setting
x = 0 in the equation.
β0 and β1 are two unknown constants that represent the
intercept and slope terms in the linear model.
Together, β0 and β1 are known as the model coefficients or
parameters.
Vineet Padmanabhan Machine Learning
Linear Regression . . .

Once we have used our training data to produce estimates β̂0 and
β̂1 for the model coefficients, we can predict future sales on the
basis of a particular value of TV advertising by computing

ŷ = βˆ0 + βˆ1 x (2)

where ŷ indicates a prediction of Y on the basis of X = x


The variable whose value is fixed by the experimenter will be
denoted by x and will be called the independent, predictor, or
explanatory variable.
For fixed x, the second variable will be random; we denote this
random variable and its observed value by Y and y, respectively,
and refer to it as the dependent or response variable.
Usually observations will be made for a number of settings of the
independent variable.

Vineet Padmanabhan Machine Learning


Linear Regression . . .

Let x1 , x2 , . . . , xn denote values of the independent variable for


which observations are made, and let Yi and yi , respectively,
denote the random variable and observed value associated with
xi .
The available bivariate data then consists of the n pairs
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
A picture of this data called a scatterplot gives preliminary
impressions about the nature of any relationship.
In such a plot, each (xi , yi ) is represented as a point plotted on a
two-dimensional coordinate system.

Vineet Padmanabhan Machine Learning


Example Data

Figure: Example Data

Thus (x1 , y1 ) = (.40, 1.02), (x5 , y5 ) = (.57, 1.52), and so on.


Several observations have identical x values yet different y values
(x8 = x9 = .75, but y8 = 1.80 and y9 = 1.74). Thus the value of y
is not determined solely by x but also by various other factors.
Vineet Padmanabhan Machine Learning
Example Data . . .

Vineet Padmanabhan Machine Learning


Example Data . . .

Figure: Example Data

There is a strong tendency for y to increase as x increases.


It appears that the value of y could be predicted from x by
finding a line that is reasonably close to the points in the plot
In other words, there is evidence of a substantial (though not
perfect) linear relationship between the two variables.
The horizontal and vertical axes in the scatterplot intersect at
the point (0, 0). In many data sets, the values of x or y or the
values of both variables differ considerably from zero relative to
the range(s) of the values.
Vineet Padmanabhan Machine Learning
Example Data . . .

Figure: Example Data

Vineet Padmanabhan Machine Learning


Example Data . . .

Figure: Example Data

Vineet Padmanabhan Machine Learning


The Simple Linear Regression/probabilistic
Model

For the deterministic model y = β0 + β1 x, the actual observed value


of y is a linear function of x. The appropriate generalization of this to
a probabilistic model assumes that the expected value of Y is a linear
function of x, but that for fixed x the variable Y differs from its
expected value by a random amount.

Y = β0 + β1 X +  (3)

The variable  is usually referred to as the random deviation or


random error term in the model.
Without , any observed pair (x, y) would correspond to a point
falling exactly on the line y = β0 + β1 x, called the true (or
population) regression line.

Vineet Padmanabhan Machine Learning


Figure: Points corresponding to observations from the simple linear
regression model

Vineet Padmanabhan Machine Learning


The Simple Linear Regression/probabilistic
Model

The inclusion of the random error term allows (x, y) to fall either
above the true regression line (when  > 0) or below the line
(when  < 0).

The points x1 , y1 , . . . , xn , yn resulting from n independent


observations will then be scattered about the true regression line,
as illustrated in the previous figure.

Vineet Padmanabhan Machine Learning


Implications of the model equation
Y = β0 + β1 x + 

Let x∗ denote a particular value of the independent variable x


and
µY ·x∗ = the expected (or mean) value of Y when x has value x∗
σY2 = the variance of Y when x has value x∗
For example, if x = applied stress (kg/mm)2 and y =
time-to-failure, then µY ·20 would denote the expected value of
time-to-failure when applied stress is 20 kg/mm2
(Alternative notation is E(Y | x∗ ) and V (Y | x∗ )).
Consider an entire population of (x, y) pairs. Then µY ·x∗ is the
mean of all values for which x = x∗ and
σY2 ·x∗ is a measure of how much these values of y spread out
about the mean value.
If, for example, x = age of a child and y = vocabulary size, then
µY ·5 is the average vocabulary size for all 5-year-old children in
the population and σY2 ·5 describes the amount of variability in
vocabulary size for this part of the population
Vineet Padmanabhan Machine Learning
Implications of the model equation
Y = β0 + β1 x + 

Once x is fixed, the only randomness on the right-hand side of


the model equation Y = β0 + β1 x +  is in the random error ,
and its mean value and variance are 0 and σ 2 , respectively,
whatever the value of x. This implies that
µY ·x∗ = E(β0 + β1 x∗ + ) = β0 + β1 x∗ + E() = β0 + β1 x∗
σY2 ·x∗ = V (β0 + β1 x∗ ) + V () = 0 + σ 2 = σ 2
Replacing x∗ in µY ·x∗ by x gives the relation µY ·x = β0 + β1 x
which says that the mean value of Y , rather than Y itself, is a
linear function of x
The true regression line y = β0 + β1 x is thus the line of mean
values
The second relation states that the amount of variability in the
distribution of Y values is the same at each different values of x
(homogeneity of variance).

Vineet Padmanabhan Machine Learning


Figure: a) Distribution of  b) Distribution of Y for different values of X

Vineet Padmanabhan Machine Learning


Implications of the model equation
Y = β0 + β1 x + 

Figure: A simulated dataVineet Padmanabhan


set. Left: Machine the
The red line represents Learning
true relationship, f (X) = 2 + 3X,
Implications of the model equation
Y = β0 + β1 x + 

The true relationship is generally not known for real data.


In real applications, we have access to a set of observations from
which we can compute the least squares line (will be explained
shortly)
The population regression line is unobserved. In the right-hand
panel of the Figure given in the previous slide ten different data
sets are generated from the model using Y = β0 + β1 X +  and
plotted the corresponding ten least squares lines.
Different data sets generated from the same true model result in
slightly different least squares lines, but the unobserved
population regression line does not change.
We only have one data set, and so what does it mean that two
different lines describe the relationship between the predictor and
the response
Similar to standard statistical approach of using information
from a sample to estimate characteristics of a large population.
Vineet Padmanabhan Machine Learning
Implications of the model equation
Y = β0 + β1 x + 

Want to estimate the population mean µ of some random


variable Y wherein µ is unknown
Suppose n observations (y1 , . . . , yn ) from YPis known. Then a
n
reasonable estimate is µ̂ = ȳ, where ȳ = n1 i=1 yi is the sample
mean.
Though sample mean and population mean are different, in
general, sample mean will provide a good estimate of the
population mean.
Similarly β0 and β1 in linear regression define the population
regression line.

Vineet Padmanabhan Machine Learning


Implications of the model equation
Y = β0 + β1 x + 

The analogy between linear regression and estimation of the mean


of a random variable is an apt one based on the concept of bias.
If we use the sample mean µ̂ to estimate µ, this estimate is
unbiased, in the sense that on average, we expect µ̂ to equal µ.
This means that on the basis of one particular set of observations
y1 , . . . , yn , µ̂might overestimate µ, and on the basis of another set
of observations, µ̂ might underestimate µ.
But if we could average a huge number of estimates of µ obtained
from a huge number of sets of observations, then this average
would exactly equal µ.
Hence, an unbiased estimator does not systematically over- or
under-estimate the true parameter.

Vineet Padmanabhan Machine Learning


Implications of the model equation
Y = β0 + β1 x + 

If we estimate β0 and β1 on the basis of a particular data set,


then our estimates will not be exactly equal to β0 and β1 .
But if we could average the estimates obtained over a huge
number of data sets, then the average of these estimates would be
spot on!
In fact, we can see from the right- hand panel of the Figure given
earlier that the average of many least squares lines, each
estimated from a separate data set, is pretty close to the true
population regression line.

Vineet Padmanabhan Machine Learning


Estimating Model parameters

The values of β0 , β1 and σ 2 will almost be never known to an


investigator
The idea is to estimate the model parameters as well as the true
regression line from the sample data consisting of n observed
pairs (x1 , y1 ), . . . , (xn , yn )
These observations are assumed to have been obtained
independently of one another.
That is, yi is the observed value of Yi , where Yi = β0 + β1 xi + i .
The n deviations 1 , 2 , . . . , n are independent random variables
Independence of Y1 , Y2 , . . . , Yn follows from independence of the
0i s

Vineet Padmanabhan Machine Learning


Figure: Two different estimates of the true regression line

Principle of Least Squares


.According to this principle, a line provides a good fit to the data if
the vertical distances (deviations) from the observed points to the line
are small. The measure of the goodness of fit is the sum of the
squares of these deviations. The best-fit line is then the one having
the smallest possible sum of squared deviations.

Vineet Padmanabhan Machine Learning


Figure: Deviations of observed data from the line y = b0 + b1 x

Vineet Padmanabhan Machine Learning


Principle of Least Squares

The vertical deviation of the point (xi , yi ) from the line y = b0 + b1 x


is
height of point − height of line = yi − (b0 + b1 xi )
The sum of squared vertical deviations from the points
(x1 , y1 ), . . . , (xn , yn ) to the line is then
n
X
f (b0 , b1 ) = [yi − (b0 + b1 xi )]2
i=1

The point estimates of β0 and β1 , denoted by βˆ0 and βˆ1 and called
the least squares estimates, are those values that minimize
f (b0 , b1 ). That is, βˆ0 and βˆ1 are such that f (βˆ0 , βˆ1 ) ≤ f (b0 , b1 ) for
any b0 and b1 . The estimated regression line or least squares line is
then the line whose equation is

y = βˆ0 + βˆ1 x

Vineet Padmanabhan Machine Learning


Linear Regression - Derivation

n
X
f (b0 , b1 ) = [yi − (b0 + b1 xi )]2
i=1

Control is only on b0 and b1 , which are constants of the


regression model.
Control b0 and b1 so that the summation will be as small as
possible
In order to minimize we have to take the derivative of b0 and b1

Vineet Padmanabhan Machine Learning


Minimising values of b0 , b1 are found by taking partial derivatives of
f (b0 , b1 ) with respect to both b0 and b1 , equating them to 0 and
solving the equation.
 Pn 

 f (b0 , b1 ) = i=1 [yi − (b0 + b1 xi )]2 


 


 P 

∂ (yi −b0 −b1 xi )2

 ∂f (b0 ,b1 ) P 

Intercept = ∂b0 = i
∂b0 = −2 i (yi − b0 − b1 xi )

 

 P 
or 2(yi − b0 − b1 xi ) − 1 = 0

 


 

 

 P 
∂f (b0 ,b1 ) ∂ (yi −b0 −b1 xi )2


∂b1 = i
∂b1


Slope =
 P P 
−2 xi (yi − b0 − b1 xi ) or 2(yi − b0 − b1 xi )(−xi ) = 0.
 
i

Vineet Padmanabhan Machine Learning


Solve for Intercept

P
0 = −2 i (yi − b0 − b1 x i )
P P P
0 = i yi − i b0 − bi xi
(multipling by − 12 and distributing sum)
P P
N b0 = i y i − bi i xi
P  P 
yi xi
b0 = N
i
− b1 i
N = b0 = ȳ − b1 x̄

Vineet Padmanabhan Machine Learning


Solve for Slope
P
0 = −2 xi (yi − b0 − b1 xi )
i
P P P
0 = xi yi − b0 xi − b1 x2
i i i i
P P P
b1 x2 = xi yi − b0 xi
i i i i
 P  P 
P P yi xi P
b1 x2 = xi yi − i − b1 i xi
i i i N N i

 P   P 2 
P P yi xi
b1 x2 = xi yi − i − b1 i
i i i N N

 P 2  P P 
P xi P yi xi
b1 x2 + b1 i = xi yi − i i
i i N i N

  P 2  P P 
P xi P yi xi
b1 x2 + i = xi yi − i i
i i N i N

P P 
P yi xi
xi yi − i i
i N
b1 =  P 2 
P xi
x2 + i
i i N

Vineet Padmanabhan Machine Learning


The Gist

P
(xi −x̄)(yi −ȳ) Sxy
b1 = βˆ1 = P
(x −x̄)2
= Sxx .
i

P P 
P yi xi
Sxy = i xi yi − − T he N umerator
i i
N

2
 P 
P 2 ( i xi )
Sxx = i xi − N − T he Denominator

P P
yi −βˆ1 xi
b0 = βˆ0 = i
N
i
= ȳ − βˆ1 x̄

Vineet Padmanabhan Machine Learning


Example
The cetane number is a critical property in specifying the ignition
quality of a fuel used in a diesel engine. Determination of this number
for a biodiesel fuel is expensive and time consuming. The article
Relating the Cetane Number of Biodiesel Fuels to Their Fatty Acid
Composition: A Critical Study (J. of Automobile Engr., 2009:
565-583) included the following data on x = iodine value(g) and
y = cetane number f or a sample of 14 biof uels. The iodine value is
the amount of iodine necessary to saturate a sample of 100 g of oil.
The article’s authors fit the simple linear regression model to this
data. Let us see how that can be done :

Vineet Padmanabhan Machine Learning


Example Continued . . .
P P
Calculating the column sums gives x i = 1307.5, yi = 779.2,
x2i = 128, 913, yi2 = 43, 745.22 from which
P P P
xi yi = 71, 347.30,
(1307.5)2
Sxx = 128, 913.93 − 14 = 6802.7693

(1307.5)(779.2)
Sxy = 71, 347.30 − 14 = −1424.41429
The estimated slope of the true regression line (i.e., the slope of the
least squares line) is
Sxy −1424.41429
βˆ1 = = = −.20938742
Sxx 6802.7693
We estimate that the expected change in true average cetane number
associated with a 1 g increase in iodine value is 2.209 - i.e., a decrease
of .209. Since x̄ = 93.392857 and ȳ = 55.657143, the estimated
intercept of the true regression line (i.e., the intercept of the least
squares line) is
βˆ0 = ȳ − βˆ1 x̄ = 55.657143 − (−.20938742)(93.392857) = 75.2122432
The equation of the estimated regression line (least squares line) is
y = 75.212 − .2094x, exactly that reported in the cited article.
Vineet Padmanabhan Machine Learning
The estimated regression line can immediately be used for two
different purposes. For a fixed x value x∗ , βˆ0 + βˆ1 x∗ (the height of the
line above x∗ ) gives either (1) a point estimate of the expected value
of Y when x = x∗ or (2) a point prediction of the Y value that will
result from a single new observation made at x = x∗ . For instance a
point estimate of true average cetane number for all biofuels whose
iodine value is 100 is
µ̂Y.100 = βˆ0 + βˆ1 (100) = 75.212 − .2094(100) = 54.27
If a single biofuel sample whose iodine value is 100 is to be selected,
54.27 is also a point prediction for the resulting cetane number.
Vineet Padmanabhan Machine Learning
The least squares line should not be used to make a prediction for an
x value much beyond the range of the data, such as x = 40 or x = 150
in the Example given above. The danger of extrapolation is that
the fitted relationship (a line here) may not be valid for such x values.

Vineet Padmanabhan Machine Learning


Estimating σ 2 and σ

The parameter σ 2 determines the amount of variability inherent


in the regression model. A large value of σ 2 will lead to observed
(xi , yi )’s that are typically quite spread out about the true
regression line, whereas when σ 2 is small the observed points will
tend to fall very close to the true line
The fitted(or predicted) values yˆ1 , . . . , yˆn are obtained by
successively substituting x1 , . . . , xn into the equation of the
estimated regression line.

yˆ1 = βˆ0 + βˆ1 x1 , yˆ2 = βˆ0 + βˆ1 x2 , . . . , yˆn = βˆ0 + βˆ1 xn

The residuals are the differences y1 − yˆ1 , y2 − yˆ2 , . . . , yn − yˆn


between the observed and fitted y values.

Vineet Padmanabhan Machine Learning


In words

The predicted value y¯i is the value of y that we would predict or


expect when using the estimated regression line with x = xi
yˆi is the height of the estimated regression line above the value xi
for which the ith observation was made.
The residual yi − yˆi is the vertical deviation between the point
(xi , yi ) and the least squares line - a positive number if the point
lies above the line and a negative number if it lies below the line.

Vineet Padmanabhan Machine Learning


Fitted Values

Relevant summary statistics


P P P 2
xiP= 2817.9, yi = 1574.8,
P 2 xi = 415, 949.85,
xi yi = 222, 657.88, yi = 124, 039.58,
x̄ = 140.895, ȳ = 78.74, Sxx = 18, 921.8295, Sxy = 776.434

βˆ1 = 18,921.8295
776.434
= .04103377 ≈ .041
βˆ0 = 78.74 − (.04103377)(140.895) = 72.958547 ≈ 72.96
Vineet Padmanabhan Machine Learning
Fitted Values . . .
The least square line is y = 72.96 + .041x
The fitted values are calculated from yˆi = 72.958547 + .04103377xi
yˆ1 = 72.958547 + .04103377(125.3) ≈ 78.100, y1 − yˆ1 ≈ −.200

Nine of the 20 residuals are negative, so the corresponding nine points


in a scatterplot of the data lie below the estimated regression line.
Vineet Padmanabhan Machine Learning
Error Sum of Squares
X X
SSE = (yi − yˆi )2 = [yi − (βˆ0 + βˆ1 xi )]2
and the estimate of σ 2 is
(yi − yˆi )2
P
SSE
σ̂ 2 = S 2 = =
n−2 n−2
2
The divisor n − 2 in s is the number of degrees of freedom
P (df)
(xi −x̄)2
associated with SSE and the estimate s2 wherein s2 = (n−1) . To
obtain s2 , the two parameters β0 and β1 must first be estimated
which results in a loss of 2df. In one sample problems µ had to be
estimated with an estimated variance based on n − 1 df.
The error sum of squares for the filtration rate-moisture content data
as given in the earlier example is
SSE = (−.200)2 + (−.188)2 + . . . + (1.099)2 = 7.968
7.968
estimate of σ 2 = σ̂ 2 = s2 = (20−2) = .4427

estimated standard deviation = σ̂ = s = .4427 = .665
.665 is the magnitude of a typical deviation from the estimated regression
line - some points are closer to the line than this and others are farther
away. Vineet Padmanabhan Machine Learning
Error Sum of Squares . . .

Computation of SSE from the defining formula involves much tedious


arithmetic, because both the predicted values and residuals must first
be calculated. Use of the following computational formula does not
require these quantities.
X X X
SSE = yi2 − βˆ0 yi − βˆ1 xi yi = Syy − βˆ1 Sxy

From substituting yˆi = βˆ0 + βˆ1 xi into (yi − yˆi )2 , squaring the
P
summand, carrying through the sum to the resulting three terms and
simplifying.
These computational formulas are especially sensitive to the effects of
rounding in βˆ0 and βˆ1 , so carrying as many digits as possible in
intermediate computations will protect against round-off error.

Vineet Padmanabhan Machine Learning


P
The necessary summary quantities are n = 14, Pxi = 890,
x3i = 67, 182, yi2 = 103.54 and
P P P
yi = 37.6, xi yi = 2234.30.
Sxx = 10, 603.4285714, Sxy = −155.98571429, βˆ1 = −.0147109,
βˆ0 = 3.6209072.
SSE = 103.54 − (3.6209072)(37.6) − (−.0147109)(2234.30) = .2624532
The same value results from
2
(37.6)
SSE = Syy − βˆ1 Sxy = 103.54 − − (−.0147109)(−155.98571429)
14
Thus s2 = .2624532
12 = .0218711 and s = .1479. When βˆ0 and βˆ1 are
rounded to three decimal places in the first computational formula for
SSE the result is
SSE = 103.54 − (3.621)(37.6) − (−.015)(2234.30) = .905
which is more than three times the correct value.
Vineet Padmanabhan Machine Learning
Total Sum of Squares

Figure: Using the model to explain y variation: (a) data for which all
variation is explained; (b) data for which most variation is explained; (c)
data for which little variation is explained

Vineet Padmanabhan Machine Learning


The error sum of squares SSE can be interpreted as a measure of how
much variation in y is left unexplained by the model-that is, how
much cannot be attributed to a linear relationship. A quantitative
measure of the total amount of variation in observed y values is given
by the total sum of squares

( yi ) 2
X X P
2 2
SST = Sxy = (yi − ȳ) = yi −
n
A quantitative measure of the total amount of variation in observed y
values is given by the total sum of squares
1 Total sum of squares is the sum of squared deviations about the
sample mean of the observed y values.
2 The same number y is subtracted from each yi in SST, whereas
SSE involves subtracting each different predicted value yi from
the corresponding observed yi .
3 SSE is the sum of squared deviations about the least squares line
y = βˆ0 + βˆ1 x.
4 SST is the sum of squared deviations about the horizontal line at
height y (since then vertical deviations are yi − ȳ)

Vineet Padmanabhan Machine Learning


Least versus Horizontal

The sum of squared deviations about the least squares line is smaller
than the sum of squared deviations about any other line, SSE < SST
unless the horizontal line itself is the least squares line.
The ratio SSE
SST
is the proportion of total variation that cannot be
explained by the simple linear regression model,
1 − SSE
SST
(a number between 0 and 1) is the proportion of observed y
variation explained by the model.
Vineet Padmanabhan Machine Learning
The coefficient of determination

SSE
r2 = 1 −
SST
It is interpreted as the proportion of observed y variation that can be
explained by the simple linear regression model (attributed to an
approximate linear relationship between y and x).
The higher the value of r2 , the more successful is the simple
linear regression model in explaining y variation.
When regression analysis is done by a statistical computer
package, either r2 or 100r2 (the percentage of variation explained
by the model) is a prominent part of the output.
If r2 is small, an analyst will usually want to search for an
alternative model (either a nonlinear model or a multiple
regression model that involves more than a single independent
variable) that can more effectively explain y variation.

Vineet Padmanabhan Machine Learning


The example of iodine value - cetane number data as given in the
earlier example has reasonable high r2 value. For instnace,

βˆ0 = 75.212432, βˆ1 = −.20938742, yi = 779.2,


P

yi2 = 43, 745.22


P P
xi yi = 71, 347.30,
We have
(779.2)2
SST = 43, 745.22 − 14 = 377.174

SSE = 43, 745.22 − (75.212432)(779.2) − (−.20938742)(71, 347.30) =


78.920
The coefficient of determination is then
SSE (78.920)
r2 = 1 − =1− = .791
SST (377.174)
That is, 79.1% of the observed variation in cetane number is attributable to
(can be explained by) the simple linear regression relationship between
cetane number and iodine value (r2 values are even higher than this in
many scientific contexts, but social scientists would typically be ecstatic at
a value anywhere near this large).
Vineet Padmanabhan Machine Learning
Regression Sum of Squares

The coefficient of determination can be written in a slightly different


way by introducing a third sum of squares - regression sum of squares,
X
SSR = (yˆi − ȳ)2 = SST − SSE

Regression sum of squares is interpreted as the amount of total


variation that is explained by the model. Then we have

SSE (SST − SSE) SSR


r2 = 1 − = =
SST SST SST
the ratio of explained variation to total variation.

Vineet Padmanabhan Machine Learning

You might also like