0% found this document useful (0 votes)

68 views

Simple Linear Regression Analysis - ReliaWiki

Uploaded by

Mohmmad Reza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

Simple Linear Regression Analysis - ReliaWiki

Uploaded by

Mohmmad Reza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Simple Linear Regression Analysis

From ReliaWiki

Regression analysis is a statistical technique that attempts to explore and model the relationship between two or more variables. For
example, an analyst may want to know if there is a relationship between road accidents and the age of the driver. Regression analysis
forms an important part of the statistical analysis of the data obtained from designed experiments and is discussed briefly in this chapter.
Every experiment analyzed in a Weibull++ (https://koi-3QN72QORVC.marketingautomation.services/net/m?
md=Rw01CJDOxn%2FabhkPlZsy6DwBQ%2BaCXsGR) DOE foilo includes regression results for each of the responses. These results,
along with the results from the analysis of variance (explained in the One Factor Designs and General Full Factorial Designs chapters),
provide information that is useful to identify significant factors in an experiment and explore the nature of the relationship between these
factors and the response. Regression analysis forms the basis for all Weibull++ (https://koi-
3QN72QORVC.marketingautomation.services/net/m?md=Rw01CJDOxn%2FabhkPlZsy6DwBQ%2BaCXsGR) DOE folio calculations
related to the sum of squares used in the analysis of variance. The reason for this is explained in Appendix B. Additionally, DOE folios
also include a regression tool to see if two or more variables are related, and to explore the nature of the relationship between them.

This chapter discusses simple linear regression analysis while a subsequent chapter focuses on multiple linear regression analysis.

Simple Linear Regression Analysis

A linear regression model attempts to explain the relationship between two or more variables using a straight line. Consider the data
obtained from a chemical process where the yield of the process is thought to be related to the reaction temperature (see the table below).

This data can be entered in the DOE folio as shown in the following figure:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 1 of 29
And a scatter plot can be obtained as shown in the following figure. In the scatter plot yield, is plotted for different temperature
values, .

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 2 of 29
It is clear that no line can be found to pass through all points of the plot. Thus no functional relation exists between the two variables
and . However, the scatter plot does give an indication that a straight line may exist such that all the points on the plot are scattered
randomly around this line. A statistical relation is said to exist in this case. The statistical relation between and may be expressed
as follows:

The above equation is the linear regression model that can be used to explain the relation between and that is seen on the scatter
plot above. In this model, the mean value of (abbreviated as ) is assumed to follow the linear relation:

The actual values of (which are observed as yield from the chemical process from time to time and are random in nature) are assumed
to be the sum of the mean value, , and a random error term, :

The regression model here is called a simple linear regression model because there is just one independent variable, , in the model. In
regression models, the independent variables are also referred to as regressors or predictor variables. The dependent variable, , is also
referred to as the response. The slope, , and the intercept, , of the line are called regression coefficients.
The slope, , can be interpreted as the change in the mean value of for a unit change in .

The random error term, , is assumed to follow the normal distribution with a mean of 0 and variance of . Since is the sum of this
random term and the mean value, , which is a constant, the variance of at any given value of is also . Therefore, at any
given value of , say , the dependent variable follows a normal distribution with a mean of and a standard
deviation of . This is illustrated in the following figure.

Fitted Regression Line

The true regression line is usually not known. However, the regression line can be estimated by estimating the coefficients and

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 3 of 29
for an observed data set. The estimates, and , are calculated using least squares. (For details on least square estimates, refer to
Hahn & Shapiro (1967).) The estimated regression line, obtained using the values of and , is called the fitted line. The least
square estimates, and , are obtained using the following equations:

where is the mean of all the observed values and is the mean of all values of the predictor variable at which the observations were
taken. is calculated using and is calculated using .

Once and are known, the fitted regression line can be written as:

where is the fitted or estimated value based on the fitted regression model. It is an estimate of the mean value, . The fitted
value, , for a given value of the predictor variable, , may be different from the corresponding observed value, . The difference
between the two values is called the residual, :

Calculation of the Fitted Line Using Least Square Estimates

The least square estimates of the regression coefficients can be obtained for the data in the preceding table as follows:

Knowing and , the fitted regression line is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 4 of 29
This line is shown in the figure below.

Once the fitted regression line is known, the fitted value of corresponding to any observed data point can be calculated. For example,
the fitted value corresponding to the 21st observation in the preceding table is:

The observed response at this point is . Therefore, the residual at this point is:

In DOE folios, fitted values and residuals can be calculated. The values are shown in the figure below.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 5 of 29
Hypothesis Tests in Simple Linear Regression
The following sections discuss hypothesis tests on the regression coefficients in simple linear regression. These tests can be carried out if
it can be assumed that the random error term, , is normally and independently distributed with a mean of zero and variance of .

t Tests

The tests are used to conduct hypothesis tests on the regression coefficients obtained in simple linear regression. A statistic based on
the distribution is used to test the two-sided hypothesis that the true slope, , equals some constant value, . The statements for
the hypothesis test are expressed as:

The test statistic used for this test is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 6 of 29
where is the least square estimate of , and is its standard error. The value of can be calculated as follows:

The test statistic, , follows a distribution with degrees of freedom, where is the total number of observations. The
null hypothesis, , is accepted if the calculated value of the test statistic is such that:

where and are the critical values for the two-sided hypothesis. is the percentile of the distribution
corresponding to a cumulative probability of and is the significance level.

If the value of used is zero, then the hypothesis tests for the significance of regression. In other words, the test indicates if the fitted
regression model is of value in explaining variations in the observations or if you are trying to impose a regression model when no true
relationship exists between and . Failure to reject implies that no linear relationship exists between and .
This result may be obtained when the scatter plots of against are as shown in (a) of the following figure and (b) of the following figure.
(a) represents the case where no model exits for the observed data. In this case you would be trying to fit a regression model to noise or
random variation. (b) represents the case where the true relationship between and is not linear. (c) and (d) represent the case when
is rejected, implying that a model does exist between and . (c) represents the case where the linear model is
sufficient. In the following figure, (d) represents the case where a higher order model may be needed.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 7 of 29
A similar procedure can be used to test the hypothesis on the intercept. The test statistic used in this case is:

where is the least square estimate of , and is its standard error which is calculated using:

Example

The test for the significance of regression for the data in the preceding table is illustrated in this example. The test is carried out using the
test on the coefficient . The hypothesis to be tested is . To calculate the statistic to test , the estimate, , and
the standard error, , are needed. The value of was obtained in this section. The standard error can be calculated as follows:

Then, the test statistic can be calculated using the following equation:

The value corresponding to this statistic based on the distribution with 23 (n-2 = 25-2 = 23) degrees of freedom can be obtained as
follows:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 8 of 29
Assuming that the desired significance level is 0.1, since value < 0.1, is rejected indicating that a relation exists
between temperature and yield for the data in the preceding table. Using this result along with the scatter plot, it can be concluded that
the relationship between temperature and yield is linear.

In Weibull++ DOE folios, information related to the test is displayed in the Regression Information table as shown in the following
figure. In this table the test for is displayed in the row for the term Temperature because is the coefficient that represents the
variable temperature in the regression model. The columns labeled Standard Error, T Value and P Value represent the standard error, the
test statistic for the test and the value for the test, respectively. These values have been calculated for in this example. The
Coefficient column represents the estimate of regression coefficients. The Effect column represents values obtained by multiplying the
coefficients by a factor of 2. This value is useful in the case of two factor experiments and is explained in Two Level Factorial
Experiments. Columns Low Confidence and High Confidence represent the limits of the confidence intervals for the regression
coefficients and are explained in Confidence Interval on Regression Coefficients.

Analysis of Variance Approach to Test the Significance of Regression

The analysis of variance (ANOVA) is another method to test for the significance of regression. As the name implies, this approach uses
the variance of the observed data to determine if a regression model can be applied to the observed data. The observed variance is
partitioned into components that are then used in the test for significance of regression.

Sum of Squares

The total variance (i.e., the variance of all of the observed data) is estimated using the observed data. As mentioned in Statistical
Background, the variance of a population can be estimated using the sample variance, which is calculated using the following
relationship:

The quantity in the numerator of the previous equation is called the sum of squares. It is the sum of the square of deviations of all the
observations, , from their mean, . In the context of ANOVA this quantity is called the total sum of squares (abbreviated )
because it relates to the total variance of the observations. Thus:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 9 of 29
The denominator in the relationship of the sample variance is the number of degrees of freedom associated with the sample variance.
Therefore, the number of degrees of freedom associated with , , is . The sample variance is also referred to as
a mean square because it is obtained by dividing the sum of squares by the respective degrees of freedom. Therefore, the total mean
square (abbreviated ) is:

When you attempt to fit a regression model to the observations, you are trying to explain some of the variation of the observations using
this model. If the regression model is such that the resulting fitted regression line passes through all of the observations, then you would
have a "perfect" model (see (a) of the figure below). In this case the model would explain all of the variability of the observations.
Therefore, the model sum of squares (also referred to as the regression sum of squares and abbreviated ) equals the total sum of
squares; i.e., the model explains all of the observed variance:

For the perfect model, the regression sum of squares, , equals the total sum of squares, , because all estimated values, ,
will equal the corresponding observations, . can be calculated using a relationship similar to the one for obtaining by
replacing by in the relationship of . Therefore:

The number of degrees of freedom associated with is 1.

Based on the preceding discussion of ANOVA, a perfect regression model exists when the fitted regression line passes through all
observed points. However, this is not usually the case, as seen in (b) of the following figure.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 10 of 29
In both of these plots, a number of points do not follow the fitted regression line. This indicates that a part of the total variability of the
observed data still remains unexplained. This portion of the total variability or the total sum of squares, that is not explained by the
model, is called the residual sum of squares or the error sum of squares (abbreviated ). The deviation for this sum of squares is
obtained at each observation in the form of the residuals, . The error sum of squares can be obtained as the sum of squares of these
deviations:

The number of degrees of freedom associated with , , is . The total variability of the observed data (i.e.,
total sum of squares, ) can be written using the portion of the variability explained by the model, , and the portion
unexplained by the model, , as:

The above equation is also referred to as the analysis of variance identity and can be expanded as follows:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 11 of 29
Mean Squares

As mentioned previously, mean squares are obtained by dividing the sum of squares by the respective degrees of freedom. For example,
the error mean square, , can be obtained as:

The error mean square is an estimate of the variance, , of the random error term, , and can be written as:

Similarly, the regression mean square, , can be obtained by dividing the regression sum of squares by the respective degrees of
freedom as follows:

F Test

To test the hypothesis , the statistic used is based on the distribution. It can be shown that if the null hypothesis is
true, then the statistic:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 12 of 29
follows the distribution with degree of freedom in the numerator and degrees of freedom in the denominator. is
rejected if the calculated statistic, , is such that:

where is the percentile of the distribution corresponding to a cumulative probability of ( ) and is the significance
level.

Example

The analysis of variance approach to test the significance of regression can be applied to the yield data in the preceding table. To
calculate the statistic, , for the test, the sum of squares have to be obtained. The sum of squares can be calculated as shown next. The
total sum of squares can be calculated as:

The regression sum of squares can be calculated as:

The error sum of squares can be calculated as:

Knowing the sum of squares, the statistic to test can be calculated as follows:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 13 of 29
The critical value at a significance level of 0.1 is . Since , is rejected and it is
concluded that is not zero. Alternatively, the value can also be used. The value corresponding to the test statistic, , based on
the distribution with one degree of freedom in the numerator and 23 degrees of freedom in the denominator is:

Assuming that the desired significance is 0.1, since the value < 0.1, then is rejected, implying that a relation does exist
between temperature and yield for the data in the preceding table. Using this result along with the scatter plot of the above figure, it can
be concluded that the relationship that exists between temperature and yield is linear. This result is displayed in the ANOVA table as
shown in the following figure. Note that this is the same result that was obtained from the test in the section t Tests. The ANOVA and
Regression Information tables in Weibull++ DOE folios represent two different ways to test for the significance of the regression model.
In the case of multiple linear regression models these tables are expanded to allow tests on individual variables used in the model. This is
done using extra sum of squares. Multiple linear regression models and the application of extra sum of squares in the analysis of these
models are discussed in Multiple Linear Regression Analysis.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 14 of 29
Confidence Intervals in Simple Linear Regression
A confidence interval represents a closed interval where a certain percentage of the population is likely to lie. For example, a 90%
confidence interval with a lower limit of and an upper limit of implies that 90% of the population lies between the values of
and . Out of the remaining 10% of the population, 5% is less than and 5% is greater than . (For details refer to the Life Data
Analysis Reference Book.) This section discusses confidence intervals used in simple linear regression analysis.

Confidence Interval on Regression Coefficients

A 100 ( ) percent confidence interval on is obtained as follows:

Similarly, a 100 ( ) percent confidence interval on is obtained as:

Confidence Interval on Fitted Values

A 100 ( ) percent confidence interval on any fitted value, , is obtained as follows:

It can be seen that the width of the confidence interval depends on the value of and will be a minimum at and will widen as
increases.

Confidence Interval on New Observations

For the data in the preceding table, assume that a new value of the yield is observed after the regression model is fit to the data. This new
observation is independent of the observations used to obtain the regression model. If is the level of the temperature at which the
new observation was taken, then the estimate for this new value based on the fitted regression model is:

If a confidence interval needs to be obtained on , then this interval should include both the error from the fitted model and the error
associated with future observations. This is because represents the estimate for a value of that was not used to obtain the
regression model. The confidence interval on is referred to as the prediction interval. A 100 ( ) percent prediction interval on a
new observation is obtained as follows:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 15 of 29
Example

To illustrate the calculation of confidence intervals, the 95% confidence intervals on the response at for the data in the
preceding table is obtained in this example. A 95% prediction interval is also obtained assuming that a new observation for the yield was
made at .

The fitted value, , corresponding to is:

The 95% confidence interval on the fitted value, , is:

The 95% limits on are 199.95 and 205.2, respectively. The estimated value based on the fitted regression model for the new
observation at is:

The 95% prediction interval on is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 16 of 29
The 95% limits on are 189.9 and 207.2, respectively. In Weibull++ DOE folios, confidence and prediction intervals can be calculated
from the control panel. The prediction interval values calculated in this example are shown in the figure below as Low Prediction
Interval and High Prediction Interval, respectively. The columns labeled Mean Predicted and Standard Error represent the values of
and the standard error used in the calculations.

Measures of Model Adequacy

It is important to analyze the regression model before inferences based on the model are undertaken. The following sections present
some techniques that can be used to check the appropriateness of the model for the given data. These techniques help to determine if any
of the model assumptions have been violated.

Coefficient of Determination (R2)

The coefficient of determination is a measure of the amount of variability in the data accounted for by the regression model. As
mentioned previously, the total variability of the data is measured by the total sum of squares, . The amount of this variability
explained by the regression model is the regression sum of squares, . The coefficient of determination is the ratio of the regression
sum of squares to the total sum of squares.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 17 of 29
can take on values between 0 and 1 since . For the yield data example, can be calculated as:

Therefore, 98% of the variability in the yield data is explained by the regression model, indicating a very good fit of the model. It may
appear that larger values of indicate a better fitting regression model. However, should be used cautiously as this is not always
the case. The value of increases as more terms are added to the model, even if the new term does not contribute significantly to the
model. Therefore, an increase in the value of cannot be taken as a sign to conclude that the new model is superior to the older model.
Adding a new term may make the regression model worse if the error mean square, , for the new model is larger than the
of the older model, even though the new model will show an increased value of . In the results obtained from the DOE folio, is
displayed as R-sq under the ANOVA table (as shown in the figure below), which displays the complete analysis sheet for the data in the
preceding table.

The other values displayed with are S, R-sq(adj), PRESS and R-sq(pred). These values measure different aspects of the adequacy of the
regression model. For example, the value of S is the square root of the error mean square, , and represents the "standard error of
the model." A lower value of S indicates a better fitting model. The values of S, R-sq and R-sq(adj) indicate how well the model fits the
observed data. The values of PRESS and R-sq(pred) are indicators of how well the regression model predicts new observations. R-
sq(adj), PRESS and R-sq(pred) are explained in Multiple Linear Regression Analysis.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 18 of 29
Residual Analysis

In the simple linear regression model the true error terms, , are never known. The residuals, , may be thought of as the observed
error terms that are similar to the true error terms. Since the true error terms, , are assumed to be normally distributed with a mean of
zero and a variance of , in a good model the observed error terms (i.e., the residuals, ) should also follow these assumptions. Thus
the residuals in the simple linear regression should be normally distributed with a mean of zero and a constant variance of . Residuals
are usually plotted against the fitted values, , against the predictor variable values, , and against time or run-order sequence, in
addition to the normal probability plot. Plots of residuals are used to check for the following:

1. Residuals follow the normal distribution.

2. Residuals have a constant variance.
3. Regression function is linear.
4. A pattern does not exist when residuals are plotted in a time or run-order sequence.
5. There are no outliers.

Examples of residual plots are shown in the following figure. (a) is a satisfactory plot with the residuals falling in a horizontal band with
no systematic pattern. Such a plot indicates an appropriate regression model. (b) shows residuals falling in a funnel shape. Such a plot
indicates increase in variance of residuals and the assumption of constant variance is violated here. Transformation on may be helpful
in this case (see Transformations). If the residuals follow the pattern of (c) or (d), then this is an indication that the linear regression
model is not adequate. Addition of higher order terms to the regression model or transformation on or may be required in such
cases. A plot of residuals may also show a pattern as seen in (e), indicating that the residuals increase (or decrease) as the run order
sequence or time progresses. This may be due to factors such as operator-learning or instrument-creep and should be investigated further.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 19 of 29
Example

Residual plots for the data of the preceding table are shown in the following figures. One of the following figures is the normal
probability plot. It can be observed that the residuals follow the normal distribution and the assumption of normality is valid here. In one
of the following figures the residuals are plotted against the fitted values, , and in one of the following figures the residuals are plotted
against the run order. Both of these plots show that the 21st observation seems to be an outlier. Further investigations are needed to study
the cause of this outlier.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 20 of 29
http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM
Page 21 of 29
Lack-of-Fit Test

As mentioned in Analysis of Variance Approach, ANOVA, a perfect regression model results in a fitted line that passes exactly through
all observed data points. This perfect model will give us a zero error sum of squares ( ). Thus, no error exists for the perfect
model. However, if you record the response values for the same values of for a second time, in conditions maintained as strictly
identical as possible to the first time, observations from the second time will not all fall along the perfect model. The deviations in
observations recorded for the second time constitute the "purely" random variation or noise. The sum of squares due to pure error
(abbreviated ) quantifies these variations. is calculated by taking repeated observations at some or all values of and
adding up the square of deviations at each level of using the respective repeated observations at that value.

Assume that there are levels of and repeated observations are taken at each the level. The data is collected as shown next:

The sum of squares of the deviations from the mean of the observations at the level of , , can be calculated as:

where is the mean of the repeated observations corresponding to ( ). The number of degrees of

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 22 of 29
freedom for these deviations is ( ) as there are observations at the level of but one degree of freedom is lost in
calculating the mean, .

The total sum of square deviations (or ) for all levels of can be obtained by summing the deviations for all as shown next:

The total number of degrees of freedom associated with is:

If all , (i.e., repeated observations are taken at all levels of ), then and the degrees of freedom
associated with are:

The corresponding mean square in this case will be:

When repeated observations are used for a perfect regression model, the sum of squares due to pure error, , is also considered as
the error sum of squares, . For the case when repeated observations are used with imperfect regression models, there are two
components of the error sum of squares, . One portion is the pure error due to the repeated observations. The other portion is the
error that represents variation not captured because of the imperfect model. The second portion is termed as the sum of squares due to
lack-of-fit (abbreviated ) to point to the deficiency in fit due to departure from the perfect-fit model. Thus, for an imperfect
regression model:

Knowing and , the previous equation can be used to obtain :

The degrees of freedom associated with can be obtained in a similar manner using subtraction. For the case when repeated
observations are taken at all levels of , the number of degrees of freedom associated with is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 23 of 29
Since there are total observations, the number of degrees of freedom associated with is:

Therefore, the number of degrees of freedom associated with is:

The corresponding mean square, , can now be obtained as:

The magnitude of or will provide an indication of how far the regression model is from the perfect model. An
test exists to examine the lack-of-fit at a particular significance level. The quantity follows an distribution with
degrees of freedom in the numerator and degrees of freedom in the denominator when all equal . The
test statistic for the lack-of-fit test is:

If the critical value is such that:

it will lead to the rejection of the hypothesis that the model adequately fits the data.

Example

Assume that a second set of observations are taken for the yield data of the preceding table
(http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis#Simple_Linear_Regression_Analysis|) . The resulting observations
are recorded in the following table. To conduct a lack-of-fit test on this data, the statistic , can be calculated
as shown next.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 24 of 29
Calculation of Least Square Estimates

The parameters of the fitted regression model can be obtained as:

Knowing and , the fitted values, , can be calculated.

Calculation of the Sum of Squares

Using the fitted values, the sum of squares can be obtained as follows:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 25 of 29
Calculation of

The error sum of squares, , can now be split into the sum of squares due to pure error, , and the sum of squares due to lack-
of-fit, . can be calculated as follows considering that in this example and :

The number of degrees of freedom associated with is:

The corresponding mean square, , can now be obtained as:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 26 of 29
can be obtained by subtraction from as:

Similarly, the number of degrees of freedom associated with is:

The lack-of-fit mean square is:

Calculation of the Test Statistic

The test statistic for the lack-of-fit test can now be calculated as:

The critical value for this test is:

Since , we fail to reject the hypothesis that the model adequately fits the data. The value for this case is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 27 of 29
Therefore, at a significance level of 0.05 we conclude that the simple linear regression model, , is adequate for
the observed data. The following table presents a summary of the ANOVA calculations for the lack-of-fit test.

Transformations
The linear regression model may not be directly applicable to certain data. Non-linearity may be detected from scatter plots or may be
known through the underlying theory of the product or process or from past experience. Transformations on either the predictor variable,
, or the response variable, , may often be sufficient to make the linear regression model appropriate for the transformed data. If it is
known that the data follows the logarithmic distribution, then a logarithmic transformation on (i.e., ) might be
useful. For data following the Poisson distribution, a square root transformation ( ) is generally applicable.

Transformations on may also be applied based on the type of scatter plot obtained from the data. The following figure shows a few
such examples.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 28 of 29
For the scatter plot labeled (a), a square root transformation ( ) is applicable. While for the plot labeled (b), a logarithmic
transformation (i.e., ) may be applied. For the plot labeled (c), the reciprocal transformation ( ) is
applicable. At times it may be helpful to introduce a constant into the transformation of . For example, if is negative and the
logarithmic transformation on Y seems applicable, a suitable constant, , may be chosen to make all observed positive. Thus the
transformation in this case would be .

The Box-Cox method may also be used to automatically identify a suitable power transformation for the data based on the relation:

Here the parameter is determined using the given data such that is minimized (details on this method are presented in One
Factor Designs).

Retrieved from "http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis"

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Page 29 of 29

Econometrics Assignment
No ratings yet
Econometrics Assignment
40 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Week 2
No ratings yet
Week 2
33 pages
Session 18 Regression
No ratings yet
Session 18 Regression
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Lecture 10
No ratings yet
Lecture 10
38 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Engineering - Simple Correlation and Regression - 2024
No ratings yet
Engineering - Simple Correlation and Regression - 2024
35 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
5 - Part II - Regression Analysis w-notes(1)
No ratings yet
5 - Part II - Regression Analysis w-notes(1)
10 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Regression and Analysis
No ratings yet
Regression and Analysis
132 pages
Linear Regression
No ratings yet
Linear Regression
33 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
QUIZ (Objectives) Identification: - (Residual)
No ratings yet
QUIZ (Objectives) Identification: - (Residual)
5 pages
Linear Models
No ratings yet
Linear Models
92 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
EECM3724 Unit 9 ch14 Slides 2023
No ratings yet
EECM3724 Unit 9 ch14 Slides 2023
57 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
ch12 0
No ratings yet
ch12 0
82 pages
An introduction to simple linear regression
No ratings yet
An introduction to simple linear regression
2 pages
Regression Analysis and Modeling-Xii
No ratings yet
Regression Analysis and Modeling-Xii
9 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
Stats and Maths
No ratings yet
Stats and Maths
29 pages
Intronumericalrecipes v01 Chapter02 Regress
No ratings yet
Intronumericalrecipes v01 Chapter02 Regress
15 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Regression For Everyone Vol. 1
No ratings yet
Regression For Everyone Vol. 1
25 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Course 10-Part 1
No ratings yet
Course 10-Part 1
32 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
92 pages
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
No ratings yet
Yesim Ozan - Simple Linear Regression-Presentation - 08.08.15
19 pages
Pradytha Galuh Putranti_2304220013_SSD_B ING-STAT (2)
No ratings yet
Pradytha Galuh Putranti_2304220013_SSD_B ING-STAT (2)
26 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
UE20CS312 Unit2 Slides
No ratings yet
UE20CS312 Unit2 Slides
206 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
UDAU M6 Correlation & Regression
No ratings yet
UDAU M6 Correlation & Regression
26 pages
CS2A - April22 - EXAM - Clean Proof
No ratings yet
CS2A - April22 - EXAM - Clean Proof
8 pages
MIL STD 105E - Text PDF
No ratings yet
MIL STD 105E - Text PDF
73 pages
Sample Size Thru Slovin's PDF
No ratings yet
Sample Size Thru Slovin's PDF
8 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
92 pages
Midterm Review Worksheet
No ratings yet
Midterm Review Worksheet
11 pages
Npar Tests: Output Spss Versi 17
No ratings yet
Npar Tests: Output Spss Versi 17
4 pages
BRM PPT For Introduction To SPSS
No ratings yet
BRM PPT For Introduction To SPSS
35 pages
Fbs Research Chapter 3
No ratings yet
Fbs Research Chapter 3
2 pages
Module 4
No ratings yet
Module 4
9 pages
Lecture 10 Inference About Means and Proportions With Two Populations - Exercises
0% (1)
Lecture 10 Inference About Means and Proportions With Two Populations - Exercises
56 pages
IB AA HL Test Paper 2 - Probability and Statistics
No ratings yet
IB AA HL Test Paper 2 - Probability and Statistics
3 pages
Sciencedirect: Categorical Principal Component Logistic Regression: A Case Study For Housing Loan Approval
No ratings yet
Sciencedirect: Categorical Principal Component Logistic Regression: A Case Study For Housing Loan Approval
7 pages
Capital Cost Estimation of Reverse Osmosis Plants GCC Countries Vs Southern Europe
No ratings yet
Capital Cost Estimation of Reverse Osmosis Plants GCC Countries Vs Southern Europe
10 pages
Sec 9.3 - 2020 Linear Regression
No ratings yet
Sec 9.3 - 2020 Linear Regression
8 pages
3 Data Sampling, Collection and Testing Powerpoint
No ratings yet
3 Data Sampling, Collection and Testing Powerpoint
14 pages
Elementary Satistics Learning Packet
No ratings yet
Elementary Satistics Learning Packet
29 pages
Diagram Simple Regression Appropriate: Roadway
No ratings yet
Diagram Simple Regression Appropriate: Roadway
1 page
Introduction To Hypothesis and Its Concepts 87
No ratings yet
Introduction To Hypothesis and Its Concepts 87
5 pages
Project Management: Demand Forecast
100% (1)
Project Management: Demand Forecast
30 pages
WLP Statistics Probability Week 8
No ratings yet
WLP Statistics Probability Week 8
3 pages
Normal Distributions and The Empirical Rule
0% (1)
Normal Distributions and The Empirical Rule
3 pages
RVS Certificate in Advanced Business Analytics and Data Science Using SAS and R
No ratings yet
RVS Certificate in Advanced Business Analytics and Data Science Using SAS and R
4 pages
Statistics From Stem and Leaf Diagrams
No ratings yet
Statistics From Stem and Leaf Diagrams
3 pages
Exploratory Factor Analysis
100% (1)
Exploratory Factor Analysis
52 pages
Class 12 AI - Unit 1
No ratings yet
Class 12 AI - Unit 1
10 pages
IGNOU MBA MS-08 Solved Assignment
No ratings yet
IGNOU MBA MS-08 Solved Assignment
12 pages
data analysis (workshop)
No ratings yet
data analysis (workshop)
2 pages
Meta Birth Task
No ratings yet
Meta Birth Task
17 pages

Simple Linear Regression Analysis - ReliaWiki

Uploaded by

Simple Linear Regression Analysis - ReliaWiki

Uploaded by

Simple Linear Regression Analysis

Simple Linear Regression Analysis

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Fitted Regression Line

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Calculation of the Fitted Line Using Least Square Estimates

Knowing and , the fitted regression line is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The test statistic used for this test is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Analysis of Variance Approach to Test the Significance of Regression

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The number of degrees of freedom associated with is 1.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The regression sum of squares can be calculated as:

The error sum of squares can be calculated as:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Confidence Interval on Regression Coefficients

A 100 ( ) percent confidence interval on is obtained as follows:

Similarly, a 100 ( ) percent confidence interval on is obtained as:

Confidence Interval on Fitted Values

A 100 ( ) percent confidence interval on any fitted value, , is obtained as follows:

Confidence Interval on New Observations

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The fitted value, , corresponding to is:

The 95% confidence interval on the fitted value, , is:

The 95% prediction interval on is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Measures of Model Adequacy

Coefficient of Determination (R2)

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

1. Residuals follow the normal distribution.

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The total number of degrees of freedom associated with is:

The corresponding mean square in this case will be:

Knowing and , the previous equation can be used to obtain :

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Therefore, the number of degrees of freedom associated with is:

The corresponding mean square, , can now be obtained as:

If the critical value is such that:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The parameters of the fitted regression model can be obtained as:

Knowing and , the fitted values, , can be calculated.

Calculation of the Sum of Squares

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

The number of degrees of freedom associated with is:

The corresponding mean square, , can now be obtained as:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Similarly, the number of degrees of freedom associated with is:

The lack-of-fit mean square is:

Calculation of the Test Statistic

The critical value for this test is:

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

Retrieved from "http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis"

http://reliawiki.org/index.php/Simple_Linear_Regression_Analysis 2020/2/29, 9:57 AM

You might also like