0% found this document useful (0 votes)
44 views

Module 6A Estimating Relationships

The document discusses regression analysis techniques for estimating relationships between variables and making predictions. It defines regression analysis and its uses in predictive analytics. Key aspects covered include formulating linear regression models, interpreting estimated coefficients, assessing goodness of fit using measures like R-squared and standard error, and conducting tests of significance on estimated relationships. Examples are provided to illustrate estimating and interpreting a simple linear regression model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Module 6A Estimating Relationships

The document discusses regression analysis techniques for estimating relationships between variables and making predictions. It defines regression analysis and its uses in predictive analytics. Key aspects covered include formulating linear regression models, interpreting estimated coefficients, assessing goodness of fit using measures like R-squared and standard error, and conducting tests of significance on estimated relationships. Examples are provided to illustrate estimating and interpreting a simple linear regression model.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 104

REGRESSION ANALYSIS:

ESTIMATING RELATIONSHIPS

Predictive Analytics
Learning Objective:
At the end of the lesson, the student should be able to:
• Recall how to estimate and interpret a linear regression model.
• Interpret goodness-of-fit measures.
• Conduct tests of significance.
• Estimate and interpret multiple linear regression models.
Regression Analysis
▪ One of the most widely used techniques in predictive analytics.
▪ Used to capture the relationship between two or more variables and
to predict the outcome of a target variable based on several input
variables.
▪ The hypothesized relationship may be linear, quadratic, or some other
form.
▪ It also allows us to make assessments and robust predictions by
determining which of the relationships matter most or can be
ignored.
Examples of nonlinear
relationships
THE LINEAR REGRESSION MODEL
We formulate a linear model that relates the outcome of a target
variable (also called a response, criterion, or dependent variable) to
one or more other input variables (called stimulus, predictor or
independent variable).
Consequently, we use the information on the predictor variables
to describe and/or predict changes in the response variable.

Business Analytics by Jaggia et al 2021


THE LINEAR REGRESSION MODEL

If the value of the response variables is uniquely determined by the


values of the predictor variables (like in physical sciences), the
relationship is deterministic. In most fields of research, the relationship
is stochastic due to the omission of relevant factors that influence the
response variable.

Business Analytics by Jaggia et al 2021


Regression Analysis: Types of relationships

Source: Statistics for Manager Using Microsoft Excel, 5e @ 2008 Prentice-Hall, Inc
Regression Analysis: Types of relationships

Source: Statistics for Manager Using Microsoft Excel, 5e @ 2008 Prentice-Hall, Inc
Regression Analysis: Types of relationships

Source: Statistics for Manager Using Microsoft Excel, 5e @ 2008 Prentice-Hall, Inc
Simple Linear Regression Equation
A common approach to obtaining estimates for the coefficients is to use
the ordinary least squares (OLS) method. OLS estimators have many desirable
properties if certain assumptions hold.

Business Analytics by Jaggia et al 2021


Business Analytics by Jaggia et al 2021; Business Analytics: data analysis and decision making by Albright & Winston, 2020
Indicators of Linear Relationship: Scatter plots
Scatter plots provide
graphical indications of
relationships, whether
they are linear, non-linear
or essentially non-
existent.
In Excel, use these functions
to get the value of the
correlation coefficient.
1. =CORREL(array1, array2)
2. =PEARSON(array1,
array2)
Indicators of Linear relationship: Correlation coefficients
▪ The analysis of bivariate data typically begins with a scatter plot
that displays each observed pair of data (x, y) as a dot on the x-y
plane.
▪ Correlation coefficients are numerical measures that indicate the
strength of LINEAR relationships between pairs of variables.
▪ Correlation is only concerned with the strength of the
relationship.
▪ No causal effect is implied with correlation.
▪ If there is a non-linear relationship, as suggested by the
scatterplot, the correlation can be completely misleading.
Interpreting an Estimated Regression Equation
The slope tells us how much, and in what direction, the dependent or response
variable will change for each one unit increase in the predictor variable. On the
other hand, the intercept is meaningful only if the predictor variable would
reasonably have a value equal to zero.
Equation:
𝑆𝑎𝑙𝑒𝑠 = 268 + 7.37 𝐴𝑑𝑠
Interpretation:
Each extra P1 million of advertising will generate P7.37 million of sales on average.
The firm would average P268 million of sales with zero advertising. However, the
intercept may not be meaningful because Ads = 0 may be outside the range of
observed data.
Interpreting an Estimated Regression Equation
Other examples:
Prediction Using Regression
One of the main uses of regression is to make predictions. Once we have a fitted
regression equation that shows the estimated relationship between X and Y, we can
plug in any value of X (within the range of our sample x values) to obtain the
prediction for Y.
Example 1
Table 1 shows the Excel output for estimating the linear regression model
𝐸𝑎𝑟𝑛𝑖𝑛𝑔𝑠 = 𝛽𝑜 + 𝛽1 𝐶𝑜𝑠𝑡 + 𝛽2 𝐺𝑟𝑎𝑑 +𝛽3 𝐷𝑒𝑏𝑡 +𝛽4 𝐶𝑖𝑡𝑦 + 𝜀

where Earnings is annual post-college earnings (in $), Cost is the average annual
cost (in $), Grad if the graduation rate (in %), Debt is the percentage of students
paying down debt (in %), and City assumes a value of 1 if the college is located in a
city, 0 otherwise.
a. What is the sample regression equation?
b. Interpret the slope coefficients.
c. Predict annual post-college earnings if a college’s average annual cost is
$25,000, its graduation rate is 60%, its percentage of students paying down
debt is 80%, and it is located in a city.
Table 1
Table 1
Table 1
Table 1
Table 1
Table 1
Example 1 Answers
a. The sample regression equation is:

b. All coefficients are positive, suggesting a positive influence of each predictor


variable on the response variable.
Example 1 Answers
b. All coefficients are positive, suggesting a positive influence of each predictor
variable on the response variable.
Example 1 Answers
b. All coefficients are positive, suggesting a positive influence of each predictor
variable on the response variable.
Example 1 Answers
Example 1.
Does the number of hours a student studies affect his or her
exam score? Shown in the table are the data for 10 students.

Student Hours, X Score, Y Student Hours, X Score, Y

1 1 53 6 11 84

2 5 74 7 14 96

3 7 59 8 15 69

4 8 43 9 15 84

5 10 56 10 19 83
Example 1 Excel output
For the scatterplot:
1. Highlight X array and
Y array.
2. Choose Insert.
3. Choose Scatter among
the chart types
available.
4. Edit the axis labels.
Example 1 Excel output
For regression:
1. Go to Data, choose
Data Analysis.
2. Choose Regression
among the Data
Analysis Tools.
3. Fill up necessary
fields.
4. Click OK.
Example 1 Excel output

The regression equation is:


𝑆𝑐𝑜𝑟𝑒 = 49.477 + 1.9641 ∗ 𝑋 (ℎ𝑜𝑢𝑟𝑠)
Example 1 Excel output

Intercept = 49.477
Example 1 Interpretation of coefficients
𝑆𝑐𝑜𝑟𝑒 = 49.477 + 1.9641 ∗ 𝑋 (ℎ𝑜𝑢𝑟𝑠)

▪ B1 measures the change in the average value of Y as a


result of a one-unit change in X.
▪ Here, 𝑏1 = 1.9641 tells us that the mean score in the
exam increases by 1.9641(1 hour) = 1.9641, on average,
for each additional one hour of studying for the
examination.
▪ Also, 𝑏0 = 49.477 tells us that a student who did not
study would expect a score of about 49.
Model Selection
• We cannot assess how well the predictor variables explain
the variation in the response variable by simply observing
the sample regression equation.
• “Goodness of fit” measures do exist that summarize how
well the sample regression equation fits the data.
• Among these are the standard error of the estimate, the
coefficient of determination, and the adjusted R squared.
Assessing Fit: Coefficient of determination, R2
▪ The coefficient of determination is the portion of the total
variation in the dependent variable that is explained by the
variation in the independent variable.
▪ It is also called r-squared and is obtained using the given
equation. The closer the value of R squared to 1, the better
the fit.
2
𝑆𝑆𝑅 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑟 = =
𝑆𝑆𝑇 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

0 ≤ 𝑟2 ≤ 1
Example 1 Coefficient of determination

𝑆𝑆𝑅 1020.341
𝑟2 = = = 0.39412
𝑆𝑆𝑇 2588.90
39.41% of the variation in scores
is explained by the variation in
study hours.
Standard Error of Estimate
Example 1 Standard error of estimate

𝑆𝑌𝑋 = 14.002
Comparing Standard Errors
𝑆𝑌𝑋 is a measure of the variation of observed Y values from the
regression line.
The magnitude of 𝑆𝑌𝑋 should always be judged relative to the
size of the Y values in the sample data.
Adjusted R squared
Adjusted R squared
Tests of Significance
We test for joint and individual significance to determine
whether there is evidence of a linear relationship between the
response variable and the predictor variables.
For the test to be valid, the OLS estimators b1, b2, …, bk
must be normally distributed. This condition is satisfied if the
random error term is normally distributed. If we cannot
assume the normality of the errors, then the test are valid
only for large sample sizes.
Test of Joint Significance or Overall Fit
Test of Joint Significance or Overall Fit
Table 1
Test of Joint Significance or Overall fit
Test of Individual Significance:
▪ The t-test for a population slope is used to determine if there
is a linear relationship between X and Y.
▪ Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 ≠ 0 (linear relationship does exist)
Test of Individual Significance:
Example 1 Inferences about the slope
The estimated regression equation is:

𝑆𝑐𝑜𝑟𝑒 = 49.477 + 1.9641 ∗ 𝑋 (ℎ𝑜𝑢𝑟𝑠)

The slope of this model is 1.9641.


Is there a relationship between the study
hours and the student’s exam score?
Example 1 Excel output

1.9641 − 0
𝑡= = 1.9641
0.8610
𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8
𝑏1
𝑇. 𝐷𝐼𝑆𝑇 2.281221,8,2 = .052
Example 1 Inference about slope

𝑇. 𝐷𝐼𝑆𝑇 2.281221,8,2
= .052 = p-value

• H0 : β1 = 0
• H1 : β 1 ≠ 0

Do not reject the null hypothesis since p > α.

There is no sufficient evidence that study hours affects exam scores.


EXERCISES
Multiple Regression
▪ Multiple regression extends simple regression to include several
independent variables (called predictors or explanatory variables).
▪ It is required when a single-predictor model is inadequate to describe
the relationship between the response variable (Y) and its potential
predictors (X1, X2, X3, …).
▪ The interpretation is similar to simple regression since simple
regression is a special case of multiple regression.
Limitations of Simple Regression
▪ Multiple relationships usually exist.
▪ The estimates are biased if relevant predictors are omitted.
▪ The lack of fit (low R-squared) does not show that X is unrelated to Y
if the true model is multivariate.
▪ Simple regression is only used when there is a compelling need for a
simple model, or when other predictors have only modest effects and a
simple logical predictor ”stands out” as doing a very good job all by
itself.
Fitted regression: comparison between
a 1-predictor model versus a 2-predictor model
Characteristics of Multiple Regression
▪ Graphically, you are no longer fitting a line to a set of points. If there
are 2 explanatory variables, you are fitting a plane to the data in 3-
dimensional space.
▪ The regression is still estimated by the least-squares method – that is,
by minimizing the sum of squared residuals.
▪ There is a slope term for each explanatory variables in the equation.
▪ The standard error of estimate and R2 measures are almost exactly the
same as in simple regression.
▪ Many types of explanatory variables can be included in the regression
equation.
Assessing over-all model fit: F-test for joint significance
▪ Before determining which, if any, of the individual predictors
are significant, we perform a global test for overall fit using
the F-test.
▪ The F-test for joint significance is often regarded as a test of
the over-all usefulness of a regression. It determines whether
the predictor variables have a joint statistical influence on y,
the response variable.
Example 1.
A distributor of frozen dessert pies wants to evaluate factors
thought to influence demand.
- the dependent variable is pie sales (units per week)
- the independent variables are price (in USD) and
advertising cost (in hundred USD)

The data are collected for 15 weeks.


Example 1. Obtain the multiple regression equation.
Advertising Costs,
Week Pie Sales Price, $
($100s) Multiple regression equation:
1 350 5.50 3.3
2 460 7.50 3.3 Sales = 𝑏0 + 𝑏1 (Price) +𝑏2 (Ads cost)
3 350 8.00 3.0
4 430 8.00 4.5 or
5 350 6.80 3.0
6 380 7.50 4.0
Sales = 𝑏0 + 𝑏1 𝑋1 +𝑏2 𝑋2
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0 Where: 𝑋1 = Price
11 340 7.20 3.5
12 300 7.90 3.2 𝑋1 = Ads cost
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Use Excel to generate the output

Multiple regression equation:

Sales = 306.526 - 24.975(X1 ) + 74.131(X 2 )


Interpretation of the regression coefficients

Sales = 306.526 − 24.975(X1 ) + 74.131(X2 )

𝑏1 = −24.975: 𝑏2 = 74.131:
Sales will decrease, on Sales will increase, on
average, by 24.975 pies per average, by 74.131 pies per
week for each $1 increase in week for each $100 increase
selling price, net of the in advertising cost, net of
effects of changes due to the effects of changes due to
advertising. price.
Predict sales for a week if the selling price is 6.50 and
the advertising cost is $420:

Sales = 306.526 − 24.975 X1 + 74.131 X2


= 306.526 − 24.975 6.50 + 74.131 4.20
= 780.2137

Note that advertising is in


$100s, so $420 means that
Predicted sales is X2 =4.20.
780.21 pies.
Example 1.
Advertising Costs,
Conduct a test to determine if the Week Pie Sales Price, $
($100s)
1 350 5.50 3.3
predictor variables are jointly 2 460 7.50 3.3
significant in explaining pie sales 3 350 8.00 3.0
4 430 8.00 4.5
at the 0.05 level of significance. 5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
ASSESSING OVERALL FIT: F-test for significance
For a regression with k predictors, the hypotheses to be tested are:

Ho: All the true coefficients are zero (𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0)


H1: At least one of the coefficients is nonzero.
ASSESSING OVERALL FIT: F-test for significance

𝑀𝑆𝑅 14730.013
𝐹= = = 6.539
𝑀𝑆𝐸 2252.776
The p-value is 0.012. Reject the
null hypothesis at α=0.05.

There is sufficient evidence that


at least one independent variable
affects Y.
COEFFICIENT OF MULTIPLE DETERMINATION
▪ The coefficient of multiple determination reports the
proportion of total variation in Y that is explained by the
variation of all predictor variables taken together.
▪ It measures the goodness of linear fit.
▪ It is also called r-squared and is obtained by:
2
𝑆𝑆𝑅 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠
𝑟 = =
𝑆𝑆𝑇 𝑡𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

0 ≤ 𝑟2 ≤ 1
ASSESSING OVERALL FIT: Coeff. of Multiple Determination

𝑆𝑆𝑅 24960.027
𝑅2 = = = 0.521
𝑆𝑆𝑇 56493.333
52.1% of the variation in pie sales
is explained by the variation in
selling price and advertising cost.
ADJUSTED R2
▪ R-squared increases when a new predictor variable X is
added to the model.
▪ This can be a disadvantage when comparing models.
▪ What is the net effect of adding a new variables?
▪ We lose a degree of freedom when a new variable is
added.
▪ Did the new X variable add enough independent power to
offset the loss of one degree of freedom?
ADJUSTED R2
▪ The adjusted R2 shows the proportion of variation in Y explained by
all X variables adjusted for the number of X variables used.

▪ It penalizes excessive use of unimportant predictor variables.


▪ It is smaller than R2.
▪ It is used to monitor whether the equation is getting better or
worse as more variables are added.
Adjusted R2

Adjusted 𝑅2 = 0.442
44.2% of the variation in pie sales is explained by
the variation in selling price and advertising cost,
taking into account the sample size and number
of predictor variables.
How many predictors?
▪ One way to prevent overfitting the model is to limit the
number of predictors based on the sample size.

▪ These rules are merely suggestions.


TEST OF INDIVIDUAL SIGNIFICANCE OF PREDICTORS
▪ We are usually interested in testing each estimated coefficient
to see whether it is significantly different from zero, that is, if a
predictor variable helps explain the variation in Y.
▪ Use t-tests of individual variable slopes.
▪ Shows if there is a linear relationship between the variables Y
and Xi.
▪ Hypotheses: bj − 0
t=
Sb j
Significance of Price as a predictor
−24.975−0
For price: 𝑡 = = −2.306,
10.832
𝑝 = 0.040 < α = .05
Significance of advertising cost as a predictor
74.131 −0
For Ads cost: 𝑡 = = 2.855,
25.967
𝑝 = 0.014 < α = .05
Reject the null hypothesis for both
variables. There is sufficient evidence that
both price and advertising cost affect pie
sales at the 0.05 level of significance.
SIGNIFICANCE OF PREDICTORS

Business Analytics by Jaggia et al


SIGNIFICANCE OF PREDICTORS

Business Analytics by Jaggia et al


REPORTING REGRESSION RESULTS
Regression results are often reported in a “user-friendly” table. Table
6.11 reports the regression results for the three models that attempt to
explain annual post-college earnings (Earnings).

Business Analytics by Jaggia et al


REPORTING REGRESSION RESULTS

For Model 1, the predictor variable is


the average annual cost (Cost).

For Model 2, the predictors are


Cost, the graduation rate (Grad),
and the percentage of students
paying down debt (Debt).

For Model 3, the predictors are


Cost, Grad, Debt, and whether or
not a college is located in a city
(City equals 1 if a city location, 0
otherwise).
Business Analytics by Jaggia et al
Assumptions of Regression (L.I.N.E)
▪ Linearity – the relationship between X and Y is linear
▪ Independence of errors – the error values (difference between
observed and estimated values) are statistically independent.
▪ Normality of error – the error values are normally distributed
for any given value of X
▪ Equal variance or homoskedasticity – the probability
distribution of the errors has constant variance.
Checking the assumptions by examining the residuals

Residual Analysis for


Linearity:
Plot X against residuals

Aside from visually examining the scatter plots of the IV and DV to assess linearity, the
scatter plot of the IV vs the residuals may also be examined. The plots at the left show curve
patterns which indicates that the data relationship is not linear. Another model should be
used.
Checking the assumptions by examining the residuals

Residual
Analysis for
Equal Variance:
Plot X against
residuals
Checking the assumptions by examining the residuals

Residual
Analysis for
Equal variance:
Plot predicted
values against
residuals
Checking the assumptions by examining the residuals
Residual Analysis for Normality:
1. Examine the Stem-and-Leaf Display of the Residuals
2. Examine the Box-and-Whisker Plot of the Residuals
3. Examine the Histogram of the Residuals
4. Construct a normal probability plot.
5. Construct a Q-Q plot.
Checking the assumptions by examining the residuals

If residuals are normal, the probability plot


and the Q-Q plot should be approximately
linear.
Checking the assumptions by examining the residuals
What can we do when residuals are not normal?
1. Consider trimming outliers – but only if they clearly are
mistakes.
2. Can you increase the sample size? If so, it will help assure
asymptotic normality of the estimates.
3. You could try a logarithmic transformation of the variables.
However, this is a new model specification with a different
interpretation of coefficients.
4. You could do nothing, just be aware of the problem.
Checking the assumptions by examining the residuals

Residual Analysis for


Independence of Errors:
Plot times series X against
residuals

Independence of errors means that the


distribution of errors is random and is not
influenced by or correlated to the errors
in prior observations.
Checking the assumptions by examining the residuals

Residual Analysis for


Independence of Errors:
Plot time series X against
residuals

Clearly, independence can be checked


when we know the order in which the
observations were made. The opposite of
independence is auto-correlation.
Measuring Autocorrelation
▪ Another way of checking for independence of errors is by
testing the significance of the Durbin Watson Statistic.
▪ The Durbin-Watson Statistic measure detects the presence of
autocorrelation.
▪ It is used when data are collected over time to detect the
presence of autocorrelation.
▪ Autocorrelation exists if residuals in one time period are
related to residuals in another period.
Measuring Autocorrelation
▪ Cross-sectional data may exhibit autocorrelation, but usually
it is an artifact of the order of data entry and so may be
ignored.
▪ The absence of autocorrelation is usually valid for cross-
sectional data, but it is often violated for time series data.
Measuring Autocorrelation
▪ The presence of autocorrelation of errors (or residuals)
violates the regression assumption that residuals are
statistically independent.
The Durbin-Watson, DW, Statistic
▪ The DW statistic is used to test for autocorrelation.
n
H0: residuals are not correlated
H1: autocorrelation is present
 i i −1
( e − e ) 2

D= i =2
n
▪ The possible range is 0 ≤ D ≤ 4 e
i =1
2
i

▪ D should be close to 2 if H0 is true

▪ D less than 2 may signal positive The value of DW can be


autocorrelation, D greater than 2 may signal obtained from software like
negative autocorrelation
SPSS, Gretl and JASP.
Common Violation: Absence of Multicollinearity
▪ Perfect multicollinearity exists when two or more predictor
variables have an exact linear relationship.
▪ Multicollinearity does not violate any of the assumptions.
▪ Its presence results in imprecise estimates of the slope
coefficients, that is, the separate influences of the predictor
variables becomes difficult to be determined.

Business Analytics by Jaggia et al


How to detect multicollinearity
▪ Examine the correlations between the predictors. Multicollinearity is
severe if the sample correlation coefficients between any two
predictor variables is more than 0.80 or less than -0.80.
▪ If we find a high R2 and a significant F statistic coupled with
individually insignificant predictors, multicollinearity may be an issue.
▪ The variance inflation factor (available in JASP or Gretl) or VIF is
another measure that can detect a high correlation between
predictors. VIF must be less than 10 for cross-sectional data.
▪ This problem is remedied by dropping one of the collinear predictor.

Business Analytics by Jaggia et al


Example 1 Excel output for assessing assumptions
The residual plot shows
that the assumptions of
linearity and constant
variance are satisfied.

The assumption of
normality of residuals is
satisfied since the points
follow a straight line.
Example 1 in
JASP:
Correlation
Analysis
Example 1 in
JASP:
Regression
analysis with
assumption
checks
Example 1 in
Gretl:
Regression
analysis
Example 1 in
Gretl:
Output
Example 1 in
Gretl:
Assumption
Checks
Strategies when performing regression analysis
▪ Start with a scatter plot of X on Y to observe possible
relationship.
▪ Perform residual analysis to check the assumptions.
▪ Plot the residuals vs X to check for violations of
assumptions such as equal variance.
▪ Use a histogram, stem and leaf display, box and whisker
plot or normal probability plot of the residuals to uncover
possible non-normality.
Strategies when performing regression analysis
▪ If there is any violation of any assumption, use alternative
methods or models.
▪ If there is no evidence of assumption violation, then test for
the significance of the regression coefficients.
▪ Avoid making predictions or forecasts outside the relevant
range.
Exercises:
Given the data regarding the selling price of a
home (in thou $), home size in sq ft, lot size
(in thou sq ft) and baths (number of
bathrooms).
a. Obtain the regression equation using
Price as response variable and the rest as
predictors.
b. Describe the results in terms of model fit.
c. Describe the results in terms of the
coefficient of determination.
d. Interpret the different coefficients.
Some helpful sources:
▪ The Four Assumptions of Linear Regression – Statology

You might also like