Chapter 06-Regression Analysis

Uploaded by

miza020627

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

Chapter 06-Regression Analysis

Uploaded by

miza020627

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

 Create charts to better understand data sets.

 For cross-sectional data, use a scatter chart.

 For time series data, use a line chart.
Linear y = a + bx
Logarithmic y = ln(x)
Polynomial (2nd order) y = ax2 + bx + c
Polynomial (3rd order) y = ax3 + bx2 + dx + e
Power y = axb
Exponential y = abx
(the base of natural logarithms, e = 2.71828…is often used
for the constant b)
 Right click on data series
and choose Add trendline
from pop-up menu
 Check the boxes Display
Equation on chart and
Display R-squared value
on chart
 R2 (R-squared) is a measure of the “fit” of the line
to the data.
◦ The value of R2 will be between 0 and 1.
◦ A value of 1.0 indicates a perfect fit and all data points
would lie on the line; the larger the value of R2 the better
the fit.
Linear demand function:
Sales = 20,512 - 9.5116(price)
 Line chart of historical crude oil prices
 Excel’s Trendline tool is used to fit various functions to the
data.

Exponential y = 50.49e0.021x R2 = 0.664

Logarithmic y = 13.02ln(x) + 39.60 R2 = 0.382
Polynomial 2° y = 0.13x2 − 2.399x + 68.01 R2 = 0.905
Polynomial 3° y = 0.005x3 − 0.111x2
+ 0.648x + 59.497 R2 = 0.928 *
Power y = 45.96x0.0169 R2 = 0.397
 The R2 value will continue to increase as the order
of the polynomial increases; that is, a 4th order
polynomial will provide a better fit than a 3rd order,
and so on.
 Higher order polynomials will generally not be very
smooth and will be difficult to interpret visually.
◦ Thus, we don't recommend going beyond a third-order
polynomial when fitting data.
 Use your eye to make a good judgment!
 Regression analysis is a tool for building
mathematical and statistical models that
characterize relationships between a dependent
(ratio) variable and one or more independent, or
explanatory variables (ratio or categorical), all of
which are numerical.
 Simple linear regression involves a single
independent variable.
 Multiple regression involves two or more
independent variables.
 Finds a linear relationship between:
- one independent variable X and
- one dependent variable Y
 First prepare a scatter plot to verify the data has a
linear trend.
 Use alternative approaches if the data is not linear.
Size of a house is
typically related to its
market value.
X = square footage
Y = market value ($)
The scatter plot of the full
data set (42 homes)
indicates a linear trend.
 Market value = a + b × square feet
 Two possible lines are shown below.

 Line A is clearly a better fit to the data.

 We want to determine the best regression line.
 Market value = 32,673 + $35.036 × square feet
◦ The estimated market value of a home with 2,200 square feet
would be: market value = $32,673 + $35.036 × 2,200 = $109,752

The regression model

explains variation in
market value due to
size of the home.
It provides better
estimates of market
value than simply
using the average.
 Simple linear regression model:

 We estimate the parameters from the sample data:

 Let Xi be the value of the independent variable of the ith

observation. When the value of the independent
variable is Xi, then Yi = b0 + b1Xi is the estimated value
of Y for Xi.
 Residuals are the observed errors associated
with estimating the value of the dependent
variable using the regression line:
Data > Data Analysis >
Regression
Input Y Range (with
header)
Input X Range (with
header)
Check Labels

Excel outputs a table with

many useful regression
statistics.
 Multiple R - | r |, where r is the sample correlation
coefficient. The value of r varies from -1 to +1 (r is
negative if slope is negative)
 R Square - coefficient of determination, R2, which
varies from 0 (no fit) to 1 (perfect fit)
 Adjusted R Square - adjusts R2 for sample size
and number of X variables
 Standard Error - variability between observed
and predicted Y values. This is formally called the
standard error of the estimate, SYX.
53% of the variation in home market values
can be explained by home size.
The standard error of $7287 is less than
standard deviation (not shown) of $10,553.
 Residual = Actual Y value − Predicted Y value
 Standard residual = residual / standard deviation
 Rule of thumb: Standard residuals outside of ±2
or ±3 are potential outliers.
 Excel provides a table and a plot of residuals.

This point has a standard

residual of 4.53
 Linearity
 examine scatter diagram (should appear linear)
 examine residual plot (should appear random)
 Normality of Errors
 view a histogram of standard residuals
 regression is robust to departures from normality
 Homoscedasticity: variation about the regression line is
constant
 examine the residual plot
 Independence of Errors: successive observations should
not be related.
 This is important when the independent variable is time.
 Linearity - linear trend in scatterplot
- no pattern in residual plot
Normality of Errors – residual histogram appears
slightly skewed but is not a serious departure
 Homoscedasticity – residual plot shows no serious
difference in the spread of the data for different X
values.
 Independence of Errors – Because the data is
cross-sectional, we can assume this assumption
holds.
 A linear regression model with more than one
independent variable is called a multiple linear
regression model.
 We estimate the regression coefficients—called
partial regression coefficients — b0, b1, b2,… bk,
then use the model:

 The partial regression coefficients represent the

expected change in the dependent variable when
the associated independent variable is increased
by one unit while the values of all other
independent variables are held constant.
 Predict student graduation rates using several
indicators:
 Regression model

 The value of R2 indicates that 53% of the variation in the dependent

variable is explained by these independent variables.
 All coefficients are statistically significant.
 A good regression model should include only significant
independent variables.
 However, it is not always clear exactly what will happen when we
add or remove variables from a model; variables that are (or are not)
significant in one model may (or may not) be significant in another.
◦ Therefore, you should not consider dropping all insignificant variables at
one time, but rather take a more structured approach.
 Adding an independent variable to a regression model will always
result in R2 equal to or greater than the R2 of the original model.
 Adjusted R2 reflects both the number of independent variables and
the sample size and may either increase or decrease when an
independent variable is added or dropped. An increase in adjusted
R2 indicates that the model has improved.
1. Construct a model with all available independent
variables. Check for significance of the independent
variables by examining the p-values.
2. Identify the independent variable having the largest p-
value that exceeds the chosen level of significance.
3. Remove the variable identified in step 2 from the
model and evaluate adjusted R2.
(Don’t remove all variables with p-values that exceed a at the
same time, but remove only one at a time.)
4. Continue until all variables are significant.
 Banking Data

Home value has the

largest p-value; drop
and re-run the
regression.
 Bank regression after removing Home Value

Adjusted R2 improves slightly.

All X variables are significant.
 Multicollinearity occurs when there are strong
correlations among the independent variables, and they
can predict each other better than the dependent variable.
◦ When significant multicollinearity is present, it becomes difficult to
isolate the effect of one independent variable on the dependent
variable, the signs of coefficients may be the opposite of what they
should be, making it difficult to interpret regression coefficients, and
p-values can be inflated.
 Correlations exceeding ±0.7 may indicate multicollinearity
 The variance inflation factor is a better indicator, but not
computed in Excel.
 Colleges and Universities correlation matrix; none
exceed the recommend threshold of ±0.7

 Banking Data correlation matrix; large correlations exist

 If we remove Wealth from the model, the adjusted R2 drops to
0.9201, but we discover that Education is no longer significant.
 Dropping Education and leaving only Age and Income in the model
results in an adjusted R2 of 0.9202.
 However, if we remove Income from the model instead of Wealth,
the Adjusted R2 drops to only 0.9345, and all remaining variables
(Age, Education, and Wealth) are significant.
 Identifying the best regression model often requires
experimentation and trial and error.
 The independent variables selected should make sense in
attempting to explain the dependent variable
◦ Logic should guide your model development. In many applications,
behavioral, economic, or physical theory might suggest that certain
variables should belong in a model.
 Additional variables increase R2 and, therefore, help to explain
a larger proportion of the variation.
◦ Even though a variable with a large p-value is not statistically significant, it
could simply be the result of sampling error and a modeler might wish to
keep it.
 Good models are as simple as possible (the principle of
parsimony).
 Regression analysis requires numerical data.
 Categorical data can be included as independent
variables, but must be coded numeric using
dummy variables.
 For variables with 2 categories, code as 0 and 1.
 Employee Salaries provides data for 35 employees

 Predict Salary using Age and MBA (code as

yes=1, no=0)
 Salary = 893.59 + 1044.15 × Age + 14767.23 × MBA
◦ If MBA = 0, salary = 893.59 + 1044 × Age
◦ If MBA = 1, salary =15,660.82 + 1044 × Age

Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
73 pages
Chapter 8 Ppt New Period 3
No ratings yet
Chapter 8 Ppt New Period 3
12 pages
Simple Linear Regression: Coefficient of Determination
No ratings yet
Simple Linear Regression: Coefficient of Determination
21 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Exp 1 121a1047 Lavanya Kurup ML
No ratings yet
Exp 1 121a1047 Lavanya Kurup ML
11 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
Fsgs
No ratings yet
Fsgs
28 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Final Answer Bank
No ratings yet
Final Answer Bank
10 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
MGT555 CH 6 Regression Analysis
No ratings yet
MGT555 CH 6 Regression Analysis
19 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Master of Business Administration Arpit
No ratings yet
Master of Business Administration Arpit
75 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
No ratings yet
5-LR Doc - R Sqared-Bias-Variance-Ridg-Lasso
26 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Unit-III (Data Analytics)
100% (1)
Unit-III (Data Analytics)
15 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
unit-3
No ratings yet
unit-3
30 pages
Chapter 10 Regression Analysis
No ratings yet
Chapter 10 Regression Analysis
3 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
An Introduction To Regression Analysis
No ratings yet
An Introduction To Regression Analysis
7 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
34 pages
CFA Level2
No ratings yet
CFA Level2
8 pages
Multiple Regression Analysis 1
No ratings yet
Multiple Regression Analysis 1
57 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
Linear Regression For Intermediate
No ratings yet
Linear Regression For Intermediate
6 pages
ML Mod 2
No ratings yet
ML Mod 2
8 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Revision235
No ratings yet
Revision235
8 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Estadística Clase 7
No ratings yet
Estadística Clase 7
24 pages
BA Notes[End Sem)
No ratings yet
BA Notes[End Sem)
26 pages
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
No ratings yet
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
14 pages
Chatgpt Unit - 2
No ratings yet
Chatgpt Unit - 2
3 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
Machine Learning Algorithm
100% (2)
Machine Learning Algorithm
20 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
U-4_IML
No ratings yet
U-4_IML
17 pages
Multiple Linear Regression
100% (3)
Multiple Linear Regression
26 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
Linear Regression Assignment Questions and Answer
No ratings yet
Linear Regression Assignment Questions and Answer
7 pages
Regression Primer
No ratings yet
Regression Primer
4 pages
DA U3
No ratings yet
DA U3
10 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Simple Linear Regression Homework Solutions
100% (1)
Simple Linear Regression Homework Solutions
6 pages
UNIT 2
No ratings yet
UNIT 2
79 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
W6 - L6 - Multiple Linear Regression
No ratings yet
W6 - L6 - Multiple Linear Regression
3 pages
Mba ZC417 Course Handout
No ratings yet
Mba ZC417 Course Handout
8 pages
Study Material - (Session 7,8)
No ratings yet
Study Material - (Session 7,8)
2 pages
Uji Normalitas
No ratings yet
Uji Normalitas
9 pages
ME Computer Engineering Syllabus
No ratings yet
ME Computer Engineering Syllabus
37 pages
CHAPTER THREE - Multiple Linear Regression Analysis
No ratings yet
CHAPTER THREE - Multiple Linear Regression Analysis
77 pages
Simulation System Project Report
100% (1)
Simulation System Project Report
36 pages
Evaluating The Relationship of The Daily Income and Expenses of Tricycle Drivers in Angeles City
100% (2)
Evaluating The Relationship of The Daily Income and Expenses of Tricycle Drivers in Angeles City
6 pages
Stat 11 Q4 Week 1-SSLM
No ratings yet
Stat 11 Q4 Week 1-SSLM
4 pages
The Two-Way Error Component Regression Model
No ratings yet
The Two-Way Error Component Regression Model
29 pages
Output SPSS 2
No ratings yet
Output SPSS 2
20 pages
QTTM509 Research Methodology-I: Dr. Tawheed Nabi
No ratings yet
QTTM509 Research Methodology-I: Dr. Tawheed Nabi
32 pages
Frekuensi Distribusi Skripsi Amel
No ratings yet
Frekuensi Distribusi Skripsi Amel
5 pages
Skittles Group E-Portfolio
No ratings yet
Skittles Group E-Portfolio
9 pages
BP801TT (2) - 230525 - 071905
No ratings yet
BP801TT (2) - 230525 - 071905
1 page
Final Exam in Educ 10
No ratings yet
Final Exam in Educ 10
2 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Hypothesis Testing (Critical Value Approach)
No ratings yet
Hypothesis Testing (Critical Value Approach)
3 pages
Parametric Identification
No ratings yet
Parametric Identification
6 pages
Analyzing and Interpreting
No ratings yet
Analyzing and Interpreting
18 pages
One Way Anova: Tugas 1 Rekayasa Kualitas
No ratings yet
One Way Anova: Tugas 1 Rekayasa Kualitas
6 pages
StudyGuide001 2015 4 B STA1502
No ratings yet
StudyGuide001 2015 4 B STA1502
116 pages
(Ebook) Testing for Normality by Henry C. Thode ISBN 9780824796136, 0824796136 instant download
100% (1)
(Ebook) Testing for Normality by Henry C. Thode ISBN 9780824796136, 0824796136 instant download
61 pages
Lab Manual FPA 580 PDF
No ratings yet
Lab Manual FPA 580 PDF
34 pages
Paired Vs Unpaired T-Test
No ratings yet
Paired Vs Unpaired T-Test
17 pages
(Ebook) Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman, Jennifer Hill ISBN 9780521867061, 0521867061 - Download the ebook and explore the most detailed content
100% (1)
(Ebook) Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman, Jennifer Hill ISBN 9780521867061, 0521867061 - Download the ebook and explore the most detailed content
35 pages
ABM 401 Lesson 1
No ratings yet
ABM 401 Lesson 1
14 pages
ARMAX Model
No ratings yet
ARMAX Model
2 pages
3 XWEgynrp de 5 DPZ 4
No ratings yet
3 XWEgynrp de 5 DPZ 4
14 pages
Week 6
No ratings yet
Week 6
11 pages
Statistical Foundations For Econometric
No ratings yet
Statistical Foundations For Econometric
413 pages