0% found this document useful (0 votes)

2 views

Notes 1

The document provides an overview of regression analysis, focusing on its purpose to model relationships between independent and dependent variables for estimation and prediction. It introduces concepts such as simple linear regression, the best fitting line, and the least squares criterion for minimizing prediction errors. Additionally, it discusses the difference between population and sample regression lines in the context of predicting outcomes based on observed data.

Uploaded by

promptmba24

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Notes 1

Uploaded by

promptmba24

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Basics of Regression

Regression analysis helps us answer

many practical questions. For example
...
What’s reducing life expectancy in U.S.?
How much of cancer mortality rates is related to where a
person lives?
How well does college percentage predict CAT score?
What are the factors relevant to the change of Body
Mass Index (BMI)? For example, age, gender, exercise,
etc.
The goal of regression analysis to model the relationship between independent
variables (predictor) and a dependent variable (response).
1 Estimation:
models the relationship between a predictor/predictors and a response with
an observed data set.

2 Prediction:
predict new outcomes given a new set of inputs with a built model.
Examples:
(1) drug overdose and life expectancy

(2) high school grade point average (gpa) and college entrance test score

(3) latitude and cancer mortality rate

(4) age, gender, physical activity, etc and body mass index (BMI)
For a start, Are you aware of...
▶ Scatterplots
▶ Linear models
Histogram of Body Height

For both genders, height follows a “normal distribution.”

Normal Distribution
If we connect the mean of the height for both genders, we get a
“line.”

Height vs Gender
85
80
75
Height

70
65
60

Women Men

Gender

We will learn linear regression in this course.

Deterministic (or functional) relationships, e.g., the relationship
between degrees Fahrenheit and degrees Celsius is known to be:
9
Fahr = Cels + 32
5
120
100
Fahrenheit
80
60
40

0 10 20 30 40 50

Celsius
The relationship is perfect. We are not interested in that.
Statistical relationships:

Skin Cancer Mortality versus Latitude

220
Mortality (Deaths per 10 million)
200
180
160
140
120
100

30 35 40 45

Latitude (at center of state)

The relationship is not perfect. Indeed, the plot exhibits some
“trend,” but it also exhibits some “scatter.”
Simple Linear Regression

Simple linear regression is a statistical method that allows us to

summarize and study relationships between two variables:
1 One variable, denoted x, is regarded as the predictor,
explanatory, or independent variable, which can be of any
type.

2 The other variable, denoted Y , is regarded as the response,

outcome, or dependent variable, which is continuous.
Simple linear regression is “simple”, because it concerns the study
of only one predictor variable.
“Best Fitting Line”

Heights (h) and weights (w) of 10 students. Which line do you think
best summarizes the trend between height and weight?
200

w = -266.5 + 6.1h
w = -331.2 + 7.1h
180
Weight
160
140
120

64 66 68 70 72 74

Height
Let’s continue with the previous example of 10 students. In order to
examine which of the two lines is a better fit, we first need to
introduce some common notation:

1 An experimental unit is an object or person on which the

measurement is made

2 Yi denotes the observed response for experimental unit i

3 xi denotes the predictor value for experimental unit i

4 Ŷ i is the predicted response (or fitted value) for experimental

unit i

Then, the equation for the best fitting line is:

Ŷi = b0 + b1 xi
1 In general, when we use Ŷi = b0 + b1 xi to predict the actual
response Yi , we make a prediction error (or residual error) of
size:
ei = Yi − Ŷi
ei is called the prediction error for data point i.

2 A line that fits the data “best” will be one for which the n
prediction errors are as small as possible in some overall sense.
Continuing with the example of 10 students:

e10
200

w = -266.5 + 6.1h
180

e8
Weight
160
140
120

64 66 68 70 72 74

Height
Least Squares Regression

One way to achieve this goal is to invoke the “least squares

criterion,” which says to “minimize the sum of the squared prediction
errors.”
That is, we need to find the values b0 and b1 that minimizes:
n
X
Q= [Yi − (b0 + b1 xi )]2
i=1
Xn
= e2i
i=1
In light of the least squares criterion, which line do you now think is
the best fitting line?
200

w = -266.5 + 6.1h
w = -331.2 + 7.1h
180
Weight
160
140
120

64 66 68 70 72 74

Height
For the dashed line:

n
X n
X
e2i = (Yi − Ŷi )2 = 118.81 + . . . + 44.89 = 766.5
i=1 i=1

For the solid line:

n
X n
X
e2i = (Yi − Ŷi )2 = 47.076 + . . . + 201.924 = 599.8
i=1 i=1
What Do b0 and b1 Estimate?
Suppose that we are interested in the relationship between high school gpa (x) and
college entrance test score (Y ) in a population of 200 students.
If we know the information of every student, we can get the following “population
regression line” by connecting the mean college entrance test score at each gpa level.
College entrance test score
20
15
10
5

1.0 1.5 2.0 2.5 3.0 3.5 4.0

High school gpa

Connecting the average of test score of each group, we get the
solid line, which we summarize by

E(Yi ) = β0 + β1 xi ,

which is called the “population regression line.”

Simple Linear Regression Model

The simple linear regression model

Yi = β0 + β1 xi + εi , i = 1, . . . , n,

where

1 εi are independent random errors with E(εi ) = 0, V ar(εi ) = σ 2 .

2 (xi , Yi ) are observed in data.

3 β0 , β1 and σ 2 are unknown parameters.

However, if we only know the information of a sample of students, say, 20 students, then
we can estimate the “population regression line” by a “sample regression line”.

population regression line

College entrance test score

sample regression line

20
15
10
5

1.0 1.5 2.0 2.5 3.0 3.5 4.0

High school gpa

The “sample regression line” is Ŷi = b0 + b1 xi and the prediction error ei = Yi − Ŷi is a
surrogate of εi .

Instant Download (Ebook PDF) An Introduction To Categorical Data Analysis by Alan Agresti PDF All Chapters
100% (5)
Instant Download (Ebook PDF) An Introduction To Categorical Data Analysis by Alan Agresti PDF All Chapters
41 pages
9.1 Multiple Choice: Chapter 9 Assessing Studies Based On Multiple Regression
100% (1)
9.1 Multiple Choice: Chapter 9 Assessing Studies Based On Multiple Regression
38 pages
Barangay Health Workers' Level of Competence
100% (9)
Barangay Health Workers' Level of Competence
15 pages
Diagnosis of Complicated Grief Using The Texas Revised Inventory of Grief, Brazilian Portuguese Version
No ratings yet
Diagnosis of Complicated Grief Using The Texas Revised Inventory of Grief, Brazilian Portuguese Version
11 pages
Top-Down or Bottom-Up Aggregate Versus Disaggregate Extrapolations
No ratings yet
Top-Down or Bottom-Up Aggregate Versus Disaggregate Extrapolations
9 pages
Multiple Choice Questions
36% (11)
Multiple Choice Questions
18 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
20 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Lesson1 - Simple Linier Regression
No ratings yet
Lesson1 - Simple Linier Regression
40 pages
Review Lecture
No ratings yet
Review Lecture
44 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Notes 2
No ratings yet
Notes 2
15 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
@regression
No ratings yet
@regression
33 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
Topics: Regression
No ratings yet
Topics: Regression
26 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Regression Bs
No ratings yet
Regression Bs
29 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Regression
No ratings yet
Regression
9 pages
Handout 05 Regression and Correlation PDF
No ratings yet
Handout 05 Regression and Correlation PDF
17 pages
Chapter 10 - 2 - 2
No ratings yet
Chapter 10 - 2 - 2
33 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
UNIT-3NEW
No ratings yet
UNIT-3NEW
34 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
19 SL Regression 2 320E F21
No ratings yet
19 SL Regression 2 320E F21
47 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Chapter 5 Regression Analysis
No ratings yet
Chapter 5 Regression Analysis
14 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
SRT 605 - Topic (10) SLR
No ratings yet
SRT 605 - Topic (10) SLR
39 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Simple Linear Regression Analysis PDF
No ratings yet
Simple Linear Regression Analysis PDF
16 pages
2 The Linear Regression Model
No ratings yet
2 The Linear Regression Model
11 pages
Lecture 1 - Simple Linear Regression
No ratings yet
Lecture 1 - Simple Linear Regression
9 pages
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Unit III
No ratings yet
Unit III
18 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
Reference Material Linear Regression
No ratings yet
Reference Material Linear Regression
12 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Correlation_Linear_Logistic Regression
No ratings yet
Correlation_Linear_Logistic Regression
123 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
reg
No ratings yet
reg
110 pages
Lecture Note - Regression
No ratings yet
Lecture Note - Regression
40 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
38 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Advanced Statistics Day 1
No ratings yet
Advanced Statistics Day 1
61 pages
06 Regression
No ratings yet
06 Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
12 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
regression2
No ratings yet
regression2
28 pages
Relationship B/W Dependent and Independent Variables: 1) Functional Relation
No ratings yet
Relationship B/W Dependent and Independent Variables: 1) Functional Relation
9 pages
Reference+Material Linear Regression
No ratings yet
Reference+Material Linear Regression
12 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Copy of Group3_SecA_Case6
No ratings yet
Copy of Group3_SecA_Case6
3 pages
BBVA Group 6 SecA
No ratings yet
BBVA Group 6 SecA
5 pages
BBVA Group 6 SecA
No ratings yet
BBVA Group 6 SecA
5 pages
Home Depot Case - Group 6 SecA
No ratings yet
Home Depot Case - Group 6 SecA
9 pages
Assignment 3
No ratings yet
Assignment 3
1 page
SectionA_Team4_Case3
No ratings yet
SectionA_Team4_Case3
8 pages
Notes 6
No ratings yet
Notes 6
26 pages
Notes 4
No ratings yet
Notes 4
15 pages
Download Full Research Methods for Social Work 7th Edition Allen Rubin PDF All Chapters
100% (1)
Download Full Research Methods for Social Work 7th Edition Allen Rubin PDF All Chapters
77 pages
Fitdistrplus
No ratings yet
Fitdistrplus
87 pages
Bài tập buổi 9
No ratings yet
Bài tập buổi 9
50 pages
A Scoping Review of Causal Methods Enabling Predictions Under Hypothetical Interventions
No ratings yet
A Scoping Review of Causal Methods Enabling Predictions Under Hypothetical Interventions
16 pages
Components of Thesis
No ratings yet
Components of Thesis
53 pages
Can Computer Virtual Influencers Replace Human Influencers in The Future An Empirical Investigation in The Age of Digital Transformation
No ratings yet
Can Computer Virtual Influencers Replace Human Influencers in The Future An Empirical Investigation in The Age of Digital Transformation
16 pages
Measurement Scale: Dr. Myint Moe Moe Khin Professor / Head Department of Statistics Monywa University of Economics
No ratings yet
Measurement Scale: Dr. Myint Moe Moe Khin Professor / Head Department of Statistics Monywa University of Economics
27 pages
B Com Rev Core Syl CBCS 20 Apsche Final
No ratings yet
B Com Rev Core Syl CBCS 20 Apsche Final
34 pages
Where can buy Statistics for the Behavioral and Social Sciences: A Brief Course, Books a la Carte 6th Edition Arthur Aron ebook with cheap price
100% (1)
Where can buy Statistics for the Behavioral and Social Sciences: A Brief Course, Books a la Carte 6th Edition Arthur Aron ebook with cheap price
55 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
8 pages
A Study of Productivity of Precast Concrete Installation
No ratings yet
A Study of Productivity of Precast Concrete Installation
5 pages
Psychological Statistics Assignment
No ratings yet
Psychological Statistics Assignment
4 pages
Statistics
No ratings yet
Statistics
6 pages
Marketing Research
No ratings yet
Marketing Research
4 pages
A Beginner's Guide To Customer Segmentation With Python - by Sigli Mumuni - Medium
No ratings yet
A Beginner's Guide To Customer Segmentation With Python - by Sigli Mumuni - Medium
14 pages
Unit 5 - Production, Planning and Control, Control Charts
No ratings yet
Unit 5 - Production, Planning and Control, Control Charts
158 pages
M A Economics Syllabus 2024-25 (1)
No ratings yet
M A Economics Syllabus 2024-25 (1)
75 pages
Examining The Influence of COVID 19 Pandemic in Changing Customers' Orientation Towards E-Shopping
No ratings yet
Examining The Influence of COVID 19 Pandemic in Changing Customers' Orientation Towards E-Shopping
18 pages
SEM Research Assignment
100% (1)
SEM Research Assignment
9 pages
Syllabus of Aptitude and Reasoning
No ratings yet
Syllabus of Aptitude and Reasoning
2 pages
Global Sensitivity Analysis of WLS State Estimation For Distribution Systems
No ratings yet
Global Sensitivity Analysis of WLS State Estimation For Distribution Systems
164 pages
Chapter 5: The Beast of Bias: Smart Alex's Solutions
No ratings yet
Chapter 5: The Beast of Bias: Smart Alex's Solutions
15 pages
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
100% (2)
Practice of Statistics in the Life Sciences Brigitte Baldi pdf download
50 pages
Techinical Analysis of Hero Honda: Name: Nishant Lomash ROLL NO: 07/MBA/45
100% (1)
Techinical Analysis of Hero Honda: Name: Nishant Lomash ROLL NO: 07/MBA/45
41 pages

Notes 1

Uploaded by

Notes 1

Uploaded by

Basics of Regression

Regression analysis helps us answer

(3) latitude and cancer mortality rate

For both genders, height follows a “normal distribution.”

We will learn linear regression in this course.

Skin Cancer Mortality versus Latitude

Latitude (at center of state)

Simple linear regression is a statistical method that allows us to

2 The other variable, denoted Y , is regarded as the response,

1 An experimental unit is an object or person on which the

2 Yi denotes the observed response for experimental unit i

3 xi denotes the predictor value for experimental unit i

4 Ŷ i is the predicted response (or fitted value) for experimental

Then, the equation for the best fitting line is:

One way to achieve this goal is to invoke the “least squares

For the solid line:

1.0 1.5 2.0 2.5 3.0 3.5 4.0

High school gpa

which is called the “population regression line.”

The simple linear regression model

1 εi are independent random errors with E(εi ) = 0, V ar(εi ) = σ 2 .

2 (xi , Yi ) are observed in data.

3 β0 , β1 and σ 2 are unknown parameters.

population regression line

sample regression line

1.0 1.5 2.0 2.5 3.0 3.5 4.0

High school gpa

You might also like