0% found this document useful (0 votes)

94 views

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

This document discusses inference and prediction in simple linear regression models. It reviews how to calculate standard errors and test statistics for regression coefficients that follow a t-distribution. Confidence intervals can be constructed using these statistics. Prediction intervals are also discussed, which account for variability in predicting new outcomes based on the regression model. Examples using diamond price data are provided to demonstrate calculating and plotting confidence and prediction intervals.

Uploaded by

Alex Boncu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Alex Boncu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Inference in regression

Brian Caffo, Jeff Leek and Roger Peng

Johns Hopkins Bloomberg School of Public Health

Recall our model and fitted values

Considerthemodel

Yi = 0 + 1 Xi + i
N(0, 2 ) .
Weassumethatthetruemodelisknown.
Weassumethatyou'veseenconfidenceintervalsandhypothesistestsbefore.
0 = Y 1 X
1 = Cor(Y, X)

Sd(Y)
Sd(X)

2/14

Review

Statisticslike oftenhavethefollowingproperties.

1. IsnormallydistributedandhasafinitesampleStudent'sTdistributioniftheestimatedvariance
isreplacedwithasampleestimate(undernormalityassumptions).
2. CanbeusedtotestH0 : = 0 versusHa : >, <, 0 .
3. Canbeusedtocreateaconfidenceintervalfor via Q1/2 where Q1/2 istherelevant
quantilefromeitheranormalorTdistribution.
Inthecaseofregressionwithiidsamplingassumptionsandnormalerrors,ourinferenceswillfollow
verysimilarilytowhatyousawinyourinferenceclass.
Wewon'tcoverasymptoticsforregressionanalysis,butsufficeittosaythatunderassumptionson
thewaysinwhichthe X valuesarecollected,theiidsamplingmodel,andmeanmodel,thenormal
resultsholdtocreateintervalsandconfidenceintervals

3/14

Standard errors (conditioned on X)

ni=1 (Yi Y )(Xi X )
Var( 1 ) = Var
(
n (Xi X )2
i=1

n
Var (i=1 Yi (Xi X ))
2
n
i=1 (Xi X )2 )
n 2 (Xi X )2
(

i=1

2
n
i=1 (Xi X) 2 )

2
ni=1 (Xi X) 2

4/14

Results
2 = Var( 1 ) = 2 / i=1 ( Xi X ) 2
n

2 = Var( ) =
0

1
n
(

2
X
ni=1 (X iX ) 2

Inpractice, isreplacedbyitsestimate.
It'sprobablynotsurprisingthatunderiidGaussianerrors

j j

followsat distributionwithn 2 degreesoffreedomandanormaldistributionforlargen.

Thiscanbeusedtocreateconfidenceintervalsandperformhypothesistests.

5/14

Example diamond data set

library(UsingR); data(diamond)
y <- diamond$price; x <- diamond$carat; n <- length(y)
beta1 <- cor(y, x) * sd(y) / sd(x)
beta0 <- mean(y) - beta1 * mean(x)
e <- y - beta0 - beta1 * x
sigma <- sqrt(sum(e^2) / (n-2))
ssx <- sum((x - mean(x))^2)
seBeta0 <- (1 / n + mean(x) ^ 2 / ssx) ^ .5 * sigma
seBeta1 <- sigma / sqrt(ssx)
tBeta0 <- beta0 / seBeta0; tBeta1 <- beta1 / seBeta1
pBeta0 <- 2 * pt(abs(tBeta0), df = n - 2, lower.tail = FALSE)
pBeta1 <- 2 * pt(abs(tBeta1), df = n - 2, lower.tail = FALSE)
coefTable <- rbind(c(beta0, seBeta0, tBeta0, pBeta0), c(beta1, seBeta1, tBeta1, pBeta1))
colnames(coefTable) <- c("Estimate", "Std. Error", "t value", "P(>|t|)")
rownames(coefTable) <- c("(Intercept)", "x")

6/14

Example continued
coefTable

Estimate Std. Error t value P(>|t|)

(Intercept) -259.6
17.32 -14.99 2.523e-19
x
3721.0
81.79 45.50 6.751e-40

fit <- lm(y ~ x);

summary(fit)$coefficients

Estimate Std. Error t value Pr(>|t|)

(Intercept) -259.6
17.32 -14.99 2.523e-19
x
3721.0
81.79 45.50 6.751e-40

7/14

Getting a confidence interval

sumCoef <- summary(fit)$coefficients
sumCoef[1,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[1, 2]

[1] -294.5 -224.8

sumCoef[2,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[2, 2]

[1] 3556 3886

With95%confidence,weestimatethata0.1caratincreaseindiamondsizeresultsina355.6to388.6
increaseinpricein(Singapore)dollars.

8/14

Prediction of outcomes
ConsiderpredictingY atavalueofX
Predictingthepriceofadiamondgiventhecarat
Predictingtheheightofachildgiventheheightoftheparents
Theobviousestimateforpredictionatpointx 0 is

0 + 1 x 0
Astandarderrorisneededtocreateapredictioninterval.
There'sadistinctionbetweenintervalsfortheregressionlineatpoint x 0 andthepredictionofwhat
aywouldbeatpointx 0 .
Lineatx se,
0

1
(x0 X ) 2
+
n
2
n

i=1 (X iX )

Predictionintervalseatx ,
0

(x0
X ) 2
1 + 1n + n
2

i=1 (X iX )

9/14

Plotting the prediction intervals

plot(x, y, frame=FALSE,xlab="Carat",ylab="Dollars",pch=21,col="black", bg="lightblue", cex=2)
abline(fit, lwd = 2)
xVals <- seq(min(x), max(x), by = .01)
yVals <- beta0 + beta1 * xVals
se1 <- sigma * sqrt(1 / n + (xVals - mean(x))^2/ssx)
se2 <- sigma * sqrt(1 + 1 / n + (xVals - mean(x))^2/ssx)
lines(xVals, yVals + 2 * se1)
lines(xVals, yVals - 2 * se1)
lines(xVals, yVals + 2 * se2)
lines(xVals, yVals - 2 * se2)

10/14

Plotting the prediction intervals

11/14

Discussion
Bothintervalshavevaryingwidths.
LeastwidthatthemeanoftheXs.
Wearequiteconfidentintheregressionline,sothatintervalisverynarrow.
Ifweknew 0 and 1 thisintervalwouldhavezerowidth.
Thepredictionintervalmustincorporatethevariabilibityinthedataaroundtheline.
Evenifweknew 0 and 1 thisintervalwouldstillhavewidth.

12/14

In R
newdata <- data.frame(x = xVals)
p1 <- predict(fit, newdata, interval = ("confidence"))
p2 <- predict(fit, newdata, interval = ("prediction"))
plot(x, y, frame=FALSE,xlab="Carat",ylab="Dollars",pch=21,col="black", bg="lightblue", cex=2)
abline(fit, lwd = 2)
lines(xVals, p1[,2]); lines(xVals, p1[,3])
lines(xVals, p2[,2]); lines(xVals, p2[,3])

13/14

In R

14/14

Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Anne Anastasi - Psychological Testing I
91% (11)
Anne Anastasi - Psychological Testing I
104 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Inference in The Regression Model
No ratings yet
Inference in The Regression Model
4 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Regression 101
No ratings yet
Regression 101
18 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
2.4 Confidence Intervals and Prediction Intervals in Linear Models
No ratings yet
2.4 Confidence Intervals and Prediction Intervals in Linear Models
7 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
Confidence Interval, Model Fitness and Prediction: S S T B
No ratings yet
Confidence Interval, Model Fitness and Prediction: S S T B
8 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
ECON835 Lecture Notes Part 1 Probability Through Asymptotics [Fall 2014]
No ratings yet
ECON835 Lecture Notes Part 1 Probability Through Asymptotics [Fall 2014]
75 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Econometría
No ratings yet
Econometría
43 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
PE Civil: Transportation e-book Practice Exam
No ratings yet
PE Civil: Transportation e-book Practice Exam
41 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
No ratings yet
Simple Linear Regression (Chapter 11) : Review of Some Inference and Notation: A Common Population Mean Model
24 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Statistical Inference Notes Melon University
No ratings yet
Statistical Inference Notes Melon University
5 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
DesignofExperimentswithMINITABPDFDrive Trang 301 522
No ratings yet
DesignofExperimentswithMINITABPDFDrive Trang 301 522
222 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Simple Linear Regression Part I - Updated FA18
No ratings yet
Simple Linear Regression Part I - Updated FA18
59 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Lecture 16 Regression
No ratings yet
Lecture 16 Regression
30 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Stat2 1st Edition Ann R. Cannon - Discover the ebook with all chapters in just a few seconds
No ratings yet
Stat2 1st Edition Ann R. Cannon - Discover the ebook with all chapters in just a few seconds
47 pages
1 Preliminaries: 1.1 Motivation
No ratings yet
1 Preliminaries: 1.1 Motivation
7 pages
Full download Stat2 1st Edition Ann R. Cannon pdf docx
No ratings yet
Full download Stat2 1st Edition Ann R. Cannon pdf docx
67 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
cheatsheet
No ratings yet
cheatsheet
4 pages
Unit X - Final Review - 1 Per Page
No ratings yet
Unit X - Final Review - 1 Per Page
30 pages
Regression
No ratings yet
Regression
60 pages
Regression Models for Data Science in R
No ratings yet
Regression Models for Data Science in R
137 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Regression
No ratings yet
Regression
46 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
A System For The Behavioral Assessment of Athletic Coaches. Research Quarterly, 48, 401-407
No ratings yet
A System For The Behavioral Assessment of Athletic Coaches. Research Quarterly, 48, 401-407
8 pages
Curran 2011
No ratings yet
Curran 2011
7 pages
International Review of Sport and Exercise Psychology
No ratings yet
International Review of Sport and Exercise Psychology
26 pages
Hollembeak 2005
No ratings yet
Hollembeak 2005
18 pages
Multiple Intelligences1
No ratings yet
Multiple Intelligences1
41 pages
Oferta Bauturi Alcoolice
No ratings yet
Oferta Bauturi Alcoolice
69 pages
Personality and Individual Differences: Alexandra Martins, Nelson Ramalho, Estelle Morin
No ratings yet
Personality and Individual Differences: Alexandra Martins, Nelson Ramalho, Estelle Morin
11 pages
Gri Zenko 1994
No ratings yet
Gri Zenko 1994
11 pages
Common Risk and Protective Factors in Successful Prevention Programs
No ratings yet
Common Risk and Protective Factors in Successful Prevention Programs
10 pages
Energizer Scrambled Sentences
No ratings yet
Energizer Scrambled Sentences
2 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
7 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
14 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
4 pages
Exploring Quantum Physics: Guest Lecture: Electron Spin
No ratings yet
Exploring Quantum Physics: Guest Lecture: Electron Spin
9 pages
Ethical Principles of Psychologists and Code of Conduct
No ratings yet
Ethical Principles of Psychologists and Code of Conduct
18 pages
13.1 Factorial ANOVA 1: Balanced Designs, No Interactions
No ratings yet
13.1 Factorial ANOVA 1: Balanced Designs, No Interactions
54 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Homework No. 3..
No ratings yet
Homework No. 3..
7 pages
Ma SNM Notes
No ratings yet
Ma SNM Notes
168 pages
Statistics For Managers Using Microsoft Excel: 7 Edition
No ratings yet
Statistics For Managers Using Microsoft Excel: 7 Edition
68 pages
Analysis of Variance
No ratings yet
Analysis of Variance
36 pages
BSC 3 Sem Statistics (Statistical Methods) Summer 2018
No ratings yet
BSC 3 Sem Statistics (Statistical Methods) Summer 2018
3 pages
Tutorial 1 - Chapter 1 Questions: Section A
No ratings yet
Tutorial 1 - Chapter 1 Questions: Section A
3 pages
Model Selection Strategies
No ratings yet
Model Selection Strategies
20 pages
Chi-Square Test of Independence
No ratings yet
Chi-Square Test of Independence
46 pages
Combinatorics and Probability
No ratings yet
Combinatorics and Probability
25 pages
Sma 260 Probability and Statistics I
No ratings yet
Sma 260 Probability and Statistics I
4 pages
Statistical Inference: Confidence Intervals
No ratings yet
Statistical Inference: Confidence Intervals
22 pages
GCSE Probability
No ratings yet
GCSE Probability
41 pages
Statistical Treatments: You Can Have Your Own Way of This Introductory Part
0% (1)
Statistical Treatments: You Can Have Your Own Way of This Introductory Part
6 pages
Poisson Mixture Models
No ratings yet
Poisson Mixture Models
21 pages
Syllabus - Foundation of Data Science
No ratings yet
Syllabus - Foundation of Data Science
4 pages
Nonparametric Statistics: Significance of Computed
No ratings yet
Nonparametric Statistics: Significance of Computed
1 page
Hersheys Kiss Activity
No ratings yet
Hersheys Kiss Activity
3 pages
Probability Trees
No ratings yet
Probability Trees
7 pages
Introduction To Conditional Probability and Bayes Theorem For Data Science Professionals
No ratings yet
Introduction To Conditional Probability and Bayes Theorem For Data Science Professionals
12 pages
Probability 3
No ratings yet
Probability 3
8 pages
Lembar Jawaban Utek Mandat Arni Despa P 220104016P
No ratings yet
Lembar Jawaban Utek Mandat Arni Despa P 220104016P
8 pages
Significance Tests
No ratings yet
Significance Tests
43 pages
FYCRP-MATHS-PHASE8
No ratings yet
FYCRP-MATHS-PHASE8
122 pages
Download Essentials of Modern Business Statistics with Microsoft Excel 8th Edition David Anderson ebook All Chapters PDF
100% (1)
Download Essentials of Modern Business Statistics with Microsoft Excel 8th Edition David Anderson ebook All Chapters PDF
65 pages
Chapter 6 BWD
No ratings yet
Chapter 6 BWD
63 pages
Ms. Koni Bernadette C. Tarayao Faculty in Mathematics College of Education
No ratings yet
Ms. Koni Bernadette C. Tarayao Faculty in Mathematics College of Education
68 pages
Week 7 Sampling
No ratings yet
Week 7 Sampling
29 pages
Sampling and Sampling Distribution
100% (2)
Sampling and Sampling Distribution
43 pages

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Inference in Regression: Brian Caffo, Jeff Leek and Roger Peng Johns Hopkins Bloomberg School of Public Health

Uploaded by

Inference in regression

Brian Caffo, Jeff Leek and Roger Peng

Recall our model and fitted values

Standard errors (conditioned on X)

followsat distributionwithn 2 degreesoffreedomandanormaldistributionforlargen.

Example diamond data set

Estimate Std. Error t value P(>|t|)

fit <- lm(y ~ x);

Estimate Std. Error t value Pr(>|t|)

Getting a confidence interval

[1] -294.5 -224.8

sumCoef[2,1] + c(-1, 1) * qt(.975, df = fit$df) * sumCoef[2, 2]

[1] 3556 3886

Plotting the prediction intervals

Plotting the prediction intervals

You might also like