0% found this document useful (0 votes)

58 views3 pages

Annotated Stata Output - DR AMINU MATERIAL2

This document summarizes the output of a simple linear regression analysis predicting elementary school test scores (api00) from school enrollment (enroll). The regression finds that enroll is a statistically significant predictor of api00, with larger schools associated with lower average test scores. Specifically, for every 1 unit increase in enroll, api00 is predicted to decrease by 0.20 units on average. The regression also reports that enroll explains about 10% of the variation in api00.

Uploaded by

Bayo Adebayo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views3 pages

Annotated Stata Output - DR AMINU MATERIAL2

Uploaded by

Bayo Adebayo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Annotated Stata Output

Simple Regression Analysis

This page shows an example simple regression analysis with footnotes explaining the output.
The analysis uses a data file about scores obtained by elementary schools, predicting api00
from enroll using the following Stata commands.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi

regress api00 enroll

The output of this command is shown below, followed by explanations of the output.

Output

Sourcea | SSb dfc MSd Number of obse = 400

-------------+------------------------------ F( 1, 398)f = 44.83
f
Model | 817326.293 1 817326.293 Prob > F = 0.0000
Residual | 7256345.70 398 18232.0244 R-squaredg = 0.1012
-------------+------------------------------ Adj R-squaredh = 0.0990
Total | 8073672.00 399 20234.7669 Root MSEi = 135.03

------------------------------------------------------------------------------
api00j | Coef.k Std. Err.l tm P>|t|m [95% Conf. Interval]n
-------------+----------------------------------------------------------------
enroll | -.1998674 .0298512 -6.70 0.000 -.2585532 -.1411817
_cons | 744.2514 15.93308 46.71 0.000 712.9279 775.5749
------------------------------------------------------------------------------

Footnotes

a. This is the source of variance, Model, Residual, and Total. The Total variance is
partitioned into the variance which can be explained by the independent variables (Model)
and the variance which is not explained by the independent variables. Note that the Sums of
Squares for the Model and Residual add up to the Total Variance, reflecting the fact that the
Total Variance is partitioned into Model and Residual variance.

b. These are the Sum of Squares associated with the three sources of variance, Total, Model
& Residual. These can be computed in many ways. Conceptually, these formulas can be
expressed as:
SSTotal. The total variability around the mean. Σ(Y - Ybar)2.
SSResidual. The sum of squared errors in prediction. Σ(Y - Ypredicted)2.
SSModel. The improvement in prediction by using the predicted value of Y over just
using the mean of Y. Hence, this would be the squared differences between the predicted
value of Y and the mean of Y, Σ(Ypredicted - Ybar)2. Another way to think of this is the
SSModel is SSTotal - SSResidual. Note that the SSTotal = SSModel + SSResidual. Note
that SSModel / SSTotal is equal to .10, the value of R-Square. This is because R-Square is
the proportion of the variance explained by the independent variables, hence can be computed
by SSModel / SSTotal.

c. These are the degrees of freedom associated with the sources of variance. The total
variance has N-1 degrees of freedom. In this case, there were N=400 observations, so the DF
for total is 399. The model degrees of freedom corresponds to the number of predictors
minus 1 (K-1). You may think this would be 1-1 (since there was 1 independent variable in
the model statement, enroll). But, the intercept is automatically included in the model (unless
you explicitly omit the intercept). Including the intercept, there are 2 predictors, so the model
has 2-1=1 degree of freedom. The Residual degrees of freedom is the DF total minus the DF
model, 399 - 1 is 398.

d. These are the Mean Squares, the Sum of Squares divided by their respective DF. For the
Model, 817326.293 / 1 is equal to 817326.293. For the Residual, 7256345.7 / 398 equals
18232.0244. These are computed so you can compute the F ratio, dividing the Mean Square
Model by the Mean Square Residual to test the significance of the predictor(s) in the model.

e. This is the number of observations used in the regression analysis.

f. The F Value is the Mean Square Model (817326.293) divided by the Mean Square Residual
(18232.0244), yielding F=44.83. The p value associated with this F value is very small
(0.0000). These values are used to answer the question "Do the independent variables
reliably predict the dependent variable?". The p value is compared to your alpha level
(typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably
predict the dependent variable". You could say that the variable enroll can be used to reliably
predict api00 (the dependent variable). If the p value were greater than 0.05, you would say
that the independent variable does not show a significant relationship with the dependent
variable, or that the independent variable does not reliably predict the dependent variable.

g. R-Square is the proportion of variance in the dependent variable (api00) which can be
predicted from the independent variable (enroll). This value indicates that 10% of the
variance in api00 can be predicted from the variable enroll.

h. Adjusted R-square. As predictors are added to the model, each predictor will explain some
of the variance in the dependent variable simply due to chance. One could continue to add
predictors to the model which would continue to improve the ability of the predictors to
explain the dependent variable, although some of this increase in R-square would be simply
due to chance variation in that particular sample. The adjusted R-square attempts to yield a
more honest value to estimate the R-squared for the population. The value of R-square was
.10, while the value of Adjusted R-square was .099. Adjusted R-squared is computed using
the formula 1 - ( (1-Rsq)*(N-1)/(N-k-1) ). From this formula, you can see that when the
number of observations is small and the number of predictors is large, there will be a much
greater difference between R-square and adjusted R-square, because the ratio (N-1)/(N-k-1)
will be much greater than 1 and adjusted R-squared will be much smaller than unadjusted R-
squared. By contrast, when the number of observations is very large compared to the number
of predictors, the value of R-square and adjusted R-square will be much closer because the
ratio (N-1)/(N-k-1) will approach 1.

i. Root MSE is the standard deviation of the error term, and is the square root of the Mean
Square Residual (or Error)

j. This column shows the dependent variable at the top (api00) with the predictor variables
below it (enroll). The last variable (_cons) represents the constant, also referred to in
textbooks as the Y intercept, the height of the regression line when it crosses the Y axis.
k. These are the values for the regression equation for predicting the dependent variable from
the independent variable. The regression equation is presented in many different ways, for
example...

Ypredicted = b0 + b1*x1

The column of estimates (coefficients or parameter estimates, from here on labeled

coefficients) provides the values for b0 and b1 for this equation. Expressed in terms of the
variables used in this example, the regression equation is

api00Predicted = 744.25 - .20*enroll

This estimate tells you about the relationship between the independent variable and the
dependent variable. This estimate indicates the amount of increase in api00 that would be
predicted by a 1 unit increase in the predictor. Note: If an independent variable is not
significant, the coefficient is not significantly different from 0, which should be taken into
account when interpreting the coefficient. (See the columns with the t value and p value
about testing whether the coefficients are significant).
enroll - The coefficient (parameter estimate) is -.20. So, for every unit increase in enroll, a
-.20 unit decrease in api00 is predicted.

l. These are the standard errors associated with the coefficients. The standard error is used
for testing whether the parameter is significantly different from 0 by dividing the parameter
estimate by the standard error to obtain a t value (see the column with t values and p values).
The standard errors can also be used to form a confidence interval for the parameter, as
shown in the last 2 columns of this table.

m. These columns provide the t value and 2 tailed p value used in testing the null hypothesis
that the coefficient/parameter is 0. If you use a 2 tailed test, then you would compare each p
value to your preselected value of alpha. Coefficients having p values less than alpha are
significant. For example, if you chose alpha to be 0.05, coefficients having a p value of 0.05
or less would be statistically significant (i.e. you can reject the null hypothesis and say that
the coefficient is significantly different from 0). If you use a 1 tailed test (i.e., you predict
that the parameter will go in a particular direction), then you can divide the p value by 2
before comparing it to your preselected alpha level. With a 2 tailed test and alpha of 0.05,
you can reject the null hypothesis that the coefficient for enroll is equal to 0. The coefficient
of -.20 is significantly different from 0. Using a 2 tailed test and alpha of 0.01, the p value of
0.000 is smaller than 0.01 and the coefficient for enroll would still be significant at the 0.01
level.
The constant (_cons) is significantly different from 0 at the 0.05 alpha level. However,
having a significant intercept is seldom interesting.

n. This shows a 95% confidence interval for the coefficient. This is very useful as it helps
you understand how high and how low the actual population value of the parameter might
be. Such confidence intervals help you to put the estimate from the coefficient into
perspective by seeing how much the value could vary.

An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
AI Sem-4 Textbook
No ratings yet
AI Sem-4 Textbook
188 pages
Exploring The Limits of Bootstrap
No ratings yet
Exploring The Limits of Bootstrap
458 pages
Econometrics Final Paper Question
100% (1)
Econometrics Final Paper Question
3 pages
P4 New - CHeat Sheet End-Term
No ratings yet
P4 New - CHeat Sheet End-Term
7 pages
SRM Notes
No ratings yet
SRM Notes
96 pages
Business Statistics - Hypothesis Testing, Chi Square and Annova
No ratings yet
Business Statistics - Hypothesis Testing, Chi Square and Annova
7 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
Catapult Experiment: Design of Experiments (Doe) and Response Surface Methods (RSM)
No ratings yet
Catapult Experiment: Design of Experiments (Doe) and Response Surface Methods (RSM)
36 pages
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
No ratings yet
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
51 pages
Lecture 2: MRA and Inference: Dr. Yundan Gong
No ratings yet
Lecture 2: MRA and Inference: Dr. Yundan Gong
52 pages
Bayesian Inference
No ratings yet
Bayesian Inference
20 pages
Lectures Stat 530
No ratings yet
Lectures Stat 530
59 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
No ratings yet
Jimma University: M.SC in Economics (Industrial Economics) Regular Program Individual Assignment: Econometrics
20 pages
Quantitative Methods
No ratings yet
Quantitative Methods
55 pages
Advanced Statistical Techniques Using R: Outliers and Missing Data
No ratings yet
Advanced Statistical Techniques Using R: Outliers and Missing Data
28 pages
Group Assignment
No ratings yet
Group Assignment
18 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Regression Analysis in SPSS
No ratings yet
Regression Analysis in SPSS
34 pages
Business Statistics, 4e: by Ken Black
No ratings yet
Business Statistics, 4e: by Ken Black
65 pages
SLR2
No ratings yet
SLR2
42 pages
Peramalan (Forecasting)
No ratings yet
Peramalan (Forecasting)
24 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Review of Multiple Regression
No ratings yet
Review of Multiple Regression
12 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Correlation Regression
No ratings yet
Correlation Regression
26 pages
Sampling Procedures: Statictics, Educ 202, Block F Arahan, Devi Kiaki D
No ratings yet
Sampling Procedures: Statictics, Educ 202, Block F Arahan, Devi Kiaki D
10 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
Practice Final Exam S13
No ratings yet
Practice Final Exam S13
15 pages
Conducting Regression Analysis Using SPSS: A Hands-On Guide With
No ratings yet
Conducting Regression Analysis Using SPSS: A Hands-On Guide With
15 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Chi-Square Tests - Handout
No ratings yet
Chi-Square Tests - Handout
10 pages
Econometrics
No ratings yet
Econometrics
12 pages
Noted
No ratings yet
Noted
24 pages
Regrion
No ratings yet
Regrion
19 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Factorial Experiments: 2), and Factor C Has 2 Levels (C 2) - These Factorial Experiment Is Named
No ratings yet
Factorial Experiments: 2), and Factor C Has 2 Levels (C 2) - These Factorial Experiment Is Named
6 pages
MultipleReg - SchoolData - Stata Annotated Output - UCLA
No ratings yet
MultipleReg - SchoolData - Stata Annotated Output - UCLA
6 pages
Introduction To Econometrics 3rd Edition James H. Stock - Ebook PDF PDF Download
100% (4)
Introduction To Econometrics 3rd Edition James H. Stock - Ebook PDF PDF Download
46 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Edme7 One-Factor Tut
No ratings yet
Edme7 One-Factor Tut
29 pages
Team8 Lab3
No ratings yet
Team8 Lab3
12 pages
Analysis of Variance (ANOVA)
No ratings yet
Analysis of Variance (ANOVA)
9 pages
Lecture Note 7 - Part 2
No ratings yet
Lecture Note 7 - Part 2
4 pages
Homework Assignment-7 Answers
No ratings yet
Homework Assignment-7 Answers
11 pages
Linier Regression
No ratings yet
Linier Regression
19 pages
x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean
No ratings yet
x=^μ= x n x x) n−1 s σ s x−μ σ se (´x) = σ n: Sample Mean
5 pages
Measures of Variability
No ratings yet
Measures of Variability
58 pages
Basic Econometrics Notes
No ratings yet
Basic Econometrics Notes
47 pages
Session 2
No ratings yet
Session 2
21 pages
Important Formulas Table
No ratings yet
Important Formulas Table
4 pages
Lecture 8 Compatibility Mode
No ratings yet
Lecture 8 Compatibility Mode
10 pages
Lec2 4
No ratings yet
Lec2 4
8 pages
Linear Regression
No ratings yet
Linear Regression
13 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
No ratings yet
Table 9 3 Contains 40 Annual Counts of The Numbers of Recruits and Spawners in A Salmon
2 pages
Simple Linear Regression Interpretation PDF
No ratings yet
Simple Linear Regression Interpretation PDF
2 pages
Working
No ratings yet
Working
15 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Unit 3
No ratings yet
Unit 3
24 pages
How To Do Linear Regression With Excel
No ratings yet
How To Do Linear Regression With Excel
8 pages
Regression
No ratings yet
Regression
24 pages
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
No ratings yet
Regression Analysis - Stata Annotated Output: Use Https://stats - Idre.ucla - Edu/stat/stata/notes/hsb2
6 pages
Bag II
No ratings yet
Bag II
28 pages
Reviewer
No ratings yet
Reviewer
1 page
Regression Equation For SI
No ratings yet
Regression Equation For SI
12 pages
Econometrics II Week 3 Summary
No ratings yet
Econometrics II Week 3 Summary
8 pages
Annotated Stata Output Multiple Regression Analysis
No ratings yet
Annotated Stata Output Multiple Regression Analysis
5 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Module 4 Part 3
No ratings yet
Module 4 Part 3
16 pages
Regression
No ratings yet
Regression
15 pages
Basic Bootstrap in Stata
No ratings yet
Basic Bootstrap in Stata
2 pages
Regression Analysis: Variables in The Model
No ratings yet
Regression Analysis: Variables in The Model
3 pages
House Prices Predictive Model Summary Report
100% (1)
House Prices Predictive Model Summary Report
20 pages
RCBD
No ratings yet
RCBD
6 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Regression Output
No ratings yet
Regression Output
3 pages
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
No ratings yet
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
3 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
FormulaSheet FinalExam
No ratings yet
FormulaSheet FinalExam
8 pages
Simpreg
No ratings yet
Simpreg
6 pages
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
From Everand
Trifocal Tensor: Exploring Depth, Motion, and Structure in Computer Vision
Fouad Sabry
No ratings yet
Implementasi Eliminasi Gauss
No ratings yet
Implementasi Eliminasi Gauss
2 pages

Annotated Stata Output - DR AMINU MATERIAL2

Uploaded by

Annotated Stata Output - DR AMINU MATERIAL2

Uploaded by

Annotated Stata Output

Simple Regression Analysis

regress api00 enroll

Sourcea | SSb dfc MSd Number of obse = 400

e. This is the number of observations used in the regression analysis.

The column of estimates (coefficients or parameter estimates, from here on labeled

api00Predicted = 744.25 - .20*enroll

You might also like