0% found this document useful (0 votes)
104 views11 pages

Introduction To Regression Analysis: Narmina Rustamova ADA

The document introduces simple linear regression models and the assumptions underlying them. It provides an example model of the relationship between election outcomes (vote share for Candidate A) and campaign expenditures. It then outlines the steps in Stata to import the data, estimate the regression model, and output and interpret the results. Key outputs discussed are the coefficients, t-statistics, R-squared, F-statistic, and how to plot the regression line on a scatter plot to visualize the fit of the linear model to the data.

Uploaded by

Rustemao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views11 pages

Introduction To Regression Analysis: Narmina Rustamova ADA

The document introduces simple linear regression models and the assumptions underlying them. It provides an example model of the relationship between election outcomes (vote share for Candidate A) and campaign expenditures. It then outlines the steps in Stata to import the data, estimate the regression model, and output and interpret the results. Key outputs discussed are the coefficients, t-statistics, R-squared, F-statistic, and how to plot the regression line on a scatter plot to visualize the fit of the linear model to the data.

Uploaded by

Rustemao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Introduction to

regression analysis
NARMINA RUSTAMOVA
ADA

Simple Linear Regression Model

where
dependent variable or regressand
independent variable or regressor
- intercept parameter
- slope parameter
error term or disturbance
i number of observations

Assumptions
1.
in the population
2. {(: i = 1, . . . , n} is a random sample of the model above, implying
uncorrelated residuals: Cov() = 0 for all i
3. { : i = 1, . . . , n} are not all identical, implying
4. E[u|x] = 0 for all x (zero average error), implying E[u] = 0 and Cov(u, x) = 0
.
5. Var[u|x] = for all x, implying Var[u] = (homoscedasticity).

Example: Campaign expenditures


and election outcomes
Model:
voteA = 0 + 1 x expendA+ 2 x expend B + 3 x prtystr A + u,

where
voteA is the percent of the vote received by Candidate A,
expendA and expend B are campaign expenditures by
Candidates A and B,
prtystr A is a measure of party strength for Candidate A (the
percent of recent presidential vote that went to As party).

Stata Step 1: Import and use


data
Stata > File > Import > Excel Spreadsheet > Browse & Import first row as variables
or use a command
import excel "C:\Users\Narmina\Documents\present\data.xlsx", sheet("Sheet1") firstrow
Data Editor > Save > Name & Location
or use a command
save "C:\Users\Narmina\Documents\present\data.dta
File > Open > Name & Location
or use a command
save "C:\Users\Narmina\Documents\present\data.dta"

Stata Step 2: Estimate the


model
Statistics > Linear regression > dependent variable & independent variable
or use a command
regress vote_A expend_A expend_B prtystr_A

Result:
Vote_A = 33.27 + 0.03 x expend_A 0.03 x expend_B + 0.34 x prtystr_A + u,

Stata Step 3: Output


Source

Coefficients indicate how much


the dependent variable varies
with an independent variable,
when all other independent
variables are held constant.
Example: An increase in
candidate As expenditures by
$1 increases the percent of
votes he/she receives by 0.035

SS

df

MS

Model
Residual

27555.6163
20901.6323

3
169

9185.20542
123.678298

Total

48457.2486

172

281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3,
169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

Stata Step 3: Output

Source

SS

df

MS

Model
Residual

27555.6163
20901.6323

3 9185.20542
169 123.678298

Total

48457.2486

172 281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

Ifp< .05, the coefficients are


statistically significantly
different from zero.If t>1.70,
the coefficients are statistically
significant at 5% significance
level.

Stata Step 3: Output


Source
Model
Residual
Total

SS
27555.6163
20901.6323
48457.2486

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

df

MS

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

3 9185.20542
169 123.678298
172 281.728189

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

The "R-squared" row represents


(also called the coefficient of
determination) the proportion of
variance in the dependent
variable that can be explained by
the independent variables:
independent variables explain
56.8% of the variability of our
dependent variable
The adjusted R-squared is a
modified version of R-squared
that has been adjusted for the
number of predictors in the
model: increases only if the new
term improves the model more
than would be expected by
chance. It decreases when a
predictor improves the model by
less than expected by chance.

Stata Step 3: Output

Source

SS

df

MS

Model
Residual

27555.6163
20901.6323

3 9185.20542
169 123.678298

Total

48457.2486

172 281.728189

vote_A

Coef.

expend_A
expend_B
prtystr_A
_cons

.0349245
-.0349236
.342515
33.26714

Std. Err.
.0033695
.0030012
.0879518
4.416778

t
10.36
-11.64
3.89
7.53

Number of obs
F( 3, 169)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.000
0.000

=
=
=
=
=
=

173
74.27
0.0000
0.5687
0.5610
11.121

[95% Conf. Interval]


.0282728
-.0408482
.1688893
24.54797

.0415762
-.0289989
.5161407
41.9863

The F-statistic tells if the


explanatory variables as a group
explain a statistically significant
share of the variation in the
dependent variable

Stata: Plotting Regression Line


Graphics > twoway graph > create > fit plots & line

To combine scatter plot and fitted plot:

60

twoway (lfit vote_A expend_A)

80

Or use a command

20

40

twoway (scatter vote_A expend_A) (lfit vote_A expend_A)

500
vote_A

expend_A

1000

Fitted values

1500

You might also like