How To Use Dummy X Variables
How To Use Dummy X Variables
HoToUseDummyXVariables.doc
Page 1 of 7
HoToUseDummyXVariables.doc
Page 2 of 7
Person is from the Midwest. Then this person is predicted to make 10.2 0.9 =
9.3 (dollars per hour)since the values for Northeast and South are zero.
Example:
Predicted Wage = 12.7 + 2.1Education 1.8Female
Holding Education constant, Female make $1.80 per hour less than Males (the
base case).
Predicted Wage from Regression of Wage on Education, Female, and FemaleEducation (Interaction Term)
0
1 F-M Differential
8 $ 7.12 $ 4.13 $
(2.99)
Male and Female Regression Lines
9 $ 8.00 $ 5.17 $
(2.83)
$18
10 $ 8.88 $ 6.22 $
(2.67)
$16
$14
11 $ 9.76 $ 7.26 $
(2.51)
$12
11.5 $ 10.20 $ 7.78 $
(2.43)
$10
12 $ 10.65 $ 8.30 $
(2.35)
Male
$8
$6
13 $ 11.53 $ 9.34 $
(2.19)
Female
$4
14 $ 12.41 $ 10.38 $
(2.02)
$2
16 $ 14.17 $ 12.47 $
(1.70)
$18 $ 15.93 $ 14.55 $
(1.38)
8
9 10 11 12 13 14 15 16 17
Education
Page 3 of 7
18
A very useful approach to interpretation is to create a table and compute the predicted Y
for given values of the included Xs.
See Section 8.5 for more.
NOTE: Sometimes, when you add an interaction term, the coefficient on the dummy
variable becomes counter intuitively signed. For example, adding Black*Education
might make the Black dummy variable coefficient positive. But the negative sign on the
Black*Education must also be included when figuring out the effect of being black on
predicted wage, and you will then obtain the expected result that blacks make less than
non-blacks.
3) How to test hypotheses with dummy variables and interaction terms?
The F-test is the way to do this.
Obtain the SSR for the restricted and unrestricted models, compute the F-statistic
(properly adjusting for the degrees of freedom in numerator and denominator), then find
the P-value.
Example using data and results from Female.xls (in the Chapter 8\ExcelFiles folder):
In a study of the determinants of wages, we want to see whether being female matters
after controlling for education. We think that being female affects both the slope and
intercept of the relationship between wages and education. Here are results from the
unrestricted regression:
Predicted Wage = 0.076 + 0.88Education 4.28Female + 0.16Female*Education
SSR = 419141
n = 8546
The corresponding restricted regression results are:
Predicted Wage = -1.53 + 0.92 Education
SSR = 430216.
n = 8546
(This regression is not in Female.xls, but it is very easy to run.)
For a test that there is no difference between male and female wages after controlling for
education, the null hypothesis says that the coefficients on Female AND
Female*Education together equal zero.
The test statistic is
HoToUseDummyXVariables.doc
Page 4 of 7
F stat
430216 41941
419141
8542
5441
49
111 .
The P-value is very small. We would reject the null hypothesis that being female doesnt
matter for wage determination, after controlling for education.
See Section 17.7 for more.
4) How to create a double-log functional form with dummy variables?
You cannot take the natural log of a dummy variable because ln(0) is undefined.
Thus, you cannot create a completely double-log specification when you have dummy
independent variables.
What is usually done is to take the natural log of the Y and continuous X variables,
leaving the dummy variables untransformed.
Example (based on Duquette (1999)):
Own Units (or Levels) Specification:
CharitableContributions i 0 1 Price i 2 Disposable Incomei 3 Married i t
All variables are continuous except Married. Price measures the after-tax cost of a
contribution. For example if for every dollar I give, my tax bill falls by 25 cents, the
price is 0.75.
To make this a log-transformed model we take the natural logs of the continuous
variables and leave the dummy variables alone.
Log Specification:
ln CharitableContributions i 0 1ln Price i 2 ln Disposable Income i 3 Married i t
The b1 and b2 estimates could be interpreted as price and income elasticities respectively.
The estimate of b3 would be roughly the approximate percentage change in charitable
contirbuitons for married tax payers versus unmarried tax payers. See the answer to the
next question to learn how to precisely interpret the coefficient estimate for Married.
HoToUseDummyXVariables.doc
Page 5 of 7
PCGMarried 1 PCGMarried 0
PCG Married 0
We can express the percentage difference in predicted values as
PCG Married 1
1
PCG Married 0
Now what we actually obtain from the log-specified regression equation is:
ln( PCG Married 1 ) ln( PCG Married 0 )
ln PCG Married 1 PCG Married 1
b3
0.46.
By the properties of logs, it then follows we can take the anti-log to obtain
PCG Married 1
exp b3 1.58.
PCG Married 1
HoToUseDummyXVariables.doc
Page 6 of 7
PCG Married 1
1 exp(b3 ) 1 exp(0.46) 1 1.58 1 0.58
PCG Married 0
PCG Married 0
Or, 58 percent. That is, married taxpayers, on average, make charitable contributions
58% percent higher than unmarried studies holding income and price constant.
The smaller the coefficient value, the closer the approximation will be to the exact
computation because x gets closer to exp(x) 1 the smaller the x.
Reference:
Duquette, Christopher M. (1999) Is Charitable Givign by Nonitemizers Responsive to
Tax Incentives? New Evidence. National Tax Journal 52(2): 195-206.
HoToUseDummyXVariables.doc
Page 7 of 7