TT220 5 Apr19-1
TT220 5 Apr19-1
Murdock
April 5, 2019, 9:10 – 11:00 am
U of T E-MAIL: [email protected]
SURNAME
(LAST NAME):
GIVEN NAME
(FIRST NAME):
UTORID:
(e.g. LIHAO118)
Instructions:
You have 110 minutes. Keep these test papers and the Supplement closed and face up on your desk until the
start of the test is announced. You must stay for a minimum of 60 minutes.
You may use a non-programmable calculator.
There are 5 questions (most with multiple parts) with varying point values worth a total of 100 points.
This test includes these 8 pages plus the Supplement. The Supplement contains the aid sheets (formulas,
Normal, Student t, and F tables) and readings, figures, tables, and other materials required for some test
questions. For each question referencing this Supplement, carefully review all materials. The Supplement will
NOT be graded: write your answers on these test papers. When we announce the end of the test, hand these
test papers to us (you keep the Supplement).
Write your answers clearly, completely and concisely in the designated space provided immediately after
each question. An answer guide ends each question to let you know what is expected. For example, a
quantitative analysis (which shows your work), a fully-labelled graph, and/or sentences.
o Anything requested by the question and/or the answer guide is required. Similarly, limit yourself to
the answer guide. For example, if the answer guide does not request sentences, provide only what is
requested (e.g. quantitative analysis).
Your entire answer must fit in the designated space provided immediately after each question. No extra
space/pages are possible. You cannot use blank space for other questions nor can you write answers on the
Supplement. Write in PENCIL and use an ERASER as needed so that you can fit your final answer (including
work and reasoning) in the appropriate space. Questions give more blank space than is needed for an answer
(with typical handwriting) worth full marks. Follow the answer guides and avoid excessively long answers.
(1) See Supplement for Question (1): The 2018 World Happiness Report.
(a) [10 pts] See Figure 2.2 and, below it, the details for Canada and Japan. What is the 95% CI estimate of the
DIFFERENCE in mean happiness between Canada and Japan? Answer with a quantitative analysis.
(b) [6 pts] See Table A7. Define 𝑎𝑏𝑟𝑜𝑎𝑑 to equal 1 for respondents with family abroad and 0 otherwise. Define
ℎ𝑎𝑝𝑝𝑖𝑛𝑒𝑠𝑠 as the life evaluation score (0-10 scale). What would be the OLS equation: ℎ𝑎𝑝𝑝𝚤𝑛𝑒𝑠𝑠 = 𝑏 + 𝑏 𝑎𝑏𝑟𝑜𝑎𝑑?
Also, what would be the sample size (𝑛) for that OLS regression? Answer with the values of 𝑏 , 𝑏 , and 𝑛.
(2) See Supplement for Question (2): Air Pollution in Tianjin, China.
(a) [6 pts] Which of these summarizes the regression results for Tianjin? Explain. Answer with 2 – 3 sentences.
(b) [4 pts] How should we interpret the value of -14.73599? Answer with 1 – 2 sentences.
(3) [18 pts] See Supplement for Question (3): California Energy. Compare and contrast the results in boldface for
constr_01_04 in Regression #1 and Regression #2. For each, interpret the results in boldface. Also, explain why the
results in boldface are similar or different across the regressions. Which one of these two regressions offers a better
answer to the primary research question in this journal article? Why is it better? Answer with 8 – 10 sentences.
(4) See Supplement for Question (4): Correlation matrix.
(a) [5 pts] Is the correlation between y and x1 statistically significant? If so, at which of these common significance
levels: 10%, 5%, 1%, or 0.1%? Answer with a quantitative analysis & 1 sentence.
(b) [5 pts] Is the correlation between y and x2 statistically significant? If so, at which of these common significance
levels: 10%, 5%, 1%, or 0.1%? Answer with a quantitative analysis & 1 sentence.
(c) [7 pts] True/False/Explain: “A multiple regression of y on x1, x2, and x3 would allow us to check how y is related
to each x variable and whether or not each correlation is statistically significant.” Answer with 2 – 3 sentences.
(5) See Supplement for Question (5): Parents’ Beliefs About Their Children’s Academic Ability.
(a) [14 pts] Is there a statistically significant difference between the control group and treatment group in the mean
overall score? What is the P-value? Is the answer about whether or not there is a statistically significant difference
surprising or expected? Explain. Answer with hypotheses in formal notation, a quantitative analysis & 2 – 3
sentences.
(b) [8 pts] Using Regression (1) in Table 1, draw ONE graph showing how predicted endline beliefs relate to overall
scores for each group: the control group and treatment group. Label the axes, specify which line belongs to which
group, and clearly write the numeric values of the intercept and slope of each. Answer with a fully-labelled graph.
(c) [8 pts] What is the model for Regression (1) in Table 1? Next, continuing with Part (b), is there a statistically
significant difference in the slopes of the two lines? Answer with a formal regression model, hypotheses in formal
notation, a quantitative analysis & 1 sentence.
(d) [9 pts] All things considered, which column of results in Table 1 is the one that the reader should focus on? Why?
Make sure to include consideration of the 𝑅 in your assessment. Answer with a clear choice of Column (1), (2), or (3)
& 3 – 4 sentences.
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 1 of 12
This Supplement contains the aid sheets (formulas, Normal, Student t, and F tables) and readings, figures, tables, and
other materials for some test questions. For each question referencing this Supplement, carefully review all materials.
∑ ∑ ( ) ∑ ∑
Sample mean: 𝑋 = Sample variance: 𝑠 = = − Sample s.d.: 𝑠 = √𝑠
( )
∑ ( )( ) ∑ ∑ ∑
Sample coefficient of variation: 𝐶𝑉 = Sample covariance: 𝑠 = = −
( )
∑
Sample interquartile range: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 Sample coefficient of correlation: 𝑟 = =
( )
Addition rule: 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵) Conditional probability: 𝑃(𝐴|𝐵) =
( )
Complement rules: 𝑃(𝐴 ) = 𝑃(𝐴 ) = 1 − 𝑃(𝐴) 𝑃(𝐴 |𝐵) = 𝑃(𝐴 |𝐵) = 1 − 𝑃(𝐴|𝐵)
Multiplication rule: 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴|𝐵)𝑃(𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴)
( )
If 𝑿 is Uniform (𝑋~𝑈[𝑎, 𝑏]) then 𝑓(𝑥) = and 𝐸[𝑋] = and 𝑉[𝑋] =
( ) ( )
CI estimator: (𝑃 − 𝑃 ) ± 𝑧 / +
Inference about a comparing two population means, independent samples, unequal variances:
( )
𝒕 test statistic: 𝑡 = CI estimator: (𝑋 − 𝑋 ) ± 𝑡 ⁄ +
Degrees of freedom: 𝜈 =
Inference about a comparing two population means, independent samples, assuming equal variances:
( )
𝒕 test statistic: 𝑡 = CI estimator: (𝑋 − 𝑋 ) ± 𝑡 ⁄ + Degrees of freedom: 𝜈 = 𝑛 + 𝑛 − 2
( ) ( )
Pooled variance: 𝑠 =
Inference about a comparing two population means, paired data: (𝑛 is number of pairs and 𝑑 = 𝑋 − 𝑋 )
SIMPLE REGRESSION:
∑ ( )
Standard deviation of residuals: 𝑠 = = Standard error of slope: 𝑠. 𝑒. (𝑏 ) = 𝑠 =
( )
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 3 of 12
𝒕 test statistic: 𝑡 =
)
CI estimator: 𝑏 ± 𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) Degrees of freedom: 𝜈 = 𝑛 − 2
. .(
𝑦 ±𝑡 ⁄ 𝑠 1+ + ( )
or 𝑦 ±𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) 𝑥 −𝑋 + +𝑠
Degrees of freedom: 𝜈 = 𝑛 − 2
𝑦 ±𝑡 ⁄ 𝑠 +( )
or 𝑦 ±𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) 𝑥 −𝑋 + Degrees of freedom: 𝜈 = 𝑛 − 2
Model: 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯+ 𝛽 𝑥 + 𝜀
⁄( )
𝑅 = =1− 𝐴𝑑𝑗. 𝑅 = 1 − ⁄(
= 𝑅 −
)
∑ ( )
Residuals: 𝑒 = 𝑦 − 𝑦 Standard deviation of residuals: 𝑠 = =
Standard error of slope: 𝑠. 𝑒. 𝑏 = 𝑠 (for multiple regression, must be obtained from technology)
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 4 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 5 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 6 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 7 of 12
Supplement for Question (1): The 2018 World Happiness Report by John F. Helliwell, Richard Layard, and Jeffrey D. Sachs
(http://worldhappiness.report/) documents and analyzes individual well-being across countries and over time.
Excerpt (p. 104): [We measure happiness, also called “life evaluations,” with] the question “Please imagine a
ladder with steps numbered from zero at the bottom to ten at the top. Suppose we say that the top of the
ladder represents the best possible life for you, and the bottom of the ladder represents the worst possible life
for you. On which step of the ladder would you say you personally feel you stand at this time, assuming that
the higher the step the better you feel about your life, and the lower the step the worse you feel about it?
Which step comes closest to the way you feel?”
Surveys ask this Cantril ladder question to a fresh random sample of people in each year and each country. Below are
pieces of Figure 2.2 and an excerpt that explains it. (The full figure is three pages long and shows 156 countries.)
Excerpt (p. 21): Figure 2.2 shows the average ladder score (answer to the Cantril ladder question, on a scale of
0 to 10) for each country, averaged over the years 2015-2017. The total sample sizes are reported in the
statistical appendix, and are reflected in Figure 2.2 by the horizontal lines showing the 95% confidence regions.
Details for Canada and Japan: For 2015-2017 combined, the sample size for Canada (ranked 7) is 2,027 and the values of
the 95% CI shown in the figure are: [7.2363, 7.4207]. The sample size for Japan (ranked 54) is 3,008 and the values of the
95% CI shown in the figure are: [5.8333, 5.9967].
Notes: Includes Venezuela, Brazil, Mexico, Costa Rica, Argentina, Bolivia, Chile, Colombia, Ecuador, El Salvador, Guatemala,
Honduras, Nicaragua, Panama, Paraguay, Peru, and Uruguay and excludes the foreign-born in each country of interview.
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 8 of 12
Supplement for Question (2): Recall Zheng and Kahn (2017) “A New Era of Pollution Progress in Urban China?” PM10
measures of air pollution in micrograms per cubic meter of air (𝜇𝑔/𝑚 ). The regression below uses 10 years of data,
2003 through 2012, for the city of Tianjin. The variable trend is 1 for the year 2003, 2 for 2004, 3 for 2005, …, and 10
for 2012. The variable trend_sq is trend squared. One value below is in boldface for easy reference.
------------------------------------------------------------------------------
pm10 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trend | -14.73599 3.364972 -4.38 0.003 -22.69289 -6.779099
trend_sq | 1.117 .2981239 3.75 0.007 .412049 1.821951
_cons | 143.0097 8.05707 17.75 0.000 123.9577 162.0616
------------------------------------------------------------------------------
Supplement for Question (3): Recall Levinson (2016) “How Much Energy Do Building Energy Codes Save? Evidence from
California Houses” (https://www.aeaweb.org/articles?id=10.1257/aer.20150102). Below are two multiple regressions from
Excel with parts in boldface for easy reference. In both Regression #1 and #2, the y variable is ln_elec_mmbtu, which is
the natural log of annual household electricity use in MMBTUs. yr_2009 is a survey year dummy (=1 if 2009 RASS data,
=0 if 2003 RASS data). A set of dummy variables record when the house was constructed, with before 1940 serving as
the reference (omitted) category. (For example, constr_40_49 =1 if constructed from 1940-1949, =0 otherwise.)
Regression #1: (Dependent variable is the natural logarithm of annual household electricity use in MMBTUs)
Regression Statistics
R Squared 0.086038686
Observations 14045
ANOVA
df SS MS F Significance F
Regression 12 316.1656134 26.34713445 110.0789512 4.9946E-263
Residual 14032 3358.525736 0.239347615
Total 14044 3674.69135
Regression #2: (Dependent variable is the natural logarithm of annual household electricity use in MMBTUs)
Regression Statistics
R Squared 0.34926507
Observations 14045
ANOVA
df SS MS F Significance F
Regression 16 1283.441333 80.21508331 470.5727886 0
Residual 14028 2391.250017 0.170462647
Total 14044 3674.69135
Supplement for Question (4): Consider the following correlation matrix for observational data with 100 observations
and four variables.
correlate y x1 x2 x3;
(obs=100)
| y x1 x2 x3
-------------+------------------------------------
y | 1.0000
x1 | -0.2892 1.0000
x2 | 0.1891 -0.0182 1.0000
x3 | 0.1423 -0.0384 0.3216 1.0000
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 10 of 12
Supplement for Question (5): In the article “Parents’ Beliefs About Their Children’s Academic Ability: Implications for
Educational Investments,” (https://www.aeaweb.org/articles?id=10.1257/aer.20171172) Dizon-Ross (2019) shows how
providing parents with clear, understandable information about their child’s academic progress can help parents make
more informed decisions. She does a field experiment with 5,268 children from 39 randomly selected primary schools in
two districts (Machinga and Balaka) in Malawi. Here is a quick summary.
The researchers have the scores (out of 100 points) for every student in the subjects of math, English, and
Chichewa. The overall score, which measures academic performance, is the average of these three scores.
Schools are required to send school report cards home to parents. However, these are often lost (students do
not deliver them to their parents) or are unclear to uneducated parents.
In the baseline survey, researchers asked parents about beliefs about their children’s academic performance.
The 5,268 children are randomly divided into a treatment group (𝑛 = 2,614) and control group (𝑛 = 2,654).
o For the treatment group, parents received an additional and specially designed report card on the child’s
performance, which the researchers carefully designed to be clear for all parents including uneducated
ones. Further, a trained person walked parents through each number.
o For the control group, parents only had the usual information (i.e. school report cards that they may
never have received or may not be able to understand).
In the endline survey, researchers asked parents again about beliefs about their child’s academic performance
by asking them to imagine a new test taken that day. These endline beliefs are also measured out of 100 points.
Next are STATA summaries of key variables, separately for the control group and the treatment group.
Control group:
Overall Score
-------------------------------------------------------------
Percentiles Smallest
1% 9 0
5% 18 2
10% 24 2 Obs 2,654
25% 35 2 Sum of Wgt. 2,654
Treatment group:
Overall Score
-------------------------------------------------------------
Percentiles Smallest
1% 10 0
5% 18 0
10% 24 0 Obs 2,614
25% 34 0 Sum of Wgt. 2,614
Control group:
Endline Beliefs (out of 100 points)
-------------------------------------------------------------
Percentiles Smallest
1% 20 0
5% 35 1
10% 40 2 Obs 2,642
25% 50 4 Sum of Wgt. 2,642
Note: There is an outlier in the posted replication data, which is shown in boldface above.
Treatment group:
Endline Beliefs (out of 100 points)
-------------------------------------------------------------
Percentiles Smallest
1% 15 0
5% 25 0
10% 30 0 Obs 2,602
25% 45 1 Sum of Wgt. 2,602
Excerpt (p. 21): I now examine whether information changes beliefs by looking at the impact of information on
mean beliefs measured at endline. Recall that, unlike beliefs measured at baseline, the beliefs question asked
at endline was not asking about last-term test scores; instead, it asked how well parents thought their child
would do on a hypothetical test taken that same day. The prediction is thus that providing information should
decrease the gap between parents’ endline beliefs and their child’s last-term scores. Information cuts the gap
nearly in half.
Table 1 below comes from Tables 2 and C.7 in the original paper as well as direct analysis of the replication data.
Notes: Shows OLS results for three regressions. Data sources are baseline survey, baseline test score data, and
endline survey data. Each observation is a child. The dependent variable is the parent’s endline beliefs about
the child’s performance (out of 100 points) on a hypothetical test taken the same day as the endline survey.
Overall score is the child’s actual average overall score (out of 100 points) on the subjects of math, English,
and Chichewa. Treat is a dummy that equals 1 for those children in the treatment group and equals 0
otherwise. Standard errors are in parentheses. Regression (1) includes an outlier: the observation for
Household id=968 and reference child=2 with endline beliefs of 350, which is outside the possible range of
values (0 to 100). Regression (2) excludes that outlier. Regression (3) keeps that outlier but includes a dummy
variable for it.