0% found this document useful (0 votes)
32 views20 pages

TT220 5 Apr19-1

Great quality

Uploaded by

Misheel B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views20 pages

TT220 5 Apr19-1

Great quality

Uploaded by

Misheel B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

ECO220Y1Y, Test #5, Prof.

Murdock
April 5, 2019, 9:10 – 11:00 am

U of T E-MAIL: [email protected]

SURNAME
(LAST NAME):

GIVEN NAME
(FIRST NAME):

UTORID:
(e.g. LIHAO118)

Instructions:
 You have 110 minutes. Keep these test papers and the Supplement closed and face up on your desk until the
start of the test is announced. You must stay for a minimum of 60 minutes.
 You may use a non-programmable calculator.
 There are 5 questions (most with multiple parts) with varying point values worth a total of 100 points.
 This test includes these 8 pages plus the Supplement. The Supplement contains the aid sheets (formulas,
Normal, Student t, and F tables) and readings, figures, tables, and other materials required for some test
questions. For each question referencing this Supplement, carefully review all materials. The Supplement will
NOT be graded: write your answers on these test papers. When we announce the end of the test, hand these
test papers to us (you keep the Supplement).
 Write your answers clearly, completely and concisely in the designated space provided immediately after
each question. An answer guide ends each question to let you know what is expected. For example, a
quantitative analysis (which shows your work), a fully-labelled graph, and/or sentences.

o Anything requested by the question and/or the answer guide is required. Similarly, limit yourself to
the answer guide. For example, if the answer guide does not request sentences, provide only what is
requested (e.g. quantitative analysis).

o Marking TAs are instructed to accept all reasonable rounding.

 Your entire answer must fit in the designated space provided immediately after each question. No extra
space/pages are possible. You cannot use blank space for other questions nor can you write answers on the
Supplement. Write in PENCIL and use an ERASER as needed so that you can fit your final answer (including
work and reasoning) in the appropriate space. Questions give more blank space than is needed for an answer
(with typical handwriting) worth full marks. Follow the answer guides and avoid excessively long answers.
(1) See Supplement for Question (1): The 2018 World Happiness Report.
(a) [10 pts] See Figure 2.2 and, below it, the details for Canada and Japan. What is the 95% CI estimate of the
DIFFERENCE in mean happiness between Canada and Japan? Answer with a quantitative analysis.

(b) [6 pts] See Table A7. Define 𝑎𝑏𝑟𝑜𝑎𝑑 to equal 1 for respondents with family abroad and 0 otherwise. Define
ℎ𝑎𝑝𝑝𝑖𝑛𝑒𝑠𝑠 as the life evaluation score (0-10 scale). What would be the OLS equation: ℎ𝑎𝑝𝑝𝚤𝑛𝑒𝑠𝑠 = 𝑏 + 𝑏 𝑎𝑏𝑟𝑜𝑎𝑑?
Also, what would be the sample size (𝑛) for that OLS regression? Answer with the values of 𝑏 , 𝑏 , and 𝑛.
(2) See Supplement for Question (2): Air Pollution in Tianjin, China.

(a) [6 pts] Which of these summarizes the regression results for Tianjin? Explain. Answer with 2 – 3 sentences.

(b) [4 pts] How should we interpret the value of -14.73599? Answer with 1 – 2 sentences.
(3) [18 pts] See Supplement for Question (3): California Energy. Compare and contrast the results in boldface for
constr_01_04 in Regression #1 and Regression #2. For each, interpret the results in boldface. Also, explain why the
results in boldface are similar or different across the regressions. Which one of these two regressions offers a better
answer to the primary research question in this journal article? Why is it better? Answer with 8 – 10 sentences.
(4) See Supplement for Question (4): Correlation matrix.

(a) [5 pts] Is the correlation between y and x1 statistically significant? If so, at which of these common significance
levels: 10%, 5%, 1%, or 0.1%? Answer with a quantitative analysis & 1 sentence.

(b) [5 pts] Is the correlation between y and x2 statistically significant? If so, at which of these common significance
levels: 10%, 5%, 1%, or 0.1%? Answer with a quantitative analysis & 1 sentence.

(c) [7 pts] True/False/Explain: “A multiple regression of y on x1, x2, and x3 would allow us to check how y is related
to each x variable and whether or not each correlation is statistically significant.” Answer with 2 – 3 sentences.
(5) See Supplement for Question (5): Parents’ Beliefs About Their Children’s Academic Ability.

(a) [14 pts] Is there a statistically significant difference between the control group and treatment group in the mean
overall score? What is the P-value? Is the answer about whether or not there is a statistically significant difference
surprising or expected? Explain. Answer with hypotheses in formal notation, a quantitative analysis & 2 – 3
sentences.
(b) [8 pts] Using Regression (1) in Table 1, draw ONE graph showing how predicted endline beliefs relate to overall
scores for each group: the control group and treatment group. Label the axes, specify which line belongs to which
group, and clearly write the numeric values of the intercept and slope of each. Answer with a fully-labelled graph.

(c) [8 pts] What is the model for Regression (1) in Table 1? Next, continuing with Part (b), is there a statistically
significant difference in the slopes of the two lines? Answer with a formal regression model, hypotheses in formal
notation, a quantitative analysis & 1 sentence.
(d) [9 pts] All things considered, which column of results in Table 1 is the one that the reader should focus on? Why?
Make sure to include consideration of the 𝑅 in your assessment. Answer with a clear choice of Column (1), (2), or (3)
& 3 – 4 sentences.
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 1 of 12

This Supplement contains the aid sheets (formulas, Normal, Student t, and F tables) and readings, figures, tables, and
other materials for some test questions. For each question referencing this Supplement, carefully review all materials.

∑ ∑ ( ) ∑ ∑
Sample mean: 𝑋 = Sample variance: 𝑠 = = − Sample s.d.: 𝑠 = √𝑠
( )

∑ ( )( ) ∑ ∑ ∑
Sample coefficient of variation: 𝐶𝑉 = Sample covariance: 𝑠 = = −
( )


Sample interquartile range: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 Sample coefficient of correlation: 𝑟 = =

( )
Addition rule: 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵) Conditional probability: 𝑃(𝐴|𝐵) =
( )

Complement rules: 𝑃(𝐴 ) = 𝑃(𝐴 ) = 1 − 𝑃(𝐴) 𝑃(𝐴 |𝐵) = 𝑃(𝐴 |𝐵) = 1 − 𝑃(𝐴|𝐵)
Multiplication rule: 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴|𝐵)𝑃(𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴)

Expected value: 𝐸[𝑋] = 𝜇 = ∑ 𝑥𝑝(𝑥) Variance: 𝑉[𝑋] = 𝐸[(𝑋 − 𝜇) ] = 𝜎 = ∑ (𝑥 − 𝜇) 𝑝(𝑥)


Covariance: 𝐶𝑂𝑉[𝑋, 𝑌] = 𝐸[(𝑋 − 𝜇 )(𝑌 − 𝜇 )] = 𝜎 =∑ ∑ (𝑥 − 𝜇 )(𝑦 − 𝜇 )𝑝(𝑥, 𝑦)

Laws of expected value: Laws of variance: Laws of covariance:


𝐸[𝑐] = 𝑐 𝑉[𝑐] = 0 𝐶𝑂𝑉[𝑋, 𝑐] = 0
𝐸[𝑋 + 𝑐] = 𝐸[𝑋] + 𝑐 𝑉[𝑋 + 𝑐] = 𝑉[𝑋] 𝐶𝑂𝑉[𝑎 + 𝑏𝑋, 𝑐 + 𝑑𝑌] = 𝑏𝑑 ∗ 𝐶𝑂𝑉[𝑋, 𝑌]
𝐸[𝑐𝑋] = 𝑐𝐸[𝑋] 𝑉[𝑐𝑋] = 𝑐 𝑉[𝑋]
𝐸[𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑎 + 𝑏𝐸[𝑋] + 𝑐𝐸[𝑌] 𝑉[𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑏 𝑉[𝑋] + 𝑐 𝑉[𝑌] + 2𝑏𝑐 ∗ 𝐶𝑂𝑉[𝑋, 𝑌]
𝑉[𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑏 𝑉[𝑋] + 𝑐 𝑉[𝑌] + 2𝑏𝑐 ∗ 𝑆𝐷(𝑋) ∗ 𝑆𝐷(𝑌) ∗ 𝜌
where 𝜌 = 𝐶𝑂𝑅𝑅𝐸𝐿𝐴𝑇𝐼𝑂𝑁[𝑋, 𝑌]
! !
Combinatorial formula: 𝐶 = Binomial probability: 𝑝(𝑥) = 𝑝 (1 − 𝑝) for 𝑥 = 0,1,2, … , 𝑛
!( )! !( )!

If 𝑿 is Binomial (𝑋~𝐵(𝑛, 𝑝)) then 𝐸[𝑋] = 𝑛𝑝 and 𝑉[𝑋] = 𝑛𝑝(1 − 𝑝)

( )
If 𝑿 is Uniform (𝑋~𝑈[𝑎, 𝑏]) then 𝑓(𝑥) = and 𝐸[𝑋] = and 𝑉[𝑋] =

Sampling distribution of 𝑿: Sampling distribution of 𝑷: Sampling distribution of (𝑷𝟐 − 𝑷𝟏 ):


𝜇 = 𝐸[𝑋] = 𝜇 𝜇 =𝐸 𝑃 =𝑝 𝜇 =𝐸 𝑃 −𝑃 =𝑝 −𝑝
( ) ( ) ( )
𝜎 = 𝑉[𝑋] = 𝜎 =𝑉 𝑃 = 𝜎 =𝑉 𝑃 −𝑃 = +
( ) ( ) ( )
𝜎 = 𝑆𝐷[𝑋] = 𝜎 = 𝑆𝐷 𝑃 = 𝜎 = 𝑆𝐷 𝑃 − 𝑃 = +

Sampling distribution of (𝑿𝟏 − 𝑿𝟐 ), independent samples: Sampling distribution of (𝑿𝒅 ), paired (𝒅 = 𝑿𝟏 − 𝑿𝟐 ):


𝜇 = 𝐸[𝑋 − 𝑋 ] = 𝜇 − 𝜇 𝜇 = 𝐸[𝑋 ] = 𝜇 − 𝜇
∗ ∗ ∗
𝜎 = 𝑉[𝑋 − 𝑋 ] = + 𝜎 = 𝑉[𝑋 ] = =
∗ ∗ ∗
𝜎 = 𝑆𝐷[𝑋 − 𝑋 ] = + 𝜎 = 𝑆𝐷[𝑋 ] = =

The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 2 of 12

Inference about a population proportion:


( )
𝒛 test statistic: 𝑧 = CI estimator: 𝑃 ± 𝑧 ⁄
( )

Inference about comparing two population proportions:

𝒛 test statistic under Null hypothesis of no difference: 𝑧 = Pooled proportion: 𝑃 =


( ) ( )

( ) ( )
CI estimator: (𝑃 − 𝑃 ) ± 𝑧 / +

Inference about the population mean:

𝒕 test statistic: 𝑡 = CI estimator: 𝑋 ± 𝑡 / Degrees of freedom: 𝜈 = 𝑛 − 1


/√ √

Inference about a comparing two population means, independent samples, unequal variances:

( )
𝒕 test statistic: 𝑡 = CI estimator: (𝑋 − 𝑋 ) ± 𝑡 ⁄ +

Degrees of freedom: 𝜈 =

Inference about a comparing two population means, independent samples, assuming equal variances:

( )
𝒕 test statistic: 𝑡 = CI estimator: (𝑋 − 𝑋 ) ± 𝑡 ⁄ + Degrees of freedom: 𝜈 = 𝑛 + 𝑛 − 2

( ) ( )
Pooled variance: 𝑠 =

Inference about a comparing two population means, paired data: (𝑛 is number of pairs and 𝑑 = 𝑋 − 𝑋 )

𝒕 test statistic: 𝑡 = CI estimator: 𝑋 ± 𝑡 ⁄ Degrees of freedom: 𝜈 = 𝑛 − 1


⁄√ √

SIMPLE REGRESSION:

Model: 𝑦 = 𝛽 + 𝛽 𝑥 + 𝜀 OLS line: 𝑦 = 𝑏 + 𝑏 𝑥 𝑏 = =𝑟 𝑏 =𝑌−𝑏 𝑋

Coefficient of determination: 𝑅 = (𝑟) Residuals: 𝑒 = 𝑦 − 𝑦

∑ ( )
Standard deviation of residuals: 𝑠 = = Standard error of slope: 𝑠. 𝑒. (𝑏 ) = 𝑠 =
( )
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 3 of 12

Inference about the population slope:

𝒕 test statistic: 𝑡 =
)
CI estimator: 𝑏 ± 𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) Degrees of freedom: 𝜈 = 𝑛 − 2
. .(

Standard error of slope: 𝑠. 𝑒. (𝑏 ) = 𝑠 =


( )

Prediction interval for 𝒚 at given value of 𝒙 (𝒙𝒈 ):

𝑦 ±𝑡 ⁄ 𝑠 1+ + ( )
or 𝑦 ±𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) 𝑥 −𝑋 + +𝑠

Degrees of freedom: 𝜈 = 𝑛 − 2

Confidence interval for predicted mean at given value of 𝒙 (𝒙𝒈 ):

𝑦 ±𝑡 ⁄ 𝑠 +( )
or 𝑦 ±𝑡 ⁄ 𝑠. 𝑒. (𝑏 ) 𝑥 −𝑋 + Degrees of freedom: 𝜈 = 𝑛 − 2

SIMPLE & MULTIPLE REGRESSION:

Model: 𝑦 = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯+ 𝛽 𝑥 + 𝜀

𝑆𝑆𝑇 = ∑ (𝑦 − 𝑌) = 𝑆𝑆𝑅 + 𝑆𝑆𝐸 𝑆𝑆𝑅 = ∑ (𝑦 − 𝑌) 𝑆𝑆𝐸 = ∑ 𝑒 =∑ (𝑦 − 𝑦 )

𝑠 = 𝑀𝑆𝐸 = 𝑅𝑜𝑜𝑡 𝑀𝑆𝐸 = 𝑀𝑆𝑅 =

⁄( )
𝑅 = =1− 𝐴𝑑𝑗. 𝑅 = 1 − ⁄(
= 𝑅 −
)

∑ ( )
Residuals: 𝑒 = 𝑦 − 𝑦 Standard deviation of residuals: 𝑠 = =

Inference about the overall statistical significance of the regression model:


/ ( )/ /
𝐹= = = =
( )/( ) /( ) /( )

Numerator degrees of freedom: 𝜈 = 𝑘 Denominator degrees of freedom: 𝜈 = 𝑛 − 𝑘 − 1

Inference about the population slope for explanatory variable j:

𝒕 test statistic: 𝑡 = CI estimator: 𝑏 ± 𝑡 / 𝑠 Degrees of freedom: 𝜈 = 𝑛 − 𝑘 − 1

Standard error of slope: 𝑠. 𝑒. 𝑏 = 𝑠 (for multiple regression, must be obtained from technology)
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 4 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 5 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 6 of 12
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 7 of 12

Supplement for Question (1): The 2018 World Happiness Report by John F. Helliwell, Richard Layard, and Jeffrey D. Sachs
(http://worldhappiness.report/) documents and analyzes individual well-being across countries and over time.

Excerpt (p. 104): [We measure happiness, also called “life evaluations,” with] the question “Please imagine a
ladder with steps numbered from zero at the bottom to ten at the top. Suppose we say that the top of the
ladder represents the best possible life for you, and the bottom of the ladder represents the worst possible life
for you. On which step of the ladder would you say you personally feel you stand at this time, assuming that
the higher the step the better you feel about your life, and the lower the step the worse you feel about it?
Which step comes closest to the way you feel?”

Surveys ask this Cantril ladder question to a fresh random sample of people in each year and each country. Below are
pieces of Figure 2.2 and an excerpt that explains it. (The full figure is three pages long and shows 156 countries.)

Excerpt (p. 21): Figure 2.2 shows the average ladder score (answer to the Cantril ladder question, on a scale of
0 to 10) for each country, averaged over the years 2015-2017. The total sample sizes are reported in the
statistical appendix, and are reflected in Figure 2.2 by the horizontal lines showing the 95% confidence regions.

Details for Canada and Japan: For 2015-2017 combined, the sample size for Canada (ranked 7) is 2,027 and the values of
the 95% CI shown in the figure are: [7.2363, 7.4207]. The sample size for Japan (ranked 54) is 3,008 and the values of the
95% CI shown in the figure are: [5.8333, 5.9967].

Below is an excerpt of Table A7 on p. 109.

Notes: Includes Venezuela, Brazil, Mexico, Costa Rica, Argentina, Bolivia, Chile, Colombia, Ecuador, El Salvador, Guatemala,
Honduras, Nicaragua, Panama, Paraguay, Peru, and Uruguay and excludes the foreign-born in each country of interview.
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 8 of 12

Supplement for Question (2): Recall Zheng and Kahn (2017) “A New Era of Pollution Progress in Urban China?” PM10
measures of air pollution in micrograms per cubic meter of air (𝜇𝑔/𝑚 ). The regression below uses 10 years of data,
2003 through 2012, for the city of Tianjin. The variable trend is 1 for the year 2003, 2 for 2004, 3 for 2005, …, and 10
for 2012. The variable trend_sq is trend squared. One value below is in boldface for easy reference.

Source | SS df MS Number of obs = 10


-------------+---------------------------------- F(2, 7) = 12.29
Model | 1153.57945 2 576.789726 Prob > F = 0.0051
Residual | 328.492494 7 46.9274991 R-squared = 0.7784
-------------+---------------------------------- Adj R-squared = 0.7150
Total | 1482.07195 9 164.674661 Root MSE = 6.8504

------------------------------------------------------------------------------
pm10 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trend | -14.73599 3.364972 -4.38 0.003 -22.69289 -6.779099
trend_sq | 1.117 .2981239 3.75 0.007 .412049 1.821951
_cons | 143.0097 8.05707 17.75 0.000 123.9577 162.0616
------------------------------------------------------------------------------

Supplement for Question (3): Recall Levinson (2016) “How Much Energy Do Building Energy Codes Save? Evidence from
California Houses” (https://www.aeaweb.org/articles?id=10.1257/aer.20150102). Below are two multiple regressions from
Excel with parts in boldface for easy reference. In both Regression #1 and #2, the y variable is ln_elec_mmbtu, which is
the natural log of annual household electricity use in MMBTUs. yr_2009 is a survey year dummy (=1 if 2009 RASS data,
=0 if 2003 RASS data). A set of dummy variables record when the house was constructed, with before 1940 serving as
the reference (omitted) category. (For example, constr_40_49 =1 if constructed from 1940-1949, =0 otherwise.)

Regression #1: (Dependent variable is the natural logarithm of annual household electricity use in MMBTUs)
Regression Statistics
R Squared 0.086038686
Observations 14045

ANOVA
df SS MS F Significance F
Regression 12 316.1656134 26.34713445 110.0789512 4.9946E-263
Residual 14032 3358.525736 0.239347615
Total 14044 3674.69135

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 2.80091723 0.014998686 186.7441713 0 2.77151781 2.830316651
yr_2009 0.099643802 0.008411839 11.84566212 3.22698E-32 0.083155478 0.116132125
constr_40_49 0.026738028 0.021701186 1.232099856 0.217932431 -0.015799184 0.06927524
constr_50_59 0.110574793 0.01757221 6.292594734 3.21478E-10 0.076130924 0.145018662
constr_60_69 0.244376666 0.017853879 13.68759493 2.25322E-42 0.209380687 0.279372644
constr_70_74 0.276537056 0.020694214 13.36301306 1.75802E-40 0.235973642 0.317100469
constr_75_77 0.346758564 0.023614619 14.68406364 1.8657E-48 0.300470769 0.393046359
constr_78_82 0.366688048 0.021044291 17.42458582 2.73986E-67 0.325438439 0.407937658
constr_83_92 0.391032426 0.017953747 21.77998972 1.8423E-103 0.355840693 0.426224159
constr_93_97 0.378969248 0.022834935 16.59602942 2.85445E-61 0.334209738 0.423728758
constr_98_00 0.380513957 0.024034536 15.83196574 5.71125E-56 0.333403067 0.427624846
constr_01_04 0.394942885 0.025326069 15.59432225 2.27406E-54 0.345300419 0.44458535
constr_05_08 0.377592192 0.032209664 11.72294738 1.36488E-31 0.314456965 0.440727418

Supplement for Question (3), continues on the next page >>>>>


The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 9 of 12

Supplement for Question (3), cont’d:


Regression #2 also includes the following variables: cool_deg_days (cooling degree days in 100s for that year and house
location), ln_sq_feet (natural log of house size in 1,000s of square feet), ln_num_res (natural log of the number of
residents living at this address), and central_ac (=1 if house has central air conditioning, =0 otherwise).

Regression #2: (Dependent variable is the natural logarithm of annual household electricity use in MMBTUs)
Regression Statistics
R Squared 0.34926507
Observations 14045

ANOVA
df SS MS F Significance F
Regression 16 1283.441333 80.21508331 470.5727886 0
Residual 14028 2391.250017 0.170462647
Total 14044 3674.69135

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 2.299491392 0.014919424 154.1273549 0 2.270247335 2.32873545
cool_deg_days 0.023994715 0.001136583 21.11128134 2.03848E-97 0.021766861 0.026222568
ln_sq_feet 0.43683163 0.009618505 45.41575184 0 0.41797808 0.45568518
ln_num_res 0.256371747 0.006796996 37.71838984 1.2487E-296 0.24304873 0.269694764
central_ac 0.211006685 0.008275383 25.4981183 3.1576E-140 0.194785834 0.227227537
yr_2009 0.051487662 0.007339257 7.015378481 2.39835E-12 0.037101743 0.065873582
constr_40_49 0.029368085 0.018332682 1.601952457 0.109188639 -0.006566412 0.065302581
constr_50_59 0.037917498 0.014877628 2.548625141 0.010825406 0.008755366 0.06707963
constr_60_69 0.078540997 0.015253059 5.149196341 2.65146E-07 0.04864297 0.108439023
constr_70_74 0.077211207 0.017697734 4.362773726 1.29341E-05 0.042521293 0.11190112
constr_75_77 0.093859498 0.020283551 4.627370234 3.73654E-06 0.054101039 0.133617957
constr_78_82 0.088566174 0.018226966 4.859073978 1.19211E-06 0.052838896 0.124293453
constr_83_92 0.059345294 0.015944254 3.722048946 0.000198385 0.028092434 0.090598154
constr_93_97 0.000500876 0.020048108 0.02498371 0.980068313 -0.038796084 0.039797836
constr_98_00 -0.030807821 0.021152757 -1.456444687 0.145292111 -0.072270041 0.010654399
constr_01_04 -0.052840257 0.022329139 -2.366426071 0.017974268 -0.096608342 -0.009072172
constr_05_08 -0.131700471 0.028135997 -4.680853193 2.88357E-06 -0.18685077 -0.076550172

Supplement for Question (4): Consider the following correlation matrix for observational data with 100 observations
and four variables.

correlate y x1 x2 x3;

(obs=100)
| y x1 x2 x3
-------------+------------------------------------
y | 1.0000
x1 | -0.2892 1.0000
x2 | 0.1891 -0.0182 1.0000
x3 | 0.1423 -0.0384 0.3216 1.0000
The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 10 of 12

Supplement for Question (5): In the article “Parents’ Beliefs About Their Children’s Academic Ability: Implications for
Educational Investments,” (https://www.aeaweb.org/articles?id=10.1257/aer.20171172) Dizon-Ross (2019) shows how
providing parents with clear, understandable information about their child’s academic progress can help parents make
more informed decisions. She does a field experiment with 5,268 children from 39 randomly selected primary schools in
two districts (Machinga and Balaka) in Malawi. Here is a quick summary.

 The researchers have the scores (out of 100 points) for every student in the subjects of math, English, and
Chichewa. The overall score, which measures academic performance, is the average of these three scores.
 Schools are required to send school report cards home to parents. However, these are often lost (students do
not deliver them to their parents) or are unclear to uneducated parents.
 In the baseline survey, researchers asked parents about beliefs about their children’s academic performance.
 The 5,268 children are randomly divided into a treatment group (𝑛 = 2,614) and control group (𝑛 = 2,654).
o For the treatment group, parents received an additional and specially designed report card on the child’s
performance, which the researchers carefully designed to be clear for all parents including uneducated
ones. Further, a trained person walked parents through each number.
o For the control group, parents only had the usual information (i.e. school report cards that they may
never have received or may not be able to understand).
 In the endline survey, researchers asked parents again about beliefs about their child’s academic performance
by asking them to imagine a new test taken that day. These endline beliefs are also measured out of 100 points.

Next are STATA summaries of key variables, separately for the control group and the treatment group.

Control group:
Overall Score
-------------------------------------------------------------
Percentiles Smallest
1% 9 0
5% 18 2
10% 24 2 Obs 2,654
25% 35 2 Sum of Wgt. 2,654

50% 47 Mean 47.13075


Largest Std. Dev. 17.44873
75% 59 98
90% 70 98 Variance 304.4582
95% 76 99 Skewness .0712081
99% 88 100 Kurtosis 2.701315

Treatment group:
Overall Score
-------------------------------------------------------------
Percentiles Smallest
1% 10 0
5% 18 0
10% 24 0 Obs 2,614
25% 34 0 Sum of Wgt. 2,614

50% 46 Mean 46.35731


Largest Std. Dev. 17.53062
75% 58 95
90% 70 96 Variance 307.3227
95% 76 97 Skewness .1070127
99% 88 100 Kurtosis 2.651868

Supplement for Question (5), continues on the next page >>>>>


The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 11 of 12

Supplement for Question (5), cont’d:

Control group:
Endline Beliefs (out of 100 points)
-------------------------------------------------------------
Percentiles Smallest
1% 20 0
5% 35 1
10% 40 2 Obs 2,642
25% 50 4 Sum of Wgt. 2,642

50% 65 Mean 63.56283


Largest Std. Dev. 17.65563
75% 75 100
90% 85 100 Variance 311.7213
95% 90 100 Skewness 1.28091
99% 95 350 Kurtosis 28.7242

Note: There is an outlier in the posted replication data, which is shown in boldface above.

Treatment group:
Endline Beliefs (out of 100 points)
-------------------------------------------------------------
Percentiles Smallest
1% 15 0
5% 25 0
10% 30 0 Obs 2,602
25% 45 1 Sum of Wgt. 2,602

50% 55 Mean 56.14105


Largest Std. Dev. 18.44784
75% 70 100
90% 80 100 Variance 340.3227
95% 85 100 Skewness -.1299682
99% 95 100 Kurtosis 2.635569

Supplement for Question (5), continues on the next page >>>>>


The pages of this supplement will NOT be graded: write your answers on the test papers. Supplement: Page 12 of 12

Supplement for Question (5), cont’d:

Next, consider the excerpt below.

Excerpt (p. 21): I now examine whether information changes beliefs by looking at the impact of information on
mean beliefs measured at endline. Recall that, unlike beliefs measured at baseline, the beliefs question asked
at endline was not asking about last-term test scores; instead, it asked how well parents thought their child
would do on a hypothetical test taken that same day. The prediction is thus that providing information should
decrease the gap between parents’ endline beliefs and their child’s last-term scores. Information cuts the gap
nearly in half.

Table 1 below comes from Tables 2 and C.7 in the original paper as well as direct analysis of the replication data.

Table 1: Information Treatment Effects


Dependent variable: Endline Beliefs (out of 100 points)
(1) (2) (3)
Explanatory variables:
0.405 0.408 0.408
Treat  Overall Score
(0.024) (0.023) (0.023)
0.305 0.303 0.303
Overall Score
(0.017) (0.017) (0.017)
-25.998 -26.023 -26.023
Treat
(1.208) (1.167) (1.167)
284.147
Dummy (HH=968, Ref=2) - -
(14.812)
49.188 49.213 49.213
Constant
(0.859) (0.830) (0.830)
Observations 5,244 5,243 5,244
R-squared 0.3095 0.3228 0.3548

Notes: Shows OLS results for three regressions. Data sources are baseline survey, baseline test score data, and
endline survey data. Each observation is a child. The dependent variable is the parent’s endline beliefs about
the child’s performance (out of 100 points) on a hypothetical test taken the same day as the endline survey.
Overall score is the child’s actual average overall score (out of 100 points) on the subjects of math, English,
and Chichewa. Treat is a dummy that equals 1 for those children in the treatment group and equals 0
otherwise. Standard errors are in parentheses. Regression (1) includes an outlier: the observation for
Household id=968 and reference child=2 with endline beliefs of 350, which is outside the possible range of
values (0 to 100). Regression (2) excludes that outlier. Regression (3) keeps that outlier but includes a dummy
variable for it.

You might also like