Exercise - 03 - Machine Learning
Exercise - 03 - Machine Learning
Exercise 03
rd
Submission by Thursday, 23 May 2024, 8:30 A.M. followed by a tutorial at the same time.
Attempt the tasks in groups of at most three individuals.
7. Programming task I.
Data Arrests from R package effects is about Arrests for Marijuana Possession, which is on
police treatment of individuals arrested in Toronto for simple possession of small quantities
of marijuana. The data are part of a larger dataset featured in a series of articles in the
Toronto Star newspaper. Analyze this data using logistic regression to identify whether
released (i.e., Whether the arrestee was released with a summon; a factor with levels: No;
Yes) is a statistically related variable:
• Note: to dichotomize the outcome Improved into 1 = Better (i.e., combine Some and
Marked into 1) and 0= None.
Hint: load the data using R code: data(“Arthritis”, package=“vcd”).
9. Poisson regression
(a) Suppose Y takes values 0, 1, 2, . . . with probability density f (y) and mean θ. Calculate
E(Y |y > 0). Hint: Calculate the mean of a truncated Poisson distribution.
(b) Assume now that y takes the values 0, 1, 2, . . . with hurdle density
P [y = 0] = f1 (0)
and
[1 − f1 (0)]
P [y = k] = f2 (0), k = 1, 2, . . .
[1 − f2 (0)]
P∞
where the density f2 (y) has untruncated mean θ2 , that is, k=0 kf (k) = θ2 . Find E(Y ).
1
(c) Introducing regressors, assume the zeros are given by a logit model and the positives
by a Poisson model, that is,
1
f1 (0) =
X ′β 1 )]
[1 + exp(X
X ′β 2 )][exp(X
f1 (k) = exp[− exp(X X ′β 2 )k /y!], k = 1, 2, . . . ;
X ].
give an expression for E[y|X
X ]/dX
(d) Hence obtain an expression for dE[y|X X for the hurdle model.