Qualification Exam Question: 1 Statistical Models and Methods
Qualification Exam Question: 1 Statistical Models and Methods
(a) Write down the Bayes classifier f : X → Y (the classifier that minimizes the expected
loss E(L(Y, f (X)))) for binary classification Y ∈ {−1, +1} with non 0-1 loss (a is the loss
for falsely predicting negative and b is the loss for falsely predicting positive). Simplify
the classification rule as much as you can.
(b) If P (X|Y = y) is a multivariate Gaussian and assuming the 0/1 loss, write the Bayes
classifier as f (X) = sign(h(X)) and simplify h as much as possible. What is the geo-
metric shape of the decision boundary?
(c) Repeat (b) when the two Gaussians have identical covariance matrices. What is the
geometric shape of the decision boundary?
(d) Repeat (b) when the two Gaussians have covariance matrix that equals the identity
matrix. Describe the geometric shape of the decision boundary as much as possible.
3. Multiclass classification
Multiclass classification tries to assign one of several class labels (rather than binary labels) to
an object. Can you give two ways which use binary classifiers to solve multiclass classification
problem? What are the pros and cons of these different methods (eg. in terms of computional
complexity or the applicability of the method)? Besides using binary classifiers, do you have
any other idea on how to build a multiclass classifier?
1
• Random variable X has a Gaussian distribution N (0, σ 2 );
• Given random variable X = x, random variable Y has a Gaussian distribution N (x, σ 2 );
• Given random variable Y = y, random variable Z is a mixture of two Gaussians with
density
Given n i.i.d. sample z 1 , . . . , z n from the mixture density (1), answer the following questions
Xi = θ + ei , i = 1, . . . , n
and assume
ei = αei−1 + βei−2 + i , i = 1, . . . , n, e0 = 0, e−1 = 0
with i ’s iid standard normal, and α and β are known constant. What is the maximum
likelihood estimate of θ? Carefully justify each step of your derivation/calculation.
2 Learning Theory
1. VCdimension
(a) What is the VC-dimension of axis-parallel rectangles in R3 ? Specifically, a legal target
function is specified by three intervals [xmin , xmax ], [ymin , ymax ], and [zmin , zmax ], and classifies
an example (x, y, z) as positive iff x ∈ [xmin , xmax ], y ∈ [ymin , ymax ], and z ∈ [zmin , zmax ]. (b)
Describe the importance of VC-dimension for Machine Learning.
2. Mistake-bound model.
(a) k-CNF is the class of Conjunctive Normal Form formulas in which each clause has size
at most k. E.g., x4 ∧ (x1 ∨ x2 ) ∧ (x2 ∨ x̄3 ∨ x5 ) is a 3-CNF. Give an algorithm to learn
5-CNF formulas over n boolean features in the mistake-bound model. Your algorithm should
run in polynomial-time per example (so the “halving algorithm” is not allowed). How many
mistakes does it make at most? (b) What is the relationship between the mistake bound
model and the PAC learning model?
3. Consistency Problem for 2-term DNF formulas (a) Prove that the consistency problem
for 2-term DNF formulas is NP-hard. (b) Is the class of 2-term DNF formulas PAC-learnable?
Explain why or why not.
2
3 Decision Processes
The theme is scalability, and you aren’t getting out of it.
3
identifiably different properties? If you believe that different options do not serve differ-
ent purposes, argue for that position as well.