0% found this document useful (0 votes)
39 views

Qualification Exam Question: 1 Statistical Models and Methods

This document contains a qualification exam question covering three topics: [1] Statistical Models and Methods, [2] Learning Theory, and [3] Decision Processes. For each topic, it lists several core concepts and problems, asking the test taker to define terms, derive solutions, compare approaches, and discuss tradeoffs. The exam questions gauge understanding of fundamental machine learning techniques like cross-validation, Bayes classifiers, kernel methods, VC dimension, reinforcement learning approaches, and more.

Uploaded by

Almalieque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Qualification Exam Question: 1 Statistical Models and Methods

This document contains a qualification exam question covering three topics: [1] Statistical Models and Methods, [2] Learning Theory, and [3] Decision Processes. For each topic, it lists several core concepts and problems, asking the test taker to define terms, derive solutions, compare approaches, and discuss tradeoffs. The exam questions gauge understanding of fundamental machine learning techniques like cross-validation, Bayes classifiers, kernel methods, VC dimension, reinforcement learning approaches, and more.

Uploaded by

Almalieque
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Qualification Exam Question

1 Statistical Models and Methods


1.1 Core
1. Cross-validation We would like to perform k-fold cross-validation. What should k be?
Discuss the pros and cons of large or small values of k.
2. Bayes classifier

(a) Write down the Bayes classifier f : X → Y (the classifier that minimizes the expected
loss E(L(Y, f (X)))) for binary classification Y ∈ {−1, +1} with non 0-1 loss (a is the loss
for falsely predicting negative and b is the loss for falsely predicting positive). Simplify
the classification rule as much as you can.
(b) If P (X|Y = y) is a multivariate Gaussian and assuming the 0/1 loss, write the Bayes
classifier as f (X) = sign(h(X)) and simplify h as much as possible. What is the geo-
metric shape of the decision boundary?
(c) Repeat (b) when the two Gaussians have identical covariance matrices. What is the
geometric shape of the decision boundary?
(d) Repeat (b) when the two Gaussians have covariance matrix that equals the identity
matrix. Describe the geometric shape of the decision boundary as much as possible.

3. Multiclass classification
Multiclass classification tries to assign one of several class labels (rather than binary labels) to
an object. Can you give two ways which use binary classifiers to solve multiclass classification
problem? What are the pros and cons of these different methods (eg. in terms of computional
complexity or the applicability of the method)? Besides using binary classifiers, do you have
any other idea on how to build a multiclass classifier?

1.2 Methods and Models


1. Kernel methods
Consider two machine learning models for 2-class classification. The first is a support vector
machine with Gaussian kernel. The second is kernel discriminant analysis (a Bayes classifier
with a kernel density estimator for each class), where the bandwidth may vary for each
dimension, and possibly also for each data point. Which is the more expressive, or powerful
model? Compare and discuss the pros and cons of each.
2. Bayes Rule
(y−µ)2
Let φ(y; µ, σ 2 ) = √ 1 2 e− 2σ2 denote the density of a random variable y with a Gaussian
2πσ
distribution N (µ, σ 2 ). Suppose that we have three related random variables, X, Y and Z,

1
• Random variable X has a Gaussian distribution N (0, σ 2 );
• Given random variable X = x, random variable Y has a Gaussian distribution N (x, σ 2 );
• Given random variable Y = y, random variable Z is a mixture of two Gaussians with
density

p(z|Y = y) = (1 − α)φ(z; 0, σ 2 ) + αφ(z; y, σ 2 ). (1)

• Conditioned on random variable Y , random variables X and Z are independent

Given n i.i.d. sample z 1 , . . . , z n from the mixture density (1), answer the following questions

(a) If n = 2, derive the posterior distribution of X conditioned on (z 1 , . . . , z n ) exact upto a


scalar difference.
(b) If n = 10 (or in general when n is large), what is the computational problem associated
with computing the poterior distribution of X?
(c) Propose approximation algorithms to deal with the computational problem when n is
large.

3. Dependent noise model


Let X1 , . . . , Xn be n determinations of a physical constant θ. Consider the model,

Xi = θ + ei , i = 1, . . . , n

and assume
ei = αei−1 + βei−2 + i , i = 1, . . . , n, e0 = 0, e−1 = 0
with i ’s iid standard normal, and α and β are known constant. What is the maximum
likelihood estimate of θ? Carefully justify each step of your derivation/calculation.

2 Learning Theory
1. VCdimension
(a) What is the VC-dimension of axis-parallel rectangles in R3 ? Specifically, a legal target
function is specified by three intervals [xmin , xmax ], [ymin , ymax ], and [zmin , zmax ], and classifies
an example (x, y, z) as positive iff x ∈ [xmin , xmax ], y ∈ [ymin , ymax ], and z ∈ [zmin , zmax ]. (b)
Describe the importance of VC-dimension for Machine Learning.

2. Mistake-bound model.
(a) k-CNF is the class of Conjunctive Normal Form formulas in which each clause has size
at most k. E.g., x4 ∧ (x1 ∨ x2 ) ∧ (x2 ∨ x̄3 ∨ x5 ) is a 3-CNF. Give an algorithm to learn
5-CNF formulas over n boolean features in the mistake-bound model. Your algorithm should
run in polynomial-time per example (so the “halving algorithm” is not allowed). How many
mistakes does it make at most? (b) What is the relationship between the mistake bound
model and the PAC learning model?

3. Consistency Problem for 2-term DNF formulas (a) Prove that the consistency problem
for 2-term DNF formulas is NP-hard. (b) Is the class of 2-term DNF formulas PAC-learnable?
Explain why or why not.

2
3 Decision Processes
The theme is scalability, and you aren’t getting out of it.

1. Scaling up reinforcement learning


Machine learning algorithms have traditionally had difficulty scaling to large problems. In
classification and traditional supervised learning this problem arises with data that exist in
very high dimensional spaces or when there are many data points for computing, for example,
estimates of conditional densities. In reinforcement learning this is also the case, arising when,
for example, there are many, many states or when actions are at a very low level of abstraction.

• Typical approaches to addressing such problems in RL include function approximation


and problem decomposition. Compare and contrast these two approaches. What prob-
lems of scale do these approaches address? What are their strengths and weaknesses?
Are they orthogonal approaches? Can they work well together?
• What are the differences between hierarchical and modular reinforcement learning? Ex-
plain both the theoretical and practical limits of these approaches.

2. Learning from demonstrations


Machine learning algorithms have traditionally had difficulty scaling to large problems. In
classification and traditional supervised learning this problem arises with data that exist in
very high dimensional spaces or when there are many data points for computing, for example,
estimates of conditional densities. In reinforcement learning this is also the case, arising when,
for example, there are many, many states or when actions are at a very low level of abstraction.
Imagine that we want to leverage domain knowledge from humans in order attack this problem
of scalability. One mechanism we might use is Learning from Demonstration where humans
demonstrate correct behavior; however, complex tasks can require more examples of complete
behavior than is practical to obtain. Given that you will only be able to extract so much time
from your human teachers, what are at least two ways you might still take advantage of their
ability to give demonstrations, even for complex tasks? For each proposed method, describe
strengths and possible pitfalls.

3. Learning with Options


Machine learning algorithms have traditionally had difficulty scaling to large problems. In
classification and traditional supervised learning this problem arises with data that exist in
very high dimensional spaces or when there are many data points for computing, for example,
estimates of conditional densities. In reinforcement learning this is also the case, arising when,
for example, there are many, many states or when actions are at a very low level of abstraction.
One mechanism for addressing such concerns in RL is to use so-called options. Options are a
mechanism for incorporating temporally-extended actions into the RL framework.

• Formally define an option.


• What are the advantages and limits of options? Be specific.
• Describe at least two ways one might automate the process of creating options.
• Although options are defined in a very specific way, would you argue that different op-
tions might serve different purposes? If so, do these different kinds of options have

3
identifiably different properties? If you believe that different options do not serve differ-
ent purposes, argue for that position as well.

You might also like