Midterm (1)
Midterm (1)
9) When the trained system matches the training set perfectly, overfitting may
occur ( T )
10) Algorithms for supervised learning are not directly applicable for unsupervised
learning ( T)
Question 2
X1
1. Predicted label for k = 1:
(a) positive (b) negative
2. Predicted label for k = 3:
(a) positive (b) negative
3. Predicted label for k = 5:
(a) positive (b) negative
Question 3
Assume the following data
Construct a parametric classifier using Naïve byes to predict whether this person with a new
instance
X= (Given Birth= "Yes", Can Fly= "no", Live in water = "Yes", Have legs="no")
Will be mammals or non-mammals.
jhgfjh
Question4 with short answer
11) The training error of 1-NN classifier is 0. (true/false ) Explain
True: Each point is its own neighbor, so 1-NN classifier achieves perfect classification
on training data.
12) Consider a naive Bayes classifier with 3 boolean input variables, X1;X2 and X3, and
one Boolean output, Y . How many parameters must be estimated to train such a
naive Bayes classifier? (list them) .
Solutions:
For a naive Bayes classifier, we need to estimate P(Y=1), P(X1 = 1|y = 0); P(X2 = 1|y = 0), P(X3 =
1|y = 0), P(X1 = 1|y = 1); P(X2 = 1|y = 1); P(X3 = 1|y = 1). Other probabilities can be obtained
with the constraint that the probabilities sum up to 1.
So we need to estimate 7 or 8 parameters.
13) The depth of a learned decision tree can be larger than the number of training
examples used to create the tree. . (true/false ) Explain
False: Each split of the tree must correspond to at least one training example, therefore,
if there are n training examples, a path in the tree can have length at most n
14) We consider the following models of logistic regression for a binary classification
with a sigmoid function
11
gg((zz)) zz
11 ee
Does it matter how the third example is labeled in Model 1? i.e., would the learned
value of w = (w1, w2) be different if we change the label of the third example to -1?
Does it matter in Model 2? Briefly explain your answer. (Hint: think of the decision
boundary on 2D plane.)
It does not matter in Model 1 because x (3) = (0, 0) makes w1x1 + w2x2 always zero and
hence the likelihood of the model does not depend on the value of w. But it does matter
in Model 2.
15) Briefly describe the difference between a maximum likelihood hypothesis and a
maximum a posteriori hypothesis.