0% found this document useful (0 votes)
16 views

Midterm (1)

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Midterm (1)

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Midterm 2018/2019

Question 1: Mark each statement with T or F in the right side: [5 marks]

1) In supervised learning, The learning algorithm detects similarity between


( F )
different training data inputs
2) We can get multiple local optimum solutions if we solve a linear regression
( F )
problem by minimizing the sum of squared errors using gradient descent.
3) When a decision tree is grown to full depth, it is more likely to fit the noise in
the data. ( T )

4) When the feature space is larger, over fitting is more likely.


( T )

5) Since classification is a special case of regression, logistic regression is a


special case of linear regression. ( F )

6) The Gradient descent will always find the global optimum ( F)

7) Overfitting Indicates limited generalization ( T )

8) In Support Vector Machines (SVM) ,Inputs are mapped to lower dimensional


space where data becomes likely to be linearly separable ( F)

9) When the trained system matches the training set perfectly, overfitting may
occur ( T )

10) Algorithms for supervised learning are not directly applicable for unsupervised
learning ( T)

Question 2

In Figure we depict training data and a single test X2


point for the task of classification given two
continuous attributes X1 and X2. For each value of
k, circle the label predicted by the k-nearest
neighbor classifier for the depicted test point.

X1
1. Predicted label for k = 1:
(a) positive (b) negative
2. Predicted label for k = 3:
(a) positive (b) negative
3. Predicted label for k = 5:
(a) positive (b) negative
Question 3
Assume the following data

Construct a parametric classifier using Naïve byes to predict whether this person with a new
instance
X= (Given Birth= "Yes", Can Fly= "no", Live in water = "Yes", Have legs="no")
Will be mammals or non-mammals.

jhgfjh
Question4 with short answer
11) The training error of 1-NN classifier is 0. (true/false ) Explain
True: Each point is its own neighbor, so 1-NN classifier achieves perfect classification
on training data.

12) Consider a naive Bayes classifier with 3 boolean input variables, X1;X2 and X3, and
one Boolean output, Y . How many parameters must be estimated to train such a
naive Bayes classifier? (list them) .

Solutions:
For a naive Bayes classifier, we need to estimate P(Y=1), P(X1 = 1|y = 0); P(X2 = 1|y = 0), P(X3 =
1|y = 0), P(X1 = 1|y = 1); P(X2 = 1|y = 1); P(X3 = 1|y = 1). Other probabilities can be obtained
with the constraint that the probabilities sum up to 1.
So we need to estimate 7 or 8 parameters.

13) The depth of a learned decision tree can be larger than the number of training
examples used to create the tree. . (true/false ) Explain

False: Each split of the tree must correspond to at least one training example, therefore,
if there are n training examples, a path in the tree can have length at most n

14) We consider the following models of logistic regression for a binary classification
with a sigmoid function
11
gg((zz))  zz
11 ee

We have three training examples:

Does it matter how the third example is labeled in Model 1? i.e., would the learned
value of w = (w1, w2) be different if we change the label of the third example to -1?
Does it matter in Model 2? Briefly explain your answer. (Hint: think of the decision
boundary on 2D plane.)
It does not matter in Model 1 because x (3) = (0, 0) makes w1x1 + w2x2 always zero and
hence the likelihood of the model does not depend on the value of w. But it does matter
in Model 2.

15) Briefly describe the difference between a maximum likelihood hypothesis and a
maximum a posteriori hypothesis.

You might also like