CS725 2021 Midsem
CS725 2021 Midsem
Mixed ML Bag
QUESTION= singlecorrect, MARKS=1
True or False? The model y = w0 + w1x1 + w2x2 + (2w3 + 3w4) x3 (where w0, w1, w2, w3 and w4
are model parameters) is a linear regression model.
OPTIONS=
A. True
B. False
ANSWER=A
QUESTION= singlecorrect, MARKS=1
[True or False] There are multiple local optimum solutions one could reach by using gradient
descent to optimize the least squares objective function for linear regression.
OPTIONS=
A. True
B. False
ANSWER=B
QUESTION=singlecorrect, MARKS=1
[True or False] For a binary classification problem, the true error of a hypothesis function can be
lower than its training error on a train set.
OPTIONS=
A. True
B. False
ANSWER=A
QUESTION=singlecorrect, MARKS=1
Compared to the variance of the MLE estimate θMLE , one expects the variance of the MAP
estimate to be ___________. Fill in the blank with one of the options from below.
OPTIONS=
A. Lower
B. Higher
C. Same
D. Can’t say
ANSWER=A
QUESTION=numeric, MARKS=1
Say you have two coins. Coin 1 will land on heads with a probability of p, while coin 2 will land
on heads with a probability of 2p. You toss both coins and record the following observations:
Coin 1 Head
Coin 1 Head
Coin 2 Tail
Coin 2 Tail
Coin 2 Head
Coin 2 Head
What is the maximum likelihood estimate for p? Write your answer as a decimal rounded up to
two decimal points.
ANSWER=[0.33,0.33]
QUESTION= singlecorrect, MARKS=1
[True or False] A decision tree of depth four will never have a training error that is higher than a
decision tree of depth one (when trained on the same dataset).
OPTIONS=
A. True
B. False
ANSWER=A
QUESTION= setcorrect, MARKS=2
Consider a decision tree with n leaves, for a binary classification task (with labels “+” and “−”).
Suppose a dataset is such that, for i = 1 to n, a pi fraction of them fall into the ith leaf. Also
suppose that of the samples which fall into the ith leaf, a fraction qi have the positive label.
Suppose you assign labels to the leaves of the decision tree in order to minimize the
classification error on this dataset. Choose the right expression(s) for the resulting error rate
from below.
OPTIONS=
A.
B.
C.
ANSWER=B
QUESTION= setcorrect, MARKS=1
Let us recall the setting we encountered in class where features are duplicated. Consider a
training instance with K binary features Xi ∈ {0,1}, i ∈ {1, … , K} and a class label Y ∈
{0,1}. Say we duplicate each feature so that each training instance has 2K binary features with
Xi = XK+i . (In the case of ties, assume that the class label of 1 will be chosen.) For a perceptron
classifier, select all the true statements below that compare test accuracy using the original
feature set and the duplicate feature set.
OPTIONS=
A. Compared to the duplicate feature set, the test accuracy could be higher using the
original feature set.
B. Compared to the original feature set, the test accuracy could be higher using the
duplicate feature set.
C. Both feature sets will yield the same test accuracy.
ANSWER=C
QUESTION=fillblanks, BLANKS=2, MARKS=1
Consider the following labeled training instances for a binary classification task:
(-2,1) 1
(2,1) -1
(4,-2) -1
(4,2) 1
Say that the perceptron classifier only has two weights w1 and w2 corresponding to X1 and X2 ,
respectively. Let w1 and w2 be initialized to w1 = 1 and w2 = -1. After processing the first
training instance with feature values (-2,1), what would be the new values of the weight vector.
Fill in the blanks with the exact values: w1 = _______ and w2 = _________.
BLANK=numeric, MARKS=0.5, ANSWER=’-1’
BLANK=numeric, MARKS=0.5, ANSWER=’0’
========================================================================
========
Set of features:
QUESTION=paper, MARKS=1
Set of features:
QUESTION=paper, MARKS=1
Set of features:
================================================================================
=========
Here, λ ≥ 0 is the ridge regression coefficient. Each yi is assumed to be generated such that
yi = wxi + εi where εi ~ N(0,1) (i.e. εi is zero-mean unit-variance Gaussian noise) and w is the
true (unknown) linear relationship that we would like to estimate.
QUESTION=paper, MARKS=2
Derive a closed-form expression for in terms of 𝛼, 𝛽 and 𝝀, where
QUESTION=paper, MARKS=2
What is E[ ] where the expectation is taken with respect to all yi 's? Write down your result in
terms of w, 𝛼, 𝛽 and 𝝀. Hint: E[εi ] = 0.
QUESTION=paper, MARKS=1
========================================================================
======
QUESTION=numeric, MARKS=2
What is the information gain at the root node for the given split? For all log values, consider log
to the base 2. You can use the following approximate value: log2(3) = 1.6. Write your answer as
a decimal. Round your answer to one decimal place (i.e. a single digit after the decimal point).
ANSWER=[0.2,0.2]
QUESTION=fillblanks, BLANKS=2, MARKS=1
For what A = _______ and B = _________ will we get the smallest possible value of information
gain at the node marked 24,8?
BLANK=numeric, MARKS=0.5, ANSWER=’12’
BLANK=numeric, MARKS=0.5, ANSWER=’4’
QUESTION=fillblanks, BLANKS=2, MARKS=1
For what G = _______ and H = _________ will we get the largest possible value of information
gain at the node marked 8,24?
BLANK=numeric, MARKS=0.5, ANSWER=’8’,’0’
BLANK=numeric, MARKS=0.5, ANSWER=’0’,’24’
========================================================================
======
QUESTION=module, COUNT=2
Naive Bayes
QUESTION=numeric, MARKS=2
Consider a binary classification problem with two binary input features X1 and X2. Suppose the
target Y ∈ {0,1} is in fact XOR(X1, X2). That is, Y = 1 iff exactly one of X1 and X2 equals one.
Compute the error probability of a Naive Bayes classifier trained on (infinite amount of) data
drawn from a distribution such that the following hold: P(X1 = 0, X2 = 0) = ⅓, P(X1 = 0, X2 = 1) = ¼
and P(Y = 0) = ½. Express your answer as a decimal rounded up to two decimal points. [This
problem might be a bit lengthy to solve.]
ANSWER=[0.17,0.17]
QUESTION=numeric, MARKS=1
Say you have a 4-class problem where the class label Y ∈ {0,1,2,3} and each training example
X has three attributes that takes one of three values, X1, X2, X3 ∈ {0,1,2}. For a Naive Bayes
classifier, how many parameters would you need to estimate?
ANSWER=[27,27]