Machine Learning Unit 4 MCQ
Machine Learning Unit 4 MCQ
22:40
1) [True or False] k-NN algorithm does
more computation on test time rather
than train time.
A) TRUE
B) FALSE
Solution: A
The training phase of the algorithm
consists only of storing the feature
vectors and class labels of the training
samples.
In the testing phase, a test point is
classified by assigning the label which
are most frequent among the k training
samples nearest to that query point –
hence higher computation.
3) Which of the following distance
metric can not be used in k-NN?
A) Manhattan
B) Minkowski
C) Tanimoto
D) Jaccard
E) Mahalanobis
F) All can be used
Solution: F
All of these distance metric can be used
as a distance metric for k-NN.
4) Which of the following option is true
about k-NN algorithm?
A) It can be used for classification
B) It can be used for regression
C) It can be used in both classification
and regression
Solution: C
We can also use k-NN for regression
problems. In this case the prediction can
be based on the mean or the median of
the k-most similar instances.
5) Which of the following statement is
true about k-NN algorithm?
1. k-NN performs much better if all of
the data have the same scale
2. k-NN works well with a small number
of input variables (p), but struggles
when the number of inputs is very
large
3. k-NN makes no assumptions about
the functional form of the problem
being solved
A) 1 and 2
B) 1 and 3
C) Only 1
D) All of the above
Solution: D
The above mentioned statements are
assumptions of kNN algorithm
6) Which of the following machine
learning algorithm can be used for
imputing missing values of both
categorical and continuous variables?
A) K-NN
B) Linear Regression
C) Logistic Regression
Solution: A
k-NN algorithm can be used for imputing
missing value of both categorical and
continuous variables.
7) Which of the following is true about
Manhattan distance?
A) It can be used for continuous
variables
B) It can be used for categorical
variables
C) It can be used for categorical as well
as continuous
D) None of these
Solution: A
Manhattan Distance is designed for
calculating the distance between real
valued features.
8) Which of the following distance
measure do we use in case of
categorical variables in k-NN?
1. Hamming Distance
2. Euclidean Distance
3. Manhattan Distance
A) 1
B) 2
C) 3
D) 1 and 2
E) 2 and 3
F) 1,2 and 3
Solution: A
Both Euclidean and Manhattan
distances are used in case of
continuous variables, whereas hamming
distance is used in case of categorical
variable.
9) Which of the following will be
Euclidean Distance between the two
data point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( (1-2)^2 + (3-3)^2) = sqrt(1^2 + 0^2)
=1
10) Which of the following will be
Manhattan Distance between the two
data point A(1,3) and B(2,3)?
A) 1
B) 2
C) 4
D) 8
Solution: A
sqrt( mod((1-2)) + mod((3-3))) = sqrt(1 +
0) = 1
Context: 11-12
Suppose, you have given the following
data where x and y are the 2 input
variables and Class is the dependent
variable.
Below is a scatter plot which shows the
above data in 2D space.
A) k1 > k2> k3
B) k1<k2
C) k1 = k2 = k3
D) None of these
Solution: D
Value of k is highest in k3, whereas in k1
it is lowest
22) Which of the following value of k in
the following graph would you give
least leave one out cross validation
accuracy?
A) 1
B) 2
C) 3
D) 5
Solution: B
If you keep the value of k as 2, it gives
the lowest cross validation accuracy.
You can try this out yourself.
23) A company has build a kNN
classifier that gets 100% accuracy on
training data. When they deployed this
model on client side it has been found
that the model is not at all accurate.
Which of the following thing might gone
wrong?
Note: Model has successfully deployed
and no technical issues are found at
client side except the model
performance
A) It is probably a overfitted model
B) It is probably a underfitted model
C) Can’t say
D) None of these
Solution: A
In an overfitted module, it seems to be
performing well on training data, but it is
not generalized enough to give the same
results on a new data.
24) You have given the following 2
statements, find which of these option
is/are true in case of k-NN?
1. In case of very large value of k, we
may include points from other classes
into the neighborhood.
2. In case of too small value of k the
algorithm is very sensitive to noise
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the options are true and are self
explanatory.
25) Which of the following statements
is true for k-NN classifiers?
A) The classification accuracy is better
with larger values of k
B) The decision boundary is smoother
with smaller values of k
C) The decision boundary is linear
D) k-NN does not require an explicit
training step
Solution: D
Option A: This is not always true. You
have to ensure that the value of k is not
too high or not too low.
Option B: This statement is not true. The
decision boundary can be a bit jagged
Option C: Same as option B
Option D: This statement is true
26) True-False: It is possible to
construct a 2-NN classifier by using the
1-NN classifier?
A) TRUE
B) FALSE
Solution: A
You can implement a 2-NN classifier by
ensembling 1-NN classifiers
27) In k-NN what will happen when you
increase/decrease the value of k?
A) The boundary becomes smoother
with increasing value of K
B) The boundary becomes smoother
with decreasing value of K
C) Smoothness of boundary doesn’t
dependent on value of K
D) None of these
Solution: A
The decision boundary would become
smoother by increasing the value of K
28) Following are the two statements
given for k-NN algorthm, which of the
statement(s)
is/are true?
1. We can choose optimal value of k
with the help of cross validation
2. Euclidean distance treats each
feature as equally important
A) 1
B) 2
C) 1 and 2
D) None of these
Solution: C
Both the statements are true
Context 29-30:
Suppose, you have trained a k-NN model
and now you want to get the prediction
on test data. Before getting the
prediction suppose you want to
calculate the time taken by k-NN for
predicting the class for test data.
Note: Calculating the distance between
2 observation will take D time.
29) What would be the time taken by 1-
NN if there are N(Very large)
observations in test data?
A) N*D
B) N*D*2
C) (N*D)/2
D) None of these
Solution: A
The value of N is very large, so option A
is correct
30) What would be the relation between
the time taken by 1-NN,2-NN,3-NN.
A) 1-NN >2-NN >3-NN
B) 1-NN < 2-NN < 3-NN
C) 1-NN ~ 2-NN ~ 3-NN
D) None of these
Solution: C
The training time for any value of k in
kNN algorithm is the same.