Unit-4 ML
Unit-4 ML
UNIT-VI
INSTANCE-BASED LEARNING
6.1 INTRODUCTION
where ar (x) denotes the value of the rth attribute of instance x. Then the
distance between two instances xi and xj is defined to be d(xi, xj), where
For example, the following figure illustrates the operation of the k-nearest
neighbour algorithm for the case where the instances are points in a two-
dimensional space and where the target function is Boolean valued. The
positive and negative training examples are shown by "+" and "-"
respectively. A query point xq, is shown as well. Note the 1- nearest
neighbour algorithm (k=1) classifies xq, as a positive example in this figure,
whereas the 5- nearest neighbour algorithm (k=5) classifies it as a negative
example.
Figure: k-NEAREST NEIGHNOR. set of positive and negative training examples is shown on
the left, along with a query instance xq, to be classified.
Where
Three possible criteria are to define the error E(xq) as a function of the
query point xq.
1. Minimize the squared error over just the k nearest neighbors:
2. Minimize the squared error over the entire set D of training examples,
while weighting the error of each training example by some decreasing
function K of its distance from xq:
3. Combine 1 and 2:
where each xu is an instance from X and where the kernel function Ku(d(xu,
x ) ) is defined so that it decreases as the distance d(xu, x) increases.
Here k is a user provided constant that specifies the number of kernel
functions to be included.
Even though f 1( x ) is a global approximation to f (x), the contribution from
each of the Ku(d (xu, x)) terms is localized to a region nearby the point xu. It
is common
to choose each function K, (d (xu, x)) to be a Gaussian function centered at
the point xu with some variance 2u
ii. They classify new query instances by analyzing similar instances while
ignoring instances that are very different from the query.
iii. They represent instances as real-valued points in an n-dimensional
Euclidean space.
Figure: A stored case and a new problem. The top half of the figure describes a typical
design fragment in the case library of CADET. The function is represented by the graph of
qualitative dependencies among the T-junction variables (described in the text). The bottom
half of the figure shows a typical design problem.
UNIT-VI
SECTION-A
Objective Questions
1. k-NN algorithm does more computation on test time rather than train
time. [ ]
A. TRUE B. FALSE
1. k-NN performs much better if all of the data have the same scale
2. k-NN works well with a small number of input variables (p), but
struggles when the number of inputs is very large
3. k-NN makes no assumptions about the functional form of the problem
being solved
A. 1 and 2 B. 1 and 3
C. Only 1 D. All of the above
4. Which of the following will be Euclidean Distance between the two data
point A(1,3) and B(2,3)? [ ]
A. 1 B. 2 C. 4 D. 8
5. When you find noise in data which of the following option would you
consider in k-NN? [ ]
A. I will increase the value of k
B. I will decrease the value of k
C. Noise cannot be dependent on value of k
D. None of these.
6. In k-NN it is very likely to overfit due to the curse of dimensionality.
Which of the following option would you consider to handle such
problem? [ ]
8. You have given the following 2 statements, find which of these option
is/are true in case of k-NN? [ ]
1. In case of very large value of k, we may include points from other classes
into the neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise
A. 1 B. 2 C. 1 and 2 D. None of these
9. In k-NN what will happen when you increase/decrease the value of k?
A. The boundary becomes smoother with increasing value of K [ ]
B. The boundary becomes smoother with decreasing value of K
C. Smoothness of boundary doesn’t dependent on value of K
D. None of these
SECTION-B
Descriptive Questions
1. Write the disadvantages of instance-based learning.
2. Why instance based learning algorithm sometimes referred as Lazy
learning algorithm?
3. Explain distance-weighted nearest neighbour algorithm.
4. Illustrate with suitable example k-nearest neighbor classifier.
5. Write a short note on Lazy and Eager Learning.
6. Describe the method of learning using locally weighted linear regression
7. Explain Case-based Reasoning Learning paradigm.
8. Discuss remarks on lazy and eager learning.
9. List out eager and lazy learning algorithms.
10. Write the differences between Lazy and Eager Learning methods.
11. What is Curse of Dimensionality?