0% found this document useful (0 votes)
13 views

Instance Based Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Instance Based Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Naveen Pragallapati

Unit-4

Part-2: Instance Based Techniques

K-Nearest Neighbour (k-NN) Learning:


k-NN is an instance-based learning algorithm. It belongs to the family of lazy learning
algorithms because no explicit model is built during the training phase. Instead of learning a
parametric model, k-NN directly uses the training instances to predict outcomes for unseen
data.

Core Idea:
1. Given a query instance, the algorithm finds the k-nearest neighbours from the training
data (based on a distance metric).
2. The predicted class or value for the query instance is based on the majority class (for
classifica on) or average value (for regression) of the neighbours.
Algorithm

Note:
The k-NEAREST NEIGHBOR algorithm is easily adapted to approxima ng con nuous-valued
target func ons. To approximate a real-valued target func on 𝑓: 𝑅 → 𝑅 we replace the final
line of the above algorithm by the line

∑ 𝑓(𝑥 )
𝑓 𝑥 =
𝑘
Naveen Pragallapati

A Note on Terminology
Much of the literature on nearest-neighbour methods and weighted local regression uses a
terminology that has arisen from the field of sta s cal pa ern recogni on. In reading that
literature, it is useful to know the following terms:
1. Regression means approxima ng a real-valued target func on.
2. Residual is the error 𝑓 (𝑥) − 𝑓(𝑥) in approxima ng the target func on.
3. Kernel func on is the func on of distance that is used to determine the weight of
each training example. In other words, the kernel func on is the func on K such that
𝑤 = 𝐾 𝑑 𝑥 ,𝑥 .

Distance Metrics:
The choice of distance metric significantly impacts the performance of k-NN.
Common distance metrics include:
1. Euclidean Distance (used for con nuous data):

2. Manha an Distance (L1 norm):

3. Hamming Distance (for categorical data):


Counts the number of mismatched a ributes.

Choosing the Value of k:


1. Small k (e.g., k = 1): High sensi vity to noise, leading to overfi ng.
2. Large k: More robust but may lead to underfi ng.
3. Rule of Thumb:
A common prac ce is to 𝑘 = √𝑛, where 𝑛 is the size of the training dataset.
Weighted k-NN:
In some cases, closer neighbours are given more weight in the predic on.
Example: Use an inverse distance weigh ng scheme where closer points have higher
influence. Replacing the final line of the k-NN algorithm by
𝑎𝑟𝑔𝑚𝑎𝑥
𝑓 𝑥 ← ∑ 𝑤 𝛿 𝑣, 𝑓(𝑥 ) where 𝑤 =
𝑣∈𝑉 ( , )

We can distance-weight the instances for real-valued target func ons in a similar fashion,
replacing the final line of the algorithm in this case by

∑ 𝑤 𝑓(𝑥 )
𝑓 𝑥 =
∑ 𝑤
Naveen Pragallapati

Strengths and Weaknesses:


Strengths:
1. Simple and intui ve.
2. No need for training phase or parameter tuning (beyond choosing k).
3. Flexible: Can handle mul -class classifica on and regression.
Weaknesses:
1. Computa onally expensive during predic on (distance computa on).
2. Performance depends heavily on the choice of k and distance metric.
3. Sensi ve to irrelevant features and noisy data.

Applica ons of k-NN:


1. Image Recogni on: Used to classify images based on pixel similarity.
2. Recommender Systems: Iden fying similar users or products for recommenda ons.
3. Anomaly Detec on: Iden fying outliers by measuring how close an instance is to its
nearest neighbours.

Example Problem 1:
Given a dataset with three a ributes: Age, Income, and Credit Score. Predict whether a
person will buy a product (Yes/No).

Steps:
1. Compute the Euclidean distance between the query instance and all training points.
2. Choose k = 3 (find the 3 nearest neighbours).
3. Use majority vo ng to assign a class label (Yes/No).
Naveen Pragallapati

Example Problem-2:
Solve the Example Problem-1 above using weighted k-NN.
Naveen Pragallapati
Naveen Pragallapati

Example Problem-3:
Naveen Pragallapati

Weighted k-NN Regression Predic on: 297598.77

Summary of Predic ons:

Summary:
1. k-NN is a powerful, non-parametric algorithm that works well for classifica on and
regression.
2. Its main drawback is the high computa onal cost during predic on, especially with
large datasets.
3. k-NN performs be er when the feature space is small and the relevant features are
carefully selected.
Naveen Pragallapati

Locally Weighted Regression (LWR):


Locally Weighted Regression (LWR) is an instance-based learning algorithm that fits a
separate model for each query point by weighting nearby data points more heavily than
distant ones. Unlike global models like linear regression, LWR constructs a local
approximation specific to each query, making it a non-parametric learning approach. It’s
particularly useful for non-linear data patterns where global models cannot accurately
capture the relationships between variables.

Applica ons of LWR:


 Robot control: Used to model dynamics and predict local behaviour in robo cs.
 Time-series forecas ng: Helps predict future values by fi ng local trends to historical
data.
 Geospa al data modelling: Captures local varia ons in spa al data, like temperature
or pollu on levels.

Training Algorithm for Locally Weighted Regression (LWR) Using Gradient Descent:
Naveen Pragallapati

Summary:
This algorithm fits a local linear model for each query point by itera vely upda ng the weights
𝑤 using gradient descent. The kernel func on ensures that only nearby points have significant
influence, making the regression localized. The training process con nues un l the model
converges, a er which it can predict values based on the op mized weights.

Locally Weighted Regression Example (Using Gradient Descent):


Naveen Pragallapati
Naveen Pragallapati
Naveen Pragallapati

Radial Basis Func on (RBF) Networks:


 An RBF network is a type of ar ficial neural network used for func on approxima on.
 RBF networks model the target func on as a weighted sum of radial basis func ons,
typically Gaussian func ons.
Key Structure:
 Input Layer: Passes input to the hidden layer without weights.
 Hidden Layer: Each neuron represents a radial basis func on.
 Output Layer: Combines the hidden layer outputs linearly to generate predic ons.

RBF Networks Work in Learning:


Naveen Pragallapati

Training Process for RBF Networks:


Naveen Pragallapati

Summary:
1. Select centres: Place RBF neurons at key points (possibly through clustering).
2. Calculate ac va ons: Use the Gaussian kernel to compute how much influence each
RBF neuron has for a given input.
3. Op mize weights: Use least squares to find the weights that minimize the predic on
error.
Naveen Pragallapati

A radial basis func on network. Each hidden unit produces an ac va on determined by a Gaussian func on cantered at
some instance xu. Therefore, its ac va on will be close to zero unless the input x is near xu. The output unit produces a linear combina on
of the hidden unit ac va ons. Although the network shown here has just one output, mul ple output units can also
be included.

Diagram Summary
In the diagram, you would typically see:
 Input nodes connected to each hidden node (RBF neuron).
 RBF neurons in the hidden layer, each receiving the input vector and compu ng its
ac va on.
 Weights associated with each connec on from hidden neurons to the output layer.
 A single output node that aggregates the weighted ac va ons to produce the predicted
value.
This architecture enables RBF networks to model nonlinear rela onships by combining local
approxima ons (via Gaussian kernels) with a global linear combina on at the output layer.
Naveen Pragallapati

Note:

Solu on:

Steps to follow:
Naveen Pragallapati
Naveen Pragallapati

Case-Based Reasoning (CBR)


Case-Based Reasoning is an approach in machine learning where new problems are solved by
referencing or adap ng solu ons from previously encountered cases or examples. Unlike
methods that generalize from training data to build a model, CBR relies on a memory of
specific instances.
Key Steps in Case-Based Reasoning
1. Retrieve: Given a new problem, retrieve the most similar cases from memory. This involves
defining a similarity measure that can match the new case with those stored in memory.
2. Reuse: Once a similar case or set of cases is retrieved, reuse the solu on (or parts of it) for
the new problem. This step may require adapta on if the old solu on doesn't exactly fit the
new context.
3. Revise: A er applying the retrieved solu on, test it. If necessary, revise or adapt the
solu on to improve it for the current problem.
4. Retain: Once the solu on is successfully applied, it’s stored as a new case in memory for
future use. This reten on allows the system to improve its knowledge base over me.
Advantages of CBR
1. Efficiency with Limited Data: CBR works well even with a smaller number of cases,
making it suitable for applica ons where data is sparse.
2. Incremental Learning: New cases are con nually added to the memory, allowing the
system to learn over me without retraining.
3. Interpretability: Since solu ons are adapted from actual cases, CBR offers
interpretable outcomes and explana ons.
Applica ons of CBR
CBR is widely used in domains where historical cases are accessible, such as:

 Medical diagnosis, where pa ent cases help diagnose similar future pa ents.
 Technical support and troubleshoo ng, where past solu ons can be adapted for new
issues.
 Legal reasoning, where previous legal cases inform judgments in new cases.
Summary
CBR’s strength lies in its ability to adapt previous knowledge directly to new problems, which
is especially powerful in contexts where cases do not follow a strict generaliza on rule. It’s
ideal for tasks where excep ons are common or complex adapta ons are needed.
Naveen Pragallapati

Example
Imagine CADET’s library includes a case of a small irriga on pump with a flow rate of 10 liters
per minute and a pressure of 5 psi. The new problem requires a pump with 10 liters per minute
but with a higher pressure of 8 psi.
1. Retrieve: CADET retrieves the small irriga on pump case, recognizing that it meets the flow
rate requirement.
2. Reuse: CADET reuses much of the design, such as the general structure and configura on.
3. Revise: To meet the higher-pressure requirement, CADET modifies the pump by increasing
the impeller size or using a more powerful motor, ensuring it can achieve 8 psi.
4. Retain: The revised pump design is stored as a new case with specifica ons for a 10 L/min,
8 psi water pump.
Adapta on Techniques in CADET
CADET’s adapta on is based on both similarity metrics and specific engineering rules, such as:

 Increasing power to handle higher pressures.


 Altering materials based on durability requirements for different pressures or flow
rates.

Remarks on Lazy and Eager Learning


Eager Learning: In eager learning, the model is constructed in advance of any query. This
involves generalizing from the training data to create a model that can be used for predic on.
Eager learners typically build a complete model during the training phase, which can be
computa onally expensive but allows for quick predic ons once the model is built. Examples
include decision trees, neural networks, and support vector machines.
Lazy Learning: In contrast, lazy learning does not construct a general model un l a query is
made. Instead, it retains the training instances and uses them directly to make predic ons.
This can be more efficient in terms of the me taken during the training phase but may result
in slower predic ons because it must process the training data at query me. Examples of lazy
learning include k-nearest neighbours (k-NN) and locally weighted regression.

Key Characteris cs:


1. Generaliza on vs. Memoriza on:

 Eager learning emphasizes generaliza on, aiming to abstract pa erns from the
training data.
 Lazy learning focuses on memoriza on, retaining the original instances for later use.
Naveen Pragallapati

2. Time Complexity:

 Eager learners require more me and computa onal resources during the training
phase, as they need to analyse and construct a model.
 Lazy learners are quick to train, as they simply store the training data but may require
more me to make predic ons since they analyse the stored data at query me.
3. Memory Usage:

 Eager learning typically uses less memory at query me since it works with a model
rather than storing all instances.
 Lazy learning may require significant memory if the training dataset is large, as it must
keep all instances accessible for querying.
4. Flexibility:

 Eager learners can some mes struggle with changes in the underlying data
distribu on, as retraining the model is necessary.
 Lazy learners can adapt to changes in the data more easily since they can incorporate
new instances dynamically during predic on.
5. Performance and Applica on Context:

 Eager learning methods may perform be er in scenarios where the dataset is large,
and a general model can effec vely capture the rela onships in the data.
 Lazy learning may excel in cases where data is sparse or when predic ons must be
made based on local rela onships in the data.
Naveen Pragallapati

Use Cases:

 Eager learning is o en used in applica ons where the cost of computa on during
training can be jus fied by the need for fast predic ons, such as in online services.
 Lazy learning is useful in applica ons where real- me updates are cri cal, such as
recommenda on systems that adapt to user preferences.
Conclusion:
Both lazy and eager learning methods have their advantages and disadvantages, and the
choice between them o en depends on the specific problem context, the size and nature of
the dataset, and the computa onal resources available. Understanding these concepts is
crucial for selec ng appropriate learning algorithms in prac cal machine learning
applica ons.

You might also like