0% found this document useful (0 votes)

12 views

Module-5_Notes_13-12-2024.docx

Module 5 aaiml

Uploaded by

saivineela0806

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Module-5_Notes_13-12-2024.docx

Module 5 aaiml

Uploaded by

saivineela0806

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Module-5 Notes

Syllabus
Clustering: Introduction, Types of clustering, partitioning methods of clustering
(k means, k-medoids), Hierarchical methods
Textbook-3 Chapter 13
Machine Learning by Vincy Joseph Anuradha Srinivasaraghavan, ID Numbers
ISBN 10 8126578513, ISBN 13 9788126578511
Instance based learning: introduction, k-nearest neighbour learning (review)
weighted regression, radial bias function, case based reasoning

Textbook 2 Chapter 8.1-8.5

Machine Learning Tom M. Mitchell ISBN: 0070428077

From Javapoint
Types of Clustering Methods
The clustering methods are broadly divided into Hard clustering (datapoint belongs
to only one group) and Soft Clustering (data points can belong to another group also).
But there are also other various approaches of Clustering exist. Below are the main
clustering methods used in Machine learning:

1
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another
cluster centroid.

Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters,
and the arbitrarily shaped distributions are formed as long as the dense region can be
connected. This algorithm does it by identifying different clusters in the dataset and
connects the areas of high densities into clusters. The dense areas in data space are
divided from each other by sparser areas.
These algorithms can face difficulty in clustering the data points if the dataset has
varying densities and high dimensions.

2
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the
probability of how a dataset belongs to a particular distribution. The grouping is done
by assuming some distributions commonly Gaussian Distribution.
The example of this type is the Expectation-Maximization Clustering
algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as
there is no requirement of pre-specifying the number of clusters to be created. In this
technique, the dataset is divided into clusters to create a tree-like structure, which is
3
also called a dendrogram. The observations or any number of clusters can be selected
by cutting the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is
the example of this type of clustering; it is sometimes also known as the Fuzzy k-
means algorithm.
Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained
above. There are different types of clustering algorithms published, but only a few are
commonly used. The clustering algorithm is based on the kind of data that we are
using. Such as, some algorithms need to guess the number of clusters in the given
dataset, whereas some are required to find the minimum distance between the
observation of the dataset.
Here we are discussing mainly popular Clustering algorithms that are widely used in
machine learning:
1. K-Means algorithm: The k-means algorithm is one of the most popular
clustering algorithms. It classifies the dataset by dividing the samples into
different clusters of equal variances. The number of clusters must be specified
in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).

4
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the
smooth density of data points. It is an example of a centroid-based model, that
works on updating the candidates for centroid to be the center of the points
within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of
Applications with Noise. It is an example of a density-based model similar to
the mean-shift, but with some remarkable advantages. In this algorithm, the
areas of high density are separated by the areas of low density. Because of this,
the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be
used as an alternative for the k-means algorithm or for those cases where K-
means can be failed. In GMM, it is assumed that the data points are Gaussian
distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical
algorithm performs the bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and then successively merged.
The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does
not require to specify the number of clusters. In this, each data point sends a
message between the pair of data points until convergence. It has O(N2T) time
complexity, which is the main drawback of this algorithm.

Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster. Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a cluster. Fuzzy C-means algorithm is
the example of this type of clustering; it is sometimes also known as the Fuzzy k-
means algorithm.

From Chatgpt
Clustering in Machine Learning
Clustering is an unsupervised learning technique used in Machine Learning (ML) to
group similar data points into clusters or groups. It is particularly useful when labels
or outputs are not available, allowing the algorithm to infer the structure within the
data. Clustering is used in various applications such as market segmentation,
document categorization, image compression, and anomaly detection.

5
Key Features of Clustering
1. Unsupervised Learning: No predefined labels or categories; patterns are
discovered based on the data.
2. Similarity/Dissimilarity: Clusters are formed based on similarity or
dissimilarity metrics (e.g., Euclidean distance, Cosine similarity).
3. Group Representation: Each cluster represents a subset of data with similar
characteristics.

Types of Clustering Methods

Clustering methods are broadly classified into the following types:
1. Partition-Based Clustering
• Description: Divides the dataset into distinct groups by optimizing a specific
criterion such as minimizing the distance within clusters.
• Algorithm Examples:
o K-Means:
▪ Partitions data into kkk clusters.
▪ Uses a centroid to represent each cluster and minimizes the sum
of squared distances between data points and their cluster
centroid.
▪ Works well with spherical clusters.
o K-Medoids:
▪ Similar to K-Means but uses actual data points (medoids) instead
of centroids.
▪ More robust to outliers.
• Pros: Fast and easy to implement.
• Cons: Sensitive to the choice of kkk; may not work well with non-spherical
clusters.

2. Hierarchical Clustering
• Description: Builds a hierarchy of clusters using a tree-like structure called a
dendrogram.
• Types:

6
o Agglomerative (Bottom-Up):
▪ Starts with each data point as a separate cluster.
▪ Merges the closest pairs of clusters iteratively until one cluster
remains.
o Divisive (Top-Down):
▪ Starts with one large cluster and divides it into smaller clusters
recursively.
• Distance Metrics: Linkage methods determine how to merge or divide
clusters:
o Single Linkage: Minimum distance between clusters.
o Complete Linkage: Maximum distance between clusters.
o Average Linkage: Average distance between clusters.
• Pros:
o Does not require specifying the number of clusters in advance.
o Useful for visualizing cluster structures.
• Cons:
o Computationally expensive for large datasets.

3. Density-Based Clustering
• Description: Identifies clusters based on areas of high data density and
separates low-density areas as noise.
• Algorithm Examples:
o DBSCAN (Density-Based Spatial Clustering of Applications with
Noise):
▪ Groups points that are closely packed together.
▪ Handles noise and outliers effectively.
o OPTICS (Ordering Points to Identify the Clustering Structure):
▪ Extends DBSCAN by handling varying densities.
• Pros:
o Can find arbitrarily shaped clusters.

7
o Handles noise and outliers well.
• Cons:
o Struggles with varying density clusters.

4. Model-Based Clustering
• Description: Assumes the data is generated by a mixture of underlying
probability distributions (e.g., Gaussian distributions) and fits the data to these
models.
• Algorithm Examples:
o Gaussian Mixture Models (GMM):
▪ Uses a probabilistic model to represent each cluster as a Gaussian
distribution.
▪ Determines the likelihood of data points belonging to each
cluster.
• Pros:
o Handles overlapping clusters.
o Provides a probabilistic measure of membership.
• Cons:
o Assumes a specific distribution, which may not always fit the data.

5. Grid-Based Clustering
• Description: Divides the data space into a finite number of cells (grids) and
then clusters the cells based on the density of data points.
• Algorithm Examples:
o STING (Statistical Information Grid):
▪ Summarizes statistical data in grids and performs clustering
hierarchically.
o CLIQUE (Clustering In QUEst):
▪ Combines grid-based and density-based clustering for high-
dimensional data.
• Pros:

8
o Efficient for large datasets.
o Handles high-dimensional data.
• Cons:
o Sensitive to grid size.

6. Spectral Clustering
• Description: Uses the eigenvalues of a similarity matrix to reduce dimensions
and group data points in lower-dimensional space.
• Algorithm Examples:
o Constructs a similarity graph from the dataset.
o Performs clustering using graph partitioning techniques.
• Pros:
o Effective for non-convex clusters.
o Handles complex data structures.
• Cons:
o Computationally expensive for large datasets.

Factors to Consider When Choosing a Clustering Method

1. Data Shape and Distribution:
o Spherical: K-Means.
o Arbitrary: DBSCAN, Spectral Clustering.
2. Cluster Density:
o Uniform: Partition-based methods.
o Varying: Density-based methods like DBSCAN.
3. Scalability:
o Large datasets: Grid-based methods.
o Small to medium datasets: Hierarchical clustering.
4. Noise and Outliers:
o Sensitive: K-Means.

9
o Robust: DBSCAN, GMM.

Evaluation Metrics for Clustering

Since clustering is unsupervised, evaluation is not as straightforward as for supervised
learning. Common metrics include:
1. Silhouette Score: Measures how similar a point is to its cluster versus other
clusters.
2. Dunn Index: Ratio of minimum inter-cluster distance to maximum intra-
cluster distance.
3. Calinski-Harabasz Index: Ratio of between-cluster dispersion to within-
cluster dispersion.
4. Davies-Bouldin Index: Measures average similarity ratio within clusters.

Clustering is a versatile and powerful tool for exploring and understanding data, and
the choice of clustering method should align with the data characteristics and desired
outcomes.

From Clustering in Machine Learning - GeeksforGeeks

Applications of Clustering in different fields:

1. Marketing: It can be used to characterize & discover customer segments for
marketing purposes.
2. Biology: It can be used for classification among different species of plants and
animals.
3. Libraries: It is used in clustering different books on the basis of topics and
information.
4. Insurance: It is used to acknowledge the customers, their policies and
identifying the frauds.
5. City Planning: It is used to make groups of houses and to study their values
based on their geographical locations and other factors present.
6. Earthquake studies: By learning the earthquake-affected areas we can
determine the dangerous zones.

10
7. Image Processing: Clustering can be used to group similar images together,
classify images based on content, and identify patterns in image data.
8. Genetics: Clustering is used to group genes that have similar expression
patterns and identify gene networks that work together in biological processes.
9. Finance: Clustering is used to identify market segments based on customer
behavior, identify patterns in stock market data, and analyze risk in investment
portfolios.
10. Customer Service: Clustering is used to group customer inquiries and
complaints into categories, identify common issues, and develop targeted
solutions.
11. Manufacturing: Clustering is used to group similar products together,
optimize production processes, and identify defects in manufacturing
processes.
12. Medical diagnosis: Clustering is used to group patients with similar symptoms
or diseases, which helps in making accurate diagnoses and identifying effective
treatments.
13. Fraud detection: Clustering is used to identify suspicious patterns or
anomalies in financial transactions, which can help in detecting fraud or other
financial crimes.
14. Traffic analysis: Clustering is used to group similar patterns of traffic data,
such as peak hours, routes, and speeds, which can help in improving
transportation planning and infrastructure.
15. Social network analysis: Clustering is used to identify communities or groups
within social networks, which can help in understanding social behavior,
influence, and trends.
16. Cybersecurity: Clustering is used to group similar patterns of network traffic
or system behavior, which can help in detecting and preventing cyberattacks.
17. Climate analysis: Clustering is used to group similar patterns of climate data,
such as temperature, precipitation, and wind, which can help in understanding
climate change and its impact on the environment.
18. Sports analysis: Clustering is used to group similar patterns of player or team
performance data, which can help in analyzing player or team strengths and
weaknesses and making strategic decisions.
19. Crime analysis: Clustering is used to group similar patterns of crime data,
such as location, time, and type, which can help in identifying crime hotspots,
predicting future crime trends, and improving crime prevention strategies.

11
Comparison of K-Means and K-Medoids in tabular form:
Aspect K-Means K-Medoids
Definition A partition-based clustering A partition-based clustering
algorithm that minimizes the algorithm that minimizes the
sum of squared distances sum of distances between data
between data points and points and the most centrally
cluster centroids. located data point (medoid) in
each cluster.
Cluster Centroids: Geometric center Medoids: Actual data points
Representation of the cluster, which may not from the dataset, representing
be an actual data point. the cluster center.
Distance Metric Typically uses Euclidean Can use various distance metrics
distance. (e.g., Manhattan, Euclidean).
Objective Minimize the sum of squared Minimize the sum of absolute
Function distances of points from their distances of points from their
cluster centroids. cluster medoids.
Robustness to Sensitive to outliers as Robust to outliers since medoids
Outliers centroids can be skewed by are actual data points and not
extreme values. affected by extreme values.
Data Type Works best with numerical Suitable for numerical and
Suitability data where centroids are categorical data as medoids can
meaningful. represent any data type.
Algorithm Generally faster: Slower due to exhaustive search
Complexity O(n×k×i)O(n \times k \times for medoids: O(n2×k×i)O(n^2
i)O(n×k×i), where nnn is the \times k \times i)O(n2×k×i).
number of data points, kkk is
the number of clusters, and iii
is the number of iterations.
Initialization Requires initialization of kkk Requires initialization of kkk
centroids (can use random medoids (often done by
selection or algorithms like randomly selecting kkk points).
K-Means++).
Convergence Converges faster, but might Converges more reliably to a
get stuck in local minima due solution since medoids are data
to sensitivity to initialization. points, but can take more
iterations.

12
Scalability Scalable to large datasets; Less scalable due to
efficient with high- computational complexity with
dimensional data. large datasets.
Cluster Shape Assumes clusters are Can handle arbitrary-shaped
spherical and evenly sized. clusters better in some cases.
Applications - Image compression - Medical diagnosis (categorical
- Document clustering data)
- Customer segmentation - Robust clustering in noisy
datasets
- Gene expression analysis
Advantages - Fast and efficient for large - Handles noise and outliers
datasets better
- Easy to implement and - Works with mixed data types
understand
Disadvantages - Sensitive to outliers - Computationally intensive
- Limited to numerical data - Slower for large datasets
- Assumes spherical clusters

When to Use K-Means vs. K-Medoids

• K-Means: Use when the dataset is large, mostly numerical, and free of
significant outliers.
• K-Medoids: Use when the dataset contains outliers, mixed data types, or when
interpretability of cluster centers is important.

Types of Instance-Based Learning Algorithms

Algorithm Description Use Cases

Predicts the output based on the

k-Nearest Text classification,
majority class (classification) or
Neighbors (k- recommendation systems,
average (regression) of the kkk-
NN) medical diagnosis
nearest neighbors.

13
Algorithm Description Use Cases

When closer neighbors

Similar to k-NN but assigns weights
Weighted k-NN should have more
to neighbors based on their distance.
influence.

Locally
Fits a local model for each query Regression problems where
Weighted
instance based on nearby training relationships vary across
Regression
data. the space.
(LWR)

Case-Based Stores specific cases and adapts

Medical diagnosis, legal
Reasoning solutions to new problems based on
decision-making.
(CBR) past cases.

Advantages of Instance-Based Learning

1. Simple Implementation:
o Requires minimal effort for training since it stores instances and defers
learning to prediction time.
2. Flexibility:
o Can adapt quickly to new data as it doesn't rely on a fixed model.
3. Handles Complex Relationships:
o Effective for non-linear and highly variable data distributions.
4. Intuitive:
o Easy to interpret since predictions are based directly on stored examples.

Disadvantages of Instance-Based Learning

14
o Performance degrades as the number of features increases, making it
hard to define meaningful distances.
4. No Abstraction:
o Doesn't provide insights or a generalized model of the data.

Applications of Instance-Based Learning

1. Recommendation Systems:
o Suggest items based on the preferences of similar users.
2. Medical Diagnosis:
o Predict diseases based on the symptoms of similar past cases.
3. Handwriting and Image Recognition:
o Identify characters or objects by comparing with known examples.
4. Text Classification:
o Classify documents or emails (e.g., spam detection).

Instance based learning: introduction, k-nearest neighbour learning (review)

weighted regression, radial bias function, case based reasoning

Textbook 2 Chapter 8.1-8.5

Machine Learning Tom M. Mitchell ISBN: 0070428077

From Machine Learning Tom M. Mitchell

8.1 INTRODUCTION
Instance-based learning methods such as nearest neighbor and locally weighted
regression are conceptually straightforward approaches to approximating real-valued
or discrete-valued target functions. Learning in these algorithms consists of simply
storing the presented training data. When a new query instance is encountered, a set of
similar related instances is retrieved from memory and used to classify the new query
instance. One key difference between these approaches and the methods discussed in
other chapters is that instance-based approaches can construct a different
approximation to the target function for each distinct query instance that must be

15
classified. In fact, many techniques construct only a local approximation to the target
function that applies in the neighborhood of the new query instance, and never
construct an approximation designed to perform well over the entire instance space.
This has significant advantages when the target function is very complex, but can still
be described by a collection of less complex local approximations.

Instance-Based Learning (IBL) in Machine Learning

Instance-Based Learning (IBL) is a type of machine learning paradigm where the
algorithm makes predictions by memorizing and using specific instances or examples
from the training data rather than building an explicit general model. The approach
focuses on using the training data itself to make decisions at prediction time.

Key Characteristics of Instance-Based Learning lazy learning"

1. Lazy Learning:
o IBL is often called "lazy learning" because the model defers the
generalization process until it receives a query or test instance.
o No explicit model is created during training; instead, predictions rely on
the stored data.
2. Similarity-Based:
o Predictions are made by comparing the similarity between the test
instance and the stored training instances.
o Similarity is usually measured using distance metrics like Euclidean
distance, Manhattan distance, or Cosine similarity.
3. Adaptability:
o As new data is added, the algorithm can adapt without the need for
retraining since predictions are based on the current set of stored
instances.
4. High Memory Usage:
o Since the training data is stored and used directly, IBL can require
significant memory, especially for large datasets.

16
How Instance-Based Learning Works
1. Training Phase:
• No explicit training occurs.
• The algorithm stores the entire training dataset or a subset of it.
2. Prediction Phase:
• When a new instance (query) is presented, the algorithm compares it to
the stored instances.
• Based on the similarity between the query and the training instances, the
algorithm predicts the output (classification or regression).
3. Learning Algorithm:
• Relies on a distance function to measure how close the query instance
is to the stored examples.
• Common algorithms use techniques like nearest neighbor or weighted
contributions from neighbors.

Types of Instance-Based Learning Algorithms

Algorithm Description Use Cases
k-Nearest Predicts the output based on the Text classification,
Neighbors (k- majority class (classification) or recommendation systems,
NN) average (regression) of the kkk- medical diagnosis
nearest neighbors.
Weighted k-NN Similar to k-NN but assigns When closer neighbors
weights to neighbors based on their should have more
distance. influence.
Locally Fits a local model for each query Regression problems
Weighted instance based on nearby training where relationships vary
Regression data. across the space.
(LWR)
Case-Based Stores specific cases and adapts Medical diagnosis, legal
Reasoning solutions to new problems based on decision-making.
(CBR) past cases.

17
Advantages of Instance-Based Learning
1. Simple Implementation:
o Requires minimal effort for training since it stores instances and defers
learning to prediction time.
2. Flexibility:
o Can adapt quickly to new data as it doesn't rely on a fixed model.
3. Handles Complex Relationships:
o Effective for non-linear and highly variable data distributions.
4. Intuitive:
o Easy to interpret since predictions are based directly on stored examples.

Disadvantages of Instance-Based Learning

1. Memory-Intensive:
o Requires storing the entire dataset, which can be inefficient for large
datasets.
2. Prediction Latency:
o Predictions can be slow for large datasets because similarity calculations
must be performed for each query.
3. Curse of Dimensionality:
o Performance degrades as the number of features increases, making it
hard to define meaningful distances.
4. No Abstraction:
o Doesn't provide insights or a generalized model of the data.

Applications of Instance-Based Learning

18
o Identify characters or objects by comparing with known examples.
4. Text Classification:
o Classify documents or emails (e.g., spam detection).

Instance-Based Learning vs. Model-Based Learning

Aspect Instance-Based Learning Model-Based Learning
Learning Lazy: Defers generalization to Eager: Builds an explicit
Phase prediction time. model during training.
Memory Usage High: Requires storing the entire Low: Stores only the model
dataset. parameters.
Generalization Local: Predictions depend on Global: Captures general
nearby data points. trends in the data.
Adaptability High: Can quickly incorporate Low: Requires retraining to
new data. incorporate new data.
Computation Expensive during prediction Expensive during training
(similarity computation). (model optimization).

Instance-Based Learning provides a simple yet powerful approach to solving

problems, especially when the data relationships are too complex for parametric
models. However, it requires careful handling of memory and computational costs to
remain efficient.
You've reached your limit for using GPTs.
Upgrade to ChatGPT Plus or try again after 2:47 PM.

Instance-Based Learning vs. Model-Based Learning

19
Generalization Local: Predictions depend on Global: Captures general
nearby data points. trends in the data.
Adaptability High: Can quickly incorporate Low: Requires retraining to
new data. incorporate new data.
Computation Expensive during prediction Expensive during training
(similarity computation). (model optimization).

Instance-Based Learning provides a simple yet powerful approach to solving

problems, especially when the data relationships are too complex for parametric
models. However, it requires careful handling of memory and computational costs to
remain efficient.

Locally Weighted Regression

From Geeks for geeks

L | Locally weighted Linear Regression
Last Updated : 13 Apr, 2023
•

Linear Regression is a supervised learning algorithm used for computing linear relationships
between input (X) and output (Y). The steps involved in ordinary linear regression are:

Training phase: Compute to minimize the

cost.

Predict output: for given query point ,

As evident from the image below, this algorithm cannot be used for making predictions when
there exists a non-linear relationship between X and Y. In such cases, locally weighted linear
regression is used.

20
Locally Weighted Linear Regression:
From ML | Locally weighted Linear Regression - GeeksforGeeks

ML | Locally weighted Linear Regression

Linear Regression is a supervised learning algorithm used for computing linear
relationships between input (X) and output (Y). The steps involved in ordinary linear
regression are:

21
As evident from the image below, this algorithm cannot be used for making
predictions when there exists a non-linear relationship between X and Y. In such
cases, locally weighted linear regression is used.

Locally Weighted Linear Regression:

22
NOTE: For Locally Weighted Linear Regression, the data must always be available on
the machine as it doesn’t learn from the whole set of data in a single shot. Whereas, in
Linear Regression, after training the model the training set can be erased from the
machine as the model has already learned the required parameters.

23
Points to remember:
1. Locally weighted linear regression is a supervised learning algorithm.
2. It is a non-parametric algorithm.
3. There exists No training phase. All the work is done during the testing
phase/while making predictions.
4. The dataset must always be available for predictions.
5. Locally weighted regression methods are a generalization of k-Nearest
Neighbour.
6. In Locally weighted regression an explicit local approximation is constructed
from the target function for each query instance.
7. The local approximation is based on the target function of the form like
constant, linear, or quadratic functions localized kernel functions.

24
From Locally Weighted Linear Regression - Javatpoint
Applications of Locally Weighted Linear Regression
1. Time Series Analysis: LWLR is particularly useful in time series analysis,
where the relationship between variables may change over time. By adapting to
the local patterns and trends, LWLR can capture the dynamics of time-varying
data and make accurate predictions.
2. Anomaly Detection: LWLR can be employed for anomaly detection in various
domains, such as fraud detection or network intrusion detection. By identifying
deviations from the expected patterns in a localized manner, LWLR helps
detect abnormal behavior that may go unnoticed using traditional regression
models.
3. Robotics and Control Systems: In robotics and control systems, LWLR can
be utilized to model and predict the behavior of complex systems. By adapting
to local conditions and variations, LWLR enables precise control and decision-
making in dynamic environments.
Benefits of Locally Weighted Linear Regression
1. Improved Predictive Accuracy: By considering local patterns and
relationships, LWLR can capture subtle nuances in the data that might be
overlooked by global regression models. This results in more accurate
predictions and better model performance.
2. Flexibility and Adaptability: LWLR can adapt to different regions of the
dataset, making it suitable for complex and non-linear relationships. It offers
flexibility in capturing local variations, allowing for more nuanced analysis and
insights.
3. Interpretable Results: Despite its adaptive nature, LWLR still provides
interpretable results. The localized models offer insights into the relationships
between variables within specific regions of the data, aiding in the
understanding of complex phenomena.
Locally Weighted Linear Regression in Python | by Suraj Verma | Towards Data
Science
A better modelled picture of LWLR is given

25
What is your imagination and answer at the junction/transitions from one linear
segment to another linear segment?

26
Radial bias function
From Radial Basis Functions: Types, Advantages, and Use Cases | HackerNoon

All of you please listen to the audio of the article by the creator of this
webpage
This is an introductory article explaining the basic intuition, mathematical idea & scope of
radial basis functions in the development of predictive machine learning models.

Table of Contents
1. Introduction
2. Basic intuition of a Radial Basis Function
3. Types of Radial Basis Function
4. The concept of the RBF Network
5. Scope & Advantages of RBF
6. Conclusion
7. References
Introduction
In machine learning, problem-solving based on hyperplane-based algorithms heavily depends
upon the distribution of the data points in the space. However, it is a known fact that real-
world data rarely follows theoretical assumptions.

There are a lot of transformation functions that can convert the natural shape of the data
points into theoretically recommended distributions persevering the hidden patterns of the
data. Radial Basis is one such renowned function which is discussed in a lot of machine
learning textbooks. In this article, we will learn about basic intuition, types and usage of the
Radial basis function.
The Basic Intuition of a Radial Basis Function
The radial basis function is a mathematical function that takes a real-valued input and
outputs a real-valued output based on the distance between the input value projected in space
from an imaginary fixed point placed elsewhere.

This function is popularly used in many machine learning and deep learning algorithms such
as Support Vector Machines, Artificial Neural Networks, etc.

Let us understand the concept and the usage of this mathematical function.

27
In real-time, whenever we solve complex machine learning problems using algorithms such
as SVM, we need to project all of our data points in an imaginary multidimensional space
where each feature will be a dimension.

Let's assume we have a classification problem to predict whether a student will pass or fail
the examination.

We have the following features as independent variables:

• Marks in internal exams

• Marks in projects
• Attendance percentage

So, these 3 independent variables become 3 dimensions of a space like this-

Image source: Illustrated by the author

Let’s consider that our data points look like this where-
• The green colour represents the students who passed the examination

28
• The red colour represents the students who failed the examination

Image source: Illustrated by the author

Now, SVM will create a hyperplane that travels through these 3 dimensions in order to
differentiate the failed and passed students-

29
Image source: Illustrated by the author

So, technically now the model understands that every data points which falls on one side of
the hyperplane belong to the students who passed the exams and vice versa.

In our example, it was easy to create the hyperplane because a linear and straight hyperplane
was enough to discriminate the 2 categories. But in real-time complex projects, these
relations may get violated in many scenarios. Especially when you have hundreds of
independent variables, there is no possibility of getting a linear relationship between data
points such that it will be difficult to create an optimal hyperplane.

In such scenarios, researchers usually apply the Radial basis function to each of the data
points so that they will be able to pass a linear hyperplane across the data points to easily
solve the problem.

Consider that our data points are looking like this in the space-

30
Image source: Illustrated by the author

It is clear that we cannot use a linear hyperplane such that it can group the data points
according to their classes.

RBF will help us in these kinds of scenarios.

Some researchers will usually project these data points in much higher dimensions so that the
distance between the data points will be increasing so that they can apply some function
(RBF or any other function) to build a hyperplane. But it is not necessary to build high
dimensions since it is always the decision of the statistician/researcher who understands the
patterns in the data.

Next, we have to mark an imaginary point in the space like this wherever we need.

31
Image source: Illustrated by the author

After that, we need to draw some concentric circles based on this imaginary point.

Image source: Illustrated by the author

The distance between the centre and any data point positioned in the boundary of the circle is

32
called the radius.

Image source: Illustrated by the author

After calculating the radius, we need to pass this value inside a mathematical function (RBF)
that will return a real value. The returned value will be the transformed magnitude of a
particular data point used for further proceedings.

Types of Radial Basis Functions

There are multiple types of radial basis functions. Each of them will transform the input value
in a different way. Some of them are-

Multiquadratic Radial Basis Function

Image source: Illustrated by the author

Where,
• r is the radius
• ε is a constant

33
The function will look like this with respect to time,

Image source: reference 1

Inverse Multiquadric Radial Basis Function

Image source: Illustrated by the author

Where,
• r is the radius
• ε is a constant

34
Image source: reference 1
Gaussian Radial Basis Function

Image source: Illustrated by the author

Where,
• r is the radius

35
• ε is a constant

Image source: reference 1

I will explain intuitively what these functions will do intuitively in the space. There are 2
different processes that are done by these functions-

• Expanding the data points across the horizontal direction

• Compressing the data points across the vertical direction

The process of expansion will visually somewhat look like this-

36
Image source: Illustrated by the author

The process of compression will visually somewhat look like this-

Image source: Illustrated by the author

After the expansion and compression, the data points would have been transformed like this-

37
Image source: Illustrated by the author

Now, we can easily construct a linear hyperplane that can classify the data points like this-

38
Image source: Illustrated by the author
The Concept of the RBF Network
Sometimes, RBF is also used along with artificial neural networks with one hidden layer. In
such types of networks, RBF will be used as activation functions in the hidden layers. Apart
from the hidden layer, there will be an input layer that contains several neurons where each
one of them represents a feature variable and the output layer will be having a weighted sum
of outputs from the hidden layer to form the network outputs.

39
Image source: Illustrated by the author

Such networks are called RBF networks.

Scope & Advantages of RBF
• With the help of the RBF function, it is possible to solve the problems in datasets that
have complex non-linear distributions.
• RBF function has a strong tolerance to input noise
• In RBF neural network, there will be only one hidden layer which is very easy to
handle.
• Hidden patterns in the distribution can be generalized better after applying the RBF
function.
• In RBF neural network, we can easily interpret what is the meaning/function of each
node in the hidden layer of the RBNN. This is difficult in Multi-layer perception.
• Some of the hyperparameters present in Multi-layer perceptron such as the number of
nodes in the hidden layer, number of hidden layers etc. are difficult to optimise. But
these are not found in RBF neural networks.
Conclusion
In this article, we discussed one of the most useful transformation functions in machine
learning. I have tried to explain this complicated concept without many in-depth
mathematical calculations in a lucid manner targeting beginners in the AIML learning space.

40
This function is available as an inbuilt library in most data science-oriented programming
languages such as Python or R. Hence, it is easy to implement this once you understand the
theoretical intuition. I have added the links to some of the advanced materials in the
references section where you can deep dive into the complex calculations if you are
interested.
References
1. Radial Basis Functions - Wikipedia
2. Radial Basis Function networks Archived 2014-04-23 at the Wayback Machine
3. Broomhead, David H.; Lowe, David (1988). "Multivariable Functional Interpolation
and Adaptive Networks" (PDF). Complex Systems. 2: 321–355. Archived from the
original (PDF) on 2014-07-14.
4. Michael J. D. Powell (1977). "Restart procedures for the conjugate gradient
method". Mathematical Programming. 12 (1): 241–
254. doi:10.1007/bf01593790. S2CID 9500591.
5. Sahin, Ferat (1997). A Radial Basis Function Approach to a Color Image
Classification Problem in a Real-Time Industrial Application (M.Sc.). Virginia Tech.
p. 26. hdl:10919/36847. Radial basis functions were first introduced by Powell to
solve the real multivariate interpolation problem.

41
Case based reasoning

What is case-based reasoning? — Klu

What is case-based reasoning?
by Stephen M. Walker II, Co-Founder / CEO

42
What is case-based reasoning?
Case-based reasoning (CBR) is a problem-solving approach in artificial intelligence and
cognitive science that uses past solutions to solve similar new problems. It is an experience-
based technique that adapts previously successful solutions to new situations. The process is
primarily memory-based, modeling the reasoning process on the recall and application of past
experiences.
The CBR process generally involves four steps:
1. Retrieval: Gathering from memory an experience closest to the current problem.
2. Reuse: Suggesting a solution based on the experience and adapting it to meet the
demands of the new problem.
3. Revision: Evaluating the use of the solution in the new context.
4. Retaining: Storing this new problem-solving method in the memory system.
CBR differs from other AI approaches, such as knowledge-based systems, in that it doesn't
rely solely on general knowledge of a problem domain or making associations. Instead, it
employs the specific knowledge of previously experienced, concrete problem situations. This
approach offers incremental, sustained learning as each time a problem is solved, a new
experience is retained and can be reused.
CBR is used in various areas, including pattern recognition, diagnosis, troubleshooting, and
planning. It's considered easier to maintain compared to rule-based expert systems. However,
it's important to note that while CBR is a powerful method for computer reasoning, it also has
its limitations and is not suitable for all types of problems.
What are the benefits of using case-based reasoning?
1. Ease of Knowledge Acquisition — CBR simplifies the process of knowledge
acquisition, as it relies on specific instances of problem-solving rather than abstract
rules or models.
2. Efficiency and Quality — It can improve the efficiency and quality of problem-
solving by adapting solutions that have been successful in the past.
3. Flexibility — CBR is adaptable to a wide range of tasks and domains, making it a
versatile tool in various fields.
4. Human-Like Reasoning — It allows machines to reason more like humans by
understanding and applying knowledge from past cases.
5. Learning Capability — CBR systems can learn incrementally as each new case is
solved and retained, enhancing their problem-solving capabilities over time.
6. Ease of Maintenance — Compared to rule-based systems, CBR systems are
generally easier to maintain because they do not require extensive rule management.
7. Intuitive Approach — The process of CBR is intuitive and mirrors human problem-
solving by using precedents, which can make development and maintenance easier.

43
What are some of the challenges associated with case-based reasoning?
1. Handling Large Case Bases — CBR can struggle with managing and searching
through large case bases efficiently.
2. Dynamic Domain Problems — CBR may not be suitable for problems in dynamic
domains where the conditions change rapidly.
3. Storage and Processing — Storing a large number of cases can require significant
storage space, and finding similar cases can be time-consuming.
4. Case Creation — Cases may need to be manually created, which can be labor-
intensive and error-prone.
5. Adaptation Challenges — Adapting retrieved cases to new problems can be difficult,
especially if the new problem is significantly different from past cases.
6. Robustness — CBR systems may lack robustness, as the absence of even one piece
of data can disrupt the retrieval process.
How can case-based reasoning be used in AI applications?
Case-based reasoning (CBR) can be utilized in various AI applications due to its ability to
solve problems by adapting solutions from similar past cases. Here are some ways CBR is
applied:
1. Diagnosis — In healthcare, CBR can assist in diagnosing diseases by comparing
current patient data with historical cases.
2. Financial Decision Making — Financial institutions use CBR for loan approvals,
risk assessments, and investment strategies by analyzing past financial cases.
3. Legal Reasoning — CBR aids in legal case analysis by referencing similar past legal
cases to inform decisions.
4. Customer Support — Help-desk systems employ CBR to provide solutions to
customer issues based on previously resolved cases.
5. Manufacturing — Advanced manufacturing processes can benefit from CBR by
troubleshooting and process control based on past incidents.
6. E-commerce — CBR can enhance self-service and e-commerce applications by
personalizing recommendations based on customer history.
What is the future of case-based reasoning?
The future of CBR looks promising due to its flexibility, accuracy, and simplicity, which
make it an attractive AI approach for various domains. It is expected to grow in popularity
and be increasingly used in areas such as:
• Self-service Applications — CBR can power self-service systems in e-commerce,
providing personalized experiences based on past user interactions.
• Web Applications — The adaptability of CBR to new areas like web applications
suggests its potential for broader application in online services.

44
• Risk Monitoring and Defense — The efficiency of CBR in risk monitoring and
defense indicates its continued relevance in sectors requiring high-stakes decision-
making.
• AI Accessibility — The relative simplicity of CBR makes it accessible for businesses
and organizations new to AI, suggesting its role in democratizing AI usage.
More terms

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Swift User Guide
100% (1)
Swift User Guide
48 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Module 5
No ratings yet
Module 5
91 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Clustering
No ratings yet
Clustering
13 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Clustering
No ratings yet
Clustering
11 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Clustering new
No ratings yet
Clustering new
6 pages
clustering
No ratings yet
clustering
9 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
clustering
No ratings yet
clustering
20 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Clustering
No ratings yet
Clustering
10 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
Clustering
No ratings yet
Clustering
6 pages
M5
No ratings yet
M5
40 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
M5
No ratings yet
M5
40 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit 5
No ratings yet
Unit 5
5 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering
No ratings yet
Clustering
8 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
Clustering
No ratings yet
Clustering
3 pages
Clustering
No ratings yet
Clustering
57 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Data mining and machine learning
No ratings yet
Data mining and machine learning
48 pages
CLUSTERING NOTES IN DEEP
No ratings yet
CLUSTERING NOTES IN DEEP
19 pages
Amity School of Engineering and Technology Amity University, Uttar Pradesh
No ratings yet
Amity School of Engineering and Technology Amity University, Uttar Pradesh
5 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Nikos Dimitrakas - Introduction To MS Access 2016 V3.1-University of Stockholm & Royal Technical University (2016)
No ratings yet
Nikos Dimitrakas - Introduction To MS Access 2016 V3.1-University of Stockholm & Royal Technical University (2016)
133 pages
Installing Ubuntu on VMware Workstation [Easiest Guide] - LinuxSimply
No ratings yet
Installing Ubuntu on VMware Workstation [Easiest Guide] - LinuxSimply
37 pages
Design and Construction of 1KW (1000VA) Power Inverter: January 2014
No ratings yet
Design and Construction of 1KW (1000VA) Power Inverter: January 2014
14 pages
Invoice: Netciti Home Internet 60 Mbps (Monthly Rp. 383,333 For 08/06/2022-30/06/2022) - Michael Louis
No ratings yet
Invoice: Netciti Home Internet 60 Mbps (Monthly Rp. 383,333 For 08/06/2022-30/06/2022) - Michael Louis
1 page
Hotel Reservation System
0% (2)
Hotel Reservation System
24 pages
Design of Deck Slab Bridge
67% (3)
Design of Deck Slab Bridge
11 pages
Installation Manual Dishwasher Ge Adora DDT595
No ratings yet
Installation Manual Dishwasher Ge Adora DDT595
48 pages
Building & Plumbing Handbook: A Guide For Working With Water Corporation
100% (4)
Building & Plumbing Handbook: A Guide For Working With Water Corporation
29 pages
M2MRates July 2017 June 2018
No ratings yet
M2MRates July 2017 June 2018
404 pages
Total Quality Management: Unit - VI
No ratings yet
Total Quality Management: Unit - VI
16 pages
Part One: Essay Ow Migration Connects and Disconnects People and Places in Numerous Ways
No ratings yet
Part One: Essay Ow Migration Connects and Disconnects People and Places in Numerous Ways
7 pages
SQL Server Replication
No ratings yet
SQL Server Replication
3 pages
Rushikesh Yadgirikar
No ratings yet
Rushikesh Yadgirikar
1 page
Authentic Spatial Vulnerability Assessment For Evacuation Shelters in Disaster Planning A Case Study of Tubay, Agusan Del Norte, Philippines
No ratings yet
Authentic Spatial Vulnerability Assessment For Evacuation Shelters in Disaster Planning A Case Study of Tubay, Agusan Del Norte, Philippines
19 pages
Explanation of RoutineTests For FATs On Current Transformers According To IEC 60044
No ratings yet
Explanation of RoutineTests For FATs On Current Transformers According To IEC 60044
6 pages
Inspection Method Statemen2
100% (2)
Inspection Method Statemen2
8 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
The Jampro Low Power FM Broadcast Antenna: 6340 Sky Creek DR, Sacramento, CA 95828 - T: 916.383.1177 - F: 916.383.1182
No ratings yet
The Jampro Low Power FM Broadcast Antenna: 6340 Sky Creek DR, Sacramento, CA 95828 - T: 916.383.1177 - F: 916.383.1182
2 pages
Account Staff Wanted
No ratings yet
Account Staff Wanted
6 pages
Burn
100% (1)
Burn
10 pages
Trigonometry for JEE (Advanced), 3rd edition SOLUTIONS G. Tewani - eBook PDF pdf download
100% (11)
Trigonometry for JEE (Advanced), 3rd edition SOLUTIONS G. Tewani - eBook PDF pdf download
55 pages
Tech Manual - WP65L Sauer USA Rev 7-23-2012
No ratings yet
Tech Manual - WP65L Sauer USA Rev 7-23-2012
114 pages
Comparative Regulatory Guidelines of Biosensors in India
No ratings yet
Comparative Regulatory Guidelines of Biosensors in India
89 pages
HIE-1407-OM3F
No ratings yet
HIE-1407-OM3F
81 pages
UDF For Multicomponent Particle Vaporization - CFD Online Discussion Forums
No ratings yet
UDF For Multicomponent Particle Vaporization - CFD Online Discussion Forums
4 pages
DODO Pizza - Furniture Specification
No ratings yet
DODO Pizza - Furniture Specification
1 page
Yulie Sekuritas Indonesia TBK
No ratings yet
Yulie Sekuritas Indonesia TBK
3 pages
Engineering and Operations LOTO Annual Inspection
No ratings yet
Engineering and Operations LOTO Annual Inspection
3 pages
Admitcard Visakhapatnam SSA218370356794
No ratings yet
Admitcard Visakhapatnam SSA218370356794
1 page

Module-5_Notes_13-12-2024.docx

Uploaded by

Module-5_Notes_13-12-2024.docx

Uploaded by

Module-5 Notes

Textbook 2 Chapter 8.1-8.5

Types of Clustering Methods

Factors to Consider When Choosing a Clustering Method

Evaluation Metrics for Clustering

From Clustering in Machine Learning - GeeksforGeeks

Applications of Clustering in different fields:

When to Use K-Means vs. K-Medoids

Types of Instance-Based Learning Algorithms

Algorithm Description Use Cases

Predicts the output based on the

When closer neighbors

Case-Based Stores specific cases and adapts

Advantages of Instance-Based Learning

Disadvantages of Instance-Based Learning

Applications of Instance-Based Learning

Instance based learning: introduction, k-nearest neighbour learning (review)

Textbook 2 Chapter 8.1-8.5

From Machine Learning Tom M. Mitchell

Instance-Based Learning (IBL) in Machine Learning

Key Characteristics of Instance-Based Learning lazy learning"

Types of Instance-Based Learning Algorithms

Disadvantages of Instance-Based Learning

Applications of Instance-Based Learning

Instance-Based Learning vs. Model-Based Learning

Instance-Based Learning provides a simple yet powerful approach to solving

Instance-Based Learning vs. Model-Based Learning

Instance-Based Learning provides a simple yet powerful approach to solving

Locally Weighted Regression

From Geeks for geeks

Training phase: Compute to minimize the

Predict output: for given query point ,

ML | Locally weighted Linear Regression

Locally Weighted Linear Regression:

We have the following features as independent variables:

• Marks in internal exams

So, these 3 independent variables become 3 dimensions of a space like this-

Image source: Illustrated by the author

Image source: Illustrated by the author

RBF will help us in these kinds of scenarios.

Image source: Illustrated by the author

Image source: Illustrated by the author

Types of Radial Basis Functions

Multiquadratic Radial Basis Function

Image source: Illustrated by the author

Image source: reference 1

Inverse Multiquadric Radial Basis Function

Image source: Illustrated by the author

Image source: Illustrated by the author

Image source: reference 1

• Expanding the data points across the horizontal direction

The process of expansion will visually somewhat look like this-

The process of compression will visually somewhat look like this-

Image source: Illustrated by the author

Such networks are called RBF networks.

What is case-based reasoning? — Klu

You might also like