0% found this document useful (0 votes)
260 views25 pages

ANN-unit 4

Self-Organizing Maps (SOMs), or Kohonen Maps, are unsupervised machine learning algorithms used for dimensionality reduction and visualization of high-dimensional data by mapping it onto a low-dimensional grid of neurons. The training process involves initializing weight vectors, presenting input data, and adjusting weights based on similarity, allowing for clustering and outlier detection. Additionally, Learning Vector Quantization (LVQ) is a supervised algorithm that classifies data using prototype vectors, with both SOMs and LVQs being applicable in various fields such as image analysis and speech recognition.

Uploaded by

Neelesh Bhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
260 views25 pages

ANN-unit 4

Self-Organizing Maps (SOMs), or Kohonen Maps, are unsupervised machine learning algorithms used for dimensionality reduction and visualization of high-dimensional data by mapping it onto a low-dimensional grid of neurons. The training process involves initializing weight vectors, presenting input data, and adjusting weights based on similarity, allowing for clustering and outlier detection. Additionally, Learning Vector Quantization (LVQ) is a supervised algorithm that classifies data using prototype vectors, with both SOMs and LVQs being applicable in various fields such as image analysis and speech recognition.

Uploaded by

Neelesh Bhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-IV

Self-Organization Maps (SOM)


Self Organizing Map (or Kohonen Map or SOM)

Self-Organizing Map (SOM), also known as Kohonen Map after its inventor Teuvo Kohonen, is
an unsupervised machine learning algorithm used for dimensionality reduction and visualization
of high-dimensional data. It is a type of neural network that learns to map high-dimensional input
data onto a low-dimensional grid of neurons, typically a two-dimensional grid.

SOM works by creating a grid of neurons, where each neuron is associated with a weight vector
of the same dimensionality as the input data. During training, the algorithm adjusts the weight
vectors of the neurons in a way that ensures that nearby neurons in the grid respond similarly to
similar inputs. This results in a mapping of the high-dimensional input space onto a low-
dimensional grid, where each neuron represents a region of the input space.

SOM training involves two main phases: initialization and adaptation. During the initialization
phase, the weight vectors of the neurons are randomly initialized. During the adaptation phase,
the algorithm iteratively presents the input data to the network and updates the weight vectors of
the neurons based on how well they match the input data. The update rule is such that neurons
that are close to the input data in the high-dimensional space will have their weight vectors
adjusted more than neurons that are far away.

After training, the SOM can be used for data visualization, clustering, and outlier detection. Data
visualization involves mapping the input data onto the low-dimensional grid of neurons, which
allows us to visualize the high-dimensional data in a 2D or 3D space. Clustering involves
grouping similar input data points into clusters based on the neuron they activate in the SOM.
Outlier detection involves identifying input data points that do not fit well into any of the
clusters. SOMs have been successfully used in various applications such as image analysis,
natural language processing, and pattern recognition. They are particularly useful for visualizing
and exploring high-dimensional data in a low-dimensional space.

How do SOM works?


Self-Organizing Maps (SOMs) are a type of unsupervised machine learning algorithm used for
dimensionality reduction, data visualization, and clustering. The basic idea behind SOMs is to
map high-dimensional input data onto a low-dimensional grid of neurons, typically a two-
dimensional grid.

1
Here are the basic steps involved in training a SOM:

Initialization: The first step in training a SOM is to initialize the weight vectors of the neurons in
the grid. These weight vectors are randomly assigned and have the same dimensionality as the
input data.

Input presentation: Next, the algorithm iteratively presents the input data to the network one at a
time. For each input data point, the algorithm finds the neuron with the weight vector that is
closest to the input data point in the high-dimensional space. This neuron is called the "winner"
neuron.

Neighborhood function: The algorithm then adjusts the weight vectors of the winner neuron and
its neighbors in the grid based on a neighborhood function. The neighborhood function is
typically a Gaussian function that assigns greater weight to neurons that are closer to the winner
neuron in the grid.

Learning rate: The amount by which the weight vectors are adjusted is determined by a learning
rate, which gradually decreases over time. This means that early in training, the weight vectors
are adjusted by large amounts, while later in training, they are adjusted by smaller amounts.

Repeat: Steps 2-4 are repeated for a fixed number of iterations or until convergence criteria are
met.

During training, the SOM gradually organizes the neurons in the grid to reflect the structure of
the input data. Neurons that are close to each other in the grid respond similarly to similar inputs,
which allow us to visualize the high-dimensional data in a 2D or 3D space.

After training, the SOM can be used for data visualization, clustering, and outlier detection. Data
visualization involves mapping the input data onto the low-dimensional grid of neurons, which
allows us to visualize the high-dimensional data in a 2D or 3D space. Clustering involves
grouping similar input data points into clusters based on the neuron they activate in the SOM.
Outlier detection involves identifying input data points that do not fit well into any of the
clusters.

Algorithm

Here is the algorithm for Self-Organizing Maps (SOM):

Initialization: Initialize the weight vectors for each neuron in the grid with random values of the
same dimensionality as the input data.

Input presentation: Present an input data point to the SOM.

2
Similarity computation: Compute the similarity between the input data point and the weight
vectors of all the neurons in the grid. Typically, Euclidean distance is used to measure the
similarity between the input data point and the weight vector of each neuron.

Find the winner neuron: Find the neuron with the weight vector closest to the input data point in
the high-dimensional space. This neuron is called the "winner" neuron.

Update the weight vectors: Update the weight vectors of the winner neuron and its neighbors in
the grid based on a neighborhood function. The neighborhood function assigns greater weight to
neurons that are closer to the winner neuron in the grid. The weight vectors are updated
according to the following equation:

w_{i,j}(t+1) = w_{i,j}(t) + α(t)h_{i,j}(t)(x-w_{i,j}(t))

where w_{i,j}(t) is the weight vector of the neuron at position (i,j) in the grid at time t, α(t) is the
learning rate at time t, h_{i,j}(t) is the neighborhood function for the neuron at position (i,j) in
the grid at time t, and x is the input data point.

Update the learning rate: Update the learning rate α(t) according to some predetermined
schedule, typically decreasing over time.

Update the neighborhood function: Update the neighborhood function h_{i,j}(t) according to
some predetermined schedule, typically decreasing over time.

Repeat: Steps 2-7 are repeated for a fixed number of iterations or until convergence criteria are
met.

During training, the SOM gradually organizes the neurons in the grid to reflect the structure of
the input data. Neurons that are close to each other in the grid respond similarly to similar inputs,
which allow us to visualize the high-dimensional data in a 2D or 3D space.

After training, the SOM can be used for data visualization, clustering, and outlier detection. Data
visualization involves mapping the input data onto the low-dimensional grid of neurons, which
allows us to visualize the high-dimensional data in a 2D or 3D space. Clustering involves
grouping similar input data points into clusters based on the neuron they activate in the SOM.
Outlier detection involves identifying input data points that do not fit well into any of the
clusters.

Learning Vector Quantization

Learning Vector Quantization (LVQ) is a supervised learning algorithm used for classification
tasks. It is similar to the Self-Organizing Map (SOM) algorithm, but instead of mapping input
data onto a low-dimensional grid of neurons, LVQ maps input data onto a set of prototype
vectors, each representing a class.

3
LVQ works by first initializing a set of prototype vectors, one for each class. During training, the
algorithm iteratively presents the input data to the network and adjusts the prototype vectors
based on how well they match the input data. The update rule is such that the prototype vector
closest to the input data is adjusted to be more like the input data, while the prototype vectors of
other classes remain unchanged.

LVQ training involves two main phases: initialization and adaptation. During the initialization
phase, the prototype vectors are randomly initialized. During the adaptation phase, the algorithm
iteratively presents the input data to the network and updates the prototype vectors based on how
well they match the input data.

After training, the LVQ can be used for classification by assigning an input data point to the
class represented by the closest prototype vector. LVQ can also be extended to handle multi-
class classification problems using techniques such as one-vs-all and one-vs-one.

LVQ has been successfully used in various applications such as speech recognition, handwritten
digit recognition, and image classification. It is particularly useful for problems with a small
number of classes and when interpretability of the model is important.

How Learning Vector Quantization works?

Learning Vector Quantization (LVQ) is a type of supervised machine learning algorithm used for
classification tasks. The basic idea behind LVQ is to map the input data onto a set of prototype
vectors, and use these prototype vectors to classify new input data.
Here are the basic steps involved in training an LVQ model:
Initialization: Initialize the prototype vectors randomly or using a clustering algorithm like k-
means.
Input presentation: Present an input data point to the LVQ model.
Similarity computation: Compute the similarity between the input data point and each prototype
vector in the set. Typically, Euclidean distance is used to measure the similarity between the
input data point and each prototype vector.
Find the winner prototype: Find the prototype vector that is closest to the input data point in the
high-dimensional space. This prototype vector is called the "winner" prototype.

4
Update the prototype vectors: Update the weights of the winner prototype and its neighbors in
the prototype set based on a learning rate and a neighborhood function. The learning rate
determines the amount of change that should be made to the winner prototype and its neighbors,
while the neighborhood function determines the size of the neighborhood around the winner
prototype. The weight vectors are updated according to the following equation:

w_{i}(t+1) = w_{i}(t) + α(t)h_{i}(t)(x-w_{i}(t))

where w_{i}(t) is the weight vector of the i-th prototype in the set at time t, α(t) is the learning
rate at time t, h_{i}(t) is the neighborhood function for the i-th prototype in the set at time t, and
x is the input data point.

Repeat: Steps 2-5 are repeated for a fixed number of iterations or until convergence criteria are
met.
During training, the LVQ model gradually organizes the prototype vectors in the set to reflect the
structure of the input data. Each prototype vector in the set corresponds to a different class, and
the goal of training is to move the prototypes so that they are closer to the input data points from
their corresponding class.
After training, the LVQ model can be used to classify new input data by finding the prototype
vector that is closest to the input data point in the high-dimensional space. The class of the input
data point is then assigned to the class of the closest prototype vector.
LVQ is a simple yet effective algorithm for classification tasks. However, it is limited by the fact
that it requires a fixed number of prototype vectors, and the number of prototype vectors needs to
be chosen before training begins. Additionally, the performance of the algorithm depends on the
initial choice of prototype vectors.

Two basic feature mapping models are:

Self-Organizing Maps (SOMs): SOMs are also known as Kohonen maps, after the name of their
inventor. They are a type of unsupervised neural network that maps high-dimensional input data
onto a low-dimensional grid of neurons. The neurons in the grid are connected to their neighbors,
and each neuron has a weight vector that is initialized randomly or using a clustering algorithm.
During training, the weights of the neurons are updated based on the similarity between their
weight vectors and the input data. This process gradually organizes the neurons in the grid to
reflect the structure of the input data, with nearby neurons representing similar input data.

Radial Basis Function Networks (RBFNs): RBFNs are a type of supervised neural network that
use radial basis functions as activation functions. The basic idea behind RBFNs is to map input
data onto a set of prototype vectors, and use these prototype vectors to classify new input data.
The prototype vectors are learned using a clustering algorithm such as k-means, and each
prototype vector is associated with a radial basis function. During training, the weights of the
radial basis functions are adjusted to minimize the difference between the network output and the
true output for the training data. The RBFNs are useful for non-linear classification problems, as
the radial basis functions allow for non-linear decision boundaries.

5
This chapter spans 5 parts:

1. What is Self Organizing Maps?

2. K-Mean Clustering Technique.

3. SOMs Network Architecture.

4. How Self Organizing Maps work.

5. Practical Implementation of SOMs.

1: What is Self Organization Maps?


Self-Organizing Maps (SOMs), also known as Kohonen maps, are a type of unsupervised neural
network that can be used for dimensionality reduction, data visualization, and clustering. SOMs
were invented by Finnish professor Teuvo Kohonen in the 1980s.
The basic idea behind SOMs is to map high-dimensional input data onto a low-dimensional grid
of neurons. Each neuron in the grid has a weight vector that is randomly initialized or using a
clustering algorithm. During training, the weights of the neurons are updated based on the
similarity between their weight vectors and the input data. This process gradually organizes the
neurons in the grid to reflect the structure of the input data, with nearby neurons representing
similar input data.
The SOM algorithm involves several steps:
Initialization: Initialize the SOM grid with random weights or using a clustering algorithm.
Input: Present an input vector to the SOM.
Similarity: Compute the distance between the input vector and the weight vector of each neuron.
Winner: Identify the neuron with the closest weight vector as the winner neuron.
Update: Update the weights of the winner neuron and its neighbors to move them closer to the
input vector.
Repeat: Repeat steps 2-5 for all input vectors and for a number of iterations.
The SOM algorithm can be used for a variety of tasks, such as clustering, data visualization, and
feature extraction. For example, by visualizing the SOM grid, we can identify clusters of similar
input data and their corresponding neuron weights. Additionally, by using the SOM as a feature
extractor, we can reduce the dimensionality of high-dimensional input data while preserving the
underlying structure of the data.

6
2. K-Mean Clustering Technique.

2.1: What is k-Mean?

K-Means clustering aims to partition n observation into k clusters in which each observation
belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

2.2: How k-Mean Cluster work?

The k-Means clustering algorithm attempt to split a given anonymous data set(a set of containing
information as to class identity into a fixed number (k) of the cluster.

Initially, k number of the so-called centroid is chosen. A centroid is a data point (imaginary or
real) at the center of the cluster.

Dataset Description:

This dataset has three attributes first is an item which is our target to make a cluster of similar
items second and the third attribute is the informatics value of that item.

7
Now In the first step take any random row to let’s suppose I take row 1 and row 3.

Now take these above centroid values to compare with observing the value of the respected row
of our data by using the Euclidean Distance formula.

Now let’s solve one by one


Row 1 (A)

Row 2 (B)

Row 3 (C)

Row 4 (D)

Row 5 (E)

Now

8
Let’s say A and B are belong the Cluster 1 and C, D and E.
As we show in below table

Now calculate the centroid of cluster 1 and 2 respectively and again calculate the closest mean
until calculate when our centroid is repeated previous one.

Now find the Centroid of respected Cluster 1 and Cluster 2

New Centroid
X1 = (1, 0.5)
X2 = (1.7, 3.7)
Previous Centroid
X1 = (1, 1)
X2 = (0, 2)
If New Centroid Value is equal to previous Centroid Value then our cluster is final otherwise if
not equal then repeat the step until new Centroid value is equal to previous Centroid value.
So in our case new centroid value is not equal to previous centroid.
Now recalculate cluster having the closest mean.
So
X1 = (1, 0.5)
X2 = (1.7, 3.7)
Similarly procedure as we calculate above

9
So based on based one, A B and C belongs to cluster 1 & D and E from cluster 2.

So mean of Cluster I and 2

X1(Cluster 1) = (0.7, 1)

X1(Cluster 2) = (2.5, 4.5)

New Centroid

X1 = (0.7, 1)

X2 = (2.5, 4.5)
Previous Centroid
X1 = (1, 0.5)
X2 = (1.7, 3.7)
If New Centoid Value is equal to previous Centroid Value then our cluster is final otherwise if not
equal then repeat the step until new Centroid value is equal to previous Centroid value. So in our
case new centroid value is not equal to previous centroid.
Now recalculate cluster having a closest mean similar step

10
So based on closest distance, A B and C belongs to cluster 1 & D and E from cluster 2.

So mean of Cluster I and 2


X1(Cluster 1) = (0.7, 1)
X1(Cluster 2) = (2.5, 4.5)
New Centroid
X1 = (1, 0.5)
X2 = (1.7, 3.7)
Previous Centroid
X1 = (1, 0.5)
X2 = (1.7, 3.7)
So here we have New Centroid values is Equal to previous value and Hence our cluster are final.
A, B and C are belong to cluster 1 and D and E are belong to Cluster 2.
As shown in fig:

3. Self Organization Maps Network Architecture?


The architecture of a Self-Organizing Map (SOM) consists of an input layer and a 2D grid of
output neurons. The input layer receives the input data, which is typically a high-dimensional
vector. The output layer consists of a 2D grid of neurons, where each neuron has a weight vector
with the same number of dimensions as the input data. The output neurons are arranged in a grid,
typically in a rectangular or hexagonal shape.
During training, the weight vectors of the output neurons are adjusted based on the similarity
between their weight vectors and the input data. The similarity is typically measured using a
distance metric such as Euclidean distance. The neuron with the weight vector closest to the
input data is known as the "winning" neuron. The weight vectors of the winning neuron and its
neighbors are updated to move them closer to the input data.
The SOM architecture has a number of advantages over other neural network architectures. It is
well-suited for visualizing high-dimensional data in a low-dimensional space, and it can be used
for unsupervised clustering and classification tasks. Additionally, because the SOM is a type of
unsupervised neural network, it does not require labeled data for training. This makes it useful
for tasks such as anomaly detection and exploratory data analysis.

11
However, the SOM architecture also has some limitations. Because the SOM is an unsupervised
network, it is not suitable for tasks that require labeled data, such as classification or regression.
Additionally, because the SOM is a type of neural network, it can be computationally expensive
to train and may require large amounts of training data.

4. How Self Organization Maps work?

Self-Organizing Maps (SOMs) work by transforming high-dimensional input data into a lower-
dimensional representation, while preserving the structure and topology of the input data. The
SOM algorithm is an unsupervised learning method that uses a 2D grid of output neurons to
represent the input data in a low-dimensional space.
The SOM algorithm involves the following steps:
Initialization: Initialize the weight vectors of the output neurons randomly or using a clustering
algorithm.
Input: Present an input vector to the SOM.
Similarity: Compute the distance between the input vector and the weight vector of each neuron
in the grid.
Winner: Identify the neuron with the closest weight vector as the winner neuron.
Update: Update the weights of the winner neuron and its neighboring neurons to move them
closer to the input vector. The update rule is typically based on a neighborhood function that
decreases with distance from the winner neuron.
Repeat: Repeat steps 2-5 for all input vectors and for a number of iterations.
Over time, the SOM algorithm organizes the output neurons in the grid to reflect the structure of
the input data. Similar input vectors are mapped to nearby neurons in the grid, while dissimilar
input vectors are mapped to distant neurons. This allows the SOM to be used for tasks such as
clustering, visualization, and data compression.
The SOM algorithm has several advantages over other unsupervised learning methods, such as k-
means clustering or principal component analysis (PCA). It is capable of preserving the topology
of the input data, which can be useful for tasks such as visualizing high-dimensional data in a
low-dimensional space. Additionally, because the SOM is a type of neural network, it can be
used for nonlinear dimensionality reduction, which is not possible with linear methods such as
PCA.
Learning Algorithm in Details.

Now it’s time for us to learn how SOMs learn. Are you ready? Let’s begin. Right here we have a
very basic self-organizing map.

12
Our input vectors amount to three features, and we have nine output nodes.

That being said, it might confuse you to see how this example shows three input nodes producing
nine output nodes. Don’t get puzzled by that. The three input nodes represent three columns
(dimensions) in the dataset, but each of these columns can contain thousands of rows. The output
nodes in a SOM are always two-dimensional.

Now what we’ll do is turn this SOM into an input set that would be more familiar to you from
when we discussed the supervised machine learning methods (artificial, convolution, and
recurrent neural networks) in earlier chapters.

Consider the Structure of Self Organizing which has 3 visible input nodes and 9 outputs that are
connected directly to input as shown below fig.

Our input nodes values are:

Now let’s take a look at each step in detail.


Step 1: Initializing the Weights

Now, let’s take the topmost output node and focus on its connections with the input nodes. As you
can see, there is a weight assigned to each of these connections.

13
Again, the word “weight” here carries a whole other meaning than it did with artificial and
convolution neural networks. For instance, with artificial neural networks we multiplied the input
node’s value by the weight and, finally, applied an activation function. With SOMs, on the other
hand, there is no activation function.
Weights are not separate from the nodes here. In a SOM, the weights belong to the output node
itself. Instead of being the result of adding up the weights, the output node in a SOM contains the
weights as its coordinates. Carrying these weights, it sneakily tries to find its way into the input
space.
In this example, we have a 3D dataset, and each of the input nodes represents an x-coordinate.
The SOM would compress these into a single output node that carries three weights. If we happen
to deal with a 20-dimensional dataset, the output node, in this case, would carry 20 weight
coordinates.
Each of these output nodes does not exactly become parts of the input space, but try to integrate
into it nevertheless, developing imaginary places for themselves.

14
We have randomly initialized the values of the weights (close to 0 but not 0).

Step 2: Calculating the Best Matching Unit

The next step is to go through our dataset. For each of the rows in our dataset, we’ll try to find the
node closest to it.

Say we take row number 1, and we extract its value for each of the three columns we have. We’ll
then want to find which of our output nodes is closest to that row.

To determine the best matching unit, one method is to iterate through all the nodes and calculate
the Euclidean distance between each node’s weight vector and the current input vector. The node
with a weight vector closest to the input vector is tagged as the BMU

The Euclidean distance is given as:

Where X is the current input vector and W is the node’s weight vector.

Let’s calculate the Best Match Unit using the Distance formula.

For 1st Nodes:

15
For 2nd Nodes:

For 3rd Nodes:

Similarly, way we calculate all remaining Nodes the same way as you can see below.

Similarly, way we calculate all remaining Nodes the same way as you can see below.

16
As we can see, node number 3 is the closest with a distance of 0.4. We will call this node our
BMU (best-matching unit).

What happens next?

To understand this next part, we’ll need to use a larger SOM.

17
Supposedly you now understand what the difference is between weights in the SOM context as
opposed to the one we were used to when dealing with supervised machine learning.

The red circle in the figure above represents this map’s BMU. Now, the new SOM will have to
update its weights so that it is even closer to our dataset’s first row. The reason we need this is
that our input nodes cannot be updated, whereas we have control over our output nodes.

In simple terms, our SOM is drawing closer to the data point by stretching the BMU towards it.
The end goal is to have our map as aligned with the dataset as we see in the image on the far right

Step 3: Calculating the size of the neighborhood around the BMU

This is where things start to get more interesting! Each iteration, after the BMU has been
determined, the next step is to calculate which of the other nodes are within the BMU’s
neighborhood. All these nodes will have their weight vectors altered in the next step. So how do
we do that? Well, it’s not too difficult… first, you calculate what the radius of the neighborhood
should be and then it’s a simple application of good old’ Pythagoras to determine if each node is
within the radial distance or not.

Where t = 0, 1, 2, 3….

Figure below shows how the neighborhood decreases over time after each iteration

18
Over time the neighborhood will shrink to the size of just one node… the BMU.

Now we know the radius, it’s a simple matter to iterate through all the nodes in the lattice to
determine if they lay within the radius or not. If a node is found to be within the neighborhood
then its weight vector is adjusted as follows in Step 4.
How to set the radius value in the self-organizing map?
It depends on the range and scale of your input data. If you are mean-zero standardizing your
feature values, then try σ=4. If you are normalizing feature values to a range of [0, 1] then you can
still try σ=4, but a value of σ=1 might be better. Remember, you have to decrease the learning rate
α and the size of the neighborhood function with increasing iterations, as none of the metrics stay
constant throughout the iterations in SOM.
It also depends on how large your SOM is. If it’s a 10 by 10, then use for example σ=5.
Otherwise, if it’s a 100 by 100 map, use σ=50.
In unsupervised classification, σ is sometimes based on the Euclidean distance between the
centroid of the first and second closest clusters.
Step 4: Adjusting the Weights
Every node within the BMU’s neighborhood (including the BMU) has its weight vector adjusted
according to the following equation:
New Weights = Old Weights + Learning Rate (Input Vector — Old Weights)
W(t+1) = W(t) + L(t) ( V(t) — W(t) )
Where t represents the time-step and L is a small variable called the learning rate, which decreases
with time. What this equation is sayiWhatnewly adjusted weight for the node is equal to the old
weight (W), plus a fraction of the difference (L) between the old weight and the input vector (V).
So according to our example are Node 4 is Best Match Unit (as you can see in step 2)
corresponding their weights:

Learning rate = 0.5


19
So update that weight according to the above equation
For W3, 1
New Weights = Old Weights + Learning Rate (Input Vector1 — Old Weights)
New Weights = 0.39 + 0.5 (0.7–0.39)
New Weights = 0.545
For W3, 2
New Weights = Old Weights + Learning Rate (Input Vector2 — Old Weights)
New Weights = 0.42 + 0.5 (0.6–0.42)
New Weights = 0.51
For W3, 3
New Weights = Old Weights + Learning Rate (Input Vector3 — Old Weights)
New Weights = 0.45 + 0.5 (0.9–0.45)
New Weights = 0.675
Updated weights:

So in this way we update the weights.


The decay of the learning rate is calculated each iteration using the following equation:

As training goes on, the neighborhood gradually shrinks. At the end of the training, the
neighborhoods have shrunk to zero sizes.

The influence rate shows the amount of influence a node’s distance from the BMU has on its
learning. In the simplest form influence rate is equal to 1 for all the nodes close to the BMU and
zero for others, but a Gaussian function is common too. Finally, from a random distribution of
weights and through much iteration, SOM can arrive at a map of stable zones. In the end,
interpretation of data is to be done by a human but SOM is a great technique to present the
invisible patterns in the data.
5. Practical Implementation of SOMs?
Fraud Detection
According to a recent report published by Markets & Markets, the Fraud Detection and
Prevention Market is going to be worth USD 33.19 Billion by 2021. This is a huge industry and
the demand for advanced Deep Learning skills is only going to grow. That’s why we have
included this case study in this chapter.

20
The business challenge here is about detecting fraud in credit card applications. We will be
creating a Deep Learning model for a bank and given a dataset that contains information on
customers applying for an advanced credit card.
This is the data that customers provided when filling the application form. Our task is to detect
potential fraud within these applications. That means that by the end of the challenge, we will
come up with an explicit list of customers who potentially cheated on their applications.

Algorithm for training


Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme.
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every input vector x.
Step 4 − Calculate Square of Euclidean Distance for j = 1 to m
Step 5 − Obtain the winning unit J where D is minimum.
Step 6 − Calculate the new weight of the winning unit by the following relation −
Step 7 − Update the learning rate α by the following relation −
Step 8 − Reduce the radius of topological scheme.
Step 9 − Check for the stopping condition for the network
SOM Algorithm
The Self-Organizing Map (SOM) algorithm is a type of artificial neural network that is used for
unsupervised learning. The SOM algorithm is designed to learn and organize complex patterns
and relationships in high-dimensional data sets.
The SOM algorithm works by creating a two-dimensional grid of nodes, each of which
represents a different feature or attribute of the data set. The nodes are connected to each other in
a network, and each node is assigned a weight vector that is randomly initialized.

21
During training, the SOM algorithm iteratively adjusts the weight vectors of the nodes based on
the input data. For each input data point, the SOM algorithm identifies the node whose weight
vector is closest to the input data point, and then updates the weight vectors of the neighboring
nodes in the network based on a distance metric.
Over time, the SOM algorithm creates a map of the input data that is organized based on the
similarity of the input data points. Data points that are similar are grouped together in the same
region of the map, while dissimilar data points are separated.
The SOM algorithm has many applications, including image and signal processing, data
visualization, and clustering. It is particularly useful for exploratory data analysis, where the goal
is to identify patterns and relationships in the data without prior knowledge of the underlying
structure.

Properties of Feature Map

The properties of a feature map depend on the type of feature map being used, but in general, a
feature map should have the following properties:
Topological ordering: The neurons in the feature map should be arranged in a way that preserves
the topology of the input data. This means that similar input data should be mapped to nearby
neurons in the feature map.
Dimensionality reduction: The feature map should reduce the dimensionality of the input data,
while preserving the important features. This makes it easier to visualize and analyze high-
dimensional data.
Nonlinear mapping: The feature map should be capable of nonlinear mapping, which allows it to
capture complex relationships between the input data and the output neurons.
Robustness: The feature map should be robust to noise and outliers in the input data. This means
that small changes in the input data should not result in large changes in the output.
Generalization: The feature map should be capable of generalizing to new input data that was not
seen during training. This means that it should not overfit to the training data.
Computational efficiency: The feature map should be computationally efficient, meaning that it
can be trained and used with reasonable computational resources.
Different types of feature maps may prioritize some of these properties over others, depending
on the specific application. For example, a Self-Organizing Map (SOM) prioritizes topological
ordering and dimensionality reduction, while a Convolution Neural Network (CNN) prioritizes
nonlinear mapping and robustness to local input variations.

22
Computer Simulations

Computer simulations refer to the use of computer programs to model or simulate the behavior
of real-world systems or processes. These simulations can be used for a wide range of purposes,
such as understanding complex phenomena, predicting future outcomes, optimizing system
performance, and testing hypotheses.
Computer simulations can be divided into different categories based on the level of abstraction
and the type of modeling used. Some common types of computer simulations include:
Physical simulations: These simulations model physical systems, such as the behavior of fluids,
the motion of objects, or the interactions of particles. Physical simulations can be used to study
the behavior of complex systems that are difficult or impossible to study experimentally.
Computational simulations: These simulations use mathematical models and algorithms to
simulate the behavior of systems. Computational simulations can be used to study a wide range
of phenomena, including weather patterns, financial markets, and biological systems.
Agent-based simulations: These simulations model the behavior of individual agents or entities,
such as people, animals, or cells. Agent-based simulations can be used to study the behavior of
complex systems that arise from the interactions of many individual entities.
Game simulations: These simulations model strategic interactions between individuals or groups,
such as in economic markets, political systems, or social networks. Game simulations can be
used to study the behavior of complex systems that arise from the interactions of self-interested
actors.
Computer simulations can be used to gain insights into complex systems that would be difficult
or impossible to study experimentally. They can also be used to optimize system performance,
design new technologies, and test hypotheses in a controlled environment. However, computer
simulations also have limitations, such as the need for accurate and reliable input data, the
complexity of modeling real-world systems, and the potential for model biases and errors.

Learning Vector Quantization

Learning Vector Quantization (LVQ) is a type of artificial neural network that is commonly used
for classification and pattern recognition tasks. LVQ is a supervised learning algorithm that
learns from a labeled dataset, where each data point is associated with a specific class or
category.

23
The basic idea behind LVQ is to create a set of prototypes, or reference vectors, that represent
the different classes in the dataset. During training, the algorithm adjusts the prototypes so that
they become better representatives of their respective classes. This is done by presenting the
algorithm with examples from the dataset, and then adjusting the prototypes based on how well
they match the examples.
The LVQ algorithm can be broken down into the following steps:
Initialization: The algorithm starts by randomly selecting a set of prototypes from the input data.
Training: The algorithm iteratively presents the training data to the prototypes and adjusts them
based on how well they match the input. During each iteration, the algorithm selects a random
data point from the training set and computes its distance to each of the prototypes. The
prototype that is closest to the input is known as the Best Matching Unit (BMU).
Prototype update: The algorithm updates the weights of the BMU and its neighbors in order to
improve their ability to classify data points from their respective classes. The amount of weight
adjustment is determined by a learning rate parameter that gradually decreases over time.
Repeat: The algorithm repeats steps 2 and 3 for a fixed number of iterations or until the
prototypes converge.
Once the prototypes have been trained, they can be used to classify new data points. During
classification, the algorithm computes the distance between the input and each prototype, and
assigns the input to the class of the prototype that is closest to it.
LVQ is a relatively simple and computationally efficient algorithm that can be used for a wide
range of classification tasks. It is particularly well-suited for datasets that have well-defined class
boundaries and where the number of classes is relatively small. However, LVQ may not be as
effective for datasets that have complex or overlapping class boundaries, or where the number of
classes is very large.

Adaptive Patter Classification

Adaptive pattern classification is a type of machine learning technique that is used to classify
patterns or data points based on their features or attributes. It is a supervised learning technique
that requires a labeled dataset for training.
The adaptive aspect of this technique refers to the ability of the classifier to update its internal
model as new data is presented. This means that the classifier can learn from experience and
improve its accuracy over time.
The basic idea behind adaptive pattern classification is to use a set of training data to build a
model that can classify new data points. The model typically consists of a set of rules or decision
boundaries that are used to map input data to specific output classes. During training, the
algorithm adjusts the model parameters based on how well it performs on the training data.
There are many different algorithms that can be used for adaptive pattern classification,
including decision trees, support vector machines, neural networks, and k-nearest neighbor
classifiers. Each of these algorithms has its own strengths and weaknesses, and the choice of
algorithm depends on the specific requirements of the classification task.

24
One of the key advantages of adaptive pattern classification is its ability to learn from new data.
This means that the classifier can adapt to changes in the underlying data distribution and
improve its accuracy over time. This is particularly important in applications where the data
distribution may change over time, such as in online advertising or fraud detection.
Adaptive pattern classification can be used in a wide range of applications, including image and
speech recognition, natural language processing, and anomaly detection. It is an important tool
for solving many real-world classification problems, and its effectiveness depends on the quality
of the training data and the choice of algorithm.

25

You might also like