unit 2
unit 2
Unsupervised learning
Unlike supervised learning, no teacher is provided that means no training will be given to
the machine. Therefore the machine is restricted to find the hidden structure in unlabeled
data by itself.
For instance, suppose it is given an image having both dogs and cats which it has never
seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it
as ‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and
differences, i.e., we can easily categorize the above picture into two parts. The first may
contain all pics having dogs in them and the second part may contain all pics having cats
in them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was
previously undetected. It mainly deals with unlabelled data.
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
This network is just like a single layer feed-forward network having feedback connection
between the outputs. The connections between the outputs are inhibitory type, which is
shown by dotted lines, which means the competitors never support themselves.
Basic Concept of Competitive Learning Rule
As said earlier, there would be competition among the output nodes so the main concept is
- during training, the output unit that has the highest activation to a given input pattern, will
be declared the winner. This rule is also called Winner-takes-all because only the winning
neuron is updated and the rest of the neurons are left unchanged.
During training process also the weights remains fixed in these competitive networks. The
idea of competition is used among neurons for enhancement of contrast in their activation
functions. In this, two networks- Maxnet and Hamming networks
In most of the neural networks using unsupervised learning, it is essential to compute the
distance and perform comparisons.
Max Net
This is also a fixed weight network, which serves as a subnet for selecting the node having
the highest input. All the nodes are fully interconnected and there exists symmetrical
weights in all these weighted interconnections.
When a net is trained to classify the input signal into one of the output categories, A, B, C,
D, E, J, or K, the net sometimes responded that the signal was both a C and a K, or both an
E and a K, or both a J and a K, due to similarities in these character pairs. In this case it
will be better to include additional structure in the net to force it to make a definitive
decision. The mechanism by which this can be accomplished is called competition.
The most extreme form of competition among a group of neurons is called Winner-
TakeAll, where only one neuron (the winner) in the group will have a nonzero output
signal when the competition is completed. An example of that is the MAXNET.
Architecture
It uses the mechanism which is an iterative process and each node receives inhibitory
inputs from all other nodes through connections. The single node whose value is maximum
would be active or winner and the activations of all other nodes would be inactive.
Hamming networks
This kind of network is Hamming network, where for every given input vectors, it would
be clustered into different groups. Following are some important features of Hamming
Networks −
Hamming Distance
The value "a - d" is the Hamming distance existing between two vectors. Since, the total
number of components is n, we have, n = a + d
i.e., d = n - a
On simplification, we
get x.y = a - (n - a)
x.y = 2a -
n 2a = x.y
+ n a =
1/2x.y +
1/2n
From the above equation, it is clearly understood that the weights can be set to one-half the
exemplar vector and bias can be set initially to n/2
Kohonen Self- Organizing Feature Map
The self-organizing map makes topologically ordered mappings between input data and
processing elements of the map. Topological ordered implies that if two inputs are of
similar characteristics, the most active processing elements answering to inputs that are
located closed to each other on the map. The weight vectors of the processing elements are
organized in ascending to descending order. Wi < Wi+1 for all values of i or Wi+1 for all
values of i (this definition is valid for one-dimensional self-organizing map only).
It is discovered by Finnish professor and researcher Dr. Teuvo Kohonen in 1982. The self-
organizing map refers to an unsupervised learning model proposed for applications in
which maintaining a topology between input and output spaces
All the entire learning process occurs without supervision because the nodes are self-
organizing. They are also known as feature maps, as they are basically retraining the
features of the input data, and simply grouping themselves as indicated by the similarity
between each other. It has practical value for visualizing complex or huge quantities of high
dimensional data and showing the relationship between them into a low, usually two-
dimensional field to check whether the given unlabeled data have any structure to it.
A self-Organizing Map (SOM) varies from typical artificial neural networks (ANNs) both
in its architecture and algorithmic properties. Its structure consists of a single layer linear
2D grid of neurons, rather than a series of layers. All the nodes on this lattice are associated
directly to the input vector, but not to each other. It means the nodes don't know the values
of their neighbors, and only update the weight of their associations as a function of the
given input. The grid itself is the map that coordinates itself at each iteration as a function
of the input data. As such, after clustering, each node has its own coordinate (i.j), which
enables one to calculate Euclidean distance between two nodes
Algorithm:
Step:1
Step:2
Step:3
Step:4
Calculate the Euclidean distance between weight vector w ij and the input vector x(t)
connected with the first node, where t, i, j =0. Step:5 track the node that generates the
smallest distance t.
Step:6
Calculate the overall Best Matching Unit (BMU). It means the node with the smallest
distance from all calculated ones.
Step:7
Discover topological neighborhood βij(t) its radius σ(t) of BMU in Kohonen Map.
Step:8
Repeat for all nodes in the BMU neighborhood: Update the weight vector w_ij of the first
node in the neighborhood of the BMU by including a fraction of the difference between the
input vector x(t) and the weight w(t) of the neuron. Wij(new)=wij(old)+alpha[xi-wij(old)]
Step:9
Repeat the complete iteration until reaching the selected iteration limit t=n.
This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6
nodes in the distance-0 grid, which means the difference between each rectangular grid is 6
nodes. The winning unit is indicated by #.
Learning Vector Quantization ( or LVQ ) is a type of Artificial Neural Network which
also inspired by biological models of neural systems. It is based on prototype supervised
learning classification algorithm and trained its network through a competitive learning
algorithm similar to Self Organizing Map. It can also deal with the multiclass classification
problem. LVQ has two layers, one is the Input layer and the other one is the Output layer.
The architecture of the Learning Vector Quantization with the number of classes in an input
data and n number of input features for any sample is given
Let’s say that an input data of size ( m, n ) where m is the number of training examples and
n is the number of features in each example and a label vector of size ( m, 1 ). First, it
initializes the weights of size ( n, c ) from the first c number of training samples with
different labels and should be discarded from all training samples. Here, c is the number of
classes. Then iterate over the remaining input data, for each training example, it updates the
winning vector ( weight vector with the shortest distance ( e.g Euclidean distance ) from the
training example ).
where alpha is a learning rate at time t, j denotes the winning vector, i denotes the i th feature
of training example and k denotes the k th training example from the input data. After
training the LVQ network, trained weights are used for classifying new examples. A new
example is labelled with the class of the winning vector.
Algorithm LVQ :
Step 3: Update weights on the winning unit wi using the following conditions:
if T = J then wi(new) = wi (old) +
α[x – wi(old)] if T ≠ J then
wi(new) = wi (old) – α[x – wi(old)]
Step 4: Check for the stopping condition if false repeat the above steps.
Below is the implementation
Counter propagation network
Counter propagation network (CPN) were proposed by Hecht Nielsen in 1987.They are
multilayer network based on the combinations of the input, output, and clustering layers.
The application of counter propagation net are data compression, function approximation
and pattern association. The counter propagation network is basically constructed from an
instar-outstar model. This model is three layer neural network that performs input-output
data mapping, producing an output vector y in response to input vector x, on the basis of
competitive learning. The three layer in an instar-outstar model are the input layer, the
hidden(competitive) layer and the output layer.
There are two stages involved in the training process of a counter propagation net. The
input vector are clustered in the first stage. In the second stage of training, the weights from
the cluster layer units to the output units are tuned to obtain the desired response. There are
two types of counter propagation network:
Full CPN
• The Full CPN allows to produce a correct output even when it is given an input vector
that is partially incomplete or incorrect.
• In first phase, the training vector pairs are used to form clusters using either dot product
or Euclidean distance.
• During second phase, the weights are adjusted between the cluster units and output units.
• The model which connects the input layers to the hidden layer is called Instar model and
the model which connects the hidden layer to the output layer is called Outstar model.
• The weights are updated in both the Instar (in first phase) and Outstar model (second
phase).
• The network is fully interconnected network.
• The active units here are the units in the x-input, z-cluster and y-input layers.
• The winning unit uses standard Kohonen learning rule for its weigh updation.
• In this phase, we can find only the J unit remaining active in the cluster layer.
• The weights from the winning cluster unit J to the output units are adjusted, so that vector
of activation of units in the y ouput layer, y*, is approximation of input vector y; and x* is
an approximation of input vector x.
Training Algorithm
• x* - Approximation to vector x.
• y* - Approximation to vector y.
A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only
the x vector to form the cluster on the Kohonen units during phase I training. In case of
forward-only CPN, first input vectors are presented to the input units. First, the weights
between the input layer and cluster layer are trained. Then the weights between the cluster
layer and output layer are trained. This is a specific competitive network, with target
known.
It consists of three layers: input layer, cluster layer and output layer. Its architecture
resembles the back-propagation network, but in CPN there exists interconnections between
the units in the cluster layer.
The activation Function Will be similar to the Full Propagation
Basic of Adaptive Resonance Theory (ART) Architecture The adaptive resonant theory is
a type of neural network that is self-organizing and competitive. It can be of both types, the
unsupervised ones(ART1, ART2, ART3, etc) or the supervised ones(ARTMAP). Generally,
the supervised algorithms are named with the suffix “MAP”. But the basic ART model is
unsupervised in nature and consists of :
• F1 layer or the comparison field(where the inputs are processed)
• F2 layer or the recognition field (which consists of the clustering units)
• The Reset Module (that acts as a control mechanism)
The F1 layer accepts the inputs and performs some processing and transfers it to the F2
layer that best matches with the classification factor. There exist two sets of weighted
interconnection for controlling the degree of similarity between the units in the F1 and the
F2 layer. The F2 layer is a competitive layer. The cluster unit with the large net input
becomes the candidate to learn the input pattern first and the rest F2 units are ignored. The
reset unit makes the decision whether or not the cluster unit is allowed to learn the input
pattern depending on how similar its top-down weight vector is to the input vector and to
the decision. This is called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance produces
more general memories.
Generally two types of learning exists,slow learning and fast learning. In fast learning,
weight update during resonance occurs rapidly. It is used in ART1.In slow learning, the
weight change occurs slowly relative to the duration of the learning trial. It is used in ART2.
Application of ART:
ART stands for Adaptive Resonance Theory. ART neural networks used for fast, stable
learning and prediction have been applied in different areas. The application incorporates
target recognition, face recognition, medical diagnosis, signature verification, mobile
control robot.
Target recognition:
Fuzzy ARTMAP neural network can be used for automatic classification of targets depend
on their radar range profiles. Tests on synthetic data show the fuzzy ARTMAP can result in
substantial savings in memory requirements when related to k nearest neighbor(kNN)
classifiers. The utilization of multiwavelength profiles mainly improves the performance of
both kinds of classifiers.
Medical diagnosis:
Signature verification:
Automatic signature verification is a well known and active area of research with various
applications such as bank check confirmation, ATM access, etc. the training of the network
is finished using ART1 that uses global features as input vector and the verification and
recognition phase uses a two-step process. In the initial step, the input vector is coordinated
with the stored reference vector, which was used as a training set, and in the second step,
cluster formation takes place.
Mobile control robot:
Nowadays, we perceive a wide range of robotic devices. It is still a field of research in their
program part, called artificial intelligence. The human brain is an interesting subject as a
model for such an intelligent system. Inspired by the structure of the human brain, an
artificial neural emerges. Similar to the brain, the artificial neural network contains
numerous simple computational units, neurons that are interconnected mutually to allow
the transfer of the signal from the neurons to neurons. Artificial neural networks are used to
solve different issues with good outcomes compared to other decision algorithms.
Limitations of Adaptive Resonance Theory Some ART networks are inconsistent (like
the Fuzzy ART and ART1) as they depend upon the order of training data, or upon the
learning rate.
Special Networks
There are several different architectures for ANNs, each with their own strengths and
weaknesses. Some of the most common architectures include:
Feedforward Neural Networks: This is the simplest type of ANN architecture, where the
information flows in one direction from input to output. The layers are fully connected,
meaning each neuron in a layer is connected to all the neurons in the next layer.
Recurrent Neural Networks (RNNs): These networks have a “memory” component, where
information can flow in cycles through the network. This allows the network to process
sequences of data, such as time series or speech.
Convolutional Neural Networks (CNNs): These networks are designed to process data with
a grid-like topology, such as images. The layers consist of convolutional layers, which learn
to detect specific features in the data, and pooling layers, which reduce the spatial
dimensions of the data.
Autoencoders: These are neural networks that are used for unsupervised learning. They
consist of an encoder that maps the input data to a lower-dimensional representation and a
decoder that maps the representation back to the original data.
Generative Adversarial Networks (GANs): These are neural networks that are used for
generative modeling. They consist of two parts: a generator that learns to generate new data
samples, and a discriminator that learns to distinguish between real and generated data.
• Interconnections
• Activation functions
• Learning rules
Interconnections:
Interconnection can be defined as the way processing elements (Neuron) in ANN are
connected to each other. Hence, the arrangements of these processing elements and
geometry of interconnections are very essential in ANN.
These arrangements always have two layers that are common to all network architectures,
the Input layer and output layer where the input layer buffers the input signal, and the
output layer generates the output of the network. The third layer is the Hidden layer, in
which neurons are neither kept in the input layer nor in the output layer. These neurons are
hidden from the people who are interfacing with the system and act as a black box to them.
By increasing the hidden layers with neurons, the system’s computational and processing
power can be increased but the training phenomena of the system get more complex at the
same time. There exist five basic types of neuron connection architecture :
In this type of network, we have only two layers input layer and the output layer but the
input layer does not count because no computation is performed in this layer. The output
layer is formed when different weights are applied to input nodes and the cumulative effect
per node is taken. After this, the neurons collectively give the output layer to compute the
output signals.
When outputs can be directed back as inputs to the same layer or preceding layer nodes,
then it results in feedback networks. Recurrent networks are feedback networks with closed
loops. The above figure shows a single recurrent network having a single neuron with
feedback to itself.
4. Single-layer recurrent network
The above network is a single-layer network with a feedback connection in which the
processing element’s output can be directed back to itself or to another processing element
or both. A recurrent neural network is a class of artificial neural networks where
connections between nodes form a directed graph along a sequence. This allows it to
exhibit dynamic temporal behavior for a time sequence. Unlike feedforward neural
networks, RNNs can use their internal state (memory) to process sequences of inputs.
In this type of network, processing element output can be directed to the processing element
in the same layer and in the preceding layer forming a multilayer recurrent network. They
perform the same task for every element of a sequence, with the output being dependent on
the previous computations. Inputs are not needed at each time step. The main feature of a
Recurrent Neural Network is its hidden state, which captures some information about a
sequence.