RBF and Unsupervised Learning
RBF and Unsupervised Learning
Unsupervised Learning
Radial Basis Function(RBF)
★ Introduction to RBF
★ Components and Structures of RBF
★ Information Processing of an RBF network
★ Training of an RBF network
★ Comparing RBF networks and Multilayer perceptron
Radial Basis Function(RBF)
★ Introduction to RBF
★ Components and Structures of RBF
★ Information Processing of an RBF network
★ Training of an RBF network
★ Comparing RBF networks and Multilayer perceptron
What is Radial Basis function?
➔ Radial basis function networks (RBF networks) are a paradigm of neural
networks, which was developed considerably later than that of
perceptrons.
➔ Like perceptrons, the RBF networks are built in layers. But in this case,
they have exactly three layers, i.e. only one single layer of hidden
neurons.
➔ Like perceptrons, the networks have a feedforward structure and their
layers are completely linked. Here, the input layer again does not
participate in information processing. The RBF networks are - like MLPs -
universal function approximators.
RBF…
Despite all things in common:
What is the difference between RBF networks and MLP?
● The difference lies in the information processing itself and in the
computational rules within the neurons outside of the input layer. So, in a
moment we will define a so far unknown type of neurons.
● An RBFN performs classification by measuring the input’s similarity to
examples from the training set. Each RBFN neuron stores a “prototype”,
which is just one of the examples from the training set. When we want to
classify a new input, each neuron computes the Euclidean distance between
the input and its prototype. Roughly speaking, if the input more closely
resembles the class A prototypes than the class B prototypes, it is classified as
class A.
Components and structure of an RBF network
RBF Network Architecture
It consists of an input vector, a layer of RBF neurons, and an output layer with one node per
category or class of data.
Components and structure of an RBF network…
➔ The Input Vector
The input vector is the n-dimensional vector that we are trying to classify. The
entire input vector is shown to each of the RBF neurons.
➔ Hidden neurons
Hidden neurons are also called RBF neurons (as well as the layer in which they
are located is referred to as RBF layer). Each RBF neuron stores a “prototype”
vector which is just one of the vectors from the training set. Each RBF neuron
compares the input vector to its prototype, and outputs a value between 0 and 1
which is a measure of similarity. If the input is equal to the prototype, then the
output of that RBF neuron will be 1. As the distance between the input and
prototype grows, the response falls off exponentially towards 0.
Components and structure of an RBF network…
➔ Output neurons:
◆ In an RBF network the output neurons only contain the identity as activation
function and one weighted sum as propagation function.
◆ The output of the network consists of a set of nodes, one per category that
we are trying to classify.
◆ Each output node computes a sort of score for the associated category.
◆ Typically, a classification decision is made by assigning the input to the
category with the highest score.
◆ The score is computed by taking a weighted sum of the activation values
from every RBF neuron. By weighted sum we mean that an output node
associates a weight value with each of the RBF neurons, and multiplies the
neuron’s activation by this weight before adding it to the total response.
RBF Neuron Activation Function
◆ Each RBF neuron computes a measure of the similarity between the input
and its prototype vector (taken from the training set).
◆ Input vectors which are more similar to the prototype return a result closer to
1.
◆ There are different possible choices of similarity functions, but the most
popular is based on the Gaussian.
◆ Below is the equation for a Gaussian with a one-dimensional input.
RBF Neuron Activation Function…
Where x is the input, mu is the mean, and sigma is the standard deviation. This
produces the familiar bell curve shown below, which is centered at the mean, mu (in
the below plot the mean is 5 and sigma is 1).
RBF Neuron Activation Function…
The RBF neuron activation function is slightly different, and is typically written as:
In the Gaussian distribution, mu(µ) refers to the mean of the distribution. Here, it is
the prototype vector which is at the center of the bell curve.
For the activation function, phi(φ), we aren’t directly interested in the value of the
standard deviation, sigma(σ), so we make a couple simplifying modifications.
RBF Neuron Activation Function…
➔ The first change is that we’ve removed the outer coefficient, 1 / (sigma * sqrt(2
* pi)). This term normally controls the height of the Gaussian. Here, though, it is
redundant with the weights applied by the output nodes. During training, the
output nodes will learn the correct coefficient or “weight” to apply to the
neuron’s response.
➔ The second change is that we’ve replaced the inner coefficient,
1 / (2 * sigma^2), with a single parameter ‘beta’. This beta coefficient controls
the width of the bell curve. Again, in this context, we don’t care about the value
of sigma, we just care that there’s some coefficient which is controlling the width
of the bell curve. So we simplify the equation by replacing the term with a single
variable.
RBF Neuron Activation Function…
It’s important to note that the underlying metric here for evaluating the similarity between an
input vector and a prototype is the Euclidean distance between the two vectors.
RBF Neuron Activation Function…
➔ Also, each RBF neuron will produce its largest response when the input is equal
to the prototype vector. This allows to take it as a measure of similarity, and sum
the results from all of the RBF neurons.
➔ As we move out from the prototype vector, the response falls off exponentially.
Recall from the RBFN architecture illustration that the output node for each
category takes the weighted sum of every RBF neuron in the network–in other
words, every neuron in the network will have some influence over the
classification decision.
➔ The exponential fall off of the activation function, however, means that the
neurons whose prototypes are far from the input vector will actually contribute
very little to the result.
Training the RBF Network
➔ The training process for an RBFN consists of selecting three sets of parameters:
the Selecting the prototypes (mu) and beta coefficient for each of the RBF
neurons, and the matrix of output weights between the RBF neurons and the
output nodes.
★ Selecting the Prototypes
➢ It seems like there’s pretty much no “wrong” way to select the prototypes for
the RBF neurons. In fact, two possible approaches are to create an RBF
neuron for every training example, or to just randomly select k prototypes
from the training data.
➢ The reason the requirements are so loose is that, given enough RBF neurons,
an RBFN can define any arbitrarily complex decision boundary. In other
words, you can always improve its accuracy by using more RBF neurons.
Training the RBF Network…
★ Selecting the Prototypes
➢ One of the approaches for making an intelligent selection of prototypes is to
perform k-Means clustering on the training set and to use the cluster
centers as the prototypes.
➢ When applying k-means, we first want to separate the training examples by
category; we don’t want the clusters to include data points from multiple
classes.
Training the RBF Network…
★ Selecting Beta Values
➢ If you use k-means clustering to select your prototypes, then one simple
method for specifying the beta coefficients is to set sigma equal to the
average distance between all points in the cluster and the cluster center.
● In unsupervised learning, we train a model to solve a problem without us knowing the correct answer. In
fact, unsupervised learning is typically used for problems where there isn't one correct answer, but instead,
better and worse solutions.
● In unsupervised, or self-organized, learning, there is no external teacher or critic to oversee the learning
process, rather, provision is made for a task-independent measure of the quality of representation that the
network is required to learn, and the free parameters of the network are optimized with respect to that
measure.
Introduction to supervised Learning…
● To perform unsupervised learning, we may use a competitive-learning rule. For example, we may use a
neural network that consists of two layers—an input layer and a competitive layer.
● The input layer receives the available data. The competitive layer consists of neurons that compete with each
other (in accordance with a learning rule) for the “opportunity” to respond to features contained in the input
data.
● In its simplest form, the network operates in accordance with a “winner-takes-all” strategy. In such a
strategy, the neuron with the greatest total input “wins” the competition and turns on; all the other neurons in
the network then switch off.
Knowledge Representation in Learning
Knowledge refers to stored information or models used by a person or machine to interpret, predict, and
appropriately respond to the outside world.
By the very nature of it, therefore, knowledge representation is goal directed. In real-world applications of “intelligent”
machines, it can be said that a good solution depends on a good representation of knowledge (Woods, 1986). So, it is with
neural networks.
A major task for a neural network is to learn a model of the world (environment) in which it is embedded, and to
maintain the model sufficiently consistently with the real world so as to achieve the specified goals of the application
of interest.
Knowledge Representation in Learning
Knowledge of the world consists of two kinds of information:
1. The known world state, represented by facts about what is and what has been known; this form of knowledge
is referred to as prior information.
2. Observations (measurements) of the world, obtained by means of sensors designed to probe the
environment, in which the neural network is supposed to operate.
Ordinarily, these observations are inherently noisy, being subject to errors due to sensor noise and system
imperfections. In any event, the observations so obtained provide the pool of information, from which the examples
used to train the neural network are drawn.
The examples can be labeled or unlabeled. In labeled examples (supervised ), or unlabeled examples consist of
different realizations of the input signal all by itself.
Unsupervised Learning Structure
Rules of Knowledge Representation in Learning
The subject of how knowledge is actually represented inside an artificial network is, however, very
complicated. Nevertheless, there are four rules for knowledge representation that are of a general
commonsense nature, as described next.
Rule 1. Similar inputs (i.e., patterns drawn) from similar classes should usually produce similar
representations inside the network, and should therefore be classified as belonging to the same class.
Rule 2. Items to be categorized as separate classes should be given widely different representations in
the network.
Rule 3. If a particular feature is important, then there should be a large number of neurons involved
in the representation of that item in the network.
Rule 4. Prior information and invariances should be built into the design of a neural network whenever
they are available, so as to simplify the network design by it’s not having to learn them.
Unsupervised learning tasks
Unsupervised learning tasks can be broadly divided into 3 categories:
● Association rule mining
● Recommendation system
● Clustering
When we have transactional data for something, it can be for products sold or any transactional data for those matters; I
want to know, is there any hidden relationship between buyer and the products or product to product, such that I can somehow
leverage this information to increase my sales. Extracting these relationships is the core of Association Rule Mining.
Recommendation System
Recommendation System is basically an extension of Association rule mining in a sense; we are extracting relationships in ARM
(Association Rule Mining). Recommendation Systems works on transactional data, be it financial transaction, e-commerce, or
grocery shop transactions.
Unsupervised learning tasks…
Clustering
Clustering can be done any data where we do not have the class or label information. We want to group the data such that
the observations with similar properties belong to the same cluster/group, and inter-cluster distance should be maximum. At
the same time, the intra-cluster distance should be minimum.
Clustering is a form of unsupervised machine learning in which observations are grouped into clusters based on similarities
in their data values, or features.
This kind of machine learning is considered unsupervised because it does not make use of previously known label values to
train a model; in a clustering model, the label is the cluster to which the observation is assigned, based purely on its features.
Unsupervised learning tasks…
Training a clustering model
There are multiple algorithms you can use for clustering. One of the most commonly used algorithms is K-Means clustering that, in its
simplest form, consists of the following steps:
1. The feature values are vectorized to define n-dimensional coordinates (where n is the number of features).
2. You decide how many clusters you want to use to group the inputs, and call this value k. For example, to create three
clusters, you would use a k value of 3. Then k points are plotted at random coordinates. These points will ultimately
be the center points for each cluster, so they're referred to as centroids.
3. Each data point (in this case flower) is assigned to its nearest centroid.
4. Each centroid is moved to the center of the data points assigned to it based on the mean distance between the points.
5. After moving the centroid, the data points may now be closer to a different centroid, so the data points are reassigned
to clusters based on the new closest centroid.
6. The centroid movement and cluster reallocation steps are repeated until the clusters become stable or a pre-determined
maximum number of iterations is reached.
Type of Unsupervised learning
SOM is a which is also inspired by biological models of neural systems from the 1970s. It follows an unsupervised learning
approach and trained its network through a competitive learning algorithm.
SOM is used for clustering and mapping (or dimensionality reduction) techniques to map multidimensional data onto
lower-dimensional which allows people to reduce complex problems for easy interpretation.
Type of Unsupervised learning
SOM has two layers; one is the Input layer and the other one is the Output layer. The architecture of the Self
Organizing Map with two clusters and n input features of any sample is given below:
Type of Unsupervised learning…
Let’s say an input data of size (m, n) where m is the number of training examples and n is the number of features in
each example. First, it initializes the weights of size (n, C) where C is the number of clusters. Then iterating over the
input data, for each training example, it updates the winning vector (weight vector with the shortest distance (e.g.,
Euclidean distance) from training example). Weight updating rule is given by:
where alpha is a learning rate at time t, j denotes the winning vector, i denotes the ith feature of training example and
k denotes the kth training example from the input data. After training the SOM network, trained weights are used for
clustering new examples. A new example falls in the cluster of winning vectors.
Type of Unsupervised learning…
The basic adaptive resonance theory uses unsupervised learning technique. The
term “adaptive” and “resonance” used in this suggests that they are open to new learning (i.e., adaptive) without
discarding the previous or the old information (i.e., resonance).
Type of Unsupervised learning…
ART
The ART networks are known to solve the stability-plasticity dilemma i.e., stability refers to their nature of
memorizing the learning and plasticity refers to the fact that they are flexible to gain new information.
Due to this the nature of ART, they are always able to learn new input patterns without forgetting the past.
Input is presented to the network and the algorithm checks whether it fits into one of the already stored
clusters. If it fits then the input is added to the cluster that matches the most else a new cluster is formed.
Applications of Unsupervised Learning
▪ Grocery shop or e-commerce store/ marketplace: Extract Association rules from customers transactional data and
recommendations for consumers to buy products.
▪ Social Media Platform: Extract relationships with different users to suggest products or services. Recommend new people for
social connect.
▪ Services: Recommendations of travel services, a recommendation of houses to rent, or matchmaking services.
▪ Banking: Cluster customers based on their financial transactions. Cluster fraudulent transaction for fraud detection.
▪ Politics: Cluster voters opinions about chances of a win for a particular party.
▪ Data Visualization: With clustering and t-distributed Stochastic Neighbor Embedding (t-SNE), we can visualize high-dimensional
data. Also, this can be used for dimensionality reduction.
▪ Entertainment: Recommendations for movies, music, as Netflix and Amazon are doing.
▪ Image segmentation: Cluster images portions based on nearest pixel values. Content: personalized newspapers, recommendations
of Web pages, e-learning applications, and email filters.
▪ Structural discovery: With clustering, we can discover any hidden structure in the data—cluster twitter data for sentiment analysis.