Deep Learning: Prof:Naveen Ghorpade
Deep Learning: Prof:Naveen Ghorpade
Auto encoder learns to copy its input to its output. It has an internal (hidden)
layer that describes a code used to represent the input, and
It is constituted by two main parts: an encoder that maps the input into the
code, and a decoder that maps the code to a reconstruction of the original
input.
Performing the copying task perfectly would simply duplicate the signal, and
this is why autoencoders usually are restricted in ways that force them to
reconstruct the input approximately, preserving only the most relevant aspects
of the data in the copy.
Their most traditional application was dimensionality reduction or feature
learning, but more recently the autoencoder concept has become more widely
used for learning generative models of data.
Some of the most powerful AIs in the 2010s involved sparse autoencoders
Auto Encoders
Auto Encoders
We have a spatially continuous input space, in which our input vectors live.
The aim is to map from this to a low dimensional spatially discrete output
space, the topology of which is formed by arranging a set of neurons in a grid.
Our SOM provides such a nonlinear transformation called a feature map.
The stages of the SOM algorithm can be summarised as follows:
1. Initialization – Choose random values for the initial weight vectors wj.
2. Sampling – Draw a sample training input vector x from the input space.
3. Matching – Find the winning neuron I(x) with weight vector closest to input
vector.
4. Updating – Apply the weight update equation
5. Continuation – keep returning to step 2 until the feature map stops changing.
Note on Biases
Regularized training of an autoencoder typically results in hidden unit biases that take on large
negative values.
The negative biases are a natural result of using a hidden layer whose responsibility is to both
represent the input data and act as a selection mechanism that ensures sparsity of the
representation.
Then the negative biases impede the learning of data distributions whose intrinsic dimensionality
is high.
We also observe a new activation function that decouples the two roles of the hidden layer and
that allows us to learn representations on data with very high intrinsic dimensionality, where
standard autoencoders typically fail.
Since the decoupled activation function acts like an implicit regularizer, the model can be trained
by minimizing the reconstruction error of training data, without requiring any additional
regularization.
Training an Auto-Encoder
Step 1 - We start with an array where the lines (observations) correspond to the users and the columns (the
features) correspond to the movies. Each cell (u, i) contains the rating (from 1 to 5, 0 if no rating) of the movie
i by the user u.
Step 2 - The first user goes into the network. The input vector x = (r1, r2, ..., rm) contains all ratings for all
movies.
Step 3 - The input vector x is encoded into a vector z of lower dimensions by a mapping function f (e.g.
sigmoid function):
Z = f(Wx + b) where W is the vector of input weights and b the bias.
Step 4 - z is then decoded into the output vector y of same dimensions as x, aiming to replicate the input vector
x.
Step 5 - The reconstruction error d(x, y) = ||x-y|| is computed. The goal is to minimize it.
Step 6 - Backpropagation. From right to left, the error is backpropagated. The weights are updated according
to how much they are responsible for the error and the learning rate decides how much we update the weights.
Step 7 - Repeat steps 1 to 6 and update the weights after each observation (Reinforcement Learning). Or repeat
steps 1 to 6 but update the weights only after a batch of observations (Batch Learning).
Step 8 - When the whole training set has passed through the ANN, this classes as an epoch. Repeat more
Over complete hidden layers
Sparse autoencoders have hidden nodes greater than input nodes. They can
still discover important features from the data.
Sparsity constraint is introduced on the hidden layer. This is to prevent output
layer copy input data.
Sparse autoencoders have a sparsity penalty, Ω(h), a value close to zero but
not zero. Sparsity penalty is applied on the hidden layer in addition to the
reconstruction error. This prevents overfitting.
Sparse autoencoders take the highest activation values in the hidden layer and
zero out the rest of the hidden nodes. This prevents autoencoders to use all of
the hidden nodes at a time and forcing only a reduced number of hidden nodes
to be used.
As we activate and inactivate hidden nodes for each row in the dataset. Each
Sparse auto-encoders
An auto-encoder takes the input image or vector and learns code dictionary
that changes the raw input from one representation to another.
Where in sparse autoencoders with a sparsity enforcer that directs a single
layer network to learn code dictionary which in turn minimizes the error in
reproducing the input while restricting number of code words for
reconstruction.
The sparse autoencoder consists a single hidden layer, which is connected to
the input vector by a weight matrix forming the encoding step.
The hidden layer then outputs to a reconstruction vector, using a tied weight
matrix to form the decoder.
Sparse auto-encoders
Dimensionality Reduction
Image Compression
Image Denoising
Feature Extraction
Image generation
Sequence to sequence prediction
Recommendation system
Dimensionality Reduction
In autoencoders, where the size of hidden layer is smaller than input layer. We
force the network to learn important features by reducing the hidden layer size.
Also, a network with high capacity(deep and highly nonlinear ) may not be
able to learn anything useful.
Dimension reduction methods are based on the assumption that dimension of
data is artificially inflated and its intrinsic dimension is much lower.
As we increase the number of layers in an autoencoder, the size of the hidden
layer will have to decrease. If the size of the hidden layer becomes smaller
than the intrinsic dimension of the data then it will result in loss of
information.
Image Compression