Q Learning
Q Learning
Consider the image below. You can see a dog in a room that has to perform an
action, which is fetching. The dog is the agent; the room is the environment it
has to work in, and the action to be performed is fetching.
If the correct action is performed, we will reward the agent. If it performs the
wrong action, we will not give it any reward or give it a negative reward, like a
scolding.
Q-Learning is a Reinforcement learning policy that will find the next best
action, given a current state. It chooses this action at random and aims to
maximize the reward.
Model-free means that the agent uses predictions of the environment’s expected
response to move forward. It does not use the reward system to learn, but rather,
trial and error.
The Bellman Equation is used to determine the value of a particular state and
deduce how good it is to be in/take that state. The optimal state will give us the
highest optimal value.
The equation is given below. It uses the current state, and the reward associated
with that state, along with the maximum expected reward and a discount rate,
which determines its importance to the current state, to find the next state of our
agent. The learning rate determines how fast or slow, the model will be
learning.
Figure 6: Bellman Equation
While running our algorithm, we will come across various solutions and the
agent will take multiple paths. How do we find out the best among them? This
is done by tabulating our findings in a table called a Q-Table.
A Q-Table helps us to find the best action for each state in the environment. We
use the Bellman Equation at each state to get the expected future state and
reward and save it in a table to compare with other states.
Lets us create a q-table for an agent that has to learn to run, fetch and sit on
command. The steps taken to construct a q-table are :
When we initially start, the values of all states and rewards will be 0. Consider
the Q-Table shown below which shows a dog simulator learning to perform
actions :
Step 2: Choose an action and perform it. Update values in the table
This is the starting point. We have performed no other action as of yet. Let us
say that we want the agent to sit initially, which it does. The table will change
to:
Step 3: Get the value of the reward and calculate the value Q-Value using
Bellman Equation
For the action performed, we need to calculate the value of the actual reward
and the Q( S, A ) value
Step 4: Continue the same until the table is filled or an episode ends
The agent continues taking actions and for each action, the reward and Q-value
are calculated and it updates the table.
Figure 10: Final Q-Table at end of an episode
One of the most popular policy gradient methods is the REINFORCE algorithm,
which uses the likelihood ratio trick to estimate the gradient of the expected
cumulative reward. Another popular method is the actor-critic algorithm, which
combines a policy network (the actor) with a value function network (the critic)
to estimate the gradient more efficiently.
More recently, researchers have proposed advanced policy gradient methods that
incorporate techniques such as trust region optimization, natural gradient, and
importance sampling to improve the stability and convergence of the algorithms.
Overall, policy gradient methods have proven to be effective in solving complex
reinforcement learning problems, and they continue to be an active area of
research in deep learning.
Here are some popular policy gradient methods in deep reinforcement learning:
REINFORCE: This is a simple but effective policy gradient algorithm that uses
Monte Carlo estimation to estimate the expected reward for each action taken in
a given state, and then uses these estimates to update the policy parameters in the
direction of the estimated gradient.
The actor network learns the policy, which is a mapping from states to actions.
The policy can be either deterministic or stochastic. The actor network is
typically implemented as a neural network that takes the current state as input
and outputs the actions to be taken.
The critic network, on the other hand, learns an estimate of the value function,
which measures how good a state is in terms of expected future rewards. The
value function is used to evaluate the quality of the actions taken by the actor
network, and to provide feedback to the actor network to improve its
performance.
During training, the actor network takes actions in the environment and receives
feedback in the form of rewards and the next state. The critic network then
evaluates the quality of the actions taken by the actor network and provides
feedback to improve the actor's performance. The feedback signal can be in the
form of the advantage function, which measures how much better or worse an
action is compared to the average action.
The actor and critic networks are trained using a combination of policy gradient
and value-based methods. The policy gradient method is used to update the
actor network to maximize the expected future rewards, while the value-based
method is used to update the critic network to minimize the difference between
the predicted and actual values.
Batch vs online learning: The actor-critic algorithm can be used for both batch
learning (where the agent learns from a fixed dataset of experiences) and online
learning (where the agent learns from experience as it interacts with the
environment). Online learning is generally more efficient, as it allows the agent
to adapt to changes in the environment and learn from new experiences.
Convergence and stability: The actor-critic algorithm can suffer from issues of
convergence and stability, especially when dealing with high-dimensional state
and action spaces. To address these issues, various modifications have been
proposed, such as using trust region optimization, clipped surrogate objectives,
or target networks.
If anyone needs the original data can reconstruct it from the compressed data
using an autoencoder.
The encoder part of the network is used for encoding and sometimes even for
data compression purposes although it is not very effective as compared to other
general compression techniques like JPEG. Encoding is achieved by the
encoder part of the network which has a decreasing number of hidden units in
each layer. Thus this part is forced to pick up only the most significant and
representative features of the data. The second half of the network performs
the Decoding function. This part has an increasing number of hidden units in
each layer and thus tries to reconstruct the original input from the encoded data.
Thus Auto-encoders are an unsupervised learning technique.
Example: See the below code, in autoencoder training data, is fitted to itself.
That’s why instead of fitting X_train to Y_train we have used X_train in both
places.
Python3
Step 2: Decoding the input data The Auto-encoder tries to reconstruct the
original input from the encoded data to test the reliability of the encoding.
Step 3: Backpropagating the error After the reconstruction, the loss function is
computed to determine the reliability of the encoding. The error generated is
backpropagated.
The above-described training process is reiterated several times until an
acceptable level of reconstruction is reached.
After the training process, only the encoder part of the Auto-encoder is retained
to encode a similar type of data used in the training process. The different ways
to constrain the network are:-
Keep small Hidden Layers: If the size of each hidden layer is
kept as small as possible, then the network will be forced to pick up
only the representative features of the data thus encoding the data.
Regularization: In this method, a loss term is added to the cost
function which encourages the network to train in ways other than
copying the input.
Denoising: Another way of constraining the network is to add
noise to the input and teach the network how to remove the noise from
the data.
Tuning the Activation Functions: This method involves chan-
ging the activation functions of various nodes so that a majority of the
nodes are dormant thus effectively reducing the size of the hidden lay-
ers.
The different variations of Auto-encoders are:-
Denoising Auto-encoder: This type of auto-encoder works on a
partially corrupted input and trains to recover the original undistorted
image. As mentioned above, this method is an effective way to con-
strain the network from simply copying the input.
Sparse Auto-encoder: This type of auto-encoder typically con-
tains more hidden units than the input but only a few are allowed to be
active at once. This property is called the sparsity of the network. The
sparsity of the network can be controlled by either manually zeroing
the required hidden units, tuning the activation functions or by adding
a loss term to the cost function.
Variational Auto-encoder: This type of auto-encoder makes strong assumptions
about the distribution of latent variables and uses the Stochastic Gradient
Variational Bayes estimator in the training process. It assumes that the data is
generated by a Directed Graphical Model and tries to learn an approximation to
to the conditional property where and are the parameters of the encoder and
the decoder respectively.
Below is the basic intuition code of how to build the autoencoder model and
fitting X_train to itself.
1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space
representation. It encodes the input image as a compressed representation in a
reduced dimension.
The Code layer represents the compressed input fed to the decoder layer.
The decoder layer decodes the encoded image back to the original dimension.
The decoded image is reconstructed from latent space representation, and it is
reconstructed from the latent space representation and is a lossy reconstruction
of the original image.
Convolutional autoencoding:
VARIATIONAL AUTOENCODING:
The main idea behind VAEs is to learn a probabilistic model of the data, where
the lower-dimensional representation is treated as a random variable with a
known probability distribution. The encoder network maps the input data to the
parameters of this probability distribution, while the decoder network samples
from the distribution to generate new data points.
The encoder network consists of one or more layers of neural networks that map
the input data to the mean and standard deviation of the probability distribution.
The standard deviation is used to ensure that the encoded data has some
randomness and variability.
During training, the VAE learns to optimize the parameters of the encoder and
decoder networks to minimize the difference between the input data and the
reconstructed data. The VAE also learns to minimize the difference between the
probability distribution defined by the encoder network and a known prior
probability distribution.
The VAE is trained using a variant of stochastic gradient descent called the
reparameterization trick. The reparameterization trick involves sampling from a
standard normal distribution and then transforming the sample using the mean
and standard deviation output by the encoder network.
The VAE has many applications in deep learning, including image and video
generation, anomaly detection, and dimensionality reduction. It has also been
used in combination with other deep learning techniques, such as generative
adversarial networks (GANs) and adversarial autoencoders, to improve its
performance and stability.
GLOSSARY:
1.Components:
2.Training:
3.Applications:
Image synthesis
Text-to-Image synthesis
Image-to-Image translation
Anomaly detection
Data augmentation
4.Limitations:
Training can be unstable and prone to mode collapse, where the
generator produces limited variations of synthetic data.
GANs can be difficult to train and require a lot of computational
resources.
GANs can generate unrealistic or irrelevant synthetic data if the
generator and discriminator are not properly trained.
Generative Adversarial Networks (GANs) are a powerful class of neural
networks that are used for unsupervised learning. It was developed and
introduced by Ian J. Goodfellow in 2014. GANs are basically made up of a
system of two competing neural network models which compete with each
other and are able to analyze, capture and copy the variations within a
dataset. Why were GANs developed in the first place? It has been noticed most
of the mainstream neural nets can be easily fooled into misclassifying things by
adding only a small amount of noise into the original data. Surprisingly, the
model after adding noise has higher confidence in the wrong prediction than
when it predicted correctly. The reason for such adversary is that most machine
learning models learn from a limited amount of data, which is a huge drawback,
as it is prone to overfitting. Also, the mapping between the input and the output
is almost linear. Although, it may seem that the boundaries of separation
between the various classes are linear, but in reality, they are composed of
linearities and even a small change in a point in the feature space might lead to
misclassification of data. How does GANs work? Generative Adversarial
Networks (GANs) can be broken down into three parts:
Generative: To learn a generative model, which describes how
data is generated in terms of a probabilistic model.
Adversarial: The training of a model is done in an adversarial
setting.
Networks: Use deep neural networks as the artificial intelli-
gence (AI) algorithms for training purpose.
In GANs, there is a generator and a discriminator. The Generator generates fake
samples of data(be it an image, audio, etc.) and tries to fool the Discriminator.
The Discriminator, on the other hand, tries to distinguish between the real and
fake samples. The Generator and the Discriminator are both Neural Networks
and they both run in competition with each other in the training phase. The steps
are repeated several times and in this, the Generator and Discriminator get
better and better in their respective jobs after each repetition. The working can
be visualized by the diagram given
below:
Once an autoencoder is trained, the encoder can be used to extract features from
new input data points. The lower-dimensional representation generated by the
encoder can be used as a new set of features for downstream machine learning
tasks, such as classification or clustering.
One advantage of using autoencoders for feature extraction is that they can learn
nonlinear relationships in the data, which traditional linear methods may not be
able to capture. Autoencoders can also be used to remove noise or redundancy
from the data, which can improve the performance of downstream machine
learning tasks.
Transfer learning: Autoencoders can be used for transfer learning, which is the
process of reusing learned features from one task to another. By pretraining an
autoencoder on a large dataset, the learned features can be used to initialize the
weights of a neural network for a new task, which can improve the performance
of the network.
Autoencoders are Neural Networks which are commonly used for feature
selection and extraction. However, when there are more nodes in the hidden
layer than there are inputs, the Network is risking to learn the so-called “Identity
Function”, also called “Null Function”, meaning that the output equals the input,
marking the Autoencoder useless.
When calculating the Loss function, it is important to compare the output values
with the original input, not with the corrupted input. That way, the risk of
learning the identity function instead of extracting features is eliminated.
Denoising Autoencoders are an important and crucial tool for feature selection
and extraction and now you know what it is! Enjoy and thanks for reading!
The structure of a DAE
Note the emphasis on the word customised. Given that we train a DAE on a
specific set of data, it will be optimised to remove noise from similar data. For
example, if we train it to remove noise from a collection of images, it will work
well on similar images but will not be suitable for cleaning text data.
Unlike Undercomplete AE, we may use the same or higher number of neurons
within the hidden layer, making the DAE overcomplete.
The second difference comes from not using identical inputs and outputs.
Instead, the outputs are the original data (e.g., images), while the inputs contain
data with some added noise.
SPARSE AUTOENCODER: