0% found this document useful (0 votes)
3 views

Generative_Models

Genarative models

Uploaded by

g4gowthamkumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Generative_Models

Genarative models

Uploaded by

g4gowthamkumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

GENERATIVE MODELS

A BRANCH OF UNSUPERVISED LEARNING


TECHNIQUES IN MACHINE
LEARNING
AUTOENCODERS
• Autoencoders are neural networks that learn a compressed
representation of the input in order to later reconstruct it, so
they can be used for dimensionality reduction.
• autoencoders (algorithms like PCA) can be used to deal
with the curse of dimensionality.
• Autoencoder, by design
reduces data dimensions
by learning how to ignore
the noise in the data.
• An autoencoder accepts input, compresses it, and then
recreates the original input.
• This is an unsupervised technique because all you need is the
original data, without any labels of known, correct results.
• The two main uses of an autoencoder are to compress data
to two (or three) dimensions so it can be graphed, and to
compress and decompress images or documents, which
removes noise in the data.
AUTOENCODERS

An autoencoder
neural network is an
Unsupervised Machine lear
ning
algorithm that applies
backpropagation, setting the
target values to be equal to
the inputs.

Autoencoders are used to


reduce the size of our inputs
into a smaller representation.
If anyone needs the original
data, they can reconstruct it
from the compressed data.
ARCHITECTURE OF AUTOENCODERS

Autoencoders consists of
three layers:
1, Encoder
2, Code
3, Decoder
AUTOENCODER COMPONENTS:
Autoencoders consists of 4 main parts:
1- Encoder: In which the model learns how to reduce the input
dimensions and compress the input data into an encoded
representation.
2- Bottleneck: which is the layer that contains the compressed
representation of the input data. This is the lowest possible
dimensions of the input data.
3- Decoder: In which the model learns how to reconstruct the data
from the encoded representation to be as close to the original input
as possible.
4- Reconstruction Loss: This is the method that measures measure
how well the decoder is performing and how close the output is to
• Here is an example of the input/output image from the MNIST
dataset to an autoencoder.
EXAMPLE
• If we input image X and the encoder compresses data, which
is also called dimension reductions, the encoder chooses the
best features (colour, size, shades, shape etc.) and stores
highly compressed data in a space called a bottleneck or
latent space, this is called encoding process.
• Similarly, the latent vector or bottleneck pushes data to the
decoder, and it produces output image X’.
• With Loss = L(X, X’), we train the model to minimize the
reconstruction loss. And this process gets automated, which
is known as Autoencoder.
• This model tries to provide data which is close to original
data.
• This is very useful in compression and denoise the data.
• Autoencoder helps us store a lot of high volume data and
also helps dimension reductions.
WHY AUTOENCODERS PREFERRED OVER
PCA
• An autoencoder can learn non-
linear transformations with a non-linear activation
function and multiple layers.
• It doesn’t have to learn dense layers. It can
use convolutional layers to learn which is better for
video, image and series data.
• It is more efficient to learn several layers with an
autoencoder rather than learn one huge transformation
with PCA.
• An autoencoder provides a representation of each layer
as the output.
• It can make use of pre-trained layers from another
model to apply transfer learning to enhance the
encoder/decoder.
PROPERTIES OF AUTOENCODERS:

• Data-specific: Autoencoders are only able to compress data


similar to what they have been trained on.(if the encoder is
trained on compressing cat images they may not work fine
with donkey images.
• Lossy: The decompressed outputs will be degraded
compared to the original inputs.
• Unsupervised learning.
HYPERPARAMETERS OF
AUTOENCODERS:
• Code size: It represents the number of nodes in the middle layer.
Smaller size results in more compression.
• Number of layers: The autoencoder can consist of as many layers
as we want.
• Number of nodes per layer: The number of nodes per layer
decreases with each subsequent layer of the encoder, and increases
back in the decoder. The decoder is symmetric to the encoder in
terms of the layer structure.
• Loss function: We either use mean squared error or binary cross-
entropy. If the input values are in the range [0, 1] then we typically
APPLICATIONS OF AUTOENCODERS

• Image Colouring :

• Feature variation: It extracts only the required features of an image and


generates the output by removing any noise or unnecessary interruption .

• Dimensionality Reduction: The reconstructed image is the same as our


input but with reduced dimensions. It helps in providing the similar image with a
reduced pixel value.

• Denoising Image: A denoising autoencoder is thus trained


to reconstruct the original input from the noisy version
• Data compression and image generation
CONVOLUTIONAL AUTOENCODERS

• CNN also can be used as an autoencoder for image noise


reduction or coloring.
• When CNN is used for image noise reduction or coloring, it is
applied in an Autoencoder framework, i.e, the CNN is used in
the encoding and decoding parts of an autoencoder.
DEEP GENERATIVE MODELS

• Generative models learn the distribution of the training data and help in
generating new data points from the learned distribution by sampling those
distribution.
• Deep generative models (DGM) are neural networks with many hidden layers
trained to approximate complicated, high-dimensional probability
distributions using a large number of samples.
• When trained successfully, we can use the DGMs to estimate the likelihood of
each observation and to create new samples from the underlying distribution.
• Generative models are not classification models.
APPROACHES TO GENERATIVE MODELS

• Generative Adversarial Networks (GANs)


• Variational Autoencoders (VAEs)

GANs are typically superior


as deep generative models as
compared to variational
autoencoders.
VARIATIONAL AUTOENCODER

• Deep neural autoencoders and deep neural variational autoencoders share


similarities in architectures, but are used for different purposes.
• Autoencoders usually work with either numerical data or image data.
• Three common uses of autoencoders are data visualization, data denoising, and
data anomaly detection.
• Variational autoencoders usually work with either image data or text
(documents) data.
• The most common use of variational autoencoders is for generating new image
or text data.
VARIATIONAL AUTOENCODER

• A variational autoencoder assumes that the source data has


some sort of underlying probability distribution (such as
Gaussian) and then attempts to find the parameters of the
distribution.
• Implementing a variational autoencoder is much more
challenging than implementing an autoencoder.
• The one main use of a variational autoencoder is to generate
new data that’s related to the original source data.
• VAE are generative autoencoders, meaning they can
generate new instances that look similar to original dataset
used for training.
• VAE learns probability distribution of the data.
• A variational autoencoder is a generative system, and serves
a similar purpose as a generative adversarial network
(although GANs work quite differently).
EXAMPLE
EXAMPLE
• Let us understand how we are generating new data. Let’s say
we have the image of a celebrity face from which our
encoder model has to recognize important features.
• With every feature, we have a probability distribution.
• Our goal is to produce new data from the current data or a
new face from the current face.
• How do faces differ? Skin tone, eye colour, hair colour, and many other
features are different. But overall, the list of the features remains the
same.
• Since we have a facility with two probability distributions: mean and
standard deviations, we have datasets of two new ranges to provide to
the decoder.
• The basic difference between autoencoder and variational encoder is its
ability to provide continuous data or a range of data in the latent space
which is helping us to generate new data or a new image.
• Now, provide a set of random samples from mean and
variance distributions from latent space to the decoder for
the reproduction of data
• Still, we do not get the desired result unless we train this
model to improvise with new samples every time.
• Since this is not a one-time activity, we need to train the
model.
GENERATIVE ADVERSARIAL
NETWORKS(GAN)
• Generative Adversarial Networks (GANs) are a powerful class of neural networks
that are used for unsupervised learning.

• It was first described in a paper in 2014 by Ian Goodfellow and a standardized and
much stable model theory was proposed by Alec Radford in 2016 which is known as
DCGAN (Deep Convolutional General Adversarial Networks).

• It automatically discovers and learn patterns in input data.


GENERATIVE ADVERSARIAL
NETWORKS
• The two neural networks that make up a GAN are referred to as the
generator and the discriminator. The generator is a
convolutional neural network and the discriminator is a
deconvolutional neural network.
• Generative adversarial networks (GANs) are algorithmic
architectures that use two neural networks, pitting one against the
other (thus the “adversarial”) in order to generate new, synthetic
instances of data that can pass for real data.
• They are used widely in image generation, video generation and
voice generation.
THE WORKING CAN BE VISUALIZED

 Generative: To learn a
generative model, which
describes how data is
generated in terms of a
probabilistic model.
 Adversarial: The training
of a model is done in an
adversarial setting.
 Networks: Use deep
neural networks as the
artificial intelligence (AI)
algorithms for training
purpose.
LOSS FUNCTIONS
EXAMPLE
• Generator:
• It is trained to generate new dataset for example in computer
vision it generate new images from existing real world
images.
• Discriminator:
• It compares those images with some real world examples and
classify real and fake images.
HOW DOES GANS WORK?
• In GANs, there is a generator and a discriminator.
• The Generator generates fake samples of data(be it an image, audio,
etc.) and tries to fool the Discriminator.
• The Discriminator, on the other hand, tries to distinguish between the
real and fake samples.
• The Generator and the Discriminator are both Neural Networks and they
both run in competition with each other in the training phase.
• The steps are repeated several times and in this, the Generator and
Discriminator get better and better in their respective jobs after each
repetition.
STEPS FOR TRAINING
• Define The Problem – define the problem and collect data.
• Choose Architecture Of GAN – Depending on your problem
choose how your GAN should look like.
• Train Discriminator On Real Data – Train the discriminator with
real data to predict them as real for n number of times.
• Generate Fake Inputs For Generator – Generate fake samples
from the generator
• Train Discriminator On Fake Data – Train the discriminator to
predict the generated data as fake.
• Train Generator With The Output Of Discriminator – After
getting the discriminator predictions, train the generator to fool
TRAINING A GAN HAS TWO PARTS:

• Part 1: The Discriminator is trained while the Generator is idle.


• In this phase, the network is only forward propagated and no back-
propagation is done.
• The Discriminator is trained on real data for n epochs, and see if it
can correctly predict them as real.
• Also, in this phase, the Discriminator is also trained on the fake
generated data from the Generator and see if it can correctly predict
them as fake.
TRAINING A GAN HAS TWO PARTS:

• Part 2: The Generator is trained while the Discriminator is idle.


• After the Discriminator is trained by the generated fake data of the
Generator, we can get its predictions and use the results for training the
Generator and get better from the previous state to try and fool the
Discriminator.
• The above method is repeated for a few epochs and then manually
check the fake data if it seems genuine. If it seems acceptable, then the
training is stopped, otherwise, its allowed to continue for few more
epochs.
GAN TRAINING

• Each side of the GAN can overpower the other.


• If the discriminator is too good, it will return values so close
to 0 or 1 that the generator will struggle to read the gradient.
• If the generator is too good, it will persistently exploit
weaknesses in the discriminator that lead to false negatives.
• This may be mitigated by the nets’ respective learning rates.
The two neural networks must have a similar “skill level.” 1
GAN WHEN TRAINING BEGINS, THE GENERATOR PRODUCES
OBVIOUSLY FAKE DATA, AND THE DISCRIMINATOR QUICKLY
LEARNS TO TELL THAT IT'S FAKE: TRAINING
EXAMPLE
AS TRAINING PROGRESSES, THE GENERATOR
GETS CLOSER TO PRODUCING OUTPUT THAT
CAN FOOL THE DISCRIMINATOR:
FINALLY, IF GENERATOR TRAINING GOES WELL, THE
DISCRIMINATOR GETS WORSE AT TELLING THE
DIFFERENCE BETWEEN REAL AND FAKE. IT STARTS TO
CLASSIFY FAKE DATA AS REAL, AND ITS ACCURACY
DECREASES.
• Both the generator and the discriminator are neural
networks.
• The generator output is connected directly to the
discriminator input. Through backpropagation, the
discriminator's classification provides a signal that the
generator uses to update its weights.
LOSS/BACK PROPAGATION
UP SAMPLING

• While a standard convolutional classifier takes an image and


downsamples it to produce a probability.
• Generative models in the GAN architecture are required to
upsample input data in order to generate an output image. The
Upsampling layer is a simple layer with no weights that
will double the dimensions of input and can be used in a
generative model when followed by a traditional
convolutional layer.
• A simple version of an un pooling or opposite pooling layer is called an
upsampling layer.
• It works by repeating the rows and columns of the input.
• A more elaborate approach is to perform a backwards convolutional
operation, originally referred to as a deconvolution, which is incorrect, but is
more commonly referred to as a fractional convolutional layer or a
transposed convolutional layer.
• Both of these layers can be used on a GAN to perform the required
upsampling operation to transform a small input into a large image output.
GAN CHALLENGES

• GANs take a long time to train. On a single GPU a GAN might take hours,
and on a single CPU more than a day.
• Vanishing gradient:
• Research has suggested that if your discriminator is too good, then
generator training can fail due to vanishing gradients. In effect, an optimal
discriminator doesn't provide enough information for the generator to
make progress.
• If the discriminator is too weak and the generator produces non realistic
images that fool it too easily.
SOLUTION

• Wasserstein loss: The Wasserstein loss is designed to


prevent vanishing gradients even when you train the
discriminator to optimality.
• Modified minimax loss: The original GAN paper proposed
a modification to minimax loss to deal with vanishing
gradients.
LOSS FUNCTIONS
LOSS FUNCTIONS

• A GAN can have two loss functions: one for generator training and
one for discriminator training
• GANs try to replicate a probability distribution.
• They should therefore use loss functions that reflect the distance
between the distribution of the data generated by the GAN and the
distribution of the real data.
• minimax loss:
• Wasserstein loss:
CROSS ENTROPY LOSS
MINIMAX LOSS

• The generator tries to minimize the following function while


the discriminator tries to maximize it:
• The formula derives from the cross-entropy between the real
and generated distributions.
• Ex[log(D(x))]+Ez[log(1−D(G(z)))]
• D(X) is the discriminator's estimate of the probability that real data instance
x is real.
• Ex is the expected value over all real data instances.
• G(z) is the generator's output when given noise z.
• D(G(z)) is the discriminator's estimate of the probability that a fake instance
is real.(in effect, the expected value over all generated fake instances G(z)
• E(z) is the expected value over all random inputs to the generator.
The generator can't directly affect the log(D(x)) term in the function, so, for
the generator,
minimizing the loss is equivalent to minimizing log(1 - D(G(z))).
MODIFIED MINIMAX LOSS

• The original GAN paper notes that the above minimax loss
function can cause the GAN to get stuck in the early stages of
GAN training when the discriminator's job is very easy.
• The paper therefore suggests modifying the generator loss so
that the generator tries to maximize log D(G(z))
WASSERSTEIN LOSS

• This loss function depends on a modification of the GAN


scheme (called "Wasserstein GAN" or "WGAN") in which the
discriminator does not actually classify instances.
• For each instance it outputs a number.
• This number does not have to be less than one or greater
than 0, so we can't use 0.5 as a threshold to decide whether
an instance is real or fake.
• Discriminator training just tries to make the output bigger for real
instances than for fake instances.
• Because it can't really discriminate between real and fake the WGAN
discriminator is actually called a "critic" instead of a "discriminator".
• Critic loss = D(x)-D(G(z))
• The discriminator tries to maximize this function. In other words, it
tries to maximize the difference between its output on real instances
and its output on fake instances.
• Generator Loss: D(G(z))
• The generator tries to maximize this function. In other words, It
tries to maximize the discriminator's output for its fake instances.
• D(x) is the critic's output for a real instance.
• G(z) is the generator's output when given noise z.
• D(G(z)) is the critic's output for a fake instance.
• The output of critic D does not have to be between 1 and 0.
APPLICATIONS

• Prediction Of Next Frame In A Video

• Text to Image Generation

• Enhancing The Resolution of an Image

• Image To Image Translation

• Data set augmentation


• Medical image synthesis.
VAE VS GAN
• Both VAE and GANs are very exciting approaches to learn the
underlying data distribution using unsupervised learning
GANs yield better results as compared to VAE.
RESEARCH SCOPE

• GANs fail miserably in determining the positioning of the


objects in terms of how many times the object should occur
at that location. So, there is a scope of hybrid approach of
GAN and capsule networks.
• Medical Data Synthesis
Image Denoising.
• Autoencoder feature extraction for classification. Hybrid
approach of Discriminative and Generative models.
• https://www.youtube.com/watch?v=ZD7HtL1gook
• https://www.youtube.com/watch?v=-UDvuBpcCXw
• https://developers.google.com/machine-learning/gan/loss
• THANK YOU
All the best dear
students

You might also like