0% found this document useful (0 votes)
13 views60 pages

Gen AI 10-1

The document presents an overview of advanced Generative Adversarial Networks (GANs), focusing on various types such as CycleGANs, Pix2Pix, StyleGANs, and DCGANs. It discusses concepts like mode collapse, mini-batch GANs, and conditional GANs, detailing their architectures, training processes, and advantages. Additionally, it highlights techniques for improving GAN performance, such as image de-duplication and the use of conditional information for more controlled data generation.

Uploaded by

20i0863 Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views60 pages

Gen AI 10-1

The document presents an overview of advanced Generative Adversarial Networks (GANs), focusing on various types such as CycleGANs, Pix2Pix, StyleGANs, and DCGANs. It discusses concepts like mode collapse, mini-batch GANs, and conditional GANs, detailing their architectures, training processes, and advantages. Additionally, it highlights techniques for improving GAN performance, such as image de-duplication and the use of conditional information for more controlled data generation.

Uploaded by

20i0863 Maryam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

National University of Computer and Emerging Sciences

Advanced Generative Adversarial Networks

AI-4009 Generative AI

Dr. Akhtar Jamil


Department of Computer Science

4/1/2024 Presented by Dr. AKHTAR JAMIL 1


Goals
• Today’s Lecture
– CycleGANs
– Pix2Pix
– CycleGANs
– StyleGANs
– BigGANs
– WGANs
– Superresolution GAN (SRGAN)

4/1/2024 Presented by Dr. AKHTAR JAMIL 2


Today’s Lecture

4/1/2024 Presented by Dr. AKHTAR JAMIL 3


Mode Collapse
Converge to same faces
Generated
Distribution

Data
Distribution
Sometimes, this is hard to tell since
one sees only what’s generated, but not what’s missed.
4/1/2024 Presented by Dr. AKHTAR JAMIL 7
Solutions
• Use Mini Batch GANs
• Supervision with Labels

4/1/2024 Presented by Dr. AKHTAR JAMIL 8


Mini Batch Generative Adversarial Networks (Mini
Batch GANs)
• Mini Batch GANs are a variation of Generative Adversarial
Networks (GANs).
• In traditional GANs, the discriminator evaluates each sample
individually to decide whether it is real or fake.
• In Mini Batch GANs, the discriminator evaluates samples in small
groups or "mini-batches".
– Concept called mini-batch discrimination
– Assess the authenticity of individual samples but also the diversity
between samples within a batch.
Mini Batch GANs on MNIST

Deconv Conv, BN,


Tanh/Sigmoid Conv, BN, Reshape, FC,
FC, BN, Deconv LReLU FC, BN,
LReLU Sigmoid
Reshape BN, ReLU LReLU

100 7x7x16 14x14x8 28x28x1 14x14x8 7x7x16 256 1

Generator
Discriminator
4/1/2024 Presented by Dr. AKHTAR JAMIL 11
Mini Batch GANs
• Generator in Batches
– The generator can produce batches of images at once instead of single
images.
– Processing in batches allows for more efficient computation.
– More diversity of samples being evaluated by the discriminator, facilitating
a more comprehensive training.
• Mini-batch Discrimination
– The discriminator uses a measure to assess how much the generated
samples deviate from the real samples within the same batch.
– Computing distances or similarities between samples in the feature space.
– How diverse the generated images are within each batch (mode collapse).

4/1/2024 Presented by Dr. AKHTAR JAMIL 12


Mini Batch GANs
• Backpropagation:
– Based on the discriminator’s feedback the GAN updates both
the generator and discriminator.
– The generator learns to produce more diverse and realistic
samples
• Iteration:
– This process repeats over many cycles
– The generator and discriminator iteratively improving in
response to each other's feedback.

4/1/2024 Presented by Dr. AKHTAR JAMIL 13


Mini Batch GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 14


Mini Batch GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 15


Advantages of Mini Batch GANs
1. Mini Batch GANs encourage the generator to produce a diverse
range of outputs.
2. Stable training as the discriminator works on a batch instead of
individual samples.
3. Mini batches can lead to faster convergence of the network
4. Mini-batch discrimination provides more information to the
generator, which can help in generating higher-quality samples.

4/1/2024 Presented by Dr. AKHTAR JAMIL 16


Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks

4/1/2024 Presented by Dr. AKHTAR JAMIL 18


DCGAN

4/1/2024 Presented by Dr. AKHTAR JAMIL 19


DCGANs
• DCGAN: Download Paper.
• Supervised learning with convolutional networks (CNNs) has
seen huge adoption in computer vision applications
• Comparatively, unsupervised learning with CNNs has received
less attention
• CNN for unsupervised learning
• DCGANs builds upon the original Generative Adversarial
Networks (GANs) by incorporating convolutional layers, making
them more suited for image processing

4/1/2024 Presented by Dr. AKHTAR JAMIL 20


DCGANS
• DCGANs consist of two main components:
• Generator: This network generates new data instances (e.g.,
images) from random noise.
• It typically uses transposed convolutional layers to progressively upsample
the input noise to an image.
• Discriminator: This network evaluates the authenticity of images.
• It takes an image (either from the training set or generated by the
generator) and outputs the probability of the image being real (as opposed
to being generated).

4/1/2024 Presented by Dr. AKHTAR JAMIL 21


DCGANS

4/1/2024 Presented by Dr. AKHTAR JAMIL 22


4/1/2024 Presented by Dr. AKHTAR JAMIL 23
4/1/2024 Presented by Dr. AKHTAR JAMIL 24
DETAILS OF ADVERSARIAL TRAINING
• We trained DCGANs on three datasets:
– Large-scale Scene Understanding (LSUN)
– Imagenet-1k
– Faces dataset.
• No pre-processing was applied to training images besides scaling
to the range of the tanh activation function [-1, 1].
• LeakyReLU with the slope of the leak was set to 0.2.
• Adam optimizer with tuned hyperparameters.
• Learning rate set to 0.0002, momentum = 0.2

4/1/2024 Presented by Dr. AKHTAR JAMIL 25


Image De-duplication Process
• It is a method used to decrease the likelihood that GAN
memorizes and directly replicates its training images
• Model learns to understand and recreate the underlying data
distribution without copying exact details.
• Before deduplication, images from the training set are modified by
cropping them to focus on the central region and then resizing
them to a standard size of 32x32 pixels.
• 3072-128-3072 de-noising dropout regularized RELU autoencoder
is used
– Hidden code layer with 128 nodes.

4/1/2024 Presented by Dr. AKHTAR JAMIL 26


Image De-duplication Process
• Autoencoder compresses and then reconstructs the images,
helping to remove noise and unnecessary details.
• Binarization and semantic hashing:
• After training, the latent spaces are used to represent each image.
• Z are made binary (0 or 1) by thresholding: values above the
threshold are set to 1, and those below are set to 0.
• Result of binarization is like semantic hashing where similar
images are likely to have similar binary codes, allowing for
efficient comparison and deduplication.

4/1/2024 Presented by Dr. AKHTAR JAMIL 27


Image De-duplication Process
• De-duplication
• Using these binary codes images are compared and deduplicated.
• Visual inspection and hash collisions: The authors manually
inspected cases where different images had the same binary code
(hash collisions).
• The technique effectively identified and removed about 275,000
near-duplicate images from the dataset

4/1/2024 Presented by Dr. AKHTAR JAMIL 28


DCGANS
• Convolutional Layers: DCGANs use convolutional layers (in the
discriminator) and transpose convolutional layers (in the
generator)
• Working Process:
• The generator creates images from random noise.
• The discriminator assesses these images and real images from
the dataset, trying to distinguish between the two.
• The output of the discriminator is a probability score that
represents how likely it is that the image is real.

4/1/2024 Presented by Dr. AKHTAR JAMIL 29


Training of DCGANs
1. Training the Discriminator:
– In each training step, the discriminator is trained first.
– It is provided a batch of real images (labeled as real) and a batch of fake
images generated by the generator (labeled as fake).
– The goal is to maximize the probability of correctly classifying both real
and fake images.
2. Training the Generator:
– After updating the discriminator, the generator is trained.
– It generates a batch of images, which are then passed to the
discriminator.
– The generator's goal is to minimize the likelihood that the discriminator
correctly identifies the images as fake (Adversarial).

4/1/2024 Presented by Dr. AKHTAR JAMIL 30


Training of DCGANs
1. Adversarial Process:
– As training progresses, the generator improves in generating realistic images,
the discriminator must also improve at distinguishing fake images from real
ones, and vice versa.
– This competition drives both networks to improve (adversarial).
2. Backpropagation and Optimization:
– Both networks use backpropagation to update their weights.
– This iterative process of alternating between training the discriminator and the
generator continues until a stopping criterion is met (like a fixed number of
epochs or a desired level of performance).
3. Loss Functions:
– Commonly used loss functions include binary cross-entropy
– The generator’s loss is based on how well it tricks the discriminator.

4/1/2024 Presented by Dr. AKHTAR JAMIL 31


Key Features
• Stability:
• DCGANs often offer more stable training compared to traditional GANs,
partly due to their convolutional nature.
• Hyperparameters:
• Tuning hyperparameters (like learning rates, batch size, etc.) is crucial for
effective training.
• Mode Collapse:
• A common issue with GANs, including DCGANs, is mode collapse, where
the generator produces limited varieties of outputs.

4/1/2024 Presented by Dr. AKHTAR JAMIL 32


Key challenges
• Some strategies to mitigate Mode collapse:
• Adding Noise to Inputs:
– Introducing noise to the inputs of the discriminator can prevent the discriminator
from becoming too confident.
– This uncertainty can prevent the generator from exploiting weaknesses in the
discriminator, leading to a more stable and diverse output.
• Regularization Techniques:
– Applying regularization methods like gradient penalty or weight normalization
can help in stabilizing the training and thus prevent mode collapse.
• Using Different Architectures or Loss Functions:
– Sometimes, simply changing the architecture of the GAN or using a different
loss function can mitigate mode collapse.
• Conditional GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 33


DCGANs Results

4/1/2024 Presented by Dr. AKHTAR JAMIL 34


DCGANs Results

4/1/2024 Presented by Dr. AKHTAR JAMIL 35


Guidelines for DCGANs
• Architecture guidelines for stable Deep Convolutional GANs
– Replace any pooling layers with strided convolutions (discriminator) and
fractional-strided convolutions (generator).
– Use Batchnorm in both the generator and the discriminator.
– Remove fully connected hidden layers for deeper architectures.
– Use ReLU activation in generator for all layers except for the output, which
uses Tanh.
– Use LeakyReLU activation in the discriminator for all layers.

4/1/2024 Presented by Dr. AKHTAR JAMIL 36


Conditional Generative Adversarial Nets
Conditional GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 37


Conditional GANs
• Conditional Generative Adversarial Nets (Conditional
GANs) are an extension of the original Generative Adversarial
Networks (GANs) framework
• It incorporates conditional information into the data generation
process.
• Both the generator and discriminator are provided with
additional conditional data
– class labels or part of data features
• This allows the generated data to be more specific to the given
condition
– More controlled and diverse data generation.

4/1/2024 Presented by Dr. AKHTAR JAMIL 38


Conditional GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 39


Conditional GANs
• Input with Condition:
– Both the generator and the discriminator receive additional conditional
information y.
– This could be a one-hot encoded vector representing class labels, text
descriptions, or any other form of auxiliary data.
• Generator:
– The generator G takes a noise vector z and conditional information y to
produce data G(z|y)
– Not only produces realistic output but also matches the given condition.

4/1/2024 Presented by Dr. AKHTAR JAMIL 40


Conditional GANs
• Discriminator:
– The discriminator ( D ) also receives the conditional information y alongside the
real data or the generated data from the generator.
– Its task is to determine whether the given data is real or fake and whether it
corresponds to the given condition.
– The discriminator assesses D(x, y), where ( x ) is either real or generated data.

• Objective Function:
– The loss function encourages the generator to create data that can fool the
discriminator into believing it is real and correctly conditioned.
– Distinguish between real and fake data and also ensure that the generated data
adheres to the conditional context.

4/1/2024 Presented by Dr. AKHTAR JAMIL 41


Conditional GANs
• The conventional GANs define the objective function as:

• Conditional GANs the objective function now becomes:

4/1/2024 Presented by Dr. AKHTAR JAMIL 42


Conditional GANs
• Backpropagation and Training:
– The networks are trained using backpropagation and gradient descent methods.
– The generator and discriminator are trained alternately: the discriminator is
trained by showing it real data with the correct condition and fake data with the
intended condition, while the generator is trained to produce data that the
discriminator will think is real.
• Learning Conditional Distributions:
– Over time, the generator learns to produce data that is conditioned on the
additional information, effectively learning the conditional distribution of the data.
– The discriminator gets better at evaluating the authenticity of the data and its
alignment with the given condition.

4/1/2024 Presented by Dr. AKHTAR JAMIL 43


Conditional GANs
• Diverse and Controlled Generation:
– By conditioning on different y, the generator can produce diverse results
tailored to specific conditions, providing more control over the data
generation process.

4/1/2024 Presented by Dr. AKHTAR JAMIL 44


Conditional GANs
• Use cases:
• Image-to-image translation
– Object reconstruction from edges, photo synthesis from label maps, and image
colorization.
• Creating images from text
– Create high-quality photos based on text.
• Video generation
– Predict future frames of a video based on a selection of previous images.
• Face generation
– Generate images of faces with specific attributes, such as hair or eye color,
simile etc.

4/1/2024 Presented by Dr. AKHTAR JAMIL 45


Conditional GAN Experimental Results: Unimodal
• Trained a conditional adversarial net on MNIST images
conditioned on their class labels, encoded as one-hot vectors.
• In the generator net, a noise prior z with dimensionality 100 was
drawn from a uniform distribution
• Both z and y are mapped to hidden layers with Rectified Linear
Unit (ReLu) activation with layer sizes 200 and 1000 respectively
• Combined both Z and Y and fed to 2nd hidden layer with ReLu of
dimensionality 1200.
• Final sigmoid unit layer as our output for generating the
784-dimensional MNIST samples.

4/1/2024 Presented by Dr. AKHTAR JAMIL 46


Conditional GAN Experimental Results: Unimodal

4/1/2024 Presented by Dr. AKHTAR JAMIL 47


Conditional GAN Experimental Results: Unimodal
• Discriminator:
• The discriminator maps x to a maxout layer with 240 units and 5
pieces
• Y to a maxout layer with 50 units and 5 pieces.
• Both of the hidden layers mapped to a joint maxout layer with 240
units and 4 pieces before being fed to the sigmoid layer.

4/1/2024 Presented by Dr. AKHTAR JAMIL 48


Maxout layer
• A Maxout layer works by dividing its inputs into groups (referred
to as "pieces") and outputting the maximum value within each
group.
• If a Maxout layer has, say, 240 units and is described to have 5
pieces, this means each unit takes 5 inputs and outputs the
maximum of these inputs.
• This approach helps the network learn more complex functions
compared to traditional units and improves the model's capacity to
handle non-linear phenomena without relying on predefined
activation functions like ReLU or sigmoid.

4/1/2024 Presented by Dr. AKHTAR JAMIL 49


Maxout layer

4/1/2024 Presented by Dr. AKHTAR JAMIL 50


Conditional GAN Experimental Results: Unimodal

4/1/2024 Presented by Dr. AKHTAR JAMIL 51


Image-to-Image Translation with Conditional GANs
• Image-to-Image Translation with Conditional Adversarial Networks
– Download Paper: https://arxiv.org/pdf/1611.07004v3.pdf
– Paper Code: https://github.com/phillipi/pix2pix
• Conditional adversarial networks as a general-purpose solution to
image-to-image translation problems
• This approach is effective at synthesizing photos from label maps,
reconstructing objects from edge maps, and colorizing images
• pix2pix software

4/1/2024 Presented by Dr. AKHTAR JAMIL 52


Image-to-Image Translation with Conditional GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 53


Image-to-Image Translation with Conditional GANs
• Many image processing and computer vision can be posed as
“translating” problems
– Translate an input image into a corresponding output image.
• Similar to translating English or French
• A scene may be rendered as an RGB image, an edge map, a
semantic label map, etc.
– We define automatic image-to-image translation as the task of translating
one possible representation of a scene into another
– Need sufficient training data

4/1/2024 Presented by Dr. AKHTAR JAMIL 54


Image-to-Image Translation with Conditional GANs
• In this paper, GANs are used in the conditional setting.
• GANs learn a generative model of data, conditional GANs
(cGANs) learn a conditional generative model.
• This makes cGANs suitable for image-to-image translation tasks,
where we condition on an input image and generate a
corresponding output image

4/1/2024 Presented by Dr. AKHTAR JAMIL 55


Image-to-Image Translation with Conditional GANs

Training a conditional GAN to map edges→photo.

4/1/2024 Presented by Dr. AKHTAR JAMIL 56


Image-to-Image Translation with Conditional GANs
• Objective:
– The objective of a conditional GAN can be expressed as

4/1/2024 Presented by Dr. AKHTAR JAMIL 57


Image-to-Image Translation with Conditional GANs

4/1/2024 Presented by Dr. AKHTAR JAMIL 58


References
• Conditional Generative Adversarial Nets
Conditional GANs
• Image-to-Image Translation with Conditional Adversarial Networks

4/1/2024 Presented by Dr. AKHTAR JAMIL 59


Thank You ☺

4/1/2024 Presented by Dr. AKHTAR JAMIL 60

You might also like