0% found this document useful (0 votes)
110 views47 pages

Lec19 - GANs

Generative Adversarial Networks (GANs) are a class of machine learning frameworks where two neural networks, a generator and discriminator, compete against each other. The generator tries to produce realistic samples from noise to fool the discriminator, while the discriminator tries to distinguish between real samples and generated samples. GANs have been applied to generate images, text, and other types of data.

Uploaded by

Yattin Gaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views47 pages

Lec19 - GANs

Generative Adversarial Networks (GANs) are a class of machine learning frameworks where two neural networks, a generator and discriminator, compete against each other. The generator tries to produce realistic samples from noise to fool the discriminator, while the discriminator tries to distinguish between real samples and generated samples. GANs have been applied to generate images, text, and other types of data.

Uploaded by

Yattin Gaur
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Generative Adversarial

Networks (GANs)
From Ian Goodfellow et al.

A short tutorial by :-
Binglin, Shashank & Bhargav

Adapted for Purdue MA 598, Spring 2019 from


http://slazebni.cs.illinois.edu/spring17/lec11_gan.pptx
Outline
• Part 1: Review of GANs

• Part 2: Some challenges with GANs

• Part 3: Applications of GANs


GAN’s Architecture
x

D D(x)

G
z
G(z)
D(G(z))

• Z is some random noise (Gaussian/Uniform).


• Z can be thought as the latent representation of the image.
https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Discriminator

https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
Training Generator

https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-generative-models-and-adversarial-training-upc-2016
GAN’s formulation
• 

• It is formulated as a minimax game, where:


• The Discriminator is trying to maximize its reward
• The Generator is trying to minimize Discriminator’s reward (or maximize its loss)

• The Nash equilibrium of this particular game is achieved at:

•D
Discriminator
updates

Generator
updates
Vanishing gradient strikes back again…

• 

• =

• Gradient goes to if is confident, i.e.

• Minimize for Generator instead (keep Discriminator as it is)


CIFAR

Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
DCGAN: Bedroom images

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434 (2015).
Deep Convolutional GANs (DCGANs)
Key ideas:
• Replace FC hidden layers with
Generator Architecture Convolutions
• Generator: Fractional-Strided
convolutions

• Use Batch Normalization after


each layer

• Inside Generator
• Use ReLU for hidden layers
• Use Tanh for the output layer

Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv:1511.06434
(2015).
Part 2
• Training Challenges
• Non-Convergence
• Mode-Collapse
• Proposed Solutions
• Supervision with Labels
• Mini-Batch GANs
• Modification of GAN’s losses
• Discriminator (EB-GAN)
• Generator (InfoGAN)
Non-Convergence
•• Deep
  Learning models (in general) involve a single player
• The player tries to maximize its reward (minimize its loss).
• Use SGD (with Backpropagation) to find the optimal parameters.
• SGD has convergence guarantees (under certain conditions).
• Problem: With non-convexity, we might converge to local optima.

• GANs instead involve two (or more) players


• Discriminator is trying to maximize its reward.
• Generator
  min is trying to minimize Discriminator’s reward.
max 𝑉 ( 𝐷 ,𝐺 )
𝐺 𝐷

• SGD was not designed to find the Nash equilibrium of a game.


• Problem: We might not converge to the Nash equilibrium at all.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Non-Convergence
• 

• Differential equation’s solution has sinusoidal


terms

• Even with a small learning rate, it will not


converge

Goodfellow, Ian. "NIPS 2016 Tutorial: Generative Adversarial Networks." arXiv preprint arXiv:1701.00160 (2016).
Mode-Collapse
• Generator fails to output diverse samples

Target

Expected

Output

Metz, Luke, et al. "Unrolled Generative Adversarial Networks." arXiv preprint arXiv:1611.02163 (2016).
How to reward sample diversity?
• At Mode Collapse,
• Generator produces good samples, but a very few of them.
• Thus, Discriminator can’t tag them as fake.

• To address this problem,


• Let the Discriminator know about this edge-case.

• More formally,
• Let the Discriminator look at the entire batch instead of single examples
• If there is lack of diversity, it will mark the examples as fake

• Thus,
• Generator will be forced to produce diverse samples.
Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Mini-Batch GANs
• Extract features that capture diversity in the mini-batch
• For e.g. L2 norm of the difference between all pairs from the batch

• Feed those features to the discriminator along with the image

• Feature values will differ b/w diverse and non-diverse batches


• Thus, Discriminator will rely on those features for classification

• This in turn,
• Will force the Generator to match those feature values with the real data
• Will generate diverse batches

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Supervision with Labels
• Label information of the real data might help
Car

Dog

Real

Human
D D
Fake
Fake

• Empirically generates much better samples

Salimans, Tim, et al. "Improved techniques for training gans." Advances in Neural Information Processing Systems. 2016.
Alternate view of GANs
• 

• In this formulation, Discriminator’s strategy was

• Alternatively, we can flip the binary classification labels i.e. Fake = 1, Real = 0

• In this new formulation, Discriminator’s strategy will be


Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Alternate view of GANs (Contd.)
• If
  all we want to encode is

We can use this

• Now, we can replace cross-entropy with any loss function (Hinge Loss)

• And thus, instead of outputting probabilities, Discriminator just has to output :-


• High values for fake samples
• Low values for real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Energy-Based GANs
•  Modified game plans
• Generator will try to generate samples with  
low values
• Discriminator will try to assign high scores to
fake values

• Use AutoEncoder inside the Discriminator

• Use Mean-Squared Reconstruction error as


• High Reconstruction Error for Fake samples
• Low Reconstruction Error for Real samples

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
More Bedrooms…

Zhao, Junbo, Michael Mathieu, and Yann LeCun. "Energy-based generative adversarial network." arXiv preprint arXiv:1609.03126 (2016)
Feature parameterization
3D Faces

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
How to reward Disentanglement?
•  Disentanglement means individual dimensions
independently capturing key attributes of the image

• Let’s partition the noise vector into 2 parts :-


• z vector will capture slight variations in the image
• c vector will capture the main attributes of the image
• For e.g. Digit, Angle and Thickness of images in MNIST

• If c vector captures the key variations in the image,


Will c and be highly correlated or weakly correlated?

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets
Mutual Information
• Mutual
  Information captures the mutual dependence between two variables

• Mutual information between two variables is defined as:


InfoGAN
•  We want to maximize the mutual information
between and

• Incorporate in the value function of the minimax


game.

 min max 𝑉 ( 𝐷 , 𝐺 ) =𝑉 ( 𝐷 ,𝐺 ) − 𝜆 𝐼 ( 𝑐 ; 𝐺 ( 𝑧 , 𝑐 ))
𝐼
𝐺 𝐷

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
InfoGAN
•Mutual
  Information’s Variational Lower bound  
𝐷  
𝑄

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximization Generative
Adversarial Nets, NIPS (2016).
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• Coupled GAN
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Conditional GANs
• Simple modification to the original GAN
framework that conditions the model on
additional information for better multi-modal
learning.

• Lends to many practical applications of GANs


when we have explicit supervision available.

Image Credit: Figure 2 in Odena, A., Olah, C. and Shlens, J., 2016. Conditional image synthesis with auxiliary classifier GANs.  arXiv preprint arXiv:1610.09585.

Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets”. arXiv preprint arXiv:1411.1784 (2014).
Conditional GANs
MNIST digits generated conditioned on their class label.
 

MNIST digits

Figure 2 in the original paper.

Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
Image-to-Image Translation

Figure 1 in the original paper.


Link to an interactive demo of this paper
https://affinelayer.com/pixsrv/
Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Image-to-Image Translation
• Architecture: DCGAN-based
architecture

• Training is conditioned on the images


from the source domain.

• Conditional GANs provide an effective


way to handle many complex domains
without worrying about designing Figure 2 in the original paper.

structured loss functions explicitly.


Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. “Image-to-image translation with conditional adversarial networks”. arXiv preprint arXiv:1611.07004. (2016).
Text-to-Image Synthesis
Motivation

Given a text description, generate


images closely associated.

Uses a conditional GAN with the


generator and discriminator being
conditioned on “dense” text
embedding.
Figure 1 in the original paper.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Text-to-Image Synthesis

Figure 2 in the original paper.

Positive Example: Negative Examples:


Real Image, Right Text Real Image, Wrong Text
Fake Image, Right Text
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. “Generative adversarial text to image synthesis”. ICML (2016).
Face Aging with Conditional GANs
• Differentiating Feature: Uses an Identity Preservation Optimization using an
auxiliary network to get a better approximation of the latent code (z*) for an
input image.
• Latent code is then conditioned on a discrete (one-hot) embedding of age
categories.

Figure 1 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Face Aging with Conditional GANs

Figure 3 in the original paper.

Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). “Face Aging With Conditional Generative Adversarial Networks”. arXiv preprint arXiv:1702.01983.
Part 3
• Conditional GANs
• Applications
• Image-to-Image Translation
• Text-to-Image Synthesis
• Face Aging
• Advanced GAN Extensions
• LAPGAN – Laplacian Pyramid of Adversarial Networks
• Adversarially Learned Inference
• Summary
Laplacian Pyramid of Adversarial
Networks

Figure 1 in the original paper. (Edited for simplicity)

• Based on the Laplacian Pyramid representation of images. (1983)


• Generate high resolution (dimension) images by using a hierarchical system of GANs
• Iteratively increase image resolution and quality.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial
Networks

Figure 1 in the original paper.

 Image Generation using a LAPGAN


• Generator generates the base image from random noise input .
• Generators () iteratively generate the difference image ( conditioned on previous small image ().
• This difference image is added to an up-scaled version of previous smaller image.

Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Laplacian Pyramid of Adversarial
Networks

Figure 2 in the original paper.

Training Procedure:
Models at each level are trained independently to learn the required representation.
Denton, E.L., Chintala, S. and Fergus, R., 2015. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. NIPS (2015)
Adversarially Learned Inference
•  Basic idea is to learn an encoder/inference network along with the
generator network.

• Consider the following joint distributions over (image) and (latent


variables) :
encoder distribution

generator distribution

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Adversarially Learned Inference

Discriminator Network

Encoder/Inference Network Generator Network


Figure 1 in the original paper.

•  min max 𝔼 𝑞 ( 𝑥 ) ¿ ¿ ¿
𝐺 𝐷

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Adversarially Learned Inference
•  Nash equilibrium yields

• Joint:
• Marginals: and 
• Conditionals: and 

• Inferred latent representation successfully reconstructed the original


image.
• Representation was useful in the downstream semi-supervised task.

Dumoulin, Vincent, et al. “Adversarially learned inference”. arXiv preprint arXiv:1606.00704 (2016).


Summary
• GANs are generative models that are implemented using two
stochastic neural network modules: Generator and Discriminator.
• Generator tries to generate samples from random noise as input
• Discriminator tries to distinguish the samples from Generator and
samples from the real data distribution.
• Both networks are trained adversarially (in tandem) to fool the other
component. In this process, both models become better at their
respective tasks.
Why use GANs for Generation?
• Can be trained using back-propagation for Neural Network based
Generator/Discriminator functions.
• Sharper images can be generated.
• Faster to sample from the model distribution: single forward pass
generates a single sample.
Reading List
• Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and Bengio, Y. Generative adversarial nets, NIPS (2014).
• Goodfellow, Ian NIPS 2016 Tutorial: Generative Adversarial Networks, NIPS (2016).
• Radford, A., Metz, L. and Chintala, S., Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434. (2015).
• Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. Improved techniques for training gans. NIPS (2016).
• Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. 
InfoGAN: Interpretable Representation Learning by Information Maximization Generative Adversarial Nets, NIPS (2016).
• Zhao, Junbo, Michael Mathieu, and Yann LeCun. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).
• Mirza, Mehdi, and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
• Liu, Ming-Yu, and Oncel Tuzel. Coupled generative adversarial networks. NIPS (2016).
• Denton, E.L., Chintala, S. and Fergus, R., 2015. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. NIPS (2015)
• Dumoulin, V., Belghazi, I., Poole, B., Lamb, A., Arjovsky, M., Mastropietro, O., & Courville, A. Adversarially learned inference. arXiv preprint
arXiv:1606.00704 (2016).

Applications:
• Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004. (2016).
• Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. Generative adversarial text to image synthesis. JMLR (2016).
• Antipov, G., Baccouche, M., & Dugelay, J. L. (2017). Face Aging With Conditional Generative Adversarial Networks. arXiv preprint arXiv:1702.01983.
Questions?

You might also like