lec_12_generative_adversarial_networks
lec_12_generative_adversarial_networks
2
12.1
Generative Adversarial Networks
Recap: Latent Variable Models
LVMs map between observation space x ∈ RD and latent space z ∈ RQ :
( fw : x 7→ z ) gw : z 7→ x̂
I One latent variable gets associated with each data point in the training set
I The latent vectors are smaller than the observations (Q < D) ⇒ compression
I Models are linear or non-linear, deterministic or stochastic, with/without encoder
A little taxonomy:
Deterministic Probabilistic
Linear Principle Component Analysis Probabilistic PCA
Non-Linear w/ Encoder Autoencoder Variational Autoencoder
Non-Linear w/o Encoder Gen. Adv. Networks 4
Generative Models
I The term generative model refers to any model that takes a dataset drawn from
pdata and learns a probability distribution pmodel to represent pdata
I In some cases, the model estimates pmodel explicitly and therefore allow for
evaluating the (approximate) likelihood/density pmodel (x) of a sample x
I In other cases, the model is only able to generate samples from pmodel
I GANs are prominent examples of this family of implicit models
I They provide a framework for training models without explicit likelihood
5
Generative Models
Generative Models
6
Generative Adversarial Networks
Generative Adversarial Networks
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 8
Generative Adversarial Networks
D and G play the following two-player minimax game with value function V (D, G):
We train D to assign probability one to samples from pdata and zero to samples from
pmodel , and G to fool D such that it assigns probability one to samples from pmodel .
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 9
Generative Adversarial Networks
Generator Discriminator
Network Network
Discriminator
Network
I Theoretical analysis shows that this minimax game recovers pmodel = pdata
if G and D are given enough capacity and assuming that D∗ can be reached
I In practice, however, we must use iterative numerical optimization and optimizing
D in the inner loop to completion is computationally prohibitive and would lead to
overfitting on finite datasets
I Therefore, we resort to alternating optimization:
I k steps of optimizing D (typically k ∈ {1, . . . , 5})
I 1 step of optimizing G (using a small enough learning rate)
I This way, we maintain D near its optimal solution as long as G changes slowly
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 11
Algorithm
While not converged do
1. For k steps do
1.1 Draw B training samples {x1 , . . . , xB } from pdata (x)
1.2 Draw B latent samples {z1 , . . . , zB } from p(z)
1.3 Update the discriminator D by ascending its stochastic gradient:
B
1 X
∇wD log D(xb ) + log(1 − D(G(zb )))
B
b=1
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 12
The Gradient Trick
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 13
Expressiveness
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 14
1D Example
A simple example with linear generator:
x ∼ N (µ, σ)
z ∼ N (0, 1)
G(z) = w0G + w1G z
D(x) = σ(w0D + w1D x + w2D x2 )
I Here, we consider the data distribution and the prior as two different 1D
Gaussians and initialize G(z) = z and D(x) = σ(x), i.e., pmodel (x) = N (x|0, 1)
I The goal is to learn wG and wD such that pmodel (x) = pdata (x) = N (x|µ, σ)
I Remark: The x2 term is needed to provide gradients for the second moment
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 15
1D Example
Iteration: 0
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 1000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 1500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 2000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 2500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 0
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 17
1D Example
Iteration: 4000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6
D(x)
p(x)
0.2 0.4
0.1 0.2
x 0
z
15 10 5 0 5 10
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 17
Theoretical Results
Generative Adversarial Networks
D and G play the following two-player minimax game with value function V (D, G):
We train D to assign probability one to samples from pdata and zero to samples from
pmodel , and G to fool D such that it assigns probability one to samples from pmodel .
19
Optimal Discriminator
Proposition 1. For any given generator G, the optimal discriminator D is:
∗ pdata (x)
DG (x) =
pdata (x) + pmodel (x)
Proof. The training criterion for the discriminator D is to maximize (wrt. D):
Z Z
V (D, G) = pdata (x) log(D(x)) dx + p(z) log(1 − D(G(z))) dz
Zx z
For any (a, b) ∈ R2 \ {0, 0}, the function y 7→ a log(y) + b log(1 − y) achieves its
a
maximum in [0, 1] at a+b . The discriminator D does not need to be defined outside
Supp(pdata ) ∪ Supp(pmodel ) where pdata = 0 and pmodel = 0, concluding the proof.
20
Global Optimality
Theorem 1. The global minimum of the virtual training criterion
∗ ∗ ∗
V (DG , G) = Ex∼pdata [log DG (x)] + Ex∼pmodel [log(1 − DG (x))]
pdata (x) pmodel (x)
= Ex∼pdata log + Ex∼pmodel log
pdata (x) + pmodel (x) pdata (x) + pmodel (x)
∗ =
is achieved for pmodel = pdata where DG 1 ∗ , G) = − log 4 ≈ −1.386.
and V (DG
2
Proposition 2. If G and D have enough capacity, and at each update step the
∗ , and p
discriminator D is allowed to reach D = DG model is updated to improve
∗ ∗ ∗
V (DG , pmodel ) = Ex∼pdata [log DG (x)] + Ex∼pmodel [log(1 − DG (x))]
Z
∝ sup pmodel (x) log(1 − D(x)) dx
D x
Proof. The argument of the supremum is convex in pmodel . The supremum doesn’t
change convexity, thus V (DG∗ ,p
model ) is also convex in pmodel with global optimum
pmodel = pdata as shown in Theorem 1.
22
Assumptions
Remarks on Assumptions:
I The theoretical results above make very strong assumptions:
I The generator G and discriminator D have enough capacity
I The discriminator D reaches its optimum DG ∗
at every outer iteration
I We directly optimize pmodel instead of its parameters w
I In practice, G and D have finite capacity, D is optimized for only k steps, and
using a neural network to define G introduces critical points in parameter space.
I Thus, in practice, GANs often do not converge to pdata and might oscillate
I However, neural networks work well in practice, and balancing the updates of G
and D keeps D close to DG∗ in order to backpropagate meaningful gradients to G
23
1D Example
Iteration: 0 V(G, DG* ) = 0.939
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
1D Example
Iteration: 500 V(G, DG* ) = 1.296
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
1D Example
Iteration: 1000 V(G, DG* ) = 1.363
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
1D Example
Iteration: 1500 V(G, DG* ) = 1.381
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
1D Example
Iteration: 2000 V(G, DG* ) = 1.386
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
1D Example
Iteration: 2500 V(G, DG* ) = 1.386
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4
D(x)
p(x)
0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10
24
Empirical Results
Visualization of Samples from the Model
CIFAR-10 - ConvNet
CIFAR-10 - MLP
I The right column shows the training example nearest to the neighboring sample
26
Likelihood Estimates
I For GANs as for VAEs, the likelihood of test samples cannot be computed
I However, a rough performance estimate can be obtained by fitting a Gaussian
Parzen window to the samples generated with G and reporting the log-likelihood
27
Latent Space Interpolations
28
Mode Collapse
One major failure mode of GANs is mode collapse, where the generator learns to
produce high-quality sample with very low variability, covering only a fraction of pdata :
2. The discriminator can’t distinguish Antarctic temperatures South Pole Alice Springs
Probability
but learns that all Australian temperatures are real
5. Repeat
29
Mode Collapse
Strategies for avoiding mode collapse:
I Encourage diversity: Minibatch discrimination allows the discriminator to
compare samples across a batch to determine whether the batch is real or fake.
I Anticipate counterplay: Look into the future, e.g., via unrolling the discriminator,
and anticipate counterplay when updating generator parameters
I Experience replay: Hopping back and forth between modes can be minimised by
showing old fake samples to the discriminator once in a while
I Multiple GANs: Train multiple GANs and hope that they cover all modes.
I Optimization Objective: Wasserstein GANs, Gradient penalties, . . .
https://aiden.nibali.org/blog/2017-01-18-mode-collapse-gans/
30
Unrolled Generative Adversarial Networks
I Top: Unrolled GAN with 10 unrolling steps. Note that unrolled GANs requires
backpropagating the generator gradient through unrolled optimization.
I Bottom: Vanilla GAN. The generator cycles through the modes of the data
distribution the modes of the data, never converges to a fixed distribution, and
only ever assigns significant probability mass to a single data mode at once.
Metz, Poole, Pfau and Sohl-Dickstein: Unrolled Generative Adversarial Networks. ICLR, 2017. 31
Discussion
Advantages:
I A wide variety of functions and distributions can be modeled (flexibility)
I Only backpropagation required for training the model (no sampling)
I No approximation to the likelihood required as in VAEs
I Samples often more realistic than those of VAEs (but VAEs progress as well)
Disadvantages:
I No explicit representation of pmodel
I Sample likelihood cannot be evaluated
I The discriminator and generator must be balanced well during training
to ensure convergence to pdata and to avoid mode collapse
I Many tricks required: https://towardsdatascience.com/10-lessons
32
12.2
GAN Developments
GAN Developments
34
DCGAN
Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 35
DCGAN Samples
I Bedroom samples from a DCGAN look much better compared to vanilla GAN
Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 36
DCGAN Interpolations
I Interpolations between two random latent codes morph TVs into windows etc.
Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 37
DCGAN Arithmetics
Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 38
GAN Explosion
I https://github.com/hindupuravinash/the-gan-zoo
I https://github.com/soumith/ganhacks
39
Fréchet Inception Distance
I Evaluating the performance of generative models without exact likelihood is hard
I The Fréchet inception distance (FID) is a metric used to assess the quality of
images created by the generator of a generative adversarial network (GAN)
I The FID compares the distribution of generated images with the distribution of
real images based on deeper features of a pre-trained Inception v3 network
I The FID metric is the Fréchet distance between two multidimensional Gaussian
distributions: N (µm , Σm ), the distribution of the Inception v3 features of the
images generated by a GAN and N (µd , Σd ), the distribution of the Inception v3
features computed over the training set:
Heusel, Ramsauer, Unterthiner, Nessler and Hochreiter: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NIPS, 2017. 40
Fréchet Inception Distance
I The FID measures image fidelity but cannot measure or prevent mode collapse
Heusel, Ramsauer, Unterthiner, Nessler and Hochreiter: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NIPS, 2017. 41
Wasserstein GAN
I Low dimensional manifolds in high dimension space often have little overlap
I The discriminator of a vanilla GAN saturates if there is no overlapping support
I WGANs uses the Earth Mover’s distance which can handle such scenarios
Arjovsky, Chintala and Bottou: Wasserstein GAN. Arxiv, 2017. 42
Gradient Penalties and Convergence
Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 43
Gradient Penalties and Convergence
Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 44
Gradient Penalties and Convergence
Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 44
CycleGAN
Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 45
CycleGAN
Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 46
CycleGAN
Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 46
Progressive Growing of GANs
I Grow the generator and discriminator resolution by adding layers during training
Karras, Aila, Laine and Lehtinen: Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR, 2018. 47
BigGAN
Brock, Donahue and Simonyan: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR, 2019. 48
BigGAN
Brock, Donahue and Simonyan: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR, 2019. 49
StyleGAN / StyleGAN2
https://youtu.be/c-NJtV9Jvp0
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 50
StyleGAN / StyleGAN2
http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
StyleGAN / StyleGAN2
http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
StyleGAN / StyleGAN2
http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
12.3
Research at AVG
Intelligent systems interact with a 3D world
3D Reconstruction
56
3D Representations
Key Idea:
I Do not represent 3D shape explicitly
I Instead, consider surface implicitly
as decision boundary of a non-linear classifier:
3D Condition Occupancy
Location (eg, Image) Probability
Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 58
Occupancy Networks
Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 59
Occupancy Flow
Niemeyer, Mescheder, Oechsle and Geiger: Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics. ICCV, 2019. 60
Conditional Surface Light Fields
Manipulating the Illumination
3D Geometry
Oechsle, Niemeyer, Mescheder, Strauss and Geiger: Learning Implicit Surface Light Fields. 3DV, 2020. 61
Convolutional Occupancy Networks
Room-Level
Reconstruction
Peng, Niemeyer, Mescheder, Pollefeys and Geiger: Convolutional Occupancy Networks. ECCV, 2020. 62
Generative Radiance Fields
Patch
Discriminator
Ray
Conditional Radiance Field
Ray
Sampling Volume Rendering Predicted Patch
3D Point
Sampling
3D Point
Ray
Pixel Real
Sampling Patch
Patch Generator
Schwarz, Liao, Niemeyer, Geiger: GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. NeurIPS, 2020 63
Neural Parts
Paschalidou, Katharopoulos, Geiger and Fidler: Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks, 2021. 64
Counterfactual Generative Networks
CGN
BigGAN U2-Net
BigGAN
BigGAN U2-Net
BigGAN cGAN
Behl, Chitta, Prakash, Ohn-Bar and Geiger: Label Efficient Visual Abstractions for Autonomous Driving. IROS, 2020. 66
Label Efficient Visual Abstractions
Behl, Chitta, Prakash, Ohn-Bar and Geiger: Label Efficient Visual Abstractions for Autonomous Driving. IROS, 2020. 66
Neural Attention Fields for End-to-End Autonomous Driving
Chitta, Prakash and Geiger: Neural Attention Fields for End-to-End Autonomous Driving. 2021. 67
Visit our Research Blog
http://autonomousvision.github.io
68
Thank you for listening!
Good luck for the exam!