0% found this document useful (0 votes)
7 views

lec_12_generative_adversarial_networks

Uploaded by

prof.mly786
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

lec_12_generative_adversarial_networks

Uploaded by

prof.mly786
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Deep Learning

Lecture 12 – Generative Adversarial Networks

Prof. Dr.-Ing. Andreas Geiger


Autonomous Vision Group
University of Tübingen / MPI-IS
Agenda

12.1 Generative Adversarial Networks

12.2 GAN Developments

12.3 Research at AVG

2
12.1
Generative Adversarial Networks
Recap: Latent Variable Models
LVMs map between observation space x ∈ RD and latent space z ∈ RQ :

( fw : x 7→ z ) gw : z 7→ x̂
I One latent variable gets associated with each data point in the training set
I The latent vectors are smaller than the observations (Q < D) ⇒ compression
I Models are linear or non-linear, deterministic or stochastic, with/without encoder

A little taxonomy:

Deterministic Probabilistic
Linear Principle Component Analysis Probabilistic PCA
Non-Linear w/ Encoder Autoencoder Variational Autoencoder
Non-Linear w/o Encoder Gen. Adv. Networks 4
Generative Models

I The term generative model refers to any model that takes a dataset drawn from
pdata and learns a probability distribution pmodel to represent pdata
I In some cases, the model estimates pmodel explicitly and therefore allow for
evaluating the (approximate) likelihood/density pmodel (x) of a sample x
I In other cases, the model is only able to generate samples from pmodel
I GANs are prominent examples of this family of implicit models
I They provide a framework for training models without explicit likelihood

5
Generative Models

Generative Models

[Goodfellow: Tutorial on Generative Adversarial Networks, 2017]

6
Generative Adversarial Networks
Generative Adversarial Networks

I VAEs approximate the intractable likelihood using a recognition model


I GANs give up on explicitly modeling the density/likelihood
I Instead, they use an adversarial process in which two models (“players”) are
trained simultaneously, also referred to as two-player game:
I A generator G that captures the data distribution
I A discriminator D that estimates if a sample cam from the data distribution
I The goal of G is to maximize the probability of D making a mistake – to fool it
I Backpropagation can be used to optimize this two-player game
I No approximate inference or slow Markov chains are necessary
I Theoretical results (optimality, convergence) require strong assumptions

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 8
Generative Adversarial Networks

Let x ∈ RD denote an observation and p(z) a prior over latent variables z ∈ RQ .


Let GwG : RQ 7→ RD denote the generator network with induced distribution pmodel .
Let DwD : RD 7→ [0, 1] denote the discriminator network which outputs a probability.

D and G play the following two-player minimax game with value function V (D, G):

G∗ , D∗ = argmin argmax V (D, G)


G D

V (D, G) = Ex∼pdata (x) [log D(x)] + Ez∼p(z) [log(1 − D(G(z)))]

We train D to assign probability one to samples from pdata and zero to samples from
pmodel , and G to fool D such that it assigns probability one to samples from pmodel .

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 9
Generative Adversarial Networks

Generator Discriminator
Network Network

Discriminator
Network

I The generator and discriminator can be implemented as MLPs, ConvNets, RNNs


I The discriminator network can be considered a learned loss function on x̂
I After training, the generator is kept to represent pmodel and sample from pmodel
Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 10
Generative Adversarial Networks

I Theoretical analysis shows that this minimax game recovers pmodel = pdata
if G and D are given enough capacity and assuming that D∗ can be reached
I In practice, however, we must use iterative numerical optimization and optimizing
D in the inner loop to completion is computationally prohibitive and would lead to
overfitting on finite datasets
I Therefore, we resort to alternating optimization:
I k steps of optimizing D (typically k ∈ {1, . . . , 5})
I 1 step of optimizing G (using a small enough learning rate)
I This way, we maintain D near its optimal solution as long as G changes slowly

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 11
Algorithm
While not converged do
1. For k steps do
1.1 Draw B training samples {x1 , . . . , xB } from pdata (x)
1.2 Draw B latent samples {z1 , . . . , zB } from p(z)
1.3 Update the discriminator D by ascending its stochastic gradient:
B
1 X
∇wD log D(xb ) + log(1 − D(G(zb )))
B
b=1

2. Draw B latent samples {z1 , . . . , zB } from p(z)


3. Update the generator G by descending its stochastic gradient:
B
1 X
∇wG log(1 − D(G(zb )))
B
b=1

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 12
The Gradient Trick

I Early in training, when G is poor, D


log(D(G(z)))
rejects samples with high confidence 4 log(1 D(G(z)))
I Thus log(1 − D(G(z)) saturates
2
I Instead of training G to minimize
0
log(1 − D(G(z))) we can train G to
maximize log(D(G(z))) 2

I This results in the same fixed point 4


but provides stronger gradients early
on during training 0.0 0.2 0.4 0.6 0.8 1.0
D(G(z))

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 13
Expressiveness

I Similar to a VAE decoder, the generator in GANs is very expressive


I Consider random samples p(z) = N (0, I) mapped via GwG (z) = z/10 + z/kzk
I The generator GwG (z) is a neural network with learned parameters wG

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 14
1D Example
A simple example with linear generator:

x ∼ N (µ, σ)
z ∼ N (0, 1)
G(z) = w0G + w1G z
D(x) = σ(w0D + w1D x + w2D x2 )

I Here, we consider the data distribution and the prior as two different 1D
Gaussians and initialize G(z) = z and D(x) = σ(x), i.e., pmodel (x) = N (x|0, 1)
I The goal is to learn wG and wD such that pmodel (x) = pdata (x) = N (x|µ, σ)
I Remark: The x2 term is needed to provide gradients for the second moment

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 15
1D Example
Iteration: 0
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 1000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 1500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 2000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 2500
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 16
1D Example
Iteration: 0
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 17
1D Example
Iteration: 4000
0.5 1.0
pdata(x) D(x)
0.4 pmodel(x) 0.8
p(z)
0.3 0.6

D(x)
p(x)

0.2 0.4

0.1 0.2

x 0

z
15 10 5 0 5 10

Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville, Bengio: Generative Adversarial Networks. NIPS, 2014. 17
Theoretical Results
Generative Adversarial Networks

Let x ∈ RD denote an observation and p(z) a prior over latent variables z ∈ RQ .


Let GwG : RQ 7→ RD denote the generator network with induced distribution pmodel .
Let DwD : RD 7→ [0, 1] denote the discriminator network which outputs a probability.

D and G play the following two-player minimax game with value function V (D, G):

G∗ , V ∗ = argmin argmax V (D, G)


G D

V (D, G) = Ex∼pdata (x) [log D(x)] + Ez∼p(z) [log(1 − D(G(z)))]

We train D to assign probability one to samples from pdata and zero to samples from
pmodel , and G to fool D such that it assigns probability one to samples from pmodel .

19
Optimal Discriminator
Proposition 1. For any given generator G, the optimal discriminator D is:

∗ pdata (x)
DG (x) =
pdata (x) + pmodel (x)

Proof. The training criterion for the discriminator D is to maximize (wrt. D):
Z Z
V (D, G) = pdata (x) log(D(x)) dx + p(z) log(1 − D(G(z))) dz
Zx z

= pdata (x) log(D(x)) + pmodel (x) log(1 − D(x)) dx


x

For any (a, b) ∈ R2 \ {0, 0}, the function y 7→ a log(y) + b log(1 − y) achieves its
a
maximum in [0, 1] at a+b . The discriminator D does not need to be defined outside
Supp(pdata ) ∪ Supp(pmodel ) where pdata = 0 and pmodel = 0, concluding the proof. 
20
Global Optimality
Theorem 1. The global minimum of the virtual training criterion
∗ ∗ ∗
V (DG , G) = Ex∼pdata [log DG (x)] + Ex∼pmodel [log(1 − DG (x))]
pdata (x) pmodel (x)
= Ex∼pdata log + Ex∼pmodel log
pdata (x) + pmodel (x) pdata (x) + pmodel (x)

∗ =
is achieved for pmodel = pdata where DG 1 ∗ , G) = − log 4 ≈ −1.386.
and V (DG
2

Proof. Reformulation in terms of the non-negative Jensen-Shannon divergence yields:



V (DG , G) = KL(pdata , pdata + pmodel ) + KL(pmodel , pdata + pmodel )
   
pdata + pmodel pdata + pmodel
= − log 4 + KL pdata , + KL pmodel ,
2 2
= − log 4 + JSD(pdata , pmodel ) 
21
Convergence

Proposition 2. If G and D have enough capacity, and at each update step the
∗ , and p
discriminator D is allowed to reach D = DG model is updated to improve

∗ ∗ ∗
V (DG , pmodel ) = Ex∼pdata [log DG (x)] + Ex∼pmodel [log(1 − DG (x))]
Z
∝ sup pmodel (x) log(1 − D(x)) dx
D x

then pmodel converges to pdata .

Proof. The argument of the supremum is convex in pmodel . The supremum doesn’t
change convexity, thus V (DG∗ ,p
model ) is also convex in pmodel with global optimum
pmodel = pdata as shown in Theorem 1. 

22
Assumptions
Remarks on Assumptions:
I The theoretical results above make very strong assumptions:
I The generator G and discriminator D have enough capacity
I The discriminator D reaches its optimum DG ∗
at every outer iteration
I We directly optimize pmodel instead of its parameters w
I In practice, G and D have finite capacity, D is optimized for only k steps, and
using a neural network to define G introduces critical points in parameter space.
I Thus, in practice, GANs often do not converge to pdata and might oscillate
I However, neural networks work well in practice, and balancing the updates of G
and D keeps D close to DG∗ in order to backpropagate meaningful gradients to G

I See also: https://srome.github.io/An-Annotated-Proof

23
1D Example
Iteration: 0 V(G, DG* ) = 0.939
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
1D Example
Iteration: 500 V(G, DG* ) = 1.296
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
1D Example
Iteration: 1000 V(G, DG* ) = 1.363
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
1D Example
Iteration: 1500 V(G, DG* ) = 1.381
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
1D Example
Iteration: 2000 V(G, DG* ) = 1.386
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
1D Example
Iteration: 2500 V(G, DG* ) = 1.386
1.0
pdata(x) D(x)
pmodel(x) DG* (x) 0.8
p(z)
0.5 0.6
0.4

D(x)
p(x)

0.4
0.3
0.2 0.2
0.1
0
x
z
15 10 5 0 5 10

24
Empirical Results
Visualization of Samples from the Model

Toronto Faces - MLP


MNIST - MLP

CIFAR-10 - ConvNet
CIFAR-10 - MLP

I The right column shows the training example nearest to the neighboring sample
26
Likelihood Estimates

I For GANs as for VAEs, the likelihood of test samples cannot be computed
I However, a rough performance estimate can be obtained by fitting a Gaussian
Parzen window to the samples generated with G and reporting the log-likelihood

27
Latent Space Interpolations

I Digits obtained by linearly interpolating between coordinates in latent space

28
Mode Collapse
One major failure mode of GANs is mode collapse, where the generator learns to
produce high-quality sample with very low variability, covering only a fraction of pdata :

1. The generator learns to fool the discriminator by producing


values close to Antarctic temperatures

2. The discriminator can’t distinguish Antarctic temperatures South Pole Alice Springs

Probability
but learns that all Australian temperatures are real

3. The generator learns that it should produce Australian


temperatures and abandons the Antarctic mode
-20 35
4. The discriminator can’t distinguish Australian temperatures Temperature (°C)
but learns that all Antarctic temperatures are real

5. Repeat
29
Mode Collapse
Strategies for avoiding mode collapse:
I Encourage diversity: Minibatch discrimination allows the discriminator to
compare samples across a batch to determine whether the batch is real or fake.
I Anticipate counterplay: Look into the future, e.g., via unrolling the discriminator,
and anticipate counterplay when updating generator parameters
I Experience replay: Hopping back and forth between modes can be minimised by
showing old fake samples to the discriminator once in a while
I Multiple GANs: Train multiple GANs and hope that they cover all modes.
I Optimization Objective: Wasserstein GANs, Gradient penalties, . . .

https://aiden.nibali.org/blog/2017-01-18-mode-collapse-gans/
30
Unrolled Generative Adversarial Networks

I Top: Unrolled GAN with 10 unrolling steps. Note that unrolled GANs requires
backpropagating the generator gradient through unrolled optimization.
I Bottom: Vanilla GAN. The generator cycles through the modes of the data
distribution the modes of the data, never converges to a fixed distribution, and
only ever assigns significant probability mass to a single data mode at once.

Metz, Poole, Pfau and Sohl-Dickstein: Unrolled Generative Adversarial Networks. ICLR, 2017. 31
Discussion
Advantages:
I A wide variety of functions and distributions can be modeled (flexibility)
I Only backpropagation required for training the model (no sampling)
I No approximation to the likelihood required as in VAEs
I Samples often more realistic than those of VAEs (but VAEs progress as well)

Disadvantages:
I No explicit representation of pmodel
I Sample likelihood cannot be evaluated
I The discriminator and generator must be balanced well during training
to ensure convergence to pdata and to avoid mode collapse
I Many tricks required: https://towardsdatascience.com/10-lessons
32
12.2
GAN Developments
GAN Developments

34
DCGAN

Architecture Guidelines for stable Deep Convolutional GANs:


I Replace any pooling layers with strided convolutions (discriminator) and
fractional strided convolutions for upsampling (generator)
I Use batch normalization in both the generator and the discriminator
I Remove fully connected hidden layers for deeper architectures
I Use ReLU activations in the generator except for the output which uses tanh
I Use Leaky ReLU activations in the discriminator for all layers
Found to work well through systematic experimentation in DCGAN paper.

Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 35
DCGAN Samples

I Bedroom samples from a DCGAN look much better compared to vanilla GAN

Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 36
DCGAN Interpolations

I Interpolations between two random latent codes morph TVs into windows etc.

Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 37
DCGAN Arithmetics

I Vector arithmetic on averaged z vectors of the samples (+ Gaussian noise)

Radford, Metz and Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR, 2016. 38
GAN Explosion

I https://github.com/hindupuravinash/the-gan-zoo
I https://github.com/soumith/ganhacks
39
Fréchet Inception Distance
I Evaluating the performance of generative models without exact likelihood is hard
I The Fréchet inception distance (FID) is a metric used to assess the quality of
images created by the generator of a generative adversarial network (GAN)
I The FID compares the distribution of generated images with the distribution of
real images based on deeper features of a pre-trained Inception v3 network
I The FID metric is the Fréchet distance between two multidimensional Gaussian
distributions: N (µm , Σm ), the distribution of the Inception v3 features of the
images generated by a GAN and N (µd , Σd ), the distribution of the Inception v3
features computed over the training set:

F ID = kµm − µd k22 + Tr(Σm + Σd − 2(Σm Σd )1/2 )

Heusel, Ramsauer, Unterthiner, Nessler and Hochreiter: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NIPS, 2017. 40
Fréchet Inception Distance

I The FID measures image fidelity but cannot measure or prevent mode collapse

Heusel, Ramsauer, Unterthiner, Nessler and Hochreiter: GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. NIPS, 2017. 41
Wasserstein GAN

I Low dimensional manifolds in high dimension space often have little overlap
I The discriminator of a vanilla GAN saturates if there is no overlapping support
I WGANs uses the Earth Mover’s distance which can handle such scenarios
Arjovsky, Chintala and Bottou: Wasserstein GAN. Arxiv, 2017. 42
Gradient Penalties and Convergence

I Adding a gradient penalty wrt. the gradients of D stabilizes GAN training:


h i
V (D, G) = Ex∼pdata (x) log D(x) − λk∇x D(x)k2 + Ez∼p(z) [log(1 − D(G(z)))]

Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 43
Gradient Penalties and Convergence

I A DCGAN with gradient penalties leads to high-quality samples without hacks

Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 44
Gradient Penalties and Convergence

I A DCGAN with gradient penalties leads to high-quality samples without hacks

Mescheder, Geiger and Nowozin: Which Training Methods for GANs do actually Converge? ICML, 2018. 44
CycleGAN

CycleGAN Image-to-Image Translation:


I Learn forward and backward mapping btw. two domains (domains = latents)
I Use cycle-consistency and adversarial losses to constrain this mapping

Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 45
CycleGAN

Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 46
CycleGAN

Zhu, Park, Isola and Efros: Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. ICCV, 2017. 46
Progressive Growing of GANs

I Grow the generator and discriminator resolution by adding layers during training

Karras, Aila, Laine and Lehtinen: Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR, 2018. 47
BigGAN

I Scale class-conditional GANs to ImageNet (5122 ) without progressive growing


I Key: more parameters, larger minibatches, orthogonal regularization of G
I Explore variants of spectral normalization and gradient penalties for D
I Analyze trade-off between stability (regularization) and performance (FID)

Brock, Donahue and Simonyan: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR, 2019. 48
BigGAN

I Monitor singular values of weight matrices of generator and discriminator


I Found early stopping leads to better FID scores compared to regularizing D

Brock, Donahue and Simonyan: Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR, 2019. 49
StyleGAN / StyleGAN2

https://youtu.be/c-NJtV9Jvp0
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 50
StyleGAN / StyleGAN2

I Complex stochastic variation with different realizations of input noise


Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 51
StyleGAN / StyleGAN2

http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
StyleGAN / StyleGAN2

http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
StyleGAN / StyleGAN2

http://thispersondoesnotexist.com/
Karras, Laine, Aittala, Hellsten, Lehtinen and Aila: Analyzing and Improving the Image Quality of StyleGAN. CVPR, 2020. 52
12.3
Research at AVG
Intelligent systems interact with a 3D world
3D Reconstruction

Input Images Neural Network 3D Reconstruction

56
3D Representations

Voxels Points Meshes


[Maturana et al., IROS 2015] [Fan et al., CVPR 2017] [Groueix et al., CVPR 2018]
57
Occupancy Networks

Key Idea:
I Do not represent 3D shape explicitly
I Instead, consider surface implicitly
as decision boundary of a non-linear classifier:

3D Condition Occupancy
Location (eg, Image) Probability

Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 58
Occupancy Networks

Mescheder, Oechsle, Niemeyer, Nowozin and Geiger: Occupancy Networks: Learning 3D Reconstruction in Function Space. CVPR, 2019. 59
Occupancy Flow

Niemeyer, Mescheder, Oechsle and Geiger: Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics. ICCV, 2019. 60
Conditional Surface Light Fields
Manipulating the Illumination

Input Image Neural


Network

Changing the Viewpoint

3D Geometry

Oechsle, Niemeyer, Mescheder, Strauss and Geiger: Learning Implicit Surface Light Fields. 3DV, 2020. 61
Convolutional Occupancy Networks

Room-Level
Reconstruction

Peng, Niemeyer, Mescheder, Pollefeys and Geiger: Convolutional Occupancy Networks. ECCV, 2020. 62
Generative Radiance Fields

Patch
Discriminator

Ray
Conditional Radiance Field
Ray
Sampling Volume Rendering Predicted Patch
3D Point
Sampling
3D Point
Ray

Pixel Real
Sampling Patch
Patch Generator

Schwarz, Liao, Niemeyer, Geiger: GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. NeurIPS, 2020 63
Neural Parts

Paschalidou, Katharopoulos, Geiger and Fidler: Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks, 2021. 64
Counterfactual Generative Networks

CGN

BigGAN U2-Net

BigGAN

BigGAN U2-Net

BigGAN cGAN

Sauer and Geiger: Counterfactual Generative Networks. ICLR, 2021. 65


Label Efficient Visual Abstractions

Behl, Chitta, Prakash, Ohn-Bar and Geiger: Label Efficient Visual Abstractions for Autonomous Driving. IROS, 2020. 66
Label Efficient Visual Abstractions

Behl, Chitta, Prakash, Ohn-Bar and Geiger: Label Efficient Visual Abstractions for Autonomous Driving. IROS, 2020. 66
Neural Attention Fields for End-to-End Autonomous Driving

Chitta, Prakash and Geiger: Neural Attention Fields for End-to-End Autonomous Driving. 2021. 67
Visit our Research Blog

http://autonomousvision.github.io
68
Thank you for listening!
Good luck for the exam!

You might also like