Lecture 11
Lecture 11
Administrative
Data: (x, y)
x is data, y is label
Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.
Data: (x, y)
x is data, y is label
Cat
Goal: Learn a function to map x -> y
Examples: Classification,
regression, object detection, Classification
semantic segmentation, image
captioning, etc.
This image is CC0 public domain
Data: (x, y)
x is data, y is label
Data: (x, y)
x is data, y is label
Data: (x, y)
x is data, y is label
Data: x
Just data, no labels!
Examples: Clustering,
dimensionality reduction, feature
learning, density estimation, etc.
Data: x
Just data, no labels!
Examples: Clustering,
K-means clustering
dimensionality reduction, density
estimation, etc.
This image is CC0 public domain
Data: x
Just data, no labels!
Data: x
Just data, no labels! 1-d density estimation
Examples: Clustering,
dimensionality reduction, density 2-d density estimation
estimation, etc.
Modeling p(x) 2-d density images left and right
are CC0 public domain
learning sampling
pmodel(x
)
Training data ~ pdata(x)
Objectives:
1. Learn pmodel(x) that approximates pdata(x)
2. Sampling new x from pmodel(x)
learning sampling
pmodel(x
)
Training data ~ pdata(x)
Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord
Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord
x2 x3 x4 xn
h0 h1 h2 h3
RNN RNN RNN ... RNN
x1 x2 x3 xn-1
Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord
Cannot optimize directly, derive and optimize lower bound on likelihood instead
Cannot optimize directly, derive and optimize lower bound on likelihood instead
Why latent z?
Decoder
Features
Encoder
Input data
Q: Why dimensionality
reduction?
Decoder
Features
Encoder
Input data
Q: Why dimensionality
reduction?
Decoder
A: Want features to
capture meaningful Features
factors of variation in
data Encoder
Input data
Encoder
Input data
Encoder
Input data
Reconstructed
input data
Decoder
Features After training,
throw away decoder
Encoder
Input data
Predicted Label
Fine-tune Train for final task
Classifier (sometimes with
Encoder can be encoder
jointly with small data)
used to initialize a Features
supervised model classifier
Encoder
Input data
Sample from
true conditional
Sample from
true prior
Sample from
true prior
Sample from
true prior
Sample from
true prior
Decoder
network
Sample from
true prior
Posterior density:
Solution: In addition to modeling pθ(x|z), learn qɸ(z|x) that approximates the true
posterior pθ(z|x).
Will see that the approximate posterior allows us to derive a lower bound on the
data likelihood that is tractable, which we can optimize.
Decoder network gives pθ(x|z), can This KL term (between pθ(z|x) intractable (saw
compute estimate of this term through Gaussians for encoder and z earlier), can’t compute this KL
sampling (need some trick to prior) has nice closed-form term :( But we know KL
differentiate through sampling). solution! divergence always >= 0.
We want to
maximize the
data
likelihood
Decoder network gives pθ(x|z), can This KL term (between pθ(z|x) intractable (saw
compute estimate of this term through Gaussians for encoder and z earlier), can’t compute this KL
sampling. prior) has nice closed-form term :( But we know KL
solution! divergence always >= 0.
We want to
maximize the
data
likelihood
Make approximate
Reconstruct posterior distribution
the input data close to prior
(parameters ɸ) (parameters θ)
Input Data
Encoder network
Input Data
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Decoder network
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Decoder network
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
Input Data
Decoder network
Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
For every minibatch of input
data: compute this forward
pass, and then backprop! Input Data
Decoder
network
Sample from
true prior
Decoder
network
Sample from Decoder network
true prior
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Decoder network
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014
Decoder network
Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Vary z2
Different
dimensions of z Vary z1
encode
interpretable factors
of variation
Head pose
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Vary z2
Different
dimensions of z Vary z1
encode
interpretable factors
of variation
Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord
Cannot optimize directly, derive and optimize lower bound on likelihood instead
Cannot optimize directly, derive and optimize lower bound on likelihood instead
What if we give up on explicitly modeling density, and just want ability to sample?
Cannot optimize directly, derive and optimize lower bound on likelihood instead
What if we give up on explicitly modeling density, and just want ability to sample?
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
But we don’t know which Output: Sample from
sample z maps to which training distribution
training image -> can’t
learn by reconstructing
training images Generator
Network
Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
But we don’t know which Output: Sample from Discriminator Real?
sample z maps to which training distribution Network Fake?
training image -> can’t
learn by reconstructing
gradient
training images Generator
Solution: Use a Network
discriminator network to
tell whether the generate Input: Random noise z
image is within data
distribution or not
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 98 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Real or Fake
Discriminator Network
Generator Network
Random noise z
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 100 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Real or Fake
Discriminator learning signal
Generator learning signal Discriminator Network
Generator Network
Random noise z
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 101 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Generator
objective Discriminator
objective
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 102 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 103 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
- Discriminator (θd) wants to maximize objective such that D(x) is close to 1 (real) and
D(G(z)) is close to 0 (fake)
- Generator (θg) wants to minimize objective such that D(G(z)) is close to 1
(discriminator is fooled into thinking generated G(z) is real)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 104 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Alternate between:
1. Gradient ascent on discriminator
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 105 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Alternate between:
1. Gradient ascent on discriminator
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 106 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Alternate between:
1. Gradient ascent on discriminator
Gradient signal
dominated by region
where sample is
2. Gradient descent on generator already good
When sample is likely
fake, want to learn from
it to improve generator
(move to the right on X
In practice, optimizing this generator objective
axis).
does not work well!
But gradient in this
region is relatively flat!
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 107 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Alternate between:
1. Gradient ascent on discriminator
Instead of minimizing likelihood of discriminator being correct, now High gradient signal
maximize likelihood of discriminator being wrong.
Same objective of fooling discriminator, but now higher gradient
signal for bad samples => works much better! Standard in practice. Low gradient signal
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 108 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 109 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Followup work
(e.g. Wasserstein
GAN, BEGAN)
alleviates this
problem, better
stability!
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 110 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014
Real or Fake
Discriminator Network
Generator Network
After training, use generator network to
Random noise z generate new images
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 111 May 14, 2020
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Generative Adversarial Nets
Generated samples
Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 112 May 14, 2020
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Generative Adversarial Nets
Generated samples (CIFAR-10)
Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 113 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures
Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 114 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures
Samples
from the
model look
much
better!
Radford et al,
ICLR 2016
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 115 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures
Interpolating
between
random
points in latent
space
Radford et al,
ICLR 2016
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 116 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man
Samples
from the
model
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 117 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man
Samples
from the
model
Average Z
vectors, do
arithmetic
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 118 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man
Smiling Man
Samples
from the
model
Average Z
vectors, do
arithmetic
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 119 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Glasses man No glasses man No glasses woman Radford et al,
ICLR 2016
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 120 May 14, 2020
2017: Explosion of GANs See also: https://github.com/soumith/ganhacks for tips
and tricks for trainings GANs
“The GAN Zoo”
https://github.com/hindupuravinash/the-gan-zoo
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 121 May 14, 2020
2017: Explosion of GANs
Better training and generation
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 122 May 14, 2020
Text -> Image Synthesis
2017: Explosion of GANs
Source->Target domain transfer
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 123 May 14, 2020
2019: BigGAN
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 124 May 14, 2020
Scene graphs to GANs
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 125 May 14, 2020
HYPE: Human eYe Perceptual Evaluations
hype.stanford.edu
Zhou, Gordon, Krishna et al. HYPE: Human eYe Perceptual Evaluations, NeurIPS 2019 Figures copyright 2019. Reproduced with permission.
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 126 May 14, 2020
GANs
Don’t work with an explicit density function
Take game-theoretic approach: learn to generate from training distribution through 2-player
game
Pros:
- Beautiful, state-of-the-art samples!
Cons:
- Trickier / more unstable to train
- Can’t solve inference queries such as p(x), p(z|x)
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 127 May 14, 2020
Taxonomy of Generative Models Direct
GAN
Generative models
Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 128 May 14, 2020
Useful Resources on Generative Models
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 129 May 14, 2020
Next: Detection and Segmentation
Semantic Object Instance
Classification
Segmentation Detection Segmentation
No spatial extent No objects, just pixels Multiple Object This image is CC0 public domain
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 130 May 14, 2020