0% found this document useful (0 votes)
2 views

Lecture 11

The document outlines administrative details for a course, including assignment deadlines and a midterm survey. It also discusses the concepts of supervised and unsupervised learning, highlighting the differences in data types and goals, along with examples of each. Additionally, it introduces generative modeling, its objectives, and various types of generative models, including explicit and implicit density models.

Uploaded by

Khoa Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 11

The document outlines administrative details for a course, including assignment deadlines and a midterm survey. It also discusses the concepts of supervised and unsupervised learning, highlighting the differences in data types and goals, along with examples of each. Additionally, it introduces generative modeling, its objectives, and various types of generative models, including explicit and implicit density models.

Uploaded by

Khoa Nguyen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 1 May 14, 2020

Administrative

● A3 is out. Due May 27.


● Milestone is due May 18 → May 20
○ Read website page for milestone requirements.
○ Need to Finish data preprocessing and initial results by then.
● Don't discuss exam yet since people might be taking make-ups.
● Anonymous midterm survey: Link will be posted on Piazza today

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 -2 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning

Data: (x, y)
x is data, y is label

Goal: Learn a function to map x -> y

Examples: Classification,
regression, object detection,
semantic segmentation, image
captioning, etc.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 3 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning

Data: (x, y)
x is data, y is label
Cat
Goal: Learn a function to map x -> y

Examples: Classification,
regression, object detection, Classification
semantic segmentation, image
captioning, etc.
This image is CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 4 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning

Data: (x, y)
x is data, y is label

Goal: Learn a function to map x -> y

Examples: Classification, DOG, DOG, CAT


regression, object detection,
semantic segmentation, image Object Detection
captioning, etc.
This image is CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 5 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning

Data: (x, y)
x is data, y is label

Goal: Learn a function to map x -> y


GRASS, CAT,
Examples: Classification,
TREE, SKY
regression, object detection,
semantic segmentation, image Semantic Segmentation
captioning, etc.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 6 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning

Data: (x, y)
x is data, y is label

Goal: Learn a function to map x -> y


A cat sitting on a suitcase on the floor
Examples: Classification,
regression, object detection,
Image captioning
semantic segmentation, image
captioning, etc. Caption generated using neuraltalk2
Image is CC0 Public domain.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 7 May 14, 2020


Supervised vs Unsupervised Learning
Unsupervised Learning

Data: x
Just data, no labels!

Goal: Learn some underlying


hidden structure of the data

Examples: Clustering,
dimensionality reduction, feature
learning, density estimation, etc.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 8 May 14, 2020


Supervised vs Unsupervised Learning
Unsupervised Learning

Data: x
Just data, no labels!

Goal: Learn some underlying


hidden structure of the data

Examples: Clustering,
K-means clustering
dimensionality reduction, density
estimation, etc.
This image is CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 9 May 14, 2020


Supervised vs Unsupervised Learning
Unsupervised Learning

Data: x
Just data, no labels!

Goal: Learn some underlying


hidden structure of the data 3-d 2-d

Examples: Clustering, Principal Component Analysis


dimensionality reduction, density (Dimensionality reduction)
estimation, etc.
This image from Matthias Scholz
is CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 10 May 14, 2020


Supervised vs Unsupervised Learning
Unsupervised Learning

Data: x
Just data, no labels! 1-d density estimation

Goal: Learn some underlying


hidden structure of the data

Examples: Clustering,
dimensionality reduction, density 2-d density estimation
estimation, etc.
Modeling p(x) 2-d density images left and right
are CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 11 May 14, 2020


Supervised vs Unsupervised Learning
Supervised Learning Unsupervised Learning

Data: (x, y) Data: x


x is data, y is label Just data, no labels!

Goal: Learn a function to map x -> y Goal: Learn some underlying


hidden structure of the data
Examples: Classification,
regression, object detection, Examples: Clustering,
semantic segmentation, image dimensionality reduction, density
captioning, etc. estimation, etc.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 12 May 14, 2020


Generative Modeling
Given training data, generate new samples from same distribution

learning sampling
pmodel(x
)
Training data ~ pdata(x)

Objectives:
1. Learn pmodel(x) that approximates pdata(x)
2. Sampling new x from pmodel(x)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 13 May 14, 2020


Generative Modeling
Given training data, generate new samples from same distribution

learning sampling
pmodel(x
)
Training data ~ pdata(x)

Formulate as density estimation problems:


- Explicit density estimation: explicitly define and solve for pmodel(x)
- Implicit density estimation: learn model that can sample from pmodel(x) without
explicitly defining it.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 14 May 14, 2020


Why Generative Models?

- Realistic samples for artwork, super-resolution, colorization, etc.


- Learn useful features for downstream tasks such as classification.
- Getting insights from high-dimensional data (physics, medical imaging, etc.)
- Modeling physical world for simulation and planning (robotics and
reinforcement learning applications)
- Many more ...
FIgures from L-R are copyright: (1) Alec Radford et al. 2016; (2) Phillip Isola et al. 2017. Reproduced with authors permission (3) BAIR Blog.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 15 May 14, 2020


Taxonomy of Generative Models Direct
GAN
Generative models

Explicit density Implicit density

Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 16 May 14, 2020


Taxonomy of Generative Models Direct
Today: discuss 3 most GAN
popular types of generative Generative models
models today

Explicit density Implicit density

Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 17 May 14, 2020


Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 18 May 14, 2020
Fully visible belief network (FVBN)
Explicit density model
Use chain rule to decompose likelihood of an image x into product of 1-d
distributions:

Likelihood of Probability of i’th pixel value


image x given all previous pixels

Then maximize likelihood of training data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 19 May 14, 2020


Fully visible belief network (FVBN)
Explicit density model
Use chain rule to decompose likelihood of an image x into product of 1-d
distributions:

Likelihood of Probability of i’th pixel value


image x given all previous pixels
Complex distribution over pixel
values => Express using a neural
Then maximize likelihood of training data network!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 20 May 14, 2020


Recurrent Neural Network

x2 x3 x4 xn

h0 h1 h2 h3
RNN RNN RNN ... RNN

x1 x2 x3 xn-1

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 21 May 14, 2020


PixelRNN [van der Oord et al. 2016]

Generate image pixels starting from corner

Dependency on previous pixels modeled


using an RNN (LSTM)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 22 May 14, 2020


PixelRNN [van der Oord et al. 2016]

Generate image pixels starting from corner

Dependency on previous pixels modeled


using an RNN (LSTM)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 23 May 14, 2020


PixelRNN [van der Oord et al. 2016]

Generate image pixels starting from corner

Dependency on previous pixels modeled


using an RNN (LSTM)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 24 May 14, 2020


PixelRNN [van der Oord et al. 2016]

Generate image pixels starting from corner

Dependency on previous pixels modeled


using an RNN (LSTM)

Drawback: sequential generation is slow


in both training and inference!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 25 May 14, 2020


PixelCNN [van der Oord et al. 2016]

Still generate image pixels starting from


corner

Dependency on previous pixels now


modeled using a CNN over context region
(masked convolution)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 26 May 14, 2020


PixelCNN [van der Oord et al. 2016]

Still generate image pixels starting from


corner

Dependency on previous pixels now


modeled using a CNN over context region
(masked convolution)
Training is faster than PixelRNN
(can parallelize convolutions since context region
values known from training images)

Generation is still slow:


For a 32x32 image, we need to do forward passes of
the network 1024 times for a single image
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 27 May 14, 2020
Generation Samples

32x32 CIFAR-10 32x32 ImageNet

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 28 May 14, 2020


PixelRNN and PixelCNN
Improving PixelCNN performance
Pros: - Gated convolutional layers
- Can explicitly compute likelihood - Short-cut connections
p(x) - Discretized logistic loss
- Easy to optimize - Multi-scale
- Good samples - Training tricks
- Etc…
Con:
- Sequential generation => slow See
- Van der Oord et al. NIPS 2016
- Salimans et al. 2017
(PixelCNN++)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 29 May 14, 2020


Taxonomy of Generative Models Direct
GAN
Generative models

Explicit density Implicit density

Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 30 May 14, 2020


Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 31 May 14, 2020
So far...
PixelRNN/CNNs define tractable density function, optimize likelihood of training data:

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 32 May 14, 2020


So far...
PixelCNNs define tractable density function, optimize likelihood of training data:

Variational Autoencoders (VAEs) define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 33 May 14, 2020


So far...
PixelCNNs define tractable density function, optimize likelihood of training data:

Variational Autoencoders (VAEs) define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead

Why latent z?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 34 May 14, 2020


Some background first: Autoencoders
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data

Decoder
Features

Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 35 May 14, 2020


Some background first: Autoencoders
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data

z usually smaller than x


(dimensionality reduction)

Q: Why dimensionality
reduction?
Decoder
Features

Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 36 May 14, 2020


Some background first: Autoencoders
Unsupervised approach for learning a lower-dimensional feature representation
from unlabeled training data

z usually smaller than x


(dimensionality reduction)

Q: Why dimensionality
reduction?
Decoder
A: Want features to
capture meaningful Features
factors of variation in
data Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 37 May 14, 2020


Some background first: Autoencoders Reconstructed data

How to learn this feature


representation? Reconstructed
input data
Train such that features Encoder: 4-layer conv
can be used to Decoder: 4-layer upconv
reconstruct original data Decoder
“Autoencoding” - Input data
encoding input itself Features

Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 38 May 14, 2020


Some background first: Autoencoders Reconstructed data

Train such that features Doesn’t use labels!


can be used to L2 Loss function:
reconstruct original data

Encoder: 4-layer conv


Decoder: 4-layer upconv
Decoder
Input data
Features

Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 39 May 14, 2020


Some background first: Autoencoders

Reconstructed
input data
Decoder
Features After training,
throw away decoder
Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 40 May 14, 2020


Some background first: Autoencoders
Transfer from large, unlabeled
dataset to small, labeled dataset. Loss function
(Softmax, etc) bird plane
dog deer truck

Predicted Label
Fine-tune Train for final task
Classifier (sometimes with
Encoder can be encoder
jointly with small data)
used to initialize a Features
supervised model classifier
Encoder
Input data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 41 May 14, 2020


Some background first: Autoencoders
Autoencoders can reconstruct
data, and can learn features to
initialize a supervised model

Reconstructed Features capture factors of


input data variation in training data.
Decoder
But we can’t generate new
Features images from an autoencoder
because we don’t know the
Encoder space of z.

Input data How do we make autoencoder a


generative model?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 42 May 14, 2020


Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 43 May 14, 2020


Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Assume training data is generated from the distribution of unobserved (latent)


representation z

Sample from
true conditional

Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 44 May 14, 2020


Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Assume training data is generated from the distribution of unobserved (latent)


representation z
Intuition (remember from autoencoders!):
Sample from x is an image, z is latent factors used to
true conditional generate x: attributes, orientation, etc.

Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 45 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
true conditional

Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 46 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
true conditional How should we represent this model?

Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 47 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
true conditional How should we represent this model?

Choose prior p(z) to be simple, e.g.


Gaussian. Reasonable for latent attributes,
Sample from
e.g. pose, how much smile.
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 48 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
true conditional How should we represent this model?

Decoder Choose prior p(z) to be simple, e.g.


network Gaussian. Reasonable for latent attributes,
Sample from
e.g. pose, how much smile.
true prior

Conditional p(x|z) is complex (generates


image) => represent with neural network

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 49 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
How to train the model?
true conditional

Decoder
network
Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 50 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
How to train the model?
true conditional
Learn model parameters to maximize likelihood
Decoder of training data
network
Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 51 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
How to train the model?
true conditional
Learn model parameters to maximize likelihood
Decoder of training data
network
Sample from
true prior

Q: What is the problem with this?

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 52 May 14, 2020


Variational Autoencoders
We want to estimate the true parameters
of this generative model given training data x.
Sample from
How to train the model?
true conditional
Learn model parameters to maximize likelihood
Decoder of training data
network
Sample from
true prior

Q: What is the problem with this?


Intractable!
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 53 May 14, 2020


Variational Autoencoders: Intractability
Data likelihood:

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 54 May 14, 2020


Variational Autoencoders: Intractability

Data likelihood:

Simple Gaussian prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 55 May 14, 2020


Variational Autoencoders: Intractability
✔ ✔
Data likelihood:

Decoder neural network

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 56 May 14, 2020


Variational Autoencoders: Intractability
��
✔ ✔
Data likelihood:

Intractable to compute p(x|z) for every z!

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 57 May 14, 2020


Variational Autoencoders: Intractability
��
✔ ✔
Data likelihood:

Intractable to compute p(x|z) for every z!

Monte Carlo estimation is too high variance

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 58 May 14, 2020


Variational Autoencoders: Intractability
��
✔ ✔
Data likelihood:

Posterior density:

Intractable data likelihood

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 59 May 14, 2020


Variational Autoencoders: Intractability
Data likelihood:

Posterior density also intractable:

Solution: In addition to modeling pθ(x|z), learn qɸ(z|x) that approximates the true
posterior pθ(z|x).

Will see that the approximate posterior allows us to derive a lower bound on the
data likelihood that is tractable, which we can optimize.

Variational inference is to approximate the unknown posterior distribution from


only the observed data x

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 60 May 14, 2020


Variational Autoencoders

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 61 May 14, 2020


Variational Autoencoders

Taking expectation wrt. z


(using encoder network) will
come in handy later

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 62 May 14, 2020


Variational Autoencoders

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 63 May 14, 2020


Variational Autoencoders

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 64 May 14, 2020


Variational Autoencoders

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 65 May 14, 2020


Variational Autoencoders

The expectation wrt. z (using


encoder network) let us write
nice KL terms

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 66 May 14, 2020


Variational Autoencoders

Decoder network gives pθ(x|z), can This KL term (between pθ(z|x) intractable (saw
compute estimate of this term through Gaussians for encoder and z earlier), can’t compute this KL
sampling (need some trick to prior) has nice closed-form term :( But we know KL
differentiate through sampling). solution! divergence always >= 0.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 67 May 14, 2020


Variational Autoencoders

We want to
maximize the
data
likelihood

Decoder network gives pθ(x|z), can This KL term (between pθ(z|x) intractable (saw
compute estimate of this term through Gaussians for encoder and z earlier), can’t compute this KL
sampling. prior) has nice closed-form term :( But we know KL
solution! divergence always >= 0.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 68 May 14, 2020


Variational Autoencoders

We want to
maximize the
data
likelihood

Tractable lower bound which we can take


gradient of and optimize! (pθ(x|z) differentiable,
KL term differentiable)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 69 May 14, 2020


Variational Autoencoders

Make approximate
Reconstruct posterior distribution
the input data close to prior

Tractable lower bound which we can take


gradient of and optimize! (pθ(x|z) differentiable,
KL term differentiable)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 70 May 14, 2020


Variational Autoencoders
Since we’re modeling probabilistic generation of data, encoder and decoder networks are probabilistic

Mean and (diagonal) covariance of z | x Mean and (diagonal) covariance of x | z

Encoder network Decoder network

(parameters ɸ) (parameters θ)

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 71 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 72 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Let’s look at computing the KL


divergence between the estimated
posterior and the prior given some data

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 73 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 74 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Have analytical solution

Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 75 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Not part of the computation graph!

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 76 May 14, 2020


Variational Autoencoders Reparameterization trick to make
sampling differentiable:
Putting it all together: maximizing the
likelihood lower bound Sample

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 77 May 14, 2020


Variational Autoencoders Reparameterization trick to make
sampling differentiable:
Putting it all together: maximizing the
likelihood lower bound Sample Input to
the graph

Part of computation graph

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 78 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Decoder network

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 79 May 14, 2020


Variational Autoencoders
Maximize likelihood of original
input being reconstructed
Putting it all together: maximizing the
likelihood lower bound

Decoder network

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network

Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 80 May 14, 2020


Variational Autoencoders
Putting it all together: maximizing the
likelihood lower bound

Decoder network

Sample z from
Make approximate
posterior distribution
close to prior
Encoder network
For every minibatch of input
data: compute this forward
pass, and then backprop! Input Data

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 81 May 14, 2020


Variational Autoencoders: Generating Data!

Our assumption about data generation


process
Sample from
true conditional

Decoder
network
Sample from
true prior

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 82 May 14, 2020


Variational Autoencoders: Generating Data!
Now given a trained VAE:
Our assumption about data generation use decoder network & sample z from prior!
process
Sample from
true conditional Sample x|z from

Decoder
network
Sample from Decoder network
true prior

Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 83 May 14, 2020


Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from prior!

Sample x|z from

Decoder network

Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 84 May 14, 2020


Variational Autoencoders: Generating Data!
Use decoder network. Now sample z from prior! Data manifold for 2-d z

Sample x|z from


Vary z1

Decoder network

Sample z from
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Vary z2

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 85 May 14, 2020


Variational Autoencoders: Generating Data!
Diagonal prior on z
=> independent
Degree of smile
latent variables

Different
dimensions of z Vary z1
encode
interpretable factors
of variation

Head pose
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Vary z2

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 86 May 14, 2020


Variational Autoencoders: Generating Data!
Diagonal prior on z
=> independent
Degree of smile
latent variables

Different
dimensions of z Vary z1
encode
interpretable factors
of variation

Also good feature representation that


can be computed using qɸ(z|x)!
Head pose
Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 Vary z2

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 87 May 14, 2020


Variational Autoencoders: Generating Data!

Labeled Faces in the Wild


32x32 CIFAR-10

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 88 May 14, 2020


Variational Autoencoders
Probabilistic spin to traditional autoencoders => allows generating data
Defines an intractable density => derive and optimize a (variational) lower bound
Pros:
- Principled approach to generative models
- Interpretable latent space.
- Allows inference of q(z|x), can be useful feature representation for other tasks
Cons:
- Maximizes lower bound of likelihood: okay, but not as good evaluation as
PixelRNN/PixelCNN
- Samples blurrier and lower quality compared to state-of-the-art (GANs)
Active areas of research:
- More flexible approximations, e.g. richer approximate posterior instead of diagonal
Gaussian, e.g., Gaussian Mixture Models (GMMs), Categorical Distributions.
- Learning disentangled representations.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 89 May 14, 2020


Taxonomy of Generative Models Direct
GAN
Generative models

Explicit density Implicit density

Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 90 May 14, 2020


Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 91 May 14, 2020
So far...
PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 92 May 14, 2020


So far...
PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead
What if we give up on explicitly modeling density, and just want ability to sample?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 93 May 14, 2020


So far...
PixelCNNs define tractable density function, optimize likelihood of training data:

VAEs define intractable density function with latent z:

Cannot optimize directly, derive and optimize lower bound on likelihood instead
What if we give up on explicitly modeling density, and just want ability to sample?

GANs: not modeling any explicit density function!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 94 May 14, 2020


Ian Goodfellow et al., “Generative
Generative Adversarial Networks Adversarial Nets”, NIPS 2014

Problem: Want to sample from complex, high-dimensional training distribution. No direct


way to do this!

Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.

Q: What can we use to


represent this complex
transformation?

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 95 May 14, 2020


Ian Goodfellow et al., “Generative
Generative Adversarial Networks Adversarial Nets”, NIPS 2014

Problem: Want to sample from complex, high-dimensional training distribution. No direct


way to do this!

Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.

Q: What can we use to Output: Sample from


represent this complex training distribution
transformation?
A: A neural network! Generator
Network

Input: Random noise z

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 96 May 14, 2020


Ian Goodfellow et al., “Generative
Generative Adversarial Networks Adversarial Nets”, NIPS 2014

Problem: Want to sample from complex, high-dimensional training distribution. No direct


way to do this!

Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
But we don’t know which Output: Sample from
sample z maps to which training distribution
training image -> can’t
learn by reconstructing
training images Generator
Network

Input: Random noise z

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 97 May 14, 2020


Ian Goodfellow et al., “Generative
Generative Adversarial Networks Adversarial Nets”, NIPS 2014

Problem: Want to sample from complex, high-dimensional training distribution. No direct


way to do this!

Solution: Sample from a simple distribution we can easily sample from, e.g. random noise.
Learn transformation to training distribution.
But we don’t know which Output: Sample from Discriminator Real?
sample z maps to which training distribution Network Fake?
training image -> can’t
learn by reconstructing
gradient
training images Generator
Solution: Use a Network
discriminator network to
tell whether the generate Input: Random noise z
image is within data
distribution or not
Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 98 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 99 May 14, 2020


Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Real or Fake

Discriminator Network

Fake Images Real Images


(from generator) (from training set)

Generator Network

Random noise z
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 100 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Real or Fake
Discriminator learning signal
Generator learning signal Discriminator Network

Fake Images Real Images


(from generator) (from training set)

Generator Network

Random noise z
Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 101 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Train jointly in minimax game

Minimax objective function:

Generator
objective Discriminator
objective

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 102 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Train jointly in minimax game


Discriminator outputs likelihood in (0,1) of real image
Minimax objective function:

Discriminator output Discriminator output for


for real data x generated fake data G(z)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 103 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Discriminator network: try to distinguish between real and fake images


Generator network: try to fool the discriminator by generating real-looking images

Train jointly in minimax game


Discriminator outputs likelihood in (0,1) of real image
Minimax objective function:

Discriminator output Discriminator output for


for real data x generated fake data G(z)

- Discriminator (θd) wants to maximize objective such that D(x) is close to 1 (real) and
D(G(z)) is close to 0 (fake)
- Generator (θg) wants to minimize objective such that D(G(z)) is close to 1
(discriminator is fooled into thinking generated G(z) is real)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 104 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Minimax objective function:

Alternate between:
1. Gradient ascent on discriminator

2. Gradient descent on generator

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 105 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Minimax objective function:

Alternate between:
1. Gradient ascent on discriminator

2. Gradient descent on generator


When sample is likely
fake, want to learn from
it to improve generator
(move to the right on X
In practice, optimizing this generator objective
axis).
does not work well!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 106 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Minimax objective function:

Alternate between:
1. Gradient ascent on discriminator
Gradient signal
dominated by region
where sample is
2. Gradient descent on generator already good
When sample is likely
fake, want to learn from
it to improve generator
(move to the right on X
In practice, optimizing this generator objective
axis).
does not work well!
But gradient in this
region is relatively flat!

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 107 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Minimax objective function:

Alternate between:
1. Gradient ascent on discriminator

2. Instead: Gradient ascent on generator, different objective

Instead of minimizing likelihood of discriminator being correct, now High gradient signal
maximize likelihood of discriminator being wrong.
Same objective of fooling discriminator, but now higher gradient
signal for bad samples => works much better! Standard in practice. Low gradient signal

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 108 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Putting it together: GAN training algorithm

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 109 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Putting it together: GAN training algorithm

Some find k=1


more stable,
others use k > 1,
no best rule.

Followup work
(e.g. Wasserstein
GAN, BEGAN)
alleviates this
problem, better
stability!

Arjovsky et al. "Wasserstein gan." arXiv preprint arXiv:1701.07875 (2017)


Berthelot, et al. "Began: Boundary equilibrium generative adversarial networks." arXiv preprint arXiv:1703.10717 (2017)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 110 May 14, 2020
Ian Goodfellow et al., “Generative
Training GANs: Two-player game Adversarial Nets”, NIPS 2014

Generator network: try to fool the discriminator by generating real-looking images


Discriminator network: try to distinguish between real and fake images

Real or Fake

Discriminator Network

Fake Images Real Images


(from generator) (from training set)

Generator Network
After training, use generator network to
Random noise z generate new images

Fake and real images copyright Emily Denton et al. 2015. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 111 May 14, 2020
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Generative Adversarial Nets
Generated samples

Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 112 May 14, 2020
Ian Goodfellow et al., “Generative
Adversarial Nets”, NIPS 2014
Generative Adversarial Nets
Generated samples (CIFAR-10)

Nearest neighbor from training set Figures copyright Ian Goodfellow et al., 2014. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 113 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures

Generator is an upsampling network with fractionally-strided convolutions


Discriminator is a convolutional network

Radford et al, “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, ICLR 2016

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 114 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures

Samples
from the
model look
much
better!

Radford et al,
ICLR 2016

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 115 May 14, 2020
Generative Adversarial Nets: Convolutional Architectures

Interpolating
between
random
points in latent
space

Radford et al,
ICLR 2016

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 116 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man

Samples
from the
model

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 117 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man

Samples
from the
model

Average Z
vectors, do
arithmetic

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 118 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Radford et al, ICLR 2016
Smiling woman Neutral woman Neutral man

Smiling Man
Samples
from the
model

Average Z
vectors, do
arithmetic

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 119 May 14, 2020
Generative Adversarial Nets: Interpretable Vector Math
Glasses man No glasses man No glasses woman Radford et al,
ICLR 2016

Woman with glasses

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 120 May 14, 2020
2017: Explosion of GANs See also: https://github.com/soumith/ganhacks for tips
and tricks for trainings GANs
“The GAN Zoo”

https://github.com/hindupuravinash/the-gan-zoo

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 121 May 14, 2020
2017: Explosion of GANs
Better training and generation

LSGAN, Zhu 2017. Wasserstein GAN,


Arjovsky 2017.
Improved Wasserstein
GAN, Gulrajani 2017.
Progressive GAN, Karras 2018.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 122 May 14, 2020
Text -> Image Synthesis
2017: Explosion of GANs
Source->Target domain transfer

Reed et al. 2017.


Many GAN applications

Pix2pix. Isola 2017. Many examples at


CycleGAN. Zhu et al. 2017. https://phillipi.github.io/pix2pix/

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 123 May 14, 2020
2019: BigGAN

Brock et al., 2019

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 124 May 14, 2020
Scene graphs to GANs

Specifying exactly what kind of image you


want to generate.

The explicit structure in scene graphs


provides better image generation for complex
scenes.

Figures copyright 2019. Reproduced with permission.


Johnson et al. Image Generation from Scene Graphs, CVPR 2019

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 125 May 14, 2020
HYPE: Human eYe Perceptual Evaluations
hype.stanford.edu

Zhou, Gordon, Krishna et al. HYPE: Human eYe Perceptual Evaluations, NeurIPS 2019 Figures copyright 2019. Reproduced with permission.

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 126 May 14, 2020
GANs
Don’t work with an explicit density function
Take game-theoretic approach: learn to generate from training distribution through 2-player
game

Pros:
- Beautiful, state-of-the-art samples!

Cons:
- Trickier / more unstable to train
- Can’t solve inference queries such as p(x), p(z|x)

Active areas of research:


- Better loss functions, more stable training (Wasserstein GAN, LSGAN, many others)
- Conditional GANs, GANs for all kinds of applications

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 127 May 14, 2020
Taxonomy of Generative Models Direct
GAN
Generative models

Explicit density Implicit density

Markov Chain
Tractable density Approximate density
GSN
Fully Visible Belief Nets
- NADE
- MADE Variational Markov Chain
- PixelRNN/CNN
Variational Autoencoder Boltzmann Machine
- NICE / RealNVP
- Glow
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
- Ffjord

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 128 May 14, 2020
Useful Resources on Generative Models

CS 236: Deep Generative Models (Stanford)

CS 294-158 Deep Unsupervised Learning (Berkeley)

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 129 May 14, 2020
Next: Detection and Segmentation
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT


TREE, SKY

No spatial extent No objects, just pixels Multiple Object This image is CC0 public domain

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 11 - 130 May 14, 2020

You might also like