0% found this document useful (0 votes)
9 views43 pages

AA2 3.3.1 Generative Explicit Density 2024

Generative AI

Uploaded by

lolamentosano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views43 pages

AA2 3.3.1 Generative Explicit Density 2024

Generative AI

Uploaded by

lolamentosano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Machine Learning II

3. Advanced architectures
3.3 Generative networks
3.3.1 Explicit density
Image and Video Processing Group, TSC, UPC

3.3 Generative
Outline

• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)

2 3.3 Generative
Supervised vs Unsupervised training

Supervised learning Unsupervised learning

Data: (x, y) x is data, y is label Data: (x) just data, no labels!


Goal: Learn a function to map Goal: Learn some underlaying
x -> y hidden structure of the data
Examples: Classification, Examples: Clustering,
regression, object detection, dimensionality reduction,
semantic segmentation, image feature learning, density
captioning, etc. estimation, etc.

3 3.3 Generative
Generative models (1)

Classification Regression

Text Probability of being a Potential Customer

Image

Jim Carrey

Audio Language Translation

What Language?

Discriminative Modeling
p(y|x)

4 3.3 Generative
Generative models (2)

Classification Regression Generative


“What about Ron magic?” offered Ron.
To Harry, Ron was loud, slow and soft
Text Prob. of being a Potential Customer
bird. Harry did not like to think about
birds.

Image

Jim Carrey

Music Composer and Interpreter


Audio Language Translation
MuseNet Sample

What Language?

Discriminative Modeling Generative Modeling


p(y|x) p(x)

5 3.3 Generative
Generative models (3)

• Model data distribution so that we can sample new samples


out of the distribution

Sample
Out
Learn
Generated Samples

Training Dataset Modeled Data Space


pdata(x) Pmodel(z)

6 3.3 Generative
Generative models (4)

• Model data distribution so that we can sample new samples


out of the distribution

Interpolated Samples

Learn

Training Dataset Modeled Data Space


pdata(x) Pmodel(z)

7 3.3 Generative
Generative models (5)

• We can generate any type of data, like speech waveform


samples or image pixels.

M samples = M dimensions

Scan and unroll


M = 32x32x3 = 3072 dimensions
32

.
.
.
32 x3 channels

8 3.3 Generative
Generative models (6)

• Our learned model should be able to make up new samples


from the distribution, not just copy and paste existing
samples!

Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)

9 3.3 Generative
Why Generative Models? (1)
• Model very complex and high-dimensional distributions.

#StyleGAN2 Karras, Tero, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Analyzing and
improving the image quality of stylegan." CVPR 2020. [code]

10 3.3 Generative
Why Generative Models? (2)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• possibly perform data augmentation
• simulate possible futures for learning algorithms
• Fill blanks in the data

#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N.
#SEGAN S. Pascual, A. Bonafonte, J. Serrà. SEGAN: Speech
Learning temporal coherence via self-supervision for GAN-based
Enhancement Generative Adversarial Network, INTERSPEECH,
video generation. ACM Transactions on Graphic, 2020.
2017.

11 3.3 Generative
Why Generative Models? (3a)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Example: edit pictures with guidance (photoshop super pro level)

L Chai, J Wulff, P Isola. “Using latent space regression to analyze and leverage compositionality in GANs”.
ICLR 2021. [tweet]

12 3.3 Generative
Why Generative Models? (3b)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Transfer characteristics from one sample to another

#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.

13 3.3 Generative
Why Generative Models? (3c)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Generate images from a description

An astronaut riding a
horse in a
photorealistic style

#DALL·E 2 CA. Ramesh. P. Dhariwal, A. Nichol, et al. “Hierarchical Text-Conditional Image Generation with CLIP
latents”, 2022.

14 3.3 Generative
Types of Generative Models
• Explicit Density
• Tractable density
• Fully visible belief networks (FVBNs)
• Approximate Density
• Variational Autoencoders
• Implicit Density
• Generative Adversarial Networks (GANs)
• Denoising
• Difussion models

Ian Goodfellow, NIPS 2017 Tutorial: Generative Adversarial Networks.

15 3.3 Generative
Outline

• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)

16 3.3 Generative
Explicit density model: Tractable density (1)
• Distributions over high dimensional objects are very sparse!!
• Take for instance MNIST with 28x28 black&white images

Learn

Training Dataset
pdata(x)

• 2!"×!" = 2$"% ≈ 10!&' parameters (space of possible x)


• However, only 10 possible digits (0,1,2,…,9)
• Idea: write as a product of simpler terms using conditional probability
𝑝 𝑥( , 𝑥! = 𝑝 𝑥( 𝑝(𝑥! |𝑥( )

17 3.3 Generative
Explicit density model: Tractable density (2)
• Use chain rule to decompose likelihood of sample x
into a product of 1D distributions:
$

𝑝 𝒙 = $ 𝑝 𝒙! |𝒙# , … , 𝒙!%#
!"#

Likelihood of sample x Probability of ith (word,char,pixel) value


(sentence, image) given all previous (words,chars,pixels)

18 3.3 Generative
Explicit density model: Tractable density (3)
• Use chain rule to decompose likelihood of sample x
into a product of 1D distributions:
$
𝑝 𝒙 = $ 𝑝 𝒙! |𝒙# , … , 𝒙!%#
!"#

Can you guess the next


pixel (in red) conditioned
• Still complex! on the previous ones?

• Use neural networks to approximate the conditional


distributions: 𝑝& (𝑥! |𝑥# , … , 𝑥!%# )
• Then maximize likelihood of training data
• Estimate as a distribution peaked at the 𝑛 data points we
have

19 3.3 Generative
Explicit density model: Tractable density (4)
Do you recall any network architecture we can use to
approximate the conditional probability?

Do not use encoder but feed


Do not use argmax but use
directly a seed (random noise)
softmax output as conditional probability
and then
SAMPLE!
20 3.3 Generative
Tractable density: Summary

• Can explicitly compute likelihood 𝑝& (𝑥)


• Explicit likelihood of training data gives good
evaluation metric and relatively good samples
• Sequential generation is very slow

21 3.3 Generative
Outline

• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)

22 3.3 Generative
Autoencoder vs Variational Autoencoder (1)
• Autoencoders
• Predict at the output the same input data
• Do not need labels

𝐱 𝐱#
Encoder Decoder

! ! !
ℒ = 𝐱 − 𝐱# = 𝐱 − 𝑑(𝐳) = 𝐱 − 𝑑(𝑒 𝐱 )

23 3.3 Generative
Autoencoder vs Variational Autoencoder (2)
• How do we generate data with autoencoders?
• Sample z (random) and use the decoder only
• However, Autoencoders “just” memorizes codes 𝐳 from
our training samples!

𝑥 𝐱#
Encoder Decoder

“Generate”

24 3.3 Generative
Autoencoder vs Variational Autoencoder (3)
• How do we generate data with autoencoders?
• Sample z (random) and use the decoder only
• However, Autoencoders “just” memorizes codes 𝐳 from
our training samples!

Image from J. Rocca, “Understanding Variational Autoencoders (VAEs)”, 2019.

25 3.3 Generative
Variational Autoencoder (1)
• Training is regularised to avoid overfitting and ensure
that the latent space has good properties that enable
generative process
• Introduce a restriction in 𝐳, such that our data points 𝐱 are
distributed in a latent space (manifold) following a
specified probability density function (normally 𝑁(0, 𝐼))

𝐳~𝑁(0, 𝐼)

𝐱 𝐱#
Encoder Decoder

26 3.3 Generative
Variational Autoencoder (2)
• Intuition behind normally distributed z vectors: any output
distribution can be achieved from the simple 𝑁(0, 𝐼) with
powerful non-linear mappings.

N(0, I)

27 3.3 Generative
Variational Autoencoder (3)
• Multivariate normal distribution (dimensions k = 2)

Notation: 𝑁(𝝁, 𝝈! )
Location: 𝝁 ∈ ℝ"
Covariance: 𝝈! ∈ ℝ"×"

Source: Wikipedia. Image by Bscan

28 3.3 Generative
Variational inference (1)
• Variation Autoencoders: a.k.a. where Bayesian
theory (probability) and deep learning collide
• We want to maximize likelihood of training data:

𝑝$ 𝐱 = 9 𝑝$ 𝐳 𝑝$ 𝐱 𝐳 𝑑𝐳

• What is the problem with this? Intractable


Simple Gaussian prior

𝑝$ 𝐱 = 9 𝑝$ 𝐳 𝑝$ 𝐱 𝐳 𝑑𝐳
Decoder Neural Network

Intractable to compute
p(x|z) for every z
• Posterior also intractable
𝑝$ 𝐳 𝐱 = 𝑝$ 𝐱 𝐳 𝑝(𝐳)/𝑝(𝐱)
29 3.3 Generative
Variational inference (2)
• Solution
• Use the encoder network 𝑞% 𝐳 𝐱 to approximate 𝑝$ 𝐳 𝐱
• This will allow us to derive a lower bound on the data
likelihood

q𝜙(z|x) z pθ(x|z)
𝐱 𝐱#

30 3.3 Generative
Variational inference (3)
• Let’s analyse the (log) data likelihood:
Taking expectation wrt. z (p𝜃(x) does not depend on z)
Ez⇠q (z|x) [log p✓ (x)]
<latexit sha1_base64="9j97MWT0idvDcELLNdichs9vXhg=">AAAXU3ic7VjNc9tEFHdLgWIobWHSHrjs0KTjTNRiOV/tdMKUpoHOJE2a1HFDo1gjy+tYU8lypXWxo+z/xj/BgSuHXuEMB3ir1UqrjzhOKD3hTBLt+/i9j33v7Vqtvm35pFr99cLFDy59+NHHlz8pf/rZlc+vXrv+RcN3B56Jd03Xdr29luFj2+rhXWIRG+/1PWw4LRu/aL1aZfwXb7DnW26vTkZ9fOAYhz2rY5kGAZJ+fepHrY07oBxCBWZ/jwY7PzyigTp/T6nVFhV1cYGWMzIvI5naIvCrS4q6vJST2RY49+8rS0tKTc2LrMswtWVloZYTacQoispQFpZpeSYjtLa5KoDmq4q6sKgs5mw9XhMyS6qizs8r6v17IVIP/2S6jmP02oEWAmkSKk3zQxBNQswhsOxpSSYz+ixvWpLDLHdb4m7nuOsSdz3HbUjcRujXjNZygi7VCVpBHaTZuEMqiNFeUD0YUqSZbZeEhCETmot5XZkHAAG5o9KI36J6F2meddgls9yC1jVIMKLczmHWziiDRRKcURqn2NNiL3cFvSOEYeE7yVLWHbIA6jSrmgRXT4ITLglAsczywWXHIF3TsIMNCl5r/sBpBnUGt1KlaA08DXeg4xkmbI7hEcuwkaRDJWrobAFKRhtAC7TeixGt77l94nLdutCMcg28NiNaK2ROzQFGabZojhRYUFZSxgv1CrwJE8tsH0d/z2aQ2zuO/qFvkfrO4eKCVhVeSEgRSptIs3p8j1qtYIc2g7bepVzBZwWeZ/sRG/MGCb3cR0YltiF0Z5WEWpOpwoeYu5lwo/wfZA1vIhT1t2H3u0Zkm+AhCXy3QxxjSHkBCNfifTwBx4wgoIRYraiMxbF1qxm3o1WYnxCDxy2gHqAkZcURMM05nj72w6ZvqPOSt0FBKIjN4FAGpi9brIcLGLbNelynbNaG5MZYnL1QZo/jbEe129yWYGG5zoBl4bSdmBzbBJ0GbwAtjEgGzoc/BO8sB/uorb8GPleRjI9XeQV8+AFdCBOWAkDyZDzAG4pyqWebecak6d1M2oBwjsSBFkvdO4KJa9qKw3m68UwKhQd8xCZHDLkiMGqRSodmzks1qxqflmruLNJrRWXNu0TOeJUqqbVKk4kg05/QuJNiMuxyle9iYfs8ffL8u8LtkyIudih0YEL7kxYQn9xSDzNZqYth2azDXvqvPRKw+qZFTT2JgX9Tpyc6MLbahroNXn1fgcuMDccOq4Lo8cTN2TBG2Nt0PSd2f7I9mysk84JfpWPL/WUWJiGXZ/6vhXdUC2xJio9Kdq1/oD0Q48X1onUkGyBVqcUJh1kXXS2G0Rxjxz2x7DYGCiPxUcNsSVaZHD60ekEfTHsWM+lbvegyEGYFLktqFT7NoMYJKg19E7cluCTAyWK6/tmV3oSDiz2dZhN0v6mdx26xIoZvWSJgnox6/virJ4ffUDRmUh8ryc5GWQaqGPF1KklvFUpv5aT/+6aPjTyfxMjTAitbxVa2xoyWk/DPNnFS6DtnShH3R0kFX2DleWGmtibIFERS4W6J83xHgmIvFXiLEaM1sA2PBrZp0/K0ZruHQV/XSBcTozKcpdPoNti6jdC0KMM1+GJyxLrDQa91rd+1KkfHIChmbw5BzNlp6C2/b5g4UB2HXRZRmUPLyEcpGN4uCdjx0Wy8OJqlCSf04P1ZimazHH5qgdDJvpzqSsqJ5JS6g05xPuWN7HyCMXdOjPHZPX9Ej/VgfSO+oaZTeHyMpCCk5phYCShCbTr8apGv7rP4kII7tZ4er62iSeNmbwXRBMFPo3J4SoieLZf1a7eqd6vhB+Uf1Ojh1kPz6tu/6Z8XnunXfoOj2Rw4uEdM2/D9fbXaJwcBe+9g2piWtYGPYW9fGYd4Hx57Bpw1B0H42o+iGaC0EZz58NsjKKTKGoHh+P7IaYEky4uf5TFiEW9/QDr3DgKr1x8Q3DO5oc7ARsRF7IUyalseNok9ggfD9CzwFZldAwqWYC9tpeVQlhQ1m4L8Q6N2V126u7AN2XlU4p/Lpa9KX5cqJbW0XHpYelJ6VtotmVM/T72d+n3qjxu/3Pjr5sWbl7joxQuRzpel1OfmlX8AisJJ7g==</latexit>

log p✓ (x) =
h i
p✓ (x|z)p✓ (z)
= Ez log p✓ (z|x) Bayes’s rule
h i
= p✓ (x|z)p✓ (z) q (z|x)
Ez log p✓ (z|x) q (z|x) Multiply by constant Expectation wrt.
h i h i z let us write
q (z|x) q (z|x) Kullback-Leibler
= Ez [log p✓ (x|z)] Ez log p✓ (z) + Ez log p✓ (z|x)
terms
= Ez [log p✓ (x|z)] DKL (q (z|x)||p✓ (z)) + DKL (q (z|x)||p✓ (z|x))

Tractable lower bound p𝜃(z|x) intractable but KL >= 0

Ez [log p✓ (x|z)]
<latexit sha1_base64="1XJMaBQPbTjFvQzn8EhOhk7LjxQ=">AAAXV3ic7Vhbc9NGFDa0pdS9AO0EHvqyUxLGmQhqOTcYJh1KSMtMQm44JkMUa2R5HWuQLCOtqR1lf0Pf+3v6J3jtWzvTp/a9PavVSqtLHAcoT3UmifZcvnPZc86u1erblk+q1dcXLn7w4UeXPr78SfnTzz7/4srVa182fHfgmXjPdG3X228ZPratHt4jFrHxft/DhtOy8bPWi1XGf/YKe77l9upk1MeHjnHUszqWaRAg6demDrU27oByCBWY/X0a7P74kAbq/F2lVltU1MUFWs7IPI9kaovAry4p6vJSTmZH4Ny7pywtKTU1L7Iuw9SWlYVaTqQRoygqQ1lYpuWZjNDa5qoAmq8q6sKispiz9WhNyCypijo/r6j37oZIPfyT6TqO0WsHWgikSag0zQ9BNAkxh8CypyWZzOizvGlJDrPcHYm7k+OuS9z1HLchcRuhXzNaywm6VCdoBXWQZuMOqSBGe0b1YEiRZrZdEhKGTGgu5nVlHgAE5LZKI36L6l2kedZRl8xyC1rXIMGIcjtHWTujDBZJcEZpnGJPi73cE/SOEIaF7yRLWXfIAqjTrGoSXD0JTrgkAMUyyweXHYN0TcMONih4rfkDpxnUGdxKlaI18DTcgY5nmLA5hkcsw0aSDpWoobMFKBltAC3Qei9GtL7n9onLdetCM8o18NqMaK2QOTUHGKXZojlSYEFZSRkv1CvwJkwss30S/T2fQW7vJPqHvkPqO4eLC1pVeCEhRShtIs3q8T1qtYJd2gzaepdyBZ8VeJ7tR2zMGyT08gAZldiG0J1VEmpNpgofYu5mwo3yf5g1vIlQ1N+G3e8akW2ChyTw3Q5xjCHlBSBci/fxFBwzgoASYrWiMhbH1q1m3I5WYX5CDB63gLqPkpQVR8A053j62A+bvqHOc94GBaEgNoNDGZi+bLEeLmDYNutxnbJZG5IbY3H2Q5l9jrMT1W5zR4KF5ToDloXTdmJybBN0GrwBtDAiGTgf/hC8sxzso7b+EvhcRTI+XuUF8OEHdCFMWAoAyZPxAK8oyqWebeY5k6Z3M2kDwhskDrRY6t4RTFzTVhzOk41tKRQe8DGbHDHkisCoRSodmjkv1axqfFqqubNIrxWVNe8SOeNVqqTWKk0mgkx/TONOismwy1W+i4Xt8+Tx0+8Lt0+KuNih0IEJ7U9aQHxySz3MZKUuhmWzDnvpv/RIwOqbFjX1JAbepk5PdWBstQ11G7z6oQKXGRuOHVYF0eOpm7NhjLC36XpO7P5kezZXSOYFv0rHlvvzLExCLs/8XwvvqBbYkhQflexaf1+7L8aL60XrSDZAqlKLEw6zLrpaDKM5xo57YtltDBRG4qOG2ZKsMjl8ZPWCPpj2LGbSt3rRZSDMClyW1Cp8mkGNE1Qa+iZuS3BJgJPFdP3zK70KBxd7Ossm6H5bexO7xYoYvmWJgHky6vnjr54cfkPRmEl9rCQ7G2UZqGLE16kkvVUovZWT/u+bPjbydBIjTwqsbBVb2RozWk7DP9/ESaHvnitF3B8lFXyBlaeFmdqaIFMQSYW7Jc7zXQmKvVTgLUaM1sA2PBrYpg1bPq3Z7lHQ1zXSxcSoDGfpNLoFxm4hNC3qcA2+mRyz9nDQS13rd63K8QkIiuGbQxCDdhqay+8bJg5Ux2G3RRjZHFuGPk7h8IZJ0E6OZ+PF8SxNOKEL79FUNJ7lBKQWCJ3uzNm+pLxITqrb6AzvU+7I3icYc2+IMT6/bxHSIz1Y34ivqekknpwgKQqpQyZWAopQm2Z+5gv8PC6k0M4sqEdrq2jSsNmbQTRB7NOoHJ4Uom/LZf3qzeqdavhB+Qc1erj5YPbn5u+/tP7c1q/+BsezOXBwj5i24fsHarVPDgP27sG0MS1rAx/D3r4wjvABPPYMOG8Og/DVH0UzQGkjOPfht0dQSJU1AsPx/ZHTAkmWFz/LY8Qi3sGAdO4eBlavPyC4Z3JDnYGNiIvYS2XUtjxsEnsED4bpWeArMrsGFCzBXtpKy6EsKWo2BfmHRu2OunRnYQey87DEP5dLX5e+KVVKamm59KD0uLRd2iuZU79O/TH119Tf119f/+fGpRuXuejFC5HOV6XU58a1fwGdI0rD</latexit>

log p✓ (x) DKL (q (z|x)||p✓ (z|x)) = DKL (q (z|x)||p✓ (z))

−ℒ(𝑥, 𝜃, 𝜙)

31 3.3 Generative
Variational inference (4)

ℒ 𝑥, 𝜃, 𝜙 = 𝐷'( 𝑞) 𝑥|𝑧 ||𝑝& 𝑧 − 𝔼* log 𝑝& 𝑥|𝑧

Regularization of our Reconstruction loss of our data


latent representation given latent space → NEURAL
→ NEURAL ENCODER DECODER reconstruction loss!
projects over prior.

Training: Minimize loss (maximize lower bound)


-
𝜃 ∗ , 𝜙 ∗ = arg min ; ℒ 𝑥! , 𝜃, 𝜙
&,)
!"#

32 3.3 Generative
Variational inference (5)
• Putting it all together:
• Encoder / Decoder setup with Variational Autoencoder losses to
regularize and reconstruct.
N(𝝁& , 𝝈!& )

𝜇&
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
SAMPLE
𝜎&!
𝐱 𝐱#

Encode Decode 𝔼) log 𝑝$ (𝑥|𝑧)

𝐷'( 𝑁 𝝁& , 𝝈!& ||𝑁(0, 𝐼) !


H
− 𝒙−𝒙
1
tr 𝝈!& + 𝝁*& 𝝁& − 𝑘 − log 𝝈!&
2
33 3.3 Generative
Variational inference (6)
• Putting it all together:
• Encoder / Decoder setup with Variational Autoencoder losses to
regularize and reconstruct.

𝜇&
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
SAMPLE
𝜎&!
𝐱 𝐱#

Encode Decode

1
ℒ= tr 𝛔!& + 𝛍*& 𝛍& − 𝑘 − log 𝛔!& H
+λ 𝒙−𝒙 !
2

34 3.3 Generative
Intuition about regularization (1)
• Two main properties:
• Continuity two close points in the latent space should not give two
completely different contents once decoded
• Completeness For a chosen distribution, a point sampled in the latent
space should give ”meaningful” content once decoded

Image from J. Rocca, “Understanding Variational Autoencoders (VAEs)”, 2019.

35 3.3 Generative
Intuition about regularization (2)
• Need to regularise both the covariance matrix and the mean
of the distributions returned by the encoder:
• Avoid distributions with very different mean or tiny variances
• Regularization tends to create a “gradient” over the
information encoded in the latent space

Image from J. Rocca, “Understanding Variational Autoencoders (VAEs)”, 2019.

36 3.3 Generative
Reparametrization trick (1)
• We cannot backpropagate through sampling as it is not
differentiable!
1
ℒ= tr 𝛔"! + 𝛍#! 𝛍! − 𝑘 − log 𝛔"! 1
+λ 𝒙−𝒙 "
2

𝜇& 𝜕ℒ
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
𝜎&! SAMPLE 𝜕𝜙
𝐱 𝐱#

Encode Decode

37 3.3 Generative
Reparametrization trick (2)
Solution
• Sample 𝜀~𝑁(0,1) and define z from it, multiplying by 𝜎& and
adding 𝜇&

𝑧 = 𝝁& + 𝝈& 𝜺

𝝁& 𝜕ℒ
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
𝝈!& 𝜕𝜙
𝐱 𝐱#

Encode 𝜺 Decode
SAMPLE

𝑁(0, 𝐼)
38 3.3 Generative
Generative behaviour (1)
• Sample from prior, discarding the encoder path:
Example for MNIST

𝑧~𝑁(0,1) 𝑝$ (𝑥|𝑧)

39 3.3 Generative
Generative behaviour (2)
• Sample from prior, discarding the encoder path:

Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]

40 3.3 Generative
Variational Autoencoders: Summary

• Probabilistic spin to autoencoders


• Allows generation of data
• Defines an intractable density à optimize a variational lower bound
• Allows inference of 𝒒𝝓 (𝒛|𝒙)
• Maximize lower bound of likelihood
• OK but not as good as the full maximization of 𝑝* (𝑥)
• Samples tend to be blurrier and lower quality compared to
GANs
• Active area of research
• Richer approximations such as Gaussian Mixture Models
• Incorporating structure in latent variables

41 3.3 Generative
Want to know more?

● Carl Doersch, Tutorial on Variational Autoencoders (2013)


● Variational Autoencoder: Intuition and Implementation
● Jaan Altosar, Tutorial - What is a variational autoencoder?

42 3.3 Generative
Questions?

https://imatge.upc.edu/web/people/javier-ruiz-hidalgo
43 3.3 Generative

You might also like