AA2 3.3.1 Generative Explicit Density 2024
AA2 3.3.1 Generative Explicit Density 2024
3. Advanced architectures
3.3 Generative networks
3.3.1 Explicit density
Image and Video Processing Group, TSC, UPC
3.3 Generative
Outline
• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)
2 3.3 Generative
Supervised vs Unsupervised training
3 3.3 Generative
Generative models (1)
Classification Regression
Image
Jim Carrey
What Language?
Discriminative Modeling
p(y|x)
4 3.3 Generative
Generative models (2)
Image
Jim Carrey
What Language?
5 3.3 Generative
Generative models (3)
Sample
Out
Learn
Generated Samples
6 3.3 Generative
Generative models (4)
Interpolated Samples
Learn
7 3.3 Generative
Generative models (5)
M samples = M dimensions
.
.
.
32 x3 channels
8 3.3 Generative
Generative models (6)
Figure from NIPS 2016 Tutorial: Generative Adversarial Networks (I. Goodfellow)
9 3.3 Generative
Why Generative Models? (1)
• Model very complex and high-dimensional distributions.
#StyleGAN2 Karras, Tero, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. "Analyzing and
improving the image quality of stylegan." CVPR 2020. [code]
10 3.3 Generative
Why Generative Models? (2)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• possibly perform data augmentation
• simulate possible futures for learning algorithms
• Fill blanks in the data
#TecoGAN Chu, M., Xie, Y., Mayer, J., Leal-Taixé, L., & Thuerey, N.
#SEGAN S. Pascual, A. Bonafonte, J. Serrà. SEGAN: Speech
Learning temporal coherence via self-supervision for GAN-based
Enhancement Generative Adversarial Network, INTERSPEECH,
video generation. ACM Transactions on Graphic, 2020.
2017.
11 3.3 Generative
Why Generative Models? (3a)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Example: edit pictures with guidance (photoshop super pro level)
L Chai, J Wulff, P Isola. “Using latent space regression to analyze and leverage compositionality in GANs”.
ICLR 2021. [tweet]
12 3.3 Generative
Why Generative Models? (3b)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Transfer characteristics from one sample to another
#EDN Chan, C., Ginosar, S., Zhou, T., & Efros, A. A. Everybody dance now. ICCV 2019.
13 3.3 Generative
Why Generative Models? (3c)
• Model very complex and high-dimensional distributions.
• Be able to generate realistic synthetic samples
• Manipulate real samples with the assistance of the generative model
• Generate images from a description
An astronaut riding a
horse in a
photorealistic style
#DALL·E 2 CA. Ramesh. P. Dhariwal, A. Nichol, et al. “Hierarchical Text-Conditional Image Generation with CLIP
latents”, 2022.
14 3.3 Generative
Types of Generative Models
• Explicit Density
• Tractable density
• Fully visible belief networks (FVBNs)
• Approximate Density
• Variational Autoencoders
• Implicit Density
• Generative Adversarial Networks (GANs)
• Denoising
• Difussion models
15 3.3 Generative
Outline
• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)
16 3.3 Generative
Explicit density model: Tractable density (1)
• Distributions over high dimensional objects are very sparse!!
• Take for instance MNIST with 28x28 black&white images
Learn
Training Dataset
pdata(x)
17 3.3 Generative
Explicit density model: Tractable density (2)
• Use chain rule to decompose likelihood of sample x
into a product of 1D distributions:
$
𝑝 𝒙 = $ 𝑝 𝒙! |𝒙# , … , 𝒙!%#
!"#
18 3.3 Generative
Explicit density model: Tractable density (3)
• Use chain rule to decompose likelihood of sample x
into a product of 1D distributions:
$
𝑝 𝒙 = $ 𝑝 𝒙! |𝒙# , … , 𝒙!%#
!"#
19 3.3 Generative
Explicit density model: Tractable density (4)
Do you recall any network architecture we can use to
approximate the conditional probability?
21 3.3 Generative
Outline
• Unsupervised learning
• Generative Models
• Explicit density
• Tractable density
• Variational autoencoders
• Implicit density
• Generative Adversarial Networks (GANs)
22 3.3 Generative
Autoencoder vs Variational Autoencoder (1)
• Autoencoders
• Predict at the output the same input data
• Do not need labels
𝐱 𝐱#
Encoder Decoder
! ! !
ℒ = 𝐱 − 𝐱# = 𝐱 − 𝑑(𝐳) = 𝐱 − 𝑑(𝑒 𝐱 )
23 3.3 Generative
Autoencoder vs Variational Autoencoder (2)
• How do we generate data with autoencoders?
• Sample z (random) and use the decoder only
• However, Autoencoders “just” memorizes codes 𝐳 from
our training samples!
𝑥 𝐱#
Encoder Decoder
“Generate”
24 3.3 Generative
Autoencoder vs Variational Autoencoder (3)
• How do we generate data with autoencoders?
• Sample z (random) and use the decoder only
• However, Autoencoders “just” memorizes codes 𝐳 from
our training samples!
25 3.3 Generative
Variational Autoencoder (1)
• Training is regularised to avoid overfitting and ensure
that the latent space has good properties that enable
generative process
• Introduce a restriction in 𝐳, such that our data points 𝐱 are
distributed in a latent space (manifold) following a
specified probability density function (normally 𝑁(0, 𝐼))
𝐳~𝑁(0, 𝐼)
𝐱 𝐱#
Encoder Decoder
26 3.3 Generative
Variational Autoencoder (2)
• Intuition behind normally distributed z vectors: any output
distribution can be achieved from the simple 𝑁(0, 𝐼) with
powerful non-linear mappings.
N(0, I)
27 3.3 Generative
Variational Autoencoder (3)
• Multivariate normal distribution (dimensions k = 2)
Notation: 𝑁(𝝁, 𝝈! )
Location: 𝝁 ∈ ℝ"
Covariance: 𝝈! ∈ ℝ"×"
28 3.3 Generative
Variational inference (1)
• Variation Autoencoders: a.k.a. where Bayesian
theory (probability) and deep learning collide
• We want to maximize likelihood of training data:
𝑝$ 𝐱 = 9 𝑝$ 𝐳 𝑝$ 𝐱 𝐳 𝑑𝐳
𝑝$ 𝐱 = 9 𝑝$ 𝐳 𝑝$ 𝐱 𝐳 𝑑𝐳
Decoder Neural Network
Intractable to compute
p(x|z) for every z
• Posterior also intractable
𝑝$ 𝐳 𝐱 = 𝑝$ 𝐱 𝐳 𝑝(𝐳)/𝑝(𝐱)
29 3.3 Generative
Variational inference (2)
• Solution
• Use the encoder network 𝑞% 𝐳 𝐱 to approximate 𝑝$ 𝐳 𝐱
• This will allow us to derive a lower bound on the data
likelihood
q𝜙(z|x) z pθ(x|z)
𝐱 𝐱#
30 3.3 Generative
Variational inference (3)
• Let’s analyse the (log) data likelihood:
Taking expectation wrt. z (p𝜃(x) does not depend on z)
Ez⇠q (z|x) [log p✓ (x)]
<latexit sha1_base64="9j97MWT0idvDcELLNdichs9vXhg=">AAAXU3ic7VjNc9tEFHdLgWIobWHSHrjs0KTjTNRiOV/tdMKUpoHOJE2a1HFDo1gjy+tYU8lypXWxo+z/xj/BgSuHXuEMB3ir1UqrjzhOKD3hTBLt+/i9j33v7Vqtvm35pFr99cLFDy59+NHHlz8pf/rZlc+vXrv+RcN3B56Jd03Xdr29luFj2+rhXWIRG+/1PWw4LRu/aL1aZfwXb7DnW26vTkZ9fOAYhz2rY5kGAZJ+fepHrY07oBxCBWZ/jwY7PzyigTp/T6nVFhV1cYGWMzIvI5naIvCrS4q6vJST2RY49+8rS0tKTc2LrMswtWVloZYTacQoispQFpZpeSYjtLa5KoDmq4q6sKgs5mw9XhMyS6qizs8r6v17IVIP/2S6jmP02oEWAmkSKk3zQxBNQswhsOxpSSYz+ixvWpLDLHdb4m7nuOsSdz3HbUjcRujXjNZygi7VCVpBHaTZuEMqiNFeUD0YUqSZbZeEhCETmot5XZkHAAG5o9KI36J6F2meddgls9yC1jVIMKLczmHWziiDRRKcURqn2NNiL3cFvSOEYeE7yVLWHbIA6jSrmgRXT4ITLglAsczywWXHIF3TsIMNCl5r/sBpBnUGt1KlaA08DXeg4xkmbI7hEcuwkaRDJWrobAFKRhtAC7TeixGt77l94nLdutCMcg28NiNaK2ROzQFGabZojhRYUFZSxgv1CrwJE8tsH0d/z2aQ2zuO/qFvkfrO4eKCVhVeSEgRSptIs3p8j1qtYIc2g7bepVzBZwWeZ/sRG/MGCb3cR0YltiF0Z5WEWpOpwoeYu5lwo/wfZA1vIhT1t2H3u0Zkm+AhCXy3QxxjSHkBCNfifTwBx4wgoIRYraiMxbF1qxm3o1WYnxCDxy2gHqAkZcURMM05nj72w6ZvqPOSt0FBKIjN4FAGpi9brIcLGLbNelynbNaG5MZYnL1QZo/jbEe129yWYGG5zoBl4bSdmBzbBJ0GbwAtjEgGzoc/BO8sB/uorb8GPleRjI9XeQV8+AFdCBOWAkDyZDzAG4pyqWebecak6d1M2oBwjsSBFkvdO4KJa9qKw3m68UwKhQd8xCZHDLkiMGqRSodmzks1qxqflmruLNJrRWXNu0TOeJUqqbVKk4kg05/QuJNiMuxyle9iYfs8ffL8u8LtkyIudih0YEL7kxYQn9xSDzNZqYth2azDXvqvPRKw+qZFTT2JgX9Tpyc6MLbahroNXn1fgcuMDccOq4Lo8cTN2TBG2Nt0PSd2f7I9mysk84JfpWPL/WUWJiGXZ/6vhXdUC2xJio9Kdq1/oD0Q48X1onUkGyBVqcUJh1kXXS2G0Rxjxz2x7DYGCiPxUcNsSVaZHD60ekEfTHsWM+lbvegyEGYFLktqFT7NoMYJKg19E7cluCTAyWK6/tmV3oSDiz2dZhN0v6mdx26xIoZvWSJgnox6/virJ4ffUDRmUh8ryc5GWQaqGPF1KklvFUpv5aT/+6aPjTyfxMjTAitbxVa2xoyWk/DPNnFS6DtnShH3R0kFX2DleWGmtibIFERS4W6J83xHgmIvFXiLEaM1sA2PBrZp0/K0ZruHQV/XSBcTozKcpdPoNti6jdC0KMM1+GJyxLrDQa91rd+1KkfHIChmbw5BzNlp6C2/b5g4UB2HXRZRmUPLyEcpGN4uCdjx0Wy8OJqlCSf04P1ZimazHH5qgdDJvpzqSsqJ5JS6g05xPuWN7HyCMXdOjPHZPX9Ej/VgfSO+oaZTeHyMpCCk5phYCShCbTr8apGv7rP4kII7tZ4er62iSeNmbwXRBMFPo3J4SoieLZf1a7eqd6vhB+Uf1Ojh1kPz6tu/6Z8XnunXfoOj2Rw4uEdM2/D9fbXaJwcBe+9g2piWtYGPYW9fGYd4Hx57Bpw1B0H42o+iGaC0EZz58NsjKKTKGoHh+P7IaYEky4uf5TFiEW9/QDr3DgKr1x8Q3DO5oc7ARsRF7IUyalseNok9ggfD9CzwFZldAwqWYC9tpeVQlhQ1m4L8Q6N2V126u7AN2XlU4p/Lpa9KX5cqJbW0XHpYelJ6VtotmVM/T72d+n3qjxu/3Pjr5sWbl7joxQuRzpel1OfmlX8AisJJ7g==</latexit>
log p✓ (x) =
h i
p✓ (x|z)p✓ (z)
= Ez log p✓ (z|x) Bayes’s rule
h i
= p✓ (x|z)p✓ (z) q (z|x)
Ez log p✓ (z|x) q (z|x) Multiply by constant Expectation wrt.
h i h i z let us write
q (z|x) q (z|x) Kullback-Leibler
= Ez [log p✓ (x|z)] Ez log p✓ (z) + Ez log p✓ (z|x)
terms
= Ez [log p✓ (x|z)] DKL (q (z|x)||p✓ (z)) + DKL (q (z|x)||p✓ (z|x))
Ez [log p✓ (x|z)]
<latexit sha1_base64="1XJMaBQPbTjFvQzn8EhOhk7LjxQ=">AAAXV3ic7Vhbc9NGFDa0pdS9AO0EHvqyUxLGmQhqOTcYJh1KSMtMQm44JkMUa2R5HWuQLCOtqR1lf0Pf+3v6J3jtWzvTp/a9PavVSqtLHAcoT3UmifZcvnPZc86u1erblk+q1dcXLn7w4UeXPr78SfnTzz7/4srVa182fHfgmXjPdG3X228ZPratHt4jFrHxft/DhtOy8bPWi1XGf/YKe77l9upk1MeHjnHUszqWaRAg6demDrU27oByCBWY/X0a7P74kAbq/F2lVltU1MUFWs7IPI9kaovAry4p6vJSTmZH4Ny7pywtKTU1L7Iuw9SWlYVaTqQRoygqQ1lYpuWZjNDa5qoAmq8q6sKispiz9WhNyCypijo/r6j37oZIPfyT6TqO0WsHWgikSag0zQ9BNAkxh8CypyWZzOizvGlJDrPcHYm7k+OuS9z1HLchcRuhXzNaywm6VCdoBXWQZuMOqSBGe0b1YEiRZrZdEhKGTGgu5nVlHgAE5LZKI36L6l2kedZRl8xyC1rXIMGIcjtHWTujDBZJcEZpnGJPi73cE/SOEIaF7yRLWXfIAqjTrGoSXD0JTrgkAMUyyweXHYN0TcMONih4rfkDpxnUGdxKlaI18DTcgY5nmLA5hkcsw0aSDpWoobMFKBltAC3Qei9GtL7n9onLdetCM8o18NqMaK2QOTUHGKXZojlSYEFZSRkv1CvwJkwss30S/T2fQW7vJPqHvkPqO4eLC1pVeCEhRShtIs3q8T1qtYJd2gzaepdyBZ8VeJ7tR2zMGyT08gAZldiG0J1VEmpNpgofYu5mwo3yf5g1vIlQ1N+G3e8akW2ChyTw3Q5xjCHlBSBci/fxFBwzgoASYrWiMhbH1q1m3I5WYX5CDB63gLqPkpQVR8A053j62A+bvqHOc94GBaEgNoNDGZi+bLEeLmDYNutxnbJZG5IbY3H2Q5l9jrMT1W5zR4KF5ToDloXTdmJybBN0GrwBtDAiGTgf/hC8sxzso7b+EvhcRTI+XuUF8OEHdCFMWAoAyZPxAK8oyqWebeY5k6Z3M2kDwhskDrRY6t4RTFzTVhzOk41tKRQe8DGbHDHkisCoRSodmjkv1axqfFqqubNIrxWVNe8SOeNVqqTWKk0mgkx/TONOismwy1W+i4Xt8+Tx0+8Lt0+KuNih0IEJ7U9aQHxySz3MZKUuhmWzDnvpv/RIwOqbFjX1JAbepk5PdWBstQ11G7z6oQKXGRuOHVYF0eOpm7NhjLC36XpO7P5kezZXSOYFv0rHlvvzLExCLs/8XwvvqBbYkhQflexaf1+7L8aL60XrSDZAqlKLEw6zLrpaDKM5xo57YtltDBRG4qOG2ZKsMjl8ZPWCPpj2LGbSt3rRZSDMClyW1Cp8mkGNE1Qa+iZuS3BJgJPFdP3zK70KBxd7Ossm6H5bexO7xYoYvmWJgHky6vnjr54cfkPRmEl9rCQ7G2UZqGLE16kkvVUovZWT/u+bPjbydBIjTwqsbBVb2RozWk7DP9/ESaHvnitF3B8lFXyBlaeFmdqaIFMQSYW7Jc7zXQmKvVTgLUaM1sA2PBrYpg1bPq3Z7lHQ1zXSxcSoDGfpNLoFxm4hNC3qcA2+mRyz9nDQS13rd63K8QkIiuGbQxCDdhqay+8bJg5Ux2G3RRjZHFuGPk7h8IZJ0E6OZ+PF8SxNOKEL79FUNJ7lBKQWCJ3uzNm+pLxITqrb6AzvU+7I3icYc2+IMT6/bxHSIz1Y34ivqekknpwgKQqpQyZWAopQm2Z+5gv8PC6k0M4sqEdrq2jSsNmbQTRB7NOoHJ4Uom/LZf3qzeqdavhB+Qc1erj5YPbn5u+/tP7c1q/+BsezOXBwj5i24fsHarVPDgP27sG0MS1rAx/D3r4wjvABPPYMOG8Og/DVH0UzQGkjOPfht0dQSJU1AsPx/ZHTAkmWFz/LY8Qi3sGAdO4eBlavPyC4Z3JDnYGNiIvYS2XUtjxsEnsED4bpWeArMrsGFCzBXtpKy6EsKWo2BfmHRu2OunRnYQey87DEP5dLX5e+KVVKamm59KD0uLRd2iuZU79O/TH119Tf119f/+fGpRuXuejFC5HOV6XU58a1fwGdI0rD</latexit>
−ℒ(𝑥, 𝜃, 𝜙)
31 3.3 Generative
Variational inference (4)
32 3.3 Generative
Variational inference (5)
• Putting it all together:
• Encoder / Decoder setup with Variational Autoencoder losses to
regularize and reconstruct.
N(𝝁& , 𝝈!& )
𝜇&
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
SAMPLE
𝜎&!
𝐱 𝐱#
𝜇&
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
SAMPLE
𝜎&!
𝐱 𝐱#
Encode Decode
1
ℒ= tr 𝛔!& + 𝛍*& 𝛍& − 𝑘 − log 𝛔!& H
+λ 𝒙−𝒙 !
2
34 3.3 Generative
Intuition about regularization (1)
• Two main properties:
• Continuity two close points in the latent space should not give two
completely different contents once decoded
• Completeness For a chosen distribution, a point sampled in the latent
space should give ”meaningful” content once decoded
35 3.3 Generative
Intuition about regularization (2)
• Need to regularise both the covariance matrix and the mean
of the distributions returned by the encoder:
• Avoid distributions with very different mean or tiny variances
• Regularization tends to create a “gradient” over the
information encoded in the latent space
36 3.3 Generative
Reparametrization trick (1)
• We cannot backpropagate through sampling as it is not
differentiable!
1
ℒ= tr 𝛔"! + 𝛍#! 𝛍! − 𝑘 − log 𝛔"! 1
+λ 𝒙−𝒙 "
2
𝜇& 𝜕ℒ
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
𝜎&! SAMPLE 𝜕𝜙
𝐱 𝐱#
Encode Decode
37 3.3 Generative
Reparametrization trick (2)
Solution
• Sample 𝜀~𝑁(0,1) and define z from it, multiplying by 𝜎& and
adding 𝜇&
𝑧 = 𝝁& + 𝝈& 𝜺
𝝁& 𝜕ℒ
𝑞% (𝑧|𝑥) z 𝑝$ (𝑥|𝑧)
𝝈!& 𝜕𝜙
𝐱 𝐱#
Encode 𝜺 Decode
SAMPLE
𝑁(0, 𝐼)
38 3.3 Generative
Generative behaviour (1)
• Sample from prior, discarding the encoder path:
Example for MNIST
𝑧~𝑁(0,1) 𝑝$ (𝑥|𝑧)
39 3.3 Generative
Generative behaviour (2)
• Sample from prior, discarding the encoder path:
Vahdat, Arash, and Jan Kautz. "NVAE: A deep hierarchical variational autoencoder." NeurIPS 2020. [code]
40 3.3 Generative
Variational Autoencoders: Summary
41 3.3 Generative
Want to know more?
42 3.3 Generative
Questions?
https://imatge.upc.edu/web/people/javier-ruiz-hidalgo
43 3.3 Generative