0% found this document useful (0 votes)
9 views

lecture16 GAN cont

The document discusses Generative Adversarial Networks (GANs) and their training processes, including the roles of the generator and discriminator. It introduces Wasserstein Loss as an alternative to binary cross-entropy loss, addressing issues like vanishing gradients. Additionally, it covers conditional and controllable generation techniques, evaluation metrics like Frechet Inception Distance (FID), and the Pix2Pix model for image-to-image translation.

Uploaded by

kayakbackwards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

lecture16 GAN cont

The document discusses Generative Adversarial Networks (GANs) and their training processes, including the roles of the generator and discriminator. It introduces Wasserstein Loss as an alternative to binary cross-entropy loss, addressing issues like vanishing gradients. Additionally, it covers conditional and controllable generation techniques, evaluation metrics like Frechet Inception Distance (FID), and the Pix2Pix model for image-to-image translation.

Uploaded by

kayakbackwards
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

COMP 4102 – Computer Vision

Generative Adversarial Networks (cont.)


Majid Komeili

* Unless otherwise noted, all material posted for this course are copyright of the
instructor, and cannot be reused or reposted without the instructor’s written permission.
Housekeeping Items

▪ GAN Tutorial: Thursday March 13 (tomorrow) at 5 pm over


Zoom.
▪ Tutorial will be recorded.

Majid Komeili, Carleton University 2


Generative Adversarial Networks
Discriminator learns to distinguish real from fake.

Random
noise

Generator learns to generate fakes that look real.

Generator and Discriminator learn from the competition with each other.
Ian Goodfellow et al., “Generative Adversarial Nets”, NIPS 2014
Majid Komeili, Carleton University 3
Training the Discriminator
𝑛

Recall: 𝐵𝐶𝐸 = − ෍ 𝑦𝑖 𝑙𝑜𝑔 𝑦ො𝑖 + (1 − 𝑦𝑖 ) 𝑙𝑜𝑔 1 − 𝑦ො𝑖


𝑖=1

Update the discriminator weights using backprop

Discriminator
BCE with labels for
𝐷𝜃 real (1) and fake (0)
G(z)

Minimize 𝒥𝐷 = 𝔼𝑥∈𝒟 −𝑙𝑜𝑔 𝐷(𝑥) + 𝔼𝑧 −𝑙𝑜𝑔 1 − 𝐷(𝐺(𝑧))


𝜃

log prob of D predicting that log prob of D predicting that


real-world data x is genuine generated data is not genuine
Majid Komeili, Carleton University 4
Training the Generator
Minimize 𝒥𝐷 = 𝔼𝑥∈𝒟 −𝑙𝑜𝑔 𝐷(𝑥) + 𝔼𝑧 −𝑙𝑜𝑔 1 − 𝐷(𝐺(𝑧))
𝜃
𝑛

Recall: 𝐵𝐶𝐸 = − ෍ 𝑦𝑖 𝑙𝑜𝑔 𝑦ො𝑖 + (1 − 𝑦𝑖 ) 𝑙𝑜𝑔 1 − 𝑦ො𝑖


𝑖=1

x backprop but do not


update parameters 𝜃

Update the Generator Discriminator


Random weights 𝜙 using backprop. BCE with all labels
noise
Generator 𝐷𝜃 equal to real (1)
𝐺𝜙 G(z)
Minimize 𝒥𝐺 = 𝔼𝑧 −𝑙𝑜𝑔 𝐷(𝐺(𝑧))
𝜙

We may also define 𝒥𝐺 as 𝒥𝐺 = −𝒥𝐷 = 𝑐𝑜𝑛𝑠𝑡. +𝔼𝑧 𝑙𝑜𝑔 1 − 𝐷(𝐺(𝑧))


Therefore, the entire cost function for GAN can be written as min max 𝒥𝐷
𝜃 𝜙
Min Max 𝔼𝑥∈𝒟 −𝑙𝑜𝑔 𝐷𝜃 (𝑥) + 𝔼𝑧 −𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑧))
𝜃 𝜙
Majid Komeili, Carleton University 5
Training GANs
Original GAN loss:
Min Max 𝔼𝑥∈𝒟 −𝑙𝑜𝑔 𝐷𝜃 (𝑥) + 𝔼𝑧 −𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑧))
𝜃 𝜙
▪ For each iteration
▪ Repeat K times
▪ Draw 𝑧 (1) , 𝑧 (2) ,…, 𝑧 (𝑛) (random noise) and generate n fake samples.
▪ Draw 𝑥 (1) , 𝑥 (2) ,…, 𝑥 (𝑛) (from the training set).
▪ Update the discriminator by gradient descent using:
𝑛
1
𝜃 𝑛𝑒𝑤 = 𝜃 𝑜𝑙𝑑 − ∇𝜃 ෍ −𝑙𝑜𝑔 𝐷𝜃 𝑥 (𝑖) − 𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑧 (𝑖) ))
𝑛
𝑖=1

▪ Draw 𝑧 (1) , 𝑧 (2) ,…, 𝑧 (𝑛) (random noise) and generate n fake samples.
▪ Update the generator by gradient descent using:
1
𝜙 𝑛𝑒𝑤 = 𝜙 𝑜𝑙𝑑 − ∇𝜙 σ𝑛𝑖=1 −𝑙𝑜𝑔 𝐷𝜃 (𝐺𝜙 𝑧 (𝑖) ) Modified generator loss
𝑛

Carleton University 6
Issue with BCE loss
▪ Generally, the task of the discriminator is easier than the generator.
▪ If the discriminator becomes too strong relative to the generator, the BCE
loss will be saturated (flat regions).
▪ Consequently, there will be no gradient signal.

Majid Komeili, Carleton University 7


Wassertein Loss
▪ Wassertein Loss is an alternative to the BCE loss.
▪ It is based on the Earth Mover’s Distance.
▪ Instead of a discriminator, we have a critic 𝐶.
min max 𝔼 𝑐 𝑥 − 𝔼 𝑐 𝐺 𝑧
𝑔 𝑐
▪ The output of W-Loss could be any real value (i.e. the output of a linear layer
rather than a sigmoid) representing how real/fake is an image.
▪ Helps with vanishing gradient and mode collapse.
▪ Critic should satisfy 1-Lipschitz Continuity condition. Two common ways to
enforce this condition:
▪ Weight clipping [−𝛽, 𝛽].
▪ Gradient penalty. Add a regularization term that penalize when the gradient is higher than 1.
▪ Regularization term is defined as ∇𝑓 𝑥ො 2 − 1 2 ,

Majid Komeili, Carleton University 8


WGAN vs GAN

Discriminator/Critic Generator

𝑛 𝑛
1 1
GAN min
𝐷 𝑛
෍ −𝑙𝑜𝑔 𝐷 𝑥 (𝑖) − 𝑙𝑜𝑔 1 − 𝐷(𝐺(𝑧 (𝑖) )) m𝑖𝑛
𝐺 𝑛
෍ −𝑙𝑜𝑔 𝐷(𝐺 𝑥 (𝑖) )
𝑖=1 𝑖=1

𝑛 𝑛
1 1
WGAN m𝑎𝑥
𝑐 𝑛
෍ 𝑐 𝑥 (𝑖) − 𝑐 𝐺 𝑧 (𝑖) m𝑎𝑥
𝐺 𝑛
෍ 𝑐 𝐺 𝑧 (𝑖)
𝑖=1 𝑖=1

Carleton University 9
Conditional Generation
• Specify which class we want the Generator to generate images.

1 if real and from


the correct class

Random 0 otherwise
noise
Discriminator input
0
1 Matrix of full of zeros
Indicate 0 0
1
the class 0 0
0
0 0
Matrix of full of ones
0
0
RGB image

Majid Komeili, Carleton University 10


Conditional vs Controllable Generation
▪ Conditional Generation
▪ Generate images from a desired class.
▪ ex. Generate a sample from class dog.
▪ Controllable Generation
▪ Generate images with a desired feature.
▪ ex. having eyeglasses when generating face images.

Majid Komeili, Carleton University 11


Controllable Generation
▪ Goal: generate images with a desired feature (e.g. have eyeglasses)

eyeglasses

Random
noise

Classifier
Generator (Pre-trained eyeglass
detector )

https://arxiv.org/pdf/1907.10786.pdf

Majid Komeili, Carleton University 12


Controllable Generation
▪ Update the random noise z based on the gradient from a pre-trained
classifier.

𝑧0

Random
noise

Classifier
Generator (Pre-trained eyeglass P(eyeglasses)=0.01
detector )

https://arxiv.org/pdf/1907.10786.pdf

Majid Komeili, Carleton University 13


Controllable Generation
▪ Update the random noise z based on the gradient from a pre-trained
classifier.

𝑧1

Random
noise

Classifier
Generator (Pre-trained eyeglass P(eyeglasses)=0.4
detector )
Backprop but do not update Backprop but do not update
Update z
using
backprop

https://arxiv.org/pdf/1907.10786.pdf

Majid Komeili, Carleton University 14


Controllable Generation
▪ Update the random noise z based on the gradient from a pre-trained
classifier.

𝑧2

Random
noise

Classifier
Generator (Pre-trained eyeglass P(eyeglasses)=0.8
detector )
Backprop but do not update Backprop but do not update
Update z
using
backprop

https://arxiv.org/pdf/1907.10786.pdf

Majid Komeili, Carleton University 15


Controllable Generation
▪ Update the random noise z based on the gradient from a pre-trained
classifier.

𝑧3

Random
noise

Classifier
Generator (Pre-trained eyeglass P(eyeglasses)=0.99
detector )
Backprop but do not update Backprop but do not update
Update z
using
backprop

https://arxiv.org/pdf/1907.10786.pdf

Majid Komeili, Carleton University 16


Evaluating GANs
▪ It is a challenge, because there is no ground-truth.
▪ Fidelity
▪ Quality of the generated images. How realistic they are?
▪ Compare more abstract features (e.g. for generated dogs, two eye,
nose, 4 legs, …)

▪ Diversity
▪ Range of images the generator can generate.

Majid Komeili, Carleton University 17


Frechet Inception Distance (FID)
▪ Goal: Compare the statistics of a set of fake images versus the statistics
of a set of real images in an embedding space.
▪ Embedding space: Inception v3 model is used.
▪ The last pooling layer prior to the output (2048 features).
▪ The activations are summarized as a multivariate Gaussian
▪ Calculate mean and covariance of the images in the embedding space.
▪ The distance between these two distributions is then calculated using the
Frechet distance.

▪ Lower FID = closer distributions.

Majid Komeili, Carleton University 18


Image-to-image Translation
▪ A wide range of important image to image problems exist in the space of
computer graphics, image processing, and computer vision

Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." 2017.
Image-to-image Translation

• Using naive loss functions often produced blurry images. → Use GAN

Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." 2017.

Majid Komeili, Carleton University 20


Image-to-image Translation
▪ GANs are generative models that learn a mapping from
random noise vector z to output image y:
G:z→y
▪ Training data: a set of real data

▪ In contrast, in conditional GANs learn a mapping from


observed image x and random noise vector z, to y:
G : {x, z} → y
▪ We condition the generated image y to the input image x.
▪ Training data: pairs of source and target images.

Majid Komeili, Carleton University 21


Pix2Pix Model
GAN loss function: Min Max 𝔼𝑥∈𝒟 −𝑙𝑜𝑔 𝐷𝜃 (𝑥) + 𝔼𝑧 −𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑧)) x: real image
𝜃 𝜙 z: noise
G(z): fake image
Training dataset: images x.

Min Max 𝔼𝑦∈𝒟 𝑙𝑜𝑔 𝐷𝜃 (𝑦) + 𝔼𝑧 𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑧)) y: real image
𝜙 𝜃 z: noise
G(z): fake image
Training dataset: images y.

cGAN loss function: Min Max 𝔼𝑥,𝑦 𝑙𝑜𝑔 𝐷𝜃 (𝑥, 𝑦) + 𝔼𝑥,𝑧 𝑙𝑜𝑔 1 − 𝐷𝜃 (𝐺𝜙 (𝑥, 𝑧))
𝜙 𝜃
x: source image
y: real target image
z: noise
As seen in the original paper:
G(x,z): fake target image
Training dataset: pairs of x,y images.

• cGANs were first proposed in Mirza, Mehdi, and Simon Osindero. "Conditional
generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014).
▪ Generator tries to minimize the
function against an adversarial
• Pix2Pix is a cGAN proposed in Isola, Phillip, et al. "Image-to-image translation with discriminator that tries to maximize it.
conditional adversarial networks." 2017.

Majid Komeili, Carleton University 22


Pix2Pix Model
x: source image
y: real target image
z: noise
G(x,z): fake target image
Training dataset: pairs of x,y images.

• To force the generator to produce


outputs that are near the ground truth
and reduce blurriness, authors added
L1 loss

Majid Komeili, Carleton University 23


PatchGAN discriminator
▪ Instead of predicting if the
Receptive field
entire image is real or fake,
the discriminator predicts
whether a given set of NxN
patches of an image is real or
fake.

▪ Pix2pix uses PatchGAN


discriminator.

Demir, U., & Unal, G. (2018). Patch-based image inpainting with


generative adversarial networks. arXiv preprint arXiv:1803.07422.

Majid Komeili, Carleton University 24


Pix2pix: Cityscapes labels→photo

Majid Komeili, Carleton University 25


Pix2pix: edges→handbags

Majid Komeili, Carleton University 26


Pix2pix: Image Inpainting

Majid Komeili, Carleton University 27


Cycle GAN
▪ Image to image translation in the absence of paired
examples

Cannot use Pix2pix!


Majid Komeili, Carleton University 28
Unpaired Image to Image Translation

Majid Komeili, Carleton University 29


CycleGAN
▪ Two mapping functions (Generators)
▪ G: X → Y
▪ F: Y → X

▪ Two Discriminators:
▪ 𝐷𝑦 : Discriminates y from G(x)
▪ 𝐷𝑥 : Discriminates x from F(y)

Majid Komeili, Carleton University 30


CycleGAN

Forward cycle-consistency loss: Backward cycle-consistency loss:


x→G(x)→F(G(x)) ≈ x y→F(y)→G(F(y)) ≈ y

Majid Komeili, Carleton University 31


CycleGAN
▪ Adversarial loss Min Max
G D
𝑋 →𝑌

▪ Cycle Consistent Loss

▪ Final loss:

Majid Komeili, Carleton University 32


CycleGAN

▪ Hours to zebra

Majid Komeili, Carleton University 33


CycleGAN

▪ Orange to Apple

Majid Komeili, Carleton University 34


CycleGAN
▪ Smartphone snaps to professional DSLR photographs

Majid Komeili, Carleton University 35

You might also like