Diffusion Models
Diffusion Models
Forward diffusion
Backward denoising
2
Diffusion models
Overview
◦ Forward diffusion process
◦ Input: a clean image Output: a pure Gaussian noise
◦ a Markovian process iteratively adds Gaussian noise to a clean image
progressively destroyed it until arriving at a pure Gaussian noise
◦ Backward denoising
◦ input: noise , output: denoised image
◦ reversing the forward diffusion process
◦ the noise is sequentially removed the original image is recreated
◦ Inference time
◦ images are generated by gradually reconstructing them starting from random
white noise.
◦ Training time
◦ The noise subtracted at each time step is estimated via a neural network,
typically based on a U-Net architecture
◦ allowing the preservation of dimensions
3
Diffusion models
Applications and categories
◦ Applications
◦ Generative modeling tasks
◦ image generation
◦ image inpainting
◦ image editing
◦ image-to-image translation
◦ Discriminative tasks: using latent representation learned by diffusion models
◦ image segmentation
◦ classification
◦ anomaly detection
◦ Categories
◦ Denoising diffusion probabilistic models (DDPMs)
◦ Latent variable models that employ latent variables to estimate the probability distribution.
◦ DDPMs can be viewed as a special kind of variational auto-encoders
◦ Noise conditioned score networks (NCSNs)
◦ Based on training a shared neural network via score matching to estimate the score function of the perturbed data distribution at different noise levels.
◦ Stochastic differential equations (SDEs)
◦ Modeling diffusion via forward and reverse SDEs 4
Diffusion models
Applications
5
Denoising diffusion probabilistic models
Details
◦ Forward diffusion process
◦ Starting from a clean image
◦ Adding noise following Gaussian distribution
◦ ;
6
Diffusion models
Training objective
◦ Problem: Training process is to maximize the probability intractable
◦ Solution: minimize a variational lower-bound of the negative log-likelihood
probability of conditioned on
difference between the
estimated denoised image
and the true one
7
Diffusion models
Training Backward denoising process
◦ Loss function for training the backward denoising process
𝑝 𝜃 ( 𝒙 𝑡 −1|𝒙 𝑡 ) = 𝑁 ( 𝒙 𝑡 −1 , 𝝁𝜃 ( 𝒙 𝑡 , 𝑡 ) , 𝚺 𝜃 ( 𝒙 𝑡 , 𝑡 ) )
fix as a constant:
[ 𝟏
]
𝟐
~
𝟐|| ||
𝐾𝐿 ( 𝑞 ( 𝑥 𝑡 −1 ∨ 𝑥𝑡 , 𝑥 0 ) ∨¿ 𝑝 𝜃 ( 𝑥 𝑡 −1 ∨ 𝑥𝑡 ) ) = 𝑬 𝝁 ( 𝒙 𝒙 ) 𝝁 (2𝒙 𝒕 ) +𝐶
[ ]
𝒕 𝒕 , 𝟎 − 𝜽𝛽 𝑡 𝒕 ,
| |
2
𝟐 𝝈𝒕 𝐸 2 | 𝝐 −𝝐 𝜃 ( √
2 𝜎 𝑡 𝛼 𝑡 ( 1 −𝛼 𝑡 )
𝛼 𝑡 𝒙 0 + √ 1− 𝛼 𝑡 𝝐 ,𝑡 )|
parameterize as =
train to predict at every step
is a non-linear function predicting the added
noise from and
train the mean function estimator to predict must predict given
8
Diffusion models
Training process
9
Diffusion models on
image-to-image translation
10
Vanilla conditional diffusion models
◦ Translate from source domain to target domain
◦ Intuition
◦ A source image can be provided as conditioning input to the reverse steps
◦ The reverse transition probability is modified to
◦ training process
◦ maintain the forward process
◦ directly inject the condition into the training objective:
11
UNIT-DDPM
overview
◦ Objetive
◦ training a generative model to infer the joint distribution of images over both domains as a Markov chain
◦ by minimising a denoising score matching objective conditioned on the other domain
◦ Method
◦ update both domain translation models simultaneously
◦ generate target domain images by a denoising Markov Chain Monte Carlo approach that is conditioned on the input source domain
images,
Sasaki, Hiroshi, Chris G. Willcocks, and Toby P. Breckon. "Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models." arXiv
preprint arXiv:2104.05358 (2021). 12
UNIT-DDPM
architecture
◦ Given
◦ images in source domain ; images in target domain
◦ Training objective
◦ optimise the reverse denoising process in both two domains:
◦ optimise the translation function
◦ Method
◦ add constraint on
◦ vanila denoising process:
◦ conditional denoising process
◦ similarly,
denoising loss: +
13
UNIT-DDPM
training domain translation functions
denoising loss: +
14
UNIT-DDPM
training domain translation functions
◦ cycle-consistency loss
15
UNIT-DDPM
overall training and inference processes
16
UNIT-DDPM
Results
17
Syn-Diff
Overview
◦ Adversarial diffusion modeling
◦ SynDiff leverages a conditional diffusion process that progressively maps noise and source images
onto the target image.
◦ For fast and accurate image sampling during inference, large large diffusion steps are taken with
adversarial projections in the reverse diffusion direction
◦ To enable training on unpaired datasets, a cycle-consistent architecture is devised with coupled diffusive
and non-diffusive modules
◦ Evaluation
◦ MRI and MRI-CT translation
Özbey, Muzaffer, et al. "Unsupervised medical image translation with adversarial diffusion models." arXiv preprint
arXiv:2207.08208 (2022).
18
Syn-Diff
Overview
vainlla conditional diffusion models
Syn-Diff
19
Syn-Diff
Architecture
◦ Problem: when k is large, the the normality assumption for
reverse transition probabilities does not hold
◦ solution: leveraging adversarial projector that uses a
Generator and a Discriminator
◦ for
◦ produces an estimate of the target image given and source image
◦ synthesizes from
◦ for
◦ distinguishes between actual and the synthesized
20
Syn-Diff
Evaluation
21
Syn-Diff
Evaluation
22
CoLa-Diff
◦ diffusion-based multi-modality MRI synthesis model
◦ main ideas
◦ introduce the brain region masks as the priors of density distributions to guide diffusion process better maintain the anatomical
structure
◦ employ auto-weight adaptation to better leverage multi-modal information
23
CoLa-Diff
evaluation
24
Brownian Bridge Diffusion Models
◦ Problem: Existing works treat image-to-image translation as conditional image generation
◦ does not obviously appear in the training objective, it is difficult to guarantee the diffusion can finally reach the desired conditional
distribution.
◦ Approach:
◦ directly build the mapping between the input and the output domains through a Brownian Bridge stochastic process, rather
than a conditional generation process.
Li, Bo, et al. "BBDM: Image-to-image translation with Brownian bridge diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023. 25
Brownian Bridge Diffusion Models
theoretical foundation
◦ Brownian Bridge:
◦ a continuous-time stochastic model, where the probability distribution during the diffusion process is conditioned on the starting
and ending states.
◦ the state distribution at each time step of a Brownian bridge process starting from point at t = 0 and ending at point at is:
26
Brownian Bridge Diffusion Models
pipeline
◦ Given image
1. extract the latent feature using the encoder pre-trained VQGAN
2. use the proposed Brownian Bridge process to map to the corresponding latent representation
3. translated image is performed using the decoder of pre-trained VQGAN
27
Brownian Bridge Diffusion Models
training process
where
where
28
Brownian Bridge Diffusion Models
training process
◦ Reverse process
◦ the reference image is only set as the start point of the reverse diffusion
◦ it will not be utilized as a conditional input in the prediction network at each step
◦ Training objective
Recall of UNIT-DDPM
vanila denoising process:
conditional denoising process
29
Brownian Bridge Diffusion Models
evaluation
30
Energy-Guided Stochastic Differential Equations
Problem and motivation
◦ To learn a mapping from image domain A to B in an unpaired manner by using score-based
diffusion models.
◦ Existing methods totally ignore the training data in the source domain, leading to sub-optimal
solutions for unpaired image-to-image.
◦ To employs an energy function pretrained on both the source and target domains to guide
the inference process of a pretrained SDE for realistic and faithful unpaired image-to-image.
EGSDE: Unpaired Image-to-Image Translation via Energy-Guided Stochastic Differential Equations (NIPS, 2022)
31
Score-based Diffusion Models
overview
32
Score-based Diffusion Models
◦ SBDMs in Unpaired Image to Image Translation
◦ The translated image should be realistic for the target domain and faithful for the source image.
◦ Related works:
◦ ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2021)
◦ SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations (International Conference
on Learning Representations, 2021).
◦ Existing works generated samples on the target domain from the backward SDE for realism.
◦ For faithfulness, they refined the sample by adding information extracted from noisy source image of
the forward SDE or started generating with noisy source image of the forward SDE.
These works did not leverage the training data in the source domain 33
Energy-Guided Stochastic Differential Equations
◦ Approach:
◦ Employs an energy function pre-trained across the two domains to guide the inference process of a pretrained SDE for both
realism and faithfulness.
◦ Defines a valid conditional distribution by compositing a pretrained SDE and a pretrained energy function:
◦ The energy function is designed to encourage the sample to retain the domain-independent features and discard the domain-
specific ones to improve both the faithfulness and realism.
◦ Method:
◦ Design the energy function:
◦ The domain-independent features should be preserved while the domain-specific features should be changed accordingly.
The energy function:
34
Energy-Guided Stochastic Differential Equations
◦ Method
◦ Design the energy function:
◦ The cosine similarity between the domain-specific features extracted from the generated sample and the source image:
◦ The negative squared L2 distance between the domain-independent features extracted from the generated sample and source image:
35
Energy-Guided Stochastic Differential Equations
◦ Method
◦ Design the energy function:
Because of poor performance in experiment, they consider a simpler energy function:
36
Energy-Guided Stochastic Differential Equations
◦ Backbone:
37
Energy-Guided Stochastic Differential Equations
◦ Backbone:
38
Energy-Guided Stochastic Differential Equations
◦ Algorithmic flow
39
Energy-Guided Stochastic Differential Equations
evaluation
◦ Experiments with AFHQ dataset (Proceedings of the IEEE/CVF Conference on CVPR, 2020):
40
Energy-Guided Stochastic Differential Equations
evaluation
◦ Experiments with 108 dataset
41
GAN-based Image2Image translation
42
Uncertainty-guided Progressive GANs
◦ Problem and motivation
◦ To learn a mapping from image domain A to B in a paired manner
◦ The generation of high-quality images remains unguided without specifically attending to poorly translated regions
◦ To exploit the correlation between estimated uncertainty and prediction error.
◦ Approach
◦ Model the underlying per-pixel residual distribution as zero-mean generalized Gaussian distribution (GGD).
◦ The network learns to predict the optimal scale (α) and shape (β) of the GGD for every pixel, then estimates output image along
with the aleatoric uncertainty for each generating phase.
43
Uncertainty-guided Progressive GANs
◦ Model structure
◦ The framework is composed of a sequence of GANs, each GAN is represented by a pair of generator and discriminator
◦ All the discriminators are patch discriminators
◦ Generators are modified U-Nets, where the head is split into three to estimate the parameters of the GGD
44
Uncertainty-guided Progressive GANs
evaluation
◦ Experiments in PET to CT translation
◦ Comparison benchmarks
◦ Pix2Pix: Image-to-Image Translation with Conditional Adversarial Networks (IEEE CVPR, 2017)
◦ PAN: Perceptual Adversarial Networks for Image-to-Image Transformation (IEEE TIP, 2018)
◦ MedGAN: Medical Image Translation using GANs (Computerized Medical Imaging and Graphics, 2020)
45
Uncertainty-guided Progressive GANs
evaluation
◦ Experiments with 108 dataset
46