0% found this document useful (0 votes)
23 views14 pages

Relighting Radiance Fields EGSR

This document presents a novel method for creating relightable radiance fields from single-illumination multi-view datasets by leveraging 2D image diffusion models. The approach involves fine-tuning a diffusion model on multi-illumination data to augment single-illumination captures, allowing for realistic 3D relighting of complete scenes. The authors demonstrate the effectiveness of their method through results on both synthetic and real datasets, showcasing its ability to provide interactive control over lighting direction and maintain multi-view consistency.

Uploaded by

jetrodopazo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Relighting Radiance Fields EGSR

This document presents a novel method for creating relightable radiance fields from single-illumination multi-view datasets by leveraging 2D image diffusion models. The approach involves fine-tuning a diffusion model on multi-illumination data to augment single-illumination captures, allowing for realistic 3D relighting of complete scenes. The authors demonstrate the effectiveness of their method through results on both synthetic and real datasets, showcasing its ability to provide interactive control over lighting direction and maintain multi-view consistency.

Uploaded by

jetrodopazo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Published in: Computer Graphics Forum

(authors version)
Eurographics Symposium on Rendering 2024
E Garces and E. Haines
(Guest Editors)

A Diffusion Approach to Radiance Field Relighting using


Multi-Illumination Synthesis
Y. Poirier-Ginter1,2 , A. Gauthier1 , J. Phillip3 , J.-F. Lalonde2 and G. Drettakis1
1
Inria, Université Côte d’Azur, France; 2 Université Laval, Canada; 3 Adobe Research, United Kingdom

Figure 1: Our method produces relightable radiance fields directly from single-illumination multi-view dataset, by using priors from generative
data in the place of an actual multi-illumination capture.

Abstract
Relighting radiance fields is severely underconstrained for multi-view data, which is most often captured under a single
illumination condition; It is especially hard for full scenes containing multiple objects. We introduce a method to create
relightable radiance fields using such single-illumination data by exploiting priors extracted from 2D image diffusion models.
We first fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by light direction, allowing us to augment a
single-illumination capture into a realistic – but possibly inconsistent – multi-illumination dataset from directly defined light
directions. We use this augmented data to create a relightable radiance field represented by 3D Gaussian splats. To allow direct
control of light direction for low-frequency lighting, we represent appearance with a multi-layer perceptron parameterized on
light direction. To enforce multi-view consistency and overcome inaccuracies we optimize a per-image auxiliary feature vector.
We show results on synthetic and real multi-view data under single illumination, demonstrating that our method successfully
exploits 2D diffusion model priors to allow realistic 3D relighting for complete scenes.
Keywords: NeRF, Radiance Field, Relighting

1. Introduction One approach to overcome this difficulty is to capture a multi-


illumination dataset which better conditions the inverse problem
Radiance fields have recently revolutionized 3D scene capture from but comes at the cost of a heavy capture setup [DHT∗ 00]. Another
option is to use priors, which is typically done by training a neu-
images [MST∗ 20]. Such captures typically involve a multi-view set
of photographs taken under the same lighting conditions. Relighting ral network on synthetic data to predict intrinsic properties or relit
such radiance fields is hard since lighting and material properties images. However, creating sufficiently large, varied and photoreal-
are entangled (e.g., is this a shadow or simply a darker color?) and istic 3D scenes is both challenging and time-consuming. As such,
the inverse problem ill-posed. methods relying on these—or simpler—priors often demonstrate

Authors Version, published in Computer Graphics Forum (EGSR 2024)


2 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

results on isolated masked objects [BBJ∗ 21], or make simplifying as- 2.1. Radiance Fields
sumptions such as distant environment lighting [BJB∗ 21, ZSD∗ 21].
Radiance field methods have revolutionized 3D scene capture using
Other methods have handled more complex illumination models,
multi-view datasets (photos or video) as input. In particular, Neural
including full scenes [PMGD21, PGZ∗ 19], but can be limited in
Radiance Fields (NeRFs) [MST∗ 20] learn to synthesize novel views
the complexity of the geometry and materials that must reconstruct
of a given scene by regressing its radiance from a set of input
well. Finally, methods that depend on accurate estimates of surface
images (multiple photos or videos of a 3D scene). Structure from
normals [JLX∗ 23, GGL∗ 23] often produce limited levels of realism
motion [Ull79, SF16] is used to estimate the camera poses for all
when relighting.
images and rays are cast through the center of all pixels. A multi-
At the other end of the spectrum, diffusion models (DMs, e.g., layer perceptron (MLP) cθ parameterized by 3D position and view
[RBL∗ 22]), trained on billions of natural images, have shown ex- direction is used to represent the radiance and opacity of the scene.
ceptional abilities to capture real image distribution priors and can The optimization objective is simply the mean squared error:
synthesize complex lighting effects. While recent progress shows h i
LNeRF = Eo,d,c∗ ||cθ (o, d) − c∗ ||22 ,
they can be controlled in various ways [ZRA23], extracting lighting-
specific priors from these models, especially for full 3D scenes, has where o is a ray’s origin, d its direction, and c∗ the target RGB
not yet been demonstrated. color value of its corresponding pixel. The predicted color for
that pixel is obtained by integrating a color field cθ weighted by a
In this paper, we build on these observations and present a new
density field σθ following the equation of volume rendering. The
method that demonstrates that it is possible to create relightable radi-
original NeRF was slow to train and to render; A vast number
ance fields for complete scenes from single low-frequency lighting
of methods [TTM∗ 22] have been proposed to improve the orig-
condition captures by exploiting 2D diffusion model priors. We first
inal technique, e.g., acceleration structures [MESK22], antialias-
propose to fine-tune a pre-trained DM conditioned on the dominant
ing [BMT∗ 21], handling larger scenes [BMV∗ 22] etc. Recently,
light source direction. For this, we leverage a dataset of images
3D Gaussian Splatting (3DGS) [KKLD23] introduces an explicit,
with many lighting conditions of the same scene [MGAD19], which
primitive-based representation of radiance fields. The anisotropic
enables the DM to produce relit versions of an image with explicit
nature of the 3D Gaussians allows the efficient representation of fine
control over the dominant lighting direction. We use this 2D relight-
detail, and the fast GPU-accelerated rasterization used allows real-
ing network to augment any standard multi-view dataset taken under
time rendering. We use 3DGS to represent radiance fields mainly for
single lighting by generating multiple relit versions of each image,
performance, but any other radiance representation, e.g., [CXG∗ 22],
effectively transforming it into a multi-illumination dataset. Given
could be used instead. Radiance fields are most commonly used in
this augmented dataset, we train a new relightable radiance field
the context of single-light condition captures, i.e., the images are all
with direct control on lighting direction, which in turn enables real-
captured under the same lighting. As a result, there is no direct way
istic interactive relighting of full scenes with lighting and camera
to change the lighting of captured scenes, severely restricting the
view control in real time for low-frequency lighting. We build on 3D
utility of radiance fields compared to traditional 3D graphics assets.
Gaussian Splatting [KKLD23], enhancing the radiance field with
Our method uses diffusion models to simulate multi-light conditions
a small Multi-Layer Perceptron and an auxiliary feature vector to
from a single-light capture thus allowing the relighting of radiance
account for the approximate nature of the generated lightings and to
fields.
handle lighting inconsistencies between views.

In summary, our contributions are: 2.2. Single Image Relighting


• A new 2D relighting neural network with direct control on lighting Single image relighting approaches have mostly been restricted to
direction, created by fine-tuning a DM with multi-lighting data. human faces, with the most recent methods using generative priors
• A method to augment single-lighting multi-view capture to an ap- [WZL∗ 08, SYH∗ 17, SKCJ18, FRV∗ 23, PTS23, PLMZ23]. Recently,
proximate multi-lighting dataset, by exploiting the 2D relighting human body relighting has also been studied [KE19, LSY∗ 21] as
network. their structure also allows for the design of effective priors. Because
• An interactive relightable radiance field that provides direct con- they are much less constrained, relighting generic scenes from a sin-
trol on lighting direction, and corrects for inconsistencies in the gle image is a much harder problem that has eluded researchers until
neural relighting. recently. While some approaches have focused on specific relighting
effects such as shadows [LLZ∗ 20, SLZ∗ 22, SZB21, VZG∗ 23], they
We demonstrate our solution on synthetic and real indoor scenes,
are applicable solely for the purpose of object compositing. Full
showing that it provides realistic relighting of multi-view datasets
scene relighting has been explored by Murmann et al. [MGAD19],
captured under a single lighting condition in real time.
who present a dataset of real indoor scenes lit by multiple light-
ing directions. They show that training a U-net on their dataset
allows for full scene relighting—in this work, we also leverage
2. Related Work their dataset but train a more powerful ControlNet [ZRA23] for
the relighting task. Other works include [TÇE∗ 21] who focus on
Our method proposes a relightable radiance field. We review work sky relighting and [LSB∗ 22, ZLZ∗ 22] which leverage image-to-
on radiance fields and their relightable variants, and discuss diffusion image translation. Methods for outdoor scenes have also been pro-
models and fine-tuning methods we build on. posed [YME∗ 20, LGZ∗ 20, YS22]. Of note, SIMBAR [ZTS∗ 22] and
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 3 of 14

OutCast [GRP22] produces realistic, user-controllable, hard, cast 3D assets (meshes and SVBRDFs) and applying physically-based
shadows from the sun. In contrast, we focus on indoor scenes which rendering algorithms. IBL-NeRF [CKK23] allows for scene-scale
often exhibit soft shadows and more complex lighting effects. Fi- material estimation but bakes the illumination into a prefiltered light-
nally, the concurrent work of Zeng et al. [ZDP∗ 24] uses diffusion field which prevents relighting. Recently, NeRF-OSR [RES∗ 22],
models to relight isolated objects using environment maps. In con- I2 -SDF [ZHY∗ 23], and Wang et al. [WSG∗ 23] focused on scene
trast to these solutions, we focus on cluttered indoor scenes which scale, single illumination relighting scenes using both implicit and
often exhibit soft shadows and more complex lighting effects. explicit representations. While they can achieve reasonable results,
they often lack overall realism, exhibiting bumpy or overly smooth
shading during relighting. In contrast, our use of diffusion priors
2.3. Multi-view Relighting
provides realistic-looking output.
While single-view methods produce good results on restricted
datasets such as faces, they are often limited by the lack of ac-
curate geometry, required to simulate light transport. To this point,
multi-view data can provide a more accurate and complete geometric 2.4. Diffusion Models
reconstruction. For example Philip et al. [PMGD21, PGZ∗ 19] build Diffusion Models (DMs) [SDWMG15, HJA20] made it possible
on multi-view stereo (MVS) reconstruction of the scene, and learn
to train generative models on diverse, high-resolution datasets of
a prior from synthetic data rendered under multiple lighting condi-
billions of images. These models learn to invert a forward diffusion
tions. Despite correcting for many of the reconstruction artifacts,
process that gradually transforms images into isotropic Gaussian
these methods are restricted by the quality of MVS reconstruction.
noise, by adding random Gaussian noise ϵt ∼ N (0, I) to an image
Nimier et al. [NDDJK21] also present a scene-scale solution but
in T steps. DMs train a neural network gφ with parameters φ to learn
require a complex pipeline that optimizes in texture space. Gao et
to denoise with the objective:
al. [GCD∗ 20] use a rough proxy and neural textures to allow object
relighting. h i
LDiffusion = Ex,ϵ,1≤t≤T ||gφ (xt |t) − yt ||22 ,
More recently radiance fields have also been used as a geometric
representation for relighting. Most methods work on the simple in which target yt is often set to ϵ. After training, sampling can
case of a single isolated object while we target larger scenes. Such be performed step-by-step, by predicting xt−1 from xt for each
methods typically assume lighting to be distant, often provided as timestep t which is expensive since T can be high (e.g., 1000);
an environment map. NeRFactor [ZSD∗ 21] uses a Bi-Directional faster alternatives include deterministic DDIM [SME20] sampling,
Reflectance D istribution F unction ( BRDF) p rior f rom measure- that can perform sampling of comparable quality with fewer steps
ments and estimates normals and visibility, while NERV [SDZ∗ 21] (i.e., 10-50× larger steps). Stable Diffusion [RBL∗ 22] performs
and Zhang et al. [ZSH∗ 22] predicts visibility to allow indirect il- denoising in a lower-dimensional latent space, by first training a
lumination estimation. NERD [BBJ∗ 21], PhySG [ZLW∗ 21], and variational encoder to compress images; for instance, in Stable
DE-NeRF [WSLG23] use efficient physically-based material and Diffusion XL [PEL∗ 23], images are mapped to a latent space of
lighting models to decompose a radiance field into spatially vary- size R128×128×4 . In a pre-pass, the dataset is compressed using this
ing BRDFs, while Neural-PIL [BJB∗ 21] learns the illumination autoencoder, and a text-conditioned diffusion model is then trained
integration and low-dimensional BRDF priors using auto-encoders. directly in this latent space.
TensorIR [JLX∗ 23] uses a mixed radiance and physics-based for-
mulation to recover intrinsic properties. NeRO [LWL∗ 23] focuses Diffusion models have an impressive capacity to synthesize highly
on specular objects showing very promising results. Relightable realistic images, typically conditioned on text prompts. The power
Gaussians [GGL∗ 23] use the more recent 3D Gaussian represen- of DMs lies in the fact that the billions of images used for training
tation along with ray-tracing to estimate properties of objects. GS- contain an extremely rich representation of the visual world. How-
IR [LZF∗ 24], GaussianShader [JTL∗ 24] and GIR [SWW∗ 23] also ever, extracting the required information for specific tasks, without
build on 3D Gaussian splatting, proposing different approaches to incurring the (unrealistic) cost of retraining DMs is not straightfor-
estimate more reliable normals while approximating visibility and ward. A set of recent methods show that it is possible to fine-tune
indirect illumination; these work well for isolated objects under dis- DMs with a typically much shorter training process to perform
tant lighting. However, these methods struggle with more complex specific tasks (e.g., [GAA∗ 23, RLJ∗ 23]). A notable example is Con-
scene-scale input and near-field illumination but can work or be trolNet [ZRA23] which proposed an efficient method for fine-tuning
adapted to both single and multi-illumination input data. Stable Diffusion with added conditioning. In particular, they demon-
strated conditional generation from depth, Canny edges, etc., with
Feeding multi-view multi-illumination data to a relightable radi- and without text prompts; We will build on this solution for our 2D
ance field indeed enables better relighting but at the cost of highly- relighting method.
controlled capture conditions [XZC∗ 23, ZCD∗ 23, TDMS∗ 23] or an
extended dataset of unconstrained illuminations [BEK∗ 22,LGF∗ 22]. In a similar spirit, there has been significant evidence in recent
In our method, we use a Diffusion Model to simulate multi- years that latent spaces of generative models encode material in-
illumination data, lifting the capture constraints while benefiting formation [BMHF23, BF24]. Recent work shows the potential to
from the lighting variations. Another body of work [MHS∗ 22, fine-tune DMs to allow direct material editing [SJL∗ 23]. Nonethe-
HHM22, YZL∗ 22, ZYL∗ 23, LWC∗ 23, LCL∗ 23] achieve object or less, we are unaware of published methods that use DM fine-tuning
scene relighting from multi-view images by extracting traditional to perform realistic relighting of full and cluttered scenes.
4 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

Figure 2: We use the single-view, multi-illumination dataset of Murmann et al. [MGAD19] to train ControlNet [ZRA23] on single view
supervised relighting. The network accepts an image (along with its estimated depth map) and a target light direction as input and produces a
relit version of the same scene under the desired target lighting.

3. Method
Our method is composed of three main parts. First, we create a 2D
relighting neural network with direct control of lighting direction
(Sec. 3.1). Second, we use this network to augment a multi-view
capture with single lighting into a multi-lighting dataset, by using
our relighting network. The resulting dataset can be used to create
a radiance field representation of the 3D scene (Sec. 3.2). Finally,
we create a relightable radiance field that accounts for inaccuracies
in the synthesized relit input images and provides a multi-view
consistent lighting solution (Sec. 3.3). Figure 3: Top row: five diffuse sphere rendered by our optimized
lighting direction and shading parameters — the direction is in-
dicated by a blue dot at the point of maximum specular intensity;
3.1. Single-View Relighting with 2D Diffusion Priors Bottom row: the corresponding target gray spheres obtained by
Relighting a scene captured under a single lighting condition is averaging the diffuse spheres captured in all spheres. We found the
severely underconstrained, given the lighting/material ambiguity, lighting directions by minimizing the L1 distance between the top
and thus requires priors about how appearance changes with illu- and bottom row.
mination. Arguably, large DMs must internally encode such priors
since they can generate realistic complex lighting effects, but exist-
ing architectures do not allow for explicit control over lighting. which best reproduces this target when rendering a gray ball with
We propose to provide explicit control over lighting by fine- a simplistic Phong shading model. More specifically, we minimize
tuning a pre-trained Stable Diffusion (SD) [RBL∗ 22] model using the L1 error when jointly optimizing for an ambient light term and
ControlNet [ZRA23] on a multi-illumination dataset. As illustrated shading parameters (albedo, specular intensity and hardness, as well
in Fig. 2, the ControlNet accepts as input an image as well as a as a Fresnel coefficient). Fig. 3 illustrates this process.
target light direction, and produces a relit version of the same scene
under the desired lighting. To train the ControlNet, we leverage the 3.1.2. Controlling Relighting Diffusion
dataset of Murmann et al. [MGAD19], which contains N = 1015 We train ControlNet to predict relit versions of the input image by
real indoor scenes captured from a single viewpoint, each lit under conditioning it on a target lighting direction. Let us denote a set X
M = 25 different, controlled lighting directions. We only keep the of images of a given scene in the multi-light dataset of Murmann
18 non-front facing light directions. et al. [MGAD19], where each image Xk ∈ X has associated light
direction lk . Our approach, illustrated in Fig. 2, trains on pairs of
3.1.1. Lighting Direction lighting directions of the same scene (including the identity pair).
The denoising objective becomes
To capture the scenes using similar light directions, Murmann et al. h i
relied on a camera-mounted directional flash controlled by a servo L2D = Eϵ,X,t,i, j ||gψ (Xt,i ;t, X j , D j , l i ) − yt ||22 , (1)
motor. A pair of diffuse and metallic spheres are also visible in each
scene; we leverage the former to obtain the effective lighting direc- where Xt,i is the noisy image at timestep t ∈ [1, T ], where i, j ∈
tions. Using as target the average of all diffuse spheres produced [1, M], and where ψ are the ControlNet optimizable parameters only.
by the same flash direction, we find the lighting direction l ∈ R3 X j is another image from the set and D j is its depth map (obtained
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 5 of 14

with the approach of Ke et al. [KOH∗ 24])—both are given as input


to the ControlNet subnetwork. In short, the network is trained to
denoise input image Xi given its light direction li while conditioned
on the image X j corresponding to another lighting direction l j of
the same scene. Here, we do not use text conditioning: the empty
text string is provided to the network.
Specifically, the light direction l i is encoded using the first 4
bands of spherical harmonics, following the method of Müller et
Figure 5: Importance of conserving edge sharpness when relighting.
al. [MESK22]. The resulting vector is added to the timestep em-
Left: input image. Middle: naive ControlNet relighting; note how
bedding prior to feeding it to the layers of ControlNet’s trainable
the edges do not match the input and how the text is illegible. Right:
copy.
our final relighting after fine-tuning the conditonal decoder network
from [ZFC∗ 23].
3.1.3. Improving the Diffusion Quality
Since ControlNet was not specifically designed for relighting, adapt-
ing it naively as described above leads to inaccurate colors and a
loss in contrast (see Fig. 4), as well as distorted edges (see Fig. 5). a learning rate of 10−4 for 20k steps when training at resolution
These errors also degrade multi-view consistency. 768 × 512 and 50k steps at resolution 1536 × 1024. Note that this
step is independent of the ControlNet training. Fig. 5 shows the
effect of these changes; note how the edges are wobbly and the text
is illegible without them.
Example relighting results obtained using our 2D relighting net-
work on images outside of the dataset are shown in Fig. 6. Observe
how the relit images produced by our method are highly realistic
and light directions are consistently reproduced across scenes. A
naive solution for radiance field relighting would be to apply this 2D
network to each synthesized novel view. However, the ControlNet
Figure 4: Importance of post-relighting color and contrast adjust- is not multi-view consistent, and such a naive solution results in
ments. Left: input image. Middle: naive ControlNet relighting; the significant flickering. Please see the accompanying video for a clear
bottle has the wrong color and the contrast is poor. Right: our re- illustration.
lighting after training with [LLLY23] and after color-matching the
input.
3.2. Augmenting Multi-View/Single-Lighting Datasets
We adopt two strategies to improve coloration and contrast. First, Given a multi-view set I of images of a scene captured under the
we follow the recommendations of [LLLY23] to improve image same lighting (suitable for training a radiance field model), we now
brightness—we found them to also help for color. In particular, using leverage our light-conditioned ControlNet model to synthetically
√ √ relight each image in I. We assume the 3D pose of each image Ii ∈ I
thethe
of “v-parameterized”
more usual yt =objective = ᾱt ·in
ϵ, provedytcritical; ϵ− − ᾱt · x, instead
this 1equation, 1 − ᾱt is known a priori, for example via Colmap [SF16, SZPF16]. We
gives the variance of the noise at timestep t. Second, after sampling, then simply relight each Ii ∈ I to the corresponding 18 known light
we color-match predictions to the input image to compensate for directions in the dataset from Murmann et al. [MGAD19] (excluding
the difference between the color distribution of the training data the directions where the flash points forward), (see Sec. 3.1). We
and that of the scene. This is done by subtracting the per-channel now have a full multi-lighting, multi-view dataset. This process is
mean and dividing by the standard deviation for the prediction, illustrated in Fig. 7.
then adding the mean and standard deviation of the input, in the
LAB colorspace. This is computed over all 18 lighting conditions
3.3. Training a Lighting-Consistent Radiance Field
together (i.e., the mean over all lighting directions) to conserve
relative brightness across all conditions. Fig. 4 shows the effect of Given the generated multi-light, multi-view dataset, we now describe
these changes; without them, the bottle is blue instead of green and our solution to provide a relightable radiance field. In particular, we
overall contrast is poor. build on the 3DGS framework of Kerbl et al. [KKLD23]. Our re-
quirements are twofold: first, define an augmented radiance field
To correct the distorted edges, we adapt the asymmetric autoen-
that can represent lighting conditions from different lighting direc-
coder approach of Zhu et al. [ZFC∗ 23], which consists in condition- tions; second, allow direct control of the lighting direction used for
ing the latent space decoder with the (masked) input image for the
relighting.
inpainting task. In our case, we ignore the masking and fine-tune
the decoder on the multi-illumination dataset [MGAD19]. At each The original 3DGS [KKLD23] radiance field uses spherical har-
fine-tuning step, we encode an image and condition the decoder monics (SH) to represent view-dependent illumination. To encode
on an image from the same scene with another random lighting varying illumination, we replace the SH coefficients with a 3-layer
direction. The decoder is fine-tuned with the Adam optimizer at MLP cθ of width 128 which takes as input the light direction along
6 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

Input Image ControlNet Relightings

Figure 6: Relighting results with our light-conditioned ControlNet. From a single input image (left column), the ControlNet can generate
realistic relit versions for different target light directions (other columns). Please notice realistic changes in highlights for different light
directions (top row), as well as the synthesis of cast shadows (bottom row).

Figure 7: Given a multi-view, single-illumination dataset we use our relighting ControlNet to generate a multi-view, multi-illumination dataset.

with the viewing direction. Both vectors have a size of 16 after ance fields like 3DGS rely on multi-view consistency, and breaking
encoding. it introduces additional floaters and holes in surfaces.
Since light directions are computed with respect to a local camera To allow the neural network to account for this inconsistency
reference frame (c.f. Sec. 3.1), we subsequently register them to the and correct accordingly, we optimize a per-image auxiliary latent
world coordinate system (obtained from Colmap) by rotating them vector a of size 128. Similar approaches for variable appearance
according to their (known) camera rotation parameters: have been used for NeRFs [MBRS∗ 21]. Therefore, in addition to the
l ′ = Ri l , (2) lighting direction l ′ , we condition the MLP with per-view auxiliary
parameters a:
where Ri is the 3 × 3 camera-to-world rotation matrix of image Ii
from its known pose. G

We condition the MLP with the spherical harmonics encoding of


c(o, d) = ∑ wg cθ (xg , d|av , l ′ ) , (3)
g=1
the globally consistent lighting direction l ′ , which enables training a
3DGS representation on our multi-lighting dataset. While this strat- where g ∈ [1, G] sums over the G gaussians (see [KKLD23]), xg /wg
egy works well for static images, it results in inconsistent lighting are their features/weights, d is the view direction, o the ray origin,
across views despite accounting for camera rotation in Eq. 2. Radi- and c is the predicted pixel color. Note that for novel views at
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 7 of 14

use Stable Diffusion [RBL∗ 22] v2.1 as a backbone. Our source code
and datasets will be released upon publication.
We first present the results of our 3D relightable radiance field,
both for synthetic and real-world scenes. We then present a quanti-
tative and qualitative evaluation of our method by comparing it to
previous work and finally present an ablation of the auxiliary vector
a from Sec. 3.3.

4.1. Test Datasets


Since there are no real multi-view multi-illumination indoor datasets
of full scenes available for our evaluation, we use synthetic scenes
to allow quantitative evaluation. For this purpose, we designed 4
synthetic test scenes (KITCHEN, LIVINGROOM, OFFICE, BEDROOM).
They were created in Blender by downloading artist-made 3D rooms
from Evermotion and modifying them to increase clutter: in each
room, we gathered objects and placed them on a table or a countertop.
We also created simpler, diffuse-only versions to evaluate how scene
Figure 8: Overview of our radiance field training scheme. To alle-
clutter affects the relighting results. For each synthetic scene, we first
viate potential inconsistencies in lighting directions, we condition
built a standard multi-view (single-lighting) dataset consisting of 4
our 3DGS-based radiance field both on the illumination direction
camera sweeps (left-to-right, at varying elevations) of 50 frames for
encoding and on optimized auxiliary vectors (one per training im-
training and one (at a different elevation) of 100 frames for testing.
age). These vectors model the differences between predictions and
We simulated the light direction of the 2D training dataset with a
let us fit each view to convergence.
spotlight with intensity of 2 kW and radius 0.1 locating on top of the
camera and pointing away from it. We used the same set of camera
flash directions as in the dataset of Murmann et al. [MGAD19]. We
inference, we use as latent vector the mean of all training view then render all frames in 736 × 512 using the Cycles path tracer.
latents i.e. Ev [av ]. Please note that the effective lighting direction will be dependent on
the exact configuration of the room. This configuration is our best
We first train 3DGS with the unlit images as a “warmup” stage
effort to produce a ground truth usable for comparison.
for 5K iterations, then train the full multi-illumination solution for
another 25K iterations, using all 18 back-facing light directions In addition, we also captured a set of real scenes (K ETTLE, H OT
(see Sec. 3.1). The multi-illumination nature of the training results P LATES, PAINT G UN, C HEST OF D RAWERS and G ARAGE WALL),
in an increase in “floaters”. As observed by Philip and Deschain- for which we performed a standard radiance-field multi-view cap-
tre [PD23], floaters are often present close to the input cameras; the ture, by taking between 90–150 images of the environment, in an
explicit nature of 3DGS allows us to reduce these effectively. In approximate sphere (or hemisphere) around the scene center of
particular, we calculate a znear value for all cameras by taking the interest.
z value of the 1st percentile of nearest SfM points and scaling this
value down by 0.9. During training, at each step, all gaussian primi- 4.2. 3D Relighting Results
tives that project within the view frustum of a camera but are located
in front of its znear plane are culled. Finally, given the complexity of We begin by showing qualitative results on the set of real scenes that
modeling variable lighting, we observed that the optimization some- we captured. Here, we used a resolution of 1536 × 1024, training
times converges to blurry results. To counter this, we overweight for 150K iterations. We show qualitative results for these scenes
three front-facing views (left, right, and center), by optimizing for using our 3D relightable radiance field in Fig. 9. In addition, we also
one of these views every three iterations. This provides marginal show results for two scenes from the MipNeRF360 dataset, namely
improvement in results; all images shown are computed with this C OUNTER and ROOM.
method, but it is optional. As our method is lightweight and only adds a small MLP over
The full method for relightable radiance fields is shown in Fig. 8. the core 3DGS architecture, it runs interactively for both novel view
At inference, we can directly choose a lighting direction, and use synthesis and relighting at 30fps on an A6000 GPU. Memory usage
efficient 3DGS rendering for interactive updates with modified light- is comparable to the original 3DGS. Please see the video for interac-
ing. Our latent vectors and floater removal remove most, but not all, tive relighting results on these scenes and additional synthetic scenes.
artifacts introduced by the multi-view inconsistencies; this can be We see that our method produces realistic and plausible relighting
seen in the ablations at the end of the supplemental video. results. Also, note that our solution is temporally consistent.

4. Results and Evaluation 4.3. Evaluation


Our method was implemented by leveraging publicly available im- Baselines. We compare our results to the method of [PMGD21]
plementations of ControlNet [ZRA23] and 3DGS [KKLD23]. We which is specifically designed for complete scenes, Ten-
8 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

Figure 9: Qualitative relighting results for the real scenes, from left to right: C HEST OF D RAWERS, K ETTLE, M IP N E RF ROOM and G ARAGE
WALL, for a moving light source. The lighting direction is indicated in the gray ball in the lower right. Please see the supplemental video for
more results. Please note how the highlights (left group) and shadows (right group) have changed.

Figure 10: Qualitative comparison on real scene K ETTLE. From left to right, from the same viewpoint: input lighting condition (view
reconstructed using 3D Gaussian Splatting), target lighting, our relighting, Philip et al. [PMGD21] relighting. Top and bottom rows are
two different lighting conditions. Philip et al. [PMGD21] exhibits much more geometry and shading artifacts compared to our method; in
particular imprecise MVS preprocessing results in missing geometry.

Method → Ours Outcast [GRP22] R3DGS [GGL∗ 23] TensoIR [JLX∗ 23]
Scene ↓ / Metrics PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑
Simple Bedroom 20.57 0.156 0.868 17.24 0.207 0.808 17.79 0.174 0.830 15.77 0.471 0.595
Simple Kitchen 17.45 0.154 0.855 17.91 0.205 0.822 18.55 0.197 0.807 20.52 0.382 0.701
Simple Livingroom 22.12 0.136 0.884 21.09 0.125 0.878 20.34 0.166 0.857 17.45 0.444 0.598
Simple Office 18.59 0.131 0.868 18.97 0.196 0.811 20.40 0.173 0.808 18.22 0.446 0.644
Complex Bedroom 17.70 0.145 0.791 15.26 0.221 0.694 16.69 0.186 0.741 14.42 0.434 0.555
Complex Kitchen 19.28 0.152 0.811 18.44 0.178 0.771 19.28 0.168 0.755 16.70 0.471 0.533
Complex Livingroom 18.61 0.163 0.800 17.94 0.187 0.783 18.39 0.175 0.770 16.82 0.382 0.602
Complex Office 20.20 0.096 0.858 17.22 0.169 0.781 18.93 0.144 0.776 15.78 0.468 0.529

Table 1: Quantitative results of our 3D relighting on the synthetic datasets (where ground truth is available), compared to previous work, from
left to right: OutCast [GRP22] (run on individual images from 3DGS [KKLD23]), Relightable3DGaussians [GGL∗ 23], and TensoIR [JLX∗ 23].
Arrows indicate higher/lower (↑ / ↓) is better. Results are color coded by best , second- and third- best.
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 9 of 14

GT Ours Outcast R3DG TensoIR

Figure 11: We show comparative results of our method of synthetic scenes where the (approximate) ground truth is available (left), and
compare to previous methods. Our approach is closer to the ground truth lighting, capturing the overall appearance in a realistic manner.
10 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

soIR [JLX∗ 23] and RelightableGaussians [GGL∗ 23]. Given that multi-illumination dataset and fine-tuning a large pretrained gen-
most other methods do not handle full scenes well, we also create a erative model. Our results show that we can synthesize realistic
new baseline, by first training 3DGS [KKLD23] on the input data relighting of captured scenes, while allowing interactive novel-view
and render a test path using novel view synthesis; We then use Out- synthesis by building on such priors. Our method shows levels of
Cast [GRP22] to relight each individual rendered frame using the realism for relighting that surpass the state of the art for cluttered
target direction. We trained TensoIR [JLX∗ 23] using the default indoor scenes (as opposed to isolated objects).
configuration but modified the “density_shift” parameter from −10
to −8 to achieve best results on our data. For Relightable 3D Gaus-
sians [GGL∗ 23], we train their “Stage 1” for 30K iterations and
“Stage 2” for an additional 10K to recover the BRDF parameters.
We then relight the scenes using 360° environment maps rendered
in Blender using a generic empty room and a similar camera/flash
setup for ground truth. Finally, to improve the baselines we normal-
ize the predictions of all methods; we first subtract the channel-wise
mean and divide out the channel-wise standard deviation, and then
multiply and add the corresponding parameters of the ground truths.
These operations are performed in LAB space for all methods.

Experimental methodology. We use our synthetic test scenes for


providing quantitative results. To compare our method, we rendered
200 novel views with 18 different lighting directions to evaluate the
relighting quality for each method by computing standard image
quality metrics. Given the complexity of setup for [PMGD21], we Figure 12: Example limitations of our approach, with our prediction
only show qualitative results for 1 real scene in Fig. 10. Here, our (top) vs ground truth (bottom). Our ControlNet mistakenly produces
method was trained at 768 × 512 resolution for 200k iterations, with a shadow at the top of the image while there should not be any
a batch size of 8 and a learning rate of 10−4 . (red arrow), presumably assuming the presence of another top shelf.
Additionally, the highlight position is somewhat incorrect (yellow
Results. We present quantitative results in Table 1. We present per- arrow), ostensibly because we define light direction in a manner
scene results on the following image quality metrics: PSNR, SSIM, that is not fully physically accurate.
and LPIPS [ZIE∗ 18]. The results demonstrate that our method out-
performs all others in all but a few scenarios, where it still achieves
competitive performance. One limitation of the proposed method is that it does not en-
force physical accuracy: the target light direction is noisy and the
Qualitative comparisons are shown in Fig. 11; on the left we show ControlNet relies mostly on its powerful Stable Diffusion prior to
the ground truth relit image rendered in Blender, and we then show relight rather than performing physics-based reasoning. For exam-
our results, as well as those from Outcast [GRP22], Relightable 3D ple, Fig. 12 shows that ControlNet can hallucinate shadows due
Gaussians [GGL∗ 23] and TensoIR [JLX∗ 23]. Please refer to the to unseen geometry, while there should not be any. Given that we
supplementary HTML viewer for more results. We clearly see that define light direction in a manner that is not fully physically accu-
our method is closer to the ground truth, visually confirming the rate, the positioning of highlight can be inaccurate, as is also shown
quantitative results in Tab. 1. TensoIR has difficulty reconstructing in Fig. 12. In addition, the appearance embeddings can correct for
the geometry, and Relightable 3D Gaussians tend to have a “splotchy” global inconsistencies indirectly and do not explicitly rely on the
look due to inaccurate normals. Outcast has difficulty with the learned 3D representation of the radiance field. Our method does
overall lighting condition and can add incorrect shadows, but in not always remove or move shadows in a fully accurate physically-
many cases produces convincing results since it operates in image based manner. While our method clearly demonstrates that 2D diffu-
space. Our results show that by using the diffusion prior we manage sion model priors can be used for realistic relighting, the ability to
to achieve realistic relighting, surpassing the state of the art. perform more complex relighting—rather than just changing light
Our method was trained for indoor scenes; Fig. 13 gives addi- direction—requires significant future research, e.g., by using more
tional ControlNet results on out-of-distribution samples, showing general training data as well as ways to encode and decode complex
that it can generalize to some extent to unseen scenes and lighting lighting.
conditions, although the realism is lower than for in-distribution An interesting direction for future work would be trying to en-
samples. force multi-view consistency more explicitly in ControlNet, e.g. by
leveraging single-illumination multi-view data. Another interesting
5. Conclusion
direction is to develop solutions that would guide the predicted
We have presented the first method to effectively leverage the strong relighting making it more accurate, leveraging the 3D geometric
prior of large generative diffusion models in the context of radi- information available in the radiance field more explicitly.
ance field relighting. Rather than relying on accurate geometry,
material and/or lighting estimation, our approach models realistic Acknowledgements This research was funded by the ERC Ad-
illumination directly, by leveraging a general-purpose single-view, vanced grant FUNGRAPH No 788065 http://fungraph.
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 11 of 14

Figure 13: We show the results of our 2D relighting network on out-of-distribution images (StyleGAN-generated woman and MipNeRF360
BICYCLE, GARDEN, and STUMP). On human faces, ControlNet may change the expression as well as the lighting, or create excessive shininess;
on outdoor scenes, while the overall lighting direction is plausible, the network fails to generate sufficiently hard shadows.

inria.fr/, supported by NSERC grant DGPIN 2020-04799 and References


the Digital Research Alliance Canada. The authors are grateful to
[BBJ∗ 21] B OSS M., B RAUN R., JAMPANI V., BARRON J. T., L IU C.,
Adobe and NVIDIA for generous donations, and the OPAL infras- L ENSCH H.: Nerd: Neural reflectance decomposition from image collec-
tructure from Université Côte d’Azur. Thanks to Georgios Kopanas tions. In IEEE/CVF Int. Conf. Comput. Vis. (2021). 2, 3
and Frédéric Fortier-Chouinard for helpful advice.
[BEK∗ 22] B OSS M., E NGELHARDT A., K AR A., L I Y., S UN D., BAR -
RON J. T., L ENSCH H. P., JAMPANI V.: SAMURAI: Shape and material
from unconstrained real-world arbitrary image collections. In Adv. Neural
Inform. Process. Syst. (2022). 3
12 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

[BF24] B HATTAD A., F ORSYTH D. A.: Stylitgan: Prompting stylegan to [LCL∗ 23] L IANG R., C HEN H., L I C., C HEN F., PANNEER S., V IJAYKU -
produce new illumination conditions. In IEEE/CVF Conf. Comput. Vis. MAR N.: Envidr: Implicit differentiable renderer with neural environment
Pattern Recog. (2024). 3 lighting. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3
[BJB∗ 21] B OSS M., JAMPANI V., B RAUN R., L IU C., BARRON J., [LGF∗ 22] L I Q., G UO J., F EI Y., L I F., G UO Y.: Neulighting: Neural
L ENSCH H.: Neural-pil: Neural pre-integrated lighting for reflectance lighting for free viewpoint outdoor scene relighting with unconstrained
decomposition. In Adv. Neural Inform. Process. Syst. (2021). 2, 3 photo collections. In SIGGRAPH Asia 2022 Conference Papers (2022). 3
[BMHF23] B HATTAD A., M C K EE D., H OIEM D., F ORSYTH D.: Style- [LGZ∗ 20] L IU A., G INOSAR S., Z HOU T., E FROS A. A., S NAVELY N.:
gan knows normal, depth, albedo, and more. 3 Learning to factorize and relight a city. In Eur. Conf. Comput. Vis. (2020).
2
[BMT∗ 21] BARRON J. T., M ILDENHALL B., TANCIK M., H EDMAN
P., M ARTIN -B RUALLA R., S RINIVASAN P. P.: Mip-nerf: A multiscale [LLLY23] L IN S., L IU B., L I J., YANG X.: Common diffusion noise
representation for anti-aliasing neural radiance fields. In IEEE/CVF Int. schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891
Conf. Comput. Vis. (2021). 2 (2023). 5
[LLZ∗ 20] L IU D., L ONG C., Z HANG H., Y U H., D ONG X., X IAO C.:
[BMV∗ 22] BARRON J. T., M ILDENHALL B., V ERBIN D., S RINIVASAN
Arshadowgan: Shadow generative adversarial network for augmented
P. P., H EDMAN P.: Mip-nerf 360: Unbounded anti-aliased neural radiance
reality in single light scenes. In IEEE/CVF Conf. Comput. Vis. Pattern
fields. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2022). 2
Recog. (2020). 2
[CKK23] C HOI C., K IM J., K IM Y. M.: IBL-NeRF: Image-based lighting [LSB∗ 22] L I Z., S HI J., B I S., Z HU R., S UNKAVALLI K., H AŠAN M.,
formulation of neural radiance fields. Comput. Graph. Forum 42, 7 (2023). X U Z., R AMAMOORTHI R., C HANDRAKER M.: Physically-based editing
3 of indoor scene lighting from a single image. In Eur. Conf. Comput. Vis.
[CXG∗ 22] C HEN A., X U Z., G EIGER A., Y U J., S U H.: Tensorf: Tenso- (2022). 2
rial radiance fields. In Eur. Conf. Comput. Vis. (2022). 2 [LSY∗ 21] L AGUNAS M., S UN X., YANG J., V ILLEGAS R., Z HANG J.,
[DHT∗ 00] D EBEVEC P., H AWKINS T., T CHOU C., D UIKER H.-P., S HU Z., M ASIA B., G UTIERREZ D.: Single-image full-body human
S AROKIN W., S AGAR M.: Acquiring the reflectance field of a human relighting. Eur. Graph. Symp. Render. (2021). 2
face. In Proceedings of the 27th annual conference on Computer graphics [LWC∗ 23] L I Z., WANG L., C HENG M., PAN C., YANG J.: Multi-view
and interactive techniques (2000), pp. 145–156. 1 inverse rendering for large-scale real-world indoor scenes. In IEEE/CVF
Conf. Comput. Vis. Pattern Recog. (2023). 3
[FRV∗ 23] F UTSCHIK D., R ITLAND K., V ECORE J., FANELLO S., O RTS -
E SCOLANO S., C URLESS B., S ỲKORA D., PANDEY R.: Controllable [LWL∗ 23] L IU Y., WANG P., L IN C., L ONG X., WANG J., L IU L.,
light diffusion for portraits. In IEEE/CVF Conf. Comput. Vis. Pattern KOMURA T., WANG W.: Nero: Neural geometry and brdf reconstruction
Recog. (2023). 2 of reflective objects from multiview images. ACM Trans. Graph. (2023).
3
[GAA∗ 23] G AL R., A RAR M., ATZMON Y., B ERMANO A. H., C HECHIK
G., C OHEN -O R D.: Encoder-based domain tuning for fast personalization [LZF∗ 24] L IANG Z., Z HANG Q., F ENG Y., S HAN Y., J IA K.: Gs-ir: 3d
of text-to-image models. ACM Trans. Graph. 42, 4 (2023), 1–13. 3 gaussian splatting for inverse rendering. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
[GCD∗ 20] G AO D., C HEN G., D ONG Y., P EERS P., X U K., T ONG X.: 3
Deferred neural lighting: free-viewpoint relighting from unstructured
photographs. ACM Trans. Graph. 39, 6 (nov 2020). 3 [MBRS∗ 21] M ARTIN -B RUALLA R., R ADWAN N., S AJJADI M. S. M.,
BARRON J. T., D OSOVITSKIY A., D UCKWORTH D.: NeRF in the
[GGL∗ 23] G AO J., G U C., L IN Y., Z HU H., C AO X., Z HANG L., YAO Wild: Neural Radiance Fields for Unconstrained Photo Collections. In
Y.: Relightable 3d gaussian: Real-time point cloud relighting with brdf IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2021). 6
decomposition and ray tracing. arXiv:2311.16043 (2023). 2, 3, 8, 10
[MESK22] M ÜLLER T., E VANS A., S CHIED C., K ELLER A.: Instant
[GRP22] G RIFFITHS D., R ITSCHEL T., P HILIP J.: Outcast: Outdoor neural graphics primitives with a multiresolution hash encoding. ACM
single-image relighting with cast shadows. In Comput. Graph. Forum Trans. Graph. 41, 4 (July 2022), 102:1–102:15. 2, 5
(2022), vol. 41, pp. 179–193. 3, 8, 10 [MGAD19] M URMANN L., G HARBI M., A ITTALA M., D URAND F.: A
[HHM22] H ASSELGREN J., H OFMANN N., M UNKBERG J.: Shape, light, multi-illumination dataset of indoor object appearance. In IEEE/CVF Int.
and material decomposition from images using monte carlo rendering and Conf. Comput. Vis. (2019). 2, 4, 5, 7
denoising. Adv. Neural Inform. Process. Syst. 35 (2022), 22856–22869. 3 [MHS∗ 22] M UNKBERG J., H ASSELGREN J., S HEN T., G AO J., C HEN
[HJA20] H O J., JAIN A., A BBEEL P.: Denoising diffusion probabilistic W., E VANS A., M ÜLLER T., F IDLER S.: Extracting Triangular 3D Mod-
models. In Adv. Neural Inform. Process. Syst. (2020). 3 els, Materials, and Lighting From Images. In IEEE/CVF Conf. Comput.
Vis. Pattern Recog. (June 2022), pp. 8280–8290. 3
[JLX∗ 23] J IN H., L IU I., X U P., Z HANG X., H AN S., B I S., Z HOU X.,
[MST∗ 20] M ILDENHALL B., S RINIVASAN P. P., TANCIK M., BARRON
X U Z., S U H.: Tensoir: Tensorial inverse rendering. In IEEE/CVF Conf.
J. T., R AMAMOORTHI R., N G R.: Nerf: Representing scenes as neural
Comput. Vis. Pattern Recog. (2023). 2, 3, 8, 10
radiance fields for view synthesis. In Eur. Conf. Comput. Vis. (2020). 1, 2
[JTL∗ 24] J IANG Y., T U J., L IU Y., G AO X., L ONG X., WANG W., M A [NDDJK21] N IMIER -DAVID M., D ONG Z., JAKOB W., K APLANYAN
Y.: Gaussianshader: 3d gaussian splatting with shading functions for A.: Material and lighting reconstruction for complex indoor scenes with
reflective surfaces, 2024. 3 texture-space differentiable rendering. Comput. Graph. Forum (2021). 3
[KE19] K ANAMORI Y., E NDO Y.: Relighting humans: occlusion-aware [PD23] P HILIP J., D ESCHAINTRE V.: Floaters no more: Radiance field
inverse rendering for full-body human images. ACM Trans. Graph. 37, 6 gradient scaling for improved near-camera training. In EGSR Conference
(2019). 2 proceedings DL-track (2023), The Eurographics Association. 7
[KKLD23] K ERBL B., KOPANAS G., L EIMKÜHLER T., D RETTAKIS G.: [PEL∗ 23] P ODELL D., E NGLISH Z., L ACEY K., B LATTMANN A.,
3d gaussian splatting for real-time radiance field rendering. ACM Trans. D OCKHORN T., M ÜLLER J., P ENNA J., ROMBACH R.: Sdxl: Improv-
Graph. 42, 4 (July 2023). 2, 5, 6, 7, 8, 10 ing latent diffusion models for high-resolution image synthesis. arXiv
[KOH∗ 24] K E B., O BUKHOV A., H UANG S., M ETZGER N., DAUDT preprint arXiv:2307.01952 (2023). 3
R. C., S CHINDLER K.: Repurposing diffusion-based image generators [PGZ∗ 19] P HILIP J., G HARBI M., Z HOU T., E FROS A. A., D RETTAKIS
for monocular depth estimation. In IEEE/CVF Conf. Comput. Vis. Pattern G.: Multi-view relighting using a geometry-aware network. ACM Trans.
Recog. (2024). 5 Graph. 38, 4 (2019), 78–1. 2, 3
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 13 of 14

[PLMZ23] PAPANTONIOU F. P., L ATTAS A., M OSCHOGLOU S., [TTM∗ 22] T EWARI A., T HIES J., M ILDENHALL B., S RINIVASAN P.,
Z AFEIRIOU S.: Relightify: Relightable 3d faces from a single image via T RETSCHK E., Y IFAN W., L ASSNER C., S ITZMANN V., M ARTIN -
diffusion models. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 2 B RUALLA R., L OMBARDI S., ET AL .: Advances in neural rendering. In
Comput. Graph. Forum (2022), vol. 41, Wiley Online Library, pp. 703–
[PMGD21] P HILIP J., M ORGENTHALER S., G HARBI M., D RETTAKIS
735. 2
G.: Free-viewpoint indoor neural relighting from multi-view stereo. ACM
Trans. Graph. 40, 5 (2021), 1–18. 2, 3, 7, 8, 10 [Ull79] U LLMAN S.: The interpretation of structure from motion. Pro-
ceedings of the Royal Society of London. Series B, Biological sciences
[PTS23] P ONGLERTNAPAKORN P., T RITRONG N., S UWAJANAKORN S.: 203, 1153 (January 1979), 405—426. 2
Difareli: Diffusion face relighting. In IEEE/CVF Int. Conf. Comput. Vis.
(2023). 2 [VZG∗ 23] VALENÇA L., Z HANG J., G HARBI M., H OLD -G EOFFROY Y.,
L ALONDE J.-F.: Shadow harmonization for realistic compositing. In
[RBL∗ 22] ROMBACH R., B LATTMANN A., L ORENZ D., E SSER P., O M - SIGGRAPH Asia 2023 Conference Papers (2023). 2
MER B.: High-resolution image synthesis with latent diffusion models.
[WSG∗ 23] WANG Z., S HEN T., G AO J., H UANG S., M UNKBERG J.,
In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2022). 2, 3, 4, 7
H ASSELGREN J., G OJCIC Z., C HEN W., F IDLER S.: Neural fields meet
[RES∗ 22] RUDNEV V., E LGHARIB M., S MITH W., L IU L., G OLYANIK explicit geometric representations for inverse rendering of urban scenes.
V., T HEOBALT C.: Nerf for outdoor scene relighting. In Eur. Conf. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2023). 3
Comput. Vis. (2022). 3 [WSLG23] W U T., S UN J.-M., L AI Y.-K., G AO L.: De-nerf: Decoupled
[RLJ∗ 23] RUIZ N., L I Y., JAMPANI V., P RITCH Y., RUBINSTEIN M., neural radiance fields for view-consistent appearance editing and high-
A BERMAN K.: Dreambooth: Fine tuning text-to-image diffusion models frequency environmental relighting. In ACM SIGGRAPH (2023). 3
for subject-driven generation. In IEEE/CVF Conf. Comput. Vis. Pattern [WZL∗ 08] WANG Y., Z HANG L., L IU Z., H UA G., W EN Z., Z HANG
Recog. (2023), pp. 22500–22510. 3 Z., S AMARAS D.: Face relighting from a single image under arbitrary
[SDWMG15] S OHL -D ICKSTEIN J., W EISS E., M AHESWARANATHAN unknown lighting conditions. IEEE Trans. Pattern Anal. Mach. Intell. 31,
N., G ANGULI S.: Deep unsupervised learning using nonequilibrium 11 (2008), 1968–1984. 2
thermodynamics. 3 [XZC∗ 23] X U Y., Z OSS G., C HANDRAN P., G ROSS M., B RADLEY D.,
G OTARDO P.: Renerf: Relightable neural radiance fields with nearfield
[SDZ∗ 21] S RINIVASAN P. P., D ENG B., Z HANG X., TANCIK M.,
lighting. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3
M ILDENHALL B., BARRON J. T.: Nerv: Neural reflectance and visibility
fields for relighting and view synthesis. In IEEE/CVF Conf. Comput. Vis. [YME∗ 20] Y U Y., M EKA A., E LGHARIB M., S EIDEL H.-P., T HEOBALT
Pattern Recog. (2021). 3 C., S MITH W. A.: Self-supervised outdoor scene relighting. In Eur. Conf.
Comput. Vis. (2020). 2
[SF16] S CHÖNBERGER J. L., F RAHM J.-M.: Structure-from-motion
revisited. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2016). 2, 5 [YS22] Y U Y., S MITH W. A. P.: Outdoor inverse rendering from a single
image using multiview self-supervision. IEEE Trans. Pattern Anal. Mach.
[SJL∗ 23] S HARMA P., JAMPANI V., L I Y., J IA X., L AGUN D., D U - Intell. 44, 7 (2022), 3659–3675. 2
RAND F., F REEMAN W. T., M ATTHEWS M.: Alchemist: Parametric
[YZL∗ 22] YAO Y., Z HANG J., L IU J., Q U Y., FANG T., M C K INNON D.,
control of material properties with diffusion models. arXiv preprint
T SIN Y., Q UAN L.: Neilf: Neural incident light field for physically-based
arXiv:2312.02970 (2023). 3
material estimation. In Eur. Conf. Comput. Vis. (2022). 3
[SKCJ18] S ENGUPTA S., K ANAZAWA A., C ASTILLO C. D., JACOBS [ZCD∗ 23] Z ENG C., C HEN G., D ONG Y., P EERS P., W U H., T ONG X.:
D. W.: Sfsnet: Learning shape, reflectance and illuminance of facesin the Relighting neural radiance fields with shadow and highlight hints. In
wild. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2018). 2 ACM SIGGRAPH 2023 Conference Proceedings (2023). 3
[SLZ∗ 22] S HENG Y., L IU Y., Z HANG J., Y IN W., O ZTIRELI A. C., [ZDP∗ 24] Z ENG C., D ONG Y., P EERS P., KONG Y., W U H., T ONG
Z HANG H., L IN Z., S HECHTMAN E., B ENES B.: Controllable shadow X.: Dilightnet: Fine-grained lighting control for diffusion-based image
generation using pixel height maps. In Eur. Conf. Comput. Vis. (2022). 2 generation. In ACM SIGGRAPH 2024 Conference Proceedings (2024). 3
[SME20] S ONG J., M ENG C., E RMON S.: Denoising diffusion implicit [ZFC∗ 23] Z HU Z., F ENG X., C HEN D., BAO J., WANG L., C HEN Y.,
models. In Int. Conf. Learn. Represent. (2020). 3 Y UAN L., H UA G.: Designing a better asymmetric vqgan for stablediffu-
sion, 2023. 5
[SWW∗ 23] S HI Y., W U Y., W U C., L IU X., Z HAO C., F ENG H., L IU J.,
Z HANG L., Z HANG J., Z HOU B., D ING E., WANG J.: Gir: 3d gaussian [ZHY∗ 23] Z HU J., H UO Y., Y E Q., L UAN F., L I J., X I D., WANG L.,
inverse rendering for relightable scene factorization. Arxiv (2023). 3 TANG R., H UA W., BAO H., ET AL .: I2-sdf: Intrinsic indoor scene
reconstruction and editing via raytracing in neural sdfs. In IEEE/CVF
[SYH∗ 17] S HU Z., Y UMER E., H ADAP S., S UNKAVALLI K., S HECHT- Conf. Comput. Vis. Pattern Recog. (2023). 3
MAN E., S AMARAS D.: Neural face editing with intrinsic image dis-
entangling. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2017). [ZIE∗ 18] Z HANG R., I SOLA P., E FROS A. A., S HECHTMAN E., WANG
2 O.: The unreasonable effectiveness of deep features as a perceptual metric.
In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2018). 10
[SZB21] S HENG Y., Z HANG J., B ENES B.: Ssn: Soft shadow network
[ZLW∗ 21] Z HANG K., L UAN F., WANG Q., BALA K., S NAVELY N.:
for image compositing. In IEEE/CVF Conf. Comput. Vis. Pattern Recog.
PhySG: Inverse rendering with spherical gaussians for physics-based
(2021). 2
material editing and relighting. In IEEE/CVF Conf. Comput. Vis. Pattern
[SZPF16] S CHÖNBERGER J. L., Z HENG E., P OLLEFEYS M., F RAHM Recog. (2021). 3
J.-M.: Pixelwise view selection for unstructured multi-view stereo. In [ZLZ∗ 22] Z HU Z.-L., L I Z., Z HANG R.-X., G UO C.-L., C HENG M.-
Eur. Conf. Comput. Vis. (2016). 5 M.: Designing an illumination-aware network for deep image relighting.
[TÇE∗ 21] T ÜRE M., Ç IKLABAKKAL M. E., E RDEM A., E RDEM E., IEEE Trans. Image Process. 31 (2022), 5396–5411. 2
S ATILMI Ş P., A KYÜZ A. O.: From noon to sunset: Interactive rendering, [ZRA23] Z HANG L., R AO A., AGRAWALA M.: Adding conditional
relighting, and recolouring of landscape photographs by modifying solar control to text-to-image diffusion models. In IEEE/CVF Int. Conf. Comput.
position. In Comput. Graph. Forum (2021), vol. 40, pp. 500–515. 2 Vis. (2023). 2, 3, 4, 7
[TDMS∗ 23] T OSCHI M., D E M ATTEO R., S PEZIALETTI R., D E G RE - [ZSD∗ 21] Z HANG X., S RINIVASAN P. P., D ENG B., D EBEVEC P., F REE -
GORIO D., D I S TEFANO L., S ALTI S.: Relight my nerf: A dataset for MAN W. T., BARRON J. T.: Nerfactor: Neural factorization of shape and
novel view synthesis and relighting of real world objects. In IEEE/CVF reflectance under an unknown illumination. ACM Trans. Graph. 40, 6
Conf. Comput. Vis. Pattern Recog. (June 2023), pp. 20762–20772. 3 (2021), 1–18. 2, 3

© 2024 Eurographics - The European Association


for Computer Graphics and John Wiley & Sons Ltd.
14 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting

[ZSH∗ 22] Z HANG Y., S UN J., H E X., F U H., J IA R., Z HOU X.: Mod-
eling indirect illumination for inverse rendering. In IEEE/CVF Conf.
Comput. Vis. Pattern Recog. (2022). 3
[ZTS∗ 22] Z HANG X., T SENG N., S YED A., B HASIN R., JAIPURIA N.:
Simbar: Single image-based scene relighting for effective data augmenta-
tion for automated driving vision tasks. In IEEE/CVF Conf. Comput. Vis.
Pattern Recog. (06 2022). 2
[ZYL∗ 23] Z HANG J., YAO Y., L I S., L IU J., FANG T., M C K INNON D.,
T SIN Y., Q UAN L.: Neilf++: Inter-reflectable light fields for geometry
and material estimation. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3

You might also like