Relighting Radiance Fields EGSR
Relighting Radiance Fields EGSR
(authors version)
Eurographics Symposium on Rendering 2024
E Garces and E. Haines
(Guest Editors)
Figure 1: Our method produces relightable radiance fields directly from single-illumination multi-view dataset, by using priors from generative
data in the place of an actual multi-illumination capture.
Abstract
Relighting radiance fields is severely underconstrained for multi-view data, which is most often captured under a single
illumination condition; It is especially hard for full scenes containing multiple objects. We introduce a method to create
relightable radiance fields using such single-illumination data by exploiting priors extracted from 2D image diffusion models.
We first fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by light direction, allowing us to augment a
single-illumination capture into a realistic – but possibly inconsistent – multi-illumination dataset from directly defined light
directions. We use this augmented data to create a relightable radiance field represented by 3D Gaussian splats. To allow direct
control of light direction for low-frequency lighting, we represent appearance with a multi-layer perceptron parameterized on
light direction. To enforce multi-view consistency and overcome inaccuracies we optimize a per-image auxiliary feature vector.
We show results on synthetic and real multi-view data under single illumination, demonstrating that our method successfully
exploits 2D diffusion model priors to allow realistic 3D relighting for complete scenes.
Keywords: NeRF, Radiance Field, Relighting
results on isolated masked objects [BBJ∗ 21], or make simplifying as- 2.1. Radiance Fields
sumptions such as distant environment lighting [BJB∗ 21, ZSD∗ 21].
Radiance field methods have revolutionized 3D scene capture using
Other methods have handled more complex illumination models,
multi-view datasets (photos or video) as input. In particular, Neural
including full scenes [PMGD21, PGZ∗ 19], but can be limited in
Radiance Fields (NeRFs) [MST∗ 20] learn to synthesize novel views
the complexity of the geometry and materials that must reconstruct
of a given scene by regressing its radiance from a set of input
well. Finally, methods that depend on accurate estimates of surface
images (multiple photos or videos of a 3D scene). Structure from
normals [JLX∗ 23, GGL∗ 23] often produce limited levels of realism
motion [Ull79, SF16] is used to estimate the camera poses for all
when relighting.
images and rays are cast through the center of all pixels. A multi-
At the other end of the spectrum, diffusion models (DMs, e.g., layer perceptron (MLP) cθ parameterized by 3D position and view
[RBL∗ 22]), trained on billions of natural images, have shown ex- direction is used to represent the radiance and opacity of the scene.
ceptional abilities to capture real image distribution priors and can The optimization objective is simply the mean squared error:
synthesize complex lighting effects. While recent progress shows h i
LNeRF = Eo,d,c∗ ||cθ (o, d) − c∗ ||22 ,
they can be controlled in various ways [ZRA23], extracting lighting-
specific priors from these models, especially for full 3D scenes, has where o is a ray’s origin, d its direction, and c∗ the target RGB
not yet been demonstrated. color value of its corresponding pixel. The predicted color for
that pixel is obtained by integrating a color field cθ weighted by a
In this paper, we build on these observations and present a new
density field σθ following the equation of volume rendering. The
method that demonstrates that it is possible to create relightable radi-
original NeRF was slow to train and to render; A vast number
ance fields for complete scenes from single low-frequency lighting
of methods [TTM∗ 22] have been proposed to improve the orig-
condition captures by exploiting 2D diffusion model priors. We first
inal technique, e.g., acceleration structures [MESK22], antialias-
propose to fine-tune a pre-trained DM conditioned on the dominant
ing [BMT∗ 21], handling larger scenes [BMV∗ 22] etc. Recently,
light source direction. For this, we leverage a dataset of images
3D Gaussian Splatting (3DGS) [KKLD23] introduces an explicit,
with many lighting conditions of the same scene [MGAD19], which
primitive-based representation of radiance fields. The anisotropic
enables the DM to produce relit versions of an image with explicit
nature of the 3D Gaussians allows the efficient representation of fine
control over the dominant lighting direction. We use this 2D relight-
detail, and the fast GPU-accelerated rasterization used allows real-
ing network to augment any standard multi-view dataset taken under
time rendering. We use 3DGS to represent radiance fields mainly for
single lighting by generating multiple relit versions of each image,
performance, but any other radiance representation, e.g., [CXG∗ 22],
effectively transforming it into a multi-illumination dataset. Given
could be used instead. Radiance fields are most commonly used in
this augmented dataset, we train a new relightable radiance field
the context of single-light condition captures, i.e., the images are all
with direct control on lighting direction, which in turn enables real-
captured under the same lighting. As a result, there is no direct way
istic interactive relighting of full scenes with lighting and camera
to change the lighting of captured scenes, severely restricting the
view control in real time for low-frequency lighting. We build on 3D
utility of radiance fields compared to traditional 3D graphics assets.
Gaussian Splatting [KKLD23], enhancing the radiance field with
Our method uses diffusion models to simulate multi-light conditions
a small Multi-Layer Perceptron and an auxiliary feature vector to
from a single-light capture thus allowing the relighting of radiance
account for the approximate nature of the generated lightings and to
fields.
handle lighting inconsistencies between views.
OutCast [GRP22] produces realistic, user-controllable, hard, cast 3D assets (meshes and SVBRDFs) and applying physically-based
shadows from the sun. In contrast, we focus on indoor scenes which rendering algorithms. IBL-NeRF [CKK23] allows for scene-scale
often exhibit soft shadows and more complex lighting effects. Fi- material estimation but bakes the illumination into a prefiltered light-
nally, the concurrent work of Zeng et al. [ZDP∗ 24] uses diffusion field which prevents relighting. Recently, NeRF-OSR [RES∗ 22],
models to relight isolated objects using environment maps. In con- I2 -SDF [ZHY∗ 23], and Wang et al. [WSG∗ 23] focused on scene
trast to these solutions, we focus on cluttered indoor scenes which scale, single illumination relighting scenes using both implicit and
often exhibit soft shadows and more complex lighting effects. explicit representations. While they can achieve reasonable results,
they often lack overall realism, exhibiting bumpy or overly smooth
shading during relighting. In contrast, our use of diffusion priors
2.3. Multi-view Relighting
provides realistic-looking output.
While single-view methods produce good results on restricted
datasets such as faces, they are often limited by the lack of ac-
curate geometry, required to simulate light transport. To this point,
multi-view data can provide a more accurate and complete geometric 2.4. Diffusion Models
reconstruction. For example Philip et al. [PMGD21, PGZ∗ 19] build Diffusion Models (DMs) [SDWMG15, HJA20] made it possible
on multi-view stereo (MVS) reconstruction of the scene, and learn
to train generative models on diverse, high-resolution datasets of
a prior from synthetic data rendered under multiple lighting condi-
billions of images. These models learn to invert a forward diffusion
tions. Despite correcting for many of the reconstruction artifacts,
process that gradually transforms images into isotropic Gaussian
these methods are restricted by the quality of MVS reconstruction.
noise, by adding random Gaussian noise ϵt ∼ N (0, I) to an image
Nimier et al. [NDDJK21] also present a scene-scale solution but
in T steps. DMs train a neural network gφ with parameters φ to learn
require a complex pipeline that optimizes in texture space. Gao et
to denoise with the objective:
al. [GCD∗ 20] use a rough proxy and neural textures to allow object
relighting. h i
LDiffusion = Ex,ϵ,1≤t≤T ||gφ (xt |t) − yt ||22 ,
More recently radiance fields have also been used as a geometric
representation for relighting. Most methods work on the simple in which target yt is often set to ϵ. After training, sampling can
case of a single isolated object while we target larger scenes. Such be performed step-by-step, by predicting xt−1 from xt for each
methods typically assume lighting to be distant, often provided as timestep t which is expensive since T can be high (e.g., 1000);
an environment map. NeRFactor [ZSD∗ 21] uses a Bi-Directional faster alternatives include deterministic DDIM [SME20] sampling,
Reflectance D istribution F unction ( BRDF) p rior f rom measure- that can perform sampling of comparable quality with fewer steps
ments and estimates normals and visibility, while NERV [SDZ∗ 21] (i.e., 10-50× larger steps). Stable Diffusion [RBL∗ 22] performs
and Zhang et al. [ZSH∗ 22] predicts visibility to allow indirect il- denoising in a lower-dimensional latent space, by first training a
lumination estimation. NERD [BBJ∗ 21], PhySG [ZLW∗ 21], and variational encoder to compress images; for instance, in Stable
DE-NeRF [WSLG23] use efficient physically-based material and Diffusion XL [PEL∗ 23], images are mapped to a latent space of
lighting models to decompose a radiance field into spatially vary- size R128×128×4 . In a pre-pass, the dataset is compressed using this
ing BRDFs, while Neural-PIL [BJB∗ 21] learns the illumination autoencoder, and a text-conditioned diffusion model is then trained
integration and low-dimensional BRDF priors using auto-encoders. directly in this latent space.
TensorIR [JLX∗ 23] uses a mixed radiance and physics-based for-
mulation to recover intrinsic properties. NeRO [LWL∗ 23] focuses Diffusion models have an impressive capacity to synthesize highly
on specular objects showing very promising results. Relightable realistic images, typically conditioned on text prompts. The power
Gaussians [GGL∗ 23] use the more recent 3D Gaussian represen- of DMs lies in the fact that the billions of images used for training
tation along with ray-tracing to estimate properties of objects. GS- contain an extremely rich representation of the visual world. How-
IR [LZF∗ 24], GaussianShader [JTL∗ 24] and GIR [SWW∗ 23] also ever, extracting the required information for specific tasks, without
build on 3D Gaussian splatting, proposing different approaches to incurring the (unrealistic) cost of retraining DMs is not straightfor-
estimate more reliable normals while approximating visibility and ward. A set of recent methods show that it is possible to fine-tune
indirect illumination; these work well for isolated objects under dis- DMs with a typically much shorter training process to perform
tant lighting. However, these methods struggle with more complex specific tasks (e.g., [GAA∗ 23, RLJ∗ 23]). A notable example is Con-
scene-scale input and near-field illumination but can work or be trolNet [ZRA23] which proposed an efficient method for fine-tuning
adapted to both single and multi-illumination input data. Stable Diffusion with added conditioning. In particular, they demon-
strated conditional generation from depth, Canny edges, etc., with
Feeding multi-view multi-illumination data to a relightable radi- and without text prompts; We will build on this solution for our 2D
ance field indeed enables better relighting but at the cost of highly- relighting method.
controlled capture conditions [XZC∗ 23, ZCD∗ 23, TDMS∗ 23] or an
extended dataset of unconstrained illuminations [BEK∗ 22,LGF∗ 22]. In a similar spirit, there has been significant evidence in recent
In our method, we use a Diffusion Model to simulate multi- years that latent spaces of generative models encode material in-
illumination data, lifting the capture constraints while benefiting formation [BMHF23, BF24]. Recent work shows the potential to
from the lighting variations. Another body of work [MHS∗ 22, fine-tune DMs to allow direct material editing [SJL∗ 23]. Nonethe-
HHM22, YZL∗ 22, ZYL∗ 23, LWC∗ 23, LCL∗ 23] achieve object or less, we are unaware of published methods that use DM fine-tuning
scene relighting from multi-view images by extracting traditional to perform realistic relighting of full and cluttered scenes.
4 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting
Figure 2: We use the single-view, multi-illumination dataset of Murmann et al. [MGAD19] to train ControlNet [ZRA23] on single view
supervised relighting. The network accepts an image (along with its estimated depth map) and a target light direction as input and produces a
relit version of the same scene under the desired target lighting.
3. Method
Our method is composed of three main parts. First, we create a 2D
relighting neural network with direct control of lighting direction
(Sec. 3.1). Second, we use this network to augment a multi-view
capture with single lighting into a multi-lighting dataset, by using
our relighting network. The resulting dataset can be used to create
a radiance field representation of the 3D scene (Sec. 3.2). Finally,
we create a relightable radiance field that accounts for inaccuracies
in the synthesized relit input images and provides a multi-view
consistent lighting solution (Sec. 3.3). Figure 3: Top row: five diffuse sphere rendered by our optimized
lighting direction and shading parameters — the direction is in-
dicated by a blue dot at the point of maximum specular intensity;
3.1. Single-View Relighting with 2D Diffusion Priors Bottom row: the corresponding target gray spheres obtained by
Relighting a scene captured under a single lighting condition is averaging the diffuse spheres captured in all spheres. We found the
severely underconstrained, given the lighting/material ambiguity, lighting directions by minimizing the L1 distance between the top
and thus requires priors about how appearance changes with illu- and bottom row.
mination. Arguably, large DMs must internally encode such priors
since they can generate realistic complex lighting effects, but exist-
ing architectures do not allow for explicit control over lighting. which best reproduces this target when rendering a gray ball with
We propose to provide explicit control over lighting by fine- a simplistic Phong shading model. More specifically, we minimize
tuning a pre-trained Stable Diffusion (SD) [RBL∗ 22] model using the L1 error when jointly optimizing for an ambient light term and
ControlNet [ZRA23] on a multi-illumination dataset. As illustrated shading parameters (albedo, specular intensity and hardness, as well
in Fig. 2, the ControlNet accepts as input an image as well as a as a Fresnel coefficient). Fig. 3 illustrates this process.
target light direction, and produces a relit version of the same scene
under the desired lighting. To train the ControlNet, we leverage the 3.1.2. Controlling Relighting Diffusion
dataset of Murmann et al. [MGAD19], which contains N = 1015 We train ControlNet to predict relit versions of the input image by
real indoor scenes captured from a single viewpoint, each lit under conditioning it on a target lighting direction. Let us denote a set X
M = 25 different, controlled lighting directions. We only keep the of images of a given scene in the multi-light dataset of Murmann
18 non-front facing light directions. et al. [MGAD19], where each image Xk ∈ X has associated light
direction lk . Our approach, illustrated in Fig. 2, trains on pairs of
3.1.1. Lighting Direction lighting directions of the same scene (including the identity pair).
The denoising objective becomes
To capture the scenes using similar light directions, Murmann et al. h i
relied on a camera-mounted directional flash controlled by a servo L2D = Eϵ,X,t,i, j ||gψ (Xt,i ;t, X j , D j , l i ) − yt ||22 , (1)
motor. A pair of diffuse and metallic spheres are also visible in each
scene; we leverage the former to obtain the effective lighting direc- where Xt,i is the noisy image at timestep t ∈ [1, T ], where i, j ∈
tions. Using as target the average of all diffuse spheres produced [1, M], and where ψ are the ControlNet optimizable parameters only.
by the same flash direction, we find the lighting direction l ∈ R3 X j is another image from the set and D j is its depth map (obtained
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 5 of 14
Figure 6: Relighting results with our light-conditioned ControlNet. From a single input image (left column), the ControlNet can generate
realistic relit versions for different target light directions (other columns). Please notice realistic changes in highlights for different light
directions (top row), as well as the synthesis of cast shadows (bottom row).
Figure 7: Given a multi-view, single-illumination dataset we use our relighting ControlNet to generate a multi-view, multi-illumination dataset.
with the viewing direction. Both vectors have a size of 16 after ance fields like 3DGS rely on multi-view consistency, and breaking
encoding. it introduces additional floaters and holes in surfaces.
Since light directions are computed with respect to a local camera To allow the neural network to account for this inconsistency
reference frame (c.f. Sec. 3.1), we subsequently register them to the and correct accordingly, we optimize a per-image auxiliary latent
world coordinate system (obtained from Colmap) by rotating them vector a of size 128. Similar approaches for variable appearance
according to their (known) camera rotation parameters: have been used for NeRFs [MBRS∗ 21]. Therefore, in addition to the
l ′ = Ri l , (2) lighting direction l ′ , we condition the MLP with per-view auxiliary
parameters a:
where Ri is the 3 × 3 camera-to-world rotation matrix of image Ii
from its known pose. G
use Stable Diffusion [RBL∗ 22] v2.1 as a backbone. Our source code
and datasets will be released upon publication.
We first present the results of our 3D relightable radiance field,
both for synthetic and real-world scenes. We then present a quanti-
tative and qualitative evaluation of our method by comparing it to
previous work and finally present an ablation of the auxiliary vector
a from Sec. 3.3.
Figure 9: Qualitative relighting results for the real scenes, from left to right: C HEST OF D RAWERS, K ETTLE, M IP N E RF ROOM and G ARAGE
WALL, for a moving light source. The lighting direction is indicated in the gray ball in the lower right. Please see the supplemental video for
more results. Please note how the highlights (left group) and shadows (right group) have changed.
Figure 10: Qualitative comparison on real scene K ETTLE. From left to right, from the same viewpoint: input lighting condition (view
reconstructed using 3D Gaussian Splatting), target lighting, our relighting, Philip et al. [PMGD21] relighting. Top and bottom rows are
two different lighting conditions. Philip et al. [PMGD21] exhibits much more geometry and shading artifacts compared to our method; in
particular imprecise MVS preprocessing results in missing geometry.
Method → Ours Outcast [GRP22] R3DGS [GGL∗ 23] TensoIR [JLX∗ 23]
Scene ↓ / Metrics PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑ PSNR↑ LPIPS↓ SSIM↑
Simple Bedroom 20.57 0.156 0.868 17.24 0.207 0.808 17.79 0.174 0.830 15.77 0.471 0.595
Simple Kitchen 17.45 0.154 0.855 17.91 0.205 0.822 18.55 0.197 0.807 20.52 0.382 0.701
Simple Livingroom 22.12 0.136 0.884 21.09 0.125 0.878 20.34 0.166 0.857 17.45 0.444 0.598
Simple Office 18.59 0.131 0.868 18.97 0.196 0.811 20.40 0.173 0.808 18.22 0.446 0.644
Complex Bedroom 17.70 0.145 0.791 15.26 0.221 0.694 16.69 0.186 0.741 14.42 0.434 0.555
Complex Kitchen 19.28 0.152 0.811 18.44 0.178 0.771 19.28 0.168 0.755 16.70 0.471 0.533
Complex Livingroom 18.61 0.163 0.800 17.94 0.187 0.783 18.39 0.175 0.770 16.82 0.382 0.602
Complex Office 20.20 0.096 0.858 17.22 0.169 0.781 18.93 0.144 0.776 15.78 0.468 0.529
Table 1: Quantitative results of our 3D relighting on the synthetic datasets (where ground truth is available), compared to previous work, from
left to right: OutCast [GRP22] (run on individual images from 3DGS [KKLD23]), Relightable3DGaussians [GGL∗ 23], and TensoIR [JLX∗ 23].
Arrows indicate higher/lower (↑ / ↓) is better. Results are color coded by best , second- and third- best.
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 9 of 14
Figure 11: We show comparative results of our method of synthetic scenes where the (approximate) ground truth is available (left), and
compare to previous methods. Our approach is closer to the ground truth lighting, capturing the overall appearance in a realistic manner.
10 of 14 Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting
soIR [JLX∗ 23] and RelightableGaussians [GGL∗ 23]. Given that multi-illumination dataset and fine-tuning a large pretrained gen-
most other methods do not handle full scenes well, we also create a erative model. Our results show that we can synthesize realistic
new baseline, by first training 3DGS [KKLD23] on the input data relighting of captured scenes, while allowing interactive novel-view
and render a test path using novel view synthesis; We then use Out- synthesis by building on such priors. Our method shows levels of
Cast [GRP22] to relight each individual rendered frame using the realism for relighting that surpass the state of the art for cluttered
target direction. We trained TensoIR [JLX∗ 23] using the default indoor scenes (as opposed to isolated objects).
configuration but modified the “density_shift” parameter from −10
to −8 to achieve best results on our data. For Relightable 3D Gaus-
sians [GGL∗ 23], we train their “Stage 1” for 30K iterations and
“Stage 2” for an additional 10K to recover the BRDF parameters.
We then relight the scenes using 360° environment maps rendered
in Blender using a generic empty room and a similar camera/flash
setup for ground truth. Finally, to improve the baselines we normal-
ize the predictions of all methods; we first subtract the channel-wise
mean and divide out the channel-wise standard deviation, and then
multiply and add the corresponding parameters of the ground truths.
These operations are performed in LAB space for all methods.
Figure 13: We show the results of our 2D relighting network on out-of-distribution images (StyleGAN-generated woman and MipNeRF360
BICYCLE, GARDEN, and STUMP). On human faces, ControlNet may change the expression as well as the lighting, or create excessive shininess;
on outdoor scenes, while the overall lighting direction is plausible, the network fails to generate sufficiently hard shadows.
[BF24] B HATTAD A., F ORSYTH D. A.: Stylitgan: Prompting stylegan to [LCL∗ 23] L IANG R., C HEN H., L I C., C HEN F., PANNEER S., V IJAYKU -
produce new illumination conditions. In IEEE/CVF Conf. Comput. Vis. MAR N.: Envidr: Implicit differentiable renderer with neural environment
Pattern Recog. (2024). 3 lighting. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3
[BJB∗ 21] B OSS M., JAMPANI V., B RAUN R., L IU C., BARRON J., [LGF∗ 22] L I Q., G UO J., F EI Y., L I F., G UO Y.: Neulighting: Neural
L ENSCH H.: Neural-pil: Neural pre-integrated lighting for reflectance lighting for free viewpoint outdoor scene relighting with unconstrained
decomposition. In Adv. Neural Inform. Process. Syst. (2021). 2, 3 photo collections. In SIGGRAPH Asia 2022 Conference Papers (2022). 3
[BMHF23] B HATTAD A., M C K EE D., H OIEM D., F ORSYTH D.: Style- [LGZ∗ 20] L IU A., G INOSAR S., Z HOU T., E FROS A. A., S NAVELY N.:
gan knows normal, depth, albedo, and more. 3 Learning to factorize and relight a city. In Eur. Conf. Comput. Vis. (2020).
2
[BMT∗ 21] BARRON J. T., M ILDENHALL B., TANCIK M., H EDMAN
P., M ARTIN -B RUALLA R., S RINIVASAN P. P.: Mip-nerf: A multiscale [LLLY23] L IN S., L IU B., L I J., YANG X.: Common diffusion noise
representation for anti-aliasing neural radiance fields. In IEEE/CVF Int. schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891
Conf. Comput. Vis. (2021). 2 (2023). 5
[LLZ∗ 20] L IU D., L ONG C., Z HANG H., Y U H., D ONG X., X IAO C.:
[BMV∗ 22] BARRON J. T., M ILDENHALL B., V ERBIN D., S RINIVASAN
Arshadowgan: Shadow generative adversarial network for augmented
P. P., H EDMAN P.: Mip-nerf 360: Unbounded anti-aliased neural radiance
reality in single light scenes. In IEEE/CVF Conf. Comput. Vis. Pattern
fields. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2022). 2
Recog. (2020). 2
[CKK23] C HOI C., K IM J., K IM Y. M.: IBL-NeRF: Image-based lighting [LSB∗ 22] L I Z., S HI J., B I S., Z HU R., S UNKAVALLI K., H AŠAN M.,
formulation of neural radiance fields. Comput. Graph. Forum 42, 7 (2023). X U Z., R AMAMOORTHI R., C HANDRAKER M.: Physically-based editing
3 of indoor scene lighting from a single image. In Eur. Conf. Comput. Vis.
[CXG∗ 22] C HEN A., X U Z., G EIGER A., Y U J., S U H.: Tensorf: Tenso- (2022). 2
rial radiance fields. In Eur. Conf. Comput. Vis. (2022). 2 [LSY∗ 21] L AGUNAS M., S UN X., YANG J., V ILLEGAS R., Z HANG J.,
[DHT∗ 00] D EBEVEC P., H AWKINS T., T CHOU C., D UIKER H.-P., S HU Z., M ASIA B., G UTIERREZ D.: Single-image full-body human
S AROKIN W., S AGAR M.: Acquiring the reflectance field of a human relighting. Eur. Graph. Symp. Render. (2021). 2
face. In Proceedings of the 27th annual conference on Computer graphics [LWC∗ 23] L I Z., WANG L., C HENG M., PAN C., YANG J.: Multi-view
and interactive techniques (2000), pp. 145–156. 1 inverse rendering for large-scale real-world indoor scenes. In IEEE/CVF
Conf. Comput. Vis. Pattern Recog. (2023). 3
[FRV∗ 23] F UTSCHIK D., R ITLAND K., V ECORE J., FANELLO S., O RTS -
E SCOLANO S., C URLESS B., S ỲKORA D., PANDEY R.: Controllable [LWL∗ 23] L IU Y., WANG P., L IN C., L ONG X., WANG J., L IU L.,
light diffusion for portraits. In IEEE/CVF Conf. Comput. Vis. Pattern KOMURA T., WANG W.: Nero: Neural geometry and brdf reconstruction
Recog. (2023). 2 of reflective objects from multiview images. ACM Trans. Graph. (2023).
3
[GAA∗ 23] G AL R., A RAR M., ATZMON Y., B ERMANO A. H., C HECHIK
G., C OHEN -O R D.: Encoder-based domain tuning for fast personalization [LZF∗ 24] L IANG Z., Z HANG Q., F ENG Y., S HAN Y., J IA K.: Gs-ir: 3d
of text-to-image models. ACM Trans. Graph. 42, 4 (2023), 1–13. 3 gaussian splatting for inverse rendering. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR) (2024).
[GCD∗ 20] G AO D., C HEN G., D ONG Y., P EERS P., X U K., T ONG X.: 3
Deferred neural lighting: free-viewpoint relighting from unstructured
photographs. ACM Trans. Graph. 39, 6 (nov 2020). 3 [MBRS∗ 21] M ARTIN -B RUALLA R., R ADWAN N., S AJJADI M. S. M.,
BARRON J. T., D OSOVITSKIY A., D UCKWORTH D.: NeRF in the
[GGL∗ 23] G AO J., G U C., L IN Y., Z HU H., C AO X., Z HANG L., YAO Wild: Neural Radiance Fields for Unconstrained Photo Collections. In
Y.: Relightable 3d gaussian: Real-time point cloud relighting with brdf IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2021). 6
decomposition and ray tracing. arXiv:2311.16043 (2023). 2, 3, 8, 10
[MESK22] M ÜLLER T., E VANS A., S CHIED C., K ELLER A.: Instant
[GRP22] G RIFFITHS D., R ITSCHEL T., P HILIP J.: Outcast: Outdoor neural graphics primitives with a multiresolution hash encoding. ACM
single-image relighting with cast shadows. In Comput. Graph. Forum Trans. Graph. 41, 4 (July 2022), 102:1–102:15. 2, 5
(2022), vol. 41, pp. 179–193. 3, 8, 10 [MGAD19] M URMANN L., G HARBI M., A ITTALA M., D URAND F.: A
[HHM22] H ASSELGREN J., H OFMANN N., M UNKBERG J.: Shape, light, multi-illumination dataset of indoor object appearance. In IEEE/CVF Int.
and material decomposition from images using monte carlo rendering and Conf. Comput. Vis. (2019). 2, 4, 5, 7
denoising. Adv. Neural Inform. Process. Syst. 35 (2022), 22856–22869. 3 [MHS∗ 22] M UNKBERG J., H ASSELGREN J., S HEN T., G AO J., C HEN
[HJA20] H O J., JAIN A., A BBEEL P.: Denoising diffusion probabilistic W., E VANS A., M ÜLLER T., F IDLER S.: Extracting Triangular 3D Mod-
models. In Adv. Neural Inform. Process. Syst. (2020). 3 els, Materials, and Lighting From Images. In IEEE/CVF Conf. Comput.
Vis. Pattern Recog. (June 2022), pp. 8280–8290. 3
[JLX∗ 23] J IN H., L IU I., X U P., Z HANG X., H AN S., B I S., Z HOU X.,
[MST∗ 20] M ILDENHALL B., S RINIVASAN P. P., TANCIK M., BARRON
X U Z., S U H.: Tensoir: Tensorial inverse rendering. In IEEE/CVF Conf.
J. T., R AMAMOORTHI R., N G R.: Nerf: Representing scenes as neural
Comput. Vis. Pattern Recog. (2023). 2, 3, 8, 10
radiance fields for view synthesis. In Eur. Conf. Comput. Vis. (2020). 1, 2
[JTL∗ 24] J IANG Y., T U J., L IU Y., G AO X., L ONG X., WANG W., M A [NDDJK21] N IMIER -DAVID M., D ONG Z., JAKOB W., K APLANYAN
Y.: Gaussianshader: 3d gaussian splatting with shading functions for A.: Material and lighting reconstruction for complex indoor scenes with
reflective surfaces, 2024. 3 texture-space differentiable rendering. Comput. Graph. Forum (2021). 3
[KE19] K ANAMORI Y., E NDO Y.: Relighting humans: occlusion-aware [PD23] P HILIP J., D ESCHAINTRE V.: Floaters no more: Radiance field
inverse rendering for full-body human images. ACM Trans. Graph. 37, 6 gradient scaling for improved near-camera training. In EGSR Conference
(2019). 2 proceedings DL-track (2023), The Eurographics Association. 7
[KKLD23] K ERBL B., KOPANAS G., L EIMKÜHLER T., D RETTAKIS G.: [PEL∗ 23] P ODELL D., E NGLISH Z., L ACEY K., B LATTMANN A.,
3d gaussian splatting for real-time radiance field rendering. ACM Trans. D OCKHORN T., M ÜLLER J., P ENNA J., ROMBACH R.: Sdxl: Improv-
Graph. 42, 4 (July 2023). 2, 5, 6, 7, 8, 10 ing latent diffusion models for high-resolution image synthesis. arXiv
[KOH∗ 24] K E B., O BUKHOV A., H UANG S., M ETZGER N., DAUDT preprint arXiv:2307.01952 (2023). 3
R. C., S CHINDLER K.: Repurposing diffusion-based image generators [PGZ∗ 19] P HILIP J., G HARBI M., Z HOU T., E FROS A. A., D RETTAKIS
for monocular depth estimation. In IEEE/CVF Conf. Comput. Vis. Pattern G.: Multi-view relighting using a geometry-aware network. ACM Trans.
Recog. (2024). 5 Graph. 38, 4 (2019), 78–1. 2, 3
Y. Poirier-Ginter et al. / A Diffusion Approach to Radiance Field Relighting 13 of 14
[PLMZ23] PAPANTONIOU F. P., L ATTAS A., M OSCHOGLOU S., [TTM∗ 22] T EWARI A., T HIES J., M ILDENHALL B., S RINIVASAN P.,
Z AFEIRIOU S.: Relightify: Relightable 3d faces from a single image via T RETSCHK E., Y IFAN W., L ASSNER C., S ITZMANN V., M ARTIN -
diffusion models. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 2 B RUALLA R., L OMBARDI S., ET AL .: Advances in neural rendering. In
Comput. Graph. Forum (2022), vol. 41, Wiley Online Library, pp. 703–
[PMGD21] P HILIP J., M ORGENTHALER S., G HARBI M., D RETTAKIS
735. 2
G.: Free-viewpoint indoor neural relighting from multi-view stereo. ACM
Trans. Graph. 40, 5 (2021), 1–18. 2, 3, 7, 8, 10 [Ull79] U LLMAN S.: The interpretation of structure from motion. Pro-
ceedings of the Royal Society of London. Series B, Biological sciences
[PTS23] P ONGLERTNAPAKORN P., T RITRONG N., S UWAJANAKORN S.: 203, 1153 (January 1979), 405—426. 2
Difareli: Diffusion face relighting. In IEEE/CVF Int. Conf. Comput. Vis.
(2023). 2 [VZG∗ 23] VALENÇA L., Z HANG J., G HARBI M., H OLD -G EOFFROY Y.,
L ALONDE J.-F.: Shadow harmonization for realistic compositing. In
[RBL∗ 22] ROMBACH R., B LATTMANN A., L ORENZ D., E SSER P., O M - SIGGRAPH Asia 2023 Conference Papers (2023). 2
MER B.: High-resolution image synthesis with latent diffusion models.
[WSG∗ 23] WANG Z., S HEN T., G AO J., H UANG S., M UNKBERG J.,
In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2022). 2, 3, 4, 7
H ASSELGREN J., G OJCIC Z., C HEN W., F IDLER S.: Neural fields meet
[RES∗ 22] RUDNEV V., E LGHARIB M., S MITH W., L IU L., G OLYANIK explicit geometric representations for inverse rendering of urban scenes.
V., T HEOBALT C.: Nerf for outdoor scene relighting. In Eur. Conf. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2023). 3
Comput. Vis. (2022). 3 [WSLG23] W U T., S UN J.-M., L AI Y.-K., G AO L.: De-nerf: Decoupled
[RLJ∗ 23] RUIZ N., L I Y., JAMPANI V., P RITCH Y., RUBINSTEIN M., neural radiance fields for view-consistent appearance editing and high-
A BERMAN K.: Dreambooth: Fine tuning text-to-image diffusion models frequency environmental relighting. In ACM SIGGRAPH (2023). 3
for subject-driven generation. In IEEE/CVF Conf. Comput. Vis. Pattern [WZL∗ 08] WANG Y., Z HANG L., L IU Z., H UA G., W EN Z., Z HANG
Recog. (2023), pp. 22500–22510. 3 Z., S AMARAS D.: Face relighting from a single image under arbitrary
[SDWMG15] S OHL -D ICKSTEIN J., W EISS E., M AHESWARANATHAN unknown lighting conditions. IEEE Trans. Pattern Anal. Mach. Intell. 31,
N., G ANGULI S.: Deep unsupervised learning using nonequilibrium 11 (2008), 1968–1984. 2
thermodynamics. 3 [XZC∗ 23] X U Y., Z OSS G., C HANDRAN P., G ROSS M., B RADLEY D.,
G OTARDO P.: Renerf: Relightable neural radiance fields with nearfield
[SDZ∗ 21] S RINIVASAN P. P., D ENG B., Z HANG X., TANCIK M.,
lighting. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3
M ILDENHALL B., BARRON J. T.: Nerv: Neural reflectance and visibility
fields for relighting and view synthesis. In IEEE/CVF Conf. Comput. Vis. [YME∗ 20] Y U Y., M EKA A., E LGHARIB M., S EIDEL H.-P., T HEOBALT
Pattern Recog. (2021). 3 C., S MITH W. A.: Self-supervised outdoor scene relighting. In Eur. Conf.
Comput. Vis. (2020). 2
[SF16] S CHÖNBERGER J. L., F RAHM J.-M.: Structure-from-motion
revisited. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2016). 2, 5 [YS22] Y U Y., S MITH W. A. P.: Outdoor inverse rendering from a single
image using multiview self-supervision. IEEE Trans. Pattern Anal. Mach.
[SJL∗ 23] S HARMA P., JAMPANI V., L I Y., J IA X., L AGUN D., D U - Intell. 44, 7 (2022), 3659–3675. 2
RAND F., F REEMAN W. T., M ATTHEWS M.: Alchemist: Parametric
[YZL∗ 22] YAO Y., Z HANG J., L IU J., Q U Y., FANG T., M C K INNON D.,
control of material properties with diffusion models. arXiv preprint
T SIN Y., Q UAN L.: Neilf: Neural incident light field for physically-based
arXiv:2312.02970 (2023). 3
material estimation. In Eur. Conf. Comput. Vis. (2022). 3
[SKCJ18] S ENGUPTA S., K ANAZAWA A., C ASTILLO C. D., JACOBS [ZCD∗ 23] Z ENG C., C HEN G., D ONG Y., P EERS P., W U H., T ONG X.:
D. W.: Sfsnet: Learning shape, reflectance and illuminance of facesin the Relighting neural radiance fields with shadow and highlight hints. In
wild. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2018). 2 ACM SIGGRAPH 2023 Conference Proceedings (2023). 3
[SLZ∗ 22] S HENG Y., L IU Y., Z HANG J., Y IN W., O ZTIRELI A. C., [ZDP∗ 24] Z ENG C., D ONG Y., P EERS P., KONG Y., W U H., T ONG
Z HANG H., L IN Z., S HECHTMAN E., B ENES B.: Controllable shadow X.: Dilightnet: Fine-grained lighting control for diffusion-based image
generation using pixel height maps. In Eur. Conf. Comput. Vis. (2022). 2 generation. In ACM SIGGRAPH 2024 Conference Proceedings (2024). 3
[SME20] S ONG J., M ENG C., E RMON S.: Denoising diffusion implicit [ZFC∗ 23] Z HU Z., F ENG X., C HEN D., BAO J., WANG L., C HEN Y.,
models. In Int. Conf. Learn. Represent. (2020). 3 Y UAN L., H UA G.: Designing a better asymmetric vqgan for stablediffu-
sion, 2023. 5
[SWW∗ 23] S HI Y., W U Y., W U C., L IU X., Z HAO C., F ENG H., L IU J.,
Z HANG L., Z HANG J., Z HOU B., D ING E., WANG J.: Gir: 3d gaussian [ZHY∗ 23] Z HU J., H UO Y., Y E Q., L UAN F., L I J., X I D., WANG L.,
inverse rendering for relightable scene factorization. Arxiv (2023). 3 TANG R., H UA W., BAO H., ET AL .: I2-sdf: Intrinsic indoor scene
reconstruction and editing via raytracing in neural sdfs. In IEEE/CVF
[SYH∗ 17] S HU Z., Y UMER E., H ADAP S., S UNKAVALLI K., S HECHT- Conf. Comput. Vis. Pattern Recog. (2023). 3
MAN E., S AMARAS D.: Neural face editing with intrinsic image dis-
entangling. In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2017). [ZIE∗ 18] Z HANG R., I SOLA P., E FROS A. A., S HECHTMAN E., WANG
2 O.: The unreasonable effectiveness of deep features as a perceptual metric.
In IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2018). 10
[SZB21] S HENG Y., Z HANG J., B ENES B.: Ssn: Soft shadow network
[ZLW∗ 21] Z HANG K., L UAN F., WANG Q., BALA K., S NAVELY N.:
for image compositing. In IEEE/CVF Conf. Comput. Vis. Pattern Recog.
PhySG: Inverse rendering with spherical gaussians for physics-based
(2021). 2
material editing and relighting. In IEEE/CVF Conf. Comput. Vis. Pattern
[SZPF16] S CHÖNBERGER J. L., Z HENG E., P OLLEFEYS M., F RAHM Recog. (2021). 3
J.-M.: Pixelwise view selection for unstructured multi-view stereo. In [ZLZ∗ 22] Z HU Z.-L., L I Z., Z HANG R.-X., G UO C.-L., C HENG M.-
Eur. Conf. Comput. Vis. (2016). 5 M.: Designing an illumination-aware network for deep image relighting.
[TÇE∗ 21] T ÜRE M., Ç IKLABAKKAL M. E., E RDEM A., E RDEM E., IEEE Trans. Image Process. 31 (2022), 5396–5411. 2
S ATILMI Ş P., A KYÜZ A. O.: From noon to sunset: Interactive rendering, [ZRA23] Z HANG L., R AO A., AGRAWALA M.: Adding conditional
relighting, and recolouring of landscape photographs by modifying solar control to text-to-image diffusion models. In IEEE/CVF Int. Conf. Comput.
position. In Comput. Graph. Forum (2021), vol. 40, pp. 500–515. 2 Vis. (2023). 2, 3, 4, 7
[TDMS∗ 23] T OSCHI M., D E M ATTEO R., S PEZIALETTI R., D E G RE - [ZSD∗ 21] Z HANG X., S RINIVASAN P. P., D ENG B., D EBEVEC P., F REE -
GORIO D., D I S TEFANO L., S ALTI S.: Relight my nerf: A dataset for MAN W. T., BARRON J. T.: Nerfactor: Neural factorization of shape and
novel view synthesis and relighting of real world objects. In IEEE/CVF reflectance under an unknown illumination. ACM Trans. Graph. 40, 6
Conf. Comput. Vis. Pattern Recog. (June 2023), pp. 20762–20772. 3 (2021), 1–18. 2, 3
[ZSH∗ 22] Z HANG Y., S UN J., H E X., F U H., J IA R., Z HOU X.: Mod-
eling indirect illumination for inverse rendering. In IEEE/CVF Conf.
Comput. Vis. Pattern Recog. (2022). 3
[ZTS∗ 22] Z HANG X., T SENG N., S YED A., B HASIN R., JAIPURIA N.:
Simbar: Single image-based scene relighting for effective data augmenta-
tion for automated driving vision tasks. In IEEE/CVF Conf. Comput. Vis.
Pattern Recog. (06 2022). 2
[ZYL∗ 23] Z HANG J., YAO Y., L I S., L IU J., FANG T., M C K INNON D.,
T SIN Y., Q UAN L.: Neilf++: Inter-reflectable light fields for geometry
and material estimation. In IEEE/CVF Int. Conf. Comput. Vis. (2023). 3