0% found this document useful (0 votes)

10 views10 pages

DiffusioNeRF_Regularizing_Neural_Radiance_Fields_With_Denoising_Diffusion_Models_CVPR_2023_paper

Uploaded by

zyn124535514

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

DiffusioNeRF_Regularizing_Neural_Radiance_Fields_With_Denoising_Diffusion_Models_CVPR_2023_paper

Uploaded by

zyn124535514

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

the final published version of the proceedings is available on IEEE Xplore.

DiffusioNeRF: Regularizing Neural Radiance Fields

with Denoising Diffusion Models

Jamie Wynn Daniyar Turmukhambetov

Niantic
www.github.com/nianticlabs/diffusionerf

Abstract

Under good conditions, Neural Radiance Fields (NeRFs)

have shown impressive results on novel view synthesis tasks.
NeRFs learn a scene’s color and density fields by minimiz-
ing the photometric discrepancy between training views and
differentiable renderings of the scene. Once trained from
a sufficient set of views, NeRFs can generate novel views (a) DiffusioNeRF (Ours)
from arbitrary camera positions. However, the scene geom-
etry and color fields are severely under-constrained, which
can lead to artifacts, especially when trained with few input
views.
To alleviate this problem we learn a prior over scene
geometry and color, using a denoising diffusion model
(DDM). Our DDM is trained on RGBD patches of the syn-
thetic Hypersim dataset and can be used to predict the gra- (b) RegNeRF [21]
dient of the logarithm of a joint probability distribution of
Figure 1. Image and depth map rendered from a test view. All
color and depth patches. We show that, these gradients of
NeRF models were trained with 3 views of the LLFF [16] dataset’s
logarithms of RGBD patch priors serve to regularize geom- “Room” scene. Our priors encourage NeRF to explain the TV and
etry and color of a scene. During NeRF training, random table geometry with flat surfaces in the density field, and to explain
RGBD patches are rendered and the estimated gradient of the view-dependent color changes with the color field.
the log-likelihood is backpropagated to the color and den-
sity fields. Evaluations on LLFF, the most relevant dataset,
show that our learned prior achieves improved quality in sity predictions.
the reconstructed geometry and improved generalization to The model is trained with the aim of faithfully recon-
novel views. Evaluations on DTU show improved recon- structing images captured with known camera poses. Even
struction quality among NeRF methods. when trained with just a photometric reconstruction loss,
NeRFs show impressive generalization capabilities, inspir-
ing novel applications in virtual and augmented reality, and
1. Introduction visual special effects.
However, with small numbers or even with large num-
Neural radiance fields, neural implicit surfaces, and bers of input views, the scene color and geometry fields are
coordinate-based scene representations are proving valu- severely under-constrained. Indeed, an infinite number of
able for novel view synthesis and 3D reconstruction tasks. NeRFs can explain all training views. In practice, NeRFs
NeRFs [17] learn a specific scene’s appearance as a multi- can generate low-quality and physically implausible ge-
layer perceptron that predicts density and color, when given ometries and surface appearances. For example, “floaters”
any 3D point and a viewing direction. are one common artifact, where the fitted density field con-
This volume representation allows differentiable render- tains clouds of semi-transparent material floating in mid-air
ing from arbitrary views, where predicted color contribu- that would look reasonable in 2D once rendered from train-
tions along a ray are alpha-composited according to the den- ing views, but look implausible from novel views.

4180
Various hand-crafted regularizers and learned priors ume to train NeRFs. The 3D CNN can be pretrained on a
have been proposed to tackle these issues: hand-engineered large number of scenes, which allows faster convergence on
priors to constrain the scene geometry [2,21], learned priors novel scenes.
that force plausible renderings from arbitrary views [21], Instant Neural Graphics Primitives [19] uses multi-scale
and methods that use single image depth and normal esti- hash tables to store feature encodings of all coordinates in
mation [38, 46] to provide high-level constraints on the es- a fixed memory block. This allows storing features at vary-
timated scene geometry. However, there are no approaches ing spatial resolutions, and consequently reduces the size
that learn a joint probability distribution of the scene geom- of the MLP that models geometry and color. With a GPU-
etry and color. optimized implementation, Instant NGP can train NeRFs in
Our contribution is leveraging denoising diffusion mod- minutes without quality degradation. Our contribution is in
els (DDMs) as a learned prior over color and geometry. priors used for NeRF optimization, and hence our method is
Specifically, we use an existing synthetic dataset to gener- agnostic to the underlying geometry representation. As In-
ate a dataset of RGBD patches to train our DDM. DDMs stant NGP is fast to train and render, we use it as a backbone
do not predict a probability for RGBD patch distribution. for our experiments.
Rather, they provide the gradient of the log-probability of Density regularization Mip-NeRF 360 [2] proposes a den-
the RGBD patch distribution, i.e. the negative direction sity regularizer that encourages compactness of the density
to the noise predicted by DDM is equivalent to moving along conical frustums. In addition to our learned regular-
towards the modes of the RGBD patch distribution. As izer, we use [2] density regularizer as it helps to sharpen the
NeRFs are trained with stochastic gradient descent, gradi- distribution of densities along sampled rays.
ents of log-probabilities are sufficient, as they can be back- Regularization with loss terms Loss terms to regularize
propagated to NeRF networks during training to act as a NeRFs can play an important role in the final result, as they
regularizer; probabilities are not required for this purpose. provide additional supervision to under-constrained geom-
We demonstrate that the DDM gradient encourages NeRFs etry and color fields. Some regularizers are hand-crafted to
to fit density and color fields that are more physically plau- encourage depth and normal smoothness, e.g. [2,21,23,48].
sible on the LLFF and DTU datasets. In [11], a semantic loss is introduced to make high-level
semantic attributes consistent across renderings from ran-
2. Related work dom views. In [27] a loss term regularizes rendered depth
Geometry modeling The geometry of the scene can be maps with depths estimated using Structure-from-Motion
modeled as a density field [17], occupancy field [22, 23] or and depth completion methods. MonoSDF [46] regularizes
signed distance field [40, 43, 44]. Geometry models can be occupancy fields with loss terms that incorporate depth and
rendered using differentiable surface/volumetric rendering, normals maps predicted with a single-image depth predic-
so that the training loss for a NeRF model is the photometric tion model. Similarly, [38] introduces loss terms that use
reconstruction loss [17]. Signed distance fields also require a single-image normal prediction model to regularize ren-
regularization with an Eikonal loss [6] to constrain the dis- dered normal maps. While all these approaches introduce
tance field to be valid. Our regularizer operates on rendered high-level geometric supervision to NeRFs, the predicted
color and depth patches, so it can be applied to any geome- depth and normals are fixed during NeRF fitting and hence
try representation. the depth and normal models provide a unimodal prior over
Field representation NeRFs [17] represent geometry with geometry. Furthermore, the additional supervision is not
a multi-layer perceptron that is queried with a 3D coordi- adapted to the NeRF reconstructions and hence the monoc-
nate. Positional encoding of coordinates, where coordinate ular depth and normal predictions are trusted blindly.
values are evaluated with sinusoids at different frequencies, Regularization with Normalizing Flows RegNeRF [21]
allows modeling of high-frequency density signals with uses a 2D depth patch smoothness prior and a normaliz-
MLPs [35]. Alternatively, [7, 29] encode scalar opacity and ing flow model as a learned prior over 2D RGB patches.
spherical harmonic coefficients in a sparse voxel represen- The color patches are rendered while fitting the NeRF and
tation, and shows that novel views can be synthesized with- a term proportional to the log probability density assigned
out MLPs. Similarly, Neural Sparse Voxel Fields [15] stores to the patch by the normalizing flow model is added to the
feature encodings in a sparse voxel octree structure that can loss function.
be trilinearly interpolated and passed through an MLP to However, the underlying cause of NeRF’s dramatic per-
predict density and color, thus improving the modeling ca- formance degradation in the few-view case is that the ge-
pacity and rendering speed of NeRFs. MVSNeRF [3] pre- ometry is poor, so we argue that it is preferable to regu-
dicts a volume of feature encodings by constructing a 3D larize the geometry directly, rather than indirectly via RGB
cost volume and processing it with 3D CNNs. Density and patches. By learning a distribution over RGBD patches we
color MLPs trilinearly interpolate the feature encoding vol- also benefit from the fact that color and depth are strongly

4181
Volume Rendering
Rays of a
patch
Ray

σ
x, y, z
density

C(r) RGBD
c color patch
d

Backpropagate

Figure 2. Illustration of our method. The scene is sampled with training-view rays and rays originating from random patches. Color and
density are predicted by MLP for the 3D points sampled along the rays. Volumetric rendering is used to estimate expected color C(r),
depth D(r) as well as weights of color contributions {wi } and positions of samples {ti }. These estimates are used to compute gradients
of losses that are backpropagated to color and density MLPs. DDM model ϵθ uses RGBD patches to predict color and density gradients
that are passed to MLPs directly. Instant NGP’s multi-scale hash table of feature encodings is not illustrated for simplicity.

correlated, and therefore attempting to regularize them sep- has incorporated Imagen into NeRF optimization to gener-
arately discards information. ate novel 3D assets from a text input. Unlike our work, they
RegNeRF [21] uses MLPs to model color and density use DDMs to guide optimization of NeRFs to match input
fields, hence during NeRF training the patch rendering cost text, while we use DDMs to regularize NeRFs given input
can extend NeRF training time substantially. Thus, Reg- training images.
NeRF renders 8 × 8 patches for the prior model, which
severely limits the amount of context visible to the normal- 3. Method
izing flow model. We use Instant NGP for our NeRF rep-
We start by covering preliminaries like NeRF and DDM
resentation, which has a fast rendering time, allowing us to
training. Next, we describe the relation of DDMs to the
model priors over 48 × 48 patches.
gradient of the log-likelihood of the data, and show how we
Normalizing flows are generative models that learn to incorporate DDMs as NeRF regularizers. An overview of
transform a simple probability distribution into a more com- the our method is shown in Fig. 2.
plex data distribution [13]. The model is built of blocks that
fulfil the requirements of (i) preserving the number of di- 3.1. NeRFs
mensions of input and output features; (ii) being invertible,
Given a set of images of a scene I with camera intrinsic
i.e. the input to the block can be calculated from the out-
parameters and poses, we are interested in optimizing a den-
put; and (iii) the Jacobian of each block must be tractable
sity field σ : R3 → R+ and color field c : R3 ×S2 → R3[0,1] ,
so that the log probability density can be computed. These
where the density field can be evaluated at any 3D coordi-
constraints can lead to trade-offs in which model expres-
nate (x, y, z) ∈ R3 and the color field can be evaluated at
siveness is sacrificed for tractability. Diffusion models do
any 3D coordinate and viewing direction d ∈ S2 .
not have such constraints on their structures and may there-
The density and color fields can be used to synthesize
fore be more suitable to model data priors.
views of the scene from arbitrary cameras using differen-
Denoising Diffusion Models DDMs [8, 20, 31] are pow- tiable rendering techniques. The expected color C(r) of a
erful generative models that learn to estimate gradients of ray r(t) = o + td can be estimated using discrete samples
the log data distribution. Once trained, Langevin dynam- t0:N (where ti+1 > ti > 0), so
ics sampling [42] can be used to generate novel samples
by performing a sequence of denoising steps starting from a
random sample of a standard Gaussian distribution. Denois- \mathbf {C}(\mathbf {r}) \approxeq \sum _{i=1}^N w_i \mathbf {c}(\mathbf {r}(t_i), \mathbf {d}) + \left ( 1 - \sum _{i=1}^N w_i \right ) \mathbf {c}_\text {bg} , (1)
ing Diffusion Models have successfully been used to learn
and sample images [8, 34], video [9], speech [4, 14], etc. where the weights of color contributions are
Recently, multiple DDM-based models were proposed for wi = T (ti )ρ(ti ), defined with
the task of text-to-image synthesis, e.g. DALL-E 2 [25] and
Imagen [28]. Concurrently to our work, Dreamfusion [24] \rho (t_i) = 1 - \exp ( -\sigma (\mathbf {r}(t_i))(t_{i+1} - t_{i})) (2)

4182
and scene should be within the frustum of more than one of the
training views.
T(t_i) = \prod _{j=1}^{i-1} (1 - \rho (t_j)) (3) Combining these geometric regularizers into a loss func-
tion already gives a very strong baseline,
is the accumulated transmittance function, i.e. the probabil-
ity of the ray r(t) starting at camera center o and reach- \mathcal {L}_\text {geom} = \mathcal {L}_{\text {photo}} + \lambda _{\text {fg}} \mathcal {L}_{\text {fg}} + \lambda _{\text {fr}} \mathcal {L}_{\text {fr}} + \lambda _{\text {dist}} \mathcal {L}_{\text {dist}}. \label {eq:geomloss} (9)
ing coordinate r(ti ) without being absorbed. The cbg is the The λ coefficients control the contributions of the regular-
background color, which we set to white. izers. In our experiments we refer to this combination of
Similarly, one can compute the expected depth as losses as our “geometric baseline”.
3.2. Score functions and DDMs
\mathbf {D}(\mathbf {r}) = \frac {\sum _{i=1}^N w_i t_i}{\sum _{i=1}^N w_i} . (4)
Per Bayes’ theorem, the a posteriori probability of den-
sity and color fields given training views I is
The density and color fields are optimized to reduce the
photometric reconstruction loss, e.g. the L2 difference be- p(\sigma , \mathbf {c} | \mathcal {I}) \propto p(\mathcal {I} | \sigma , \mathbf {c}) p(\sigma , \mathbf {c}), (10)
tween input images and renderings from the same views is
where we drop the normalizing constant since it depends
only on I. The log-posterior is
\mathcal {L}_{\text {photo}}(\sigma , \mathbf {c}) = \sum _{i=1}^\mathcal {I} ||I_i - \mathbf {C}_i||_2. \label {eq:photo} (5)
\log (p(\mathcal {I} | \sigma , \mathbf {c})) + \log (p(\sigma , \mathbf {c})). \label {eq:posterior} (11)
The weights of color contributions wi in Eq. 5 can be In practice, we are interested in maximizing p(σ, c|I)
regularized to have compact distribution [2]: with stochastic gradient descent, which only requires
computation of the gradient of the log-likelihood
\mathcal {L}_{\text {dist}} = \frac {1}{D(\mathbf {r})} &\biggr (\sum _{i, j} w_i w_k \left | \frac {t_i + t_{i+1}}{2} - \frac {t_j + t_{j+1}}{2} \right | \nonumber \\ &+ \frac {1}{3} \sum _{i=1}^N w_i^2 (t_{i+1} - t_i)\biggr ) , ∇σ,c log(p(I|σ, c)) and the gradient of the log-prior
∇σ,c log(p(σ, c)), i.e. the score function. Notice that
explicit computation of the probabilities of the density and
(6) color fields p(σ, c) is not required. Below, we describe how
DDMs are learned and their relation to the score function.
The forward diffusion process progressively adds small
where we deviate from the original formulation by divid- Gaussian noise to a data sample x0 ∼ q(x) to produce pro-
ing through by the expected depth for the ray, which has gressively noisier versions, so
the effect of increasing the strength of this regularizer for
geometry that is close to the camera. \mathbf {x}_\tau = \sqrt {\alpha _\tau }\mathbf {x}_{\tau -1} + \sqrt {\beta _\tau } \epsilon _{\tau -1}, (12)
We also encourage the weights to sum to unity, because
in real scenes we always expect a ray to be absorbed fully where ϵτ −1 ∼ N (0, I) and ατ = 1 − βτ , i.e. the variances
by the scene geometry: {βτ }Tτ =1 control the noise schedule. As the noise function
is Gaussian, it follows from the reparameterization trick that

\mathcal {L}_{\text {fg}} = \left (1 - \sum _{i=1}^N w_i \right ) ^ 2. (7) q(\mathbf {x}_\tau | \mathbf {x}_{0}) = \mathcal {N}(\mathbf {x}_\tau ; \sqrt {\bar {\alpha }_\tau } \mathbf {x}_{0}, (1-\bar {\alpha }_\tau \mathbf {I})), (13)
Qτ
where ᾱτ = s=0 αs , allowing efficient generation of
In the few-view case, NeRFs frequently collapse to a de- noised samples for arbitrary τ . As T → ∞ the distribu-
generate solution in which each camera is fully or partially tion of noised samples xT is equivalent to an isotropic unit
“covered up” with a copy of the corresponding training im- Gaussian.
age. To prevent this, we introduce a regularization approach The DDM [8, 20, 31] is tasked to learn the reverse diffu-
in which the placement of density that is contained in only sion process:
one view frustum is penalized as p(\mathbf {x}_{\tau -1} | \mathbf {x}_\tau ) = \mathcal {N}(\mathbf {x}_{\tau -1}; \mathbf {\mu }(\mathbf {x}_\tau , \tau ), \tilde {\beta }_\tau \mathbf {I})), (14)
\mathcal {L}_\text {fr} = \sum _i w_i \mathbf {1}(n_i <= 1), (8) where β̃τ = (1 − ᾱτ −1 )βτ /(1 − ᾱτ ) .
Since xτ is available as input to µ(xτ , τ ), the mean
µ(xτ , τ ) can be computed by predicting noise ϵτ −1 from
where ni is the number of training view frustums in which
the noised input [8]:
the point along the ray r(ti ) is contained, so that only
weights which lie in fewer than two training frustums are
\mathbf {\mu }(\mathbf {x}_\tau , \tau ) = \frac {1}{\sqrt {\alpha _\tau }} \left ( \mathbf {x}_\tau - \frac {\beta _\tau }{\sqrt {1 - \bar {\alpha }_\tau }} \epsilon _\theta (\mathbf {x}_\tau , \tau ) \right ), (15)
included in the sum. This reflects our prior that most of the

4183
Forward Diffusion modeling the score function over the distribution of RGBD
patches ϵθ ({C(r), D(r)|r ∈ P }), where P is a set of rays
that pass through a random 48 × 48 patch of pixels cast
from a random camera. To allow control of the magni-
tude of the gradients, we further normalize the output of
ϵθ ({C(r), D(r)|r ∈ P }), and refer to this regularization
…
function as ϵθ (see supplementary for details).
To train our DDM we use Hypersim [26], a photoreal-
istic synthetic dataset for indoor scene understanding with
ground truth images and depth maps. Specifically, we sam-
Reverse Diffusion ple 48 × 48 patches of images and depth maps to generate
training data for the DDM (removing problematic images
(a) and scenes as per dataset instructions); see Fig. 3(b) for ex-
amples. Fig. 3(c) shows samples of RGBD patches gener-
ated by our DDM model. The quality of samples indicates
that DDM successfully learns the data distribution of the
(b) RGBD Hypersim patches.
3.3. Regularizing NeRFs with DDMs
The gradient of the log-posterior (11), which forms our
loss function, is
(c)
Figure 3. (a) Illustration of forward and reverse diffusion pro- \nabla \log p(\sigma , \mathbf {c} | \mathcal {I}) = \nabla \log p(\sigma , \mathbf {c}) + \nabla \log p(\mathcal {I} | \sigma , \mathbf {c}). (18)
cesses. (b) Example RGBD patches in the training set of the DDM
model extracted from Hypersim dataset. (c) Example RGBD By plugging (17) into the above, we can use a diffusion
patches generated with our DDM model trained on Hypersim model as a prior over (σ, c). For the second term on the
dataset. Depths are shown as normalized inverse depths for vi- RHS we use loss in eq 9, resulting in the following gradient
sualization purposes. The noise in the samples is due to noise that for our loss function:
is injected during the sampling process.
\nabla \mathcal {L} = \nabla \mathcal {L}_{\text {photo}} + \lambda _{\text {fg}} \nabla \mathcal {L}_{\text {fg}} + \lambda _{\text {fr}} \nabla \mathcal {L}_{\text {fr}} + \lambda _{\text {dist}} \nabla \mathcal {L}_{\text {dist}} - \lambda _{\text {DDM}} \epsilon _\theta , \label {eq:lossgrad}
(19)
using a neural network ϵθ (xτ , τ ). where λDDM controls the weight of the our regularizer.
Thus, one can learn the reverse diffusion process by During NeRF optimization we compute the gradient of
training a neural network ϵθ (xτ , τ ) to estimate noise given the loss as per eq. 19 and backpropagate as usual to obtain
a noised input and noise-level using the loss function: gradients for the NeRF density and color field parameters.
3.4. Implementation Details
\mathbb {E}_{\mathbf {x}_0, \epsilon } \left [ \frac {\beta _\tau }{2\alpha _\tau (1-\bar {\alpha }_\tau )} || \epsilon - \epsilon _\theta \left ( \sqrt {\bar {\alpha }_\tau } \mathbf {x}_0 + \sqrt {1 - \bar {\alpha }_\tau } \epsilon , \tau \right ) || \right ] ,
We use the training protocol of [8, 39] to train our DDM
(16)
model. We optimize the DDM for 650,000 steps with batch
where ϵ ∼ N (0, I). Fig. 3 (a) illustrates the forward and
size 32 on 1 GPU.
backwards processes.
We use the torch-ngp [36] implementation of Instant
Importantly, it was shown in [8, 37] that a DDM noise
NGP [19] with the tiny-cuda-nn [18] back-end as the
estimator has a connection to score matching [10, 32, 33]
NeRF model for our experiments. NeRFs are optimized
and is proportional to the score function:
for 12,000 steps, where the first 2500 steps are opti-
\epsilon _\theta (\mathbf {x}_\tau , \tau ) \propto -\nabla _\mathbf {x} \log p(\mathbf {x}). \label {eq:score} (17) mized with λdist = 0 and the diffusion time parame-
ter τ smoothly interpolates from 0.1 to 0, hence we set
Hence, taking steps in the negative direction to the noise ᾱτ = cos(0.5π(τ + 0.008)/1.008) and other variables are
predicted by the model is equivalent to moving towards the derived accordingly. By scheduling τ this way the diffu-
modes of the data distribution. This can be used to generate sion model is conditioned to expect progressively less noisy
samples from the data distribution using Langevin dynam- inputs as the NeRF trains and generates increasingly more
ics [8, 32, 42]. accurate colors and depths. After 3000 steps, λdist linearly
In this work, we want to use a DDM model as a score increases from 0 until it reaches its maximum value at 8000
function estimator to regularize NeRF reconstructions ac- steps, where the maximum value is 1 × 10−4 for the DTU
cording to eq. 11. Hence, we model a prior over (σ, c) by dataset and 1.5×10−5 for the LLFF dataset. We empirically

4184
PSNR ↑ SSIM ↑ LPIPS ↓ Average ↓
Method Setting
3-view 6-view 9-view 3-view 6-view 9-view 3-view 6-view 9-view 3-view 6-view 9-view
mip-NeRF [1] Optimized per Scene 14.62 20.87 24.26 0.351 0.692 0.805 0.495 0.255 0.172 0.246 0.114 0.073
DietNeRF [11] Optimized per Scene 14.94 21.75 24.28 0.370 0.717 0.801 0.496 0.248 0.183 0.240 0.105 0.073
PixelNeRF ft [45] DTU + ft per Scene 16.17 17.03 18.92 0.438 0.473 0.535 0.512 0.477 0.430 0.217 0.196 0.163
LLFF

MVSNeRF ft [3] DTU + ft per Scene 17.88 19.99 20.47 0.584 0.660 0.695 0.327 0.264 0.244 0.157 0.122 0.111
RegNeRF [21] Optimized per Scene 19.08 21.10 24.86 0.587 0.760 0.820 0.336 0.206 0.161 0.146 0.086 0.067
Geometric Baseline Optimized per Scene 19.88 24.28 25.10 0.590 0.765 0.802 0.192 0.101 0.084 0.118 0.071 0.060
DiffusioNeRF (Ours) Optimized per Scene 19.79 23.79 25.02 0.568 0.747 0.785 0.209 0.114 0.096 0.127 0.075 0.064
mip-NeRF [1] Optimized per Scene 8.68 16.54 23.58 0.571 0.741 0.879 0.353 0.198 0.092 0.323 0.148 0.056
DietNeRF [11] Optimized per Scene 11.85 20.63 23.83 0.633 0.778 0.823 0.314 0.201 0.173 0.243 0.101 0.068
PixelNeRF ft [45] DTU + ft per Scene 18.95 20.56 21.83 0.710 0.753 0.781 0.269 0.223 0.203 0.125 0.104 0.090
DTU

MVSNeRF ft [3] DTU + ft per Scene 18.54 20.49 22.22 0.769 0.822 0.853 0.197 0.155 0.135 0.113 0.089 0.069
RegNeRF [21] Optimized per Scene 18.89 22.20 24.93 0.745 0.841 0.884 0.190 0.117 0.089 0.112 0.071 0.047
Geometric Baseline Optimized per Scene 13.60 16.43 22.01 0.661 0.759 0.853 0.212 0.147 0.071 0.185 0.092 0.056
DiffusioNeRF (Ours) Optimized per Scene 16.20 20.34 25.18 0.698 0.818 0.883 0.160 0.093 0.046 0.135 0.052 0.033

Table 1. DiffusioNeRF vs. SOTA in novel view synthesis task on LLFF and DTU datasets with few input views [21, 45]. We report
scores on PSNR, SSIM, LPIPS and Average metrics averaged over all 8 scenes when NeRFs are fitted with 3, 6 and 9 training views. For
each view/metric combination the first and second scores are highlighted.

found that this schedule of τ and regularization weights pro- views of the scene are used as ground truth to com-
duces best results. On a single Nvidia A100 GPU our NeRF pare against synthesized views. Image similarity met-
model trains in approximately 30 minutes per scene. rics such as PSNR, SSIM [41] and LPIPS [47] are mea-
Furthermore, 25% of the time we use a training pose sured for each test view and average score per each scene
for patch rendering, and sample the RGB component of the is reported. We also report an “Average” score, specifi-
RGBD patch directly from the training image. This is help- cally the geometric mean of the three metrics as per [1]:
p3
√
ful in the early stages, when NeRF renderings are not yet 10 −PSNR/10 · 1 − SSIM · LPIPS.
accurate. For the geometry estimation task, we convert an isosur-
face of the density field into a mesh using the marching
4. Experiments cubes. The mesh is culled to retain only parts that are visible
Datasets We experiment on two datasets: LLFF and DTU. in at least one training view and the background surfaces are
The LLFF [16] dataset has 8 scenes with 20-62 images masked out. We then sample the mesh to generate a point
per scene captured with a handheld camera. The scenes are cloud, and report the average chamfer L1 distance between
reconstructed with COLMAP [30] to estimate camera in- the estimated and ground truth point clouds.
trinsics, camera poses and the 3D bounds of the scenes. A
4.1. Evaluations
few images are used for training and test images are used
to evaluate novel view synthesis quality. We select LLFF Table 1 show a comparison of our geometric baseline
for evaluations as it allows comparison against other SOTA and our model against SOTA methods on LLFF and DTU
NeRF models, such as RegNeRF [21]. datasets when trained with 3, 6 and 9 views. When the num-
The DTU [12] dataset consists of images of objects ber of views is low, the regularizer can have a large impact
placed on a table against black background. Images and on the final result, which allows easier comparison of reg-
depth maps are captured with structured light scanner ularizers. As seen from table 1, the geometric baseline and
mounted on an industrial robot arm. The dataset provides our method both score favorably to other methods, achiev-
images, poses, and ground truth point clouds for evaluation. ing best scores in PSNR, LPIPS and Average metrics. Our
For novel-view synthesis with few view setting on DTU, geometric baseline has higher metrics on LLFF, however
we use the test set of 15 scans of PixelNeRF [45], allowing there are artifacts in the generated test views that can be
comparison against other methods. seen in Fig. 4. Our diffusion model-based method gener-
We use the test set of 15 scans defined in [23, 43, 46] ates more plausible depths compared to the geometric base-
to evaluate geometry quality, e.g. via the surface method line, see section 4.2. One side-effect is over-smoothing of
of evaluation as described in UNISURF [23]. Tradition- thin-structures (e.g. the top row in Fig. 4). It is also note-
ally, geometry estimated by the density field of a NeRF may worthy that test views contain parts of the scene that are not
not allow accurate surface reconstruction compared to occu- visible in any of the training views. These occluded parts
pancy and SDF-based approaches [23], which score higher of the scene can impact reconstruction scores significantly
on DTU, e.g. [23, 43, 44, 46]. (see supplementary for details).
Metrics For the task of novel-view synthesis, hold-out Table 2 shows an evaluation of reconstruction quality on

4185
Ground Truth RegNeRF Geometric Baseline DiffusioNeRF (Ours)

Figure 4. Qualitative results for the task of novel view synthesis on LLFF dataset. NeRF models are trained with 3 views and rendered
from one of test views. Our DDM model encourages more realistic geometry as seen in the depth maps.

15 scans of the DTU dataset when NeRFs are fitted with view synthesis and reconstruction quality. As reported, the
all views. In the large number of views regime, the priors geometric baseline scores favorably on the LLFF dataset,
are less important as training views provide more informa- but has issues in geometry as reflected in DTU scores. Qual-
tion about the scene. Nevertheless, the priors should not itative results in Fig. 4 demonstrate that the geometry esti-
introduce any undesirable artifacts and can help with am- mated by the geometric baseline is not realistic, even if the
biguous regions such as textureless table. Despite DDM appearance scores are high. Our DDM-based approach im-
being trained on images of indoor room-sized scenes, it proves on DTU scores, but its performance on the novel
shows good generalization to the object-centric reconstruc- view synthesis metrics is hampered by its tendency to intro-
tion task. Our density-based method performs adequately duce details in areas of the scene that are not pictured in any
when compared to occupancy and SDF-based methods. training view.
In Fig. 5 the qualitative results indicate that density based
methods struggle with shiny objects (rows 2 and 4) but can In table 3 we also show ablations of some of the finer de-
have higher fidelity geometry on diffuse and textured sur- tails of our model. This table suggests that a model trained
faces (rows 1 and 3). The textured regions alone are not on 24 × 24 patches outperforms a model trained on 48 × 48
sufficient for high quality output, e.g. our geometric base- patches on LLFF, but underperforms on DTU.
line struggles to complete the geometry of a house in row The ablations show the significance of feeding patches
1, and our DDM model provides a complementary signal from input images to DDM 25% of the time during NeRF
to the geometric regularizers resulting in fewer holes and fitting. It can be especially important early on, when ren-
smoother surfaces. dered patches are very different from input images.
4.2. Ablation studies Unsurprisingly, reducing the amount of training data for
In table 3 we show contributions of each of our optimiza- the DDM (only using 20% of the Hypersim scenes) slightly
tion terms evaluated on LLFF and DTU datasets for novel reduces the scores. The RGB-only regularization with
DDMs is similar to RegNeRF’s normalizing flow model
Mean Mean regularization, but with larger patch sizes. Interestingly, the
SDF-based Methods Chamfer- NeRF-based Methods Chamfer- RGBD regularizer trained with 20% of the data is still better
L1 ↓ L1 ↓ than the RGB-only regularizer that was trained with 100%
UNISURF [23] 1.02 Instant NGP [19] 1.71 of the data. The last two rows of the ablation show that care-
NeuS [40] 0.84 NeRF [17] 1.49
VolSDF [43] 0.86 Geometric Baseline 1.36 ful scheduling of τ and DDM gradient weights is necessary
MonoSDF [46] 0.73 DiffusioNeRF 1.21 to produce good results. This is an active area of research,
having previously been noted in [24]. The DDM weight
Table 2. DiffusioNeRF vs. SOTA in geometry reconstruction on λDDM trades off the accuracy of reconstruction around thin
the DTU dataset with all views [5]. structures against the overall depth smoothness.

4186
Scan # RGB NeuS [40] VolSDF [43] MonoSDF [46] Geom. Baseline Ours

110

Figure 5. Qualitative comparison of our method against SOTA on geometry reconstruction evaluated on DTU dataset.

LLFF DTU
Method Average ↓ Average ↓ Chamfer-L1 ↓
3-view 6-view 9-view 3-view 6-view 9-view All views
∇L = ∇Lphoto 0.210 0.128 0.090 0.203 0.142 0.119 2.87
∇L = ∇Lphoto + λfg ∇Lfg 0.210 0.128 0.090 0.195 0.126 0.092 1.71
∇L = ∇Lphoto + λfg ∇Lfg + λfr ∇Lfr 0.135 0.089 0.072 0.215 0.128 0.093 1.71
∇L = ∇Lphoto + λfg ∇Lfg + λfr ∇Lfr − λDDM ϵθ 0.145 0.085 0.066 0.190 0.097 0.072 1.67
∇L = ∇Lphoto + λfg ∇Lfg + λfr ∇Lfr + λdist ∇Ldist 0.118 0.071 0.060 0.185 0.092 0.056 1.36
∇L = ∇Lphoto + λfg ∇Lfg + λfr ∇Lfr + λdist ∇Ldist − λDDM ϵθ 0.127 0.075 0.064 0.135 0.052 0.033 1.21
DDM regularizer using 24x24 patches 0.126 0.074 0.061 0.195 0.068 0.043 1.22
24x24 patch DDM & NeRF fitted with 4 × λDDM 0.129 0.074 0.062 0.260 0.080 0.050 1.22
Patches from input images are not given to DDM 0.139 0.078 0.066 0.159 0.063 0.049 1.91
DDM trained with 20% of Hypersim scenes 0.132 0.078 0.066 0.163 0.057 0.035 1.65
RGB-only DDM regularizer 0.134 0.083 0.070 0.189 0.081 0.058 1.31
τ = 0 (no schedule) during NeRF fitting 0.137 0.081 0.067 0.152 0.055 0.042 1.31
NeRF fitted with 4 × λDDM 0.146 0.088 0.076 0.220 0.134 0.071 2.56

Table 3. Ablation study of our method. Note that for DTU, λfr is set to 0, hence the 2nd and 3rd rows have identical scores on DTU.
Geometric baseline corresponds to the model in the 5th row.

5. Conclusions results are shown in the supplementary materials.

In this paper we address the problem of regularization One avenue of future work is formulating a principled
of NeRFs. Our approach uses a DDM trained on RGBD combination of the DDM gradient with the NeRF objective
patches to approximate a score function, i.e. the gradient to avoid heuristics-based τ and gradient scheduling.
of the logarithm of an RGBD patch distribution. Experi-
Our work is focused on NeRF optimization, however the
mentally, we demonstrate that the proposed regularization
general approach of using DDMs as a regularizer could po-
scheme improves performance on novel view synthesis and
tentially be used for other tasks that are optimized with gra-
3D reconstruction.
dient descent, e.g. self-supervised monocular depth estima-
While we show regularization using color and depth tion [5], or self-supervised stereo matching [49, 50].
patches as input, the proposed framework is versatile and
can be used to regularize the 3D voxel grid of densities, Acknowledgements We thank Niantic colleagues, espe-
density weights sampled along the ray, etc. Indeed, instead cially Gabriel Brostow, for discussions and suggestions. We
of generating RGBD patches, we can generate 3D voxel are also grateful for Jiaxiang Tang’s Pytorch implementa-
blocks of densities to train a DDM and use it during NeRF tion of Instant-NGP [36], Phil Wang’s implementation of
optimization to regularize the density field directly. Early DDM [39], and to Thomas Müller for tiny-cuda-nn [18].

4187
References Representing Scenes as Neural Radiance Fields for View
Synthesis. In ECCV, 2020. 1, 2, 7
[1] Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter
[18] Thomas Müller. Tiny CUDA Neural Network Framework,
Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan.
2021. github.com/nvlabs/tiny-cuda-nn. 5, 8
Mip-NeRF: A Multiscale Representation for Anti-Aliasing
Neural Radiance Fields. In ICCV, 2021. 6 [19] Thomas Müller, Alex Evans, Christoph Schied, and Alexan-
der Keller. Instant Neural Graphics Primitives with a Mul-
[2] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P.
tiresolution Hash Encoding. ACM TOG, 2022. 2, 5, 7
Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded
Anti-Aliased Neural Radiance Fields. In CVPR, 2022. 2, 4 [20] Alexander Quinn Nichol and Prafulla Dhariwal. Improved
[3] Anpei Chen, Zexiang Xu, Fuqiang Zhao, Xiaoshuai Zhang, Denoising Diffusion Probabilistic Models. In ICML, 2021.
Fanbo Xiang, Jingyi Yu, and Hao Su. MVSNeRF: Fast Gen- 3, 4
eralizable Radiance Field Reconstruction from Multi-View [21] Michael Niemeyer, Jonathan T. Barron, Ben Mildenhall,
Stereo. In ICCV, 2021. 2, 6 Mehdi S. M. Sajjadi, Andreas Geiger, and Noha Radwan.
[4] Nanxin Chen, Yu Zhang, Heiga Zen, Ron J Weiss, Moham- RegNeRF: Regularizing Neural Radiance Fields for View
mad Norouzi, and William Chan. WaveGrad: Estimating Synthesis from Sparse Inputs. In CVPR, 2022. 1, 2, 3, 6
Gradients for Waveform Generation. In ICLR, 2020. 3 [22] Michael Niemeyer, Lars Mescheder, Michael Oechsle, and
[5] Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. Andreas Geiger. Differentiable Volumetric Rendering:
Unsupervised Monocular Depth Estimation with Left-Right Learning Implicit 3D Representations without 3D Supervi-
Consistency. In CVPR, 2017. 7, 8 sion. In CVPR, 2020. 2
[6] Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and [23] Michael Oechsle, Songyou Peng, and Andreas Geiger.
Yaron Lipman. Implicit Geometric Regularization for Learn- UNISURF: Unifying Neural Implicit Surfaces and Radiance
ing Shapes. In ICML, 2020. 2 Fields for Multi-View Reconstruction. In ICCV, 2021. 2, 6,
[7] Peter Hedman, Pratul P. Srinivasan, Ben Mildenhall, 7
Jonathan T. Barron, and Paul Debevec. Baking Neural Ra- [24] Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden-
diance Fields for Real-Time View Synthesis. ICCV, 2021. hall. DreamFusion: Text-to-3D using 2D Diffusion. In ICLR,
2 2023. 3, 7
[8] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- [25] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu,
fusion Probabilistic Models. In NeurIPS, 2020. 3, 4, 5 and Mark Chen. Hierarchical Text-conditional Image Fener-
[9] Jonathan Ho, Tim Salimans, Alexey A Gritsenko, William ation with CLIP Latents. arXiv preprint arXiv:2204.06125,
Chan, Mohammad Norouzi, and David J Fleet. Video Diffu- 2022. 3
sion Models. In ICLR Workshop on Deep Generative Models [26] Mike Roberts, Jason Ramapuram, Anurag Ranjan, Atulit
for Highly Structured Data, 2022. 3 Kumar, Miguel Angel Bautista, Nathan Paczan, Russ Webb,
[10] Aapo Hyvärinen. Estimation of non-normalized statistical and Joshua M. Susskind. Hypersim: A Photorealistic Syn-
models by score matching. Journal of Machine Learning thetic Dataset for Holistic Indoor Scene Understanding. In
Research, 2005. 5 ICCV, 2021. 5
[11] Ajay Jain, Matthew Tancik, and Pieter Abbeel. Putting NeRF [27] Barbara Roessle, Jonathan T. Barron, Ben Mildenhall,
on a Diet: Semantically Consistent Few-Shot View Synthe- Pratul P. Srinivasan, and Matthias Nießner. Dense Depth Pri-
sis. In ICCV, 2021. 2, 6 ors for Neural Radiance Fields from Sparse Input Views. In
[12] Rasmus Jensen, Anders Dahl, George Vogiatzis, Engil Tola, CVPR, 2022. 2
and Henrik Aanæs. Large Scale Multi-view Stereopsis Eval- [28] Chitwan Saharia, William Chan, Saurabh Saxena, Lala
uation. In CVPR, 2014. 6 Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour,
[13] Ivan Kobyzev, Simon JD Prince, and Marcus A Brubaker. Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans,
Normalizing Flows: An Introduction and Review of Current Jonathan Ho, David J Fleet, and Mohammad Norouzi. Pho-
Methods. IEEE TPAMI, 2020. 3 torealistic Text-to-Image Diffusion Models with Deep Lan-
[14] Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and guage Understanding. In NeurIPS, 2022. 3
Bryan Catanzaro. DiffWave: A Versatile Diffusion Model [29] Sara Fridovich-Keil and Alex Yu, Matthew Tancik, Qinhong
for Audio Synthesis. In ICLR, 2020. 3 Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels:
[15] Lingjie Liu, Jiatao Gu, Kyaw Zaw Lin, Tat-Seng Chua, and Radiance Fields without Neural Networks. In CVPR, 2022.
Christian Theobalt. Neural Sparse Voxel Fields. In NeurIPS, 2
2020. 2 [30] Johannes Lutz Schönberger and Jan-Michael Frahm.
[16] Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Structure-from-Motion Revisited. In CVPR, 2016. 6
Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and [31] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan,
Abhishek Kar. Local Light Field Fusion: Practical View and Surya Ganguli. Deep Unsupervised Learning using
Synthesis with Prescriptive Sampling Guidelines. ACM Nonequilibrium Thermodynamics. In ICML, 2015. 3, 4
TOG, 2019. 1, 6 [32] Yang Song and Stefano Ermon. Generative Modeling by
[17] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Estimating Gradients of the Data Distribution. In NeurIPS,
Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. NeRF: 2019. 5

4188
[33] Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Improving Ability. arXiv preprint arXiv:1709.00930, 2017.
Sliced Score Matching: A Scalable Approach to Density and 8
Score Estimation. In Uncertainty in Artificial Intelligence, [50] Chao Zhou, Hong Zhang, Xiaoyong Shen, and Jiaya Jia. Un-
2020. 5 supervised Learning of Stereo Matching. In ICCV, 2017. 8
[34] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab-
hishek Kumar, Stefano Ermon, and Ben Poole. Score-Based
Generative Modeling through Stochastic Differential Equa-
tions. In ICLR, 2021. 3
[35] Matthew Tancik, Pratul P. Srinivasan, Ben Mildenhall, Sara
Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra-
mamoorthi, Jonathan T. Barron, and Ren Ng. Fourier Fea-
tures Let Networks Learn High Frequency Functions in Low
Dimensional Domains. In NeurIPS, 2020. 2
[36] Jiaxiang Tang. Torch-NGP: a PyTorch implementation of
Instant-NGP, 2022. github.com/ashawkey/torch-ngp. 5, 8
[37] Pascal Vincent. A Connection Between Score Matching and
Denoising Autoencoders. Neural Computation, 2011. 5
[38] Jiepeng Wang, Peng Wang, Xiaoxiao Long, Christian
Theobalt, Taku Komura, Lingjie Liu, and Wenping Wang.
NeuRIS: Neural Reconstruction of Indoor Scenes Using
Normal Priors. In ECCV, 2022. 2
[39] Phil Wang. Denoising Diffusion Probabilistic Model in
Pytorch, 2022. github.com/lucidrains/denoising-diffusion-
pytorch. 5, 8
[40] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku
Komura, and Wenping Wang. NeuS: Learning Neural Im-
plicit Surfaces by Volume Rendering for Multi-view Recon-
struction. In NeurIPS, 2021. 2, 7, 8
[41] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Si-
moncelli. Image Quality Assessment: From Error Visibility
to Structural Similarity. IEEE TIP, 2004. 6
[42] Max Welling and Yee W Teh. Bayesian Learning via
Stochastic Gradient Langevin Dynamics. In ICML, 2011.
3, 5
[43] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Vol-
ume Rendering of Neural Implicit Surfaces. In NeurIPS,
2021. 2, 6, 7, 8
[44] Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan
Atzmon, Basri Ronen, and Yaron Lipman. Multiview Neu-
ral Surface Reconstruction by Disentangling Geometry and
Appearance. In NeurIPS, 2020. 2, 6
[45] Alex Yu, Vickie Ye, Matthew Tancik, and Angjoo Kanazawa.
pixelNeRF: Neural radiance fields from one or few images.
In CVPR, 2021. 6
[46] Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sat-
tler, and Andreas Geiger. MonoSDF: Exploring Monocular
Geometric Cues for Neural Implicit Surface Reconstruction.
In NeurIPS, 2022. 2, 6, 7, 8
[47] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman,
and Oliver Wang. The Unreasonable Effectiveness of Deep
Features as a Perceptual Metric. In CVPR, 2018. 6
[48] Xiuming Zhang, Pratul P Srinivasan, Boyang Deng, Paul De-
bevec, William T Freeman, and Jonathan T Barron. NeRFac-
tor: Neural Factorization of Shape and Reflectance Under an
Unknown Illumination. ACM TOG, 2021. 2
[49] Yiran Zhong, Yuchao Dai, and Hongdong Li. Self-
Supervised Learning for Stereo Matching with Self-

4189

Maths G-8 P-II E
No ratings yet
Maths G-8 P-II E
184 pages
Brand Guidelines
No ratings yet
Brand Guidelines
5 pages
Nerf Studio
No ratings yet
Nerf Studio
13 pages
CV - Service Engineer - Ramkumar
No ratings yet
CV - Service Engineer - Ramkumar
4 pages
National Taiwan University Chi Ming Chung Master Thesis Fix Final
No ratings yet
National Taiwan University Chi Ming Chung Master Thesis Fix Final
80 pages
4.NeRF
No ratings yet
4.NeRF
65 pages
Presented By:-Rajiv Ranjan Singh
No ratings yet
Presented By:-Rajiv Ranjan Singh
7 pages
Database Management Systems
No ratings yet
Database Management Systems
52 pages
Hedge Fund Replication: A Model Combination Approach: Michael O'Doherty N. E. Savin Ashish Tiwari
No ratings yet
Hedge Fund Replication: A Model Combination Approach: Michael O'Doherty N. E. Savin Ashish Tiwari
44 pages
Rafe: Generative Radiance Fields Restoration
No ratings yet
Rafe: Generative Radiance Fields Restoration
23 pages
RobustNeRF - Ignoring Distractors With Robust Losses
No ratings yet
RobustNeRF - Ignoring Distractors With Robust Losses
20 pages
Plenoxel
No ratings yet
Plenoxel
21 pages
Mip-Nerf 360: Unbounded Anti-Aliased Neural Radiance Fields
No ratings yet
Mip-Nerf 360: Unbounded Anti-Aliased Neural Radiance Fields
18 pages
MIP NeRF
No ratings yet
MIP NeRF
19 pages
Ne RF
No ratings yet
Ne RF
20 pages
Ambient-NeRF Light Train Enhancing Neural Radiance Fields in Low-Light Conditions With Ambient-Illumination
No ratings yet
Ambient-NeRF Light Train Enhancing Neural Radiance Fields in Low-Light Conditions With Ambient-Illumination
17 pages
NeRF in The Dark High Dynamic Range View Synthesis From Noisy Raw Images
No ratings yet
NeRF in The Dark High Dynamic Range View Synthesis From Noisy Raw Images
18 pages
Point-NeRF Point-Based Neural Radiance Fields
No ratings yet
Point-NeRF Point-Based Neural Radiance Fields
16 pages
(DiffusionGS)2503.10860v1
No ratings yet
(DiffusionGS)2503.10860v1
14 pages
Digital Marketing Tutorial - Javatpoint
No ratings yet
Digital Marketing Tutorial - Javatpoint
11 pages
2312.02981v1
No ratings yet
2312.02981v1
13 pages
Nerfstudio
No ratings yet
Nerfstudio
12 pages
Officine Orobiche: Level Transmitter 4-20 Ma - TL Series
No ratings yet
Officine Orobiche: Level Transmitter 4-20 Ma - TL Series
1 page
Nerf Weekly Report March
No ratings yet
Nerf Weekly Report March
13 pages
Xie 等 - 2023 - HollowNeRF Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
No ratings yet
Xie 等 - 2023 - HollowNeRF Pruning Hashgrid-Based NeRFs with Trainable Collision Mitigation
11 pages
NeRF Seminar Report Part3
No ratings yet
NeRF Seminar Report Part3
13 pages
Bian_NoPe-NeRF_Optimising_Neural_Radiance_Field_With_No_Pose_Prior_CVPR_2023_paper
No ratings yet
Bian_NoPe-NeRF_Optimising_Neural_Radiance_Field_With_No_Pose_Prior_CVPR_2023_paper
10 pages
Gaussian Activated Neural Radiance Fields For
No ratings yet
Gaussian Activated Neural Radiance Fields For
17 pages
Pixel Nerf
No ratings yet
Pixel Nerf
10 pages
FR UB4 EVak AESawh
No ratings yet
FR UB4 EVak AESawh
41 pages
TowerXchange Journal Issue 1
No ratings yet
TowerXchange Journal Issue 1
56 pages
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
No ratings yet
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models For 3D Generation
16 pages
116.00000162
No ratings yet
116.00000162
32 pages
Technical Overview: SAP Contact Center Software Version 7
No ratings yet
Technical Overview: SAP Contact Center Software Version 7
56 pages
2211.09869v4
No ratings yet
2211.09869v4
15 pages
Beyondpixels: A Comprehensive Review of The Evolution of Neural Radiance Fields
No ratings yet
Beyondpixels: A Comprehensive Review of The Evolution of Neural Radiance Fields
33 pages
Depth-Supervised Nerf: Fewer Views and Faster Training For Free
No ratings yet
Depth-Supervised Nerf: Fewer Views and Faster Training For Free
13 pages
NeRF - Neural Radiance Field in 3D Vision, A Comprehensive Review
No ratings yet
NeRF - Neural Radiance Field in 3D Vision, A Comprehensive Review
28 pages
GraspNeRF Multiview-Based 6-DoF Grasp Detection For Transparent and Specular Objects Using Generalizable NeRF
No ratings yet
GraspNeRF Multiview-Based 6-DoF Grasp Detection For Transparent and Specular Objects Using Generalizable NeRF
7 pages
IC Product Quality Control Chart Sample 11221
No ratings yet
IC Product Quality Control Chart Sample 11221
7 pages
Nerf: Neural Radiance Field in 3D Vision, Introduction and Review
No ratings yet
Nerf: Neural Radiance Field in 3D Vision, Introduction and Review
26 pages
NeRF Seminar Report Part2
No ratings yet
NeRF Seminar Report Part2
9 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
Nerf Slam
No ratings yet
Nerf Slam
10 pages
4K-NeRF High Fidelity Neural Radiance Fields at Ultra High Resolutions
No ratings yet
4K-NeRF High Fidelity Neural Radiance Fields at Ultra High Resolutions
16 pages
Block Nerf
No ratings yet
Block Nerf
15 pages
Morrow Dam Second Violation
No ratings yet
Morrow Dam Second Violation
5 pages
NeRF-SR High-Quality Neural Radiance Fields Using Super-Sampling
No ratings yet
NeRF-SR High-Quality Neural Radiance Fields Using Super-Sampling
14 pages
Nerf Supervision
No ratings yet
Nerf Supervision
8 pages
Accelerating NeRF With The Visual Hull
No ratings yet
Accelerating NeRF With The Visual Hull
15 pages
Enhancing View Synthesis With Depth-Guided Neural Radiance Fields and Improved Depth Completion
No ratings yet
Enhancing View Synthesis With Depth-Guided Neural Radiance Fields and Improved Depth Completion
17 pages
Regnerf: Regularizing Neural Radiance Fields For View Synthesis From Sparse Inputs
No ratings yet
Regnerf: Regularizing Neural Radiance Fields For View Synthesis From Sparse Inputs
11 pages
A Generic and Flexible Regularization Framework For NeRFs
No ratings yet
A Generic and Flexible Regularization Framework For NeRFs
10 pages
NeRV - Pratul
No ratings yet
NeRV - Pratul
12 pages
Stylesdf: High-Resolution 3D-Consistent Image and Geometry Generation
No ratings yet
Stylesdf: High-Resolution 3D-Consistent Image and Geometry Generation
17 pages
3D Aware Synthesis Via Learning Textural and Structural Representations
No ratings yet
3D Aware Synthesis Via Learning Textural and Structural Representations
13 pages
107_03_SmartNotes
No ratings yet
107_03_SmartNotes
6 pages
柱
No ratings yet
柱
2 pages
2303.09431v1
No ratings yet
2303.09431v1
11 pages
Volume GAN
No ratings yet
Volume GAN
12 pages
Neural Radiance Fields NeRFs a Review and Some Rec
No ratings yet
Neural Radiance Fields NeRFs a Review and Some Rec
5 pages
NeRF Seminar Report Part1
No ratings yet
NeRF Seminar Report Part1
5 pages
Azinovic_Neural_RGB-D_Surface_Reconstruction_CVPR_2022_paper
No ratings yet
Azinovic_Neural_RGB-D_Surface_Reconstruction_CVPR_2022_paper
12 pages
Neural Radiance Fields Implementation
No ratings yet
Neural Radiance Fields Implementation
5 pages
ACM: NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis
No ratings yet
ACM: NeRF: Representing Scenes As Neural Radiance Fields For View Synthesis
8 pages
Cdi 31 Chapter Vi
No ratings yet
Cdi 31 Chapter Vi
4 pages
NeRF-DA_Neural_Radiance_Fields_Deblurring_With_Active_Learning
No ratings yet
NeRF-DA_Neural_Radiance_Fields_Deblurring_With_Active_Learning
5 pages
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
No ratings yet
Turki Mega-NERF Scalable Construction of Large-Scale NeRFs For Virtual Fly-Throughs CVPR 2022 Paper
10 pages
GIRAFFE; Representing Scenes as Compositional Generative Neural Feature Fields _2011.12100v2
No ratings yet
GIRAFFE; Representing Scenes as Compositional Generative Neural Feature Fields _2011.12100v2
12 pages
Fferential Equations: Fast Neural Network Based Solving of Partial Di
No ratings yet
Fferential Equations: Fast Neural Network Based Solving of Partial Di
6 pages
SOAP-TEMPLATE
No ratings yet
SOAP-TEMPLATE
1 page
EVAPORACIONES ONIX 2022
No ratings yet
EVAPORACIONES ONIX 2022
1 page
EXERCISE NO. 11 Respiratory
No ratings yet
EXERCISE NO. 11 Respiratory
6 pages
Dual Wavelength Laser For Cosmetic Gynecology
No ratings yet
Dual Wavelength Laser For Cosmetic Gynecology
13 pages
Nerf Paper IA 3D
No ratings yet
Nerf Paper IA 3D
8 pages
Tosi Et Al CVPR2023 Poster
No ratings yet
Tosi Et Al CVPR2023 Poster
1 page
Solution - Chapter 2
No ratings yet
Solution - Chapter 2
4 pages
From 2D to 3D: Leveraging Sparse Inputs for High-Fidelity Model Generation with Neural Radiance Fields
No ratings yet
From 2D to 3D: Leveraging Sparse Inputs for High-Fidelity Model Generation with Neural Radiance Fields
5 pages
Hello KEBEI
No ratings yet
Hello KEBEI
2 pages
Marketing Project: Visit To A Retail Store
100% (1)
Marketing Project: Visit To A Retail Store
20 pages
Maria Van Buskirk - 2015 Resume
No ratings yet
Maria Van Buskirk - 2015 Resume
1 page
Department of Physics Indian Institute of Technology Jodhpur Two-Year M.Sc. Program Curriculum
No ratings yet
Department of Physics Indian Institute of Technology Jodhpur Two-Year M.Sc. Program Curriculum
67 pages
Summative Quiz in Animal Production 7
100% (4)
Summative Quiz in Animal Production 7
11 pages
Imei Tac To Ue Model 20101026
67% (3)
Imei Tac To Ue Model 20101026
930 pages
ISIPHO The Gift Converted 1 1 1
No ratings yet
ISIPHO The Gift Converted 1 1 1
545 pages
Office 2019
No ratings yet
Office 2019
1 page
Lab Experiment No.02: Introduction To Computing (COMP-01102) Telecom 1 Semester
100% (1)
Lab Experiment No.02: Introduction To Computing (COMP-01102) Telecom 1 Semester
5 pages
Cessna: Where This Catalog Refers To A
No ratings yet
Cessna: Where This Catalog Refers To A
458 pages
Texture Mapping: Exploring Dimensionality in Computer Vision
From Everand
Texture Mapping: Exploring Dimensionality in Computer Vision
Fouad Sabry
No ratings yet
Tone Mapping: Tone Mapping: Illuminating Perspectives in Computer Vision
From Everand
Tone Mapping: Tone Mapping: Illuminating Perspectives in Computer Vision
Fouad Sabry
No ratings yet

DiffusioNeRF_Regularizing_Neural_Radiance_Fields_With_Denoising_Diffusion_Models_CVPR_2023_paper

Uploaded by

DiffusioNeRF_Regularizing_Neural_Radiance_Fields_With_Denoising_Diffusion_Models_CVPR_2023_paper

Uploaded by

This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;

DiffusioNeRF: Regularizing Neural Radiance Fields

Jamie Wynn Daniyar Turmukhambetov

Under good conditions, Neural Radiance Fields (NeRFs)

5. Conclusions results are shown in the supplementary materials.

You might also like