Multi-Scale 3D Gaussian Splatting For Anti-Aliased Rendering
Multi-Scale 3D Gaussian Splatting For Anti-Aliased Rendering
Figure 1. The rendering quality and speed of the original 3D Gaussian splatting[12] deteriorate severely at low resolutions or from
distant cameras due to aliasing. Conversely, our multi-scale 3D Gaussians representation utilizes selective rendering to achieve faster
(160% − 2400% at 128× resolution) and more accurate rendering at lower resolutions.
1
with an interval of one-pixel size. The signal can be consid- for this is that the coarse Gaussians are used to render low-
ered as the 3D scene represented implicitly as in NeRF or resolution images and the fine Gaussians are used to ren-
explicitly as in 3D Gaussians. When part of the 3D scene is der high-resolution images. With fewer than 5% number
represented with high details but rendered with low resolu- of Gaussians added and a similar training time, our method
tion or from distant positions, the disparity between the low can achieve 13%-66% PSNR and 160%-2400% rendering
sampling and high signal frequencies culminates in aliasing speed improvements at 4×-128× scale rendering on Mip-
artifacts. A naive solution is to render at high resolution NeRF360 dataset[2], while maintaining a comparable ren-
and subsequently down-scale the rendered image to a lower dering quality and speed at 1× scale.
resolution. However, this solution is not viable for scenes
containing both near and far regions which are very com- 2. Related Works
mon. Due to the inability of 3D Gaussian splatting algo-
rithm to accommodate varying resolutions within a single 2.1. Anti-Aliasing in Computer Graphics
image, rendering the entire image with a even higher reso- Aliasing is a long-standing problem for computer graphics
lution for the sake of far away regions is neither time nor when rendering a scene to a discrete image. Traditional
memory efficient. anti-aliasing techniques primarily target mesh representa-
We postulate that the pronounced aliasing artifacts ob- tions. Supersampling Anti-Aliasing (SSAA) [5] renders the
served when rendering with 3D Gaussians, as opposed to scene at a higher resolution before downscaling, leading
other techniques such as NeRF, are primarily attributable to to significantly more time and memory demain, and there-
the splatting of small Gaussians. 3D regions with intricate fore is less used in real-time applications. The Multisample
details are represented with large amount of small Gaus- Anti-Aliasing (MSAA) [1, 5] algorithm selectively super-
sians. When rendering these regions with low resolution samples pixels on the edges, reducing resource and time
or from a distant view, many splatted small Gaussians are consumption. This technique is not very suitable for 3D
cramped in one pixel and therefore the pixel color of this Gaussian splatting because of its requirement for regular
region is dominated by the front-most Gaussian, even if this grids and lack of support for variable sampling resolution
Gaussian is much smaller than others and not at the center. at different pixels. The more recent Fast Approximate Anti-
This problem is further aggravated by the low pass filter in Aliasing (FXAA) [11, 14] is a post processing algorithm
[12, 19] applied to each individual Gaussian with the inten- that smooths the jagged edges after the image is rendered.
tion to mitigate aliasing on edges at high resolutions. This Unfortunately, this technique is also not suitable for Gaus-
problem is explained in more detail in Sec. 3.2. sian representation as the front-most Gaussian dominates
In addition to the aliasing artifacts, the rendering speed the pixel color and produces chunky instead of jagged arti-
of 3D Gaussians is also affected at low resolution. The num- facts in mesh rendering.
ber of 3D Gaussians that need to be rendered remains con- In contrast to the supersampling methods mentioned
stant at lower resolutions, but they are more concentrated above, our method takes the inspiration from hierarchical
to fewer pixels. The Gaussians that are splatted to the same mipmap [18] and level of details (LOD) [7, 9] algorithms
pixel cannot be rendered in parallel. This means that the im- to address the aliasing for 3D Gaussians. Mipmap uses
age rendering is even slower at lower resolution in compari- multi-scale textures for the rendering at different resolution
son with NeRF rendering time that reduces linearly with de- or from different distances. LOD algorithm represents the
creasing resolution. Hence, although aliasing is not a prob- models in a scene with different complexity to be rendered
lem exclusive to 3D Gaussian splatting, it is more prominent at different distances. Both techniques not only mitigate the
and more difficult to tackle. aliasing effect by reducing the complexity of the scene rep-
resentation, but also enhances rendering speed, particularly
Contributions To mitigate the aliasing problem for 3D for large-scale scenes.
Gaussian splatting, we propose a novel multi-scale 3D
2.2. Anti-Aliasing in Neural Representation
Gaussians to represent the scene at different levels of de-
tail (LOD) as shown in Fig. 2. This is inspired by the The recent success of neural representations especially Neu-
mipmap and LOD algorithms widely used in computer ral Radiance Fields (NeRF) [6, 15, 16] has also inspired
graphics, which pre-computes textures and polygons at dif- some works to develop algorithms against aliasing effect
ferent scales to be rendered under different resolutions and on neural representations beyond the traditional mesh rep-
distances. Similarly, we add larger, coarser Gaussians for resentation. Mip-NeRF [2, 3] employ low pass filters on
lower resolutions by aggregating the smaller and finer Gaus- the positional encoding of the input spatial coordinates to
sians from higher resolutions. Depending on the pixel cov- reduce the scene signal frequency. Building on the hash
erage of the splatted Gaussians during rendering, only a grid representation used by InstantNGP [16] with no po-
subset of the Gaussians is used. A simplified explanation sition encoding, Zip-NeRF [4] proposes a multi-sampling
2
Aggregate and
Insert Selective Rendering
Render at a Given Resolu�on
Calculate Pixel Coverage 𝑆𝑘
Render all Gaussians of
Appropriate Size
Filter if too Small
Scale 1x
Scale 2x
Figure 2. Overall pipeline of our algorithm. At the early stage of training (left), small Gaussians below certain size threshold in each voxel
are aggregated, enlarged and inserted into the scene at different resolution scale. During rendering (right), the multi-scale Gaussians of the
appropriate “pixel coverage” at the current render resolution are selected for rendering. If the rendering resolution scale equals to the scale
of the Gaussians, the expected “pixel coverage” range of the Gaussians are updated accordingly.
P∞
strategy in the conical frustum instead of the camera ray, at n=−∞ δ(x − n · ∆x), where δ is a impulse function. The
the cost of 6× rendering time. Similar to the mipmap al- result of the sample in the spatial domain is:
gorithm in mesh texture rendering, Tri-MipNeRF [10] and
MipGrid [17] proposes to use multi-scale feature grids for gs (x) = δs (x, ∆x) · g(x). (1)
rendering at different resolution or distance.
Conversely, 3D Gaussian splatting [12] presents unique This sampled function converted into the frequency domain
anti-aliasing challenges due to its distinct scene represen- using Fourier transform operator F becomes:
tation. It does not have any positional encoding or fea- ∞
1 X k
ture grid, and its requirement for regular grids conflicts the F[gs (u)] = δ(u − ) ∗ F[g(x)]
more flexible multi-sampling strategies. The concentration ∆x ∆x
k=−∞
of small Gaussians in detail-rich regions exacerbates alias- ∞
(2)
1 X k
ing and speed issues, more so than in NeRF representations. = G(u − ).
∆x ∆x
To the best of our knowledge, we are the first to propose an k=−∞
anti-aliasing algorithm for scene reconstruction using 3D
When the highest frequency component fmax of the signal
Gaussian splatting. 1
is greater than half of the sampling frequency fs = ∆x ,
k
3. Preliminaries G(u − ∆x ) in the summation sequence would overlap with
each other and causes the sampled signal to diverge from the
3.1. 3D Gaussian Splatting actual signal. This phenomenon is the aliasing effect and
3D Gaussian splatting is first proposed in EWA Splatting the minimum sampling frequency needed to avoid aliasing
[19], and later used by [12] for scene reconstruction and is fN y = 2 · fmax , known as the Nyquist frequency.
novel view synthesis. The scene is represented by a set of K The EWA splatting [19] used by 3D Gaussian splatting
3D Gaussians {GV̂ k ,µ̂k , σk , ck | k ∈ [1, K]} with variance [12] also tries to mitigate the aliasing problem by applying a
low pass filter to each Gaussian independently. Specifically,
V̂ k , center µ̂k , density σk and color ck . During rendering,
it applies a Gaussian kernel h(x) as the low pass filter on the
the 3D Gaussians are splatted to the 2D screen to by the
splatted 2D signal gc (x) to produce a band limited signal:
perspective transformation to form 2D Gaussians GV k ,µk .
The image is then divided into 16×16 regular tiles and all gc′ (x) = gc (x) ∗ h(x)
2D Gaussians touching each tile are sorted based on their X Z
original depth. The color of each pixel in the tile is then = σk ck Tk qk (η)h(x − η)dη
rasterized from the sequential alpha blending the 2D Gaus- k R2 (3)
sians from front to back.
X
= σk ck Tk · (qk ∗ h)(x),
3.2. Cause of Aliasing in 3D Gaussian Splatting k
Aliasing can occur when sampling a continuous signal where R2 is the range of one pixel, qk (x) is the 2D in-
g(x) with a discrete sampling function δs (x, ∆x) = tegrated Gaussian kernel, and σk , ck , Tk are the opacity,
3
color, and transmittance at each Gaussian, respectively. By
combining the reconstruction Gaussian kernel GV k and low
pass Gaussian kernel GV h of covariance matrix V k and V h , Center
X
gc′ (x) = αk · (GV k ∗ GV h )(x) Level Set
𝜎𝑘 𝒢 𝜇 𝑘 ,𝑉 𝑘 = 𝜎𝑇
k
X (4)
= αk · GV k +V h (x), Figure 3. Pixel coverage of a 3D Gaussian is its horizontal or
k vertical size, whichever is smaller measured by the level set.
where αk represents all coefficients invariant of x at each
Gaussian and V h is determined by the screen pixel size. A
simple understanding of this is that the covariance of each
3D Gaussian is increased based on the screen pixel size.
This method of applying a low pass filter to each 3D
Gaussian independently helps to smooth the edges of the
Gaussians when the Gaussians are not too small compared
to the pixel size. However, it also gives rise to two substan-
tial issues at low resolutions:
1. V h added to the original covariance V k effectively in- Figure 4. Missing parts caused by naive small Gaussian filtering
creases the extent of each Gaussian, especially when V h at different resolution scales.
is large compared to V k at low resolutions. Small Gaus-
sians in the front dominate the color of the pixel and The pixel coverage approximates the extent of a 2D
cause severe artifacts shown in Fig. 7. P splatted Gaussian in the spatial domain. During the render-
2. The number of Gaussians involved in the sequential k ing from a given camera direction, the color of each splatted
for each pixel scales increases with decreasing image Gaussian is constant within this pixel coverage. As a result,
resolution. Due to the incremental calculation of the the coverage of this pixel approximates the inverse of the
transmittance Tk , the rendering even slower at lower res- highest frequency component fmax = 1/Sk in this region.
olutions. Compared to the sampling frequency of fs = 1px−1 during
rasterization, a signal frequency of fmax > fs /2 can cause
4. Our Method the sampling to fall below the Nyquist frequency needed to
4.1. Multi-Scale Gaussians Based on Pixel Coverage avoid aliasing.
Consequently, the Gaussians with pixel coverage Sk <
To mitigate the aliasing artifacts of 3D Gaussians [12] while ST = 2px should be filtered out during rendering to avoid
avoiding the two problems of the EWA splatting [19], we aliasing. Since 3D Gaussian representation does not encode
introduce multi-scale 3D Gaussians (cf . Fig. 2) that tackle the signal of different frequencies at different Gaussians,
the problem on the scene-level instead of on each individual naively filtering out the small Gaussians will result in a hole
Gaussian. The 3D scene is represented with Gaussians from or part missing in the scene as shown in Fig. 4. To address
4 levels of detail, corresponding to the 1×, 4×, 16×, and this issue, we propose to aggregate the small Gaussians to
64× downsampled resolution. Small finer-level Gaussians form large Gaussians that encode the low-frequency signal.
are aggregated to create larger Gaussians for coarser levels These large Gaussians would appear when the small Gaus-
during training. Each 3D Gaussian Gkl belongs to one of the sians are filtered out.
levels l and is included or excluded independently during
the rendering based on its “pixel coverage”. Aggregate to Insert Large Gaussians. All 3D Gaussians
initialized from the input point cloud at the start of the train-
Pixel Coverage of Gaussian. The “pixel coverage” of a ing belong to the finest level l = 1. They are densified by
Gaussian reflects the size of the Gaussian when splatted splitting and cloning as in [12], and all the densified Gaus-
onto the screen space compared to the pixel size at the cur- sians would inherit the same level. After the warm-up stage
rent rendering resolution. The “pixel coverage” Sk of a of the first 1,000 iterations, we introduce coarse-level Gaus-
splatted 2D Gaussian G(µk ,V k ) is defined as the length of sians by aggregating fine-level Gaussians that are too small
its horizontal or vertical axis until the low opacity level set, as visualized in Fig. 5 and described in Algorithm 1. The
whichever is smaller, as shown in Fig. 3. The pixel cover- procedure is outlined as follows:
age is measured in pixel count and the opacity threshold σT 1. For all levels {lm | 2 ≤ lm ≤ lmax }, we render all 3D
1
is set as 255 . Gaussians from [1, lm − 1] at the 4lm −1 times downsam-
4
Algorithm 2 Selective Rendering Based on Pixel Coverage
1:lmax
1: procedure S ELECTIVE R ENDER(G1:K , scale lr )
1:lmax
2: S1:K =PixelCoverage(G1:K )
3: G1 = {Gk |Sk /Skmax ≤ Srelmax
, ∀k}
min min
Select Small Gaussians Aggregate as
4: G2 = {Gk |Sk /Sk ≥ Srel ∨ Sk ≥ ST , ∀k}
Enlarge by
in Voxel Average Gaussian 5: Glr = {Gkl |l = lr , ∀k}
6: for Gkl ∈ Glr do
7: UpdateRange(Skmax , Skmin , Sk )
Figure 5. Large Gaussians are created by aggregating the small
8: end for
Gaussians in each voxel below the pixel coverage threshold, and
then enlarged by the pixel coverage multiplier. 9: return Render(G1 ∩ G2 , lr )
10: end procedure
tributes of all Gaussians within each voxel are aggre- Filtered at Low Resolu�on Selected at Appropriate Filtered at High Resolu�on
gated to create a new Gaussian using average pooling, Gaussian Too Small Resolu�on Gaussian Too Large
5
Table 1. Quantitative comparison and ablation study on the 360 dataset [3] at various downsampled scales, with time in “ms”.
Scale 1x 4x 16x 64x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 27.52 0.142 10.5 22.50 0.137 9.3 17.79 0.149 27.9 15.23 N.A. 103.3
3DGS + MS Train 27.35 0.155 11.3 23.50 0.126 7.7 20.21 0.115 22.8 19.38 N.A. 84.8
3DGS + Filter Small 27.40 0.153 10.0 23.81 0.149 5.4 20.02 0.186 4.8 17.38 N.A. 4.6
3DGS + Insert Large 18.02 0.604 9.7 18.75 0.531 2.5 20.23 0.256 2.7 21.53 N.A. 7.1
Our Full Method 27.39 0.155 9.1 24.82 0.132 5.4 24.75 0.066 4.9 25.35 N.A. 4.9
Table 2. Quantitative comparison and ablation study on Tank and Temples dataset [13] at various downsampled scales, with time in “ms”.
Scale 1x 4x 16x 64x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 23.74 0.096 6.5 19.70 0.105 11.1 15.61 0.068 43.4 13.88 N.A. 82.6
3DGS + MS Train 22.97 0.118 6.0 21.46 0.086 9.6 18.56 0.049 37.4 16.54 N.A. 71.7
3DGS + Filter Small 23.78 0.100 5.6 20.12 0.107 4.5 17.41 0.072 4.4 14.95 N.A. 4.7
3DGS + Insert Large 10.84 0.697 5.1 11.15 0.703 1.7 11.73 0.447 1.7 12.62 N.A. 2.5
Our Full Method 23.46 0.111 7.6 21.92 0.087 4.7 20.91 0.034 4.8 19.67 N.A. 5.9
Table 3. Quantitative comparison and ablation study on the Deep Blending dataset [8] at various downsampled scales, with time in “ms”.
Scale 1x 4x 16x 64x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 29.65 0.094 8.6 27.48 0.066 7.5 22.06 0.067 20.7 17.75 N.A. 59.7
3DGS + MS Train 29.46 0.102 6.6 28.18 0.062 5.3 24.13 0.055 14.3 20.03 N.A. 41.3
3DGS + Filter Small 29.68 0.095 6.7 28.26 0.064 4.2 24.52 0.078 3.6 18.29 N.A. 3.2
3DGS + Insert Large 20.59 0.379 4.6 20.83 0.336 1.6 21.29 0.143 2.1 20.10 N.A. 4.2
Our Full Method 29.70 0.096 7.4 28.43 0.064 3.9 27.66 0.036 3.4 25.70 N.A. 3.4
The pixel coverage range of each Gaussian allows the cover a wide range of object centric, indoor, and ourdoor
model to maintain multi-scale Gaussians for different lev- scenes.
els of detail. The appropriate subset of Gaussians is chosen Our evaluation focuses on the rendering quality and
for rendering at different resolutions and distances. More speed at multiple downsampling scales of 1x, 4x, 16x, and
smaller Gaussians encoding the high-frequency information 64x derived from the test views. The rendering quality is
are rendered at high resolution, and fewer and larger Gaus- measured in PSNR and LPIPS, while the speed is measured
sians encoding the low-frequency information are rendered in per-image rendering time. This multi-scale evaluation is
at low resolution for less aliasing effect and faster speed. aimed at simulating the rendering performance in scenarios
of low-resolution imaging or when captured from distant
5. Experiments cameras. More detailed evaluations, including the results
for more resolution scales and per-scene decomposition, are
In this section, we present a comprehensive evaluation of included in the supplementary materials due to the space
our proposed model, which is grounded on the implemen- constraint. Additionally, the supplementary materials in-
tation framework of the official release of the 3D Gaussian clude a video that offers an intuitive qualitative comparison
Splatting code. To achieve a similar training time as the of the two algorithms, vividly demonstrating the improve-
baseline model, our models are trained for 40000 iterations ment of our algorithm in quality and speed from multiple
with all other hyper-parameters unchanged. All rendering viewpoints.
speed are measured on a single RTX3090 GPU. We evaluate
the performance of the vanilla 3D Gaussian Splatting[12]
algorithm and our model on the multi-scale 360[3], Tank Quantitative Comparison. As shown in Tab. 1, Tab. 2,
And Temples[13], and Deep Blending[8] dataset, aligned and Tab. 3, our method can achieve substantial quality and
with the data used by the original paper. These datasets speed improvements compared to the original 3D Gaussian
6
Ground Truth
Mul�-Scale
3D Gaussian
Spla�ng
Ours
Figure 8. Qualitative Comparison on Tank and Temples dataset[13] for different resolution scales.
Splatting [12] at lower resolutions. The quality and speed Qualitative Comparison. We present the qualitative
improvements become more pronounced as the resolution comparison with the original 3D Gaussian Splatting [12]
reduces, with the most noticeable 6-10dB PSNR and 20- shown in Fig. 7, Fig. 8, and Fig. 9. At higher resolutions
30× speed gain at the 64× resolution scale. As the reso- (1×-8×), both ours and the original algorithm can render
lution reduces, the original splatting algorithm slows down the novel view rather faithfully. However, as the resolu-
while our method accelerates. The rendering quality and tion reduces further(16×-64×), the original splatting algo-
speed at the original resolution (1×) remain comparable, rithm produces severe artifacts, where the foreground be-
indicating the effectiveness of our multi-scale Gaussians in comes larger and larger, dominating the pixel colors as ex-
representing both the high and low resolutions together. plained in Sec. 3.2. In contrast, the images rendered by our
method closely resemble the ground truth across all resolu-
tion scales.
7
Ground Truth
Mul�-Scale
3D Gaussian
Spla�ng
Ours
Figure 9. Qualitative Comparison on Deep Blending dataset[8] for different resolution scales.
8
Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Supplementary Material
8. Video Comparison sians in one voxel are grouped together for the aggrega-
tion later. The size of the voxel increases as the resolu-
To better demonstrate the improvement of our algorithm
tions decrease because coarser levels require fewer larger
in quality and speed for different resolutions, we include
Gaussians. Specifically, when inserting large Gaussians for
a video comparing our results with the original 3D Gaus-
level lm , the voxel size is chosen to be an empirical value of
sian Splatting[12] at multiple scenes from different views (400/lm )3 . All Gaussians with their center in one voxel are
and resolutions. grouped together for the next step. Although it is possible
for a Gaussian to extent beyond the voxel while its center
9. Details of Gaussian Aggregation Algorithm
resides in the voxel, it is unlikely to reach too far as large
Due to the space constraint of the main paper, some details Gaussians are filtered out in the earlier procedure.
of the Gaussian aggregation process are omitted. In this
section, we will elaborate further with some examples to
Average Pooling and Enlargement After the small
help the readers understand and reproduce our work. The
Gaussians are grouped in individual voxels, their param-
process consists of the following steps:
eters are averaged to create the large Gaussian. Specif-
ically, the large Gaussian takes the average position, ro-
Render at Lower Resolution. Since we want to insert tation, spherical harmonics features, opacity and scaling.
large Gaussians that are of appropriate size to be rendered However, a new Gaussian would be too small if it remains
at lower resolutions, we need to aggregate small Gaussians at this scaling. Consequently, we calculate the average pixel
to form large Gaussians. Pixel coverage is used to deter- coverage of all the aggregated small Gaussians Savg using
mine whether a Gaussian is too small, we need to render all their pixel coverage derived earlier. The scaling of the new
Gaussians first to calculate their pixel coverage at all train- Gaussian is then enlarged by ST /Savg for its pixel coverage
ing cameras. For all coarse levels lm = [2, lmax ], we render to be approximately ST , which is suitable to be rendered at
all Gaussians from [1, lm − 1] at 4lm −1 times downsampled level lm . This average pooling is not perfect, but simple
resolution. For example, we render all Gaussians from level and effective enough to produce a reasonable initialization
1 to 3 at the 64× downsampled resolution from all train- for the multi-scale training later.
ing cameras to add large Gaussians for level 4. A Gaussian
splatted to any of the training cameras with a pixel coverage
10. Qualitative Ablation Study
Sk smaller than ST is considered too small, and is included
for the next step of aggregation. To better compare the effectiveness of each of our proposed
module qualitatively, we present the rendering results of our
Unbounded Scene Normalization. The Gaussians can method and various ablation models in Fig. 10–14. The ab-
be located at the range of (−∞, ∞) in unbounded scenes. lation model design follows the experiment section in the
This is not suitable for voxelization later as only a limited main paper. Specifically, the “+MS Train” model is trained
amount of voxels can be used. To normalize the unbounded using multi-scale images, but the Gaussians are only of a
space, the center region and the outer region are handled single scale as in 3D Gaussian Splatting [12]. The low-
in different manners. The space bounded by a axis-aligned resolution performance is slightly improved, but the render-
cube of length B defined by the span of all training cameras ing speed is as slow as the original method. The “+Fil-
is considered the center region, and the rest is considered ter Small” model filters the small Gaussians based on the
the outer region. To preserve the structure in the center re- pixel coverage on top of the multi-scale training. It signifi-
gion, the coordinates are linearly scaled from [−B, B] to cantly accelerates the low-resolution rendering process, but
[−1, 1]. To normalize the unbounded outer region, the co- the scene has some part missing as shown in the rendered
ordinates are non-linearly scaled from (−∞, ∞) to (−2, 2). images. The image rendered also has artifacts like black
The exact normalization is as follows: dots at low resolutions, caused by the filtered small Gaus-
( sians. The “+Insert Large” model inserts the large Gaus-
x/B, if max(|x|) ≤ B
xnorm = . (7) sians from aggregation on top of the multi-scale training.
2 − B/x, otherwise It has good rendering speed and quality at low resolutions,
but the image rendered at high resolution is over-smoothed.
Voxelization. After the Gaussian positions are normal- This is caused by the finer level Gaussians not filtered out
ized to [−2, 2], they need to be voxelized so that all Gaus- but optimized together with the inserted large Gaussians at
9
Figure 10. Qualitative ablation results of our proposed method on the ”Bicycle” scene.
low resolutions. Our ”Full Method” overcomes the weak- 12. Per-Scene Quantitative Results
ness of the ablation models and produces high-quality ren-
dering at fast speed on both high and low resolutions. The We present the per-scene decomposition of the quantitative
small Gaussians filtered improves the speed, and the large results of our method and the original 3D Gaussian splat-
Gaussians inserted improves the quality at low resolutions. ting [12] in various resolutions. The experiments are carried
on MipNeRF-360 dataset [3] as shown in Tab. 7, Tank and
The qualitative ablation supports the effectiveness of our
proposed components. Temple dataset [13] as shown in Tab. 8, and Deep Blend-
ing dataset [8] as shown in Tab. 9. The scenes chosen to be
tested on follow the experiments carried out in the original
11. Quantitative Results on More Resolutions 3D Gaussian splatting paper [12].
We present the quantitative results of our method, the orig-
inal 3D Gaussian Splatting[12], and the various ablation
methods on more downsampled resolutions. The resolu-
tions include those that are not used during training which
demonstrate the performance and robustness of our model.
The experiments are conducted on MipNeRF-360 dataset
[3] as shown in Tab. 4, Tank and Temple dataset [13] as
shown in Tab. 5, and Deep Blending dataset [8] as shown
in Tab. 6.
10
Figure 11. Qualitative ablation results of our proposed method on the ”Counter” scene.
Table 4. Quantitative comparison and ablation study on MipNeRF 360 dataset [3] at more downsampled scales, with time in “ms”.
Scale 1x 2x 4x 8x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 27.52 0.142 10.5 25.96 0.124 8.0 22.50 0.137 9.3 19.79 0.154 14.6
3DGS + MS Train 27.35 0.155 11.3 26.33 0.128 7.3 23.50 0.126 7.7 21.38 0.131 12.1
3DGS + Filter Small 27.40 0.153 10.0 26.42 0.129 6.8 23.81 0.149 5.4 21.73 0.175 5.1
3DGS + Insert Large 18.02 0.604 9.7 18.28 0.593 3.4 18.75 0.531 2.5 19.39 0.419 2.2
Our Method 27.39 0.155 9.1 26.44 0.134 6.3 24.82 0.132 5.4 24.44 0.112 5.1
Scale 16x 32x 64x 128x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 17.79 0.149 27.9 16.30 0.084 55.2 15.23 N.A. 103.3 14.55 N.A. 123.2
3DGS + MS Train 20.21 0.115 22.8 19.80 0.060 45.6 19.38 N.A. 84.8 18.75 N.A. 100.1
3DGS + Filter Small 20.02 0.186 4.8 18.81 0.090 4.4 17.38 N.A. 4.6 16.13 N.A. 4.8
3DGS + Insert Large 20.23 0.256 2.7 21.17 0.081 4.6 21.53 N.A. 7.1 20.25 N.A. 9.4
Our Method 24.75 0.066 4.9 25.06 0.025 4.7 25.35 N.A. 4.9 22.55 N.A. 5.0
11
Figure 12. Qualitative ablation results of our proposed method on the ”Garden” scene.
Table 5. Quantitative comparison and ablation study on Tank and Temple dataset [13] at more downsampled scales, with time in “ms”.
Scale 1x 2x 4x 8x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 23.74 0.096 6.5 22.55 0.080 7.1 19.70 0.105 11.1 17.34 0.117 21.5
3DGS + MS Train 22.97 0.118 6.0 23.04 0.083 6.3 21.46 0.086 9.6 20.18 0.080 18.5
3DGS + Filter Small 23.78 0.100 5.6 22.76 0.079 5.1 20.12 0.107 4.5 18.62 0.122 4.4
3DGS + Insert Large 10.84 0.697 5.1 10.96 0.719 2.4 11.15 0.703 1.7 11.40 0.631 1.6
Our Method 23.46 0.111 7.6 22.44 0.095 5.6 21.92 0.087 4.7 20.88 0.082 4.6
Scale 16x 32x 64x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 15.61 0.068 43.4 14.45 N.A. 70.9 13.88 N.A. 82.6
3DGS + MS Train 18.56 0.049 37.4 17.41 N.A. 61.7 16.54 N.A. 71.7
3DGS + Filter Small 17.41 0.072 4.4 16.05 N.A. 4.5 14.95 N.A. 4.7
3DGS + Insert Large 11.73 0.447 1.7 12.14 N.A. 2.1 12.62 N.A. 2.5
Our Method 20.91 0.034 4.8 21.01 N.A. 5.4 19.67 N.A. 5.9
12
Figure 13. Qualitative ablation results of our proposed method on the ”Treehill” scene.
Table 6. Quantitative comparison and ablation study on Deep Blending dataset [8] at more downsampled scales, with time in “ms”.
Scale 1x 2x 4x 8x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 29.65 0.094 8.6 29.41 0.065 6.6 27.48 0.066 7.5 24.67 0.076 11.3
3DGS + MS Train 29.46 0.102 6.6 29.42 0.069 4.8 28.18 0.062 5.3 26.15 0.065 8.0
3DGS + Filter Small 29.68 0.095 6.7 29.53 0.064 4.9 28.26 0.064 4.2 26.51 0.082 3.8
3DGS + Insert Large 20.59 0.379 4.6 20.67 0.381 2.2 20.83 0.336 1.6 21.07 0.263 1.7
Our Method 29.70 0.096 7.4 29.58 0.065 4.8 28.43 0.064 3.9 27.59 0.063 3.6
Scale 16x 32x 64x
Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
3D Gaussian[12] 22.06 0.067 20.7 19.74 N.A. 36.3 17.75 N.A. 59.7
3DGS + MS Train 24.13 0.055 14.3 22.09 N.A. 24.8 20.03 N.A. 41.3
3DGS + Filter Small 24.52 0.078 3.6 22.01 N.A. 3.3 18.29 N.A. 3.2
3DGS + Insert Large 21.29 0.143 2.1 21.14 N.A. 2.8 20.10 N.A. 4.2
Our Method 27.66 0.036 3.4 27.22 N.A. 3.3 25.70 N.A. 3.4
13
Figure 14. Qualitative ablation results of our proposed method on the ”Truck” scene.
14
Table 7. Per-scene performance decomposition on MipNeRF-360 dataset[3]. Time measured in ’ms’.
Scale 1x 4x 16x 64x 128x
Scene Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
garden 3D-GS[12] 27.27 0.070 15.0 20.42 0.136 14.4 16.74 0.166 48.8 14.92 N.A. 200.9 14.29 N.A. 245.0
garden Ours 27.16 0.080 11.8 23.99 0.112 7.8 26.41 0.044 7.5 24.79 N.A. 8.6 21.19 N.A. 9.6
flowers 3D-GS[12] 21.41 0.309 9.1 18.89 0.239 8.8 15.46 0.165 24.9 13.90 N.A. 93.2 13.69 N.A. 112.2
flowers Ours 21.11 0.333 8.1 20.83 0.234 5.7 21.97 0.093 5.1 22.69 N.A. 4.9 21.82 N.A. 5.0
treehill 3D-GS[12] 22.60 0.274 10.0 21.63 0.232 9.7 18.71 0.193 24.6 16.19 N.A. 90.6 15.52 N.A. 97.0
treehill Ours 22.64 0.291 8.7 22.31 0.239 5.8 23.55 0.072 5.4 24.28 N.A. 4.9 22.27 N.A. 5.0
bicycle 3D-GS[12] 25.15 0.164 18.8 19.71 0.178 15.5 16.27 0.215 43.9 14.99 N.A. 163.8 15.15 N.A. 187.0
bicycle Ours 24.44 0.210 13.4 24.76 0.131 7.4 25.00 0.081 6.4 26.02 N.A. 6.5 21.56 N.A. 6.9
counter 3D-GS[12] 29.15 0.099 7.5 24.81 0.084 6.4 17.94 0.101 19.2 14.32 N.A. 60.4 13.39 N.A. 74.6
counter Ours 29.17 0.100 6.6 26.77 0.076 3.3 23.44 0.057 2.8 24.59 N.A. 2.7 21.14 N.A. 2.7
kitchen 3D-GS[12] 31.70 0.064 9.3 23.95 0.081 8.5 18.50 0.093 35.4 15.00 N.A. 124.4 14.15 N.A. 150.3
kitchen Ours 31.64 0.064 8.1 25.93 0.089 4.2 24.16 0.049 3.9 25.35 N.A. 3.3 21.50 N.A. 3.2
room 3D-GS[12] 31.63 0.093 8.0 26.60 0.057 5.1 19.50 0.096 12.0 15.50 N.A. 49.2 14.37 N.A. 70.8
room Ours 31.51 0.094 6.6 28.95 0.053 3.1 28.15 0.025 2.9 25.77 N.A. 2.9 21.82 N.A. 2.9
stump 3D-GS[12] 26.75 0.138 10.6 22.24 0.152 10.1 18.57 0.188 26.5 17.33 N.A. 95.2 16.97 N.A. 114.0
stump Ours 26.59 0.152 12.9 23.52 0.150 8.2 25.22 0.112 7.2 29.22 N.A. 7.1 29.09 N.A. 7.2
bonsai 3D-GS[12] 32.04 0.065 6.0 24.23 0.075 5.3 18.43 0.126 15.4 14.95 N.A. 52.4 13.46 N.A. 57.9
bonsai Ours 32.27 0.067 5.5 26.32 0.106 3.3 24.87 0.062 2.8 25.40 N.A. 2.9 22.53 N.A. 2.8
Table 8. Per-scene performance decomposition on Tank and Temple dataset[13]. Time measured in ’ms’.
Scale 1x 4x 16x 64x
Scene Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
truck 3D-GS[12] 25.39 0.064 7.3 19.97 0.103 11.3 15.69 0.064 49.2 14.20 N.A. 89.1
truck Ours 24.94 0.078 9.0 23.67 0.059 5.4 22.62 0.024 6.0 19.99 N.A. 8.6
train 3D-GS[12] 22.09 0.129 5.8 19.42 0.108 10.9 15.54 0.072 37.6 13.57 N.A. 76.1
train Ours 21.98 0.144 6.2 20.17 0.114 3.9 19.21 0.044 3.5 19.36 N.A. 3.3
Table 9. Per-scene performance decomposition on Deep Blending dataset[8] Time measured in ’ms’.
Scale 1x 4x 16x 64x
Scene Metric PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓ PSNR↑ LPIPS↓ Time↓
drjohnson 3D-GS[12] 29.14 0.106 10.1 27.23 0.079 9.3 22.73 0.078 26.3 18.60 N.A. 67.6
drjohnson Ours 29.19 0.108 8.6 27.96 0.078 4.4 26.80 0.051 3.9 27.19 N.A. 3.8
playroom 3D-GS[12] 30.15 0.082 7.0 27.72 0.053 5.7 21.40 0.056 15.0 16.89 N.A. 51.8
playroom Ours 30.20 0.084 6.2 28.89 0.051 3.4 28.53 0.020 3.0 24.22 N.A. 3.0
15
References Hao Su. Tensorf: Tensorial radiance fields. In European
Conference on Computer Vision (ECCV), 2022. 2
[1] Kurt Akeley. Reality engine graphics. In Proceedings of the
[7] Carl Erikson. Polygonal simplification. Technical Report
20th Annual Conference on Computer Graphics and Interac-
96-016, 1996. 2
tive Techniques, page 109–116, New York, NY, USA, 1993.
[8] Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm,
Association for Computing Machinery. 2
George Drettakis, and Gabriel Brostow. Deep blending for
free-viewpoint image-based rendering. ACM Trans. Graph.,
37(6), 2018. 6, 8, 10, 13, 15
[9] Tan Kim Heok and D. Daman. A review on level of detail. In
Proceedings. International Conference on Computer Graph-
ics, Imaging and Visualization, 2004. CGIV 2004., pages 70–
75, 2004. 2
[10] Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao,
[2] Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Xiao Liu, and Yuewen Ma. Tri-miprf: Tri-mip represen-
Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. tation for efficient anti-aliasing neural radiance fields. In
Mip-nerf: A multiscale representation for anti-aliasing neu- Proceedings of the IEEE/CVF International Conference on
ral radiance fields. ICCV, 2021. 1, 2 Computer Vision (ICCV), pages 19774–19783, 2023. 3
[11] Jorge Jimenez, Diego Gutiérrez, Jason Yang, Alexan-
der Reshetov, Pete Demoreuille, Tobias Berghoff, Cedric
Perthuis, Henry Yu, Morgan Mcguire, Timothy Lottes, Hugh
Malan, and Emil Persson. Filtering approaches for real-
time anti-aliasing. ACM SIGGRAPH 2011 Courses, SIG-
GRAPH’11, 2011. 2
[12] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler,
and George Drettakis. 3d gaussian splatting for real-time
[3] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. radiance field rendering. ACM Transactions on Graphics, 42
Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded (4), 2023. 1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 15
anti-aliased neural radiance fields. CVPR, 2022. 2, 6, 7, 10,
[13] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen
11, 15
Koltun. Tanks and temples: Benchmarking large-scale scene
reconstruction. ACM Transactions on Graphics, 36(4), 2017.
6, 7, 10, 12, 15
[14] Timothy Lottes. Fxaa. Technical report, Nvidia, 2011. 2
[15] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik,
Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf:
Representing scenes as neural radiance fields for view syn-
thesis. In ECCV, 2020. 1, 2
[4] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. [16] Thomas Müller, Alex Evans, Christoph Schied, and Alexan-
Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid- der Keller. Instant neural graphics primitives with a multires-
based neural radiance fields. ICCV, 2023. 2 olution hash encoding. ACM Trans. Graph., 41(4):102:1–
102:15, 2022. 1, 2
[17] Seungtae Nam, Daniel Rho, Jong Hwan Ko, and Eunbyung
Park. Mip-grid: Anti-aliased grid representations for neu-
ral radiance fields. In Thirty-seventh Conference on Neural
Information Processing Systems, 2023. 3
[18] Lance Williams. Pyramidal parametrics. SIGGRAPH Com-
put. Graph., 17(3):1–11, 1983. 2
[19] M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa vol-
[5] Kristof Beets and David L. Barron. Super-sampling anti-
ume splatting. In Proceedings Visualization, 2001. VIS ’01.,
aliasing analyzed. 2000. 2
pages 29–538, 2001. 2, 3, 4
[6] Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and
16