Space-Optimized Texture Atlases
Space-Optimized Texture Atlases
MASTER THESIS
SPACE-OPTIMIZED TEXTURE ATLASES
STUDENT: JONS MARTNEZ BAYONA DIRECTOR: CARLOS ANDJAR GRAN DATE: 8/09/2009
Acknowledgments
We would like to thank TeleAtlas for providing the tridimensional model and facade textures of Barcelona and the Institut Cartogrc de Catalunya for the aerial photos.
Contents
I Abstract 7
II
Introduction
III
10
10
10 10 11 11 12 12 12 13 13 13 14 14 15 15 16
Perceptually-based metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The human visual system Visible dierences predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single scale tone reproduction operators . . . . . . . . . . . . . . . . . . . . . . . Multi scale tone reproduction operators
2 Image compression
2.1 2.2 2.3 Lossy compression methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lossless compression methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Compression of images with repetitive patterns . . . . . . . . . . . . . . . . . . . . . . .
17
17 18 19
20 22
22 22
24
24 25 25 25 25 26
Exact algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IV
27
27
27 28
29
29 30 30 32 32 35 35 35 36
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Predicting the minimum size of texture atlas Texture atlas binary tree Optimizing texture space
38
38 38 40 40 40 42 43 45 45 47 47
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results
49
9 Test specications
9.1 9.2 Test model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hardware tested
49
49 50
10 Space compression
10.1 Image downsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Downsampling function of test images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Reconstruction of test images with varying RMSE visual tolerance . . . . . . . . 10.1.3 Reconstruction of test images with varying HVSE visual tolerance 10.1.4 Test images compression results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.5 Image downsampling results of test model . . . . . . . . . . . . . . . . . . . . . . 10.2 Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Encoding texture chart coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
52 52 56 61 69 74 77 81
83 85
88
VII
Appendix
89
List of Figures
1 2 3 4 5 6 7 Images analyzed with RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 13 14 15 18 20 Mannos-Sakrison contrast sensitivity function . . . . . . . . . . . . . . . . . . . . . . . . Cross Section of the human Eye (Illustration by Mark Erickson) cessing of the spatial frequency and orientation channels. Block diagram of the Visible Dierences Predictor. Heavy arrows indicate parallel pro. . . . . . . . . . . . . . . . . S3TC lookup table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of a texture atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TileTree by Lefebvre et al. [45] Left: A torus is textured by a TileTree. Middle: The TileTree positions square texture tiles around the surface using an octree.Right: The tile map holding the set of square tiles. 8 9 10 11 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 23 24 25 28 Distance-dependent texture selection proposed by Buchholz and Dllner [6] First-Fit Decreasing Height algorithm
The Fekete and Schepers modeling approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture atlas creation scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example of search space reduction using binary search. Each point of the square represents a texture size on
Lower row :
search on
alternating search on
w, h.
both approaches are not guaranteed to nd the same (wo ,ho ). . . . . . . . . . . . . . . . 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 RMSE error of the subsampling of a facade detail Framework of the . . . . . . . . . . . . . . . . . . . . .
CIELAB
colour model . . . . . . . . . . . . . . . . . . . . . . . . . .
HVSE error of the subsampling of a facade detail . . . . . . . . . . . . . . . . . . . . . . Binary tree structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Packed texture coordinates on the atlas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Encoding of a compressed texture coordinate textures
Uninitialized texels at the 2x2 and 1x1 mipmaps for an atlas containing 8x8 and 4x4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bilinear ltering of lower mip-levels accesses texels from unrelated neighbouring textures Correct bilinear ltering scheme in a 16x16 chart with 2 border texels and 2 mip-levels . Incorrect bilinear ltering scheme in a 16x16 chart with 1 border texel and 2 mip-levels Hierarchical texture atlas representation . . . . . . . . . . . . . . . . . . . . . . . . . . . Screen projection factor scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Terrain rendering LOD example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thumbnails of a set of Teleatlas textures . . . . . . . . . . . . . . . . . . . . . . . . . . . Window downsampling function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rocktile downsampling function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bricktile downsampling function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabrictile downsampling function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textureatlas downsampling function Boat downsampling function Crayons downsampling function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aerial photo downsampling function Rocktile reconstruction using RMSE Fabrictile reconstruction using RMSE Crayons reconstruction using RMSE
Window reconstruction using RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bricktile reconstruction using RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textureatlas reconstruction using RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . Boat reconstruction using RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aerialphoto reconstruction using RMSE Window reconstruction using HVSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62 63 64 65 66 67 68 77 77 78 78 79 79 81 82 82 83 84 85 85 86 86 87 87 91
Fabrictile reconstruction using HVSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Textureatlas reconstruction using HVSE . . . . . . . . . . . . . . . . . . . . . . . . . . . Crayons reconstruction using HVSE Boat reconstruction using HVSE Texture set 1 packing Texture set 2 packing Texture set 3 packing Texture set 4 packing Texture set 5 packing Texture packing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chart encoding performance (vertices/s) . . . . . . . . . . . . . . . . . . . . . . . . . . . Chart encoding performance (fragments/s) Chart encoding performance (frames/s) Framerate evolution of walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Barcelona snapshot 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barcelona snapshot 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barcelona snapshot 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barcelona snapshot 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barcelona snapshot 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barcelona snapshot 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Some characteristic colours of Barcelona facades . . . . . . . . . . . . . . . . . . . . . .
List of Tables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Space and processing overheads of the three options considered for tiling periodic images 39 Area precision and mip-level associated to each atlas tree level of our implementation . 46 49 49 50 51 74 75 75 75 75 75 76 80 81 83 Test model geometry information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test model texture information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hardware specications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test image specications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture compression with the whole city . . . . . . . . . . . . . . . . . . . . . . . . . . . City downsampling RMSE 5% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . City downsampling RMSE 10% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . City downsampling RMSE 20% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . City downsampling HVSE 10% City downsampling HVSE 30% City downsampling HVSE 50% Texture set packing occupancy 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chart encoding performance (for more information of the encoding techniques see Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resulting framerate for each technique (walkthrough)
List of Algorithms
1 2 3 4 5 6 7 8 9 Subsampling image in one direction with error metric Compression of an image with error metric Texture atlas bin packing Inserting an image . . . . . . . . . . . . . . . . . . . 31 31 35 36 37 37 89 89 90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Optimizing texture stretch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Texture wrapping vertex program (see Section 8.1.1) . . . . . . . . . . . . . . . . . . . . Texture wrapping fragment program (see Section 8.1.3) Quadtree generation (see Section 8.2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part I
Abstract
Texture atlas parameterization provides an eective way to map a variety of colour and data attributes from 2D texture domains onto polygonal surface meshes. Most of the existing literature focus on how to build seamless texture atlases for continuous photometric detail, but little eort has been devoted to devise ecient techniques for encoding self-repeating, uncontinuous signals such as building facades. We present a perception-based scheme for generating space-optimized texture atlases specically designed for intentionally non-bijective parameterizations. Our scheme combines within-chart tiling support with intelligent packing and perceptual measures for assigning texture space in accordance to the amount of information contents of the image and on its saliency. We demonstrate our optimization scheme in the context of real-time navigation through a gigatexel urban model of an European city. Our scheme achieves signicant compression ratios and speed-up factors with visually indistinguishable results. We developed a technique that generates space-optimized texture atlases for the particular encoding of uncontinuous signals projected onto geometry. The scene is partitioned using a texture atlas tree that contains for each node a texture atlas. The leaf nodes of the tree contain scene geometry. The level of detail is controlled by traversing the tree and selecting the appropriate texture atlas for a given viewer position and orientation. In a preprocessing step, textures associated to each texture atlas node of the tree are packed. Textures are resized according to a given user-dened texel size and the size of the geometry that are projected onto. We also use perceptual measures to assign texture space in accordance to image detail. We also explore dierent techniques for supporting texture wrapping of uncontinuous signals, which involved the development of ecient techniques for compressing texture coordinates via the GPU. Our approach supports texture ltering and DXTC compression without noticeable artifacts. We have implemented a prototype version of our space-optimized texture atlases technique and used it to render the 3D city model of Barcelona achieving interactive rendering frame rates. The whole model was composed by more than three million triangles and contained more than twenty thousand dierent textures representing the building facades with an average original resolution of
512 512
pixels per texture. Our scheme achieves up 100:1 compression ratios and speed-up factors
Part II
Introduction
Heavily-textured models involving a large number of periodic textures are often encountered in many computer graphic applications including architectural visualization, urban modelling and virtual earth globes. Factoring repeated content through texture tiling helps to reduce image data by several orders of magnitude. Current available urban city models, for example, make extensive use of periodic textures to encode highly-detailed repeating patterns on building facades. Even though the current trend in urban modeling is to use real photographs, the limitations of current acquisition technology makes this option feasible only for a few singular buildings, and it is obviously not applicable for ancient sites and non-existing cities. As an example, the most detailed 3D model of the city of Barcelona available today, makes use of a library of 23,939 distinct textures, most of them periodic, to represent 93,111 buildings, in addition to 28 singular buildings for which real photographs are used. Real-time rendering of detailed textured models is still a challenging problem in computer graphics. Rendering acceleration methods for geometry data (such as maximizing the hit rate of the geometry cache [36], reducing pixel overdraw and level-of-detail rendering) are benecial for nely-tesselated, geometrically-complex models, but not for urban mass models with a few planar polygons and most details stored in texture maps and displacement maps. Image-based method simplify scene parts by replacing them with impostors, at the expense of high storage requirements. View frustum culling and occlusion culling techniques can provide speed-ups of several orders of magnitude, but are mostly eective for indoor scenes, being of little value e.g. for overhead views of outdoor scenes. The use of large collections of highly-detailed, periodic textures in models representing man-made structures poses several additional problems:
Per-corner attributes.
attribute binding.
dominance of sharp edges over smooth edges, thus requiring per-corner (rather than per-vertex) A corner is a vertex/polygon pair and typical corner attributes are color, normal, and texture coordinates. Per-corner attribute binding requires much more storage (and often memory bandwidth) than per-vertex binding. For example, a triangle mesh with has
vertices
3T 6V
4Q 4V
storing six (resp. four) times more attributes than per-vertex binding. For example, a triangle mesh with per-corner normals and texture coordinates typically requires coordinates and
332V
6 (2 + 3) 32 V
bits for per-corner attributes. Vertex Buer Objects (VBOs) is one of the most VBOs
allow vertex array data to be stored in high-performance graphics memory on the server side and promotes ecient data transfer. Unfortunately, VBOs only support per-vertex binding (the specication of OpenGL does not support multi-index vertex arrays). This means that the only way to use VBOs to render primitives with per-corner attributes is to replicate all per-corner attributes, would be
6 (2 + 3) 32 V
Rendering models using a large collection of texture maps State-sorting can be performed so that all
objects with similar textures are drawn together, but the remaining number of texture switches can be still too high. The traditional approach to avoid texture switching is to pack multiple textures into a single texture atlas. Unfortunately, OpenGL's GL_REPEAT wrapping mode cannot be used within a chart inside a texture atlas, and thus tiled textures need to be unfolded on the texture atlas, increasing the required space by several orders of magnitude. Some new GPUs provide a new feature called texture arrays, which allow to reduce texture switching but
all the charts are required to have exactly the same size and the maximum number of charts is limited by the maximum texture resolution rather than by the memory available.
Contribution
In this thesis we present an algorithm for creating a space-optimized texture atlas from a heavilytextured polygonal model with per-polygon textures. Novel elements include:
A new pipeline to generate a space-optimized texture atlas. Our approach has been conceived to be integrated into a texture LOD management system.. An ecient algorithm for resizing each chart in accordance with the object-space size of the surface the chart is mapped onto, and the perceptual importance of the chart's contents from given viewing conditions.
A BSP-based algorithm for packing rectangular charts into a single texture atlas which minimizes unused space. In fact, our algorithm always achieve a
100%
of atlas coverage.
Several shader techniques providing within-chart tiling support for periodic textures. Our strategy avoids unfolding periodic textures. Factoring repeated content through texture tiling helps to reduce image data by several orders of magnitude when compared to conventional texture atlases.
Part III
State of the art
In the following sections we review previous work related to texture atlas generation and parametrization techniques that provide a background to our packing algorithm. We also introduce some previous work on image comparison metrics and image compression techniques strongly related with our perceptual-driven texture compression. A summary of urban and terrain visualization techniques is also presented.
a and b, the value returned by a metric M (a, b) is called the magnitude of dierence M . However, the MODs returned by dierent metrics are not directly comparable.
Individual metrics may measure dierent properties of the images concerned and operate in dierent sub-domains. Hualin et al.[80] present a study of image comparison metrics to quantify the magnitude of dierence between a visualization of a computer simulation and a photographic image captured from an experiment. Normalization of MODs is thus necessary to make it comparable. In this section, we describe and discuss several metrics classied into three categories: domain, spatial-frequency and perceptually-based approaches. spatial
1.1
This group of metrics operate in the spatial domain of images and derive an evaluation by examining some statistical properties of images.
1.1.1
The mean squared error or MSE of an estimator is one of many ways to quantify the amount by which an estimator diers from the true value of the quantity being estimated. MSE measures the average of the square of the error. The error is the amount by which the estimator diers from the quantity to be estimated. The dierence occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate. In the case of image analysis, we examine the MOD between two images pixel by pixel in the form of the squared error of a pair of pixel intensities, and derive its measurement as:
M SE =
x=1 y=1
(ax,y bx,y ) M SE
and
RM SE =
Where
and
W H;
ax,y
and
bx,y
(x, y)
etc).
in
and
respectively.
The Root Mean Square Error is just the root of MSE. With colour
images the dierence between two pixels can be calculated using dierent colour spaces (RGB, LUV, MSE is one of the simplest and popular comparison metrics because of their analytical tractability, but it has long been accepted that it is inaccurate in predicting perceived distortion [31]. This is clearly illustrated in the following paradoxical example. Figure 1b and 1c were created by adding dierent types of distortions to the original image; the original image is shown on Figure 1a. The root mean squared error (RMSE) between each of the distorted images and the original was computed.
10
The RMSE between the more distorted image and the original is 8.5 while the RMSE between the less distorted image and the original is 9.0. Although the RMSE of the Figure 1b is less than that of Figure 1c, the distortion introduced in the rst is more visible than the distortion added to the second. We see that the root mean squared error is a poor indicator of perceptual image delity. Zhou Wang et al. poor behaviour: 1. Signal delity is independent of temporal or spatial relationships between the samples of the original signal. In other words, if the original and distorted signals are randomly re-ordered in the same way, then the MSE between them will remain unchanged. 2. Signal delity is independent of any relationship between the original signal and the error signal. For a given error signal, the MSE remains unchanged, regardless of which original signal it is added to. 3. Signal delity is independent of the signs of the error signal samples. 4. All signal samples are equally important to signal delity. Since all image pixels are treated equally in the formulation of the MSE, image content-dependent variations in image delity cannot be accounted for. [73] state that MSE is not suitable in the context of measuring the visual perception of image delity. The next implicit assumptions are provided in order to demonstrate this
1.1.2
Normalized MSE tries to reduce the sensitivity of MSE to the global-shift of image intensities. The compared images of image as of
a.
ax,y = a
ax,y a
The mean of the image is rst normalized to 0 by scaling the intensity of each pixel of
1.
sa
is further scaled as
ax,y = a
, after image
1.1.3
Template matching
Template matching is a commonly used technique in pattern recognition [33]. The basic method uses a convolution mask (template), tailored to a specic feature of the search image, which we want to detect. The convolution output will be highest at places where the image structure matches the mask structure, where large image values get multiplied by large mask values.
11
In order to compare two images we examine the cross-correlation (or auto-correlation) sequences in order to determine if a testing image contains a template image. Consider two images without any object shift. The conventional cross-correlation function of two images
and
is:
(a, b) =
x=1 y=1
However
ax,y bx,y a
and
b.
(a, b) =
x,y
Where as:
(ax,y a )(bx,y b )
x,y (ax,y
a )2
x,y (bx,y
b )2
Then the MOD metric is dened
and
a and b, respectively.
1.2
Many traditional image comparison metrics operate entirely in the spatial-frequency domain. Images are rst normalized and transformed to the Fourier Domain using FFT. A contrast sensitive function (CSF), which models the sensitivity to spatial frequencies, is then applied to the resultant magnitudes. The MOD between the two resultant images is then measured using MSE. In this section, we rst introduce CSF and examine dierent CSF lters in the literature.
r = u2 + v 2 . function M :
(u, v)
fu,v
using
12
has a peak of value 1 approximately at f=8.0 cycles/degree, and is meaningless for frequencies above 60 cycles/degree. Figure 2 shows the contrast sensitivity function
M .
1.2.3
Daly's lter
The Visible Dierences Predictor (VDP) proposed by Daly [16] is one of the most well-established algorithms for evaluating image delity. It is presented in more depth in Section 1.3.2. Rushmeier et al. [65] adapted the CSF of the VDP in a spatial-frequency domain pipeline for evaluating rendering quality against a captured image. The CSF is applied to lter, and has the following form:
fu,v
D (r) =
0.008 +1 r3
0.2
1.42re(0.3r)
1+0.06e0.3r
1.2.4
Ahumada's lter
A (r) = ac e( fc ) as e( fs )
r r 2 2
Ahumada [40] proposed a CSF that is a balanced dierence of two Gaussians as:
fc and fs are the center and surround lowpass cut-o as are the center and surround amplitudes. A default set of fc = 97.32 and fs = 12.16, but these values can be modied to
where and further details). middle range of spatial-frequencies.
ac ac = 1, as = 0.685,
Like the Mannos-Sakrison's and Daly's lter, Ahumada's lter is sensitive to the
1.3
Perceptually-based metrics
A collection of image comparison metrics, commonly called perceptually-based or HVS metrics, have been developed to simulate some features of the HVS. Many HVS metrics adopt a pipeline which in principle can be viewed as an extension of the general pipeline described in Section 1.2. These metrics are of great value for photorealistic rendering. Knowledge of the behaviour of the HVS can be used to speed up rendering by focusing computational eort into areas of an image with perceivable errors. Accounting for such HVS limitations enables computational eort to be shifted Several attempts have been made to incorporate from areas of a scene deemed to have a visually insignicant eect on the solutions appearance, and shifted into those areas that are most noticeable. what is known about the HVS into the image synthesis process.
13
1.3.1
Perception is the process by which humans, and other organisms, interpret and organize sensation in order to understand their surrounding environment. The response of the human eye to light is a complex, still not well understood process. It is dicult to quantify due to the high level of interaction between the visual system and complex brain functions. A sketch of the anatomical components of the human eye is shown in Figure 3. Figure 3: Cross Section of the human Eye (Illustration by Mark Erickson)
The main structures are the iris, lens, pupil, cornea, retina, vitreous humor, optic disk and optic nerve. The path of light through the visual system begins at the pupil, is focused by the lens, then passes onto the retina, which covers the back surface of the eye. The retina is a mesh of photoreceptors, which receive light and pass the stimulus on to the brain. Some concepts are introduced when talking about Human Visual System:
Visual acuity
The
human eye is less sensitive to gradual and sudden changes in brightness in the image plane but has higher sensitivity to intermediate changes. Acuity decreases increasing the distance.
Depth perception
is the ability to see the world in three dimensions and to perceive distance.
Images projected onto the retina are two dimensional, from these at images three dimensional worlds are constructed.
Perceptual constancy
it changes in the actual pattern of light falling on the retina. A number of psychophysical experimental studies have demonstrated many features of how the Human Visual System works. in computer graphics. However, problems arise when trying to generalize these results for use This is because experiments are usually conducted under limited laboratory
conditions and are typically designed to explore a single dimension of the HVS.
1.3.2
Daly's VDP [16] is a HVS-based image quality metric, which takes two images as input and produces a probability map for dierence detection as output. It consists of three main functional components, namely amplitude non-linearity, contrast sensitivity function and detection mechanisms.
Amplitude non-linearity simulates the adaptation of HVS to local luminance. It applies a nonlinear response function to each pixel in the input images, assuming that the adaptation results from an observer xating a small image area.
A contrast sensitivity function simulates the variations in visual sensitivity of HVS, and models the variations as a function of spatial frequency. The process is similar to that described in Section 1.2.3, applying a FFT, followed by Daly's CSF, to each image.
14
Detection mechanisms simulate the spatial-frequency selectivity of HVS by decomposing each image into 31 independent streams. Multiple detection mechanisms are then applied to the corresponding streams of the two images. These mechanisms include computation of contrasts, application of a masking function to increase the threshold of detectability, and use of a psychometric function to predict the probability of detecting a dierence at every location in each stream. Finally, the detection probabilities for all streams are combined into a single image that describes the overall probability for every location.
Figure 4: Block diagram of the Visible Dierences Predictor. Heavy arrows indicate parallel processing of the spatial frequency and orientation channels.
1.4
Tone mapping is a technique used to map a set of colours to another; often to approximate the appearance of high dynamic range images in media with a more limited dynamic range. Print-outs, CRT or LCD monitors, and projectors all have a limited dynamic range which is inadequate to reproduce the full range of light intensities present in natural scenes. Essentially, tone mapping addresses the problem of strong contrast reduction from the scene values to the displayable range while preserving the image details and colour appearance important to appreciate the original scene content. The human eye is sensitive to relative luminance rather than absolute luminance. Taking advantage of this allows the overall subjective impression of a real environment to be replicated on some display media. Tone reproduction operators can be classied according to the manner in which values are Single-scale operators proceed by applying the same scaling transformation for each transformed.
pixel in the image, and that scaling only depends on the current level of adaptation, and not on the real-world luminance. Multi-scale operators take a diering approach and may apply a dierent scale to each pixel in the image. Tone reproduction operators are useful for giving a measure of the perceptible dierence between two luminance levels at a given level of adaptation. This function can then be used to guide algorithms where there is a need to determine whether some process would be noticeable or not to the end user.
1.4.1
Tumblin and Rushmeier [71] were the rst to apply the dynamics of tone reproduction to realistic image synthesis. and Stevens [67] they proposed a tone reproduction operator to match the brightness of the real scene to the brightness of the computed image displayed on a CRT.
15
Applying Steven's equation, which relates brightness to target luminance, the perceived value of a real world luminance
Lw ,
is computed as:
(La(w) )
and
(La(w) )
(La(w) ) = 0.4 log10 (La(w) ) + 1.519 (La(w) ) = 0.4(log10 (La(w) ))2 0.218 log10 (La(w) ) + 6.1642
If it is assumed that a display observer viewing a CRT screen adapts to a luminance,La(d) , the brightness of a displayed luminance value can be similarly expressed:
(La(d) )
and
(La(d) )
Bw
must equal
Bd .
1 Ld = 10 104
( 10
Lw )
a(w) a(d)
This represents the concatenation of the real-world observer and the inverse display observer model. Ward [74] proposed a linear transform with similar result, while reducing computational expense, transforming real world luminances,
Lw ,
to display luminances,
Ld ,
through
m,
a scaling factor:
Ld = mLw
The consequence of adaptation can be thought of as a shift in the absolute dierence. The scaling factor,
m,
dictates how to map luminances from the world to the display such that a Just Noticeable JND is the smallest
Dierence (JND) in world luminances maps to a JND in display luminances. luminance in the related case).
detectable dierence between a starting and secondary level of a particular sensory stimulus (the A critical aspect of tone mapping is the visual model used. As we move through dierent environments or look from place to place within a single environment our eyes adapt to the prevailing conditions of illumination both globally and within local regions of the visual eld. performance. These adaptation processes have dramatic eects on the visibility and appearance of objects and on our visual
16
2 Image compression
Texture mapping is employed on rendering systems to increase the visual complexity of a scene without increasing its geometric complexity [61]. Texture compression can help to achieve higher graphics quality with given memory and bandwidth or reduce memory and bandwidth consumption without degrading quality too much. There are many compression techniques for images, most of which are geared towards compression for storage or transmission. In choosing a compression scheme for texture mapping there are several issues to consider:
Decoding speed.
feature of the compression scheme is fast decompression so that the time necessary to access a single texture pixel is not severely impacted.
Random access.
texture compression schemes must provide fast random access to pixels in the texture.
a texture, they achieve much lower compression rates than lossy schemes. However, using a lossy compression scheme introduces errors into the textures.
Encoding speed.
since decoding speed is essential while encoding speed is useful but not necessary. Image compression can be lossy or lossless. Lossless compression is sometimes preferred for articial images such as technical drawings, icons or comics. This is because lossy compression methods, especially when used at low bit rates, introduce compression artifacts. Lossless compression methods may also be preferred for high value content, such as medical imagery or image scans made for archival purposes. We classify image compression techniques in the next two sections. Finally we discuss several approaches to compress textures with repeated content. Texture compression is a specialized form of image compression designed for storing texture maps in 3D computer graphics rendering systems. Unlike conventional image compression algorithms, texture compression algorithms are optimized for random access. Most texture compression algorithms involve some form of xed-rate lossy vector quantization of small xed-size blocks of pixels into small xed-size blocks of coding bits, sometimes with additional extra pre-processing and post-processing steps. Block Truncation Coding is a very simple example of this family of algorithms. Because their data access patterns are well-dened, texture decompression may be executed on-they during rendering as part of the overall graphics pipeline, reducing overall bandwidth and storage needs throughout the graphics system. As well as texture maps, texture compression may also be used to encode other kinds of rendering map, including bump maps and surface normal maps. anisotropic ltering. Texture compression may also be used together with other forms of map processing such as MIP maps and
2.1
Indexed colour.
Vector quantization.
Images can be compressed using a codebook of image blocks [30, 4]. Vector By operating on small blocks, vector
quantization uses small blocks (e.g. 44 pixels), because with larger blocks the codebook cannot
17
Chroma subsampling.
This technique takes advantage of the fact that the eye perceives spatial
changes of brightness more sharply than those of colour, by averaging or dropping some of the chrominance information in the image [10].
Block decomposition.
small blocks with local colours. S3 Texture Compression (S3TC), sometimes also called DXTn or DXTC, is an implementation of block decomposition technique. It breaks a texture map into 4 x 4 blocks of texels. For opaque texture maps, each of these texels is represented by two bits in a bitmap, for a total of 32 bits. In addition to the bitmap, each block also has two representative 16 bit colours in RGB565 format associated with it. These two explicitly encoded colours, plus two additional colours that are derived by uniformly interpolating the explicitly encoded colours, form a four colour lookup table. This lookup table is used to determine the actual colour at any texel in the block. In total, the 16 texels are encoded using 64 bits, or an average of 4 bits per texel. Decoding blocks compressed in S3TC format is straightforward. A two-bit index is signed to each of the 16 texels. A four colour lookup table (see Figure 5) is then used to determine which 16-bit colour value should be used for each texel. The decoder can be operated at very high speeds and replicated to allow parallel decoding for very high performance solutions.
Transform coding.
tion to discard, thereby lowering its bandwidth. The remaining information can be compressed then via a variety of methods. When the output is decoded, the result may not be identical to the original input, but is expected to be close enough for the purpose of the application. The common JPEG image format is an example of a transform coding, one that examines small blocks of the image and averages out the colour using a discrete cosine transform to form an image with far fewer colours in total.
Fractal compression.
The fractal compression technique [37] relies on the fact that in certain
images, parts of the image resemble other parts of the same image. Fractal algorithms convert these parts, or more precisely, geometric shapes into mathematical data called fractal codes which are used to recreate the encoded image. Fractal compression diers from pixel-based compression schemes such as JPEG, GIF and MPEG since no pixels are saved. Once an image has been converted into fractal code its relationship to a specic resolution has been lost; it becomes resolution independent.
2.2
Run-length encoding.
18
Entropy encoding.
Entropy coding creates and assigns a unique prex code to each unique symbol
that occurs in the input. These entropy encoders then compress data by replacing each xedlength input symbol by the corresponding variable-length prex codeword. The length of each codeword is approximately proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes.
algorithm. A particular LZW compression algorithm takes each input sequence of bits of a given length and creates an entry in a table for that particular bit pattern, consisting of the pattern itself and a shorter code. As input is read, any pattern that has been read before results in the substitution of the shorter code.
2.3
Compression of images with repetitive patterns is an interesting problem on the eld of image compression. They can be readily observed in man-made environments: buildings, wallpapers, oors, tiles, windows, fabric, pottery and decorative arts; and in nature: the arrangement of atoms, honeycomb, animal fur, gait patterns, feathers, leaves, waves of the ocean, and patterns of sand dunes. To simulate the real world on computers faithfully, textures with repetitive patterns deserve special attention. Humans have an innate ability to perceive and take advantage of symmetry [50]. Rao and Lohse [64] show that regularity plays an important role in human texture perception. Mathematically speaking, regular texture refers to periodic patterns that present non-trivial translation symmetry, with the possible addition of rotation, reection and glide-reection symmetries [57]. When studying periodic patterns, a useful fact from mathematics is the answer to Hilbert's 18th problem: there is only a nite number of symmetry groups for all possible periodic patterns in dimension n. When n =1 there are seven frieze groups, and when n =2 there are 17 wallpaper groups. Here group is referring to the symmetry group of a periodic pattern. A symmetry group is composed of transformations that keep the pattern invariant. Most of the work on compressing near-regular images is related to the eld of texture synthesis: the process of algorithmically constructing a large digital image from a small digital sample image by taking advantage of its structural content. Procedural texture synthesis [59] is based on generative models of texture. The sample-based approach does not require any previous texture model, yet has the capability of reproducing textures with similar appearance as the input sample [34]. The class of textures that yield good results for texture-synthesis algorithms remains an open question. Lin et al. [51] compare several texture-synthesis algorithms. Their results show that general purpose synthesis algorithms fail to preserve the structural regularity on more than 40% of the tested samples. These results demonstrate that faithful near-regular texture synthesis remains a challenging problem. Vector quantization ( on page 17) is a good option when compressing near-regular textures, but the diculty is that even if the image content is highly repetitive, the rigid placement of the blocks implies that they will most often be unique. Other techniques try to make more adaptive the placement of the blocks. Liu et al. [77] and Hays et al. [38] analyze near-regular textures to infer lattice structures and local deformations. Epitomic analysis of the image is also used [39]. The epitome of an image is its miniature, condensed version containing the essence of the textural and shape properties of the image and is built from the patches of various sizes from the input image. Wang et al. [72] propose and epitomic analysis that enables random-access reconstruction of the image, making it more suitable for interactive applications.
19
3 Texture atlases
The lack of batching decreases the performance of graphical applications. A batch consists of a number of render-state changes followed by a draw-call. Submitting hundreds or worse, thousands of batches per frame inevitably makes an application CPU-limited due to inherent driver overhead. Figure 6: Example of a texture atlas
The use of texture atlases reduces the number of batches caused by having to repeatedly bind dierent textures. The atlas comprises a set of charts, each of which maps a connected part of the 3D surface (a patch) onto a piece of the 2D texture domain. Models using these packed atlases need to remap their texture coordinates. One way proposed to generate an atlas is to make a chart for each triangle. The simplied triangles are then packed into texture space and sampled to generate texture maps [66]. However seams may appear between triangles due to bilinear interpolation between adjacently packed triangle charts. Alternatively, triangles may be clustered into patches which are then parameterized as charts [22]. However if the matching boundary edges dier in length or orientation in the texture domain, it is still dicult to eliminate subtle seams along the boundaries (even if a one texel padding is applied just outside the charts). Recognizing that seams are an important problem with atlases, various approaches have been developed to minimize their eect. The seams may be forced into regions of high negative curvature [49] making them less apparent. As an alternative, an image delity metric [79] can be used to minimize the visual eect of seams. Work on procedural solid texturing has produced a multi-resolution texture atlas [9], which uses standard mip-mapping on graphics hardware. This texture atlas has several desirable properties, including control of the sampling rate across the surface and ecient use of the entire texture space. However, the scheme used still generates seams between charts except at the highest-resolution mipmap level. Newer approaches avoid seams by parameterizing the surface onto regular charts [62]. While stored discontinuously, neighbouring charts have corresponding samples and a continuous interpolation can be dened along the surface. To avoid splitting the geometry along chart boundaries Tarini et al. [69] parameterize surfaces on the faces of a regular polycube: a set of xed size cubes surrounding the object. Polycube maps dene a continuous, tileable texture space. However the xed resolution has to be carefully chosen to match the geometric features, the construction requires manual intervention and a triangle mesh is required to encode the parameterization. To enable texturing of implicit surfaces and avoid explicit parameterization altogether, Benson et al. [5] and DeBry et al. [18] proposed to encode texture data in an octree surrounding the surface. This provides low distortion and adaptive texturing, at the expense of a space and time overhead. Such methods are particularly well suited for interactive painting on 3D objects, where the intrinsic adaptive sampling of the octree structure reduces the waste of memory exhibited by xed-resolution 2D maps. More recently, Lefebvre et al. [45] (see Figure 7) and Lefohn et al. [47] have proposed GPU
20
implementation of octree-textures, encoding them in simple 2D or 3D textures, adapted to ecient access by the fragment shader. Figure 7: TileTree by Lefebvre et al. [45] Left: A torus is textured by a TileTree. Middle: The TileTree positions square texture tiles around the surface using an octree.Right: The tile map holding the set of square tiles.
Most of the existing literature focus on how to build seamless texture atlases for continuous photometric detail, but little eort has been devoted to devise ecient techniques for encoding self-repeating, uncontinuous signals such as building facades. When each polygon of the input model has assigned a dierent texture, seam artifacts do not occur and more simple strategies for packing the charts into an atlas are possible. In this thesis we focus on this type of models.
21
4.1
Terrain LOD algorithms use a hierarchy of mesh renement operations to adapt the surface tessellation. Algorithms can be categorized by the structure of these hierarchies as follows:
Irregular meshes.
This technique requires the tracking of mesh adjacencies and renement de-
pendencies but provides the best approximation for a given number of faces. Some hierarchies visually softens the transition between two levels of triangulation [15] while others allow arbitrary connectivities[17].
Bin-tree hierarchies.
layout and traversal algorithms [52]. However, these semi-regular meshes still involve randomaccess memory references and immediate-mode rendering.
Bin-tree regions.
Bin-tree regions [48] dene coarser-grain renement operations on regions asPrecomputed triangulated regions are uploaded to buers
Tiled blocks.
Tiled blocks partitions the terrain into square patches that are tessellated at dierent
resolutions [35, 12]. The main challenge is to stitch the block boundaries seamlessly.
Geometry clipmaps.
Geometry clipmaps [54] caches the terrain in a set of nested regular grids As the viewpoint moves, the The approach
centered about the viewer. These grids represent ltered versions of the terrain at power-of-two resolutions, and are stored as vertex buers in video memory. clipmap levels shift and are incrementally relled with data. Rather than dening a world-space quadtree, the geometry clipmap dene a hierarchy centered about the viewer. clipmaps [68] (see Section 4.2). has parallels with the LOD treatment of images in texture mapping as it is based on texture
4.2
Huge urban models often require textures that exceeds the main memory capacity.
method for dealing with large textures requires subdividing a huge texture image into small tiles of sizes directly supportable by typical graphics hardware. This approach provides good paging granularity for the system both from disk to main memory and from main memory to texture memory. The problem of texture complexity has been addressed in several approaches. Clipmaps of Tanner et al. [68] use dynamic texture representation that eciently caches textures of arbitrarily large size in a nite amount of physical memory. Cline and Egbert [13] proposed a software approach for large texture handling. At runtime, they determine the appropriate mipmap level for a group of polygons based on the projected screen size of the polygons and the corresponding area in texture space. In another terrain viewing application, Lindstrom et al. [60] use the angle at which textures are viewed to reduce texture requests over using a simple distance metric.
22
Lefebvre et al. [46] proposed a GPU-based approach for large-scale texture management of arbitrary meshes. The novel idea of their approach is to render the texture coordinates of the visible geometry into texture space to determine the necessary texture tiles for each frame. However the method requires a cost-intensive fragment shader and the geometry has to be rendered multiple times per frame. Carr and Hart [9] introduced a texture atlas for real-time procedural solid texturing. They partition the mesh surface into a hierarchy of surface regions that correspond to rectangular sections of the texture atlas. This structure supports mipmapping of the texture atlas because textures of triangles are merged only for contiguous regions on the mesh surface. Frueh et al. [27] described an approach to create texture maps for 3D city models. The technique includes the creation of a specialized texture atlas for building facades and supports ecient rendering for virtual y-throughs. Hesina et al. [29] described a texture caching approach for complex textured city models. Their approach is restricted to interactive walk-throughs. Lakhia [42] proposed an out-of-core rendering engine which applies the cost and benet approach of the Adaptive Display algorithm by Funkhouser and Sequin [28] to hierarchical levels of detail (HLOD) [23] achieving interactive rendering of detailed city models. To support texturing, they store downsampled versions of the original scene textures with each HLOD. Buchholz and Dllner [6] presented a level-of-detail texturing technique that creates a hierarchical data structure for all textures used by scene objects (see Figure 8), and it derives texture atlases at dierent resolutions. At runtime, their texturing technique requires only a small set of these texture atlases, which represent scene textures in an appropriate size depending on the current camera position and screen resolution. Figure 8: Distance-dependent texture selection proposed by Buchholz and Dllner [6]
Another way to handle large textures the use of a large virtual texture as a stochastic tiling of a small set of texture image tiles [14]. The tiles may be lled with texture, patterns, or geometry that when assembled create a continuous representation. mapping on graphics hardware. Li-Yi Wei [75] extended the tile-based texture
23
O(nlog(n))
time, where
is the number
of elements to be packed. The algorithm can be made much more eective by rst sorting the list of elements into decreasing order, although this does not guarantee an optimal solution, and for longer lists may increase the running time of the algorithm. Most of the contributions in the literature are devoted to the case where the items to be packed have a xed orientation with respect to the stock unit, one is not allowed to rotate them.
5.1
Two-dimensional models
The rst attempt to model two-dimensional packing problems was made by Gilmore and Gomory [58]. They proposed a column generation approach based on the enumeration of all subsets of items (patterns) that can be packed into a single bin. Let
belongs to the
Aj be a binary column vector of n elements j th pattern, and the value 0 otherwise. The matrix A, composed by all possible Aj columns
min
j=1
Subject to:
xj
aij xj = 1, (i = 1, ..., n)
j=1
xj
if pattern
otherwise. Is easy
to see the immense number of columns that can appear in is to dynamically generate columns when needed.
Beasley [3] considered a two dimensional cutting problem in which a prot is associated with each item , and the objective is to pack a maximum prot subset of items into a single bin. Figure 9: The Fekete and Schepers modeling approach
A completely dierent modeling approach has been proposed by Fekete and Schepers [26], through a graph-theoretical characterization of the packing of a set of items into a single bin. Let
Gw = (V, Ew )
24
vi
(vi , vj )
and
Figure 9). They proved that, if the packing is feasible then: 1. For each stable set 2.
of
Gw ,
vS
wi W
Ew Eh =
Approximation algorithms
5.2
In the next sections we present o-line algorithms (algorithms which have full knowledge of the input). Two classical constructive heuristics algorithms and metaheuristics techniques are presented.
5.2.1
Strip packing
[21]extended two classical approximation algorithms for one dimension to the two
Coman et al.
dimensional strip packing problem. They assume that the items are sorted by non-increasing height and packed in levels:
Next-Fit Decreasing Height (NFDH) algorithm packs the next item, left justied, on the current level (initially, the bottom of the strip), if it ts. Otherwise, the level is closed, a new current level is created (as a horizontal line drawn on the top of the tallest item packed on the current level), and the item is packed, left justied, on it.
The First-Fit Decreasing Height (FFDH) algorithm packs the next item, left justied, on the rst level where it ts, if any. If no level can accommodate it, a new level is created as in NFDH. Figure 10 shows the result of an FFDH packing.
5.2.2
Bin packing
Chung et al. [24] studied the following two-phase approach. The rst phase of algorithm Hybrid FirstFit (HFF) consists of executing algorithm FFDH of the previous section to obtain a strip packing. Consider now the one dimensional instance obtained by dening one item per level, with size equal to the level height, and bin capacity H. It is clear that any solution to this instance provides a solution to two dimensions. Hence, the second phase of HFF obtains a two dimensional solution by solving the induced one dimensional instance through the First-Fit Decreasing one-dimensional algorithm. The same idea can also be used in conjunction with NFDH and BFDH. The time complexity of the resulting algorithm remains
O(nlog(n)).
5.2.3
Metaheuristics
Metaheuristic techniques are nowadays a frequently used tool for the approximate solution of hard combinatorial optimization problems. We refer to Aarts and Lenstra [20] for an in introduction to this area.
25
feasible solutions and solutions in which some of the items overlap. During the search, the objective function is thus the total pairwise overlapping area, and the neighbourhood contains all the solutions corresponding to vertical or horizontal items shifting. As soon as a new feasible solution improving the current one is found, an upper bound is xed to its height.
5.3
Exact algorithms
An enumerative approach for nding the optimal solution of bin packing was proposed by Martello et al. [56]. Initially, the items are sorted by non increasing height, and a reduction procedure determines the optimal packing of a subset of items in a portion of the strip, thus possibly reducing the instance size. Fekete and Schepers [25] recently developed an enumerative approach, based on their model (see Section 5.1), to the exact solution of the problem of packing a set of items into a single bin by determining, through binary search, the minimum height such that all the items can be packed into a single bin of base W and height H.
26
Part IV
Space-optimized texture atlases
In this section we present an ecient technique for generating space-optimized texture atlases for the particular case of 3D buildings. Our method allows the visualization of a huge city in real-time. We also encode self-repeating and uncontinuous signals such as building facades reducing the spatial redundancy and compress these self-repeating details according perceptual measures. Texture atlases are a well-known technique used to reduce the lack of batching (see Section3). With the introduction of Texture Arrays it is possible to store a collection of images with identical size and format, arranged in layers. An array texture is accessed as a single unit in a programmable shader, using a single coordinate vector. A single layer is selected, and that layer is then accessed as though it were a one or two-dimensional texture. Some articles used this technique to replace the use of typical texture atlas. Sylvain Lefebvre [44] presents an approach to display properly ltered tilings. Although the benets of texture arrays are clear (transparent integration, only one binding per array) we still used texture atlas for two main reasons: 1. Texture arrays require all the layered images to have the same dimension. Our approach supports varying texture dimensions to benet from perceptual-driven and user-dened texel size compression (see Section 7.1 and 7.2). 2. Texture arrays require DirectX 10 or higher disabling the compatibility with several graphical cards.
6.1
(M, I), where M is a textured polygonal mesh and M contains well-dened edges and thus texture coordinates (s, t) are specied per-corner. We also allow input (s, t) coordinates in M to be outside the range [0, 1] to support repeating textures. Without loss of generality we assume s 0 , t 0.
Our optimization scheme takes as input a tuple
I = {Ii }
In order to facilitate the integration of our scheme with LOD techniques, the user must provide a parameter
with an upper bound of the desired texel size in object space. This parameter can be
easily computed from the viewing range associated with each geometry LOD level so that the screen projection of each texel approximately matches one on-screen pixel. We introduce in this Section a new pipeline, depicted on Figure 11 to generate a texture atlas: 1.
it is repeated, and the largest surface area has a repeat factor of 24 and
ws hs each tile it is mapped onto. For 48 48 quad with texture coordinates (0, 0), each tile maps onto a 12 8 patch of the surface
Each input image
is downsampled according
(see Section 7.1). For example, the texture atlas for a detailed
l = 5cm/texel,
l = 50cm/texel.
27
3.
stretch factors according to perceptual-based image saliency measures (see Section 7.2).
allows to further reduce the size of some textures with little perceptual impact on the rendered
4.
Texture packing.
Texture images are packed into a single texture atlas while minimizing unused
space (see Section 7.3.2). We restrict ourselves to rectangular tiles as most tiled textures used in modeling are rectangular to mimic 3D APIs REPEAT wrapping modes. 5.
6.2
Real-time visualization
As we created all the required texture atlas, we are able to visualize in real-time the whole tridimensional model. Three dierent parts are described of this process:
Texture wrapping.
This stage, which is described in Section 8.1, involves the process of wrapping We present a wrapping scheme supporting mip-
each chart of the atlas onto the polygons of the scene and achieve a wrapping scheme able to compress eciently the repetitive patterns. mapping and DXTC compression.
Building visualization.
set of input buildings are partitioned using a quadtree presented in Section 8.2.1. The control of level of detail is done with a texture atlas tree and guarantees a maximum number of texture batches increasing the performance.
Terrain visualization.
Each tile have associated a set of textures from lowest to highest resolution. The control of level of detail is done in a way similar to the building visualization.
28
l.
Let
wi hi
Ii ,
and let
ws hs
Ii
is mapped onto. If
lwi > ws
or
lhi > hs
using
a bilinear lter. Since the resulting downsampled image respects the user-dened texel size in object space, we can safely assume that the image detail lost in this step will be visually indistinguishable under the viewing conditions associated with the LOD level the texture atlas will be used for. For a set of polygons
P = p1 , p2 , ..., pn
of them have similar area precision (meters/pixel). We want to obtain a new dimension of the texture that will be inserted in the atlas tting the new of a polygon
l.
lp
is:
lpwidth (p) =
lpheight (p) =
pheight ptextureHeight
And we want the next conditions to match it with the user-dened texel size coverage:
Notice that we get the minimum between the downsampled dimension and the available texture space to avoid the use of unnecessary memory space.
29
7.2
In Section 1 we reviewed dierent metrics used to obtain a measure of the similarity between two images. We also have seen that dierent metrics may measure dierent properties so it is dicult to compare them. In our case we want to measure the error introduced by the further subsampling of an image. Then, given an image and a maximum error threshold we want to return a new subsampled image with an error not greater than the selected maximum error. The objective is to construct a perceptual-driven texture compression bounded by a given maximum error. This will let us decrease the image quality within a tolerance. In the next sections we will present a generic texture compression approach and two dierent image comparison metrics used by our method. Given an input image the dierence between
Ii ,
Ii
Io
Io
such that
from
Io
measured by some perceptual-based metric. Therefore our algorithm must search for a pair representing the nal size of
(wo , ho )
Io .
We restrict
(wo , ho )
Ii
Io
M M
means that the two images are identical and an error value of one means that they are totally dierent from the viewpoint of the metric. So we must accomplish the next condition:
M Ii , Io , [0, 1]
(1)
Since analytical formulae expressing the perceptual image dierence in terms of downsampling factor are rather complex, we adopt a much simple approach. Our algorithm performs a search of the optimum
(wo , ho )
values using a dicotomic search. A rst option is to rst perform a dicotomic search
wo [4..wi ] satisfying the error threshold, and then repeat the process for ho [4..wi ]. This algorithm requires log2 (wi )+log2 (hi ) comparisons. For
However,
512512 input size, this amounts for 18 image dierence evaluations per input image.
since(wo , ho ) are correlated, this approach tends to produce anisotropically-scaled images depending on which coordinate is optimized rst. Therefore we adopted a dierent approach, consisting in searching
(wo , ho ) simultaneously, using wo and ho alternatively at each step of the binary search [4..wi ] and then 1D interval [4..hi ], we split the 2D rectangular interval [4..wi ] [4..hi ] alternating horizontal and vertical subdivisions. An example of how the search space for the optimum (wo , ho ) is reduced at each step of the binary search is shown
for the optimum (i.e. instead of recursively splitting the 1D interval in Figure 12. To approximate the dimensions of the output image
Io
binary search to detect the minimum size that has an error less or equal than the desired threshold. The algorithm rst decreases the image width while the error does not exceed the threshold (see Algorithm 1). Then it decreases the height using the same method. The subsampling of an image is done using a bilineal lter. For better results, the pass is also done decreasing rst the image height and then the width (see Algorithm 2). Indeed we have not seen signicant dierences between the output images when swapping the order of the decrease at each iteration. In order to make it possible the comparison between the subsampled image and the original image, we have to resize the subsampled image to the original size also using a bilineal lter.
30
Figure 12: Example of search space reduction using binary search. Each point of the square represents a texture size more steps).
(w, h). Upper row : search on w (rst three steps) followed by Lower row : alternating search on w, h. Note that in general both
search on
(three
Algorithm 1 Subsampling image in one direction with error metric function subsample_image (I , , direction)
lower = 0 {lower bound} upper = 1 {upper bound} c = 0 {current error} while c [ , + ] do { / (lower +upper ) = 2
is a tolerance threshold}
if
else if end if
then
then
else
end if end while return last valid subsampled image Algorithm 2 Compression of an image with error metric function visual_metric_compression (image,error) subsampled_wh =subsample_image (image, error, width) subsampled_wh =subsample_image (subsampled_wh, error, height) subsampled_hw =subsample_image (image, error, height) subsampled_hw =subsample_image (subsampled_hw, error, width) if size (subsampled_wh) > size (subsampled_hw) then return subsampled_hw else return subsampled_wh end if
31
upper =
7.2.2
The rst error metric we tested for the compression is the mean squared error dierence that operates on the spatial domain (see Section 1.1.1). For a pair of two pixels we take their intensity and calculate the squared deviation. The global mean squared error is a value between 0 and 1 that quanties the amount by which the metric estimator diers from the true value of the quantity being estimated. We try to reduce the sensitivity of RMSE to the global-shift of image intensities using the normalized squared error metric (see Section 1.1.2) instead of the typical RMSE. In Figure 13 we see an estimation of the error caused by subsampling using the RMSE. The original image is a facade detail on the lower right part of the gure. The upper left graphic shows the error due to the width subsampling and the right one because the height. simultaneous subsampling of width and height. We see that the error tends to increase too much slowly as we increase the subsampling level. This is a well-known problem of RMSE (see Zhou Wang et al. [73]). The dierence between one pixel and the corresponding subsampled and resampled pixel of a lower detail level will change slowly; the subsampled colour is interpolated using a bilineal lter and tends to be similar to the pixel with more level of detail. In fact the comparison is done in a locally manner and this is not a desirable property if we are trying to quantify the error of subsampling. So we clearly see that mean squared error does not distinguish well the critical decrease of detail and the perceptual impact that has on the user. More comparisons that illustrates this behaviour are shown in Section 10.1. Figure 13: RMSE error of the subsampling of a facade detail The lower left graphic shows
7.2.3
In order to improve the sensitivity of the RMSE metric we focused on the HVS-based metrics (see Section 1.3). Our human visual system metric is based on the paper of Yee et al. [78] that describes a perceptually based image comparison process that can be used to tell when images are perceptually identical even though they contain imperceptible dierences. Their technique has shown much utility in the
32
production testing of rendering software and focus on the VDP (see Section 1.3.2). The VDP gives the per-pixel probability of detection given two images to be compared and a set of viewing conditions. Daly's model takes into account three factors of the Human Visual System (HVS) that reduce the sensitivity to error. The rst, amplitude non-linearity, states that the sensitivity of the HVS to contrast changes decreases with increasing light levels. This means that humans are more likely to notice a change of a particular magnitude at low light conditions than the same change in a brighter environment. Secondly, the sensitivity decreases with increasing spatial frequency. For example, a needle is harder to spot in a haystack than on a white piece of paper due to the higher spatial frequency of the haystack. Finally, the last eect, masking, takes into account the variations in sensitivity due to the signal content of the background. Yee et al. [78] also use an abridged version of the VDP in the same way as Ramasubramanian et al. [63] in which they drop the orientation computation when calculating spatial frequencies and extend VDP by including the colour domain in computing the dierences. Figure 14: Framework of the
CIELAB
colour model
We are assuming that the reference image and the image to be compared are in the RGB colour space, so the rst step will be the conversion of the images into colour space where
a colour space designed to be perceptually uniform, where the Euclidean distance between two colours corresponds to perceptual distance (see Figure 14). To compute the threshold elevation factor frequency hierarchy is constructed from the
F,
computed using the Laplacian pyramid of Burt and Adelson [7]. The pyramid enables to compute the spatial frequencies present the image to determine how much sensitivity to contrast changes decreases with increasing frequency. The pyramid is constructed by convolving the luminance separable lter. Following Ramasubramanian et al. [63] we compute the normalized Contrast Sensitivity Function (CSF) multiplied by the masking function given in [16] to obtain the combined threshold elevation factor,
channel with a
F.
We compute some of the intermediate variables from the eld of view (f ov ) and the image
pixelsP erDegree =
cyclesP erDegree =
Where
pixelsP erDegree 2
f ov
is the horizontal eld of view in radians, width is the number of pixels across the screen.
The top level of the Laplacian pyramid corresponds to frequencies at cycles per degree and each level
33
thereafter is half the frequency of the preceding level. Typical values for
f ov
in the next section. The CSF is computed as a function of cycles per degree and luminance. Finally two tests are performed to know if the images are perceptually dierent. If the dierence of luminance between two corresponding pixels in the reference and test images is greater than a threshold, then the rst test fails. The threshold is dened by the average of pixels in a one degree radius from the
and
channels of the reference and test images using a scale factor that turns o the colour test in the mesopic and scotopic luminance ranges (nigh time light levels) where colour vision starts to degrade. We have done some modications in the original algorithm of Yee et al. [78] to adapt it for our requirements. We do not want to decide only if two images are dierent or not as we want to quantify this dierence. So we take the total number of pixels that failed the HVSE test and divide it by the total number of pixels of the image giving us a value between 0 and 1. Then a value of 0 means that the images are identical and 1 that are totally dierent in terms of human perception. Figure 15: HVSE error of the subsampling of a facade detail
In Figure 15 we see an estimation of the error caused by subsampling using the HVSE with the same presentation scheme as used in Figure 13. established at a default The luminance for the HVS calculations has been
100cmd2 .
range than the y axis of Figure 13. As we see HVSE has more sensitivity to the eect of subsampling compared with RMSE in all the three tests and is more reactive to the increase of subsampling. As we have more range of error with HVSE it also means that we have more resolution and furthermore it is designed to react better with the human visual system conditions. So we decided to use HVSE for our compression scheme based on visual tolerance, although it is slower than the classical RMSE. In Section 10.1 we show more examples of this low RMSE sensivity.
34
7.3
We need to pack a set of rectangular textures with varying sizes into a single texture atlas. The size of a texture atlas must be a power of two to make an ecient use of texture hardware memory. We use a Binary Space Partition [70] to dene the space occupied by each input texture. An overview of our bin packing is shown in Algorithm 3. The input parameters are a collection of textures, an area precision (meters per pixel) and visual metric tolerance. We rst sort the input textures from the biggest to the smallest. This is a common step also used by strip bin packing [21] to optimize the occupied space. The next step, consist in predicting the minimum size of texture atlas and increase it while we do not have enough space. Finally we optimize the size of the inserted textures growing them progressively. In the next sections we will explain in detail each of these mentioned steps.
Algorithm 3
Sort textures from biggest to smallest Calculate the minimum size of texture atlas valid_size = Insert all the textures in the atlas
while
(not valid_size)
do
end while
7.3.1
Let
We dene the initial size of the texture atlas taking in account the sum of all the sizes of input textures.
I = i1 , i2 , ..., in
= log2
iwidthj iheightj
j=1
width = 2[ 2 ] height = 2[ 2 ]
This formula returns a width and height potency of two and minimum size to contain the input textures. To increase the size of the texture atlas to the next greater valid size, we just sum one unit to
7.3.2
A node of the texture atlas binary tree denes a rectangular region of the texture atlas. The root of the tree denes the whole texture space. Figure 16). We call the insert function (see Algorithm 4) passing it the root node and the width and height of the texture atlas. If the texture is too small we just return. We accept if it ts perfectly on the node. Otherwise we split the node in the direction (by width or height) that provides more space for the input image and recursively call the insert function. So the insertion time takes texture or unoccupied. If not, has two child nodes that overlap all the space of the parent node (see
O(log(n)).
35
Algorithm 4 if if
Inserting an image
function insert (nod) node is empty and is a leaf is too small return
then
then
end if 7.3.3
insert (node.right)
We also optimize the occupied space of input textures after we have inserted them on the hierarchy. Since we have the restriction to only deal with texture sizes power of two, a considerable amount of unused space may appear. The optimization process has two steps explained below. First we perform a binary search taking the lower bound as the current size of each inserted textures and the upper bound the maximum size of the texture. We just iterate trying to optimize the stretch and obtain the maximum occupied space (see Algorithm 6). Then for each pair of leafs of the BSP tree, in the case we have one lled with a texture and the other one empty, we just expand the lled one to occupy the empty space (see Algorithm 5). This enables us to obtain a nal 100% occupation of the texture atlas. Section 10.2. Packing results are described in
36
Algorithm 5 if if
then
right node is empty and left node is lled Expand left node to occupy also right node
then then
Algorithm 6 Optimizing texture stretch repeat for each texture do end for for if
New texture size equals (lower bound size + upper bound size ) / 2 Insert stretched textures to atlas each texture valid atlas
do then
else
end if end for until not convergence of lower and upper bound
37
8.1.1
We have tested several ways to encode the texture coordinates of the charts packed into a texture atlas. For any chart we have the next constraints (all the dimensions are in the normalized range ): 1. Origin of the chart 2. Size of the chart
Our texture atlases support periodic texture tiling by mimicking OpenGL's GL_REPEAT wrapping mode. Since all the charts are rectangular, this support can be eciently implemented on a fragment shader with minimum processing overhead. Suppose that, in the input model, a primitive had assigned a periodic texture Let
T.
O,S
17). Typically, texture atlas construction implies replacing the local by global
(s, t)
(s , t )
coordinates to reect the new parameterization. In our texture atlases all charts are
axis-aligned rectangles, so conversion to global coordinates can be done using this equation,
s t
Sx 0
0 Sy
Ox Oy
s t 1
Unfortunately, the above conversion from local to global coordinates is valid only for input coordinates in the range
(s, t)
AppGPU 4 oats/T
VBO compatibility
No
Yes Yes
12 oats/T
none
Table 1: Space and processing overheads of the three options considered for tiling periodic images
(s, t)
interpolated on a per-fragment basis. We then let the fragment shader perform the conversion to global coordinates using this straightforward formula,
s t
where now
Sx 0
0 Sy
Ox Oy
fract(s) fract(t)
(2)
(s, t)
We are multiplying the fractional part of original textures by the chart size and then adding the oset of the origin.
(s , t )
(s, t).
(s, t)
coordinates and use them for the texture lookup. The algorithm 7 of the appendix shows the simple GLSL shader code of the texture mapping using the mapping function of Equation 2. With the standard texture lookup functions the implicit level of detail is selected as follows: for a texture that is not mip-mapped, the texture is used directly but if it is mip-mapped and running in a fragment shader, the LOD computed by the implementation is used to do the texture lookup. If it is mip-mapped and running on the vertex shader, then the base texture is used. But that is not valid using a texture atlas because we are referring a local coordinate for each chart and we need to specify the derivatives using the local coordinates. So we enable the extension GL_ARB_shader_texture_lod in order to use the function texture2DGrad that makes a texture lookup with explicit gradients (visit www.opengl.org/registry/doc/GLSLangSpec.Full.1.40.05.pdf for more information). We have explored three dierent options for passing the tuple table 1). In the following discussion we use GLSL terminology. is to encode
(O, S)
(O, S)
in a
vec4 uniform
primitive. This implies an overhead of storing and transmitting 4 oats per triangle. Unfortunately, the current OpenGL specication does not support binding uniform variables to buer objects, i.e., uniforms cannot be modied in a vertex array. Therefore this approach is not compatible with using Vertex Buer Objects (VBOs) for rendering groups of primitives, which is the most ecient rendering mode in current graphics hardware. A second option is to send
bound to buer objects and hence the option is compatible with VBOs. The memory requirements for the application are the same as in the previous option. The transmission overhead, though, depends on the rendering mode. For VBOs, the attribute must be specied on a per-corner basis (OpenGL only supports a single index buer for vertex-arrays). oats per triangle. The third option we considered attempts to minimize space and transmission overheads (see Section 8.1.2). A key observation is that texture coordinates are represented in 32-bit single precision oatingpoint format. However, the maximum texture size supported by state-of-the-art graphics hardware is 40964096. Addressing a texel on such a texture requires only 12 bits. Therefore, we can use the unused bits to encode This leads to a transmission overhead of 12
(O, S)
(s, t)
39
8.1.2
We make some assumptions on the size and position of the charts in order to allow texture coordinate compression. First of all we consider that the maximum size of a texture atlas is 4096x4096 pixels and the maximum size of each chart is of 512x512 pixels. Also the size of the atlas and the charts must be multiple of four so we only need 10 bits to encode the origin and 7 bits to encode the size. About the original in the
(s, t) coordinates, the integer part must be in the [0..63] range (6 bits) and the fractional part [0..511] range (9 bits). All these restrictions allows us to encode with two unsigned integers (32 s
or
bits) all the required information for texture mapping (see Figure 18): 1. Original texture coordinate (a) Integer part (6 bits) (b) Fractional part (9 bits) 2. The origin of the chart 3. The size of the chart
(15 bits)
Owidth
or
or
Oheight
(10 bits)
Swidht
Sheight
(7 bits)
So we only have to send two packed components per vertex. We have seen that generally this memory layout ts well with our test application but the amount of bits used for each encoded attribute can be modied to match the requirements of another application. Figure 18: Encoding of a compressed texture coordinate
8.1.3
The decompression of the packed coordinates (see Section 8.1.2) is done in the vertex program stage. The two encoded unsigned integers composed by pressed using the integer operators incorporated in the extension EXT_gpu_shader4 of Nvidia (visit http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt for more information). The algorithm 7 of the appendix shows the simple GLSL shader code of the decompression of coordinates.
8.1.4
Mip-mapped textures are essential for achieving any kind of rendering performance. until eventually the lowest mip-level of problem than using a single image.
mapped textures into an atlas, however, seems to imply that the mipmaps of these charts combine,
11
texel. We see that the use of texture mipmapping in combination with texture atlas is a more complex So the rst problem arises when we want to determine the lowest mip-level that we are able to use. In the Figure 19 we see a texture atlas with a set of charts represented by dierent colours. When we
40
1:8
incorrect mix of the upper level charts. Figure 19: Uninitialized texels at the 2x2 and 1x1 mipmaps for an atlas containing 8x8 and 4x4 textures
When we pack textures directly into an atlas texels they are never combined (just copied) and no smearing or cross-pollution occurs. But when we generate the mip-map chain, uninitialized texels may appear. In the technical report of Nvidia [1] related to texture atlas, they propose a simple solution for this problem. Even generating mip-map chains of atlases on the y with a two-by-two box lter does not pollute mip-maps with neighbouring texels, if the atlas is a power-of two texture and contains only power-of-two textures that do not unnecessarily cross power-of-two lines. As the various mip-levels are generated, texels of separated textures do not combine. Also because textures can dier in size and large textures have longer mip-chains than smaller textures, the largest texture packed into an atlas determines the number of mip-map levels in the atlas. So they abridge the mip-chain of an atlas to the length of the mip-chain of the smallest texture contained in the atlas. However, that would typically have severe performances and image quality implications. Furthermore, the restriction that each chart must have a power-of two dimension drastically reduces the available sizes of charts. More problems overcome when we want to support texture ltering: the method used to determine the texture colour for a texture mapped pixel, using the colours of nearby texels. Artifacts may appear at the borders as we use texels from foreign textures to lter. For the highest resolution mip-level, a possible solution is to clamp to the edge the texture coordinates sampling a texel at its center on the border: even when bilinear ltering (in this method the four nearest texels to the pixel center are sampled at the closest mipmap level, and their colours are combined by weighted average according to distance) is enabled only that texel contributes to the ltered output. So the new texture coordinates must be in the range:
(s, t)
While bilinear ltering of the highest resolution mip-level is safe, anisotropic ltering of the same mip-level does potentially access unrelated neighbouring texels. Worse, bilinear and anisotropic ltering of all lower mip-maps also access unrelated neighbouring texels, as Figure 20 demonstrates. The most left gure shows the sampling of corner texels at the highest mip-level clamping to the edge, so we do not have neighbour contribution to the ltering. But at the two next gures, corresponding to lower mip-levels, the same coordinates are no longer dead-center, producing an incorrect ltering.
41
Figure 20: Bilinear ltering of lower mip-levels accesses texels from unrelated neighbouring textures
Taking account of all this mentioned problems, we developed a mipmapping and texture atlas ltering scheme capable to do bilinear ltering and generate a mip-map chain, without the restriction of having only power-of two dimensions of the charts. The rst constraint is introduced by the texture coordinate compression (see Section 8.1.2). Each chart must have a dimension multiple by four. power-of two dimension. Indeed, this restriction is less hard than to have a For example, in a texture atlas with 1024 pixels of width, we have 256
possible dimensions with our method. With the power-of two restriction we have only ten. Indeed, the maximum mip-map levels per atlas is two, as the smallest dimension we may have is four. But that is not a signicant problem, as we are using a progressive representation of the level of detail through a hierarchy (see Section 8.2). mip-levels. To provide bilinear ltering we thought about two dierent techniques: 1. The rst solution is to use a fragment shader to clamp atlas coordinates to corresponding charts taking into account which mip-level the texture operation is about to access. For the highest mip-level the atlas coordinates remain unchanged, yet for lower mip-levels the atlas coordinates are remapped closer to the center. This technique requires pixel shader 2.0 support and a comparatively complex and expensive shader. 2. The second solution is to pad textures and their mip-chains with border texels. That consumes more space, but provides a transparent integration in the texture ltering pipeline and the mipmap chain does not involve any special requirement aside the typical box subsampling. In the next section we will explain in detail this approach and the integration with our technique. We use a two-by-two box lter for the mip-map chain generation, guaranteeing that texels of separate textures do not combine until we do not reach more than two
8.1.5
We add a border for each chart that replicates the repeating eect. Let
(Owidth , Oheight ) the chart origin and (Dwidht , Dheight ) (Ofwidth , Ofheight ) and size (Dfwidht , Dfheight ) is:
(Ofwidth , Ofheight ) = (Owidth b, Oheight b) (Dfwidht , Dfheight ) = (Dwidht + 2b, Dheight + 2b)
The cost of increasing the size with this padding is:
In our case
size penalty is approximately four times the sum of width and height of a chart. We are sacricing spatial cost to transparently integrate the method in the graphical pipeline. amount of extra size will not be critical as long as the maximum mip-levels per atlas is two.
42
This padding enables us to select the available mip-levels depending on the border. Let the number of mip-map levels supported (where 0 is the base texture and and
M N
be
fb (M )a
fb (M ) =
0 2M 1
M =0 M >0
(3)
If we do not have mip-maps no border will be required thanks to the clamp to edge technique. If we have one mip-map a border of on texel will be enough. With two mip-levels we will use a border of two texels. We want that the interpolation in the lower mip-levels takes only the colours of the chart (not the neighbours). As we see in the Figure 21, we have a texture with
to support two mip-levels, so we add a border with two pixels. As we reduce the resolution through the lower mip-levels the corner points clamped to the edge never reaches further from the dead-center. This property ensures us that the nal interpolated colour of the borders will always take the proper colours. Figure 21: Correct bilinear ltering scheme in a 16x16 chart with 2 border texels and 2 mip-levels
We see in the Figure 22 an incorrect bilinear ltering scheme using only one border and storing two mip-levels. In the rst mip-level reduction the corners still get valid colours, but at the lowest mip-level the corners will take colours from neighbouring charts, giving us incorrect results. Figure 22: Incorrect bilinear ltering scheme in a 16x16 chart with 1 border texel and 2 mip-levels
Finally we have that the required border for a given mip-level result of a power-of two exponential function (see Equation 3).
is given by
fb (M )
and is the
8.1.6
The DXTC system introduced in Section 2.1 provides a method to compress textures and decompress them in real-time. Furthermore, graphical hardware has and optimized support to make this decompression faster. Thanks to these advantages, we decided to use it to encode our texture atlas using a DXTC compression scheme.
43
But a problem appeared when we enabled one or two mip-levels per atlas. Instead we applied the constraints described at Section 8.1.4 smaller artifacts appeared in some boundaries of the textures. However in most of the cases the error was somewhat imperceptible, we tried to nd the underlying problem. That problem is clearly focused on the compression stage as we do not have any boundary error without using DXTC compression. If we analyze the system used for compression by DXTC, we see that it converts a mixing for each
44
block of
pixels to a 64-bit or 128-bit quantity resulting in a lossy compression algorithm. So we have a lossy
44
cell of the image. In the case of DXT1 (used to encode only colour) we store 16
input pixels in 64 bits of output, consisting of two 16-bit RGB 5:6:5 colour values and a 4x4 two bit lookup table. In the decompression stage, if the rst colour value second colour value
c0
c1 ,
c2 =
Otherwise, if
1 1 2 2 c0 + c1 c3 = c0 + c1 3 3 3 3 1 1 c0 + c1 2 2 c0
and a value
c0 c1 then: c2 =
And
c3
is transparent black corresponding to premultiplied alpha format. The lookup table is then
consulted to determine the colour value for each pixel, with a value of 0 corresponding to of 3 corresponding to
c3 .
The DXT1 does not store alpha data enabling higher compression ratios.
So taking account that the cell used for the block decomposition has a dimension
44
we introduce
the next additional condition: for every mip-level and chart, the block decomposition only aects each chart so the mixing is never done between neighbour charts. This ensures us that the boundaries will not have artifacts because there is no incorrect interaction between charts. So instead to having a dimension multiple per four it will be dierent depending on the maximum available mip-level
M.
What we want is to have in the lowest mip-level a set of charts with a dimension Let
fd (M, ) be a function that returns M and a scale factor we have that: fd (M, ) = 4 (M + 1) 1
(4)
We see that this is a constraint less exible than the introduced in Section 8.1.4. If we do not have mip-maps
(M = 0)
(multiple of four), but for a maximum mip-level of one or two we have the condition to having a dimension multiple of eight and sixteen respectively. Instead for two levels the restriction is hard than having a power-of two dimension, we have less dimensions available for each chart, then loosing some compression power. However, there are two points that make them more preferable than to do not use texture compression: 1. Higher compression ratios varying between 4:1 to 8:1 that fully compensates the cost to truncate to a valid size of the charts. 2. Thanks to the hierarchical scheme (see Section 8.2) a maximum number of two mip-levels are required for each texture atlas node of the quadtree. This reduces the strenght of the limitation described in Equation 4 to make suitable the DXTC compression for the atlas.
44
8.2
Real-time visualization
We developed a method to visualize a huge collection of buildings with several textures in real-time using our space-optimized texture atlas. Most of the performance gain comes from the fact that texture batching is heavily decreased. Furthermore, we can guarantee that this batching will never reach a given threshold thanks to the space subdivision. In the next sections, we will present the hierarchical representation of the set of buildings and the level of detail technique used for the texture mapping of repetitive details.
8.2.1
A texture atlas tree, introduced by Buchholz and Dllner [6] denes a quadtree subdivision of the scene in the x-z-plane and a corresponding hierarchy of texture atlases. Each node represents a part of the scene geometry and stores its bounding box. The atlas of an inner node contains downsampled versions of its child nodes. The scene geometry is stored in the leaf nodes. For each frame, the tree provides a small collection of texture atlases containing each visible texture of the scene at an appropriate resolution. The computation of the necessary texture resolution is explained in more detail in Section 8.2.2. Our texture atlas compression algorithm has been designed to be used in combination with a hierarchical subdivision of the scene. In the case of urban models, we use a quadtree encoding a hierarchy of multiresolution texture atlases. This hierarchy can be seen as a coarse-level collection of mipmapped texture atlases (we say it is coarse-level because all primitives associated with a quadtree node share the same mipmap level). The geometry associated with each quadtree node is rendered using a pre-ltered texture atlas whose charts have been scaled down so that their size approximately matches the size of the screen-projection of the polygons they are mapped onto (see Section 7.1), under the viewing conditions causing the quadtree node to be selected for drawing. We use a texture atlas tree to represent all the buildings and consider that they are placed in a two dimensional space and classied by the building height. Let
the proximity of the quadrant center. The Algorithm 9 referred on the annex clearly illustrates this
However there are some dierences between the texture atlas tree proposed by [6] and our system. Their scheme requires that all the atlas have equal size but our system does not have this limitation. Furthermore, with their solution two sets of texture coordinates are specied for the triangles, one
45
referring to the original textures, and the other referring to the texture atlas of the leaf node: we only use one set of compressed textures (see Section 8.1.2) per atlas. In the Figure 23 we see an example of a texture atlas tree using a bin-tree hierarchy. In the upper level
N 1.
is
As we see, this
system provides a level of detail technique that tries to minimize the texture batching lack. We also see that the total number of texture atlas required for a quadtree with deep
n i i=0 4 . For each node we store a value that denes the area precision (meters/texel) of the associated
texture atlas. In the preprocessing stage, where we build each texture atlas, we use this area precision to scale the charts to an appropriate dimension. We rst dene the area precision of the leaf nodes of the texture atlas tree and then recursively set the upper levels with a lower area precision. Let
L [1..Nr ] be the current level in the atlas hierarchy (where L = 1 refers to the leaf nodes and L = Nr to the global root node), l the area precision selected for the level 1 and (L) a function that returns the area precision for a given level L. We set this function as: (L) = L1 l > 1 is a scale factor of the subsampling force. The clever selection of the deep tree N and is a key to achieve better visual results and performance. We also have to consider the mipmapping when selecting : dierent mip-levels may be applied to a level L considering the area precision (L + 1) of the upper level. For our implementation, we set = 8, Nr = 2 and l = 0.05. That means that the highest precision is set to 0.05 m/texel and we do not subdivide more than two levels on the quadtree. The
Where the value of factor progressively subsamples lower resolution levels multiplying by eight the area precision. Buchholz and Dllner also add a texture border area (see Section 21) to the required texture area to avoid mixing adjacent textures in downsampled versions of the texture atlas. But for their implementation they use a constant border width of eight texels guaranteeing the avoidance of texture pollution for the rst three down-sampling levels. Indeed, that supposes heavy waste of memory. So we decided to minimize the amount of memory used for the mipmapping, and designed an adaptive scheme capable to use texture atlas with dierent mip-levels. We will need more mip-levels in the texture atlas with highest area precision than for example in the texture atlas of the parent node, where in our implementation reaches a 3.2 meters/texel area precision. Let mip-levels associated to a level texture atlas. So we dene
M ip (L)
be the number of
of the tree, we can limit the necessary number of mip-levels seeing as:
how the mipmapping subsampled is done: the reduction is done dividing by two the dimensions of the
M ip (L)
M ip (L) =
0 log2 () 1
L=N L<N
For the node with lowest resolution we do not need any mipmap. For the next nodes with more precision,
M ip (L)
is dened by the scale factor between the lower and upper level of resolution. We
see that this scale is in fact the scale factor the number of mip-levels for all the levels
= (L+1) . So in our case we have a xed requirement on (L) (M ip (L) = 2) except for the lowest resolution level. So we (L)
values associated
are storing the pyramid of subsampled texture atlas explicitly in the quadtree hierarchy and implicitly in the mip-levels associated to each texture atlas. Table 2 shows the resulting by each
M ip (L)
used.
Table 2: Area precision and mip-level associated to each atlas tree level of our implementation
(L)(m/texel)
0.05 0.4 3.2
M (L)
2 2 0
0 1 2
46
artifacts are less noticeable than in the highest resolution level, where we use two mip-levels.
8.2.2
Building rendering
The rendering is performed by a top-down traversal of the texture atlas tree. For each traversed node we perform a view frustum test: if the bounding box is completely outside the view frustum the node is skipped. If it is visible, we compute the point of the bounding box node nearest to the viewer and the distance
Sproj
given an
area precision
For simplicity, we assume identical scale factors for both screen axes (see Figure 24).
The screen projection of a line segment of the length world viewed from the distance size if it is oriented orthogonal to the viewing direction. Let
has maximum
the eld
Sproj =
If
w 2d tan
f 2
(5)
Sproj
tres
the current level of resolution is binded and used to wrap the textures. The texture resolution is chosen in a way that the texel-per-pixel ratio is always near to one (tres to decrease the texture quality and increase the performance. Figure 24: Screen projection factor scheme
= 1)
necessary texture data remains diminished. However the texel-per-screen pixel ratio can be increased
The quadtree subdivision guarantees that for a given maximum deep, we also have a maximum texture binding switches. In general for a deep
So if we have two maximum levels of deep of our quadtree, the maximum number of texture batches is only sixteen. In some applications with several textures it means a very signicant performance gain. Furthermore, for huge environments this batching bound is rarely reached because it only occurs when we are seeing all the scene and we require the maximum resolution for each node.
8.2.3
Terrain visualization
We render the terrain using aerial photos projected on a planar surface. We use a regular partition subdivision of the terrain in tiles. For each tile we have associated a collection of photos with dierent resolutions that represents the tile region. To increase the performance, we use a quadtree where the leafs are the tiles and the root represents the whole terrain. The rendering is performed by a top-down traversal of the tile tree. As with building rendering (see Section 8.2.2) for each traversed node we perform a view frustum test and discard it if is completely outside from the view frustum. When we reach the visible leafs, we have to select the appropriate resolution of the texture. We also use the
Sproj (see
equation 5) with
=1
47
d.
Let
tlenght
vy
the vector joining the viewer and the nearest point of the tile. Then, the required resolution
Sres
in
Sres
give the required resolution for a distance and viewer orientation. To take advantage In our implementation, we used for the highest level of detail a resolution of
of the graphical pipeline we recommend to use power-of two textures to represent the dierent levels of detail of each tile.
512 512
32 32
can see a simple 2D scheme of the level of detail used in terrain rendering. The nearest tiles use higher resolution tiles and as we get far lower resolution textures are used. Figure 25: Terrain rendering LOD example
One advantage to load separately each resolution version of a tile (instead to store implicitly in a mipmap pyramid) is that at each rendering time we only load the necessary texture tiles. We use a simple texture cache scheme that loads onto the memory the used texture tiles and unloads them when the cache reaches a predened maximum amount of memory or number of texture units.
48
Part V
Results
9 Test specications
9.1 Test model
The model we used to test our technique covers an area of 234.4 set of thumbnails of the facade textures. Table 3: Test model geometry information
km2
4 show the geometry and texture information respectively of the test model. The Figure 26 shows a
Triangles Vertices Buildings Building blocks Terrain area Total surface area (facades+ceilings)
km2 km2
Number of distinct textures Memory space without compression Memory space with DXT1 compression Memory space with JPEG compression Average resolution Needed texture memory space for all the city withouth wrapping Needed memory space for all the city without compression Needed memory space for all the city with DXT1 compression
cm2 /texel
49
9.2
Hardware tested
The hardware used for all the tests was: Table 5: Hardware specications
Nvidia Geforce GTX 260 Intel Core 2 Quad 2.33 Ghz 3.25 GB
50
10 Space compression
Space compression results analyzes the image dowmsampling and compression of a set of eight test images. Also tests the compression achieved by the test model varying the tolerance and the error metric. The specication of the test images are follows: Table 6: Test image specications Name Window Dimensions 512x512 Snapshot
(a)
Rocktile
500x375
(b)
Bricktile
499x366
(c)
Fabrictile
400x300
(d)
Textureatlas
400x400
(e)
Crayons
516x341
(f)
Boat
500x336
(g)
Aerialphoto
521x512
(h)
51
10.1
Image downsampling
52
53
54
As said in Section 7.2, we see that HVSE has more sensitivity to the eect of downsampling compared with RMSE. That is a desirable property because we want to distinguish between dierent tolerance levels. RMSE poorly reacts to the downsampling also with a reduction of 90% or greater. However we also see that HVSE have in some cases high sensitivity applying a little downsampling. That may be a restriction if we dene error thresholds not greater than the initial minimum threshold. Per example, in the Figure 30 only reducing by 1% the width and height we obtain more than 50% of error so if we do not have an error threshold greater we will not have any compression. In the Figure
55
33 we have a dierent case, all the two metrics start with similar error but another time, HVSE is more sensitive to the downsampling increase. In general, as depicted in Section 1, we see that RMSE is not suitable in the context of measuring the visual perception of image delity: since all image pixels are treated equally image content-dependent variations in image delity cannot be accounted for. The Sections 10.1.2 and 10.1.3 also include more tests that demonstrates this poor beahviour.
56
57
58
59
We see that RMSE downsamples too much fast the size of the textures while increasing the visual error tolerance. In some cases, even at 10% of error tolerance we obtain a full downsampling of 1x1 chart size. With a 20% all the tests downsample to 1x1. This implies that we have a limited range of image compression using the method described at Section 7.2.
60
61
62
63
64
65
66
67
68
We use dierent L values for the tests of this section and Section 10.1.2 because RMSE slowly reacts to the downsampling and have less error tolerance resolution. As we see, because HVSE has more sensitivity to the downsampling it provides more resolution in the range of error metric tolerance. That is also clearly illustrated in Section 10.1.1. Compared with RMSE, that downsamples to a 1x1 chart until it reaches the 20% error tolerance, we have more range using HVSE. In some cases, the compression starts at tolerance error levels greater than 30-40% (e.g. in the Figure 50), so HVS seems to be more restrictive in the initial steps of the userdened error tolerance. This happens when the images contain more visual perceptual information (see Section 7.2.3 for detailed explanation of the criteria). We also detect that the stretching is sometimes anisotropic depending on the downsampling direction (e.g. in the Figure 45) meaning that the downsampling process distincts between the amount of information provided separately by width and height.
View size: size after the match to view conditions (see Section 7.1). View CR: compression ratio due the match to view conditions. Visual size: size after the saliency matching. Visual CR: compression ratio with respect to view size. Total CR: total compression ratio achieved.
0.05 meter/texel 400x300 2.18:1
Window compression results View size View CR Visual size Visual CR Total CR
5%
350x75 4.57:1 9.98:1
RMSE 10%
12x1 10000:1 21845:1
20%
1x1 120000:1 262144:1
10%
398x298 1.01:1 2.21:1
20%
398x298 1.01:1 2.21:1
HVSE 30%
398x298 1.01:1 2.21:1
40%
225x290 1.83:1 4.01:1
50%
100x103 11.65:1 25.45:1
RMSE 10%
6x1 308.33:1 43960:1
141.69:1
20%
1x1 1850:1 262144:1
10%
48x35 1.10:1 156:1
20%
48x35 1.10:1 156:1
50%
37x5 10:1 1416:1
69
RMSE 10%
6x4 1:1 10922:1
20%
3x4 2:1 21845:1
10%
6x4 1:1 10922:1
20%
4x4 1.5:1 16384:1
HVSE 30%
4x3 2:1 21845:1
40%
4x2 3:1 32768:1
50%
4x2 3:1 32768:1
Rocktile compression results View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 7812.5:1 0.05 meter/texel 400x300 1.56:1
5%
375x298 1.07:1 1.67:1
RMSE 10%
56x93 23:1 36:1
20%
1x1 120000:1 187500:1
10%
362x262 1.26:1 1.97:1
20%
201x126 4.73:1 7.40:1
HVSE 30%
104x63 18.31:1 28.61:1
40%
62x30 64.51:1 100.8:1
50%
39x12 256.4:1 400:1
5%
48x35 1.10:1 111.6:1
RMSE 10%
25x18 4.11:1 416:1
20%
1x1 1850:1 187500:1
10%
48x35 1.1:1 111.6:1
20%
48x35 1.1:1 111.6:1
HVSE 30%
43x35 1.22:1 124.58:1
40%
37x18 2.77:1 281.5:1
50%
6x13 23.71 2403.85:1
RMSE 10%
6x4 1:1 7812.5:1
20%
3x4 2 15625:1
10%
6x4 1:1 7812.5:1
20%
6x4 1:1 7812.5:1
HVSE 30%
3x2 4:1 31250:1
40%
1x2 12:1 93750:1
50%
1x2 12:1 93750:1
Bricktile compression results View size View CR Visual size Visual CR Total CR
5%
25x150 32:1 48.1:1 0.05 meter/texel 400x300 1.52:1
RMSE 10%
1x1 120000:1 182634:1
20%
1x1 120000:1 182634:1
10%
312x298 1.29:1 1.96:1
20%
150x253 3.16:1 4.81:1
HVSE 30%
73x164 10.02:1 15.25:1
40%
40x103 29.12:1 44.32:1
50%
21x58 98.52:1 149.94:1
70
View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 7609.75:1
5%
1x18 102.7:1 10146:1
RMSE 10%
1x1 1850:1 182634:1
20%
1x1 1850:1 182634:1
10%
48x35 1.1:1 108.71:1
20%
48x35 1.1:1 108.71:1
HVSE 30%
37x35 1.42:1 141:1
40%
3x35 17.61:1 1739:1
50%
1x1 1850:1 182634:1
RMSE 10%
6x4 1:1 7609.75:1
20%
1x4 6:1 45658:1
10%
6x4 1:1 7609:1
20%
1x4 6:1 45658:1
HVSE 30%
1x1 24:1 182634:1
40%
1x1 24:1 182634:1
50%
1x1 24:1 182634:1
Fabrictile compression results View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 5000:1 0.05 meter/texel 400x300 1:1
5%
225x187 2.85:1 2.85:1
RMSE 10%
1x9 13333:1 13333:1
20%
1x1 120000:1 120000:1
10%
398x298 1.01:1 1.01:1
20%
398x298 1.01:1 1.01:1
HVSE 30%
398x298 1.01:1 1.01:1
40%
343x248 1:1.41 1:1.41
50%
243x154 3.2:1 3.2:1
5%
25x18 4.11:1 4.11:1
RMSE 10%
1x4 462.5:1 30000:1
20%
1x1 1850:1 120000:1
10%
48x35 1.1:1 71.4:1
20%
48x35 1.1:1 71.4:1
HVSE 30%
48x35 1.1:1 71.4:1
40%
48x35 1.1:1 71.4:1
50%
48x35 1.1:1 71.4:1
RMSE 10%
6x4 1:1 5000:1
20%
3x4 2:1 10000:1
10%
6x3 1:1 5000:1
20%
4x4 1.5:1 7500:1
HVSE 30%
4x3 2:1 10000:1
40%
4x3 2:1 10000:1
50%
4x3 2:1 10000:1
71
Textureatlas compression results View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 6666:1 0.05 meter/texel 400x300 1.33:1
5%
398x298 1.01:1 1.34:1
RMSE 10%
125x56 17.14:1 22.85:1
20%
1x1 120000:1 160000:1
10%
400x300 1:1 1.33:1
20%
400x300 1:1 1.33:1
HVSE 30%
398x298 1.01:1 1.34:1
40%
262x243 1.88:1 2.51:1
50%
131x152 6.02:1 8.03:1
5%
48x34 1.13:1 98:1
RMSE 10%
25x18 4.11:1 355:1
20%
1x1 1850:1 160000:1
10%
50x37 1:1 64.86:1
20%
50x37 1:1 64.86:1
HVSE 30%
50x37 1:1 64.86:1
40%
48x35 1.10:1 95.23:1
50%
48x35 1.10:1 95.23:1
RMSE 10%
6x4 1:1 6666:1
20%
3x4 2:1 13333:1
10%
6x4 1:1 6666:1
20%
4x4 1.5:1 10000:1
HVSE 30%
4x3 2:1 133333.3
40%
3x3 2.66:1 17777:1
50%
3x3 2.66:1 17777:1
Crayons compression results View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
48x36 1.1:1 104.73:1 0.05 meter/texel 400x300 1.46:1
5%
350x225 1.52:1 2.23:1
RMSE 10%
87x56 24.63:1 36.11:1
20%
1x3 20000:1 58652:1
10%
400x300 1:1 1.46:1
20%
399x298 1.01 1.48:1
HVSE 30%
275x298 1.46:1 2.14:1
40%
153x199 3.94:1 5.77:1
50%
90x121 11.01:1 16.15:1
RMSE 10%
48x18 2.14:1 203.65:1
20%
1x2 925:1 87978:1
10%
50x37 1:1 95.11:1
20%
48x35 1.1:1 104.73:1
HVSE 30%
48x34 1.1:1 107:1
40%
48x34 1.1:1 107:1
50%
48x34 1.1:1 107:1
72
RMSE 10%
6x4 1:1 7331:1
20%
3x4 2:1 14663:1
10%
6x4 1:1 7331:1
20%
4x4 1:5 10997:1
HVSE 30%
4x3 2:1 14663:1
40%
4x3 2:1 14663:1
50%
4x3 2:1 14663:1
Boat compression results View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 7000:1 0.05 meter/texel 400x300 1.4:1
5%
100x75 16:1 22.4:1
RMSE 10%
4x11 2727.27:1 318.18:1
20%
1x1 120000:1 168000:1
10%
143x65 12.91:1 18.07:1
20%
60x22 90.9:1 127.27:1
HVSE 30%
32x12 312.5:1 437.5:1
40%
7x18 952.3:1 1333:1
50%
4x12 2500:1 3500:1
5%
25x18 4.11:1 373.33:1
RMSE 10%
12x9 17.12:1 1555.56:1
20%
1x1 1850:1 168000:1
10%
50x37 1:1 90.81:1
20%
25x27 2.74:1 248.88:1
HVSE 30%
14x13 10.16:1 923:1
40%
9x6 34.25:1 3111:1
50%
4x4 115.62:1 10500:1
RMSE 10%
6x4 1:1 7000:1
20%
3x4 7000:1 14000:1
10%
6x4 1:1 7000:1
20%
4x4 1.5 10500:1
HVSE 30%
4x3 2:1 14000:1
40%
4x3 2:1 14000:1
50%
3x3 2.66:1 18666:1
Aerialphoto compression results View size View CR Visual size Visual CR Total CR
5%
400x300 1:1 2.18:1 0.05 meter/texel 400x300 2.18:1
RMSE 10%
398x262 1.15:1 2.51:1
20%
1x1 120000:1 262144:1
10%
400x300 1:1 2.18:1
20%
400x300 1:1 2.18:1
HVSE 30%
400x300 1:1 2.18:1
40%
312x267 1.44:1 3.14:1
50%
225x196 2.72:1 5.94:1
73
View size View CR Visual size Visual CR Total CR View size View CR Visual size Visual CR Total CR
5%
6x4 1:1 10922:1
5%
25x18 4.11:1 582.54:1
RMSE 10%
1x1 1850:1 262144:1
20%
1x1 1850:1 262144:1
10%
50x37 1:1 141.69:1
20%
50x37 1:1 141.69:1
HVSE 30%
48x27 1.42:1 202.27:1
40%
43x4 10.75:1 1524:1
50%
1x4 462.5:1 65536:1
RMSE 10%
6x4 1:1 10922:1
20%
1x4 6:1 65536:1
10%
6x4 1:1 10922:1
20%
3x4 2:1 21845:1
HVSE 30%
1x1 24:1 262144:1
40%
1x1 24:1 262144:1
50%
1x1 24:1 262144:1
Wee see that RMSE achieves high compression ratios because his low sensitivity to the downsampling eect, also demonstrated on Section 10.1.1,10.1.2 and 10.1.3. This commonly supposes a heavy loss of image information for error metric tolerances greater than 10% or even less. In the case of HVSE, we see in some cases low reaction to downsampling with the lowest tolerance level, but the correlation between the reduction of size and the user-dened error tolerance is higher than RMSE correlation. We nd more utility to use compression using HVSE for the levels with the highest area precision. In the test examples, using 0.05m/texel area precision, the compression ratio with respect to the compression to match to view conditions goes from the range of 1:1 to 2500:1, but in the average case reacts much more to the downsampling eect than RMSE. It is interesting to see that in some cases, the visual size is greater when downsampling the levels with less area precision (e.g. the Aerialphoto using RMSE at 20% and 3.2 meter/texel have a visual size of 1x4, and using 0.05meter/texel just 1x1). This behaviour appears only when using RMSE and is another sign of its poor ability to measure the visual perception of image delity.
CR View matching
14.22:1 1611:1 379000:1 42.28:1
We see that the view matching downsampling involves a signicant compression of the space used to represent each LOD level. The next six tables show the compression obtained by introducing the image saliency matching. The second row indicates the compression ratio with respect to the view matching size and the third one the total compression.
74
Total CR
35:92 1632:1 391738:1 105:1
Total CR
475:1 1864:1 391738:1 1135:1
Total CR
69102:1 28224:1 391738:1 57192:1
We see that RMSE obtains too much high compression ratios thanks to his poor reaction to the downsampling eect. In the level with 20% of error tolerance the total CR reaches 57192:1 causing an unnaceptable loss of information. Table 11: City downsampling HVSE 10%
Total CR
25:1 1708:1 399599:1 75:1
Total CR
110:1 4291:1 399599:1 323:1
75
Total CR
362:1 7720:1 399499:1 1037:1
As we see, image saliency matching is also capable to compress the space using dierent metric tolerances and maintaining the visual delity. The total compression ratios go from 75:1 with 10% to 1037:1 with 50% and are more correlated compared with RMSE.
76
10.2
Packing
The next gures show for dierent packing sets the texture atlas obtained using four dierent congurations. An atlas is optimized if it uses the Algorithm 6 and is stretched if uses the Algorithm 5, both presented in Section 7.3.3. A close-up view is also shown in order to appreciate the unused space between the packed charts. Figure 51: Texture set 1 packing
77
78
The Table 14 and Figure 56 shows for each texture set the occupancy of the packing of each of the four methods. We see that the best option is the optimized and stretched, obtaining always a 100% coverage of the atlas. Figure 56: Texture packing results
79
Table 14: Texture set packing occupancy Texture set Resolution Number of subtextures meters/texel Not optimized / not stretched Not optimized / stretched Optimized / not stretched Optimized / stretched 1 2048x2048 192 0.02 72.37% 78.14% 89.48% 100% 2 2048x2048 378 0.03 81.85% 85.44% 95.29% 100% 3 4096x4096 1107 0.03 75.56% 77.96% 95.34% 100% 4 4096x4096 1689 0.04 83.65% 85.58% 96.36% 100% 5 4096x4096 2897 0.05 92.20% 94.32% 96.96% 100%
80
10.3
We have tested the three chart encoding options listed in Table 1 of Section 8.1.1. We rendered the whole test model (see Section 9.1) to extract the results shown on the Figure 15. second are measured using a viewport of The frames per seconds are measured
1 1 with the camera facing the whole geometrical dense city. with the maximum size of the viewport (in our case 1440 900)
and facing the whole city. Finally, the fragments per second are measured rendering one building and facing a textured facade with a maximum size of the viewport. Since Option 1 in Table 1 does not have VBO support we adapted the rendering scheme to use VBO to make it comparable with the other two options. We introduced the additional condition to have the same chart per vertex buer. This increases the number of required VBO buers and the fragmentation reducing the performance. The results are better than we predicted. The option that uses packed coordinates (Option 3) have the highest vertices/s and frames/s rate although it have to decompress each vertex information (see Section 8.1.3). The Option 2 do not have any decompression step but has 20.76% and 23.67% less vertices/s and frames/s rate respectively. The option 1 fails with both intensive geometry tests but has the best fragments/s rate because it has the simplest fragment shader. So we see that the option 3 has more performance reducing the memory bandwidth instead it have an additional decompression cost per vertex. Table 15: Chart encoding performance (for more information of the encoding techniques see Table 1) Encoding technique Option 1 Option 2 Option 3 Vertices/s 44,032,002 436,739,971 527,426,950 Fragments/s 1,570,752,000 1,382,832,000 1,362,096,000 Frames/s 14.57 132.37 163.71
81
82
11 Time Performance
To evaluate the time performance we have dened a walkthrough of the city going from far to a close view of a building. We show the results for each rendering technique. When we do not use texture atlas, we tried to load the original textures but it was just imposible to load all of them so we switched to a subsampled version with 256x256 and DXT1 compression for the results provided in this section. We do not use visual improvements (see Section VII) and disable terrain rendering to just evaluate the texture atlas rendering. The Figure 61 shows the evolution of the framerate during the walk. evolution of the walkthrough in steps of two seconds. Table 16: Resulting framerate for each technique (walkthrough) Technique Encoding option 1 Encoding option 2 Encoding option 3 VBO / No texture atlas Inmediate / No texture atlas Minimum FPS 20 136 158 10.3 1.16 Maximum FPS 30.36 210.16 251.5 14.22 1.59 Average FPS 24.63 174.55 206.89 12.34 1.16 The snapshots show the
83
As in Section 10.3, we see that the encoding option 3 has the best framerate, followed by the encoding option 2. Both have similar evolution during the walkthrough. However, the encoding option 1 fails in the test (obtains FPS nearly the other two options that do not use texture atlas) because the heavy VBO fragmentation. Comparing the average FPS of the encoding option 2 and the technique that use VBO without texture atlas, we have an speed-up factor of 17, very signicant in a real-time rendering system.
84
12 Selected snapshots
The next gures are selected snapshots of the city using the visual improvements described at Section VII and enabling terrain rendering. Figure 62: Barcelona snapshot 1
85
86
87
Part VI
Conclusions
We have developed a novel technique that generates space-optimized texture atlas encoding repetitive texture details of the geometry and implemented an application able to render in real time a huge city with thousands of textures achieving interactive framerates. Our contributions are summarized as follows:
An algorithm for resizing each chart in accordance with the object-space size of the surface the chart is mapped onto and the perceptual importance under a given viewing conditions. An algorithm to pack rectangular charts into a single texture that minimizes the unused space.
A compressed texture coordinate format designed to support tiled textures avoiding the unfolding of periodic textures. Several shader techniques providing within-chart tiling support and decompression of texture coordinates. Full support to DXTC formats avoiding artifacts due the texture atlas compression. A texture atlas hierarchy supporting implicit mipmap levels per atlas and providing explicit user-dened texture LOD. A visual improvement in the particular case of the city of Barcelona mixing with the facade details the extracted characteristic colours.
88
Part VII
Appendix
#extension GL_EXT_gpu_shader4 : enable // bit mask // bits 31..22 (10 bits): O/4; // all subimages start at multiple-of-four positions // bits 21..15 ( 7 bits): S/4; // all subimages have multiple-of-four dimensions // bits 14.. 9 ( 6 bits): integer part of original tex coordinate // (max repeat factor: 63) // bits 8.. 0 ( 9 bits): fract part of original tex coordinate //(max subimage size: 512) const uint exp2_22 = 4194304u; const uint exp2_22_div4 = 1048576u; const uint exp2_15 = 32768u; const uint exp2_15_div4 = 8192u; const uint exp2_9 = 512u; varying vec4 rc; //input compressed coordinates uniform vec2 textureSize; //texture atlas size uniform int border; //texture atlas charts border size void main(){ // Decode input packed coordinates uvec2 rem; rc.st = vec2((input/exp2_22*4u)+border)/textureSize; rem = input % exp2_22; rc.pq = vec2((rem/exp2_15*4u)-2u*border)/textureSize; rem = rem % exp2_15; gl_TexCoord[0].st=vec2(rem/exp2_9)+vec2(rem%exp2_9)/vec2(exp2_9); gl_Position = ftransform(); }
Algorithm 8
#extension GL_ARB_shader_texture_lod : enable varying vec4 rc; //The decompressed texture coordinates uniform sampler2D tex; //The input texture atlas void main(){ gl_FragColor=texture2DGrad(tex,rcv.pq*fract(gl_TexCoord[0].st) +rcv.st,dFdx(gl_TexCoord[0].st),dFdy(gl_TexCoord[0].st)); }
89
Quadtree generation
Algorithm 9
Quadtree generation (see Section 8.2.1) //Quadtree constructor //Each variable beginning with the name quadtree, refers to variables //of the quadtree object constructor Quadtree(B ,n) // Base case
if n = 0 then else
quadtree.B =
// Recursive case
for h = 0..4 do
// For each son collect buildings in the center of the quadrant vector Building subset;
for i = 0..B.size() do if B [i].center.x > B .center.x then end if if B [i].center.z > B .center.z then end if if c = h then end if end for end for end if
subset.push(B [i]);
c=0
c=c+1
c=c+2
quadtree.children[h] =
new
QuadtreeNode(subset,n
1)
90
Visual improvements
We have noticed that rendering the Barcelona model using the provided textures is quite unrealistic: the textures contains several details but the colours used do not correspond in some cases to the characteristic colours of Barcelona facades. some of them represented in the Figure 68. Figure 68: Some characteristic colours of Barcelona facades Thanks to Jordi Moyes and Carlos Andujar, the most present colours of the buildings have been extracted. In total we have twenty-four dierent colours,
We want to mix this colour with the facade texture to improve the visual realism. The mixing is done in the fragment program stage. We mathematically dene a subdivision of the XZ coordinates (the Y represents de building height) that changes above a variable step parameter be the collection with
(meters). Let
pf rag
t=
colourmix
is the colour used for the mixing with the texture colour.
For the shading we only consider local diuse and ambient shading and try to place the light position in a similar way that seems to be in the aerial photos.
91
References
[1] Improve batching using texture atlases. Technical report, Nvidia Corporation, 2004. [2] Carlos Andujar and Jonas Martinez. Locally-adaptive texture compression.
[3] J.E. Beasley. An exact two-dimensional non-guillotine cutting tree search procedure.
[4] Andrew C. Beers, Maneesh Agrawala, and Navin Chaddha. Rendering from compressed textures.
pages 373378, August 1996. [5] David Benson and Joel Davis. Octree textures.
[6] Henrik Buchholz and Jrgen Dllner. View-dependent rendering of multiresolution texture-atlases. In
[7] Peter J. Burt and Edward H. Adelson. The laplacian pyramid as a compact image code. pages 671679, 1987. [8] Graham Campbell, Thomas A. DeFanti, Je Frederiksen, Stephen A. Joyce, and Lawrence A. Leske. Two bit/pixel full color encoding.
[9] Nathan A. Carr and John C. Hart. Meshed atlases for real-time procedural solid texturing.
ACM
[11] K Chiu, M Herf, P Shirley, S Swamy, C Wang, and K Zimmerman. Spatially nonuniform scaling functions for high contrast images. In
[12] Paolo Cignoni, Fabio Ganovelli, Enrico Gobbetti, Fabio Marton, Federico Ponchio, and Roberto
[13] David Cline and Parris K. Egbert. Interactive display of very large textures. In
Computer Society Press. [14] Michael F. Cohen, Jonathan Shade, Stefan Hiller, and Oliver Deussen. Wang tiles for image and texture generation.
[15] Daniel Cohen-Or and Yishay Levanoni. Temporal continuity of levels of detail in delaunay triangulated terrain. In Los Alamitos, CA, USA, 1996. IEEE Computer Society Press. [16] Scott Daly. The visible dierences predictor: an algorithm for the assessment of image delity. pages 179206, 1993.
VIS '96: Proceedings of the 7th conference on Visualization '96, pages 3742,
[17] Leila De Floriani, Paola Magillo, and Enrico Puppo. Building and traversing a surface at variable resolution. In
VIS '97: Proceedings of the 8th conference on Visualization '97, pages 103., Los
92
[18] David DeBry, Jonathan Gibbs, Devorah DeLeon Petty, and Nate Robins. Painting and rendering textures on unparameterized models. In SIGGRAPH '02: Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pages 763768, New York, NY, USA, 2002. ACM. [19] K. Dowsland. Some experiments with simulated annealing techniques for packing problems.
Eu-
Wiley, 1997.
[21] M. R. Garey D. S. Johnson E. G. Coman, Jr. and R. E. Tarjan. Performance bounds for leveloriented two-dimensional packing algorithms.
[22] Matthias Eck, Tony DeRose, Tom Duchamp, Hugues Hoppe, Michael Lounsbery, and Werner Stuetzle. Multiresolution analysis of arbitrary meshes. In SIGGRAPH '95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 173182, New York, NY, USA, 1995. ACM. [23] Carl Erikson, Dinesh Manocha, and William V. Baxter, III. Hlods for faster display of large static and dynamic environments. In
I3D '01: Proceedings of the 2001 symposium on Interactive 3D graphics, pages 111120, New York, NY, USA, 2001. ACM. Algebraic and Discrete Methods, 3:6676, 1982. SIAM. J. on
[25] Sandor P. Fekete and Jrg Schepers. On more-dimensional packing i: Exact algorithms, 1997. [26] Sandor P. Fekete and Jrg Schepers. On more-dimensional packing i: Modeling, 1997. [27] Christian Frueh, Russell Sammon, and Avideh Zakhor. Automated texture mapping of 3d city models with oblique aerial imagery. USA, 2004. IEEE Computer Society. [28] Thomas A. Funkhouser and Carlo H. Squin. Adaptive display algorithm for interactive frame
In 3DPVT '04: Proceedings of the 3D Data Processing, Visualization, and Transmission, 2nd International Symposium, pages 396403, Washington, DC,
SIGGRAPH '93: Proceedings of the 20th annual conference on Computer graphics and interactive techniques, pages 247254, New
rates during visualization of complex virtual environments. In York, NY, USA, 1993. ACM.
[29] Stefan Maierhofer Gerd Hesina and Robert F. Tobler. Texture management for high-quality city walk-throughs.
Proceedings of CORP -International Symposion on Information and Communication Technologies in Urban and Spatial Planning, 25:305308, 2004. Vector quantization and signal compression.
Kluwer Academic
Publishers, Norwell, MA, USA, 1991. [31] Bernd Girod. What's wrong with mean-squared error? pages 207220, 1993. [32] Andrew S. Glassner.
San Francisco, CA, USA, 1994. [33] Rafael C. Gonzalez and Richard E. Woods. Inc., Upper Saddle River, NJ, USA, 2006. [34] David J. Heeger and James R. Bergen. Pyramid-based texture analysis/synthesis. In pages 229238, New York, NY, USA, 1995. ACM.
Prentice-Hall,
SIGGRAPH '95: Proceedings of the 22nd annual conference on Computer graphics and interactive techniques,
93
[35] Lewis E. Hitchner and Michael W. McGreevy. Methods for user-based reduction of model complexity for virtual planetary exploration.
[36] Hugues Hoppe. Optimization of mesh locality for transparent vertex caching. pages 269276, 1999.
[37] Arnaud E. Jacquin. Image coding based on a fractal theory of iterated contractive image.
Transactions on Image Processing, vol. 1, issue 1, pp. 18-30, 1:1830, 1992. ECCV, 3952:522535, 2006.
[38] Alexei A. Efros James Hays, Marius Leordeanu and Yanxi Liu. Discovering texture regularity as a higher-order correspondence problem.
[39] Nebojsa Jojic, Brendan J. Frey, and Anitha Kannan. Epitomic analysis of appearance and shape. In page 34, Washington, DC, USA, 2003. IEEE Computer Society. [40] A. J. Ahumada Jr. Simplied vision models for image quality assessment.
ICCV '03: Proceedings of the Ninth IEEE International Conference on Computer Vision, SID International
[41] G. Knittel, A. Schilling, A. Kugler, and W. Strasser. Hardware for superior texture performance.
3DPVT '04: Proceedings of the 3D Data Processing, Visualization, and Transmission, 2nd International Symposium, pages 275282, Washington, DC, USA, 2004. IEEE Computer Society.
production operator for high dynamic range scenes.
Shader X6.
I3D '07: Proceedings of the 2007 symposium on Interactive 3D graphics and games, pages 2531, New York, NY, USA, 2007. ACM.
meshes. Technical report, INRIA, 2004.
[46] Sylvain Lefebvre, Jerome Darbon, and Fabrice Neyret. Unied texture management for arbitrary
[47] Aaron E. Lefohn, Shubhabrata Sengupta, Joe Kniss, Robert Strzodka, and John D. Owens. Glift: Generic, ecient, random-access gpu data structures.
[48] Joshua Levenberg. Fast view-dependent level-of-detail rendering using cached geometry. In
VIS
[49] Bruno Lvy, Sylvain Petitjean, Nicolas Ray, and Jrome Maillot. Least squares conformal maps for automatic texture atlas generation. [50] M. Leyton.
[51] Wen-Chieh Lin, James H. Hays, Chenyu Wu, Vivek Kwatra, and Yanxi Liu. A comparison study of four texture synthesis algorithms on regular and near-regular textures. Technical Report CMURI-TR-04-01, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, January 2004. [52] Peter Lindstrom, David Koller, William Ribarsky, Larry F. Hodges, Nick Faust, and Gregory A. Turner. Real-time, continuous level of detail rendering of height elds. In SIGGRAPH '96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 109118, New York, NY, USA, 1996. ACM.
94
[54] Frank Losasso and Hugues Hoppe. Geometry clipmaps: terrain rendering using nested regular grids. In 2004. ACM. [55] J. L. Mannos and D. J. Sakrison. images.
SIGGRAPH '04: ACM SIGGRAPH 2004 Papers, pages 769776, New York, NY, USA,
The eects of a visual delity criterion on the encoding of
IEEE Transactions on Information Theory, 20:522, 1974. INFORMS J. on Computing, 15(3):310319, 2003. Symmetry Groups and Their Applications.
[58] R.E. Gomory P.C. Gilmore. A linear programming approach to the cutting stock problem.
Op-
SIGGRAPH '85: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pages 287296, New York, NY, USA, 1985.
[60] Larry F. Hodges William Ribarsky Nick Faust Peter Lindstrom, David Koller and Gregory Turner. Level-of-detail management for real-time rendering of phototextured terrain. Georgia Institute of Technology, 1995. [61] Paul Heckbert Pixar and Paul S. Heckbert. Survey of texture mapping. Technical report,
[62] Budirijanto Purnomo, Jonathan D. Cohen, and Subodh Kumar. Seamless texture atlases. In pages 6574, New York, NY, USA, 2004. ACM. [63] Mahesh Ramasubramanian, Sumanta N. Pattanaik, and Donald P. Greenberg. based physical error metric for realistic image synthesis. In
SGP '04: Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing,
A perceptually
SIGGRAPH '99: Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 7382, New York,
NY, USA, 1999. ACM Press/Addison-Wesley Publishing Co.
[64] A. Ravishankar Rao and Gerald L. Lohse. Identifying high level features of texture perception.
[65] Holly E. Rushmeier, Gregory J. Ward, Christine D. Piatko, Phil Sanders, and Bert Rust. Comparing real and synthetic images: Some ideas about metrics. 8291, 1995. [66] Marc Soucy, Guy Godin, and Marc Rioux. A texture-mapping approach for the compression of colored 3d triangulations. In
Rendering Techniques,
pages
[67] J.C Stevens and S.S Stevens. Brightness function: Eects of adaptation.
SIGGRAPH '98: Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pages 151158, New York, NY, USA, 1998. ACM. ceedings of SIGGRAPH 2004, pages 853860, 2004.
95
[69] Marco Tarini, Kai Hormann, Paolo Cignoni, and Claudio Montani. Polycube-maps. In
In Pro-
[70] William C. Thibault and Bruce F. Naylor. Set operations on polyhedra using binary space partitioning trees.
IEEE Computer
[72] Huamin Wang, Yonatan Wexler, Eyal Ofek, and Hugues Hoppe. Factoring repeated content within
SIGGRAPH '08: ACM SIGGRAPH 2008 papers, Mean Squared Error: Love It or Leave It?
[74] Gregory J. Ward. The radiance lighting simulation and rendering system. In 459472, New York, NY, USA, 1994. ACM. [75] Li-Yi Wei. Tile-based texture mapping on graphics hardware. In NY, USA, 2004. ACM. [76] Lance Williams. Pyramidal parametrics.
SIGGRAPH '94: Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pages HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pages 5563, New York, SIGGRAPH Comput. Graph., 17(3):111, 1983.
[77] Wen-Chieh Lin Yanxi Liu and James H. Hays. Near-regular texture analysis and manipulation.
[78] Yangli Hector Yee and Anna Newman. A perceptual metric for production testing. In
'04: ACM SIGGRAPH 2004 Sketches, page 121, New York, NY, USA, 2004. ACM. ACM Trans. Graph., 24(1):127, 2005.
SIGGRAPH
[79] Eugene Zhang, Konstantin Mischaikow, and Greg Turk. Feature-based surface parameterization and texture mapping.
VIS '02: Proceedings of the conference on Visualization '02, pages 315322, Washington, DC, USA, 2002. IEEE Computer Society.
experimental results using image comparison metrics. In
[81] Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding.
96