Turning Mobile Phones Into 3D Scanners

This paper proposes a method for efficiently integrating multiple stereo-based depth measurements from a mobile phone to generate a 3D model. A confidence score is assigned to each depth estimate based on local geometry, camera settings, and image evidence. Depth measurements are then fused together while resolving conflicts, with estimates weighted by their confidence. The method improves the accuracy of real-time 3D modeling on mobile phones with minimal overhead. Evaluation shows it outperforms alternative fusion techniques in both accuracy and efficiency for mobile 3D scanning applications.

Uploaded by

Filippo Costanzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views8 pages

Turning Mobile Phones Into 3D Scanners

Uploaded by

Filippo Costanzo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Turning Mobile Phones into 3D Scanners

Kalin Kolev, Petri Tanskanen, Pablo Speciale and Marc Pollefeys

Department of Computer Science
ETH Zurich, Switzerland

Abstract

In this paper, we propose an efficient and accurate

scheme for the integration of multiple stereo-based depth
measurements. For each provided depth map a confidence-
based weight is assigned to each depth estimate by eval-
uating local geometry orientation, underlying camera set-
ting and photometric evidence. Subsequently, all hypothe-
ses are fused together into a compact and consistent 3D
model. Thereby, visibility conflicts are identified and re-
solved, and fitting measurements are averaged with regard Figure 1. This paper deals with the problem of live 3D reconstruc-
to their confidence scores. The individual stages of the pro- tion on mobile phones. The proposed approach allows to obtain
posed approach are validated by comparing it to two al- 3D models of pleasing quality interactively and entirely on-device.
ternative techniques which rely on a conceptually different
fusion scheme and a different confidence inference, respec-
tively. Pursuing live 3D reconstruction on mobile devices as fortunately, their usability is limited to desktop computers
a primary goal, we demonstrate that the developed method and high-end laptops as they rely on massive processing re-
can easily be integrated into a system for monocular inter- sources like multi-core CPUs and powerful GPUs. This pre-
active 3D modeling by substantially improving its accuracy cludes applications of casual capture of 3D models but also
while adding a negligible overhead to its performance and reduces the users benefit from the provided visual feedback
retaining its interactive potential. since his attention should steadily be redirected from the
capturing device to the display and back.
Modern smartphones and tablet computers offer im-
proved mobility and interactivity, and open up new possi-
1. Introduction
bilities for live 3D modeling. While recent mobile devices
There is a growing demand for easy and reliable genera- are equipped with a substantial computational power like
tion of 3D models of real-world objects and environments. multi-core processors and graphics processing cores, their
Vision-based techniques offer a promising accessible alter- capabilities are still far from those of desktop computers. To
native to active laser scanning technologies with competi- a great extent, these restrictions render most of the currently
tive quality. While the acquisition of photographs is trivial known approaches inapplicable on mobile devices, giving
and does not require expertise, the generation of an image room to research in the direction of specially designed, effi-
set, which ensures the desired accuracy of the subsequently cient on-line algorithms to tackle all the limitations of em-
obtained 3D model, is a more challenging task. Camera sen- bedded hardware architectures. While first notable attempts
sor noise, occlusions and complex reflectance of the scene for interactive 3D reconstruction on smartphones have al-
often lead to failure in the reconstruction process but their ready been presented [8, 13, 20], an application able to pro-
appearance is difficult to predict in advance. This prob- duce high-quality 3D models of real-world objects and en-
lem is addressed by monocular real-time capable systems vironments is still illusive.
which can provide useful feedback to the user in the course This paper can be regarded as an effort towards closing
of the reconstruction process and assist him in planning his the gap between the capabilities of current systems for live
movements. Interactive systems based on video cameras 3D reconstruction on mobile devices and the accuracy of
[6] and depth sensors [14, 7] have been demonstrated. Un- similar interactive systems designed for high-end systems

1
(see Fig. 1). Its main contribution is the development of ability. Recently, the work of Pradeep et al. [12] appeared.
an efficient and accurate scheme for integrating multiple It presents another pipeline for real-time 3D reconstruction
stereo-based depth hypotheses into a compact and consis- from monocular video input based on volumetric depth-map
tent 3D model. Thereby, various criteria based on local fusion. Again, those techniques are developed for high-end
geometry orientation, underlying camera setting and photo- computers and have never been demonstrated on embedded
metric evidence are evaluated to judge the reliability of each systems.
measurement. Based on that, the proposed fusion technique Probably the most similar method to ours was proposed
justifies the integrity of the depth estimates and resolves vis- in [22] and subsequently generalized in [3, 1]. Therein,
ibility conflicts. We demonstrate the performance of the a system for interactive in-hand scanning of objects was
developed method within a framework for real-time 3D re- demonstrated. Similar to the approach, presented in this
construction on a mobile phone and show that the accuracy paper, it relies on a surfel representation of the modeled 3D
of the system can be improved while retaining its interactive object. However, the developed fusion scheme is designed
rate. for measurements stemming from active sensors, which are
considerably more accurate than stereo-based ones. There-
2. Related Work fore, the employed confidence estimation is quite different
from this proposed in the current paper.
As the current paper deals with the problem of depth map Recently, the first works on live 3D reconstruction on
fusion, which is a classical problem in multi-view 3D recon- mobile devices appeared. Wendel et al. [23] rely on a dis-
struction, it is related to a myriad of works on binocular and tributed framework with a variant of [2] on a micro air vehi-
multi-view stereo. We refer to the benchmarks in [16], [17] cle. A tablet computer is barely used for visualization while
and [18] for a representative list. However, most of those all demanding computations are performed on a separate
methods are not applicable to our particular scenario as they server machine. Sankar et al. [15] proposed a system for
are not incremental in nature or dont meet the efficiency re- interactively creating and navigating through visual tours.
quirements of embedded systems. In the following, we will Thereby, an approximate geometry of indoor environments
focus only on approaches which are conceptually related to is generated based on strong planar priors and some user in-
ours. teraction. Pan et al. [8] demonstrated an automatic system
Building upon pioneering work on reconstruction with for 3D reconstruction capable of operating entirely on a mo-
a hand-held camera [10], Pollefeys et al. [11] presented bile phone. However, the generated 3D models are not very
a complete pipeline for real-time video-based 3D acquisi- precise due to the sparse nature of the approach. Prisacariu
tion. The system was developed with focus on capturing et al. [13] presented a shape-from-silhouette framework
large-scale urban scenes by means of multiple video cam- running in real time on a mobile phone. Despite the im-
eras mounted on a driving vehicle. Yet, despite its real-time pressive performance, the method suffers from the known
performance, the applicability of the system on a live sce- weaknesses of silhouette-based techniques, e. g. the inabil-
nario is not straightforward. Nevertheless, we drew some ity to capture concavities. Tanskanen et al. [20] developed a
inspiration from the utilized depth map fusion scheme, orig- dense stereo-based system for 3D reconstruction capable of
inally published in [4]. The first methods for real-time in- interactive rates on a mobile phone. We use a similar sys-
teractive 3D reconstruction were proposed by Newcombe tem as a starting point and show that considerable accuracy
et al. [5] and Stuehmer et al. [19]. In both works, a 3D improvements can be achieved by integrating the proposed
representation of the scene is obtained by estimating depth approach without affecting its interactive potential.
maps from multiple views and converting them to triangle
meshes based on the respective neighborhood connectivity. 3. Multi-Resolution Depth Map Computation
Even though these techniques cover our context, they are
designed for high-end computers and are not functional on In the first stage of the 3D modeling pipeline depth maps
mobile devices due to some time-consuming optimization are created from a set of keyframes, and corresponding cal-
operations. Another approach for live video-based 3D re- ibration information and camera poses. Here, we adopt the
construction, which is conceptually similar to ours, was pro- methodology proposed in [20]. Apart from being efficient
posed by Vogiatzis and Hernandez [21]. Here, the captured and accurate, it is particularly appealing due to the poten-
scene is represented by a point cloud where each generated tial of the utilized multi-resolution depth map computation
3D point is obtained as a probabilistic depth estimate by scheme for implementation on mobile GPUs. In the follow-
fusing measurements from different views. Similar to the ing, we outline the procedure for the sake of completeness.
already discussed methods, this one also requires substan- More details can be found in [20].
tial computational resources. Another key difference to our A camera motion tracking system produces a series of
framework is the utilization of a marker to estimate camera keyframes and associated camera poses which are provided
poses, which entails considerable limitations in terms of us- to a dense modeling module. As abrupt jumps in the cam-
era motion cannot be expected, a straightforward strategy fel representation offers the required resilience since the
is to maintain a sliding window containing the most re- unstructured set of surfels can easily be kept consistent
cent keyframes and use them for stereo matching but also throughout any modifications.
to check consistency between different depth maps. Pursu- The proposed depth map fusion approach relies on the
ing an interactive framework on mobile devices, binocular following scheme: When a new depth map becomes avail-
stereo instead of multi-view stereo is applied to minimize able, a weight is assigned to each pixel measurement re-
the memory access overhead. In particular, a newly arrived flecting its expected accuracy. Based on this input, the sur-
keyframe is used as a reference image and is matched to an fel model is modified by adding new surfels, updating or
appropriate image in the current buffer. Thereby, a multi- removing existing ones. In the following, these steps are
resolution scheme for the depth map computation is em- explained in more detail.
ployed to reduce the computational time and to avoid lo-
4.1. Confidence-Based Weighting
cal maxima of the photoconsistency score along the consid-
ered epipolar segments. When moving from one resolution The accuracy of a depth measurement, obtained from
level to the next, the depth range is restricted based on the stereo matching, depends on many factors, e. g. inherent
depth estimates at neighboring pixels. Additionally, com- scene texture, geometry orientation, camera noise, distance
putations are limited to pixels exhibiting sufficient local im- between the scene and the camera device etc. In an effort
age texturedness within regions where the current 3D model to capture all those aspects we assign different weights to
has not reached the desired degree of maturity. The result each estimate and combine them subsequently to obtain a
is a depth map possibly corrupted by noise due to motion final weighting score that expresses our confidence in the
blur, occlusions, lack of texture, presence of slanted sur- particular depth value.
faces etc. A very efficient and effective filtering procedure
is applied to remove the outliers. Thereby, the consistency Geometry-Based Weights. The accuracy of a depth
of each depth measurement is tested on agreement with the measurement depends on the local surface orientation
other depth maps within the sliding window. If a sufficient at that point. The depth measurement is more accurate
number of confirmations is reached, the measurement is re- when the observed geometry is fronto-parallel and less
tained, otherwise it is discarded as an outlier. Subsequently, accurate at grazing viewing angles. As a local normal
the depth map is smoothed by applying bilateral filtering to vector is computed to each depth estimate, those cases can
improve the precision of the depth values. be identified by considering the scalar product between the
The final output of this stage is a series of partial depth normal and the respective viewing direction of the camera.
maps. We build upon this scheme and additionally compute If nx S 2 denotes the normal vector and vx S 2 stands
a normal vector to each depth measurement by applying a for the normalized reverted viewing direction of the camera
local plane fitting procedure. Isolated points with insuffi- for a pixel x Z2 within the image domain, we define
cient support within the neighborhood are discarded. In the a geometry-based weight to x as
next stage, all those measurements are merged into a unified
hnx , vx i cos(max ) , if (n , v )

3D model of the scene. x x max
wg (x) = 1 cos(max )

4. Confidence-Based Depth Map Fusion 0, otherwise,
(1)
A central issue in the design of a depth map fusion ap- where max is a critical angle at which the measurements
proach is the representation of the modeled scene. While are considered unreliable and is set to 80 throughout all
triangle meshes exhibit a common geometric representa- experiments. The weight defined in (1) takes on values
tion, they do not seem well-suited for interactive applica- within [0, 1]. Note that it does not directly depend on the
tions running in real time since considerable efforts are depth estimates. However, there is an indirect relation as
needed to guarantee the integrity and consistency of the the computation of the normal vectors relies on them.
mesh topology after adding, updating or removing any ver-
tices. Note that the user is expected to make use of the live Camera-Based Weights. The accuracy of a depth
visual feedback and recapture certain parts of the scene un- measurement, obtained from binocular stereo, depends on
til the desired surface quality is reached. For that reason, we the utilized camera setting. For example, short baselines
rely on a surfel representation [9]. A surfel sj consists of a implicate high depth imprecision as larger changes of
position pj , normal vector Nj , color Cj and a confidence the depth along the visual rays result in small projection
score cj which is defined as the difference between a cumu- footprints on the image plane of the non-reference camera.
(in) (out)
lative inlier and outlier weight, i. e. cj = Wj Wj . Analogously, increasing the image resolution or moving
Additional attributes like local patch radius or visibility in- the camera closer to the scene leads to more accurate depth
formation could be maintained if needed. The utilized sur- estimates. Based on these observations, a camera-based
weight could be defined by measuring the depth deviation
corresponding to a certain shift (for example one pixel)
along the respective epipolar line. Yet, this cannot be real-
ized efficiently since it involves an additional triangulation
operation. Further complications pose the discrepance
between viewing ray traversal and pixel sampling. Instead,
we revert the inference and measure the pixel shift that
a certain offset along the ray produces. More concretely,
the offset along the visual rays is set to 1/600 of the depth Figure 2. Confidence-based weighting of depth measurements.
range. Then, a camera-based weight to a pixel x is defined The reference image of a stereo pair and corresponding color-
as coded weights to the computed depth estimates. Green repre-
wc (x) = 1 e , (2) sents high weighting, red represents low weighting. Note that pix-
els, where the local normal vector points away from the camera,
where R is a parameter specifying the penalizing get small weights. Also, more distant measurements tend to be
behavior of the term and is set to 5.0 throughout all weighted low.
experiments, and is measured in pixel coordinates. Note
that wc [0, 1] is inversely proportional to the estimated
depths, i. e. larger depths get lower weights and smaller The last step is to combine all weight estimates and to
depths get higher weights. This corresponds to the intuition provide a final overall weight to each depth measurement in
that parts of the scene closer to the camera are expected to the provided depth map. To this end, for each x we set
be reconstructed more accurately than parts further away
from the camera. Moreover, the length of the baseline w(x) = wg (x) wc (x) wph (x). (4)
is also taken into account by the formulation in (2). In
particular, depth maps, obtained from short baselines, will The overall weight lies in [0, 1] and will be high only when
generally be weighted lower. all three weights, the geometry-based one, the camera-
based one and the photoconsistency-based one, are high. In
Photoconsistency-Based Weights. Probably the most other words, a measurement is considered as accurate if it
straightforward criterion to judge the accuracy of a depth is accurate from geometric, stereoscopic and photometric
measurement is its photoconsistency score. However, this point of view.
is also the least discriminative criterion since the provided Fig. 2 shows an example of the estimated weighting for
depth maps are already checked for consistency and a depth map capturing a small church figurine. For all depth
filtered, thus, the respective matching scores are expected measurements the corresponding weights are computed ac-
to be high. The easiest way to obtain the photoconsistency cording to (4). Note that the effects from applying the ge-
value to a depth estimate is to use the one delivered by the ometry and the camera term are clearly visible. Indeed, pix-
stereo module. Yet, as normal information is available at els, where the local normal vector points away from the
that point, a more accurate measure can be employed. Here, camera, get small weights. Also, more distant measure-
we adopt normalized cross-correlations (NCC) over 5 5 ments tend to be weighted low. The effect from applying
patches where the provided normal vectors are leveraged the photoconsistency term is less noticeable.
to warp the patches from the reference image to the second
view. Then, for a pixel x we specify
4.2. Measurement Integration
( When a new depth map becomes available and con-
N CC(x), if N CC(x) thr fidence weights are assigned to all measurements, the
wph (x) = (3)
0, otherwise provided data is used to update the current surfel cloud.
This is done using three basic operations: surfel addition,
as the photoconsistency-based weight. Thereby, thr is a surfel update and surfel removal. New surfels are created
threshold parameter set to 0.65 throughout all experiments, for parts of the depth map that are not explained by the
and N CC(x) denotes the NCC score for the depth and the current model. Surfels that are in correspondence with the
normal at x. Again, we have wph [0, 1]. It should be input depth map are updated by integrating the respective
noted that the computation of the photoconsistency-based depth and normal estimates. Surfels with confidence value
weights is more time-consuming than that of the geometry- below a certain threshold are removed from the cloud. In
based and the camera-based ones while having the least the following, these operations are explained in more detail.
contribution to the final weighting values. For this reason,
it could be omitted when more efficiency is required. Surfel addition. Surfels are added in those parts where
the depth map is not covered by model surfels. Of course,
There are four different update cases (see Fig. 3):
(1) d(pj ) dx : The depth measurement occludes the
model surfel. By itself this is not a visibility conflict since
the depth map could capture a different part of the surface.
The dashed line in Fig. 3(a) shows a potential visibility con-
figuration. In fact, this is the most delicate case as both the
surfel and the measurement could be outliers. Here, we just
ignore the depth measurement and do not perform any sur-
(a) (b) fel update. Note that this could cause problems when parts
of the surface are acquired which are in the line of sight
of already reconstructed ones (with the same orientation).
However, this is unlikely to occur in practice as the user
usually captures more accessible parts first before moving
to locations that are more difficult to reach.
(2) d(pj ) dx : The depth measurement is behind the
model surfel. This is a clear visibility conflict. In this case
we add the measurements weight to the cumulative outlier
(c) (d) weight of the surfel, i. e.
(out) (out)
Figure 3. Different cases for a surfel update. Red denotes the in- Wj Wj + w(x). (5)
coming measurement and dark red - the surfel. (a) Measurement |d(p )d |
is in front of the observed surfel. There is no visibility conflict. (b) (3) d(pj
j)
x
< and (Nj , nx ) 45 : The measure-
Measurement is behind the observed surfel. There is a visibility ment and the model surfel match, both in terms of depth
conflict. (c) Measurement and observed surfel match. (d) Depths and normal orientation. Then, the surfel position and nor-
of the measurement and the observed surfel match but not their mal are updated accordingly. In particular, we compute a
normals. There is a visibility conflict. See text for more details. running weighted average
(in)
Wj Xj + w(x)Xx
Xj
for the initial depth map all measurements will create new (in)
Wj + w(x) (6)
surfels. For each newly created surfel the position and (in) (in)
Wj Wj + w(x),
normal vector are set according to the depth and normal
estimate of the measurement. The color is set to the color where the pixels depth dx and normal nx are converted into
of the respective image pixel. The cumulative inlier weight a state vector Xx .
|d(pj )dx |
is initialized with the weight of the depth measurement and (4) d(p j)
< and (Nj , nx ) > 45 : The measure-
the cumulative outlier weight - with zero. ment and the model surfel match in terms of depth but the
orientations of their normals deviate from each other. We
Surfel update. If the projection of a surfel coincides consider this as a visibility conflict and increment the cu-
with a provided depth measurement, the surfel is updated. mulative outlier weight according to (5).
(in) (out)
Let sj = (pj , Nj , Cj , Wj , Wj , cj ) be the surfel of Recall that there are two additional attributes to each
interest. If there are multiple surfels along the same visual surfel - a color Cj and a confidence score cj . The color
ray, we take the one closest to the camera center that is is set to the color of the pixel with the largest weight
expected to be visible. Additionally, we maintain a state w(x) used in the fusion process for the surfel. The
vector Xj = (p1 , p2 , p3 , , ) R5 encoding its current confidence measure is defined as the difference between
position and normal. Thereby, the normal is represented by cumulative inlier weight and cumulative outlier weight, i. e.
means of a polar angle and an azimuth angle . When (in) (out)
cj = Wj Wj , and has to be updated each time one
a new surfel is created, a spherical coordinate system is of those values is modified.
generated with the provided normal estimate as the first
base vector. Let x = (pj ) be the projection of the surfel Surfel removal. Surfels are removed from the cloud
onto the image plane of the current frame and let d(pj ) be during the acquisition process if their confidence falls
its depth with respect to the camera center. At x the given below a threshold. We set this threshold to 0.5 throughout
depth map provides a depth measurement dx and a normal all conducted experiments. Note that the removal of surfels
measurement nx . In addition to that, we get a weight w(x) opens up gaps that can be filled by new more accurate
reflecting the accuracy of the estimates. surfels.
Now, we have to update the surfel based on this input.
Figure 4. Confidence evolution during reconstruction. Visualized are the color-coded confidence scores of the generated surfels for con-
secutive frames of a real-world sequence. Green represents high confidence, red represents low confidence. An input image from the
same viewpoint can be seen in Fig. 2. Note how the confidence values of surfels, seen from different directions, increase in the course of
reconstruction.

One could wonder why the normals are integrated in the criteria is estimated in each step to reduce the generation of
proposed depth map fusion scheme. In fact, they can be ob- redundant points. See [20] for more details.
tained in a post-processing step by considering the neigh- To evaluate the viability of the confidence-based weight-
borhood of each point within the point cloud. There are ing approach, we combined the developed fusion scheme
two main reasons for this design decision. First, the nor- with the weight computation proposed in [4]. The basic
mal information is useful as it captures the local geometric idea of this strategy is to judge the accuracy of each depth
structure of each depth measurement and enables the iden- measurement by analyzing the photoconsistency distribu-
tification of accidental matches like in the case depicted in tion along the respective visual rays. Rays with a single
Fig. 3(d). Second, the proposed scheme allows to leverage sharp maximum are expected to provide more accurate esti-
the neighborhood relation between different measurements, mates than those exhibiting a shallow maximum or several
provided by the camera sensor. Moreover, note that the pro- local maxima. More details can be found in [4].
posed depth map fusion procedure is incremental and lends Fig. 5 shows the reconstructions generated by applying
itself to online applications. Also, it allows reconstructed all three techniques on a real-world image sequence. One
parts of the scene to be recaptured by providing additional of the input images can be seen in Fig. 2. Camera poses
depth data and improving the accuracy of the respective were obtained by applying a version of [2]. Note that the
subset of the surfel cloud. approach of [20] does not explicitly estimate normals to
Fig. 4 depicts the evolution of the confidence scores of the generated point cloud. Therefore, for the purpose of
the generated surfels for consecutive frames of a real-world rendering we assigned to each point a normal vector based
sequence. Note that the confidence values are small for on the depth map that was used to create it. For the other
newly created surfels but increase in the course of the acqui- two approaches we used the normal estimates obtained on-
sition process if they are observed from other viewpoints. line from the fusion process. It is evident that while all
three methods achieve a high degree of completeness, the
5. Experimental Results proposed one with confidence-based weighting outperforms
We validate the proposed confidence-based depth map the others in terms of accuracy. The technique in [20] pro-
fusion scheme by comparing it to two state-of-the-art real- duces an oversampling of the scene and is more sensitive
time capable alternatives. Furthermore, we demonstrate its to noise than the other two as each 3D point is based on a
performance by integrating it into a system for live 3D re- single depth measurement. This proves the importance of a
construction running on a mobile phone. depth map fusion scheme. Moreover, the reconstruction ob-
tained with the proposed confidence-based weighting is sig-
5.1. Comparison to Alternative Techniques nificantly more accurate than the one relying on the weight-
ing of [4], which validates the deployment of geometric and
For the sake of comparison we implemented two alterna- camera-based criteria in the depth integration process.
tive techniques meeting the efficiency requirements of the
application at hand. 5.2. Live 3D Reconstruction on a Mobile Phone
The first one is the merging method used in [20].
Thereby, the interconnection between the different input Pursuing a system for live 3D reconstruction running on
depth maps is exploited barely to identify inconsistencies mobile phones as a primary goal, we integrated the pro-
and to filter out outliers. All consistent depth measurements posed method into the framework of [20]. This substantially
are back-projected to 3D and merged into a unified point improved its accuracy while adding a negligible overhead
cloud. Moreover, a coverage mask based on photometric of less than a second per processed image. In the follow-
Figure 5. Comparison to alternative techniques. From left to right: Reconstructions with the depth map merging technique in [20], the
developed fusion scheme with the weighting suggested in [4] and the complete approach proposed in this paper. One of the images in
the input sequence can be seen in Fig. 2. The reconstructions contain 311135, 161647 and 181077 points, respectively. While all three
methods achieve a high degree of completeness, the proposed approach with confidence-based weighting outperforms the other two in
terms of accuracy.

Figure 6. Hippopotamus. Rendering of the reconstructed surfel

cloud with colors and shading, and a reference image of the object.
Note the accurate reconstruction of the head.

Figure 8. Buddha statue. Rendering of the reconstructed surfel

cloud with colors and shading, and a reference image of the object.
Note the accurately captured small-scale details.

geometric precision.
Fig. 7 shows the reconstruction of a relief on a decora-
tion vase. The model was captured outdoors under sunlight
conditions. Note that this is a known failure case for many
active sensors.
Figure 7. Relief. Rendering of the reconstructed surfel cloud with The capabilities of current mobile devices for in-hand
colors and shading, and a reference image of the object. The model scanning are further demonstrated in Fig. 8. The recon-
was captured outdoors. struction of a Buddha statue in a museum is visualized.
Even though the generated point cloud exhibits a substan-
tial amount of high-frequency noise, many small-scale de-
ing, multiple reconstructions of real-world objects, gener- tails like the wrinkles of the clothing or the face features are
ated interactively on a Samsung Galaxy SIII and a Samsung captured in the reconstruction.
Galaxy Note 3, are depicted.
Fig. 6 depicts the reconstruction of a fabric toy of a hip- 6. Conclusion
popotamus. Expectedly, homogeneous regions (e. g. on the
ball) lead to holes in the 3D model. However, the well- We presented an efficient and accurate method for
textured head of the hippopotamus is reconstructed at high confidence-based depth map fusion. At its core is a two-
stage approach where confidence-based weights, that re- [11] M. Pollefeys et al. Detailed real-time urban 3d reconstruction
flect the expected accuracy, are first assigned to each depth from video. Int. J. Comput. Vision, 78(2-3):143167, 2008.
measurement and subsequently integrated into a unified and 2
consistent 3D model. Thereby, the maintained 3D represen- [12] V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and
tation in form of a surfel cloud is updated dynamically so as S. Bathiche. Monofusion: Real-time 3D reconstruction of
to resolve visibility conflicts and ensure the integrity of the small scenes with a single web camera. In ISMAR, pages
8388, 2013. 2
reconstruction. The advantages of the proposed approach in
[13] V. A. Prisacariu, O. Kaehler, D. Murray, and I. Reid. Simul-
terms of accuracy improvements are highlighted by a com-
taneous 3D tracking and reconstruction on a mobile phone.
parison to alternative techniques which meet the underly- In ISMAR, 2013. 1, 2
ing efficiency requirements. Additionally, the potential of [14] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time
the developed method is emphasized by integrating it into 3D model acquisition. In SIGGRAPH, pages 438446, New
a state-of-the-art system for live 3D reconstruction running York, NY, USA, 2002. ACM. 1
on a mobile phone and demonstrating its performance on [15] A. Sankar and S. Seitz. Capturing indoor scenes with smart-
multiple real-world objects. phones. In ACM Symposium on User Interface Software and
Technology, 2012. 2
Acknowledgments [16] D. Scharstein and R. Szeliski. A taxonomy and evaluation
of dense two-frame stereo correspondence algorithms. Int. J.
We thank Lorenz Meier for helping with the supplemen-
Comput. Vision, 47(1-3):742, Apr. 2002. 2
tary material. This work is funded by the ETH Zurich Post-
[17] S. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski.
doctoral Fellowship Program, the Marie Curie Actions for A comparison and evaluation of multi-view stereo recon-
People COFUND Program and ERC grant no. 210806. struction algorithms. In IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 519528, 2006.
References 2
[1] M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, [18] C. Strecha, W. von Hansen, L. V. Gool, P. Fua, and U. Thoen-
and A. Kolb. Real-time 3D reconstruction in dynamic scenes nessen. On benchmarking camera calibration and multi-view
using point-based fusion. In 3DV, pages 18, 2013. 2 stereo for high resolution imagery. In IEEE Conference on
[2] G. Klein and D. Murray. Parallel tracking and mapping on a Computer Vision and Pattern Recognition (CVPR), Anchor-
camera phone. ISMAR, pages 8386, 2009. 2, 6 age, AK, USA, 2008. 2
[3] M. Krainin, P. Henry, X. Ren, and D. Fox. Manipulator and [19] J. Stuehmer, S. Gumhold, and D. Cremers. Real-time dense
object tracking for in-hand 3D object modeling. Int. J. Rob. geometry from a handheld camera. In Pattern Recognition
Res., 30(11):13111327, 2011. 2 (Proc. DAGM), pages 1120, 2010. 2
[4] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. [20] P. Tanskanen, K. Kolev, L. Meier, F. Camposeco, O. Saurer,
Frahm, R. Yang, D. Nistr, and M. Pollefeys. Real-time and M. Pollefeys. Live metric 3D reconstruction on mobile
visibility-based fusion of depth maps. In IEEE International phones. In IEEE International Conference on Computer Vi-
Conference on Computer Vision (ICCV), pages 18, 2007. 2, sion (ICCV), 2013. 1, 2, 6, 7
6, 7 [21] G. Vogiatzis and C. Hernandez. Video-based, real-time
[5] R. A. Newcombe and A. J. Davison. Live dense reconstruc- multi-view stereo. Image Vision Comput., pages 434441,
tion with a single moving camera. In IEEE Conference on 2011. 2
Computer Vision and Pattern Recognition (CVPR), 2010. 2 [22] T. Weise, T. Wismer, B. Leibe, , and L. V. Gool. In-hand
[6] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. scanning with online loop closure. In IEEE International
DTAM: Dense tracking and mapping in real-time. In IEEE Workshop on 3-D Digital Imaging and Modeling, 2009. 2
International Conference on Computer Vision (ICCV), pages [23] A. Wendel, M. Maurer, G. Graber, T. Pock, and H. Bischof.
23202327, 2011. 1 Dense reconstruction on-the-fly. In IEEE Conference on
[7] R. A. Newcombe et al. Kinectfusion: Real-time dense sur- Computer Vision and Pattern Recognition (CVPR), pages
face mapping and tracking. In ISMAR, pages 127136, 2011. 14501457, 2012. 2
1
[8] Q. Pan, C. Arth, E. Rosten, G. Reitmayr, and T. Drum-
mond. Rapid scene reconstruction on mobile phones from
panoramic images. In ISMAR, pages 5564, 2011. 1, 2
[9] H. Pfister, M. Zwicker, J. van Baar, and M. Gross. Surfels:
surface elements as rendering primitives. In SIGGRAPH,
pages 335342, 2000. 3
[10] M. Pollefeys, L. Van Gool, M. Vergauwen, F. Verbiest,
K. Cornelis, J. Tops, and R. Koch. Visual modeling with
a hand-held camera. Int. J. Comput. Vision, 59(3):207232,
2004. 2

F2023 PDF
100% (1)
F2023 PDF
8 pages
Turning Mobile Phones Into 3D Scanners
No ratings yet
Turning Mobile Phones Into 3D Scanners
8 pages
Mobile 3d Recon
No ratings yet
Mobile 3d Recon
11 pages
Icra 06
No ratings yet
Icra 06
8 pages
Eficient Integration of TSDF in Mobile Devices 2023
No ratings yet
Eficient Integration of TSDF in Mobile Devices 2023
10 pages
Fast Mobile Depth
No ratings yet
Fast Mobile Depth
10 pages
Uw Cse 11 02 02 PDF
No ratings yet
Uw Cse 11 02 02 PDF
8 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
From Everand
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
Fouad Sabry
No ratings yet
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
2d 3d Reconstruction
No ratings yet
2d 3d Reconstruction
11 pages
From Images To 3D Models
No ratings yet
From Images To 3D Models
7 pages
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
From Everand
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
Fouad Sabry
No ratings yet
3D Geometric Reconstruction
No ratings yet
3D Geometric Reconstruction
6 pages
Neural Recon
No ratings yet
Neural Recon
10 pages
Mobicom22 Final138
No ratings yet
Mobicom22 Final138
14 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Global Illumination: Advancing Vision: Insights into Global Illumination
From Everand
Global Illumination: Advancing Vision: Insights into Global Illumination
Fouad Sabry
No ratings yet
Chugunov The Implicit Values of A Good Hand Shake Handheld Multi-Frame CVPR 2022 Paper
No ratings yet
Chugunov The Implicit Values of A Good Hand Shake Handheld Multi-Frame CVPR 2022 Paper
11 pages
cvpr06 3dreconstructionindoor
No ratings yet
cvpr06 3dreconstructionindoor
8 pages
Overview On 3 D Reconstruction From Images
No ratings yet
Overview On 3 D Reconstruction From Images
7 pages
Camera Based in Situ 3D Modeling Techniq
No ratings yet
Camera Based in Situ 3D Modeling Techniq
10 pages
3D Reconstruction Based On Stereovision and Texture Mapping
No ratings yet
3D Reconstruction Based On Stereovision and Texture Mapping
6 pages
cs231 Project Midterm Shengzhang
No ratings yet
cs231 Project Midterm Shengzhang
6 pages
A Real World Dataset For Multi-View 3D
No ratings yet
A Real World Dataset For Multi-View 3D
18 pages
3d Reconstruction PHD Thesis
100% (3)
3d Reconstruction PHD Thesis
6 pages
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
No ratings yet
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
10 pages
2D-to-3D Photo Rendering For 3D Displays: Comandu@dsi - Unifi.it Atsuto - Maki@crl - Toshiba.co - Uk
No ratings yet
2D-to-3D Photo Rendering For 3D Displays: Comandu@dsi - Unifi.it Atsuto - Maki@crl - Toshiba.co - Uk
8 pages
Improved Feature Matching For Mobile Devices With IMU
No ratings yet
Improved Feature Matching For Mobile Devices With IMU
20 pages
Distance Fog: Exploring the Visual Frontier: Insights into Computer Vision's Distance Fog
From Everand
Distance Fog: Exploring the Visual Frontier: Insights into Computer Vision's Distance Fog
Fouad Sabry
No ratings yet
Development of A Stereo Vision Measurement Architecture For An Underwater Robot
No ratings yet
Development of A Stereo Vision Measurement Architecture For An Underwater Robot
4 pages
AReview On 3 DReconstruction Techniques From 2 DImages
No ratings yet
AReview On 3 DReconstruction Techniques From 2 DImages
15 pages
3D Reconstruction From A Single Still Image Based PDF
No ratings yet
3D Reconstruction From A Single Still Image Based PDF
7 pages
A Mobile Structured Light System For 3D Face Acquisition
No ratings yet
A Mobile Structured Light System For 3D Face Acquisition
2 pages
LiDAR and BIM Integration For 3D Reconstruction and Preservation of Cultural Heritage Images in Ancien
No ratings yet
LiDAR and BIM Integration For 3D Reconstruction and Preservation of Cultural Heritage Images in Ancien
6 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Sensors 24 04177
No ratings yet
Sensors 24 04177
31 pages
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
No ratings yet
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
9 pages
Isprs Archives XLVIII 2 W8 2024 31 2024
No ratings yet
Isprs Archives XLVIII 2 W8 2024 31 2024
6 pages
Visapp 2023 231
No ratings yet
Visapp 2023 231
1 page
Unit Iv Aicv Aids
No ratings yet
Unit Iv Aicv Aids
22 pages
By Reinhard Koch, Marc Pollefeys and Luc Van Gool
No ratings yet
By Reinhard Koch, Marc Pollefeys and Luc Van Gool
13 pages
Isprsarchives XXXVIII 5 W16 575 2011
No ratings yet
Isprsarchives XXXVIII 5 W16 575 2011
8 pages
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
No ratings yet
Generalized Fringe-To-Phase Framework For Single-Shot 3D Reconstruction Integrating Structured Light With Deep Learning
18 pages
613-Article Text-2537-3-10-20221025
No ratings yet
613-Article Text-2537-3-10-20221025
10 pages
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
No ratings yet
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
8 pages
Multi View Environment - Fuhrmann-2014-MVE
No ratings yet
Multi View Environment - Fuhrmann-2014-MVE
8 pages
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
No ratings yet
Azinovic Neural RGB-D Surface Reconstruction CVPR 2022 Paper
12 pages
3D Modeling by Means of Videogrammetry and Laser Scanners For Reverse Engineering
No ratings yet
3D Modeling by Means of Videogrammetry and Laser Scanners For Reverse Engineering
12 pages
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
No ratings yet
Deep Learning Stereo Vision at The Edge: Luca Puglia and Cormac Brick
10 pages
One Shot 3D Photography: Facebook
No ratings yet
One Shot 3D Photography: Facebook
13 pages
3 DRec
No ratings yet
3 DRec
31 pages
2012-Dense Reconstruction On The Fly
No ratings yet
2012-Dense Reconstruction On The Fly
9 pages
Sensors 24 02268
No ratings yet
Sensors 24 02268
22 pages
Instant 3D Photography: Peter Hedman, Johannes Kopf
No ratings yet
Instant 3D Photography: Peter Hedman, Johannes Kopf
12 pages
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
No ratings yet
ORB-SLAM-based Tracing and 3D Reconstruction For Robot Using Kinect 2.0
6 pages
Computer Vision: Exploring the Depths of Computer Vision
From Everand
Computer Vision: Exploring the Depths of Computer Vision
Fouad Sabry
No ratings yet
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
From Everand
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
Fouad Sabry
No ratings yet
3d Image Reconstruction From Point Cloud of 2d Images
No ratings yet
3d Image Reconstruction From Point Cloud of 2d Images
5 pages
Fast 3D Mapping by Matching Planes Extracted From Range Sensor Point-Clouds
No ratings yet
Fast 3D Mapping by Matching Planes Extracted From Range Sensor Point-Clouds
6 pages
Research On 3D Reconstruction Based On Multiple Views
No ratings yet
Research On 3D Reconstruction Based On Multiple Views
5 pages
AI (Deep Learning) Explained Simply - Fabio Ciucci
No ratings yet
AI (Deep Learning) Explained Simply - Fabio Ciucci
7 pages
Propp Morphology
No ratings yet
Propp Morphology
10 pages
Uddin Article
No ratings yet
Uddin Article
10 pages
Anisotropic Polygonal Remeshing: Apport de Recherche
No ratings yet
Anisotropic Polygonal Remeshing: Apport de Recherche
30 pages
Esr-3807 23
No ratings yet
Esr-3807 23
28 pages
CIDP Report
No ratings yet
CIDP Report
41 pages
Esp32-S3 Datasheet en
No ratings yet
Esp32-S3 Datasheet en
75 pages
Unit-1-Coal and Petroleum Processing
No ratings yet
Unit-1-Coal and Petroleum Processing
68 pages
DISSOLUTION OF PARTNERSHIP, Accounts, Class 12
No ratings yet
DISSOLUTION OF PARTNERSHIP, Accounts, Class 12
13 pages
Info Text
No ratings yet
Info Text
24 pages
Chemix Plast R 1 - TDS
No ratings yet
Chemix Plast R 1 - TDS
2 pages
Vss 2600 Valetplus Web
No ratings yet
Vss 2600 Valetplus Web
2 pages
B02084 - Chapter 6-10 - Exercises in Slides
No ratings yet
B02084 - Chapter 6-10 - Exercises in Slides
12 pages
The Perfect Heist
No ratings yet
The Perfect Heist
107 pages
Summative Math
No ratings yet
Summative Math
9 pages
Ats 100a
No ratings yet
Ats 100a
2 pages
N 3 Dec
No ratings yet
N 3 Dec
2 pages
SheqAce Consulting Company Profile
No ratings yet
SheqAce Consulting Company Profile
30 pages
DRB Rectilinear Translation Free Fall and Variable Accelerated Motion
No ratings yet
DRB Rectilinear Translation Free Fall and Variable Accelerated Motion
5 pages
Scholarship Letter - 2024
No ratings yet
Scholarship Letter - 2024
2 pages
(LITA Guides) Matthew Beckstrom, Brady Lund - Casting Light On The Dark Web - A Guide For Safe Exploration-Rowman & Littlefield (2019) PDF
No ratings yet
(LITA Guides) Matthew Beckstrom, Brady Lund - Casting Light On The Dark Web - A Guide For Safe Exploration-Rowman & Littlefield (2019) PDF
133 pages
Social Media Strategies
No ratings yet
Social Media Strategies
10 pages
Whitehack Third Edition (Tablet) (2021) PDF
50% (2)
Whitehack Third Edition (Tablet) (2021) PDF
160 pages
Lecture 3
No ratings yet
Lecture 3
58 pages
Computer Network - Assignment 03 Solution
No ratings yet
Computer Network - Assignment 03 Solution
2 pages
RNL Star Brochure
No ratings yet
RNL Star Brochure
24 pages
Untitled Document-2
No ratings yet
Untitled Document-2
26 pages
Rulebook - Technical Document Preparation and Review
No ratings yet
Rulebook - Technical Document Preparation and Review
40 pages
Manappuram Finance Limited - NOC December 28, 2022
No ratings yet
Manappuram Finance Limited - NOC December 28, 2022
9 pages
Lecture Notes On Cement
No ratings yet
Lecture Notes On Cement
58 pages
VS102 Datasheet
No ratings yet
VS102 Datasheet
8 pages
SM025 KMJ Muafakat Set 1 (Question)
No ratings yet
SM025 KMJ Muafakat Set 1 (Question)
5 pages

Turning Mobile Phones Into 3D Scanners

Uploaded by

Turning Mobile Phones Into 3D Scanners

Uploaded by

Turning Mobile Phones into 3D Scanners

Kalin Kolev, Petri Tanskanen, Pablo Speciale and Marc Pollefeys

In this paper, we propose an efficient and accurate

Figure 6. Hippopotamus. Rendering of the reconstructed surfel

Figure 8. Buddha statue. Rendering of the reconstructed surfel

You might also like