0% found this document useful (0 votes)
18 views5 pages

Singh 2020

This paper presents a generative modeling technique for 3D reconstruction from a single 2D image using deep learning methods. The proposed approach consists of two stages: generating multiple images from different viewpoints using convolutional neural networks, and constructing a 3D model from these images using generative adversarial networks. The results demonstrate significant improvements over existing techniques in 3D object reconstruction.

Uploaded by

rifanapa2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

Singh 2020

This paper presents a generative modeling technique for 3D reconstruction from a single 2D image using deep learning methods. The proposed approach consists of two stages: generating multiple images from different viewpoints using convolutional neural networks, and constructing a 3D model from these images using generative adversarial networks. The results demonstrate significant improvements over existing techniques in 3D object reconstruction.

Uploaded by

rifanapa2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A Generative Modelling Technique for 3D

Reconstruction from a Single 2D Image


Saurabh Kumar Singh Shrey Tanna
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
IIT (BHU), Varanasi IIT (BHU), Varanasi
Varanasi, India Varanasi, India
[email protected] [email protected]

Abstract—3D Object Reconstruction is the task of predicting have any depth on the 3D model, it is very difficult to predict
the 3D model of an object given a set of 2D images. In this paper, the depth using only one image. However, the position of the
we propose an approach to solving this problem, given a single point can be found as the intersection of the projection rays
2D image. We attempt to make use of several deep learning
techniques. Our model consists of two parts. The first part provided we have two images of that object.
generates multiple images having different viewpoints. We have In this paper, we present a two-stage approach. The first
included this part because reconstructing 3D object directly from one focuses on generating several images having different
a single 2D image is quite difficult, but the same task would be viewpoints from a single, using convolutional neural networks.
a lot easier given multiple images which capture different views The second one constructs a 3D model from these multiple
of that same object. Also, predicting an image having a different
viewpoint is much easier than predicting the whole 3D object, view images using generative adversarial networks. This paper
given an input image. The second part uses a network consisting presents an improved model of present deep-learning methods
of an Encoder, a Decoder (or Generator), and a Discriminator [1] [2], which is well reflected in our results.
to predict the complete 3D voxel grid of the object. In this way, The organization of the paper is as follows: Section II
we achieve significant improvements in the results as compared presents a summary of the literature survey showing different
to the existing techniques.
Index Terms—Reconstruction, GANs, CNNs, Neural networks,
methods for multiple-view image generation and non-neural
Voxel network based and neural network based methods of 3D
reconstruction. The proposed approach is discussed in Section
III. The experimental analysis and results are discussed in
I. I NTRODUCTION
Section IV. Finally, the paper is concluded in Section V.
The rapid development of fields like robotics, designing,
II. R ELATED W ORK
virtual reality, medical imaging, etc. requires a crucial under-
standing of shapes of objects, and hence 3D understanding A. Multiple views prediction from a single image
of objects has itself evolved as an interesting problem for Several approaches have been studied for obtaining unseen
computer vision researchers. As the computer storage space views of a given image using different image transformation or
is increasing, the visual data is also increasing proportionally, neural network methods. Transforming autoencoders [3] show
and hence studying their shape features has been a worry. us how neural networks deal with variations in orientation,
The problem would have been easier if somehow we could scale, position and lighting of the image. The Deep Convo-
extract a 3D model out of it. This creation of a 3D model lution Inverse Graphics Network (DCIGN) [4] can generate
from a given set of image(s) is called 3D reconstruction. Let images with a different pose for the same object and similarly,
us understand this with an example. Say, there is a bus standing images with different lighting given an input image. It consists
on the road, and we click a picture of it using a camera. The of multiple layers of convolution and de-convolution operators
picture we get is a 2D image. If we are given this photo and is trained using the Stochastic Gradient Variational Bayes
and asked to predict its 3D model, it is quite tough. It can (SGVB) algorithm [5]. Such task is much popular in the field
be seen that 3D reconstruction is just the reverse of clicking of face recognition and manipulation, as various factors like
a 2D photo of a 3D scene. When we click a picture of an identity, view and illumination are coupled in the face images
object using a camera, we only get its projection onto a given and hence disentangling such factors is a major challenge. As
plane. The loss of most of the depth information makes the an example, the Multi-view perceptron (MVP) [6] disentangles
problem challenging. A human, when asked to predict the the face identity and predict features under different view-
corresponding 3D counterpart of a given image, as he/she may points. Also, another similar problem is comparing two images
have some idea about that as he/she has seen several similar from different views, which turns out to be quite challenging
pictures throughout his/her life. This gives us a hint that the as features are not stable under large view point changes. [7]
machine can be trained with lots of visual data to accomplish focuses on solving this problem by synthesizing the features
the task. As we know that a given point on a 2D image may for other views with the help of 3D model collection of related

978-1-7281-9615-2/20/$31.00 ©2020 IEEE 1

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 01,2020 at 13:53:49 UTC from IEEE Xplore. Restrictions apply.
objects, based on which feature sets for the two images is includes weight clipping or enforcing Lipschitz constraint.
created and compared. To render rotated objects from a single DCGAN [13] uses convolutional networks for implementing
image, [8] is an end-to-end trained network. Given an image, GANs. Stacked GAN [14] is trained to invert the hierarchical
it basically gives us the view from a viewpoint differing with representations of a bottom-up discriminative network and is
a fixed angle. As a result of this, a discrete set of views of the able to generate images of much higher quality than GANs
given image may be generated by using the algorithm again without stacking. Different methods for improvement of GANs
and again. However, it may lead to error accumulation in case are also described in [15], [16] and [17]. [1] mentions an
of large angle. As opposite of this, [2] can generate views, approach generating a 3D object from a probabilistic space
varying the angle continuously. using GANs and CNNs. [18] also achieves this, but uses
IWGAN for the same. MarrNet [19] uses 2.5D sketches such
B. Neural Network Based 3D Reconstruction as a normal map, depth map, and silhouette map in addition
1) RNN-based 3D object reconstruction: Recurrent neural to single image are input to the network in order to improve
networks have been used for reconstructing the 3D object 3D-GAN [1]. Most of this 3D work uses the Shapenet dataset
from both single and multi-images input. The paper [9] [20].
proposes a novel recurrent neural network architecture, called [21] trained generative up-convolutional neural networks
3D Recurrent Reconstruction Neural Network (3D-R2N2) for which are able to generate images of objects given object
3D object reconstruction. In this network, we can feed one style, viewpoint, and color. [22] presented a method for joint
image or multiple images. These images may be taken from analysis and synthesis of geometrically diverse 3D shape
randomly oriented viewpoints. In this whole procedure, no families. [23], [24] proposed to learn a joint embedding of 3D
pre-processing elements like segmentation, viewpoint labels, shapes and synthesized images, [25], [26] focused on learning
or key points are needed. This method uses recurrent neural discriminative representations for 3D object recognition, [27],
networks like LSTMs and GRUs, and convolutional neural [28], [29] discussed 3D object reconstruction from in-the-
networks as well. wild images, possibly with a recurrent network, and [24],
The network aims at performing object reconstruction from [30] explored autoencoder-based networks for learning voxel-
single or multi-view images. The network consists of three based object representations. [8] proposed a novel recurrent
components, a 2D Convolutional Neural Network (2D-CNN), convolutional encoder-decoder network that is trained end-to-
a 3D Convolutional LSTM (3D-LSTM), and a 3D Deconvo- end on the task of rendering rotated objects starting from a
lutional Neural Network (3D-DCNN). single image.
2) CNN-based 3D object reconstruction: Convolutional
neural networks are used in most of the computer vision C. Non-neural Network Based 3D Reconstruction
problems. Here, we are discussing some of the methods using A lot of approaches are also available, which does not
only CNNs for 3D reconstruction. The paper [10] explores involve any use of neural network techniques.
the point set representation of a 3D object using a single [31] presented an automated pipeline with pixels as in-
image of that object. A point cloud representation as output formation sources and 3D surfaces of different classifications
is chosen as it is a simple and uniform representation and as a result of pictures of some scenes. Their methodology
allows geometric transformations or deformations easily with had deformable 3D models that can be learned from 2D
some manipulations. Also, in cases where ground truth may annotations available in existing object detection data sets. [32]
be ambiguous, this representation provides us the best output. proposed a two-step method. Initially, they employ orthogonal
To be capable of this, the model analyzes the visible parts of matching pursuit to choose the closest single CAD model
the object with the help of the input image and then cleverly in the dictionary to the projected image. Finally, they use
guesses the rest part. their graph embedding based on local dense correspondence
The network aims at outputting a point set representation of to allow for sparse linear combinations of the CAD models.
a 3D object given a single image of that object as input. 3D
shape of the object is represented as an unordered point set III. P ROPOSED A PPROACH
S = (xi ; yi ; zi )N
i=1 where N is a constant. Mostly, N is taken as In this section, we introduce our approach to solving the 3D
1024. Such representation can have geometric transformations Reconstruction problem. We can easily divide our approach
(like rotation, scaling, translation or their combinations) easily into two parts which are discussed in the following sections.
by simple matrix algebra. There is an encoder in the network
and a decoder(or a predictor) in the network. A. Multiple view images generation step
3) GAN based 3D Reconstruction: Generative adversarial The first part of the approach is to generate multiple view
networks [11] have several advantages over the other methods images for the given image. Several images having different
and can produce high-quality realistic objects and outperforms viewpoints are produced using the multi-view 3D (mv3D)
several other methods in terms of performance. There are other CNN network inspired by the paper [2]. This network takes
modified versions of GANs available, as mentioned further. an image as well as a viewpoint vector as input and outputs
WGAN or IWGAN [12] mentions a method for stable training the required image. Viewpoint is described using a viewpoint
of GANs by providing an alternative training method, which vector containing five elements, two of which being the sine

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 01,2020 at 13:53:49 UTC from IEEE Xplore. Restrictions apply.
and cosine of the azimuthal angle, the other two being the sine of channels, kernel size, and strides respectively, as (64, 11, 4),
and cosine of the elevation angle, and the fifth one being the (128, 5, 2), (256, 5, 2), (512, 5, 2), (400, 8, 1). Finally, we sam-
distance from the object center. ple the output of the last layer (400-sized vector) into two
sets to form a mean vector and a standard deviation vector,
giving rise to a 200-dimensional vector. This vector is further
used as input to the generator. The generator (or decoder)
consists of three 3D convolutional layers with number of
channels, kernel size, and strides respectively, as (128, 10, 2),
(64, 4, 2), (1, 4, 2). It outputs a 20 × 20 × 20 voxel grid, which
is taken as input by the discriminator to check whether it is
real or fake by estimating a score using ground truth. The
Fig. 1. First part of the network, showing multi-view images as the output of discriminator consists of three 3D convolutional layers with
the mv3D network, given a single input image along with different viewpoint
vectors number of channels, kernel size, and strides respectively, as
(64, 4, 2), (128, 4, 2), (2, 2, 1). It outputs a value between 0
The network used is an encoder-decoder network. A pair of and 1, denoting the probability of the 3D voxel being real/fake.
image and viewpoint vector (xi , vi ) is input to the network, We also add batch normalization layer and the layer having
and the network aims to predict the required image (ground leaky ReLU units with parameter 0.2 in between convolutional
truth = fi ). The encoder consists of five convolutional layers layers throughout.
(stride = 2) followed by a fully connected layer at the end. The overall loss function is made up of the following
The viewpoint vector is independently processed by fully components.
connected layers, and the result is merged with the encoder The first one is the 3D-GAN loss function, which is a
output. The decoder then takes the resultant vector through minimax loss as it involves the update of two components G
five deconvolutional layers. Deconvolution is the reverse of and D, taking turns alternatively by minimizing generator loss
convolution. Each pixel is replaced with a 2 × 2 pixel window and maximizing discriminator loss. The Discriminator aims to
with original pixel value in the top-left pixel and filling others maximize the probability of assigning the correct label to the
with zero. training examples and output of G. So, we can write the loss
To train the network, any pair of snapshots from the 3D function for D as:
model of some object along with angle information is taken
as an input-output pair. The training is done by minimizing the LDisc = −Ex [log(D(x))] − Ez [log(1 − D(G(z)))] (2)
loss function, which is squared Euclidean loss for the required where E is the mathematical expectation operator, x is 3D
image, and as shown below: voxel grid input to D, z is the vector (output of E) input to
X G. The generator aims to minimize the inverse probability by
L= k fi − fˆi k22 (1)
D for fake samples. So, we can write the loss function for G
i
as:
where fˆi is the output image and fi is the ground truth image LGen = Ez [log(1 − D(G(z)))] (3)
for input image xi and viewpoint vector vi .
For a given image xi , we make 11 pairs of inputs as The second one is the Kullback-Leibler divergence loss,
(xi , vi1 ), (xi , vi2 ), .., (xi , vi11 ) using 11 different viewpoint which aims to confine the distribution space of output of the
vectors vi1 , vi2 , .., vi11 respectively. It can be seen in the encoder E.
Fig. 1 that 11 images of different viewpoints are produced LKL = DKL (N (µ, σ) k N (0, I)) (4)
in our implementation, however, this number can be changed
appropriately. Now, the original image xi , along with 11 where DKL (P k Q) is the divergence between P and Q,
images fˆi1 , fˆi2 , .., fi11
ˆ produced, is taken as a whole to form N (µ, σ) denotes the normal distribution with mean µ and
a 256 × 256 × 12 block by taking only the L-channel of the standard deviation σ (produced by the encoder), and N (0, I)
respective images. This block is then taken as the input for denotes the standard normal distribution.
a 3D-GAN network, which constitutes the second part of our The third loss function represents the 3D model reconstruc-
solution as described in the section as under. tion loss, which is the squared euclidean distance between the
generator output and the target shape. It is given by:
B. 3D model construction step
L3D =k y − ŷ k22 (5)
In this section, we discuss the second part of the network,
which takes in the output of the first part of the network as where y is the target 3D shape of an image, and ŷ is the 3D
input, and outputs the required 3D voxel grid. output of the generator.
The network consists of three components: Encoder (E), In each iteration of the training step, an appropriate set of
Generator (G), and Discriminator (D). The encoder takes the loss functions are used for updating a particular component
12 × 256 × 256 block (a set of 12 images’ L-channels) as of the network. We have used ADAM optimizer for training
input. It consists of five 3D convolutional layers with number purpose, setting the parameters as α = 0.00008, β1 = 0.5

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 01,2020 at 13:53:49 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Second part of the network, showing its three components: Encoder, Generator and Discriminator

and β2 = 0.9. We have trained the algorithm for 1200 epochs, ACKNOWLEDGEMENTS
keeping the batch size to be 256. This paper and the project would not have been possi-
IV. E XPERIMENTS ble without our mentor’s support, Dr. Pratik Chattopadhyay,
A. Dataset and System Description Assistant Professor, CSE, IIT(BHU), Varanasi. His valuable
discussions, insightful suggestions, and knowledge kept our
The 3D CAD models are downloaded from the Shapenets
work on track.
[20] website. These models are then rendered as images, given
different illumination levels, size, azimuthal, and elevation R EFERENCES
angles. The images are then resized by cropping them from [1] Jiajun Wu, Chengkai Zhang, Tianfan Xue, Bill Freeman, and Josh
the center to make it 256 × 256 sized. These images and their Tenenbaum. Learning a probabilistic latent space of object shapes via
corresponding voxel grid are used as the dataset for the task. 3d generative-adversarial modeling. In Advances in neural information
processing systems, pages 82–90, 2016.
We have trained the model for the car dataset. [2] Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. Multi-
We have used a system with three Graphics Processing Units view 3d models from single images with a convolutional network. In
(GPUs) for training purposes. One of them is Nvidia Titan Xp European Conference on Computer Vision, pages 322–337. Springer,
2016.
with 12GB RAM, 12196 MB FB memory and 256 MB BAR1 [3] Geoffrey E Hinton, Alex Krizhevsky, and Sida D Wang. Transforming
memory, and the other two are Nvidia GeForce GTX 1080 Ti auto-encoders. In International Conference on Artificial Neural Net-
with 11 GB RAM, 11178 MB FB memory and 256 MB BAR1 works, pages 44–51. Springer, 2011.
[4] Tejas D Kulkarni, William F Whitney, Pushmeet Kohli, and Josh
memory. Tenenbaum. Deep convolutional inverse graphics network. In Advances
B. Results in neural information processing systems, pages 2539–2547, 2015.
[5] Diederik P Kingma and Max Welling. Auto-encoding variational bayes.
We show the qualitative results for 3D reconstruction from arXiv preprint arXiv:1312.6114, 2013.
a single image in fig. 3. In each row, the image on the left [6] Zhenyao Zhu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Multi-
view perceptron: a deep model for learning face identity and view
is the input image, and the other images show two arbitrary representations. In Advances in Neural Information Processing Systems,
views of the corresponding 3D output. The results as can be pages 217–225, 2014.
seen, never miss a major part of the article even if it’s thin, [7] Hao Su, Fan Wang, Li Yi, and Leonidas Guibas. 3d-assisted im-
age feature synthesis for novel views of an object. arXiv preprint
which was not the case with some other methods. This can arXiv:1412.0003, 2014.
easily be seen in the output for the tractor. [8] Jimei Yang, Scott E Reed, Ming-Hsuan Yang, and Honglak Lee. Weakly-
supervised disentangling with recurrent transformations for 3d view
V. C ONCLUSION AND F UTURE W ORK synthesis. In Advances in Neural Information Processing Systems, pages
In this paper, we present an approach that simplifies the 1099–1107, 2015.
[9] Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and
problem of reconstructing a 3D object from a single image, Silvio Savarese. 3d-r2n2: A unified approach for single and multi-view
to a problem of reconstructing a 3D object from multiple 3d object reconstruction. In European conference on computer vision,
images(captured from different angles). For this task, we use pages 628–644. Springer, 2016.
[10] Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation
the multi-view 3D CNN network, inspired by [2]. And the network for 3d object reconstruction from a single image. In Proceedings
actual 3D generation after this step is done by the GAN. The of the IEEE conference on computer vision and pattern recognition,
encoder of this GAN takes the L-channels of several multi- pages 605–613, 2017.
[11] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
view images as it’s input. The decoder consists of three 3D Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Gen-
convolutional layers, which outputs a voxel grid. erative adversarial nets. In Advances in neural information processing
The approaches proposed till now and even our approach are systems, pages 2672–2680, 2014.
[12] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin,
likely to work for a single class image. The network is trained and Aaron C Courville. Improved training of wasserstein gans. In
for a given class of objects and gives the desired output only in Advances in Neural Information Processing Systems, pages 5767–5777,
case of the images taken from that class. So, a robust solution 2017.
[13] Wei Fang, Feihong Zhang, Victor S Sheng, and Yewen Ding. A method
that could work for all classes of images is to be developed for improving cnn-based image recognition using dcgan. Comput. Mater.
and is a challenging problem. Contin, 57:167–178, 2018.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 01,2020 at 13:53:49 UTC from IEEE Xplore. Restrictions apply.
[14] Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, and Serge
Belongie. Stacked generative adversarial networks. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages
5077–5086, 2017.
[15] Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive
growing of gans for improved quality, stability, and variation. arXiv
preprint arXiv:1710.10196, 2017.
[16] Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks.
arXiv preprint arXiv:1701.00160, 2016.
[17] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec
Radford, and Xi Chen. Improved techniques for training gans. In
Advances in neural information processing systems, pages 2234–2242,
2016.
[18] Edward Smith and David Meger. Improved adversarial systems for 3d
object generation and reconstruction. arXiv preprint arXiv:1707.09557,
2017.
[19] Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T Free-
man, and Joshua B Tenenbaum. MarrNet: 3D Shape Reconstruction via
2.5D Sketches. In Advances In Neural Information Processing Systems,
2017.
[20] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan,
Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song,
Hao Su, et al. Shapenet: An information-rich 3d model repository. arXiv
preprint arXiv:1512.03012, 2015.
[21] Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko,
and Thomas Brox. Learning to generate chairs, tables and cars with
convolutional networks. IEEE transactions on pattern analysis and
machine intelligence, 39(4):692–705, 2017.
[22] Haibin Huang, Evangelos Kalogerakis, and Benjamin Marlin. Analysis
and synthesis of 3d shape families via deep-learned generative models
of surfaces. In Computer Graphics Forum, volume 34, pages 25–38.
Wiley Online Library, 2015.
[23] Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-
Or, and Leonidas J Guibas. Joint embeddings of shapes and images
via cnn image purification. ACM transactions on graphics (TOG),
34(6):234, 2015.
[24] Rohit Girdhar, David F Fouhey, Mikel Rodriguez, and Abhinav Gupta.
Learning a predictable and generative vector representation for objects.
In European Conference on Computer Vision, pages 484–499. Springer,
2016.
[25] Hao Su, Charles R Qi, Yangyan Li, and Leonidas J Guibas. Render for
cnn: Viewpoint estimation in images using cnns trained with rendered
3d model views. In Proceedings of the IEEE International Conference
on Computer Vision, pages 2686–2694, 2015.
[26] Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan,
and Leonidas J Guibas. Volumetric and multi-view cnns for object
classification on 3d data. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 5648–5656, 2016.
[27] Jiajun Wu, Tianfan Xue, Joseph J Lim, Yuandong Tian, Joshua B
Tenenbaum, Antonio Torralba, and William T Freeman. Single image
3d interpreter network. In European Conference on Computer Vision,
pages 365–382. Springer, 2016.
[28] Yu Xiang, Wongun Choi, Yuanqing Lin, and Silvio Savarese. Data-
driven 3d voxel patterns for object category recognition. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 1903–1911, 2015.
[29] Christopher B Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and
Silvio Savarese. 3d-r2n2: A unified approach for single and multi-view
3d object reconstruction. In European conference on computer vision,
pages 628–644. Springer, 2016.
[30] Abhishek Sharma, Oliver Grau, and Mario Fritz. Vconv-dae: Deep
volumetric shape learning without object labels. In European Conference
on Computer Vision, pages 236–250. Springer, 2016.
[31] Abhishek Kar, Shubham Tulsiani, Joao Carreira, and Jitendra Malik.
Category-specific object reconstruction from a single image. In Proceed-
ings of the IEEE conference on computer vision and pattern recognition,
pages 1966–1974, 2015.
Fig. 3. Results showing 3D reconstruction from a single image. In each row, [32] Chen Kong, Chen-Hsuan Lin, and Simon Lucey. Using locally corre-
the image on the left is the input image, and the other images show two sponding cad models for dense 3d reconstructions from a single image.
arbitrary views of the corresponding 3D output In Proceedings of the IEEE conference on computer vision and pattern
recognition, pages 4857–4865, 2017.

Authorized licensed use limited to: Auckland University of Technology. Downloaded on November 01,2020 at 13:53:49 UTC from IEEE Xplore. Restrictions apply.

You might also like