2018_DGCNN
2018_DGCNN
Fig. 1. Point cloud segmentation using the proposed neural network. Bottom: schematic neural network architecture. Top: Structure of the feature spaces
produced at different layers of the network, visualized as the distance from the red point to all the rest of the points (shown left-to-right are the input
and layers 1–3; rightmost figure shows the resulting segmentation). Observe how the feature space structure in deeper layers captures semantically similar
structures such as wings, fuselage, or turbines, despite a large distance between them in the original input space.
Point clouds provide a flexible geometric representation suitable for count- of most 3D data acquisition devices. While hand-designed features
less applications in computer graphics; they also comprise the raw output on point clouds have long been proposed in graphics and vision, however,
the recent overwhelming success of convolutional neural networks
(CNNs) for image analysis suggests the value of adapting insight from
The authors acknowledge the generous support of Army Research Office Grant No. CNN to the point cloud world. Point clouds inherently lack topological
W911NF-12-R-0011, of Air Force Office of Scientific Research Award No. FA9550-19- information, so designing a model to recover topology can enrich the
1-0319, of National Science Foundation Grant No. IIS-1838071, of ERC Consolida-
tor Grant No. 724228 (LEMAN), from an Amazon Research Award, from the MIT-
representation power of point clouds. To this end, we propose a new
IBM Watson AI Laboratory, from the Toyota-CSAIL Joint Research Center, from the neural network module dubbed EdgeConv suitable for CNN-based high-
Skoltech-MIT Next Generation Program, and from Google Faculty Research Award. level tasks on point clouds, including classification and segmentation.
Any opinions, findings, and conclusions or recommendations expressed in this ma- EdgeConv acts on graphs dynamically computed in each layer of the
terial are those of the authors and do not necessarily reflect the views of these
organizations.
network. It is differentiable and can be plugged into existing architectures.
Authors’ addresses: Y. Wang, Y. Sun, S. E. Sarma, and J. M. Solomon, Massachusetts In- Compared to existing modules operating in extrinsic space or treating
stitute of Technology; emails: [email protected], {yb_sun, sesarma, jsolomon}@ each point independently, EdgeConv has several appealing properties: It
mit.edu; Z. Liu, The Chinese University of Hong Kong; email: [email protected]; incorporates local neighborhood information; it can be stacked applied to
M. M. Bronstein, Imperial College London / USI Lugano; email: m.bronstein@
imperial.ac.uk.
learn global shape properties; and in multi-layer systems affinity in feature
Permission to make digital or hard copies of all or part of this work for personal or space captures semantic characteristics over potentially long distances
classroom use is granted without fee provided that copies are not made or distributed in the original embedding. We show the performance of our model on
for profit or commercial advantage and that copies bear this notice and the full cita- standard benchmarks, including ModelNet40, ShapeNetPart, and S3DIS.
tion on the first page. Copyrights for components of this work owned by others than
the author(s) must be honored. Abstracting with credit is permitted. To copy other- CCS Concepts: • Computing methodologies → Neural networks;
wise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. Request permissions from [email protected].
Point-based models; Shape analysis;
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
0730-0301/2019/10-ART146 $15.00
Additional Key Words and Phrases: Point cloud, classification, segmenta-
https://doi.org/10.1145/3326362 tion
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:2 • Y. Wang et al.
ACM Reference format: the network to exploit local features, improving upon performance
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, of the basic model. These techniques largely treat points indepen-
and Justin M. Solomon. 2019. Dynamic Graph CNN for Learning on Point dently at local scale to maintain permutation invariance. This in-
Clouds. ACM Trans. Graph. 38, 5, Article 146 (October 2019), 12 pages. dependence, however, neglects the geometric relationships among
https://doi.org/10.1145/3326362
points, presenting a fundamental limitation that cannot capture
local features.
1 INTRODUCTION To address these drawbacks, we propose a novel simple oper-
Point clouds, or scattered collections of points in 2D or 3D, are ar- ation, called EdgeConv, which captures local geometric structure
guably the simplest shape representation; they also comprise the while maintaining permutation invariance. Instead of generating
output of 3D sensing technology, including LiDAR scanners and point features directly from their embeddings, EdgeConv gener-
stereo reconstruction. With the advent of fast 3D point cloud ac- ates edge features that describe the relationships between a point
quisition, recent pipelines for graphics and vision often process and its neighbors. EdgeConv is designed to be invariant to the or-
point clouds directly, bypassing expensive mesh reconstruction or dering of neighbors, and thus is permutation invariant. Because
denoising due to efficiency considerations or instability of these EdgeConv explicitly constructs a local graph and learns the em-
techniques in the presence of noise. A few of the many recent ap- beddings for the edges, the model is capable of grouping points
plications of point cloud processing and analysis include indoor both in Euclidean space and in semantic space.
navigation (Zhu et al. 2017), self-driving vehicles (Liang et al. 2018; EdgeConv is easy to implement and integrate into existing deep
Qi et al. 2017a; Wang et al. 2018b), robotics (Rusu et al. 2008b), and learning models to improve their performance. In our experiments,
shape synthesis and modeling (Golovinskiy et al. 2009; Guerrero we integrate EdgeConv into the basic version of PointNet without
et al. 2018). using any feature transformation. We show the resulting network
These modern applications demand high-level processing of achieves state-of-the-art performance on several datasets, most
point clouds. Rather than identifying salient geometric features notably ModelNet40 and S3DIS for classification and segmentation.
like corners and edges, recent algorithms search for semantic cues Key Contributions. We summarize the key contributions of our
and affordances. These features do not fit cleanly into the frame- work as follows:
works of computational or differential geometry and typically re-
quire learning-based approaches that derive relevant information • We present a novel operation for learning from point clouds,
through statistical analysis of labeled or unlabeled datasets. EdgeConv, to better capture local geometric features of point
In this article, we primarily consider point cloud classification clouds while still maintaining permutation invariance.
and segmentation, two model tasks in point cloud processing. Tra- • We show the model can learn to semantically group points
ditional methods for solving these problems employ handcrafted by dynamically updating a graph of relationships from layer
features to capture geometric properties of point clouds (Lu et al. to layer.
2014; Rusu et al. 2009, 2008a). More recently, the success of deep • We demonstrate that EdgeConv can be integrated into mul-
neural networks for image processing has motivated a data-driven tiple existing pipelines for point cloud processing.
approach to learning features on point clouds. Deep point cloud • We present extensive analysis and testing of EdgeConv and
processing and analysis methods are developing rapidly and out- show that it achieves state-of-the-art performance on bench-
perform traditional approaches in various tasks (Chang et al. 2015). mark datasets.
Adaptation of deep learning to point cloud data, however, is far
from straightforward. Most critically, standard deep neural net- 2 RELATED WORK
work models require input data with regular structure, while point Hand-Crafted Features. Various tasks in geometric data pro-
clouds are fundamentally irregular: Point positions are continu- cessing and analysis—including segmentation, classification, and
ously distributed in the space, and any permutation of their or- matching—require some notion of local similarity between shapes.
dering does not change the spatial distribution. One common ap- Traditionally, this similarity is established by constructing feature
proach to process point cloud data using deep learning models is descriptors that capture local geometric structure. Countless
to first convert raw point cloud data into a volumetric representa- papers in computer vision and graphics propose local feature
tion, namely a 3D grid (Maturana and Scherer 2015; Wu et al. 2015). descriptors for point clouds suitable for different problems and
This approach, however, usually introduces quantization artifacts data structures. A comprehensive overview of hand-designed
and excessive memory usage, making it difficult to go to capture point features is out of the scope of this article, but we refer the
high-resolution or fine-grained features. reader to Biasotti et al. (2016), Guo et al. (2014), and Van Kaick
State-of-the-art deep neural networks are designed specifically et al. (2011) for discussion.
to handle the irregularity of point clouds, directly manipulating Broadly speaking, one can distinguish between extrinsic and
raw point cloud data rather than passing to an intermediate reg- intrinsic descriptors. Extrinsic descriptors usually are derived from
ular representation. This approach was pioneered by PointNet (Qi the coordinates of the shape in 3D space and includes classical
et al. 2017b), which achieves permutation invariance of points by methods like shape context (Belongie et al. 2001), spin images
operating on each point independently and subsequently applying (Johnson and Hebert 1999), integral features (Manay et al. 2006),
a symmetric function to accumulate features. Various extensions distance-based descriptors (Ling and Jacobs 2007), point feature
of PointNet consider neighborhoods of points rather than acting histograms (Rusu et al. 2009, 2008a), and normal histograms
on each independently (Qi et al. 2017c; Shen et al. 2017); these allow (Tombari et al. 2011), to name a few. Intrinsic descriptors treat
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
Dynamic Graph CNN for Learning on Point Clouds • 146:3
Fig. 2. Left: Computing an edge feature, ei j (top), from a point pair, xi and x j (bottom). In this example, h Θ () is instantiated using a fully connected layer,
and the learnable parameters are its associated weights. Right: The EdgeConv operation. The output of EdgeConv is calculated by aggregating the edge
features associated with all the edges emanating from each connected vertex.
the 3D shape as a manifold whose metric structure is discretized deep neural network, allowing to do intrinsic structured predic-
as a mesh or graph; quantities expressed in terms of the metric tion of correspondence between nonrigid shapes.
are invariant to isometric deformation. Representatives of this The last class of geometric deep learning approaches attempts
class include spectral descriptors such as global point signatures to pull back a convolution operation by embedding the shape into
(Rustamov 2007), the heat and wave kernel signatures (Aubry et al. a domain with shift-invariant structure such as the sphere (Sinha
2011; Sun et al. 2009), and variants (Bronstein and Kokkinos 2010). et al. 2016), torus (Maron et al. 2017), plane (Ezuz et al. 2017), sparse
Most recently, several approaches wrap machine learning schemes network lattice (Su et al. 2018), or spline (Fey et al. 2018).
around standard descriptors (Guo et al. 2014; Shah et al. 2013). Finally, we should mention geometric generative models, which
Deep Learning on Geometry. Following the breakthrough results attempt to generalize models such as autoencoders, variational
of convolutional neural networks (CNNs) in vision (Krizhevsky autoencoders (VAE) (Kingma and Welling 2013), and generative
et al. 2012; LeCun et al. 1989), there has been strong interest to adversarial networks (GAN) (Goodfellow et al. 2014) to the non-
adapt such methods to geometric data. Unlike images, geometry Euclidean setting. One of the fundamental differences between
usually does not have an underlying grid, requiring new building these two settings is the lack of canonical order between the input
blocks replacing convolution and pooling or adaptation to a grid and the output vertices, thus requiring an input-output correspon-
structure. dence problem to be solved. In 3D mesh generation, it is commonly
As a simple way to overcome this issue, view-based (Su et al. assumed that the mesh is given and its vertices are canonically or-
2015; Wei et al. 2016) and volumetric representations (Klokov and dered; the generation problem thus amounts only to determining
Lempitsky 2017; Maturana and Scherer 2015; Tatarchenko et al. the embedding of the mesh vertices. Kostrikov et al. (2017) pro-
2017; Wu et al. 2015)—or their combination (Qi et al. 2016)—“place” posed SurfaceNets based on the extrinsic Dirac operator for this
geometric data onto a grid. More recently, PointNet (Qi et al. 2017b, task. Litany et al. (2017a) introduced the intrinsic VAE for meshes
2017c) exemplifies a broad class of deep learning architectures on and applied it to shape completion; a similar architecture was used
non-Euclidean data (graphs and manifolds) termed geometric deep by Ranjan et al. (2018) for 3D face synthesis. For point clouds, mul-
learning (Bronstein et al. 2017). These date back to early methods tiple generative architectures have been proposed (Fan et al. 2017;
to construct neural networks on graphs (Scarselli et al. 2009), re- Li et al. 2018b; Yang et al. 2018).
cently improved with gated recurrent units (Li et al. 2016) and
neural message passing (Gilmer et al. 2017). Bruna et al. (2013) 3 OUR APPROACH
and Henaff et al. (2015) generalized convolution to graphs via the We propose an approach inspired by PointNet and convolution op-
Laplacian eigenvectors (Shuman et al. 2013). Computational draw- erations. Instead of working on individual points like PointNet,
backs of this foundational approach were alleviated in follow-up however, we exploit local geometric structures by constructing a
works using polynomial (Defferrard et al. 2016; Kipf and Welling local neighborhood graph and applying convolution-like opera-
2017; Monti et al. 2017b, 2018), or rational (Levie et al. 2017) spec- tions on the edges connecting neighboring pairs of points, in the
tral filters that avoid Laplacian eigendecomposition and guaran- spirit of graph neural networks. We show in the following that
tee localization. An alternative definition of non-Euclidean con- such an operation, dubbed edge convolution (EdgeConv), has prop-
volution employs spatial rather than spectral filters. The Geodesic erties lying between translation-invariance and non-locality.
CNN (GCNN) is a deep CNN on meshes generalizing the notion of Unlike graph CNNs, our graph is not fixed but rather is dynam-
patches using local intrinsic parameterization (Masci et al. 2015). ically updated after each layer of the network. That is, the set of
Its key advantage over spectral approaches is better generaliza- k-nearest neighbors of a point changes from layer to layer of the
tion as well as a simple way of constructing directional filters. network and is computed from the sequence of embeddings. Prox-
Follow-up work proposed different local charting techniques us- imity in feature space differs from proximity in the input, leading
ing anisotropic diffusion (Boscaini et al. 2016) or Gaussian mixture to nonlocal diffusion of information throughout the point cloud.
models (Monti et al. 2017a; Veličković et al. 2017). In Halimi et al. As a connection to existing work, Non-local Neural Networks
(2018) and Litany et al. (2017b), a differentiable functional map (Wang et al. 2018a) explored similar ideas in the video recognition
(Ovsjanikov et al. 2012) layer was incorporated into a geometric field, and follow-up work by Xie et al. (2018) proposed using
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:4 • Y. Wang et al.
Fig. 3. Model architectures: The model architectures used for classification (top branch) and segmentation (bottom branch). The classification model takes
as input n points, calculates an edge feature set of size k for each point at an EdgeConv layer, and aggregates features within each set to compute EdgeConv
responses for corresponding points. The output features of the last EdgeConv layer are aggregated globally to form an 1D global descriptor, which is used
to generate classification scores for c classes. The segmentation model extends the classification model by concatenating the 1D global descriptor and all
the EdgeConv outputs (serving as local descriptors) for each point. It outputs per-point classification scores for p semantic labels. ⊕: concatenation. Point
cloud transform block: The point cloud transform block is designed to align an input point set to a canonical space by applying an estimated 3 × 3 matrix.
To estimate the 3 × 3 matrix, a tensor concatenating the coordinates of each point and the coordinate differences between its k neighboring points is used.
EdgeConv block: The EdgeConv block takes as input a tensor of shape n × f , computes edge features for each point by applying a multi-layer perceptron
(mlp) with the number of layer neurons defined as {a 1, a 2, . . . , a n }, and generates a tensor of shape n × a n after pooling among neighboring edge features.
non-local blocks to denoise feature maps to defend against Making analogy to convolution along images, we regard xi as the
adversarial attacks. central pixel and {xj : (i, j) ∈ E} as a patch around it (see Figure 2).
Overall, given an F -dimensional point cloud with n points, Edge-
3.1 Edge Convolution Conv produces an F -dimensional point cloud with the same num-
Consider an F -dimensional point cloud with n points, denoted by ber of points.
X = {x1 , . . . , xn } ⊆ RF . In the simplest setting of F = 3, each point Choice of h and . The choice of the edge function and the ag-
contains 3D coordinates xi = (x i , yi , zi ); it is also possible to in- gregation operation has a crucial influence on the properties of
clude additional coordinates representing color, surface normal, EdgeConv. For example, when x1 , . . . , xn represent image pixels
and so on. In a deep neural network architecture, each subsequent on a regular grid and the graph G has connectivity representing
layer operates on the output of the previous layer, so more gen- patches of fixed size around each pixel, the choice θm · xj as the
erally the dimension F represents the feature dimensionality of a edge function and sum as the aggregation operation yields stan-
given layer. dard convolution:
We compute a directed graph G = (V, E) representing local
point cloud structure, where V = {1, . . . , n} and E ⊆ V × V are x im = θ m · xj . (2)
the vertices and edges, respectively. In the simplest case, we con- j:(i, j ) ∈ E
struct G as the k-nearest neighbor (k-NN) graph of X in RF . The Here, Θ = (θ 1 , . . . , θ M ) encodes the weights of M different filters.
graph includes self-loop, meaning each node also points to it- Each θ m has the same dimensionality as x, and · denotes the Eu-
self. We define edge features as e i j = h Θ (xi , xj ), where h Θ : RF × clidean inner product.
RF → RF is a nonlinear function with a set of learnable parame- A second choice of h is
ters Θ.
Finally, we define the EdgeConv operation by applying a h Θ (xi , xj ) = h Θ (xi ), (3)
channel-wise symmetric aggregation operation (e.g., or max) encoding only global shape information oblivious of the local
on the edge features associated with all the edges emanating from neighborhood structure. This type of operation is used in Point-
each vertex. The output of EdgeConv at the i-th vertex is thus given Net, which can thus be regarded as a special case of EdgeConv.
by A third choice of h adopted by Atzmon et al. (2018) is
xi = h Θ (xi , xj ). (1)
j:(i, j ) ∈ E h Θ (xi , xj ) = h Θ (xj ) (4)
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
Dynamic Graph CNN for Learning on Point Clouds • 146:5
and and a permutation operator π . The output of the layer xi is in-
x im = (hθ (xj ) )д(u (xi , xj )), (5) variant to permutation of the input xj because max is a symmetric
j ∈V function (other symmetric functions also apply). The global max
where д is a Gaussian kernel and u computes pairwise distance in pooling operator to aggregate point features is also permutation-
Euclidean space. invariant.
A fourth option is Translation Invariance. Our operator has a “partial” translation
h Θ (xi , xj ) = h Θ (xj − xi ). (6) invariance property, in that our choice of edge functions Equa-
tion (7) explicitly exposes the part of the function that can be
This encodes only local information, treating the shape as a col- translation-dependent and optionally can be disabled. Consider a
lection of small patches and losing global structure. translation applied to xj and xi ; we can show that part of the edge
Finally, a fifth option that we adopt in this article is an asym- feature is preserved when shifting by T . In particular, for the trans-
metric edge function lated point cloud, we have
h Θ (xi , xj ) = h̄ Θ (xi , xj − xi ). (7)
eijm = θ m · (xj + T − (xi + T )) + ϕ m · (xi + T )
This explicitly combines global shape structure, captured by the
coordinates of the patch centers xi , with local neighborhood in- = θ m · (xj − xi ) + ϕ m · (xi + T ).
formation, captured by xj − xi . In particular, we can define our
operator by notating If we only consider xj − xi by taking ϕ m = 0, then the operator
is fully invariant to translation. In this case, however, the model
eijm = ReLU(θ m · (xj − xi ) + ϕ m · xi ), (8) reduces to recognizing an object based on an unordered set of
which can be implemented as a shared MLP, and taking patches, ignoring the positions and orientations of patches. With
both xj − xi and xi as input, the model takes account into the local
x im = max eijm , (9) geometry of patches while keeping global shape information.
j:(i, j ) ∈ E
where Θ = (θ 1 , . . . , θ M , ϕ 1 , . . . , ϕ M ).
3.4 Comparison to Existing Methods
3.2 Dynamic Graph Update DGCNN is related to two classes of approaches, PointNet and
Our experiments suggest that it is beneficial to recompute the graph graph CNNs, which we show to be particular settings of our
using nearest neighbors in the feature space produced by each method. We summarize different methods in Table 1.
layer. This is a crucial distinction of our method from graph CNNs PointNet is a special case of our method with k = 1, yielding a
working on a fixed input graph. Such a dynamic graph update is graph with an empty edge set E = ∅. The edge function used in
the reason for the name of our architecture, the Dynamic Graph PointNet is h Θ (xi , xj ) = h Θ (xi ), which considers global but not lo-
CNN (DGCNN). With dynamic graph updates, the receptive field cal geometry. PointNet++ tries to account for local structure by ap-
is as large as the diameter of the point cloud, while being sparse. plying PointNet in a local manner. In our parlance, PointNet++ first
At each layer, we have a different graph G (l ) = (V (l ) , E (l ) ), constructs the graph according to the Euclidean distances between
where the lth layer edges are of the form (i, ji1 ), . . . , (i, jikl ) such the points, and in each layer applies a graph coarsening operation.
(l ) (l ) (l ) For each layer, some points are selected using farthest point sam-
that xji 1 , . . . , x j are the kl points closest to xi . Put differently, pling (FPS); only the selected points are preserved while others are
ikl
our architecture learns how to construct the graph G used in each directly discarded after this layer. In this way, the graph becomes
layer rather than taking it as a fixed constant constructed before smaller after the operation applied on each layer. In contrast to
the network is evaluated. In our implementation, we compute a DGCNN, PointNet++ computes pairwise distances using point in-
pairwise distance matrix in feature space and then take the closest put coordinates, and hence their graphs are fixed during training.
k points for each single point. The edge function used by PointNet++ is h Θ (xi , xj ) = h Θ (xj ), and
the aggregation operation is also a max.
3.3 Properties Among graph CNNs, MoNet (Monti et al. 2017a), ECC
Permutation Invariance. Consider the output of a layer, (Simonovsky and Komodakis 2017), Graph Attention Networks
(Veličković et al. 2017), and the concurrent work (Atzmon et al.
xi = max h Θ (xi , xj ), (10)
j:(i, j ) ∈ E 2018) are the most related approaches. Their common denominator
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:6 • Y. Wang et al.
Fig. 4. Structure of the feature spaces produced at different stages of our shape classification neural network architecture, visualized as the distance between
the red point to the rest of the points. For each set, Left: Euclidean distance in the input R3 space; Middle: Distance after the point cloud transform stage,
amounting to a global transformation of the shape; Right: Distance in the feature space of the last layer. Observe how in the feature space of deeper layers
semantically similar structures such as shelves of a bookshelf or legs of a table are brought close together, although they are distant in the original space.
is a notion of a local patch on a graph, in which a convolution-type where д is a Gaussian kernel, is the elementwise (Hadamard)
operation can be defined.1 product, {w 1 , . . . , w N } encode the learnable parameters of the
Specifically, Monti et al. (2017a) use the graph structure to com- Gaussians (mean and covariance), and {θ 1 , . . . , θ M } are the learn-
pute a local “pseudo-coordinate system” u in which the neighbor- able filter coefficients. Equation (11) is an instance of our general
hood vertices are represented; the convolution is then defined as operation Equation (1), with a particular edge function
an M-component Gaussian mixture,
x im = θ m · (xj дw n (u (xi , xj ))), (11) hθ m ,w n (xi , xj ) = θ m · (xj дw n (u (xi , xj )))
j:(i, j ) ∈ E
and = . Again, their graph structure is fixed, and u is con-
structed based on the degrees of nodes.
1 Simonovsky and Komodakis (2017) and Veličković et al. (2017) can be considered Atzmon et al. (2018) can be seen as a special case of Monti
instances of Monti et al. (2017a), with the difference that the weights are constructed
employing features from adjacent nodes instead of graph structure; Atzmon et al. et al. (2017a) with д as predefined Gaussian functions. Remov-
(2018) is also similar except that the weighting function is hand-designed. ing learnable parameters (w 1 , . . . , w N ) and constructing a dense
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
Dynamic Graph CNN for Learning on Point Clouds • 146:7
Mean Overall CENT DYN MPOINTS Mean Class Accuracy(%) Overall Accuracy(%)
Class Accuracy Accuracy 88.9 91.7
3DShapeNets (Wu et al. 2015) 77.3 84.7 x 89.3 92.2
VoxNet (Maturana and Scherer 2015) 83.0 85.9 x x 90.2 92.9
Subvolume (Qi et al. 2016) 86.0 89.2 x x x 90.7 93.5
VRN (single view) (Brock et al. 2016) 88.98 — CENT denotes centralization, DYN denotes dynamical graph recomputation and
VRN (multiple views) (Brock et al. 2016) 91.33 — MPOINTS denotes experiments with 2,048 points.
ECC (Simonovsky and Komodakis 2017) 83.2 87.4
PointNet (Qi et al. 2017b) 86.0 89.2 Table 5. Results of Our Model with Different Numbers of Nearest
PointNet++ (Qi et al. 2017c) — 90.7 Neighbors
Kd-net (Klokov and Lempitsky 2017) — 90.6
PointCNN (Li et al. 2018a) 88.1 92.2 Number of nearest neighbors (k) Mean Overall
PCNN (Atzmon et al. 2018) — 92.3 Class Accuracy(%) Accuracy(%)
Ours (baseline) 88.9 91.7 5 88.0 90.5
Ours 90.2 92.9 10 88.9 91.4
Ours (2048 points) 90.7 93.5 20 90.2 92.9
40 89.4 92.4
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:8 • Y. Wang et al.
Fig. 5. Left: Results of our model tested with random input dropout. The
model is trained with number of points being 1024 and k being 20. Right:
Point clouds with different number of points. The numbers of points are
shown below the bottom row.
Fig. 7. Compare part segmentation results. For each set, from left to right:
PointNet, ours, and ground truth.
density. Note that PCNN (Atzmon et al. 2018) uses additional aug-
mentation techniques like randomly sampling 1,024 points out of
1,200 points during both training and testing.
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
Dynamic Graph CNN for Learning on Point Clouds • 146:9
mean areo bag cap car chair ear guitar knife lamp laptop motor mug pistol rocket skate table
. phone board
# shapes 2690 76 55 898 3758 69 787 392 1547 451 202 184 283 66 152 5271
PointNet 83.7 83.4 78.7 82.5 74.9 89.6 73.0 91.5 85.9 80.8 95.3 65.2 93.0 81.2 57.9 72.8 80.6
PointNet++ 85.1 82.4 79.0 87.7 77.3 90.8 71.8 91.0 85.9 83.7 95.3 71.6 94.1 81.3 58.7 76.4 82.6
Kd-Net 82.3 80.1 74.6 74.3 70.3 88.6 73.5 90.2 87.2 81.0 94.9 57.4 86.7 78.1 51.8 69.9 80.3
LocalFeatureNet 84.3 86.1 73.0 54.9 77.4 88.8 55.0 90.6 86.5 75.2 96.1 57.3 91.7 83.1 53.9 72.5 83.8
PCNN 85.1 82.4 80.1 85.5 79.5 90.8 73.2 91.3 86.0 85.0 95.7 73.2 94.8 83.3 51.0 75.0 81.8
PointCNN 86.1 84.1 86.45 86.0 80.8 90.6 79.7 92.3 88.4 85.3 96.1 77.2 95.3 84.2 64.2 80.0 83.0
Ours 85.2 84.0 83.4 86.7 77.8 90.6 74.7 91.2 87.5 82.8 95.7 66.3 94.9 81.1 63.5 74.5 82.6
Fig. 9. Left: The mean IoU (%) improves when the ratio of kept points in-
creases. Points are dropped from one of six sides (top, bottom, left, right,
front, and back) randomly during evaluation process. Right: Part segmen-
tation results on partial data. Points on each row are dropped from the
same side. The keep ratio is shown below the bottom row. Note that
the segmentation results of turbines are improved when more points are
included.
Mean overall
IoU accuracy
PointNet (baseline) (Qi et al. 2017b) 20.1 53.2
PointNet (Qi et al. 2017b) 47.6 78.5
MS + CU(2) (Engelmann et al. 2017) 47.8 79.2
G + RCU (Engelmann et al. 2017) 49.7 81.1
PointCNN (Li et al. 2018a) 65.39 —
Ours 56.1 84.1
MS+CU for multi-scale block features with consolidation units; G+RCU for the
grid-blocks with recurrent consolidation units.
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:10 • Y. Wang et al.
Fig. 10. Semantic segmentation results. From left to right: PointNet, ours, ground truth, and point cloud with original color. Notice our model outputs
smoother segmentation results, for example, wall (cyan) in top two rows, chairs (red) and columns (magenta) in bottom two rows.
3D shapes from 16 object categories, annotated with 50 parts in Training. The same training setting as in our classification task
total. We sampled 2,048 points from each training shape, and most is adopted. A distributed training scheme is further implemented
sampled point sets are labeled with less than six parts. We follow on two NVIDIA TITAN X GPUs to maintain the training batch size.
the official train/validation/test split scheme as Chang et al. (2015)
Results. We use Intersection-over-Union (IoU) on points to eval-
in our experiment.
uate our model and compare with other benchmarks. We follow
Architecture. The network architecture is illustrated in Figure 3 the same evaluation scheme as PointNet: The IoU of a shape is
(bottom branch). After a spatial transformer network, three Edge- computed by averaging the IoUs of different parts occurring in
Conv layers are used. A shared fully connected layer (1,024) that shape, and the IoU of a category is obtained by averaging the
aggregates information from the previous layers. Shortcut con- IoUs of all the shapes belonging to that category. The mean IoU
nections are used to include all the EdgeConv outputs as local (mIoU) is finally calculated by averaging the IoUs of all the testing
feature descriptors. At last, three shared fully connected layers shapes. We compare our results with PointNet (Qi et al. 2017b),
(256, 256, 128) are used to transform the pointwise features. Batch- PointNet++ (Qi et al. 2017c), Kd-Net (Klokov and Lempitsky 2017),
norm, dropout, and ReLU are included in the similar fashion to our LocalFeatureNet (Shen et al. 2017), PCNN (Atzmon et al. 2018), and
classification network. PointCNN (Li et al. 2018a). The evaluation results are shown in
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
Dynamic Graph CNN for Learning on Point Clouds • 146:11
Table 6. We also visually compare the results of our model and be revised and/or re-engineered to improve efficiency or scala-
PointNet in Figure 7. More examples are shown in Figure 6. bility, e.g. incorporating fast data structures rather than comput-
Intra-cloud Distances. We next explore the relationships between ing pairwise distances to evaluate k-nearest neighbors queries. We
different point clouds captured using our features. As shown in also could consider higher-order relationships between larger tu-
Figure 8, we take one red point from a source point cloud and com- ples of points, rather than considering them pairwise. Another
pute its distance in feature space to points in other point clouds possible extension is to design a non-shared transformer network
from the same category. An interesting finding is that although that works on each local patch differently, adding flexibility to our
points are from different sources, they are close to each other model.
if they are from semantically similar parts. We evaluate on the Our experiments suggest that intrinsic features can be equally
features after the third layer of our segmentation model for this valuable if not more valuable than point coordinates; developing
experiment. a practical and theoretically justified framework for balancing in-
trinsic and extrinsic considerations in a learning pipeline will re-
Segmentation on Partial Data. Our model is robust to partial data. quire insight from theory and practice in geometry processing.
We simulate the environment that part of the shape is dropped Given this, we will consider applications of our techniques to more
from one of six sides (top, bottom, right, left, front, and back) with abstract point clouds coming from applications like document re-
different percentages. The results are shown in Figure 9. On the trieval and image processing rather than 3D geometry; beyond
left, the mean IoU versus “keep ratio” is shown. On the right, the broadening the applicability of our technique, these experiments
results for an airplane model are visualized. will provide insight into the role of geometry in abstract data
processing.
4.5 Indoor Scene Segmentation
Data. We evaluate our model on Stanford Large-Scale 3D Indoor REFERENCES
Spaces Dataset (S3DIS) (Armeni et al. 2016) for a semantic scene Iro Armeni, Ozan Sener, Amir R. Zamir, Helen Jiang, Ioannis Brilakis, Martin Fischer,
segmentation task. This dataset includes 3D scan point clouds for and Silvio Savarese. 2016. 3D semantic parsing of large-scale indoor spaces. In
6 indoor areas including 272 rooms in total. Each point belongs Proceedings of the CVPR.
Matan Atzmon, Haggai Maron, and Yaron Lipman. 2018. Point convolutional neural
to one of 13 semantic categories—e.g., board, bookcase, chair, ceil- networks by extension operators. ACM Trans. Graph. 37, 4, Article 71 (July 2018),
ing, and beam—plus clutter. We follow the same setting as Qi et al. 12 pages. DOI:https://doi.org/10.1145/3197517.3201301
Mathieu Aubry, Ulrich Schlickewei, and Daniel Cremers. 2011. The wave kernel sig-
(2017b), where each room is split into blocks with area 1m × 1m, nature: A quantum mechanical approach to shape analysis. In Proceedings of the
and each point is represented as a 9D vector (XYZ, RGB, and nor- ICCV Workshops.
malized spatial coordinates). We sampled 4,096 points for each Serge Belongie, Jitendra Malik, and Jan Puzicha. 2001. Shape context: A new descriptor
for shape matching and object recognition. In Proceedings of the NIPS.
block during training process, and all points are used for testing. Silvia Biasotti, Andrea Cerri, A. Bronstein, and M. Bronstein. 2016. Recent trends,
We also use the same sixfold cross validation over the six areas, applications, and perspectives in 3D shape similarity assessment. Comput. Graph.
and the average evaluation results are reported. Forum 35, 6 (2016), 87–119.
Davide Boscaini, Jonathan Masci, Emanuele Rodolà, and Michael Bronstein. 2016.
The model used for this task is similar to part segmentation Learning shape correspondence with anisotropic convolutional neural networks.
model, except that a probability distribution over semantic object In Proceedings of the NIPS.
Andrew Brock, Theodore Lim, James Millar Ritchie, and Nicholas J. Weston. 2016.
classes is generated for each input point and no categorical vector Generative and discriminative voxel modeling with convolutional neural net-
is used here. We compare our model with both PointNet (Qi et al. works. In Proceedings of the NIPS.
2017b) and PointNet baseline, where additional point features (lo- Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Van-
dergheynst. 2017. Geometric deep learning: Going beyond euclidean data. IEEE
cal point density, local curvature, and normal) are used to construct Signal Process. Mag. 34, 4 (2017), 18–42.
handcrafted features and then fed to an MLP classifier. We further Michael M. Bronstein and Iasonas Kokkinos. 2010. Scale-invariant heat kernel signa-
compare our work with Engelmann et al. (2017) and PointCNN tures for non-rigid shape recognition. In Proceedings of the CVPR.
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2013. Spectral net-
(Li et al. 2018a). Engelmann et al. (2017) present network architec- works and locally connected networks on graphs. arXiv:1312.6203 (2013).
tures to enlarge the receptive field over the 3D scene. Two differ- Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang,
Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su et al. 2015.
ent approaches are proposed in their work: MS+CU for multi-scale Shapenet: An information-rich 3D model repository. arXiv:1512.03012 (2015).
block features with consolidation units; G+RCU for the grid-blocks Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional
with recurrent consolidation Units. We report evaluation results in neural networks on graphs with fast localized spectral filtering. In Proceedings of
the NIPS.
Table 7 and visually compare the results of PointNet and our model Francis Engelmann, Theodora Kontogianni, Alexander Hermans, and Bastian Leibe.
in Figure 10. 2017. Exploring spatial context for 3D semantic segmentation of point clouds. In
Proceedings of the CVPR.
Danielle Ezuz, Justin Solomon, Vladimir G. Kim, and Mirela Ben-Chen. 2017. GWCNN:
5 DISCUSSION A metric alignment layer for deep shape analysis. Comput. Graph. Forum 36, 5
In this work, we propose a new operator for learning on point (2017), 49–57.
Haoqiang Fan, Hao Su, and Leonidas J. Guibas. 2017. A point set generation network
cloud and show its performance on various tasks. Our model sug- for 3D object reconstruction from a single image. In Proceedings of the CVPR.
gests that local geometric features are important to 3D recognition Matthias Fey, Jan Eric Lenssen, Frank Weichert, and Heinrich Müller. 2018.
SplineCNN: Fast geometric deep learning with continuous B-spline kernels. In
tasks, even after introducing machinery from deep learning. Proceedings of the CVPR.
While our architectures easily can be incorporated as-is into ex- Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George
isting pipelines for point cloud-based graphics, learning, and vi- E. Dahl. 2017. Neural message passing for quantum chemistry. arXiv:1704.01212
(2017).
sion, our experiments also indicate several avenues for future re- Aleksey Golovinskiy, Vladimir G. Kim, and Thomas Funkhouser. 2009. Shape-based
search and extension. Some details of our implementation could recognition of 3D point clouds in urban environments. In Proceedings of the ICCV.
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.
146:12 • Y. Wang et al.
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2017b. PointNet: Deep
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial learning on point sets for 3D classification and segmentation. In Proceedings of
nets. In Proceedings of the NIPS. the CVPR.
Paul Guerrero, Yanir Kleiman, Maks Ovsjanikov, and Niloy J. Mitra. 2018. PCPNet: Charles R. Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas
Learning local shape properties from raw point clouds. Comput. Graph. Forum 37, J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D
2 (2018), 75–85. DOI:https://doi.org/10.1111/cgf.13343 data. In Proceedings of the CVPR.
Yulan Guo, Mohammed Bennamoun, Ferdous Sohel, Min Lu, and Jianwei Wan. 2014. Charles R. Qi, Li Yi, Hao Su, and Leonidas J. Guibas. 2017c. PointNet++: Deep hierar-
3D object recognition in cluttered scenes with local surface features: A survey. chical feature learning on point sets in a metric space. In Proceedings of the NIPS.
Trans. PAMI 36, 11 (2014), 2270–2287. Anurag Ranjan, Timo Bolkart, Soubhik Sanyal, and Michael J. Black. 2018. Generating
Oshri Halimi, Or Litany, Emanuele Rodolà, Alex Bronstein, and Ron Kimmel. 2018. 3D faces using convolutional mesh autoencoders. arXiv:1807.10267 (2018).
Self-supervised learning of dense shape correspondence. arXiv:1812.02415 (2018). Raif M. Rustamov. 2007. Laplace-beltrami eigenfunctions for deformation invariant
M. Henaff, J. Bruna, and Y. LeCun. 2015. Deep convolutional networks on graph- shape representation. In Proceedings of the SGP.
structured data. arXiv:1506.05163 (2015). Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. 2009. Fast point feature his-
Andrew E. Johnson and Martial Hebert. 1999. Using spin images for efficient object tograms (FPFH) for 3D registration. In Proceedings of the ICRA.
recognition in cluttered 3D scenes. Trans. PAMI 21, 5 (1999), 433–449. Radu Bogdan Rusu, Nico Blodow, Zoltan Csaba Marton, and Michael Beetz. 2008a.
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational bayes. Aligning point cloud views using persistent feature histograms. In Proceedings of
arXiv:1312.6114 (2013). the IROS.
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised classification with graph Radu Bogdan Rusu, Zoltan Csaba Marton, Nico Blodow, Mihai Dolha, and Michael
convolutional networks. International Conference on Learning Representations Beetz. 2008b. Towards 3D point cloud-based object maps for household environ-
(ICLR). ments. Robot. Auton. Syst. J. 56, 11 (Nov. 2008), 927–941.
Roman Klokov and Victor Lempitsky. 2017. Escape from cells: Deep Kd-networks for Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele
the recognition of 3D point cloud models. (2017). Monfardini. 2009. The graph neural network model. IEEE Tran. Neural Networks
Ilya Kostrikov, Zhongshi Jiang, Daniele Panozzo, Denis Zorin, and Joan Bruna. 2017. 20, 1 (2009), 61–80.
Surface networks. In Proceedings of the CVPR. Syed Afaq Ali Shah, Mohammed Bennamoun, Farid Boussaid, and Amar A. El-Sallam.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification 2013. 3D-Div: A novel local surface descriptor for feature matching and pairwise
with deep convolutional neural networks. In Proceedings of the NIPS. range image registration. In Proceedings of the ICIP.
Yann LeCun, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. 2017. Neighbors do help: Deeply
Wayne Hubbard, and Lawrence D. Jackel. 1989. Backpropagation applied to hand- exploiting local structures of point clouds. arXiv:1712.06760 (2017).
written ZIP code recognition. Neural Comput. 1, 4 (1989), 541–551. David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Van-
Ron Levie, Federico Monti, Xavier Bresson, and Michael M. Bronstein. 2017. Cay- dergheynst. 2013. The emerging field of signal processing on graphs: Extending
leyNets: Graph convolutional neural networks with complex rational spectral fil- high-dimensional data analysis to networks and other irregular domains. IEEE
ters. arXiv:1705.07664 (2017). Signal Process. Mag. 30, 3 (2013), 83–98.
Chun-Liang Li, Manzil Zaheer, Yang Zhang, Barnabas Poczos, and Ruslan Salakhut- Martin Simonovsky and Nikos Komodakis. 2017. Dynamic edge-conditioned filters in
dinov. 2018b. Point cloud GAN. arXiv:1810.05795 (2018). convolutional neural networks on graphs. In Proceedings of the CVPR.
Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Ayan Sinha, Jing Bai, and Karthik Ramani. 2016. Deep learning 3D shape surfaces
2018a. PointCNN: Convolution On X-transformed points. In Advances in Neu- using geometry images. In Proceedings of the ECCV.
ral Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Hang Su, Varun Jampani, Deqing Sun, Subhransu Maji, Evangelos Kalogerakis, Ming-
Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., Hsuan Yang, and Jan Kautz. 2018. SPLATNet: Sparse lattice networks for point
820–830. Retrieved from http://papers.nips.cc/paper/7362-pointcnn-convolution- cloud processing. In Proceedings of the CVPR. 2530–2539.
on-x-transformed-points.pdf. Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015.
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. 2016. Gated graph Multi-view convolutional neural networks for 3D shape recognition. In Proceed-
sequence neural networks. In Proceedings of the ICLR. ings of the CVPR.
Ming Liang, Bin Yang, Shenlong Wang, and Raquel Urtasun. 2018. Deep continuous Jian Sun, Maks Ovsjanikov, and Leonidas Guibas. 2009. A concise and provably infor-
fusion for multi-sensor 3D object detection. In Proceedings of the ECCV. mative multi-scale signature based on heat diffusion. Comput. Graph. Forum 28, 5
Haibin Ling and David W. Jacobs. 2007. Shape classification using the inner-distance. (2009), 1383–1392.
Trans. PAMI 29, 2 (2007), 286–299. Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. 2017. Octree generating
Or Litany, Alex Bronstein, Michael Bronstein, and Ameesh Makadia. 2017a. networks: Efficient convolutional architectures for high-resolution 3D outputs. In
Deformable shape completion with graph convolutional autoencoders. Proceedings of the ICCV.
arXiv:1712.00268 (2017). Federico Tombari, Samuele Salti, and Luigi Di Stefano. 2011. A combined texture-
Or Litany, Tal Remez, Emanuele Rodolà, Alex M. Bronstein, and Michael M. Bron- shape descriptor for enhanced 3D feature matching. In Proceedings of the ICIP.
stein. 2017b. Deep functional maps: Structured prediction for dense shape corre- Oliver Van Kaick, Hao Zhang, Ghassan Hamarneh, and Daniel Cohen-Or. 2011. A
spondence. In Proceedings of the ICCV. survey on shape correspondence. Comput. Graph. Forum 30, 6 (2011), 1681–1707.
I. Loshchilov and F. Hutter. 2017. SGDR: Stochastic gradient descent with warm Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò,
restarts. In Proceedings of the ICLR. and Yoshua Bengio. 2017. Graph attention networks. arXiv:1710.10903.
Min Lu, Yulan Guo, Jun Zhang, Yanxin Ma, and Yinjie Lei. 2014. Recognizing objects in Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, and Raquel Urtasun.
3D point clouds with multi-scale local features. Sensors 14, 12 (2014), 24156–24173. 2018b. Deep parametric continuous convolutional neural networks. In Proceedings
Siddharth Manay, Daniel Cremers, Byung-Woo Hong, Anthony J. Yezzi, and Stefano of the CVPR.
Soatto. 2006. Integral invariants for shape matching. Trans. PAMI 28, 10 (2006), Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018a. Non-local
1602–1618. neural networks. In Proceedings of the CVPR.
Haggai Maron, Meirav Galun, Noam Aigerman, Miri Trope, Nadav Dym, Ersin Yumer, Lingyu Wei, Qixing Huang, Duygu Ceylan, Etienne Vouga, and Hao Li. 2016. Dense
Vladimir G Kim, and Yaron Lipman. 2017. Convolutional neural networks on sur- human body correspondences using convolutional networks. In Proceedings of the
faces via seamless toric covers. In Proceedings of the SIGGRAPH. CVPR.
Jonathan Masci, Davide Boscaini, Michael Bronstein, and Pierre Vandergheynst. 2015. Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang,
Geodesic convolutional neural networks on riemannian manifolds. In Proceedings and Jianxiong Xiao. 2015. 3D shapenets: A deep representation for volumetric
of the 3dRR. shapes. In Proceedings of the CVPR.
Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3D convolutional neural net- Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. 2018.
work for real-time object recognition. In Proceedings of the IROS. Feature denoising for improving adversarial robustness. arXiv:1812.03411.
Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda, and Yaoqing Yang, Chen Feng, Yiru Shen, and Dong Tian. 2018. FoldingNet: Point cloud
Michael M. Bronstein. 2017a. Geometric deep learning on graphs and manifolds auto-encoder via deep grid deformation. In Proceedings of the CVPR.
using mixture model CNNs. In Proceedings of the CVPR. Li Yi, Vladimir G. Kim, Duygu Ceylan, I. Shen, Mengyan Yan, Hao Su, A. R. Cewu Lu,
F. Monti, M. M. Bronstein, and X. Bresson. 2017b. Geometric matrix completion with Qixing Huang, Alla Sheffer, Leonidas Guibas et al. 2016. A scalable active frame-
recurrent multi-graph neural networks. In Proceedings of the NIPS. work for region annotation in 3D shape collections. Trans. Graph. 35, 6 (2016),
Federico Monti, Karl Otness, and Michael M. Bronstein. 2018. MotifNet: A motif-based 210.
graph convolutional network for directed graphs. arXiv:1802.01572 (2018). Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei,
Maks Ovsjanikov, Mirela Ben-Chen, Justin Solomon, Adrian Butscher, and Leonidas and Ali Farhadi. 2017. Target-driven visual navigation in indoor scenes using deep
Guibas. 2012. Functional maps: A flexible representation of maps between shapes. reinforcement learning. In Proceedings of the ICRA.
Trans. Graph. 31, 4 (2012), 30.
Charles R. Qi, Wei Liu, Chenxia Wu, Hao Su, and Leonidas J. Guibas. 2017a. Frustum
PointNets for 3D object detection from RGB-D data. arXiv:1711.08488 (2017). Received January 2019; revised May 2019; accepted June 2019
ACM Transactions on Graphics, Vol. 38, No. 5, Article 146. Publication date: October 2019.