0% found this document useful (0 votes)
20 views

Lecture 16 Hao

This document discusses 3D deep learning and summarizes key challenges and approaches. It covers applications of 3D data in robotics, augmented reality, autonomous driving, and medical imaging. Traditional 3D vision used multi-view geometry and physics, while 3D deep learning aims to acquire knowledge of the 3D world through learning. Representing 3D data for deep learning is challenging due to irregular geometric structures. The document discusses various network architectures for point clouds, meshes, volumes, and multi-view data. It also covers tasks like 3D classification, reconstruction, and generating point clouds from single images.

Uploaded by

Shashank Alok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Lecture 16 Hao

This document discusses 3D deep learning and summarizes key challenges and approaches. It covers applications of 3D data in robotics, augmented reality, autonomous driving, and medical imaging. Traditional 3D vision used multi-view geometry and physics, while 3D deep learning aims to acquire knowledge of the 3D world through learning. Representing 3D data for deep learning is challenging due to irregular geometric structures. The document discusses various network architectures for point clouds, meshes, volumes, and multi-view data. It also covers tasks like 3D classification, reconstruction, and generating point clouds from single images.

Uploaded by

Shashank Alok
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

3D Deep Learning

Hao Su

@Stanford CS231n Guest Leture


Broad Applications of 3D data

Robotics
Broad Applications of 3D data

Robotics Augmented
Reality
Broad Applications of 3D data

Robotics Augmented
Reality

Autonomous
driving
Broad Applications of 3D data

Robotics Augmented
Reality

Autonomous Medical Image


driving Processing
Traditional 3D Vision
Multi-view Geometry: Physics based
3D Learning: Knowledge Based
Acquire Knowledge of 3D World by Learning
The Representation Challenge
of 3D Deep Learning

Rasterized form Geometric form


(regular grids) (irregular)
The Representation Challenge
of 3D Deep Learning

Multi-view Volumetric Part Assembly

F(x) = 0

Point Cloud Mesh (Graph CNN) Implicit Shape


The Richness of 3D Learning Tasks

3D Analysis

Detection

Segmentation
Classification Correspondence
(object/scene)
The Richness of 3D Learning Tasks
3D Synthesis

Monocular Shape completion Shape modeling


3D reconstruction
Agenda

• 3D Classification

• 3D Reconstruction

• Others
Volumetric CNN
Can we use CNNs but avoid projecting the 3D
data to views first?

Straight-forward idea: Extend 2D grids 3D grids


Voxelization

Represent the occupancy of regular 3D grids


3D CNN on Volumetric Data

3D convolution uses 4D kernels


Complexity Issue

AlexNet, 2012 3DShapeNets,


Input resolution: 224x224 2015
224x224=50176 Input resolution: 30x30x30
224x224=27000
Complexity Issue

Polygon Mesh Occupancy Grid


30x30x30

Information loss in voxelization


Idea 1: Learn to Project

Idea: “X-ray” rendering + Image (2D) CNNs


very low #param, very low computation

Su et al., “Volumetric and Multi-View CNNs for Object Many other works in autonomous driving that
Classification on 3D Data”, CVPR 2016
uses bird’s eye view for object detection
More Principled: Sparsity of 3D Shapes

Occupancy:
Resolution: 32 64 128
Store only the Occupied Grids

• Store the sparse surface signals


• Constrain the computation near the surface
Octree: Recursively Partition the Space

Each internal node has exactly eight children


Neighborhood searching: Hash table
Memory Efficiency

GPU Memory
Memory (GB)
6

4.5 Voxel CNN O-CNN


3

1.5

0
16^3 32^3 64^3 128^3 256^3 Resolutio
O-CNN
Voxel CNN
Implementation

• SparseConvNet
• https://github.com/facebookresearch/
SparseConvNet
• Uses ResNet architecture
• State-of-the-art for 3D analysis
• Takes time to train

Graham et al., “Submanifold Sparse Convolutional


Networks”, arxiv
Point Networks
Point cloud
(The most common 3D sensor data)
Directly Process Point Cloud Data

End-to-end learning for unstructured,


unordered point data

Object
PointNet Classification

Qi, Charles R., et al. "Pointnet: Deep learning on point


sets for 3d classification and segmentation”, CVPR 2017
Zaheer, Manzil, et al. "Deep sets”, NeurIPS 2017
Permutation invariance
Point cloud: N orderless points, each represented by a
D dim coordinate
D

2D array representation
Permutation invariance
Point cloud: N orderless points, each represented by a
D dim coordinate
D D

N represents the same set as N

2D array representation
Construct a Symmetric Function

Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h

(1,2,3)
(1,1,1)
(2,3,2)
(2,3,4)
Construct a Symmetric Function

Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h

(1,2,3) simple symmetric function


(1,1,1) g

(2,3,2)
(2,3,4)
Construct a Symmetric Function

Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h

(1,2,3) simple symmetric function


(1,1,1) g γ

(2,3,2)
(2,3,4)
PointNet (vanilla)
Limitations of PointNet
Hierarchical feature learning Global feature learning
Multiple levels of abstraction Either one point or all points

3D CNN (Wu et al.) PointNet (vanilla) (Qi et al.)

• No local context for each point!


• Global feature depends on absolute coordinate. Hard to
generalize to unseen scene configurations!
Points in Metric Space

• Learn “kernels” in 3D space and conduct convolution

• Kernels have compact spatial support

• For convolution, we need to find neighboring points

• Possible strategies for range query


• Ball query (results in more stable features)
• k-NN query (faster)
PointNet v2.0: Multi-Scale PointNet

N points in N1 points in N2 points in


(x,y) (x,y,f) (x,y,f’)

Repeat
• Sample anchor points
• Find neighborhood of anchor points
• Apply PointNet in each neighborhood to mimic convolution
Point Convolution As Graph Convolution
• Points -> Nodes
• Neighborhood -> Edges
• Graph CNN for point cloud processing

Wang et al., “Dynamic Graph CNN for Learning on Point Clouds”,


Transactions on Graphics, 2019

Liu et al., “Relation-Shape Convolutional Neural Network for Point


Cloud Analysis”, CVPR 2019
Agenda

• 3D Classification

• 3D Reconstruction

• Others
Multi-View Stereo (MVS)
Reconstruct the dense 3D shape from a set of images
and camera parameters

1. Goldlucke et al. “A Super-resolution Framework for High-Accuracy Multiview Reconstruction”


Requirements of MVS

Time Computation
Applications Range Accuracy
Efficiency Efficiency

Remote Sensing

Autonomous Driving

AR/VR

Robot Manipulation

Inverse Engineering
Reconstruction from Photo-Consistency

NCC (Normalized Cross Correlation)

SSD (Sum Squared Distance)

• Requires texture
• Sensitive to Non-lambertian area

Image source: UW CSE455


Cost-Volume-based MVS
Multi-view images and camera parameters
Cost-Volume-based MVS
Build 3D cost volume in reference view frustum
Topdown View of Cost Volume
Cost-Volume-based MVS
Fetch images features for each voxel
• Voxel in ground truth surface shows feature consistency
Cost-Volume-based MVS
Dense 3D CNNs
Improve Output Resolution
• Differentiable soft-argmin to achieve sub-pixel accuracy.

d=1

d=2

d=3

Kendall et al., “End-to-End Learning of Geometry and Context for Deep


Stereo Regression”, ICCV 2017
Reconstruction is More Complete
More Details from Point MVSNet

Camp [2] Ours


Agenda

• 3D Classification

• 3D Reconstruction

• Others
From Single Image to Point Cloud
• It is possible to generate a set (permutation invariant)

Image Predicted set


8 9
>
> (x1 , y1 , z1 ) >
>
< =
Deep Neural (x2 , y2 , z2 )
Network >
> ... >
>
: ;
(xn , yn , zn )
Point Set

8 9 Distance
>
> (x01 , y10 , z10 ) >
>
< =
(x02 , y20 , z20 )
>
> ... >
>
: 0 0 0 ;
(xn , yn , zn )

Groundtruth point cloud

Fan et al., “A Point Set Generation Network for 3D Object


Reconstruction from a Single Image”, CVPR 2017
From Image to Surface

• Learn to warp a plane to surface

Groueix et al., “AtlasNet: A Papier-Mâché Approach to


Learning 3D Surface Generation”, CVPR 2018
Yang, Yaoqing, et al. "Foldingnet: Point cloud auto-
encoder via deep grid deformation”, CVPR 2018
Structured Prediction: Part-based
Recursive Network for Hierarchical Graph AE

Li, Jun et al., “GRASS: Generative Recursive Autoencoders Mo, Kaichun et al., “StructureNet, a hierarchical graph network
for Shape Structures”, Siggraph 2017 for learning PartNet shape generation”, Siggraph Asia 2019
Structured Prediction: Part-based

Mo et al., “StructureNet, a hierarchical graph network for


learning PartNet shape generation”, Siggraph Asia 2019
Many More to Explore…

Movable Part Motion Parameter


Segmentation Estimation

Long-horizon
Planning Part Manipulation

You might also like