Lecture 16 Hao
Lecture 16 Hao
Hao Su
Robotics
Broad Applications of 3D data
Robotics Augmented
Reality
Broad Applications of 3D data
Robotics Augmented
Reality
Autonomous
driving
Broad Applications of 3D data
Robotics Augmented
Reality
F(x) = 0
3D Analysis
Detection
Segmentation
Classification Correspondence
(object/scene)
The Richness of 3D Learning Tasks
3D Synthesis
• 3D Classification
• 3D Reconstruction
• Others
Volumetric CNN
Can we use CNNs but avoid projecting the 3D
data to views first?
Su et al., “Volumetric and Multi-View CNNs for Object Many other works in autonomous driving that
Classification on 3D Data”, CVPR 2016
uses bird’s eye view for object detection
More Principled: Sparsity of 3D Shapes
Occupancy:
Resolution: 32 64 128
Store only the Occupied Grids
GPU Memory
Memory (GB)
6
1.5
0
16^3 32^3 64^3 128^3 256^3 Resolutio
O-CNN
Voxel CNN
Implementation
• SparseConvNet
• https://github.com/facebookresearch/
SparseConvNet
• Uses ResNet architecture
• State-of-the-art for 3D analysis
• Takes time to train
Object
PointNet Classification
2D array representation
Permutation invariance
Point cloud: N orderless points, each represented by a
D dim coordinate
D D
2D array representation
Construct a Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(1,2,3)
(1,1,1)
(2,3,2)
(2,3,4)
Construct a Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(2,3,2)
(2,3,4)
Construct a Symmetric Function
Observe:
f (x1 , x2 ,…, xn ) = γ ! g(h(x1 ),…,h(xn )) is symmetric if g is symmetric
h
(2,3,2)
(2,3,4)
PointNet (vanilla)
Limitations of PointNet
Hierarchical feature learning Global feature learning
Multiple levels of abstraction Either one point or all points
Repeat
• Sample anchor points
• Find neighborhood of anchor points
• Apply PointNet in each neighborhood to mimic convolution
Point Convolution As Graph Convolution
• Points -> Nodes
• Neighborhood -> Edges
• Graph CNN for point cloud processing
• 3D Classification
• 3D Reconstruction
• Others
Multi-View Stereo (MVS)
Reconstruct the dense 3D shape from a set of images
and camera parameters
Time Computation
Applications Range Accuracy
Efficiency Efficiency
Remote Sensing
Autonomous Driving
AR/VR
Robot Manipulation
Inverse Engineering
Reconstruction from Photo-Consistency
• Requires texture
• Sensitive to Non-lambertian area
d=1
d=2
d=3
• 3D Classification
• 3D Reconstruction
• Others
From Single Image to Point Cloud
• It is possible to generate a set (permutation invariant)
8 9 Distance
>
> (x01 , y10 , z10 ) >
>
< =
(x02 , y20 , z20 )
>
> ... >
>
: 0 0 0 ;
(xn , yn , zn )
Li, Jun et al., “GRASS: Generative Recursive Autoencoders Mo, Kaichun et al., “StructureNet, a hierarchical graph network
for Shape Structures”, Siggraph 2017 for learning PartNet shape generation”, Siggraph Asia 2019
Structured Prediction: Part-based
Long-horizon
Planning Part Manipulation