0% found this document useful (0 votes)

34 views

Robot Learning With Implicit Representations

This document discusses implicit neural representations and their applications in robotics. It introduces implicit representations as a continuous alternative to discrete 3D representations like voxels or meshes. Implicit representations are described as functions that map coordinates to feature values. The document outlines how implicit representations can be used for tasks like shape reconstruction, rendering, and novel view synthesis. It also discusses applications in robotics, including grasp detection, visuomotor control, and generalization in manipulation. Finally, it proposes further work on algorithmic development for perception, action, and simulation using implicit representations.

Uploaded by

Qinan Zhang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Robot Learning With Implicit Representations

Uploaded by

Qinan Zhang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Robot Learning with Implicit Representations

Perception, Action, and Simulation

Animesh Garg
RSS 2022 Workshop
What is Implicit Neural Representation?

3D Representations in Visual Computing

ü Discrete Representations
ü Intuitive Spatial Map

✗ Memory
✗ Arbitrary Topologies
✗ Connectivity Structures

Voxels Point Clouds Mesh

Occupancy Nets, CVPR 20

What is Implicit Neural Representation?

ü Continuous Representations
ü “Infinite” Spatial Resolution
ü Memory depends on signal complexity

✗ Not Analytically Tractable

Implicit
Voxels Point Clouds Mesh
Representation

Occupancy Nets, CVPR 20

What is Implicit Neural Representation?
Coordinates Values

𝑓: ℝ! → ℝ"

𝑉 = 𝑓(𝑟)
Images:
ü𝑟:Continuous
(x, y), 𝑉: (r, Representations
g, b)
ü “Infinite” Spatial Resolution
ü3DMemory
Scenesdepends
and Shapes
on signal
(as incomplexity
NeRFs)
𝑟: (𝑥, 𝑦, 𝑧, 𝜃, 𝜙), 𝑉: (𝑟, 𝑔, 𝑏, 𝜎)
✗ Not Analytically Tractable Implicit
Trajectories Representation
𝑟: (𝑞)"! generalized coordinates
𝑉: utility function
Implicit Representations in Visual Computing

Shape reconstruction Rendering Novel view synthesis

Occupancy Networks: Learning 3D Reconstruction in Function Space. In CVPR, 2019.

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes. In CVPR, 2021.
NeRF: Representing Scenes as Neural Radance Fields for View Synthesis. In ECCV, 2020.
Implicit Neural Representations in Robotics

Grasp detection Visuomotor control Generalization in Manipulation

Synergies Between Affordance and Geometry: 6-DOF Grasp Detection via Implicit Representations. In RSS, 2021.
3D Neural Scene Representations for Visuomotor Control. In CoRL, 2021.
Neural Descriptor Fields: SE(3)-Equivariant Object Representations for Manipulation. In ICRA, 2022.
Robot Learning with Implicit Representations
Algorithmic Development (perception and control)
+ Improved Simulation for Contact-rich Manipulation

Perception Action Simulation

Objects & Poses Trajectories & Differentiable
Value Functions contact sim

Work in progress
Robot Learning with Implicit Representations
Algorithmic Development (perception and control)
+ Improved Simulation for Contact-rich Manipulation

Perception Action Simulation

Objects & Poses Trajectories & Differentiable
Value Functions contact sim
NERF 2 NERF
Registering Partially Overlapping NeRFs

Lily Goli, Daniel Rebain, Animesh Garg, Andrea Tagliasacchi

9
Motivation 2 NeRF-2-NeRF
Registration Estimated Object
Poses

Collected NeRFs
In Memory
3
Combination
&
Rendering &
…

Interactive &
Interactable
Simulation
1 NeRF
Acquisition
What are Neural Radiance Fields (NeRFs)?
Training an MLP Composition & Rendering
Registration Problem in NeRFs
Unsupervised Training to Find T - Objective Function?
View/RGB Difference Correspondence Difference

Distance between positions of

Loss Function = Error (NeRF1(T*R), NeRF2(R)) + corresponding point
coordinates after applying T

Challenges:
Even if learned T is optimal: Error between rendered images We need a robust function applied to
is NOT zero! => MSE
The scenes are only partially overlapping. To make it more robust

Corresponding points lie in 2D space of rendered images We derive equivalent 3D Points

Transformation T lies in 3D space =>
using Triangulation
Focusing on First Loss Term (View Difference)
Robust Registration of 2D views. Modeling the problem in 2D setting:

• Delta will not be zero even if TL = TG, in some query points!

Just focus on object of interest -> many loss functions, mostly use manual thresholding

Barron CVPR 2019

Registration via Radiance Matching

Random
view 1

Random
view 2

NeRF 1 without transform NeRF 1 with transform Overlap of fixed NeRF2 Fixed NeRF 2 (target)
with sample points with sample points and moving NeRF1
Different Lightings (Failure Case)
If we use only radiance for registration,
then different lighting models on the object fail!
• Fix: Use Geometry features rather than radiance

Sampling in the moving NeRF target view (uniformly lighter)

Geometry Network via Distillation
We train a 3 layer network supervised by:
Results

Random
view 1

Random
view 2

(moving) NeRF 1 - initial NeRF 1 - registration Overlay of fixed NeRF 2 (fixed) NeRF 2 - target
pose iterations and moving NeRF 1
Robot Learning with Implicit Representations
Algorithmic Development (perception and control)
+ Improved Simulation for Contact-rich Manipulation

Perception Action Simulation

Objects & Poses Trajectories & Differentiable
Value Functions contact sim
NEURAL MOTION FIELDS
Encoding Grasp Trajectories as Implicit Value Functions

Yun-Chun Chen, Adithya Murali, Bala Sundaralingam,

Wei Yang, Animesh Garg, Dieter Fox

21
Existing Grasping Methods

Grasp pose detection

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods

Grasp pose detection

Find inverse kinematic solutions

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods

Grasp pose detection

Find inverse kinematic solutions

Plan a collision-free trajectory

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods

Grasp pose detection

Find inverse kinematic solutions

Plan a collision-free trajectory

Execute the open-loop trajectory

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods
+ Table-top object grasping

+ Grasping in clutter

+ Bin-picking

Contact-GraspNet: Efficient 6-DOF Grasp Generation in Cluttered Scenes. In ICRA, 2021.

6-DOF Grasping for Target-driven Object Manipulation in Clutter. In ICRA, 2020.

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods
+ Table-top object grasping

+ Grasping in clutter

+ Bin-picking

- Infer a finite discrete number of grasps

Contact-GraspNet: Efficient 6-DOF Grasp Generation in Cluttered Scenes. In ICRA, 2021.

6-DOF Grasping for Target-driven Object Manipulation in Clutter. In ICRA, 2020.

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Existing Grasping Methods
+ Table-top object grasping

+ Grasping in clutter

+ Bin-picking

- Infer a finite discrete number of grasps

Grasp affordances are a continuous manifold

Contact-GraspNet: Efficient 6-DOF Grasp Generation in Cluttered Scenes. In ICRA, 2021.

6-DOF Grasping for Target-driven Object Manipulation in Clutter. In ICRA, 2020.

6-DOF GraspNet: Variational Grasp Generation for Object Manipulation. In ICCV, 2019.
Neural Motion Fields
Goal:
Learn a value function that can be used to plan a trajectory for grasping
P ). We propose Neural Motion Fields, which consists of cross-entropy loss function, which is defined as

Neural Motion Fields

modules: a path length module and a collision module.
length prediction. As shown in Figure 1, the path length
Lcollision = pgt (g, P ) log ppred (g, P )
+ (1 pgt (g, P )) log(1 ppred (g, P )),
(
ule first takes as input the object point cloud P and uses the
cloudGoal:
encoder Epath-length to encode a feature embedding where ppred (g, P ) is the predicted collision probability a
ength =Learn a value
Epath-length (P ) 2function
Rd , wherethat
d iscan be used ofto plan
the dimension ) is the ground
pgt (g,aPtrajectory truth.
for grasping
eature fpath-length . Then, the point cloud feature fpath-length B. Generating Grasp Motion
he gripper pose g are concatenated and passed to the path Given the path length value function represented by the pa
h prediction network Fpath-length to predict the path length length module and the collision value function represented
(g, P ) for the gripper pose g.1 the collision module, we formulate the grasp cost Cgrasp as
Value function:
train the path length module, we adopt an `1 loss function,
Map aas gripper pose to its path length to a grasp
h is defined Cgrasp (gt , P ) = (1 V (gt , P )) + C(gt , P ), (
Lpath-length = kVpred (g, P ) Vgt (g, P )k1 , (2) where V (gt , P ) is the predicted path length for the gripp
pose gt , C(gt , P ) is the collision cost of the gripper pose
e Vpred (g, P ) denotes the predicted path length of the computed by thresholding p(gt , P ) ⌧ , ⌧ is a hyperparamet
er pose g and Vgt (g, P ) denotes the ground truth. and P is the object point cloud. In our work, we set ⌧ = 0.
e visualize the learned value function using a cost map We then optimize the grasp cost Cgrasp along with t
lization as shown in Figure 3. We show two cost maps. In cost Cstorm to ensure smooth collision-free motions usi
cost map, we select a grasp pose. We keep the orientation STORM [1], which is a GPU-based MPC framework:
vary the x and y positions of the grasp pose to compose
. We then query the path lengths of the composed poses min Cstorm (q) + Cgrasp (
ẍt2[0,H]
the learned model. Each input pose is represented by a Additional details on C is available in [1].
yP ). We2) We show that this learned object-centric representations
propose
allows Neuralgrasp
reactive Motion Fields, which
manipulation using consists
MPC of cross-entropy loss function, which is defined as
[1].
n
modules:
d
s
length
Neural Motion Fields
a path length module and a collision module.
prediction. II.
As P ROBLEM S TATEMENT
shown in Figure 1, the path length
Lcollision = pgt (g, P ) log ppred (g, P )
+ (1 pgt (g, P )) log(1 ppred (g, P )),
(
ule Value
n first function
takes as inputlearning
the object forpoint
grasping.
cloud We P and
are uses the in
interested
scloud Goal:
learning
encoder a model that can
Epath-length be used
to encode to plan embedding
a feature a kinematicallywhere ppred (g, P ) is the predicted collision probability a
=Learn
engthfeasible a value
trajectory
Epath-length 2function
(P ) for Rthe, wherethatd iscan be used ofto plan
d robot to execute to grasp an object.
the dimension ) is the ground
pgt (g,aPtrajectory truth.
for grasping
s Specifically,
eature fpath-length .we cast the
Then, thispoint
task cloud
as a value
featurefunction learning
fpath-length B. Generating Grasp Motion
problem.
he gripper
s, poseWegassume that we areand
are Nconcatenated given
passeda segmented
to the pathobject Given the path length value function represented by the pa
point cloud
ht prediction P 2 R F , wheretoNpredict
network
⇥3
is the number
the pathoflength
points in a
path-length
point cloud, and a gripper pose g 2 SE(3). The value function length module and the collision value function represented
l
(g, P ) for the gripper pose g. 1
ytrain Value
V (g, function:
P ) describes how far the gripper pose g is from a grasp the collision module, we formulate the grasp cost Cgrasp as
the path length module, we adopt an `1 loss function,
s, on the object.
Map We use
aas gripper the path
pose length
to its pathof length
a gripper topose to
a grasp
h is represent
defined Cgrasp (gt , P ) = (1 V (gt , P )) + C(gt , P ), (
k the value function.
p Gripper pose=path
Lpath-length kVpredlength.
(g, P )As Vshown
gt (g, Pin
)kFigure
1, givenwhere
2a, (2) a V (gt , P ) is the predicted path length for the gripp
n trajectory {gi }ti=0 , where g0 denotes the end pose (grasp pose) pose gt , C(gt , P ) is the collision cost of the gripper pose
ef Vand
pred (g,
gt P ) denotes
denotes the pose,
the start predicted
the pathpath length
length of start
of the the pose
computed by thresholding p(gt , P ) ⌧ , ⌧ is a hyperparamet
ger pose g and Vas
gt (denoted gt (g, Pt)))denotes
V (g is definedthe as
ground truth.
the cumulative sum ofand P is the object point cloud. In our work, we set ⌧ = 0.
el visualize
the average the distance
learned between
value function usinggripper
two adjacent a costposes
map [16], We then optimize the grasp cost C
Grasp pose !! grasp along with t
lization
t which Gripper
ascan bepose
shown path
in Figure
expressed 3. length:
by We show two cost maps. In cost C
storm to ensure smooth collision-free motions usi
mcost map, we select Xt 1 a grasp pose. We keep the orientation STORM [1], which is a GPU-based MPC framework:
1 X
vary
s the
V (gxt )and
= y positionsk(R of ithe
x +grasp
Ti ) pose
(Ri+1tox+ compose
Ti+1 )k, (1)
t. We then query
m
i=0the x2Mpath lengths of the composed poses min Cstorm (q) + Cgrasp (
ẍt2[0,H]
thewhere
learned is the rotation
Ri model. Each input of the
posegripper pose gi , by
is represented Ti ais the
Additional details on C
Start pose !"
is available in [1].
e path length value function represented by the path

Grasp Motion Generation

ule and the collision value function represented by
n module, we formulate the grasp cost Cgrasp as

grasp (gt , P ) = (1 V (gt , P )) + C(gt , P ), (4)

t, P ) is the predicted path length for the gripper

(gt , P ) is the collision cost of the gripper pose gt
Query gripper
y thresholding p(gt , P ) poses
⌧ , ⌧ is and optimize the value
a hyperparameter,
function
e object using
point cloud. a sampling-based
In our work, we set ⌧ = 0.25.MPC
optimize the grasp cost Cgrasp along with the
framework (MPPI)
to ensure smooth collision-free motions using
], which is a GPU-based MPC framework:
min Cstorm (q) + Cgrasp (5)
ẍt2[0,H]

details on Cstorm is available in [1].

IV. E XPERIMENTS
mental Setup
e experiment with four box objects from the dataset
y [8] and one bowl object from the ACRONYM
We first subsample a set of 16 grasp poses aroundSTORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation. In CoRL, 2022.
Ablation Study on Number of Trajectories

Static object poses Dynamic object poses

More data helps with fine-grained rotation error with non-stationary objects
Ablation Study on Number of Anchor Grasps

Anchor grasps

More data helps with snapping to multi-modal grasp prediction

Floating Object Demo
Robot Learning with Implicit Representations
Algorithmic Development (perception and control)
+ Improved Simulation for Contact-rich Manipulation

Perception Action Simulation

Objects & Poses Trajectories & Differentiable
Value Functions contact sim
GRASP’D
Differentiable Contact-Rich Grasp Synthesis

Dylan Turpin, Liquan Wang, Eric Heiden, Yun-Chun Chen,

Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg

37
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Only a fraction of possible contacts are active (in collision) at a given time.
Inactive contacts have no gradient.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Only a fraction of possible contacts are active (in collision) at a given time.
Inactive contacts have no gradient.
Can’t follow gradient to create new contacts.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Can’t follow gradient to create new contacts.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Can’t follow gradient to create new contacts.

2. Local flatness
Often compute ground-truth SDF from mesh.
If closest point is on triangle face, surface normal gradient is 0.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Can’t follow gradient to create new contacts.

2. Local flatness
Often compute ground-truth SDF from mesh.
If closest point is on triangle face, surface normal gradient is 0.
Can’t follow gradient to improve contact normals.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Can’t follow gradient to create new contacts.

2. Local flatness
Can’t follow gradient to improve contact normals.
Motivation
Goal: Make SDF-based contact forces friendly to gradient-based optimization.
Why? Planning in high-dimensional contact-rich scenarios,
e.g., robotic grasping and manipulation with multi-finger hands.
Challenges
1. Contact sparsity
Can’t follow gradient to create new contacts.