Jim Fan’s Post

View profile for Jim Fan
Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training.  - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa

Ary Shalizi

Science, from artisanal to automated | High Content & High Throughput Assay Development | Laboratory Automation

4mo

Jim Fan so does it work in actual robots?

Mehmet Akdemir

Automotive Professional

4mo

Jim Fan But why a humanoid? Why do you push a robot's capability into a limited capacity of a human ergonomics... Try to put wheels and add a third arm observe the efficiency and please delete this head. It only makes the centre of gravity higher. If you need to place the controller of the robot try the bottom of the robot. I believe Lucas is a genius making C-3PO the stupid one. Why do you think the ideal form of an universal robot is a human being?

For embodiment agents/reinforcement learning models we can train to SOTA with much fewer parameters than models for image generation or language modeling. It’s interesting how a seemingly more complex task requires much less bandwidth to solve. Like an image model might need billions of params to generate in-distribution data from a PB scale dataset, but for a highly complex physical problem- the policies can be pennies in comparison. Scaling laws sorta apply, but it’s very diminishing after a certain point. I saw similar limits at quilter when training rl placer agents- beyond a million parameters was unnecessarily large.

Philippe Collard

Business samurai | General Manager, Rezoway USA

4mo

There is no use case for humanoid robots. As a matter of fact, they are a danger (physical) to real people (think babies, handicapped, etc). This is just sci fi fantasy work...how about doing something useful?

Like
Reply
John DesJardins

Field CTO Akka | Passionate about global scale distributed applications | Lifelong Learner

4mo

I love to seeing more focus on robotics! It has so much potential to transform and improve lives, such as for the disabled. Using AI to solve unsolved problems should be the top focus for the industry.

Percy Oliver Mueller

CEO at UTG, Chairman of Atomind Inc.

4mo

Progress is promising. We’re researching analog technology to train localized network-like systems that improve kinetic response in robotic joints. Additionally, we’re developing feedback sensors designed to relay information to local networks before reaching the central AI within nanoseconds. Reducing latency in a robot's wiring is critical, as every millisecond saved allows the AI to respond within the necessary timeframe—similar to the peripheral nervous system’s role in the human body, where rapid, localized responses enhance overall reaction speed. Analog systems operate in real time, and analog AI is expected to emerge within the next five years, further advancing these capabilities.

Theodore Tanner Jr.

Ship Software Change The World

4mo

Have been saying for a couple of years now and with results: Vertically (small model) trained then horizontally chained.

Eric Fraser

CTO of Dr. Lisa AI. Views expressed here are my own.

4mo

This is actually a more "native" use of GPUs than making them do transformer math for LLMs

Nyla Worker

AI product manager at Google | Ex-NVIDIA | Finalist Women in AI Innovator of the Year

3mo

Human’s key advantage is our one-shot ability for us to learn to grab and interface with the world. How fast and precise is this in the real world? How close do you think we are to a foundational model that reaches fast one one-shot reactions to the real world? Would be a good time to make a benchmark!

인다니엘

물류회사 직장인

4mo

틱톡 라이트 앱에 가입하면 별을 채우고 보상을 받을 수 있어요! https://lite.tiktok.com/t/ZSjkfL46T/

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics