Jim Fan’s Post

Name: Not every foundation model needs to be gigantic. We trained a… | Jim Fan | 159 comments
Uploaded: 2024-10-30T15:23:45.331Z
Duration: 42 s
Channel: Jim Fan

Jim Fan

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa

159 Comments

Ary Shalizi

Science, from artisanal to automated | High Content & High Throughput Assay Development | Laboratory Automation

4mo

Jim Fan so does it work in actual robots?

1 Reaction

Mehmet Akdemir

Automotive Professional

4mo

Jim Fan But why a humanoid? Why do you push a robot's capability into a limited capacity of a human ergonomics... Try to put wheels and add a third arm observe the efficiency and please delete this head. It only makes the centre of gravity higher. If you need to place the controller of the robot try the bottom of the robot. I believe Lucas is a genius making C-3PO the stupid one. Why do you think the ideal form of an universal robot is a human being?

50 Reactions

Dyllan McCreary

AI/RL

4mo

For embodiment agents/reinforcement learning models we can train to SOTA with much fewer parameters than models for image generation or language modeling. It’s interesting how a seemingly more complex task requires much less bandwidth to solve. Like an image model might need billions of params to generate in-distribution data from a PB scale dataset, but for a highly complex physical problem- the policies can be pennies in comparison. Scaling laws sorta apply, but it’s very diminishing after a certain point. I saw similar limits at quilter when training rl placer agents- beyond a million parameters was unnecessarily large.

12 Reactions

Philippe Collard

Business samurai | General Manager, Rezoway USA

4mo

There is no use case for humanoid robots. As a matter of fact, they are a danger (physical) to real people (think babies, handicapped, etc). This is just sci fi fantasy work...how about doing something useful?

John DesJardins

Field CTO Akka | Passionate about global scale distributed applications | Lifelong Learner

4mo

I love to seeing more focus on robotics! It has so much potential to transform and improve lives, such as for the disabled. Using AI to solve unsolved problems should be the top focus for the industry.

5 Reactions

Percy Oliver Mueller

CEO at UTG, Chairman of Atomind Inc.

4mo

Progress is promising. We’re researching analog technology to train localized network-like systems that improve kinetic response in robotic joints. Additionally, we’re developing feedback sensors designed to relay information to local networks before reaching the central AI within nanoseconds. Reducing latency in a robot's wiring is critical, as every millisecond saved allows the AI to respond within the necessary timeframe—similar to the peripheral nervous system’s role in the human body, where rapid, localized responses enhance overall reaction speed. Analog systems operate in real time, and analog AI is expected to emerge within the next five years, further advancing these capabilities.

2 Reactions

Theodore Tanner Jr.

Ship Software Change The World

4mo

Have been saying for a couple of years now and with results: Vertically (small model) trained then horizontally chained.

3 Reactions

Eric Fraser

CTO of Dr. Lisa AI. Views expressed here are my own.

4mo

This is actually a more "native" use of GPUs than making them do transformer math for LLMs

5 Reactions

Nyla Worker

AI product manager at Google | Ex-NVIDIA | Finalist Women in AI Innovator of the Year

3mo

Human’s key advantage is our one-shot ability for us to learn to grab and interface with the world. How fast and precise is this in the real world? How close do you think we are to a foundational model that reaches fast one one-shot reactions to the real world? Would be a good time to make a benchmark!

2 Reactions

인다니엘

물류회사 직장인

4mo

틱톡 라이트 앱에 가입하면 별을 채우고 보상을 받을 수 있어요! https://lite.tiktok.com/t/ZSjkfL46T/

See more comments

To view or add a comment, sign in

More Relevant Posts

Charles Cozad

Manager Solution Architecture at Aya Healthcare
4mo
Report this post
Simulating motion in 3d space at scale.

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Tino Herold

Senior SAP Finance Control Consultant bei ISAP Solutions FZE. Blockchain | Wallet | NFT | DeFi | Metaverse |
4mo
Report this post
Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Somasundaram A

Firmware, Platform Security Engineer, Technical Lead, Inventor.
4mo Edited
Report this post
This is a groundbreaking "human locomotive learning-training model". It also profoundly demonstrates the power and efficiency of NVIDIA's GPU, out of box approach and innovativeness in conceptualizing NVIDIA Isaac -simulation suite and Hover-The Model". I think this can be further advanced by progressively training such humanoids to perform specialized locomotive permutations combined with elements of cognitive computing, which are applicable in multidisciplinary and disparate life saving actions/action groups, and other perilous functions. For instance, in deep mining, this can complement human workers for doing a subset of activities.

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Axel Schmiegelow

Artificial Intelligence | Agentic Generative AI | Sustainability & Rewilding | Travel and Tourism |
4mo
Report this post
fascinating use of the inherent capabilities of GPUs

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Jess Bennett

Vice President, Growth & Innovation
4mo
Report this post
Yes!! Not every foundation model needs to be gigantic. Smaller models also have a smaller carbon footprint so it is important to consider MLMs and SLMs. You need the right sized model for the job. 🌱 #llm #machinelearning #ai #artificialintelligence

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Raymond Uzwyshyn Ph.D.

Research Impact, IT, AI, Data, Digital Scholarship Libraries, Innovation
3mo Edited
Report this post
Here Come the Robots! In an era defined by rapid technological evolution, it’s crucial to reflect on where new advancements in AI, robotics, and machine learning outlined eloquently below by Nvidia's Jim Fan are leading us. How they will impact global social and economic systems over the next 10, 25, and 50 years? Today, we stand at a juncture where models like NVIDIA’s HOVER, a 1.5M-parameter neural network, demonstrate the capability to replicate the subconscious coordination needed for human movement. With robots undergoing years’ worth of training in virtual environments in just minutes, the implications are profound. Currently, approximately 50% of global labor involves physical tasks—what does this mean for the future workforce? 10 Years From Now (2034): By 2034, the integration of robotics into labor-intensive industries could redefine efficiency and productivity. Routine physical tasks and even specialized skilled roles may become increasingly automated. Globally, this shift will necessitate significant socio-economic adjustments, requiring a recalibration of education systems, job training programs, and labor policies to adapt to an AI-enhanced landscape. 25 Years Into the Future (2049): By 2049, with half the world’s labor force potentially impacted, we could witness a broad restructuring of economies. Robots with advanced cognitive and physical capabilities might support industries ranging from manufacturing to healthcare, reshaping not just workplaces but entire industries. Socially and culturally, new paradigms will emerge, redefining the value of human labor and prompting discussions on equity, income distribution, and social welfare. 50-Year Horizon (2074): Looking ahead to 2074, the line between human and robotic capabilities may become nearly indistinguishable. Entirely new economic systems could arise, moving beyond the traditional labor model. With half of global labor tasks no longer reliant on human physicality, societies will need to pivot towards valuing creativity, strategic oversight, and human-machine collaboration. Nations could see shifts in power dynamics as AI resources become a cornerstone of socio-economic stability and growth. As we contemplate a future where the backbone of human labor transitions from physical to augmented, the challenge will be ensuring that technological advancements empower rather than displace. The path forward demands strategic foresight to create systems that are not only cutting-edge but also equitable and sustainable. How can we prepare for this transformation while safeguarding human dignity and opportunity? The conversation begins now. #AI #FutureOfWork #SocioEconomicShifts #GlobalLabor #Robotics

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Franck Nouyrigat

Founder@ Electis / StartupWeekend / startup next / Up Global / recorp.co / Massive / I focus on ambitious tech projects with high impact
4mo
Report this post
only 1.5M parameters is crazy, can easily run on any robots

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Jens Scharnetzki

Chief Product Owner & Head of Experience Design and Journey Management at Yello Strom and EnBW B2C Sales and Thought Leader 🤪 for the UX of Ai
4mo Edited
Report this post
Why Not Every AI Model Needs to Be Huge 🚀 Bigger isn’t always better. Dr. Fan and his team at NVIDIA just proved this with HOVER — a lean, 1.5M-parameter neural net that shows how small can be mighty. This model captures the “subconscious” control we humans use to walk, balance, and move, packing it into a streamlined, focused design. 💡 Thanks to a supercharged virtual dojo, HOVER compresses a full year of learning into 50 minutes, then transfers straight to the real world with no extra tuning. What’s especially impressive? It adapts to diverse prompts, from XR devices to joysticks 🎮, making it intuitive to control complex movements on any humanoid. Huge kudos to Dr. Fan and the NVIDIA team for this innovation — sometimes, less really is more! #ai

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Arash Hashemi, M.A.Sc

Software Developer @ General Motors⠀⠀ ⠀ ⠀⠀⠀⠀ ⠀⠀ ⠀⠀⠀⠀⠀⠀⠀⠀ AWS ML Specialty Certified⠀⠀⠀ ⠀⠀⠀⠀ ⠀⠀ ⠀⠀⠀⠀ ⠀⠀ ⠀⠀⠀⠀ ⠀ ⠀⠀ Machine Learning | Advanced Controls | Automation
4mo
Report this post
🚀 Fast simulation + deep learning = endless possibilities! 🤖 As we push both speed and accuracy, new doors open for complex, data-driven robotics solutions. 🔥 #AI #DeepLearning #Robotics #Innovation #FutureTech #Simulation

Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.
4mo

Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid robot. It takes a lot of subconscious processing for us humans to walk, maintain balance, and maneuver our arms and legs into desired positions. We capture this “subconsciousness” in HOVER, a single model that learns how to coordinate the motors of a humanoid robot to support locomotion and manipulation. We trained HOVER in NVIDIA Isaac, a GPU-powered simulation suite that accelerates physics by 10,000x faster than real time. To put the number in perspective, the robots undergo 1 year of intense training in a virtual “dojo”, but take only ~50 minutes of wall clock time on one GPU card. The neural net then transfers zero-shot to the real world without finetuning. HOVER can be *prompted* for various types of high-level motion instructions that we call “control modes”. To name a few: - Head and hand poses - can be captured by XR devices like Apple Vision Pro. - Whole-body poses - via MoCap or RGB camera. - Whole-body joint angles - Exoskeleton. - Root velocity command - Joysticks. What HOVER enables: - A unified interface for us to control the robot using whichever input devices are convenient at hand. - An easier way to collect whole-body teleoperation data for training. - An upstream Vision-Language-Action model to provide motion instructions, which HOVER translates to low-level motor signals at high frequency. HOVER supports any humanoid that can be simulated in Isaac. Bring your own robot, and watch it come to life! It's a big teamwork from NVIDIA GEAR Lab and collaborators: Tairan He, Wenli Xiao, Toru Lin, Zhengyi Luo, Zhenjia Xu, Zhenyu Jiang, Jan Kautz, Changliu Liu, Guanya Shi, Xiaolong Wang Team leads: Jim Fan, Yuke Zhu Website: https://lnkd.in/g6WrJyRC Paper: https://lnkd.in/g99yWTPa
Like Comment
To view or add a comment, sign in
Seeed Studio

49,129 followers
10mo
Report this post
#projectSpotlight Check out this AI-powered forensic assistant #NecroTalk, developed on the NVIDIA Robotics Jetson AGX Orin. This tool is designed to improve the automated transcription and analysis of conversations during necropsies, making it faster, more accurate, and accessible. Here's a quick link to project: https://lnkd.in/gUNvCkx3 Key Methods Involved: ✅Precision Transcription: Utilizes OpenAI's #Whisper to capture the nuances and context of forensic conversations accurately. ✅Speaker Identification: Employs NVIDIA #NeMo toolkit for identifying and segregating different speakers within audio recordings. ✅Advanced Language Processing: Leverages #GPT models to enhance understanding and processing of complex forensic language. ✅Edge Computing: Integrates Jetson Orin to ensure all processing is done at the edge with the requisite speed and efficiency for forensic applications. Learn more about Nvidia Jetson AGX Orin : https://lnkd.in/gvyk5QAf #nvidia #jetson #transformer #ai #audiototext #languageprocessing
1 Comment
Like Comment
To view or add a comment, sign in

190,338 followers

259 Posts

View Profile Connect

Jim Fan’s Post

More Relevant Posts

Explore topics