Physical AI

NVIDIA Cosmos

Accelerate physical AI development with world foundation models.

Overview

What is NVIDIA Cosmos?

NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models (WFM), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline built to accelerate the development of physical AI systems such as autonomous vehicles (AVs) and robots.

Cosmos World Foundation Models Openly Available to Physical AI Developer Community

State-of-the-art models trained on millions of hours of driving and robotics video data to democratize physical AI development, available under open model license.

The World Foundation Model Platform to Accelerate Physical AI Development

The development of physical-AI-embodied systems such as robots and autonomous vehicles is accelerated with the new NVIDIA Cosmos platform.

Benefits

Accelerate Physical AI Development With World Foundation Models

Cosmos provides developers with open and easy access to highly performant world foundation models and data pipelines, making physical AI development accessible to all.

Decorative icon

Physics Aware

Suite of first-generation video models trained on 9,000 trillion tokens, including 20 million hours of robotics and driving data - generating high-quality videos from multimodal inputs like images, text, or video.

Decorative icon

Open

Cosmos WFMs and tokenizers are under NVIDIA Open Model License, enabling developers worldwide to build physical AI systems at scale without high entry costs.

Decorative icon

Accelerate Data Processing and Curation

Speed up data curation by 20X with NVIDIA NeMo Curator pipeline of CUDA™-X and NVIDIA AI-accelerated tooling for processing over 100 PB of data. It provides out-of-the-box optimizations, minimizing the total cost of ownership (TCO) and accelerating time-to-market.

Decorative icon

Develop Custom Models

Cosmos tokenizer converts visual data into high-fidelity tokens with 8X better compression and 12X faster processing.

NVIDIA NeMo™ delivers accelerated training and fine-tuning to build multimodal generative AI models for physical AI.

Models

NVIDIA Cosmos World Foundation Models

A family of pre-trained models purpose-built for generating physics-aware videos and world states for physical AI development.


Learn more about model architectures, development resources, and availability here.

NVIDIA Cosmos World Foundation Models
Decorative icon

Family of State-of-the-Art Models

  • Autoregressive and diffusion models for Text-to-World and Video-to-World generation, available in parameter sizes ranging from 4 to 14 billion to suit various needs.
  • 12-billion-parameter upsampling model for refining text prompts, delivering enhanced accuracy and detail in generated outputs.
  • 7-billion-parameter model designed for decoding video sequences, optimized for augmented reality applications.
Decorative icon

Inbuilt Guardrails

  • Pre-guard to filter brands, unsafe content, and harmful prompts within Cosmos generated outputs.
  • Post-guard to remove questionable scenarios.
  • Guardrail to blur human faces.
  • Digital watermarks on synthetic videos generated from Preview APIs on NVIDIA API catalog.

Benchmarks

Journey to Physical AI Performance

NVIDIA is working with the robotics and autonomous vehicle ecosystem to develop a set of benchmarks to reflect the unique requirements of physical AI applications from world foundation models.

Cosmos benchmarks are designed to evaluate the next generation of world models with advanced criteria like 3D consistency and physics alignment, essential for robotics and autonomous systems.

Compared to VideoLDM (VLDM), a baseline generative model for video synthesis, Cosmos WFMs excel in geometric accuracy with lower Sampson error and better temporal stability. Benchmarks also evaluate WFMs based on physical behaviors like gravity and collision dynamics.

Cosmos WFMs consistently outperform VLDM on visual consistency, achieving up to 14X higher pose estimation success rates. While diffusion models deliver higher fidelity out of the box, autoregressive models deliver excellent performance for custom models.

Synthetic Data Generation

Use Cases

How Developers Use NVIDIA Cosmos

See how developers across robotics, autonomous vehicles, and vision AI can use Cosmos to advance their work.

Video Search

Cosmos helps developers build bespoke datasets for their AI model training. Whether it’s snowy road footage for self-driving cars or busy warehouse scenes for robotics, Cosmos simplifies video tagging and search by understanding spatial and temporal patterns, making training data preparation easier.

This saves time, reduces costs, and helps deliver AI models that are highly relevant and impactful for real-world use.

Synthetic Data Generation

Ecosystem

Adopted by Leading Physical AI Innovators

Model developers from robotics, autonomous vehicles, and vision AI industries are using Cosmos to accelerate physical AI development.

1X Technologies logo
Agile Robots logo
Agility Robotics logo
Figure AI logo
Foretellix logo
Fourier logo
Galbot logo
Hillbot logo
IntBot logo
Neura Robotics logo
Skild AI logo
Uber logo
Virtual Incision logo
Waabi logo
Wayve logo
Xpeng logo

Next Steps

Ready to Get Started?

Test drive a world foundation model in the NVIDIA API catalog or start building your world models using NVIDIA Cosmos.

Build Your Custom Models

NVIDIA NeMo provides an end-to-end pipeline to curate, tokenize, and fine-tune world models on any platform.

Start Curating Video Data For World Models

Accelerated data processing and curation pipeline powered by NVIDIA NeMo Curator and optimized for NVIDIA data center GPUs.

Frequently Asked Questions