Physical AI

NVIDIA Cosmos

Develop world foundation models to advance physical AI.

Overview

What Is NVIDIA Cosmos?

NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline. It is built to power world model training and accelerate physical AI development for autonomous vehicles (AVs) and robots.

NVIDIA Powers Humanoid Robotics With Cloud-to-Robot Computing Platforms

New NVIDIA Isaac™ GR00T open models and GR00T-Dreams blueprint for generating synthetic data are advancing humanoid robot reasoning and behavior.

Scale Synthetic Data and Physical AI Reasoning With NVIDIA Cosmos

Explore the latest NVIDIA Cosmos WFMs for advanced reasoning and controllable synthetic data generation, enabling the next generation of AI-driven humanoids and autonomous vehicles.

Benefits

Accelerate Virtual World Generation for Physical AI

Cosmos provides developers with easy access to high-performance world foundation models, data pipelines, and tools to generate synthetic data and post-train for robotics and autonomous driving applications.

Physics First Data

World foundation models are pre-trained on 20 million hours of robotics and driving data to generate world states grounded in physics.

Open

Cosmos WFMs, guardrails, and tokenizers are licensed under the NVIDIA Open Model License, allowing access to all physical AI developers.

Models

Cosmos World Foundation Models

A family of pretrained multimodal models that developers can use out-of-the-box for world generation and reasoning, or post-train to develop specialized physical AI models.

Cosmos Predict

Generalist model for superior and faster world generation and frame prediction from multimodal input. Trained on 9,000 trillion tokens of robotics and driving data and purpose-built for post-training.

Available as Cosmos NIM for accelerated inference anywhere.

Cosmos Transfer

Amplify input video to a variety of environments and lighting conditions for physics-aware world generation conditioned on ground-truth and structured inputs. Speed up controllable synthetic data generation by using ground-truth simulation from NVIDIA Omniverse™.

Cosmos Reason

Fully customizable, multimodal reasoning model for planning response based on spatial and temporal understanding. 

Trained using visual-language model post-training and reinforcement learning for chain-of-thoughts reasoning.

Cosmos Guardrail

Develop responsible models using Cosmos WFM with pre-guard for filtering unsafe inputs and post-guard for consistent and safe outputs.

Tools

Post-train Cosmos World Foundation Models

Cosmos provides developers with open and highly performant data curation pipelines, tokenizers, training framework, and post-training scripts to quickly and easily build specialized world models like policy models and visual language action (VLA) models for embodied AI.

Efficiently Tokenize Video Data

Use Cosmos tokenizers to generate image or video tokens at higher compression rates—for scalable, robust, and efficient development of large world models. Choose high-res or low-res variants for post-training Cosmos WFMs into specialized AI models.

Speed Up Data Curation

Speed up data curation by 20X with the NVIDIA NeMo™ Curator pipeline of CUDA-X™ and NVIDIA AI-accelerated tooling for processing over 100PB of data. It provides out-of-the-box optimizations, minimizing the total cost of ownership (TCO) and accelerating time to market.

Fully Managed Development Support

NVIDIA DGX Cloud is a high-performance AI platform for accelerated training, enabling developers to curate data, post-train, and deploy video and world foundation models with a fully managed service.

Post-Training Script

Customize Cosmos WFMs for downstream Physical AI use case using PyTorch scripts. Post-train models to generate actions or text, or modify length, precision, view, and camera controls to match real-world scenarios and requirements.

Hardware

Get The Best Performance With NVIDIA AI

Cosmos WFMs are fully optimized for top-tier NVIDIA GPUs, including those built on the latest Blackwell architecture.

Run on NVIDIA Blackwell

For enterprises running massive, custom multimodal models-such as Cosmos world foundation models, NVIDIA’s GB200 delivers industry-leading speed and scalability for billion-plus parameter workloads. Access on NVIDIA DGX Cloud to develop next-generation AI superclusters and large-scale physical AI applications.

Physical AI developers can leverage server and workstation platforms with NVIDIA RTX PRO 6000 Blackwell GPUs and DGX Cloud to accelerate synthetic data generation using Omniverse and Cosmos. This combination lets you quickly generate physics-based synthetic data. This helps with advanced robotics, self-driving cars, and simulation workflows.


Use Cases

How Developers Use NVIDIA Cosmos

Accelerate downstream foundation model development to advance vision AI and embodied AI with synthetic data generation and post-training.

Synthetic Data Generation (SDG)

Omniverse creates realistic 3D scenes that can be used as input for Cosmos Transfer, which amplifies them across diverse, photorealistic environments and lighting. This process generates scalable, augmented data, removing the data bottleneck for more effective foundation model training.

Cosmos Reason can evaluate synthetic data by removing outputs that don’t meet post-training or evaluation requirements. It also generates captions to add context and help organize data, speeding up foundation model development for vision AI and embodied AI.

Policy Model Initialization

A policy model guides a physical AI system’s behavior, ensuring that the system operates with safety and in accordance with its goals. Cosmos Predict or Cosmos Reason can be post-trained into policy models to generate actions, saving the cost, time, and data needs of manual policy training.

Policy Model Evaluation

Cosmos WFMs accelerate policy evaluation by simulating real-world actions through video outputs, using Omniverse ground-truth physics for accuracy. Developers can build a vision-language-action (VLA) model using Cosmos Reason and add it to critique and drive actions. This simulation loop reduces the cost, time, and risk of real-world testing while improving policy precision.

Multiview Generation

Cosmos Predict can be post-trained to generate multiple views or diverse camera perspectives, enabling high-fidelity, temporally consistent, physics-based training data that contains up to 360° views from a single text, image, or video input.

This boosts model robustness, reduces edge-case failures, and accelerates development cycles for autonomous machines—lowering costs and delivering faster, safer deployments.

Our Commitment

Democratizing Trustworthy AI for Physical AI Community

Cosmos models, guardrails, and tokenizers are available on Hugging Face and GitHub, with resources to tackle data scarcity in training physical AI models. We're committed to driving Cosmos forward— transparent, open, and built for all.

Ecosystem

Adopted by Leading Physical AI Innovators

Model developers from the robotics, autonomous vehicles, and vision AI industries are using Cosmos to accelerate physical AI development.

Next Steps

Ready to Get Started?

Test drive a world foundation model in the NVIDIA API catalog or start building your world models using Cosmos.

Post-Train WFMs

Cosmos WFMs are purpose-built for post-training, unlocking powerful, downstream world models that accelerate physical AI development.

Curate Video Data For World Models

Leverage an accelerated data processing and curation pipeline powered by NVIDIA NeMo Curator and optimized for NVIDIA data center GPUs.

Frequently Asked Questions

Start with documentation. Cosmos world foundation models are openly available on Hugging Face with inference and post-training scripts on GitHub. Developers can also use Cosmos tokenizer from /NVIDIA/cosmos-tokenizer on GitHub and Hugging Face.

Cosmos world foundation models are available under an NVIDIA Open Model License for all.

PyTorch scripts are openly available for all Cosmos models for post-training. Please read the documentation for a step-by-step guide on post-training.

Yes, you can leverage Cosmos to build from scratch with your preferred foundation model or model architecture. You can start by using NeMo Curator for video data pre-processing. Then compress and decode your data with Cosmos tokenizer. Once you have processed the data, you can train or fine-tune your model using NVIDIA NeMo.

Using NVIDIA NIM™ microservices, you can easily integrate your physical AI models into your applications across cloud, data centers, and workstations.

You can also use NVIDIA DGX Cloud to train AI models and deploy them anywhere at scale.

Omniverse creates realistic 3D simulations of real-world tasks by using different generative APIs, SDKs, and NVIDIA RTX rendering technology.

Developers can input Omniverse simulations as instruction videos to Cosmos Transfer models to generate controllable photoreal synthetic data.

Together, Omniverse provides the simulation environment before and after training, while Cosmos provides the foundation models to generate video data and train physical AI models.

Learn more about NVIDIA Omniverse.