NVIDIA Cosmos 3: The World’s First Fully Open Omnimodel

NVIDIA has released Cosmos 3, which the company calls an open world foundation model for physical AI. Cosmos 3 uses a mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction.
The company describes Cosmos 3 as the world's first fully open omnimodel which can understand and generate text, images, video, ambient sound and actions with what NVIDIA calls leading physics accuracy.
According to McKinsey, robotics could cross the gap from simulation to reality soon. The consultancy adds that robots now operate in dynamic settings where adaptability and autonomy are required.
Physical AI capabilities expand
NVIDIA says Cosmos 3 allows robots, autonomous vehicles or vision agents to work in real world conditions even with limited training data and fragmented simulation stacks. The model's mixture-of-transformers architecture pairs a reasoning transformer with what the company describes as an expert generation transformer.
This structure allows Cosmos 3 to understand object interactions, motion and spatial-temporal relationships before generating video and action trajectories. The Cosmos platform now includes new datasets for robotics, physics, human motion, autonomous driving, warehouse safety and spatial reasoning.
The platform also offers new physical AI agent skills for neural scene reconstruction, defect-image generation and video augmentation.
According to Deloitte, with greater integration of AI capabilities in robotic systems and the emergence of specialised foundational models, robots could expand into multiple industries and applications, including smart factories.
Deloitte predicts that cumulative installed capacity of industrial robots could reach 5.5 million by 2026 globally.
Cosmos 3 Super, which is part of NVIDIA's lineup for post-training robotics and autonomous vehicle models is best suited for applications that need the highest physics accuracy and generation quality.
How developers use Cosmos
Cosmos 3 can generate synthetic data and scene variations, then support post-training with embodiment-specific behaviour and environment data. Tasks range from pick-and-place to dexterous manipulation.
It can also be used by developers as a vision language model or the backbone for world action models. They can also use it as a world model or video foundation model that simulates physical environments and predicts future world states for training and evaluation.
Jensen Huang, Founder and Chief Executive of NVIDIA, says: "The big bang of physical AI is just around the corner thanks to breakthroughs in multimodal reasoning language, vision and world models.
"The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, AVs and vision AI that perceive, reason, plan and act in the physical world."
Physical AI developers are building on the Cosmos platform across industries. Robotics users include Agile Robots, Doosan Robotics, LG Electronics, Samsung Electronics and Skild AI.
Li Auto is using the platform for autonomous vehicles. Centific, Fogsphere, Linker Vision, Milestone Systems and Yuan are using the platform for vision AI agents to power industrial AI and smart space applications.
NVIDIA announced Cosmos 3 alongside the NVIDIA Cosmos Coalition – which the company describes as a global collaboration between world model builders and AI developers.
Members include Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI.
According to NVIDIA, the coalition will work on open world models across industries and members can contribute models, research and evaluation techniques while using Cosmos 3 technologies.

