How is OpenAI's Sora 2 Model Redefining Generative Video AI?

Share this article
Share this article
Prioritise Us on Google
Sam Altman, CEO and Founder of OpenAI releases Sora 2, a video and audio generation model with enhanced capabilities
OpenAI’s Sora 2 delivers physics-accurate video and audio generation with dialogue sync, cameo identity tools and creative AI safety features

The race to advance video generation systems capable of realistically simulating the physical world has accelerated rapidly across the tech industry over the past year.

OpenAI has introduced Sora 2, its latest video and audio generation model, which the company claims delivers enhanced physics simulation compared to previous iterations.

The upgrade forms part of what OpenAI refers to as world simulation technology, deploying neural networks to produce video that aligns more accurately with the laws of physics.

The system can now render complex physical scenarios such as gymnastics sequences or basketball rebounds, capturing dynamics of buoyancy and rigidity with greater precision.

According to the Sora team, this release represents a significant step forward from the original Sora, launched in February 2024.

“The Sora team has been focused on training models with more advanced world simulation capabilities,” the team writes in an OpenAI blog post.

“We believe such systems will be critical for training AI models that deeply understand the physical world. 

“A major milestone for this is mastering pre-training and post-training on large-scale video data, which are in their infancy compared to language.”

Sora 2’s enhanced capabilities 

The OpenAI Sora team explains that earlier video generation systems often distorted objects and reshaped entire scenarios in order to fit text-based prompts.

Youtube Placeholder

In contrast, Sora 2 is designed to generate outcomes that adhere more closely to established physics constraints.

“Prior video models are overoptimistic – they will morph objects and deform reality to successfully execute upon a text prompt,” the team writes.

“For example, if a basketball player misses a shot, the ball may spontaneously teleport to the hoop. 

“In Sora 2, if a basketball player misses a shot, it will rebound off the backboard.”

The system is capable of following instructions across multiple shots while preserving consistency of elements within generated scenes.

It supports a range of visual styles, from photorealistic and cinematic to anime.

Alongside video content, the model can also produce audio components such as background soundscapes, dialogue and sound effects.

OpenAI has further added a new feature that enables users to embed recordings of people or objects directly into generated environments.

“By observing a video of one of our teammates, the model can insert them into any Sora-generated environment with an accurate portrayal of appearance and voice,” the team writes.

How does Sora 2 work?

The company has launched an iOS mobile app that offers access to Sora 2, currently available through an invite-only system.

Youtube Placeholder

The app features a tool called cameos, which requires users to provide a recorded video and audio sample for identity verification before their likeness can be used in generated content.

Through dedicated permission settings, users are able to retain control over how their digital likeness is applied.

“Only you decide who can use your cameo and you can revoke access or remove any video that includes it at any time,” the team says.

OpenAI has introduced what it calls a natural language recommender system, which leverages the company’s language models to let users guide their content feed using text commands.

According to the company, the algorithm is designed to prioritise material from followed accounts as well as videos intended to provide creative inspiration.

“We explicitly designed the app to maximise creation, not consumption,” they say.

Sora 2’s safety features

The company has introduced multiple features to protect the user experience.

Distinguishing AI content
Sora 2 embeds visible watermarks and C2PA metadata in every video, alongside internal tracing tools, to ensure AI-generated content is identifiable and accountable.

Sora 2’s top capabilities:
  • Physics-accurate video generation with realistic motion and outcomes
  • Audio and dialogue synchronisation alongside video content
  • Multi‑style output: photorealistic, cinematic and anime formats
  • Cameo feature with identity‑verified likeness control
  • Built‑in safety systems: watermarks, filters, parental controls

Consent-based likeness
Users control how their likeness is used through cameo features, with the ability to revoke access, review drafts, delete or report content and set custom preferences. Public figures are blocked unless they opt in.

Safeguards for teens
The platform restricts mature content, blocks adults from initiating teen contact and introduces parental controls via ChatGPT. Teen users face limits on scrolling and receive a non-personalised feed by default.

Filtering harmful content
Sora deploys layered defences to block unsafe prompts and outputs, filters feed content against global policies – and applies tighter rules due to video realism, with human moderation supplementing automation.

Audio safeguards
Generated audio is reviewed for policy violations, prevents imitation of living artists and honours takedown requests from creators.

User control and recourse
Users decide when to publish content, can remove or report videos and accounts and maintain control over visibility and interactions.

“Video models are getting very good, very quickly,” the team writes.

Company portals

Executives