OpenAI Expands into Next-Gen Audio AI With Three New Models

By Rithula Nisha

May 11, 2026

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

OpenAI API launches realtime audio for AI agents. Credit: OpenAI

OpenAI has launched three new audio AI models for developers to build real-time voice agents that can listen, reason, respond and act naturally

OpenAI has released three audio models designed to handle real-time voice interactions. GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper could enable software systems to process spoken requests and respond whilst conversations are still taking place.

The models target developers building applications where users need to communicate by voice rather than text. This could include scenarios where typing is impractical or where live translation and transcription are required.

Voice models with reasoning

GPT-Realtime-2 is the first voice model from OpenAI to include reasoning capabilities from its GPT-5 class architecture. The system can process requests whilst maintaining conversational flow.

It handles interruptions and corrections during live interactions. The model calls tools and adjusts responses based on the context of the conversation as it unfolds.

The system could allow applications to move from simple voice commands to interactions where software reasons through multi-step requests. Developers are building voice-to-action patterns where systems complete tasks after interpreting spoken instructions.

A system-to-voice mode enables applications to convert content into spoken output. Users could search for travel options conversationally whilst the system manages connected tasks like rebooking hotels after flight changes.

Translation across 70 languages

GPT-Realtime-Translate processes speech from more than 70 input languages into 13 output languages. The model translates whilst speakers continue talking.

Three ways to build with voice AI. Credit: OpenAI

Deutsche Telekom is building customer support systems where users speak in their preferred language and the model translates the conversation in real time. The approach targets support operations, education platforms and media services with international audiences.

Vimeo uses the model to translate product education videos as they play. Global customers can hear content in their chosen language without waiting for separately produced versions.

Cobus Kok, VP AI Experiences at Priceline, says: "GPT-Realtime-2 stood out for how well it handles complex requests, coordinates multiple tool calls at once, and keeps the interaction feeling natural."

Cobus Kok, VP AI Experiences at Priceline

"For Penny, Priceline’s AI travel agent, that translates into quicker, more practical support by voice – especially when travellers need to adjust plans in real time."

Live transcription and workflow

GPT-Realtime-Whisper converts speech to text as speakers talk. The streaming transcription model operates with low latency.

The system could generate captions for broadcasts or produce summaries while meetings are in progress. Teams can integrate live speech into business workflows without waiting for recordings to finish processing.

Prateek Sachan, Co-Founder and CTO at BolnaAI

Prateek Sachan, Co-Founder and CTO at BolnaAI, says: "Building voice AI for India means handling diverse regional phonetics. GPT-Realtime-Translate delivered 12.5% lower word error rates than any other model we tested."

OpenAI has designed the models specifically for applications where voice is the primary interface. The company expects developers to build experiences where users can speak naturally whilst software handles tasks in real time across various use cases.

The Realtime API includes multiple layers of controls to prevent misuse. Active classifiers can halt sessions that violate content guidelines during live interactions.

Usage policies prohibit distribution of outputs for spam or deceptive purposes, though it falls on the developer to ensure end users know when they are interacting with AI systems.

Company portals

Executives

Cobus Kok
VP AI Experiences
Prateek Sachan
Co-Founder and CTO

OpenAI Expands into Next-Gen Audio AI With Three New Models

Voice models with reasoning

Translation across 70 languages

Live transcription and workflow

Company portals

Bolna.AI

OpenAI

Priceline

Executives

Cobus Kok

Prateek Sachan

Tags