ChatGPT update enables AI chatbot to ‘see, hear, and speak’

OpenAI ChatGPT AI chatbot will soon be able to have voice conversations with users and interact using images, the company has said

OpenAI's wildly popular large language model AI chatbot ChatGPT will soon be able to have voice conversations with users and interact using images, the company has revealed.

The company’s release of ChatGPT last year has rapidly accelerated interest in generative AI, with the tool capable of interacting conversationally, answering follow-up questions, admitting its mistakes, challenging incorrect premises, and rejecting inappropriate requests. 

In March OpenAI announced the launch of GPT-4, the latest iteration in its deep learning model, which it says ‘exhibits human-level performance’ on various professional and academic benchmarks from the US bar exam to SAT school exams.

“We are beginning to roll out new voice and image capabilities in ChatGPT,” OpenAI said in a blog post. “They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”

ChatGPT voice and image capabilities

According to OpenAI, users of ChatGPT will soon be able to engage in a back-and-forth conversation with the chatbot. The new voice capability, the company says, is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech. OpenAI collaborated with professional voice actors to create each of the voices, and uses Whisper, the company’s open-source speech recognition system, to transcribe spoken words into text.

Meanwhile, with images support, users can take pictures of things around them and ask the chatbot to "troubleshoot why your grill won't start, explore the contents of your fridge to plan a meal, or analyse a complex graph for work-related data".

Image understanding is powered by multimodal GPT-3.5 and GPT-4. These models, OpenAI says, apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images.

According to OpenAI, these voice and images capabilities in ChatGPT will be rolled out to Plus and Enterprise users over the coming weeks.

“OpenAI’s goal is to build AGI that is safe and beneficial,” it said. “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.”

******

For more insights into the world of Technology - check out the latest edition of Technology Magazine and be sure to follow us on LinkedIn & Twitter.

Other magazines that may be of interest - AI Magazine | Cyber Magazine.

Please also check out our upcoming event - Cloud and 5G LIVE on October 11 and 12 2023.

******

BizClik is a global provider of B2B digital media platforms that cover Executive Communities for CEOs, CFOs, CMOs, Sustainability leaders, Procurement & Supply Chain leaders, Technology & AI leaders, Cyber leaders, FinTech & InsurTech leaders as well as covering industries such as Manufacturing, Mining, Energy, EV, Construction, Healthcare and Food.

BizClik – based in London, Dubai, and New York – offers services such as content creation, advertising & sponsorship solutions, webinars & events.

Share
Share

Featured Articles

NetApp Cloud Complexity: Reliable Data is Key to AI Success

NetApp’s second Cloud Complexity study highlights the divide between AI leaders and AI laggards, illustrating the value of a unified data approach

Top 100 Women 2024: Karine Brunet, Capgemini - No. 9

Technology Magazine’s Top 100 Women in Technology honours Capgemini’s Karine Brunet at Number 9 for 2024

AMD: Expansion, Growth and Doubling Down on AI Innovation

With the AI chips market booming and set to grow to US$67bn in 2024, AMD is positioning itself for the new AI era as it celebrates its 55th birthday

Top 100 Women 2024: Miriam Murphy, NTT - No. 10

Data & Data Analytics

Dell at 40: A Long-Standing Commitment to Digital Innovation

Digital Transformation

Globant to Drive Formula 1’s Digital Transformation

Digital Transformation