ChatGPT update enables AI chatbot to ‘see, hear, and speak’
OpenAI's wildly popular large language model AI chatbot ChatGPT will soon be able to have voice conversations with users and interact using images, the company has revealed.
The company’s release of ChatGPT last year has rapidly accelerated interest in generative AI, with the tool capable of interacting conversationally, answering follow-up questions, admitting its mistakes, challenging incorrect premises, and rejecting inappropriate requests.
In March OpenAI announced the launch of GPT-4, the latest iteration in its deep learning model, which it says ‘exhibits human-level performance’ on various professional and academic benchmarks from the US bar exam to SAT school exams.
“We are beginning to roll out new voice and image capabilities in ChatGPT,” OpenAI said in a blog post. “They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.”
ChatGPT voice and image capabilities
According to OpenAI, users of ChatGPT will soon be able to engage in a back-and-forth conversation with the chatbot. The new voice capability, the company says, is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech. OpenAI collaborated with professional voice actors to create each of the voices, and uses Whisper, the company’s open-source speech recognition system, to transcribe spoken words into text.
Meanwhile, with images support, users can take pictures of things around them and ask the chatbot to "troubleshoot why your grill won't start, explore the contents of your fridge to plan a meal, or analyse a complex graph for work-related data".
Image understanding is powered by multimodal GPT-3.5 and GPT-4. These models, OpenAI says, apply their language reasoning skills to a wide range of images, such as photographs, screenshots, and documents containing both text and images.
According to OpenAI, these voice and images capabilities in ChatGPT will be rolled out to Plus and Enterprise users over the coming weeks.
“OpenAI’s goal is to build AGI that is safe and beneficial,” it said. “We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.”
******
For more insights into the world of Technology - check out the latest edition of Technology Magazine and be sure to follow us on LinkedIn & Twitter.
Other magazines that may be of interest - AI Magazine | Cyber Magazine.
Please also check out our upcoming event - Cloud and 5G LIVE on October 11 and 12 2023.
******
BizClik is a global provider of B2B digital media platforms that cover Executive Communities for CEOs, CFOs, CMOs, Sustainability leaders, Procurement & Supply Chain leaders, Technology & AI leaders, Cyber leaders, FinTech & InsurTech leaders as well as covering industries such as Manufacturing, Mining, Energy, EV, Construction, Healthcare and Food.
BizClik – based in London, Dubai, and New York – offers services such as content creation, advertising & sponsorship solutions, webinars & events.