How Google’s Gemini 2.0 AI Will Uplift Universal Assistance
Google has announced Gemini 2.0, the latest model in its line of large language models aimed at organising the world’s information.
Sundar Pichai, CEO of Google and its parent company Alphabet, said in a statement that Gemini 2.0 “will enable us to build new AI agents that bring us closer to our vision of a universal assistant” and noted that the model incorporates “new advances in multimodality – like native image and audio output – and native tool use.”
“If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful,” he said. “I can’t wait to see what this next era brings.”
Sundar said the new model’s capabilities are “underpinned by decade-long investments in our differentiated full-stack approach to AI innovation.” It is built on custom hardware like the company’s sixth-generation Tensor Processing Units (TPUs), which powered all of the training and inference for Gemini 2.0.
Gemini 2.0 Flash Available to Developers and Users
Google is also releasing Gemini 2.0 Flash, an experimental version of the model with “low latency and enhanced performance at the cutting edge of our technology, at scale,” according to Demis Hassabis, CEO of Google’s AI research unit DeepMind and Koray Kavukcuoglu, Google DeepMind’s CTO.
“Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times,” they said. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed.”
The model is available now to developers via Google’s AI APIs and to users of the Gemini AI chatbot. Gemini users globally can also access a chat-optimised version of the model by selecting it in the model dropdown on the desktop and mobile web versions of the app. It will be available in the Gemini mobile apps soon.
Demis and Koray said that in addition to supporting multimodal inputs like images, video and audio, Gemini 2.0 Flash “now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio.” The model can also natively call tools like Google Search, code execution and third-party user-defined functions.
To help developers build applications with the new model, Google is also releasing a Multimodal Live API that supports real-time audio and video streaming input, as well as the ability to use multiple combined tools.
Research Prototypes Showcase Agentic AI Abilities
Google also showcased several research prototypes built with Gemini 2.0 that aim to demonstrate the ‘agentic’ abilities of the model to take actions and accomplish tasks on behalf of users.
Project Astra, first introduced at the company’s I/O developer conference, is a prototype universal AI assistant that Google has been testing with a small group of users, whereas the latest version built with Gemini 2.0 features “better dialogue” with the ability to converse in multiple languages, new tool use capabilities, improved memory and lower latency.
“We’re working to bring these types of capabilities to Google products like Gemini app, our AI assistant, and to other form factors like glasses,” Sundar said. "And we’re starting to expand our trusted tester program to more people, including a small group that will soon begin testing Project Astra on prototype glasses."
Another product, Project Mariner, is “an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser,” Demis and Koray said.
- 2x: Gemini 2.0 Flash outperforms the 1.5 Pro model on key benchmarks at twice the speed.
- 83.5%: Project Mariner, a research prototype built with Gemini 2.0, achieved a state-of-the-art result of 83.5% on the WebVoyager benchmark, which tests agent performance on real-world web tasks.
- 1bn: Google's AI Overviews feature in Search, which will incorporate Gemini 2.0 capabilities, now reaches 1 billion people.
Via an experimental Chrome browser extension, the agent is able to “understand and reason across information in your browser screen” and complete tasks for users.
The DeepMind executives said Project Mariner achieved state-of-the-art results on the WebVoyager benchmark, which tests AI agent performance on real-world web tasks. “It’s still early, but Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time,” they said.
Finally, Jules is an experimental AI code agent that integrates with the GitHub software development platform. “It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision,” according to the DeepMind executives. “This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding.”
Gemini 2.0 Coming to More Google Products
Sunadar said Gemini 2.0 is already being tested in a limited fashion in Google’s AI Overviews feature in Search, with the advanced reasoning capabilities of the model being used to “tackle more complex topics and multi-step questions including advanced math equations, multimodal queries and coding.”
If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful.
“Early next year, we’ll expand Gemini 2.0 to more Google products,” he said.
“No product has been transformed more by AI than Search. Our AI Overviews now reach one billion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever.”
Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.
Discover all our upcoming events and secure your tickets today.
Technology Magazine is a BizClik brand
- Splunk: How to Avoid AI Implementation ChallengesAI & Machine Learning
- How Will Informatica’s CDGC Help Google Cloud’s Customers?AI & Machine Learning
- Why SAP has Unveiled AI-Powered Retail Cloud PlatformCloud & Cybersecurity
- How Honeywell And NXP Are Revolutionizing Aviation TechDigital Transformation