How Google’s Gemini 2.0 AI Will Uplift Universal Assistance

Share
Google has announced Gemini 2.0, the latest model in its line of LLMs
Google's latest LLM, Gemini 2.0, introduces advanced multimodal capabilities, agentic AI abilities and enhanced performance for improved AI assistants

Google has announced Gemini 2.0, the latest model in its line of large language models aimed at organising the world’s information.

Sundar Pichai, CEO of Google and its parent company Alphabet, said in a statement that Gemini 2.0 “will enable us to build new AI agents that bring us closer to our vision of a universal assistant” and noted that the model incorporates “new advances in multimodality – like native image and audio output – and native tool use.”

Sundar Pichai, CEO of Google and Alphabet

“If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful,” he said. “I can’t wait to see what this next era brings.”

Sundar said the new model’s capabilities are “underpinned by decade-long investments in our differentiated full-stack approach to AI innovation.” It is built on custom hardware like the company’s sixth-generation Tensor Processing Units (TPUs), which powered all of the training and inference for Gemini 2.0.

Gemini 2.0 Flash Available to Developers and Users

Google is also releasing Gemini 2.0 Flash, an experimental version of the model with “low latency and enhanced performance at the cutting edge of our technology, at scale,” according to Demis Hassabis, CEO of Google’s AI research unit DeepMind and Koray Kavukcuoglu, Google DeepMind’s CTO.

Demis Hassabis, CEO of Google DeepMind

“Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times,” they said. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed.”

The model is available now to developers via Google’s AI APIs and to users of the Gemini AI chatbot. Gemini users globally can also access a chat-optimised version of the model by selecting it in the model dropdown on the desktop and mobile web versions of the app. It will be available in the Gemini mobile apps soon.

Demis and Koray said that in addition to supporting multimodal inputs like images, video and audio, Gemini 2.0 Flash “now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio.” The model can also natively call tools like Google Search, code execution and third-party user-defined functions.

To help developers build applications with the new model, Google is also releasing a Multimodal Live API that supports real-time audio and video streaming input, as well as the ability to use multiple combined tools.

Research Prototypes Showcase Agentic AI Abilities

Google also showcased several research prototypes built with Gemini 2.0 that aim to demonstrate the ‘agentic’ abilities of the model to take actions and accomplish tasks on behalf of users.

Youtube Placeholder

Project Astra, first introduced at the company’s I/O developer conference, is a prototype universal AI assistant that Google has been testing with a small group of users, whereas the latest version built with Gemini 2.0 features “better dialogue” with the ability to converse in multiple languages, new tool use capabilities, improved memory and lower latency.

“We’re working to bring these types of capabilities to Google products like Gemini app, our AI assistant, and to other form factors like glasses,” Sundar said. "And we’re starting to expand our trusted tester program to more people, including a small group that will soon begin testing Project Astra on prototype glasses."

Another product, Project Mariner, is “an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser,” Demis and Koray said.

Key facts
  • 2x: Gemini 2.0 Flash outperforms the 1.5 Pro model on key benchmarks at twice the speed.
  • 83.5%: Project Mariner, a research prototype built with Gemini 2.0, achieved a state-of-the-art result of 83.5% on the WebVoyager benchmark, which tests agent performance on real-world web tasks.
  • 1bn: Google's AI Overviews feature in Search, which will incorporate Gemini 2.0 capabilities, now reaches 1 billion people.

Via an experimental Chrome browser extension, the agent is able to “understand and reason across information in your browser screen” and complete tasks for users.

The DeepMind executives said Project Mariner achieved state-of-the-art results on the WebVoyager benchmark, which tests AI agent performance on real-world web tasks. “It’s still early, but Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time,” they said.

Finally, Jules is an experimental AI code agent that integrates with the GitHub software development platform. “It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision,” according to the DeepMind executives. “This effort is part of our long-term goal of building AI agents that are helpful in all domains, including coding.”

Gemini 2.0 Coming to More Google Products

Sunadar said Gemini 2.0 is already being tested in a limited fashion in Google’s AI Overviews feature in Search, with the advanced reasoning capabilities of the model being used to “tackle more complex topics and multi-step questions including advanced math equations, multimodal queries and coding.” 

If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful.

Sundar Pichai, CEO of Google and Alphabet

“Early next year, we’ll expand Gemini 2.0 to more Google products,” he said.

“No product has been transformed more by AI than Search. Our AI Overviews now reach one billion people, enabling them to ask entirely new types of questions — quickly becoming one of our most popular Search features ever.”


Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.

Discover all our upcoming events and secure your tickets today.


Technology Magazine is a BizClik brand

Share

Featured Articles

Why AWS is Committing $5bn to Thailand Cloud Infrastructure

AWS expands its Asia-Pacific presence with three new availability zones in Thailand, as financial services firms lead cloud adoption

The Impact of Meta’s New Policies on Social Media Worldwide

Meta shifts from third-party fact-checking to community-driven content moderation, raising concerns about misinformation and user safety

Google Cloud Names Former Microsoft Exec to Lead EMEA Push

Former Microsoft and Accenture executive Maureen Costello takes helm as cloud computing battle intensifies in European and African markets

Nvidia’s New AI Innovations at CES 2025: Explained

AI & Machine Learning

Microsoft's AI Vision For America’s Technological Future

AI & Machine Learning

Nvidia's New Affordable Gen AI Supercomputer: Explained

AI & Machine Learning