Google unveils Gemini, its largest and most capable AI model

By Marcus Law

December 06, 2023

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

Gemini is Google's most capable and general model yet, with state-of-the-art performance across many leading benchmarks

Google says its Gemini AI model is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code

Google has announced the much-anticipated launch of Gemini, its largest and most capable AI model.

According to Google, Gemini has sophisticated multi-modal capabilities, being able to master human-style conversations, language and content, as well as understand and interpret images, code, drive data and analytics and be used by developers to create new AI apps and APIs.

According to Google DeepMind CEO Demis Hassibis, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

“With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities,” he wrote in an announcement blog.

“Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.”

Gemini model boasts sophisticated multimodal reasoning capabilities

Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. This, Google says, makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data.

Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.

The model was trained to recognise and understand text, images, audio and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects like math and physics.

Google says Gemini will bring improvements to Google’s existing AI and AI-enhanced products like Bard, Google Assistant and Search. Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more, in the biggest upgrade to Bard since it launched.

Google is also bringing Gemini to Pixel, with its Pixel 8 Pro the first smartphone engineered to run Gemini Nano, which is powering new features like Summarize in the Recorder app and rolling out in Smart Reply in Gboard, starting with WhatsApp.

In the coming months, Google says Gemini will be available in more of its products and services like Search, Ads, Chrome and Duet AI.

Built with responsibility and safety at the core

Building upon Google’s AI Principles and the robust safety policies across its products, Google says it has added new protections to account for Gemini’s multimodal capabilities.

Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity. The company has conducted novel research into potential risk areas like cyber-offense, persuasion and autonomy, and has applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment.

“This is a significant milestone in the development of AI, and the start of a new era for us at Google as we continue to rapidly innovate and responsibly advance the capabilities of our models,” wrote Hassabis.

“We’ve made great progress on Gemini so far and we’re working hard to further extend its capabilities for future versions, including advances in planning and memory, and increasing the context window for processing even more information to give better responses.

“We’re excited by the amazing possibilities of a world responsibly empowered by AI — a future of innovation that will enhance creativity, extend knowledge, advance science and transform the way billions of people live and work around the world.”

******

For more insights into the world of Technology - check out the latest edition of Technology Magazine and be sure to follow us on LinkedIn & Twitter.

Other magazines that may be of interest - AI Magazine | Cyber Magazine | Data Centre Magazine

Please also check out our upcoming event - Sustainability LIVE Net Zero on 6 and 7 March 2024.

******

BizClik is a global provider of B2B digital media platforms that covers executive communities for CEOs, CFOs, CMOs, sustainability leaders, procurement & supply chain leaders, technology & AI leaders, fintech leaders as well as covering industries such as manufacturing, mining, energy, EV, construction, healthcare and food.

Based in London, Dubai, and New York, Bizclik offers services such as content creation, advertising & sponsorship solutions, webinars & events.