Google says Gemini, launching today inside the Bard chatbot, is its “most capable” AI model ever. It was trained on video, images, and audio as well as text.
INCREASING TALK OF artificial intelligence developing with potentially dangerous speed is hardly slowing things down. A year after OpenAI launched ChatGPT and triggered a new race to develop AI technology, Google today revealed an AI project intended to reestablish the search giant as the world leader in AI.
Gemini, a new type of AI model that can work with text, images, and video, could be the most important algorithm in Google’s history after PageRank, which vaulted the search engine into the public psyche and created a corporate giant.
An initial version of Gemini starts to roll out today inside Google’s chatbot Bard for the English language setting. It will be available in more than 170 countries and territories. Google says Gemini will be made available to developers through Google Cloud’s API from December 13. A more compact version of the model will from today power suggested messaging replies from the keyboard of Pixel 8 smartphones. Gemini will be introduced into other Google products including generative search, ads, and Chrome in “coming months,” the company says. The most powerful Gemini version of all will debut in 2024, pending “extensive trust and safety checks,” Google says.
“It’s a big moment for us,” Demis Hassabis, CEO of Google DeepMind, told WIRED ahead of today’s announcement. “We’re really excited by its performance, and we’re also excited to see what people are going to do building on top of that.”
Gemini is described by Google as “natively multimodal,” because it was trained on images, video, and audio rather than just text, as the large language models at the heart of the recent generative AI boom are. “It’s our largest and most capable model; it’s also our most general,” Eli Collins, vice president of product for Google DeepMind, said at a press briefing announcing Gemini.
Google says there are three versions of Gemini: Ultra, the largest and most capable; Nano, which is significantly smaller and more efficient; and Pro, of medium size and middling capabilities.
From today, Google’s Bard, a chatbot similar to ChatGPT, will be powered by Gemini Pro, a change the company says will make it capable of more advanced reasoning and planning. Today, a specialized version of Gemini Pro is being folded into a new version of AlphaCode, a “research product” generative tool for coding from Google DeepMind. The most powerful version of Gemini, Ultra, will be put inside Bard and made available through a cloud API in 2024.
Sissy Hsiao, vice president at Google and general manager for Bard, says the model’s multimodal capabilities have given Bard new skills and made it better at tasks such as summarizing content, brainstorming, writing, and planning. “These are the biggest single quality improvements of Bard since we’ve launched,” Hsiao says.
Google showed several demos illustrating Gemini’s ability to handle problems involving visual information. One saw the AI model respond to a video in which someone drew images, created simple puzzles, and asked for game ideas involving a map of the world. Two Google researchers also showed how Gemini can help with scientific research by answering questions about a research paper featuring graphs and equations.
Collins says that Gemini Pro, the model being rolled out this week, outscored the earlier model that initially powered ChatGPT, called GPT-3.5, on six out of eight commonly used benchmarks for testing the smarts of AI software.
Google says Gemini Ultra, the model that will debut next year, scores 90 percent, higher than any other model including GPT-4, on the Massive Multitask Language Understanding (MMLU) benchmark, developed by academic researchers to test language models on questions on topics including math, US history, and law.
“Gemini is state-of-the-art across a wide range of benchmarks—30 out of 32 of the widely used ones in the machine-learning research community,” Collins said. “And so we do see it setting frontiers across the board.”
OpenAI’s GPT-4, which currently powers the most capable version of ChatGPT, blew people’s socks off when it debuted in March of this year. It also prompted some researchers to revise their expectations of when AI would rival the broadness of human intelligence. OpenAI has described GPT-4 as multimodal and in September upgraded ChatGPT to process images and audio, but it has not said whether the core GPT-4 model was trained directly on more than just text. ChatGPT can also generate images with help from another OpenAI model called DALL-E 2.
Google today released a technical report that provides some details of Gemini’s inner workings. It does not disclose the specifics of the architecture, size of the AI model, or the collection of data used to train it.
The lengthy and expensive process of training large AI models on powerful computer chips means that Gemini likely cost hundreds of millions of dollars, AI experts say. Google is expected to have developed a novel design for the model and a new mix of training data. The company has accelerated the release of its AI technology and poured resources into several new AI efforts in an attempt to drown out the noise around OpenAI’s ChatGPT and reestablish itself as the world’s leading AI company.
“We’re in a kind of tit-for-tat arms race,” says Oren Etzioni, a professor emeritus at the University of Washington and former CEO of the Allen Institute for AI. “There’s no reason to disbelieve that Gemini does better than GPT-4 on these benchmarks, but the next version, GPT-5, will do better than that.”
Etzioni says giant models like Gemini are thought to cost hundreds of millions of dollars to build, but the ultimate prize could be billions or even trillions in revenue for the company that dominates in supplying AI through the cloud. “This is a take-no-prisoners, must-win war,” he says.
Google invented some key techniques at work in ChatGPT but was slow to release its own chatbot technology prior to OpenAI’s own release roughly a year ago, in part because of concern it could say unsavory or even dangerous things. The company says it has done its most comprehensive safety testing to date with Gemini, because of the model’s more general capabilities.
Gemini was tested using a data set of toxic model prompts developed by the Allen Institute for AI. Collins says the company is collaborating with external researchers to further “red-team” the model, pushing it to misbehave and discover its weak points. Without providing specifics, Collins said Gemini’s greater power requires Google to “up the bar on the sort of quality and safety checking that we have to do.”
A lot is riding on the new algorithm for Google and its parent company Alphabet, which built up formidable AI research capabilities over the past decade. With millions of developers building on top of OpenAI’s algorithms, and Microsoft using the technology to add new features to its operating systems and productivity software, Google has been compelled to rethink its focus as never before.
The search company first announced that it was working on Gemini at its I/O conference in May, as the company scrambled to add generative AI to search to head off the popularity of ChatGPT and the threat that OpenAI’s technology might power up Microsoft’s Bing search engine. Google’s estimated share of the global search market still exceeds 90 percent, but the Gemini launch appears to show the company continuing to ramp up its response to ChatGPT.
Google DeepMind, the division that led development of Gemini, was created as part of that response by merging Google’s main AI research group, Google Brain, with its London-based AI unit, DeepMind, in April. But the Gemini project drew on researchers and engineers from across Google for the past few months. It made use of a recently upgraded version of Google’s custom silicon chips for training AI models, known as Tensor Processing Units (TPUs).