At the Google I/O developer conference in May 2023, CEO Sundar Pichai announced the company’s next artificial intelligence (AI) system, Gemini.
The large language model (LLM) was developed by the Google DeepMind division (Brain Team + DeepMind). It could compete with and perhaps surpass AI systems like OpenAI’s ChatGPT.
While details remain scarce, here’s what we can piece together from the latest interviews and reports on Google Gemini.
Google Gemini will be multimodal
Pichai said Gemini combines the strengths of DeepMind’s AlphaGo system, known for mastering the complex game Go, with broad language modeling capabilities.
He said it is designed from the ground up to be multimodal, integrating text, images and other types of data. This may allow for more natural conversational skills.
Pichai also hinted at future abilities such as memory and planning that could enable activities requiring reasoning.
Gemini can use tools and APIs
In an update to his professional bio over the summer, Google chief scientist Jeffrey Dean said Gemini is one of the “next-generation multimodal models” of which he is a co-leader.
It said it will use Pathways, Google’s new AI infrastructure, to allow it to scale up training across different datasets.
This suggests that Gemini is potentially the largest language model created to date, likely exceeding the size of GPT-3 with over 175 billion parameters.
It will come with various sizes and capacities
Further details came from Demis Hassabis, CEO of DeepMind.
In June, he told Wired that AlphaGo’s techniques, such as reinforcement learning and tree search, could give Gemini new skills such as reasoning and problem solving.
Hassabis said the Gemini is a “series of models” that will be made available in different sizes and capabilities.
He also mentioned that Geminis could use memory, fact-checking against sources like Google Search, and better reinforcement learning to improve accuracy and reduce dangerous hallucinated content.
The first results of Gemini are promising
In an interview with September Time, Hassabis reiterated that Gemini aims to combine scalability and innovation.
He said the integration of planning and memory is in the early exploratory stages.
Hassabis also said that Gemini could use retrieval methods to produce entire chunks of information, rather than word-by-word generation, to improve factual coherence.
He revealed that Gemini builds on DeepMind’s multimodal work such as the Flamingo image captioning system.
Overall, Hassabis said Gemini is showing “very promising early results.”
Advanced chatbots as universal personal assistants
In an interview with Wired, published a few days later, Pichai gave the most unequivocal indication of how Gemini fits into Google’s product roadmap.
He said conversational AI systems like Bard are not “the end state” but benchmarks leading toward more advanced chatbots.
Pichai said Gemini and future iterations will eventually become “incredible universal personal assistants” integrated into people’s daily lives in areas such as travel, work and entertainment.
He reiterated that Gemini will combine the strengths of text and images, saying that today’s chatbots will “seem trivial” by comparison within a few years.
Contestants are interested in Gemini’s performance
OpenAI’s CEO tweeted what appeared to be a response to a paywalled article reporting that Google Gemini could outperform GPT-4.
Are the numbers wrong?
—Elon Musk (@elonmusk) August 30, 2023
There was no official response to Elon Musk’s follow-up question about whether the numbers provided by SemiAnalysis were correct.
Select companies have early access to Gemini
More clues to Gemini’s progress this week: The Information reported that Google has given a small group of developers outside of Google early access to Gemini.
This suggests that Gemini may soon be ready for a beta release and integration into services like Google Cloud Vertex AI.
Meta working on LLM to compete with OpenAI
While the news on Gemini is promising so far, Google isn’t the only company poised to launch a new LLM to compete with OpenAI.
According to the Wall Street Journal, Meta is also working on an AI model that could compete with the GPT model that powers ChatGPT.
Meta recently announced the release of Llama 2, an open source AI model, in collaboration with Microsoft. The company seems committed to responsibly creating more accessible AI.
The countdown to Google Gemini
What we know so far indicates that Gemini could represent a significant advance in natural language processing.
Merging DeepMind’s latest AI research with Google’s vast computational resources makes the potential impact difficult to overestimate.
If Gemini lives up to expectations, it could drive a shift in interactive AI, aligning with Google’s ambitions to “bring AI responsibly to billions of people.”
The latest news from Meta and Google comes just days after the first AI Insight Forum, where tech CEOs met privately with a portion of the US Senate to discuss the future of artificial intelligence.
Featured image: VDB Photos/Shutterstock