What is Google Gemini (formerly Bard)
What is Google Gemini (formerly Bard)?
Google Gemini -- formerly known as Bard -- is an artificial intelligence (AI) chatbot tool designed by Google to simulate human conversations using natural language processing (NLP) and machine learning. In addition to supplementing Google Search, Gemini can be integrated into websites, messaging platforms or applications to provide realistic, natural language responses to user questions.
Google Gemini is a family of multimodal AI large language models (LLMs) that have capabilities in language, audio, code and video understanding.
Gemini 1.0 was announced on Dec. 6, 2023, and built by Alphabet's Google DeepMind business unit, which is focused on advanced AI research and development. Google co-founder Sergey Brin is credited with helping to develop the Gemini LLMs, alongside other Google staff.
At its release, Gemini was the most advanced set of LLMs at Google, powering Bard before Bard's renaming and superseding the company's Pathways Language Model (Palm 2). As was the case with Palm 2, Gemini was integrated into multiple Google technologies to provide generative AI capabilities.
Gemini integrates NLP capabilities, which provide the ability to understand and process language. Gemini is also used to comprehend input queries as well as data. It's able to understand and recognize images, enabling it to parse complex visuals, such as charts and figures, without the need for external optical character recognition (OCR). It also has broad multilingual capabilities for translation tasks and functionality across different languages.
This article is part of
What is Gen AI? Generative AI explained
Unlike prior AI models from Google, Gemini is natively multimodal, meaning it's trained end to end on data sets spanning multiple data types. As a multimodal model, Gemini enables cross-modal reasoning abilities. That means Gemini can reason across a sequence of different input data types, including audio, images and text. For example, Gemini can understand handwritten notes, graphs and diagrams to solve complex problems. The Gemini architecture supports directly ingesting text, images, audio waveforms and video frames as interleaved sequences.
How does Google Gemini work?
Google Gemini works by first being trained on a massive corpus of data. After training, the model uses several neural network techniques to be able to understand content, answer questions, generate text and produce outputs.
Specifically, the Gemini LLMs use a transformer model-based neural network architecture. The Gemini architecture has been enhanced to process lengthy contextual sequences across different data types, including text, audio and video. Google DeepMind makes use of efficient attention mechanisms in the transformer decoder to help the models process long contexts, spanning different modalities.
Gemini models have been trained on diverse multimodal and multilingual data sets of text, images, audio and video with Google DeepMind using advanced data filtering to optimize training. As different Gemini models are deployed in support of specific Google services, there's a process of targeted fine-tuning that can be used to further optimize a model for a use case. During both the training and inference phases, Gemini benefits from the use of Google's latest tensor processing unit chips, TPU v5, which are optimized custom AI accelerators designed to efficiently train and deploy large models.
A key challenge for LLMs is the risk of bias and potentially toxic content. According to Google, Gemini underwent extensive safety testing and mitigation around risks such as bias and toxicity to help provide a degree of LLM safety. To help further ensure Gemini works as it should, the models were tested against academic benchmarks spanning language, image, audio, video and code domains. Google has assured the public it adheres to a list of AI principles.
At launch on Dec. 6, 2023, Gemini was announced to be made up of a series of different model sizes, each designed for a specific set of use cases and deployment environments. The Ultra model is the top end and is designed for highly complex tasks. The Pro model is designed for performance and deployment at scale. As of Dec. 13, 2023, Google enabled access to Gemini Pro in Google Cloud Vertex AI and Google AI Studio. For code, a version of Gemini Pro is being used to power the Google AlphaCode 2 generative AI coding technology.
The Nano model is targeted at on-device use cases. There are two different versions of Gemini Nano: Nano-1 is a 1.8 billion-parameter model, while Nano-2 is a 3.25 billion-parameter model. Among the places where Nano is being embedded is the Google Pixel 8 Pro smartphone.
When was Google Bard first released?
Google initially announced Bard, its AI-powered chatbot, on Feb. 6, 2023, with a vague release date. It opened access to Bard on March 21, 2023, inviting users to join a waitlist. On May 10, 2023, Google removed the waitlist and made Bard available in more than 180 countries and territories. Almost precisely a year after its initial announcement, Bard was renamed Gemini.
Many believed that Google felt the pressure of ChatGPT's success and positive press, leading the company to rush Bard out before it was ready. For example, during a live demo by Google and Alphabet CEO Sundar Pichai, it responded to a query with a wrong answer.
In the demo, a user asked Bard the question: "What new discoveries from the James Webb Space Telescope can I tell my 9-year-old about?" In Bard's response, it mentioned that the telescope "took the very first pictures of a planet outside of our own solar system." Astronomers quickly took to social media to point out that the first image of an exoplanet was taken by an earthbound observatory in 2004, making Bard's answer incorrect. The next day, Google lost $100 billion in market value -- a decline attributed to the embarrassing mistake.
Why did Google rename Bard to Gemini and when did it happen?
Bard was renamed Gemini on Feb. 8, 2024. Gemini was already the LLM powering Bard. Some believe rebranding the platform as Gemini might have been done to draw attention away from the Bard moniker and the criticism the chatbot faced when it was first released. It also simplified Google's AI effort and focused on the success of the Gemini LLM.
The name change also made sense from a marketing perspective, as Google aims to expand its AI services. It's a way for Google to increase awareness of its advanced LLM offering as AI democratization and advancements show no signs of slowing.
Who can use Google Gemini?
Gemini is widely available around the world. Gemini Pro is available in more than 230 countries and territories, while Gemini Advanced is available in more than 150 countries at the time of this writing. However, there are age limits in place to comply with laws and regulations that exist to govern AI.
Users must be at least 18 years old and have a personal Google account. However, age restrictions vary for the Gemini web app. Users in Europe must be 18 or older. In other countries where the platform is available, the minimum age is 13 unless otherwise specified by local laws. Also, users younger than 18 can only use the Gemini web app in English.
Is Gemini free to use?
When Bard became available, Google gave no indication that it would charge for use. Google has no history of charging customers for services, excluding enterprise-level usage of Google Cloud. The assumption was that the chatbot would be integrated into Google's basic search engine, and therefore be free to use.
After rebranding Bard to Gemini on Feb. 8, 2024, Google introduced a paid tier in addition to the free web application. Pro and Nano are currently free to use by registration. However, users can only get access to Ultra through the Gemini Advanced option for $20 per month. Users sign up for Gemini Advanced through a Google One AI Premium subscription, which also includes Google Workspace features and 2 TB of storage.
What can you use Gemini for? Use cases and applications
The Google Gemini models are used in many different ways, including text, image, audio and video understanding. The multimodal nature of Gemini also enables these different types of input to be combined for generating output.
Use cases
Businesses can use Gemini to perform various tasks that include the following:
- Text summarization. Gemini models can summarize content from different types of data.
- Text generation. Gemini can generate text based on user prompts. That text can also be driven by a Q&A-type chatbot interface.
- Text translation. The Gemini models have broad multilingual capabilities, enabling translation and understanding of more than 100 languages.
- Image understanding. Gemini can parse complex visuals, such as charts, figures and diagrams, without external OCR tools. It can be used for image captioning and visual Q&A capabilities.
- Audio processing. Gemini has support for speech recognition across more than 100 languages and audio translation tasks.
- Video understanding. Gemini can process and understand video clip frames to answer questions and generate descriptions.
- Multimodal reasoning. A key strength of Gemini is its use of multimodal AI reasoning, where different types of data can be mixed for a prompt to generate an output.
- Code analysis and generation. Gemini can understand, explain and generate code in popular programming languages, including Python, Java, C++ and Go.
Applications
Google developed Gemini as a foundation model to be widely integrated across various Google services. It's also available for developers to use in building their own applications. Applications that use Gemini include the following:
- AlphaCode 2. Google DeepMind's AlphaCode 2 code generation tool makes use of a customized version of Gemini Pro.
- Google Pixel. The Google-built Pixel 8 Pro smartphone is the first device engineered to run Gemini Nano. Gemini powers new features in existing Google apps, such as summarization in Recorder and Smart Reply in Gboard for messaging apps.
- Android 14. The Pixel 8 Pro is the first Android smartphone to benefit from Gemini. Android developers can build with Gemini Nano through the AICore system capability.
- Vertex AI. Google Cloud's Vertex AI service, which provides foundation models that developers can use to build applications, also provides access to Gemini Pro.
- Google AI Studio. Developers can build prototypes and apps with Gemini using the Google AI Studio web-based tool.
- Search. Google is experimenting with using Gemini in its Search Generative Experience to reduce latency and improve quality.
What are Gemini's limitations?
A few limitations might cause hesitation among potential end users. These include the following:
- Training data. Like all AI chatbots, Gemini must learn to give correct answers. To do this, the models must be trained on correct information that's not inaccurate or misleading. However, they also must be able to identify incorrect or misleading information when it comes their way.
- Bias and potential harm. AI training is an endless, compute-intensive process because there's always new information to learn. Across all Gemini models, Google has claimed it has followed responsible development practices, including extensive evaluation to help limit the risk of bias and potential harm.
- Originality and creativity. There are limits on how original and creative the content Gemini produces can be. This is particularly the case with the free version, which has had trouble processing complicated prompts, with multiple steps and nuances, and producing adequate output. The free version is based on the Gemini Pro LLM, which is more limited in capabilities; the paid versions of the platform offer access to more advanced features.
What are the concerns about Gemini?
One concern about Gemini revolves around its potential to present biased or false information to users. Any bias inherent in the training data fed to Gemini could lead to wariness among users. For example, as is the case with all advanced AI software, training data that excludes certain groups within a given population will lead to skewed outputs.
The propensity of Gemini to generate hallucinations and other fabrications and pass them along to users as truthful is also a cause for concern. This has been one of the biggest risks with ChatGPT responses since its inception, as it is with other advanced AI tools. In addition, since Gemini doesn't always understand context, its responses might not always be relevant to the prompts and queries users provide.
What languages is Gemini available in?
Gemini can be used in more than 45 languages. It can translate text-based inputs into different languages with almost humanlike accuracy. Google plans to expand Gemini's language understanding capabilities and make it ubiquitous. However, there are important factors to consider, such as bans on LLM-generated content or ongoing regulatory efforts in various countries that could limit or prevent future use of Gemini.
Gemini offers other functionality across different languages in addition to translation. For example, it's capable of mathematical reasoning and summarization in multiple languages. It can also generate captions for an image in different languages.
Is image generation available in Gemini?
Upon Gemini's release, Google touted its ability to generate images the same way as other generative AI tools, such as Dall-E, Midjourney and Stable Diffusion. Gemini currently uses Google's Imagen 2 text-to-image model, which gives the tool image generation capabilities.
However, in late February 2024, Gemini's image generation feature was halted to undergo retooling after generated images were shown to depict factual inaccuracies. Google intends to improve the feature so that Gemini can remain multimodal in the long run.
Prior to Google pausing access to the image creation feature, Gemini's outputs ranged from simple to complex, depending on end-user inputs. Users could provide descriptive prompts to elicit specific images. A simple step-by-step process was required for a user to enter a prompt, view the image Gemini generated, edit it and save it for later use.
Gemini vs. GPT-3 and GPT-4
Google Gemini is a direct competitor to the GPT-3 and GPT-4 models from OpenAI. The following table compares some key features of Google Gemini and OpenAI products.
Gemini | GPT-3 and GPT-4 | |
Developer | Google DeepMind | OpenAI |
Chatbot interface | Gemini; formerly Bard | ChatGPT |
Modality | Multimodal; trained on text, images, audio and video | Originally built as a text-only language model; GPT-4 is multimodal |
Model variations | Size-based variations, including Ultra, Pro and Nano | Optimizations for size, including GPT-3.5 Turbo and GPT-4 Turbo |
Context window length | 32,000 tokens | 32,000 tokens |
Google Gemini vs. ChatGPT
Both Gemini and ChatGPT are AI chatbots designed for interaction with people through NLP and machine learning. Both use an underlying LLM for generating and creating conversational text.
ChatGPT uses generative AI to produce original content. For example, users can ask it to write a thesis on the advantages of AI. Gemini uses generative AI as well. Both are geared to make search more natural and helpful as well as synthesize new information in their answers.
In January 2023, Microsoft signed a deal reportedly worth $10 billion with OpenAI to license and incorporate ChatGPT into its Bing search engine to provide more conversational search results, similar to Google Bard at the time. That opened the door for other search engines to license ChatGPT, whereas Gemini supports only Google.
Another similarity between the two chatbots is their potential to generate plagiarized content and their ability to control this issue. Neither Gemini nor ChatGPT has built-in plagiarism detection features that users can rely on to verify that outputs are original. However, separate tools exist to detect plagiarism in AI-generated content, so users have other options. Gemini is able to cite other content in its responses and link to sources. Gemini's double-check function provides URLs to the sources of information it draws from to generate content based on a prompt.
Alternatives to Google Gemini
Gemini didn't spring up in a vacuum. AI chatbots have been around for a while, in less versatile forms. Multiple startup companies have similar chatbot technologies, but without the spotlight ChatGPT has received.
Examples of Gemini chatbot competitors that generate original text or code, as mentioned by Audrey Chee-Read, principal analyst at Forrester Research, as well as by other industry experts, include the following.
Chatsonic
Marketed as a "ChatGPT alternative with superpowers," Chatsonic is an AI chatbot powered by Google Search with an AI-based text generator, Writesonic, that lets users discuss topics in real time to create text or images.
Claude
Anthropic's Claude is an AI-driven chatbot named after the underlying LLM powering it. It has undergone rigorous testing to ensure it's adhering to ethical AI standards and not producing offensive or factually inaccurate output.
Copy.ai
Copy.ai was originally built to aid sales and marketing teams. It generates original text, such as social media posts, blogs, emails and other types of content, and it also automates workflow tasks.
GitHub Copilot
GitHub Copilot specializes in code generation for developers. The aim is to simplify the otherwise tedious software development tasks involved in producing modern software. While it isn't meant for text generation, it serves as a viable alternative to ChatGPT or Gemini for code generation.
Jasper Chat
Jasper.ai's Jasper Chat is a conversational AI tool that's focused on generating text. It's aimed at companies looking to create brand-relevant content and have conversations with customers. It enables content creators to specify search engine optimization keywords and tone of voice in their prompts.
Microsoft Bing
Microsoft and its partnership with OpenAI offer exactly what Google does with Gemini: AI-powered search that recognizes natural language queries and gives natural language responses. When a user makes a search query, they receive the standard Bing search results and an answer generated by GPT-4, as well as the ability to interact with the AI regarding its response.
SpinBot
This generative AI tool specializes in original text generation as well as rewriting content and avoiding plagiarism. It handles other simple tasks to aid professionals in writing assignments, such as proofreading.
YouChat
YouChat is the AI chatbot from the You.com search engine based in Germany. YouChat answers questions and provides the citations for its answers so that users can review the sources and fact-check its responses.
Gemini's history and future
Gemini, under its original Bard name, was initially designed around search. It aimed to provide for more natural language queries, rather than keywords, for search. Its AI was trained around natural-sounding conversational queries and responses. Instead of giving a list of answers, it provided context to the responses. Bard was designed to help with follow-up questions -- something new to search. It also had a share-conversation function and a double-check function that helped users fact-check generated results.
Bard also integrated with several Google apps and services, including YouTube, Maps, Hotels, Flights, Gmail, Docs and Drive, enabling users to apply the AI tool to their personal content.
The first version of Bard used a lighter-model version of Lamda that required less computing power to scale to more concurrent users. The incorporation of the Palm 2 language model enabled Bard to be more visual in its responses to user queries. Bard also incorporated Google Lens, letting users upload images in addition to written prompts. The later incorporation of the Gemini language model enabled more advanced reasoning, planning and understanding.
Then, as part of the initial launch of Gemini on Dec. 6, 2023, Google provided direction on the future of its next-generation LLMs. While Google announced Gemini Ultra, Pro and Nano that day, it did not make Ultra available at the same time as Pro and Nano. Initially, Ultra was only available to select customers, developers, partners and experts; it was fully released in February 2024.
The future of Gemini is also about a broader rollout and integrations across the Google portfolio. Gemini will eventually be incorporated into the Google Chrome browser to improve the web experience for users. Google has also pledged to integrate Gemini into the Google Ads platform, providing new ways for advertisers to connect with and engage users. The Duet AI assistant is also set to benefit from Gemini in the future.
On Feb. 15, 2024, Google announced early testing of Gemini 1.5. This version is optimized for a range of tasks in which it performs similarly to Gemini 1.0 Ultra, but with an added experimental feature focused on long-context understanding. According to Google, early tests show Gemini 1.5 Pro outperforming 1.0 Pro on about 87% of Google's benchmarks established for developing LLMs. Ongoing testing is expected until a full rollout of 1.5 Pro is announced.
Recent updates to Google Gemini
In May 2024, Google announced further advancements to Google 1.5 Pro at the Google I/O conference. Upgrades include performance improvements in translation, coding and reasoning features. The upgraded Google 1.5 Pro also has improved image and video understanding, including the ability to directly process voice inputs using native audio understanding. The model's context window was increased to 1 million tokens, enabling it to remember much more information when responding to prompts.
Also released in May was Gemini 1.5 Flash, a smaller model with a sub-second average first-token latency and a 1 million token context window.
In addition to the core model upgrades, Google announced new features to the Gemini API in May, including the following:
- Video frame extraction. Users can upload a video to generate content.
- Parallel function calling. Users can engage in more than one function call at a time.
The vendor plans to add context caching -- to ensure users only have to send parts of a prompt to a model once -- in June.
Previews of both Gemini 1.5 Pro and Gemini 1.5 Flash are available in over 200 countries and territories. These models will be generally available in June 2024.
The list of large language models available continues to grow. Learn about the top LLMs, including well-known ones and others that are more obscure.