Building a RAG System using Gemini API

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

Published Sep 6, 2024

Welcome to the first episode of AI Weekly with Krithi! In this series, we’ll explore various AI topics, tools, and techniques through practical demos.

Today, we’ll dive into building a Retrieval-Augmented Generation (RAG) system using the Gemini API.

We all started using generative AI for everything, from writing emails to all the LinkedIn posts I see here, haha. It made our lives easier.

But here’s a funny thought!

With all our prompts, generative AI is learning well, but our learning has vanished. Let’s learn to avoid that. That’s why Krithi is a procrastinator when it comes to writing. When I get busy with talks, I never ask, “Hey Cool LLM, generate my post,” because I don’t want to claim computer generation as mine. So, let’s motivate ourselves to do certain tasks on our own, in a polite way.

It’s important to do things ourselves because it helps us grow and learn. When we rely too much on AI, we miss out on the opportunity to develop our skills and creativity. By doing tasks on our own, we can better understand the process and improve our abilities.

I remember my professor saying, “Don’t go to GPT and ask, ‘Glen gave me one assignment, do it.’”

Even though generative AI has made our lives easier, I would like to talk a bit about it. Today, all the LLMs we use work by retrieving information from their knowledge base. They generate responses based on the data they have been trained on.

LLMs

LLMs, or Large Language Models, are a type of AI that can understand and generate human-like text. They differ from generative AI in that LLMs are specifically designed to process and generate language, while generative AI can create a variety of content, including images, music, and more. LLMs work by analyzing vast amounts of text data and learning patterns, grammar, and context to generate coherent and contextually relevant responses.

And you know, sometimes we don’t get real-time updates using LLMs. Here’s the hero to solve this problem: RAG.

RAG

RAG, or Retrieval-Augmented Generation, is a technique that combines the power of retrieval-based methods with generative models. It works by first retrieving relevant information from a large corpus of data and then using that information to generate more accurate and contextually relevant responses. This approach helps in providing up-to-date information and improves the overall quality of the generated content.

One of the main problems RAG solves is the limitation of LLMs in accessing real-time information. Traditional LLMs rely solely on their training data, which can become outdated. RAG addresses this by incorporating a retrieval mechanism that fetches the latest information from external sources, ensuring that the generated responses are current and accurate.

Additionally, RAG helps in handling ambiguous queries more effectively. By retrieving multiple relevant documents, the system can generate responses that consider various perspectives, leading to more comprehensive and nuanced answers.

To sum up RAG,

RAG enhances the capabilities of LLMs by combining retrieval and generation, providing real-time updates, and improving the quality of responses. It’s a powerful technique that addresses some of the key limitations of traditional generative models.

Come on Let's discuss our Use Case

Let's say, I’m working on an LLM RAG-powered chatbot for a startup. Whenever I ask for information about the startup, the LLM doesn’t provide it, so I’ve added the info about the startup in a PDF. Whenever clients use the LLM RAG-powered chatbot on my website, they can access this information. This setup helps solve a specific use case for my startup.

How LLMs and RAG helps me here?

LLM (Large Language Model) generates human-like responses and understands natural language queries.

RAG (Retrieval-Augmented Generation) retrieves relevant information from external sources (like the PDF) and combines it with the LLM’s capabilities to provide accurate and contextually relevant responses.

This combination ensures that clients get precise information about my startup when they interact with the chatbot.

Come on Let's code it!

First Step is to setup Environment Variables

But Why?

Environment variables keep your API keys secure and your code adaptable. Just swap in your unique key to maintain safety and flexibility!

By using environment variables, you can change configurations without modifying your code, which is especially useful when deploying applications across different environments. Plus, it makes your code more portable and easier to share with others, as they can set their own environment variables without needing to tweak the code.

import os
os.environ["GEMINI_API_KEY"] = "Your_Gemini_API_Key"

Don’t comment on this post saying the code doesn’t work. Just remember to replace "Your_Gemini_API_Key" with your unique Gemini API key. 😉

Second step is to Extracting Text from a PDF Using PyPDF

This step reads a PDF file and extracts the text from each page, combining it into a single string. Make sure to install the pypdf library and provide the correct file path for your PDF.

from pypdf import PdfReader

def load_pdf(file_path):

    reader = PdfReader(file_path)

    text = ""

    for page in reader.pages:

      text += page.extract_text()

    return text

text = load_pdf(file_path= "/content/startup.pdf")

Third step is to split the text into chunks

It’s useful for breaking down large texts into manageable pieces for better retrieval and generation in RAG models.

import re
def split_text(text: str):
     split_text = re.split('\n \n', text)
     return [i for i in split_text if i != ""]

Fourth step is to build Gemini Embedding Function

This function uses the Gemini API to generate embeddings for the documents. Embeddings are numerical representations of text that capture its semantic meaning. In our scenario, this function helps in creating embeddings for the extracted text.

import google.generativeai as genai
from chromadb import Documents, EmbeddingFunction, Embeddings
import os
class GeminiEmbeddingFunction(EmbeddingFunction):
  def __call__(self, input: Documents) -> Embeddings:
    gemini_api_key = os.getenv("GEMINI_API_KEY")
    if not gemini_api_key:
      raise ValueError(" Please provide GEMINI_API_KEY as   an environment variable")
    genai.configure(api_key=gemini_api_key)
    model = "models/embedding-001"
    title = "Custom query"
    return genai.embed_content(model=model,content=input,task_type="retrieval_document",title=title)["embedding"]

If you face any issues, make sure you have

!pip install google-generativeai

!pip install chromadb

!pip install --upgrade typing_extensions

Fifth Step is Creating and Loading ChromaDB Collection

ChromaDB is used to store the documents and their embeddings. In our scenario, we create a ChromaDB collection to store the extracted text and its embeddings.

import chromadb
def create_chroma_db(documents, path, name):
  chroma_client = chromadb.PersistentClient(path=path)
  db = chroma_client.create_collection(name=name, embedding_function=GeminiEmbeddingFunction())
  for i, d in enumerate(documents):
    db.add(documents=d, ids=str(i))
  return db, name

db,name =create_chroma_db(documents=text, path="C:\Repos\RAG\contents", name="rag_experiment")

def load_chroma_collection(path, name):
  chroma_client = chromadb.PersistentClient(path=path)
  db = chroma_client.get_collection(name=name, embedding_function=GeminiEmbeddingFunction())
  return db

db=path=load_chroma_collection("C:\Repos\RAG\contents", name="rag_experiment")

Sixth Step is Retrieving Relevant Passage

This step involves retrieving the most relevant passage from the ChromaDB collection based on the query. In our scenario, this helps in fetching the relevant information about the startup.

def get_relevant_passage(query, db, n_results):
    passage = db.query(query_texts=[query], n_results=n_results)['documents'][0]
    return passage

relevant_text = get_relevant_passage("transformers", db, 3)

Seventh Step is defining prompt

This function generates a prompt to answer a question using a reference passage.

def make_rag_prompt(query, relevant_passage):
    escaped = relevant_passage.replace("'", "").replace('"', "").replace("\n", " ")

    prompt = ("""You are a helpful and informative bot that answers questions using text from the reference passage included below. \Be sure to respond in a complete sentence, being comprehensive, including all relevant background information. \
          However, you are talking to a non-technical audience, so be sure to break down complicated concepts and \
            strike a friendly and converstional tone. \
              If the passage is irrelevant to the answer, you may ignore it.
                QUESTION: '{query}'
                  PASSAGE: '{relevant_passage}'ANSWER:
                      """).format(query=query, relevant_passage=escaped)

    return prompt

Step 8 is to Generate response

This code retrieves relevant text chunks from a database, creates a prompt, and uses a generative AI model to generate a response. It is designed to answer user queries using the Gemini API.

import google.generativeai as genai
def generate_response(prompt):
    gemini_api_key = os.getenv("GEMINI_API_KEY")
    if not gemini_api_key:
      raise ValueError("Gemini API Key not provided. Please provide GEMINI_API_KEY as an environment variable")
    genai.configure(api_key=gemini_api_key)
    model = genai.GenerativeModel('gemini-pro')
    answer = model.generate_content(prompt)
    return answer.text

def generate_answer(db,query):
      #retrieve top 3 relevant text chunks
    relevant_text = get_relevant_passage(query,db,n_results=3)
    prompt = make_rag_prompt(query, relevant_passage="".join(relevant_text)) # joining the relevant chunks to create a single passage
    answer = generate_response(prompt)
    return answer

db=load_chroma_collection(path="C:\Repos\RAG\contents", #replace with path of your persistent directory
                          name="rag_experiment") #replace with the collection name
answer = generate_answer(db,query=" what are the services offered")
print(answer)

We successfully built a Retrieval-Augmented Generation (RAG) system using the Gemini API to enhance the capabilities of Large Language Models (LLMs). This system retrieves relevant information from external sources and combines it with the LLM’s capabilities to provide accurate and contextually relevant responses. This setup ensures that clients get precise information about the startup when they interact with the RAG.

Thank you for joining us on the first episode of AI Weekly with Krithi!

View the code here

I hope you found it informative and engaging.

See you next week for more exciting AI topics and practical demos.

Have a great week ahead and stay tuned!

Cheers,

Kiruthika.

Marco Javier Suarez Baron

Associate Professor UPTC-Colombia

4mo

Excellent!

Sasidharan Balakrishnan

Student Ambassador @ UIT || Director - Rotary & Rotaract @ Rotaract of UIT || Python Dev || Freelancer || Design Thinker || LLM Explorer || Innovator || Prompt Engineering poet || R&D || Backend

5mo

Useful tips

2 Reactions

See more comments

To view or add a comment, sign in

Building a RAG System using Gemini API

Kiruthika Subramani

Innovating AI for a Better Tomorrow | AI Engineer | Google Developer Expert | Author | IBM Dual Champion | 200+ Global AI Talks | Master's Student at MILA

LLMs

RAG

Come on Let's discuss our Use Case

First Step is to setup Environment Variables

Second step is to Extracting Text from a PDF Using PyPDF

Third step is to split the text into chunks

Fourth step is to build Gemini Embedding Function

Fifth Step is Creating and Loading ChromaDB Collection

Sixth Step is Retrieving Relevant Passage

Seventh Step is defining prompt

Step 8 is to Generate response

I hope you found it informative and engaging.

More articles by Kiruthika Subramani

Explore topics

LLMs

RAG

Come on Let's discuss our Use Case

First Step is to setup Environment Variables

Second step is to Extracting Text from a PDF Using PyPDF

Third step is to split the text into chunks

Fourth step is to build Gemini Embedding Function

Fifth Step is Creating and Loading ChromaDB Collection

Sixth Step is Retrieving Relevant Passage

Seventh Step is defining prompt

Step 8 is to Generate response

I hope you found it informative and engaging.

More articles by Kiruthika Subramani

RAG System with Video

Evaluation methods for LLMs

Different Fine-tuning Methods for LLMs

Pretraining and Fine Tuning LLMs

Architecting Large Language Models

LLMs #2

LLM's Introduction

Transformers

Generative Adversarial Network (GAN)

Autoencoder

Explore topics