Augmented Intelligence Newsletter (AiN) # 16: Retrieval Augmented Generation (RAG) Pattern in the context of Customer Care Use case
@copyright: Chan Naseeb Figure 1: RAG Pattern for Customer Care

Augmented Intelligence Newsletter (AiN) # 16: Retrieval Augmented Generation (RAG) Pattern in the context of Customer Care Use case

Welcome to Augmented Intelligence Newsletter (AiN) by C. Naseeb.

AiN Issue # 16

Thank you for reading my latest article on Understanding #RAG pattern or #retrievalaugmentedgeneration.

Hey, in this issue, I explain the #RAG pattern: understanding what it is i.e., with an example of insurance policies and a diagram that illustrates the flow. 

In the previous issues, I wrote about the five pillars of #trustworthyaiTransparencyExplainabilityFairnessRobustness, and Privacy.


RAG in the Context of Customer Care Use case


What is the RAG pattern?

Retrieval Augmented Generation (RAG) is a technique to retrieve data from outside a foundation model, e.g., an organizational corpus or Documents Database, to augment the prompts by injecting the relevant retrieved data into the context. RAG is more cost-effective and efficient than pre-training or fine-tuning foundation models. RAG is a bleeding-edge design pattern enabling LLMs to directly utilize specific data when responding to prompts.


The RAG (Retrieval Augmented Generation) pattern combines Information Retrieval with Generative AI capability to provide answers instead of document matches. It can be combined with a Conversational AI front-end to provide a more engaging experience to the user. So, the conversational AI agent (for example, Watson Assistant) is integrated into the portal, where the user questions are handled by that agent. 

Retrieval Augmented Generation (RAG) was introduced by Meta AI researchers to address such knowledge-intensive tasks. RAG combines an information retrieval component with a text generator model. RAG can be fine-tuned, and its internal knowledge can be modified efficiently without retraining the entire model. This method takes input and retrieves relevant documents from a source (e.g., an organisational database). The documents are concatenated as context with the original input prompt and fed to the text generator, which produces the final output. This makes RAG adaptive to situations where facts could evolve over time. This is very useful as LLMs' parametric knowledge is inert or fixed. RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation. 



No alt text provided for this image
@copyright: Chan Naseeb Figure 1: RAG Pattern for Customer Care



Example

  1. A user is exploring health insurance policies at a company portal. The insurance company has a search interface for its users to ask questions and obtain answers from existing documents without involving a live agent. The user can pick from an existing set of commonly asked questions in the menu or ask any other questions.

2. The user question is sent to an API backend.

3. The API backend forwards the question to a document repository, such as Watson Discovery or any other searchable indexed database, to retrieve matching document chunks.

The ranked chunks are then matched with the question to find the most appropriate chunk(s) that contains the answer. This uses a similarity matching function using embeddings of the question and the document chunks using a vector DB.

4. The document chunk(s) containing the possible answer is sent to an LLM through an API call to 'generate' the answer text,

5. which can then be presented to the user as a final response from the API backend.

Multimodal models like DALL-EMidjourneyFlamingo, and GPT-4 generate outstanding images and text. The challenge with this approach is that the models store all their knowledge implicitly in the model parameters, which can cause increased training costs (e.g., more parameters and data) and hallucination (e.g., inaccurate generation due to lack of knowledge). This suggests the need for a more efficient approach, which I will try to discuss in one of the next issues, so stay tuned.

General-purpose language models can be fine-tuned to achieve several common tasks, such as sentiment analysis and named entity recognition. These tasks generally don't need further background knowledge. Building a language model-based system that accesses external knowledge sources to complete tasks is feasible for more complex and knowledge-intensive tasks. This enables more factual consistency, improves the generated responses' reliability, and helps mitigate the problem of "hallucination."


In the next one, I'll talk about #PromptEngineering;

Subscribe and view previous issues here

Subscribe to this newsletter or click 'Follow' to read my future articles. 

Enjoy the newsletter! Help us make it great and better by sharing it with your network. 


Have a nice day! See you soon. - Chan




Imran Muzammal

IT Manufacturing Engineer (GQO)

1y

The article is full of information and well-constructed. I'm curious about how #RAG handles raw data when a user is uncertain about what they're looking at.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics