0% found this document useful (0 votes)
15 views2 pages

2.5 Retrieval Augmented Generation RAG

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by providing additional context from relevant documents to improve their responses to specific queries. The RAG process involves retrieving pertinent information, incorporating it into an enriched prompt, and then prompting the LLM for a more informed answer. This technique is transforming applications and web search by allowing LLMs to function as reasoning engines rather than mere knowledge stores.

Uploaded by

dnabc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views2 pages

2.5 Retrieval Augmented Generation RAG

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by providing additional context from relevant documents to improve their responses to specific queries. The RAG process involves retrieving pertinent information, incorporating it into an enriched prompt, and then prompting the LLM for a more informed answer. This technique is transforming applications and web search by allowing LLMs to function as reasoning engines rather than mere knowledge stores.

Uploaded by

dnabc04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

We've already seen that prompting a large language model (LLM) can take you quite far, but

there's a technique called Retrieval-Augmented Generation (RAG) that can significantly


expand what an LLM can do by giving it additional knowledge beyond what it may have
learned from data on the Internet or other open sources.

If you ask a general-purpose chat system, such as one of the ones on the internet, a question
like "Is there parking for employees?" it might answer something like, "I need more specific
information about your workplace," because it doesn't know the parking policy for your
company. However, RAG, or Retrieval-Augmented Generation, is a technique that can give
the LLM additional information so that if you ask it if there's parking, it can refer to policies
specific to your company.

How does it work? RAG has three steps. Step one is to take a question, such as "Is there
parking for employees?" and first look through a collection of documents that may have the
answer. For example, if your company has documents on employee benefits, leave policy,
facilities, and payroll processes, the RAG system would first identify which of these
documents is most relevant. Parking seems like a question about facilities, so hopefully, the
system would select the facilities document as the most relevant.

The second step is to incorporate the retrieved document or text into an updated prompt. For
instance, you might construct a prompt as follows: "Use the following pieces of context to
answer the question at the end." Then, you'd include relevant text from the facilities
documentation, such as the parking policy that states all employees may park on levels one
and two. This creates a long prompt that provides context for the LLM.

In practice, instead of dumping an entire long document into the prompt, you might extract
just the part that's most relevant to the question. You then append the original question, "Is
there parking for employees?" to the prompt. This is called Retrieval-Augmented Generation
because it generates an answer by augmenting the prompt with retrieved context or
information.

The final step is to prompt the LLM with this enriched prompt. Ideally, the LLM will provide
a thoughtful answer, such as explaining where employees can park. In some applications
using RAG, the output shown to the user might also include a link to the original source
document that led to the generated answer, allowing users to verify the response by consulting
the source.

RAG enables LLMs to have context or information beyond what they may have learned on
the open internet. For example, applications like Panda Chat AI, Chat PDF, and others let
users upload PDF files and ask questions. These tools use RAG to generate answers based on
the uploaded document. Similarly, other RAG applications answer questions based on website
content, such as Coursera Coach, Snapchat's chatbot, and HubSpot's chatbots.
RAG is also transforming web search. Microsoft Bing has a chat capability, Google has a
generative AI feature, and You.com, a web search engine started by a former PhD student,
Richard Socher, centers on a chat-like interface. These examples show how RAG is
revolutionizing how we interact with information.

A key takeaway is to think of LLMs not as knowledge stores but as reasoning engines. While
LLMs may have read a lot of text online, they don’t know everything. Instead, the RAG
approach provides relevant context in the prompt, asking the LLM to process the information
to reach an answer. This shifts the focus from memorization to reasoning.

Although LLM technology is still early and has limitations, viewing LLMs as reasoning
engines opens up exciting possibilities for new applications. Even if you're just using a web
interface, you can copy a piece of text into the prompt and use it as context for generating
answers, which is essentially an application of RAG.

RAG has proven useful in many applications, and I hope you find it valuable too. In the next
video, we'll explore another technique called fine-tuning, which expands what an LLM can
do. Thank you for watching, and I hope you clean up with this RAG stuff! See you in the next
video.

You might also like