Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
Multi-Document Agentic RAG Using Llama-Index and Mistral - by Plaban Nayak - The AI Forum - May, 2024 - Medium
Open in app
Search
Introduction
Large language models (LLMs) have revolutionized the way we extract insights from
vast amounts of text data. In the domain of financial analysis, LLM applications are
being designed to assist analysts in answering complex questions about company
performance, earnings reports, and market trends.
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 1/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
One such application involves the use of a retrieval augmented generation (RAG)
pipeline to facilitate the extraction of information from financial statements and
other sources.
What is an Agent?
According to Llama-Index an “agent” is an automated reasoning and decision
engine. It takes in a user input/query and can make internal decisions for executing
that query in order to return the correct result. The key agent components can
include, but are not limited to:
Choosing an external Tool to use + coming up with parameters for calling the Tool
Let’s break down how an LLM Agent can be developed to answer the
aforementioned question:
Planning: The LLM Agent first needs to understand the nature of the question and
create a plan to extract relevant information. This involves identifying key terms like
“Q2 earnings call” and “technological moats” and determining the best sources to
gather this information from.
Tailored Focus: The LLM Agent then focuses its attention on the specific aspects of the
question related to technological moats. This involves filtering out irrelevant
information and honing in on the details that are most pertinent to the analyst’s
inquiry.
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 2/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Memory: The LLM Agent leverages its memory to recall relevant information from
past earnings calls, company reports, and other sources. This helps provide context and
background information to support its analysis.
Using Different Tools: The LLM Agent utilizes a range of tools and techniques to
extract and analyze information. This may include natural language processing (NLP)
algorithms, sentiment analysis, and topic modeling to gain deeper insights into the
earnings call.
Breaking Down Complex Questions: Finally, the LLM Agent breaks down the complex
question into simpler sub-parts, making it easier to extract relevant information and
provide a coherent answer.
Tool Calling
In standard RAG LLMs are mainly used for synthesis of information only.
On the other hand Tool Calling adds a layer of query understanding on top of a RAG
Pipeline enabling the users to ask complex queries and get back more precise
results. This allows the LLM to figure out how to use a vectordb instead of just
consuming it’s outputs.
Tool Calling enables LLM to interact with external environments through a dynamic
interface where the tool calling not only helps to choose the appropriate tool but
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 3/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Agent Architecture
In LlamaIndex an agent consists of two components:
AgentRunner
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 4/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
AgentWorkers
State
Conversational Memory
Create Tasks
Maintain Tasks
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 5/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Source: Llama-Index
Calling the agent query allows to query the agent in a one-off manner but does not
preserve the state. This is where the memory aspects comes into picture to maintain
the conversation history. Here the agent maintains the chat history into a
conversational memory buffer. By default the memory buffer is a flat list of items
that is a rolling buffer depending on the context window size of the LLM. Therefore
when the agent decides to use a tool it not only uses the current chat but also the
previous conversation history in order to perform the next set of actions.
Mistral API : Developers can interact with Mistral through its API, which is
similar to the experience with OpenAI’s API system.
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 6/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Its 32K tokens context window allows precise information recall from large
documents.
Code Implementation
Code was implemented using google colab
%%writefile requirements.txt
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 7/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai
####################################################################
Successfully installed Unstructured-0.13.7 accelerate-0.30.1 asgiref-3.8.1 back
!mkdir data
#
! wget "https://arxiv.org/pdf/1810.04805.pdf" -O ./data/BERT_arxiv.pdf
! wget "https://arxiv.org/pdf/2005.11401" -O ./data/RAG_arxiv.pdf
! wget "https://arxiv.org/pdf/2310.11511" -O ./data/self_rag_arxiv.pdf
! wget "https://arxiv.org/pdf/2401.15884" -O ./data/crag_arxiv.pdf
import nest_asyncio
nest_asyncio.apply()
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 8/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")
print(f"get the content for node 0 :{nodes[0].get_content(metadata_mode='all')}
###########################RESPONSE ################################
Length of nodes : 43
get the content for node 0 :page_label: 1
file_name: self_rag_arxiv.pdf
file_path: data/self_rag_arxiv.pdf
file_type: application/pdf
file_size: 1405127
creation_date: 2024-05-11
last_modified_date: 2023-10-19
Preprint.
SELF-RAG: LEARNING TO RETRIEVE , GENERATE ,AND
CRITIQUE THROUGH SELF-REFLECTION
Akari Asai†, Zeqiu Wu†, Yizhong Wang†§, Avirup Sil‡, Hannaneh Hajishirzi†§
†University of Washington§Allen Institute for AI‡IBM Research AI
{akari,zeqiuwu,yizhongw,hannaneh }@cs.washington.edu ,[email protected]
ABSTRACT
Despite their remarkable capabilities, large language models (LLMs) often produ
responses containing factual inaccuracies due to their sole reliance on the par
ric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad
hoc approach that augments LMs with retrieval of relevant knowledge, decreases
such issues. However, indiscriminately retrieving and incorporating a fixed num
of retrieved passages, regardless of whether retrieval is necessary, or passage
relevant, diminishes LM versatility or can lead to unhelpful response generatio
We introduce a new framework called Self-Reflective Retrieval-Augmented Gen-
eration ( SELF-RAG)that enhances an LM’s quality and factuality through retriev
and self-reflection. Our framework trains a single arbitrary LM that adaptively
retrieves passages on-demand, and generates and reflects on retrieved passages
and its own generations using special tokens, called reflection tokens. Generat
reflection tokens makes the LM controllable during the inference phase, enablin
to tailor its behavior to diverse task requirements. Experiments show that SELF
RAG(7B and 13B parameters) significantly outperforms state-of-the-art LLMs
and retrieval-augmented models on a diverse set of tasks. Specifically, SELF-RA
outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA,
reasoning and fact verification tasks, and it shows significant gains in improv
factuality and citation accuracy for long-form generations relative to these mo
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 9/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
1 I NTRODUCTION
State-of-the-art LLMs continue to struggle with factual errors (Mallen et al.,
despite their increased model and data scale (Ouyang et al., 2022). Retrieval-A
(RAG) methods (Figure 1 left; Lewis et al. 2020; Guu et al. 2020) augment the i
with relevant retrieved passages, reducing factual errors in knowledge-intensiv
2023; Asai et al., 2023a). However, these methods may hinder the versatility of
unnecessary or off-topic passages that lead to low-quality generations (Shi et
retrieve passages indiscriminately regardless of whether the factual grounding
the output is not guaranteed to be consistent with retrieved relevant passages
the models are not explicitly trained to leverage and follow facts from provide
work introduces Self-Reflective Retrieval-augmented Generation ( SELF-RAG)to im
LLM’s generation quality, including its factual accuracy without hurting its ve
retrieval and self-reflection. We train an arbitrary LM in an end-to-end manner
its own generation process given a task input by generating both task output an
tokens (i.e., reflection tokens ). Reflection tokens are categorized into retri
indicate the need for retrieval and its generation quality respectively (Figure
given an input prompt and preceding generations, SELF-RAGfirst determines if au
continued generation with retrieved passages would be helpful. If so, it output
calls a retriever model on demand (Step 1). Subsequently, SELF-RAGconcurrently
retrieved passages, evaluating their relevance and then generating correspondin
2). It then generates critique tokens to criticize its own output and choose be
of factuality and overall quality. This process differs from conventional RAG (
1Our code and trained models are available at https://selfrag.github.io/ .
1arXiv:2310.11511v1 [cs.CL] 17 Oct 2023
import chromadb
db = chromadb.PersistentClient(path="./chroma_db_mistral")
chroma_collection = db.get_or_create_collection("multidocument-agent")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 10/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Settings.chunk_size = 1024
#
Instantiate the Vector Query tool and summary tool for specific document
LlamaIndex Data Agents process natural language input to perform actions rather
than generating responses. The key to creating effective data agents lies in
abstracting tools. But what exactly is meant by a tool in this context? Think of tools
as API interfaces designed for agent interactions rather than human interfaces.
Core Concepts:
Tool Spec: This delves into the API specifics, presenting a comprehensive
service API specification that can be translated into various Tools.
FunctionTool: Converts any user-defined function into a Tool, with the ability to
infer the function’s schema.
agents .
#instantiate Vectorstore
name = "BERT_arxiv"
vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma
#
# Define Vectorstore Autoretrieval tool
def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
perform vector search over index on
query(str): query string needs to be embedded
page_numbers(List[str]): list of page numbers to be retrieved,
leave blank if we want to perform a vector search ove
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]
#
query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_di
)
#
response = query_engine.query(query)
return response
#
#llamiondex FunctionTool wraps any python function we feed it
vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)
# Prepare Summary Tool
summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summar
se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
query_engine=summary_query_
description=("Use ONLY IF you
"DO NOT USE if you have specified
response = llm.predict_and_call([vector_query_tool],
"Summarize the content in page number 2",
verbose=True)
######################RESPONSE###########################
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 12/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Helper function to generate Vectorstore Tool and Summary tool for all the documents
def get_doc_tools(file_path:str,name:str)->str:
'''
get vector query and sumnmary query tools from a document
'''
#load documents
documents = SimpleDirectoryReader(input_files = [file_path]).load_data()
print(f"length of nodes")
splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")
#instantiate Vectorstore
vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chro
#
# Define Vectorstore Autoretrieval tool
def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
perform vector search over index on
query(str): query string needs to be embedded
page_numbers(List[str]): list of page numbers to be retrieved,
leave blank if we want to perform a vector search o
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]
#
query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_
)
#
response = query_engine.query(query)
return response
#
#llamiondex FunctionTool wraps any python function we feed it
vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)
# Prepare Summary Tool
summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summ
se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}
query_engine=summary_query
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 13/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
description=("Use ONLY IF y
"DO NOT USE if you have specifi
return vector_query_tool,summary_query_tool
import os
root_path = "/content/data"
file_name = []
file_path = []
for files in os.listdir(root_path):
if file.endswith(".pdf"):
file_name.append(files.split(".")[0])
file_path.append(os.path.join(root_path,file))
#
print(file_name)
print(file_path)
################################RESPONSE###############################
['self_rag_arxiv', 'crag_arxiv', 'RAG_arxiv', '', 'BERT_arxiv']
['/content/data/BERT_arxiv.pdf',
'/content/data/BERT_arxiv.pdf',
'/content/data/BERT_arxiv.pdf',
'/content/data/BERT_arxiv.pdf',
'/content/data/BERT_arxiv.pdf']
Note : FunctionTool expects a string that matches the pattern ‘^[a-zA-Z0–9_-]+$’ for
the tool name
papers_to_tools_dict = {}
for name,filename in zip(file_name,file_path):
vector_query_tool,summary_query_tool = get_doc_tools(filename,name)
papers_to_tools_dict[name] = [vector_query_tool,summary_query_tool]
####################RESPONSE###########################
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 14/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
Stuffing too many tool selections into the LLM prompt will lead to the following
issues:
The tools might not fit into the prompt especially if the number of our document
is big as we are modeling each documents as a separate tool.
Cost and latency will spike owing to the increase in number of tokens.
The prompt outline can also get confusing resulting in the LLm not performing
as instructed.
A solution here is to perform RAG on the level of tools.In order to perform this we
will use ObjectIndex class of Llama-Index.
The ObjectIndex class is one that allows for the indexing of arbitrary Python objects.
As such, it is quite flexible and applicable to a wide-range of use cases. As examples:
Using different vector stores as the storage backend, enhancing the flexibility
and scalability of applications.
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 15/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
obj_retriever = obj_index.as_retriever(similarity_top_k=2)
tools = obj_retriever.retrieve("compare and contrast the papers self rag and co
#
print(tools[0].metadata)
print(tools[1].metadata)
###################################RESPONSE###########################
ToolMetadata(description='Use ONLY IF you want to get a holistic summary of the
Ask Query 1
#
response = agent.query("Compare and contrast self rag and crag.")
print(str(response))
##############################RESPONSE###################################
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 16/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Added user message to memory: Compare and contrast self rag and crag.
=== LLM Response ===
Sure, I'd be happy to help you understand the differences between Self RAG and
Again, it's crucial to remember that both of these methods should only be used
Ask Query 2
Conclusion
Unlike the standard RAG pipeline — suitable for simple queries across a few
documents — this intelligent approach adapts based on initial findings to enhance
further data retrieval. Here we have developed an autonomous research agent,
enhancing our ability to engage with and analyze our data comprehensively.
References:
Llamaindex Agents
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 17/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Follow
476 3
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 18/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
498 5
Which Vector Database Should You Use? Choosing the Best One for Your
Needs
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 19/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
Introduction
363 6
346 5
356 2
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 21/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
476 3
Lists
ChatGPT prompts
47 stories · 1573 saves
728 4
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 22/24
21/05/2024, 15:48 Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
405 7
Assaf Elovic
Learn how to build an autonomous research assistant using LangGraph with a team of
specialized AI agents
649 2
Tabular Data, RAG, & LLMs: Improve Results Through Data Table
Prompting
How to ingest small tabular data when working with LLMs.
83 1
https://medium.com/the-ai-forum/multi-document-agentic-rag-using-llama-index-and-mistral-b334fa45d3ee 24/24