Self Checks
RAGFlow workspace code commit ID
cfdcceb
RAGFlow image version
v0.22.1
Other environment information
Actual behavior
When using the ragflow_retrieval MCP tool with a dataset containing more than 30 documents, chunks from documents 31+ do not contain document_metadata. This is because the metadata is fetched from the document metadata cache. That cache is built using the list documents endpoint with the default pagination parameters of page=1, page_size=30:
|
docs_res = self._get(f"/datasets/{dataset_id}/documents") |
|
page = int(q.get("page", 1)) |
Ultimately that means we'll only ever be able to see the metadata for the first 30 docs. Even if the logic is altered to fetch all the pages when building the metadata cache there is also a hard limit of 128 documents.
|
_MAX_DOCUMENT_CACHE = 128 |
|
if len(self._dataset_metadata_cache) > self._MAX_DATASET_CACHE: |
Expected behavior
When using the ragflow_retrieval tool, every chunk should include the relevant document_metadata
Steps to reproduce
1. Create a dataset,
2. Upload more than 30 documents,
3. Use the langflow_retrieval MCP tool, using a query which will return chunks from one of the later documents in the set,
4. Check the response to verify that those chunks do not include a document_metadata property
Additional information
No response
Self Checks
RAGFlow workspace code commit ID
cfdcceb
RAGFlow image version
v0.22.1
Other environment information
Actual behavior
When using the
ragflow_retrievalMCP tool with a dataset containing more than 30 documents, chunks from documents 31+ do not containdocument_metadata. This is because the metadata is fetched from the document metadata cache. That cache is built using the list documents endpoint with the default pagination parameters of page=1, page_size=30:ragflow/mcp/server/server.py
Line 243 in 2fd5ac1
ragflow/api/apps/sdk/doc.py
Line 547 in 2fd5ac1
Ultimately that means we'll only ever be able to see the metadata for the first 30 docs. Even if the logic is altered to fetch all the pages when building the metadata cache there is also a hard limit of 128 documents.
ragflow/mcp/server/server.py
Line 60 in 2fd5ac1
ragflow/mcp/server/server.py
Line 104 in 2fd5ac1
Expected behavior
When using the ragflow_retrieval tool, every chunk should include the relevant document_metadata
Steps to reproduce
Additional information
No response