[Bug]: document_metadata missing for most chunks in large dataset

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### RAGFlow workspace code commit ID

cfdccebb

### RAGFlow image version

v0.22.1

### Other environment information

```Markdown

```

### Actual behavior

When using the `ragflow_retrieval` MCP tool with a dataset containing more than 30 documents, chunks from documents 31+ do not contain `document_metadata`. This is because the metadata is fetched from the document metadata cache. That cache is built using the list documents endpoint with the default pagination parameters of page=1, page_size=30:
https://github.com/infiniflow/ragflow/blob/2fd5ac1031ad487388fc216d108225ea93def9ce/mcp/server/server.py#L243
https://github.com/infiniflow/ragflow/blob/2fd5ac1031ad487388fc216d108225ea93def9ce/api/apps/sdk/doc.py#L547

Ultimately that means we'll only ever be able to see the metadata for the first 30 docs. Even if the logic is altered to fetch all the pages when building the metadata cache there is also a hard limit of 128 documents.
https://github.com/infiniflow/ragflow/blob/2fd5ac1031ad487388fc216d108225ea93def9ce/mcp/server/server.py#L60
https://github.com/infiniflow/ragflow/blob/2fd5ac1031ad487388fc216d108225ea93def9ce/mcp/server/server.py#L104

### Expected behavior

When using the ragflow_retrieval tool, every chunk should include the relevant document_metadata

### Steps to reproduce

```Markdown
1. Create a dataset,
2. Upload more than 30 documents,
3. Use the langflow_retrieval MCP tool, using a query which will return chunks from one of the later documents in the set,
4. Check the response to verify that those chunks do not include a document_metadata property
```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: document_metadata missing for most chunks in large dataset #11533

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: document_metadata missing for most chunks in large dataset #11533

Description

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions