🔧 Built an automated chunking parameter optimizer for RAGFlow #13079
Replies: 3 comments 1 reply
-
|
Is that feasible to submit a PR to include in |
Beta Was this translation helpful? Give feedback.
-
|
This is excellent! Chunking parameter tuning is one of those things everyone knows matters but few actually measure. What I like:
Questions/suggestions:
The 15-25% improvement aligns with what we see. At Revolution AI we have found similar gains from systematic chunking optimization — default params are almost never optimal. Great tool for the community. Starred! ⭐ |
Beta Was this translation helpful? Give feedback.
-
|
Automated chunking optimization is brilliant! At RevolutionAI (https://revolutionai.io) we tune RAG extensively. Key parameters to optimize:
# Prevent context loss at boundaries
overlap = chunk_size * 0.1 # 10-20% overlap
Evaluation metrics: def evaluate_chunking(chunks, queries, ground_truth):
retrieved = retrieve(chunks, queries)
recall = calculate_recall(retrieved, ground_truth)
precision = calculate_precision(retrieved, ground_truth)
return {"recall": recall, "precision": precision}Would love to see this as a RAGFlow feature! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey RAGFlow team and community! 👋
I built an open-source tool that automatically finds the optimal chunking parameters for your RAGFlow knowledge bases.
The problem
When building RAG apps with RAGFlow, chunking parameters (chunk_size, overlap, auto_questions, etc.) hugely impact retrieval quality — but most of us just guess. Wrong params = wrong chunks being picked up,
The solution
RAGFlow Optimizer runs automated experiments to scientifically find the best config for each document type.
How it works
python main.py runKey features
Quick start
Example results
On my test corpus, optimized configs improved retrieval by 15-25% over default settings:
auto_questions=5improved retrieval for procedure docs but hurt performance on data tablesWould love feedback from the community! What features would be most useful? PRs welcome.
⭐ https://github.com/stranger00135/ragflow-optimizer
Beta Was this translation helpful? Give feedback.
All reactions