This notebook implements an AI researcher that continuously searches for information based on a user query until the system is confident that it has gathered all the necessary details. It makes use of several services to do so:
- SERPAPI: To perform Google searches.
- Jina: To fetch and extract webpage content.
- OpenRouter (default model:
anthropic/claude-3.5-haiku
): To interact with a LLM for generating search queries, evaluating page relevance, and extracting context.
- Iterative Research Loop: The system refines its search queries iteratively until no further queries are required.
- Asynchronous Processing: Searches, webpage fetching, evaluation, and context extraction are performed concurrently to improve speed.
- Duplicate Filtering: Aggregates and deduplicates links within each round, ensuring that the same link isn’t processed twice.
- LLM-Powered Decision Making: Uses the LLM to generate new search queries, decide on page usefulness, extract relevant context, and produce a final comprehensive report.
- Gradio Interface: Use the
open-deep-researcher - gradio
notebook if you want to use this in a functional UI
- API access and keys for:
- OpenRouter API
- SERPAPI API
- Jina API
-
Clone or Open the Notebook:
- Download the notebook file or open it directly in Google Colab.
-
Install
nest_asyncio
:Run the first cell to set up
nest_asyncio
. -
Configure API Keys:
- Replace the placeholder values in the notebook for
OPENROUTER_API_KEY
,SERPAPI_API_KEY
, andJINA_API_KEY
with your actual API keys.
- Replace the placeholder values in the notebook for
-
Run the Notebook Cells: Execute all cells in order. The notebook will prompt you for:
- A research query/topic.
- An optional maximum number of iterations (default is 10).
-
Follow the Research Process:
- Initial Query & Search Generation: The notebook uses the LLM to generate initial search queries.
- Asynchronous Searches & Extraction: It performs SERPAPI searches for all queries concurrently, aggregates unique links, and processes each link in parallel to determine page usefulness and extract relevant context.
- Iterative Refinement: After each round, the aggregated context is analyzed by the LLM to determine if further search queries are needed.
- Final Report: Once the LLM indicates that no further research is needed (or the iteration limit is reached), a final report is generated based on all gathered context.
-
View the Final Report: The final comprehensive report will be printed in the output.
-
Input & Query Generation:
The user enters a research topic, and the LLM generates up to four distinct search queries. -
Concurrent Search & Processing:
- SERPAPI: Each search query is sent to SERPAPI concurrently.
- Deduplication: All retrieved links are aggregated and deduplicated within the current iteration.
- Jina & LLM: Each unique link is processed concurrently to fetch webpage content via Jina, evaluate its usefulness with the LLM, and extract relevant information if the page is deemed useful.
-
Iterative Refinement:
The system passes the aggregated context to the LLM to determine if further search queries are needed. New queries are generated if required; otherwise, the loop terminates. -
Final Report Generation:
All gathered context is compiled and sent to the LLM to produce a final, comprehensive report addressing the original query.
-
RuntimeError with asyncio:
If you encounter an error like:RuntimeError: asyncio.run() cannot be called from a running event loop
Ensure you have applied
nest_asyncio
as shown in the setup section. -
API Issues:
Verify that your API keys are correct and that you are not exceeding any rate limits.
Follow me on X for updates on this and other AI things I'm working on.
OpenDeepResearcher is released under the MIT License. See the LICENSE file for more details.