Problem Statement Pathway (1)
Problem Statement Pathway (1)
Agentic RAG adds autonomy to standard RAG systems. Instead of manually orchestrating each retrieval
step, an “agent” (your AI application) can:
2. About Pathway
About the Company: Pathway Technology Inc. (pathway.com) is the maker of the world’s fastest global
data processing engine (GitHub). With offices in the US, France, and Poland, our ~25-member team has
deep expertise from top AI labs like Microsoft Research, Google Brain, and ETH Zurich. Many of our
members have worked at Google and hold degrees from prestigious institutions like École Polytechnique,
UC Berkeley, CNRS, and HEC Paris—one even earned a PhD at just 20. Our CTO has co-authored notable
works with AI pioneers Geoffrey Hinton and Yoshua Bengio. Our leadership includes the co-founder of
Spoj.com (one of the earliest CP platforms with over 1M developers) and NK.pl (Poland’s first social
media with 13.5M+ users), and advisors from current/previous leadership of OpenAI, SAP, and DHL.
● https://github.com/pathwaycom/llm-app
● https://github.com/pathwaycom/pathway
Pathway is a Python data processing framework designed for analytics and AI pipelines over data
streams. It is the ideal solution for real-time processing use cases such as streaming ETL or Retrieval
Augmented Generation (RAG) pipelines for unstructured, changing data.
Key Components of this definition:
1. Python Framework: Written in Rust 🦀 for speed and efficiency, Pathway is usable via Python,
making it powerful yet simple to use with just Python know-how.
2. Data Processing: Pathway excels at processing large-scale, real-time data and is recognized as
the world’s fastest data processing engine. As a developer, you can use it for tasks like
performing JOINS on incoming data streams (real-time data flow) or updating vector/hybrid
indexes in real time. These are just simple examples—its potential goes much further.
3. AI Pipelines Over Data Streams: Pathway helps AI systems learn from real-time data streams,
enabling applications like sentiment analysis, anomaly detection, and RAG pipelines that
automatically adapt to incoming data.
3. Problem Statement
Title: Build a Real-Time Retrieval-Augmented Generation (RAG) App with Pathway
At Pathway, we offer several app templates with end-to-end executable code in Dockerized
environments. However, for this challenge, we expect participants to explore and build solutions using
their own expertise, rather than relying on a pre-defined template.
Objective:
Create a fully functional end to end real-time RAG application that leverages Pathway as its core
orchestrator for data ingestion, incremental indexing (vector/hybrid), and REST API deployment. You are
free to use any domain (health, finance, etc.) and any agentic framework (e.g., LangGraph, Crew AI,
AutoGen, OpenAI Swarm) if you want to add agent functionality. However, your solution must
demonstrate how Pathway’s real-time pipeline automatically adapts to data updates and provides fresh
context to the LLM or agent.
Minimum Requirements
Participants may choose any domain (finance, healthcare, e-commerce, etc.) where real-time data makes
a difference. For instance, in finance, potential applications include:
● Compliance: Automate the interpretation of new regulations (e.g., AML, MIFID) and flag changes
via alerts.
● Due Diligence: Extract key metrics from pitch decks or risk reports.
● Analyst Reports: Generate dynamic investment analyses that reference real-time data.
● Asset Management: Merge diverse ESG data into up-to-date compliance summaries.
● You may integrate an AI agent framework of your choice (e.g., LangGraph, Crew AI, AutoGen,
OpenAI Swarm).
● The agent can orchestrate multi-step actions, call external tools, or handle complex conversation
flows. (Not a deal-breaker)
● Your real-time Pathway pipeline should still remain the single source of truth for up-to-date
contextual data.
Non-Negotiable: Participants integrating an AI agent framework must deploy the custom agentic
workflow by exposing their agent logic via a REST API endpoint using, ensuring seamless
interaction with the real-time RAG pipeline powered by Pathway
4. Overall Evaluation Framework
I. Technical Implementation (70%)
5. Bonus Ideas
● Automated Alerts: Trigger Slack/email alerts if a newly ingested document changes the answer
to a compliance query.
● Enhanced Summaries: Summarize key financial metrics (e.g., EBITDA, ROI) in a structured table.
● Extending Pathway: Extend an existing class in Pathway to unlock new capabilities.
● Handling Complex Data: Show how to process non-textual formats (tables, charts) using vLMs
where pure text-based LLMs might struggle.
● Agentic Error Handling: Implement fallback mechanisms (e.g., switching to a backup data
source) if a primary feed fails.
● Cutting-Edge Implementations: Explore state-of-the-art models (e.g., Gemini 2.0) or design a
“Faraday cage” approach with zero external API dependency.