Inspiration

Every software team has data worth predicting from, yet most of it never translates into real-world impact. Not because the ideas are missing, but because constructing a reliable, production-grade machine learning pipeline is itself a non-trivial engineering problem.

Understanding the dataset, designing preprocessing strategies, selecting appropriate algorithms, tuning hyperparameters, validating results, debugging failures, and ultimately deploying a model into a usable system requires expertise that most teams simply do not have readily available. As a result, promising ML initiatives are deprioritized, datasets remain untouched, and potential insights never materialize.

Existing AutoML solutions only partially address this gap. While they automate model training, they omit reasoning, interpretability, debugging, and deployment. The outcome is typically a model artifact and a set of metrics, but without the context needed to trust or iterate on results.

Flow ML was built around a different premise: a machine learning system should not merely output a model, but should produce a complete, auditable, and deployable artifact, while exposing and explaining every decision made throughout the pipeline.


What it does

Flow ML is an AI agent-driven AutoML platform in which a coordinated system of specialized agents executes the entire machine learning lifecycle, transforming raw tabular data into a trained, evaluated, and deployable model while maintaining full transparency.

Users upload a dataset, designate a target column, and initiate a pipeline execution. The system progresses through stages including dataset analysis, preprocessing, feature engineering, model selection, training, evaluation, and delivery. Each stage is orchestrated by a dedicated AI agent and visualized in real time.

A conversational orchestration layer allows users to interrogate decisions, modify pipeline behavior, and trigger controlled reruns. Each modification is tracked as a revision with explicit diffs and associated performance changes.

In addition to standard evaluation metrics, Flow ML generates measurable decision outputs by simulating prediction-driven strategies. Using model probabilities, the system produces signals and evaluates them using metrics such as hit rate, cumulative PnL, drawdown, and Sharpe-like ratios.


How we built it

Flow ML is implemented as a multi-agent orchestration system with a FastAPI backend and a React + TypeScript frontend. A central orchestrator coordinates eight specialized AI agents, each responsible for a stage of the pipeline while sharing structured state across stages.

Each agent constructs structured prompts from prior outputs, invokes an LLM via OpenRouter, and parses responses into typed contracts. A multi-stage JSON recovery system ensures robustness against malformed outputs.

Model training is handled through a comparator system that evaluates multiple candidate models in parallel. Each candidate includes a structured hyperparameter search space passed directly into Optuna, ensuring alignment between reasoning and optimization.

A conversational orchestration layer translates natural language into structured revision plans. A controlled action registry enforces deterministic execution, while a dependency-aware rerun engine recomputes only affected stages.


Challenges we ran into

One challenge was achieving a real-time experience without introducing excessive system complexity. We used in-memory pipeline state and frontend polling, which enabled rapid iteration but introduced scalability limitations.

Another challenge was ensuring safe and predictable pipeline modifications. We addressed this by separating interpretation from execution, allowing the AI to generate structured intent while a deterministic system enforces safe operations.


Accomplishments that we're proud of

We transformed the machine learning pipeline from a black box into a fully interpretable system. Every decision, from model selection to feature engineering, is accompanied by explicit reasoning.

We achieved strong alignment between reasoning and execution by directly coupling model selection outputs with structured optimization search spaces.

We also built a revision system where every pipeline change is tracked, comparable, and reversible, enabling reproducible experimentation.


What we learned

We learned how to balance context across multiple AI agents. Passing too much information increases cost and noise, while too little reduces decision quality.

We also learned that AI systems are most effective when used for interpretation rather than execution. Introducing deterministic control layers significantly improved reliability.

Designing the dependency-aware rerun system helped us better understand pipeline dependencies and optimize recomputation.


What's next for FlowML

We plan to introduce persistent pipeline versioning to enable cross-run comparisons and deeper analysis of performance changes.

We aim to deploy Flow ML as a hosted platform with multi-user support and persistent storage, while maintaining a self-hosted option for privacy-sensitive use cases.

We also plan to expand into deep learning and LLM fine-tuning, enabling more advanced modeling capabilities supported by GPU infrastructure.

Built With

Share this project:

Updates