Skip to content

OllamaFlow is a lightweight intelligent load-balancer and model synchronizer for Ollama

License

Notifications You must be signed in to change notification settings

crazyants/ollamaflow

Β 
Β 

Repository files navigation

OllamaFlow

OllamaFlow

Intelligent Load Balancing and Model Orchestration for Ollama

License: MIT .NET Docker

πŸš€ Scale Your Ollama Infrastructure

OllamaFlow is a lightweight, intelligent orchestration layer that transforms multiple Ollama instances into a unified, high-availability AI inference cluster. Whether you're scaling AI workloads across multiple GPUs or ensuring zero-downtime model serving, OllamaFlow has you covered.

Why OllamaFlow?

  • 🎯 Multiple Virtual Endpoints: Create multiple frontend endpoints, each mapping to their own set of Ollama backends
  • βš–οΈ Smart Load Balancing: Distribute requests intelligently across healthy backends
  • πŸ”„ Automatic Model Sync: Ensure all backends have the required models - automatically
  • ❀️ Health Monitoring: Real-time health checks with configurable thresholds
  • πŸ“Š Zero Downtime: Seamlessly handle backend failures without dropping requests
  • πŸ› οΈ RESTful Admin API: Full control through a comprehensive management API

🎨 Key Features

Load Balancing

  • Round-robin and random distribution strategies
  • Request routing based on backend health and capacity
  • Automatic failover for unhealthy backends
  • Configurable rate limiting per backend

Model Management

  • Automatic model discovery across all backends
  • Intelligent synchronization - pulls missing models automatically
  • Dynamic model requirements - update required models on the fly
  • Parallel downloads with configurable concurrency

High Availability

  • Real-time health monitoring with customizable check intervals
  • Automatic failover for unhealthy backends
  • Request queuing during high load
  • Connection pooling for optimal performance

Enterprise Ready

  • Bearer token authentication for admin APIs
  • Comprehensive logging with syslog support
  • Docker and Docker Compose ready
  • SQLite database for configuration persistence

πŸƒ Quick Start

Using Docker (Recommended)

# Pull the image
docker pull jchristn/ollamaflow

# Run with default configuration
docker run -d \
  -p 43411:43411 \
  -v $(pwd)/ollamaflow.json:/app/ollamaflow.json \
  jchristn/ollamaflow

Using .NET

# Clone the repository
git clone https://github.com/jchristn/ollamaflow.git
cd ollamaflow/src

# Build and run
dotnet build
cd OllamaFlow.Server/bin/Debug/net8.0
dotnet OllamaFlow.Server.dll

βš™οΈ Configuration

OllamaFlow uses a simple JSON configuration file. Here's a minimal example:

{
  "Webserver": {
    "Hostname": "localhost",
    "Port": 43411
  },
  "Logging": {
    "MinimumSeverity": "Info",
    "ConsoleLogging": true
  }
}

Frontend Configuration

Frontends define your virtual Ollama endpoints:

{
  "Identifier": "main-frontend",
  "Name": "Production Ollama Frontend",
  "Hostname": "*",
  "LoadBalancing": "RoundRobin",
  "Backends": ["gpu-1", "gpu-2", "gpu-3"],
  "RequiredModels": ["llama3", "mistral", "codellama"]
}

Backend Configuration

Backends represent your actual Ollama instances:

{
  "Identifier": "gpu-1",
  "Name": "GPU Server 1",
  "Hostname": "192.168.1.100",
  "Port": 11434,
  "MaxParallelRequests": 4,
  "HealthCheckUrl": "/",
  "UnhealthyThreshold": 2
}

πŸ“‘ API Compatibility

OllamaFlow is fully compatible with the Ollama API, supporting:

  • βœ… /api/generate - Text generation
  • βœ… /api/chat - Chat completions
  • βœ… /api/pull - Model pulling
  • βœ… /api/push - Model pushing
  • βœ… /api/show - Model information
  • βœ… /api/tags - List models
  • βœ… /api/ps - Running models
  • βœ… /api/embed - Embeddings
  • βœ… /api/delete - Model deletion

πŸ”§ Advanced Features

Multi-Node Testing

Test with multiple Ollama instances using Docker Compose:

cd Docker
docker compose -f compose-ollama.yaml up -d

This spins up 4 Ollama instances on ports 11435-11438 for testing.

Admin API

Manage your cluster programmatically:

# List all backends
curl -H "Authorization: Bearer your-token" \
  http://localhost:43411/v1.0/backends

# Add a new backend
curl -X PUT \
  -H "Authorization: Bearer your-token" \
  -H "Content-Type: application/json" \
  -d '{"Identifier": "gpu-4", "Hostname": "192.168.1.104", "Port": 11434}' \
  http://localhost:43411/v1.0/backends

A complete Postman collection (OllamaFlow.postman_collection.json) is included in the repository root with examples for all API endpoints, both Ollama-compatible and administrative APIs.

🀝 Contributing

We welcome contributions! Whether it's:

  • πŸ› Bug fixes
  • ✨ New features
  • πŸ“š Documentation improvements
  • πŸ’‘ Feature requests

Please check out our Contributing Guidelines and feel free to:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“Š Performance

OllamaFlow adds minimal overhead to your Ollama requests:

  • < 1ms routing decision time
  • Negligible memory footprint (~50MB)
  • High throughput - handles thousands of requests per second
  • Efficient streaming support for real-time responses

πŸ›‘οΈ Security

  • Bearer token authentication for administrative APIs
  • Request source IP forwarding for audit trails
  • Configurable request size limits
  • No external dependencies for core functionality

🌟 Use Cases

  • GPU Cluster Management: Distribute AI workloads across multiple GPU servers
  • CPU Infrastructure: Perfect for dense CPU systems like Ampere processors
  • High Availability: Ensure your AI services stay online 24/7
  • Development & Testing: Easily switch between different model configurations
  • Cost Optimization: Maximize hardware utilization across your infrastructure
  • Multi-Tenant Scenarios: Isolate workloads while sharing infrastructure

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • The Ollama team for creating an amazing local AI runtime
  • All our contributors and users who make this project possible

Ready to scale your AI infrastructure?
Get started with OllamaFlow today!

About

OllamaFlow is a lightweight intelligent load-balancer and model synchronizer for Ollama

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 98.7%
  • Other 1.3%