Fast Sandbox is a high-performance, cloud-native (Kubernetes-native) sandbox management system designed to provide millisecond-scale cold container startup and controlled self-healing capabilities for AI Agents, Serverless functions, and compute-intensive tasks.
By pre-warming "Agent Pod" resource pools and directly integrating with host-level container management, Fast Sandbox bypasses the significant overhead of traditional Kubernetes Pod creation, achieving ultra-fast task distribution with physical isolation.
- Fast-Path API: gRPC-based fast path supporting <50ms end-to-end startup latency. Dual-mode switching between Fast Mode (Agent-First, ultra-fast) and Strong Mode (CRD-First, strong consistency).

- Developer CLI (
fsb-ctl): Docker-like command-line experience with interactive creation, configuration management, streaming log viewing (logs -f), and status queries. - Zero-Pull Startup: Leverages Host Containerd Integration to launch microcontainers directly on the host, reusing node image cache.
- Smart Scheduling: Allocation algorithm based on Image Affinity and Atomic Slots, eliminating image pull latency and avoiding port conflicts.
- Resilient Design:
- Controlled Self-Healing: Supports
AutoRecreatepolicy and manualresetRevision. - Graceful Shutdown: Complete SIGTERM → SIGKILL flow preventing zombie processes.
- Node Janitor: Independent DaemonSet for automatic orphan container and file cleanup.
- Controlled Self-Healing: Supports
The system uses a "centralized control plane decision, extreme data plane execution" architecture:

- Fast-Path Server (gRPC): Handles high-concurrency sandbox create/delete requests, direct CLI access
- Port:
9090 - Services:
CreateSandbox,DeleteSandbox,UpdateSandbox,ListSandboxes,GetSandbox
- Port:
- SandboxController: Manages CRD state machine, Finalizer resource cleanup, and dual-mode consistency coordination
- SandboxPoolController: Manages Agent Pod resource pools (Min/Max capacity)
- Atomic Registry: In-memory state center supporting high-concurrency mutex allocation and image weight scoring
- Privileged Pods running on hosts, communicating via HTTP with the control plane
- Runtime Integration: Direct Containerd Socket access for container lifecycle and log persistence
- HTTP Server: Listens on port
5758POST /api/v1/agent/create- Create sandboxPOST /api/v1/agent/delete- Delete sandboxGET /api/v1/agent/status- Get agent statusGET /api/v1/agent/logs?follow=true- Stream logs
- fsb-ctl: Developer CLI with
run,list,get,logs,deletecommands
make build
# Generates bin/fsb-ctl
export PATH=$PWD/bin:$PATHfsb-ctl run my-sandbox
# Opens editor for configuration (image, ports, command, env)fsb-ctl logs my-sandbox -fYou can also use Kubernetes CRD directly:
apiVersion: sandbox.fast.io/v1alpha1
kind: Sandbox
metadata:
name: my-sandbox
namespace: default
spec:
image: alpine:latest
exposedPorts: [8080]
poolRef: default-pool
consistencyMode: fast # or strong
failurePolicy: AutoRecreate- CLI → Controller gRPC request
- Registry allocates Agent
- Controller → Agent HTTP create request
- Agent starts container via Containerd
- Controller returns success to CLI
- Controller async creates K8s CRD
Latency: <50ms Trade-off: CRD creation failure may cause orphan (cleaned by Janitor)
- CLI → Controller gRPC request
- Controller creates K8s CRD (Pending phase)
- Controller Watch triggers
- Controller → Agent HTTP create request
- Agent starts container
- CRD status updated to Running
Latency: ~200ms Guarantee: Strong consistency, no orphans
| Flag | Default | Description |
|---|---|---|
--agent-port |
5758 |
Agent HTTP server port |
--metrics-bind-address |
:9091 |
Prometheus metrics endpoint |
--health-probe-bind-address |
:5758 |
Health check endpoint |
--fastpath-consistency-mode |
fast |
Consistency mode: fast or strong |
--fastpath-orphan-timeout |
10s |
Fast mode orphan cleanup timeout |
| Flag | Default | Description |
|---|---|---|
--containerd-socket |
/run/containerd/containerd.sock |
Containerd socket path |
--http-port |
5758 |
HTTP server port |
| Variable | Description |
|---|---|
AGENT_CAPACITY |
Max sandboxes per agent (default: 5) |
service FastPathService {
rpc CreateSandbox(CreateRequest) returns (CreateResponse);
rpc DeleteSandbox(DeleteRequest) returns (DeleteResponse);
rpc UpdateSandbox(UpdateRequest) returns (UpdateResponse);
rpc ListSandboxes(ListRequest) returns (ListResponse);
rpc GetSandbox(GetRequest) returns (SandboxInfo);
}FAST: Create container first, async CRD writeSTRONG: Write CRD first, then create container
MANUAL: Report status only, no auto-recoveryAUTO_RECREATE: Automatically reschedule on failure
# All tests
go test ./... -v
# With coverage
go test ./... -coverprofile=coverage.out
# Specific module
go test ./internal/controller/agentpool/ -vSee docs/TESTING.md for detailed testing documentation.
# CPU profiling
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof
# View profile
go tool pprof -http=:8080 cpu.profSee docs/PERFORMANCE.md for performance analysis.
- Phase 1: Core Runtime (Containerd) & gRPC framework
- Phase 2: Fast-Path API & Registry scheduling
- Phase 3: CLI (
fsb-ctl) & interactive experience - Phase 4: Log streaming & auto tunneling
- Phase 5: Unified logging (klog)
- Phase 6: Performance instrumentation & unit tests
- Phase 7: Supports custom volume mounting.
- Phase 8: Container checkpoint/restore (CRIU)
- Phase 9: Web console & traffic proxy
- Phase 10: gVisor support for secure sandboxing
- Phase 11: CLI exec bash & Python SDK (Modal-like)
- Phase 12: GPU container support