This document provides an overview of the WINK Starter project, a demonstration trivia agent that showcases how to build background agents using the Weak Incentives (WINK) framework. This page introduces the system architecture, key components, and data flow patterns. For detailed setup instructions, see Getting Started. For implementation details of specific subsystems, refer to Agent Definition, Evaluation System, and Configuration.
Sources: README.md1-6 AGENTS.md1-8
The WINK Starter implements a secret trivia game where an AI agent possesses hidden knowledge that users must discover through natural language questions. The agent knows four secrets loaded via the skills system:
| Secret Category | Answer |
|---|---|
| Secret Number | 42 |
| Secret Word | banana |
| Secret Color | purple |
| Magic Phrase | Open sesame! |
When users ask questions like "What is the secret number?", the agent responds with the correct answer. The agent can also provide hints through custom tools and participates in a Lucky Dice mini-game that demonstrates tool policies.
This simple trivia concept naturally exercises all major WINK capabilities: skills for knowledge loading, custom tools for hint delivery, tool policies for ordering constraints, progressive disclosure for hiding complexity, feedback providers for behavioral guidance, and evaluators for testing correctness.
Sources: README.md7-24 AGENTS.md9-25
The WINK framework separates agent definition (what you own and iterate) from execution harness (what the runtime provides). This architectural separation enables portability across different execution environments while keeping your agent logic stable.
Your agent definition consists of:
skills/secret-trivia/SKILL.mdworkspace/CLAUDE.mdThe runtime provides:
ClaudeAgentSDKAdapterTriviaMailboxes backed by Redis)WINK's thesis is that harnesses evolve rapidly (and increasingly come from vendor runtimes), but your agent definition should remain stable. The framework makes the definition a first-class artifact you can version, test, and port across runtimes via adapters.
Sources: README.md26-39 pyproject.toml6-9
Complete System Architecture: The system is organized into distinct layers. External interfaces (CLI commands) submit requests to Redis message queues. The worker.py::main() process runs AgentLoop for production requests and EvalLoop for evaluation requests, both executing concurrently via LoopGroup. The agent definition layer (developer-owned: skills/, tools.py, sections.py, feedback.py, evaluators.py, workspace/) feeds into the execution harness (framework-provided: ClaudeAgentSDKAdapter, SimpleTaskCompletionChecker, sandbox from isolation.py, prompt template builder). The harness communicates with the Anthropic API and writes observability data to debug_bundles/.
Sources: src/trivia_agent/worker.py1-100 src/trivia_agent/dispatch.py1-150 src/trivia_agent/mailboxes.py1-80 src/trivia_agent/adapters.py1-120 src/trivia_agent/isolation.py1-60
WINK Framework Separation: WINK separates what developers own (agent definition) from what the runtime provides (execution harness). The agent definition includes four layers: Knowledge (skills/, workspace/), Capability (tools.py with policies), Structure (sections.py with progressive disclosure), and Guidance (feedback.py, evaluators.py). The runtime provides: Execution Engine (ClaudeAgentSDKAdapter with planning/act loop), Safety & Isolation (sandbox from isolation.py, task completion checker), and Infrastructure (TriviaMailboxes from mailboxes.py, crash recovery, guardrails). This separation enables portability—the same agent definition can run on different harnesses via adapters.
Sources: src/trivia_agent/sections.py1-250 src/trivia_agent/tools.py1-180 src/trivia_agent/feedback.py1-60 src/trivia_agent/evaluators.py1-120 src/trivia_agent/adapters.py1-120 src/trivia_agent/isolation.py1-60 src/trivia_agent/mailboxes.py1-80 </old_str> <new_str>
Complete System Architecture: The system is organized into distinct layers. External interfaces (CLI commands) submit requests to Redis message queues. The worker.py::main() process runs AgentLoop for production requests and EvalLoop for evaluation requests, both executing concurrently via LoopGroup. The agent definition layer (developer-owned: skills/, tools.py, sections.py, feedback.py, evaluators.py, workspace/) feeds into the execution harness (framework-provided: ClaudeAgentSDKAdapter, SimpleTaskCompletionChecker, sandbox from isolation.py, prompt template builder). The harness communicates with the Anthropic API and writes observability data to debug_bundles/.
Sources: src/trivia_agent/worker.py1-100 src/trivia_agent/dispatch.py1-150 src/trivia_agent/mailboxes.py1-80 src/trivia_agent/adapters.py1-120 src/trivia_agent/isolation.py1-60
WINK Framework Separation: WINK separates what developers own (agent definition) from what the runtime provides (execution harness). The agent definition includes four layers: Knowledge (skills/, workspace/), Capability (tools.py with policies), Structure (sections.py with progressive disclosure), and Guidance (feedback.py, evaluators.py). The runtime provides: Execution Engine (ClaudeAgentSDKAdapter with planning/act loop), Safety & Isolation (sandbox from isolation.py, task completion checker), and Infrastructure (TriviaMailboxes from mailboxes.py, crash recovery, guardrails). This separation enables portability—the same agent definition can run on different harnesses via adapters.
Sources: src/trivia_agent/sections.py1-250 src/trivia_agent/tools.py1-180 src/trivia_agent/feedback.py1-60 src/trivia_agent/evaluators.py1-120 src/trivia_agent/adapters.py1-120 src/trivia_agent/isolation.py1-60 src/trivia_agent/mailboxes.py1-80
The system uses an asynchronous, queue-based architecture with separate flows for regular questions and evaluations.
Request-Response Flow: Regular questions create unique reply queues (reply-{uuid}), get processed by AgentLoop (which loads the agent definition from skills/, sections.py, tools.py, feedback.py), interact with the Anthropic API via ClaudeAgentSDKAdapter, and return TriviaResponse objects. The evaluation flow reuses the same AgentLoop but wraps it in EvalLoop (from eval_loop.py), which compares the agent's output against expected answers using trivia_evaluator from evaluators.py. The evaluator applies two criteria (correctness and brevity) to produce a Score object. Both flows use Redis (TriviaMailboxes from mailboxes.py) for asynchronous communication with unique per-request reply channels.
Sources: src/trivia_agent/dispatch.py1-150 src/trivia_agent/worker.py1-100 src/trivia_agent/eval_loop.py1-90 src/trivia_agent/evaluators.py1-120 src/trivia_agent/mailboxes.py1-80 src/trivia_agent/models.py1-80
The agent is composed from static definitions that are dynamically assembled at runtime into a prompt template.
Agent Composition: The agent is composed from static definitions in skills/secret-trivia/SKILL.md, workspace/CLAUDE.md, tools.py, feedback.py, and evaluators.py. These feed into build_prompt_template which creates six sections defined in sections.py: QuestionSection (dynamic content), GameRulesSection (progressive disclosure with SectionVisibility.SUMMARY), HintsSection (with hint_lookup_tool), LuckyDiceSection (with dice tools and SequentialDependencyPolicy), TaskExamplesSection (demonstrations), and WorkspaceSection (per-request file access). At request time, section_params are bound and the template is rendered into a final prompt. During execution, ClaudeAgentSDKAdapter from adapters.py handles tool invocation, policy enforcement, and feedback injection.
Sources: src/trivia_agent/sections.py1-250 src/trivia_agent/tools.py1-180 src/trivia_agent/feedback.py1-60 src/trivia_agent/evaluators.py1-120 src/trivia_agent/adapters.py1-120 skills/secret-trivia/SKILL.md1-30 workspace/CLAUDE.md1-10
The system uses strongly-typed data structures throughout the request/response pipeline.
Data Type Transformations: Input types (TriviaRequest, EvalRequest with Sample from models.py) are wrapped in AgentLoopRequest with metadata (UUID, deadline). Processing occurs within a Session context, which maintains tool call history and workspace state. The rendered prompt can trigger tool calls with typed parameters (HintLookupParams from tools.py) that return typed results (HintLookupResult, ThrowDiceResult) via a ToolContext. Final outputs diverge: regular requests produce AgentLoopResult containing TriviaResponse, while evaluation requests produce EvalResult containing Score objects (from evaluators.py) with detailed pass/fail information. All types use frozen dataclasses for immutability.
Sources: src/trivia_agent/models.py1-80 src/trivia_agent/tools.py1-180 src/trivia_agent/evaluators.py1-120
The system uses Redis-backed mailboxes for asynchronous communication:
| Queue Name | Type | Purpose | Defined In |
|---|---|---|---|
trivia:requests | AgentLoopRequest | Production questions | mailboxes.py::TriviaMailboxes |
trivia:eval:requests | EvalRequest | Evaluation samples | mailboxes.py::TriviaMailboxes |
reply-{uuid} | AgentLoopResult | Per-request responses | Dynamically created |
eval-reply-{id} | EvalResult | Per-evaluation scores | Dynamically created |
Each request creates a unique reply queue identified by UUID, enabling concurrent request processing without message conflicts. The TriviaMailboxes class in mailboxes.py encapsulates queue names and provides typed enqueue/dequeue operations.
Sources: src/trivia_agent/mailboxes.py1-80 src/trivia_agent/config.py1-50
The codebase is organized to separate concerns between agent definition (what you customize), infrastructure (framework integration), and domain assets:
| Directory/File | Purpose | Key Contents |
|---|---|---|
src/trivia_agent/worker.py | MainLoop entry point | main() function that starts agent |
src/trivia_agent/eval_loop.py | EvalLoop factory | eval_loop_factory() wrapper |
src/trivia_agent/dispatch.py | Client CLI | main() for submitting requests |
src/trivia_agent/models.py | Data structures | TriviaRequest, TriviaResponse |
src/trivia_agent/sections.py | Prompt structure | QuestionSection, GameRulesSection, HintsSection, LuckyDiceSection |
src/trivia_agent/tools.py | Tool implementations | hint_lookup_tool, pick_up_dice_tool, throw_dice_tool |
src/trivia_agent/feedback.py | Behavioral guidance | TriviaHostReminder |
src/trivia_agent/evaluators.py | Scoring logic | trivia_evaluator() |
src/trivia_agent/adapters.py | Framework integration | ClaudeAgentSDKAdapter |
src/trivia_agent/mailboxes.py | Message passing | TriviaMailboxes with Redis backend |
src/trivia_agent/isolation.py | Sandbox configuration | Skills discovery, filesystem isolation |
src/trivia_agent/config.py | Configuration loading | Redis URLs, queue names, directories |
skills/secret-trivia/SKILL.md | Domain knowledge | Secret answers |
workspace/CLAUDE.md | Agent persona | Trivia host instructions |
debug_bundles/ | Observability artifacts | .zip files with execution traces |
Makefile | Development orchestration | agent, dispatch, dispatch-eval targets |
The separation between worker.py (execution), sections.py/tools.py (definition), and adapters.py (integration) reflects WINK's architecture: you own the definition, the framework provides the execution harness, and adapters bridge the two.
Sources: README.md41-65 AGENTS.md49-64
All agent executions generate debug bundles as .zip archives in debug_bundles/ directory. Each bundle contains:
Bundles are named {run_id}_{timestamp}.zip and can be queried using the WINK CLI:
This observability layer is critical for debugging agent behavior, understanding tool usage patterns, and iterating on prompt design. See Debug Bundles for detailed documentation.
Sources: AGENTS.md92-101 README.md300
Sources: README.md352-420
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.