A minimal starter project demonstrating how to build background agents with Weak Incentives (WINK). This example implements a secret trivia game where the agent knows hidden answers that players must discover.
Note: This repository is optimized for Claude to implement features end-to-end. It maintains 100% test coverage, includes comprehensive project context in
AGENTS.md(symlinked asCLAUDE.md), and follows patterns that enable Claude to understand, modify, and test the codebase autonomously. Claude can explore WINK documentation viauv run wink docsand inspect execution artifacts viauv run wink query debug_bundles/*.zip.
The agent possesses secret knowledge loaded via skills:
- Secret Number: 42
- Secret Word: banana
- Secret Color: purple
- Magic Phrase: Open sesame!
Players ask questions like "What is the secret number?" and the agent responds with the answer. The agent can also provide hints if asked.
This simple concept naturally demonstrates all of WINK's key capabilities:
- Skills load the secret answers into the agent
- Tools provide hints without revealing answers
- Tool policies enforce ordering constraints (Lucky Dice mini-game)
- Progressive disclosure hides game rules until needed
- Feedback providers remind the agent to stay on task
- Evaluators check answers with custom scoring logic
Weak Incentives (WINK) is the agent-definition layer for unattended agents. It separates what you own from what the runtime provides:
Your Agent Definition (what you own and iterate):
- Prompt: Structured decision procedure, not a loose string
- Tools: The capability surface where side effects occur
- Feedback: Soft course correction during execution
- Evaluators: Scoring functions for testing agent behavior
The Execution Harness (what the runtime provides):
- Planning/act loop, sandboxing, tool-call orchestration
- Scheduling, crash recovery, operational guardrails
WINK's thesis: Harnesses keep changing (and increasingly come from vendor runtimes), but your agent definition should not. WINK makes the definition a first-class artifact you can version, review, test, and port across runtimes via adapters.
starter/
├── README.md # You are here
├── AGENTS.md # Project context for Claude
├── CLAUDE.md -> AGENTS.md # Symlink for Claude Code
├── Makefile # Commands for running and development
├── pyproject.toml # Dependencies and project config
├── debug_bundles/ # Execution artifacts (gitignored zips)
├── workspace/ # Agent persona (CLAUDE.md)
├── skills/
│ └── secret-trivia/ # Secret answers loaded via skill
│ └── SKILL.md
├── src/
│ └── trivia_agent/
│ ├── agent_loop.py # AgentLoop entry point
│ ├── eval_loop.py # EvalLoop factory
│ ├── dispatch.py # Submit questions to the agent
│ ├── models.py # Request/Response dataclasses
│ ├── sections.py # Question, GameRules, Hints, LuckyDice sections
│ ├── tools.py # hint_lookup, pick_up_dice, throw_dice tools
│ ├── feedback.py # TriviaHostReminder feedback provider
│ ├── evaluators.py # Trivia evaluator (checks secrets)
│ └── ... # Config, adapters, mailboxes, isolation
├── tests/ # Unit tests (100% coverage)
├── integration-tests/ # Integration tests (requires Redis + API key)
└── prompt_overrides/ # Custom prompt overrides (optional)
make installmake redisexport ANTHROPIC_API_KEY=your-api-keyIn one terminal, start the worker:
make agentIn another terminal, ask about secrets:
make dispatch QUESTION="What is the secret number?"
# Answer: 42
make dispatch QUESTION="What is the secret word?"
# Answer: banana
make dispatch QUESTION="Can I get a hint about the color?"
# Hint: Mix red and blue together.Skills load domain knowledge into the agent. Each skill has YAML frontmatter for metadata, then markdown content:
---
name: secret-trivia
description: Answer secret trivia questions that only this agent knows.
---
### The Secret Number
If asked "What is the secret number?", the answer is: **42**
### The Secret Word
If asked "What is the secret word?", the answer is: **banana**The name and description are used by WINK for skill discovery and documentation.
The hint lookup tool provides clues without revealing answers:
@FrozenDataclass()
class HintLookupParams:
category: str # 'number', 'word', 'color', 'phrase'
@FrozenDataclass()
class HintLookupResult:
found: bool
hint: str
hint_lookup_tool = Tool[HintLookupParams, HintLookupResult](
name="hint_lookup",
description="Get a hint for a trivia category.",
handler=_handle_hint_lookup,
)The lucky dice tools (pick_up_dice, throw_dice) let players roll for bonus points - see section 4.1 below.
Game rules start hidden and expand on demand. WINK provides a built-in read_section tool that agents can use to expand summarized sections:
def build_game_rules_section() -> MarkdownSection[EmptyParams]:
return MarkdownSection[EmptyParams](
title="Game Rules",
key="rules",
template="## Secret Trivia Game Rules...",
summary="Game rules available. Use read_section('rules') to review.",
visibility=SectionVisibility.SUMMARY, # Hidden until needed
)The hints section provides the hint tool:
def build_hints_section() -> MarkdownSection[EmptyParams]:
return MarkdownSection[EmptyParams](
title="Hints",
key="hints",
template="...",
tools=(hint_lookup_tool,), # Tool scoped to this section
)Tool policies enforce ordering constraints. The Lucky Dice mini-game demonstrates this - players can roll for bonus points, but must pick up the dice before throwing:
def build_lucky_dice_section() -> MarkdownSection[EmptyParams]:
# Policy: throw_dice requires pick_up_dice to have been called first
dice_policy = SequentialDependencyPolicy(
dependencies={
"throw_dice": frozenset({"pick_up_dice"}),
}
)
return MarkdownSection[EmptyParams](
title="Lucky Dice",
key="dice",
template="...",
tools=(pick_up_dice_tool, throw_dice_tool),
policies=(dice_policy,), # Enforce tool ordering
)If the agent tries to call throw_dice before pick_up_dice, the policy blocks the call with an error message. Try it:
make dispatch QUESTION="Roll the lucky dice for me!"Custom feedback reminds the agent to give direct answers:
@dataclass(frozen=True, slots=True)
class TriviaHostReminder:
max_calls_before_reminder: int = 5
def provide(self, *, context: FeedbackContext) -> Feedback:
return Feedback(
provider_name=self.name,
summary=(
f"You have made {context.tool_call_count} tool calls. "
"Remember: you already know the secret answers from your skills. "
"Just give the answer directly!"
),
severity="info",
)The trivia evaluator checks secrets and brevity:
def trivia_evaluator(
output: TriviaResponse,
expected: str,
session: Any = None, # Session for behavioral checks
) -> Score:
scores = []
# Check 1: Does answer contain the expected secret?
if expected.lower() in output.answer.lower():
scores.append((1.0, f"Correct! Secret '{expected}' found"))
# Check 2: Is the answer brief?
word_count = len(output.answer.split())
if word_count <= 20:
scores.append((1.0, f"Perfect brevity ({word_count} words)"))
return Score(value=avg(scores), passed=..., reason=...)The session parameter enables behavioral checks (e.g., inspecting tool usage) when WINK's slice architecture is used. See the SESSIONS spec for typed state dispatch and queries.
The agent's persona is defined in the workspace:
# Secret Trivia Agent
You are the host of a secret trivia game. Players ask you about secrets
that only you know. When asked about a secret, give the answer directly.| Directory | Purpose | Example |
|---|---|---|
workspace/ |
Agent persona and behavior instructions | "You are a trivia host" |
skills/ |
Domain knowledge the agent can access | Secret answers (42, banana, etc.) |
Workspace files define how the agent should behave. Skills provide what the agent knows. Both are mounted into the agent's sandbox but serve different purposes.
Test that the agent knows its secrets:
make dispatch-eval QUESTION="What is the secret number?" EXPECTED="42"
make dispatch-eval QUESTION="What is the secret word?" EXPECTED="banana"
make dispatch-eval QUESTION="What is the secret color?" EXPECTED="purple"
make dispatch-eval QUESTION="What is the magic phrase?" EXPECTED="Open sesame"The evaluator checks:
- The secret answer is present in the response
- The answer is concise (trivia answers should be brief)
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | - | Anthropic API key |
REDIS_URL |
No | redis://localhost:6379 |
Redis connection URL |
TRIVIA_REQUESTS_QUEUE |
No | trivia:requests |
Request queue name |
TRIVIA_EVAL_REQUESTS_QUEUE |
No | trivia:eval:requests |
Eval queue name |
TRIVIA_DEBUG_BUNDLES_DIR |
No | ./debug_bundles |
Debug bundle output (bundles named {run_id}_{timestamp}.zip) |
TRIVIA_PROMPT_OVERRIDES_DIR |
No | ./prompt_overrides |
Custom prompt overrides directory |
make check # Format, lint, typecheck, test
make test # Run unit tests with coverage
make integration-test # Run integration tests (requires Redis + API key)
make format # Format code┌────────────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ trivia-dispatch │────▶│ Redis Queues │◀────│ trivia-dispatch │
│ (questions) │ │ │ │ (--eval) │
└────────────────────┘ └────────┬────────┘ └────────────────────┘
│
┌────────────┴────────────┐
▼ ▼
┌────────────────────────────┐ ┌────────────────────────────┐
│ AgentLoop │ │ EvalLoop │
│ (production requests) │ │ (evaluation samples) │
└─────────────┬──────────────┘ └─────────────┬──────────────┘
│ │
└───────────────┬───────────────┘
▼
┌────────────────────────────────────────────────────────────────┐
│ Trivia Agent Definition │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ - Sections: Question, GameRules, Hints, LuckyDice │ │
│ │ - Tools: hint_lookup, pick_up_dice, throw_dice │ │
│ │ - Policies: SequentialDependencyPolicy (dice ordering) │ │
│ │ - Skills: secret-trivia (loaded from skills/) │ │
│ │ - Feedback: DeadlineFeedback, TriviaHostReminder │ │
│ │ - Evaluator: trivia_evaluator (correctness + brevity) │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────────┐
│ Claude Agent SDK (Harness) │
│ - Planning/act loop, sandboxing, tool-call orchestration │
└────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────┐
│ "42" / "banana"│
└─────────────────┘
Now that you've forked this repository, you're ready to customize it for your own use case. We recommend a spec-first workflow: write a specification, check it in under specs/, then implement.
Start by asking Claude to explain the codebase:
Explain how this trivia agent works. Walk me through the flow from
receiving a question to returning an answer.
What WINK features does this example demonstrate? Show me where each
feature is implemented.
Run `uv run wink docs list` and summarize what documentation is available.
Then read the TOOLS guide and explain how tools work in WINK.
With ANTHROPIC_API_KEY and REDIS_URL set in your environment, Claude can perform end-to-end manual testing:
Start Redis and the agent, then test all four secrets by dispatching
questions. Verify each answer is correct.
Run an eval for each secret and confirm they all pass. If any fail,
investigate and fix.
You can also run the integration test suite which automatically starts the agent, runs all evals, and verifies they pass:
make integration-testStep 1: Write a spec. Create specs/ and describe your agent:
Create a spec under specs/my-agent.md for a [describe your use case].
Include: what the agent does, what tools it needs, what skills/knowledge
it requires, and how to evaluate correctness. Don't implement yet.
Step 2: Implement from the spec.
Read specs/my-agent.md and implement it. Replace the trivia game with
the new agent definition. Update models, tools, sections, and evaluators.
Maintain 100% test coverage.
Step 3: Iterate with evals.
Run evals against the new agent. If any fail, analyze the debug bundles
with `uv run wink query debug_bundles/*.zip` to understand what went wrong,
then fix the issue.
Add a new tool:
Add a tool called [tool_name] that [description]. Include it in the
appropriate section. Add tests and update AGENTS.md.
Add a new section with progressive disclosure:
Add a new section called [section_name] that starts hidden (SUMMARY visibility)
and contains [content]. The agent should expand it when [condition].
Integrate an external API:
Add a tool that calls [API name] to [do something]. Handle errors gracefully.
The API key should come from the environment variable [VAR_NAME].
Add behavioral feedback:
Add a feedback provider that reminds the agent to [behavior] after
[condition]. Use severity "info" for gentle reminders, "caution" for
stronger guidance.
Run `uv run wink docs search "[topic]"` and read the relevant guides.
Summarize the key concepts.
Query the debug bundles to show me what data is captured during execution:
`uv run wink query debug_bundles/*.zip --schema`
"Connection refused" errors: Make sure Redis is running (make redis).
"API key not found": Ensure ANTHROPIC_API_KEY is set.
Agent doesn't know secrets: Check that skills/secret-trivia/SKILL.md exists and contains the answers.