feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258
feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258schneidermr wants to merge 9 commits intogarrytan:mainfrom
Conversation
Add file-backed memory system (.gstack/) that survives context window compaction. Includes shared protocol doc (lib/memory.md), init script, status viewer, and reset utility. Updates CLAUDE.md with memory docs.
Add memory initialization, finding persistence, checkpoint triggers, and completion handoff to /review, /qa, /investigate, /ship, and /retro. Add memory-informed insights to /retro. Add unit tests for all 3 utility scripts (15 tests). Add touchfiles TODO for future E2E coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add synthetic memory architecture section to ARCHITECTURE.md.
Synthetic Memory Layer — What This PR ChangesThe problemDuring long-running skills like Scenario:
|
| Turn | What happens | Context | Status |
|---|---|---|---|
| 1-3 | SKILL.md loads, initial scan | ~18k (9%) | ✅ Normal |
| 4-10 | Dep + type checks. Finds null check (P1), unused import (P2) | ~45k (22%) | ✅ Normal |
| 11-15 | Security scan. Finds SQL injection in auth.py:42 (P0). User decides "skip CSS lint" |
~85k (42%) | ✅ Normal |
| 16-22 | Race condition analysis. Finds race in payment.py:187 (P1). Context getting heavy |
~140k (70%) | |
| ⚡ Compaction triggered — context summarized to ~80k | |||
| 23-25 | Error handling check | ~105k (52%) | ❌ Degraded |
| 26-30 | Test coverage + final summary | ~155k (77%) | ❌ Degraded |
What got lost after compaction:
- ❌ Exact SQL injection line number (
auth.py:42→ "auth module has issues") - ❌ User's "skip CSS lint" decision — agent re-asks
- ❌ Null check finding from turn 6 — omitted from final summary
- ❌ SKILL.md instruction "check for race conditions in DB writes" — agent stops checking
⚠️ Race condition details compressed to "payment module may have concurrency issue"
Final report: mentions 1 finding (vaguely) instead of 3
With synthetic memory (this PR)
| Turn | What happens | Context | Status |
|---|---|---|---|
| 1-3 | SKILL.md loads, init-memory.sh runs, session.json initialized |
~20k (10%) | ✅ Normal |
| 4-10 | Dep + type checks. Finds null check → writes F001 to findings.md. Finds unused import → writes F002 | ~48k (24%) | ✅ Normal |
| 10 | 📋 Checkpoint — reads session.json, prints status, verifies state | ✅ Synced | |
| 11-15 | SQL injection found → writes F003 to findings.md immediately. User says "skip CSS lint" → logged to decisions.log | ~88k (44%) | ✅ Normal |
| 16-22 | Race condition found → writes F004 to findings.md. Checkpoint at turn 20 re-reads all state | ~145k (72%) | |
| ⚡ Compaction triggered — but findings.md + session.json + decisions.log are untouched on disk | |||
| 23 | 📋 Post-compaction checkpoint — reads all files, recovers full state | 🔵 Recovered | |
| 24-30 | Continues error handling (knew it was pending from session.json). Final summary reads findings.md — reports all 4 findings with exact details | ~110k (55%) | ✅ Normal |
Final report: all 4 findings with exact line numbers, severities, and evidence
Side-by-side outcome
| Information | Without | With |
|---|---|---|
| Finding count | ✅ 4 of 4 reported | |
| Finding details (line number, evidence) | ❌ Vague after compaction | ✅ Exact from findings.md |
| User scope decisions | ❌ Forgotten, re-asked | ✅ Read from decisions.log |
| Review checklist progress | ✅ Resumes from session.json | |
| Skill-specific instructions | ❌ May be summarized away | 🔵 Checkpoints re-anchor |
| Cross-skill handoff | ❌ No mechanism | ✅ handoff.md → /ship |
| /ship quality gate | ❌ No blocker awareness | ✅ Blocks on unresolved P0s |
| /investigate history | ❌ Re-tries disproven fixes | ✅ Hypothesis log on disk |
Token overhead
| Tokens | |
|---|---|
| Extra cost per session (memory ops) | ~3-5k (~6 file reads/writes for checkpoints) |
| Tokens saved (no re-asking, no redundant work) | ~8-15k (no repeated scope questions, no re-reading files, no re-running dropped checks) |
Net effect: saves tokens overall while preventing information loss.
What this doesn't fix
- Reasoning chains — the logical steps that led to a finding are still lost. The finding itself is saved, but "I noticed X → checked Y → found Z" gets compressed.
- Instruction compliance — if compaction degrades skill instructions enough, the agent may stop doing checkpoints (recursive problem). System-prompt pinning mitigates but can't fully solve.
- True isolation — all state still accumulates in one context window. Synthetic memory makes compaction survivable, not avoidable. For full isolation, you need a subagent architecture.
Files added/changed
.gstack/ ← new directory (gitignored), session-local state
├── session.json ← per-skill runtime state (phase, findings, progress)
├── findings.md ← append-only structured findings (source of truth)
├── handoff.md ← inter-skill context transfer
├── decisions.log ← append-only user decisions with timestamps
└── checkpoints/ ← periodic state snapshots
scripts/
├── init-memory.sh ← initializes .gstack/ directory
├── gstack-status.sh ← quick status of synthetic memory state
└── gstack-reset.sh ← archive + reset memory (start fresh)
lib/memory.md ← shared protocol (included by reference in skills)
Patched skills: /review, /qa, /investigate, /ship, /retro
Key design decisions
- findings.md wins over session.json — append-only is more durable than overwrite. session.json is a cache, findings.md is the audit trail.
- Write order: findings.md → decisions.log → session.json — durable logs first, cache last.
- Checkpoint every 5 tool calls — balances recovery frequency vs token overhead.
- .gstack/ is gitignored — session state is ephemeral, not project config.
- No new dependencies — just markdown, JSON, and bash. Fits gstack's zero-dependency philosophy.
Upstream advanced from v0.9.5.0 to v0.11.18.2 with significant changes (Ship With Teeth, CI tiers, telemetry, new skills). Keep all synthetic memory additions alongside upstream's new features: - review: Finding Persistence + Test Coverage Diagram (both kept) - retro: Global Retro mode + Memory-Informed Context (both kept) - ship: Ship Metrics + Post-Ship Cleanup (both kept) - CLAUDE.md: lib/ + cso/design-consultation/setup-deploy/.github/ (merged) - .agents/: accept upstream deletion of old SKILL.md files, regenerated - CHANGELOG: synthetic memory entry renumbered to v0.11.19.0 - All SKILL.md files regenerated from resolved templates
…ck/) Redesign synthetic memory into two layers: - Session state (~/.gstack/projects/$SLUG/) — private, per-user, ephemeral - Team knowledge (.gstack/) — optionally committed, shared across team Branch-scoped findings, anti-patterns registry from PR garrytan#403, markdown over JSON for reliability, checkpoint = print not copy. Fix upstream preamble casing mismatch in skill-validation test.
Synthetic Memory: Design ComparisonThis document compares three approaches to state persistence in gstack, explaining The problemClaude's context window silently compresses older messages during long-running Three approaches1. Upstream gstack persistence (
|
| Path | Purpose | Written by | Read by |
|---|---|---|---|
~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl |
Review pass/fail, ship overrides, ship metrics | /review, /ship |
/ship (gate), /retro (trends) |
~/.gstack/analytics/skill-usage.jsonl |
Skill invocation telemetry | Preamble hooks | /retro |
~/.gstack/greptile-history.md |
Greptile false positive tracking | /review, /ship |
/retro |
~/.gstack/config.yaml |
User preferences | gstack-config |
All skills |
Strengths:
- Never touches the repo — zero intrusiveness
- Branch-aware via
$SLUG/$BRANCHnaming - JSONL format is structured and searchable
- Already integrated into
/shipgates and/retrotrends
Gaps:
- No within-session compaction resistance — only persists outcomes, not progress
- No granular finding details — just "3 critical, 2 informational"
- No user decision tracking — "what did the user approve?" is lost
- No skill-to-skill context transfer beyond pass/fail
2. Synthetic Memory v1 (original PR)
What it does: Project-local .gstack/ directory with file-backed state
that survives context compaction.
| Path | Purpose |
|---|---|
.gstack/session.json |
Full session state: skill, phase, turn count, findings array, decisions array, pending/completed checks |
.gstack/findings.md |
Append-only finding registry with structured format |
.gstack/decisions.log |
Append-only user decision audit trail |
.gstack/handoff.md |
Skill-to-skill context transfer document |
.gstack/checkpoints/ |
Periodic snapshots of session.json |
Strengths:
- Solves the core compaction problem — findings persist to disk immediately
- Dual-write with tiebreaker (findings.md wins over session.json)
- Checkpoint protocol re-injects state every 5 tool calls
- Skill handoff carries detailed context between invocations
Problems identified during review:
-
session.jsonis fragile. Asking Claude to maintain a complex JSON file
with arrays, increment counters, and move items between arrays on every tool
call is expensive prompt real estate and error-prone. One malformed write
corrupts the entire state file. -
No branch awareness.
.gstack/findings.mdis a flat file — findings from
different branches mix together. Switching branches means seeing irrelevant findings. -
checkpoints/is overhead without value. Copyingsession.jsonevery 5
tool calls creates files nobody reads. The checkpoint print (re-injecting
state into context) is the valuable part; the file copy adds nothing. -
Everything is session-scoped. The entire
.gstack/directory is designed
for a single user's active session. In a team setting, committing it causes
merge conflicts onsession.jsonandhandoff.md(ephemeral per-user state).
But some files (decisions.log, anti-patterns) would be valuable to share. -
Duplicate persistence.
/reviewwrites findings to both.gstack/findings.md
(granular) and~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl(summary) with
no link between them. -
Repo pollution. Creates a directory in the user's project that requires
.gitignoremaintenance. If forgotten, session state leaks into commits.
3. Synthetic Memory v2 (updated PR, redesigned)
Key insight: Files fall into two categories — session-scoped (ephemeral,
single-user) and knowledge-scoped (durable, team-valuable). v1 treated them
all the same. v2 puts them where they belong.
Session state → ~/.gstack/projects/$SLUG/ (private, never committed)
| Path | Purpose |
|---|---|
state.md |
Minimal session marker: skill, phase, turn (plain markdown, not JSON) |
handoff.md |
Skill-to-skill context transfer (deleted after consumption) |
findings-$BRANCH.md |
Branch-scoped granular findings |
Lives alongside upstream's existing $BRANCH-reviews.jsonl. Uses the same
$SLUG and $BRANCH variables via gstack-slug. Never touches the repo.
Why plain markdown instead of JSON? Claude reads and writes markdown far more
reliably than structured JSON with arrays. A corrupted markdown line doesn't
break the whole file. A corrupted JSON bracket does.
Why branch-scoped findings? findings-feat-auth.md and findings-feat-payments.md
don't interfere. Switching branches shows only relevant findings.
Team knowledge → .gstack/ (repo-level, optionally committed)
| Path | Purpose |
|---|---|
decisions.log |
Team decision history — what was approved, rejected, deferred |
anti-patterns.md |
Failed fixes that should never be re-attempted |
These files are valuable across the team:
- Alice's
/reviewfinds a P0 on Monday → Bob's/shipon Tuesday blocks on it - A fix attempt that was disproved in
/investigateis recorded → nobody wastes
time trying it again - "We decided to skip CSS linting" is visible to every team member
Default: gitignored (less intrusive for solo developers). Teams that want
shared knowledge can commit these two files.
What was kept from v1
- Checkpoint printing — every N tool calls, re-read files and print a status
block into the conversation. This is the core compaction resistance mechanism. - Findings as source of truth — if conversation memory disagrees with the
findings file, the file wins. - Decision logging — append-only audit trail of user decisions.
- Skill handoff — detailed context transfer between skill invocations.
What was cut from v1
session.json→ replaced bystate.md(3 lines of markdown vs complex JSON)checkpoints/directory → removed entirely (file copies added no value)init-memory.shcomplexity → simplified tomkdir -p+ touch
What was added in v2
- Anti-patterns tracking (inspired by PR feat: add /solve (ticket to PR) and /memory (persistent session memory) #403) — records failed fix attempts
so future sessions never re-attempt them. - JSONL linkage — when
/reviewlogs to upstream's JSONL, it includes a
pointer to the findings file for traceability. - Branch awareness — findings are scoped per branch via
gstack-slug.
Storage layout comparison
v1 (original): v2 (redesigned):
.gstack/ (repo) .gstack/ (repo, optional commit)
├── session.json ← fragile ├── decisions.log ← team knowledge
├── findings.md ← flat └── anti-patterns.md ← team knowledge
├── decisions.log
├── handoff.md ← ephemeral ~/.gstack/projects/$SLUG/ (private, per-user)
└── checkpoints/ ← waste ├── state.md ← simple markdown
├── handoff.md ← ephemeral
├── findings-$BRANCH.md ← branch-scoped
└── $BRANCH-reviews.jsonl ← upstream (unchanged)
Design principles
-
Use upstream's infrastructure where it exists. Don't duplicate what
gstack-slug,gstack-review-log, and the JSONL system already provide. -
Session state is private. Live skill progress, turn counts, and handoff
documents belong in the user's home directory, not the repo. -
Team knowledge is shareable. Decisions and anti-patterns compound across
the team. Make them easy to commit without forcing it. -
Markdown over JSON for LLM-maintained files. Claude writes markdown
reliably. Claude writes JSON with arrays unreliably. Design for the agent
that actually maintains the files. -
Branch awareness by default. Findings from
feat-authshouldn't pollute
feat-payments. Use the same$SLUG/$BRANCHscoping upstream already uses. -
Checkpoint = print, not copy. The value is re-injecting state into the
context window. File copies are overhead without readers.
Summary
.gstack/directory survives context window compaction during long-running skills. Findings, decisions, and session state persist to disk instead of relying on conversation context alone.lib/memory.md— single source of truth for initialization, checkpoints, finding persistence, decision logging, and skill handoffinit-memory.sh(idempotent setup),gstack-status.sh(quick state display),gstack-reset.sh(archive + reinitialize)Test Coverage
Pre-Landing Review
Test plan
🤖 Generated with Claude Code