Skip to content

feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258

Open
schneidermr wants to merge 9 commits intogarrytan:mainfrom
bitkaio:feat/synthetic-memory
Open

feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258
schneidermr wants to merge 9 commits intogarrytan:mainfrom
bitkaio:feat/synthetic-memory

Conversation

@schneidermr
Copy link

Summary

  • Synthetic memory layer — file-backed .gstack/ directory survives context window compaction during long-running skills. Findings, decisions, and session state persist to disk instead of relying on conversation context alone.
  • Five skills patched — review, qa, investigate, ship, and retro all read/write synthetic memory with skill-specific protocols (screenshot memory for QA, hypothesis tracking for investigate, pre-ship validation for ship, etc.)
  • Shared protocol in lib/memory.md — single source of truth for initialization, checkpoints, finding persistence, decision logging, and skill handoff
  • Three utility scriptsinit-memory.sh (idempotent setup), gstack-status.sh (quick state display), gstack-reset.sh (archive + reinitialize)
  • 15 unit tests covering init idempotency, JSON schema validity, status output parsing, finding count accuracy, and reset archive behavior
  • Dual-write with tiebreaker — findings written to both session.json and findings.md, with findings.md as the source of truth if they diverge

Test Coverage

  • Tests: 0 → 1 (+1 new test file, 15 test cases)
  • All new code paths covered: init, status, reset, archive, idempotency, JSON schema

Pre-Landing Review

  • Fixed grep substring matching bug in gstack-status.sh (RESOLVED also matched UNRESOLVED)
  • Fixed script path resolution (scripts run from user's project dir, not gstack install dir)
  • Fixed gstack-reset.sh re-init path with self-relative resolution

Test plan

  • All unit tests pass (15 tests, 0 failures)
  • init-memory.sh is idempotent (running twice doesn't corrupt state)
  • gstack-status.sh correctly counts RESOLVED vs UNRESOLVED findings
  • gstack-reset.sh archives before reinitializing

🤖 Generated with Claude Code

schneidermr and others added 4 commits March 20, 2026 23:53
Add file-backed memory system (.gstack/) that survives context window
compaction. Includes shared protocol doc (lib/memory.md), init script,
status viewer, and reset utility. Updates CLAUDE.md with memory docs.
Add memory initialization, finding persistence, checkpoint triggers,
and completion handoff to /review, /qa, /investigate, /ship, and /retro.
Add memory-informed insights to /retro. Add unit tests for all 3
utility scripts (15 tests). Add touchfiles TODO for future E2E coverage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add synthetic memory architecture section to ARCHITECTURE.md.
@schneidermr
Copy link
Author

Synthetic Memory Layer — What This PR Changes

The problem

During long-running skills like /review, /qa, and /investigate, Claude Code's context window fills up and gets compacted — silently summarized, losing critical details. The agent doesn't know it lost information and continues with degraded awareness.

Scenario: /review on a 40-file branch (30 turns)

Without synthetic memory

Turn What happens Context Status
1-3 SKILL.md loads, initial scan ~18k (9%) ✅ Normal
4-10 Dep + type checks. Finds null check (P1), unused import (P2) ~45k (22%) ✅ Normal
11-15 Security scan. Finds SQL injection in auth.py:42 (P0). User decides "skip CSS lint" ~85k (42%) ✅ Normal
16-22 Race condition analysis. Finds race in payment.py:187 (P1). Context getting heavy ~140k (70%) ⚠️ Pressure
Compaction triggered — context summarized to ~80k
23-25 Error handling check ~105k (52%) Degraded
26-30 Test coverage + final summary ~155k (77%) Degraded

What got lost after compaction:

  • ❌ Exact SQL injection line number (auth.py:42 → "auth module has issues")
  • ❌ User's "skip CSS lint" decision — agent re-asks
  • ❌ Null check finding from turn 6 — omitted from final summary
  • ❌ SKILL.md instruction "check for race conditions in DB writes" — agent stops checking
  • ⚠️ Race condition details compressed to "payment module may have concurrency issue"

Final report: mentions 1 finding (vaguely) instead of 3


With synthetic memory (this PR)

Turn What happens Context Status
1-3 SKILL.md loads, init-memory.sh runs, session.json initialized ~20k (10%) ✅ Normal
4-10 Dep + type checks. Finds null check → writes F001 to findings.md. Finds unused import → writes F002 ~48k (24%) ✅ Normal
10 📋 Checkpoint — reads session.json, prints status, verifies state ✅ Synced
11-15 SQL injection found → writes F003 to findings.md immediately. User says "skip CSS lint" → logged to decisions.log ~88k (44%) ✅ Normal
16-22 Race condition found → writes F004 to findings.md. Checkpoint at turn 20 re-reads all state ~145k (72%) ⚠️ Pressure
Compaction triggered — but findings.md + session.json + decisions.log are untouched on disk
23 📋 Post-compaction checkpoint — reads all files, recovers full state 🔵 Recovered
24-30 Continues error handling (knew it was pending from session.json). Final summary reads findings.md — reports all 4 findings with exact details ~110k (55%) ✅ Normal

Final report: all 4 findings with exact line numbers, severities, and evidence


Side-by-side outcome

Information Without With
Finding count ⚠️ 1 of 3 reported ✅ 4 of 4 reported
Finding details (line number, evidence) ❌ Vague after compaction ✅ Exact from findings.md
User scope decisions ❌ Forgotten, re-asked ✅ Read from decisions.log
Review checklist progress ⚠️ Skips checks silently ✅ Resumes from session.json
Skill-specific instructions ❌ May be summarized away 🔵 Checkpoints re-anchor
Cross-skill handoff ❌ No mechanism ✅ handoff.md → /ship
/ship quality gate ❌ No blocker awareness ✅ Blocks on unresolved P0s
/investigate history ❌ Re-tries disproven fixes ✅ Hypothesis log on disk

Token overhead

Tokens
Extra cost per session (memory ops) ~3-5k (~6 file reads/writes for checkpoints)
Tokens saved (no re-asking, no redundant work) ~8-15k (no repeated scope questions, no re-reading files, no re-running dropped checks)

Net effect: saves tokens overall while preventing information loss.

What this doesn't fix

  • Reasoning chains — the logical steps that led to a finding are still lost. The finding itself is saved, but "I noticed X → checked Y → found Z" gets compressed.
  • Instruction compliance — if compaction degrades skill instructions enough, the agent may stop doing checkpoints (recursive problem). System-prompt pinning mitigates but can't fully solve.
  • True isolation — all state still accumulates in one context window. Synthetic memory makes compaction survivable, not avoidable. For full isolation, you need a subagent architecture.

Files added/changed

.gstack/                          ← new directory (gitignored), session-local state
├── session.json                  ← per-skill runtime state (phase, findings, progress)
├── findings.md                   ← append-only structured findings (source of truth)
├── handoff.md                    ← inter-skill context transfer
├── decisions.log                 ← append-only user decisions with timestamps
└── checkpoints/                  ← periodic state snapshots
 
scripts/
├── init-memory.sh                ← initializes .gstack/ directory
├── gstack-status.sh              ← quick status of synthetic memory state
└── gstack-reset.sh               ← archive + reset memory (start fresh)
 
lib/memory.md                     ← shared protocol (included by reference in skills)
 
Patched skills: /review, /qa, /investigate, /ship, /retro

Key design decisions

  • findings.md wins over session.json — append-only is more durable than overwrite. session.json is a cache, findings.md is the audit trail.
  • Write order: findings.md → decisions.log → session.json — durable logs first, cache last.
  • Checkpoint every 5 tool calls — balances recovery frequency vs token overhead.
  • .gstack/ is gitignored — session state is ephemeral, not project config.
  • No new dependencies — just markdown, JSON, and bash. Fits gstack's zero-dependency philosophy.

@schneidermr schneidermr changed the title feat: synthetic memory layer for compaction resistance (v0.9.5.0) feat: synthetic memory layer for compaction resistance (v0.9.6.0) Mar 21, 2026
Upstream advanced from v0.9.5.0 to v0.11.18.2 with significant changes
(Ship With Teeth, CI tiers, telemetry, new skills). Keep all synthetic
memory additions alongside upstream's new features:

- review: Finding Persistence + Test Coverage Diagram (both kept)
- retro: Global Retro mode + Memory-Informed Context (both kept)
- ship: Ship Metrics + Post-Ship Cleanup (both kept)
- CLAUDE.md: lib/ + cso/design-consultation/setup-deploy/.github/ (merged)
- .agents/: accept upstream deletion of old SKILL.md files, regenerated
- CHANGELOG: synthetic memory entry renumbered to v0.11.19.0
- All SKILL.md files regenerated from resolved templates
…ck/)

Redesign synthetic memory into two layers:
- Session state (~/.gstack/projects/$SLUG/) — private, per-user, ephemeral
- Team knowledge (.gstack/) — optionally committed, shared across team

Branch-scoped findings, anti-patterns registry from PR garrytan#403,
markdown over JSON for reliability, checkpoint = print not copy.
Fix upstream preamble casing mismatch in skill-validation test.
@schneidermr
Copy link
Author

Synthetic Memory: Design Comparison

This document compares three approaches to state persistence in gstack, explaining
why the synthetic memory layer was redesigned from v1 to v2.

The problem

Claude's context window silently compresses older messages during long-running
skills like /review, /qa, and /investigate. When this happens, specific
findings, user decisions, and session progress disappear — the agent forgets
what it already checked, re-investigates resolved issues, and loses track of
what the user approved or rejected. The longer the session, the worse it gets.

Three approaches

1. Upstream gstack persistence (~/.gstack/)

What it does: Stores cross-session history in the user's home directory,
scoped by project slug and branch.

Path Purpose Written by Read by
~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl Review pass/fail, ship overrides, ship metrics /review, /ship /ship (gate), /retro (trends)
~/.gstack/analytics/skill-usage.jsonl Skill invocation telemetry Preamble hooks /retro
~/.gstack/greptile-history.md Greptile false positive tracking /review, /ship /retro
~/.gstack/config.yaml User preferences gstack-config All skills

Strengths:

  • Never touches the repo — zero intrusiveness
  • Branch-aware via $SLUG/$BRANCH naming
  • JSONL format is structured and searchable
  • Already integrated into /ship gates and /retro trends

Gaps:

  • No within-session compaction resistance — only persists outcomes, not progress
  • No granular finding details — just "3 critical, 2 informational"
  • No user decision tracking — "what did the user approve?" is lost
  • No skill-to-skill context transfer beyond pass/fail

2. Synthetic Memory v1 (original PR)

What it does: Project-local .gstack/ directory with file-backed state
that survives context compaction.

Path Purpose
.gstack/session.json Full session state: skill, phase, turn count, findings array, decisions array, pending/completed checks
.gstack/findings.md Append-only finding registry with structured format
.gstack/decisions.log Append-only user decision audit trail
.gstack/handoff.md Skill-to-skill context transfer document
.gstack/checkpoints/ Periodic snapshots of session.json

Strengths:

  • Solves the core compaction problem — findings persist to disk immediately
  • Dual-write with tiebreaker (findings.md wins over session.json)
  • Checkpoint protocol re-injects state every 5 tool calls
  • Skill handoff carries detailed context between invocations

Problems identified during review:

  1. session.json is fragile. Asking Claude to maintain a complex JSON file
    with arrays, increment counters, and move items between arrays on every tool
    call is expensive prompt real estate and error-prone. One malformed write
    corrupts the entire state file.

  2. No branch awareness. .gstack/findings.md is a flat file — findings from
    different branches mix together. Switching branches means seeing irrelevant findings.

  3. checkpoints/ is overhead without value. Copying session.json every 5
    tool calls creates files nobody reads. The checkpoint print (re-injecting
    state into context) is the valuable part; the file copy adds nothing.

  4. Everything is session-scoped. The entire .gstack/ directory is designed
    for a single user's active session. In a team setting, committing it causes
    merge conflicts on session.json and handoff.md (ephemeral per-user state).
    But some files (decisions.log, anti-patterns) would be valuable to share.

  5. Duplicate persistence. /review writes findings to both .gstack/findings.md
    (granular) and ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl (summary) with
    no link between them.

  6. Repo pollution. Creates a directory in the user's project that requires
    .gitignore maintenance. If forgotten, session state leaks into commits.

3. Synthetic Memory v2 (updated PR, redesigned)

Key insight: Files fall into two categories — session-scoped (ephemeral,
single-user) and knowledge-scoped (durable, team-valuable). v1 treated them
all the same. v2 puts them where they belong.

Session state → ~/.gstack/projects/$SLUG/ (private, never committed)

Path Purpose
state.md Minimal session marker: skill, phase, turn (plain markdown, not JSON)
handoff.md Skill-to-skill context transfer (deleted after consumption)
findings-$BRANCH.md Branch-scoped granular findings

Lives alongside upstream's existing $BRANCH-reviews.jsonl. Uses the same
$SLUG and $BRANCH variables via gstack-slug. Never touches the repo.

Why plain markdown instead of JSON? Claude reads and writes markdown far more
reliably than structured JSON with arrays. A corrupted markdown line doesn't
break the whole file. A corrupted JSON bracket does.

Why branch-scoped findings? findings-feat-auth.md and findings-feat-payments.md
don't interfere. Switching branches shows only relevant findings.

Team knowledge → .gstack/ (repo-level, optionally committed)

Path Purpose
decisions.log Team decision history — what was approved, rejected, deferred
anti-patterns.md Failed fixes that should never be re-attempted

These files are valuable across the team:

  • Alice's /review finds a P0 on Monday → Bob's /ship on Tuesday blocks on it
  • A fix attempt that was disproved in /investigate is recorded → nobody wastes
    time trying it again
  • "We decided to skip CSS linting" is visible to every team member

Default: gitignored (less intrusive for solo developers). Teams that want
shared knowledge can commit these two files.

What was kept from v1

  • Checkpoint printing — every N tool calls, re-read files and print a status
    block into the conversation. This is the core compaction resistance mechanism.
  • Findings as source of truth — if conversation memory disagrees with the
    findings file, the file wins.
  • Decision logging — append-only audit trail of user decisions.
  • Skill handoff — detailed context transfer between skill invocations.

What was cut from v1

  • session.json → replaced by state.md (3 lines of markdown vs complex JSON)
  • checkpoints/ directory → removed entirely (file copies added no value)
  • init-memory.sh complexity → simplified to mkdir -p + touch

What was added in v2

Storage layout comparison

v1 (original):                          v2 (redesigned):

.gstack/                (repo)          .gstack/                (repo, optional commit)
├── session.json         ← fragile      ├── decisions.log        ← team knowledge
├── findings.md          ← flat         └── anti-patterns.md     ← team knowledge
├── decisions.log
├── handoff.md           ← ephemeral    ~/.gstack/projects/$SLUG/ (private, per-user)
└── checkpoints/         ← waste        ├── state.md             ← simple markdown
                                        ├── handoff.md           ← ephemeral
                                        ├── findings-$BRANCH.md  ← branch-scoped
                                        └── $BRANCH-reviews.jsonl ← upstream (unchanged)

Design principles

  1. Use upstream's infrastructure where it exists. Don't duplicate what
    gstack-slug, gstack-review-log, and the JSONL system already provide.

  2. Session state is private. Live skill progress, turn counts, and handoff
    documents belong in the user's home directory, not the repo.

  3. Team knowledge is shareable. Decisions and anti-patterns compound across
    the team. Make them easy to commit without forcing it.

  4. Markdown over JSON for LLM-maintained files. Claude writes markdown
    reliably. Claude writes JSON with arrays unreliably. Design for the agent
    that actually maintains the files.

  5. Branch awareness by default. Findings from feat-auth shouldn't pollute
    feat-payments. Use the same $SLUG/$BRANCH scoping upstream already uses.

  6. Checkpoint = print, not copy. The value is re-injecting state into the
    context window. File copies are overhead without readers.

@schneidermr schneidermr changed the title feat: synthetic memory layer for compaction resistance (v0.9.6.0) feat: synthetic memory layer for compaction resistance (v0.11.19.0) Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant