feat: synthetic memory layer for compaction resistance (v0.11.19.0) by schneidermr · Pull Request #258 · garrytan/gstack

schneidermr · 2026-03-20T23:02:36Z

Summary

Synthetic memory layer — file-backed .gstack/ directory survives context window compaction during long-running skills. Findings, decisions, and session state persist to disk instead of relying on conversation context alone.
Five skills patched — review, qa, investigate, ship, and retro all read/write synthetic memory with skill-specific protocols (screenshot memory for QA, hypothesis tracking for investigate, pre-ship validation for ship, etc.)
Shared protocol in lib/memory.md — single source of truth for initialization, checkpoints, finding persistence, decision logging, and skill handoff
Three utility scripts — init-memory.sh (idempotent setup), gstack-status.sh (quick state display), gstack-reset.sh (archive + reinitialize)
15 unit tests covering init idempotency, JSON schema validity, status output parsing, finding count accuracy, and reset archive behavior
Dual-write with tiebreaker — findings written to both session.json and findings.md, with findings.md as the source of truth if they diverge

Test Coverage

Tests: 0 → 1 (+1 new test file, 15 test cases)
All new code paths covered: init, status, reset, archive, idempotency, JSON schema

Pre-Landing Review

Fixed grep substring matching bug in gstack-status.sh (RESOLVED also matched UNRESOLVED)
Fixed script path resolution (scripts run from user's project dir, not gstack install dir)
Fixed gstack-reset.sh re-init path with self-relative resolution

Test plan

All unit tests pass (15 tests, 0 failures)
init-memory.sh is idempotent (running twice doesn't corrupt state)
gstack-status.sh correctly counts RESOLVED vs UNRESOLVED findings
gstack-reset.sh archives before reinitializing

🤖 Generated with Claude Code

Add file-backed memory system (.gstack/) that survives context window compaction. Includes shared protocol doc (lib/memory.md), init script, status viewer, and reset utility. Updates CLAUDE.md with memory docs.

Add memory initialization, finding persistence, checkpoint triggers, and completion handoff to /review, /qa, /investigate, /ship, and /retro. Add memory-informed insights to /retro. Add unit tests for all 3 utility scripts (15 tests). Add touchfiles TODO for future E2E coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add synthetic memory architecture section to ARCHITECTURE.md.

schneidermr · 2026-03-20T23:04:49Z

Synthetic Memory Layer — What This PR Changes

The problem

During long-running skills like /review, /qa, and /investigate, Claude Code's context window fills up and gets compacted — silently summarized, losing critical details. The agent doesn't know it lost information and continues with degraded awareness.

Scenario: `/review` on a 40-file branch (30 turns)

Without synthetic memory

Turn	What happens	Context	Status
1-3	SKILL.md loads, initial scan	~18k (9%)	✅ Normal
4-10	Dep + type checks. Finds null check (P1), unused import (P2)	~45k (22%)	✅ Normal
11-15	Security scan. Finds SQL injection in `auth.py:42` (P0). User decides "skip CSS lint"	~85k (42%)	✅ Normal
16-22	Race condition analysis. Finds race in `payment.py:187` (P1). Context getting heavy	~140k (70%)	⚠️ Pressure
	⚡ Compaction triggered — context summarized to ~80k
23-25	Error handling check	~105k (52%)	❌ Degraded
26-30	Test coverage + final summary	~155k (77%)	❌ Degraded

What got lost after compaction:

❌ Exact SQL injection line number (auth.py:42 → "auth module has issues")
❌ User's "skip CSS lint" decision — agent re-asks
❌ Null check finding from turn 6 — omitted from final summary
❌ SKILL.md instruction "check for race conditions in DB writes" — agent stops checking
⚠️ Race condition details compressed to "payment module may have concurrency issue"

Final report: mentions 1 finding (vaguely) instead of 3

With synthetic memory (this PR)

Turn	What happens	Context	Status
1-3	SKILL.md loads, `init-memory.sh` runs, session.json initialized	~20k (10%)	✅ Normal
4-10	Dep + type checks. Finds null check → writes F001 to findings.md. Finds unused import → writes F002	~48k (24%)	✅ Normal
10	📋 Checkpoint — reads session.json, prints status, verifies state		✅ Synced
11-15	SQL injection found → writes F003 to findings.md immediately. User says "skip CSS lint" → logged to decisions.log	~88k (44%)	✅ Normal
16-22	Race condition found → writes F004 to findings.md. Checkpoint at turn 20 re-reads all state	~145k (72%)	⚠️ Pressure
	⚡ Compaction triggered — but findings.md + session.json + decisions.log are untouched on disk
23	📋 Post-compaction checkpoint — reads all files, recovers full state		🔵 Recovered
24-30	Continues error handling (knew it was pending from session.json). Final summary reads findings.md — reports all 4 findings with exact details	~110k (55%)	✅ Normal

Final report: all 4 findings with exact line numbers, severities, and evidence

Side-by-side outcome

Information	Without	With
Finding count	⚠️ 1 of 3 reported	✅ 4 of 4 reported
Finding details (line number, evidence)	❌ Vague after compaction	✅ Exact from findings.md
User scope decisions	❌ Forgotten, re-asked	✅ Read from decisions.log
Review checklist progress	⚠️ Skips checks silently	✅ Resumes from session.json
Skill-specific instructions	❌ May be summarized away	🔵 Checkpoints re-anchor
Cross-skill handoff	❌ No mechanism	✅ handoff.md → /ship
/ship quality gate	❌ No blocker awareness	✅ Blocks on unresolved P0s
/investigate history	❌ Re-tries disproven fixes	✅ Hypothesis log on disk

Token overhead

	Tokens
Extra cost per session (memory ops)	~3-5k (~6 file reads/writes for checkpoints)
Tokens saved (no re-asking, no redundant work)	~8-15k (no repeated scope questions, no re-reading files, no re-running dropped checks)

Net effect: saves tokens overall while preventing information loss.

What this doesn't fix

Reasoning chains — the logical steps that led to a finding are still lost. The finding itself is saved, but "I noticed X → checked Y → found Z" gets compressed.
Instruction compliance — if compaction degrades skill instructions enough, the agent may stop doing checkpoints (recursive problem). System-prompt pinning mitigates but can't fully solve.
True isolation — all state still accumulates in one context window. Synthetic memory makes compaction survivable, not avoidable. For full isolation, you need a subagent architecture.

Files added/changed

.gstack/                          ← new directory (gitignored), session-local state
├── session.json                  ← per-skill runtime state (phase, findings, progress)
├── findings.md                   ← append-only structured findings (source of truth)
├── handoff.md                    ← inter-skill context transfer
├── decisions.log                 ← append-only user decisions with timestamps
└── checkpoints/                  ← periodic state snapshots
 
scripts/
├── init-memory.sh                ← initializes .gstack/ directory
├── gstack-status.sh              ← quick status of synthetic memory state
└── gstack-reset.sh               ← archive + reset memory (start fresh)
 
lib/memory.md                     ← shared protocol (included by reference in skills)
 
Patched skills: /review, /qa, /investigate, /ship, /retro

Key design decisions

findings.md wins over session.json — append-only is more durable than overwrite. session.json is a cache, findings.md is the audit trail.
Write order: findings.md → decisions.log → session.json — durable logs first, cache last.
Checkpoint every 5 tool calls — balances recovery frequency vs token overhead.
.gstack/ is gitignored — session state is ephemeral, not project config.
No new dependencies — just markdown, JSON, and bash. Fits gstack's zero-dependency philosophy.

Upstream advanced from v0.9.5.0 to v0.11.18.2 with significant changes (Ship With Teeth, CI tiers, telemetry, new skills). Keep all synthetic memory additions alongside upstream's new features: - review: Finding Persistence + Test Coverage Diagram (both kept) - retro: Global Retro mode + Memory-Informed Context (both kept) - ship: Ship Metrics + Post-Ship Cleanup (both kept) - CLAUDE.md: lib/ + cso/design-consultation/setup-deploy/.github/ (merged) - .agents/: accept upstream deletion of old SKILL.md files, regenerated - CHANGELOG: synthetic memory entry renumbered to v0.11.19.0 - All SKILL.md files regenerated from resolved templates

…ck/) Redesign synthetic memory into two layers: - Session state (~/.gstack/projects/$SLUG/) — private, per-user, ephemeral - Team knowledge (.gstack/) — optionally committed, shared across team Branch-scoped findings, anti-patterns registry from PR garrytan#403, markdown over JSON for reliability, checkpoint = print not copy. Fix upstream preamble casing mismatch in skill-validation test.

schneidermr · 2026-03-25T15:23:41Z

Synthetic Memory: Design Comparison

This document compares three approaches to state persistence in gstack, explaining
why the synthetic memory layer was redesigned from v1 to v2.

The problem

Claude's context window silently compresses older messages during long-running
skills like /review, /qa, and /investigate. When this happens, specific
findings, user decisions, and session progress disappear — the agent forgets
what it already checked, re-investigates resolved issues, and loses track of
what the user approved or rejected. The longer the session, the worse it gets.

Three approaches

1. Upstream gstack persistence (`~/.gstack/`)

What it does: Stores cross-session history in the user's home directory,
scoped by project slug and branch.

Path	Purpose	Written by	Read by
`~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl`	Review pass/fail, ship overrides, ship metrics	`/review`, `/ship`	`/ship` (gate), `/retro` (trends)
`~/.gstack/analytics/skill-usage.jsonl`	Skill invocation telemetry	Preamble hooks	`/retro`
`~/.gstack/greptile-history.md`	Greptile false positive tracking	`/review`, `/ship`	`/retro`
`~/.gstack/config.yaml`	User preferences	`gstack-config`	All skills

Strengths:

Never touches the repo — zero intrusiveness
Branch-aware via $SLUG/$BRANCH naming
JSONL format is structured and searchable
Already integrated into /ship gates and /retro trends

Gaps:

No within-session compaction resistance — only persists outcomes, not progress
No granular finding details — just "3 critical, 2 informational"
No user decision tracking — "what did the user approve?" is lost
No skill-to-skill context transfer beyond pass/fail

2. Synthetic Memory v1 (original PR)

What it does: Project-local .gstack/ directory with file-backed state
that survives context compaction.

Path	Purpose
`.gstack/session.json`	Full session state: skill, phase, turn count, findings array, decisions array, pending/completed checks
`.gstack/findings.md`	Append-only finding registry with structured format
`.gstack/decisions.log`	Append-only user decision audit trail
`.gstack/handoff.md`	Skill-to-skill context transfer document
`.gstack/checkpoints/`	Periodic snapshots of session.json

Strengths:

Solves the core compaction problem — findings persist to disk immediately
Dual-write with tiebreaker (findings.md wins over session.json)
Checkpoint protocol re-injects state every 5 tool calls
Skill handoff carries detailed context between invocations

Problems identified during review:

session.json is fragile. Asking Claude to maintain a complex JSON file
with arrays, increment counters, and move items between arrays on every tool
call is expensive prompt real estate and error-prone. One malformed write
corrupts the entire state file.
No branch awareness. .gstack/findings.md is a flat file — findings from
different branches mix together. Switching branches means seeing irrelevant findings.
checkpoints/ is overhead without value. Copying session.json every 5
tool calls creates files nobody reads. The checkpoint print (re-injecting
state into context) is the valuable part; the file copy adds nothing.
Everything is session-scoped. The entire .gstack/ directory is designed
for a single user's active session. In a team setting, committing it causes
merge conflicts on session.json and handoff.md (ephemeral per-user state).
But some files (decisions.log, anti-patterns) would be valuable to share.
Duplicate persistence. /review writes findings to both .gstack/findings.md
(granular) and ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl (summary) with
no link between them.
Repo pollution. Creates a directory in the user's project that requires
.gitignore maintenance. If forgotten, session state leaks into commits.

3. Synthetic Memory v2 (updated PR, redesigned)

Key insight: Files fall into two categories — session-scoped (ephemeral,
single-user) and knowledge-scoped (durable, team-valuable). v1 treated them
all the same. v2 puts them where they belong.

Session state → `~/.gstack/projects/$SLUG/` (private, never committed)

Path	Purpose
`state.md`	Minimal session marker: skill, phase, turn (plain markdown, not JSON)
`handoff.md`	Skill-to-skill context transfer (deleted after consumption)
`findings-$BRANCH.md`	Branch-scoped granular findings

Lives alongside upstream's existing $BRANCH-reviews.jsonl. Uses the same
$SLUG and $BRANCH variables via gstack-slug. Never touches the repo.

Why plain markdown instead of JSON? Claude reads and writes markdown far more
reliably than structured JSON with arrays. A corrupted markdown line doesn't
break the whole file. A corrupted JSON bracket does.

Why branch-scoped findings? findings-feat-auth.md and findings-feat-payments.md
don't interfere. Switching branches shows only relevant findings.

Team knowledge → `.gstack/` (repo-level, optionally committed)

Path	Purpose
`decisions.log`	Team decision history — what was approved, rejected, deferred
`anti-patterns.md`	Failed fixes that should never be re-attempted

These files are valuable across the team:

Alice's /review finds a P0 on Monday → Bob's /ship on Tuesday blocks on it
A fix attempt that was disproved in /investigate is recorded → nobody wastes
time trying it again
"We decided to skip CSS linting" is visible to every team member

Default: gitignored (less intrusive for solo developers). Teams that want
shared knowledge can commit these two files.

What was kept from v1

Checkpoint printing — every N tool calls, re-read files and print a status
block into the conversation. This is the core compaction resistance mechanism.
Findings as source of truth — if conversation memory disagrees with the
findings file, the file wins.
Decision logging — append-only audit trail of user decisions.
Skill handoff — detailed context transfer between skill invocations.

What was cut from v1

session.json → replaced by state.md (3 lines of markdown vs complex JSON)
checkpoints/ directory → removed entirely (file copies added no value)
init-memory.sh complexity → simplified to mkdir -p + touch

What was added in v2

Anti-patterns tracking (inspired by PR feat: add /solve (ticket to PR) and /memory (persistent session memory) #403) — records failed fix attempts
so future sessions never re-attempt them.
JSONL linkage — when /review logs to upstream's JSONL, it includes a
pointer to the findings file for traceability.
Branch awareness — findings are scoped per branch via gstack-slug.

Storage layout comparison

v1 (original):                          v2 (redesigned):

.gstack/                (repo)          .gstack/                (repo, optional commit)
├── session.json         ← fragile      ├── decisions.log        ← team knowledge
├── findings.md          ← flat         └── anti-patterns.md     ← team knowledge
├── decisions.log
├── handoff.md           ← ephemeral    ~/.gstack/projects/$SLUG/ (private, per-user)
└── checkpoints/         ← waste        ├── state.md             ← simple markdown
                                        ├── handoff.md           ← ephemeral
                                        ├── findings-$BRANCH.md  ← branch-scoped
                                        └── $BRANCH-reviews.jsonl ← upstream (unchanged)

Design principles

Use upstream's infrastructure where it exists. Don't duplicate what
gstack-slug, gstack-review-log, and the JSONL system already provide.
Session state is private. Live skill progress, turn counts, and handoff
documents belong in the user's home directory, not the repo.
Team knowledge is shareable. Decisions and anti-patterns compound across
the team. Make them easy to commit without forcing it.
Markdown over JSON for LLM-maintained files. Claude writes markdown
reliably. Claude writes JSON with arrays unreliably. Design for the agent
that actually maintains the files.
Branch awareness by default. Findings from feat-auth shouldn't pollute
feat-payments. Use the same $SLUG/$BRANCH scoping upstream already uses.
Checkpoint = print, not copy. The value is re-injecting state into the
context window. File copies are overhead without readers.

schneidermr and others added 4 commits March 20, 2026 23:53

feat: add synthetic memory infrastructure

aca4fad

Add file-backed memory system (.gstack/) that survives context window compaction. Includes shared protocol doc (lib/memory.md), init script, status viewer, and reset utility. Updates CLAUDE.md with memory docs.

chore: bump version and changelog (v0.9.5.0)

323984a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: sync documentation with shipped changes

415c54b

Add synthetic memory architecture section to ARCHITECTURE.md.

schneidermr added 2 commits March 21, 2026 09:27

merge: resolve conflicts with upstream/main (v0.9.4.1)

8c9df16

merge: resolve conflicts with upstream/main (v0.9.5.0)

461a475

schneidermr changed the title ~~feat: synthetic memory layer for compaction resistance (v0.9.5.0)~~ feat: synthetic memory layer for compaction resistance (v0.9.6.0) Mar 21, 2026

garrytan mentioned this pull request Mar 25, 2026

feat: add /solve (ticket to PR) and /memory (persistent session memory) #403

Closed

schneidermr added 3 commits March 25, 2026 07:51

docs: clarify .gstack/ vs ~/.gstack/ storage layers in memory protocol

bec29d0

schneidermr changed the title ~~feat: synthetic memory layer for compaction resistance (v0.9.6.0)~~ feat: synthetic memory layer for compaction resistance (v0.11.19.0) Mar 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258

feat: synthetic memory layer for compaction resistance (v0.11.19.0)#258
schneidermr wants to merge 9 commits intogarrytan:mainfrom
bitkaio:feat/synthetic-memory

schneidermr commented Mar 20, 2026

Uh oh!

schneidermr commented Mar 20, 2026

Uh oh!

schneidermr commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

schneidermr commented Mar 20, 2026

Summary

Test Coverage

Pre-Landing Review

Test plan

Uh oh!

schneidermr commented Mar 20, 2026

Synthetic Memory Layer — What This PR Changes

The problem

Scenario: /review on a 40-file branch (30 turns)

Without synthetic memory

With synthetic memory (this PR)

Side-by-side outcome

Token overhead

What this doesn't fix

Files added/changed

Key design decisions

Uh oh!

schneidermr commented Mar 25, 2026

Synthetic Memory: Design Comparison

The problem

Three approaches

1. Upstream gstack persistence (~/.gstack/)

2. Synthetic Memory v1 (original PR)

3. Synthetic Memory v2 (updated PR, redesigned)

Session state → ~/.gstack/projects/$SLUG/ (private, never committed)

Team knowledge → .gstack/ (repo-level, optionally committed)

What was kept from v1

What was cut from v1

What was added in v2

Storage layout comparison

Design principles

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Scenario: `/review` on a 40-file branch (30 turns)

1. Upstream gstack persistence (`~/.gstack/`)

Session state → `~/.gstack/projects/$SLUG/` (private, never committed)

Team knowledge → `.gstack/` (repo-level, optionally committed)