Skip to content

refactor: event-driven BullMQ pipeline, remove MongoDB status polling#20

Draft
necipsagiro wants to merge 1 commit intofeat/proverfrom
refactor/event-driven-pipeline
Draft

refactor: event-driven BullMQ pipeline, remove MongoDB status polling#20
necipsagiro wants to merge 1 commit intofeat/proverfrom
refactor/event-driven-pipeline

Conversation

@necipsagiro
Copy link
Member

Experimental: This PR is not production-ready or audited. It aims to demonstrate how an event-driven version of the current polling-based pipeline would look. Review and further testing required before merging.

Summary

The previous architecture duplicated BullMQ's built-in capabilities in MongoDB:

  • status[] arrays on BlockEpoch/ProofEpoch tracked "waiting"/"processing"/"done" — but BullMQ already manages job states
  • failCount fields reimplemented retry logic — but BullMQ has attempts + exponential backoff
  • timeoutAt fields reimplemented timeout detection — but BullMQ has lockDuration + stalled detection
  • Master classes polled MongoDB in while(true) loops looking for "waiting" records — creating unnecessary load and latency
  • Crash recovery was broken: if a server crashed while a record was in "processing" status, it would stay stuck forever with no way to recover

What changed

MongoDB is now a pure data store. All job orchestration is handled by BullMQ.

Before After
Master polling loops (while(true) + sleep) Event-driven push (each worker triggers the next stage)
MongoDB status[] fields BullMQ job states
MongoDB failCount + manual retry BullMQ attempts: 3 + exponential backoff (10s → 20s → 40s)
MongoDB timeoutAt + manual timeout check BullMQ lockDuration (5min) + stalled detection (5s)
No crash recovery Automatic re-queue via BullMQ + startup recovery sweep
15 hardcoded aggregation patterns Generic binary tree formula
Mongoose transactions in workers Idempotent upserts

New files

  • processors/triggers.ts — Event-driven stage transitions. After a worker stores a proof, tryEnqueueAggregation() checks if the sibling proof exists and enqueues the merge. tryEnqueueSettlement() enqueues the settler when the root proof is ready.
  • processors/pipeline.tsPipelineManager replaces the three separate Master classes. Creates all BullMQ workers, handles events, graceful shutdown (SIGTERM/SIGINT).
  • processors/recovery.ts — Startup recovery sweep. Scans MongoDB for incomplete work (full epochs without proofs, sibling pairs without parent, unsettled roots) and enqueues missing jobs. Safe to run repeatedly thanks to deterministic job IDs.
  • processors/utils/jobOptions.ts — Shared BullMQ config and deterministic job ID generators (bp:{height}, agg:{height}:{index}, settle:{height}).

Deleted files

  • processors/base/Master.ts — Base polling class
  • processors/block-prover/master.ts — BlockProver polling loop
  • processors/aggregator/master.ts — Aggregator polling loop
  • processors/settler/master.ts — Settler polling loop
  • processors/block-prover/utils.ts — Status registration helpers
  • db/types.tsBlockStatus, ProofStatus, ProofKind enums
  • All associated test files

Schema simplifications

  • BlockEpoch: Removed status[], epochStatus, failCount, timeoutAt
  • ProofEpoch: Removed status[], kind, failCount, timeoutAt; added settled: boolean
  • Block: Removed status, timeoutAt

Test plan

  • Verify all remaining tests pass (npx vitest run)
  • Review idempotency: each worker skips if its output already exists
  • Review crash recovery: startup sweep catches incomplete work
  • Review deterministic job IDs prevent duplicate processing
  • Manual test: stop server mid-pipeline, restart, verify jobs resume

🤖 Generated with Claude Code

Remove redundant MongoDB status tracking (status[], failCount, timeoutAt, kind)
and Master polling classes. Each worker now triggers the next pipeline stage
upon completion via deterministic BullMQ jobs. Add startup recovery sweep
for crash resilience and PipelineManager for worker lifecycle management.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pulsar Ready Ready Preview, Comment Mar 5, 2026 7:29am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant