Skip to content

Conversation

@aadamsx
Copy link

@aadamsx aadamsx commented Dec 22, 2025

Summary

This PR addresses memory exhaustion issues that occur when running workflows with loops containing agent blocks that make many tool calls (e.g., MCP file operations).

Fixes #2525

Problem

Memory accumulated unbounded in two key areas during workflow execution:

  1. allIterationOutputs in LoopScope - every loop iteration pushed results to this array with no limit
  2. blockLogs in ExecutionContext - every block execution added logs with no pruning

This caused OOM crashes on systems with 64GB+ RAM during long-running workflow executions with loops.

Solution

Added memory management with configurable limits in two places:

Loop Orchestrator (apps/sim/executor/orchestrators/loop.ts)

  • New addIterationOutputsWithMemoryLimit() method
  • Limits stored iterations to MAX_STORED_ITERATION_OUTPUTS (default: 100)
  • Monitors memory size with MAX_ITERATION_OUTPUTS_SIZE_BYTES (default: 50MB)
  • Discards oldest iterations when limits exceeded
  • Logs warning when truncation occurs

Block Executor (apps/sim/executor/execution/block-executor.ts)

  • New addBlockLogWithMemoryLimit() method
  • Limits stored logs to MAX_BLOCK_LOGS (default: 500)
  • Monitors memory size with MAX_BLOCK_LOGS_SIZE_BYTES (default: 100MB)
  • Periodic size checks every 50 logs to avoid frequent JSON serialization
  • Logs warning when truncation occurs

New Constants (apps/sim/executor/constants.ts)

  • MAX_STORED_ITERATION_OUTPUTS: 100
  • MAX_ITERATION_OUTPUTS_SIZE_BYTES: 50MB
  • MAX_BLOCK_LOGS: 500
  • MAX_BLOCK_LOGS_SIZE_BYTES: 100MB

Trade-offs

  • Final aggregated loop.results will contain only the most recent iterations (up to 100)
  • Block logs in execution data will contain only the most recent logs (up to 500)
  • Warnings are logged when truncation occurs, allowing users to see if limits were hit

Testing

  • For loop: Execute a workflow with 200+ loop iterations, verify memory doesn't grow unbounded
  • Agent in loop: Run a loop with agent blocks making 50+ tool calls per iteration, verify no OOM

Files Changed

  • apps/sim/executor/constants.ts - Added new configurable limits
  • apps/sim/executor/orchestrators/loop.ts - Added memory-bounded iteration storage
  • apps/sim/executor/execution/block-executor.ts - Added memory-bounded log storage

icecrasher321 and others added 7 commits December 18, 2025 16:23
…dioai#2481)

The realtime service network policy was missing the custom egress rules section
that allows configuration of additional egress rules via values.yaml. This caused
the realtime pods to be unable to connect to external databases (e.g., PostgreSQL
on port 5432) when using external database configurations.

The app network policy already had this section, but the realtime network policy
was missing it, creating an inconsistency and preventing the realtime service
from accessing external databases configured via networkPolicy.egress values.

This fix adds the same custom egress rules template section to the realtime
network policy, matching the app network policy behavior and allowing users to
configure database connectivity via values.yaml.
@vercel
Copy link

vercel bot commented Dec 22, 2025

@aadamsx is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 22, 2025

Greptile Summary

This PR implements memory management for long-running workflow loops by adding bounded storage for iteration outputs and block logs.

Key Changes:

  • Added configurable memory limits in constants.ts (100 iterations/50MB for loops, 500 logs/100MB for blocks)
  • Implemented addIterationOutputsWithMemoryLimit() in loop orchestrator that truncates oldest iterations when limits exceeded
  • Implemented addBlockLogWithMemoryLimit() in block executor that discards older logs when limits exceeded
  • Both use periodic memory size checks (every 10 iterations / every 50 logs) to avoid frequent serialization overhead
  • Added custom egress rules support to Helm network policy (unrelated infrastructure change)

Considerations:

  • Memory size checks use JSON.stringify().length * 2 as approximation, which may not reflect actual V8 heap usage
  • Periodic checking (modulo arithmetic) means memory can grow beyond limits between checks
  • Users lose access to older iteration results/logs after truncation (trade-off documented in PR)

Confidence Score: 4/5

  • safe to merge with minor timing edge case in memory checks
  • implementation correctly addresses the OOM issue with reasonable defaults, but periodic memory checking has a timing gap where memory can exceed limits between checks (especially after count-based truncation resets the modulo counter)
  • pay attention to apps/sim/executor/orchestrators/loop.ts due to memory check timing issue

Important Files Changed

Filename Overview
apps/sim/executor/constants.ts added memory limit constants for loop iterations and block logs with clear documentation
apps/sim/executor/orchestrators/loop.ts implemented memory-bounded iteration storage with count and size limits; memory check timing may cause issues
apps/sim/executor/execution/block-executor.ts implemented memory-bounded log storage with count and size limits, properly handles serialization failures

Sequence Diagram

sequenceDiagram
    participant WF as Workflow Executor
    participant BE as BlockExecutor
    participant LO as LoopOrchestrator
    participant LS as LoopScope
    participant CTX as ExecutionContext
    
    Note over WF,CTX: Loop Execution with Memory Management
    
    WF->>LO: initializeLoopScope(ctx, loopId)
    LO->>LS: create LoopScope
    LS-->>LO: scope with empty allIterationOutputs[]
    
    loop Each Iteration
        WF->>BE: execute(ctx, node, block)
        BE->>BE: createBlockLog()
        BE->>BE: addBlockLogWithMemoryLimit(ctx, blockLog)
        
        alt blockLogs.length > MAX_BLOCK_LOGS (500)
            BE->>CTX: slice and discard oldest logs
            BE->>BE: log warning
        end
        
        alt blockLogs.length % 50 === 0
            BE->>BE: estimateBlockLogsSize()
            alt size > MAX_BLOCK_LOGS_SIZE_BYTES (100MB)
                BE->>CTX: discard oldest half of logs
                BE->>BE: log warning
            end
        end
        
        BE->>CTX: blockLogs.push(blockLog)
        BE-->>WF: NormalizedBlockOutput
        
        WF->>LO: storeLoopNodeOutput(ctx, loopId, nodeId, output)
        LO->>LS: currentIterationOutputs.set(nodeId, output)
        
        WF->>LO: evaluateLoopContinuation(ctx, loopId)
        LO->>LO: collect iterationResults from currentIterationOutputs
        LO->>LO: addIterationOutputsWithMemoryLimit(scope, results)
        
        alt allIterationOutputs.length > MAX_STORED_ITERATION_OUTPUTS (100)
            LO->>LS: slice and discard oldest iterations
            LO->>LO: log warning
        end
        
        alt allIterationOutputs.length % 10 === 0
            LO->>LO: estimateObjectSize()
            alt size > MAX_ITERATION_OUTPUTS_SIZE_BYTES (50MB)
                LO->>LS: discard oldest half of iterations
                LO->>LO: log warning
            end
        end
        
        LO->>LS: allIterationOutputs.push(results)
        LO->>LS: currentIterationOutputs.clear()
        
        LO->>LO: evaluateCondition()
        alt condition false
            LO->>LO: createExitResult()
            LO-->>WF: shouldExit=true
        else condition true
            LO->>LS: increment iteration
            LO-->>WF: shouldContinue=true
        end
    end
    
    Note over WF,CTX: Memory bounded to ~150MB total
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (5)

  1. apps/sim/executor/orchestrators/loop.ts, line 519-529 (link)

    logic: JSON.stringify() runs on EVERY iteration before size check, defeating the purpose of memory optimization. With 1000 iterations, this serializes potentially GBs of data 1000 times.

    Move size check to only run periodically (e.g., every 10 iterations) like block-executor.ts:

    Then in addIterationOutputsWithMemoryLimit at line 497:

    // Check memory size limit periodically (every 10 iterations to avoid frequent serialization)
    if (scope.allIterationOutputs.length % 10 === 0) {
      const estimatedSize = this.estimateObjectSize(scope.allIterationOutputs)
      if (estimatedSize > DEFAULTS.MAX_ITERATION_OUTPUTS_SIZE_BYTES) {
  2. apps/sim/executor/execution/block-executor.ts, line 704-710 (link)

    logic: returns max limit on error, which prevents cleanup and allows unbounded growth if serialization consistently fails

  3. apps/sim/executor/orchestrators/loop.ts, line 519-529 (link)

    logic: same issue - returning max limit prevents cleanup on serialization errors

  4. apps/sim/executor/orchestrators/loop.ts, line 483-494 (link)

    logic: slicing from discardCount removes NEWEST iterations instead of oldest. The intent is to keep the most recent 100 iterations.

  5. apps/sim/executor/execution/block-executor.ts, line 673-682 (link)

    logic: slicing from discardCount removes NEWEST logs instead of oldest

3 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

This change addresses memory exhaustion issues that occur when running
workflows with loops containing agent blocks that make many tool calls.

Problem:
Memory accumulated unbounded in two key areas:
1. allIterationOutputs in LoopScope - every iteration pushed results
2. blockLogs in ExecutionContext - every block execution added logs

Solution:
Added memory management with configurable limits in constants.ts:
- MAX_STORED_ITERATION_OUTPUTS (100) and MAX_ITERATION_OUTPUTS_SIZE_BYTES (50MB)
- MAX_BLOCK_LOGS (500) and MAX_BLOCK_LOGS_SIZE_BYTES (100MB)

Loop orchestrator (loop.ts):
- New addIterationOutputsWithMemoryLimit() method
- Periodic size checks (every 10 iterations) to avoid serialization overhead
- Discards oldest iterations when limits exceeded

Block executor (block-executor.ts):
- New addBlockLogWithMemoryLimit() method
- Periodic size checks (every 50 logs)
- Discards oldest logs when limits exceeded

Trade-offs:
- Final aggregated results contain only recent iterations
- Logs show warning when truncation occurs for debugging

Fixes simstudioai#2525
@aadamsx aadamsx force-pushed the fix/memory-accumulation-in-loops branch from 36bbe99 to 387efb6 Compare December 22, 2025 19:19
@aadamsx
Copy link
Author

aadamsx commented Dec 27, 2025

Response to Greptile Review

The issues identified in the review on commit aae7c40 have been addressed in the current HEAD (387efb6):

✅ Performance Issue (Fixed)

Added periodic size checking - JSON.stringify() now only runs every 10 iterations for loop outputs and every 50 logs for block logs, rather than on every iteration.

✅ Error Handling (Fixed)

Changed error return to MAX + 1 instead of MAX so that serialization failures properly trigger cleanup.

ℹ️ Array Slicing Logic (Already Correct)

The Greptile comment about array slicing is incorrect. The current logic is working as intended:

scope.allIterationOutputs = scope.allIterationOutputs.slice(discardCount)

When we have 110 items and MAX is 100:

  • discardCount = 110 - 100 = 10
  • slice(10) returns items at indices 10-109 (newest 100 items)
  • Items at indices 0-9 (oldest 10) are discarded

This correctly implements "discard oldest, keep newest" as stated in the docstring. The array is built by pushing new items, so index 0 is oldest and the last index is newest.

@aadamsx aadamsx changed the base branch from main to staging December 27, 2025 03:07
@waleedlatif1 waleedlatif1 deleted the branch simstudioai:staging December 27, 2025 05:25
@waleedlatif1 waleedlatif1 reopened this Dec 27, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. apps/sim/executor/orchestrators/loop.ts, line 496-497 (link)

    logic: memory check happens only on exact multiples of 10, missing iterations 101-109 after first truncation at iteration 100

    if iteration 100 hits count limit and truncates to 100 items, then iteration 101-109 won't check memory size until iteration 110, potentially allowing another 50MB+ growth

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory exhaustion in loop executions with agent blocks making tool calls

4 participants