Run Lifecycle and State Machine

Relevant source files

This document describes the complete lifecycle of a task run execution, the state machine governing its progression, and the dual-status model that tracks both run-level and execution-level states. For information about how runs are queued and dequeued, see Queue Management. For details on the systems that orchestrate these state transitions, see Run Engine Architecture.

Dual Status Model

The run engine employs a dual-status tracking system to separate concerns between the overall run state and the granular execution state:

TaskRunStatus - Stored on the TaskRun database record, represents the overall lifecycle state of the run. This is the primary status visible to users and used for filtering and reporting.

TaskRunExecutionStatus - Stored on TaskRunExecutionSnapshot records, represents the detailed execution state at specific points in time. Each snapshot captures a moment in the run's execution, allowing the system to track progress, recover from failures, and coordinate distributed operations.

This separation allows the engine to maintain a simple, user-facing status (TaskRun.status) while internally tracking more granular execution states through immutable snapshots. Multiple snapshots can exist for a single run, forming a complete audit trail of execution progress.

Sources: internal-packages/database/prisma/schema.prisma internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts1-450

TaskRun Statuses

The TaskRunStatus enum defines the high-level states a run can be in:

Status	Description	Terminal	Next States
`PENDING`	Run is created and waiting to be queued	No	QUEUED, EXPIRED, PENDING_VERSION
`DELAYED`	Run is waiting for `delayUntil` time	No	PENDING, CANCELED
`PENDING_VERSION`	Waiting for a worker deployment to become available	No	QUEUED, SYSTEM_FAILURE
`QUEUED`	Enqueued and waiting to be dequeued by a worker	No	DEQUEUED, EXPIRED
`DEQUEUED`	Removed from queue, preparing to execute	No	EXECUTING
`EXECUTING`	Currently executing	No	WAITING_TO_RESUME, COMPLETED_SUCCESSFULLY, COMPLETED_WITH_ERRORS, CRASHED, SYSTEM_FAILURE, CANCELED
`WAITING_TO_RESUME`	Checkpointed and waiting to resume	No	EXECUTING, CANCELED
`COMPLETED_SUCCESSFULLY`	Completed without errors	Yes	None
`COMPLETED_WITH_ERRORS`	Completed but with errors after all retries	Yes	None
`SYSTEM_FAILURE`	Failed due to system/infrastructure issues	Yes	None
`CRASHED`	Worker crashed (OOM, segfault, etc.)	Yes	None
`EXPIRED`	TTL exceeded before execution	Yes	None
`TIMED_OUT`	Execution exceeded maxDuration	Yes	None
`CANCELED`	Explicitly canceled	Yes	None
`INTERRUPTED`	Interrupted (legacy, no longer used)	Yes	None

Sources: internal-packages/run-engine/src/engine/statuses.ts44-61 internal-packages/database/prisma/schema.prisma

TaskRunExecutionStatus

The TaskRunExecutionStatus enum defines the detailed execution states tracked in snapshots:

Status	Description	Dequeuable	Checkpointable	Holds Concurrency
`RUN_CREATED`	Initial snapshot created	No	Yes	No
`QUEUED`	Enqueued in run queue	Yes	Yes	Yes
`QUEUED_EXECUTING`	Queued while still executing (rare)	Yes	Yes	Yes
`PENDING_EXECUTING`	Dequeued, waiting for attempt to start	No	No	Yes
`EXECUTING`	Actively executing code	No	Yes	Yes
`EXECUTING_WITH_WAITPOINTS`	Executing but blocked on waitpoints	No	Yes	No (released)
`SUSPENDED`	Checkpointed and suspended	No	No	No (released)
`FINISHED`	Execution complete	No	No	No
`PENDING_CANCEL`	Cancellation requested	No	No	Yes

Sources: internal-packages/run-engine/src/engine/statuses.ts1-62 internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts140-156

Complete State Machine

The following diagram shows all possible state transitions for a task run, including both TaskRunStatus (shown in bold) and TaskRunExecutionStatus (shown in context):

Sources: internal-packages/run-engine/src/engine/index.ts336-579 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts294-822 internal-packages/run-engine/src/engine/systems/dequeueSystem.ts105-661

State Transitions by Phase

Triggering Phase

When a run is triggered via RunEngine.trigger(), it begins in one of two initial states:

Initial State	Condition	Execution Status
`DELAYED`	`delayUntil` parameter provided	`RUN_CREATED`
`PENDING`	No delay, or after delay expires	`RUN_CREATED` → `QUEUED`

The trigger operation creates:

A TaskRun record with the initial status
An initial TaskRunExecutionSnapshot with executionStatus: RUN_CREATED
An associated Waitpoint (type RUN) for parent tracking
If delayed, schedules enqueueDelayedRun job via DelayedRunSystem
If not delayed, immediately calls EnqueueSystem.enqueueRun()

Sources: internal-packages/run-engine/src/engine/index.ts339-579 internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts100-166

Queuing Phase

The EnqueueSystem transitions runs from initial states to the queue:

Enqueuing performs:

Creates new snapshot with executionStatus: QUEUED
Calculates queue timestamp: (queueTimestamp ?? createdAt) - priorityMs
Adds message to RunQueue (Redis-backed)
Allocates concurrency at environment and queue levels
If TTL specified, schedules expireRun job

Sources: internal-packages/run-engine/src/engine/systems/enqueueSystem.ts25-104 internal-packages/run-queue/src/index.ts

Dequeuing Phase

The DequeueSystem removes runs from the queue and prepares them for execution:

Key validations during dequeue:

Latest snapshot must be in QUEUED or QUEUED_EXECUTING state
Background worker must exist for the run's locked version
Task must be registered in the deployment
Task queue must exist
Deployment must have an image reference (production only)

Sources: internal-packages/run-engine/src/engine/systems/dequeueSystem.ts105-661 internal-packages/run-engine/src/engine/systems/dequeueSystem.ts281-489

Execution Phase

The RunAttemptSystem manages the execution lifecycle:

Sources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts294-628 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts630-822

Waitpoint States

Waitpoints allow runs to pause execution while waiting for external events. The WaitpointSystem manages these states:

When a run enters EXECUTING_WITH_WAITPOINTS:

Concurrency tokens are released (environment and queue level)
Other runs can execute while this run waits
Multiple waitpoints can block a single run
Waitpoints can complete independently
When all waitpoints complete, continueRunIfUnblocked() is triggered

Sources: internal-packages/run-engine/src/engine/systems/waitpointSystem.ts368-496 internal-packages/run-engine/src/engine/systems/waitpointSystem.ts499-737

Checkpoint and Resume States

The CheckpointSystem allows runs to suspend execution and free all resources:

Checkpoint states:

WAITING_TO_RESUME with SUSPENDED snapshot: Checkpoint exists, run is fully suspended
Concurrency is completely released
Run must be re-queued to resume
On resume, checkpoint data (location, imageRef) is provided to worker
Worker restores state from checkpoint storage

Sources: internal-packages/run-engine/src/engine/systems/checkpointSystem.ts36-249 internal-packages/run-engine/src/engine/systems/checkpointSystem.ts254-365

Terminal States

Runs enter terminal states through RunAttemptSystem.completeRunAttempt():

Sources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts630-667 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts669-822 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts824-1145

Execution Snapshots

TaskRunExecutionSnapshot records provide an immutable audit trail of run state changes:

Key properties:

Immutable: Snapshots are never modified after creation
Linked: Each snapshot references the previous via previousSnapshotId
Atomic: Creating a snapshot is atomic with state updates
Latest: getLatestExecutionSnapshot() finds the most recent valid snapshot
Historical: getExecutionSnapshotsSince() retrieves snapshot history

Snapshot creation flow:

ExecutionSnapshotSystem.createExecutionSnapshot() is called by any system changing state
New snapshot record is inserted with current executionStatus and runStatus
If valid, heartbeat job is scheduled (if needed for the status)
executionSnapshotCreated event is emitted for realtime updates
Previous snapshot becomes part of the historical chain

Sources: internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts226-337 internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts95-113

Concurrency and State

Different states consume or release concurrency tokens:

State	Environment Concurrency	Queue Concurrency	Notes
`RUN_CREATED`	No	No	Not yet queued
`QUEUED`	Yes	Yes	Holds both tokens
`PENDING_EXECUTING`	Yes	Yes	Still holding while starting
`EXECUTING`	Yes	Yes	Active execution
`EXECUTING_WITH_WAITPOINTS`	No	No	Released during wait
`SUSPENDED`	No	No	Released after checkpoint
`QUEUED_EXECUTING`	Yes	Yes	Rare state during re-queue
`FINISHED`	No	No	Execution complete

This design allows the system to maximize throughput:

Runs waiting on external events (EXECUTING_WITH_WAITPOINTS) don't block other runs
Checkpointed runs (SUSPENDED) completely free resources for other work
Only actively executing runs consume concurrency

The concurrency release happens via:

RunQueue.releaseAllConcurrency(orgId, runId) for both levels
Called by CheckpointSystem and WaitpointSystem
Automatic acknowledgment via RunQueue.acknowledgeMessage() on completion

Sources: internal-packages/run-engine/src/engine/systems/checkpointSystem.ts195-203 internal-packages/run-engine/src/engine/systems/waitpointSystem.ts431-440 internal-packages/run-queue/src/index.ts

Heartbeat System

The heartbeat system detects stalled runs and initiates recovery:

Heartbeat intervals by status (from HeartbeatTimeouts):

Status	Default Timeout	Behavior on Timeout
`PENDING_EXECUTING`	60,000ms	Requeue (assume startup failure)
`PENDING_CANCEL`	60,000ms	Force cancel
`EXECUTING`	60,000ms	Mark as crashed (OOM/segfault)
`EXECUTING_WITH_WAITPOINTS`	60,000ms	Usually shouldn't timeout
`SUSPENDED`	600,000ms	Retry heartbeat with exponential backoff

The heartbeat mechanism:

ExecutionSnapshotSystem.createExecutionSnapshot() schedules initial heartbeat
ExecutionSnapshotSystem.heartbeatRun() reschedules on successful heartbeat
Worker calls heartbeatRun() periodically during execution
If not rescheduled, handleStalledSnapshot() is triggered
Recovery action depends on status and environment type

Sources: internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts308-319 internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts339-394 internal-packages/run-engine/src/engine/types.ts89-95 internal-packages/run-engine/src/engine/index.ts199-204

State Machine Implementation Mapping

The following table maps each state transition to the responsible code entity:

Transition	Triggered By	Primary System	Method
Initial → `DELAYED`	User trigger	`RunEngine`	`trigger()`
Initial → `PENDING`	User trigger	`RunEngine`	`trigger()`
`DELAYED` → `PENDING`	Scheduled job	`DelayedRunSystem`	`enqueueDelayedRun()`
`PENDING` → `QUEUED`	System	`EnqueueSystem`	`enqueueRun()`
`QUEUED` → `PENDING_EXECUTING`	Worker pull	`DequeueSystem`	`dequeueFromWorkerQueue()`
`PENDING_EXECUTING` → `EXECUTING`	Worker	`RunAttemptSystem`	`startRunAttempt()`
`EXECUTING` → `EXECUTING_WITH_WAITPOINTS`	Task code	`WaitpointSystem`	`blockRunWithWaitpoint()`
`EXECUTING_WITH_WAITPOINTS` → `EXECUTING`	Waitpoint completion	`WaitpointSystem`	`continueRunIfUnblocked()`
`EXECUTING` → `SUSPENDED`	Task code	`CheckpointSystem`	`createCheckpoint()`
`SUSPENDED` → `QUEUED`	Unblock/resume	`EnqueueSystem`	`enqueueRun()`
`PENDING_EXECUTING` → `EXECUTING` (resume)	Worker	`CheckpointSystem`	`continueRunExecution()`
`EXECUTING` → `COMPLETED_SUCCESSFULLY`	Task completion	`RunAttemptSystem`	`attemptSucceeded()`
`EXECUTING` → `COMPLETED_WITH_ERRORS`	Task error	`RunAttemptSystem`	`attemptFailed()`
`EXECUTING` → `EXECUTING` (retry)	Task error	`RunAttemptSystem`	`attemptFailed()` + `startRunAttempt()`
`EXECUTING` → `CRASHED`	Heartbeat timeout	`RunAttemptSystem`	`handleStalledSnapshot()`
`EXECUTING` → `PENDING_CANCEL`	User cancellation	`RunAttemptSystem`	`cancelRun()`
`PENDING_CANCEL` → `CANCELED`	Worker ACK	`RunAttemptSystem`	`cancelRun()` (finalize)
`PENDING` → `EXPIRED`	TTL timeout	`TtlSystem`	`expireRun()`
`PENDING` → `PENDING_VERSION`	No deployment	`DequeueSystem`	`#pendingVersion()`

All state transitions occur within a distributed lock acquired via RunLocker to ensure consistency across the cluster.

Sources: internal-packages/run-engine/src/engine/index.ts337-579 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts internal-packages/run-engine/src/engine/systems/dequeueSystem.ts internal-packages/run-engine/src/engine/systems/waitpointSystem.ts internal-packages/run-engine/src/engine/systems/checkpointSystem.ts internal-packages/run-engine/src/engine/systems/ttlSystem.ts internal-packages/run-engine/src/engine/locking.ts

Run Lifecycle and State Machine

Relevant source files

Dual Status Model

The run engine employs a dual-status tracking system to separate concerns between the overall run state and the granular execution state:

TaskRunStatus - Stored on the TaskRun database record, represents the overall lifecycle state of the run. This is the primary status visible to users and used for filtering and reporting.

Sources: internal-packages/database/prisma/schema.prisma internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts1-450

TaskRun Statuses

The TaskRunStatus enum defines the high-level states a run can be in:

Status	Description	Terminal	Next States
`PENDING`	Run is created and waiting to be queued	No	QUEUED, EXPIRED, PENDING_VERSION
`DELAYED`	Run is waiting for `delayUntil` time	No	PENDING, CANCELED
`PENDING_VERSION`	Waiting for a worker deployment to become available	No	QUEUED, SYSTEM_FAILURE
`QUEUED`	Enqueued and waiting to be dequeued by a worker	No	DEQUEUED, EXPIRED
`DEQUEUED`	Removed from queue, preparing to execute	No	EXECUTING
`EXECUTING`	Currently executing	No	WAITING_TO_RESUME, COMPLETED_SUCCESSFULLY, COMPLETED_WITH_ERRORS, CRASHED, SYSTEM_FAILURE, CANCELED
`WAITING_TO_RESUME`	Checkpointed and waiting to resume	No	EXECUTING, CANCELED
`COMPLETED_SUCCESSFULLY`	Completed without errors	Yes	None
`COMPLETED_WITH_ERRORS`	Completed but with errors after all retries	Yes	None
`SYSTEM_FAILURE`	Failed due to system/infrastructure issues	Yes	None
`CRASHED`	Worker crashed (OOM, segfault, etc.)	Yes	None
`EXPIRED`	TTL exceeded before execution	Yes	None
`TIMED_OUT`	Execution exceeded maxDuration	Yes	None
`CANCELED`	Explicitly canceled	Yes	None
`INTERRUPTED`	Interrupted (legacy, no longer used)	Yes	None

Sources: internal-packages/run-engine/src/engine/statuses.ts44-61 internal-packages/database/prisma/schema.prisma

TaskRunExecutionStatus

The TaskRunExecutionStatus enum defines the detailed execution states tracked in snapshots:

Status	Description	Dequeuable	Checkpointable	Holds Concurrency
`RUN_CREATED`	Initial snapshot created	No	Yes	No
`QUEUED`	Enqueued in run queue	Yes	Yes	Yes
`QUEUED_EXECUTING`	Queued while still executing (rare)	Yes	Yes	Yes
`PENDING_EXECUTING`	Dequeued, waiting for attempt to start	No	No	Yes
`EXECUTING`	Actively executing code	No	Yes	Yes
`EXECUTING_WITH_WAITPOINTS`	Executing but blocked on waitpoints	No	Yes	No (released)
`SUSPENDED`	Checkpointed and suspended	No	No	No (released)
`FINISHED`	Execution complete	No	No	No
`PENDING_CANCEL`	Cancellation requested	No	No	Yes

Sources: internal-packages/run-engine/src/engine/statuses.ts1-62 internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts140-156

Complete State Machine

The following diagram shows all possible state transitions for a task run, including both TaskRunStatus (shown in bold) and TaskRunExecutionStatus (shown in context):

State Transitions by Phase

Triggering Phase

When a run is triggered via RunEngine.trigger(), it begins in one of two initial states:

Initial State	Condition	Execution Status
`DELAYED`	`delayUntil` parameter provided	`RUN_CREATED`
`PENDING`	No delay, or after delay expires	`RUN_CREATED` → `QUEUED`

The trigger operation creates:

A TaskRun record with the initial status
An initial TaskRunExecutionSnapshot with executionStatus: RUN_CREATED
An associated Waitpoint (type RUN) for parent tracking
If delayed, schedules enqueueDelayedRun job via DelayedRunSystem
If not delayed, immediately calls EnqueueSystem.enqueueRun()

Sources: internal-packages/run-engine/src/engine/index.ts339-579 internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts100-166

Queuing Phase

The EnqueueSystem transitions runs from initial states to the queue:

Enqueuing performs:

Creates new snapshot with executionStatus: QUEUED
Calculates queue timestamp: (queueTimestamp ?? createdAt) - priorityMs
Adds message to RunQueue (Redis-backed)
Allocates concurrency at environment and queue levels
If TTL specified, schedules expireRun job

Sources: internal-packages/run-engine/src/engine/systems/enqueueSystem.ts25-104 internal-packages/run-queue/src/index.ts

Dequeuing Phase

The DequeueSystem removes runs from the queue and prepares them for execution:

Key validations during dequeue:

Latest snapshot must be in QUEUED or QUEUED_EXECUTING state
Background worker must exist for the run's locked version
Task must be registered in the deployment
Task queue must exist
Deployment must have an image reference (production only)

Sources: internal-packages/run-engine/src/engine/systems/dequeueSystem.ts105-661 internal-packages/run-engine/src/engine/systems/dequeueSystem.ts281-489

Execution Phase

The RunAttemptSystem manages the execution lifecycle:

Sources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts294-628 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts630-822

Waitpoint States

Waitpoints allow runs to pause execution while waiting for external events. The WaitpointSystem manages these states:

When a run enters EXECUTING_WITH_WAITPOINTS:

Concurrency tokens are released (environment and queue level)
Other runs can execute while this run waits
Multiple waitpoints can block a single run
Waitpoints can complete independently
When all waitpoints complete, continueRunIfUnblocked() is triggered

Sources: internal-packages/run-engine/src/engine/systems/waitpointSystem.ts368-496 internal-packages/run-engine/src/engine/systems/waitpointSystem.ts499-737

Checkpoint and Resume States

The CheckpointSystem allows runs to suspend execution and free all resources:

Checkpoint states:

WAITING_TO_RESUME with SUSPENDED snapshot: Checkpoint exists, run is fully suspended
Concurrency is completely released
Run must be re-queued to resume
On resume, checkpoint data (location, imageRef) is provided to worker
Worker restores state from checkpoint storage

Sources: internal-packages/run-engine/src/engine/systems/checkpointSystem.ts36-249 internal-packages/run-engine/src/engine/systems/checkpointSystem.ts254-365

Terminal States

Runs enter terminal states through RunAttemptSystem.completeRunAttempt():

Execution Snapshots

TaskRunExecutionSnapshot records provide an immutable audit trail of run state changes:

Key properties:

Immutable: Snapshots are never modified after creation
Linked: Each snapshot references the previous via previousSnapshotId
Atomic: Creating a snapshot is atomic with state updates
Latest: getLatestExecutionSnapshot() finds the most recent valid snapshot
Historical: getExecutionSnapshotsSince() retrieves snapshot history

Snapshot creation flow:

ExecutionSnapshotSystem.createExecutionSnapshot() is called by any system changing state
New snapshot record is inserted with current executionStatus and runStatus
If valid, heartbeat job is scheduled (if needed for the status)
executionSnapshotCreated event is emitted for realtime updates
Previous snapshot becomes part of the historical chain

Sources: internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts226-337 internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts95-113

Concurrency and State

Different states consume or release concurrency tokens:

State	Environment Concurrency	Queue Concurrency	Notes
`RUN_CREATED`	No	No	Not yet queued
`QUEUED`	Yes	Yes	Holds both tokens
`PENDING_EXECUTING`	Yes	Yes	Still holding while starting
`EXECUTING`	Yes	Yes	Active execution
`EXECUTING_WITH_WAITPOINTS`	No	No	Released during wait
`SUSPENDED`	No	No	Released after checkpoint
`QUEUED_EXECUTING`	Yes	Yes	Rare state during re-queue
`FINISHED`	No	No	Execution complete

This design allows the system to maximize throughput:

Runs waiting on external events (EXECUTING_WITH_WAITPOINTS) don't block other runs
Checkpointed runs (SUSPENDED) completely free resources for other work
Only actively executing runs consume concurrency

The concurrency release happens via:

RunQueue.releaseAllConcurrency(orgId, runId) for both levels
Called by CheckpointSystem and WaitpointSystem
Automatic acknowledgment via RunQueue.acknowledgeMessage() on completion

Heartbeat System

The heartbeat system detects stalled runs and initiates recovery:

Heartbeat intervals by status (from HeartbeatTimeouts):

Status	Default Timeout	Behavior on Timeout
`PENDING_EXECUTING`	60,000ms	Requeue (assume startup failure)
`PENDING_CANCEL`	60,000ms	Force cancel
`EXECUTING`	60,000ms	Mark as crashed (OOM/segfault)
`EXECUTING_WITH_WAITPOINTS`	60,000ms	Usually shouldn't timeout
`SUSPENDED`	600,000ms	Retry heartbeat with exponential backoff

The heartbeat mechanism:

ExecutionSnapshotSystem.createExecutionSnapshot() schedules initial heartbeat
ExecutionSnapshotSystem.heartbeatRun() reschedules on successful heartbeat
Worker calls heartbeatRun() periodically during execution
If not rescheduled, handleStalledSnapshot() is triggered
Recovery action depends on status and environment type

State Machine Implementation Mapping

The following table maps each state transition to the responsible code entity:

Transition	Triggered By	Primary System	Method
Initial → `DELAYED`	User trigger	`RunEngine`	`trigger()`
Initial → `PENDING`	User trigger	`RunEngine`	`trigger()`
`DELAYED` → `PENDING`	Scheduled job	`DelayedRunSystem`	`enqueueDelayedRun()`
`PENDING` → `QUEUED`	System	`EnqueueSystem`	`enqueueRun()`
`QUEUED` → `PENDING_EXECUTING`	Worker pull	`DequeueSystem`	`dequeueFromWorkerQueue()`
`PENDING_EXECUTING` → `EXECUTING`	Worker	`RunAttemptSystem`	`startRunAttempt()`
`EXECUTING` → `EXECUTING_WITH_WAITPOINTS`	Task code	`WaitpointSystem`	`blockRunWithWaitpoint()`
`EXECUTING_WITH_WAITPOINTS` → `EXECUTING`	Waitpoint completion	`WaitpointSystem`	`continueRunIfUnblocked()`
`EXECUTING` → `SUSPENDED`	Task code	`CheckpointSystem`	`createCheckpoint()`
`SUSPENDED` → `QUEUED`	Unblock/resume	`EnqueueSystem`	`enqueueRun()`
`PENDING_EXECUTING` → `EXECUTING` (resume)	Worker	`CheckpointSystem`	`continueRunExecution()`
`EXECUTING` → `COMPLETED_SUCCESSFULLY`	Task completion	`RunAttemptSystem`	`attemptSucceeded()`
`EXECUTING` → `COMPLETED_WITH_ERRORS`	Task error	`RunAttemptSystem`	`attemptFailed()`
`EXECUTING` → `EXECUTING` (retry)	Task error	`RunAttemptSystem`	`attemptFailed()` + `startRunAttempt()`
`EXECUTING` → `CRASHED`	Heartbeat timeout	`RunAttemptSystem`	`handleStalledSnapshot()`
`EXECUTING` → `PENDING_CANCEL`	User cancellation	`RunAttemptSystem`	`cancelRun()`
`PENDING_CANCEL` → `CANCELED`	Worker ACK	`RunAttemptSystem`	`cancelRun()` (finalize)
`PENDING` → `EXPIRED`	TTL timeout	`TtlSystem`	`expireRun()`
`PENDING` → `PENDING_VERSION`	No deployment	`DequeueSystem`	`#pendingVersion()`

All state transitions occur within a distributed lock acquired via RunLocker to ensure consistency across the cluster.

Run Lifecycle and State Machine

Dual Status Model

TaskRun Statuses

TaskRunExecutionStatus

Complete State Machine

State Transitions by Phase

Triggering Phase

Queuing Phase

Dequeuing Phase

Execution Phase

Waitpoint States

Checkpoint and Resume States

Terminal States

Execution Snapshots

Concurrency and State

Heartbeat System

State Machine Implementation Mapping

On this page

Run Lifecycle and State Machine

Dual Status Model

TaskRun Statuses

TaskRunExecutionStatus

Complete State Machine

State Transitions by Phase

Triggering Phase

Queuing Phase

Dequeuing Phase

Execution Phase

Waitpoint States

Checkpoint and Resume States

Terminal States

Execution Snapshots

Concurrency and State

Heartbeat System

State Machine Implementation Mapping

On this page