Skip to content

Releases: hatchet-dev/hatchet

v0.57.1

31 Mar 18:02
229674f
Compare
Choose a tag to compare

What's Changed

Small improvements to docs and minor bug fixes.

Bug Fixes

  • Fixes an issue where /hatchet/hatchet-migrate isn't set automatically as an entrypoint in the migration container (#1440)
  • Improves Typescript typing (#1444)
  • Improves error-handling in the Hatchet gRPC service when sending a task to a worker fails (#1441)

Full Changelog: v0.57.0...v0.57.1

v0.57.0 - v1 SDKs

28 Mar 04:57
024af22
Compare
Choose a tag to compare

V1 SDK Improvements

This release bundles SDK improvements for Go, Typescript, and Python.

Python SDK Highlights

The Python SDK has a number of notable highlights to showcase for V1. Many of them have been highlighted elsewhere, such as in the migration guide, on the Pydantic page, an in various examples. Here, we'll list out each of them, along with their motivations and benefits.

First and foremost: Many of the changes in the V1 Python SDK are motivated by improved support for type checking and validation across large codebases and in production use-cases. With that in mind, the main highlights in the V1 Python SDK are:

  1. Workflows are now declared with hatchet.workflow, which returns a Workflow object, or hatchet.task (for simple cases) which returns a Standalone object. Workflows then have their corresponding tasks registered with Workflow.task. The Workflow object (and the Standalone object) can be reused easily across the codebase, and has wrapper methods like run and schedule that make it easy to run workflows. In these wrapper methods, inputs to the workflow are type checked, and you no longer need to specify the name of the workflow to run as a magic string.
  2. Tasks have their inputs type checked, and inputs are now Pydantic models. The input field is either the model you provide to the workflow as the input_validator, or is an EmptyModel, which is a helper Pydantic model Hatchet provides and uses as a default.
  3. In the new SDK, we define the parents of a task as a list of Task objects as opposed to as a list of strings. This also allows us to use ctx.task_output(my_task) to access the output of the my_task task in the a downstream task, while allowing that output to be type checked correctly.
  4. In the new SDK, inputs are injected directly into the task as the first positional argument, so the signature of a task now will be Callable[[YourWorkflowInputType, Context]]. This replaces the old method of accessing workflow inputs via context.workflow_input().

Other Breaking Changes

There have been a number of other breaking changes throughout the SDK in V1.

Typing improvements:

  1. External-facing protobuf objects, such as StickyStrategy and ConcurrencyLimitStrategy, have been replaced by native Python enums to make working with them easier.
  2. All external-facing types that are used for triggering workflows, scheduling workflows, etc. are now Pydantic objects, as opposed to being TypedDicts.
  3. The return type of each Task is restricted to a JSONSerializableMapping or a Pydantic model, to better align with what the Hatchet Engine expects.
  4. The ClientConfig now uses Pydantic Settings, and we've removed the static methods on the Client for from_environment and from_config in favor of passing configuration in correctly. See the configuration example for more details.
  5. The REST API wrappers, which previously were under hatchet.rest, have been completely overhauled.

Naming changes:

  1. We no longer have nested aio clients for async methods. Instead, async methods throughout the entire SDK are prefixed by aio_, similar to Langchain's use of the a prefix to indicate async. For example, to run a workflow, you may now either use workflow.run() or workflow.aio_run().
  2. All functions on Hatchet clients are now verbs. For instance, if something was named hatchet.nounVerb before, it now will be something more like hatchet.verb_noun. For example, hatchet.runs.get_result gets the result of a workflow run.
  3. timeout, the execution timeout of a task, has been renamed to execution_timeout for clarity.

Removals:

  1. sync_to_async has been removed. We recommend reading our asyncio documentation for our recommendations on handling blocking work in otherwise async tasks.
  2. The AdminClient has been removed, and refactored into individual clients. For example, if you absolutely need to create a workflow run manually without using Workflow.run or Standalone.run, you can use hatchet.runs.create. This replaces the old hatchet.admin.run_workflow.

Other miscellaneous changes:

  1. As shown in the Pydantic example above, there is no longer a spawn_workflow(s) method on the Context. run is now the preferred method for spawning workflows, which will automatically propagate the parent's metadata to the child workflow.
  2. All times and durations, such as execution_timeout and schedule_timeout, now allow datetime.timedelta objects instead of only allowing strings (e.g. "10s" can be timedelta(seconds=10)).

Other New features

There are a handful of other new features that will make interfacing with the SDK easier, which are listed below.

  1. Concurrency keys using the input to a workflow are now checked for validity at runtime. If the workflow's input_validator does not contain a field that's used in a key, Hatchet will reject the workflow when it's created. For example, if the key is input.user_id, the input_validator Pydantic model must contain a user_id field.
  2. There is now an on_success_task on the Workflow object, which works just like an on-failure task, but it runs after all upstream tasks in the workflow have succeeded.
  3. We've exposed feature clients on the Hatchet client to make it easier to interact with and control your environment.

For example, you can write scripts to find all runs that match certain criteria, and replay or cancel them.

hatchet = Hatchet()

workflows = hatchet.workflows.list()

assert workflows.rows

workflow = workflows.rows[0]

workflow_runs = hatchet.runs.list(workflow_ids=[workflow.metadata.id])

workflow_run_ids = [workflow_run.metadata.id for workflow_run in workflow_runs.rows]

bulk_cancel_by_ids = BulkCancelReplayOpts(ids=workflow_run_ids)

hatchet.runs.bulk_cancel(bulk_cancel_by_ids)

bulk_cancel_by_filters = BulkCancelReplayOpts(
    filters=RunFilter(
        since=datetime.today() - timedelta(days=1),
        until=datetime.now(),
        statuses=[V1TaskStatus.RUNNING],
        workflow_ids=[workflow.metadata.id],
        additional_metadata={"key": "value"},
    )
)

hatchet.runs.bulk_cancel(bulk_cancel_by_filters)

The hatchet client also has clients for workflows (declarations), schedules, crons, metrics (i.e. queue depth), events, and workers.

Typescript SDK Highlights

The Typescript SDK has a number of notable highlights to showcase for V1. Many of them have been highlighted elsewhere, such as in the migration guide, an in various examples. Here, we'll list out each of them, along with their motivations and benefits.

First and foremost: Many of the changes in the V1 Typescript SDK are motivated by improved support for type checking and inference across large codebases and in production use-cases. With that in mind, here are the main highlights:

  1. We've moved away from a pure object-based pattern to a factory pattern for creating your workflows and tasks. This allows for much more flexibility and type safety.

The simplest way to declare a workflow is with hatchet.task.

export const simple = hatchet.task({
  name: "simple",
  fn: (input: SimpleInput) => {
    return {
      TransformedMessage: input.Message.toLowerCase(),
    };
  },
});

This returns an object that you can use to run the task with fully inferred types!

const input = { Message: "Hello, World!" };
// run now
const result = await simple.run(input);
const runReference = await simple.runNoWait(input);

// or in the future
const runAt = new Date(new Date().setHours(12, 0, 0, 0) + 24 * 60 * 60 * 1000);
const scheduled = await simple.schedule(runAt, input);
const cron = await simple.cron("simple-daily", "0 0 * * *", input);
  1. DAGs got a similar and can be run the same way. DAGs are now a collection of tasks that are composed by calling .task on the Workflow object.

You can declare your types for DAGs. Output types are checked if there is a corresponding task name as a key in the output type.

type DagInput = {
  Message: string;
};

type DagOutput = {
  reverse: {
    Original: string;
    Transformed: string;
  };
};

export const dag = hatchet.workflow<DagInput, DagOutput>({
  name: "simple",
});

// Next, we declare the tasks bound to the workflow
const toLower = dag.task({
  name: "to-lower",
  fn: (input) => {
    return {
      TransformedMessage: input.Message.toLowerCase(),
    };
  },
});

// Next, we declare the tasks bound to the workflow
dag.task({
  name: "reverse",
  parents: [toLower],
  fn: async (input, ctx) => {
    const lower = await ctx.parentOutput(toLower);
    return {
      Original: input.Message,
      Transformed: lower.TransformedMessage.split("").reverse().join(""),
    };
  },
});
  1. Logical organization of SDK features to make it easier to understand and use.

We've exposed feature clients on the Hatchet client to make it easier to interact with and control your environment.

For example, you can write scripts to find all runs that match certain criteria, and replay or cancel them.

const hatchet = HatchetClient.init();
const { runs } = hatchet;

const allFailedRuns = await runs.list({
  statuses: [WorkflowRunStatus.FAILED],
});

// replay by ids
await runs.replay({ ids: allFailedRuns.rows?.map((r) => r.metadata.id) });

// or you can run bulk operations with filters directly
await runs.cancel({
  filters: {
    since: new Date("2025-03-27"),
    additionalMetadata: { user: "123" },
  },
});

The hatchet client also has clients for workflows (declarations), schedules,...

Read more

v0.56.3 - Benchmarking Service

27 Mar 04:36
b20b74d
Compare
Choose a tag to compare

Benchmarking Hatchet

Today, we're open-sourcing our new benchmarking container, which allows anyone to test the performance of their Hatchet setup. This load testing container can be run in pretty much any environment -- for example:

docker run -e HATCHET_CLIENT_TOKEN=your-token ghcr.io/hatchet-dev/hatchet/hatchet-loadtest -e "100" -d "60s" --level "warn" --slots "100"

Example Results

With our latest v1 Hatchet engine, we ran a series of internal benchmarks on an 8 CPU database (Amazon RDS m7g.2xlarge), achieving a stable throughput of up to 2000 events/second. Beyond that, we've also tested higher throughput on larger DB instances (for example, up to 10k events/second on an m7g.8xlarge).

Here's a brief summary of our results:

  • Throughput: Scales smoothly up to 2000 events/s on m7g.2xlarge, leveling out at about 83% CPU utilization on the database.
  • Latency: For lower throughput (100-500 events/s), average execution time remains below 50ms.
  • Setup: Benchmarks were run against a Kubernetes cluster on AWS with 2 Hatchet engine replicas (c7i.4xlarge). The RDS database instance (m7g.2xlarge) was chosen to avoid disk/CPU contention under typical load.

The new engine design allows Hatchet to efficiently handle high volumes of background tasks with stable performance, making it well-suited for AI-driven workloads or large-scale async event processing.

Want to see more details or run your own benchmarks? Read our benchmarking guide for more information.

v0.55.26 - Hatchet v1 Engine

25 Mar 22:54
5062bf1
Compare
Choose a tag to compare

🪓 Launching our v1 Engine

March 24th, 2025

For the past several months, we've been working on a complete rewrite of the Hatchet queue with a focus on performance and a set of feature requests which weren't possible in the v0 architecture. The v1 engine has been available in preview for several weeks, and is currently running over 200 million tasks/month on Hatchet Cloud.

Migration Guide

To upgrade to the v1 engine, please see our migration guide in our docs.

In the rest of this document, we'll cover the major architectural improvements that we made to the Hatchet engine -- while you don't need to worry about these as a Hatchet user, perhaps they'll be interesting to other developers who are working on similar Postgres scaling problems.

TL;DR - nearly every usage pattern of Hatchet is faster and more efficient in v1, including usage of concurrency keys, high-throughput workflows, and long-running workflows. We've seen a 30% reduction in latency, a 10x increase in throughput, and a 5x reduction in CPU and IOPs load on the database.

Throughput Improvements

One of the main bottlenecks in the previous Hatchet engine was throughput -- we could only handle about 1k tasks/second on the ingestion side, even on a relatively large database, even though we could queue at a much faster rate once the work was ingested into the system.

Hatchet is a durable task queue built on Postgres, which means that every result and intermediate task event needs to be persisted to the database in a transactionally-safe manner. One of the main ways to increase throughput in Postgres is to perform batch inserts of data, which can considerably cut down on transaction overhead and round trips to the database.

While the v0 engine was utilizing batch inserts on a few high-volume tables, the v1 engine now uses batch inserts on almost every table in the system. This has allowed us to increase throughput by an order of magnitude -- we're now able to queue up to 10k tasks/second on a single Hatchet instance, and we've been able to sustain that rate over a 24 hour period.

However, throughput and latency are always a tradeoff -- so how did the Hatchet engine get faster as as well? The answer comes down to dynamic buffering...

Latency Improvements

The v1 engine is able to achieve a 30% reduction in latency, from 30ms -> 20ms average queue time. The main reason for this is that we run many buffers in parallel in engine memory which are flushed on an interval.

At a relatively low volume, we'd like to write to the database with only one item in the buffer, because we're not getting a major performance benefit from batched inserts on the database. So when the first tasks enter the buffers, we flush them as soon as there's 1 task in the buffer.

After we've flushed a buffer, we wait a minimum of 10ms before flushing out of the buffer again. This means that at a low volume of 100 tasks/s, we're still flushing to the database immediately. At a higher volume, we're batching tasks in the buffers before flushing them to the database.

This naturally isn't the full story -- if we receive too many tasks, we'd like our buffers to exert some sort of backpressure on the caller. So there's an internal saturation threshold on the buffers, where callers will be blocked from adding more tasks to the buffer until the buffer is flushed.

It turns out this strategy is extremely effective at providing fast queueing at relatively low volume, but allowing queue duration to be constant at a much higher volume!

Deletion Improvements

One of the main bottlenecks in the v0 engine was deletion of old data. Not only did large Hatchet instances get bloated on disk quickly, but deletion operations would commonly time out. The reason for this is that we were relying on UUIDs as the primary key for most tables, whose indices are randomly distributed across pages in the database.

In the v1 engine, we've added table partitioning by range to almost every large table in the system. This has allowed us to drop old data much more efficiently, because we can simply delete the table partition which contains the old data.

Other Improvements

We've also made a number of other improvements to the engine, including:

  1. We changed our high-write and high-churn table definitions to use identity columns and tuned autovacuum settings on a per-table basis

  2. We're now using separate tables to implement the actual queue vs listing workflow runs for usage by the API. This allowed us to remove almost all indexes on the queue which speeds up queueing significantly

  3. We process status updates for tasks in a deferred fashion, using tables which are hash-partitioned by the workflow run id. This was one of the most difficult parts of the rewrite -- turns out computing task statuses is a surprisingly difficult problem when task events can arrive out-of-order. We actually were exploring the use of Clickhouse for storing task events and rendering task status, but it didn't perform the way that we were expecting.

  4. A single task implements a state machine, and the state machine transitions are now using Postgres triggers. Using triggers keeps most updates and processing on the database server which increases performance.

Get Started with the v1 Engine

The v1 engine is available on Hatchet Cloud today. To get started, simply create a new Hatchet Cloud account and start queuing tasks. If you're an existing Hatchet user or are self-hosting, please see our migration guide to get started.

v0.54.8

13 Feb 17:31
2ebd659
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.54.7...v0.54.8

v0.54.6

05 Feb 21:58
30c1a97
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.54.5...v0.54.6

v0.53.10

14 Jan 17:23
75657a1
Compare
Choose a tag to compare

What's Changed

  • chore(deps): bump github.com/go-playground/validator/v10 from 10.23.0 to 10.24.0 by @dependabot in #1182
  • Feat: Fix switching tenants not working by @mrkaye97 in #1172
  • fix: don't exit early out of queuer by @abelanger5 in #1184
  • chore(deps): bump google.golang.org/grpc from 1.69.2 to 1.69.4 by @dependabot in #1185
  • JFI: Fixing copy-paste bug in workflow settings by @mrkaye97 in #1183
  • fix: hard sticky strategy with no desired worker id by @abelanger5 in #1186

Full Changelog: v0.53.9...v0.53.10

v0.52.11

06 Dec 21:40
1499668
Compare
Choose a tag to compare

What's Changed

  • fix: duplicate cron expressions only cause a single trigger by @abelanger5 in #1101

Full Changelog: v0.52.10...v0.52.11

v0.52.6

05 Dec 21:56
db6558a
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.52.5...v0.52.6

v0.50.4

30 Oct 22:46
a9936ef
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.50.3...v0.50.4