Skip to content

sangezar/avroalign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Avro Align (subprocess)

A lightweight CLI that coerces/aligns JSON documents to an Avro schema. It reads JSON from stdin and writes aligned JSON to stdout. Designed to be used standalone or as a Redpanda Connect (Benthos) subprocess processor.

Features

  • Soft coercion of common types:
    • Numbers from strings (e.g. "200" → 200)
    • Booleans from strings ("true"/"false"/"1"/"0"/…)
    • ISO-8601 timestamps → epoch millis/micros when Avro logical types are used
    • Optional: treat empty string as null within unions
  • Permissive record handling: unknown fields are preserved as-is
  • Union handling prioritizes non-null branches
  • Avro long logicals detection: TimestampMillis/TimestampMicros supported
  • Schema source: inline JSON or Schema Registry (subject/version)
  • Caching of loaded schema with TTL and retryable HTTP requests

Build

Requires Go 1.22+.

  • Build linux/amd64 binary (default):
make build
# → bin/avroalign
  • Local development run (your current OS/arch):
go run . --help

Usage

The tool reads newline-delimited JSON objects from stdin and writes aligned JSON objects to stdout. On error it prints a message to stderr and exits with a non-zero code.

Examples

  • Inline schema:
echo '{"id":"123","active":"true"}' \
| ./bin/avroalign \
  --schema '{"type":"record","name":"User","fields":[{"name":"id","type":"string"},{"name":"active","type":"boolean"}]}'
  • Schema Registry (subject/version):
echo '{"id":"123","created_at":"2024-01-02T03:04:05Z"}' \
| ./bin/avroalign \
  --sr-url http://redpanda:8081 \
  --subject users-value \
  --version latest

CLI flags

  • --schema string: Inline Avro schema JSON (overrides Schema Registry)
  • --sr-url string: Schema Registry URL, e.g. http://redpanda:8081
  • --subject string: Schema Registry subject
  • --version string: Schema version number or latest (default latest)
  • --cache-ttl duration: Schema cache TTL (default 10m)
  • --on-error string: Error strategy: fail | null | drop_field (default fail)
  • --numbers-from-strings bool: Coerce numbers from strings (default true)
  • --booleans-from-strings bool: Coerce booleans from strings (default true)
  • --parse-iso-timestamps bool: Parse ISO timestamps (default true)
  • --empty-string-as-null bool: Treat empty string as null in unions (default false)

Behavior details

  • Records: Unknown fields are kept as-is (permissive mode).
  • Unions: Non-null branches are tried first; when enabled, an empty string becomes null if null is a branch.
  • Int bounds: Values mapped to Avro int are validated against 32-bit bounds.
  • Long logicals: TimestampMillis/TimestampMicros are detected by type name and parsed from ISO-8601 or numeric epoch. Date/TimeMillis/TimeMicros are minimally passed through as long values.
  • Bytes: Accepts []byte and string (converted to bytes). Decimal logical is not handled explicitly.

Exit codes

  • 0 — success
  • 1 — processing error (decode/align/encode)
  • 2 — invalid arguments

Redpanda Connect (Benthos) integration

Use the subprocess processor to pipe messages through this binary.

pipeline:
  processors:
    - subprocess:
        name: /plugins/avroalign
        args:
          - --sr-url
          - http://redpanda:8081
          - --subject
          - edges_clean_access_log-value
          - --version
          - latest
          - --cache-ttl
          - 10m
          - --on-error
          - fail
          - --numbers-from-strings=true
          - --booleans-from-strings=true
          - --parse-iso-timestamps=true
          - --empty-string-as-null=false

Adjust the command path and flags for your deployment. Ensure your messages are JSON lines.

Module

  • Go module: avroalign
  • Dependencies: github.com/hamba/avro/v2, github.com/hashicorp/go-retryablehttp

License

TBD.

About

Avro Align is a lightweight CLI that coerces JSON to an Avro schema. It reads JSON from stdin and writes aligned JSON to stdout, supporting Schema Registry or inline schemas, soft type coercion, logical timestamps, and permissive union/record handling. Ideal as a Redpanda Connect subprocess.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors