A lightweight CLI that coerces/aligns JSON documents to an Avro schema. It reads JSON from stdin and writes aligned JSON to stdout. Designed to be used standalone or as a Redpanda Connect (Benthos) subprocess processor.
- Soft coercion of common types:
- Numbers from strings (e.g. "200" → 200)
- Booleans from strings ("true"/"false"/"1"/"0"/…)
- ISO-8601 timestamps → epoch millis/micros when Avro logical types are used
- Optional: treat empty string as null within unions
- Permissive record handling: unknown fields are preserved as-is
- Union handling prioritizes non-null branches
- Avro long logicals detection: TimestampMillis/TimestampMicros supported
- Schema source: inline JSON or Schema Registry (subject/version)
- Caching of loaded schema with TTL and retryable HTTP requests
Requires Go 1.22+.
- Build linux/amd64 binary (default):
make build
# → bin/avroalign- Local development run (your current OS/arch):
go run . --helpThe tool reads newline-delimited JSON objects from stdin and writes aligned JSON objects to stdout. On error it prints a message to stderr and exits with a non-zero code.
- Inline schema:
echo '{"id":"123","active":"true"}' \
| ./bin/avroalign \
--schema '{"type":"record","name":"User","fields":[{"name":"id","type":"string"},{"name":"active","type":"boolean"}]}'- Schema Registry (subject/version):
echo '{"id":"123","created_at":"2024-01-02T03:04:05Z"}' \
| ./bin/avroalign \
--sr-url http://redpanda:8081 \
--subject users-value \
--version latest--schemastring: Inline Avro schema JSON (overrides Schema Registry)--sr-urlstring: Schema Registry URL, e.g.http://redpanda:8081--subjectstring: Schema Registry subject--versionstring: Schema version number orlatest(defaultlatest)--cache-ttlduration: Schema cache TTL (default10m)--on-errorstring: Error strategy:fail|null|drop_field(defaultfail)--numbers-from-stringsbool: Coerce numbers from strings (defaulttrue)--booleans-from-stringsbool: Coerce booleans from strings (defaulttrue)--parse-iso-timestampsbool: Parse ISO timestamps (defaulttrue)--empty-string-as-nullbool: Treat empty string as null in unions (defaultfalse)
- Records: Unknown fields are kept as-is (permissive mode).
- Unions: Non-null branches are tried first; when enabled, an empty string becomes
nullifnullis a branch. - Int bounds: Values mapped to Avro
intare validated against 32-bit bounds. - Long logicals:
TimestampMillis/TimestampMicrosare detected by type name and parsed from ISO-8601 or numeric epoch. Date/TimeMillis/TimeMicros are minimally passed through as long values. - Bytes: Accepts
[]byteandstring(converted to bytes). Decimal logical is not handled explicitly.
0— success1— processing error (decode/align/encode)2— invalid arguments
Use the subprocess processor to pipe messages through this binary.
pipeline:
processors:
- subprocess:
name: /plugins/avroalign
args:
- --sr-url
- http://redpanda:8081
- --subject
- edges_clean_access_log-value
- --version
- latest
- --cache-ttl
- 10m
- --on-error
- fail
- --numbers-from-strings=true
- --booleans-from-strings=true
- --parse-iso-timestamps=true
- --empty-string-as-null=falseAdjust the command path and flags for your deployment. Ensure your messages are JSON lines.
- Go module:
avroalign - Dependencies:
github.com/hamba/avro/v2,github.com/hashicorp/go-retryablehttp
TBD.