Skip to content

Self-describing binary format for arrays of integers, real numbers, complex numbers and strings, designed for object storage, database and single file.

License

Notifications You must be signed in to change notification settings

YShoji-HEP/ArrayObject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Array Object

"Buy Me A Coffee" "GitHub Sponsors" Crates.io Crates.io License

ArrayObject is a self-describing binary format for arrays of integers, floating-point numbers, complex numbers, and strings. It is designed for efficient object storage, database integration, and single-file usage.

ArrayObject is part of the dbgbb project.

Features

  • Self-describing data that can inflate into typed variables.
  • Simple, uniform arrays: no nested structures, tuples, or dataset names.
  • Generic integer and float types abstract away type size differences.
  • Automatic compression via variable-length encoding and dictionary coding for strings, minimizing data size.
  • Seamless conversions to and from Vec<_>, [T; N], ndarray, and nalgebra.

Status

Note: This crate is under active development. Specifications may change.

Usage Example

use array_object::*;

fn main() {
    // Convert data into binary
    let original = vec![1u32, 2, 3, 4];
    let obj: ArrayObject = original.clone().try_into().unwrap();
    let packed = obj.pack(); // Converts data into Vec<u8>

    // Restore data
    let unpacked = ArrayObject::unpack(packed).unwrap();
    let inflated: Vec<u32> = unpacked.try_into().unwrap();
    assert_eq!(original, inflated);
}

Format Overview

ArrayObject automatically selects the optimal data format to minimize size. For detailed encoding specifications, see the spec.

Integer

Unsigned or signed, determined at construction. Zigzag encoding is used for signed integers.

  • Scalar:
    • Short Integer (5-bit): Stored with the footer; total size is one byte.
    • Variable Length (8 × n bits): Shortened to the smallest possible size.
  • Array:
    • Fixed Length (8, 16, 32, 64, 128-bit): All elements use the same size.
    • Variable Length (8, 16, 32, 63, 64–128-bit variable): Each group of four integers has a size indicator byte; longer integers use additional bytes.

Float (Real, Complex)

32-bit and 64-bit floating-point numbers.

  • Scalar/Array:
    • Fixed Length (32, 64-bit): Uses the smallest size without precision loss.
    • Variable Length (32, 64-bit): Each group of four numbers has a size indicator byte.

String

Only UTF-8 strings are supported. The value 0xFF is reserved internally.

  • Scalar: Stored as Vec<u8> binary data.
  • Array:
    • Joined: Strings concatenated with 0xFF marker.
    • Dictionary: Up to 256 unique strings; array stores references to the dictionary.

Roadmap

  • Support for [T; N]
  • Support for usize and isize
  • Serde integration
  • Implementations in Python, Julia, R, C++, Fortran, etc.
  • Support for half-precision and extended-precision floats

Q&A

When is Array Object Useful?

ArrayObject is ideal for storing multi-dimensional arrays, similar to CSV files but with support for higher dimensions, reduced file size, and faster I/O. It is not designed for appending data or mixing types within a single object. Instead, it provides type abstraction, strict type checking, and type-dependent compression with minimal metadata.

How Does It Differ from Raw Binary Files?

Raw binary files lack type information and consistency checks, making interoperability difficult. ArrayObject enforces a well-defined specification, ensuring reliable data exchange regardless of language or platform.

How Does It Compare to Databases like HDF5?

ArrayObject is a compact, portable format for single arrays, not a database. It does not include metadata such as names or timestamps, which should be managed externally. Its simplicity makes it closer to CSV, but with strict typing and compression.

How Is It Different from Serialization Libraries like Serde?

ArrayObject prohibits nested structures and tuples, focusing on arrays of numbers or strings. Serialization frameworks typically add size indicators for each data item, while ArrayObject relies on storage systems for boundaries and uses a footer for metadata.

Why Support Complex Numbers?

Complex numbers are a natural extension of real numbers, with well-defined mappings. Many mathematical operations yield complex results, and explicit support simplifies data handling and conversion.

Why No Conversion from Vec<Vec<_>>?

Nested vectors may have varying lengths, which are incompatible with ArrayObject's uniform array model. For fixed-length arrays, use ndarray or nalgebra. For collections of compatible ArrayObject, use .try_concat(). For varying lengths, store multiple ArrayObject in object storage (see dbgbb).

About

Self-describing binary format for arrays of integers, real numbers, complex numbers and strings, designed for object storage, database and single file.

Resources

License

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages