Lingua is a library and specification for defining a universal message format for large language model APIs. It enables developers to write messages, model parameters, and tool definitions in a single format that can be translated to and from any model provider's API client-side with zero runtime overhead.
- You should be able to write messages, model parameters, and tool definitions in this format, and use them with any version of any model provider.
- The spec describes how message data is represented, and the implementation converts to-and-from model provider APIs and popular frameworks.
- ~Zero runtime overhead, because there is no execution logic. The sole purpose of this project is to define a universal message format that can be translated across different model providers.
- Framework. This project is explicitly not providing any higher-level abstractions, guidance on how to structure your app, or model execution support. Frameworks can build on top of Lingua to avoid reimplementing model-provider translation.
- Proxy. This format could be used as the foundation for a proxy implementation, but has no concept of actually running prompts or handling authentication.
- Optimization. Messages written in this format will execute exactly what you would expect from the model provider. 3rd party optimizers can be built on the format, and those optimizers will naturally work across providers.
- Supports 100% of model-provider specific quirks (eg cache breakpoints, stateful responses).
- Messages you write in this format should be safe to store and survive many years of changes in model behavior and API versions.
- Zero dependencies and support for many languages including Typescript, Python, Java, Golang. Ideally can cross-compile or trivial for AI to generate support in language N+1.
- Code and tests are structured to facilitate coding agents to efficiently add new providers and support new features.
- Has a precise definition of usage (token) reporting that can be used to compute cost from a standard price table across providers.
Lingua Universal Format
↓
Capability Detection
↓
Provider Translators
↓
OpenAI │ Anthropic │ Google │ Bedrock │ ...
[ ... list the known capabilities ... ]
[ .. for each provider, list which capabilities are supported ... ]
lingua/
├── src/
│ ├── universal/ # Universal Lingua format definitions
│ ├── providers/ # Provider-specific API types
│ ├── translators/ # Translation logic between formats
│ ├── capabilities/ # Capability detection system
│ ├── wasm.rs # WebAssembly bindings
│ ├── python.rs # Python bindings (PyO3)
│ └── lib.rs # Main library entry
├── bindings/
│ ├── typescript/ # TypeScript/WASM bindings
│ └── python/ # Python bindings
├── examples/ # Usage examples
└── tests/typescript/ # TypeScript compatibility tests
Before building WASM bindings for the first time, install the required tools:
# Install WASM build tools (wasm32-unknown-unknown target, wasm-bindgen-cli)
make install-wasm-tools
# Or run the full setup script
./scripts/setup.shUse the Makefile for easy building:
# Show all available targets
make help
# Build all bindings
make all
# Build specific bindings
make typescript
make python
# Run tests
make test
make test-rust
make test-typescript
make test-python
# Clean build artifacts
make cleancd bindings/typescript
npm install
npm run build
npm testSee bindings/typescript/README.md for details.
cd bindings/python
uv sync --extra dev
uv run pytest tests/See bindings/python/README.md for details.
These two primitives make it very easy to, for example, reproduce how tool calls are represented in each of the major model providers, snapshot inputs/outputs, and then fill in logic to translate between them. Some of that has been handwritten, but a lot of it is generated by LLMs. In general, a goal of this project is create enough scaffolding so that you can open a coding agent and say "fix support for parallel tool calls" and there should be enough context for an LLM to add the test cases, generate snapshots, and make the necessary changes.
OpenAI and Anthropic do not have Rust SDKs, but they do publish OpenAPI specs, so we fetch those and use quicktype (with a few hacks) to generate Rust types. These types generally work but do not have really great discriminated unions (like they do in Typescript). Google has protobufs, which we are able to convert to Rust types, and Bedrock actually publishes a Rust SDK.
Lingua employs a comprehensive testing strategy to ensure accurate and lossless conversion between provider-specific formats and the universal format.
The core testing approach uses roundtrip conversion tests to verify that data can be converted from provider format → universal format → provider format without loss:
Provider Payload → Universal ModelMessage → Provider Payload
(input) (conversion) (output)
Key test scenarios:
-
Request Roundtrips:
openai_request → universal → openai_request(should be identical)anthropic_request → universal → anthropic_request(should be identical)
-
Response Roundtrips:
openai_response → universal → openai_response(should be identical)anthropic_response → universal → anthropic_response(should be identical)
-
Cross-Provider Compatibility:
openai_request → universal → anthropic_request(should be equivalent)anthropic_response → universal → openai_response(should be equivalent)
Tests use real API payloads captured from actual provider interactions:
- Payload Snapshots: Located in
paylods/snapshots/directory with real request/response examples - Comprehensive Coverage: Tests cover simple messages, tool calls, streaming responses, multi-modal content
- Version Tracking: Payloads are version-controlled to detect breaking changes in provider APIs
- Unit Tests: Individual conversion functions with synthetic data
- Integration Tests: Full roundtrip tests using real payload snapshots
- Compatibility Tests: Cross-provider conversion validation
- Regression Tests: Ensure updates don't break existing functionality
This strategy ensures Lingua maintains 100% fidelity when converting between provider formats while providing confidence that the universal format can represent any provider-specific capability.
Provider types can be automatically updated using GitHub Actions:
- Manual trigger: Go to Actions → "Update Provider Types" → Run workflow
- Choose providers: Select
all, or specific providers likeopenai,anthropic - Automatic PR: If changes are detected, a PR will be created automatically
The automation downloads the latest specifications, regenerates types, applies formatting, and creates a pull request for review.
- Show token accounting across providers. Ideally we give users a way to access the provider's native usage + a unified format.
- How does structured outputs + Anthropic work? Translate to tool, and parse the response? Does that require carrying some state across request/response? Maybe we can generate an object when performing the forward translation that can be used in the reverse translation.
- Audit and remove all remaining
todo!()calls
Lingua supports optional provider dependencies through feature flags to minimize build time and binary size:
openai- OpenAI API types and translatorsanthropic- Anthropic API types and translatorsgoogle- Google Gemini API types and translatorsbedrock- Amazon Bedrock API types and translators (pulls in AWS SDK)
Default (all providers):
[dependencies]
lingua = "0.1.0"Minimal (only OpenAI):
[dependencies]
lingua = { version = "0.1.0", default-features = false, features = ["openai"] }Without AWS dependencies:
[dependencies]
lingua = { version = "0.1.0", default-features = false, features = ["openai", "anthropic", "google"] }Only Bedrock:
[dependencies]
lingua = { version = "0.1.0", default-features = false, features = ["bedrock"] }The translators and types are only available when their respective features are enabled:
#[cfg(feature = "openai")]
use lingua::translators::to_openai_format;
#[cfg(feature = "bedrock")]
use lingua::translators::to_bedrock_format_with_model;🚧 In Development - Currently building the foundational types and translator architecture.
- Support parsing streaming responses and combining streaming messages into a single response.
This project aims to support the entire ecosystem of LLM providers. Contributions for new providers, capability detection improvements, and format enhancements are welcome.
Prerequisites: Rust toolchain, Node.js, pnpm.
Run ./scripts/setup.sh from the project root after cloning. If the script succeeds, you should be all set! Otherwise, follow the error messages.
TypeScript types for the universal Message format are automatically generated from Rust types using ts-rs:
# Generate TypeScript types from Rust
make generate-types
# Generated files: bindings/typescript/src/generated/*.tsImportant: After modifying Rust types in src/universal/, run make generate-types and commit the updated TypeScript files. CI will verify that generated types are up to date.
TBD