lmcpp is both an executable binary that can be run, and a library that can be used in Rust programs.
Installing lmcpp-server-cli lmcpp-toolchain-cli executables
Assuming you have Rust/Cargo installed , run this command in a terminal:
cargo install lmcpp
It will make lmcpp-server-cli lmcpp-toolchain-cli commands available in your PATH if you've allowed the PATH to be modified when installing Rust . cargo uninstall lmcpp uninstalls.
Adding lmcpp library as a dependency
Run this command in a terminal, in your project's directory:
cargo add lmcpp
To add it manually, edit your project's Cargo.toml file and add to the [dependencies] section:
lmcpp = "0.1.1"
The lmcpp library will be automatically available globally.
Read the lmcpp library documentation .
Back to the crate overview .
Readme
lmcpp – llama.cpp 's llama-server for Rust
Fully Managed
Automated Toolchain – Downloads, builds, and manages the llama.cpp toolchain with LmcppToolChain .
Supported Platforms – Linux, macOS, and Windows with CPU, CUDA, and Metal support.
Multiple Versions – Each release tag and backend is cached separately, allowing you to install multiple versions of llama.cpp .
Blazing Fast UDS
UDS IPC – Integrates with llama-server ’s Unix-domain-socket client on Linux, macOS, and Windows.
Fast! – Is it faster than HTTP? Yes. Is it measurably faster? Maybe.
Fully Typed / Fully Documented
Server Args – All llama-server arguments implemented by ServerArgs .
Endpoints – Each endpoint has request and response types defined.
Good Docs – Every parameter was researched to improve upon the original llama-server documentation.
lmcpp-toolchain-cli – Manage the llama.cpp toolchain: download, build, cache.
lmcpp-server-cli – Start, stop, and list servers.
Easy Web UI – Use LmcppServerLauncher:: webui to start with HTTP and the Web UI enabled.
use lmcpp:: * ;
fn main ( ) -> LmcppResult< ( ) > {
let server = LmcppServerLauncher:: builder( )
. server_args (
ServerArgs:: builder( )
. hf_repo ( " bartowski/google_gemma-3-1b-it-qat-GGUF" ) ?
. build ( ) ,
)
. load ( ) ? ;
let res = server. completion (
CompletionRequest:: builder( )
. prompt ( " Tell me a joke about Rust." )
. n_predict ( 64 ) ,
) ? ;
println! ( " Completion response: {:#?} " , res. content) ;
Ok ( ( ) )
}
// With default model
cargo run -- bin lmcpp-server-cli -- --webui
// Or with a specific model from URL:
cargo run -- bin lmcpp-server-cli -- --webui -u https://huggingface.co/bartowski/google_gemma-3-1b-it-qat-GGUF/blob/main/google_gemma-3-1b-it-qat-Q4_K_M.gguf
// Or with a specific local model:
cargo run -- bin lmcpp-server-cli -- --webui -l /path/to/local/model.gguf
How It Works
Your Rust App
│
├─→ LmcppToolChain (downloads / builds / caches)
│ ↓
├─→ LmcppServerLauncher (spawns & monitors)
│ ↓
└─→ LmcppServer (typed handle over UDS*)
│
├─→ completion() → text generation
└─→ other endpoints → stuff
Endpoints ⇄ Typed Helpers
¹ Internal helper for server health.
Platform
CPU
CUDA
Metal
Binary Sources
Linux x64
✅
✅
–
Pre-built + Source
macOS ARM
✅
–
✅
Pre-built + Source
macOS x64
✅
–
✅
Pre-built + Source
Windows x64
✅
✅
–
Pre-built + Source