Model Specifications
Pricing · per token
Input tokens
per 1M tokens
$0.00
Cached input tokens
per 1M tokens
$0.0000
Output tokens
per 1M tokens
$0.00
Prices derived from CI benchmarks (vLLM on DGX Spark @$0.26/hr). Free tier available.
API base URL
Compatible with openai, litellm, langchain.
All Available Models
Model
Primary
Compare with
System Prompt
Parameters
Temperature0.70
Max Tokens512
Top P1.00
Options
Streaming
Start a conversation
Send a message to chat with the selected model.
Quantization benefits
LLM haiku
FP8 vs NVFP4
About DGX Spark
Powered by vLLM · OpenAI-compatible API
Endpoint
POST
OpenAI-compatible chat completions. Streaming via stream: true.
Model identifier
Code Examples
Request Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | Full model identifier |
| messages | array | Array of {role, content} objects |
| max_tokens | integer | Max tokens to generate (default 512) |
| temperature | float | Sampling temperature 0–2 (default 0.7) |
| top_p | float | Nucleus sampling (default 1.0) |
| stream | boolean | Stream via SSE (default false) |