Overview
Playground
API Reference
Model Specifications
Pricing · per token
Input tokens
per 1M tokens
$0.00
Cached input tokens
per 1M tokens
$0.0000
Output tokens
per 1M tokens
$0.00

Prices derived from CI benchmarks (vLLM on DGX Spark @$0.26/hr). Free tier available.

API base URL

Compatible with openai, litellm, langchain.

All Available Models
Model
Primary
System Prompt
Parameters
Temperature0.70
Max Tokens512
Top P1.00
Options
Streaming
Mode: Select a model to compare
💬
Start a conversation
Send a message to chat with the selected model.
Quantization benefits
LLM haiku
FP8 vs NVFP4
About DGX Spark
Powered by vLLM · OpenAI-compatible API
Endpoint
POST

OpenAI-compatible chat completions. Streaming via stream: true.

Model identifier
Code Examples
cURL
Python
Node.js
LiteLLM
Request Parameters
Parameter Type Description
modelstringFull model identifier
messagesarrayArray of {role, content} objects
max_tokensintegerMax tokens to generate (default 512)
temperaturefloatSampling temperature 0–2 (default 0.7)
top_pfloatNucleus sampling (default 1.0)
streambooleanStream via SSE (default false)