NVIDIA Dynamo is an open-source modular inference framework for serving generative AI models in distributed environments. It enables seamless scaling of inference workloads across large GPU fleets with dynamic resource scheduling, intelligent request routing, optimized memory management, and accelerated data transfer.
When serving the open-source DeepSeek-R1 671B reasoning model on NVIDIA GB200 NVL72, NVIDIA Dynamo increased the number of requests served by up to 30x, making it the ideal solution for AI factories looking to run at the lowest possible cost to maximize token revenue generation.
NVIDIA Dynamo supports all major AI inference backends and features large language model (LLM)-specific optimizations, such as disaggregated serving, accelerating and scaling AI reasoning models at the lowest cost and with the highest efficiency. It will be supported as a part of NVIDIA AI Enterprise in a future release.