NVLink

Feb 13, 2025
Simplify System Memory Management with the Latest NVIDIA GH200 NVL2 Enterprise RA
NVIDIA Enterprise Reference Architectures (Enterprise RAs) can reduce the time and cost of deploying AI infrastructure solutions. They provide a streamlined...
8 MIN READ

Jan 24, 2025
Optimize AI Inference Performance with NVIDIA Full-Stack Solutions
The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing...
9 MIN READ

Dec 11, 2024
Deploying NVIDIA H200 NVL at Scale with New Enterprise Reference Architecture
Last month at the Supercomputing 2024 conference, NVIDIA announced the availability of NVIDIA H200 NVL, the latest NVIDIA Hopper platform. Optimized for...
8 MIN READ

Nov 21, 2024
Best Practices for Multi-GPU Data Analysis Using RAPIDS with Dask
As we move towards a more dense computing infrastructure, with more compute, more GPUs, accelerated networking, and so forth—multi-gpu training and analysis...
5 MIN READ

Nov 01, 2024
3x Faster AllReduce with NVSwitch and TensorRT-LLM MultiShot
Deploying generative AI workloads in production environments where user numbers can fluctuate from hundreds to hundreds of thousands – and where input...
5 MIN READ

Oct 28, 2024
NVIDIA GH200 Superchip Accelerates Inference by 2x in Multiturn Interactions with Llama Models
Deploying large language models (LLMs) in production environments often requires making hard trade-offs between enhancing user interactivity and increasing...
7 MIN READ

Oct 09, 2024
Boosting Llama 3.1 405B Throughput by Another 1.5x on NVIDIA H200 Tensor Core GPUs and NVLink Switch
The continued growth of LLMs capability, fueled by increasing parameter counts and support for longer contexts, has led to their usage in a wide variety of...
8 MIN READ

Oct 08, 2024
Bringing AI-RAN to a Telco Near You
Inferencing for generative AI and AI agents will drive the need for AI compute infrastructure to be distributed from edge to central clouds. IDC predicts that...
14 MIN READ

Sep 26, 2024
Low Latency Inference Chapter 2: Blackwell is Coming. NVIDIA GH200 NVL32 with NVLink Switch Gives Signs of Big Leap in Time to First Token Performance
Many of the most exciting applications of large language models (LLMs), such as interactive speech bots, coding co-pilots, and search, need to begin responding...
8 MIN READ

Sep 24, 2024
NVIDIA GH200 Grace Hopper Superchip Delivers Outstanding Performance in MLPerf Inference v4.1
In the latest round of MLPerf Inference – a suite of standardized, peer-reviewed inference benchmarks – the NVIDIA platform delivered outstanding...
7 MIN READ

Sep 16, 2024
Memory Efficiency, Faster Initialization, and Cost Estimation with NVIDIA Collective Communications Library 2.22
For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes....
8 MIN READ

Sep 06, 2024
Enhancing Application Portability and Compatibility across New Platforms Using NVIDIA Magnum IO NVSHMEM 3.0
NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on...
7 MIN READ

Sep 05, 2024
Low Latency Inference Chapter 1: Up to 1.9x Higher Llama 3.1 Performance with Medusa on NVIDIA HGX H200 with NVLink Switch
As large language models (LLMs) continue to grow in size and complexity, multi-GPU compute is a must-have to deliver the low latency and high throughput that...
5 MIN READ

Aug 28, 2024
Boosting Llama 3.1 405B Performance up to 1.44x with NVIDIA TensorRT Model Optimizer on NVIDIA H200 GPUs
The Llama 3.1 405B large language model (LLM), developed by Meta, is an open-source community model that delivers state-of-the-art performance and supports a...
7 MIN READ

Aug 12, 2024
NVIDIA NVLink and NVIDIA NVSwitch Supercharge Large Language Model Inference
Large language models (LLM) are getting larger, increasing the amount of compute required to process inference requests. To meet real-time latency requirements...
8 MIN READ

Jun 12, 2024
NVIDIA Sets New Generative AI Performance and Scale Records in MLPerf Training v4.0
Generative AI models have a variety of uses, such as helping write computer code, crafting stories, composing music, generating images, producing videos, and...
11 MIN READ