Technical Brief Gpu Positioning Virtualized Compute and Graphics Workloads
Technical Brief Gpu Positioning Virtualized Compute and Graphics Workloads
Technical Brief
List of Tables
GPU Positioning for Virtualized Compute and Graphics Workloads TB-09867-001_v03 | iii
Executive Summary
The NVIDIA virtual GPU (vGPU) solution provides a flexible way to accelerate virtualized
workloads – from Artificial Intelligence (AI) to Virtual Desktop Infrastructure (VDI). This
solution includes NVIDIA graphics processing units (GPUs) for virtualization and NVIDIA
software for virtualizing these GPUs.
Decoupling the GPU hardware and virtual GPU software options enables customers to
benefit from innovative features delivered in the software at a regular cadence, without
the need to purchase new GPU hardware. It also provides the flexibility for IT
departments to architect the optimal solution to meet the specific needs of users in
their environment.
The flexibility of the NVIDIA vGPU solution sometimes leads to the question, “How do I
select the right combination of NVIDIA GPUs and virtualization software that best meets
the requirements of my workloads?” In this technical brief, you will find guidance to help
you answer that question.
This guidance is based on factors such as raw performance, performance per dollar1, and
overall cost effectiveness. It serves as a great starting point for understanding best
practices for accelerating workloads in a virtualized infrastructure. However, to
determine the NVIDIA virtual GPU solution that best meets your needs, you must test
the solution with your own workloads.
You should also consider other factors, such as which NVIDIA vGPU certified OEM server
to select, which NVIDIA GPUs are supported by that server, and any power and cooling
constraints.
Table 1 summarizes the NVIDIA vGPU solutions for virtualized workloads.
1 Performance per dollar is calculated by adding the estimated GPU street prices to the cost of a
4-year or 5-year subscription to NVIDIA virtual GPU software and dividing the total cost by the
number of users.
2 NVIDIA H100, A100 and NVIDIA A30 do not support graphics workloads.
The GPU that best meets the requirements of your workloads depends on the
importance to you of factors such as raw performance, time-to-solution, performance
per dollar, performance per watt, form factor, and any power and cooling constraints.
NVIDIA L40
The NVIDIA® L40, based on the NVIDIA Ada Lovelace GPU architecture, delivers
unprecedented visual computing performance for the data center and provides
revolutionary neural graphics, compute, and AI capabilities to accelerate the most
demanding visual computing workloads. The L40 features 142 third-generation RT Cores
that enhance real-time ray tracing capabilities and 568 fourth-generation Tensor Cores
with support for the FP8 data format. These new features are combined with the latest
generation CUDA Cores and 48GB of graphics memory to accelerate visual computing
workloads from high-performance virtual workstation instances to large-scale digital
twins in NVIDIA Omniverse. With up to twice the performance of the previous generation
at the same power, the NVIDIA L40 is uniquely suited to provide the visual computing
power and performance required by the modern data center. When combined with
NVIDIA RTX™ Virtual Workstation (vWS) software, the NVIDIA L40 delivers powerful
virtual workstations from the data center or cloud to any device. Millions of creative and
technical professionals can access the most demanding applications from anywhere
with awe-inspiring performance that rivals physical workstations—all while meeting the
need for greater security.
3 Performance-optimized GPUs are designed to maximize raw performance for a specific class of
virtualized workload. They are typically recommended for the following classes of virtualized
workload:
> High-end virtual workstations running professional visualization applications.
> Compute-intensive workloads such as artificial intelligence, deep learning, or data science
workloads.
Density-optimized GPUs are designed to maximize the number of VDI users supported in a
server. They are typically recommended for knowledge worker virtual desktop infrastructure (VDI)
to run office productivity applications, streaming video, and the Windows OS.
NVIDIA L4
The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy
efficiency for video, AI, virtual workstations, and graphics applications in the
enterprise, in the cloud, and at the edge. And with NVIDIA’s AI platform and full-stack
approach, L4 is optimized for video and inference at scale for a broad range of AI
applications to deliver the best in personalized experiences. As the most efficient NVIDIA
accelerator for mainstream use, servers equipped with L4 power up to 120X higher AI
video performance over CPU solutions and 2.5X more generative AI performance, as well
as over 4X more graphics performance than the previous GPU generation. L4’s versatility
and energy-efficient, single-slot, low-profile form factor makes it ideal for edge, cloud,
and enterprise deployments.
NVIDIA A40
Built on the RTX platform, the NVIDIA A40 GPU is uniquely positioned to power high-end
virtual workstations running professional visualization applications, accelerating the
most demanding graphics workloads. The second-generation RT Cores of the NVIDIA
A40 enable it to deliver massive speedups for workloads such as photorealistic rendering
NVIDIA A16
The NVIDIA A16 is designed to the provide the most cost-effective graphics
performance for knowledge worker VDI workloads. For these workloads, where users
are accessing office productivity applications, web browsers, and streaming video, the
most important consideration is achieving the best performance per dollar and the
highest user density per server. With four GPUs on each board, the NVIDIA A16 is ideal
for providing the best performance per dollar and a high number of users per GPU for
these workloads.
NVIDIA A10
The NVIDIA A10 is designed to the provide cost-effective graphics performance for
accelerating and optimizing the performance of mixed workloads. When combined with
NVIDIA RTX vWS software, it accelerates graphics and video processing with AI on
mainstream enterprise servers. Its second-generation RT Cores make the NVIDIA A10
ideal for mainstream professional visualization applications running on high-performance
mid-range virtual workstations.
NVIDIA A2
The NVIDIA A2 is designed to provide cost-effective compute performance for deep
learning inference workloads and cost-effective graphics performance for knowledge
worker VDI workloads with vPC. It is a versatile, low-power, small-footprint, entry-level
GPU that delivers low cost per user.
Note: When choosing GPUs based on raw performance or performance per dollar, use these
results for general guidance only. All results are based on the workloads listed in Table 3,
which could differ from the applications being used in production.
4 Assumes that enough frame buffer is available on all vGPUs across all GPUs.
5 NVIDIA H100, A100 and NVIDIA A30 do not support graphics workloads.
Test Results
The GPUs that provide the best raw performance and cost effectiveness for knowledge
worker VDI workloads are listed in Table 4. For knowledge worker VDI workloads, the
principal factor in determining cost effectiveness is the combination of performance per
dollar and user density.
As more knowledge worker users are added to a server, the server consumes more CPU
resources. Adding an NVIDIA GPU for this workload conserves CPU resources by
offloading graphics rendering tasks to the GPU. As a result, user experience and
performance are improved for end users.
Table 5 assumes that each user requires a vGPU profile with 1GB of frame buffer.
However, to determine the profile sizes that provide the best user experience for the
users in your environment, you must conduct a proof of concept (POC).
6The maximum number of boards per server assumes a 2U server. Refer to the specifications for
your preferred OEM server to determine the maximum number of boards supported.
1.60
1.39
1.40
1.24 1.23
1.18
1.20 1.08
1.00
1.00
0.80
0.60
0.40
0.20
0.00
L40 A40 A10 T4 L4 A2 A16
Figure 1 assumes an estimated GPU street price plus the cost of NVIDIA vPC software
with a four-year subscription divided by number of users.
Professional Graphics
GPU performance for professional graphics workloads was measured by using the
SPECviewperf 2020 (3840x2160) benchmark test. SPECviewperf 2020 is a standard
benchmark for measuring the graphics performance of professional applications. It
measures the 3D graphics performance of systems running under the OpenGL and
Direct X application programming interfaces.
Test Results
The GPUs that provide the best raw performance and cost effectiveness for
professional graphics workloads are listed in Table 6. For professional graphics
workloads, the principal factor in determining cost effectiveness is performance per
dollar.
3.5
Geomean (Normalized)
3.0
2.5 2.4
1.9
2.0 1.7
1.5
1.0
1.0
0.5
0.5
0.0
A16 T4 L4 A10 A40 L40
SPECviewperf2020
1.4
1.4
1.2
1.2 1.1
1.0
1.0
0.8
0.8
0.6
0.4
0.2
0.0
A16 T4 A40 L40 A10 L4
SPECviewperf2020
Figure 3 assumes an estimated GPU street price plus the cost of NVIDIA RTX vWS
software with a four-year subscription.
Test Results
The GPUs that provide the best raw performance and cost effectiveness for AI deep
learning training workloads are listed in Table 8. For AI deep learning training workloads,
the principal factor in determining cost effectiveness is time-to-solution.
8.0 7.0
6.0
4.6
4.0 3.5
3.0
2.3
2.0 1.6
1.0
0.0
T4 L4 A10 A30 A40 L40 A100 H100
BERT Large Fine Tune Training
8.0
6.0 5.2
3.7
4.0
3.1
2.7
2.2
2.0 1.6
1.0
0.0
T4 L4 A10 A30 A40 L40 A100 H100
BERT Large Fine Tune Training
Figure 5 assumes an estimated GPU street price plus the cost of NVIDIA AI Enterprise
software with a five-year subscription.
Test Results
The GPUs that provide the best raw performance and cost effectiveness for AI deep
learning inference workloads are listed in Table 10. For AI deep learning inference
workloads, the principal factor in determining cost effectiveness is the combination of
performance per dollar and the flexibility that is a result of support for the Multi-
Instance GPU (MIG) feature.
10.0
8.6
8.0
5.7
6.0
4.6 4.6
4.0 3.1
2.6
2.0 1.0
0.7
0.0
A2 T4 L4 A10 A40 A30 L40 A100 H100
BERT Large Inference
8.0
6.4
6.0
4.6
4.0 4.1
4.0 2.9
2.6
2.0 1.0
0.7
0.0
A2 T4 L4 A10 A40 A30 L40 A100 H100
BERT Large Inference
Figure 7 assumes an estimated GPU street price plus the cost of NVIDIA AI Enterprise
software with a five-year subscription.
Property Value
Batch size 128
Integer data type INT 8
Sequence length 128
Precision Mixed
NVIDIA GPU virtualization software products are optimized for different classes of
workload. Therefore, you should select the right NVIDIA GPU virtualization software
product on the basis of the workloads that your users are running.
Graphics Features and APIs NVIDIA RTX vWS NVIDIA vPC NVIDIA AI Enterprise
NVENC ✓ ✓ ✓
OpenGL extensions (WebGL) ✓ ✓
In-situ graphics/GL support ✓
RTX platform optimizations ✓
DirectX ✓ ✓
Vulkan support ✓ ✓
Profiles NVIDIA RTX vWS NVIDIA vPC NVIDIA AI Enterprise
Maximum supported frame buffer 48 GB 2 GB 80 GB
Available Profiles 0Q, 1Q, 2Q, 3Q, 4Q, 0B, 1B, 2B 4C, 5C, 6C, 8C, 10C,
6Q, 8Q, 12Q, 16Q, 16C, 20C, 40C, 80C
24Q, 32Q, 48Q
Optimal Workloads
Table 13 shows the different classes of workload for which NVIDIA GPU virtualization
software products are optimized.
Product Details
Each NVIDIA GPU virtualization software product is designed for a specific class of
workload.
NVIDIA Virtual PC
NVIDIA Virtual PC (vPC) software is designed for knowledge worker VDI workloads to
accelerate the following software and peripheral devices:
> Office productivity applications
> Streaming video
> The Windows OS
> Multiple monitors
> High-resolution monitors
> 2D electric design automation (EDA)
NVIDIA AI Enterprise
NVIDIA AI Enterprise is designed for compute-intensive workloads, such as artificial
intelligence (AI), deep learning, data science, and high-performance computing (HPC)
workloads. It is a secure, end-to-end, cloud-native suite of AI software that includes an
extensive library of full-stack software such as:
> NVIDIA AI workflows
> Frameworks
> Pretrained models
> Infrastructure optimization
NVIDIA AI Enterprise accelerates the data science pipeline and streamlines the
development and deployment of production AI including generative AI, computer vision,
speech AI and more. Available in the cloud, the data center and at the edge, NVIDIA AI
Enterprise enables organizations to develop once and run anywhere. Global enterprise
support and regular security reviews ensure business continuity and AI projects stay on
track.
1.2
1
0.27
Geomean (Normalized)
0.8 0.54
0.27
0.6
1
0.4 0.27
0.54
0.2
0.27
0
A40-48Q A40-24Q A40-12Q
Single VM Two VMs Four VMs
The server configuration for measuring the effect of GPU sharing on overall throughput
is listed in Table 14.
1.0
0.8
0.6
0.4
0.2
0.0
BERT-Large HighAccuracy Offline BERT-Large Offline
Samples
3x A100-40c Virtualized per Second
using NVIDIA - Relative 3x A100 Bare Metal
AI Enterprise
The server configuration for measuring the performance of NVIDIA AI Enterprise is listed
in Table 15.
2X
1X
0X
1 Node 2 Node Cluster 3 Node Cluster 4 Node Cluster
Sentences per Second - Relative Performance
The server configuration for measuring the multinode scaling performance of NVIDIA AI
Enterprise is listed in Table 16.
The NVIDIA solutions for virtualized compute and graphics workloads offers unmatched
flexibility and performance when paired with GPUs based on the NVIDIA Hopper,
Lovelace, and Ampere architecture. The solution is designed to meet the ever-shifting
workloads and organizational needs of today’s modern enterprises.
For professional visualization workloads, the optimal GPU for each class of workload is
as follows:
> The NVIDIA L40 is uniquely positioned to power the most demanding graphics and
rendering workloads for dynamic virtual workstations.
> The NVIDIA L4 offers the best performance per dollar for professional graphics
workloads.
> If the infrastructure supports knowledge worker VDI workloads, the NVIDIA A16
provides the best performance per dollar, while also providing the best user density.
For AI workloads, including deep learning training and deep learning inferencing, the
optimal GPU for each class of workload is as follows:
> The NVIDIA H100 offers the best raw performance and cost effectiveness for
training and inference workloads. It is the most advanced data center GPU ever built,
delivering high performance with unprecedented acceleration.
The NVIDIA H100 GPU supports MIG, optimizing GPU utilization and providing flexibility
with dynamic reconfiguration of MIG instances.
NVIDIA GPU virtualization software products are optimized for different classes of
workload. For details on how to best configure an accelerated virtualized infrastructure,
refer to the sizing guidelines for these GPU virtualization software products:
> NVIDIA vPC Windows 10 Profile Sizing Guidance
> NVIDIA RTX Virtual Workstation Application Sizing Guide
> NVIDIA AI Enterprise Sizing Guide
Although this technical brief provides general guidance on how to select the right
NVIDIA GPU and virtualization software for your workloads, actual results may vary
depending on the specific workloads that are being virtualized. To balance virtual
machine density (scalability) with required performance, conduct a proof of concept
(POC) with your production workloads. To allow the configuration to be optimized to
meet the requirements for performance and scale, analyze the utilization of all resources
of the system and gather subjective feedback from all stakeholders.
Other Resources
> Try NVIDIA vGPU for free
> Using NVIDIA Virtual GPUs to Power Mixed Workloads
> NVIDIA Virtual GPU Software Documentation
> NVIDIA vGPU Certified Servers
> NVIDIA LaunchPad: The End-to-End AI Platform
Trademarks
NVIDIA, the NVIDIA logo, NVIDA CUDA, NVIDIA RTX, NVIDIA Turing, NVIDIA Volta, GPUDirect, NVLink, Quadro RTX, and TensorRT are trademarks and/or
registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective
companies with which they are associated.
MLPerf
The MLPerf name and logo are trademarks of MLCommons Association (“MLCommons”) in the United States and other countries.
OpenCL
OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc.
Copyright
© 2023 NVIDIA Corporation. All rights reserved.
TB-09867-001_v03