Nvidia Profiling Tools Keipert 10 4 22
Nvidia Profiling Tools Keipert 10 4 22
KRISTOPHER KEIPERT
PROGRAMMING THE NVIDIA PLATFORM
CPU, GPU, and Network
ACCELERATION LIBRARIES
Core Math Communication Data Analytics AI Quantum
NVIDIA HPC SDK
Available at developer.nvidia.com/hpc-sdk, on NGC, via Spack, and in the Cloud
DEVELOPMENT ANALYSIS
HPC-X
Standard C++ & Fortran nvcc nvc libcu++ cuBLAS cuTENSOR Nsight cuda-gdb
MPI
UCX SHMEM
OpenACC & OpenMP nvc++ Thrust cuSPARSE cuSOLVER SHARP HCOLL Systems Host
NVSHMEM
Compute Device
CUDA nvfortran CUB cuFFT cuRAND
NCCL
Start here
Nsight Systems
Comprehensive system-level
performance
Re-check overall Re-check overall
performance performance
5
NSIGHT SYSTEMS
System Profiler
Key Features:
▪ System-wide application algorithm tuning
▪ Multi-process tree support
▪ Locate optimization opportunities
▪ Visualize millions of events on a very fast GUI timeline
▪ Or gaps of unused CPU and GPU time
▪ Balance your workload across multiple CPUs and GPUs
▪ CPU algorithms, utilization and thread state
GPU streams, kernels, memory transfers, etc
▪ Command Line, Standalone, IDE Integration
OS: Linux (x86, Power, Arm SBSA, Tegra), Windows, MacOSX (host)
GPUs: Pascal+
Docs/product:
6
https://developer.nvidia.com/nsight-systems
Thread/core
migration
Processes
and
threads Thread state
CUDA and
OpenGL API trace
cuDNN and
cuBLAS trace
Multi-GPU
7
ZOOM/FILTER TO EXACT AREAS OF INTEREST
+
8
NVTX: NVIDIA TOOLS EXTENSIONS
Code Annotation API
9
EXPERT SYSTEMS & STATISTICS
Built-in Data Analytics with Advice
MULTI-REPORT TILING
Visualize More Parallel Activity
Open multiple
reports
Loaded on same
Open multiple
timeline based on
reports
wall-clock
APPLICATION PROFILES WITH NSIGHT SYSTEMS
• Pytorch
o DNN Layer annotations are disabled by default
o ++ ”with torch.autograd.profiler.emit_nvtx():”
o Manually with torch.cuda.nvtx.range_(push/pop)
o TensorRT backend is already annotated
• Tensorflow
o Annotated by default with NVTX in NVIDIA TF containers
o TF_DISABLE_NVTX_RANGES=1 to disable for production
13
NSIGHT COMPUTE
Kernel Profiling Tool
Key Features:
▪ Interactive CUDA API debugging and kernel profiling
▪ Built-in rules expertise
▪ Fully customizable data collection and display
▪ Command Line, Standalone, IDE Integration, Remote Targets
Docs/product: https://developer.nvidia.com/nsight-compute
Targeted metric
sections
Customizable data
collection and
presentation
Metric heatmap to
Source metrics per quickly identify
instruction hotspots
OCCUPANCY CALCULATOR
Model Hardware Usage and Identify Limiters
▪ Model theoretical
hardware usage
▪ Understand limitations
from hardware vs.
kernel parameters
▪ Configure model to
vary HW and kernel
parameters
▪ Opened from an
existing report or as a
new activity
HIERARCHICAL ROOFLINE
• (Without the –k option, Nsight Compute with profile everything and take a long time)
CUDA-GDB
Command-Line and IDE Back-End Debugger
▪ https://github.com/NVIDIA/compute-
sanitizer-samples
NSIGHT VISUAL STUDIO CODE EDITION
Session status
CUDA focus
https://developer.nvidia.com/nsight-visual-studio-code-edition
ADDITIONAL RESOURCES
▪ Sessions
▪ A41100 - CUDA: New Features and Beyond
▪ A41131 - Developing Efficient CUDA Kernels for Fourth-Generation Tensor Cores
▪ Labs
▪ DLIT41277 - Optimizing CUDA Machine Learning Codes with Nsight Profiling Tools
▪ DLIT41274 - Debugging and Analyzing Correctness of CUDA Applications
▪ DLIT41276 - Developer Tools Fundamentals for Ray Tracing using NVIDIA Nsight Graphics and NVIDIA Nsight Systems
▪ Ampere Architecture Detailed Blog
▪ NVIDIA Ampere Architecture In-Depth
▪ Developer Tools are free and packaged in the latest version of the CUDA Toolkit
▪ https://developer.nvidia.com/cuda-downloads
▪ Support is available via:
▪ https://forums.developer.nvidia.com/c/development-tools/
▪ More information at:
▪ https://developer.nvidia.com/tools-overview
HANDS-ON
/lus/eagle/projects/SDL_Workshop/jacobi