When it comes to developing and deploying advanced AI models, access to scalable, efficient GPU infrastructure is critical. But managing this infrastructure across cloud-native, containerized environments can be complex and costly. That’s where NVIDIA Run:ai can help. NVIDIA Run:ai is now generally available on AWS Marketplace, making it even easier for organizations to streamline their AI infrastructure management.
Built for Kubernetes-native environments, NVIDIA Run:ai acts as a control plane for GPU infrastructure, removing complexity and enabling organizations to scale AI workloads with speed, efficiency, and proper governance.
This post dives into how NVIDIA Run:ai orchestrates AI workloads and GPUs across Amazon Web Services (AWS). It integrates seamlessly with NVIDIA GPU-accelerated Amazon EC2 instances, Amazon Elastic Kubernetes Service (EKS), Amazon SageMaker HyperPod, AWS Identity and Access Management (IAM), Amazon CloudWatch, and other AWS-native services.
The challenge: efficient GPU orchestration at scale
Modern AI workloads—from large-scale training to real-time inference—require dynamic access to powerful GPUs. But in Kubernetes environments, native support for GPUs is limited. Common challenges include:
- Inefficient GPU utilization due to static allocation
- Lack of workload prioritization and preemption
- Limited visibility into GPU consumption
- Difficulty enforcing governance across teams and workloads
The NVIDIA Run:ai solution
NVIDIA Run:ai addresses these challenges with a Kubernetes-based AI orchestration platform designed specifically for AI/ML workloads. It introduces a virtual GPU pool, enabling dynamic, policy-based scheduling of GPU resources.
Key capabilities:
- Fractional GPU allocation: Share a single GPU across multiple inference jobs or Jupyter notebooks.
- Dynamic scheduling: Allocate full or fractional GPUs based on job priority, queueing, and availability.
- Workload-aware orchestration: Treat training, tuning, and inference differently, with policies optimized for each phase.
- Team-based quotas and isolation: Guarantee resources for teams or projects using fairshare or hard quotas.
- Multi-tenant governance: Ensure cost visibility and compliance in shared infrastructure environments.

How NVIDIA Run:ai works on AWS
NVIDIA Run:ai integrates seamlessly with NVIDIA-powered AWS services to optimize performance and simplify operations:
1. Amazon EC2 GPU-accelerated instances within Kubernetes clusters (NVIDIA A10G, A100, H100, etc.)
NVIDIA Run:ai schedules AI workloads on Kubernetes clusters that are deployed on EC2 instances with NVIDIA GPUs. That maximizes GPU utilization through intelligent sharing and bin packing.
- Supports multi-GPU and multi-node training
- Enables time-slicing and GPU overcommit for interactive workloads
2. Amazon EKS (Elastic Kubernetes Service)
NVIDIA Run:ai integrates natively with Amazon EKS, providing a robust scheduling and orchestration layer that’s purpose-built for AI workloads. It maximizes the utilization of GPU resources in Kubernetes clusters.
- Native integration of the NVIDIA Run:ai Scheduler with EKS
- Orchestrates and optimizes AI workloads using advanced GPU resource management for workloads on EKS
- Compatible with the NVIDIA GPU Operator, which automates the provisioning of GPU drivers, monitoring agents, and libraries across EKS nodes
3. Amazon Sagemaker HyperPod
NVIDIA Run:ai integrates with Amazon SageMaker HyperPod to seamlessly extend AI infrastructure across both on-premise and public/private cloud environments.
- Improves efficiency and flexibility when combined with NVIDIA Run:ai’s advanced AI workload and GPU orchestration platform
- Purpose-built for large-scale distributed training and inference
Integrating with Amazon CloudWatch
Monitoring GPU workloads at scale requires real-time observability. NVIDIA Run:ai can be integrated with Amazon CloudWatch to provide:
- Custom Metrics: Push GPU-level usage metrics (e.g., memory utilization and time-slicing stats) to CloudWatch.
- Dashboards: Visualize GPU consumption per job, team, or project.
- Alarms: Trigger alerts based on underutilization, job failures, or quota breaches.
By combining NVIDIA Run:ai’s rich workload telemetry with CloudWatch’s analytics and alerting, users gain actionable insights into resource consumption and efficiency.
Integrating with AWS IAM
Security and governance are foundational for AI infrastructure. NVIDIA Run:ai integrates with AWS IAM to:
- Manage secure access to AWS resources
- Enforce least-privilege access controls at the API, resource, and namespace levels within NVIDIA Run:ai
- Support auditing of access logs and API interactions for compliance and security
IAM integration ensures that only authorized users and services can access or manage NVIDIA Run:ai resources within your AWS environment.
Example: multi-team GPU orchestration on EKS
Imagine an enterprise AI platform with three teams: natural language processing (NLP), computer vision, and generative AI. Each team needs guaranteed GPU access for training, while also running inference jobs on shared infrastructure.
With NVIDIA Run:ai:
- Each team receives a guaranteed quota and namespace with its own fairshare policy.
- Training jobs are queued and scheduled dynamically based on priority and available capacity.
- Interactive jobs use fractional GPUs, maximizing the return on scarce GPU resources.
- All usage is monitored in CloudWatch, and access is controlled via IAM roles.
This model allows AI teams to move faster without stepping on each other’s toes—or burning the budget on underutilized GPUs.

Get started
As enterprises scale their AI efforts, managing GPU infrastructure manually becomes unsustainable. NVIDIA Run:ai, in combination with NVIDIA technologies on AWS, offers a powerful orchestration layer that simplifies GPU management, boosts utilization, and accelerates AI innovation.
With native integration into EKS, EC2, IAM, SageMaker HyperPod, and CloudWatch, NVIDIA Run:ai provides a unified, enterprise-ready foundation for AI/ML workloads in the cloud.
To learn more or deploy NVIDIA Run:ai on your AWS environment, visit the NVIDIA Run:ai listing on AWS Marketplace or explore the NVIDIA Run:ai documentation.