Accelerate AI Model Orchestration with NVIDIA Run:ai on AWS

When it comes to developing and deploying advanced AI models, access to scalable, efficient GPU infrastructure is critical. But managing this infrastructure across cloud-native, containerized environments can be complex and costly. That’s where NVIDIA Run:ai can help. NVIDIA Run:ai is now generally available on AWS Marketplace, making it even easier for organizations to streamline their AI infrastructure management.

Built for Kubernetes-native environments, NVIDIA Run:ai acts as a control plane for GPU infrastructure, removing complexity and enabling organizations to scale AI workloads with speed, efficiency, and proper governance.

This post dives into how NVIDIA Run:ai orchestrates AI workloads and GPUs across Amazon Web Services (AWS). It integrates seamlessly with NVIDIA GPU-accelerated Amazon EC2 instances, Amazon Elastic Kubernetes Service (EKS), Amazon SageMaker HyperPod, AWS Identity and Access Management (IAM), Amazon CloudWatch, and other AWS-native services.

The challenge: efficient GPU orchestration at scale

Modern AI workloads—from large-scale training to real-time inference—require dynamic access to powerful GPUs. But in Kubernetes environments, native support for GPUs is limited. Common challenges include:

Inefficient GPU utilization due to static allocation
Lack of workload prioritization and preemption
Limited visibility into GPU consumption
Difficulty enforcing governance across teams and workloads

The NVIDIA Run:ai solution

NVIDIA Run:ai addresses these challenges with a Kubernetes-based AI orchestration platform designed specifically for AI/ML workloads. It introduces a virtual GPU pool, enabling dynamic, policy-based scheduling of GPU resources.

Key capabilities:

Fractional GPU allocation: Share a single GPU across multiple inference jobs or Jupyter notebooks.
Dynamic scheduling: Allocate full or fractional GPUs based on job priority, queueing, and availability.
Workload-aware orchestration: Treat training, tuning, and inference differently, with policies optimized for each phase.
Team-based quotas and isolation: Guarantee resources for teams or projects using fairshare or hard quotas.
Multi-tenant governance: Ensure cost visibility and compliance in shared infrastructure environments.

An architecture diagram showing the system components of NVIDIA Run:ai and how they integrate with AWS — *Figure 1. NVIDIA Run:ai Cluster and Control Plane with AWS*

How NVIDIA Run:ai works on AWS

NVIDIA Run:ai integrates seamlessly with NVIDIA-powered AWS services to optimize performance and simplify operations:

1. Amazon EC2 GPU-accelerated instances within Kubernetes clusters (NVIDIA A10G, A100, H100, etc.)

NVIDIA Run:ai schedules AI workloads on Kubernetes clusters that are deployed on EC2 instances with NVIDIA GPUs. That maximizes GPU utilization through intelligent sharing and bin packing.

Supports multi-GPU and multi-node training
Enables time-slicing and GPU overcommit for interactive workloads

2. Amazon EKS (Elastic Kubernetes Service)

NVIDIA Run:ai integrates natively with Amazon EKS, providing a robust scheduling and orchestration layer that’s purpose-built for AI workloads. It maximizes the utilization of GPU resources in Kubernetes clusters.

Native integration of the NVIDIA Run:ai Scheduler with EKS
Orchestrates and optimizes AI workloads using advanced GPU resource management for workloads on EKS
Compatible with the NVIDIA GPU Operator, which automates the provisioning of GPU drivers, monitoring agents, and libraries across EKS nodes

3. Amazon Sagemaker HyperPod

NVIDIA Run:ai integrates with Amazon SageMaker HyperPod to seamlessly extend AI infrastructure across both on-premise and public/private cloud environments.

Improves efficiency and flexibility when combined with NVIDIA Run:ai’s advanced AI workload and GPU orchestration platform
Purpose-built for large-scale distributed training and inference

Integrating with Amazon CloudWatch

Monitoring GPU workloads at scale requires real-time observability. NVIDIA Run:ai can be integrated with Amazon CloudWatch to provide:

Custom Metrics: Push GPU-level usage metrics (e.g., memory utilization and time-slicing stats) to CloudWatch.
Dashboards: Visualize GPU consumption per job, team, or project.
Alarms: Trigger alerts based on underutilization, job failures, or quota breaches.

By combining NVIDIA Run:ai’s rich workload telemetry with CloudWatch’s analytics and alerting, users gain actionable insights into resource consumption and efficiency.

Integrating with AWS IAM

Security and governance are foundational for AI infrastructure. NVIDIA Run:ai integrates with AWS IAM to:

Manage secure access to AWS resources
Enforce least-privilege access controls at the API, resource, and namespace levels within NVIDIA Run:ai
Support auditing of access logs and API interactions for compliance and security

IAM integration ensures that only authorized users and services can access or manage NVIDIA Run:ai resources within your AWS environment.

Example: multi-team GPU orchestration on EKS

Imagine an enterprise AI platform with three teams: natural language processing (NLP), computer vision, and generative AI. Each team needs guaranteed GPU access for training, while also running inference jobs on shared infrastructure.

With NVIDIA Run:ai:

Each team receives a guaranteed quota and namespace with its own fairshare policy.
Training jobs are queued and scheduled dynamically based on priority and available capacity.
Interactive jobs use fractional GPUs, maximizing the return on scarce GPU resources.
All usage is monitored in CloudWatch, and access is controlled via IAM roles.

This model allows AI teams to move faster without stepping on each other’s toes—or burning the budget on underutilized GPUs.

NVIDIA Run:ai Dashboard showing metrics and visibility into GPU utilization. — *Figure 2. NVIDIA Run:ai Dashboard*

Get started

As enterprises scale their AI efforts, managing GPU infrastructure manually becomes unsustainable. NVIDIA Run:ai, in combination with NVIDIA technologies on AWS, offers a powerful orchestration layer that simplifies GPU management, boosts utilization, and accelerates AI innovation.

With native integration into EKS, EC2, IAM, SageMaker HyperPod, and CloudWatch, NVIDIA Run:ai provides a unified, enterprise-ready foundation for AI/ML workloads in the cloud.

To learn more or deploy NVIDIA Run:ai on your AWS environment, visit the NVIDIA Run:ai listing on AWS Marketplace or explore the NVIDIA Run:ai documentation.

Accelerate AI Model Orchestration with NVIDIA Run:ai on AWS

The challenge: efficient GPU orchestration at scale

The NVIDIA Run:ai solution

Key capabilities:

How NVIDIA Run:ai works on AWS

1. Amazon EC2 GPU-accelerated instances within Kubernetes clusters (NVIDIA A10G, A100, H100, etc.)

2. Amazon EKS (Elastic Kubernetes Service)

3. Amazon Sagemaker HyperPod

Integrating with Amazon CloudWatch

Integrating with AWS IAM

Example: multi-team GPU orchestration on EKS

Get started

Related resources

Tags

About the Authors

Accelerate AI Model Orchestration with NVIDIA Run:ai on AWS

The challenge: efficient GPU orchestration at scale

The NVIDIA Run:ai solution

Key capabilities:

How NVIDIA Run:ai works on AWS

1. Amazon EC2 GPU-accelerated instances within Kubernetes clusters (NVIDIA A10G, A100, H100, etc.)

2. Amazon EKS (Elastic Kubernetes Service)

3. Amazon Sagemaker HyperPod

Integrating with Amazon CloudWatch

Integrating with AWS IAM

Example: multi-team GPU orchestration on EKS

Get started

Related resources

Tags

About the Authors

Comments

Related posts

NVIDIA Run:ai and Amazon SageMaker HyperPod: Working Together to Manage Complex AI Training

Simplify AI Application Development with NVIDIA Cloud Native Stack

Google Cloud Run Adds Support for NVIDIA L4 GPUs, NVIDIA NIM, and Serverless AI Inference Deployments at Scale

Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai

Amazon Elastic Kubernetes Services Now Offers Native Support for NVIDIA A100 Multi-Instance GPUs

Related posts

Hackathon Winners Bring Agentic AI to Life with the NVIDIA NeMo Agent Toolkit

NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science

Safeguard Agentic AI Systems with the NVIDIA Safety Recipe

New Learning Pathway: Deploy AI Models with NVIDIA NIM on GKE

NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science