0% found this document useful (0 votes)

108 views

Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P

Uploaded by

sumit.nitb

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views

Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P

Uploaded by

sumit.nitb

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

CMP372-P

Using AWS ParallelCluster to

simplify HPC cluster management
Nathan Stornetta
Senior Product Manager, HPC
Amazon Web Services

Introduction to AWS ParallelCluster

Demo: Running CFD using AWS ParallelCluster

Scaling up fast with AWS ParallelCluster

AWS ParallelCluster best practices

Related breakouts
CMP402-R: Setting up and optimizing your HPC cluster on AWS
CMP408-R: Using Elastic Fabric Adapter to scale HPC workloads on AWS
CMP409-R: Selecting the right instance for your HPC workloads
CMP418-R: Using AWS ParallelCluster to simplify cluster management
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We think the metric for success for any business
should be time-to-results

“For every $1 spent on

HPC, businesses see
9

7
5

Fixed datacenter $463 in incremental

revenues and $44 in
8

capacity limit
2
7 8 1
3
7 1 2

incremental profit.”
6 3 7
6
Cores

Cores
9 6 6
4 8
1
9
4 7
2 1 2
5 7
7
8 4 4 4
5 2 1
1 3
2 2

Finite capacity, usually with Massive capacity when needed to speed up time
long queues to wait in to results, and agile environment when additional
hardware and software experimentation is needed
Because, a TCO analysis never tells the whole story
Lost productivity & longer time to results

72.8% of organizations that use HPC reported

delayed or cancelled HPC jobs*

Lost innovation Outdated technology Technical debt

Questions are left unasked, Almost 20% of the useful Adapting newer algorithms to
experiments are left undone, life of new technology/ meet the requirements of an
and potential revenue hardware lost in the existing infrastructure = delays,
left on the table. procurement process. and below-par performance.
AWS services to get started with HPC on AWS
Amazon CloudWatch
Data management Compute & Automation &
Storage Visualization
& data transfer networking orchestration

AWS DataSync Amazon EC2 instances Amazon EBS AWS Batch NICE DCV
(CPU, GPU, FPGA)
AWS Snowball Amazon FSx for Lustre AWS ParallelCluster Amazon AppStream 2.0
Amazon EC2 Spot
AWS Snowmobile Amazon EFS NICE EnginFrame
AWS Auto Scaling
AWS Direct Connect Amazon S3
Placement groups
Enhanced networking
Elastic Fabric Adapter

AWS Identity and Access Management (IAM)

AWS Budgets
Running HPC applications at extreme scale
Accelerating time to innovation

single
HPC cluster of 1 million vCPUs

“Storage technology is amazingly complex and we’re constantly pushing the limits
of physics and engineering to deliver next-generation capacities and technical
innovation. This successful collaboration with AWS shows the extreme scale,
power and agility of cloud-based HPC to help us run complex simulations for
future storage architecture analysis and materials science explorations. Using
AWS to easily shrink simulation time from 20 days to 8 hours allows Western
Digital R&D teams to explore new designs and innovations at a pace un-
imaginable just a short time ago.” – Steve Phillpott, CIO, Western Digital
Helping financial institutions
model investment risks
Run risk models
4,000 times faster
In hours, instead of months

Manage 50X the number of securities

Easy cluster management Automatic Resource Scaling Seamless Migration to the Cloud
Easy cluster management

`pcluster configure` to set Use config files to define Launch, stop, and restart clusters
up a cluster in minutes details of replicable clusters on demand
Automatic resource scaling

Scale up when jobs are Scale down when the Your data storage and file system
waiting cluster is idle scale to match your compute
Seamless migration to the cloud

Making HPC workloads cloud- AWS ParallelCluster simplifies first Integrations simplify the transition to
native can take time and planning steps to migrate HPC workloads cloud-native HPC at your own pace
AWS ParallelCluster
ALINUX CENTOS 6/7 UBUNTU DCV EFA OPENMPI INTELMPI NCCL
16/18

SLURM SGE TORQUE AWS BATCH

FSX EFS S3 EBS RAID

ON-DEMAND SPOT VPC & SUBNETS

AWS ParallelCluster works for a variety of use cases

Optimizing production workloads Fast prototyping Bypassing queues

AWS ParallelCluster runs workloads across all
industries

Drug discovery, Genomics Risk, Quantitative Research Reservoir Modeling, Seismic Imaging

CAE, CFD Weather modeling

AWS ParallelCluster runs workloads with varying
compute and throughput characteristics

Tightly coupled workloads Loosely coupled workloads Accelerated computing

Visualization AI/ML High-volume data analytics

AWS ParallelCluster Architecture
AWS Region

Availability Zone

VPC Subnet

Master Server Compute Nodes

Scheduler
Queue

Client AWS CloudFormation Amazon EC2 AWS Auto Scaling

Template

SSH +
User
NFS Share

Amazon EC2 C5n

Case Data Instances + EFA
Compute Suite
Amazon S3 Amazon Elastic Block Store
Bucket (EBS)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A story of an HPC research group
AWS ParallelCluster can scale in minutes to thousands
of cores
AWS ParallelCluster can switch between compute
nodes for rapid prototyping
AWS ParallelCluster makes elastic HPC easy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use cluster placement groups to place instances even
closer together

Availability Zone

Cluster Placement Group

Use Elastic Fabric Adapter (EFA) to scale tightly
coupled workloads even further
Scale tightly coupled HPC applications
on AWS

M5n/ R5n/
P3dn G4dn1 C5n i3en
M5dn R5dn

Custom
NVIDIA Intel Xeon
V100 Tensor Scalable
and T4 processor
Core GPUs
Choose compute instances suited for each workload
Categories Capabilities
NEW
(AWS, Intel, AMD)

(up to 4.0 GHz)

(up to 12 TiB)

(HDD and NVMe)

(GPUs and FPGA)

NEW
(up to 100 Gbps)

(Nano to 32xlarge)
Choose a master node to match your cluster

• The master node orchestrates

cluster scaling logic
• Bigger clusters can require bigger master
nodes

• Small master nodes have more

limited network throughput
• DCV is managed through the
master node
• Consider GPU-based instances for graphics-
intensive visualization
AWS ParallelCluster supports on-demand, reserved
instances, and EC2 Spot Pricing
• On-demand for workloads that have
to be done now

• Spot for fault-tolerant and flexible

workloads

• Reserved instances for predictable

levels of workloads
FSx for Lustre offers massively scalable file system
performance
Parallel file system SSD-based

100+ GiB/s throughput Supports hundreds of

Millions of IOPS thousands of cores
Consistent sub-millisecond latencies
Disable hyperthreading for improved performance

• Easily turn hyperthreading on or off

with a single parameter
Custom AMIs provide additional flexibility and choice

• Install your own software on top of

an existing AWS ParallelCluster AMI

• Bring your own AMI and add AWS

ParallelCluster on top
HPC on AWS

Flexible configuration and virtually unlimited scalability

to grow and shrink your infrastructure as your HPC
workloads dictate, not the other way around
Thank you!
Nathan Stornetta
[email protected]

https://github.com/aws/aws-parallelcluster

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
AWS Envision Engineering Program Fact Sheet For AWS Customers
No ratings yet
AWS Envision Engineering Program Fact Sheet For AWS Customers
2 pages
AWS Re/start Agenda: Week 1 - Introduction, Cloud Foundations
No ratings yet
AWS Re/start Agenda: Week 1 - Introduction, Cloud Foundations
12 pages
APPROVED - Cloud Migration Checklist From Core PDF
No ratings yet
APPROVED - Cloud Migration Checklist From Core PDF
3 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Oracle To Azure Database For Postgresql Migration Cookbook: Prepared by
No ratings yet
Oracle To Azure Database For Postgresql Migration Cookbook: Prepared by
13 pages
Aws Vs Azure PDF
No ratings yet
Aws Vs Azure PDF
9 pages
Azure Arc Overview
No ratings yet
Azure Arc Overview
2 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
AWS Migration PDF
No ratings yet
AWS Migration PDF
5 pages
Oracle Database Checklist
No ratings yet
Oracle Database Checklist
25 pages
Cloud Computing: Concepts, Technologies and Business Implications
No ratings yet
Cloud Computing: Concepts, Technologies and Business Implications
37 pages
Benefits of Cloudendure
No ratings yet
Benefits of Cloudendure
7 pages
High Performance Computing Lecture 1 HPC Public
No ratings yet
High Performance Computing Lecture 1 HPC Public
50 pages
Implementing Microsoft Azure Infrastructure Solutions
100% (1)
Implementing Microsoft Azure Infrastructure Solutions
9 pages
HCIA-Cloud Computing-Chapter3
No ratings yet
HCIA-Cloud Computing-Chapter3
28 pages
Cloud Advocate - 1 - Introduction To Cloud Computing
No ratings yet
Cloud Advocate - 1 - Introduction To Cloud Computing
37 pages
Dockers and Kubernetes: A Way To Build Scalable and Portable Applications With Cloud
No ratings yet
Dockers and Kubernetes: A Way To Build Scalable and Portable Applications With Cloud
133 pages
Az 900 PDF
No ratings yet
Az 900 PDF
11 pages
Modernizing Database
No ratings yet
Modernizing Database
67 pages
Conditional Access
No ratings yet
Conditional Access
39 pages
A Case Study of Migrating An Enterprise IT System To IaaS PDF
No ratings yet
A Case Study of Migrating An Enterprise IT System To IaaS PDF
8 pages
WWW - Aka.ms/pathways: Azure Reserved Instances
No ratings yet
WWW - Aka.ms/pathways: Azure Reserved Instances
1 page
Amazon Web Services (AWS) : Overview
No ratings yet
Amazon Web Services (AWS) : Overview
6 pages
Azure CLI Cheat Sheet 80 Commands
No ratings yet
Azure CLI Cheat Sheet 80 Commands
13 pages
CoursePresentation AZ900 AzureFundamentals
No ratings yet
CoursePresentation AZ900 AzureFundamentals
188 pages
Easy Loan Management System
No ratings yet
Easy Loan Management System
9 pages
AZ 103 Azure Syllabus
No ratings yet
AZ 103 Azure Syllabus
6 pages
# Step Description Who
No ratings yet
# Step Description Who
2 pages
Get Elements of Cloud Computing Security A Survey of Key Practicalities 1st Edition Mohammed M. Alani (Auth.) Free All Chapters
100% (4)
Get Elements of Cloud Computing Security A Survey of Key Practicalities 1st Edition Mohammed M. Alani (Auth.) Free All Chapters
52 pages
Features & Solutions AWS
No ratings yet
Features & Solutions AWS
11 pages
Azure Brochure
No ratings yet
Azure Brochure
4 pages
Washington State Cloud Readiness Report
No ratings yet
Washington State Cloud Readiness Report
27 pages
Azure AD
No ratings yet
Azure AD
10 pages
The 10 Riskiest Azure Misconfigurations
No ratings yet
The 10 Riskiest Azure Misconfigurations
11 pages
Cloud Computing Use Cases Whitepaper-4 0
No ratings yet
Cloud Computing Use Cases Whitepaper-4 0
68 pages
Citrix Virtual Apps and Desktops Vs AWS Workspaces - Battlecard PDF
No ratings yet
Citrix Virtual Apps and Desktops Vs AWS Workspaces - Battlecard PDF
3 pages
CS1 - Introduction To Cloud
No ratings yet
CS1 - Introduction To Cloud
62 pages
Service Offerings: AWS Price Calculator Azure Calculator
No ratings yet
Service Offerings: AWS Price Calculator Azure Calculator
3 pages
ASSIGENMENT
No ratings yet
ASSIGENMENT
13 pages
AWS Governance at Scale
No ratings yet
AWS Governance at Scale
21 pages
Competitive Analysis of The VMware VRealize Cloud Management Suite
No ratings yet
Competitive Analysis of The VMware VRealize Cloud Management Suite
37 pages
April 2018
No ratings yet
April 2018
12 pages
AcademyCloudFoundations Module 01
No ratings yet
AcademyCloudFoundations Module 01
47 pages
Zesty Overview
No ratings yet
Zesty Overview
26 pages
Cloud Migration Plan 7
No ratings yet
Cloud Migration Plan 7
1 page
Session2-Cloud computing
No ratings yet
Session2-Cloud computing
30 pages
Exam1 AWS-Certified-Cloud-Practitioner - Sample-Questions
No ratings yet
Exam1 AWS-Certified-Cloud-Practitioner - Sample-Questions
17 pages
Cloud Computing
No ratings yet
Cloud Computing
12 pages
Cloud Computing Assignment-1
No ratings yet
Cloud Computing Assignment-1
9 pages
A987059828 - 11266 - 4 - 2020 - GCp-3 My PDF
No ratings yet
A987059828 - 11266 - 4 - 2020 - GCp-3 My PDF
42 pages
AWS Architecture
100% (1)
AWS Architecture
1 page
CampusRecruitmentBook PDF
No ratings yet
CampusRecruitmentBook PDF
126 pages
Total Cost of Ownership (TCO) Calculator: Estimate The Cost Savings You Can Realize by Migrating Your Workloads To Azure
No ratings yet
Total Cost of Ownership (TCO) Calculator: Estimate The Cost Savings You Can Realize by Migrating Your Workloads To Azure
12 pages
AzureARM Handout M5&6-1
No ratings yet
AzureARM Handout M5&6-1
1 page
Azure Architecture Center - Azure Architecture Center - Microsoft Docs
No ratings yet
Azure Architecture Center - Azure Architecture Center - Microsoft Docs
5 pages
AWS CloudFormation Masterclass
No ratings yet
AWS CloudFormation Masterclass
81 pages
Transition Prep
No ratings yet
Transition Prep
17 pages
Cloud Computing MCQ (Multi Choice Questions) - Javatpoint
No ratings yet
Cloud Computing MCQ (Multi Choice Questions) - Javatpoint
60 pages
AWS GitLab DevOps
No ratings yet
AWS GitLab DevOps
13 pages
SQL Server Ground To Cloud
No ratings yet
SQL Server Ground To Cloud
167 pages
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
Fortanix SDKMS Datasheet
No ratings yet
Fortanix SDKMS Datasheet
4 pages
Cloud Security: Timothy Brown
No ratings yet
Cloud Security: Timothy Brown
40 pages
Cloud-Based Phone System
No ratings yet
Cloud-Based Phone System
3 pages
Emerging Trends in Cloud Computing
No ratings yet
Emerging Trends in Cloud Computing
8 pages
Cloud Erp Thesis
100% (2)
Cloud Erp Thesis
6 pages
Internet of Things (IoT): A Vision, Architectural Elements and Future Directions
No ratings yet
Internet of Things (IoT): A Vision, Architectural Elements and Future Directions
23 pages
2019 BlackHat US Metcalf Morowczynski AttackingAndDefendingTheMicrosoftCloud
No ratings yet
2019 BlackHat US Metcalf Morowczynski AttackingAndDefendingTheMicrosoftCloud
98 pages
Salesfroce Admin Synopsis
No ratings yet
Salesfroce Admin Synopsis
5 pages
Ahmad Mustaqiem CV
No ratings yet
Ahmad Mustaqiem CV
4 pages
Data Cloud 1
No ratings yet
Data Cloud 1
1 page
The Grand Horizon of Embedded Commerce Services
No ratings yet
The Grand Horizon of Embedded Commerce Services
7 pages
Chapter 1 Introduction to Cloud Computing (1.1-1.4)
No ratings yet
Chapter 1 Introduction to Cloud Computing (1.1-1.4)
21 pages
annotated-Enar_AngeluMiguel_ERPFundamentals%20of%20Accounting%20with%20Analytics%20-%20Workbook%20v2024_-1387617706
No ratings yet
annotated-Enar_AngeluMiguel_ERPFundamentals%20of%20Accounting%20with%20Analytics%20-%20Workbook%20v2024_-1387617706
49 pages
vcf-private-cloud-automation
No ratings yet
vcf-private-cloud-automation
234 pages
ASV Business Solutions Pitch
No ratings yet
ASV Business Solutions Pitch
2 pages
Part B ETI
No ratings yet
Part B ETI
10 pages
1 Align Cloud Document
No ratings yet
1 Align Cloud Document
63 pages
SAP Business One Competition
No ratings yet
SAP Business One Competition
10 pages
R Rep M.2370 2015 PDF e
No ratings yet
R Rep M.2370 2015 PDF e
51 pages
23000122010
No ratings yet
23000122010
12 pages
Graduands Convocation 2021 v6
No ratings yet
Graduands Convocation 2021 v6
57 pages
'Google Cloud Computing Foundations' Program
No ratings yet
'Google Cloud Computing Foundations' Program
3 pages
Hello Barbie. The Doll That REALLY Listens: Looking For Something?
No ratings yet
Hello Barbie. The Doll That REALLY Listens: Looking For Something?
8 pages
Eagle Company Profile v1.4
No ratings yet
Eagle Company Profile v1.4
24 pages
ODZONIC PVT LTD
No ratings yet
ODZONIC PVT LTD
8 pages
Isms-p 인증제도 안내서 (2021.7)
No ratings yet
Isms-p 인증제도 안내서 (2021.7)
62 pages
Cloud Computing PPT CNS
No ratings yet
Cloud Computing PPT CNS
17 pages
PDF Getting Started with NSX-T: Logical Routing and Switching 1st Edition Iwan Hoogendoorn download
100% (4)
PDF Getting Started with NSX-T: Logical Routing and Switching 1st Edition Iwan Hoogendoorn download
55 pages

Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P

Uploaded by

Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P

Uploaded by

CMP372-P

Using AWS ParallelCluster to

Introduction to AWS ParallelCluster

Demo: Running CFD using AWS ParallelCluster

Scaling up fast with AWS ParallelCluster

AWS ParallelCluster best practices

“For every $1 spent on

Fixed datacenter $463 in incremental

72.8% of organizations that use HPC reported

Lost innovation Outdated technology Technical debt

AWS Identity and Access Management (IAM)

Manage 50X the number of securities

SLURM SGE TORQUE AWS BATCH

FSX EFS S3 EBS RAID

ON-DEMAND SPOT VPC & SUBNETS

Optimizing production workloads Fast prototyping Bypassing queues

CAE, CFD Weather modeling

Tightly coupled workloads Loosely coupled workloads Accelerated computing

Visualization AI/ML High-volume data analytics

Master Server Compute Nodes

Client AWS CloudFormation Amazon EC2 AWS Auto Scaling

Amazon EC2 C5n

Cluster Placement Group

(up to 4.0 GHz)

(HDD and NVMe)

(GPUs and FPGA)

• The master node orchestrates

• Small master nodes have more

• Spot for fault-tolerant and flexible

• Reserved instances for predictable

100+ GiB/s throughput Supports hundreds of

• Easily turn hyperthreading on or off

• Install your own software on top of

• Bring your own AMI and add AWS

Flexible configuration and virtually unlimited scalability

You might also like