Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P
Using AWS ParallelCluster To Simplify HPC Cluster Management CMP372-P
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
HPC on AWS
7
5
capacity limit
2
7 8 1
3
7 1 2
incremental profit.”
6 3 7
6
Cores
Cores
9 6 6
4 8
1
9
4 7
2 1 2
5 7
7
8 4 4 4
5 2 1
1 3
2 2
Finite capacity, usually with Massive capacity when needed to speed up time
long queues to wait in to results, and agile environment when additional
hardware and software experimentation is needed
Because, a TCO analysis never tells the whole story
Lost productivity & longer time to results
AWS DataSync Amazon EC2 instances Amazon EBS AWS Batch NICE DCV
(CPU, GPU, FPGA)
AWS Snowball Amazon FSx for Lustre AWS ParallelCluster Amazon AppStream 2.0
Amazon EC2 Spot
AWS Snowmobile Amazon EFS NICE EnginFrame
AWS Auto Scaling
AWS Direct Connect Amazon S3
Placement groups
Enhanced networking
Elastic Fabric Adapter
AWS Budgets
Running HPC applications at extreme scale
Accelerating time to innovation
single
HPC cluster of 1 million vCPUs
“Storage technology is amazingly complex and we’re constantly pushing the limits
of physics and engineering to deliver next-generation capacities and technical
innovation. This successful collaboration with AWS shows the extreme scale,
power and agility of cloud-based HPC to help us run complex simulations for
future storage architecture analysis and materials science explorations. Using
AWS to easily shrink simulation time from 20 days to 8 hours allows Western
Digital R&D teams to explore new designs and innovations at a pace un-
imaginable just a short time ago.” – Steve Phillpott, CIO, Western Digital
Helping financial institutions
model investment risks
Run risk models
4,000 times faster
In hours, instead of months
Easy cluster management Automatic Resource Scaling Seamless Migration to the Cloud
Easy cluster management
`pcluster configure` to set Use config files to define Launch, stop, and restart clusters
up a cluster in minutes details of replicable clusters on demand
Automatic resource scaling
Scale up when jobs are Scale down when the Your data storage and file system
waiting cluster is idle scale to match your compute
Seamless migration to the cloud
Making HPC workloads cloud- AWS ParallelCluster simplifies first Integrations simplify the transition to
native can take time and planning steps to migrate HPC workloads cloud-native HPC at your own pace
AWS ParallelCluster
ALINUX CENTOS 6/7 UBUNTU DCV EFA OPENMPI INTELMPI NCCL
16/18
Drug discovery, Genomics Risk, Quantitative Research Reservoir Modeling, Seismic Imaging
Availability Zone
VPC Subnet
Scheduler
Queue
SSH +
User
NFS Share
Availability Zone
M5n/ R5n/
P3dn G4dn1 C5n i3en
M5dn R5dn
Custom
NVIDIA Intel Xeon
V100 Tensor Scalable
and T4 processor
Core GPUs
Choose compute instances suited for each workload
Categories Capabilities
NEW
(AWS, Intel, AMD)
(up to 12 TiB)
NEW
(up to 100 Gbps)
(Nano to 32xlarge)
Choose a master node to match your cluster
https://github.com/aws/aws-parallelcluster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.