0% found this document useful (0 votes)

13 views

AIM301 Deep Learning With TensorFlow PyTorch and MXNet on AWS

Uploaded by

anishaman6206

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

AIM301 Deep Learning With TensorFlow PyTorch and MXNet on AWS

Uploaded by

anishaman6206

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

AMERICAS

AIM301

Deep learning with TensorFlow,

PyTorch, and MXNet on AWS
Shashank Prasanna
Sr. Developer Advocate, AI/ML
AWS

Popular deep learning frameworks: TensorFlow, PyTorch, and MXNet

Getting the most out of deep learning frameworks with

Amazon SageMaker

Summary

Resources

Q&A
Deep learning: ML with deep neural networks
Recommendations Forecasting Image recognition …

K-nearest neighbors XGBoost

Random forest
Data K-means Factorization machines
Linear learner PCA
Support vector machines

Machine
learning Deep neural networks

• Recommendations
Results •
•
Forecasts
Predictions
• Trends and patterns
Challenges with deep learning
Many model architectures – difficult to get started
VGG, ResNet, ResNeXt, DenseNet, SqueezeNet, R-CNN, Faster R-CNN,
SSD, YOLO, Seq2Seq, Transformers, and custom model architecture

Computationally intensive to train and deploy ...

• Needs high-performance CPUs and GPUs ...
...
• Needs fast access to GBs and TBs of data for training ...
• Training on hundreds of CPUs and GPUs requires infrastructure
management

Difficult to host and manage models in production

• Difficult to deliver high-performance and low-latency predictions
• Scaling to thousands and millions of users requires infrastructure
management
Deep learning frameworks
Building blocks for designing, training, and validating deep neural networks

• High-level programming • Low-level functions for research

APIs with Keras and Gluon and development
• Performance optimizations • Ability to run training at scale (but you
to take advantage of GPUs will have to manage infrastructure)
Deep learning needs more than just frameworks
SageMaker Studio (IDE)
ML services
Built-in SageMaker SageMaker Model SageMaker SageMaker Model SageMaker
algorithms notebooks experiments tuning Debugger Autopilot hosting Model Monitor

Frameworks

Compute
networking
storage
Deep learning on AWS
Amazon SageMaker + deep learning frameworks + infrastructure services =
record-setting performance at low cost

Low cost

27 minutes 62 minutes 40% lower

is the record-setting time to train is the record-setting time to train cost per inference for Inf1
Mask R-CNN with TensorFlow BERT with TensorFlow using 256 instances compared to G4
using 24 P3dn.24xlarge instances P3dn.24xlarge instances with instances – the lowest cost
with 192 total GPUs 2,048 GPUs per inference in the cloud
Amazon SageMaker framework optimizations
Full-stack optimizations: compute + networking + storage + frameworks

High-performance Cost-effective Every framework

Amazon EC2 p3dn,

Amazon EC2 G4 instances Amazon EC2 Inf1 instances
Deep learning framework containers

Amazon S3, Amazon FSx for Lustre Amazon Elastic Inference

AWS Neuron SDK
Getting the most out of deep learning frameworks for
training with Amazon SageMaker
Fully managed and optimized
Amazon SageMaker cluster

SageMaker SDK

Training scripts

…
Two ways to scale deep learning with
Amazon SageMaker
1 2
Bring your own training script Bring your own Docker container
(script mode) (BYOC)

Training scripts

Code files
1 Bring your own training script

AWS Deep
Learning
Code files Containers Amazon ECR

Container
registry
Amazon SageMaker SDK

Amazon S3
Fully managed
SageMaker cluster
2 Bring your own Docker container
Custom container

Docker build Amazon ECR

Code files
Container
registry

Amazon SageMaker SDK

Amazon S3
Fully managed
SageMaker cluster
Large training datasets: What are my options?
TensorFlow, PyTorch, and MXNet
Moderate and • File mode: Copy entire
Fully managed and optimized 1 large datasets
dataset to local volume
Amazon SageMaker cluster
• Pipe mode: Stream
Amazon S3 dataset from Amazon S3

Scalable shared
2 file system • No downloading or
streaming
• Share file system with
Amazon EFS other services

High-performance • Optimized for

3 file system high-performance
computing
… • Natively integrated
FSx for Lustre
file system with Amazon S3
How do I choose the right instance for training?
P3.2xlarge P3.8xlarge P3.16xlarge P3dn.24xlarge Highest performance
optimized for
GPUs 1 x V100 4 x V100 8 x V100 8 x V100 distributed training
• 32 GB memory
GPU • 100 Gbps bandwidth
16 GB / GPU 16 GB / GPU 16 GB / GPU 32 GB / GPU
memory • Record-setting
vCPUs 8 32 64 96 performance on Mask
R-CNN and BERT
Mem 61 244 488 768

Distributing training and large-scale experiments

Distributing training and multiple experiments

Local mode training and prototyping P3

Choosing the right instance for inference deployments
• What is your target latency SLA for your application?
• Real-time inference or batch predictions?
• Popular deep learning framework model or custom code?

CPU instances Elastic Inference GPU instances Custom chip

Network
attached
P3 G4
C5 M5
Inference
accelerator eia1.medium

Large models, high High throughput,

Small models, Mid-sized models, throughput, and best cost and
low throughput low-latency budget low-latency performance in
with tolerance limits access to CUDA the cloud

Start small and size up if you need more capacity

TensorFlow on AWS
https://aws.amazon.com/tensorflow/

AWS optimizations for TensorFlow available on

Amazon EC2 and Amazon SageMaker
• AWS Deep Learning Containers for training and inference
• AWS Deep Learning AMIs (DLAMI)

Amazon SageMaker benefits for TensorFlow Amazon Elastic Inference

• Built-in support for TensorBoard, Debugger, local mode,

hyperparameter tuning, Managed Spot Training, Pipe
mode, and Amazon Elastic Inference
• Distributed training – parameter server and Horovod
• Performance optimizations – GPUs, CPUs, and storage
TensorBoard
Demo: TensorFlow + Amazon SageMaker
• Develop and test using local mode
• Large-scale hyperparameter optimization
• Large-scale distributed training
• Model hosting
PyTorch on AWS
https://aws.amazon.com/pytorch/

AWS optimizations for PyTorch available on

Amazon EC2 and Amazon SageMaker TorchServe
• AWS Deep Learning Containers for training and inference
• AWS Deep Learning AMIs (DLAMI)
TorchElastic
PyTorch on Amazon SageMaker
• Debugger, local mode, hyperparameter tuning, Managed
Spot Training, Pipe mode, and Amazon Elastic Inference
• Serving framework – TorchServe
Amazon Elastic Inference
• Distributed training – TorchElastic
• Performance optimizations – GPUs, CPUs, and storage
TorchServe
An open-source model serving library for PyTorch, built and maintained by AWS
in collaboration with Facebook

aws.amazon.com/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/
Apache MXNet on AWS
https://aws.amazon.com/mxnet/
GluonCV
AWS-optimized Apache MXNet
GluonTS
• AWS Deep Learning Containers for training and inference
• AWS Deep Learning AMIs (DLAMI) GluonNLP

Apache MXNet on Amazon SageMaker

• Debugger, local mode, hyperparameter tuning, Managed
Spot Training, Pipe mode, Amazon Elastic Inference, and
distributed training
• C++, JavaScript, Python, R, Julia, Scala, Clojure, and Perl
• Performance optimizations – GPUs, CPUs, EFA, and storage
Amazon Elastic Inference
Gluon domain-specific tools and libraries
Computer vision Natural language Probabilistic time
processing series modeling
GluonCV
GluonNLP GluonTS
Deep factor
DeepAR
DeepState
Gaussian Processes Forecaster
Non-Parametric Time Series
Forecaster
Feedforward (MLP)
Transformer model
Wavenet
Seq-2-seq
Prophet
R Forecast
AutoGluon: Open-source AutoML
github.com/awslabs/autogluon

Tabular prediction Image classification

Text classification Object detection

https://aws.amazon.com/blogs/opensource/machine-
learning-with-autogluon-an-open-source-automl-library/
Recap: Challenges and solutions
Many model architectures
• TensorFlow, PyTorch, and MXNet offer pretrained models
• Gluon and Keras make it easy to develop custom networks
• Gluon libraries include over 200 pretrained models in CV and NLP

Computationally intensive to train and deploy ...

...
Amazon SageMaker let’s you leverage full-stack optimizations:
...
compute + networking + storage + frameworks for ...
state-of-the-art performance

Difficult to host and manage models in production

• Deploy high-performance, low-latency inference endpoints with SageMaker using
TensorFlow serving, TorchServe, and Multi Model Server
• Reduce cost with Amazon Elastic Inference and Inf1 instances
Resources: Amazon SageMaker

https://github.com/awslabs/a https://docs.aws.amazon.com/sage https://sagemaker.readthedocs.i

mazon-sagemaker-examples maker/latest/dg/whatis.html o/en/stable/overview.html
Resources: Deep learning frameworks
aws.amazon.com/tensorflow aws.amazon.com/pytorch aws.amazon.com/mxnet

Deep Learning Containers images

docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
Resources: Blog posts and videos

https://towardsdatascience.com/how-to- https://towardsdatascience.com/a-quick- https://www.youtube.com/watch?v=02Ft-

debug-machine-learning-models-to-catch- guide-to-distributed-training-with- rCssRs
issues-early-and-often-5663f2b4383b tensorflow-and-horovod-on-amazon-
sagemaker-dae18371ef6e
Learn machine learning with AWS Training and Certification
Resources created by the experts at AWS to help you build and validate machine learning skills

Explore tailored machine learning (ML) paths for business decision

makers, data platform engineers, data scientists, and developers

Learn at your convenience with 65+ free digital courses, or register

for a live instructor-led class featuring hands-on labs and
opportunities for practical application

Take the AWS Certified Machine Learning – Specialty exam

to validate expertise in building, training, tuning, and deploying
ML models

Visit the ML learning paths at https://aws.training/ML

Thank you!
Shashank Prasanna
@shshnkp

linkedin.com/in/shashankprasanna

medium.com/@shashankprasanna

Nutanix Premium NCP-EUC 77q-DEMO
100% (1)
Nutanix Premium NCP-EUC 77q-DEMO
39 pages
Mastering Kubernetes
From Everand
Mastering Kubernetes
Gigi Sayfan
5/5 (1)
NVIDIA Jetson AGX Xavier Series System-on-Module: Volta GPU + Carmel CPU + 8/32GB LPDDR4x + 32GB eMMC
No ratings yet
NVIDIA Jetson AGX Xavier Series System-on-Module: Volta GPU + Carmel CPU + 8/32GB LPDDR4x + 32GB eMMC
72 pages
10 Cuda Dgemm Tiled
No ratings yet
10 Cuda Dgemm Tiled
33 pages
Resume 202304241221
No ratings yet
Resume 202304241221
2 pages
Devops Resume
No ratings yet
Devops Resume
5 pages
Pavan Cloud&devops
No ratings yet
Pavan Cloud&devops
5 pages
Shiva Devops Resume
No ratings yet
Shiva Devops Resume
6 pages
AWS ML Notes -Domain Misc
No ratings yet
AWS ML Notes -Domain Misc
15 pages
Karuna - AWS _DevOps (1) (2)
No ratings yet
Karuna - AWS _DevOps (1) (2)
3 pages
DevOps_Resume_10-210322-120700
No ratings yet
DevOps_Resume_10-210322-120700
4 pages
6 +Athena,+QuickSight,+EMR
No ratings yet
6 +Athena,+QuickSight,+EMR
63 pages
Sagemaker
No ratings yet
Sagemaker
8 pages
Naukri_BiradarChannammarani[4y_0m]
No ratings yet
Naukri_BiradarChannammarani[4y_0m]
4 pages
Docker Architecture - javatpoint
No ratings yet
Docker Architecture - javatpoint
1 page
Amrit Kumar Majhi Resume
No ratings yet
Amrit Kumar Majhi Resume
2 pages
AWSomeDayOnline Q322 - 2. Introduction To AWS Services Compute, Storage, Databases
No ratings yet
AWSomeDayOnline Q322 - 2. Introduction To AWS Services Compute, Storage, Databases
33 pages
Javed Akhtar: Objective
No ratings yet
Javed Akhtar: Objective
2 pages
Cloud and Devops - Syllabus
No ratings yet
Cloud and Devops - Syllabus
4 pages
Slides Updated No Notes
No ratings yet
Slides Updated No Notes
10 pages
Lokesh Narahari_devops_resume-3yrs
No ratings yet
Lokesh Narahari_devops_resume-3yrs
4 pages
GOPI_N
No ratings yet
GOPI_N
3 pages
Module Preprocesing_MLPipeline
No ratings yet
Module Preprocesing_MLPipeline
7 pages
Siddu_BellankiAWS_DEVOPS resume (1) ABS
No ratings yet
Siddu_BellankiAWS_DEVOPS resume (1) ABS
4 pages
S1 Nilda Boza Act01 Tema1
No ratings yet
S1 Nilda Boza Act01 Tema1
6 pages
Suresh Devops
No ratings yet
Suresh Devops
4 pages
Architecting and Managing Apps Matt Tavis July 2010
No ratings yet
Architecting and Managing Apps Matt Tavis July 2010
30 pages
TrailHead_ArchitectingInTheCloud(2)
No ratings yet
TrailHead_ArchitectingInTheCloud(2)
24 pages
Log Analytics Withamazonelasticsearchservice
No ratings yet
Log Analytics Withamazonelasticsearchservice
46 pages
Data Warehouse _ What is Snowflake Schema
No ratings yet
Data Warehouse _ What is Snowflake Schema
10 pages
Defender for CSPM DevOps Containers Cloud Database Storage
No ratings yet
Defender for CSPM DevOps Containers Cloud Database Storage
64 pages
ANILRESUMEAWS2023 (3)
No ratings yet
ANILRESUMEAWS2023 (3)
3 pages
161231132352
No ratings yet
161231132352
4 pages
Esther CV Draft
No ratings yet
Esther CV Draft
2 pages
Learn The Fundamentals of AWS Cloud: Learning Resource Duration Type
No ratings yet
Learn The Fundamentals of AWS Cloud: Learning Resource Duration Type
4 pages
Avinash Kumar
No ratings yet
Avinash Kumar
1 page
Atharva CLoud India
No ratings yet
Atharva CLoud India
1 page
Ullas DevOps Resume (2)
No ratings yet
Ullas DevOps Resume (2)
3 pages
Raghava_Cloud
No ratings yet
Raghava_Cloud
2 pages
AIOPS MULTI-CLOUD ENGINEERING
No ratings yet
AIOPS MULTI-CLOUD ENGINEERING
18 pages
AcademyCloudFoundations Module 08
No ratings yet
AcademyCloudFoundations Module 08
64 pages
Saurabh Jagtap Resume
No ratings yet
Saurabh Jagtap Resume
3 pages
EV_under-the-hood-how-to-run-oracle-databases-on-aws_Mar-2021
No ratings yet
EV_under-the-hood-how-to-run-oracle-databases-on-aws_Mar-2021
34 pages
Sudhir DevOps 3yrs ImmidiateJoiner
No ratings yet
Sudhir DevOps 3yrs ImmidiateJoiner
3 pages
Name Senior AWS Cloud Engineer Passport No: .
No ratings yet
Name Senior AWS Cloud Engineer Passport No: .
6 pages
REPEAT 1 Modernizing Microsoft SQL Server On AWS WIN301-R1
No ratings yet
REPEAT 1 Modernizing Microsoft SQL Server On AWS WIN301-R1
54 pages
SuruchiBhatnagar (4 6) PDF
No ratings yet
SuruchiBhatnagar (4 6) PDF
3 pages
Chirag_Resume-1
No ratings yet
Chirag_Resume-1
3 pages
Karthik AwsDevops 4yrs Resume
No ratings yet
Karthik AwsDevops 4yrs Resume
4 pages
A Path To Event Sourcing With Amazon MSK - James Ousby
No ratings yet
A Path To Event Sourcing With Amazon MSK - James Ousby
42 pages
Research ppt1
No ratings yet
Research ppt1
6 pages
REPEAT_9_Get_started_with_AWS_DeepRacer_AIM207-R9
No ratings yet
REPEAT_9_Get_started_with_AWS_DeepRacer_AIM207-R9
47 pages
APP306 - Using AWS CloudFormation For Deployment and Management at Scale
No ratings yet
APP306 - Using AWS CloudFormation For Deployment and Management at Scale
59 pages
DAT325 - Managed Oracle Databases With Amazon RDS New Features and Best Practices
No ratings yet
DAT325 - Managed Oracle Databases With Amazon RDS New Features and Best Practices
48 pages
AWSome Day Online 2020 - Module 2 Deck - Final
No ratings yet
AWSome Day Online 2020 - Module 2 Deck - Final
55 pages
Amazon SageMaker DataWrangler Deep Dive Deck
No ratings yet
Amazon SageMaker DataWrangler Deep Dive Deck
30 pages
Vijay Devops Resume
No ratings yet
Vijay Devops Resume
6 pages
Ramp-Up_Guide_Machine_Learning
No ratings yet
Ramp-Up_Guide_Machine_Learning
4 pages
Gopal_DevOps_Resume 3.10
No ratings yet
Gopal_DevOps_Resume 3.10
3 pages
CV Suresh Iyer
No ratings yet
CV Suresh Iyer
4 pages
5. Storage Services
No ratings yet
5. Storage Services
34 pages
AWS Certified Solutions Architect - Associate Exam Prep kit
From Everand
AWS Certified Solutions Architect - Associate Exam Prep kit
SUJAN
No ratings yet
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Cloud Computing with AWS: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Misic (Parallel Implementation) PDF
No ratings yet
Misic (Parallel Implementation) PDF
4 pages
Deep Learning Toolbox Getting Started Guide MATLAB The Mathworks - The ebook is ready for download with just one simple click
100% (1)
Deep Learning Toolbox Getting Started Guide MATLAB The Mathworks - The ebook is ready for download with just one simple click
78 pages
Quickspecs: HP Zbook Fury 15.6 Inch G8 Mobile Workstation PC
No ratings yet
Quickspecs: HP Zbook Fury 15.6 Inch G8 Mobile Workstation PC
49 pages
New Dlau
No ratings yet
New Dlau
52 pages
Host Launcher Log
No ratings yet
Host Launcher Log
10 pages
Essence Editor Log
No ratings yet
Essence Editor Log
10 pages
Data-Parallel Architectures and
No ratings yet
Data-Parallel Architectures and
27 pages
DX Diag
No ratings yet
DX Diag
46 pages
Openai Five: Defeat Dota 2 Restrictions The International Complex Train Prize Pool
No ratings yet
Openai Five: Defeat Dota 2 Restrictions The International Complex Train Prize Pool
18 pages
Solivieri
No ratings yet
Solivieri
245 pages
UNV EZStation 3.0-Reference Configuration for Decoding Performance-En
No ratings yet
UNV EZStation 3.0-Reference Configuration for Decoding Performance-En
2 pages
Info - Call of Duty - Black Ops 2.v 1.0.0.1u3
No ratings yet
Info - Call of Duty - Black Ops 2.v 1.0.0.1u3
6 pages
Swot Analysis
100% (1)
Swot Analysis
6 pages
Lastexception 63840978597
No ratings yet
Lastexception 63840978597
5 pages
Vmware Nvidia Grid Vgpu Faq
No ratings yet
Vmware Nvidia Grid Vgpu Faq
4 pages
Sezojudunodanibimepu
No ratings yet
Sezojudunodanibimepu
4 pages
Riverflow2D GPU Tests 2021 - Revised
No ratings yet
Riverflow2D GPU Tests 2021 - Revised
18 pages
Thesis Chapter 1 - 3 Carlo
50% (4)
Thesis Chapter 1 - 3 Carlo
34 pages
Specifications of Dell G15 5511: Dimensions and Weight
No ratings yet
Specifications of Dell G15 5511: Dimensions and Weight
9 pages
Kit PPK L1L2 Da T2R Soluções Tecnológicas em Um DJI Air 2S
No ratings yet
Kit PPK L1L2 Da T2R Soluções Tecnológicas em Um DJI Air 2S
10 pages
GVDB TechnicalTalk Siggraph2016
No ratings yet
GVDB TechnicalTalk Siggraph2016
49 pages
Product Sheet - MasterBox NR200P MAX
No ratings yet
Product Sheet - MasterBox NR200P MAX
7 pages
Petrel 2017 Installation Guide
No ratings yet
Petrel 2017 Installation Guide
102 pages
6-23_HC35_PIM_PNM_Samsung_final
No ratings yet
6-23_HC35_PIM_PNM_Samsung_final
31 pages
System Reference Manual Beagleboard - Beaglebone-Ai Wiki GitHub
No ratings yet
System Reference Manual Beagleboard - Beaglebone-Ai Wiki GitHub
167 pages
LibreOffice Calc Spreadsheets On The GPU
No ratings yet
LibreOffice Calc Spreadsheets On The GPU
41 pages
Calculating Prime Numbers Comparing Java, C, and Cuda
No ratings yet
Calculating Prime Numbers Comparing Java, C, and Cuda
27 pages