AI From The Data Center To The Edge An Optimized Path Using Intel Architecture PDF
AI From The Data Center To The Edge An Optimized Path Using Intel Architecture PDF
Legal information
These materials are provided for educational purposes only and is being provided subject to the CC_BY_NC_ND 4.0 license which can be found at the following location:
https://creativecommons.org/licenses/by-nc-nd/4.0/
Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on
system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel
representative to obtain the latest forecast, schedule, specifications and roadmaps.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and
MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined
with other products.
Intel, the Intel logo, Arria, Myriad, Atom, Xeon, Core, Movidius, neon, Stratix, OpenCL, Celeron, Phi, VTune, Iris, OpenVINO, Nervana, Nauta, and nGraph are trademarks of Intel
Corporation in the U.S. and/or other countries.
https://creativecommons.org/licenses/by‐nc‐nd/4.0/
2
Dataset citation
A Large and Diverse Dataset for Improved Vehicle Make and Model Recognition
F. Tafazzoli, K. Nishiyama and H. Frigui
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2017.
http://vmmrdb.cecsresearch.org/papers/VMMR_TSWC.pdf
3
Course completion certificate
• You have the option to receive an Intel® AI Course Completion Certificate upon
completion of the end of the course quiz.
• Before taking the quiz, you may have to disable AdBlockers. (Ghostery, uBlock,
AdGuard, etc.)
4
Learning objective
Use Intel hardware and software portfolio and demonstrate the data science process
• Hands-on understanding of building a deep learning model and deploying to the edge
₋ Train the model – obtain the graph and weights of the trained network
₋ Deploy the model on CPU, integrated Graphics and Intel® Movidius™ Neural Compute Stick
5
Training Outline
1. Intel’s AI Portfolio 4. Model Analysis
• Hardware: From training to inference with emphasis on 2nd Gen Intel® • Check your scores
Xeon™ Scalable Processors • Compare your results
• Software: Frameworks, libraries and tools optimized for Intel® • Hyper parameter tuning
Architecture • Pick the winner or go back to training
• Community resources: Intel Developer Zone Resources
5. Deploy to the edge / Inference
2. Exploratory Data Analysis • Introduction to the Intel® OpenVINO™ Toolkit – Capabilities and
• Obtain a dataset benefits
• Explore data visually to understand distribution • Usage Models
• Data Reduction and address imbalances • Model Optimizer – Optimize model, generate hardware agnostic
Intermediate Representation (IR) files for prebuilt and custom models
3. Training the models • Inference Engine – Deploy to CPU, integrated GPU, FPGA and Intel®
• Infrastructure: Intel® AI DevCloud, Amazon Web Services*, Google Movidius™ Neural Compute Stick
Compute Engine*, Microsoft Azure*
• Process: Prepare and visualize the dataset, prepare for consumption
into framework, hyper-parameter tuning, training, validate
6
prerequisites
• Basic understanding of AI principles, Machine Learning and Deep Learning
– Introduction to AI
– Machine Learning
– Deep Learning
https://software.intel.com/en‐us/ai/courses/artificial‐intelligence
https://software.intel.com/en‐us/ai/courses/machine‐learning
https://software.intel.com/en‐us/ai/courses/deep‐learning
https://software.intel.com/en‐us/ai/courses/tensorflow
7
8
Business The ai Intel
imperative journey AI
9
© 2019 Intel Corporation
9
Why AI now?
Data deluge (2019) Analytics Curve Insights
25 GB 1 PER month
Internet User
Act/adapt
Cognitive
50 GB 2 PER DAY
Analytics
BUSINESS
Smart Car Forecast
Prescriptive
30 tb 2 PER DAY Analytics
Smart Hospital
Foresight
40 tb 2 PER DAY Predictive
OPERATIONAL
AI
Analytics
Airplane Data
insight
1Smart
Pb 2Factory
PER DAY
Diagnostic
Analytics
Hindsight
50 Pb 2 PER DAY
City Safety
Descriptive
Analytics Is the driving force SECURITY
1. Source: http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html
2. Source: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/547_11_10-15-DocumentsCisco_GCI_Deck_2014-2019_for_CKN__10NOV2015_.pdf 10
So what’s driving this AI surge?
Data, for one. In 2019, the average internet user will generate ~25GB of IP traffic
per month. By comparison, in a single day, a smart car will generate 2X that
amount of data (50 GB), a smart hospital will generate 120X (3TB or 3,000 GB), a
plane will generate 1,600X (40TB or 40,000 GB), a smart factory will generate
40,000X (1PB or 1,000,000 GB), and a city safety system will generate 800,000X
(50PB or 50,000,000 GB). And we’re only talking about 2019! When you consider
that there will be 3X more smart connected devices than the global population by
2022, a growth from 17.7 billion networked devices in 2019 to 28.5 billion in
2022, the quantity of data generated is difficult to fathom. This data contains a
treasure trove of valuable insights, in business, operations and security that we really want
to extract, and in order to do that efficiently we need tools like analytics and AI by our side.
And when you think about analyzing troves of data, the first thing that may come
to mind is “data analytics”, the longstanding but constantly evolving science that
companies leverage for insight, innovation, and competitive advantage. Analytics
has changed a lot over the years, but continues to advance through ‘more or less’
five stages of increasing scale & maturity:
10
• Descriptive & Diagnostic analytics, sometimes called “operational analytics”, help
us understand what happened and why.
• Predictive, Prescriptive & Cognitive analytics, sometimes called “advanced
analytics”, help us predict and plan for the future.
AI is its own category, applied to all phases of the analytics pipeline (especially more
advanced analytics), and a vital tool for reaching higher maturity & scale data analytics. And
with the recent breakthroughs in computation performance and an in‐pouring of innovation
into the A+AI realm, we now have the tools to extract valuable insights from troves of data.
10
What is AI?
AI Training:
Human Bicycle
Machine
Forward
“Strawberry”
learning
Backward ? “Bicycle”
Lots of Error
tagged
Strawberry data
Model
Weights
Deep Inference:
learning Forward
“Bicycle”?
??????
So, what is AI, exactly? Well, we’re a long way from how Artificial Intelligence (AI for short)
is portrayed in science fiction movies. The definition continues to evolve, but
fundamentally, AI is the ability of machines to learn from experience, without explicit
programming, in order to perform functions typically associated with the human mind.
There’s no one-size fits all approach to AI, so it’s helpful to explore some of the more prominent approaches
to AI.
One such leading approach is machine learning, a category includes algorithms that improve with exposure to more data
over time, and there are countless such algorithms that perform functions like regression, classification, clustering,
decision trees, extrapolation, and more.
A fast-growing subset within the machine learning category is deep learning. This
approach uses layered neural networks that learn from vast amounts of data to
solve problems that are difficult to reverse engineer, such as computer vision,
speech recognition, and many more. With deep learning, we avoid feature some of the
‘reverse engineering’ required with traditional machine learning algorithms, instead letting
the neural network automatically adjust and adapt to every new piece of training data.
Now, let’s explore the difference between the two key stages of deep
learning: training and inference.
11
In the example shown here, the job of the deep neural network is to
classify a picture into one of three different categories – a person, a
bicycle, or a strawberry.
First, labeled image (e.g. picture of a bicycle labeled as “bicycle”) is input
into the network in a “forward pass” and the untrained network predicts
that the bicycle is a strawberry, which is an error. Next, in the “backward
pass”, the error propagates back through the network and the weights
(i.e., the interconnections between the artificial neurons) are updated to
account for the error. Once these updates have been made, the next time
the same image is passed into the network, it will be more likely to predict
that it’s a bicycle. Over billions upon billions of iterations like this
example, you end up with a “trained” neural network that can accurately
identify a given input image. When you’re satisfied with the trained
accuracy of your neural network, the model weights are frozen, and the
trained model can be used for inference.
Inference, in this example, is the process of feeding an unknown image
into the trained neural network and allowing it to “infer” what’s in that
image. If you did a good job training the model weights, it should predict
“bicycle” for an image of a bicycle. Inference is really just the “forward
pass” portion of the training phase, but while training is dense compute
intensive and typically done in the data center, inference can take place
there or even in a smart car or on a smartphone. The compute demands
for inferencing really depend on the use case, and vary significantly in
throughput, latency, power and size. So while you could use the same
processor for inference as you do for training, it often makes sense to use
a different more efficient approach.
11
AI will transform
12
Which industries are the earliest adopters of AI? Generally, those segments with clear use
cases, high purchasing power, and high rewards for making decisions quickly and/or more
accurately will adopt AI fastest. Here are the segments that we believe will lead AI through
2020, ordered roughly by market opportunity (earliest at left).
Consumer
• Smart Assistants – personal assistant that anticipates, optimizes, automates daily life
(e.g. Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana, Facebook Jarvis
home automation, X.ai virtual assistant Amy)
• Chatbots – 24/7/365 no waiting access to an informative or helpful agent (e.g. WeChat,
Bank of America, Uber, Pizza Hut, Alaska Airlines, Amtrak, etc.)
• Search – ability to more intelligently search more data types including image, video,
context, etc (e.g. Improved Google search, Google Photos, ReSnap)
• Personalization – ability to automatically adjust content/recommendations to suit
individuals (e.g. Entefy, Netflix recommendation engine, Amazon personalized shopping
recommendations)
• Augmented Reality – overlay information on our field of view in real‐time to identify
interesting or undesirable things (e.g.Google Translate using smartphone camera)
• Robots – personal robots that are able to perform household, yard, or other chores (e.g.
Jibo robot for day‐to‐day functions, Roomba follow‐ons)
12
Health
• Enhanced Diagnosis – a tool for doctors to augment their own diagnosis with more data,
experience, precision and accuracy (e.g. radiology image analysis, Journal of American
Medicine Association paper on retina scan for diabetic retinopathy, skin lesion
classification to recognize melanoma with 98% accuracy, medical history scraping,
treatment outcome prediction)
• Drug Discovery – computational drug discovery that intelligently hones in on the most
promising treatments (e.g. speeding pharma drug development)
• Patient Care – machines that aid with monitoring, treatment, and/or recovery of patients
(e.g. visual patient monitoring, autonomous robotic surgery, friendly medication and/or
physical therapy robots)
• Research – instantly sifting through hundreds of new research papers and clinical trials
that are published each day to make new connections (e.g. AI at University of North
Carolina’s Lineberger Comprehensive Cancer Center)
• Sensory Aids – filling in for various senses that are absent or challenged (e.g. visual aid,
audio aid)
Finance
• Algorithmic Trading – augment rule‐based algorithmic trading models and data sources
using AI (e.g. Kensho analysis of myriad data to predict stock movement)
• Fraud Detection – ability to identify fraudulent transactions and/or claims (e.g. USAA
identifies insurance fraud)
• Research – ability to intelligently assemble, parse, and extract meaning from troves of
data that influence asset prices (e.g. Quid, FSI firm reducing time to insight for portfolio
managers through smart knowledge management system)
• Personal Finance – smarter recommendations, lower risk lending, greater efficiency (e.g.
active portfolio recommendations, quickly parsing more data before issuing loan,
automatic reading of check scans, etc.)
• Risk Mitigation – detect risk factors and/or reduce the burden of regulation and minimize
errors through automated compliance (e.g. IBM+Promontory Financial Group using
natural language processing to detect excursions)
Retail
• Support – bots providing shopping, ordering and support in lifelike interaction (e.g. My
Starbucks Barista, KLM Dutch Airline customer support via social media, Nieman Marcus
visual search, Pizza Hut order pizza via bot, Adobe Digital’s digital mirror that recommends
clothes, intelligent phone menu routing based on NLP, ViSenze recommending similar
items based on image, Adobe Digital’s digital mirror that recommends clothes)
• Experience – deliver winning consumer experiences in‐store (e.g. Amazon Go checkout‐
free grocery store, Macy’s mobile shopping assistant, Lowes Lowebots that roam stores
answering simple questions and tracking inventory)
• Marketing – precision marketing to consumers, promoting products and services how and
where they want to hear (e.g. North Face “Expert Personal Shopper” on website)
12
• Merchandising – better planning through accelerated and expanded insight into consumer
buying patterns (e.g. Stitch Fix virtual styling, Skechers.com analyzing clicks in real‐time to
bring similar catalog items forward, Wal‐mart pairing products that sell together,
Cosabella evolutionary website tweaks)
• Loyalty – transform the consumer experience through segmentation (e.g. Under Armour
health app that constantly collects user data to deliver personalized fitness
recommendations)
• Supply Chain – optimize the supply chain and inventory management for efficiency and
innovate new business models (e.g. OnProcess technology’s use of predictive analytics for
inventory management)
• Security – improve security of all consumer and business digital assets, such as real‐time
shoplifting/lifter detection, multi‐factor identity verification, data breach detection (e.g.
Mastercard pay with your face, Walmart facial recognition to catch shoplifters)
Government
• Defense – drones, connected soldiers, defense strategy (e.g. military/surveillance drones,
autonomous rescue vehicles, augmented connected soldier, real‐time threat assessment
and strategy recommendation)
• Data Insights – analyze massive amounts of data to identify opportunities/inefficiencies in
bureaucracy, cybersecurity threats and more, to ultimately implement better systems and
policies (e.g. MIT AI that detects cyber security threats)
• Crime Preventionusing AI to predict and help recover from disasters thanks to ability to
quickly process large amounts of unstructured data and optimize limited resources (e.g.
1Concern, BlueLineGrid)
• Safety & Security – crowd analytics, behavioral/sentiment analytics, social media
analytics, face/vehicle recognition, online identity recognition, real‐time video analytics,
using AI to predict and help recover from disasters thanks to ability to quickly process
large amounts of unstructured data and optimize limited resources (e.g. police analyzing
social media to adjust police presence, license plate readers in police cars, 1Concern,
BlueLineGrid)
• Resident Engagement – new tools to facilitate citizen engagement like chatbots, at‐risk
citizen identification, (e.g. Amelia chatbot in North London Enfield council, North Carolina
chatbot to help state employees with IT inquiries)
• Smarter Cities – traffic/pedestrian management, lighting management, weather
management, energy conservation, services analytics (e.g. San Francisco and Pittsburgh
using sensors and AI to optimize traffic flow)
Energy
• Oil & Gas Exploration – automated geophysical feature detection (e.g. oil & gas producers
using AI to augment traditional modeling & simulation)
• Smart Grid – predictive and real‐time intelligent generation, allocation, and storage of
power to meet variable demand (e.g. GridSense, SoloGrid)
• Operational Improvement – safety and efficiency improvements through predictive and/or
insightful AI (e.g. GE Oil and Gas using predictive analytics and AI to predict and preempt
potential operational problems)
12
• Conservation – intelligent buildings, computing and appliances that reduce power
consumption and are more efficient than producing another kWh of electricity (e.g.
Google DeepMind datacenter energy reductions)
Transport
• Automated Cars – autonomous cars driving on the roadways (e.g. BMW, Google, Uber,
many others)
• Automated Trucking – autonomous trucks driving on the roadways (e.g. Daimler)
• Aerospace – autonomous planes and other aerial vehicles (e.g. Boeing’s evolution of
autopilot and drones)
• Shipping – autonomous package delivery via drone or other vehicle (e.g. Amazon package
delivery drone)
• Search & Rescue – ability to deploy autonomous robot to search and rescue victims in
potentially hazardous environments (e.g. war casualty extraction, miner rescue,
firefighting, avalanche rescue)
Industrial
• Factory Automation – highly‐productive, efficient and safe factories with robots that can
see, hear and adapt to their environment to produce goods with incredible quality and
speed (e.g. assembly line)
• Predictive Maintenance – ability to detect patterns that indicate the likelihood of an
upcoming fault that would require maintenance (e.g. airline being able to adjust schedule
to perform preventive maintenance before a failure)
• Precision Agriculture – ability to deliver the precise amount of water, nutrients, sunlight,
weed killer, etc to a particular crop or individual plant (e.g. farmer using visual weed
search to zap only weeds with RoundUp, automated sorting of produce for market)
• Field Automation – ability to automate heavy equipment beyond the factory walls (e.g.
mining, excavation, construction, road repair)
Other
• Advertising – interactive ads, adaptive ads, personalized ads, real‐time ads (e.g. AdBrain,
MetaMarkets, Proximic, RocketFuel)
• Education – virtual mentors, foreign language instruction, automated study sheets,
personalized assignments, cheating detection, deliberate practice, machine‐to‐machine
instruction (e.g. Intelligent Tutor Systems, Content Technologies Inc, PR2 robot from
Cornell)
• Gaming – dynamic and interactive video game experiences (e.g. Xbox Kinect, Playstation
Eye, Wii)
• Professional & IT Services – sales, marketing, legal research, accounting/tax, assisted
counseling, customized IT recommendations (e.g. Pinsent Masons law firm that emulates
human decision‐making, Salesforce use of AI)
• Telco/Media – customized content/ads, network optimization, quality of service,
mobile/home security (e.g. media company customizing tv show recommendations and
ads, network operator ensuring efficient and high‐quality delivery/repair, wireless
12
company using multi‐factor security)
• Sports – intelligent analytics for injury prevention and betting (e.g. Kinduct injury
prevention, Microsoft Cortana predicting football games)
Here is an even broader list of industries that will be impacted by AI: Advertising, Aerospace,
Agriculture, Automotive, Building Automation, Business, Education, Fashion, Finance,
Gaming, Government, Healthcare, IT, Investment, Legal, Life Sciences, Logistics,
Manufacturing, Media & Entertainment, Oil/Gas/Mining, Real Estate, Retail, Sports & Fitness,
Telecommunications, Transportation
Sources: Intel forecast (IDC, GII Research, Tractica, Technavio, Market Research Store, Allied
Market Research, BCC Research)
12
Business The ai Intel
imperative journey AI
13
© 2019 Intel Corporation
13
1. Challenge
6. Data 4. People
5. Technology
14
Before we venture any further, it’s important to understand that implementing AI in your
organization will be a journey, and to think about which technology partner can help you
accelerate each step to take you full circle.
14
Business The ai Intel
imperative journey AI
15
© 2019 Intel Corporation
15
The Intel AI commitment to our customers is simple: we’re here to help them
break barriers between AI theory and reality. With Intel AI, our customers can
simplify AI, choose any approach, tame their data deluge, speed up development,
deploy AI anywhere, and scale with confidence. There’s no other company on the
planet that brings these unique capabilities together to accelerate AI from start to
finish.
Third is our AI community which brings it all together. Intel’s community helps
customers truly move from data strategy to enterprise-scale AI deployment,
through AI direct engagements, market-ready solutions, reference designs, and
many more offerings across all industries.
For more information about all these and more, visit www.intel.ai
16
17
Deploy AI anywhere
WITH UNPRECEDENTED HARDWARE CHOICE
Dedicated Automated
device Media/vision Driving
Dedicated Flexible
Edge And/OR DL Training Acceleration
ADD
ACCELERATION NNP-L
Multi-cloud Dedicated
DL inference
Graphics, Media &
Analytics Acceleration
GPU
NNP-I
*FPGA: (1) First to market to accelerate evolving AI workloads (2) AI+other system level workloads like AI+I/O ingest, networking, security, pre/post-processing, etc (3) Low latency memory constrained workloads like RNN/LSTM
1GNA=Gaussian Neural Accelerator
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. 18
Images are examples of intended applications but not an exhaustive list.
With Intel AI, you can deploy AI anywhere with unprecedented hardware choice.
As you can infer (pun intended), each AI use case has very different requirements in terms
of compute, power, size/form factor, latency, cost, resilience, etc. It’s helpful to break these
requirements down into a few buckets:
• On the top‐left is the device category, where end point uses with lower power
interactive technology reside such as personal computers, cameras, smart speakers,
drones, robots and more. For this category, the Intel® Atom™ and Intel® Core™
processors are frequently the host processor, but we’re seeing a growing demand in this
space for domain specific inference SOC’s tailored to individual applications
• For internet of things sensors (IOT) in security, home, retail, industrial and many
more verticals, high performance with very low power is crucial. For vision &
inference applications in drones and cameras for example, the Intel® Movidius™
Vision Processing Units (VPU) deliver high quality image recognition in a <1 watt
power envelope. You can experience this platform through the Intel®
Movidius™ Neural Compute Stick NCS (https://developer.movidius.com).
Similarly, for speech recognition in smart speakers and robots for example, the
combination of the Intel® Atom™ processor with the Intel GNA (Gaussian Neural
Accelerator), you can enable always‐on listening using only milliwatts of power.
You can experience this platform through the Intel speech enabling developer kit
18
(https://software.intel.com/en‐us/iot/speech‐enabling‐dev‐kit).
• For self‐driving vehicles, Intel® Mobileye™ technology is your autonomous driving
solution – it’s a comprehensive self‐driving vehicle platform that’s been in
development for years. For transportation‐as‐a‐service oriented companies that
want control over their IP and access to the bare silicon, the Intel® Nervana™
Neural Network Processor for inference is the best solution.
• For personal computing, including desktops, laptops, convertibles, tablets,
smartphones and more, Intel is combining several of our dedicated accelerators
(Movidius VPU, GNA) with our CPU technology (Atom, Core) and integrated
processor graphics (Intel® Iris graphics) to deliver game‐changing display,
video/vision, AR/VR, speech and gesture capabilities.
• On the left‐middle is the edge, which could be a small distributed cluster located at a
company’s factories around the world, an aggregation point like a network video recorder
(NVR), a complex system like a car or MRI (magnetic resonance imaging) machine, or even
just a few servers or workstations acting as gateway devices. In other words, the “edge” is
a broad category for localized compute. For the most part, most customers are doing all
their deep learning inferencing on Xeon/CPU, unless they’re consistently doing a
tremendous amount of it and/or have specific use case requirements, which drives
demand for general purpose acceleration. Even in that case, for customers who run into
problems running inference on CPU, upgrading to the latest generation Xeon and utilizing
the latest Intel‐optimized deep learning software (frameworks & topologies) can help
meet their demands.
• For dedicated inference applications, the Intel® Nervana™ Neural Network
Processor for inference will likely be the most efficient solution
• For vision & inference workloads with higher performance/watt requirements, the
Intel® Movidius™ vision processing unit (VPU) is a great option, available as a PCIe
add‐in card called the “Intel® Vision Accelerator Design with Intel® MovidiusTM
MyriadTM X VPU”
• For streaming latency‐bound workloads with ”real‐time” inference demands,
particularly in media & vision, the highly‐flexible Intel® Arria® 10 FPGA is another
option, , available as a PCIe add‐in card called the “Intel® Vision Accelerator
Design with Intel® Arria® 10 FPGA”
• Finally, on the left‐bottom is multi‐cloud, which consists of the largest ‘hyperscale’
deployments such as public clouds (AWS, GCP, etc), communication service providers,
government labs, academic clusters, large enterprise IT (private and/or hybrid cloud) and
more. For the most part, most customers are running their deep learning inferencing and
training on Xeon/CPU, unless they’re consistently doing a tremendous amount of it, which
drives demand for general purpose acceleration. Even in that case, for customers who run
into problems running on CPU, upgrading to the latest generation Xeon and utilizing the
latest Intel‐optimized deep learning software (frameworks & topologies) can help meet
their demands.
• For dedicated deep learning training environments, like those where that
18
workload is persistent and accounts for a large share of all compute cycles, the
Intel® Nervana™ Neural Network Processor (NNP) is your purpose‐built AI
accelerator solution with multi‐model and multi‐user support – coming in 2019.
• For customers with memory‐bandwidth bound (e.g. RNN/recurrent neural
networks) and/or flexible acceleration needs, the Intel® Stratix® 10 FPGA is an
option – especially such as with very specific custom use cases, unique IP flows,
data types, and/or multi‐function workload flows. If you understand how to use
FPGA’s overall, similar to how they’re used in other accelerated applications, then
this may be a good acceleration solution.
18
the Deep learning myth
“A GPU is required for deep learning…”
Acceleration FALSE
zone • Most businesses (---)
DL Demand
zone
may reach a tipping point
when acceleration is needed1
Time
1”Most businesses” claim is based on survey of Intel direct engagements and internal market segment analysis
19
It’s a myth that you need to use GPUs for AI or deep learning.
As you see in this chart, most enterprises are below the blue line, successfully using Intel®
Xeon® processors for AI and deep learning inference, and are now increasingly using them
for deep learning training too, thanks to optimizations that have led to breakthrough
performance increases. This performance continues to improve over time, and many
enterprises will never need acceleration to meet their needs.
That said, at some point down the line in your AI journey, you may reach an inflection point
where acceleration does becomes necessary. This could be driven by a particular usage
model at initial deployment or once your application “takes off” with huge growth in
inference demand (e.g. your app explodes like Instagram). However, for the initial proof‐of‐
concept (POC) – which can take a few weeks to several months, don’t waste your money on
a limited‐purpose GPU accelerator that will sit idle most of the time, and hardly save you
any time (if at all) due to the added time required to manage/deploy/duplicate data/etc. If
and when you are truly ready to benefit from acceleration, there will be exciting new
options on the market to select from, some of which we cover in this presentation.
So, as a rule of thumb, if you are like most enterprises and are just beginning your AI
journey, forget about acceleration and start with Xeon. It’s already the standard for deep
19
learning inference in the data center, and is now more capable than ever for deep learning
training thanks in large part to all the software optimizations in the past 1‐2 years. At some
point during the AI journey, you may need acceleration for a particular use case or because
deep learning has grown to be significant in your overall compute mix, but cross that bridge
when (and if) it comes… in fact, you may be more than satisfied with the continuous
extension of Xeon AI performance that comes with each new generation, especially now that
new AI features are being built into the silicon architecture.
19
20
Speed up development WITH OPEN AI SOFTWARE
TOOLKITS
App
DEEP LEARNING DEPLOYMENT
Intel® Distribution of OpenVINO™ Toolkit1 Nauta (Beta)
developers Deep learning inference deployment Open source, scalable, and extensible
on CPU/GPU/FPGA/VPU for distributed deep learning platform
Caffe*, TensorFlow*, MXNet*, ONNX*, Kaldi* built on Kubernetes
libraries
Data
Python
MACHINE LEARNING (ML)
R Distributed
DEEP LEARNING FRAMEWORKS
Optimized for CPU & more
•Scikit- •Cart •MlLib (on Spark) * * More framework optimizations
scientists
*
Kernels
Library
Intel®
ANALYTICS & ML
Intel® Data
DEEP LEARNING
Intel® Math Kernel
DEEP LEARNING GRAPH COMPILER
Intel® nGraph™ Compiler (Beta)
developers
Distribution Analytics Library for Deep Open source compiler for deep learning model
for Python* Library Neural Networks computations optimized for multiple devices (CPU,
Intel distribution Intel® Data Analytics GPU, NNP) from multiple frameworks (TF, MXNet,
(Intel® MKL-DNN)
optimized for Acceleration Library ONNX)
Open source DNN functions for
machine learning (incl machine learning) CPU / integrated graphics
1 An open source version is available at: 01.org/openvinotoolkit *Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. 21
© 2019 Intel Corporation Optimization Notice
With Intel AI, you can speed up development with open AI software.
Intel is investing in AI tools that get the most out of, and streamline development
across, each hardware option in our portfolio. This ultimately accelerates total
time-to-solution.
• For application developers—those who deploy solutions using AI-based
algorithms—Intel has several tools to optimize performance and accelerate
time-to-solution. For deep learning, the Intel Distribution of OpenVINO Toolkit
facilitates model deployment for inference by converting and optimizing
trained models for whichever hardware target is downstream. It offers support
for models trained in TensorFlow, Caffe, and MXNet on CPU, integrated GPU,
VPU (Movidius Myriad 2/Neural Compute Stick), and FPGA. Intel also launched
the beta of a tool to help compress the end-to-end deep learning development
cycle. This open source, scalable and extensible distributed deep learning
platform, built on Kubernetes, is called Nauta (pronounced as ‘nau‧ta’; means
‘sailor’ in Latin), formerly know as the Intel Deep Learning Studio.
• For data scientists—those who create AI‐based algorithms—Intel contributes to and
optimizes a set of open source libraries that are widely used for machine and deep
learning. There are a number of such machine learning libraries that can be used to get
the most out of Intel hardware today, spanning Python, R, and Distributed. For deep
learning, Intel aims to ensure that all the major deep learning frameworks and
topologies run well on Intel hardware, and customers are of course free to
choose whichever frameworks best suit their needs. We’ve been directly optimizing
the most popular AI frameworks first, based on market demand, and producing huge
improvements. Today, we have many optimized topologies available for TensorFlow,
MXNet, Caffe2/PyTorch, and BigDL on Spark, and you can download and install the
optimized version of these frameworks by clicking on the link in this slide. Going forward,
we intend to enable even more frameworks like PaddlePaddle, CNTK & many more
through the Intel nGraph compiler.
• For library developers—those who develop and optimize APIs, libraries, and frameworks
to support new algorithms and topologies on the underlying hardware—Intel offers a host
of foundational building blocks to get the most out of our hardware. Beginning on the left
with the primitives category, the Data Analytics Acceleration Library and Intel Python
distribution are important building blocks for machine learning. The DNN (deep neural
network) open source libraries contain CPU‐optimized functions that are most relevant
for, you guessed it, deep learning model development. On the right side of this row is a
description of the Intel nGraph library (formerly the Nervana Graph), which takes the
computational graph from each deep learning framework and creates an intermediate
representation, which is executed by calling the math accelerator software libraries of
each Intel hardware target. This compiler reduces the need for framework and model
direct optimization for each hardware target using low‐level software and math
accelerator libraries. Today, it supports Intel Xeon CPUs, GPU (CUDA), and the Crest family,
with more hardware targets planned going forward.
21
22
Intel® AI academy
FOR DEVELOPERS, STUDENTS, INSTRUCTORS AND STARTUPS
learn Develop
AI DevCloud, use your existing Intel®
tutorials, webinars, student
Xeon® Processor-based cluster, or
kits and support forums
use a public cloud service
software.intel.com/ai 23
So where can you get started? The Intel AI academy is a great place to start for developers,
students, instructors and startups. There, you can learn all about AI, download tools and
resources to begin development with AI, find course materials to teach others & spread the
knowledge, and share things that you've learned and created with the AI community. To
get started, go to software.intel.com/ai.
your AI Projects with the World
Connect with Developers, Share Your Skill, Gain Reputation
Get Noticed with Intel® DevMesh
9K+ Member Profiles
1900+ Developer Projects
Community Groups
Developer Blogs
Community Repos Developer Speakership Programs
devmesh.intel.com 24
24
AI builders: ecosystem
CROSS VERTICAL
oem System integrators
VERTICAL
HEALTHCARE FINANCIAL RETAIL TRANSPORTATION NEWS, MEDIA & AGRICULTURE LEGAL & HR ROBOTIC PROCESS
SERVICES ENTERTAINMENT AUTOMATION
HORIZONTAL
BUSINESS INTELLIGENCE VISION CONVERSATIONAL BOTS AI TOOLS & CONSULTING AI PaaS
& ANALYTCS
Builders.intel.com/ai 25
And last but certainly not least, you can turn to Intel or one of our many ecosystem
partners to help you get started on your AI journey. Visit builders.intel.com/ai to find out
more about one of our 100 and counting list of AI builder partners.
Why Intel AI?
Partner Simplify AI
via our robust community
accelerate
Choose any approach
from analytics to deep learning
Speed up development
your AI
with open AI software
Deploy AI anywhere
with unprecedented HW choice
www.intel.ai 26
26
Resources
• Intel® AI Academy
https://software.intel.com/ai-academy
https://software.intel.com/ai-academy/students/kits/
• Intel® AI DevCloud
https://software.intel.com/ai-academy/tools/devcloud
https://communities.intel.com/community/tech/intel-ai-academy
• DevMesh
https://devmesh.intel.com
27
https://software.intel.com/ai‐academy
https://software.intel.com/ai‐academy/students/kits/
https://software.intel.com/ai‐academy/tools/devcloud
https://communities.intel.com/community/tech/intel‐ai‐academy
https://devmesh.intel.com/
27
28
1. Challenge
8. Deploy 2. Approach
The AI journey
with an 7. model 3. values
intel case study 6. Data 4. People
5. Technology
29
In the next four slides, we’ll walk through an Intel AI case study to illustrate this
journey.
29
Value
approach
70+ AI solutions in Intel’s portfolio
and rank the business value of each
1. Challenge
This journey began with a survey of potential AI opportunities, starting with
internal brainstorming, surveying the external landscape, and combing
through the 70+ solutions in the Intel AI builders program. Once we identified
some promising opportunities, the next step was to assess and rank the
business value of implementing each AI solution.
2. Approach
The next step was to work with Intel’s experts to identify the best approach
(analytics, ML, DL, etc.) and estimate the associated complexity/cost of each
solution. For example, building a new deep learning model from scratch is
more costly than building off an existing deep learning model, which in turn is
more costly than using a know machine learning method. We then plotted the
top AI opportunities on a value/simplicity chart, it became clear which project
would deliver the highest ROI: automating underwater industrial defect
detection using deep learning image recognition.
3. Values
Before going any further down the chosen path, it’s important to assess the “other”
30
ramifications of an AI implementation, beyond the dollars and cents. In this case, we
discussed the legal, social, and ethical issues that may arise, what we could do to
mitigate them, and whether there we had any showstopper risks. We also
documented the assessment and mitigation plan to revisit if/when this pilot
goes into production.
4. People
The next step was to secure organizational buy‐in and build up the right talent. This step
is crucial, because if key stakeholders aren’t ready to accept data‐driven insights, then all
the work ahead may be for naught. A classic example is the initial resistance to data
analytics in sports, where general managers and scouts scoffed at the idea of computer
algorithms outsmarting their years of experience and tribal knowledge. We used other
Intel AI solution briefs and customer testimonials get buy‐in, as well as ensure that the
organization was ready to embrace the fact that AI development is different, involving
more trial & error and uncertainty than traditional software development. Next, we
assessed the talent situation and determined that training up existing
developers through Intel’s free AI developer program was the best approach.
30
Intel AI case study (cont’d)
Challenge Prepare data for model development working with Intel and/or partner to get the time-
consuming data layer right (~12 weeks)
approach
Values Source
Data
Transmit
Data
Ingest
Data
Cleanup
Data
Integrate
Data
Stage
Data
Technology Develop model by training, testing inference and documenting results working with
Intel and/or partner for the pilot (~12 weeks)
Data
Train Train Test Document
Model (Topology
Experiments)
(Tune Hyper-
parameters)
Inference Results
Project breakdown is approximated based on engineering estimates for time spent on each step in this real customer POC/pilot; time distribution is expected to be similar but vary somewhat for other deep learning use cases 31
While the time slice breakdowns you’ll see on this slide are only based on this example,
other projects will vary slightly but generally follow the same process.
6. Data
One of the biggest barriers to AI, which is often overlooked, is getting your data ready.
From sourcing to storing to preparing’cleansing data for analysis, Intel worked with this
customer to get their data layer right – a stage that took about as long as the actual
model development itself!
7. Model
Once the data was ready, the team began experimenting with various topologies and
tuning hyperparameters through iterative training runs. Once a sufficiently high
accuracy was reached, the trained model was tested against a control data set, and
inference results achieved a high enough accuracy to proceed to the cleanup &
documentation phase. About 60% of the time was spent training, whereas the rest was
testing & documentation.
31
Intel AI case study (cont’d)
Challenge Engage Intel AI Builders partner to deploy & scale
AI Builders
approach Drones
Data Media Prepare
Training Model Inference Label Solution
Ingest Store Data Store Store Layer
Technology
Drone per day, one-
110 Nodes Data Ingestion Model Store history
day retention Training
10 Drones Inference Label Store
8 TB/day per 4 Nodes Label Store 4 Nodes 16 Nodes
Real-time object Inference
camera 20M frames Label Store Labels for Intermittent use
Data
detection and Inference
10 cameras Inference per day Label Store 20M frames 1 training/month
data collection /day
3x replication Prepare Data 2 Nodes Per Node for <10 hours
Drone Prepare Data Infrequent op 1x 2S 81xx Training
1-year video
Service Layer 3 Nodes
Model
Drone retention 5x 4TB SSD Per Node
Drone Service Layer Simultaneous 1x 2S 81xx
4 mgmt nodes Service Layer users
Per Drone 1x 4TB SSD
1x Intel® Core™
Media Store
Media Store
Media Server
Media Server
3 Nodes
10k clips Software
Deploy
processor
Media Store Media Server stored OpenVino™ Toolkit TensorFlow*
1x Intel® Movidius™
VPU Per
Intel® Movidius™ SDK Intel® MKL-DNN
Node 1x 2S 61xx 20x 4TB SSD
32
8. Deploy
The final (and arguably most complex) stage is to take the pilot to production,
deploying the model at scale.
In this case, our customer joined forces with a partner from the Intel AI Builders
program to put together a “real world” AI solution.
The colorful block diagram at the top is a functional description of each step in this
industrial defect detection scenario. There are 10 underwater drones that are
equipped with video cameras to monitor heavy industrial equipment in order to detect
potential defects. These drones capture videos of the underwater equipment, which is
then ingested into the data center. Those videos are stored, for use in re‐training the
model and future reference, as well as passed to the inference cluster to determine if
and where defects are. For re‐training, human experts label images where the
equipment was present or not, and where defects were present or not, in order to
continue build the dataset and achieve even higher levels of accuracy. The latest
trained models are stored, with one being deployed to perform object recognition
inference on the drone (to aim the cameras at the equipment itself), and the other
deployed to perform defect image recognition inference in the data center on the
ingested video streams. As possible defects are identified, the inference output is sent
32
to both the service layer (for human audit) and the solution layer, where it is used as part
of a larger decision process to determine whether to call a technician and/or shut down
the equipment.
The stacks at the bottom of this slide illustrate the infrastructure – both hardware and
software – underlying each colored step in the solution. This includes a whole lot of
Xeon‐based servers in the data center with SSD storage, Movidius VPU’s in the drones,
and Intel AI software like the OpenVINO toolkit, the Movidius SDK, and the latest Intel‐
optimized version of TensorFlow with MKL‐DNN.
THE BOTTOM LINE here is that AI in the real world is much more involved than in the lab,
and Intel & our partners are here to help you… not only with your deployment at scale, but
to accelerate each and every step in your AI journey. Next, we’ll see what Intel AI brings to
the table.
32
Key learning
33
33
Addressing the ai journey in the classroom
• An enterprise problem is too large and complex to address in a classroom
• Pick a smaller challenge and understand the steps to later apply to your enterprise problems
₋ Defining a challenge
₋ Technology choices
₋ Training a model and deploying it on CPU, integrated Graphics, Intel® Movidius™ Neural Compute Stick
34
34
35
1. Challenge
8. Deploy 2. Approach
The AI journey –
Steps we will Cover 7. model 3. values
in this course
6. Data 4. People
5. Technology
36
36
Step 1 – the challenge
37
37
38
Step 5 - Compute choices for training and inference
• Intel® AI DevCloud
• Microsoft Azure*
39
39
40
Intel® AI DevCloud
• A cloud hosted hardware and software platform available to Intel® AI Academy members to learn, sandbox
and get started on Artificial Intelligence projects
• Intel® Xeon® Scalable Processors(Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 24 cores with 2-way hyper-
threading, 96 GB of on-platform RAM (DDR4), 200 GB of file storage
• https://software.intel.com/ai-academy/tools/devcloud
41
41
Optimized Software – No install required
• Intel® distribution of Python* 2.7 and 3.6 including NumPy, • Intel® Parallel Studio XE Cluster Edition and the tools and
SciPy, pandas, scikit-learn, Jupyter, matplotlib, and mpi4py libraries included with it:
• More Frameworks coming as they are optimized ₋ Intel® Math Kernel Library-DNN
42
42
Dev cloud overview
43
43
44
Choosing your Cloud compute
Amazon Web Services* (AWS) Microsoft Azure* (Azure): Google Compute Engine* (GCE):
₋ Better: Intel® Xeon™ Scalable Processor (code named Skylake) / Best: 2nd Gen Intel® Xeon™ Scalable Processor (code named
Cascade Lake)
45
Not all servers are equal, each CSP has different choices for servers. Depending on your
favorite CSP we recommend looking for these types of instances to get the necessary
Processors that support the optimized software features., e.g., AVX512 and or VNNI
45
46
system configuration
Supported hardware: Supported operating systems:
⁻ 6th to 8th generation Intel® Core™ processors and Intel® Xeon® ⁻ Windows® 10 (64 bit)
processors
⁻ Ubuntu* 16.04.3 LTS (64 bit)
⁻ Intel Pentium® processor N4200/5, N3350/5, or N3450/5 with
Intel® HD Graphics ⁻ CentOS* 7.4 (64 bit)
https://software.intel.com/en-us/openvino-toolkit/hardware
47
https://software.intel.com/en‐us/openvino‐toolkit/hardware
47
Create Anaconda environment
1. Navigate to the root directory of the class
3. Now, to add this environment to the list of available environments you’ll see in your Jupyter notebook by running:
4. Run source activate intel_dc2edge or conda activate intel_dc2edge to activate the environment.
Now you'll be able to use all the libraries you'll need to complete the exercises!
Note: if you run into any problems while creating the environment, deactivate then delete the environment and start back at step 1.
48
48
49
Connect to your DEVCLOUD ACCOUNT
Obtain an account on Intel® AI DevCloud
50
hub.colfaxresearch.com
50
ACCESS your DEVCLOUD jupyter notebook ACCOUNT
4. Enter the previously copied username and password to access your
jupyter notebook account. 4 5
5. Click on the ‘New’ menu on the right side of the page and select the
‘Terminal’ to access the terminal
51
51
52
Step 6 – Exploratory data analysis
₋ Preprocess
₋ Data augmentation
53
53
Obtain a starter dataset
• Look for existing datasets that are similar to or match the given problem
54
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8014855
54
Initial assessment of the dataset
A Large and Diverse Dataset for Improved Vehicle
Make and Model Recognition
₋ Large in scale and diversity
₋ Images are collected from Craigslist
₋ Contains 9170 classes
₋ Identified 76 Car Manufacturers
₋ 291,752 images in total
₋ Manufactured between1950-2016
Car Manufacturer
55
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8014855
55
dataset for the stolen cars challenge
Hottest Wheels: The Most Stolen New And Used Cars In The U.S.
⁻ Chevrolet Silverado (2004): 30.056 # indicates number of stolen cars in each model in 2017
56
The problem we are trying to solve is based on the hottest wheels – most stolen cars.
https://www.forbes.com/sites/jimgorzelany/2018/09/18/hottest‐wheels‐the‐most‐stolen‐
new‐and‐used‐cars‐in‐the‐u‐s/#3e9577545258
56
prepare dataset for the challenge
• Map multiple year vehicles to the stolen car category (based on exterior similarity)
57
57
Preprocess the dataset
• Fetch and visually inspect a dataset
• Image Preprocessing
58
58
Inspect the dataset
• Visually Inspecting the Dataset
₋ Taking note of variances
› ¾ view
› Front view
› Back view
› Side View, etc.
› Image aspect ratio differs
• Sample Class name:
₋ Manufacturer
₋ Model
₋ Year
59
59
Data creation
• Honda Civic (1998)
60
We wanted a category of everything but the top 10 most stolen. After experimentation we
discovered that it had too many similarities to the other 10 categories and we ended
retraining without the others category and we got significant improvement in prediction.
60
Preprocessing & Augmentation
Preprocessing Data Augmentation
• Removes inconsistencies and incompleteness in • Improves the quantity and quality of the dataset
the raw data and cleans it up for model
consumption • Helpful when dataset is small or some classes
have less data than others
• Techniques:
• Techniques:
– Black background
– Rotation
– Rescaling, gray scaling
– Horizontal & Vertical Shift, Flip
– Sample wise centering, standard normalization
– Zooming & Shearing
– Feature wise centering, standard normalization
– RGB BGR
61
61
Preprocessing
62
62
RGB channels
• Images are made of pixels
63
63
RGB – BGR
• Depending on the network choice RGB-BGR conversion is required.
>> keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)
64
64
DATA Augmentation
• Oversample Minority Classes in Training
65
65
summary
Before Preprocessing
After Preprocessing
66
66
67
Step 7 – The training/model phase
68
68
69
Decision metrics for choosing a framework
70
70
optimized Deep learning frameworks
INSTALL AN INTEL-OPTIMIZED FRAMEWORK AND FEATURED TOPOLOGY
SEE ALSO: Machine Learning Libraries for Python (Scikit-learn, Pandas, NumPy), R (Cart, randomForest, e1071), Distributed (MlLib on Spark, Mahout)
*Limited availability today 71
Other names and brands may be claimed as the property of others.
How do you unleash all that deep learning performance on the Intel® Xeon®
Processor? Well, you need to install an Intel-optimized framework to get started.
Intel aims to ensure that all major DL frameworks and topologies will run well on
Intel Architecture, and customers are free to choose whichever framework(s) best
suit their needs. We’ve been directly optimizing the most popular AI frameworks for Intel
Architecture (based on market demand) and producing huge speedups. We intend to
enable even more frameworks in the future through the Intel® nGraph™ Compiler. Please
note that each of these frameworks have a varying degree of optimization and
configuration protocols, so visit ai.intel.com/framework‐optimizations/ for full details. Of
special note is the BigDL framework that’s been getting a LOT of traction lately with
customers who want an easy way to achieve high‐performance deep learning on their
existing big data/analytics infrastructure. BigDL is a distributed deep learning library for
Spark that can run directly on top of existing Spark or Apache Hadoop* clusters with
support for Scala or Python programming languages.
ai.intel.com/framework-optimizations/
71
Caffe / TensorFlow / Pytorch frameworks
Developing Deep Neural Network models can be done faster with Machine learning
frameworks/libraries. There are a plethora of choices of frameworks and the decision on
which to choose is very important. Some of the criteria to consider for the choice are:
2. Optimizations on CPU
3. Graph Visualization
4. Debugging
5. Library Management
6. Inference target (CPU/ Integrated Graphics/ Intel® Movidius™ Neural Compute Stick
/FPGA)
Considering all these factors, we have decided to use the Google Deep Learning framework
TensorFlow
72
72
Why did we choose TensorFlow ?
The choice of framework was based on:
Opensource and high level of Adoption
₋ Supports more features, also has the ‘contrib’ package for the creation of more models which allows for
support of more higher-level functions.
Optimizations on CPU
₋ TensorFlow with CPU optimizations can give up to 14x Speedup in Training and 3.2x Speedup in Inference!
TensorFlow is flexible enough to support experimentation with new deep learning models/topologies and
system level optimizations. Intel optimizations have been up-streamed and are part of public TensorFlow*
GitHub repo.
73
73
Why did we choose TensorFlow ?
The choice of framework was base on ..
• Graph Visualization: compared to its closest rivals like Torch and Theano, TensorFlow has better
computational graph visualization with Tensor Board.
• Debugging: TensorFlow uses its debugger called the ‘tfdbg’ TensorFlow Debugging, which lets you
execute subparts of a graph to observe the state of the running graphs.
• Library Management: TensorFlow has the advantage of the consistent performance, quick updates
and regular new releases with new features. This course uses Keras which will enable an easier
transition to TensorFlow 2.0 for training and testing models.
74
74
75
How to SELECT A Network?
We started this project with the plan for inference on an edge device in mind as our ultimate deployment
platform. To that end we always considered three things when selecting our topology or network: time to
train, size, and inference speed.
• Time to Train: Depending on the number of layers and computation required, a network can take a
significantly shorter or longer time to train. Computation time and programmer time are costly resources,
so we wanted a reduced training times.
• Size: Since we're targeting edge devices and an Intel® Movidius™ Neural Compute Stick, we must consider
the size of the network that is allowed in memory as well as supported networks.
• Inference Speed: Typically the deeper and larger the network, the slower the inference speed. In our use
case we are working with a live video stream; we want at least 10 frames per second on inference.
• Accuracy: It is equally important to have an accurate model. Even though, most pretrained models have
their accuracy data published, but we still need to discover how they perform on our dataset.
76
76
Inception v3 - VGG16 – MOBILENET networks
We decided to train our dataset on three networks that are currently supported on our edge devices
(CPU, Integrated GPU, Intel® Movidius™ Neural Compute Stick).
The original paper* was trained on ResNet-50. However, it is not supported currently on Intel®
Movidius™ Neural Compute Stick.
• Inception v3
• VGG16
• MobileNet
*http://vmmrdb.cecsresearch.org/papers/VMMR_TSWC.pdf
77
http://vmmrdb.cecsresearch.org/papers/VMMR_TSWC.pdf
77
Inception v3
https://arxiv.org/abs/1512.00567
78
ImageNet 2015
Szegedy, et al. 2014
Idea: network would want to use different receptive fields
Want computational efficiency
Also want to have sparse activations of groups of neurons
Hebbian principle: “Fire together, wire together”
Solution: Turn each layer into branches of convolutions
Each branch handles smaller portion of workload
Concatenate different branches at the end
78
Vgg16
79
One of the first architectures to experiment with many layers (more is better approach)
Uses multiple 3x3 convolutions to simulate larger kernels with fewer parameters
two 3x3 convolutions are equal to one 5x5
three 3x3 convolutions are equal to one 7x7
3x3xcxc = 9c2
7x7xcxc = 49c2
79
MOBILENET
https://arxiv.org/pdf/1704.04861.pdf
80
Picked initially due to it’s small nature
Uses global hyperparameters that efficiently tradeoff between latency and accuracy
These hyper‐parameters allow the model builder to choose the right sized model for their
application based on the constraints of the problem
80
Inception v3 - VGG16 - MOBILENET
After training and comparing the performance and results based on the previously discussed criteria, our
final choice of Network was Inception V3.
• MobileNet was the least accurate model (74%) but had the smallest size (16mb)
• VGG16 was the most accurate (89%) but the largest in size (528mb)
81
As you will see in the hands on section your results will be similar
81
summary
Based on your projects requirements the choice of framework and topology will differ.
• Time to train
• Inference speed
• Acceptable accuracy
There is no one size fits all approach to these choices and there is trial and error to finding your
optimal solution.
82
82
83
Training and inference workflow
84
(optional) training using vgg16 and mobilenet
• Try out Optional-Training_VGG16.ipynb
85
85
Model analysis
• Understand how to interpret the results of the training
by analyzing our model with different metrics and
graphs
₋ Confusion Matrix
₋ Classification Report
₋ Precision-Recall Plot
₋ ROC Plot
86
86
87
Step 8 – The deployment phase
88
88
What does deployment/inference mean?
89
1. Inference on the PC is the process of performing computations on custom or specialized
trained AI models in systems where limitations for size, power, and real‐time performance
are required to ensure success.
89
What is inference on the Edge?
90
• https://towardsdatascience.com/deep‐learning‐on‐the‐edge‐9181693f466c
• 1. Bandwidth and Latency
• 2. Security and Decentralization
• 3. Job Specific Usage (Customization)
90
91
92
People Counter Solution
(comes with the intel® distribution of openvino™ toolkit installation)
An application capable of counting the number of people in a given input video frame, a cumulative count of people
DESCRIPTION detected so far and the duration for which a person was present on the screen. This solution can be leveraged to a people
traffic monitor in retail stores. The data can be utilized by the store owners to optimize staffing, analyzing the store
sections and identifying the hours that bring in maximum traffic etc. The application uses a “ResMobNet_v4 (LReLU) with
single SSD head” model as its backbone
HARDWARE REQUIREMENTS Intel Core System, Intel Integrated GPU, Movidius VPU
93
93
Micro Emotion Recognition Solution
(comes with the intel® distribution of openvino™ toolkit installation)
This application demonstrates how to create a micro emotion recognition solution using Intel® hardware and software
tools. This solution is capable of mapping emptions to five categories - 'neutral', 'happy', 'sad', 'surprise', 'anger’. It can be
DESCRIPTION
leveraged to behavioral analysis solutions for the market research industry where video feeds of customer product
interaction is captured and analyzed in the interest of optimizing marketing strategies. The application uses a pipeline of
two models, one with a default MobileNet backbone that uses depth-wise convolutions and another that is a full
convolutional network
USE CASES Emotion recognition for interviews, Market research, Video surveillance
HARDWARE REQUIREMENTS Intel Core System, Intel Integrated GPU, Movidius VPU
94
https://towardsdatascience.com/background‐removal‐with‐deep‐learning‐c4f2104b3157
94
95
Pre-trained Models optimized for intel architecture
OpenVINO™ toolkit includes optimized pre-trained models that can expedite development and improve deep learning inference on
Intel® processors. Use these models for development and production deployment without the need to search for or to train your own
models.
Pre-Trained Models
• Age & Gender • Vehicle Detection • Identify Roadside objects
• Face Detection – standard • Retail Environment • Advanced Roadside Identification
& enhanced
• Pedestrian Detection • Person Detection & Action Recognition
• Head Position
• Human Detection – eye-level • Pedestrian & Vehicle Detection • Person Re-identification – ultra small/ultra fast
& high-angle detection
• Person Attributes Recognition Crossroad • Face Re-identification
• Detect People, Vehicles & Bikes
• Emotion Recognition • Landmarks Regression
• License Plate Detection: small & front facing
• Identify Someone from Different Videos –
• Vehicle Metadata standard & enhanced
96 96
96
Save Time with Deep Learning Samples & Computer Vision Algorithms
Samples Computer Vision Algorithms
Use Model Optimizer & Inference Engine for both public models as well Get started quickly on your vision applications with highly-optimized,
as Intel pre-trained models with these samples. ready-to-deploy, custom built algorithms using the pre-trained models.
• Object Detection
• Face Detector
• Standard & Pipelined Image Classification
• Age & Gender Recognizer
• Security Barrier
• Camera Tampering Detector
• Object Detection for Single Shot Multibox Detector (SSD) using
Asynch API • Emotions Recognizer
97
Sharpen the difference
97
98
Deep Learning vs. Traditional Computer Vision
OpenVINO™ has tools for an end to end vision pipeline
OpenVINO™
Intel® SDK for Intel®
OpenCV* OpenVX* OpenCL™ Applications Media SDK
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. 99 99
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
Key Takeaways
- OpenVINO™ has tools for both Traditional and Deep Learning CV
‐ Multiple Intel tools (Media SDK, OpenVINO™ , ISS) work together to provide a complete
CV pipeline optimization solution
‐ Using OpenVINO™ allows developer to maximize HW performance by using common
API without having to go to the Metal
‐ Easy to incorporate deep learning with the Deep Learning Deployment Toolkit
‐ Trad. And DL are not mutually exclusive
OpenCL is used for:
• required to run with GPU target (clDNN) using Intel® Processor Graphics
• custom kernels
• other kernels can be used for other non‐inference pipeline stages, such as color
conversions
99
100
Intel® Deep Learning Deployment Toolkit
Train Prepare Optimize Inference Optimize/ Extend
Heterogeneous
Train a DL model. Model optimizer: Inference engine Inference engine Inference engine supports
Currently supports: • Converting lightweight API to use in supports multiple devices extensibility and allows
• Caffe* • Optimizing applications for inference. for heterogeneous flows. custom kernels for various
• Mxnet* • Preparing to inference devices.
• TensorFlow* (device-level optimization)
(device agnostic,
generic optimization)
101
101
Step 1 – Train a model
1. A trained model is the 2. Use the frozen graph (.pb 3. The Model Optimizer
input to the Model file) from the Stolen Cars provides to tools to convert
Optimizer (MO) model training as input a trained model to a frozen
graph in the event it is not
already done.
102
102
103
Step 2 – model optimizer (mo)
Improve Performance with Model Optimizer
Model Optimizer
ANALYZE
Intermediate
QUANTIZE Representation
(IR) file
Trained
Model OPTIMIZE TOPOLOGY
CONVERT
104
104
Improve Performance with Model Optimizer (cont’d)
Model optimizer performs generic optimization:
• Node merging
FP32 FP16
• Horizontal fusion
CPU YES NO
• FP16/FP32 quantization
105
105
Improve Performance with Model Optimizer
EXAMPLE
1. Remove Batch normalization stage.
2. Recalculate the weights to ‘include’ the operation.
3. Merge Convolution and ReLU into one optimized kernel.
ReLU Pooling
Pooling
106106
• The Model Optimizer is easier to install, and easier to use for optimizations
• Improved performance and output
• deep learning
• written in easy Python language, more efficient workflow
• using standard layers, get faster performance without the overhead of
frameworks
106
Processing standard layers
• To generate IR files, the MO must recognize the layers in the model
• Some layers are standard across frameworks and neural network topologies
₋ Caffe: https://software.intel.com/en-us/articles/OpenVINO-Using-Caffe
₋ Tensorflow: https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow
₋ MxNet: https://software.intel.com/en-us/articles/OpenVINO-Using-MXNet
107
107
Processing custom layers (optional)
• Custom layers are layers not included in the list of layers known to MO
• Register the custom layers as Custom and use the system Caffe to calculate the output shape of each Custom Layer
108
108
109
Optimal Model Performance Using the Inference Engine
TRANSFORM MODELS & DATA INTO RESULTS & INTELLIGENCE
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. 110110
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
110
Inference Engine
• The Inference Engine is a C++ library with a
set of C++ classes that application
developers use in their application to infer
input data (images) and get the result.
User Application
• The library provides you an API to read the
IR, set input and output formats and
execute the model on various devices.
IR
Train a Model Run Model Optimizer Inference Engine
.xml
.bin
111
111
Layers Supported by Layer Type
Convolution
Fully Connected
CPU
Yes
Yes
FPGA
Yes
Yes
GPU
Yes
Yes
MyriadX
Yes
Yes
– FP16 data types, FP11 is coming Add Yes Yes Yes Yes
Permute Yes Yes Yes
• Intel® Movidius™ Neural Compute Stick– Intel® Movidius™ Myriad™ VPU Plugin PriorBox Yes Yes Yes
SimplerNMS Yes Yes
– Set of layers are supported on Intel® Movidius™ Myriad™ X (28 layers), non-supported Detection Output Yes Yes Yes
layers must be inferred through other inference engine (IE) plugins . Supports FP16 Memory / Delay Object Yes
Tile Yes Yes
https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_Supported_Devices.html 112
https://github.com/01org/mkl‐dnn
https://github.com/01org/clDNN
https://docs.openvinotoolkit.org/latest/_docs_IE_DG_supported_plugins_Supported_Devi
ces.html
112
113
Install the intel® OpenVINO™ Toolkit
• Installation instructions can be found on this link: https://software.intel.com/en-us/openvino-toolkit/choose-
download
• Follow the instructions for TensorFlow*
• Test out some of the samples before we begin
• Before running inference, you will need to convert the frozen graph obtained from training to Intermediate
Representation using the Model Optimizer (MO)
114
https://software.intel.com/en‐us/openvino‐toolkit/choose‐download
114
115
generate optimized intermediate representation (IR) using Mo
Configure the Model Optimizer for TensorFlow*:
⁻ Configure the Model Optimizer for the TensorFlow* framework running the configuration bash script (Linux* OS) or batch file (Windows*
OS) from:
<INSTALL_DIR>/deployment_tools/model_optimizer/install_prerequisites folder:
install_prerequisites_tf.sh
install_prerequisites_tf.bat
116
116
generate optimized intermediate representation (IR) using Mo
To convert a TensorFlow* model:
‾ Use the mo_tf.py script to simply convert a model with the path to the input model .pb file with the output Intermediate Representation
called result.xml and result.bin that are placed in the specified ../../models/:
‾ Launching the Model Optimizer for model .pb file, with reversing channels order between RGB and BGR, specifying mean values for the
input and the precision of the Intermediate Representation to be FP16:
11
117
Note: The Model Optimizer does not revert input channels from RGB to BGR by
default. Manually specify the command-line parameter to perform this reversion: ‐
‐reverse_input_channels
117
118
Hands on Inference on the Edge - Tutorial:
• Introduction
• What it does
119
119
How it Works
120
120
OpenVINO™ App Execution flow
Configure
Load Plugin Load Network Load Model Prepare Input Infer Process Output
Input/Output
121
121
StepS to Inference
1. Load plugin 5. Prepare Input
exec_net = plugin.load(network=net)
122
122
Running inference on jupyter notebook
• You can also create IR files (bin/xml) by running the MO through a jupyter notebook and infer using the
Inference Engine
• Refer to Part4-OpenVINO_Video_Inference.ipynb
⁻ Set the “arg_device” parameter to “CPU”, “GPU” or “MYRIAD” to run on the CPU, integrated graphics or the
Intel® Movidius™ Neural Compute Stick
123
123
124
New 2nd Generation Intel® Xeon® processor scalable
family, which is drop‐in compatible with the previous
Intel® Xeon® Scalable processor platform.
You can use it to:
‐ Achieve the deep learning performance you need
thanks to built‐in acceleration with Intel DL boost,
optimized DL SW frameworks, and the ability to
efficiently scale up to hundreds of nodes
‐ Lower TCO/increase utilization by sharing resources
between data center and AI workloads, with even
more agility thanks to new features like
IMT/ADQ/SST
‐ Confidently analyze your sensitive data with
hardware‐enhanced security including new features
like Intel SecL and TDT
‐ And so much more…
125
Intel® Deep Learning Boost (DL Boost)
FEATURING VECTOR NEURAL NETWORK INSTRUCTIONS (VNNI)
Sign Mantissa
INT8 07 06 05 04 03 02 01 00
NEW
126
Intel® Deep Learning Boost (VNNI) on the 2nd Generation Intel® Xeon® Scalable processor is
designed to deliver significant, more efficient Deep Learning (Inference) acceleration.
• Intel® DL Boost (VNNI) is a new Intel®
Advanced Vector Extension (Intel® AVX‐512)
instruction
• It is a fused multiply‐add instruction, which
is often used in matrix manipulations as part
of deep learning inference
• The new VNNI instruction combines what
126
were three separate instructions into a single
processor instruction, saving clock cycles on
the processor.
• VNNI can help to speed up image
classification, speech recognition,
language translation, object detection
and more
126
127
workflow
The workflow is similar to FP32, EXCEPT for the use of “Calibration Tool” for INT8.
128
128
Steps to convert a trained model and infer
OpenVINO toolkit support for int8 model inference on Intel processors:
₋ Convert the model from original framework format using the Model Optimizer
tool. This will output the model in Intermediate Representation (IR) format.
₋ Perform model calibration using the calibration tool within the Intel
Distribution of OpenVINO toolkit. It accepts the model in IR format and is
framework-agnostic.
129
https://software.intel.com/en‐us/articles/OpenVINO‐ModelOptimizer
https://docs.openvinotoolkit.org/R5/_samples_calibration_tool_README.html
129
130
Course completion certificate
• You have the option to receive an Intel® AI Course Completion Certificate upon completion of
the end of the course quiz.
• Before taking the quiz, you may have to disable AdBlockers. (Ghostery, uBlock, AdGuard, etc.)
• Take the quiz
131
Take the Quiz https://intel.az1.qualtrics.com/jfe/form/SV_9EIVi2JXNF1ViiV
131
132
resources
• Intel® Distribution of OpenVINO™ Toolkit Learn more through the AI webinar series
133
https://software.intel.com/en‐us/openvino‐toolkit
https://github.com/NervanaSystems/coach
http://nlp_architect.nervanasys.com/
https://www.intel.ai/introducing‐nauta/#gs.8hTP6kBc
https://software.intel.com/en‐us/ai/frameworks/bigdl
https://software.intel.com/en‐us/ai/frameworks/caffe
https://software.intel.com/en‐us/ai/frameworks/tensorflow
https://software.intel.com/en‐us/ai/courses/artificial‐intelligence
https://software.intel.com/en‐us/ai/courses/machine‐learning
https://software.intel.com/en‐us/ai/courses/deep‐learning
https://software.intel.com/en‐us/ai/courses/tensorflow
133
134