ML System Optimization - Lecture 10 - Model Optimization Techniques

Uploaded by

allybenson5888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views33 pages

ML System Optimization - Lecture 10 - Model Optimization Techniques

Uploaded by

allybenson5888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

Model Optimization for

Edge ML
Machine Learning Model Optimization for Intelligent
Edge

• Use cases built around IoT are endless.

• Devices are being used in homes to manage security, energy,
watering and appliances.
• Factories are optimizing operations and costs through
predictive maintenance.
• Cities are controlling traffic and applying IoT for public safety.
Logistics companies are tracking shipments, doing fleet
management and optimizing routes.
• Restaurants are ensuring food safety in fridges and deep fryers,
retailer are deploying smart digital signage and implementing
advanced payment systems — and the list goes on and on.
Machine Learning Model Optimization for Intelligent
Edge

• To cope with this challenge, “core” or “central” cloud

alone cannot deliver services at the scale and speed
expected in this era.
• Rather, support on the edge is going to be needed (in
form of a separate cloud instance) to satisfy response
time demands and to deliver superior user experience.
• In this work, we elaborate on the need of an edge
focusing mainly on the AI/ML design strategy with edge
playing a central role.
Smart devices, Intelligent Edge,
Continuous Learning
Edge ML Process
• The edge machine learning process underscoring three distinct
tiers, IoT, Core and Edge clouds.
• First off, the labelled (training) data is secured separately from a singular or multiple resources,
outside of the main process.
• This data is then fed into ML algorithms for model training, which (for practical reasons) takes
place in back-end core.
• The core is poised to do the heavy lifting of model training due to (near) endless capacity and
processing power availability.
• Once the model has reached acceptable accuracy, it is then deployed on the Edge to provide
data insights and real-time inferences based on the data collected locally from the devices.
• Model training, followed by pruning (taking extra fat off) and validation goes through an
iterative cycle to generate an optimal model before rolling into production.
• This creates a closed loop system that incorporates the approach of taking user actions and
then feeding them back into the data set for re-training. This entails new features or modified
ground truth or both.
• Re-training on newer distribution helps in improving accuracy and increases capacity to handle
wider range of data inputs.
Edge ML Subsystem Needs
• In a typical multi-tiered, multi-segment system, you need to
pay attention to the sub-system characteristic.
• For example, the Edge compute layer has far less storage and
compute power comparing to the back-end cloud data center.
• High-performance and high throughput is expected of the
Edge to cater to real-time traffic with low-latency and quick
response time requirements.
• When it comes to re-training, tasks need to be carefully split
between the Edge and the Core cloud to maintain business
objectives and at the same time gain improvements in terms
of better accuracy.
Continual Feedback loop

• Important here is the idea of the feedback loop and online model training/update stages.
• Pre-training and optimization takes place in a full-scale central cloud
• This way it can utilize its almost infinite power and space.
• By improving model capacity with more factual field data results in a better functioning
model at run time.
• It can be shared over a number of (edge) devices by employing them for a number of
tasks, with each device working in parallel.
Edge ML Approach Guidelines
1.Select the right chip set with estimates for energy consumption and compute
performance
2.Understand storage requirements over time
3.Latency and throughput requirements
4.Understand your workload
5.Understand your traffic patterns
6.Understand your data growth
7.Understand memory/storage requirements
8.Hyper-parameters knowledge and tuning
9.Re-use space and reduce storage overuse/overkill
10.Train and save (at core), transfer (model) and load (at edge)
11.Containerization of AI services — Well orchestrated and manageable
12.Reproducible and secure pipelines (such as using kubeflow)
Benefits of Model Optimization
• Less resources required: Deploying models to edge devices with
restrictions on processing, memory, or power-consumption. For
example, mobile and Internet of Things (IoT) devices.
• Efficiency: Reduced model size can help improving productivity,
especially when deployed on the Edge.
• Latency: there’s no round-trip to a server, adherence to compliance
• Privacy: no data needs to leave the device or edge gateway, hence
better security
• Connectivity: an Internet connection isn’t required for business
operation
• Power consumption: Matrix multiplication operations require
compute power. Less neurons means less power consumption.
Model Optimization
Techniques
Pruning
• Pruning describes a set of techniques to trim network size (by nodes not layers) to
improve computational performance and sometimes resolution performance.
• The gist of these techniques is removing nodes from the network during training by
identifying those nodes which, if removed from the network, would not noticeably
affect network performance (i.e., resolution of the data).
• Even without using a formal pruning technique, you can get a rough idea of which
nodes are not important by looking at your weight matrix after training; look weights
very close to zero — it’s the nodes on either end of those weights that are often
removed during pruning.
• Pruning neural networks is an old idea going back to 1990 (with Yan Lecun’s
optimal brain damage work) and before.
• The idea is that among the many parameters in the network, some are redundant
and don’t contribute a lot to the output.
• If you could rank the neurons in the network according to how much they contribute,
you could then remove the low ranking neurons from the network, resulting in a
smaller and faster network.
Pruning Algorithms
• By applying a pruning algorithm to your network during
training, you can approach optimal network
configuration.
• The ranking of neurons can be done according to
the L1/L2 mean of weights, their mean activations, the
number of times a neuron wasn’t zero on some
validation set, and other creative methods.
• After the pruning, the accuracy will drop (hopefully not
too much if the ranking is clever), and the network is
usually trained more to recover.
Pruning Algorithms (for Neural
Network)

Pruning for SVM: https://www.atlantis-press.com/article/10.pdf

Dimensionality Reduction
• When dealing with real world problems we often deal
with high dimensional data that can cross million of
data points.
• Depending upon your algorithm selection, techniques
like PCA and RFE can prove to be quite useful in
reducing data dimensions and hence model’s space
requirements.
• This method it is used primarily for non-neural network
based ML algorithms.
• Other techniques, such as pruning and the ones in forth
coming sections are more suitable for optimizing deep
learning models.
Quantization
• Quantization techniques are particularly effective when applied during training and can
improve inference speed by reducing the number of bits used for model weights and
activations.
• For example, using 8-bit fixed point representation instead of floats can speed up the
model inference, reduce power and further reduce size by 4x.
• We will look at several techniques and useful tips as we move on with our discussion.
1.Reduce parameter count with pruning and structured pruning. Practically setting the
neural network parameters’ values to zero, thus creating a sparse neural net (matrix).
Sparse matrices tend to yield better compression resulting in overall model size
reduction.
2.Reduce representational precision with quantization. Quantizing deep neural networks
uses techniques that allow for reduced precision representations of weights and,
optionally, activations for both storage and computation. It is found that weight pruning
when combined with quantization, results in compound benefits.
3.Update the original model topology to a more efficient one with reduced parameters or
faster execution. For example, tensor decomposition methods and distillation.
Palletization
• Map the weights of a model to a discrete set of precomputed (or
learned) values.
• Inspired by an artist’s palette, the idea is to map many similar values
to one average or approximate value, then use those new values for
computing inference.
• In this way, palettization is similar in spirit to algorithmic
memoization, a dictionary, or a look-up table.
• Palettization can make a model smaller but does not make a model
faster since it incurs look-up time.
Regularization
• One of the most common problem data science professionals face is to avoid
overfitting.
• Avoiding overfitting can single-handedly improve your model’s performance. L1 and L2
(weight decay) are the most common types of regularization.
• These update the general cost function by adding another term known as the
regularization term. Concretely: Cost function = Loss (say, binary cross entropy) +
Regularization term
• the weight matrices W need to be reasonably close to zero. So one piece of intuition is
maybe to set the weight to be so close to zero for a lot of hidden units that’s basically
zeroing out a lot of the impact of these hidden units.
• We can think of it as zeroing out or at least reducing the impact of a lot of the hidden
units so you end up with what might feel like a simpler network.
• It turns out that what actually happens is they’ll still use all the hidden units, but each
of them would just have a much smaller effect.
• But you do end up with a simpler network and as if you have a smaller network that is
therefore less prone to overfitting.
Hyper Parameter Tuning
• By tuning hyper-parameters, efficient networks can be engineered
• This in-turn will result in superior run time performance when put
under resource constraining situations, like on an IoT device or
MEC.
• Some of the popular approaches are:
• Neural Network depth — Concretely, depth of a NN and the number of
neurons per hidden layer enhance model’s capability to work with more
complex decision boundaries. The selection of number of neurons per
layer and number of layers constitute what’s called the network
architecture. There is no hard and fast rule when it comes to decide on the
hidden layer(s) dimensions; rather your architectural choices will be based
on the empirical results obtained from different combinations. Typically,
you will treat the number of Hidden layers as a tune able hyper-parameter
and use it during the forward pass.
Hyper Parameter Tuning
• Drop out ratio — Dropout is a technique to fight overfitting and improve neural
network generalization. It is one of the most interesting types of regularization
techniques and most frequently used in the field of deep learning.
• Dropout is also a defensive mechanism against model over-fitting. At every iteration, it
randomly selects some nodes and removes them along with all of their incoming and
outgoing connections. The dropout ratio is a hyper parameter that controls the zeroing
out of neurons/weights and supported by all major ML libraries such as Keras. The value
is normally set between 0.25–0.50.
Hardware Acceleration
• At its core, neural networks are multi-dimensional arrays (matrices or
tensors) which operate on mathematical operations like addition and
multiplication.
• Specialized hardware such as FPGA , TPU or GPU rapidly manipulate
and alter memory to accelerate the overall process, such as model
training and its execution.
• Edge TPU is Google’s purpose-built ASIC designed to run AI at the
edge. It delivers high performance in a small physical and power
footprint, enabling the deployment of high-accuracy AI at the edge.
• Other solutions like AI2GO from Xnor AI provide pre-built, purpose
built models that can run autonomously on small inexpensive devices
including Raspberry Pi with no connection to Internet or central cloud
needed.
Lightweight Frameworks
• In May 2017 Google introduced TensorFlow Lite for
mobile edge devices development.
• It is designed to make it easy to perform machine
learning at the edge, instead of sending data back and
forth to a server.
• TensorFlow Lite works with a huge range of devices,
from tiny microcontrollers to powerful mobile phones.
Putting it all together using Pipelines
Purpose Built Frameworks
• Frameworks like Learn2Compress from Google,
generalizes the learning by incorporating several state-
of-the-art techniques for compressing neural network
models.
• It takes as input a large pre-trained TensorFlow model
provided by the user, performs training and
optimization and automatically generates ready-to-use
on-device models that are smaller in size, more
memory-efficient, more power-efficient and faster at
inference with minimal loss in accuracy.
TFLite based Model
Optimization
Types of optimization supported by
TFLite
• TensorFlow Lite currently supports optimization via
quantization, pruning and clustering.
• These are part of the
TensorFlow Model Optimization Toolkit,
• The toolkit provides resources for model optimization
techniques that are compatible with TensorFlow Lite.
Quantization
• Quantization works by reducing the precision of the numbers used to represent a model's
parameters, which by default are 32-bit floating point numbers, resulting in smaller model size
and faster computation.
• The following types of quantization are available in TensorFlow Lite:
Data Supported
Technique requirements Size reduction Accuracy hardware
Post-training float16 No data Up to 50% Insignificant CPU, GPU
quantization accuracy loss

Post-training dynami No data Up to 75% Smallest accuracy CPU, GPU (Android)

c range quantization loss

Post-training integer Unlabelled Up to 75% Small accuracy loss CPU, GPU (Android),
quantization representative EdgeTPU
sample
Quantization-aware t Labelled training Up to 75% Smallest accuracy CPU, GPU (Android),
raining data loss EdgeTPU
Quantization Decision Tree
Quantization Numbers
Top-1 Top-1 Latency Latency
Top-1 Accuracy Accuracy Latency (Post (Quantizati Size Size
Model Accuracy (Post (Quantizati (Original) Training on Aware (Original) (Optimized)
(Original) Training on Aware (ms) Quantized) Training) (MB) (MB)
Quantized) Training) (ms) (ms)

Mobilenet- 0.709 0.657 0.70 124 112 64 16.9 4.3

v1-1-224

Mobilenet- 0.719 0.637 0.709 89 98 54 14 3.6

v2-1-224

Inception_v 0.78 0.772 0.775 1130 845 543 95.7 23.9

3
Resnet_v2_ 0.770 0.768 N/A 3973 2868 N/A 178.3 44.9
101
Full integer quantization: int16 activations,
int8 weights

• Quantization with int16 activations is a full integer quantization scheme with

activations in int16 and weights in int8.
• This mode can improve accuracy of the quantized model in comparison to the full
integer quantization scheme with both activations and weights in int8 keeping a
similar model size.
• It is recommended when activations are sensitive to the quantization.
• NOTE: Currently only non-optimized reference kernel implementations are available in
TFLite for this quantization scheme, so by default the performance will be slow
compared to int8 kernels.
• Full advantages of this mode can currently be accessed via specialised hardware, or
custom software.
Development
workflow
• As a starting point, hosted models are studied to see if they
could work for the application.
• If not, the post-training quantization tool based approach is
typically adopted as it is broadly applicable and does not require
training data.
• For cases where the accuracy and latency targets are not met, or
hardware accelerator support is important,
quantization-aware training is considered the better option.
• Note: See additional optimization techniques under the
TensorFlow Model Optimization Toolkit.
• For further reduction in model size, pruning and/or clustering
prior to quantizing models is the typical approach.
Model Optimization Sequence
1. Choose the best model for the task: Depending on the task, you will need to make a
tradeoff between model complexity and size. If your task requires high accuracy, then you
may need a large and complex model. For tasks that require less precision, it is better to use
a smaller model because they not only use less disk space and memory, but they are also
generally faster and more energy efficient.
2. Pre-optimized models: See if any existing TensorFlow Lite pre-optimized models provide
the efficiency required by your application.

3. Post-training tooling
If you cannot use a pre-trained model for your application, try using
TensorFlow Lite post-training quantization tools during TensorFlow Lite conversion, which can
optimize your already-trained TensorFlow model. See the post-training quantization tutorial to
learn more.
4. Training-time tooling
If the above simple solutions don't satisfy your needs, you may need to involve training-time
optimization techniques. Optimize further with our training-time tools and dig deeper.
Optimize further
• When pre-optimized models and post-training tools do not satisfy
your use case, the next step is to try the different training-time
tools.
• Training time tools piggyback on the model's loss function over the
training data such that the model can "adapt" to the changes
brought by the optimization technique.
• The starting point to use our training APIs is a Keras training script,
which can be optionally initialized from a pre-trained Keras model to
further fine tune.
• Training time tools available for you to try:
• Weight pruning
• Quantization
• Weight clustering
• Collaborative optimization
TFLite Model Optimizations
Capabilities in the Pipeline
• Quantization
• Selective post-training quantization to exclude certain layers from
quantization.
• Quantization debugger to inspect quantization error losses per each layer.
• Applying quantization-aware training on more model coverage e.g.
TensorFlow Model Garden.
• Quality and performance improvements for post-training dynamic-range
quantization.
• Tensor Compression API to allow compression algorithms such as SVD.
• Pruning / sparsity
• Combine configurable training-time (pruning + quantization-aware training)
APIs.
• Increase sparsity application on TF Model Garden models.
• Sparse model execution support in TensorFlow Lite.

Designing Machine Learning Systems by Chip Huygen by Rick
No ratings yet
Designing Machine Learning Systems by Chip Huygen by Rick
15 pages
PlusOne VPA G4 Sample
100% (1)
PlusOne VPA G4 Sample
36 pages
Analog Digital
No ratings yet
Analog Digital
30 pages
PQAT
No ratings yet
PQAT
25 pages
DLEI PPT B-Batch Unit-6
No ratings yet
DLEI PPT B-Batch Unit-6
41 pages
Efficient Deep Learning for Edge Devices_ a Review
No ratings yet
Efficient Deep Learning for Edge Devices_ a Review
13 pages
M Thesis Report
No ratings yet
M Thesis Report
38 pages
AIML105
No ratings yet
AIML105
5 pages
Unit 5
No ratings yet
Unit 5
36 pages
ML Ans
No ratings yet
ML Ans
4 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Paper 3
No ratings yet
Paper 3
5 pages
Feature Labs - ML 2.0
No ratings yet
Feature Labs - ML 2.0
13 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Pruning Algorithms of Neural Networks - A Comparat
No ratings yet
Pruning Algorithms of Neural Networks - A Comparat
11 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Efficient Machine Learning On Edge Computing Through Data Compression Techniques
No ratings yet
Efficient Machine Learning On Edge Computing Through Data Compression Techniques
10 pages
LLM 5 - Copy
No ratings yet
LLM 5 - Copy
31 pages
Module 3-DL
No ratings yet
Module 3-DL
12 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
NNQuant 4
No ratings yet
NNQuant 4
20 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
MODULE 2 Deep Learning
No ratings yet
MODULE 2 Deep Learning
26 pages
12_lecture_27_02_25 (1)
No ratings yet
12_lecture_27_02_25 (1)
39 pages
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
No ratings yet
UCS - 401 - Unit-LV - Trends in Machine Learning - Model and Symbols - Bagging and Boosting, Multitask
44 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Week 2 - Select and Train A Model
No ratings yet
Week 2 - Select and Train A Model
29 pages
Pattern Classification Using Simplified Neural Networks With Pruning Algorithm
No ratings yet
Pattern Classification Using Simplified Neural Networks With Pruning Algorithm
7 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
6 CNN
No ratings yet
6 CNN
50 pages
IMP Deep Learning
No ratings yet
IMP Deep Learning
9 pages
Functions:: Sparse Modeling
No ratings yet
Functions:: Sparse Modeling
7 pages
ANN Module-III
No ratings yet
ANN Module-III
16 pages
Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
IoT and ML Question
No ratings yet
IoT and ML Question
12 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
No ratings yet
Optimizing Edge AI - A Comprehensive Survey On Data, Model, and System Strategies
31 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Fixing Neural Network Course 2 1659759284
No ratings yet
Fixing Neural Network Course 2 1659759284
30 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Simple Introduction of Neural Network
No ratings yet
Simple Introduction of Neural Network
28 pages
Runtime Neural Pruning
No ratings yet
Runtime Neural Pruning
11 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
Mathematics 12 03032
No ratings yet
Mathematics 12 03032
19 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
No ratings yet
Pruning and Quantization For Deep Neural Network Acceleration: A Survey
41 pages
DL 4
No ratings yet
DL 4
15 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Data Science For Civil Engineering Unit 5 Notes
No ratings yet
Data Science For Civil Engineering Unit 5 Notes
17 pages
Safari - 07-Apr-2023 at 4:10 PM
No ratings yet
Safari - 07-Apr-2023 at 4:10 PM
1 page
Lecture 13 Small Networks
No ratings yet
Lecture 13 Small Networks
66 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
GlobalLogic - Optimization Algorithms For Machine Learning
No ratings yet
GlobalLogic - Optimization Algorithms For Machine Learning
4 pages
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
No ratings yet
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
70 pages
Mastering The Basics of Machine Learning
No ratings yet
Mastering The Basics of Machine Learning
65 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Application of ML in Manufacturing Industry
No ratings yet
Application of ML in Manufacturing Industry
11 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Verilog Questions
No ratings yet
Verilog Questions
6 pages
ZJK
No ratings yet
ZJK
2 pages
SBC 5200 Datasheet
No ratings yet
SBC 5200 Datasheet
3 pages
Evaluation of PRT Bus Route Path and PuT Path Using Visum Software
No ratings yet
Evaluation of PRT Bus Route Path and PuT Path Using Visum Software
11 pages
Textron Case Study
No ratings yet
Textron Case Study
4 pages
Top 10 Retail Chains in India - Success Stories of Organized Retailing and Stories of Organised Retail Failure in India - Dartconsulting
No ratings yet
Top 10 Retail Chains in India - Success Stories of Organized Retailing and Stories of Organised Retail Failure in India - Dartconsulting
6 pages
Call Center Licensing Guidelines PDF
No ratings yet
Call Center Licensing Guidelines PDF
11 pages
STM 32 MC
No ratings yet
STM 32 MC
8 pages
Corpcio Global LLC Company Profile
No ratings yet
Corpcio Global LLC Company Profile
6 pages
Resume: Vinay Kumar Pandey
No ratings yet
Resume: Vinay Kumar Pandey
2 pages
Manual PA160 Mk1
No ratings yet
Manual PA160 Mk1
13 pages
Huawei Band 4 Pro - Manual
100% (1)
Huawei Band 4 Pro - Manual
112 pages
Auto Light
No ratings yet
Auto Light
37 pages
Map - Glsu Manual
No ratings yet
Map - Glsu Manual
115 pages
Restaurant Table Booking Service
No ratings yet
Restaurant Table Booking Service
43 pages
UNIT-4: Computer Network
No ratings yet
UNIT-4: Computer Network
12 pages
12F-288F ADSS Cable For 100M SPAN LENGHT
No ratings yet
12F-288F ADSS Cable For 100M SPAN LENGHT
2 pages
EPOS2 P Programmable Positioning Controller Data
No ratings yet
EPOS2 P Programmable Positioning Controller Data
1 page
Kenr9524kenr9524-01 Sis
No ratings yet
Kenr9524kenr9524-01 Sis
4 pages
Inspur How To Flash BNC
No ratings yet
Inspur How To Flash BNC
3 pages
ControlDesign (GCD Processor Classic and One Hot Method)
100% (1)
ControlDesign (GCD Processor Classic and One Hot Method)
69 pages
PI OPCInt 2.5.0.9a
No ratings yet
PI OPCInt 2.5.0.9a
112 pages
SAP BW ODS Object Structure
No ratings yet
SAP BW ODS Object Structure
12 pages
Manual Urc8210 English
No ratings yet
Manual Urc8210 English
58 pages
S7-1500 Sales Slide
No ratings yet
S7-1500 Sales Slide
16 pages
In Class - Finite State Machine Examples
No ratings yet
In Class - Finite State Machine Examples
30 pages
OBD Tracking Device Immobilizer Installation User Guide
No ratings yet
OBD Tracking Device Immobilizer Installation User Guide
7 pages
Pen Testing Poster 2014
No ratings yet
Pen Testing Poster 2014
1 page

ML System Optimization - Lecture 10 - Model Optimization Techniques

Uploaded by

ML System Optimization - Lecture 10 - Model Optimization Techniques

Uploaded by

Model Optimization for

• Use cases built around IoT are endless.

• To cope with this challenge, “core” or “central” cloud

Pruning for SVM: https://www.atlantis-press.com/article/10.pdf

Post-training dynami No data Up to 75% Smallest accuracy CPU, GPU (Android)

Mobilenet- 0.709 0.657 0.70 124 112 64 16.9 4.3

Mobilenet- 0.719 0.637 0.709 89 98 54 14 3.6

Inception_v 0.78 0.772 0.775 1130 845 543 95.7 23.9

• Quantization with int16 activations is a full integer quantization scheme with

You might also like