0% found this document useful (0 votes)
18 views

Day5_03_Converting Neural Networks model into Optimzied Code

Uploaded by

Sidi El Hacen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Day5_03_Converting Neural Networks model into Optimzied Code

Uploaded by

Sidi El Hacen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Converting Neural Networks model into

Optimized Code for MCU: STM32Cube.AI


tool

Amit Kumar
5th Aug 2021
STM32 Cube.AI
STM32 comprehensive AI ecosystem
Applications Frameworks

Applicative
Examples
(Function Packs)

AI Model convertor
Pre and post
Processing Graph Memory
Quantizer
libraries optimizer optimizer

Edge Hardware

STM32 series Discovery kit STM32 Nucleo board Camera add-on

3
A tool to seamlessly integrate AI in your projects
Machine Learning

Deep Learning

Select MCU & upload your model


HIGH PERFORMANCE
EDGE AI PRODUCT
Optimize and validate

Generate project and deploy

4
The 3 pillars of STM32Cube.AI
Graph optimizer Quantized model support Memory optimizer
Automatically improve performance through Import your quantized ANN to be compatible Optimize memory allocation to get the best
graph simplifications & optimizations that with STM32 embedded architectures while performance while respecting the constraints
benefit STM32 target HW architectures keeping their performance of your embedded design

• Auto graph rewrite • From FP32 to Int8 • Memory allocation


• Node/operator fusion • Minimum loss of accuracy • Internal/external memory repartition
• Layout optimization • Code validation on target • Model-only update option
• Constant-folding… o Latency
• Operator-level info to fine-tune o Accuracy
memory footprint and computation o Memory usage

STM32Cube.AI is free of charge, available both in graphical interface and in command line.

5
Graph optimizer

Squeeze your graph to fit into an MCU!

Fully automated process in the


STM32Cube.AI workflow

• Your original graph is optimized at the


very early stage for optimal integration
into STM32 MCU/MPU

• Loss-less conversion

6
Quantized model support
Simply use quantized networks to reduce memory footprint and
inference time

LATENCY & MEMORY COMPARISON FOR STM32Cube.AI support quantized Neural Network
QUANTIZED MODELS
models with all parameter formats:
FP32
800
• FP32
700
• Int8
600
• Mixed binary Int1 to Int8 (Qkeras*, Larq.dev*)
Flash (kB)

500
*Please contact [email protected] to request
400
the relevant version of STM32Cube.AI
300
200 Int8
100 HW Target: NUCLEO-STM32H743ZI2
Int 1 + Int8 Model: Low complexity handwritten digit reading
0
Freq: 480 MHz
0 20 40 60 80
Accuracy: >97% for all quantized models
Latency (ms)
Tested database: MNIST dataset MNIST dataset

7
Memory optimizer

Optimize performance easily with the memory allocation tool

Model memory allocation


▪ Set your external memory
▪ Map in non-contiguous internal
flash section
▪ Partition internal vs external flash
memories

Re-use model input buffer to


store activation data*
▪ Minimize RAM requirements

Model RAM Relocatable network


consumption per layer
▪ A separate binary is generated for
▪ Easily identify most critical the library and the network to
layers enable standalone model upgrade

* Requires input and activation buffers in same memory


STM32Cube.AI
Get the best performance on STM32
Image Classif v0.7 Visual Wake Word v0.7

378
148
141 KB
KB
KB

301
KB

HW Target: STM32H723
55 Flash: 1Mbyte
50 50 KB RAM: 564 Kbytes
KB 102 Freq: 550 MHz
KB
28
58 SW Version:
36 KB X-Cube.AI v 7.2.0
27 TFLm v2.7.0
Latency (ms) Flash RAM Latency (ms) Flash RAM
* the lower the better

9
Making Edge AI possible with all STM32 portfolio
STM32Cube.AI is compatible with all STM32 series
STM32MP1
4158 CoreMark
MPU Up to 800 MHz Cortex-A7
209 MHz Cortex-M4

STM32F2 STM32F4 STM32F7 STM32H7


High Perf Up to 398 CoreMark Up to 608 CoreMark 1082 CoreMark Up to 3224 CoreMark
MCUs 120 MHz Cortex-M3 180 MHz Cortex-M4 216 MHz Cortex-M7 Up to 550 MHz Cortex -M7
240 MHz Cortex -M4

STM32F0 STM32G0 STM32F1


106 CoreMark 142 CoreMark 177 CoreMark
Mainstream 48 MHz Cortex-M0 64 MHz Cortex-M0+ 72 MHz Cortex-M3

MCUs STM32F3 STM32G4


245 CoreMark 569 CoreMark
72 MHz Cortex-M4 170 MHz Cortex-M4 Mixed-signal MCUs

Ultra-low Power STM32L0 STM32L1 STM32L4 STM32L4+ STM32L5 STM32U5


75 CoreMark 93 CoreMark 273 CoreMark 409 CoreMark 443 CoreMark 651 CoreMark
MCUs 32 MHz Cortex-M0+ 32 MHz Cortex-M3 80 MHz Cortex-M4 120 MHz Cortex-M4 110 MHz Cortex-M33 160 MHz Cortex-M33

STM32WL STM32WB
Wireless 162 CoreMark 216 CoreMark
MCUs 48 MHz Cortex-M4 64 MHz Cortex-M4
48 MHz Cortex-M0+ 32 MHz Cortex-M0+

Latest product generation Radio co-processor only


What’s new in STM32Cube.AI v7.2.0?
STM32Cube.AI development tool supports deeply quantized neural
networks

Further reduce AI code Improved performance Up-to-date and improved


# with deep quantization # tuning # code generation

• Introducing support for mixed • Addition of a new kernel • Support TensorFlow v2.9 models
precision quantization and binary performance enhancement for • Support of new ONNX operators
neural network (BNN) for STM32 further optimization of memory (refer to documentation for
• Supporting pre-trained quantized footprint and power consumption exhaustive list)
models from: • Extend support of scikit-learn ML
• qKeras algorithms
• Larq

11
STM32Cube.AI user flow (1/3)

• Load model and select an AI runtime


1 Select MCU • Analyze minimal required footprint
• Select corresponding STM32 MCUs

2 Optimize and validate

Generate project and


3
integrate

12
STM32Cube.AI user flow (2/3)
• Model complexity and footprint analysis
• Fine tune memory allocation with optimizations and GUI
• Optimize system parameters and clock tree
• Extend model with your own customer layers
1 Select MCU

2 Optimize and validate

Generate project and


3
integrate
• Validate on desktop with your own dataset
• Validate on target and check inference time

13
STM32Cube.AI user flow (3/3)

1 Select MCU
• Generate Application Template
• Integrate with your application-
specific code in your favorite IDE
• Perform system tests
2 Optimize and validate

Generate project and


3
integrate

14
Possible conversion strategies:
Network code generation and interpreter
• X-CUBE-AI Expansion Package integrates a specific path which allows to generate a ready-to-use STM32
IDE project embedding a TensorFlow Lite for Microcontrollers run-time (also called TFLm) and its
associated TFLite model. This can be considered as an alternative of the default X-CUBE-AI solution to
deploy a AI solution based on a TFLite model.

15
Possible conversion strategies:
Network code generation and interpreter
More Flexible: More optimized:
TensorFlow Lite interpreter mode Optimized C code generated by

Flat buffer NN C files


.tflite
converter converter STM32.AI lib
Pre-trained Pre-trained
model model

Flat buffer .tflite NN C files


Model is Model is pre- User app
interpreted and User compiled and STM32.AI lib
TFLite interpreter
app
executed by built-in custom linked only with
STM32 BSP
pre-built ops ops ops used ops
STM32 BSP STM32 device

STM32 device

run-time

run-time on
16
STM32Cube.AI vs Tensor Flow Lite for MCU
Results with STM32Cube.AI v7.2
Model Runtime MCU and Inference Flash (KiB) RAM (KiB)
board time (ms)
Image Classification X-CUBE-AI STM32U585 148 142 50
NUCLEO-
Image Classification TFLM 253 149 55
U575ZI-Q
% TFLM vs Cube.AI + 71 +5 +9

Model Runtime MCU and Inference Flash (KiB) RAM (KiB)


board time (ms)
Visual Wake Word X-CUBE-AI STM32H7A3 53 301 58
NUCLEO-
Visual Wake Word TFLM 72 381 102
H7A3ZI-Q
% TFLM vs Cube.AI + 35 + 27 + 75

17
Two user interfaces of X-CUBE-AI

• In order to get a better user experience, X-CUBE-AI provides two user interfaces ,
GUI and CLI
• GUI can be used with cubeMX

GUI CLI

18
« stm32ai » main command(s)
• Memory requirements (ro/rw)
• Processing requirements (MACC)
Pre-trained model • Complexity per layer
(topology and weights)

analyze • Metrics report, confusion matrix


• Accuracy classification
validate • Validation on target/desktop

Input/Output custom generate • Generate C-files


data or random data • Generate binary file for training
provided by the tool quantize parameters

• Keras reshaped pre-trained


model (float)
Pre-trained model with
• Quantization parameters
Quantization parameters
(JSON file)
19
Generate command
• The 'generate' command is used to generate the specialized network and data C-files. They
are generated in the specified output directory.
• The '<name>.c/.h' files contain the topology of the C-model (C-struct definition of the tensors
and the operators), including the embedded inference client API to use the generated model
on the top of the optimized inference runtime library. '<name>_data.c/.h' files are by default a
simple C-array with the data of the weight/bias tensors.

20
Analyze command
• The 'analyze' command is the primary command to import, to parse, to check and to render an
uploaded pre-trained model. Detailed report provides the main system metrics to know if the
generated code can be deployed on a STM32 device. It includes also rendering information by
layer or/and operator.

21
Validate command

• The 'validate' command allows to import, to render and to validate the generated C-files.

22
Validation
• the different metrics (and associated computing flow) which are used to evaluate the performance of
the generated C-files (or C-model). Proposed metrics should be considered as the generic indicators
which allows to compare numerically the predictions of the c-model against the predictions of the
original model.

23
Don't go alone

A network of companies is there to support you

Trust our authorized partners to ensure the success of your


project. Learn more at st.com/stm32ai

Would you like to discuss a co-development partnership


for ML/AI projects? Contact us at [email protected]

24
Find out more at www.st.com/stm32ai

© STMicroelectronics - All rights reserved.


ST logo is a trademark or a registered trademark of STMicroelectronics International NV or its affiliates in the EU and/or other countries.
For additional information about ST trademarks, please refer to www.st.com/trademarks.
All other product or service names are the property of their respective owners.

You might also like