0% found this document useful (0 votes)

18 views

Day5_03_Converting Neural Networks model into Optimzied Code

Uploaded by

Sidi El Hacen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Day5_03_Converting Neural Networks model into Optimzied Code

Uploaded by

Sidi El Hacen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Converting Neural Networks model into

Optimized Code for MCU: STM32Cube.AI

tool

Amit Kumar
5th Aug 2021
STM32 Cube.AI
STM32 comprehensive AI ecosystem
Applications Frameworks

Applicative
Examples
(Function Packs)

AI Model convertor
Pre and post
Processing Graph Memory
Quantizer
libraries optimizer optimizer

Edge Hardware

STM32 series Discovery kit STM32 Nucleo board Camera add-on

3
A tool to seamlessly integrate AI in your projects
Machine Learning

Deep Learning

Select MCU & upload your model

HIGH PERFORMANCE
EDGE AI PRODUCT
Optimize and validate

Generate project and deploy

4
The 3 pillars of STM32Cube.AI
Graph optimizer Quantized model support Memory optimizer
Automatically improve performance through Import your quantized ANN to be compatible Optimize memory allocation to get the best
graph simplifications & optimizations that with STM32 embedded architectures while performance while respecting the constraints
benefit STM32 target HW architectures keeping their performance of your embedded design

• Auto graph rewrite • From FP32 to Int8 • Memory allocation

• Node/operator fusion • Minimum loss of accuracy • Internal/external memory repartition
• Layout optimization • Code validation on target • Model-only update option
• Constant-folding… o Latency
• Operator-level info to fine-tune o Accuracy
memory footprint and computation o Memory usage

STM32Cube.AI is free of charge, available both in graphical interface and in command line.

5
Graph optimizer

Squeeze your graph to fit into an MCU!

Fully automated process in the

STM32Cube.AI workflow

• Your original graph is optimized at the

very early stage for optimal integration
into STM32 MCU/MPU

• Loss-less conversion

6
Quantized model support
Simply use quantized networks to reduce memory footprint and
inference time

LATENCY & MEMORY COMPARISON FOR STM32Cube.AI support quantized Neural Network
QUANTIZED MODELS
models with all parameter formats:
FP32
800
• FP32
700
• Int8
600
• Mixed binary Int1 to Int8 (Qkeras*, Larq.dev*)
Flash (kB)

500
*Please contact [email protected] to request
400
the relevant version of STM32Cube.AI
300
200 Int8
100 HW Target: NUCLEO-STM32H743ZI2
Int 1 + Int8 Model: Low complexity handwritten digit reading
0
Freq: 480 MHz
0 20 40 60 80
Accuracy: >97% for all quantized models
Latency (ms)
Tested database: MNIST dataset MNIST dataset

7
Memory optimizer

Optimize performance easily with the memory allocation tool

Model memory allocation

▪ Set your external memory
▪ Map in non-contiguous internal
flash section
▪ Partition internal vs external flash
memories

Re-use model input buffer to

store activation data*
▪ Minimize RAM requirements

Model RAM Relocatable network

consumption per layer
▪ A separate binary is generated for
▪ Easily identify most critical the library and the network to
layers enable standalone model upgrade

* Requires input and activation buffers in same memory

STM32Cube.AI
Get the best performance on STM32
Image Classif v0.7 Visual Wake Word v0.7

378
148
141 KB
KB
KB

301
KB

HW Target: STM32H723
55 Flash: 1Mbyte
50 50 KB RAM: 564 Kbytes
KB 102 Freq: 550 MHz
KB
28
58 SW Version:
36 KB X-Cube.AI v 7.2.0
27 TFLm v2.7.0
Latency (ms) Flash RAM Latency (ms) Flash RAM
* the lower the better

9
Making Edge AI possible with all STM32 portfolio
STM32Cube.AI is compatible with all STM32 series
STM32MP1
4158 CoreMark
MPU Up to 800 MHz Cortex-A7
209 MHz Cortex-M4

STM32F2 STM32F4 STM32F7 STM32H7

High Perf Up to 398 CoreMark Up to 608 CoreMark 1082 CoreMark Up to 3224 CoreMark
MCUs 120 MHz Cortex-M3 180 MHz Cortex-M4 216 MHz Cortex-M7 Up to 550 MHz Cortex -M7
240 MHz Cortex -M4

STM32F0 STM32G0 STM32F1

106 CoreMark 142 CoreMark 177 CoreMark
Mainstream 48 MHz Cortex-M0 64 MHz Cortex-M0+ 72 MHz Cortex-M3

MCUs STM32F3 STM32G4

245 CoreMark 569 CoreMark
72 MHz Cortex-M4 170 MHz Cortex-M4 Mixed-signal MCUs

Ultra-low Power STM32L0 STM32L1 STM32L4 STM32L4+ STM32L5 STM32U5

75 CoreMark 93 CoreMark 273 CoreMark 409 CoreMark 443 CoreMark 651 CoreMark
MCUs 32 MHz Cortex-M0+ 32 MHz Cortex-M3 80 MHz Cortex-M4 120 MHz Cortex-M4 110 MHz Cortex-M33 160 MHz Cortex-M33

STM32WL STM32WB
Wireless 162 CoreMark 216 CoreMark
MCUs 48 MHz Cortex-M4 64 MHz Cortex-M4
48 MHz Cortex-M0+ 32 MHz Cortex-M0+

Latest product generation Radio co-processor only

What’s new in STM32Cube.AI v7.2.0?
STM32Cube.AI development tool supports deeply quantized neural
networks

Further reduce AI code Improved performance Up-to-date and improved

# with deep quantization # tuning # code generation

• Introducing support for mixed • Addition of a new kernel • Support TensorFlow v2.9 models
precision quantization and binary performance enhancement for • Support of new ONNX operators
neural network (BNN) for STM32 further optimization of memory (refer to documentation for
• Supporting pre-trained quantized footprint and power consumption exhaustive list)
models from: • Extend support of scikit-learn ML
• qKeras algorithms
• Larq

11
STM32Cube.AI user flow (1/3)

• Load model and select an AI runtime

1 Select MCU • Analyze minimal required footprint
• Select corresponding STM32 MCUs

2 Optimize and validate

Generate project and

3
integrate

12
STM32Cube.AI user flow (2/3)
• Model complexity and footprint analysis
• Fine tune memory allocation with optimizations and GUI
• Optimize system parameters and clock tree
• Extend model with your own customer layers
1 Select MCU

2 Optimize and validate

Generate project and

3
integrate
• Validate on desktop with your own dataset
• Validate on target and check inference time

13
STM32Cube.AI user flow (3/3)

1 Select MCU
• Generate Application Template
• Integrate with your application-
specific code in your favorite IDE
• Perform system tests
2 Optimize and validate

Generate project and

3
integrate

14
Possible conversion strategies:
Network code generation and interpreter
• X-CUBE-AI Expansion Package integrates a specific path which allows to generate a ready-to-use STM32
IDE project embedding a TensorFlow Lite for Microcontrollers run-time (also called TFLm) and its
associated TFLite model. This can be considered as an alternative of the default X-CUBE-AI solution to
deploy a AI solution based on a TFLite model.

15
Possible conversion strategies:
Network code generation and interpreter
More Flexible: More optimized:
TensorFlow Lite interpreter mode Optimized C code generated by

Flat buffer NN C files

.tflite
converter converter STM32.AI lib
Pre-trained Pre-trained
model model

Flat buffer .tflite NN C files

Model is Model is pre- User app
interpreted and User compiled and STM32.AI lib
TFLite interpreter
app
executed by built-in custom linked only with
STM32 BSP
pre-built ops ops ops used ops
STM32 BSP STM32 device

STM32 device

run-time

run-time on
16
STM32Cube.AI vs Tensor Flow Lite for MCU
Results with STM32Cube.AI v7.2
Model Runtime MCU and Inference Flash (KiB) RAM (KiB)
board time (ms)
Image Classification X-CUBE-AI STM32U585 148 142 50
NUCLEO-
Image Classification TFLM 253 149 55
U575ZI-Q
% TFLM vs Cube.AI + 71 +5 +9

Model Runtime MCU and Inference Flash (KiB) RAM (KiB)

board time (ms)
Visual Wake Word X-CUBE-AI STM32H7A3 53 301 58
NUCLEO-
Visual Wake Word TFLM 72 381 102
H7A3ZI-Q
% TFLM vs Cube.AI + 35 + 27 + 75

17
Two user interfaces of X-CUBE-AI

• In order to get a better user experience, X-CUBE-AI provides two user interfaces ,
GUI and CLI
• GUI can be used with cubeMX

GUI CLI

18
« stm32ai » main command(s)
• Memory requirements (ro/rw)
• Processing requirements (MACC)
Pre-trained model • Complexity per layer
(topology and weights)

analyze • Metrics report, confusion matrix

• Accuracy classification
validate • Validation on target/desktop

Input/Output custom generate • Generate C-files

data or random data • Generate binary file for training
provided by the tool quantize parameters

• Keras reshaped pre-trained

model (float)
Pre-trained model with
• Quantization parameters
Quantization parameters
(JSON file)
19
Generate command
• The 'generate' command is used to generate the specialized network and data C-files. They
are generated in the specified output directory.
• The '<name>.c/.h' files contain the topology of the C-model (C-struct definition of the tensors
and the operators), including the embedded inference client API to use the generated model
on the top of the optimized inference runtime library. '<name>_data.c/.h' files are by default a
simple C-array with the data of the weight/bias tensors.

20
Analyze command
• The 'analyze' command is the primary command to import, to parse, to check and to render an
uploaded pre-trained model. Detailed report provides the main system metrics to know if the
generated code can be deployed on a STM32 device. It includes also rendering information by
layer or/and operator.

21
Validate command

• The 'validate' command allows to import, to render and to validate the generated C-files.

22
Validation
• the different metrics (and associated computing flow) which are used to evaluate the performance of
the generated C-files (or C-model). Proposed metrics should be considered as the generic indicators
which allows to compare numerically the predictions of the c-model against the predictions of the
original model.

23
Don't go alone

A network of companies is there to support you

Trust our authorized partners to ensure the success of your

project. Learn more at st.com/stm32ai

Would you like to discuss a co-development partnership

for ML/AI projects? Contact us at [email protected]

24
Find out more at www.st.com/stm32ai

ST logo is a trademark or a registered trademark of STMicroelectronics International NV or its affiliates in the EU and/or other countries.
For additional information about ST trademarks, please refer to www.st.com/trademarks.
All other product or service names are the property of their respective owners.

Algorithmic Trading & Quantitative Strategies Gappy Lecture 4
No ratings yet
Algorithmic Trading & Quantitative Strategies Gappy Lecture 4
23 pages
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
From Everand
PlayStation Architecture: Architecture of Consoles: A Practical Analysis, #6
Rodrigo Copetti
No ratings yet
SAC 8 0 Developers Guide Linux Rev A
No ratings yet
SAC 8 0 Developers Guide Linux Rev A
134 pages
E4ds ST Webinar Presentation
No ratings yet
E4ds ST Webinar Presentation
61 pages
X Cube Ai
No ratings yet
X Cube Ai
8 pages
STM32 Artificial Intelligence Solutions: Raphael Apfeldorfer - Feb 2021 MDG/MCD/AI Solutions
No ratings yet
STM32 Artificial Intelligence Solutions: Raphael Apfeldorfer - Feb 2021 MDG/MCD/AI Solutions
34 pages
en.DM01140254
No ratings yet
en.DM01140254
6 pages
Nanoedgeaistudio
No ratings yet
Nanoedgeaistudio
6 pages
Artificial Intelligence (AI) Software Expansion For STM32Cube
No ratings yet
Artificial Intelligence (AI) Software Expansion For STM32Cube
8 pages
Online-EN Fact Sheet STM32MP1x-1
No ratings yet
Online-EN Fact Sheet STM32MP1x-1
2 pages
Chapter 1
No ratings yet
Chapter 1
107 pages
VLSID 2024 DC OrientationSession
No ratings yet
VLSID 2024 DC OrientationSession
16 pages
Microcontrollers stm32 Family Overview
No ratings yet
Microcontrollers stm32 Family Overview
46 pages
Microcontrollers Stm32h7a3-B3 Line Product-Overview
No ratings yet
Microcontrollers Stm32h7a3-B3 Line Product-Overview
25 pages
En - stm32 Embedded Software Offering
No ratings yet
En - stm32 Embedded Software Offering
15 pages
Tricore: Licensable 32bit Microcontroller Core
No ratings yet
Tricore: Licensable 32bit Microcontroller Core
2 pages
Instinct Mi325x Infographic
No ratings yet
Instinct Mi325x Infographic
1 page
STM32 - Overview and Motor Control - 2015
No ratings yet
STM32 - Overview and Motor Control - 2015
29 pages
tinyML_Talks_Eben_Upton_210304
No ratings yet
tinyML_Talks_Eben_Upton_210304
22 pages
Code Generation For Stm32 Mcus Using Matlab and Simulink: March 2020
No ratings yet
Code Generation For Stm32 Mcus Using Matlab and Simulink: March 2020
21 pages
Slide #2 - The AVR Microcontroller
No ratings yet
Slide #2 - The AVR Microcontroller
23 pages
Comparative Analysis of ESP32 and Other Microcontrollers
No ratings yet
Comparative Analysis of ESP32 and Other Microcontrollers
13 pages
M.2 AI Edge Accelerator Module
No ratings yet
M.2 AI Edge Accelerator Module
2 pages
Imx8qadxplusfs 1879171
No ratings yet
Imx8qadxplusfs 1879171
3 pages
IMX8MPIEC
No ratings yet
IMX8MPIEC
113 pages
CPU31x Tech Specs
No ratings yet
CPU31x Tech Specs
64 pages
tinyML_Talks_Massimo_Banzi_210420
No ratings yet
tinyML_Talks_Massimo_Banzi_210420
68 pages
01 Introduction PDF
No ratings yet
01 Introduction PDF
48 pages
Stm32cube Ecosystem Overview
No ratings yet
Stm32cube Ecosystem Overview
65 pages
En - stm32 Matlab
No ratings yet
En - stm32 Matlab
14 pages
slat162
No ratings yet
slat162
6 pages
S32k1-Generalpurposemcus Brochure
No ratings yet
S32k1-Generalpurposemcus Brochure
5 pages
Cortex A8
No ratings yet
Cortex A8
5 pages
Stm32 Embedded Software Offering
No ratings yet
Stm32 Embedded Software Offering
12 pages
Microcontrollers stm32 Family Overview
No ratings yet
Microcontrollers stm32 Family Overview
46 pages
68HC908JB8: 8-Bit Microcontroller
No ratings yet
68HC908JB8: 8-Bit Microcontroller
3 pages
SKEA64PB, Product Brief - Product Brief
No ratings yet
SKEA64PB, Product Brief - Product Brief
14 pages
STM32H7 Series Product Overview
No ratings yet
STM32H7 Series Product Overview
65 pages
cxstm8 PD
No ratings yet
cxstm8 PD
4 pages
Cell Tutorial
No ratings yet
Cell Tutorial
87 pages
MNU CAI ICI334 Lec4&5
No ratings yet
MNU CAI ICI334 Lec4&5
33 pages
MICA2 Datasheet
No ratings yet
MICA2 Datasheet
2 pages
stm32n6570-dk
No ratings yet
stm32n6570-dk
7 pages
MAJIC-MX Datasheet
No ratings yet
MAJIC-MX Datasheet
2 pages
Optimus S: S S Iiss Aa Ccoom
No ratings yet
Optimus S: S S Iiss Aa Ccoom
2 pages
m10 - Overview 683658 666776
No ratings yet
m10 - Overview 683658 666776
14 pages
In-Sight 8000 Datasheet
No ratings yet
In-Sight 8000 Datasheet
3 pages
MD00931 2B microAptivUC DTS 01.01
No ratings yet
MD00931 2B microAptivUC DTS 01.01
19 pages
Stm32Cubeprog: Stm32Cubeprogrammer All-In-One Software Tool
No ratings yet
Stm32Cubeprog: Stm32Cubeprogrammer All-In-One Software Tool
4 pages
Evaluation Kit Based On i.MX 6ULL Applications Processors
No ratings yet
Evaluation Kit Based On i.MX 6ULL Applications Processors
2 pages
Lecture Note 02 - Microprocessor Concept Architectures and Its Trends
No ratings yet
Lecture Note 02 - Microprocessor Concept Architectures and Its Trends
82 pages
st67w611m1-flyer-1
No ratings yet
st67w611m1-flyer-1
2 pages
Lecture12_CurrentFutureIoTTrends
No ratings yet
Lecture12_CurrentFutureIoTTrends
27 pages
Embedded Systems: Laboratory Manual
100% (1)
Embedded Systems: Laboratory Manual
32 pages
Chapter 2
No ratings yet
Chapter 2
58 pages
Stm32h573i DK
No ratings yet
Stm32h573i DK
7 pages
ThinkCentre M90n 1 Nano IoT Spec
No ratings yet
ThinkCentre M90n 1 Nano IoT Spec
8 pages
Tiva - C Tm4C123G Launchpad For Blinking Led: Name Aqdas Nadeem Reg. # 2017-Ee-410 Marks
No ratings yet
Tiva - C Tm4C123G Launchpad For Blinking Led: Name Aqdas Nadeem Reg. # 2017-Ee-410 Marks
6 pages
stm32mp135f dk-3107229
No ratings yet
stm32mp135f dk-3107229
8 pages
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
From Everand
Dreamcast Architecture: Architecture of Consoles: A Practical Analysis, #9
Rodrigo Copetti
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
From Everand
Mastering the Art of x86 Assembly Programming: Unlocking the Secrets of Expert-Level Skills
Steve Jones
No ratings yet
4A. Remedial Law 2006-2019
No ratings yet
4A. Remedial Law 2006-2019
118 pages
ACI 318M-11 RC Bracket and Corbel Design - v0.03 - 2017-04-10
No ratings yet
ACI 318M-11 RC Bracket and Corbel Design - v0.03 - 2017-04-10
5 pages
Quantum Dot Solar Cells: High Efficiency Through Multiple Exciton Generation
No ratings yet
Quantum Dot Solar Cells: High Efficiency Through Multiple Exciton Generation
5 pages
IMC Marathon
No ratings yet
IMC Marathon
4 pages
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
No ratings yet
C 100 For I 1 To N Do For J 1 To N Do (Temp A (I) (J) + C A (I) (J) A (J) (I) A (J) (I) Temp - C) For I 1 To N Do For J 1 To N Do Output (A (I) (J) )
5 pages
EO1995 2000 Lawrence - 0008
No ratings yet
EO1995 2000 Lawrence - 0008
114 pages
Confederation of Indian Industry
No ratings yet
Confederation of Indian Industry
12 pages
Office Cleaning Services Bid Proposal: This Packet Includes
No ratings yet
Office Cleaning Services Bid Proposal: This Packet Includes
6 pages
Core Abap Notes PDF
100% (2)
Core Abap Notes PDF
38 pages
Lea 4 - Instructional Material
No ratings yet
Lea 4 - Instructional Material
34 pages
FS Final AEI 31 December 2023
No ratings yet
FS Final AEI 31 December 2023
185 pages
Box Culvert
No ratings yet
Box Culvert
4 pages
Variable Stress Analysis
No ratings yet
Variable Stress Analysis
20 pages
DRAM
No ratings yet
DRAM
24 pages
Lead Acid Battery
No ratings yet
Lead Acid Battery
8 pages
How To Update Firmware
No ratings yet
How To Update Firmware
2 pages
Mtek 3 KVA Monofásica Torre
No ratings yet
Mtek 3 KVA Monofásica Torre
2 pages
7AA56 160a
No ratings yet
7AA56 160a
16 pages
Exclusionary Principle Under Interpreteation of Statutes
No ratings yet
Exclusionary Principle Under Interpreteation of Statutes
14 pages
EF River Pro
No ratings yet
EF River Pro
157 pages
Scenario Planning
No ratings yet
Scenario Planning
3 pages
7135 PLB Level 2 Cur
No ratings yet
7135 PLB Level 2 Cur
38 pages
Tech Radiopharmacy
100% (1)
Tech Radiopharmacy
52 pages
Sikagrout 214 in Msds
No ratings yet
Sikagrout 214 in Msds
7 pages
Chapter 4. Internet of Things Iot
100% (1)
Chapter 4. Internet of Things Iot
70 pages
1367 6 PDF
100% (1)
1367 6 PDF
20 pages
Pages From Slope Stability 1-2
No ratings yet
Pages From Slope Stability 1-2
36 pages
Lockton Closer Look 2015
No ratings yet
Lockton Closer Look 2015
8 pages