0% found this document useful (0 votes)
70 views

Techology Trend of Edge Ai: Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu Mediatek Inc

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Techology Trend of Edge Ai: Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu Mediatek Inc

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Techology Trend of Edge AI

Yen-Lin Lee, Pei-Kuei Tsung, and Max Wu


MediaTek Inc.
{yenlin.lee, pei-kuei.tsung, max.wu}@mediatek.com

Abstract — Artificial intelligence (AI), defined as intelligence Latency Efficiency Availability Privacy
exhibited by machines, has many applications in today's society,
Smart
including robotics, mobile devices, smart transportation, ADAS Smart Glasses Home
Camera
healthcare service, and more. Recently, lots of AI investment in Drone AR/VR Assistant
Sensors
both big companies and startups have launched. Besides cloud-
Fig. 1. Edge AI opportunities different from cloud-based AI
based solution, AI on the edge devices (Edge AI) takes the
advantages of rapid response with low latency, high privacy,
more robustness, and better efficient use of network bandwidth. DSP Deep Learning
CPU GPU
To enable Edge AI, new embedded system technologies are (VPU) Accelerator (DLA)
desired, including machine learning, neural network acceleration Main function
- Control - Graph Signal
Special purpose
- Serial computing - Parallel computing processing
and reduction, and heterogeneous run-time mechanism. This
Flexibility H L
paper introduces challenges and technologies trend of Edge AI.
Efficiency L H
In addition, it illustrates edge AI solutions from MediaTek,
including the dedicated AI processing unit (APU) and NeuroPilot Fig. 2. Processors comparison for AI processing
technology, which provides superior Edge AI ability in a wide
range of applications.
the dedicated AI processing unit (APU) and the NeuroPilot
technology, MediaTek provides ready-to-product solution for
I. INTRODUCTION
Edge AI.
In the recent years, artificial intelligence (AI) appears in The rest of the paper is organized as following: First, the
every technology field. From home electronics to the complex design challenges and current technology progress for Edge
simulation experiment for protein structure, AI or machine AI is illustrated in section II. Then, MediaTek approaches for
learning has been launched to enhance the quality of Edge AI is discussed in section III. Finally, section IV
computation and create possibilities for new applications, such concludes this paper.
as face unlock of mobile phone or autonomous driving.
However, high performance of machine learning or deep II. EDGE AI DESIGN CHALLENGES AND TECHNOLOGIES
learning requires huge computation capability to deal with
Figure 1 describes the opportunities and key requirements
complex training and inference methodologies and large
of Edge AI computing comparing to the challenges of cloud-
dataset [1]. That is, in order to satisfy the computation thirst,
based AI frameworks. For different applications, there are
cloud servers have to provide very powerful computational
different critical requirements including latency, efficiency,
capabilities. Hence, more and more non-traditional alternative
availability, privacy, and so on. For the vehicles or drones,
solutions have shown up in recent years to effectively execute
moving speed limits the latency tolerance of response.
the AI computation tasks. For example, Google provides the
Otherwise, a crash or accident will happen. Home assistant
tensor processing unit (TPU) solution as the specialized
systems or devices always bring privacy concerns because
computing unit for AI processing tasks [2]. NVidia also
processed content touches on personal information.
innovates new GPU server architecture to favor AI
Furthermore, power efficiency is extremely important for
characteristic [3].
Edge AI devices, especially for wearable, in order to have
The cloud-based eco-system has demonstrated itself as a
longer duration usage. All these needs make the Edge AI
practical platform to serve some AI applications. However, the
computing necessary and have brought it to the forefront.
cloud-based solution has many limitations that might prevent
However, Edge AI also has its own design challenges that
the adoption on all AI applications. Taking the autonomous
need to address:
driving for example, the connection robustness and its latency
from the server seriously impact the safety of the vehicle due A. Power Efficiency and Different Types of AI Processors
to the time-to-collision. In addition, uploading the personal Fist and the most important challenge is that the edge
information or the record of street-view video to cloud brings devices have to provide enough computational capacity within
the privacy issue. Furthermore, there is not always existing specific limitations, such as thermal or form factor size. Due
internet connectivity everywhere. These issues lead to the to these limitations, the Edge AI prefers to focus on the
requirement that AI computation must be on the edge devices inference part and leave the training stage in the cloud as usual.
(Edge AI). In this paper, design challenges and technology The inference computation in Edge AI can be handled by
trend of Edge AI are discussed, and how MediaTek overcomes various computation units inside the device. Figure 2 shows
the challenges stated above is also introduced. By developing the various embedded processors for AI computing and their

978-1-5386-4260-3/18/$31.00 ©2018 IEEE


Various AI Applications
latency cannot be simply hidden in the AI computation. In
order to have better throughput on pre-defined hardware, SoC
Face Scene Gesture System Voice
Face ID Beautification Detection Detection Performance Recognition vendors usually provide efficient pre-built libraries for the key
Google
TensorFlow Caffe
Amazon
MXNet
Sony
NNabla
Other NN
Frameworks
MediaTek AI
Extensions App Libs
AI computation operations. Getting benefit from AI libraries,
obj. detection/
face detection/ face
developers can reduce lots of effort with programing but still
Model Translator Android NN recognition/…
get good performance as close-to-metal coding.
Optimizer Quantization Others…
Tool Kits
C. Privacy/Security
Heterogeneous Runtime Edge device without any internet connection has high
CPU GPU APU (AI Processing Unit)
privacy protection. However, most consumer electronic
devices need to support on-line applications other than the
Fig. 3. MediaTek NeuroPilot framework introduction
Edge AI application only. Therefore, it is necessary to enable
the dedicated security zone to protect the private information,
AP / Router Smart Device such as finger print, voice, and face recognition data. The
security zone might require isolated hardware or software
environment to avoid pollution.
Digital TV Voice / Codec Engine IoT

C III. MEDIATEK SOLUTIONS FOR EDGE AI


Connectivity G Multimedia
A
Various Processing
Facing the Edge AI challenges and opportunities, MediaTek
Units in SoC
prepares the corresponding solution. As shown in Fig. 3,
Smartphone Automotive

Visual ISP Other Sensors


MediaTek provides both hardware and software environment
Product
Portfolio
Technology
Portfolio
Performance /
Power Balance
Heterogeneous
Computing to optimize Edge AI performance [7]. First of all, the
Android Linux RTOS Others
dedicated APU is designed to have better power efficiency.
Fig. 4. MediaTek provides broad-range electronic devices with the latest AI
technologies running on different platforms and OSes
Comparing to CPU operation, up to 95% of energy
consumption can be eliminated. Second, the heterogeneous
runtime in NeuroPilot software development kit (SDK)
pros and cons. Although CPU and GPU have been used for AI
manages the task scheduling among the CPU, GPU, and APU.
process as computation resources for a long period of time, the
Moreover, NeuroPilot supports current state-of-the-art AI
general purpose architecture still causes higher power frameworks, including Caffe, Tensorflow, MXNet and
consumption. For those applications that only require short NNabla. The toolchains including model translator in
burst of computation, the general purpose processor can NeuroPilot allow programmers to enable AI application on
provide enough performance and take the advantage of time- devices. Finally, the application libraries are offered to fast
to-market. However, for long duration or sustainable scenarios, connect the PC-based prototyping to the close-to-metal
such as AI post-processing on social video streaming, energy performance in the Edge AI development. As the result, Fig. 4
efficiency becomes very important and necessary. The shows the application scope with MediaTek solution. With the
specialized processor, like DSP-based processor, has been core technology supporting, Edge AI can be realized in a wide
adopted to have better power efficiency for specific range of applications.
application with less flexibility. For computation and power
efficiency, fixed-point format is widely used, and the bit-width IV. CONCLUSION
might be configured to 8 bits or even lower in DSP [4]. Today, In this paper, the design challenges and technology trend in
deep learning accelerator (DLA) starts to be used in edge Edge AI are introduced. Compared to cloud-based AI, Edge
devices. DLA can achieve the highest power efficiency by AI takes the advantage of low latency and privacy protection.
implementing the key computation operations as hardwired Power efficiency and computation efficiency become the
logic [5]. necessary requirements for edge AI. Finally, the MediaTek
B. Computational Complexity and Efficiency solutions is proposed. The dedicated APU design and
For Edge AI applications, the computational requirements NeuroPilot SDK enable Edge AI ability in a wide range of
are represented in two different ways. First, the algorithm applications.
throughput needs to be constantly sustainable period to meet
the real-time constraint without frame drop. Second, the REFERENCE
processing latency should be low enough going through [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning.” In Nature, 2015.
[2] Norman P. Jouppi, et. al., “In-Datacenter Performance Analysis of a Tensor
overall algorithm pipeline. Taking autonomous driving as an Processing Unit.” arXiv:1704.04760
example, the latency of video processing should be less than [3] “NVidia Tesla V100 GPU Architecture” Nvidia website, 2017
[4] M. Courbariaux, et. al., “Training deep neural networks with low precision
100ms for safety [6]. multiplications.” arXiv:1412.7024
Challenges lead to the innovation of AI processor [5] Y.-H. Chen, et. al., "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for
architecture approaches. The conventional parallel computing Deep Convolutional Neural Networks". IEEE International Solid-State Circuits
Conference, ISSCC 2016
architecture usually leverages multiple threads to hide memory [6] S. Mochizuki et. al.,“A 197mW 70ms-Latency Full-HD 12-Channel Video-
access latency. New on-chip memory architecture and data Processing SoC for Car Information Systems”. IEEE International Solid-State
Circuits Conference, ISSCC 2016
flow management needs to be re-considered because the [7] https://www.MediaTek.tw/features/artificial-intelligence

You might also like