Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
Received August 29, 2019, accepted September 10, 2019, date of publication September 16, 2019,
date of current version September 27, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2941836
ABSTRACT Edge computing aims to integrate computing into everyday settings, enabling the system
to be context-aware and private to the user. With the increasing success and popularity of deep learning
methods, there is an increased demand to leverage these techniques in mobile and wearable computing
scenarios. In this paper, we present an assessment of a deep human activity recognition system’s memory
and execution time requirements, when implemented on a mid-range smartphone class hardware and the
memory implications for embedded hardware. This paper presents the design of a convolutional neural
network (CNN) in the context of human activity recognition scenario. Here, layers of CNN automate the
feature learning and the influence of various hyper-parameters such as the number of filters and filter size on
the performance of CNN. The proposed CNN showed increased robustness with better capability of detecting
activities with temporal dependence compared to models using statistical machine learning techniques. The
model obtained an accuracy of 96.4% in a five-class static and dynamic activity recognition scenario. We
calculated the proposed model memory consumption and execution time requirements needed for using it
on a mid-range smartphone. Per-channel quantization of weights and per-layer quantization of activation to
8-bits of precision post-training produces classification accuracy within 2% of floating-point networks for
dense, convolutional neural network architecture. Almost all the size and execution time reduction in the
optimized model was achieved due to weight quantization. We achieved more than four times reduction in
model size when optimized to 8-bit, which ensured a feasible model capable of fast on-device inference.
INDEX TERMS Convolutional neural networks, edge computing, tensorflow lite, activity recognition, deep
learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 7, 2019 133509
T. Zebin et al.: Design and Implementation of a CNN on an Edge Computing Smartphone for HAR
FIGURE 1. Layer-wise specification of the CNN architecture used for multi-class HAR classification.
layer wass then flattened out to be used as an automatically parameters to minimize the difference between input and
extracted feature set by any other classifier. The six-channel reconstructed output over the whole training set. To apply
input time-series (segmented as 128 samples/window for CNN to human activity recognition, there is a need for several
each channel) from 3-axis accelerometer and 3-axis gyro- design adjustments for 1-D adaptation for processing the
scope were processed by a number of convolution filters. sensor data such as input adaptation, pooling, and weight-
These filters create a non-linear, distributed representation sharing. The subsections below discuss several adaptations
of the input and are of variable filter size. These are then be and processing stages of the proposed convolutional neural
applied over the entire input time series with a specific stride networks for the activity classification task in hand.
length. Then a max-pooling layer is used to down-sample the
temporal features (such as slopes/changes in the time series B. CNN FEATURE EXTRACTION
signal) that the convolution layer has just extracted. For a In the CNN architecture, the internal representation of the
given training dataset, our objective is to find the optimal input is implicitly learned by the convolutional kernels.
FIGURE 4. Effect of convolutional kernel size in CNN activity FIGURE 5. Average training set accuracy over 150 epochs for the
classification. proposed model. The batch normalized CNN is more stable in terms of
accuracy than the generic CNN.
FIGURE 7. Steps for embedded adaptation of TensroFlow based deep learning models.
TABLE 5. Memory consumption and execution time summary by node are efficient for a cross-platform serialization library but not
type.
suitable for embedded implementations [33], [34].
We presented the required stages in the workflow in Fig. 7
where the first block corresponds to training stage of the
deep learning model on TensorFlow and the second block
encompasses the steps required to get the graph representa-
tion inference ready on an embedded platform. The rest of the
stages are required for optimization before the model can be
deployed on an edge device.
FIGURE 8. (a) Screen-shot and (b) power profile of the HAR Android app running on a Samsung
A5 smartphone.
FIGURE 9. (a) Weight and activation quantization scheme, (b) Memory footprint of various deep learning models in terms of weight
and activation.
Fig. 8(b) shows the absolute power consumption of the with a floating-point representation, there are a number of
implementation. We used Trepn Profiler [35] for these mea- weight tensors that are constant and variable input tensors
surements. The average absolute power consumption when stored as floating point numbers. The forward pass function
the app is running in the inference mode is approximately which operates on the weights and inputs, using floating point
40 mW. In comparison, the same value measured for YouTube arithmetic, storing the output in output tensors as a float.
is around 116 mW, which suggests that our HAR android Post-training quantization techniques are simpler to use and
app is relatively low power and computationally effective. For allow for quantization with limited data [32], [36]. In this
implementing model inference stage on device, we further work, we explore the impact of quantizing the weights and
reduced the model size by compressing weights and/or quan- activation functions separately. The results for the weight and
tize both weights and activations for faster inference, without activation quantization experiments are shown in Fig. 9(b)
re-training the model. and Table 6. Moving from 32-bits float to fixed-point 8-bits
leads to 4× reduction in memory. Fig. 9(b) shows the actual
C. OPTIMIZATION THROUGH QUANTIZATION memory footprint of various learning models such as the
In this section, multiple approaches for model quantization dense neural network (DNN), CNN and an LSTM model built
are discussed to demonstrate the performance impact for each using the same dataset. As can be seen from the bar diagram
of these approaches. All these experimental scenarios were in Fig. 9(a) and from the memory footprint column in Table 6,
simulated on a conventional computer with a 2.4GHZ CPU the CNN model was 7.62 times smaller in size when post
and 32GB memory and the quantized model were tested fur- training quantization is applied on the weights and activation
ther on a Samsung A5 2017 smartphone (1.2 GHz quad-core of the model. This version would also be more suitable for
CPU with 3GB memory) for its functionality on an activity low-end microprocessors that do not support floating-point
recognition app using the TFlite library. In a conventional arithmetic. Our experiments suggested that we can variably
neural network layer implemented in TensorFlow or keras quantize (i.e. discretize) the range to only record some values
TABLE 6. Change of model accuracy due to quantization, 32-bit floating point model accuracy and 8-bit quantized model accuracy.
with float32 that have more significant contribution on the of sensor displacement. In the future, we will drive this
model accuracy and round off the rest to unit8 values to still implementation on programmable devices such as an ARM
take advantage of the optimized model. In Table 6, we also Cortex M-series [38] systems and Sparkfun apollo3 edge
observed a slight increase in prediction accuracy for the test platforms [6], with further development of the C++ API in
dataset for DNN and LSTM models because of the salient TensorFlow Lite framework [31]. The smartphone implemen-
and less noisy weight values available in the saved model tation presented this study could be useful for making smart
for activity prediction. The inference time for the optimized wearables and devices that are stand alone from the cloud,
and fine-tuned CNN was 0.23 seconds on an average for the potentially improving the user privacy.
detection of a typical activity window. The table also suggests
that the minimization of model parameters may not necessar- ACKNOWLEDGMENT
ily lead to the optimal network in terms of performance. This This work was carried out at the University of Manchester.
means a selective quantization approach (e.g. using k-means Data and code supporting this publication can be obtained
clustering) can make the process more efficient than straight from T. Zebin’s Github [37]. The steps to deploy the trained
linear quantization. model within the Android Application are explained in the
video abstract accompanying this article.
VII. CONCLUSION
We have presented a deep convolutional neural network REFERENCES
model for the classification of five daily-life activities using [1] J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu, ‘‘Deep learning for sensor-
based activity recognition: A survey,’’ 2017, arXiv:1707.03502. [Online].
raw accelerometer and gyroscope data of a wearable sen- Available: https://arxiv.org/abs/1707.03502
sor as the input. Our experimental results demonstrate how [2] F. Gu, K. Khoshelham, S. Valaee, J. Shang, and R. Zhang, ‘‘Locomotion
these characteristics can be efficiently extracted by automated activity recognition using stacked denoising autoencoders,’’ IEEE Internet
Things J., vol. 5, no. 3, pp. 2085–2093, Jun. 2018.
feature engine in CNNs. The presented model obtained an [3] C. A. Ronao and S.-B. Cho, ‘‘Deep convolutional neural networks for
accuracy of 96.4% in a five-class static and dynamic activ- human activity recognition with smartphone sensors,’’ in Proc. Int. Conf.
ity recognition scenario with a 20 volunteer custom dataset Neural Inf. Process. Cham, Switzerland: Springer, 2015, pp. 46–53.
[Online]. Available: https://link.springer.com/chapter/10.1007/978-3-319-
available at the GitHub repository for this research [37]. The 26561-2_6
proposed model showed increased robustness and has a better [4] T. Zebin, P. J. Scully, and K. B. Ozanyan, ‘‘Evaluation of supervised clas-
capability of detecting activities with temporal dependence sification algorithms for human activity recognition with inertial sensors,’’
in Proc. IEEE SENSORS conf., Glasgow, U.K., Nov. 2017, pp. 1–3.
compared to models using statistical machine learning tech- [5] N. Twomey, T. Diethe, I. Craddock, and P. Flach, ‘‘Unsupervised learning
niques. Additionally, the batch normalized implementation of sensor topologies for improving activity recognition in smart environ-
made the network achieve stable training performance in ments,’’ Neurocomputing, vol. 234, pp. 93–106, Apr. 2017.
[6] (2019). SparkFun Edge Development Board-Apollo3 Blue. [Online].
almost four times fewer iterations. The proposed model has Available: https://learn.sparkfun.com/tutorials/using-sparkfun-edge-
further been empirically analyzed for the memory require- board-with-ambiq-apollo3-sdk/all
ment and execution time for its operations with the objective [7] M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu, P. Wu, and
J. Zhang, ‘‘Convolutional neural networks for human activity recognition
of deploying the model in edge devices such as smartphones
using mobile sensors,’’ in Proc. 6th Int. Conf. Mobile Comput., Appl.
and wearables. We observed that most of the size and execu- Services, Nov. 2014, pp. 197–205.
tion time reduction in the optimized model are due to weight [8] A. Bulling, U. Blanke, and B. Schiele, ‘‘A tutorial on human activity recog-
quantization, potentially allowing them to be quantized dif- nition using body-worn inertial sensors,’’ ACM Comput. Surv., vol. 46,
no. 3, pp. 1–33, 2014.
ferently to the activations and allowing further optimizations. [9] F. J. Ordóñez and D. Roggen, ‘‘Deep convolutional and LSTM recurrent
In future, we would like to amend and develop time-series neural networks for multimodal wearable activity recognition,’’ Sensors,
counterpart of models with new and efficient architecture vol. 16, no. 1, p. 115, 2016.
[10] C. A. Ronao and S.-B. Cho, ‘‘Human activity recognition with smartphone
similar to shufflenet [24] facilitating pointwise group con- sensors using deep learning neural networks,’’ Expert Syst. Appl., vol. 59,
volution and channel shuffle, to further reduce computation pp. 235–244, Oct. 2016.
cost while maintaining accuracy. The proposed model has [11] T. Zebin, M. Sperrin, N. Peek, and A. J. Casson, ‘‘Human activity recog-
nition from inertial sensor time-series using batch normalized deep LSTM
been validated and successfully implemented on a smart- recurrent networks,’’ in Proc. 40th Annu. Int. Conf. IEEE Eng. Med. Biol.
phone. The implementation on the smartphone is utilizing Soc., Jul. 2018, pp. 1–4.
the real-time sensor data to predict the activity, a current [12] F. M. Rueda, R. Grzeszick, G. A. Fink, S. Feldhorst, and
M. T. Hompel, ‘‘Convolutional neural networks for human activity
limitation of this pre-trained model is that the classification recognition using body-worn sensors,’’ Informatics, vol. 5, no. 2, p. 26,
accuracy decreases during activity transition and in case 2018.
133518 VOLUME 7, 2019
T. Zebin et al.: Design and Implementation of a CNN on an Edge Computing Smartphone for HAR
[13] M. Gochoo, T.-H. Tan, S.-H. Liu, F.-R. Jean, F. Alnajjar, and S.-C. Huang, [36] B. Moons, K. Goetschalckx, N. Van Berckelaer, and M. Verhelst, ‘‘Min-
‘‘Unobtrusive activity recognition of elderly people living alone using imum energy quantized neural networks,’’ in Proc. 51st Asilomar Conf.
anonymous binary sensors and DCNN,’’ IEEE J. Biomed. Health Inform., Signals, Syst., Comput., Oct./Nov. 2017, pp. 1921–1925.
vol. 23, no. 2, pp. 693–702, Mar. 2018. [37] T. Zebin. (2018). Deep Learning Demo. [Online]. Available:
[14] D. Ravi, C. Wong, B. Lo, and G. Z. Yang, ‘‘Deep learning for human https://github.com/TZebin/Thesis-Supporting-Files/tree/master/Deep
activity recognition: A resource efficient implementation on low-power Learning Demo
devices,’’ in Proc. IEEE 13th Int. Conf. Wearable Implant. Body Sensor [38] (2019). ARM Cortex-m Series Processors. [Online]. Available:
Netw., Jun. 2016, pp. 71–76. https://developer.arm.com/ip-products/processors/cortex-m
[15] R. Chavarriaga, H. Sagha, A. Calatroni, S. T. Digumarti, G. Tröster,
J. del R. Millán, and D. Roggen, ‘‘The Opportunity challenge: A bench-
mark database for on-body sensor-based activity recognition,’’ Pattern
Recognit. Lett., vol. 34, no. 15, pp. 2033–2042, Jan. 2009.
[16] A. Reiss and D. Stricker, ‘‘Introducing a new benchmarked dataset
for activity monitoring,’’ in Proc. 16th Int. Symp. Wearable Comput.,
Jun. 2012, pp. 108–109.
TAHMINA ZEBIN received the undergraduate
[17] M. Lichman. (2013). Uci Har Machine Learning Repository. and M.S. degrees in applied physics, electronics
[Online]. Available: https://archive.ics.uci.edu/ml/machine-learning- and communication engineering from the Univer-
databases/00240/ sity of Dhaka, Bangladesh, and the M.Sc. degree
[18] N. Y. Hammerla, S. Halloran, and T. Plotz, ‘‘Deep, convolutional, and in digital image and signal processing from the
recurrent models for human activity recognition using wearables,’’ in University of Manchester, in 2012. Before joining
Proc. 25th Int. Joint Conf. Artif. Intell., New York, NY, USA, Jul. 2016, as a Lecturer with the University of East Anglia,
pp. 1533–1540. she was a Postdoctoral Research Associate in the
[19] MPU-9150 Product Specification, Invensence, San Jose, CA, USA, 2012. EPSRC funded project Wearable Clinic: Self, Help
[20] O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas, ‘‘Win- and Care with the University of Manchester and
dow size impact in human activity recognition,’’ Sensors, vol. 14, no. 4, was a Research Fellow in health innovation ecosystem with the University
pp. 6474–6499, Apr. 2014. of Westminster. Her current research interests include advanced image and
[21] J. Liono, A. K. Qin, and F. D. Salim, ‘‘Optimal time window for temporal signal processing, human activity recognition, and risk prediction modeling
segmentation of sensor streams in multi-activity recognition,’’ in Proc. from electronic health records using various statistical and deep learning
13th Int. Conf. Mobile Ubiquitous Syst., Comput., Netw. Services, 2016,
techniques. She was a recipient of the President’s Doctoral Scholarship,
pp. 10–19.
from 2013 to 2016, for conducting the Ph.D. in electrical and electronic
[22] F. Chollet. (2013). Keras: The Python Deep Learning Library. [Online].
engineering.
Available: https://keras.io/
[23] A. Doherty, D. Jackson, N. Hammerla, T. Plötz, P. Olivier, M. H. Granat,
T. White, V. T. van Hees, M. I. Trenell, C. G. Owen, S. J. Preece,
R. Gillions, S. Sheard, and N. J. Wareham, ‘‘Large scale population
assessment of physical activity using wrist worn accelerometers: The UK
biobank study,’’ PLoS ONE, vol. 12, no. 2, 2017, Art. no. e0169649.
[24] X. Zhang, X. Zhou, M. Lin, and J. Sun, ‘‘Shufflenet: An extremely efficient PATRICIA J. SCULLY received the Ph.D. degree
convolutional neural network for mobile devices,’’ in Proc. IEEE Conf. in engineering from the University of Liverpool,
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6848–6856. Liverpool, U.K., in 1992, and held Reader position
[25] I. Song, H.-J. Kim, and P. B. Jeon, ‘‘Deep learning for real-time robust with Liverpool John Moores University, in 2000.
facial expression recognition on a smartphone,’’ in Proc. IEEE Int. Conf. She was a Senior Lecturer/Associate Professor
Consum. Electron., Jan. 2014, pp. 564–567. in sensor instrumentation with the University of
[26] I. Andrey, ‘‘Real-time human activity recognition from accelerometer Manchester, in 2002, before moving to NUI Gal-
data using convolutional neural networks,’’ Appl. Soft Comput., vol. 62, way, Ireland, in 2018. She is experienced in lead-
pp. 915–922, Jan. 2017. ing industrial and research council/government
[27] C. K. Wong, H. M. Mentis, and R. Kuber, ‘‘The bit doesn’t fit: Evaluation
funded research projects at national and interna-
of a commercial activity-tracker at slower walking speeds,’’ Gait Posture,
tional levels and has research interests in sensors and monitoring for indus-
vol. 59, pp. 177–181, Jan. 2018.
[28] Y. Huang, J. Xu, B. Yu, and P. B. Shull, ‘‘Validity of FitBit, Jawbone trial processes, including optical fiber technology and photonic materials for
UP, Nike+ and other wearable devices for level and stair walking,’’ Gait sensors and devices, ranging from functional chemically sensitive optical
Posture, vol. 48, pp. 36–41, Jul. 2016. coatings to laser-inscribed photonic and conducting structures in transparent
[29] J. Huang, S. Lin, N. Wang, G. Dai, Y. Xie, and J. Zhou, ‘‘TSE-CNN: A two- materials that affect the properties of light.
stage end-To-end CNN for human activity recognition,’’ IEEE J. Biomed.
Health Informat., to be published.
[30] E. Grolman, A. Finkelshtein, R. Puzis, A. Shabtai, G. Celniker, Z. Katzir,
and L. Rosenfeld, ‘‘Transfer learning for user action identication in mobile
apps via encrypted trafc analysis,’’ IEEE Intell. Syst., vol. 33, no. 2,
pp. 40–53, Mar./Apr. 2018. NIELS PEEK received the M.Sc. degree in com-
[31] (2019). Tensorflow Lite for Mobile and Embedded Learning. [Online]. puter science and artificial intelligence, in 1994,
Available: https://www.tensorflow.org/lite/microcontrollers/overview and the Ph.D. degree in computer science, in 2000,
[32] R. Krishnamoorthi, ‘‘Quantizing deep convolutional networks for efficient from Utrecht University. From 2013 to 2017,
inference: A whitepaper,’’ Jun. 2018, arXiv:1806.08342. [Online]. Avail- he was the President of the Society for Arti-
able: https://arxiv.org/abs/1806.08342
ficial Intelligence in Medicine. He is currently
[33] J. Hanhirova, T. Kämäräinen, S. Seppälä, M. Siekkinen, V. Hirvisalo,
a Professor of health informatics with the Uni-
A. Ylä-Jääski, ‘‘Latency and throughput characterization of convo-
lutional neural networks for mobile computer vision,’’ Mar. 2018,
versity of Manchester. He has coauthored more
arXiv:1803.09492. [Online]. Available: https://arxiv.org/abs/1803.09492 than 200 peer-reviewed scientific publications. His
[34] A. Ignatov, R. Timofte, W. Chou, K. Wang, M. Wu, T. Hartley, and research interests include data-driven methods for
L. Van Gool, ‘‘Ai benchmark: Running deep neural networks on health research, healthcare quality improvement, and computerized decision
Android smartphones,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, support. In 2018, he was an Elected Fellow of the American College of
pp. 288–314. Medical Informatics and memory consumption and execution time fellow
[35] (2019). Trepn Power Profiler–Qualcomm Developer Network. [Online]. of the Alan Turing Institute, the U.K.’s national institute for data science and
Available: https://developer.qualcomm.com/software/trepn-power-profiler artificial intelligence.
ALEXANDER J. CASSON received the master’s KRIKOR B. OZANYAN received the M.Sc. degree
degree in engineering science from the University in engineering physics (semiconductors) and the
of Oxford, Oxford, U.K., in 2006, and the Ph.D. Ph.D. degree in solid-state physics, in 1980 and
degree in electronic engineering from Imperial 1989, respectively. He is currently the Director of
College London, London, U.K., in 2010. Since Research with the School of EEE, The University
2013, he has been a Faculty Member with The Uni- of Manchester, U.K. He has more than 300 pub-
versity of Manchester, Manchester, U.K., where lications in the areas of devices, materials, and
he leads a research team focusing on next gener- systems for sensing and imaging. He is a Fellow
ation wearable devices and their integration and of the Institute of Engineering and Technology,
use in the healthcare system. He has published over U.K., and the Institute of Physics, U.K. He was a
100 articles on these topics. He is the Vice-Chair of the IET Healthcare Tech- Distinguished Lecturer of the IEEE Sensors Council and was the Editor-in-
nologies Network and a Lead of the Manchester Bioelectronics Network. Chief of the IEEE SENSORS JOURNAL and the General Co-Chair of the IEEE
Sensors Conferences in the last few years.