Development of A Hand Pose Recognition System On An Embedded Computer Using Artificial Intelligence
Development of A Hand Pose Recognition System On An Embedded Computer Using Artificial Intelligence
Abstract—The recognition of hand gestures is a very interesting In this work we developed a system for hand pose recogni-
research topic due to the growing demand in recent years in tion to work on embedded computers with limited computa-
robotics, virtual reality, autonomous driving systems, human- tional resources and making use of low power consumption. In
machine interfaces and in other new technologies. Despite several
approaches for a robust recognition system, gesture recognition order to accomplish these targets, we employ low-processing
based on visual perception has many advantages over devices algorithms and trained a light CNN, which was optimized
such as sensors, or electronic gloves. This paper describes the to balance high accuracy, fast time response, low power
implementation of a visual-based recognition system on a em- consumption and low computational costs.
bedded computer for 10 hand poses recognition. Hand detection
is achieved using a tracking algorithm and classification by a II. M ETHODOLOGY
light convolutional neural network. Results show an accuracy of A. Overview
94.50%, a low power consumption and a near real-time response.
Thereby, the proposed system could be applied in a large range The proposed system works with images captured from a
of applications, from robotics to entertainment. standard CMOS camera and executed on embedded comput-
Index Terms—Gesture Recognition, Human-Machine Interac- ers with low computational resources, without GPU support,
tion, Recognition System, Hand Poses, Embedded Computer. such as Raspberry Pi, BeagleBone Board, Banana Pi, Intel
Galileo Board, and others. Therefore, the main objectives of
I. I NTRODUCTION the proposed system are as follows: high accuracy rate, fast
Hand gesture recognition is one obvious strategy to build time response, low power consumption and low computational
user-friendly interfaces between machines and users. In the costs.
near future, hand posture recognition technology would allow The system is composed of three main steps: hand detection,
for the operation of complex machines and smart devices hand region tracking and hand gesture recognition. In the
through only series of hand postures, finger and hand move- first step the Haar cascades classifier detects a basic hand
ments, eliminating the need for physical contact between man shape in order to have a good hand detection. Then, this
and machine. Hand gesture recognition on images from com- hand region is tracked using the MIL (Multiple Instance
mon single camera is a difficult problem because occlusions, Learning) tracking algorithm. Finally, hand gesture recogni-
variations of posture appearance, differences in hand anatomy, tion is performed based on a trained Convolutional Neural
etc. Despite these difficulties, several approaches to gesture Network. Since the steps described before are designed to
recognition on color images has been proposed during the last consume few computational resources, the whole system will
decade [1]. be implemented on a personal computer and Raspberry Pi
In recent years, Convolutional Neural Networks (CNNs) board. Fig. 1 shows the steps mentioned above.
have become the state-of-the-art for object recognition in
computer vision [2]. In spite of high potential of CNNs in
object detection problems [3] [4] and image segmentation [2]
tasks, only few papers report successful results (a recent survey
on hand gesture recognition [1] reports only one important
work [5]). Some obstacles to wider use of CNNs are high
computational costs, lack of sufficiently large datasets, as
well as lack of hand detectors appropriate for CNN-based
classifiers. In [6], a CNN has been used for classification of
six hand poses to control robots using colored gloves. In more
recent work [7], a CNN has been implemented on the Nao
robot. In a recent work [8], a CNN has been trained on one
million of images. However, only a portion of the dataset with
3361 manually labeled frames in 45 classes of sign language Fig. 1: Diagram for the proposed system
is publicly available.
IV. C ONCLUSION
In this paper, we introduce the implementation of a hand
pose recognition system on a regular embedded computer. We
demonstrated that our system is capable to recognize 10 hand
gestures with an accuracy of 94.50% on images captured from
a single RGB camera, and using low power consumption,
about 0.690 W. In addition, the average time to process each
640x480 image on a Raspberry Pi 3 board is 351.2 ms. The
results demonstrate that our recognition system is suitable for
embedded applications in robotics, virtual reality, autonomous
driving systems, human-machine interfaces and others.
R EFERENCES
[1] Oyedotun, O., Khashman, A.: Deep learning in vision-based static hand
gesture recognition. Neural Computing and Applications (2016) 111
[2] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with
deep convolutional neural networks. In: NIPS. (2012) 10971105
[3] Kwolek, B.: Face detection using convolutional neural networks and
Gabor filters. In: Int. Conf. Artificial Neural Networks, LNCS, vol. 3696,
Springer (2005) 551556
[4] Arel, I., Rose, D., Karnowski, T.: Research frontier: Deep machine
learninga new frontier in artificial intelligence research. Comp. Intell.
Mag. 5(4) (2010) 1318
[5] Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose
recovery of human hands using convolutional networks. ACM Trans.
Graph. 33(5) (2014)
[6] Nagi, J., Ducatelle, F.: Max-pooling convolutional neural networks for
vision-based hand gesture recognition. In: IEEE ICSIP. (2011) 342347
[7] Barros, P., Magg, S., Weber, C., Wermter, S.: A multichannel convolu-
tional neural network for hand posture recognition. In: 24th Int. Conf.
on Artificial Neural Networks (ICANN), Cham, Springer (2014) 403410
[8] Koller, O., Ney, H., Bowden, R.: Deep hand: How to train a CNN on 1
million hand images when your data is continuous and weakly labelled.
In: IEEE Conf. on Comp. Vision and Pattern Rec. (2016) 37933802
[9] B Babenko, M-H Yang, S Belongie, Visual Tracking with Online
Multiple Instance Learning, IEEE CVPR09, June, 2009.
[10] Jones, M.J., Rehg, J.M.: Statistical color models with application to skin
detection. Int. J. Comput. Vision 46(1) (2002) 8196
[11] Núñez Fernández D., Kwolek B., Hand Posture Recognition Using
Convolutional Neural Network. In: Mendoza M., Velastn S. (eds)
Progress in Pattern Recognition, Image Analysis, Computer Vision, and
Applications. CIARP 2017. Lecture Notes in Computer Science, vol
10657. Springer, Cham
[12] C. Szegedy et al., Going deeper with convolutions, 2015 IEEE Conf. on
Computer Vision and Pattern Recogn. (CVPR), Boston, MA, 2015.
[13] Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisser-
man. Return of the Devil in the Details: Delving Deep into Convolutional
Nets. In Proceedings of the British Machine Vision Conference 2014,
pages 6.16.12. British Machine Vision Association, 2014.
[14] Min Lin, Qiang Chen, and Shuicheng Yan. Network In Network. In
International Conference on Learning Representations (ICLR) 2014.