0% found this document useful (0 votes)
3 views

Dynamic gesture recognition based on deep learning in human-to-computer interfaces

The document presents a deep learning approach for dynamic gesture recognition in human-computer interfaces, addressing limitations of traditional methods that rely on manual feature extraction. It utilizes an improved inverted residual network architecture based on the SSD (Single Shot MultiBox Detector) for efficient feature extraction and employs transfer learning to optimize the model. Experimental results demonstrate that the proposed method effectively recognizes various gestures quickly and accurately.

Uploaded by

Shahid Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Dynamic gesture recognition based on deep learning in human-to-computer interfaces

The document presents a deep learning approach for dynamic gesture recognition in human-computer interfaces, addressing limitations of traditional methods that rely on manual feature extraction. It utilizes an improved inverted residual network architecture based on the SSD (Single Shot MultiBox Detector) for efficient feature extraction and employs transfer learning to optimize the model. Experimental results demonstrate that the proposed method effectively recognizes various gestures quickly and accurately.

Uploaded by

Shahid Karim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Journal of Applied Science and Engineering, Vol. 23, No. 1, pp. 31-38 (2020) DOI: 10.6180/jase.202003_23(1).

0004

Dynamic Gesture Recognition Based on Deep


Learning in Human-to-Computer Interfaces
Jing Yu1, Hang Li2*, Shou-Lin Yin2*, Qingwu Shi4* and Shahid Karim3
1
Luxun Academy of Fine Arts, Shenyang 110034, P.R. China
2
Software College, Shenyang Normal University, Shenyang 110034, P.R. China
3
Institute of Image and Information Technology, Harbin Institute of Technology, Harbin 150000, P.R. China
4
College of Information Science & Electronic Technique, Jiamusi University

Abstract
Currently, gesture recognition provides a faster, simpler, convenient, effective and more natural
way for human-computer interaction, which has been widely concerned. Gesture recognition plays an
important role in real life. The manual feature extraction in traditional gesture recognition methods is
time-consuming and strenuous. Moreover, in order to improve the accuracy of recognition, the quantity
and quality of features to be extracted are required to be very high, which is a bottleneck for traditional
gesture recognition methods. Therefore, we propose a deep learning method for dynamic gesture
recognition in Human-to-Computer interfaces. An improved inverted residual network architecture is
utilized as the basis of SSD (Single Shot MultiBox Detector) network for feature extraction. And the
convolution structure of the auxiliary layer is predicted by using the inverse residual structure combining
the cavity convolution. It uses multi-scale information, which can reduce the amount of calculation and
parameters number. Transfer learning is used to optimize the trained network model so as to reduce the
training time and make the model more convergent. Finally, experimental results show that the proposed
method can recognize different gestures quickly and effectively.

Key Words: Gesture Recognition, Deep Learning, Human-to-Computer Interfaces, Feature Extraction

1. Introduction hand gesture recognition from challenging depth and in-


tensity data using 3D convolutional neural networks. This
The Human-to-Computer interface is a process that method combined information from multiple spatial scales
people exchange information with computers in a certain for the final prediction. It also employed spatio-temporal
way [1]. Recently, gesture interaction based on computer data augmentation for more effective training and to re-
vision has become a research hotspot in the field of Hu- duce potential overfitting. Wilson [5] extended the stan-
man-to-Computer interfaces due to its convenient and dard hidden Markov model method of gesture recognition
simple equipment. HOG, SIFT and other traditional fea- by including a global parametric variation in the output
ture based gesture recognition methods have low recog- probabilities of the HMM states. Using a linear model of
nition accuracy. It is difficult to identify multiple gesture dependence, it formulated an expectation-maximization
targets in one image [2,3]. (EM) method for training the parametric HMM. Cara-
Molchanov [4] proposed an algorithm for drivers’ miaux [6] presented a gesture recognition/ adaptation sys-
tem for human--computer interaction applications that, as
a complement to gesture labeling, characterized the mo-
*Corresponding author. E-mail: [email protected];
[email protected]; vement execution. It described a template-based recogni-
[email protected] tion method that simultaneously aligned the input gesture
32 Jing Yu et al.

to the templates using a Sequential Monte Carlo inference deep convolutional neural network are: RCNN, Fast
technique. And many other topics are proposed to detect RCNN, Faster RCNN and SSD, etc. [10-13]. When the
the gestures. However, there are still some problems such PASCAL VOC data set was tested, the object recogni-
as low efficiency, time-consuming etc. tion rate of Faster RCNN was 73.2%, and 7 frames of im-
Deep learning model is a complex, multi-layer artifi- age were recognized in each second. The recognition
cial neural network structure. Deep learning models have rate of SSD method was 72.1%, and 58 frames of image
strong nonlinear modeling ability and use general learn- were recognized per second. The recognition rate of
ing process to learn features from data. Compared with Faster R-CNN was faster than that of SSD. The recogni-
the features of traditional artificial design, the deep learn- tion rate of YOLO method was 63.4%, and it could rec-
ing model can express higher level and more abstract in- ognize 45 frames of image per second. The recognition
ternal features [7-9]. speed was similar to that of SSD method, and the recog-
Deep convolutional neural network (CNN) in deep nition rate was significantly lower than that of SSD. In
learning is an effective method for image feature extrac- this paper, modified SSD (MSSD) model is selected as
tion. Because of its invariance in translation and rotation the recognition model.
of image information, it has become a popular method in
the field of image processing and target recognition. At 2.1 SSD Network Structure
present, most of the researches on gesture recognition fo- SSD target detection model does not require time-
cus on the gesture recognition with a single hand. In the consuming region generation and feature re-sampling
process of gesture interaction, two-handed operation and steps. By directly convolving the whole image and pre-
other hands often occur. For gesture recognition of mul- dicting the category and corresponding coordinates of
tiple hands, this paper proposes a dynamic gesture recog- the object contained in the image, the detection speed is
nition method based on deep convolutional neural net- greatly improved. Meanwhile, the accuracy of target de-
work. Our contributions are as follows: tection is greatly improved by using small size convolu-
1. Feature is extracted by an improved inverted residual tion kernel and multi-scale prediction.
network architecture based on SSD. The SSD network structure is divided into Base net-
2. The convolution structure of the auxiliary layer is pre- work and Auxiliary network. The Base network is the
dicted by using the inverse residual structure combin- network that has high classification accuracy in the field
ing the cavity convolution with multi-scale informa- of image classification and removes its classification
tion, which can reduce the amount of calculation and layer. The auxiliary network is a convolutional network
parameters number. structure added on the basis of the basic network for tar-
3. Transfer learning is used to optimize the trained net- get detection. The size of these layers gradually de-
work model so as to reduce the training time and make creases so that multi-scale prediction can be made. Each
the model more convergent. added auxiliary network layer through a series of convo-
4. Experimental results show that the proposed method can lution kernels will produce a fixed predicted set. For a m
recognize different gestures quickly and effectively. ´ n ´ p (p is the channel number, m, n are the size) feature
The rest of this paper is organized as follows. In the layer, each auxiliary network will use 3 ´ 3 ´ p convolu-
next section, we detailed introduce the proposed SSD tion kernel to predict and produce score for one class. In
method for gesture recognition. Then, we give rich ex- the m ´ n positions, it predicts all the corresponding val-
periments and analysis in section 3. A conclusion is con- ues.
ducted in section 4. SSD model predicts k boundary boxes at each posi-
tion of feature graph. At the same time, the score of an
2. Gesture Recognition Model in Deep object appearing in this position and the offset of the ob-
Learning ject position relative to the boundary box are predicted.
Thus, c ´ k scores and 4k position offsets are predicted at
The main methods of object recognition based on the positions of each feature graph. For a feature graph
Dynamic Gesture Recognition Based on Deep Learning in Human-to-Computer Interfaces 33

with m ´ n size, it will predict (c + 4) × k × m × n outputs. Hierarchical feature fusion is the sum of the outputs
Finally, non-maximal suppression is applied to obtain of each convolution unit in the empty convolution layer.
the final predicted value of object category and position And the result of each sum is obtained by concatenate
information in the image. operation to get the final output result.
The reverse residual structure adopts ReLU6 as the
2.2 Modified SSD Network Structure activation function, and its output is,
SSD model uses VGG network as the basic network.
Y = min(max(X, 0), 6) (1)
But VGG network model has a large number of parame-
ters, occupies most of the running time in the process of where Y is the output of ReLU6 activation function. X is
feature extraction. And in the forward propagation pro- the input eigenvalue.
cess, information loss in the transformation process is al- Compared with ReLU, ReLU6 has better robustness
ways caused by nonlinear transformation. in low precision computing scenes. In addition, the con-
Shen [14] put forward the nonlinear activation func- volution kernel of 3´3 is used. Dropout and batch nor-
tion ReLU based on the manifold learning theory. Under malization are used in the training network process to re-
the high dimension, it would be better to retain informa- duce the overfitting in the training process. The impro-
tion. And in the low dimension, it would cause greater ved reverse residual structure is shown in Figure 1.
loss of information. Therefore, the input layer should in- Where Dilated denotes the empty convolution, Linear is
crease the feature dimension before the nonlinear trans- the Linear activation function, and HFF represents the
formation. In the output layer, the linear activation func- hierarchical feature fusion. Dwise represents a depth-se-
tion should be used to reduce the dimension of the fea- parable convolution structure.
ture to reduce the loss of information. So inverted resid- Combining with the improved reverse residual struc-
ual block was proposed. ture, we modify the base layer and auxiliary layer in SSD
The down-sampling operation in the reverse residual model: (1) original SSD uses VGG network as a base
structure will cause the loss of feature information while layer for feature extraction, but VGG network model is
increasing the perceptive field of the convolution kernel. not suitable for deployment to run on mobile devices, so
Therefore, it is considered to abandon the down-sam- reverse residual MobileNetV2 is proposed on the basis
pling operation in the convolution structure and introduce of network structure, which has less parameters, small
the empty convolution to solve this problem. Empty con- footprint, and running faster, which is as the SSD feature
volution adds an expansion parameter on the basis of the extraction network and to reduce the size of the model
original convolution operation. It expands the convolu- and calculation. The traditional convolutional network
tion kernel to the corresponding scale, and fills 0 in the structure is used in SSD auxiliary layer, which leads to
unused area of the original convolution kernel. The ap- large number of parameters and large amount of calcula-
plication of empty convolution can increase the sensing tion. As the basic structure of the auxiliary layer, the im-
field of the convolution kernel without the down-sam- proved auxiliary network layer can reduce the informa-
pling operation. However, the using of empty convolu- tion loss caused by the nonlinear transformation in the
tion will make the operation of convolution check data learning process and the convolution kernel has multi-
discontinuous, and small objects cannot be better identi- scale receptive field.
fied. This paper considers the hierarchical feature fusion
to solve the problems caused by the introduction of 2.3 Loss Function in MSSD Network Structure
empty convolution. Generating recognition box in MSSD model is a re-

Figure 1. Improved inverted residual network.


34 Jing Yu et al.

gression process. Judging the category within the recog- tive. The data set contains 4800 images, and each image
nition box is a classification process. The total objective contains 4 categories: his own left hand (owlh), his own
loss function is the weighted sum of position loss (loc) right hand (owrh), opposite left hand (oplh) and opposite
and confidence loss (conf). right hand (oprh). Each image labels the gesture region
position of 4 categories, as shown in Figure 2.
L(c, l, g) = 1 / N(Lconf (c) + aLloc (l, g)) (2)
In training process of MSSD model, the training set,
where, N is the number of default boxes corresponding verification set and test set are shown in Table 1.
to the real boxes. a = 1 is the weight term according to
the real experiment situation. Lconf (c) is the cross en- 3.2 Evaluation Index
tropy classification loss function of Softmax, and c is In this paper, we adopt the following evaluation in-
the confidence of each category. In Lloc (l, g), l = (lx, ly, dexes to analyze the effectiveness of proposed model.
lw, lh), each item denotes the predicted center of the box 1. IoU (intersection over union) is defined as the ratio of
(x, y) and the width (w), high (h). g = (gx, gy, gw, gh) re- the intersection and union of the area occupied by two
presents the true central position (x, y), width (w) and boxes [16].
high (h).
(5)
(3)
where P is the predicted box. GT is the ground truth.
where 2. Precision and recall are two famous quantitative in-
dexes. The gesture recognition model will classify the
contents in the identified boxes, predict the possibility
(4)
of the four gesture categories, and set the most likely
as the classification result.
3. Experiments on Gesture Recognition
(6)
3.1 Data Set Analysis
In order to realize the training of MSSD model, the (7)
gesture image data set taken from the first perspective is
used. The experiment adopts the gesture data set Ego-
(8)
Hands created by Indiana university [15]. The EgoHands
use the wearable device Google glass to shoot images.
Two people interact with each other in the first perspec- where TP is the detected correct gesture number. FP is

Figure 2. Samples in EgoHands.

Table 1. Some data set in this paper


Data set Description Number
Training set Multiple scenes, multiple people, multiple activities 2500
Verification set Same with above 500
Test set Same with above 1000
Dynamic Gesture Recognition Based on Deep Learning in Human-to-Computer Interfaces 35

the detected other posture number. FN is the leak de- 1060. The original image size of EgoHands dataset is
tected gesture number. F-score is used to adjust Preci- 1240´720 pixel, which is adjusted to 600´600 during
sion and Recall, which is more close to 1, the model is training process. The training strategy is shown in Table
better. 2. In this paper, the fine-tuning and transfer learning are
3. mAP (mean Average Precision) is to get an index that improved in MSSD model network.
can reflect the global performance. The size of input image and the size of feature graph
with true box would affect the recognition accuracy of
(9) MSSD model [17]. The added BN layer will also affect
the recognition accuracy of the deep learning model. This
3.3 Fine-tuning Network and Transfer Leaning experiment will fine-tune the MSSD model structure.
We firstly verify the effect of IoU on the recognition In the experiment, the size of the input image is ad-
accuracy with proposed method. The blue bar is accu- justed from 1240´720 to 600´600 and 300´300. Finally,
racy rate of recognition and the red bar is error rate of the trained models are denoted as MSSD6 and MSSD3,
recognition in Figure 3. When IoU = 0.3, though the rec- respectively. In the experiment, each pixel in the Conv3
ognition rate is high, the error rate is high too. When IoU ´3 layer extracted from the VGG-16 basic network is
= 0.6, the result is similar to IoU = 0.9. But IoU = 0.9, it added with box. The conv3´3 layer is also introduced
needs more time to process one image. Therefore, we into the calculation of loss function and the back propa-
choose IoU = 0.6 in this paper. gation process of box recognition, and the training result
For gesture recognition problem in gesture interac- is MSSD+Conv3 model. The results are shown in Table
tion process, the parameters are changed in MSSD mo- 3 and Figure 4.
del. The VGG-16 recognition model trained in PASCAL Transfer learning means that a learning algorithm
VOC dataset is used to initialize the parameters of the can use the commonalities among different learning ta-
basic network in MSSD model. It fixes the first two lay- sks to share statistical advantages and transfer knowl-
ers and does not participate in the back propagation. The edge between tasks. Transfer learning can shorten the
target to be identified is divided into four categories, and training time and improve the recognition rate of the
one background category. The total number of categories model.
is set as 5. The maximum recognition results of each Bambach [18] proposed a model for EgoHands ges-
frame are set as 4, and the maximum recognition result of ture recognition based on Caffenet network. In the ex-
each class is set as 1. This set only shows the most likely periment, the basic network in MSSD model was appro-
recognition result in each gesture class, which greatly re- priately changed, and then the parameters in Caffenet
duces the false recognition in each class. The training model and residual network model (Resnet) were trans-
and testing in MSSD model adopt Caffe deep learning ferred to MSSD model for training.
framework, and computer graphics card is NVIDIA GTX In the experiment, the MSSD model is adjusted by
changing the basic network in VGG as the top-5 layer
network in Caffenet model. Then, the parameters of the
Caffenet model in [18] are transferred to the basic net-

Table 2. Parameters in SSD model


Name Value
Size 600 ´ 600 pixel
Learning rate 10-4
Forgetting rate 0.9
Weight decay 5 ´ 10-4
Image number in each iteration 3
Figure 3. Effect of IoU on recognition. Iteration number 64000
36 Jing Yu et al.

Table 3. mAP results with different models


Average recognition
Model mAP/%
image per second
MSSD6 91.3 10
MSSD3 89.5 12
MSSD + Conv3 84.9 9
MSSD + BN 70.8 5

work in the MSSD model to initialize it. Then the net-


work is trained. Training results are as transfer Caffenet
model. In addition, the parameters of Caffenet model
Figure 4. Effect of different models on mAP.
(top-5 layer structure) are fixed. It dose not participate in
reverse back propagation. Training results are as transfer
Caffenet top-5 model. 3.4 Comparison experiment
The basic network of MSSD model is changed from We conduct comparison experiments with other two
VGG to residual network with101 layers. The parame- state-of-the-art dynamic gesture recognition methods in-
ters of the residual network trained in PASCAL VOC cluding RPS [19], GRM [20], FMCW [21] and LSPD
data set are transferred to the basic network of MSSD [22]. mAP results are shown in Table 5 and Table 6. Fig-
model and initialized. Then the network training is car- ure 6 displays the mAP value of four hands and Figure 7
ried out, the training result is as the transfer Resnet101 presents some gesture recognition results with our pro-
model. The residual network is relatively complex. In or- posed method.
der to shorten the training time, the size of the image is Our proposed MSSD method can achieve a better re-
adjusted to 256´256 for training. The training process of sults on all the hands recognition in terms of the mAP.
each transfer learning model is shown in Figure 5. Due to crossed hands with a big area, the recognition re-
Table 4 is the mAP results with different transfer
learning methods. Table 5. Comparison results with different methods
Method Four hands Precision Recall F-score
RPS owlh 91.73 87.58 89.54
owrh 92.77 86.31 89.64
oplh 90.88 85.23 87.46
oprh 91.25 87.37 89.46
GRM owlh 92.54 88.71 90.58
owrh 94.63 90.28 92.86
oplh 92.86 83.77 88.67
oprh 93.78 89.65 92.07
FMCW owlh 93.18 89.67 90.24
owrh 94.71 90.58 92.45
Figure 5. Effect of different transfer learning models on mAP. oplh 93.14 84.65 87.56
oprh 94.87 88.56 92.15
Table 4. mAP results with different transfer learning LSPD owlh 94.38 90.84 91.54
models owrh 95.37 91.57 92.84
oplh 93.94 85.72 89.41
Model mAP/% oprh 95.88 90.63 93.72
MSSD6 92.6 MSSD6 owlh 95.21 91.62 93.79
Transfer Caffenet top-5 91.7 owrh 96.42 91.08 94.14
Transfer Caffenet 86.2 oplh 94.83 90.58 93.18
Transfer Resnet 101 73.4 oprh 96.88 91.27 94.27
Dynamic Gesture Recognition Based on Deep Learning in Human-to-Computer Interfaces 37

Table 6. mAP results with different methods


Model mAP/%
RPS 78.6
GRM 84.3
FMCW 84.8
LSPD 85.2
MSSD6 88.9

Figure 6. Four hands’ mAP value.


Figure 7. Part of the results: left segment result, right recog-
sult is not ideal, but it still has productive efficiency than nition result.
other methods.
Through the wearable device, we take the first view myocontrol of a human–computer interface by paretic
video, and 100 video frames are randomly selected as muscles after stroke, IEEE Transactions on Cognitive
test images. The mAP obtained on the trained MSSD6 & Developmental Systems 10(4), 1126-1132. doi: 10.
model is 93.2%, and it can recognize 20 pictures per sec- 1109/TCDS.2018.2830388
ond. The dynamic gesture recognition effect is better. [2] Mert, A., and A. Akan (2018) Emotion recognition
from EEG signals by using multivariate empirical mode
4. Conclusion decomposition, Pattern Analysis & Applications 21(1),
81-89. doi: 10.1007/s10044-016-0567-6
In this paper, we propose a modified deep learning [3] Yu, J., H. Li, and S. L. Yin (2019) New intelligent in-
model for dynamic gesture recognition in Human-to- terface study based on K-means gaze tracking, Inter-
Computer Interfaces. Multiple gestures in the image can national Journal of Computational Science and Engi-
be recognized at the same time, the average mAP of ges- neering 18(1), 12-20. doi: 10.1504/IJCSE.2019.
ture recognition with proposed MSSD6 model is larger 096971
than 90 percent. It can be used in real-time recognition [4] Molchanov, P., S. Gupta, K. Kim, et al. (2015) Hand
based on visual gesture interaction. Experiments show gesture recognition with 3D convolutional neural net-
that, the method in this paper can quickly and accurately works, Computer Vision & Pattern Recognition Work-
recognize the multi-gesture hands in video. In the future, shops. doi: 10.1109/CVPRW.2015.7301342
we will design more advanced CNN network to improve [5] Wilson, A. D., and A. F. Bobick (2016) Parametric
the accuracy of gesture recognition. hidden Markov models for gesture recognition, IEEE
Trans.pattern Anal. & Mach.intell 21(9), 884-900.
References doi: 10.1109/34.790429
[6] Caramiaux, B., N. Montecchio, and A. Tanaka (2014)
[1] Yang, C., J. Long, M. A. Urbin, et al. (2018) Real-time Adaptive gesture recognition with variation estimation
38 Jing Yu et al.

for interactive systems, Acm Transactions on Inter- [15] Bambach, S., S. Lee, D. J. Crandall, et al. (2015) Lend-
active Intelligent Systems 4(4), 1-34. doi: 10.1145/ ing A hand: detecting hands and recognizing activities
2643204 in complex egocentric interactions, 2015 IEEE Inter-
[7] Gao, J., P. Li, and Z. K. Chen (2019) A canonical national Conference on Computer Vision (ICCV). IEEE
polyadic deep convolutional computation model for Computer Society. doi: 10.1109/ICCV.2015.226
big data feature learning in Internet of Things, Future [16] Lepetit-Aimon, G., R. Duval, and F. Cheriet (2018)
Generation Computer Systems. doi: 10.1016/j.future. Large receptive field fully convolutional network for
2019.04.048 semantic segmentation of retinal vasculature in fundus
[8] Lin, T., H. Li, and S. L. Yin (2018) Modified pyramid images, International Workshop on Computational
dual tree direction filter-based image de-noising via Pathology 201-209. doi: 10.1007/978-3-030-00949-
curvature scale and non-local mean multi-grade rem- 6_24
nant multi-grade remnant filter, International Journal [17] Liu, W., D. Anguelov, D. Erhan, et al. (2016) SSD: sin-
of Communication Systems 31(16). doi: 10.1002/dac. gle shot MultiBox detector, European Conference on
3486 Computer Vision. ECCV, 21-37. doi: 10.1007/978-
[9] Yin, S. L., and J. Bi (2019) Medical image annotation 3-319-46448-0_2
based on deep transfer learning, Journal of Applied [18] Bambach, S., S. Lee, D. J. Crandall, et al. (2015) Lend-
Science and Engineering 22(2), 385-390. doi: 10. ing A hand: detecting hands and recognizing activities
6180/jase.201906_22(2).0020 in complex egocentric interactions, 2015 IEEE Inter-
[10] Yin, S. L., Y. Zhang, and S. Karim (2018) Large scale national Conference on Computer Vision (ICCV).
remote sensing image segmentation based on fuzzy re- IEEE Computer Society. doi: 10.1109/ICCV.2015.226
gion competition and Gaussian mixture model, IEEE [19] Zhou, Z., Z. Cao, and Y. Pi (2018) Dynamic gesture
Access 6, 26069-26080. doi: 10.1109/ACCESS.2018. recognition with a Terahertz Radar based on range pro-
2834960 file sequences and Doppler signatures, Sensors 18(1),
[11] Yin, S. L., Y. Zhang, and S. Karim (2019) Region 10. doi: 10.3390/s18010010
search based on hybrid CNN in optical remote sensing [20] Verma, B., and A. Choudhary (2018) Framework for
images under cloud computing environment, Interna- dynamic hand gesture recognition using Grassmann
tional Journal of Distributed Sensor Networks 15(5). manifold for intelligent vehicles, Iet Intelligent Trans-
doi: 10.1177/1550147719852036 port Systems 12(7), 721-729. doi: 10.1049/iet-its.2017.
[12] Ren, S., K. He, R. Girshick, et al. (2017) Faster R- 0331
CNN: towards real-time object detection with region [21] Zhang, Z., Z. Tian, and Z. Mu (2018) Latern: dynamic
proposal networks, IEEE Transactions on Pattern An- continuous hand gesture recognition using FMCW ra-
alysis & Machine Intelligence 39(6), 1137-1149. doi: dar sensor, IEEE Sensors Journal 18(8), 1-1. doi: 10.
10.1109/TPAMI.2016.2577031 1109/JSEN.2018.2808688
[13] Li, J., H. C. Wong, S. L. Lo, et al. (2018) Multiple ob- [22] Nguyen, X. S., L. Brun, O. Lezoray, et al. (2019) Skel-
ject detection by deformable part-based model and R- eton-based hand gesture recognition by learning SPD
CNN, IEEE Signal Processing Letters PP(99):1-1. matrices with neural networks, IEEE International
doi: 10.1109/LSP.2017.2789325 Conference on Automatic Face & Gesture Recogni-
[14] Shen, J., J. Bu, B. Ju, et al. (2012) Refining Gaussian tion (FG). IEEE. doi: 10.1109/FG.2019.8756512
mixture model based on enhanced manifold learning,
Neurocomputing 87(1), 19-25. doi: 10.1016/j.neucom. Manuscript Received: Jul. 22, 2019
2012.01.029 Accepted: Oct. 19, 2019

You might also like