0% found this document useful (0 votes)

56 views7 pages

Action Classification via 2D Pose Estimation

Uploaded by

eeshnaugraiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views7 pages

Action Classification via 2D Pose Estimation

Uploaded by

eeshnaugraiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/332780847

Action Classiﬁcation Based on 2D Coordinates Obtained by Real-time Pose

Estimation

Conference Paper · February 2019

CITATIONS READS

3 3,834

4 authors, including:

Muthu Subash Kavitha Takio Kurita

Nagasaki University Hiroshima University
78 PUBLICATIONS 793 CITATIONS 336 PUBLICATIONS 5,042 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Muthu Subash Kavitha on 01 May 2019.

The user has requested enhancement of the downloaded file.

Action Classification Based on 2D Coordinates
Obtained by Real-time Pose Estimation
Siyi Shuai1 , Muthusubash Kavitha2 , Junichi Miyao3 , and Takio Kurita3
1
Department of Information Engineering
Hiroshima University
1-4-1 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan
{m174409, kavitha, miyao, tkurita}@[Link]

Abstract—Human action classification is a significant issue in

computer vision field. To retrieve essential information from a
large number of videos, understanding the content of the videos
is very important. In this study, we propose an approach that
classify the human actions based on the coordinate information
of the body parts. The extracted key coordinate points from
each frame based on real-time pose estimation algorithm are
accumulated as the matrix. Then these accumulated coordinates
are used to feed the convolutional neural network (CNN) to
classify human actions, that is the main contribution of this
study. This study is designed to ignore the background, and
just consider the movement information of the joints of the
extracted poses. CNN is designed to consist of three convolutional
layers, pooling layer and linear layer to extract the most relevant
features for classifying the human actions. We use two benchmark (a)
dataset to validate the performance of our proposed approach.
The human action classification performance of our proposed
approach using six different types of actions achieves very high
accuracy (100%), which is higher than the other competitive
approaches using KTH dataset.
Index Terms—action classification, pose estimation, 2D coor-
dinates, CNN

I. I NTRODUCTION
Human action recognition, classification and understanding
in videos have been a significant research domain in com-
puter vision. Recently large-scale online video resources (e.g.,
YouTube) and datasets (e.g., ActivityNet) are easily available.
(b)
In order to control the information explosion, it is necessary to
recognize and analyze these video contents for various reasons Fig. 1: Results of human action obtained using real-time pose
such as search, recommendation, etc. There are several appli- estimation algorithm. Different frames show different posture
cations that include videos for understanding, intelligent search and joints position (a) boxing action and joints movement, (b)
and retrieval, surveillance and human-computer interactions. pull up action and joints movement
However, human activity recognition and classification using
videos has been considered as one of the most challenging
visual tasks. figure 1. The person size appeared in different video frames is
Human activity, no matter how common, is done for some not always same. If the person size appeared as smaller, then
purpose. For example, in order to accomplish a physical the range of joints movement is also smaller. Thus, it is hardly
action, the person should interact and give feedback to the to learn with other person size. However, the human joints
environment using his/her head, hands, arms, legs, bodies, etc position information in each video frame is significant for
[1]. Different joints position express different actions. Hence, action classification. While there has been important progress
several joints movement of human body can be considered as in solving these tasks such as real-time human pose estimation
human activity. In the image-based human action, it is im- and CNN-based human activity classification model.
mobile human posture. However, video-based human actions For real-time human pose estimation, the progress of human
consists of a sequence of motionless postures, as shown in pose estimation have been achieved by using deep learning
neural network. There are two different approaches to analyze [9] proposed a method that can able to apply on various
the human pose such as, top-down approaches and bottom-up video understanding tasks which contains objects, actions and
approaches. Each method has its own characteristics. In our scenes.
research, we prefer to utilize real-time human pose estimation Recently, there were some research methods have also
method [2]. In this study, we use the 2D coordinates obtained conduct action recognition task starting from the perspective
by using real-time human pose estimation approach [2] as the of human skeletons. V. Raviteja et al. [11] proposed a new
cues to action classification. In the proposed approach, the 2D skeletal description by using rotations and translations in 3D
coordinates of a person in each frame in the video clips are space, and thus 3D geometric relationships between various
extracted and accumulated as matrix. Then these accumulated body parts can be modeled. D. Yong et al. [12] divided human
coordinates are used to feed the neural networks to classify the skeleton into five parts on the basis of human physical struc-
human activities. The performance of the proposed approach ture and proposed an end-to-end hierarchical RNN structure
is evaluated on two different bench mark datasets. constructing with five sub-nets in the model.
This paper is organized as follows. The related works are Additionally, the Long Short Term Memory (LSTM) model
briefly reviewed in section II. Section III mainly explains the were used to store, modify, and access the internal state of
proposed approach. Experimental results and conclusion are memory cells. Thus it allowes to discover long-range temporal
explained in section IV. section V, respectively. relationships. Hence LSTMs achieve state-of-the-art results in
various applications, such as handwriting recognition, segmen-
II. R ELATED W ORKS tation of events, emotion detection and speech recognition
It is worth to study the problem of recognition and un- [10]. The completed overview of LSTMs is beyond the scope
derstanding of human activities in computer vision. We in- of this study and hence we do not introduce in much detail.
vestigate and present the summary of the historical evolution
approaches related to action recognition [3], [1] and human B. Human pose estimation methods
pose estimation [4]. The review of this approach mainly focus Human pose estimation mainly focus on finding and localiz-
on recent and most relevant methods. ing the key points of individual and describe human skeleton
information by using ”parts ” of the body [13]. Traditional
A. Action recognition methods human pose estimation methods generally are based on the
The traditional action recognition methods largely pay at- idea of template matching using geometrical prior model. The
tention to global video representations and achieved good key point describes how to use the template [14]. It express
results on datasets, such as KTH [5], HMDB51, UCF101. the whole human body structure including key points of body,
These approaches focused on taking the advantage of local limb structures and the relationship between the different limb
appearance and motion information such as histogram of structures. P. F. Felzenszwalb et al. [15] proposed a classic
oriented gradients (HOG), histogram of optical flow (HOF), approach based on pictorial structure model of the relationship
motion boundary histogram (MBH) or dense trajectories. Fur- of spatial correlations between the parts of the body. However
thermore, in order to aggregate and encode these information these methods [16] can achieve good results only the limbs
and produce a global video-level representation, they used of the person are visible on the images. Hence it can easily
spatio-temporal pyramids with bag-of-Words (BoW) or fisher influence some error such as double-counting.
vector based encoding. Finally, human action classification Recently, there has been huge interest on models that
was achieved via traditional SVMs. employ deep learning neural network for the task of articulated
About neural networks, especially CNNs have been shown pose estimation. Deep learning neural network can be divided
to reach a great performance in action recognition [1]. A. into two research directions. One is the top-down and another
Karpathy et al. [6] explored multiple approaches for various one is bottom-up methods. The top-down approaches firstly
frame-level fusion methods and utilized local spatio-temporal perform person detection and then employ pose estimation.
information through the connectivity of the CNN in time W. Shih-En et al. [17] proposed a multi-staged CNN archi-
domain. K. Simonyan et al. [7] proposed two-stream CNN tectures to provide an additional information regarding the
approach which comprises spatial and temporal networks co-occurrence, interdependence, and context of body parts in
for action recognition. CNN trained on multi-frame dense each stage of the network. Thus the network can able to learn
optical flow that can achieve a great results. Through using image dependent spatial relationships between the body parts
a separate CNN stream to learn spatial information of frame- implicitly. Y. Chen et al. [18] considered the difficulty in
level, the combination of two-stream CNN model shown better detecting different key points that handle to localize simple
performance than the traditional methods. However aforemen- and difficult key points. The common top-down approaches
tioned methods required computing optical flow separately for are prone to occur problems such as wrongly detected person
optimizing the parameters. In order to solve this issue, 3D position and repeatedly detected same person. These issues
CNN was proposed by S. Ji et al. [8] for action recognition. cause key points detection errors or generate different key
The 3D CNN model extracted features from spatio-temporal points for the same person. F. Hao-shu et al. [19] proposed a
dimensions, and thus the motion information can be captured method to solve these problems by using bottom-up methods.
in the multiple adjacent frames. Recently, D. Tran et al. It mainly focused on key points detection and clustering. It
We consider 2D coordinate of body key points as an
important features that can help to classify actions. Therefore,
we would like to take advantage of the results of real-time
pose estimation approach [2] to get 2D coordinate information.
Though it is the same action, the joints position is different
based on the size of a person. Hence it is hard to use these
raw position information directly. In order to overcome the
(a) standing (b) riding different size of the person in each frame, we normalize all
the coordinates to 1 × 1 grid and classify the human actions
based on the convolution neural network model. The outline
of the proposed approach is shown in the figure 3.
B. Real-time pose estimation
The real-time pose estimation [2] is used to obtain the 2D
coordinate information of body key points. We use color video
(c) walking (d) running clips as inputs and produce joints position information of a
person as the outputs. The joints position information of a
Fig. 2: Coordinate points of human joints
person can be represented by (xi , yi ). The extracted results of
14 body key points that includes nose, neck, shoulders, elbows,
was assembled for full poses of the persons after all key points wrists, hip, knees and ankles. Hence we consider P is equal to
were detected. X. Fangting et al. [20] proposed a method that 14. The position of a person is expressed using total number
divide the human body into different parts. The key points of joints position information and thus the feature vectors for
located at specific positions of the segmentation area were used one frame are defined as
T
to model the relationship between the key points of the divided

x = x1 , y1 , x2 , y2 , . . . , xi , yi (i = 0, . . . , P ) (1)
body parts. Different with segmentation area to perform pose
estimation, Zhe. Cao et al. [2] mapped the relationship of Consider video clips with F number of frames that is ex-
key points into body part affinity fields (PAFs). It learned to pressed as F × 2P matrix, as shown in figure 4. Hence repre-
associate different body parts with individuals on the image. senting the action of all frames in the video are accumulated
The architecture jointly learned the part of locations and their in matrix, which is expressed as
 T
association through the two branches of the same sequential x1
prediction process. Furthermore it can maintain high accuracy xT2 
X =  ..  (f = 1, . . . , F ) (2)
 
while achieving real-time performance, with large number of
 . 
people on the image. It achieved 8.8 fps of speed for a video
with 19 people. On the COCO 2016 key points challenge xTf
dataset, this architecture set the state-of-the-art and the results The accumulated matrix X generated from all frames is
significantly exceeds the previous state-of-the-art methods on used as input into the convolution neural network architecture,
the MPII multi-person dataset. We take the advantage of this which is the main contribution of our study. Different with
method to get the coordinate information of the body parts to using images as input to classify actions, we ignore the
classify human actions. background, and just consider the movement information of
the joints of the extracted poses. The length of the feature
III. ACTION CLASSIFICATION FRAMEWORK vectors in each frame can be defined by the number of joints
A. Outline of the proposed Method information that needs to be extracted. The matrix we designed
is perfect and easy to find the movement information of all
The human pose estimation approach provide the 2D infor- extracted coordinates in each frame. Furthermore, it is suitable
mation of body key points, i.e., 2D joints position information for describing the changes in each coordinate values of several
of frames. The 2D information could express the status of consecutive frames.
each person in each frame, thereby a sequence of frames
will feedback the action information of each person. Different C. Action classification using neural networks
position of human joints lead to different actions. In different We design five layers of neural networks to classify the
actions, the position of human joints are very different, in actions. The proposed architecture includes three convolution
contrast, these joints information are really similar in same layers, max pooling and fully connected layer as shown in
type actions. For example, in actions such as “standing” and figure 5. The convolution layers are applied to extract the
“riding”, the position of arms and legs are really different. features. The information of all joints position is important
Whereas the actions such as “walking” and “running”, in the for action recognition and hence the size of one side of the
real scenes express similar actions in some camera angles, as convolution filter is set to the length of the feature matrix that
shown in figure 2. was accumulated as shown in figure 4. The convolution layer
Fig. 3: Outline of the proposed human action classification approach.

Fig. 4: Matrix obtained from all extracted coordinates of all Fig. 5: Proposed convolution neural network model.
frames

parameters of the network, we use the standard softmax cross

we designed can extract information of the shape of the pose entropy loss, which is defined as
as well as the short-time movement of the [Link] final fully N X
K
connected layer is used to classify the actions. The convolution
X
L=− tij log yij (5)
layers with Relu activation function are defined as i=1 j=1
 
X where yij is the output of the network for j-th class to the
xlj = f  xl−1
i
l
∗ wij + blj  (3) i-th sample.
i∈Xj
IV. E XPERIMENTS
where Xj is a set of indexes for the input features, l denote A. Dataset
the current layer and xlj is the feature map output in the l’s
layer. Also, x0i represents the input matrix that accumulating In the field of action recognition and video understanding,
all joints position of all frames. Each output features of each KTH, UCF101, Hollywood, Sports-1M, HMDB51 are com-
layer is given an additive bias b. The activation function f is monly used dataset to confirm the performance of the devel-
defined as oped approach. Among those we use KTH and UCF50 dataset
to test our proposed model. The KTH video dataset contains
f (x) = max (0, x) (4) six types of human actions; running, walking, boxing, jogging,
hand waving and hand clapping. The spatial resolution of each
In the last layer, softmax function is used to classify the video is 160x120 pixels with average length of four seconds.
actions. It totally includes 600 video clips with 25 fps frame rate of
Let {(Xi , ti )|i = 1, . . . , N } be the set of training samples, each video clip.
where Xi is a matrix that is obtained from the extracted The UCF50 dataset contain 50 types of human actions.
joints positions for i-th video in the training samples and ti is The proposed approach mainly focus on classifying single
the teacher signal represented as one-hot vector. To train the person action in the video clips. Hence we use some videos
from UCF50 dataset that come under our criteria. We used action classification accuracy and outperforms other previous
11 human actions: Golf swing, playing violin, pommel horse, approaches.
pull up, push ups, lunges, nun chucks, rope climbing, rock
climbing in door, taichi and jumping jack. It totally consists
of 913 video clips and each actions include 83 video clips.
B. Feature extraction
As explained in the section III, the coordinates are extracted
by applying real-time pose estimation algorithm. This algo-
rithm detect 18 key points totally from body parts. Whereas
eyes and ears are not necessary, hence those are not included
in the feature vectors. Finally, single person could represent
by feature vectors of 14 coordinates. The ksize and stride have
(a)
been set according how many coordinates do you want to
extract, for example, in the first CNN layer, the ksize have
been set [14, 2] and [2, 1] respectively, as the figure 6 shown
in the below. The p and f are the number of coordinates and
frame that you want to extract.

(b)

Fig. 7: Human action classification of KTH dataset for a total

of 5000 iteration. (a) accuracy of both train and test dataset
and (b) loss values of both train and test dataset

Fig. 6: The ksize and stride setting of first CNN layer as an

example.

C. Classification results
We select one frame from every five frames and used as
input into the model. The input is the accumulated matrix that
consist of 60 frames. Each frame represent one vector that
comprise 14 coordinates. Therefore, the input size of one video (a)
clip is 60×28. We randomly divide the total number of dataset
into 0.7 training and 0.3 testing. The mini-batch size is set to
100. We select SGD as a loss function and set learning rate
as 0.001. In order to prevent over fitting, dropout and weight
decay is added before the linear layer. The dropout is 0.5 and
weight decay is 0.01. In table I, we show the performance of
our proposed method on different datasets based on accuracy
and loss values. The human action classification results of our
proposed approach based on KTH datasets produce very high
performance as shown in figure 7. Figure 8 shows the action (b)
classification results of eleven types of human actions based on Fig. 8: Human action classification of eleven types of actions
UCF50 dataset. These eleven types of human actions in terms based on UCF50 dataset for a total of 5000 iterations. (a)
of precision, recall and F-measure by our proposed method accuracy of both train and test dataset, and (b) loss values of
is presents in table II. Table III shows the results of action both train and test dataset
classification accuracy of KTH dataset with other competitive
approaches. It shows that our proposed approach achieves best
Measures KTH UCF50 In the present study we use single person action classifica-
Train tion. In future we plan to extend the proposed classification
Accuracy 1.0 0.964 approach with multi-person actions. Furthermore, we consider
Loss 0.065 0.118 extending our approach by constructing the relationship of
Test more joints point via other neural network architecture such
Accuracy 1.0 0.803 as graph convolution neural network for generalizing the
Loss 0.085 0.667 performance of the results.
TABLE I: Performance of the proposed approach based on ACKNOWLEDGEMENT
accuracy and loss values
This work was partly supported by JSPS KAKENHI Grant
Number 16K00239.
Types of actions Precision Recall F-measure
Pommel horse 0.91 0.91 0.91 R EFERENCES
Pull up 0.67 0.77 0.71 [1] Yu Kong, Yun Fu, Human Action Recognition and Prediction: A Survey.
Push ups 0.83 1 0.91 2018
Golf swing 0.78 0.78 0.78 [2] Zhe. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multiperson 2d
pose estimation using part afnity elds. In CVPR, 2017.
Playing Violin 0.89 0.89 0.89 [3] F. Negin and F. Bremond. Human action recognition in videos: A survey.
Nun chucks 0.85 0.85 0.85 2016.
Lunges 0.86 0.86 0.86 [4] T. Moeslund, A. Hilton, and V. Kruger. A survey of advances in vision-
based human motion capture and analysis. In CVIU, 2006.
Rope climbing 0.70 0.54 0.61 [5] C. Schldt, I. Laptev, and B. Caputo. Recognizing human actions: A local
Rope climbing in door 0.71 0.63 0.67 svm approach. In ICPR, 2004.
Tai chi 0.86 0.86 0.86 [6] [Link], [Link], [Link], [Link], [Link] and L. Fei-Fei.
Large-scale video classication with convolutional neural networks. In
Jumping jack 0.77 0.76 0.79 CVPR, pages 1725-1732, 2014.
[7] K. Simonyan and A. Zisserman. Two-stream convolutional networks
TABLE II: Performance of the proposed approach based on for action recognition in videos. In Advances in Neural Information
different types of human actions of UCF50 dataset iterms of Processing Systems, pages 568-576, 2014.
precision, recall and F-measure [8] S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks
for human action recognition. IEEE transactions on pattern analysis and
machine intelligence, 35(1):221-231, 2013.
[9] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning
V. C ONCLUSION spatiotemporal features with 3d convolutional networks. In ICCV, pages
4489-4497. IEEE, 2015.
[10] J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R.
In this study, we proposed human action classification ap- Monga, and G. Toderici. Beyond short snippets: Deep networks for video
proach based on fourteen 2D coordinate points using real-time classication. In CVPR, pages 4694-4702, 2015.
pose estimation algorithm. The designed matrix containing [11] V. Raviteja, Felipe Arrate and Rama Chellappa. Human Action Recog-
nition by Representing 3D Skeletons as Points in a Lie Group. In CVPR,
multi-frames that compose of extracted 2D coordinates of 2014
human body, which is used as inputs to train the convolution [12] D. Yong, Yun Fu, Liang Wang. Hierarchical Recurrent Neural Network
neural network model, that is the main contribution of this for Skeleton Based Action Recognition. In CVPR, 2015
[13] A. Bulat and G. Tzimiropoulos. Human pose estimation via convolu-
study. The matrix we designed is perfect and easy to find tional part heatmap regression. In ECCV, 2016.
the movement information of all coordinates in each frame. [14] Y. Yang and D. Ramanan. Articulated human detection with exible
Furthermore, it is suitable for describing the changes in each mixtures of parts. In TPAMI, 2013.
[15] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial structures for object
coordinate values of several consecutive frames. The human recognition. In IJCV, 2005.
action classification results of our proposed approach using [16] M. Andriluka, S. Roth, and B. Schiele. Pictorial structures revisited:
six different types of human actions based on KTH dataset People detection and articulated pose estimation. In CVPR, 2009.
[17] W. Shih-En, Varun Ramakrishna, Takeo Kanade and Yaser Sheikh.
produce very good performance, which is higher than the Convolutional Pose Machines. In CVPR 2016
other competitive approaches. The human action classification [18] C. Yilun, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu and
performance using eleven different types of human actions Jian Sun. Cascaded Pyramid Network for Multi-Person Pose Estimation.
2017
based on UCF50 dataset yield very high to moderate results [19] F. Hao-Shu, Shuqin Xie, Yu-Wing Tai and Cewu Lu1. RMPE: Regional
in terms of precision, recall and F-measure values. Multi-Person Pose Estimation. In ICCV 2017.
[20] X. Fangting , Peng Wang, Xianjie Chen and Alan Yuille. Joint Multi-
Person Pose Estimation and Semantic Part Segmentation. In ICCV 2017.
Approach KTH [21] R. Mahdyar, Ravanbakhsh, Hossein Mousavi, Mohammad Rastegari,
Vittorio Murino, Larry S. Davis. Action Recognition with Image Based
R. Mahdyar et al. [21] 95.6% CNN Features. 2015
A. Fadwa et al. [22] 98.90% [22] A. Fadwa, Chunbo Bao, Arwa Mohammed Taqi, Mariofanna Milanova,
Our proposed method 100% Nabeel Ghassan. Human Actions Recognition Based on 3D Deep Neural
Network. In NTICT, 2017
TABLE III: Comparison of human action classification ac-
curacy of our proposed method with the state-of-the-art ap-
proaches using KTH dataset.

View publication stats

Skeleton-Based Action Classification in Videos
No ratings yet
Skeleton-Based Action Classification in Videos
14 pages
Human Activity Recognition via Pose Estimation
No ratings yet
Human Activity Recognition via Pose Estimation
5 pages
Deep Learning for Human Pose Analysis
No ratings yet
Deep Learning for Human Pose Analysis
47 pages
Human Pose Recovery & Action Recognition
No ratings yet
Human Pose Recovery & Action Recognition
7 pages
Human Pose Recovery & Action Recognition
No ratings yet
Human Pose Recovery & Action Recognition
8 pages
Human Activity Recognition With Sensor Approach
No ratings yet
Human Activity Recognition With Sensor Approach
179 pages
Efficient Human Action Recognition Method
No ratings yet
Efficient Human Action Recognition Method
16 pages
Human Activity Recognition Techniques
No ratings yet
Human Activity Recognition Techniques
24 pages
566600917
No ratings yet
566600917
43 pages
Multiperson Action Recognition via Skeletons
No ratings yet
Multiperson Action Recognition via Skeletons
14 pages
Human Behavior Recognition in Kinematics
No ratings yet
Human Behavior Recognition in Kinematics
9 pages
Video Survivallence
No ratings yet
Video Survivallence
3 pages
25 лит12312в12вувсвыскпкпаапавфпофокумиуфкоон65кн6ое
No ratings yet
25 лит12312в12вувсвыскпкпаапавфпофокумиуфкоон65кн6ое
15 pages
Action Recognition via Depth Maps
No ratings yet
Action Recognition via Depth Maps
14 pages
P-CNN for Action Recognition
No ratings yet
P-CNN for Action Recognition
9 pages
LSTM for Video-Based Action Recognition
No ratings yet
LSTM for Video-Based Action Recognition
8 pages
Multi-Person Activity Recognition Using CNN-LSTM
No ratings yet
Multi-Person Activity Recognition Using CNN-LSTM
15 pages
Spatio-Temporal Activity Recognition Using Transfer Learning
No ratings yet
Spatio-Temporal Activity Recognition Using Transfer Learning
9 pages
Enhanced AcT Model for Action Recognition
No ratings yet
Enhanced AcT Model for Action Recognition
14 pages
Multi-Task Learning for 3D Pose Estimation
No ratings yet
Multi-Task Learning for 3D Pose Estimation
13 pages
Advances in Human Pose Estimation
No ratings yet
Advances in Human Pose Estimation
4 pages
Action Recognition From Video Using
No ratings yet
Action Recognition From Video Using
16 pages
Action Recognition With Spatial-Temporal Discriminative Filter Banks
No ratings yet
Action Recognition With Spatial-Temporal Discriminative Filter Banks
10 pages
Human Action Recognition Project Report
No ratings yet
Human Action Recognition Project Report
32 pages
Adaptive Weighted DTW for Action Recognition
No ratings yet
Adaptive Weighted DTW for Action Recognition
9 pages
Human Action Recognition with CNNs & GAs
No ratings yet
Human Action Recognition with CNNs & GAs
14 pages
Vision-Based Action Recognition Thesis
No ratings yet
Vision-Based Action Recognition Thesis
173 pages
Multitask Deep Learning for Pose Estimation
No ratings yet
Multitask Deep Learning for Pose Estimation
10 pages
Advances in Human Pose Estimation
No ratings yet
Advances in Human Pose Estimation
11 pages
Human Pose Estimation with CNNs
No ratings yet
Human Pose Estimation with CNNs
7 pages
Lightweight Human Motion Recognition Framework
No ratings yet
Lightweight Human Motion Recognition Framework
6 pages
CNN and DTW for Depth Map Action Recognition
No ratings yet
CNN and DTW for Depth Map Action Recognition
13 pages
Recent Advances in Video-Based Human Action Recognition Using Deep Learning: A Review
No ratings yet
Recent Advances in Video-Based Human Action Recognition Using Deep Learning: A Review
8 pages
Survey on 3D Skeleton Action Recognition
No ratings yet
Survey on 3D Skeleton Action Recognition
18 pages
3D-Posture Recognition Using Joint Angle Representation
No ratings yet
3D-Posture Recognition Using Joint Angle Representation
11 pages
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
No ratings yet
Gradient Local Auto-Correlation Features For Depth Human Action Recognition - SpringerLink
3 pages
Hierarchical Features for Action Recognition
No ratings yet
Hierarchical Features for Action Recognition
19 pages
Object-Based Action Recognition Model
No ratings yet
Object-Based Action Recognition Model
15 pages
Multi-task Deep Learning for Pose & Action
No ratings yet
Multi-task Deep Learning for Pose & Action
16 pages
Skip-Pose Vectors for Action Recognition
No ratings yet
Skip-Pose Vectors for Action Recognition
6 pages
Deep Learning for Sports Action Detection
No ratings yet
Deep Learning for Sports Action Detection
16 pages
View Invariant Action Recognition
No ratings yet
View Invariant Action Recognition
16 pages
Fast Human Activity Recognition Methods
No ratings yet
Fast Human Activity Recognition Methods
8 pages
Multimodal Action Recognition with CNN
No ratings yet
Multimodal Action Recognition with CNN
8 pages
Human Activity Detection with Pose Net
No ratings yet
Human Activity Detection with Pose Net
5 pages
Electronics 10 01118 v3
No ratings yet
Electronics 10 01118 v3
12 pages
Human Activity Recognition Process Using 3-D Posture Data
No ratings yet
Human Activity Recognition Process Using 3-D Posture Data
12 pages
Human Action Recognition Survey 2017
No ratings yet
Human Action Recognition Survey 2017
9 pages
Human Action Recognition Techniques
No ratings yet
Human Action Recognition Techniques
6 pages
Vision-Based Human Action Recognition
No ratings yet
Vision-Based Human Action Recognition
11 pages
Comparative Study of Human Pose
No ratings yet
Comparative Study of Human Pose
9 pages
Deep Learning for Action Recognition
No ratings yet
Deep Learning for Action Recognition
6 pages
Ref CNN Attention
No ratings yet
Ref CNN Attention
6 pages
Enhancing Sewing Activity Recognition
No ratings yet
Enhancing Sewing Activity Recognition
7 pages
Deep Learning for Human Activity Recognition
No ratings yet
Deep Learning for Human Activity Recognition
17 pages
Human Pose Estimation: Deep Learning Benchmark
No ratings yet
Human Pose Estimation: Deep Learning Benchmark
7 pages
Real-Time Human Action Recognition
No ratings yet
Real-Time Human Action Recognition
10 pages
FPD Template 2
No ratings yet
FPD Template 2
11 pages
English Language Competencies for 6th Grade
No ratings yet
English Language Competencies for 6th Grade
3 pages
ACCG2000 Management Accounting Guide
No ratings yet
ACCG2000 Management Accounting Guide
8 pages
Summarizing the Research Process
No ratings yet
Summarizing the Research Process
35 pages
Lesson 6 - Principles of Speech Writing
No ratings yet
Lesson 6 - Principles of Speech Writing
7 pages
Urban AI Implementation Guide 2023
No ratings yet
Urban AI Implementation Guide 2023
91 pages
SOSE Unit: Meeting Needs in Families
No ratings yet
SOSE Unit: Meeting Needs in Families
31 pages
Built Heritage Modelling Techniques Review
No ratings yet
Built Heritage Modelling Techniques Review
28 pages
Philosophy of Engineering MCQ Bank
No ratings yet
Philosophy of Engineering MCQ Bank
5 pages
Digital Framework for Nuclear Knowledge Management
No ratings yet
Digital Framework for Nuclear Knowledge Management
8 pages
Henry R. Towne: Pioneer of Management
No ratings yet
Henry R. Towne: Pioneer of Management
6 pages
8 Cs of Effective Communication
No ratings yet
8 Cs of Effective Communication
14 pages
Information Superhighway and Africa's Future
No ratings yet
Information Superhighway and Africa's Future
16 pages
Media Literacy and Multi-Screening in PH
No ratings yet
Media Literacy and Multi-Screening in PH
5 pages
Economic Impact of Telecommunications in Spain
No ratings yet
Economic Impact of Telecommunications in Spain
8 pages
Standards of Conduct for ITU Staff
No ratings yet
Standards of Conduct for ITU Staff
11 pages
Effective Chalkboard Use in Teaching
No ratings yet
Effective Chalkboard Use in Teaching
33 pages
Terapia Familiar en la Escuela de Milán
No ratings yet
Terapia Familiar en la Escuela de Milán
30 pages
Information Structures For Causally Explainable Decisions
No ratings yet
Information Structures For Causally Explainable Decisions
34 pages
Impact of Digital Culture on Reading Skills
100% (1)
Impact of Digital Culture on Reading Skills
6 pages
Domestic Violence in 'Where the Crawdads Sing'
No ratings yet
Domestic Violence in 'Where the Crawdads Sing'
13 pages
Understanding Clinical Data Repositories
No ratings yet
Understanding Clinical Data Repositories
9 pages
ISO 17025 Document Control Guide
100% (1)
ISO 17025 Document Control Guide
20 pages
Understanding Textual Aids in English 10
100% (1)
Understanding Textual Aids in English 10
24 pages
IATF Auditor Guide for ISO/TS 16949
100% (1)
IATF Auditor Guide for ISO/TS 16949
33 pages
Volvo EC210B NC CAN Bus IDs
No ratings yet
Volvo EC210B NC CAN Bus IDs
4 pages
Factorial Studio Privacy Policy Overview
No ratings yet
Factorial Studio Privacy Policy Overview
9 pages
Dynamic Administrative Professional Profile
No ratings yet
Dynamic Administrative Professional Profile
2 pages
Presentation Flow: Public © 2023 SAP SE or An SAP Affiliate Company. All Rights Reserved. ǀ
No ratings yet
Presentation Flow: Public © 2023 SAP SE or An SAP Affiliate Company. All Rights Reserved. ǀ
31 pages
Financial Analysis of Tirumala Milk Products
No ratings yet
Financial Analysis of Tirumala Milk Products
7 pages

Action Classification via 2D Pose Estimation

Uploaded by

Action Classification via 2D Pose Estimation

Uploaded by

See discussions, stats, and author profiles for this publication at: [Link]

Action Classiﬁcation Based on 2D Coordinates Obtained by Real-time Pose

Conference Paper · February 2019

Muthu Subash Kavitha Takio Kurita

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Abstract—Human action classification is a significant issue in

parameters of the network, we use the standard softmax cross

Fig. 7: Human action classification of KTH dataset for a total

Fig. 6: The ksize and stride setting of first CNN layer as an

View publication stats

You might also like