Human Activity Classification

This document discusses human activity classification using machine learning techniques. It compares logistic regression, support vector machines, decision trees, and neural networks on a dataset containing sensor data from wearable devices. The dataset contains over 1.9 million data points measuring 12 different human activities. The paper finds that deep learning techniques like convolutional neural networks achieved the highest classification accuracy above 95% for recognizing human activities.

Uploaded by

nikhil singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Human Activity Classification

Uploaded by

nikhil singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Human Activity Classification

Aristos Athens, Zachary Blum, Navjot Singh. output. However, the classes in the article are improperly
balanced (with one class accounting for between 50 and 60
I. I NTRODUCTION percent of all of the data), which could affect the true test
Activity recognition is an important task in several health- performance (assuming that the test data also suffers from
care applications. By continuously monitoring and analyzing the same problem). The paper by Youngwook Kim and Hao
user activity it is possible to provide automated recommen- Ling [7] discusses an interesting approach of human activity
dations to both patients and doctors [2, 6]. There are also recognition through data obtained by a Doppler radar. Similar
applications to consumer products such as data logging for to our approach, this paper incorporate SVM models that are
smart watches health apps [3]. Common consumer devices tuned through cross-validation over a range of hyperparameter
such as smart phones and smart watches generally ship with values to pick an optimal one. However, the features and
IMUs (Inertial Measurement Unit), which are packaged ac- characteristics of the data are intrinsically different than the
celerometer and gyroscope sensors. Through the information IMU and heart rate data that our paper deals with. For
provided by these IMUs, machine learning techniques can then example, we do not have to take into consideration the angle
be used to train activity classifiers, giving users, doctors, and of the subject with respect to the radar and apply a different
app developers access to an individual’s lifestyle and activity model based on these edge cases. These variations in the data
choices. In this paper we examine two questions in parallel: lead the authors to create a classification model consisting of
what is the ”best” classification technique and how well both decision trees and SVM’s. We believe our data is easier
can it perform with less features. To compare classification to classify and as such can achieve a higher classification rate
techniques we use a variety of metrics, including classification than the 90% achieved in this paper. We found that the state-
accuracy, required dataset size, and prediction speed. In partic- of-the-art approach to supervised human activity classification
ular, we examine the use of logistic regression, support vector generally involved neural networks, particularly convolution
machines (SVM), various decision tree techniques, and neural neural networks (CNN) and recurrent neural networks (RNN).
networks. We then examine if these techniques can perform A number of studies [8] [12] use multiple datasets to classify
just as well with reduced sensor channels, for example an IMU similar activities. Both [Hamerla, 2016] and [Xue] incorporate
from a single body location vs. multiple IMU’s placed across CNN’s and KNN’s with moving windows in order to look
the body. at the activities of a user of a period of time and achieved
accuracy results above 95%. Given the limited scope of the
II. R ELATED W ORK CS 229, we did not try to emulate these techniques but believe
Because of its many applications, supervised human activity they can produce more robust models, especially for similar
classification using sensor data is a relatively popular research activities like playing soccer or running, where it may be
area. Through our research, we have found that related articles important to look over a history rather than a particular time-
and their approaches can generally be divided into three stamp.
categories: Naive Bayes Classification, SVM/Decision Trees,
and Neural Networks. We consider the use of Naive Bayes as III. DATASET AND F EATURES
a classifier for human activity as clever and interesting, since it We used the PAMAP2 Dataset from the UCI repository of
is usually used for text classification. One such article that uses machine learning datasets. [16, 15] This dataset includes raw
the Naive Bayes Classifier is Long, Yin, and Aarts, 2009 [9]. 9-axis IMU data streams (from the hand, chest, and ankle) as
This article was similar to our paper in that it included sensor well as heart rate data from nine different subjects performing
data from multiple subjects for the purpose of multi-class various activities. Each IMU provides temperature, 3-axis
activity classification, and evaluated their models using cross- acceleration, 3-axis angular velocity, and 3-axis magnetometer
validation (and compared their results to that of other methods, data at a rate of 100 Hz. In total there are 1.9 million
including decision trees). One way in which this paper differs data points, each containing 52 features. In the dataset, each
from our work (and is a strength of the paper) is its use time-step is labeled with an activity ID, one of 12 different
of Principal Component Analysis before conducting Naive activities that the subjects were engaged in. The 12 activities
Bayes to reduce the feature space and correlation between are the following: ironing, walking, lying, standing, sitting,
the features. This classification model, however, only had Nordic walking, vacuum cleaning, cycling, ascending stairs,
an accuracy of around 80 percent. One study that examined descending stairs, running, rope jumping. We had to clean
decision trees was [Parkka, 2006] [5]. Similar to our paper, the the data for a few reasons. The frequency of the time-stamps
study used an ordinary decision tree grown via cross-validation matches the highest frequency sensors, the IMUs, which read
and using the Gini Loss for each split. One strength of the every 0.01 seconds (100 Hz). However, the heart-rate monitor
article is that, in addition to this ”automatically generated had a frequency of 9 Hz, meaning approximately 90% of
decision tree,” the researchers also created a ”custom decision the heart-rate column consisted of Nan values. We filled in
tree” using their expert knowledge and analysis of the sensor missing heart-rate values by linearly interpolating between
nearest valid readings. For the logistic regression, SVM, and linearly separated. Part of the advantage for SVM’s is that
neural network models, an additional preprocessing step of only a fraction (represented by n in eqn 3) of the original
removing the training mean and dividing by the training training set has to be retrained for creating the hyperplane
standard deviation was added. We combined each subjects during predictions. Another advantage is that various kernel
data into a single matrix and divided that into a training and functions can easily be tested and selected for the one that
test set. We used 85% of our data for training and 15% for best fits a particular application. Thus, to select the optimal
testing. To avoid over-fitting on only a certain subset of classes, kernel function along with the C parameter specifying the soft-
we randomly split the data between training and testing. To margin size, a grid search was performed over three standard
train our models, we performed 5-fold cross-validation using kernel functions (polynomial, rbf, and linear) with a range of
our training set, to tune the hyperparameters for the given C values for each.
model. We then used this optimized model and analyzed its C. Deep Learning
performance on the test set.
Based on our literature review we believed that Deep
IV. M ETHODS Learning techniques should work well for this classification
A. Regression problem. We therefore implemented a MultiLayer Perceptron
As a baseline measure, we incorporated a standard logistic architecture for multi-class classification. A MultiLayer Per-
regression model with a loss function of the form shown in ceptron architecture is a fully connected feedforward neural
1. To make the model more robust against overfitting, L2 network with one input layer, one or more hidden layers and
regularization was incorporated with the C parameter inversely one output layer. Formally, the MLP can be considered a
related to the strength of regularization. In order to pick the function f: Rn −→ Rk , where n is the number of input features
optimal value for C, the model was independently trained over and k is the number of classes. Each hidden layer can be
a range of C values. The C value that produced the highest formalized as f:Ra −→ Rb , where a is the input size and b is
accuracy on the validation set was selected and tested on the the output size. In matrix notation, it would be:
test set. f (x) = Ac (W c x + bc ) (4)
m
X 1 where x is the input vector, W c is the weight matrix associated
J(θ) = (hθ (x(i) ) − y (i) )2 + ||θ||22 (1)
C with layer c, and bc is the bias vector associated with layer c,
i=0
and Ac is the activation function associated with layer c.
For each training phase, 5-fold cross validation was incor- We use softmax as the final activation, so each prediction
porated in order to reduce the variance in a trained model. is size (kx1) where k is the number of classes. The classifier
Stochastic Average Gradient (SAG) descent was selected as predicts a score for each class, instead of simply producing
the solver to use for training as it generally provides fast a single class label. This can give a sense of how close we
convergence for large feature-sets such as ours [10]. The SAG are to the correct label. We converted the true labels to use
solver is only guaranteed to converge quickly if all features are one-hot encoding, that is we took an (mx1) array and made
of about the same scale. Thus, the normalization preprocessing it (mxk), where m is the number of datapoints. We initially
step described earlier is necessary. found that making the model deeper produced worse results,
B. Support Vector Machine but increasing the number of neurons per layer improved
the accuracy. Our final architecture is Layer1 Relu Layer2
We wanted to also create a non-linear classifier in the hopes
Relu Layer3 Softmax. The layers have weight sizes of
that it can outperform the linear logistic regression model, so
Layer1(n, 512), Layer2(512, k). In order to generalize better
a SVM was a natural choice, as it is fairly easy to implement
we apply dropout for each hidden layer; this helps combat
with few parameters to tune. We created our SVM’s by solving
overfitting by suppressing each node with 50% probability.
the following primal problem:
We tried different gradient descent rules, with the best result
m
1 X coming from the SGD optimizer. To evaluate loss we use
miny,w,b ||w||2 + C ζi
2 categorical cross entropy, which is as follows:
i=1
(2) k
s.t. y (i) (wT x(i) + b) ≥ 1 − ζi , i = 1, ..., m X
CE(y 0 ) = − y j log(y 0 j ) (5)
ζi ≥ 0, i = 1, ..., m j=i

and the decision function is defined as: We use this in conjunction with softmax activation for the final
Xn layer, which gives us a probability, or confidence, for each
sgn( yi αi K(xi , x) + ρ) (3) prediction. Softmax output for the ith element is as follows:
i=1 0
eyi
Through training, an SVM model can create multiple hyper- SM (y 0 , i) = − Pk 0 (6)
planes to split the training set into its labeled categories. This j=1 eyj
is done through the use of kernels that transform the input This is advantageous because we have multiple classes, instead
data into a higher dimension, so that the data can then be of simple binary classification. We want the loss to give a sense
of the degree of error for each category, instead of a simple fact that individual decision trees have high variance which
yes or no. For example, assume we have 3 classes, and the could lead to low prediction accuracy [11]
first class is the correct label for this time step. Consider two 2) Boosted Decision Trees
example outputs: (0.98, 0.01, 0.01) and (0.51, 0.48, 0.01). The One form of ensembling is using ”boosting,” which com-
first output is clearly superior to the second, because it has bines many ”weak learners” (simple decision trees) in order
a higher confidence for the correct class, however both will to reduce bias in the model (at the expense of increasing
”predict” the first class. If we use something like simple error variance). Boosting these weak decision trees is done for the
rate for our loss, we are not able to capture the confidence of purpose of ideally improving accuracy, though it may be prone
our predictions. Using softmax activation with cross entropy to overfitting [11]. One specific algorithm for boosting, is
loss helps us capture this difference. Our final architecture is AdaBoost (which was used in this paper), as described in [4].
shown in figure 1. This neural network was implemented using 3) Random Forest
TensorFlow’s Keras [1]. Another ensemble method for decision trees, for the purpose
of improving prediction accuracy, is random forest. Random
Forest is a form of bagging (bootstrap aggregation), which
involves sampling with replacement from the original popula-
tion for the purpose of reducing variance (at the expense of an
increase in bias, increased computational cost, and decreased
interpretability of the trees). For a random forest, a large
number of decision trees are generated, and the bias is further
reduced (by decorrelating the trees) by only considering a
subset of the total number of features at each split in the
decision tree [11].
Fig. 1. We achieved our best results with this architecture. Our architecture
had size Layer1(n, 512), Layer2(512, k), where n is the number of data V. R ESULTS AND D ISCUSSION
features ranging from 12 − 52 and k = 18 is the number of class labels.
We use ReLU activation, softmax final activation, cross entropy loss, and As discussed previously, we were interested in classification
applied dropout to avoid overfitting.
performance on both the full set of features and performance
when using a reduced set of features. We tried several fea-
D. Trees ture combinations and discovered that we could get decent
Decision trees are a useful method for multi-class classifica- performance when using just the hand IMU plus heart rate
tion for nonlinear feature sets. Decision trees perform greedy sensor. The hand IMU performed better than any individual
”splits” on the each feature of the data at a specific threshold. IMU. This is a positive result, as we are particularly interested
In order to choose a split, a decision tree seeks to maximize the in applications where a user is holding a phone or wearing a
difference between the loss of the parent node and the sum smart watch. We will refer to the hand IMU plus heart rate
of the losses of the child nodes. The specific loss function data as the ”reduced” or ”limited” feature-set. Our primary
we used was the Gini Loss shown below, where pmk is the evaluation metric for all models was classification accuracy.
proportion of examples in class k present in region Rm , and This is simply the count of correctly classified data points
qm is the proportion of examples in Rm from tree T with divided by the count of classifications attempted. It is as
| T | different Rm regions [13]: follows:
m
1 X
|T | K f (y 0 ) = 1{y 0 i == y i } (8)
X X m i=1
qm pmk (1 − pmk ) (7)
m=1 k=1
where y 0 is our set of predicted labels, and y is the set of
There are multiple methods of regularizing, or preventing true labels. We used various loss functions, as described in
overfitting, for decision trees including setting a minimum size the subsection for each technique. The full results are shown
of leaf (terminal) nodes, and setting a maximum tree depth, below in Figure 3, and explained in detail in the remainder of
setting a maximum number of nodes [11]. For this paper, we this section.
chose to regularize using the maximum tree depth. All of these
decision tree classifiers were implemented using the scikit-
learn library [14]
1) Ordinary Decision Trees
The ordinary decision tree classifier uses the above algo-
rithm to create one tree, and we test the resulting tree on a
test set. While this generally perform well (depending on the Fig. 2. Training and Test Results for all Methods, for both Full and Limited
dataset), there are a few reasons for the desire to ”ensemble,” Feature-sets
or combine, multiple decision trees, including primarily the
A. Logistic Regression C. Decision Trees
Through 5-fold cross-validation, in fig. 3 we see that the 1) Ordinary Decision Trees
optimal C value for both feature-sets occurs around 0.01.The In order to tune the maximum depth hyperparameter of the
SAG algorithm automatically calculates the learning rate decision tree, we used scikit-learn’s validation curve function
value, so one was not needed to be selected prior [10]. to perform 5-fold cross-validation. The training and cross-
With a chosen C value of 0.01, the limited feature-set model validation curves for the limited feature-set is shown in the
performed with an 63.9% accuracy on the test set and the full left side of 4 below as a function of maximum tree depth,
feature-set model performed with a 81.5% accuracy, both of and similar curves were produced for the full feature-set It is
which are very similar to the validation sets. From the con-
fusion matrix (which is not posted due to space constraints),
the main sources of error for the both feature-set models are
activities like vacuum cleaning and ironing, Nordic walking
and walking, or lying and sitting. As expected, these activities
have similar body movements, so from the point of view of a
sensor reading, these may be hard to distinguish for a linear
classifier.

Fig. 4. Left: Training and Validation Curves for Limited Feature-set using
Ordinary Decision Trees. Right: Training loss and accuracy for the Multilayer
Perceptron neural net when trained on the reduced feature-set.

typical in practice to use the ”one-standard error rule” when

using cross-validation to choose the simplest model whose
accuracy or error is within one standard deviation of the
best performing model [4]. This serves to further reduce the
chance of overfitting. Using the one-standard error rule for
Fig. 3. Validation curves for a given range of regularization constants C for the the validation accuracy, the results suggest that for both the
SVM and logistic regression models. Training curves for logistic regression limited and full feature-sets, a maximum depth of 15 should
cannot be seen easily as they lie just above the validation curves
be used. Using a maximum depth of 15, the limited feature-
set achieved a test accuracy of 0.873 and the full feature-
set achieved a test accuracy of 0.927. Because of the size
B. Support Vector Machine of the tree, it was in-feasible to include the image of the
Using 5-fold cross-validation for each training model, fig. tree in this report. From the confusion matrix for this model,
3 shows that a good C value for the limited feature-set model the most frequent misclassifications were between vacuum
is 1000 and 100 for the full. Using these C values, on the cleaning and ironing, ascending and descending stairs, ironing
test set the limited feature-set and full feature-set models and standing, and vacuum cleaning and standing. While the
performed with accuracy values of 95.0% and 98.9%, respec- first two common misclassifications are understandable based
tively. Due to space constraints, only the graphs for the rbf on the similarity in hand movements and heart rate data, the
kernel are shown because it outperformed both the linear and latter two misclassifications are surprising and highlight areas
polynomial kernel. Comparing the highest validation scores, where the decision tree performed poorly.
the rbf kernel produced scores that were higher than the linear 2) Boosted Decision Trees
and polynomial kernels by on average about 15% and 10%, For boosted decision trees, we used the default learning
respectively for both feature-sets combined. For larger values rate, and manually tuned the maximum depth of the base
of C (greater than about 10) the model becomes considerably decision tree estimators, and the number of trees. Since the
more expensive to train and predict since the amount of base estimator of boosting should be a ”weak learner,” and
support vectors needed to store becomes larger. For a C value since the ”strong learner” from the results from Ordinary
of 100 on the reduced feature-set, the model consisted of about Decision Trees had a maximum depth of 15, we decided to
10,000 training samples, about 8% of the training data. From limit our weak learners to having a maximum depth of less
the confusion matrix (not posted due to space constraints), than or equal to 10. For both the limited and full feature-sets,
the main sources of error for the SVM model in general we created tables that presented the training and validation
were distinguishing between vacuum cleaning and ironing, accuracy values for different combinations of maximum depth
ascending stairs and walking, and standing and sitting. These of the weak learners (from 1-10) and number of trees (50,
points of confusion are similar to those of logistic regression, 100, 250, or 500). Using the one-standard error rule from the
but SVM seems to do a much better job in being able to optimal validation performance, we chose 500 trees of max-
distinguish between active and inactive tasks. depth=10 for the limited feature set and 250 trees of max-
depth=9. Using these models, the limited feature-set scored However, we did find that our model converged faster when
an accuracy of 0.940 on the test set, and the full feature-set using softmax for the final activation function in conjunction
scored an accuracy of 0.985 on the test set. A confusion matrix with categorical cross entropy loss. As expected, reducing the
for the limited feature-set is shown below in 5 dropout rate tended to improve training accuracy, but reducing
it too much caused a degradation in test accuracy. In this
project we were limited in both time and compute, and we
believe we can improve accuracy given more of both. We
can improve performance by training for more epochs. Loss
continued to decrease at the end of our training, indicating
performance was still improving when training finished. We
could also further increase the number of neurons per hidden
Fig. 5. Confidence Matrix for Boosted Decision Trees on Limited Feature- layer, at the cost of a larger model with slower training time.
set. The four most frequent misclassifications are highlighted in yellow. These
represented misclassifications between ascending and descending stairs, and Figure 4 shows the training accuracy and loss when this model
between vacuum cleaning and ascending/descending stairs. These misclassi- was trained on the limited feature-set for 100 epochs.
fications are expected because of the relative similarity in hand motions and
heart rate between these activities. VI. C ONCLUSION AND F UTURE W ORK
3) Random Forest Unsurprisingly, logistic regression performed the worst.
We decided to include 100 trees in our random forest model, Having the advantage of extremely low memory usage and
because 100 was a well-performing trade-off of time and speed for predictions, it can still be a viable method in low-
accuracy (for random forests, increasing the number of trees compute devices like microcontrollers. SVM’s, on the other
will only serve to decrease the variance, and will not increase hand, provide great results in accuracy but may not be viable
the likelihood of overfitting). We used the default option of solutions for microcontrollers because of the large number of
only considering a random subset of the square-root of the support vectors required to store and do operations on for
total number of features for each split. In order to tune the predicting. For decision tree methods, as expected, ensembling
maximum depth for each decision tree in the random forest, methods improved test performance over ordinary decision
we again used scikit-learn’s validation curve function to per- trees for both the full and limited feature-sets. Boosted de-
form 5-fold cross-validation. The training and cross-validation cision trees performed slightly better than random forest on
curves for the limited feature-set and the full feature-set, as a both feature sets, which is promising because it is generally
function of maximum tree depth in the random forest, were less computationally intensive, and thus is a good candidate for
created using a similar approach to the cross-validation for a model to actually deploy in a smart device. We would also
ordinary decision trees. Using the one-standard error rule for be interested in exploring different types of Deep Learning
the validation accuracy, the results suggest that for both the architectures. We considered using RNN’s (Recurrent Neural
limited and full feature-sets, a maximum depth of 20 should Networks) but our feature-set had a relatively large number of
be used. Using a maximum depth of 20, the limited feature- features per time step, and the activities did not involve more
set achieved a test accuracy of 0.937 and the full feature- than a few actions, so it was not necessary to take history into
set achieved a test accuracy of 0.980. From the confusion account when classifying a single time-step. As such, simple
matrix for this model, the most frequent misclassifications feed-forward neural nets were sufficient for this problem.
were between vacuum cleaning and ironing, ascending and However, we would like to explore CNN’s (Convolutional
descending stairs, vacuum cleaning and ascending/descending Neural Networks) which could potentially give similar or
stairs. These are exactly the common misclassifications found improved performance while using substantially less memory.
in boosting, and are expected because of the relative similarity In general, the limited feature-set performed only slightly
in hand motions and heart rate between these activities. worse than the full feature-set on all of the methods, which is a
promising result for actual deployment in smart devices. In the
D. Deep Learning future, we would like to test these models using real IMU’s. In
The MultiLayer Perceptron neural network was able to particular, we would want to see if a low-compute embedded
achieve high classification accuracy. When trained on the en- device could perform classifications with neural nets or SVM’s
tire dataset, it consistently achieved training accuracies greater in real-time, in addition to computationally cheaper methods
than 98% and test accuracies greater than 95%. The best model such as decision trees.
we trained produced a test accuracy of 98.1%. When trained
on the reduced feature-set consisting of only the hand IMU A PPENDICES
and the heart rate sensor, it achieved 81.4 % test accuracy. We All code used in this project can be found at:
noted a trend in accuracy vs hidden layer size. Increasing the https://github.com/aristosathens/HumanA ctivityC lassif ier
size of each layer (number of neurons) improved performance,
while increasing the depth (number of hidden layers) degraded C ONTRIBUTIONS
performance. We did not notice a significant difference in Aristos, Zach, and Navjot all contributed equally to this
performance when using ReLU vs other activation functions. project. Aristos focused on Deep Learning, Navjot focused
on Logistic Regression and SVM, and Zach focused on Trees [11] Andrew Ng. CS229 Course Notes Decision Trees. URL:
(ordinary decision trees, boosting, and random forests). All http : / / cs229 . stanford . edu / notes / rf - notes . pdf. CS229
three members worked on data preprocessing, analysis, and online course notes.
writing this report. [12] Thomas Plotz Nils Y. Hammerla1 Shane Halloran2.
“Deep, Convolutional, and Recurrent Models for Hu-
R EFERENCES
man Activity Recognition using Wearables”. In: Open
[1] Martı́n Abadi et al. TensorFlow: Large-Scale Machine Lab, School of Computing Science, Newcastle Univer-
Learning on Heterogeneous Systems. Software avail- sity, UK. URL: https://arxiv.org/pdf/1604.08880.pdf.
able from tensorflow.org. 2015. URL: https : / / www . [13] Rajan Patel. STATS 202 Lecture 19. URL: https://web.
tensorflow.org/. stanford . edu / class / stats202 / content / lec19 - cond . pdf.
[2] C. E. Matthews et al. “Amount of time spent in STATS 202 online course notes.
sedentary behaviors and cause-specific mortality in US [14] F. Pedregosa et al. “Scikit-learn: Machine Learning in
adults”. In: American Journal of Clinical Nutrition Python”. In: Journal of Machine Learning Research 12
(2012). URL: https://www.ncbi.nlm.nih.gov/pubmed/ (2011), pp. 2825–2830.
22218159. [15] Attila Reiss. PAMAP2 Dataset: Physical Activity Mon-
[3] Xiaoyong Chai and Qiang Yang. “Multiple-Goal Recog- itoring. URL: https://archive.ics.uci.edu/ml/machine-
nition from Low-Level Signals”. In: AAAI (2005). URL: learning - databases / 00231 / readme . pdf. UCI Machine
http : / / www. aaai . org / Papers / AAAI / 2005 / AAAI05 - Learning dataset.
001.pdf. [16] Attila Reiss and Dider Stricker. “Introducing a New
[4] Trevor Hasting. The Elements of Statistical Learning. Benchmarked Dataset for Activity Monitoring”. In:
URL : https://web.stanford.edu/ ∼hastie/Papers/ESLII. IEEE Computer Society Washington (2012). URL: https:
pdf. pg. 339. //dl.acm.org/citation.cfm?id=2358027.
[5] Panu Korpipa Juha Parkka Miikka Ermes. “Activ-
ity Classification Using Realistic Data From Wear-
able Sensors”. In: IEEE TRANSACTIONS ON IN-
FORMATION TECHNOLOGY IN BIOMEDICINE.
IEEE, 2006. URL: https : / / s3 . amazonaws . com /
academia . edu . documents / 12195796 / parkka06 . pdf ?
AWSAccessKeyId = AKIAIWOWYYGZ2Y53UL3A &
Expires = 1544770601 & Signature = QBZmGI0nx4 \
%2FLEqsaBbGl6Sm1AXQ \ %3D & response - content -
disposition=inline\%3B\%20filename\%3DActivity
Classification Using Realistic.pdf.
[6] P. T. Katzmarzyk and I-M. Lee. “Sedentary behaviour
and life expectancy in the USA: A cause-deleted life
table analysis”. In: BMJ Open (2012). URL: https : / /
bmjopen.bmj.com/content/2/4/e000828.
[7] Youngwook Kim and Hao Ling. “Human Activity Clas-
sification Based on Micro-Doppler Signatures Using a
Support Vector Machine”. In: IEEE TRANSACTIONS
ON GEOSCIENCE AND REMOTE SENSING. IEEE,
2009. URL: https://ieeexplore.ieee.org/stamp/stamp.jsp?
tp=&arnumber=4801689&tag=1.
[8] Nie Lanshun Li Jiazhen Ding Renjie Zhan Dechen
Chu Dianhui Li Xue Si Xiandong. “Understanding and
Improving Deep Neural Network for Activity Recogni-
tion”. In: School of Computer Science and Technology,
Harbin Institute of Technology. URL: https://arxiv.org/
ftp/arxiv/papers/1805/1805.07020.pdf.
[9] B; Aarts R.M Long X.; Yin. “Single-accelerometer-
based daily physical activity classification”. In: Pro-
ceedings of the EMBC. IEEE, 2009. URL: https://pure.
tue.nl/ws/files/2828549/Metis234421.pdf.
[10] Francis Bach Mark Schmidt Nicolas Le Roux. Minimiz-
ing Finite Sums with the Stochastic Average Gradient.
2nd ed. Springer.