LMR timefeatures
LMR timefeatures
Article
IMU-Based Fitness Activity Recognition Using CNNs for Time
Series Classification
Philipp Niklas Müller * , Alexander Josef Müller, Philipp Achenbach and Stefan Göbel
Abstract: Mobile fitness applications provide the opportunity to show users real-time feedback
on their current fitness activity. For such applications, it is essential to accurately track the user’s
current fitness activity using available mobile sensors, such as inertial measurement units (IMUs).
Convolutional neural networks (CNNs) have been shown to produce strong results in different
time series classification tasks, including the recognition of daily living activities. However, fitness
activities can present unique challenges to the human activity recognition task (HAR), including
greater similarity between individual activities and fewer available data for model training. In this
paper, we evaluate the applicability of CNNs to the fitness activity recognition task (FAR) using IMU
data and determine the impact of input data size and sensor count on performance. For this purpose,
we adapted three existing CNN architectures to the FAR task and designed a fourth CNN variant,
which we call the scaling fully convolutional network (Scaling-FCN). We designed a preprocessing
pipeline and recorded a running exercise data set with 20 participants, in which we evaluated the
respective recognition performances of the four networks, comparing them with three traditional
machine learning (ML) methods commonly used in HAR. Although CNN architectures achieve at
least 94% test accuracy in all scenarios, two traditional ML architectures surpass them in the default
scenario, with support vector machines (SVMs) achieving 99.00 ± 0.34% test accuracy. The removal
of all sensors except one foot sensor reduced the performance of traditional ML architectures but
improved the performance of CNN architectures on our data set, with our Scaling-FCN reaching the
highest accuracy of 99.86 ± 0.11% on the test set. Our results suggest that CNNs are generally well
Citation: Müller, P.N.; Müller, A.J.;
suited for fitness activity recognition, and noticeable performance improvements can be achieved if
Achenbach, P.; Göbel, S. IMU-Based
sensors are dropped selectively, although traditional ML architectures can still compete with or even
Fitness Activity Recognition Using
CNNs for Time Series Classification.
surpass CNNs when favorable input data are utilized.
Sensors 2024, 24, 742. https://
doi.org/10.3390/s24030742 Keywords: activity recognition; inertial measurement unit; deep learning; convolutional neural
network; residual neural network; traditional machine learning; study
Academic Editor: Andrew D. Nordin
tasks [3], including human activity recognition (HAR) [4–6]. In the context of fitness activ-
ity recognition, they have been successfully applied to various different activities, such as
swing sports [7–9], skiing [10,11], beach volleyball [12], football [13], and exercising [14].
However, it is unclear how well CNN architectures translate to other fitness activities that
can present unique challenges, such as low availability of training data, small differences
between different activities, and limited processing power on mobile devices. Furthermore,
most of these papers focus on their respective use cases and thus do not compare their
CNN to other CNN architectures or traditional machine learning methods.
Therefore, this study aims to assess how CNN architectures can be adapted to the
mobile fitness activity recognition task using IMUs and how their results compare to
traditional machine learning. For this purpose, we propose a preprocessing pipeline,
adaptations to three existing CNN architectures, and a new CNN architecture that aims
to address the execution speed variability in fitness exercises. The performance of each
architecture is evaluated on a running exercise data set that was recorded in the context of
this study and compared to a baseline of three traditional machine learning models that are
commonly used in HAR. Lastly, performance changes are determined for varying numbers
of sensors and input data sizes.
Contributions
Our work provides the following key contributions to the field of human activity recognition:
1. An introduction of the Scaling-FCN architecture designed for sensor-based fitness
activity recognition with fixed time windows.
2. An introduction of a new public data set with IMU data of 20 participants for seven
different running exercises [15].
3. A recording and preprocessing pipeline for fitness activity data recorded with multiple
body-worn IMUs.
4. A detailed performance analysis of the Scaling-FCN compared to three existing CNN-
based architectures and three traditional machine learning architectures on the run-
ning exercise data set, focusing on the effect of different input data parameters.
2. Data Acquisition
A representative data set is essential not only to train a machine learning model but
also to assess its expected real-world performance. However, at the time of the study, we
were unable to find a single public human activity recognition (HAR) data set that met
the criteria for our study. In particular, we found that most data sets in the field of mobile
HAR, such as the one provided by Anguita et al. [16] only cover activities of daily living
and not fitness activities. Other data sets, such as the BasicMotions data set [17] and the
CounterMovementJump data set [18], feature relatively few activities and only a single
body-worn sensor. Furthermore, many public HAR data sets already consist of statistical
features such as mean, minimum, and maximum values across a recording and, therefore,
are not suitable for the CNN approaches evaluated in this study. The only data set that
we could find that satisfies the previous criteria, the daily and sports activities data set by
Barshan and Altun [19], consists of data from only eight subjects and primarily features
activities that are very different from each other and, therefore, are relatively simple to
classify. Plötz et al. [20] also acknowledge this lack of larger data sets in mobile HAR as
one of its main challenges, appealing for the development of such data sets.
Therefore, we recorded a running exercise data set that is publicly available at [15].
The data set consists of seven popular running exercises performed by 20 healthy subjects
(16 m, 4 f) between 16 and 31 years of age while wearing an IMU on each ankle and wrist
for a total of four IMUs (see Figure 1).
Sensors 2024, 24, 742 3 of 20
Figure 1. Position of the four GSDBLE sensor boards worn during data acquisition.
2.2. Hardware
Data were recorded using four GSDBLE sensor boards that have been developed for
mobile activity recognition in the context of Pascal Dornfeld’s thesis [21] (see Figure 2a).
They are powered by a CR2450 3 V lithium battery and record data with an LSM6DSL IMU
from STMicroelectronics. They use a Bluetooth low energy (BLE) connection to send their
accelerometer and gyroscope data alongside time stamps to a connected smartphone. The
sensor boards are contained in sweatband pockets (see Figure 2b) so they can be worn
without affecting the user’s mobility.
Subjects wore a total of four sensor boards during all recordings, one at each ankle
and each wrist, respectively (see Figure 1). All sensor boards were connected to a single
Huawei P20 smartphone that aggregated all their data using a custom recording applica-
tion. At the end of a recording, the application stored the sensor data, timestamps, and
metadata in a JavaScript Object Notation (JSON) format file. Timestamps are recorded in
Sensors 2024, 24, 742 4 of 20
milliseconds, whereas accelerometer and gyroscope values are recorded as signed 16-bit
integer values representing an acceleration of a ∈ [−16 G, +16 G] and angular velocity of
v ∈ [−2000 dps, +2000 dps], respectively.
(a) (b)
Figure 2. The data acquisition sensor setup. (a) GSDBLE sensor boards with coin batteries for scale.
(b) The sweatband with the GSDBLE sensor board (currently not in pocket) on top.
Table 1. Data loss (in %) during real-world usage performing fitness exercises over 40 s for each
sensor and averaged across all sensors.
ODR Data Loss Per Sensor (in %) Average Data Loss (in %)
52 3.71 | 1.84 | 3.57 | 3.52 3.16
104 3.52 | 4.01 | 12.78 | 7.77 7.02
3. Preprocessing
To utilize the recorded data set in a machine learning (ML) classifier, it must be brought
into a suitable format. In addition, recorded data should first be cleansed to reduce the
impact of recording or transmission errors on classification performance.
(a) (b)
Figure 3. Comparison of sensor data streams for equivalent movements. (a) Before timestamp
correction. (b) After timestamp correction.
3.3. Standardization
Many machine learning architectures require the input data to be normalized or
standardized for optimal training. Since we have two different types of data, acceleration
V −µ
and angular velocity, we use the standardization formula V ′ = σ to scale all data to
Sensors 2024, 24, 742 6 of 20
µ = 0 and σ = 1 to prevent one type of data from dominating the other during training. The
values for µ and σ were calculated once for all accelerometer data and once for all gyroscope
data in the data set and then applied to all samples. We decided against calculating µ and
σ for individual sensor positions and axes to preserve the relative differences in intensity.
In a real-time application, these values could be supplied alongside the model to ensure
consistent data values between training and inference.
3.4. Segmentation
To simulate the use case of a real-time application, data recordings have to be split into
individual segments, each segment representing the data that last arrived at the application
at a given point in time. Each segment later serves as one input sample in our experiments.
For this purpose, we use a typical sliding-window approach, as shown in Figure 4.
Liu et al. [22] suggest that window length and overlap ratio are important parameters
for HAR modeling and real-time performance and should be chosen with the data set
and use case in mind, specifically mentioning the length of individual motions as useful a
priori information, which they determined for daily activities in their previous work [23].
We estimate the length of individual running exercise repetitions to be roughly 0.5 to
1.0 s based on our observations that participants were able to perform roughly 10 to
20 repetitions in each 10 s recording. As a result, we chose a baseline window size of one
second (52 timestamps) to ensure that each window contains at least one full repetition
and compared it to smaller window sizes in Section 6.2.1, which may be better suited for
real-time applications relying on low prediction latency, such as exergames. We use 75% as
the overlap ratio to ensure that we have a sufficient number of samples for model training
and to simulate a real-time application that requires regular prediction updates, resulting
in a stride of 250 ms for a window length of one second. As detailed in the next paragraph,
we strictly ensure that no data from the same user are shared between training, validation,
and test sets, preventing any potential data leakage through window overlap.
Overlap Overlap
Stride Stride
Figure 4. A visualization of the sliding window segmentation with a window size of 13, a stride of
10, and an overlap of roughly 23%.
4. CNN Architectures
To account for the variety of CNN architectures available, we adapted three different
CNN architectures that have been shown to perform well in time series classification
tasks. Furthermore, we designed a fourth CNN architecture that utilizes data rescaling as
proposed by Cui et al. [24] but adapted to the fitness activity recognition task.
Modifications to existing architectures were made when necessary to allow inference
on mobile devices. Although recent work shows that current mobile phones are capable of
running image classification model inference fast enough for real-time applications [25–28],
and further optimizations are possible [29], implemented models should still aim to limit
parameter count to preserve battery life and save computational resources. This can be
particularly important in mobile HAR applications, where recognition models may run
for prolonged periods of time in the background. Since we furthermore expect that the
task of recognizing fitness activities will be less complex than the task of recognizing
images, for which models usually contain at least one million parameters [25], we set
one million as the upper limit for the parameter count of our models. In our empirical
hyperparameter optimization, all architectures were able to generate optimal results (within
the measurement error range) with between 207,360 and 302,400 parameters each, resulting
in the final parameter counts shown in Table 2. All CNN architectures were implemented
in PyTorch version 2.1 (https://pytorch.org/(accessed on 20 December 2023)).
global pooling. ReLU is again used as the activation function, whereas a softmax layer
produces the final output.
4.4. Scaling-FCN
The speed of execution of fitness activities can differ significantly between individuals
based on factors such as fitness and motivation. Additionally, fitness activities often consist
of multiple overlapping movements that may be performed at varying time intervals. We
reflect this in our scaling fully convolutional network (Scaling-FN), shown in Figure 5, by
using one-dimensional average pooling layers at the beginning of the network to rescale
the input data in its time dimension. Compared to approaches such as the multiscale
convolutional neural network of Cui et al. [24], our approach does not drop any data that
could contain important information and is comparable to filtering in image classification,
such as [37]. After scaling the input data to three different sizes using average pooling,
each scaled input is processed in parallel by three convolution layers with a kernel size of
3 and a padding of 1, after which two-dimensional average pooling is applied. The data
are then concatenated, fed into an additional convolution layer (kernel_size = 3, stride = 1,
padding = 1), and finally fed into a softmax layer to generate the output. Similarly to ResNet,
each convolution layer is followed by a batch normalization and a ReLU activation function.
Figure 5. The Scaling-FCN architecture with data shapes for the Running Exercise data set with seven
different classes.
5.1. Features
For traditional machine learning algorithms, it is important to generate features that
provide the model with the necessary information to differentiate classes. As surveys such
as those conducted by Lara et al. [39] and Cornacchia et al. [40] show, a large number of
different features in the time and frequency domains are being used for human activity
recognition. Barandas et al. [41] propose the TSFEL library containing 60 different features
for time series classification, which has been adapted by recent work in HAR, such as those
by Rodrigues et al. [42], Liu et al. [43,44], and Hartmann et al. [45].
As we do not have the computational resources to perform an extensive feature
selection procedure such as that detailed by Hui Liu [44] for multiple traditional machine
learning architectures, we instead compare feature sets that each comprise all features of
one category (statistical, temporal, and spectral) in TSFEL version 0.1.6 (https://github.
com/fraunhoferportugal/tsfel (accessed on 8 December 2023)), respectively, in addition
to all combinations of these feature sets. TSFEL currently supports 20 different statistical
features, 14 different temporal features, and 26 different spectral features.
Table 3 shows the prediction accuracies for all combinations of feature sets for each of
the three traditional machine learning architectures considered in this paper. For RF and
AVM, a combination of statistical and temporal features achieves the best results, whereas
for K-NN, this combination performs extremely close to the best-performing feature set
consisting exclusively of temporal features. Furthermore, we could observe that the spectral
features took significantly longer to generate than the statistical and temporal features.
Table 4 shows the average combined time spent on feature generation plus prediction
of a single label on a 4-core 8-thread 2200 Mhz Intel(R) Xeon(R) CPU. We expect these
to be roughly representative of modern higher-end smartphone CPUs that have similar
numbers of cores and frequencies, and we consider a prediction time of below 100 ms
to be acceptable for real-time usage. As a result, we use the feature set that combines
the 20 different statistical features and 14 different temporal features of the TSFEL library
in all subsequent experiments. We opted against using a different feature set consisting
of exclusively temporal features for K-NN as it only performed marginally better on the
validation sets, performed worse on the test set (not shown here), and would have made it
more difficult to compare the architectures’ behavior for different scenarios in Section 6.
Table 3. Average model validation accuracy over 16 models on different feature sets.
Table 4. Average prediction time including feature generation on different feature sets.
5.2. Hyperparameters
A grid search was performed to determine the optimal hyperparameters for each
architecture. A limit of 100 ms was set for the combined time of feature generation and
prediction to ensure that the resulting models would still be suitable for a real-world
use case. In practice, this limit did not have to be enforced, as none of the architectures
ever surpassed it for any given hyperparameter combination. Table 5 shows the three
architectures identified by their scikit-learn implementation and selected hyperparameter
values for the hyperparameters that were optimized.
Classifier Hyperparameters
criterion = ‘entropy’
n_estimators = 400
RandomForestClassifier max_depth = 10
min_samples_leaf = 1
min_samples_split = 4
C=1
SVC gamma = ‘scale’
kernel = ‘linear’
n_neighbors = 12
KNeighborsClassifier p=2
weights = ‘uniform’
6. Results
The purpose of this study is to evaluate the usability of CNN architectures in the
context of IMU-based recognition of fitness activities. For this purpose, we first determine
a baseline using the traditional machine learning architectures presented in Section 5 that
are commonly used for human activity recognition. We then compare the performance of
these architectures with that of the CNN architectures presented in Section 4. Finally, we
assess the performance impact when there are fewer sensor data available for classification.
For each scenario and architecture, we performed a leave-one-subject-out cross-
validation with splits, as detailed in Section 3.4, resulting in 16 models each. CNNs
were trained with a batch size of 128, early stopping after 15 consecutive epochs with
no improvement, and a maximum of 1000 epochs. For each model trained during cross-
validation, we additionally measured its performance on the test set that was not used
during any training or hyperparameter optimization for any architecture. Therefore, we
report performance metrics as the mean and standard deviation of the 16 trained models.
As we have a well-balanced dataset, we will primarily present results using accuracy as a
metric instead of resorting to less intuitive metrics such as the F1 score.
As seen in Table 7, CNN models generally stopped training well before the maximum
of 1000 epochs was reached, suggesting that no potential performance was lost due to
the maximum number of epochs. ResNet and Deep-CNN generally stopped training the
earliest, with an average of 286 and 263 epochs, respectively, whereas our Scaling-FCN
stopped the latest, with an average of 591 epochs.
When looking at the confusion matrices of all trained models’ predictions on the
test set (see Figure 6 for ResNet and Appendix A for other architectures), some common
misclassifications can be observed. Most notably, all models share prediction errors for high-
knee running, which is most commonly misclassified as heel-to-butt running, suggesting a
similarity in the generated sensor data. Although the FCN (see Figure A5) has the highest
misclassification rate for high-knee running, with only 77.26% accurately predicted, it is
the only architecure with 100% prediction accuracy for all other classes. The second most
commonly misclassified exercise is regular running, with up to 7.55% misclassifications in
the case of Deep-CNN.
Figure 6. Confusion matrix for the ResNet architecture across all models of the cross-validation for
the default scenario on the test data set.
Sensors 2024, 24, 742 12 of 20
Although this behavior was consistent for all CNN models, traditional machine
learning models did not share the same behavior and instead performed worse without
data from either wrist. Table 11 shows the performance of all CNN and traditional machine
learning models when only data from the right ankle are used. In this scenario, the test
accuracies of traditional machine learning models dropped to between 89.80% for RF and
96.20% for SVM. With a test accuracy of 99.86%, our Scaling-FCN performs extremely well
on this reduced problem, performing better than any other architecture in the process,
although other CNNs also achieve scores of at least 98.44%.
Interestingly, all CNN models now perform worse during cross-validation than on
the test set, suggesting that they may no longer overfit the training and validation sets.
However, when we analyzed the individual models trained during cross-validation, it
appeared that the data of a single participant could no longer be accurately classified
for any sensor combination excluding the left foot, resulting in below 70% accuracy for
that particular participant’s validation set for all architectures. Potential causes could be
sensor movement relative to the foot or inconsistent execution of the movement pattern
compared to other participants. Since neither was noticed during recording, we consider
this representative of a real-world use case.
Table 11. Average model accuracy over 16 models trained in a cross-validation using only data from
the right ankle.
In an attempt to find the absolute best model on our data set, we also checked for
combinations of the parameters assessed previously but could not find any combinations
for which the CNNs performed better than for a time window of 52 samples and three-
dimensional sensor data from a single ankle. In particular, all models performed worse
with four-dimensional sensor data than they did with three-dimensional sensor data when
only data from a single ankle were used.
Table 12. Average prediction time of each architecture on the test set, including feature generation for
traditional architectures. Pure prediction times are reported in brackets.
6.4. Limitations
Our results are primarily limited by the data set used in this work and by the specific
model architectures evaluated. We expect the impact of the time window length and
sensor positions, in particular, to be highly dependent on the activities in the data set.
For example, slower weightlifting activities would likely benefit from longer input time
windows and favor wrist sensor data over ankle sensor data. Therefore, while our results
suggest that these parameters may be used to optimize performance, this behavior needs
to be confirmed individually for future data sets.
Regarding the reference neural network architectures, we attempted to use established
architectures with as few changes as possible to fit the requirements detailed in Section 4
to ensure comparability, but we cannot exclude that further hyperparameter tuning or
architectural changes such as ResNeXt [36] produce better results than those presented in
this work. For traditional machine learning architectures, we further cannot guarantee that
the selected feature sets (see Section 5.1) are fully optimal, as our feature selection process
was limited by the computational resources available to us.
Lastly, all prediction times were recorded on a 4-core 8-thread 2200 Mhz Intel(R)
Xeon(R) desktop CPU. Real-world prediction times will highly depend on the respective
smartphone CPU as well as background processes. Furthermore, our prediction times may
benefit from a higher spatial and temporal locality of the accessed data compared to a
real-world use case. As a result, our prediction times are primarily useful for a comparison
between architectures and configurations and should not be taken as absolutes.
7. Conclusions
In this paper, we investigate the applicability of CNN-based architectures to the task
of IMU-based fitness activity recognition. For this purpose, we designed a preprocess-
ing pipeline, adapted three existing CNN architectures, and developed the Scaling-FCN
architecture. Furthermore, we recorded a new data set [15] consisting of IMU data for
seven different exercises performed by 20 participants, which is made publicly available.
We evaluated the four CNN architectures by comparing their performance with three
traditional machine learning architectures commonly used in human activity recognition
and assessing the impact that different input data parameters had on their performance.
The results of our evaluation suggest that CNN-based architectures are well suited
for IMU-based fitness activity recognition, consistently achieving strong results on our
data set across a number of different input data configurations. Although K-NN and SVM
Sensors 2024, 24, 742 15 of 20
outperform all CNNs on the test set when data from all four sensor positions are available,
achieving up to 99.00% accuracy, FCN, ResNet, and our Scaling-FCN are still within at
most three percentage points of K-NN and SVM on the test set while performing better on
the validation sets. On our data set, CNNs show particularly strong performance when
fewer input data are available, dropping by less than three percentage points when window
sizes are reduced from 1.0 to 0.25 s. When only data from a single foot are available,
CNNs outperform all traditional ML architectures on the test and validation sets, with our
Scaling-FCN achieving up to 99.86% test accuracy.
In future work, we plan to investigate the performance of the Scaling-FCN within our
pipeline on other data sets consisting of different fitness activities and ultimately apply
it within the context of a mobile fitness application to track the user’s fitness activity and
provide real-time feedback. As our data set is publicly available, we hope other scientists
can utilize it to evaluate their systems and provide reference data for different machine
learning architectures. Lastly, more research needs to be conducted to compare a larger
variety of machine learning architectures on the task of fitness activity recognition on
mobile devices, focusing on real-world applications.
Author Contributions: Conceptualization, P.N.M., A.J.M., P.A. and S.G.; methodology, P.N.M. and
A.J.M.; software, P.N.M. and A.J.M.; validation, P.N.M. and A.J.M.; formal analysis, P.N.M. and
A.J.M.; investigation, A.J.M.; resources, P.N.M. and S.G.; data curation, A.J.M.; writing—original draft
preparation, P.N.M. and A.J.M.; writing—review and editing, P.N.M., P.A. and S.G.; visualization,
P.N.M. and A.J.M.; supervision, P.N.M. and S.G.; project administration, P.N.M. and S.G.; funding
acquisition, S.G. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data set is available at https://figshare.com/articles/dataset/
Running_Exercise_IMU_dataset/22117235 (accessed on 17 Feburary 2023).
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AR augmented reality
TSC time series classification
HAR human activity recognition
FAR fitness activity recognition
ML machine learning
DL deep learning
ANN artificial neural network
CNN convolutional neural network
ResNet residual neural network
Deep-CNN deep convolutional neural network
FCN fully convolutional network
Scaling-FCN scaling fully convolutional network
RF random forest
SVM support vector machine
K-NN k-nearest neighbor
IMU inertial measurement unit
BLE Bluetooth low energy
ODR output data rate
JSON javascript object notation
Sensors 2024, 24, 742 16 of 20
Figure A1. Confusion matrix for the RF architecture across all models of the cross-validation for the
default scenario on the test data set.
Figure A2. Confusion matrix for the SVM architecture across all models of the cross-validation for
the default scenario on the test data set.
Sensors 2024, 24, 742 17 of 20
Figure A3. Confusion matrix for the K-NN architecture across all models of the cross-validation for
the default scenario on the test data set.
Figure A4. Confusion matrix for the Deep-CNN architecture across all models of the cross-validation
for the default scenario on the test data set.
Sensors 2024, 24, 742 18 of 20
Figure A5. Confusion matrix for the FCN architecture across all models of the cross-validation for
the default scenario on the test data set.
Figure A6. Confusion matrix for the Scaling-FCN architecture across all models of the cross-validation
for the default scenario on the test data set.
References
1. Müller, P.N.; Fenn, S.; Göbel, S. Javelin Throw Analysis and Assessment with Body-Worn Sensors. In Proceedings of the Serious
Games, Dublin, Ireland, 26–27 October 2023; Haahr, M., Rojas-Salazar, A., Göbel, S., Eds.; Lecture Notes in Computer Science;
Springer Nature Switzerland: Cham, Switzerland, 2023; pp. 36–49.
2. Nadeem, A.; Jalal, A.; Kim, K. Accurate Physical Activity Recognition Using Multidimensional Features and Markov Model for
Smart Health Fitness. Symmetry 2020, 12, 1766. [CrossRef]
Sensors 2024, 24, 742 19 of 20
3. Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep Learning for Time Series Classification: A Review.
Data Min. Knowl. Discov. 2019, 33, 917–963. [CrossRef]
4. Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using
Wearables. arXiv 2016, arXiv:1604.08880.
5. Liu, H.; Zhao, B.; Dai, C.; Sun, B.; Li, A.; Wang, Z. MAG-Res2Net: A Novel Deep Learning Network for Human Activity
Recognition. Physiol. Meas. 2023, 44, 115007. [CrossRef] [PubMed]
6. Mekruksavanich, S.; Jantawong, P.; Jitpattanakul, A. A Deep Learning-based Model for Human Activity Recognition Using
Biosensors Embedded into a Smart Knee Bandage. Procedia Comput. Sci. 2022, 214, 621–627. [CrossRef]
7. Tabrizi, S.S.; Pashazadeh, S.; Javani, V. Comparative Study of Table Tennis Forehand Strokes Classification Using Deep Learning
and SVM. IEEE Sens. J. 2020, 20, 13552–13561. [CrossRef]
8. Anand, A.; Sharma, M.; Srivastava, R.; Kaligounder, L.; Prakash, D. Wearable Motion Sensor Based Analysis of Swing Sports. In
Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico,
18–21 December 2017; pp. 261–267. [CrossRef]
9. Jiao, L.; Bie, R.; Wu, H.; Wei, Y.; Ma, J.; Umek, A.; Kos, A. Golf Swing Classification with Multiple Deep Convolutional Neural
Networks. Int. J. Distrib. Sens. Netw. 2018, 14, 1550147718802186. [CrossRef]
10. Rassem, A.; El-Beltagy, M.; Saleh, M. Cross-Country Skiing Gears Classification Using Deep Learning. arXiv 2017,
arXiv:1706.08924.
11. Brock, H.; Ohgi, Y.; Lee, J. Learning to Judge like a Human: Convolutional Networks for Classification of Ski Jumping Errors.
In Proceedings of the 2017 ACM International Symposium on Wearable Computers, Maui, HI, USA, 11–15 September 2017;
pp. 106–113. [CrossRef]
12. Kautz, T.; Groh, B.H.; Hannink, J.; Jensen, U.; Strubberg, H.; Eskofier, B.M. Activity Recognition in Beach Volleyball Using a
Deep Convolutional Neural Network: Leveraging the Potential of Deep Learning in Sports. Data Min. Knowl. Discov. 2017,
31, 1678–1705. [CrossRef]
13. Stoeve, M.; Schuldhaus, D.; Gamp, A.; Zwick, C.; Eskofier, B.M. From the Laboratory to the Field: IMU-Based Shot and Pass
Detection in Football Training and Game Scenarios Using Deep Learning. Sensors 2021, 21, 3071. [CrossRef]
14. Patalas-Maliszewska, J.; Pajak, I.; Krutz, P.; Pajak, G.; Rehm, M.; Schlegel, H.; Dix, M. Inertial Sensor-Based Sport Activity
Advisory System Using Machine Learning Algorithms. Sensors 2023, 23, 1137. [CrossRef]
15. Müller, P.N.; Müller, A.J. Running Exercise IMU Dataset. 2023. Available online: https://figshare.com/articles/dataset/
Running_Exercise_IMU_Dataset/22117235 (accessed on 17 February 2023).
16. Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition Using
Smartphones. Comput. Intell. 2013, 3, 6.
17. Clements, J. BasicMotions Dataset. Available online: https://www.timeseriesclassification.com/description.php?Dataset=
BasicMotions (accessed on 17 February 2023).
18. O’Reilly, M.; Le Nguyen, T. CounterMovementJump Dataset. Available online: http://timeseriesclassification.com/description.
php?Dataset=CounterMovementJump (accessed on 17 February 2023).
19. Barshan, B.; Altun, K. Daily and Sports Activities. UCI Machine Learning Repository. 2013. Available online: https://archive.ics.
uci.edu/dataset/256/daily+and+sports+activities (accessed on 17 February 2023).
20. Plötz, T.; Guan, Y. Deep Learning for Human Activity Recognition in Mobile Computing. Computer 2018, 51, 50–59. [CrossRef]
21. Dornfeld, P. Entwicklung eines Systems für die Mobile Sensordatenerfassung zur Erkennung von Ganzkörpergesten in Echtzeit.
Bachelor’s Thesis, TU Darmstadt, Darmstadt, Germany, 2019; p. 53.
22. Liu, H.; Xue, T.; Schultz, T. On a Real Real-Time Wearable Human Activity Recognition System. In Proceedings of the 16th
International Joint Conference on Biomedical Engineering Systems and Technologies, Lisbon, Portugal 16–18 February 2023;
pp. 711–720. [CrossRef]
23. Liu, H.; Schultz, T. How Long Are Various Types of Daily Activities? Statistical Analysis of a Multimodal Wearable Sensor-based
Human Activity Dataset. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and
Technologies, Virtual Event, 9–11 February 2022; pp. 680–688. [CrossRef]
24. Cui, Z.; Chen, W.; Chen, Y. Multi-Scale Convolutional Neural Networks for Time Series Classification. arXiv 2016,
arXiv:1603.06995.
25. Sehgal, A.; Kehtarnavaz, N. Guidelines and Benchmarks for Deployment of Deep Learning Models on Smartphones as Real-Time
Apps. Mach. Learn. Knowl. Extr. 2019, 1, 450–465. [CrossRef]
26. Luo, C.; He, X.; Zhan, J.; Wang, L.; Gao, W.; Dai, J. Comparison and Benchmarking of AI Models and Frameworks on Mobile
Devices. arXiv 2020, arXiv:2005.05085.
27. Deng, Y. Deep Learning on Mobile Devices: A Review. In Proceedings of the Mobile Multimedia/Image Processing, Security,
and Applications 2019, Baltimore, MD, USA, 14–18 April 2019; Agaian, S.S., DelMarco, S.P., Asari, V.K., Eds.; SPIE: Baltimore,
MD, USA, 2019; p. 11. [CrossRef]
28. Ignatov, A.; Timofte, R.; Chou, W.; Wang, K.; Wu, M.; Hartley, T.; Van Gool, L. AI Benchmark: Running Deep Neural Networks on
Android Smartphones. In Proceedings of the Computer Vision–ECCV 2018 Workshops, Munich, Germany, 8–14 September 2019;
Leal-Taixé, L., Roth, S., Eds.; Springer International Publishing: Cham, Switzerland, 2019; Volume 11133, pp. 288–314. [CrossRef]
Sensors 2024, 24, 742 20 of 20
29. Niu, W.; Ma, X.; Wang, Y.; Ren, B. 26ms Inference Time for ResNet-50: Towards Real-Time Execution of All DNNs on Smartphone.
arXiv 2019, arXiv:1905.00571.
30. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556.
31. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in
Neural Information Processing Systems 25; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.:
Red Hook, NY, USA, 2012; pp. 1097–1105.
32. Lin, M.; Chen, Q.; Yan, S. Network In Network. arXiv 2014, arXiv:1312.4400.
33. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [CrossRef]
34. Wang, Z.; Yan, W.; Oates, T. Time Series Classification from Scratch with Deep Neural Networks: A Strong Baseline. In
Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017;
pp. 1578–1585. [CrossRef]
35. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [CrossRef]
36. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
37. van Noord, N.; Postma, E. Learning Scale-Variant and Scale-Invariant Features for Deep Image Classification. Pattern Recognit.
2017, 61, 583–592. [CrossRef]
38. Balkhi, P.; Moallem, M. A Multipurpose Wearable Sensor-Based System for Weight Training. Automation 2022, 3, 132–152.
[CrossRef]
39. Lara, O.D.; Labrador, M.A. A Survey on Human Activity Recognition Using Wearable Sensors. IEEE Commun. Surv. Tutor. 2013,
15, 1192–1209. [CrossRef]
40. Cornacchia, M.; Ozcan, K.; Zheng, Y.; Velipasalar, S. A Survey on Activity Detection and Classification Using Wearable Sensors.
IEEE Sens. J. 2017, 17, 386–403. [CrossRef]
41. Barandas, M.; Folgado, D.; Fernandes, L.; Santos, S.; Abreu, M.; Bota, P.; Liu, H.; Schultz, T.; Gamboa, H. TSFEL: Time Series
Feature Extraction Library. SoftwareX 2020, 11, 100456. [CrossRef]
42. Rodrigues, J.; Liu, H.; Folgado, D.; Belo, D.; Schultz, T.; Gamboa, H. Feature-Based Information Retrieval of Multimodal Biosignals
with a Self-Similarity Matrix: Focus on Automatic Segmentation. Biosensors 2022, 12, 1182. [CrossRef]
43. Liu, H.; Hartmann, Y.; Schultz, T. Motion Units: Generalized Sequence Modeling of Human Activities for Sensor-Based Activity
Recognition. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 23–27 August
2021; pp. 1506–1510. [CrossRef]
44. Liu, H. Biosignal Processing and Activity Modeling for Multimodal Human Activity Recognition. Ph.D. Thesis, Universität
Bremen, Bremen, Germany, 2021. [CrossRef]
45. Hartmann, Y.; Liu, H.; Schultz, T. Feature Space Reduction for Multimodal Human Activity Recognition. In Proceedings
of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020)—Volume 4:
BIOSIGNALS, Valletta, Malta, 24–26 February 2020; pp. 135–140. [CrossRef]
46. Hartmann, Y. Feature Selection for Multimodal Human Activity Recognition. Ph.D. Thesis, Universität Bremen, Bremen,
Germany, 2020.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.