0% found this document useful (0 votes)
9 views

Redundant Feature Screening Method

This document presents a novel attention purification mechanism, MSAP, for human activity recognition (HAR) that effectively reduces feature redundancy while maintaining low resource consumption. The proposed model integrates deep learning techniques with a network correction module to enhance performance in sensor-based HAR tasks, addressing challenges such as noise interference and the balance between performance and efficiency. Experimental results demonstrate that the MSAP-DM model outperforms existing state-of-the-art methods across multiple datasets.

Uploaded by

m18233511508
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Redundant Feature Screening Method

This document presents a novel attention purification mechanism, MSAP, for human activity recognition (HAR) that effectively reduces feature redundancy while maintaining low resource consumption. The proposed model integrates deep learning techniques with a network correction module to enhance performance in sensor-based HAR tasks, addressing challenges such as noise interference and the balance between performance and efficiency. Experimental results demonstrate that the MSAP-DM model outperforms existing state-of-the-art methods across multiple datasets.

Uploaded by

m18233511508
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Redundant feature screening method for human activity recognition based on

attention purification mechanism

Hanyu Liu, Boyang Zhao, Zhiqiong Wang* et al.

Abstract systems find extensive usage in various applications such


as health monitoring, fall detection, athlete tracking, and
In the field of sensor-based Human Activity electrocardiogram analysis [Qi and Su, 2022].
Recognition (HAR), deep neural networks provide
Classical machine learning methods such as decision trees
advanced technical support. Many studies
(DT), support vector machines (SVM), random forests (RF),
have proven that recognition accuracy can be
and naive Bayes (NB) have found widespread usage in initial
improved by increasing the depth or width of
sensor-based HAR tasks due to their low computational
the network. However, for wearable devices,
complexity and smaller dataset suitability [Wang et al.,
the balance between network performance and
2016; Chen et al., 2017]. However, the inability of these
resource consumption is crucial. With minimum
conventional methods to capture complex relationships has
resource consumption as the basic principle, we
necessitated the usage of deep learning methods. Progress in
propose a universal attention feature purification
mobile sensing technology has eased the deployment of deep
mechanism, called MSAP, which is suitable for
neural networks such as Convolutional Neural Networks,
multi-scale networks. The mechanism effectively
Recurrent Neural Networks, and Transformers in HAR [Zeng
solves the feature redundancy caused by the
et al., 2014; Meng et al., 2018; Li et al., 2020]. These deep
superposition of multi-scale features by means
learning models, endowed with robust feature learning and
of inter-scale attention screening and connection
complex temporal relationship modeling, thus offer distinct
method. In addition, for deepening network
improvements in sensor-based HAR tasks over traditional
layers, we have designed a network correction
methods.
module that integrates seamlessly between layers
of individual network modules to mitigate inherent Nonetheless, challenges exist in sensor-based HAR:
problems in deep networks. A large number of Difficulty balancing performance and efficiency: HAR
experiments on four publicly available datasets is essentially a classification task. Despite successful
show that the MSAP-DM model effectively reduces feature extraction endeavours, issues persist in practical
the redundant features in the sifted data and application; for example, confusion when learning similar
provides excellent performance with little resource action features, resulting in particular action categories being
consumption. In terms of evaluation indicators, it indistinguishable, or difficulty in applying individual learned
is better than other state-of-the-arts. features to others [Chen et al., 2021], thereby making it
a challenging task. Common strategies to mitigate these
issues include incorporating LSTM, Transformer, and other
1 Introduction modules adept at extracting sequence features based on the
We are witnessing substantial growth in affordable, reliable, convolutional network [Phyo et al., 2022; Qiao et al., 2020];
and proficient solutions as we move towards the era of escalating the depth, width, and parameter quantity of the
autonomous systems. One such application is Human network [Mekruksavanich et al., 2022]; utilizing traditional
Activity Recognition (HAR), a domain which utilizes AI Time series network models for feature extraction [Guan and
technology to unearth motion behavior patterns [Wang et Plötz, 2017; Zeng et al., 2018]. These methods frequently
al., 2019]. HAR, with its ability to classify human activity consume copious resources, posing challenges for wearable
signals into different actions, has profound economic and devices to maintain high efficiency with high accuracy.
research implications, particularly in intelligent healthcare. Noise Interference: The accuracy of recognition in
Predominantly, there exist two categories of HAR systems: sensor-based HAR heavily relies on features extracted
vision-based and sensor-based [Sun et al., 2023b]. Whereas from raw signals which are often polluted by noise
vision-based methods involve image processing and are [Chen et al., 2021], thereby magnifying the challenge
susceptible to various environmental factors, sensor-based of feature extraction. During data acquisition, sensors
techniques are more reliable and cost-effective, and less can produce random noise due to technical errors and
vulnerable to external disturbances. Sensor-based HAR accuracy constraints. Furthermore, the preprocessing stage
might incorporate irrelevant information not linked to the thereby providing a comprehensive description using the
actual signal, generating noise. Environmental noise and features extracted at various scales [Gao et al., 2019]. Tang
electromagnetic interference can also taint data. Traditional et al. [Tang et al., 2023] proposed a Hierarchical Splitting
noise reduction methods such as Gaussian filters and wavelet (HS) method. The HS module, thus, enhances the power of
filters are extensively used to process signal data amassed multi-scale feature representation by capturing an expansive
by sensors like ECGs. However, processing different receptive field of human activities within the feature layer.
movements through typical noise reduction methods, such
as three-axis acceleration information, becomes challenging 2.2 Denoising Methods for HAR
[Lee et al., 2017]. Currently, a unified noise reduction method Numerous denoising methods are widely adopted in the
for diverse action signals in the HAR field is lacking. domain of signal sensors, including low-pass filters, mean
This paper, therefore, makes three main contributions: filters, linear filters, and Kalman filters. These methods
1. We introduce a lightweight attention purification module typically display satisfactory performance for activity signals
for multi-scale networks that can easily integrate [Chen et al., 2021]. However, finding a universal filtering
with any multi-scale network, offering scalability method that fits all circumstances is challenging in HAR
and versatility. It efficiently curtails the network’s systems, as the noise patterns for varied activities and
overprocessing of redundant features with minimal sensors diverge significantly. Human activity encompasses
resource consumption. a vast frequency range, extending from low-frequency
everyday activities to high-frequency hand movements,
2. We enhance the soft threshold selection method of
all of which carry vital information. However, most
the residual shrinkage network to screen redundant
filtering methods are only suitable to deal with noise in
features more proficiently and propose DRSN-M, which
a specific frequency range. It is worth noting that there
is optimal for HAR systems. [Zhang
are currently some solutions. Zhang et al.
3. We have completed a comprehensive examination of the et al., 2017] formulated the Noise-Assisted Multivariate
public collection. Our MSAP-DM has been exhibited Empirical Mode Decomposition (NA-MEMD) method,
with excellent performance, and we have confirmed that intended for preprocessing multi-channel electromyogram
we have submitted an imaginary model for practical use, (EMG) signals to extract statistical data delineating
and we have collected the number of existing models the temporal and spatial attributes of diverse muscle
that we currently have, and we are currently providing groups. Vijayvargiya [Vijayvargiya et al., 2021] put
HAR depth special exploration. forth a wavelet denoising approach using the Wavelet
Decomposition-Enhanced Empirical Mode Decomposition
2 Related work (WD-EEMD) preprocessing technology to eliminate noise
from the SEMG signals of calf muscles for activity detection.
2.1 Deep feature extraction
The advent of myriad deep learning architectures in
recent years has incited the formation of an array of
3 Methods
resilient deep learning methods for HAR. Particularly, The purpose of this paper is to demonstrate the excellent
models employing Convolutional Neural Networks performance of the attention purification mechanism and
(CNN) and Long Short-Term Memory (LSTM) have the interlayer noise reduction network architecture by
demonstrated potent results [Dang et al., 2020; constructing a multi-scale CNN network. Our aim is to
Zeng et al., 2014]. Presently, the pivotal Multi-headed achieve higher performance in the sensor HAR system while
Self-attention mechanism within the Transformer structure consuming fewer resources. In a standard HAR task, we first
has earned several leading-edge performances in this field. need to process the raw signal data, which may be inertial
Sun et al. [Sun et al., 2023a] recently unveiled RetNet, sensor signals such as three-axis acceleration and angular
which mimics a Transformer structure, distinctly offering velocity, or ECG or EMG signals.We utilize the sliding
three concurrent advantages unavailable with standard window method to segment the signal into overlapping
Transformers: parallelizable training, reduced inference windows. While we do not perform any noise reduction in the
cost, and exceptional performance. Furthermore, attention pre-processing stage, we incorporate this step in the feature
mechanisms are proven to be effective in augmenting the extraction stage.
model’s capacity to concentrate on crucial activity features Specify the input feature xτ ∈ RC×T , where C is the
and improving the interpretability of sensor-based HAR feature Channel and T is the length of the time of the channel.
systems. Gao et al. [Gao et al., 2021] innovated a unique In order to capture a wider range of features, we propose an
dual attention mechanism in their DanHAR approach, MSAP structure that can be used for multi-scale networks
thereby presenting a framework that consolidates CNN in the feature extraction stage, which can easily adapt to
channel attention and temporal attention. This intensifies various multi-scale networks and provide a more efficient
the attention toward essential sensor patterns and time connection method for complex networks. Split the feature
steps, culminating in substantial enhancements to the map into s subsets of feature maps with the same channel size
interpretability of Multimodal HAR. Multiscale networks xi , for the first subset of feature maps, we use convolution
are currently advantageous in time-series analysis due to the Ki () and attention Ai () for feature optimization, and then
potential to capture multi-scale information about an object, transfer the features to the next scale according to Figure
MSAP DRSN-M
Adaptive
Conv1d
×
ECA Threshold
Soft
× GAP Thresholding
σ

1×1
1×1
Input
+
~
ECA

ECA

×N
×M

Figure 1: The method proposed in this paper is the total process of human activity identification

1. For xi , when passing in the next scale, the xi and output of Ki (). The feature subset xi is added with the output
xi+1 are superimposed and the features are filtered in the of Ki−1 (), and then fed into Ki (). To reduce parameters
ECA attention, which can eliminate unnecessary redundant while increasing s, we omit the 3 × 1 convolution for x1 .
features of the scale. At the same time, xi is output in Thus, yi can be written as:
its own scale, which we define as yi , The final ideal scale 
expression is yi = Ai [Ki (xi ) + Ki−1 (xi−1 ) + xi−1 ]. x i , i=1
All outputs are then concatenated and passed into a 1×1 yi = Ki (xi ), i=2 (1)
convolution. The DRSN module is added between MSAP 
Ki (xi ) + yi−1 , 2 < i ≤ s.
modules with different channel sizes, which we define as
DRSN-Modular(DM). In DM , we first decompose the input We capture features of different scales and process them using
feature, which is mainly done by convolution, then filter all channel attention, denoted as Ai (). Then, in the next scale,
the decomposed features within the threshold, and finally we combine the pre-processed features Ki−1 (xi−1 ) and the
reconstruct all the filtered signals. Among them, the threshold post-processed features yi−1 , and the post-processed features
is set one by one by the ECA for each channel of the feature. Ki (xi ). This process continues until all scales of features
During the construction process, we refer to a number of have been processed. The formula for yi in this model is:
networks about multiscale, attention, and residual structures
[Tang et al., 2023; Gao et al., 2019; Gao et al., 2021; 
x i , i=1
Zhao et al., 2019; Sun et al., 2023a; Pramanik et al., 2023;
yi = Ai [Ki (xi )], i=2
Yang et al., 2020]. The result is a structure that is more
Ai [Ki (xi ) + Ki−1 (xi−1 ) + yi−1 ], 2 < i ≤ s.

efficient and capable of extracting features, while maintaining
a complex MSAP and MSAP-DM network structure similar (2)
to the original structure. This way, it preserves the original information of the features,
and to some extent strengthens the weight of important
3.1 Multi-scale Attention Feature Extraction features. When using new attention, it can also further filter
the current situation, reducing the relatively ineffective parts
We designed an attention purification mechanism based in the features that were previously given higher weight,
on a multi-scale residual network. Firstly, we use a and improving the relatively effective parts in the features
simple multi-scale residual network as the basic framework. that were previously given lower weight. This approach can
Then, we added inter-channel correlations between each not only select effective features more accurately but also
scale to allow the model to better focus on the combined alleviate the problem of model overfitting.
information of features at multiple scales. Finally, we added
an attention purification mechanism to reduce redundant 3.2 Noise reduction module
features at multiple scales. Our model can capture important
features at multiple scales while avoiding unnecessary and Based on the Deep Residual Shrinkage Network (DRSN),
difficult-to-process features that exist at multiple scales. Fig. we proposed a denoising network DRSN-M that can
1 shows the differences between the bottleneck block and handle redundant features more efficiently, and successfully
the proposed MSAP module. After the 1 × 1 convolution, introduced it into the field of sensor-based HAR to solve the
we evenly split the feature maps into s feature map subsets, noise problem. The soft thresholding method has always been
denoted by xi , where i ∈ {1, 2, . . . , s}. Each feature subset a key step in signal denoising [Vijayvargiya et al., 2021].
xi has the same channel size equal to 1/s compared with the Generally, the original signal is transformed into a domain
input feature map. Except for x1 , each xi has a corresponding where near-zero digital features are not important, and then
3 × 1 convolution, denoted by Ki (). We denote by yi the the soft threshold is applied to convert near-zero features to
DRSN-M
Algorithm 1: The total flow pseudo algorithm
Input: network modules MSAP; Initial network
σ + × parameters θM SAP and θDRSN −M ;
optimization algorithm Adam.
GAP ECA Block Output: Optimized parameters Θ
Block (b)

Block (c)
Block (b)
Block (a)

MSAP
MSAP

MSAP
MSAP

... ~
+
while network parameters not converged do
x ← Conv(x)
×N for m ← 1 to M do
for n ← 1 to N do
Figure 2: In DRSN-M architecture, the soft threshold module is no foreach i in scales do
longer used once after each MSAP module, but only between MSAP fi ← MSAPi (fi )
groups of different convolution kernel sizes
P
f ← Cat( fi )
f̂ ← ECA(GAP (f ))
zero. The soft threshold function can be represented as: α ← Sof t − threshold(f̂ , GAP (f ))
 f ← Shrinkage(α, f )
x − τ, x > τ
yi = 0, −τ ≤ x ≤ τ (3) y ← Conv(AAP (f ))
L ← N1
P

x + τ, x < −τ. W CE(yτ , ŷτ )
Θ ←Adam(L )
where x is the input feature, y is the output feature, and τ is
the threshold, which is a positive parameter. The derivative
can be expressed as:
 them. In order to ensure an objective evaluation of our
∂y 1 x > τ
methodology, several pertinent aspects of these employed
= 0 −τ ≤ x ≤ τ (4)
∂x  datasets have been outlined below.
1 x < −τ.
The PAMAP2 dataset[Reiss and Stricker, 2012], publicly
Zhao et al. [Zhao et al., 2019] proposed two structural available on the UCI repository, captures 18 diverse physical
modules: DRSN-CS and DRSN-CW. In general, DRSN-CW activities from nine subjects. These subjects wore three
performs better because it allows attention to set thresholds Inertial Measurement Units (IMUs) on their dominant wrist,
for each channel individually. However, in deep residual chest, and ankle.
networks, it is difficult to avoid the loss of effective features The WISDM dataset[Kwapisz et al., 2011], a notable
with a large number of denoising modules. Therefore, we HAR benchmark from the Wireless Sensor Data Mining Lab,
have built a DRSN-M (DM) module, which is also based on includes six data attributes: user, activity, timestamp, and x,
the residual network structure. We define the MSAP network y, z accelerations. The data were collected from 29 volunteers
layer with the same convolutional kernel as an MSAP group. performing activities such as walking, jogging, and stair
Each MSAP group uses a DM module for denoising. In climbing using an Android smartphone.
order to reduce the impact of data dimensionality reduction
The OPPORTUNITY dataset[Chavarriaga et al., 2013]
on network resource consumption, we use a lightweight
meticulously documents the activities of 12 subjects in a
non-redundant ECA attention mechanism instead of the
sensor-enriched environment. The dataset, simulating a
traditional SE attention mechanism and set parameters for
real-life setting, includes data from 15 networked sensor
feature processing according to the corresponding number
systems with a total of 72 sensors of 10 different types.
of channels [Wang et al., 2020]. In summary, our model
can solve the problem of denoising difficulties in HAR and The UCI-HAR dataset[Ignatov, 2018] consists of sensor
the problem of the direct application of traditional DRSN recordings from 30 subjects performing routine activities.
networks. We combine it with the MSAP to form the The data were collected using a waist-mounted smartphone,
MSAP-DM (MSAP & DRSN-M). capturing three-axis linear acceleration and three-axis angular
velocity signals at a constant rate of 50 Hz.
3.3 Optimization Before the datasets could be utilized for training,
To optimize the tunable parameters of MSAP-DM, we validation, and testing, they underwent rigorous
develop a downstream classification learning task. Pseudo preprocessing. Details of the data processing and the
algorithms for training all networks in the framework are parameters for MSAP-DM can be found presented in Table
given in Algorithm 2. 1.

4 Experiment
4.2 Evaluation Metrics
4.1 Dataset used in the experiments
This section offers a comprehensive elucidation of the To evaluate the performance of the proposed model for HAR,
conducted experiments and specificities associated with the followed metrics were used for evaluation generally.
Table 1: Dataset Processing Details

Datasets PAMAP2 WISDM OPPORT. UCI-HAR


Sensor 40 3 72 9
Rate 33Hz 20Hz 30Hz 50Hz
Subject 9 29 12 30
Class 12 6 18 6
Window Size 171 90 128
Tr/Va/Te 8:1:1 7:2:1 7:2:1 8:1:1
Batch size 128 512 256 512
Width 10 8 4 8
Scale 4 4 12 8
Lr 0.001 0.001 0.001 0.001

TP + TN
Accuracy =
TP + FN + FP + TN
2 × (Precision × Recall)
F1-macro = (5)
Precision + Recall
X 2 × ωi × (Precisioni × Recalli )
F1-weighted = Figure 3: MSAP (left) and MAP-DM (right) confusion matrices on
i
Precisioni + Recalli the UCI-HAR and PAMAP2 datasets
where TP and TN are the number of true and false
positives, respectively, and FN and FP are the number of three progressive multi-scale models proposed in the method
false negatives and false positives. ωi is the proportion of section, denoted as Extra Addition. The second part is an
samples of class i. extension based on the MSAP network, which validates the
effectiveness of several optimization methods proposed by us
4.3 Experimental environment and design and their improvements.
The experiments in this paper were conducted on the Kaggle The model using a simple unrelated multi-scale network
platform. We used NVIDIA P100 GPU 16GB, with the has already surpassed general neural networks such as
default CPU and other devices. We divide the experiment ResNet in accuracy. However, after adding scale
into two parts, ablation experiments and comparison of interconnections, the accuracy decreases. We believe this
related work. In the ablation experiment part, we first is due to unnecessary feature stacking at multiple scales.
verify the rationality of constructing a multi-scale network. In our attention purification network, the performance
Then further add the DRSN-M module and verify the is significantly improved compared to the former, which
effectiveness of the module by comparing the performance should be attributed to the attention-based feature selection
of MSAP and MSAP-DM. The comparative experiments mechanism. We believe that this is due to excessive
of related work are divided into two parts. We test the feature stacking at multiple scales, resulting in feature
performance and efficiency gap between the classic HAR redundancy, and the attention purification mechanism needs
high-performance model and MSAP and MSAP-DM. Then, to do to eliminate this part of the feature. From the
more detailed comparative experiments will be conducted experimental results, it is clear that the performance of MSAP
between MSAP-DM and more advanced models. is significantly improved compared with the former. In
addition, the direct application of the DRSN noise reduction
5 Results and Discussion module to the MSAP network does not produce good results,
and the network performance is reduced compared to the
5.1 Ablation Experiment original MSAP. The DRSN-M (DM) we built solves this
The ablation experiment is divided into two parts, one is the problem well by adjusting the noise reduction structure,
effectiveness test of the single module, which is used to verify and we have achieved considerable results, improving
the rationality of the model construction. The other part of the accuracy by about 1.5% on WISDM and 3% on
the model sizing test is to find the best trade-off between OPPORTUNITY.
efficiency and performance by adjusting the combination of Fig 3. shows the confusion matrix of MSAP (left) and
hyperparameters that affect the model size. MSAP-DM (right) on UCI-HAR and PAMAP2 datasets.
MSAP-DM achieves higher accuracy on more categories
Module validity test and effectively reduces the problems caused by confusing
We have set up different module combinations to study categories. For example, between the ”Sitting” and
the feasibility of corresponding model performance and ”Standing” categories on UCI-HAR, MSAP has 11.28%
method combinations. First, we validate the rationality of confusion, resulting in an accuracy of 88.72% for ”Sitting”,
constructing a multi-scale network. The experiments include while MSAP-DM optimizes this confusion to 4.10%. The
Table 2: Ablation Experiment

Model Dataset: WISDM Dataset: OPPORTUNITY


Network Extra Addition Accuracy F1-macro F1-weight Accuracy F1-macro F1-weight
\ 96.03 94.47 96.01 82.43 77.08 83.69
Multi-scale net Scale Connections 95.51 93.69 95.57 82.40 77.57 84.63
Attention Purification (AP) 96.85 95.44 96.89 84.39 78.35 84.90
DRSN 96.46 95.02 96.41 81.56 75.02 82.57
MSAP
DRSN-M(DM) 98.24 97.09 98.14 87.55 82.17 87.60

substantially influence the model’s performance. While an


increased number of scales and larger widths can enhance
performance - for instance, setting scales to 12 or 16 yields
high accuracy - a balancing act is needed. The model
with scales of 16 and width 10 (16, 10) also yields good
accuracy, and has yielded our highest performance to date.
However, the associated costs render the performance gains
not worthwhile. On the other hand, configurations such as (4,
12) offer comparable performance to larger models (less than
0.5% difference), without significantly amplifying the cost.
Consequently, even though larger parameters may yield better
performance, their consumption may deem them unworthy.
As a result, we continue to prioritize parameter combinations
that offer significant practical value.
5.2 Comparison with Existing Work
We have conducted a comparative analysis between the
MSAP-DM models and several corresponding models in the
Figure 4: Performance of the MSAP-DM model on HAR domain. This includes a variety of models ranging
OPPORTUNITY datasets at different sizes from early traditional networks to general-purpose CNN and
LSTM series networks, as referred to in the works [Zeng et
al., 2014; Dang et al., 2020; Xia et al., 2020; Dua et al.,
accuracy of the ”Sitting” category also reached 95.90%, and 2021]. Furthermore, we included high-performance residual
the same situation also appeared between the ”2”, ”3”, ”10” network models as well as state-of-the-art sensor-based HAR
and ”11” categories on PAMAP2. Compared with MSAP, the methods. These models were re-implemented in accordance
accuracy rate of the ”MSAP-DM” was significantly improved with our standard methodology. The results derived from our
due to its stronger ability to distinguish complex tasks. experiments are captured in Table 3. It is salient to note that,
when juxtaposed with other models, MSAP-DM delivered
Model performance and time test superior results.
It is worth noting that MSAP-DM achieves very excellent In terms of performance, MSAP-DM stands out with
results on the OPPORTUNITY dataset, but in fact, the an impressive accuracy of 93.33%, 98.24%, 86.23%,
model’s optimal parameters on the OPPORTUNITY dataset and 98.64% on PAMAP2, WISDM, OPPORTUNITY, and
lead to its very large size, which raises the question UCI-HAR datasets respectively. Notably, MSAP-DM
whether such performance improvement is worth its resource exhibits a significant improvement in accuracy compared
consumption? For the MSAR-DM model, the most to classical models.Furthermore, when comparing with
important factors affecting model performance and efficiency state-of-the-art models such as SE-Res2Net [Gao et
are Scales and Width, and there is a certain correlation al., 2019], ResNeXt [Mekruksavanich et al., 2022],
trend between them and model performance. In order to Rev-Attention [Pramanik et al., 2023], and Gated-Res2Net
explore this problem, we set up a network scale experiment [Yang et al., 2020], MSAP still maintains its leading position.
on OPPORTUNITY to study the above relationship, and Figure 5 visually compares the performance and efficiency
optimized the parameters (Scales and width) that mainly between MSAP-DM and advanced models. It is evident that
affect the model size in MSAP-DM, aiming to find the best MSAP-DM outperforms both multi-scale models Res2Net
trade-off between performance and efficiency. and SE-Res2Net by nearly 2% and 1.5% respectively.
Figure 4 articulates the relationship between MSAP-DM Additionally, compared to more advanced grouping topology
network size and performance and time consumption, as network ResNeXt and traditional residual model, MSAP-DM
tested on the OPPORTUNITY dataset. It is evident that demonstrates superior performance without increasing model
both the quantity of Scales and the Width of the Scales complexity.
Table 3: Comparison of related work

PAMAP2 WISDM OPPORTUNITY UCI-HAR


Model
Accuracy(%) Time(s) Accuracy(%) Time(s) Accuracy(%) Time(s) Accuracy(%) Time(s)
CNN 90.96 24 93.31 37 82.15 39 92.39 14
LSTM 91.51 71 96.71 101 81.65 68 95.52 12
LSTM-CNN 91.43 64 95.90 148 77.64 96 97.01 32
CNN-GRU 91.58 38 94.95 78 79.85 44 95.11 17
SE-Res2Net 92.25 157 95.52 527 82.15 178 96.60 83
ResNeXt 89.34 1706 96.67 1364 79.15 561 96.38 228
Gated-Res2Net 91.94 160 97.02 516 81.51 204 96.31 171
Rev-Attention 91.87 188 97.46 374 83.77 138 95.53 88
MSAP-DM 93.33 115 98.24 426 86.23 100 99.05 132

Figure 6: Loss curve of HAR model on OPPOTUNITY data


Figure 5: Model performance analysis (Accuracy, F1-macro, verification set
F1-weighted and Time)

To provide a comprehensive evaluation of their


performance during training, we present Figure 6 illustrating
the loss plot for state-of-the-art models on the OPPOTUNITY
dataset across training and validation sets. In terms of
efficiency, the classical model exhibits a remarkably
high level of performance, enabling swift completion
of computation and data processing tasks. However, its
limited capability to handle complex tasks or large datasets
diminishes its value as a reference point. Conversely, in our
advanced model, we have optimized the DRSN interlayer
structure which empowers MSAP-DM to outperform other
models without significant time costs. Figure 7 is our
relationship between model performance and time. In the
figure, x axis is the total time spent by the model on the four
data sets, and y axis is the average accuracy of the model.
It can be seen that the time cost of the MSAP-DM is not
high compared to a number of advanced models, and the
performance is far beyond them.
Overall, MSAP-DM achieves substantial performance
improvements with minimal overheads, allowing it to
effectively tackle intricate tasks and yield superior results Figure 7: Performance and efficiency analysis of HAR model
while maintaining optimal efficiency. More detailed
experiments can be found in the attachment.
6 Conclusion multimodal human activity recognition using wearable
sensors. Applied Soft Computing, 111:107728, 2021.
In this study, we first explore the feature extraction
capability of multi-scale convolutional neural networks, and [Guan and Plötz, 2017] Yu Guan and Thomas Plötz.
propose a multi-scale convolutional neural network with Ensembles of deep lstm learners for activity recognition
integrated attention feature purification module to enhance using wearables. Proceedings of the ACM on interactive,
the performance of multi-scale networks in sensor-based mobile, wearable and ubiquitous technologies, 1(2):1–28,
HAR systems. We construct a cross-scale feature 2017.
association network while solving the problem of redundant [Ignatov, 2018] Andrey Ignatov. Real-time human activity
features that may affect performance during association, recognition from accelerometer data using convolutional
resulting in broader and more efficient feature extraction neural networks. Applied Soft Computing, 62:915–922,
capabilities. Furthermore, we introduce the deep residual 2018.
shrinkage network into the HAR field to reduce redundant
[Kwapisz et al., 2011] Jennifer R Kwapisz, Gary M Weiss,
features while adjusting its structure to be compatible
and Samuel A Moore. Activity recognition using
with HAR systems. Experiments prove that our method
cell phone accelerometers. ACM SigKDD Explorations
only adds a small amount of resource cost, but achieves
Newsletter, 12(2):74–82, 2011.
huge performance improvement and surpasses all previous
methods. Our research has made a great contribution to [Lee et al., 2017] Song-Mi Lee, Heeryon Cho, and Sang Min
the application in sensor-based HAR. In the future, we will Yoon. Statistical noise reduction for robust human activity
further explore deep feature extraction methods and further recognition. In 2017 IEEE International Conference on
study more effective methods of denoising through network Multisensor Fusion and Integration for Intelligent Systems
models. (MFI), pages 284–288. IEEE, 2017.
[Li et al., 2020] Xiao Li, Yufeng Wang, Bo Zhang, and
References Jianhua Ma. Psdrnn: An efficient and effective
har scheme based on feature extraction and deep
[Chavarriaga et al., 2013] Ricardo Chavarriaga, Hesam learning. IEEE Transactions on Industrial Informatics,
Sagha, Alberto Calatroni, Sundara Tejaswi Digumarti, 16(10):6703–6713, 2020.
Gerhard Tröster, José del R Millán, and Daniel Roggen.
The opportunity challenge: A benchmark database for [Mekruksavanich et al., 2022] Sakorn Mekruksavanich,
on-body sensor-based activity recognition. Pattern Ponnipa Jantawong, and Anuchit Jitpattanakul. A deep
Recognition Letters, 34(15):2033–2042, 2013. learning-based model for human activity recognition
using biosensors embedded into a smart knee bandage.
[Chen et al., 2017] Zhenghua Chen, Qingchang Zhu, Procedia Computer Science, 214:621–627, 2022.
Yeng Chai Soh, and Le Zhang. Robust human activity
[Meng et al., 2018] Bo Meng, XueJun Liu, and Xiaolin
recognition using smartphone sensors via ct-pca and
online svm. IEEE transactions on industrial informatics, Wang. Human action recognition based on quaternion
13(6):3070–3080, 2017. spatial-temporal convolutional neural network and lstm
in rgb videos. Multimedia Tools and Applications,
[Chen et al., 2021] Kaixuan Chen, Dalin Zhang, Lina Yao, 77(20):26901–26918, 2018.
Bin Guo, Zhiwen Yu, and Yunhao Liu. Deep learning [Phyo et al., 2022] Jaeun Phyo, Wonjun Ko, Eunjin Jeon,
for sensor-based human activity recognition: Overview,
and Heung-Il Suk. Transsleep: Transitioning-aware
challenges, and opportunities. ACM Computing Surveys
attention-based deep neural network for sleep staging.
(CSUR), 54(4):1–40, 2021.
IEEE Transactions on Cybernetics, 2022.
[Dang et al., 2020] L Minh Dang, Kyungbok Min, [Pramanik et al., 2023] Rishav Pramanik, Ritodeep Sikdar,
Hanxiang Wang, Md Jalil Piran, Cheol Hee Lee, and and Ram Sarkar. Transformer-based deep reverse
Hyeonjoon Moon. Sensor-based and vision-based human attention network for multi-sensory human activity
activity recognition: A comprehensive survey. Pattern recognition. Engineering Applications of Artificial
Recognition, 108:107561, 2020. Intelligence, 122:106150, 2023.
[Dua et al., 2021] Nidhi Dua, Shiva Nand Singh, and [Qi and Su, 2022] Wen Qi and Hang Su. A cybertwin
Vijay Bhaskar Semwal. Multi-input cnn-gru based human based multimodal network for ecg patterns monitoring
activity recognition using wearable sensors. Computing, using deep learning. IEEE Transactions on Industrial
103:1461–1478, 2021. Informatics, 18(10):6663–6670, 2022.
[Gao et al., 2019] Shang-Hua Gao, Ming-Ming Cheng, Kai [Qiao et al., 2020] Meiying Qiao, Shuhao Yan, Xiaxia Tang,
Zhao, Xin-Yu Zhang, Ming-Hsuan Yang, and Philip Torr. and Chengkuan Xu. Deep convolutional and lstm recurrent
Res2net: A new multi-scale backbone architecture. IEEE neural networks for rolling bearing fault diagnosis
transactions on pattern analysis and machine intelligence, under strong noises and variable loads. Ieee Access,
43(2):652–662, 2019. 8:66257–66269, 2020.
[Gao et al., 2021] Wenbin Gao, Lei Zhang, Qi Teng, Jun [Reiss and Stricker, 2012] Attila Reiss and Didier Stricker.
He, and Hao Wu. Danhar: Dual attention network for Introducing a new benchmarked dataset for activity
monitoring. In 2012 16th international symposium on human activity recognition by continuous attention. In
wearable computers, pages 108–109. IEEE, 2012. Proceedings of the 2018 ACM international symposium on
[Sun et al., 2023a] Yutao Sun, Li Dong, Shaohan Huang, wearable computers, pages 56–63, 2018.
Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, [Zhang et al., 2017] Yi Zhang, Peng Xu, Peiyang Li,
and Furu Wei. Retentive network: A successor to Keyi Duan, Yuexin Wen, Qin Yang, Tao Zhang, and
transformer for large language models. arXiv preprint Dezhong Yao. Noise-assisted multivariate empirical mode
arXiv:2307.08621, 2023. decomposition for multichannel emg signals. Biomedical
[Sun et al., 2023b] Zehua Sun, Qiuhong Ke, Hossein engineering online, 16(1):1–17, 2017.
Rahmani, Mohammed Bennamoun, Gang Wang, and [Zhao et al., 2019] Minghang Zhao, Shisheng Zhong,
Jun Liu. Human action recognition from various data Xuyun Fu, Baoping Tang, and Michael Pecht. Deep
modalities: A review. IEEE Transactions on Pattern residual shrinkage networks for fault diagnosis. IEEE
Analysis and Machine Intelligence, 45(3):3200–3225, Transactions on Industrial Informatics, 16(7):4681–4690,
2023. 2019.
[Tang et al., 2023] Yin Tang, Lei Zhang, Fuhong Min, and
Jun He. Multiscale deep feature learning for human
activity recognition using wearable sensors. IEEE
Transactions on Industrial Electronics, 70(2):2106–2116,
2023.
[Vijayvargiya et al., 2021] Ankit Vijayvargiya, Vishu Gupta,
Rajesh Kumar, Nilanjan Dey, and João Manuel RS
Tavares. A hybrid wd-eemd semg feature extraction
technique for lower limb activity recognition. IEEE
Sensors Journal, 21(18):20431–20439, 2021.
[Wang et al., 2016] Zhelong Wang, Donghui Wu, Jianming
Chen, Ahmed Ghoneim, and Mohammad Anwar Hossain.
A triaxial accelerometer-based human activity recognition
via eemd-based features and game-theory-based feature
selection. IEEE Sensors Journal, 16(9):3198–3207, 2016.
[Wang et al., 2019] Jindong Wang, Yiqiang Chen, Shuji
Hao, Xiaohui Peng, and Lisha Hu. Deep learning for
sensor-based activity recognition: A survey. Pattern
Recognition Letters, 119:3–11, 2019. Deep Learning for
Pattern Recognition.
[Wang et al., 2020] Qilong Wang, Banggu Wu, Pengfei Zhu,
Peihua Li, Wangmeng Zuo, and Qinghua Hu. Eca-net:
Efficient channel attention for deep convolutional neural
networks. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages
11534–11542, 2020.
[Xia et al., 2020] Kun Xia, Jianguang Huang, and Hanyu
Wang. Lstm-cnn architecture for human activity
recognition. IEEE Access, 8:56855–56866, 2020.
[Yang et al., 2020] Chao Yang, Mingxing Jiang, Zhongwen
Guo, and Yuan Liu. Gated res2net for multivariate time
series analysis. In 2020 International Joint Conference on
Neural Networks (IJCNN), pages 1–7. IEEE, 2020.
[Zeng et al., 2014] Ming Zeng, Le T. Nguyen, Bo Yu,
Ole J. Mengshoel, Jiang Zhu, Pang Wu, and Joy
Zhang. Convolutional neural networks for human activity
recognition using mobile sensors. In 6th International
Conference on Mobile Computing, Applications and
Services, pages 197–205, 2014.
[Zeng et al., 2018] Ming Zeng, Haoxiang Gao, Tong Yu,
Ole J Mengshoel, Helge Langseth, Ian Lane, and Xiaobing
Liu. Understanding and improving recurrent networks for

You might also like