Redundant Feature Screening Method
Redundant Feature Screening Method
1×1
1×1
Input
+
~
ECA
ECA
×N
×M
Figure 1: The method proposed in this paper is the total process of human activity identification
1. For xi , when passing in the next scale, the xi and output of Ki (). The feature subset xi is added with the output
xi+1 are superimposed and the features are filtered in the of Ki−1 (), and then fed into Ki (). To reduce parameters
ECA attention, which can eliminate unnecessary redundant while increasing s, we omit the 3 × 1 convolution for x1 .
features of the scale. At the same time, xi is output in Thus, yi can be written as:
its own scale, which we define as yi , The final ideal scale
expression is yi = Ai [Ki (xi ) + Ki−1 (xi−1 ) + xi−1 ]. x i , i=1
All outputs are then concatenated and passed into a 1×1 yi = Ki (xi ), i=2 (1)
convolution. The DRSN module is added between MSAP
Ki (xi ) + yi−1 , 2 < i ≤ s.
modules with different channel sizes, which we define as
DRSN-Modular(DM). In DM , we first decompose the input We capture features of different scales and process them using
feature, which is mainly done by convolution, then filter all channel attention, denoted as Ai (). Then, in the next scale,
the decomposed features within the threshold, and finally we combine the pre-processed features Ki−1 (xi−1 ) and the
reconstruct all the filtered signals. Among them, the threshold post-processed features yi−1 , and the post-processed features
is set one by one by the ECA for each channel of the feature. Ki (xi ). This process continues until all scales of features
During the construction process, we refer to a number of have been processed. The formula for yi in this model is:
networks about multiscale, attention, and residual structures
[Tang et al., 2023; Gao et al., 2019; Gao et al., 2021;
x i , i=1
Zhao et al., 2019; Sun et al., 2023a; Pramanik et al., 2023;
yi = Ai [Ki (xi )], i=2
Yang et al., 2020]. The result is a structure that is more
Ai [Ki (xi ) + Ki−1 (xi−1 ) + yi−1 ], 2 < i ≤ s.
efficient and capable of extracting features, while maintaining
a complex MSAP and MSAP-DM network structure similar (2)
to the original structure. This way, it preserves the original information of the features,
and to some extent strengthens the weight of important
3.1 Multi-scale Attention Feature Extraction features. When using new attention, it can also further filter
the current situation, reducing the relatively ineffective parts
We designed an attention purification mechanism based in the features that were previously given higher weight,
on a multi-scale residual network. Firstly, we use a and improving the relatively effective parts in the features
simple multi-scale residual network as the basic framework. that were previously given lower weight. This approach can
Then, we added inter-channel correlations between each not only select effective features more accurately but also
scale to allow the model to better focus on the combined alleviate the problem of model overfitting.
information of features at multiple scales. Finally, we added
an attention purification mechanism to reduce redundant 3.2 Noise reduction module
features at multiple scales. Our model can capture important
features at multiple scales while avoiding unnecessary and Based on the Deep Residual Shrinkage Network (DRSN),
difficult-to-process features that exist at multiple scales. Fig. we proposed a denoising network DRSN-M that can
1 shows the differences between the bottleneck block and handle redundant features more efficiently, and successfully
the proposed MSAP module. After the 1 × 1 convolution, introduced it into the field of sensor-based HAR to solve the
we evenly split the feature maps into s feature map subsets, noise problem. The soft thresholding method has always been
denoted by xi , where i ∈ {1, 2, . . . , s}. Each feature subset a key step in signal denoising [Vijayvargiya et al., 2021].
xi has the same channel size equal to 1/s compared with the Generally, the original signal is transformed into a domain
input feature map. Except for x1 , each xi has a corresponding where near-zero digital features are not important, and then
3 × 1 convolution, denoted by Ki (). We denote by yi the the soft threshold is applied to convert near-zero features to
DRSN-M
Algorithm 1: The total flow pseudo algorithm
Input: network modules MSAP; Initial network
σ + × parameters θM SAP and θDRSN −M ;
optimization algorithm Adam.
GAP ECA Block Output: Optimized parameters Θ
Block (b)
Block (c)
Block (b)
Block (a)
MSAP
MSAP
MSAP
MSAP
... ~
+
while network parameters not converged do
x ← Conv(x)
×N for m ← 1 to M do
for n ← 1 to N do
Figure 2: In DRSN-M architecture, the soft threshold module is no foreach i in scales do
longer used once after each MSAP module, but only between MSAP fi ← MSAPi (fi )
groups of different convolution kernel sizes
P
f ← Cat( fi )
f̂ ← ECA(GAP (f ))
zero. The soft threshold function can be represented as: α ← Sof t − threshold(f̂ , GAP (f ))
f ← Shrinkage(α, f )
x − τ, x > τ
yi = 0, −τ ≤ x ≤ τ (3) y ← Conv(AAP (f ))
L ← N1
P
x + τ, x < −τ. W CE(yτ , ŷτ )
Θ ←Adam(L )
where x is the input feature, y is the output feature, and τ is
the threshold, which is a positive parameter. The derivative
can be expressed as:
them. In order to ensure an objective evaluation of our
∂y 1 x > τ
methodology, several pertinent aspects of these employed
= 0 −τ ≤ x ≤ τ (4)
∂x datasets have been outlined below.
1 x < −τ.
The PAMAP2 dataset[Reiss and Stricker, 2012], publicly
Zhao et al. [Zhao et al., 2019] proposed two structural available on the UCI repository, captures 18 diverse physical
modules: DRSN-CS and DRSN-CW. In general, DRSN-CW activities from nine subjects. These subjects wore three
performs better because it allows attention to set thresholds Inertial Measurement Units (IMUs) on their dominant wrist,
for each channel individually. However, in deep residual chest, and ankle.
networks, it is difficult to avoid the loss of effective features The WISDM dataset[Kwapisz et al., 2011], a notable
with a large number of denoising modules. Therefore, we HAR benchmark from the Wireless Sensor Data Mining Lab,
have built a DRSN-M (DM) module, which is also based on includes six data attributes: user, activity, timestamp, and x,
the residual network structure. We define the MSAP network y, z accelerations. The data were collected from 29 volunteers
layer with the same convolutional kernel as an MSAP group. performing activities such as walking, jogging, and stair
Each MSAP group uses a DM module for denoising. In climbing using an Android smartphone.
order to reduce the impact of data dimensionality reduction
The OPPORTUNITY dataset[Chavarriaga et al., 2013]
on network resource consumption, we use a lightweight
meticulously documents the activities of 12 subjects in a
non-redundant ECA attention mechanism instead of the
sensor-enriched environment. The dataset, simulating a
traditional SE attention mechanism and set parameters for
real-life setting, includes data from 15 networked sensor
feature processing according to the corresponding number
systems with a total of 72 sensors of 10 different types.
of channels [Wang et al., 2020]. In summary, our model
can solve the problem of denoising difficulties in HAR and The UCI-HAR dataset[Ignatov, 2018] consists of sensor
the problem of the direct application of traditional DRSN recordings from 30 subjects performing routine activities.
networks. We combine it with the MSAP to form the The data were collected using a waist-mounted smartphone,
MSAP-DM (MSAP & DRSN-M). capturing three-axis linear acceleration and three-axis angular
velocity signals at a constant rate of 50 Hz.
3.3 Optimization Before the datasets could be utilized for training,
To optimize the tunable parameters of MSAP-DM, we validation, and testing, they underwent rigorous
develop a downstream classification learning task. Pseudo preprocessing. Details of the data processing and the
algorithms for training all networks in the framework are parameters for MSAP-DM can be found presented in Table
given in Algorithm 2. 1.
4 Experiment
4.2 Evaluation Metrics
4.1 Dataset used in the experiments
This section offers a comprehensive elucidation of the To evaluate the performance of the proposed model for HAR,
conducted experiments and specificities associated with the followed metrics were used for evaluation generally.
Table 1: Dataset Processing Details
TP + TN
Accuracy =
TP + FN + FP + TN
2 × (Precision × Recall)
F1-macro = (5)
Precision + Recall
X 2 × ωi × (Precisioni × Recalli )
F1-weighted = Figure 3: MSAP (left) and MAP-DM (right) confusion matrices on
i
Precisioni + Recalli the UCI-HAR and PAMAP2 datasets
where TP and TN are the number of true and false
positives, respectively, and FN and FP are the number of three progressive multi-scale models proposed in the method
false negatives and false positives. ωi is the proportion of section, denoted as Extra Addition. The second part is an
samples of class i. extension based on the MSAP network, which validates the
effectiveness of several optimization methods proposed by us
4.3 Experimental environment and design and their improvements.
The experiments in this paper were conducted on the Kaggle The model using a simple unrelated multi-scale network
platform. We used NVIDIA P100 GPU 16GB, with the has already surpassed general neural networks such as
default CPU and other devices. We divide the experiment ResNet in accuracy. However, after adding scale
into two parts, ablation experiments and comparison of interconnections, the accuracy decreases. We believe this
related work. In the ablation experiment part, we first is due to unnecessary feature stacking at multiple scales.
verify the rationality of constructing a multi-scale network. In our attention purification network, the performance
Then further add the DRSN-M module and verify the is significantly improved compared to the former, which
effectiveness of the module by comparing the performance should be attributed to the attention-based feature selection
of MSAP and MSAP-DM. The comparative experiments mechanism. We believe that this is due to excessive
of related work are divided into two parts. We test the feature stacking at multiple scales, resulting in feature
performance and efficiency gap between the classic HAR redundancy, and the attention purification mechanism needs
high-performance model and MSAP and MSAP-DM. Then, to do to eliminate this part of the feature. From the
more detailed comparative experiments will be conducted experimental results, it is clear that the performance of MSAP
between MSAP-DM and more advanced models. is significantly improved compared with the former. In
addition, the direct application of the DRSN noise reduction
5 Results and Discussion module to the MSAP network does not produce good results,
and the network performance is reduced compared to the
5.1 Ablation Experiment original MSAP. The DRSN-M (DM) we built solves this
The ablation experiment is divided into two parts, one is the problem well by adjusting the noise reduction structure,
effectiveness test of the single module, which is used to verify and we have achieved considerable results, improving
the rationality of the model construction. The other part of the accuracy by about 1.5% on WISDM and 3% on
the model sizing test is to find the best trade-off between OPPORTUNITY.
efficiency and performance by adjusting the combination of Fig 3. shows the confusion matrix of MSAP (left) and
hyperparameters that affect the model size. MSAP-DM (right) on UCI-HAR and PAMAP2 datasets.
MSAP-DM achieves higher accuracy on more categories
Module validity test and effectively reduces the problems caused by confusing
We have set up different module combinations to study categories. For example, between the ”Sitting” and
the feasibility of corresponding model performance and ”Standing” categories on UCI-HAR, MSAP has 11.28%
method combinations. First, we validate the rationality of confusion, resulting in an accuracy of 88.72% for ”Sitting”,
constructing a multi-scale network. The experiments include while MSAP-DM optimizes this confusion to 4.10%. The
Table 2: Ablation Experiment