0% found this document useful (0 votes)
3 views20 pages

High-Frequency_Workpiece_Image_Recognition_Model_B

This paper presents a high-frequency workpiece image recognition model called HAEN, which utilizes a hybrid attention mechanism to improve recognition precision under complex lighting conditions. The model enhances a dataset through various augmentation techniques and employs lightweight convolutional attention modules to extract robust features, achieving a classification precision of 97.23%. Experimental results demonstrate that HAEN outperforms existing models in recognizing high-frequency workpieces, addressing challenges related to intricate internal textures and minimal property variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views20 pages

High-Frequency_Workpiece_Image_Recognition_Model_B

This paper presents a high-frequency workpiece image recognition model called HAEN, which utilizes a hybrid attention mechanism to improve recognition precision under complex lighting conditions. The model enhances a dataset through various augmentation techniques and employs lightweight convolutional attention modules to extract robust features, achieving a classification precision of 97.23%. Experimental results demonstrate that HAEN outperforms existing models in recognizing high-frequency workpieces, addressing challenges related to intricate internal textures and minimal property variance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Article

High‐Frequency Workpiece Image Recognition Model Based on


Hybrid Attention Mechanism
Jiaqi Deng *, Chenglong Sun, Xin Liu, Gang Du, Liangzhong Jiang and Xu Yang

Southwest Electronics Technology Research Institute, Chengdu 610036, China


* Correspondence: [email protected]; Tel.: +86‐181‐8073‐9329

Abstract: High‐frequency workpieces are specialized items characterized by complex


internal textures and minimal variance in properties. Under intricate lighting conditions,
existing mainstream image recognition models struggle with low precision when applied
to the identification of high‐frequency workpiece images. This paper introduces a
high‐frequency workpiece image recognition model based on a hybrid attention mecha‐
nism, HAEN. Initially, the high‐frequency workpiece dataset is enhanced through geo‐
metric transformations, random noise, and random lighting adjustments to augment the
model’s generalization capabilities. Subsequently, lightweight convolution, including
one‐dimensional and dilated convolutions, is employed to enhance convolutional atten‐
tion and reduce the model’s parameter count, extracting original image features with
robustness to strong lighting and mitigating the impact of lighting conditions on recog‐
nition outcomes. Finally, lightweight re‐estimation attention modules are integrated at
various model levels to reassess spatial information in feature maps and enhance the
model’s representation of depth channel features. Experimental results demonstrate that
the proposed model effectively extracts features from high‐frequency workpiece images
under complex lighting, outperforming existing models in image classification tasks with
a precision of 97.23%.

Keywords: high‐frequency workpiece; image recognition; illumination change; hybrid attention

Academic Editor: Pedro Couto

Received: 12 November 2024


Revised: 20 December 2024
1. Introduction
Accepted: 24 December 2024
Published: 26 December 2024 Manufacturing has long been a major engine of global economic growth, pro‐
foundly impacting employment and social stability worldwide. With the growing trend
Citation: Deng, J.; Sun, C.; Liu, X.;
Du, G.; Jiang, L.; Yang, X.
of intelligent transformation in global manufacturing [1], the level of intelligence in pro‐
High‐Frequency Workpiece Image cessing and industrial production is also improving [2]. High‐frequency workpieces are
Recognition Model Based on Hybrid among the most crucial parts in aerospace equipment, and their processing quality,
Attention Mechanism. Appl. Sci. timeliness, and level of intelligence are significant factors affecting the development of
2025, 15, 94. https://doi.org/10.3390/
the aerospace industry. To enhance the intelligence level of high‐frequency workpiece
app15010094
processing, image recognition technology can be introduced into the processing work‐
Copyright: © 2024 by the author. flow and combined with the Manufacturing Execution System (MES). Capturing data
Licensee MDPI, Basel, Switzerland.
such as the type and dimensions of high‐frequency workpieces through image recogni‐
This article is an open access article
tion technology and transmitting it in real‐time to the MES system supports quality con‐
distributed under the terms and
conditions of the Creative Commons
trol and process optimization during production, thereby improving the efficiency and
Attribution (CC BY) license quality of high‐frequency workpiece processing.
(https://creativecommons.org/license Image recognition technology, based on computer vision, enables computers to
s/by/4.0/). mimic the human visual system to analyze and understand images or videos with the

Appl. Sci. 2025, 15, 94 https://doi.org/10.3390/app15010094


Appl. Sci. 2025, 15, 94 2 of 20

help of deep learning and machine learning methods. This technology can automate and
enhance intelligence in various fields such as gesture recognition [3], face recognition [4],
vehicle identification [5], security monitoring [6], and industrial production [7]. Due to
the increased level of industrial automation and the growing variety of mechanical
workpieces on production lines, traditional manual identification methods can no longer
meet the needs of efficient production [8]. Therefore, workpiece recognition and detec‐
tion have become important applications of computer vision and deep learning tech‐
nology in the field of smart manufacturing [9].
In the 21st century, research in the field of workpiece recognition and detection has
primarily focused on the following four aspects: (1) The recognition and detection of
workpiece processing features. The processing‐feature‐recognition algorithm is one of
the key technologies for realizing the integration of computer‐aided design (CAD),
computer‐aided manufacturing (CAM), and computer‐aided process planning (CAPP)
systems [10]. It is used to identify processing features such as grooves, cavities, surfaces,
and holes in CAD models. Ning et al. [11] proposed a part processing‐feature‐recognition
method based on deep learning, and Wu et al. [12] proposed a graph neural network for
multi‐task processing feature recognition. (2) The recognition and detection of workpiece
posture. Workpiece posture recognition is an important part of modern intelligent pro‐
duction lines. In automated operations such as industrial robot grasping, assembly, and
welding, it is necessary to accurately obtain the position and posture information of the
workpiece to ensure the accuracy and stability of the operation. Yin et al. [13] proposed a
rotating workpiece position‐ and posture‐estimation algorithm based on image recogni‐
tion, and Zhang et al. [14] proposed a workpiece posture‐detection method that combines
small‐sample learning and lightweight deep network (3) The recognition and detection
of workpiece types. Workpiece type recognition mainly uses computer vision technology
to extract and analyze the category of the workpiece from the image information of the
workpiece, providing important technical support for industrial production. Li Qi et al.
[15] proposed a part‐recognition and ‐classification system based on block principal
component analysis (PCA) and a support vector machine (SVM); Xu Wanze et al. [16]
proposed a metal‐part‐recognition algorithm based on ring template matching; Yin Kexin
et al. [17] proposed a high‐frequency component fast‐recognition algorithm based on hi‐
erarchical ring segmentation; Yang Tao et al. [18] proposed a high‐frequency component
deep learning algorithm with joint loss supervision; Zhang Pengfei et al. [19] proposed a
multi‐branch feature fusion convolutional neural network (Multi‐branch Feature Fusion
CNN, MFF‐CNN) for the automatic classification of main bearing cover parts; Yang Le et
al. [20] proposed improved Inception V3 [21] and Xception [22] for the recognition of
threaded connection parts; and Qiao et al. [23] proposed a method based on migration
component analysis for workpiece recognition. (4) Workpiece defect recognition and
detection. Workpiece defect recognition refers to the process of automatically detecting
and identifying defects on the surface or inside of a workpiece during the machining
process. Wang et al. [24] proposed a real‐time defect detection method for metal work‐
pieces, and Chen et al. [25] proposed an improved deep learning model for surface‐defect
detection for rectangular pipe workpieces.
Although the aforementioned workpiece recognition methods can solve problems
related to complex internal textures and small feature differences to some extent, their
research subjects are relatively simple and not effectively applicable to the recognition of
high‐frequency workpieces with complex intra‐class diversity, small inter‐class differ‐
ences, and varying poses and lighting. Thus, there remains significant research space for
high‐frequency workpiece image recognition under complex lighting. To address the
challenges of difficult recognition and low precision under complex lighting, this paper
proposes a high‐frequency workpiece image recognition model based on a hybrid atten‐
Appl. Sci. 2025, 15, 94 3 of 20

tion mechanism (Hybrid Attention EfficientNet, HAEN). The model is based on the base
network model Efficient‐b0 [26]. Data enhancement was used to improve the model’s
robustness. Then, a lightweight convolutional attention module was designed to extract
robust features from workpiece images under strong lighting, reducing the impact of
lighting variations on the recognition results of high‐frequency workpieces. Finally, us‐
ing lightweight re‐estimation attention modules [27], the network’s feature expression of
workpiece images was further enhanced. Experimental results on a laboratory‐produced
high‐frequency workpiece dataset show that the proposed model can automatically focus
on and extract features and is robust against strong lighting, demonstrating significant
advantages in recognition precision over other methods.

2. Approach
2.1. Overall Framework
The HAEN model proposed in this paper was improved based on EfficientNet‐B0.
The amount of workpiece data was increased through data augmentation, the generali‐
zation ability of the model was improved, and the sensitivity of the model to images was
reduced. The feature extraction ability and recognition performance of the network were
improved through two serial modules, the Improved Lightweight Convolutional Block
Attention Module (ILCBAM) and the lightweight re‐estimation attention module
(LRAM). The ILCBAM was added to the input part of the basic network, and the LRAM
was added to the output part of each group of MBConv convolution blocks. The HAEN
model framework and workflow are shown in Figure 1.

Figure 1. HAEN model framework.


Appl. Sci. 2025, 15, 94 4 of 20

2.2. Data Augmentation


In the production process of high‐frequency workpieces, it is necessary to collect
images before and after workpiece heat treatment. However, when using the image ac‐
quisition device to collect images of the workpiece, due to the influence of the on‐site
environment, lighting conditions, and placement angle, the collected workpiece images
can have problems such as uneven lighting, large lighting changes, and inconsistent
workpiece postures. In order to reduce the interference of the above situation on the
feature extraction and recognition of the later workpiece images, it is necessary to en‐
hance the data of the collected workpiece images to simulate the diversity in the real
scene, so as to improve the effect of later processing and recognition. The HAEN model
uses three data enhancement methods to cope with the complex changes in the image,
mainly including geometric transformation, random noise, and random light correction
processing.
Geometric transformation generates diverse images by randomly cropping and
randomly rotating the workpiece image from −45° to 45°, thereby introducing random
image perspectives and eliminating geometric distortion caused by the placement angle
to a certain extent, which helps to improve the performance of the deep learning model
so that it can better adapt to the changes in different workpiece perspectives and pos‐
tures, thereby improving the feature extraction and generalization capabilities of the
model. The effect is shown in Figure 2.

(a) (b) (c)

Figure 2. Geometric transformation. (a) Workpiece image, (b) random cropping, and (c) random
rotation.

Salt and pepper noise, also known as impulse noise, is a common visual disturbance
in digital images. It appears in certain areas of the image in a discrete and random man‐
ner. The pixels are obviously bright or dark and appear abnormal compared to other
pixels in the image. Gaussian noise is a specific type of random process whose charac‐
teristics are described by the normal distribution. The normal distribution is a
bell‐shaped probability distribution whose probability density function reaches a max‐
imum value near the mean and gradually decreases as it deviates from the mean. Its
mathematical expression is:

  x   2 
 
f x , 2 
1
2 2
exp  
 2 2 
 (1)

where 𝜇 represents the mean of the distribution and 𝜎 represents the variance, which
determines the degree of expansion of the distribution.
Random noise introduced random salt and pepper and Gaussian noise into the
workpiece image, making the model more sensitive to small changes in the input data,
which could improve the generalization and robustness of the model to a certain extent.
The effect is shown in Figure 3.
Appl. Sci. 2025, 15, 94 5 of 20

(a) (b) (c)

Figure 3. Random noise. (a) Workpiece image, (b) salt and pepper noise, and (c) gaussian noise.

In order to reduce the interference of color cast and ambient light distribution on
post‐processing, we decided to perform random light correction on the collected work‐
piece samples. Random light correction includes mean white balance, grayscale world
assumption, color cast detection, and color correction based on image analysis. It can
adjust the color distribution in images with three types of lighting problems. Expanding
the dataset through different light processing helps to improve the adaptability and
generalization ability of the model to lighting changes in real industrial scenes. The effect
is shown in Figure 4.

(a) (b) (c) (d)

Figure 4. Random light correction processing. (a) Workpiece image, (b) mean white balance, (c)
grayscale world assumption, and (d) color cast correction.

2.3. Improved Lightweight Convolutional Block Attention Module (ILCBAM)


In actual industrial production environments, the surface of high‐frequency work‐
pieces is easily affected by factors such as uneven illumination and large changes in il‐
lumination, resulting in light spots, shadows, and insufficient light in the collected
workpiece images. If low‐quality workpiece images are directly input into the network to
extract features, effective image features will not be obtained, and it will be difficult to
accurately identify the categories of the high‐frequency workpieces. In order to overcome
the influence of the above interference information on workpiece recognition, this paper
uses the ILCBAM to enhance the features of workpiece images.
Inspired by the Convolutional Block Attention Module [28] (CBAM) and Efficient
Channel Attention [29] (ECA), the ILCBAM was proposed, and its structure is shown in
Figure 5. This module uses the Improved Channel Attention Module (ICAM) and the
Appl. Sci. 2025, 15, 94 6 of 20

Improved Spatial Attention Module (ISAM) to replace the CAM and SAM, respectively,
in the CBAM.

ICAM ISAM

X2

Input X1 Output X 3
Multiply

Figure 5. ILCBAM structure diagram.

2.3.1. Improved Channel Attention Module (ICAM)


The collected workpiece image is an RGB three‐channel color image. Some color
channels contain information that is greatly affected by illumination changes, while some
color channels contain information that is insensitive to illumination changes. The pur‐
pose of the CAM is to explicitly establish the interdependence between color feature
channels in the original image and then automatically obtain the importance of each color
feature channel through learning. According to their importance, the attention to useful
color feature channels is increased, and the color feature channels that are unfavorable to
the current image recognition task are suppressed to minimize the impact of illumination
changes. Since the maximum pooling processing mechanism is used in the CAM, a large
amount of useful information is lost, so the ICAM uses global power pooling to replace
global maximum pooling and then uses one‐dimensional convolution to make the net‐
work focus on the learning of effective channels with less computation. Its structure is
shown in Figure 6.

C  11 C 11
Conv1D
C  H W
C  11
Global power pooling

C  11 C 11
Conv1D Channel attention weight M c

Input X1
Global average pooling
Sigmoid

Multiply by channel

Figure 6. ICAM structure diagram.

The input feature map 𝑿 ∈ R is subjected to global power pooling and global
average pooling to obtain the channel information description maps:

𝑧 ∑ ∑ 𝑿 ℎ, 𝑤 , (2)

𝑧 ∑ ∑ 𝑿 ℎ, 𝑤 , (3)

where 𝐶, 𝐻, and 𝑊 represent the number of feature channels, height, and width of the
image, respectively; ℎ and w represent the coordinates in the height and width direc‐
tions, respectively; and 𝑝 is set to 2 to highlight the local salient features.
𝑧 and 𝑧 accumulate global information in different ways and then perform
one‐dimensional convolution on the information. After the convolution and activation
combination operation, the channel attention weight 𝑴 ∈ R is obtained.
Appl. Sci. 2025, 15, 94 7 of 20

𝑴 𝜎 𝐹𝑘𝑝 𝑧𝑝 ⨂𝜎 𝐹𝑘𝑚 𝑧𝑚 , (4)

where 𝐹 ∙ and 𝐹 ∙ represent one‐dimensional convolution operations, 𝑘 represents


the convolution kernel size of one‐dimensional convolution, and 𝜎 ∙ represents the
sigmoid function.
The one‐dimensional convolution kernel size k can be adaptively calculated by
formula 5 [29]:
log
𝑘 𝛹 𝐶 , (5)

where 𝐶 represents the number of channels, 𝛾 and 𝑏 are set to 2 and 1, respectively,
and odd represents the nearest number.

2.3.2. Improved Spatial Attention Module (ISAM)


In view of the fact that the feature differences of different types of workpieces are
small, and the most important difference area only occupies a small part of the entire
image, 3 × 3 convolution was used to replace the 7 × 7 convolution in the SAM. At the
same time, in order to obtain multi‐scale context information, 3 × 3 dilated convolution
was added in parallel to design the ISAM model, whose structure is shown in Figure 7.
Since dilated convolution will produce a grid effect, different receptive fields are added
and combined. There are only 18 parameters in the entire convolution process, which
greatly reduces the number of parameters compared with the previous 49 parameters

1 H  W

C  H W 2  H W 1 H  W
3 3
Dilated conv
1 H  W

3 3
X2 Spatial attention
Conv
weight M s
[Max pooling,
average pooling]

Sigmoid

Add by element

Figure 7. ISAM structure diagram.

The input feature map 𝑿 ∈ R is subjected to maximum pooling and average


pooling for each pixel by channel, and then the two feature maps are concatenated and
subjected to 3 × 3 convolution and 3 × 3 dilated convolution, respectively, where the di‐
lation and rate are set to 1 and 2, respectively. To ensure that the size of the feature map
remains unchanged, the padding parameter needs to be set, and, finally, the spatial
attention weight 𝑴 ∈ R is obtained through activation and addition operations:

𝑴 𝑿 𝜎 𝐹 𝑃 𝑿 ;𝑃 𝑿 ⊕
(6)
𝜎 𝐹 𝑃 𝑿 ;𝑃 𝑿 ,

where 𝐹 ∙ represents 3 × 3 convolution, 𝐹 ∙ represents 3 × 3 dilated convolution,


𝑃 ∙ represents maximum pooling, and 𝑃 ∙ represents average pooling.
The attention weights are weighted to the input feature map, and the original fea‐
tures in the channel dimension are redefined to obtain enhanced output features:
Appl. Sci. 2025, 15, 94 8 of 20

𝑿 𝑿 ⨂𝑴 ⨂𝑴 (7)

2.4. Lightweight Re‐Evaluation Attention Module (LRAM)


The spatial information of different feature maps has visual and semantic differ‐
ences, which is crucial for extracting effective channel features. However, due to the high
spatial information dimension and large data volume of the feature map, the computa‐
tional cost of processing spatial information is high, making it difficult to train the neural
network and limiting the network’s application value. To solve this problem, this paper
proposes a lightweight re‐evaluation attention module (LRAM), which captures the
global range dependence between spatial positions and aims to enhance the network’s
representation ability by combining channel attention with feature map spatial infor‐
mation. Its structure is shown in Figure 8.

C  H  W 
C  1 1
Average Global depth conv
pooling
C  H W
Interlayer feature
map R

Input X1 Output X 2

Multiply by channel

Figure 8. IRAM structure diagram.

As can be seen from Figure 8, the LRAM can be divided into two steps: spatial in‐
formation compression and channel feature extraction. In order to obtain channel atten‐
tion, the feature maps of different channels need to be re‐estimated. However, due to the
large spatial dimension of the front layer of the convolutional neural network, the com‐
plex feature information, and the high computational cost, it is difficult to perform the
re‐estimation operation directly. Therefore, average pooling is used to compress the spa‐
tial size of the feature map while retaining sufficient spatial information and providing a
lighter input for subsequent channel feature extraction.
Assume that the input feature map is 𝑿 ∈ R and that C, H, and W represent
the number of feature channels, height, and width, respectively. The input feature map
𝑿 is average‐pooled along each feature map to obtain the intermediate layer feature
map R:
𝑹 𝑟 ,𝑟 ,…,𝑟 𝐹𝑝𝑜𝑜𝑙 𝑋1 , (8)

where the height and width of 𝑟 and 𝑛 ∈ 1,2, … , 𝐶 are 𝐻 and 𝑊 , respectively, and
𝐻 𝐻, 𝑊 𝑊, and 𝐹 ∙ represent average pooling.
After obtaining the intermediate feature map R, the importance of each feature map
channel 𝑟 is modeled through the global depth convolution kernel 𝑙 , whose structure
is shown in Figure 9. Global convolution in global depth convolution can directly extract
important channel features from all the spatial information of the feature map. At the
same time, the amount of calculation is significantly reduced through depth convolution
and the important information corresponding to different channels is independently
captured. Therefore, the use of global deep convolution can enable the network to itera‐
tively learn the optimal convolution‐kernel parameters, perceive and make full use of the
wider spatial information in the feature map, and at the same time reduce the computa‐
tional cost as much as possible so that the network can perform feature processing in a
more efficient manner extract.
Appl. Sci. 2025, 15, 94 9 of 20

1 H   W  1 H   W 

r1 l1 s1

C  H  W 
C 11
r2 l2 s2

Attention weight S
Input R

rc lc sc
Multiply by element

Figure 9. Global depth convolution structure diagram.

Assume that S is the attention weight of R, and obtain 𝑹 ∈ R by global depth


convolution of the intermediate feature map 𝑺 ∈ R :
𝑺 𝑆 ,𝑆 ,…,𝑆 𝐹 𝑅 , (9)
where 𝐹 ⋅ represents the global depthwise convolution.
Since 𝑟 and 𝑙 have the same size, the result of the depthwise convolution opera‐
tion is a scalar, which is then used through the activation function to obtain the attention
weight 𝑠 of 𝑟 :
𝑆 𝜎 𝑟 ⊙ 𝑙 , (10)
where 𝜎 ∙ represents the sigmoid function.
The feature maps of different channels are generated by different convolution ker‐
nels, which can capture different types of features and have different effects on network
performance. Therefore, the LRAM enhances the channels that contribute to the classifi‐
cation task and weakens the useless channels according to the learned channel im‐
portance, so that the network selectively focuses on the effective channel features, and
finally obtains the output feature map 𝑿 ∈ R :
𝑿 𝑿 ⨂𝑺 (11)

3. Experiments
3.1. Experimental Environment
The computer used for the experiments was configured as follows: the CPU was an
Intel(R) Core(TM) i5‐10400F (Intel Corporation, Santa Clara, CA, USA), the GPU was an
NVIDIA GeForce GTX 1660 SUPER (NVIDIA Corporation, Santa Clara, CA, USA), with
16 GB of RAM, running a Windows 10 system. The experiments were conducted using
Python 3.6, the PyTorch 1.2 deep learning framework, and the CUDA 10.2 deep learning
network acceleration library. The network input size was 224 × 224, initialized with
pre‐trained weights from ImageNet. An Adam optimizer was employed, using a
cross‐entropy loss function, with a batch size set at 8 and a total of 30 iterations. The
learning rate started at 10−4 and was divided by 10 every 10 iterations. Additionally,
k‐fold cross‐validation (with k = 5) was used to split and rotate the training and test sets
in a 4:1 ratio. Given that the goal of the designed HAEN model was to improve the pre‐
cision of high‐frequency workpiece classification in complex lighting environments,
classification precision was used as the performance metric of the model.
Appl. Sci. 2025, 15, 94 10 of 20

3.2. Experimental Dataset


The experimental dataset consisted of images of high‐frequency workpieces pro‐
duced by a certain company, from which 3600 images were selected to form the dataset,
encompassing 20 categories of workpieces. Data augmentation techniques such as geo‐
metric transformations, random noise, and random lighting correction were employed to
double the size of the dataset. Each category included 360 images, with each image hav‐
ing a resolution of 3822 × 2702 pixels.
In actual industrial production environments, the surfaces of high‐frequency work‐
pieces are easily affected by factors such as uneven illumination and large changes in il‐
lumination, resulting in light spots, shadows, and insufficient light in the collected
workpiece images, as shown in Figure 10. If low‐quality workpiece images are directly
input into the network to extract features, effective image features will not be obtained,
and it will be difficult to accurately identify the categories of the high‐frequency work‐
pieces. Figure 10 also shows three different categories of high‐frequency workpieces,
where the red circles indicate the subtle differences between the three categories. It can be
seen from Figure 10 that in each type of high‐frequency workpiece image, its interior has
complex texture characteristics; for different types of high‐frequency workpiece images,
there are small differences between the classes. Therefore, the recognition of
high‐frequency workpiece images under multiple types of complex illumination is an
extremely challenging task.

(a) (b)

(c) (d)

(e) (f)

Figure 10. A part of the experimental data. In this context, images (a–c) represent the effects of the
same workpiece under the following different lighting conditions: (a) light spot, (b) shadow, and
(c) insufficient light. Images (d–f) are comparison images showing subtle differences between
different workpieces.
Appl. Sci. 2025, 15, 94 11 of 20

3.3. Model Parameter Selection Experiment


H and W in the LRAM are the length and width of the intermediate feature map
R, which determine the size of the retained spatial information, further affecting the
number of model parameters and classification performance. This section adjusts the
spatial size of the LRAM intermediate feature map to study the impact of H and W on
the model. From Table 1, we can see that the HAEN model has 7 groups of MBConv
convolution blocks. When the input image size is 224 × 224, the spatial size of the output
feature map of the last group of MBConv convolution blocks is 7 × 7. Therefore, in the
experiment, the maximum value of 〈H , W 〉 was 〈7,7〉, and the experimental results are
shown in Table 1. Among them, 〈H , W 〉 is 〈1,1〉, which means that the global average
pooling operation was used for the input feature map and the spatial information of the
feature map was not used.

Table 1. The influence of H′ and W′ on the recognition results of high‐frequency workpieces.

𝐇 , 𝐖 Values Precision/% Parameters/M


〈7,7〉 97.23 6.05
〈5,5〉 96.93 5.69
〈3,3〉 96.74 5.45
〈1,1〉 96.41 5.33

From the experimental results, we can see that when 〈H , W 〉 is 〈7,7〉 , the
high‐frequency workpiece recognition achieves the highest precision, while the number
of model parameters increases only slightly. As 〈H , W 〉 increases, the effect of work‐
piece recognition continues to improve, because a larger 〈H , W 〉 enables the network to
utilize more spatial information. Therefore, this paper selects 〈7,7〉 as the spatial param‐
eter of the intermediate feature map.

3.4. Recognition Precision Comparison Experiment


In order to verify the recognition effect of the model proposed in this paper, the
HAEN model was compared with the original network EfficientNet‐B0 and the recent
recognition model related to solving complex lighting problems on the high‐frequency
workpiece dataset. The comparison models included the following: (1) the basic model
was EfficientNet‐B0, denoted as Basis; (2) the recognition model that integrated the gra‐
dient feature [30], denoted as IGFN (Integrated Gradient Features Network); (3) the
model based on the attention mechanism [31], denoted as IAN (Improved AlexNet For
Fish Species Recognition); (4) the model based on Retinex and the attention mechanism
[32], denoted as IRN (Improved RegNet For Traffic Sign Recognition). In order to accu‐
rately test the difference in the effects of different models, the experimental environment
and experimental dataset in all model experiments were consistent with Sections 3.1 and
3.2, respectively. The comparison results of the high‐frequency workpiece recognition of
different models are shown in Table 2.

Table 2. High‐frequency workpiece recognition results of different models.

Parameter Training
Model Backbone Network Precision/%
Quantity/M Time/s
Basis EfficientNet‐B0 5.30 2314 87.92
IGFN SqueezeNet 5.43 2563 91.78
IAN AlexNet 240.17 4620 90.85
IRN RegNet 25.19 2149 92.64
HAEN EfficientNet‐B0 6.05 2405 97.23
Appl. Sci. 2025, 15, 94 12 of 20

The following can be seen from Table 3: (1) When directly using the EfficientNet‐B0
network to classify high‐frequency workpiece images, the recognition precision was
87.92%, which was the worst among all models and much lower than IGFN, IAN, and
IRN. (2) Compared with IGFN, IAN, and IRN, the recognition precision of the HAEN
model was higher by 5.45%, 6.38%, and 4.59% respectively, indicating that the model
proposed in this article has significant advantages in the precision of high‐frequency
workpiece recognition. (3) In the HAEN model, the highest precision rate of 97.23% was
achieved on the high‐frequency workpiece dataset, indicating that the data enhancement
and hybrid attention mechanisms have high recognition precision. (4) The HAEN model
outperformed IAN and IRN in terms of parameter count and surpassed IGFN and IAN in
training time. Overall, the HAEN model is superior to other models according to various
metrics.

Table 3. Classification performance before and after model improvement.

Experiment1 Experiment2 Experiment3 Experiment4 Experiment5


Before improvement 87.92 87.34 87.31 87.37 87.30
After improvement 97.23 97.16 97.20 97.15 97.17

In order to further analyze the recognition effects of different methods, Basis and the
HAEN model were compared by using a confusion matrix, and the comparison results
are shown in Figure 11. It can be intuitively seen from the figure that the HAEN model
showed a significant improvement in the classification precision of high‐frequency
workpieces compared with Basis. From the confusion matrix, it can be seen that the clas‐
sification results of the HAEN model are more concentrated on the diagonal, while the
proportion of misclassification on the non‐diagonal line is significantly reduced. The
classification precision of each category of workpieces was above 0.90, which shows that
through data enhancement and the introduction of the ILCBAM and LRAM, the influ‐
ence of illumination changes on the classification results was overcome and the percep‐
tion ability of Basis for strong illumination‐robust feature information was improved,
which could effectively solve the problem of high‐frequency workpieces being difficult to
accurately classify due to illumination changes.

(a)
Appl. Sci. 2025, 15, 94 13 of 20

(b)

Figure 11. Test set confusion matrix for (a) Basis and (b) HAEN.

From the confusion matrix of the HAEN model, it is evident that the two types of
workpieces with the highest misclassification rates are Type 8 and Type 15, as shown in
Figure 12.

(a) (b)

Figure 12. Misclassified workpieces. (a) Type 8 and (b) Type 15.

The distinguishing features of these two types of workpieces share the following
common characteristics: the differential features occupy a small proportion of the entire
image and are easily confused with surrounding features under insufficient lighting
conditions, leading to misclassification by the model.

3.5. T‐Test
Given that model performance is influenced by hyperparameters, randomness in the
training process, and other factors, statistical significance tests were conducted. As
shown in Table 3, a paired‐sample t‐test was used to compare the performance of the
same dataset before and after model improvement.
The p‐value obtained from the t‐test was 7.15 × 10−11, which was significantly lower
than the set significance level (0.05) and indicated a significant difference between the
Appl. Sci. 2025, 15, 94 14 of 20

model before and after improvement. This confirmed that the improvements were sta‐
tistically significant.

3.6. Ablation Experiment


3.6.1. Module Ablation Comparison Experiment
To validate the impact of the data augmentation, ILCBAM (Inter‐Layer Convolu‐
tional Block Attention Module), and LRAM (lightweight re‐estimation attention module)
proposed in the HAEN model on the task of the fine‐grained classification of
high‐frequency workpieces, a variable control experiment was conducted by removing
each improvement module from the baseline model. The results are shown in Table 4.

Table 4. Impact of different modules on network recognition precision.

Data Enhancement ILCBAM LRAM Precision/%


× × × 87.92
√ × × 90.47
× √ × 96.10
× × √ 96.24
× √ √ 96.78
√ √ √ 97.23

From Table 4, the following can be observed: (1) Data augmentation methods in‐
creased the precision by 2.55%, indicating that introducing a wider variety of samples
allowed the model to learn richer features, thereby enhancing recognition performance.
(2) Adding the Inter‐Layer Convolutional Block Attention Module (ILCBAM) to the base
network, EfficientNet‐B0, increased precision by 8.18%, demonstrating that the
ILCBAM’s ability to perceive different color channel features helped overcome the effects
of lighting variations, aiding the network in capturing and learning the lighting‐robust
features of different types of workpieces, thus further improving the recognition per‐
formance of the workpieces. (3) Compared to the base model, the lightweight
re‐estimation attention module (LRAM) improved precision by 8.32%, indicating that the
LRAM enhanced the network’s capability to extract feature information, effectively
capturing the features of high‐frequency workpiece images. (4) By incorporating both the
ILCBAM and LRAM, the precision was further increased by 0.54%, showing that com‐
bining these two attention mechanisms further enhanced the recognition performance of
high‐frequency workpieces. (5) The HAEN model that integrated data augmentation, the
ILCBAM, and the LRAM achieved the highest precision of 97.23%, suggesting that in‐
troducing more enhanced samples on the backbone network of EfficientNet‐B0 and in‐
tegrating two different attention modules could better overcome the disruptive features
affecting workpiece classification and thus improved recognition precision.
The ROC and AUC curves of the HAEN model are shown in Figure 13. These fur‐
ther indicate that combining data augmentation, the ILCBAM, and the LRAM improved
the performance of the model.
Appl. Sci. 2025, 15, 94 15 of 20

Figure 13. ROC and AUC curves.

3.6.2. Bias–Variance Decomposition Experiment


Bias–variance decomposition is an important tool in machine learning and in statis‐
tics used to assess model performance. It provides a better understanding of the perfor‐
mance limitations of a model, as shown in Figure 14.

Figure 14. Bias–Variance decomposition.

The experimental results show that the model’s bias and variance were stable across
different types of workpieces. Specifically, Type 8 workpieces exhibited higher bias and
variance. According to the confusion matrix, the recognition precision for this type was
90%, indicating that the model’s performance was not limited by underfitting or over‐
fitting.

3.6.3. Distortion Ablation Comparison Experiment


As noise‐type distortions were already added in the data augmentation phase, the
dataset underwent further distortion experiments with random occlusion and Gaussian
blur. The recognition results of high‐frequency workpieces under different distortion
conditions are shown in Table 5.

Table 5. Recognition results of high‐frequency workpieces under different distortion conditions.

Random Occlusion Gaussian Blur Precision/%


× × 97.23
√ × 97.34
× √ 97.31
√ √ 97.28
Appl. Sci. 2025, 15, 94 16 of 20

From Table 5, the following can be observed: (1) Adding random occlusion and
Gaussian blur separately increased the image classification precision by 0.11% and 0.08%
respectively, indicating that the model learned more variations in images during train‐
ing, which enhanced its generalization capability. (2) Introducing both random occlusion
and Gaussian blur together improved the precision by an additional 0.05%, demonstrat‐
ing the model’s robustness to occlusion and blur distortions.

3.6.4. Two‐Factor ANOVA Experiment


A two‐factor analysis of variance (ANOVA) was used to evaluate the differences in
model precision across different lighting conditions and workpiece types, as shown in
Table 6. Light spots, shadows, and insufficient lighting were considered as one inde‐
pendent variable, and workpiece types as another independent variable, with image
classification precision as the dependent variable.

Table 6. Two‐factor analysis of variance table.

X df Sum_sq Mean_sq F PR(>F)


C(Light) 2 0.0030 0.0015 1.5405 0.2273
C(Category) 19 0.0725 0.0038 3.9189 0.0002
Residual 38 0.0370 0.0010 NaN NaN

From Table 6, the following can be observed: (1) The p‐value for the lighting condi‐
tion group was 0.2273, which was greater than the significance level (0.05), indicating that
this factor did not have a significant impact on model precision. (2) The p‐value for the
workpiece type group was less than 0.05, indicating that this main effect significantly
impacted model precision.

3.7. Model Structure Comparison Experiment


To validate whether the ILCBAM and LRAM could enhance the model’s feature
extraction capabilities, a comparative experiment was conducted between the models
with these two lightweight attention modules and the original models. The comparative
models included ResNet34 [33], MobileNet_V2 [34], MobileNet_V3 [35], and Efficient‐
Net‐B0. The ILCBAM and LRAM were added to the input part and the output part of
each convolutional block of these networks, and experiments were conducted using the
same data augmentation and loss functions. The results are shown in Figure 15.

Figure 15. Recognition results of various models.

The experimental results demonstrate that compared to the base models ResNet34,
MobileNet_V2, MobileNet_V3, and EfficientNet‐B0, the modified models embedded with
the ILCBAM and LRAM increased the recognition precision of high‐frequency work‐
Appl. Sci. 2025, 15, 94 17 of 20

pieces by 0.0799, 0.1139, 0.0785, and 0.0931, respectively. Therefore, the combination of
ILCBAM and LRAM modules enhanced the network’s feature extraction capabilities,
reduced the impact of lighting variations on the recognition results of high‐frequency
workpieces, and effectively resolved the issues caused by complex lighting variations
that make it difficult to accurately classify workpiece images.

3.8. Attention Visualization Experiment


To verify the effectiveness of MAEN in extracting features robust to strong lighting,
the gradient class activation heatmap (Grad‐CAM [36]) method was applied for visuali‐
zation experiments on both the Basis and MAEN models. The Grad‐CAM method uti‐
lizes the gradient information from the last convolutional layer of the model, obtaining a
feature map that reflects the areas of interest through global average pooling and
weighted summation. This map highlights the regions in the input workpiece images
that play a key role in the model’s predictions. The Grad‐CAM visualization results are
shown in Figure 16.

(a) (b) (c)

Figure 16. Grad‐CAM visualization results. (a) Input image, (b) Basis, and (c) HAEN.

Figure 16 intuitively shows the enhancement of the HAEN model’s ability to extract
high‐frequency workpiece features in three situations: light spots, shadows, and insuffi‐
cient light. It can be seen that the Basis model mainly focused on most areas of the
Appl. Sci. 2025, 15, 94 18 of 20

workpiece image, especially in shadows and insufficient light scenes, which made the
network unable to distinguish between the key information and redundant information
in the image, while the improved HAEN model made the network more focused on the
area where the boss was located, which was different from other types of workpieces.
Therefore, the HAEN model could be unaffected by complex lighting and could extract
more distinctive workpiece image features.

4. Discussion
The number of image samples is one of the factors affecting the performance of
models based on deep learning. This study’s model, used for high‐frequency workpiece
image recognition, does not account for the potential scarcity of workpiece data samples
in actual production processes. When the number of samples is limited, CNNs struggle to
generalize data from the training set to the test set. Based on this, researchers have pro‐
posed few‐shot learning (FSL). To enable small‐sample models to effectively extract im‐
age features, Vilalta and Drissi [37] proposed a model based on meta‐learning, which
trains only the feature extraction capabilities of the model before recognizing new classes;
hence, these models are known as feature extractors. To endow models with the capabil‐
ity to analyze features after extraction, researchers have introduced metric‐based classi‐
fication algorithms. These place a metric unit after the feature extractor to compare dis‐
tances between the feature vectors of support‐set samples and query‐set samples within a
unified feature space, eventually outputting a probability distribution of categories to
complete classification. Wang et al. [38] proposed the non‐local network (NLN), which
extracts global features of images through non‐local operations, widely applied in
small‐sample object recognition and classification domains. Future research could con‐
sider improving the model presented in this paper based on these approaches to achieve
the precise recognition of high‐frequency workpiece images in low‐sample scenarios.

5. Conclusions
This paper addresses the issue of existing networks struggling to accurately differ‐
entiate high‐frequency workpieces due to complex lighting variations, proposing a
high‐frequency workpiece image recognition model based on a hybrid attention mecha‐
nism (HAEN). First, the necessity of data augmentation for high‐frequency workpiece
images was analyzed, and the data augmentation process was introduced; then, the
ILCBAM and LRAM were designed and integrated with the EfficientNet‐B0 backbone
network, enhancing the network’s feature extraction capabilities and reducing the impact
of lighting variations on workpiece classification results. Finally, through model param‐
eter selection experiments, model performance comparison experiments, model structure
comparison experiments, and attention visualization experiments, it was verified that
MAEN could automatically focus on and extract features robustly against strong light‐
ing. It exhibited superior classification precision compared to other models, achieving
97.23% precision, meeting the demands for high‐frequency workpiece image recognition
in industrial scenarios.

Author Contributions: Conceptualization, J.D. and C.S.; methodology, J.D.; software, J.D. and X.L.;
validation, C.S.; writing—original draft preparation, J.D. and X.L.; writing—review and editing,
J.D. and G.D.; visualization, L.J. and X.Y.; supervision, C.S.; funding acquisition, C.S. All authors
have read and agreed to the published version of the manuscript.

Funding: This research was supported by the Fund of Key R&D Project of Sichuan Province Sci‐
ence and Technology Department, grant No. 2021YFN0020.

Institutional Review Board Statement: Not applicable.


Appl. Sci. 2025, 15, 94 19 of 20

Informed Consent Statement: Not applicable.

Data Availability Statement: The dataset that was generated and analyzed during this study is
available from the corresponding author upon reasonable request, but restrictions apply to data
reproducibility and commercially confident details.

Acknowledgments: The authors gratefully acknowledge the useful comments of the reviewers.

Conflicts of Interest: The authors declare no conflicts of interest.

References
1. Zhou, J.; Wen, X. Research on influencing factors and multiple driving paths of intelligent transformation in China’s manu‐
facturing industry. J. Comput. Methods Sci. Eng. 2021, 21, 1561–1573.
2. Li, C.‐M.; Li, D.‐N.; Chen, C.‐J.; Zhao, Z.‐X. Parts recognition based on convolutional neural network and virtual training data
sets. Modul. Mach. Tools Autom. Mach. Technol. 2021, 8, 40–43.
3. Song, Y.; Wu, L.; Zhao, Y.; Liu, P.; Lv, R.; Ullah, H. High‐Accuracy Gesture Recognition using Mm‐Wave Radar Based on
Convolutional Block Attention Module. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP),
Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 1485–1489.
4. Majidpour, J.; Jameel, S.K.; Qadir, J.A. Face identification system based on synthesizing realistic image using edge‐aided Gans.
Comput. J. 2023, 66, 61–69.
5. Sharma, P.; Singh, A.; Singh, K.K.; Dhull, A. Vehicle identification using modified region based convolution network for intel‐
ligent transportation system. Multimed. Tools Appl. 2022, 81, 34893–34917.
6. Vieira, J.C.; Sartori, A.; Stefenon, S.F.; Perez, F.L.; De Jesus, G.S.; Leithardt, V.R.Q. Low‐cost CNN for automatic violence
recognition on embedded system. IEEE Access 2022, 10, 25190–25202.
7. Duan, S.; Yin, C.; Liu, M. Recognition Algorithm Based on Convolution Neural Network for the Mechanical Parts. In Advanced
Manufacturing and Automation VIII; Springer: Singapore, 2019; pp. 337–347.
8. Gong, Y.; Wei, C.; Xia, M. Workpiece recognition technology based on improved convolutional neural network. J. Harbin Univ.
Commer. Nat. Sci. Ed. 2023, 39, 294–302.
9. Chen, C.; Abdullah, A.; Kok, S.H.; Tien, D.T.K. Review of industry workpiece classification and defect detection using deep
learning. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 329–340.
10. Zhang, H.; Zhang, S.; Zhang, Y.; Liang, J.; Wang, Z. Machining feature recognition based on a novel multi‐task deep learning
network. Robot. Comput.‐Integr. Manuf. 2022, 77, 102369.
11. Ning, F.; Shi, Y.; Cai, M.; Xu, W. Part machining feature recognition based on a deep learning method. J. Intell. Manuf. 2023, 34,
809–821.
12. Wu, H.; Lei, R.; Peng, Y.; Gao, L. AAGNet: A graph neural network towards multi‐task machining feature recognition. Robot.
Comput.‐Integr. Manuf. 2024, 86, 102661.
13. Yin, K.; Fang, J.; Mo, W.; Wang, H.; Fu, M.; Zhang, T. Research on Position and Posture Estimation of Rotated Workpiece Based
on Image Recognition. In Proceedings of the 2021 4th International Conference on Mechatronics, Robotics and Automation
(ICMRA), Zhanjiang, China, 22–24 October 2021; pp. 69–74.
14. Zhang, T.; Zheng, J.; Zou, Y. Fusing few‐shot learning and lightweight deep network method for detecting workpiece pose
based on monocular vision systems. Measurement 2023, 218, 113118.
15. Li, Q.; Wang, Y. Parts recognition and classification system based on block PCA and SVM. Mech. Eng. Autom. 2021, 4, 21–23+26.
16. Xu, W.; Li, B.; Ou, Y.; Luo, J. Recognition algorithm for metal parts based on ring template matching. Transducer Microsyst.
Technol. 2021, 40, 128–131. https://doi.org/10.13873/J.1000‐9787(2021)02‐0128‐04.
17. Yin, K.; Ou, Y.; Li, B.; Lin, D. Fast identification algorithm of high frequency components based on ring segmentation. Mech.
Des. Manuf. 2022, 12, 196–200+206.
18. Yang, T.; Ou, Y.; Su, X.; Wu, X.; Li, B. High frequency workpiece deep learning recognition algorithm based on joint loss su‐
pervision. Mech. Manuf. Autom. 2023, 52, 30–33+47.
19. Zhang, P.; Shi, Z.; Li, X.; Ouyang, X. Main bearing cap classification and recognition algorithm based on deep learning. J. Graph.
2021, 42, 572–580.
20. Yang, L.; Gan, Z.; Li, Y.; Chao, X.; Zi, H.L.; Wang, X.S. Parts recognition based on improved convolutional neural network.
Instrum. Technol. Sens. 2022, 5, 82–87.
Appl. Sci. 2025, 15, 94 20 of 20

21. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Pro‐
ceedings of the 29th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30
June 2016; pp. 2818–2826.
22. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions.In Proceedings of the 30th IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807.
23. Qiao, L.; Zhang, S.; Liu, C.; Jin, H.; Zhao, H.; Yao, J.; Cao, L.; Ji, Y. Workpiece classification based on transfer component anal‐
ysis. Wirel. Netw. 2024, 30, 4935–4947.
24. Wang, H.; Xu, X.; Liu, Y.; Lu, D.; Liang, B.; Tang, Y. Real‐time defect detection for metal components: a fusion of enhanced
Canny–Devernay and YOLOv6 algorithms. Appl. Sci. 2023, 13, 6898.
25. Chen, C.X.; Azman, A. Improved Deep Learning Model for Workpieces of Rectangular Pipeline Surface Defect Detection.
Computers 2024, 13, 30.
26. Tan, M.X.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the Interna‐
tional Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.
27. Shan, X.; Shen, Y.; Cai, H.; Wen, Y. Convolutional neural network optimization via channel reassessment attention module.
Digit. Signal Process. 2022, 123, 103408.
28. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: convolutional block attention module. In Proceedings of the European Confer‐
ence on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19.
29. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA‐Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
30. Liu, X.L.; Li, T.H.; Zhang, M. Face recognition based on lightweight neural network integrating gradient features. Laser Optoe‐
lectron. Prog. 2020, 57, 84–89.
31. Ju, Z.Y.; Xue, Y.J. Fish species recognition using an improved AlexNet model. Optik 2020, 223, 165499.
32. Zhang, N.; Li, Z.G. A method for traffic sign recognition in weak light. Electron. Devices 2023, 46, 103–108.
33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
34. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Pro‐
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp.
4510–4520.
35. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching
for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 Octo‐
ber–2 November 2019; pp. 1314–1324.
36. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad‐cam: Visual explanations from deep networks
via gradient‐based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29
October 2017; pp. 618–626.
37. Vilalta, R.; Drissi, Y. A perspcetive view and survey of meta‐learning. Artif. Intell. Rev. 2022, 18, 77–95.
38. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non‐local neural networks. In Proceedings of the 2018 IEEE/CVF Conference Vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury
to people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like