High-Frequency_Workpiece_Image_Recognition_Model_B
High-Frequency_Workpiece_Image_Recognition_Model_B
help of deep learning and machine learning methods. This technology can automate and
enhance intelligence in various fields such as gesture recognition [3], face recognition [4],
vehicle identification [5], security monitoring [6], and industrial production [7]. Due to
the increased level of industrial automation and the growing variety of mechanical
workpieces on production lines, traditional manual identification methods can no longer
meet the needs of efficient production [8]. Therefore, workpiece recognition and detec‐
tion have become important applications of computer vision and deep learning tech‐
nology in the field of smart manufacturing [9].
In the 21st century, research in the field of workpiece recognition and detection has
primarily focused on the following four aspects: (1) The recognition and detection of
workpiece processing features. The processing‐feature‐recognition algorithm is one of
the key technologies for realizing the integration of computer‐aided design (CAD),
computer‐aided manufacturing (CAM), and computer‐aided process planning (CAPP)
systems [10]. It is used to identify processing features such as grooves, cavities, surfaces,
and holes in CAD models. Ning et al. [11] proposed a part processing‐feature‐recognition
method based on deep learning, and Wu et al. [12] proposed a graph neural network for
multi‐task processing feature recognition. (2) The recognition and detection of workpiece
posture. Workpiece posture recognition is an important part of modern intelligent pro‐
duction lines. In automated operations such as industrial robot grasping, assembly, and
welding, it is necessary to accurately obtain the position and posture information of the
workpiece to ensure the accuracy and stability of the operation. Yin et al. [13] proposed a
rotating workpiece position‐ and posture‐estimation algorithm based on image recogni‐
tion, and Zhang et al. [14] proposed a workpiece posture‐detection method that combines
small‐sample learning and lightweight deep network (3) The recognition and detection
of workpiece types. Workpiece type recognition mainly uses computer vision technology
to extract and analyze the category of the workpiece from the image information of the
workpiece, providing important technical support for industrial production. Li Qi et al.
[15] proposed a part‐recognition and ‐classification system based on block principal
component analysis (PCA) and a support vector machine (SVM); Xu Wanze et al. [16]
proposed a metal‐part‐recognition algorithm based on ring template matching; Yin Kexin
et al. [17] proposed a high‐frequency component fast‐recognition algorithm based on hi‐
erarchical ring segmentation; Yang Tao et al. [18] proposed a high‐frequency component
deep learning algorithm with joint loss supervision; Zhang Pengfei et al. [19] proposed a
multi‐branch feature fusion convolutional neural network (Multi‐branch Feature Fusion
CNN, MFF‐CNN) for the automatic classification of main bearing cover parts; Yang Le et
al. [20] proposed improved Inception V3 [21] and Xception [22] for the recognition of
threaded connection parts; and Qiao et al. [23] proposed a method based on migration
component analysis for workpiece recognition. (4) Workpiece defect recognition and
detection. Workpiece defect recognition refers to the process of automatically detecting
and identifying defects on the surface or inside of a workpiece during the machining
process. Wang et al. [24] proposed a real‐time defect detection method for metal work‐
pieces, and Chen et al. [25] proposed an improved deep learning model for surface‐defect
detection for rectangular pipe workpieces.
Although the aforementioned workpiece recognition methods can solve problems
related to complex internal textures and small feature differences to some extent, their
research subjects are relatively simple and not effectively applicable to the recognition of
high‐frequency workpieces with complex intra‐class diversity, small inter‐class differ‐
ences, and varying poses and lighting. Thus, there remains significant research space for
high‐frequency workpiece image recognition under complex lighting. To address the
challenges of difficult recognition and low precision under complex lighting, this paper
proposes a high‐frequency workpiece image recognition model based on a hybrid atten‐
Appl. Sci. 2025, 15, 94 3 of 20
tion mechanism (Hybrid Attention EfficientNet, HAEN). The model is based on the base
network model Efficient‐b0 [26]. Data enhancement was used to improve the model’s
robustness. Then, a lightweight convolutional attention module was designed to extract
robust features from workpiece images under strong lighting, reducing the impact of
lighting variations on the recognition results of high‐frequency workpieces. Finally, us‐
ing lightweight re‐estimation attention modules [27], the network’s feature expression of
workpiece images was further enhanced. Experimental results on a laboratory‐produced
high‐frequency workpiece dataset show that the proposed model can automatically focus
on and extract features and is robust against strong lighting, demonstrating significant
advantages in recognition precision over other methods.
2. Approach
2.1. Overall Framework
The HAEN model proposed in this paper was improved based on EfficientNet‐B0.
The amount of workpiece data was increased through data augmentation, the generali‐
zation ability of the model was improved, and the sensitivity of the model to images was
reduced. The feature extraction ability and recognition performance of the network were
improved through two serial modules, the Improved Lightweight Convolutional Block
Attention Module (ILCBAM) and the lightweight re‐estimation attention module
(LRAM). The ILCBAM was added to the input part of the basic network, and the LRAM
was added to the output part of each group of MBConv convolution blocks. The HAEN
model framework and workflow are shown in Figure 1.
Figure 2. Geometric transformation. (a) Workpiece image, (b) random cropping, and (c) random
rotation.
Salt and pepper noise, also known as impulse noise, is a common visual disturbance
in digital images. It appears in certain areas of the image in a discrete and random man‐
ner. The pixels are obviously bright or dark and appear abnormal compared to other
pixels in the image. Gaussian noise is a specific type of random process whose charac‐
teristics are described by the normal distribution. The normal distribution is a
bell‐shaped probability distribution whose probability density function reaches a max‐
imum value near the mean and gradually decreases as it deviates from the mean. Its
mathematical expression is:
x 2
f x , 2
1
2 2
exp
2 2
(1)
where 𝜇 represents the mean of the distribution and 𝜎 represents the variance, which
determines the degree of expansion of the distribution.
Random noise introduced random salt and pepper and Gaussian noise into the
workpiece image, making the model more sensitive to small changes in the input data,
which could improve the generalization and robustness of the model to a certain extent.
The effect is shown in Figure 3.
Appl. Sci. 2025, 15, 94 5 of 20
Figure 3. Random noise. (a) Workpiece image, (b) salt and pepper noise, and (c) gaussian noise.
In order to reduce the interference of color cast and ambient light distribution on
post‐processing, we decided to perform random light correction on the collected work‐
piece samples. Random light correction includes mean white balance, grayscale world
assumption, color cast detection, and color correction based on image analysis. It can
adjust the color distribution in images with three types of lighting problems. Expanding
the dataset through different light processing helps to improve the adaptability and
generalization ability of the model to lighting changes in real industrial scenes. The effect
is shown in Figure 4.
Figure 4. Random light correction processing. (a) Workpiece image, (b) mean white balance, (c)
grayscale world assumption, and (d) color cast correction.
Improved Spatial Attention Module (ISAM) to replace the CAM and SAM, respectively,
in the CBAM.
ICAM ISAM
X2
Input X1 Output X 3
Multiply
C 11 C 11
Conv1D
C H W
C 11
Global power pooling
C 11 C 11
Conv1D Channel attention weight M c
Input X1
Global average pooling
Sigmoid
Multiply by channel
The input feature map 𝑿 ∈ R is subjected to global power pooling and global
average pooling to obtain the channel information description maps:
𝑧 ∑ ∑ 𝑿 ℎ, 𝑤 , (2)
𝑧 ∑ ∑ 𝑿 ℎ, 𝑤 , (3)
where 𝐶, 𝐻, and 𝑊 represent the number of feature channels, height, and width of the
image, respectively; ℎ and w represent the coordinates in the height and width direc‐
tions, respectively; and 𝑝 is set to 2 to highlight the local salient features.
𝑧 and 𝑧 accumulate global information in different ways and then perform
one‐dimensional convolution on the information. After the convolution and activation
combination operation, the channel attention weight 𝑴 ∈ R is obtained.
Appl. Sci. 2025, 15, 94 7 of 20
where 𝐶 represents the number of channels, 𝛾 and 𝑏 are set to 2 and 1, respectively,
and odd represents the nearest number.
1 H W
C H W 2 H W 1 H W
3 3
Dilated conv
1 H W
3 3
X2 Spatial attention
Conv
weight M s
[Max pooling,
average pooling]
Sigmoid
Add by element
𝑴 𝑿 𝜎 𝐹 𝑃 𝑿 ;𝑃 𝑿 ⊕
(6)
𝜎 𝐹 𝑃 𝑿 ;𝑃 𝑿 ,
𝑿 𝑿 ⨂𝑴 ⨂𝑴 (7)
C H W
C 1 1
Average Global depth conv
pooling
C H W
Interlayer feature
map R
Input X1 Output X 2
Multiply by channel
As can be seen from Figure 8, the LRAM can be divided into two steps: spatial in‐
formation compression and channel feature extraction. In order to obtain channel atten‐
tion, the feature maps of different channels need to be re‐estimated. However, due to the
large spatial dimension of the front layer of the convolutional neural network, the com‐
plex feature information, and the high computational cost, it is difficult to perform the
re‐estimation operation directly. Therefore, average pooling is used to compress the spa‐
tial size of the feature map while retaining sufficient spatial information and providing a
lighter input for subsequent channel feature extraction.
Assume that the input feature map is 𝑿 ∈ R and that C, H, and W represent
the number of feature channels, height, and width, respectively. The input feature map
𝑿 is average‐pooled along each feature map to obtain the intermediate layer feature
map R:
𝑹 𝑟 ,𝑟 ,…,𝑟 𝐹𝑝𝑜𝑜𝑙 𝑋1 , (8)
where the height and width of 𝑟 and 𝑛 ∈ 1,2, … , 𝐶 are 𝐻 and 𝑊 , respectively, and
𝐻 𝐻, 𝑊 𝑊, and 𝐹 ∙ represent average pooling.
After obtaining the intermediate feature map R, the importance of each feature map
channel 𝑟 is modeled through the global depth convolution kernel 𝑙 , whose structure
is shown in Figure 9. Global convolution in global depth convolution can directly extract
important channel features from all the spatial information of the feature map. At the
same time, the amount of calculation is significantly reduced through depth convolution
and the important information corresponding to different channels is independently
captured. Therefore, the use of global deep convolution can enable the network to itera‐
tively learn the optimal convolution‐kernel parameters, perceive and make full use of the
wider spatial information in the feature map, and at the same time reduce the computa‐
tional cost as much as possible so that the network can perform feature processing in a
more efficient manner extract.
Appl. Sci. 2025, 15, 94 9 of 20
1 H W 1 H W
r1 l1 s1
C H W
C 11
r2 l2 s2
Attention weight S
Input R
rc lc sc
Multiply by element
3. Experiments
3.1. Experimental Environment
The computer used for the experiments was configured as follows: the CPU was an
Intel(R) Core(TM) i5‐10400F (Intel Corporation, Santa Clara, CA, USA), the GPU was an
NVIDIA GeForce GTX 1660 SUPER (NVIDIA Corporation, Santa Clara, CA, USA), with
16 GB of RAM, running a Windows 10 system. The experiments were conducted using
Python 3.6, the PyTorch 1.2 deep learning framework, and the CUDA 10.2 deep learning
network acceleration library. The network input size was 224 × 224, initialized with
pre‐trained weights from ImageNet. An Adam optimizer was employed, using a
cross‐entropy loss function, with a batch size set at 8 and a total of 30 iterations. The
learning rate started at 10−4 and was divided by 10 every 10 iterations. Additionally,
k‐fold cross‐validation (with k = 5) was used to split and rotate the training and test sets
in a 4:1 ratio. Given that the goal of the designed HAEN model was to improve the pre‐
cision of high‐frequency workpiece classification in complex lighting environments,
classification precision was used as the performance metric of the model.
Appl. Sci. 2025, 15, 94 10 of 20
(a) (b)
(c) (d)
(e) (f)
Figure 10. A part of the experimental data. In this context, images (a–c) represent the effects of the
same workpiece under the following different lighting conditions: (a) light spot, (b) shadow, and
(c) insufficient light. Images (d–f) are comparison images showing subtle differences between
different workpieces.
Appl. Sci. 2025, 15, 94 11 of 20
From the experimental results, we can see that when 〈H , W 〉 is 〈7,7〉 , the
high‐frequency workpiece recognition achieves the highest precision, while the number
of model parameters increases only slightly. As 〈H , W 〉 increases, the effect of work‐
piece recognition continues to improve, because a larger 〈H , W 〉 enables the network to
utilize more spatial information. Therefore, this paper selects 〈7,7〉 as the spatial param‐
eter of the intermediate feature map.
Parameter Training
Model Backbone Network Precision/%
Quantity/M Time/s
Basis EfficientNet‐B0 5.30 2314 87.92
IGFN SqueezeNet 5.43 2563 91.78
IAN AlexNet 240.17 4620 90.85
IRN RegNet 25.19 2149 92.64
HAEN EfficientNet‐B0 6.05 2405 97.23
Appl. Sci. 2025, 15, 94 12 of 20
The following can be seen from Table 3: (1) When directly using the EfficientNet‐B0
network to classify high‐frequency workpiece images, the recognition precision was
87.92%, which was the worst among all models and much lower than IGFN, IAN, and
IRN. (2) Compared with IGFN, IAN, and IRN, the recognition precision of the HAEN
model was higher by 5.45%, 6.38%, and 4.59% respectively, indicating that the model
proposed in this article has significant advantages in the precision of high‐frequency
workpiece recognition. (3) In the HAEN model, the highest precision rate of 97.23% was
achieved on the high‐frequency workpiece dataset, indicating that the data enhancement
and hybrid attention mechanisms have high recognition precision. (4) The HAEN model
outperformed IAN and IRN in terms of parameter count and surpassed IGFN and IAN in
training time. Overall, the HAEN model is superior to other models according to various
metrics.
In order to further analyze the recognition effects of different methods, Basis and the
HAEN model were compared by using a confusion matrix, and the comparison results
are shown in Figure 11. It can be intuitively seen from the figure that the HAEN model
showed a significant improvement in the classification precision of high‐frequency
workpieces compared with Basis. From the confusion matrix, it can be seen that the clas‐
sification results of the HAEN model are more concentrated on the diagonal, while the
proportion of misclassification on the non‐diagonal line is significantly reduced. The
classification precision of each category of workpieces was above 0.90, which shows that
through data enhancement and the introduction of the ILCBAM and LRAM, the influ‐
ence of illumination changes on the classification results was overcome and the percep‐
tion ability of Basis for strong illumination‐robust feature information was improved,
which could effectively solve the problem of high‐frequency workpieces being difficult to
accurately classify due to illumination changes.
(a)
Appl. Sci. 2025, 15, 94 13 of 20
(b)
Figure 11. Test set confusion matrix for (a) Basis and (b) HAEN.
From the confusion matrix of the HAEN model, it is evident that the two types of
workpieces with the highest misclassification rates are Type 8 and Type 15, as shown in
Figure 12.
(a) (b)
Figure 12. Misclassified workpieces. (a) Type 8 and (b) Type 15.
The distinguishing features of these two types of workpieces share the following
common characteristics: the differential features occupy a small proportion of the entire
image and are easily confused with surrounding features under insufficient lighting
conditions, leading to misclassification by the model.
3.5. T‐Test
Given that model performance is influenced by hyperparameters, randomness in the
training process, and other factors, statistical significance tests were conducted. As
shown in Table 3, a paired‐sample t‐test was used to compare the performance of the
same dataset before and after model improvement.
The p‐value obtained from the t‐test was 7.15 × 10−11, which was significantly lower
than the set significance level (0.05) and indicated a significant difference between the
Appl. Sci. 2025, 15, 94 14 of 20
model before and after improvement. This confirmed that the improvements were sta‐
tistically significant.
From Table 4, the following can be observed: (1) Data augmentation methods in‐
creased the precision by 2.55%, indicating that introducing a wider variety of samples
allowed the model to learn richer features, thereby enhancing recognition performance.
(2) Adding the Inter‐Layer Convolutional Block Attention Module (ILCBAM) to the base
network, EfficientNet‐B0, increased precision by 8.18%, demonstrating that the
ILCBAM’s ability to perceive different color channel features helped overcome the effects
of lighting variations, aiding the network in capturing and learning the lighting‐robust
features of different types of workpieces, thus further improving the recognition per‐
formance of the workpieces. (3) Compared to the base model, the lightweight
re‐estimation attention module (LRAM) improved precision by 8.32%, indicating that the
LRAM enhanced the network’s capability to extract feature information, effectively
capturing the features of high‐frequency workpiece images. (4) By incorporating both the
ILCBAM and LRAM, the precision was further increased by 0.54%, showing that com‐
bining these two attention mechanisms further enhanced the recognition performance of
high‐frequency workpieces. (5) The HAEN model that integrated data augmentation, the
ILCBAM, and the LRAM achieved the highest precision of 97.23%, suggesting that in‐
troducing more enhanced samples on the backbone network of EfficientNet‐B0 and in‐
tegrating two different attention modules could better overcome the disruptive features
affecting workpiece classification and thus improved recognition precision.
The ROC and AUC curves of the HAEN model are shown in Figure 13. These fur‐
ther indicate that combining data augmentation, the ILCBAM, and the LRAM improved
the performance of the model.
Appl. Sci. 2025, 15, 94 15 of 20
The experimental results show that the model’s bias and variance were stable across
different types of workpieces. Specifically, Type 8 workpieces exhibited higher bias and
variance. According to the confusion matrix, the recognition precision for this type was
90%, indicating that the model’s performance was not limited by underfitting or over‐
fitting.
From Table 5, the following can be observed: (1) Adding random occlusion and
Gaussian blur separately increased the image classification precision by 0.11% and 0.08%
respectively, indicating that the model learned more variations in images during train‐
ing, which enhanced its generalization capability. (2) Introducing both random occlusion
and Gaussian blur together improved the precision by an additional 0.05%, demonstrat‐
ing the model’s robustness to occlusion and blur distortions.
From Table 6, the following can be observed: (1) The p‐value for the lighting condi‐
tion group was 0.2273, which was greater than the significance level (0.05), indicating that
this factor did not have a significant impact on model precision. (2) The p‐value for the
workpiece type group was less than 0.05, indicating that this main effect significantly
impacted model precision.
The experimental results demonstrate that compared to the base models ResNet34,
MobileNet_V2, MobileNet_V3, and EfficientNet‐B0, the modified models embedded with
the ILCBAM and LRAM increased the recognition precision of high‐frequency work‐
Appl. Sci. 2025, 15, 94 17 of 20
pieces by 0.0799, 0.1139, 0.0785, and 0.0931, respectively. Therefore, the combination of
ILCBAM and LRAM modules enhanced the network’s feature extraction capabilities,
reduced the impact of lighting variations on the recognition results of high‐frequency
workpieces, and effectively resolved the issues caused by complex lighting variations
that make it difficult to accurately classify workpiece images.
Figure 16. Grad‐CAM visualization results. (a) Input image, (b) Basis, and (c) HAEN.
Figure 16 intuitively shows the enhancement of the HAEN model’s ability to extract
high‐frequency workpiece features in three situations: light spots, shadows, and insuffi‐
cient light. It can be seen that the Basis model mainly focused on most areas of the
Appl. Sci. 2025, 15, 94 18 of 20
workpiece image, especially in shadows and insufficient light scenes, which made the
network unable to distinguish between the key information and redundant information
in the image, while the improved HAEN model made the network more focused on the
area where the boss was located, which was different from other types of workpieces.
Therefore, the HAEN model could be unaffected by complex lighting and could extract
more distinctive workpiece image features.
4. Discussion
The number of image samples is one of the factors affecting the performance of
models based on deep learning. This study’s model, used for high‐frequency workpiece
image recognition, does not account for the potential scarcity of workpiece data samples
in actual production processes. When the number of samples is limited, CNNs struggle to
generalize data from the training set to the test set. Based on this, researchers have pro‐
posed few‐shot learning (FSL). To enable small‐sample models to effectively extract im‐
age features, Vilalta and Drissi [37] proposed a model based on meta‐learning, which
trains only the feature extraction capabilities of the model before recognizing new classes;
hence, these models are known as feature extractors. To endow models with the capabil‐
ity to analyze features after extraction, researchers have introduced metric‐based classi‐
fication algorithms. These place a metric unit after the feature extractor to compare dis‐
tances between the feature vectors of support‐set samples and query‐set samples within a
unified feature space, eventually outputting a probability distribution of categories to
complete classification. Wang et al. [38] proposed the non‐local network (NLN), which
extracts global features of images through non‐local operations, widely applied in
small‐sample object recognition and classification domains. Future research could con‐
sider improving the model presented in this paper based on these approaches to achieve
the precise recognition of high‐frequency workpiece images in low‐sample scenarios.
5. Conclusions
This paper addresses the issue of existing networks struggling to accurately differ‐
entiate high‐frequency workpieces due to complex lighting variations, proposing a
high‐frequency workpiece image recognition model based on a hybrid attention mecha‐
nism (HAEN). First, the necessity of data augmentation for high‐frequency workpiece
images was analyzed, and the data augmentation process was introduced; then, the
ILCBAM and LRAM were designed and integrated with the EfficientNet‐B0 backbone
network, enhancing the network’s feature extraction capabilities and reducing the impact
of lighting variations on workpiece classification results. Finally, through model param‐
eter selection experiments, model performance comparison experiments, model structure
comparison experiments, and attention visualization experiments, it was verified that
MAEN could automatically focus on and extract features robustly against strong light‐
ing. It exhibited superior classification precision compared to other models, achieving
97.23% precision, meeting the demands for high‐frequency workpiece image recognition
in industrial scenarios.
Author Contributions: Conceptualization, J.D. and C.S.; methodology, J.D.; software, J.D. and X.L.;
validation, C.S.; writing—original draft preparation, J.D. and X.L.; writing—review and editing,
J.D. and G.D.; visualization, L.J. and X.Y.; supervision, C.S.; funding acquisition, C.S. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was supported by the Fund of Key R&D Project of Sichuan Province Sci‐
ence and Technology Department, grant No. 2021YFN0020.
Data Availability Statement: The dataset that was generated and analyzed during this study is
available from the corresponding author upon reasonable request, but restrictions apply to data
reproducibility and commercially confident details.
Acknowledgments: The authors gratefully acknowledge the useful comments of the reviewers.
References
1. Zhou, J.; Wen, X. Research on influencing factors and multiple driving paths of intelligent transformation in China’s manu‐
facturing industry. J. Comput. Methods Sci. Eng. 2021, 21, 1561–1573.
2. Li, C.‐M.; Li, D.‐N.; Chen, C.‐J.; Zhao, Z.‐X. Parts recognition based on convolutional neural network and virtual training data
sets. Modul. Mach. Tools Autom. Mach. Technol. 2021, 8, 40–43.
3. Song, Y.; Wu, L.; Zhao, Y.; Liu, P.; Lv, R.; Ullah, H. High‐Accuracy Gesture Recognition using Mm‐Wave Radar Based on
Convolutional Block Attention Module. In Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP),
Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 1485–1489.
4. Majidpour, J.; Jameel, S.K.; Qadir, J.A. Face identification system based on synthesizing realistic image using edge‐aided Gans.
Comput. J. 2023, 66, 61–69.
5. Sharma, P.; Singh, A.; Singh, K.K.; Dhull, A. Vehicle identification using modified region based convolution network for intel‐
ligent transportation system. Multimed. Tools Appl. 2022, 81, 34893–34917.
6. Vieira, J.C.; Sartori, A.; Stefenon, S.F.; Perez, F.L.; De Jesus, G.S.; Leithardt, V.R.Q. Low‐cost CNN for automatic violence
recognition on embedded system. IEEE Access 2022, 10, 25190–25202.
7. Duan, S.; Yin, C.; Liu, M. Recognition Algorithm Based on Convolution Neural Network for the Mechanical Parts. In Advanced
Manufacturing and Automation VIII; Springer: Singapore, 2019; pp. 337–347.
8. Gong, Y.; Wei, C.; Xia, M. Workpiece recognition technology based on improved convolutional neural network. J. Harbin Univ.
Commer. Nat. Sci. Ed. 2023, 39, 294–302.
9. Chen, C.; Abdullah, A.; Kok, S.H.; Tien, D.T.K. Review of industry workpiece classification and defect detection using deep
learning. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 329–340.
10. Zhang, H.; Zhang, S.; Zhang, Y.; Liang, J.; Wang, Z. Machining feature recognition based on a novel multi‐task deep learning
network. Robot. Comput.‐Integr. Manuf. 2022, 77, 102369.
11. Ning, F.; Shi, Y.; Cai, M.; Xu, W. Part machining feature recognition based on a deep learning method. J. Intell. Manuf. 2023, 34,
809–821.
12. Wu, H.; Lei, R.; Peng, Y.; Gao, L. AAGNet: A graph neural network towards multi‐task machining feature recognition. Robot.
Comput.‐Integr. Manuf. 2024, 86, 102661.
13. Yin, K.; Fang, J.; Mo, W.; Wang, H.; Fu, M.; Zhang, T. Research on Position and Posture Estimation of Rotated Workpiece Based
on Image Recognition. In Proceedings of the 2021 4th International Conference on Mechatronics, Robotics and Automation
(ICMRA), Zhanjiang, China, 22–24 October 2021; pp. 69–74.
14. Zhang, T.; Zheng, J.; Zou, Y. Fusing few‐shot learning and lightweight deep network method for detecting workpiece pose
based on monocular vision systems. Measurement 2023, 218, 113118.
15. Li, Q.; Wang, Y. Parts recognition and classification system based on block PCA and SVM. Mech. Eng. Autom. 2021, 4, 21–23+26.
16. Xu, W.; Li, B.; Ou, Y.; Luo, J. Recognition algorithm for metal parts based on ring template matching. Transducer Microsyst.
Technol. 2021, 40, 128–131. https://doi.org/10.13873/J.1000‐9787(2021)02‐0128‐04.
17. Yin, K.; Ou, Y.; Li, B.; Lin, D. Fast identification algorithm of high frequency components based on ring segmentation. Mech.
Des. Manuf. 2022, 12, 196–200+206.
18. Yang, T.; Ou, Y.; Su, X.; Wu, X.; Li, B. High frequency workpiece deep learning recognition algorithm based on joint loss su‐
pervision. Mech. Manuf. Autom. 2023, 52, 30–33+47.
19. Zhang, P.; Shi, Z.; Li, X.; Ouyang, X. Main bearing cap classification and recognition algorithm based on deep learning. J. Graph.
2021, 42, 572–580.
20. Yang, L.; Gan, Z.; Li, Y.; Chao, X.; Zi, H.L.; Wang, X.S. Parts recognition based on improved convolutional neural network.
Instrum. Technol. Sens. 2022, 5, 82–87.
Appl. Sci. 2025, 15, 94 20 of 20
21. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Pro‐
ceedings of the 29th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30
June 2016; pp. 2818–2826.
22. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions.In Proceedings of the 30th IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807.
23. Qiao, L.; Zhang, S.; Liu, C.; Jin, H.; Zhao, H.; Yao, J.; Cao, L.; Ji, Y. Workpiece classification based on transfer component anal‐
ysis. Wirel. Netw. 2024, 30, 4935–4947.
24. Wang, H.; Xu, X.; Liu, Y.; Lu, D.; Liang, B.; Tang, Y. Real‐time defect detection for metal components: a fusion of enhanced
Canny–Devernay and YOLOv6 algorithms. Appl. Sci. 2023, 13, 6898.
25. Chen, C.X.; Azman, A. Improved Deep Learning Model for Workpieces of Rectangular Pipeline Surface Defect Detection.
Computers 2024, 13, 30.
26. Tan, M.X.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the Interna‐
tional Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114.
27. Shan, X.; Shen, Y.; Cai, H.; Wen, Y. Convolutional neural network optimization via channel reassessment attention module.
Digit. Signal Process. 2022, 123, 103408.
28. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: convolutional block attention module. In Proceedings of the European Confer‐
ence on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19.
29. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA‐Net: Efficient channel attention for deep convolutional neural networks.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 11534–11542.
30. Liu, X.L.; Li, T.H.; Zhang, M. Face recognition based on lightweight neural network integrating gradient features. Laser Optoe‐
lectron. Prog. 2020, 57, 84–89.
31. Ju, Z.Y.; Xue, Y.J. Fish species recognition using an improved AlexNet model. Optik 2020, 223, 165499.
32. Zhang, N.; Li, Z.G. A method for traffic sign recognition in weak light. Electron. Devices 2023, 46, 103–108.
33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
34. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Pro‐
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp.
4510–4520.
35. Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching
for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 27 Octo‐
ber–2 November 2019; pp. 1314–1324.
36. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad‐cam: Visual explanations from deep networks
via gradient‐based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29
October 2017; pp. 618–626.
37. Vilalta, R.; Drissi, Y. A perspcetive view and survey of meta‐learning. Artif. Intell. Rev. 2022, 18, 77–95.
38. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non‐local neural networks. In Proceedings of the 2018 IEEE/CVF Conference Vision
and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury
to people or property resulting from any ideas, methods, instructions or products referred to in the content.