0% found this document useful (0 votes)

3 views

IncARMAG a Convolutional Neural Network With Multi-level Autoregressive Moving Average Graph Convolutional Processing Framework for Medical Image Classification

Uploaded by

mafex68146

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

IncARMAG a Convolutional Neural Network With Multi-level Autoregressive Moving Average Graph Convolutional Processing Framework for Medical Image Classification

Uploaded by

mafex68146

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Neurocomputing 617 (2025) 129038

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

IncARMAG: A convolutional neural network with multi-level autoregressive

moving average graph convolutional processing framework for medical
image classification
Adrian S. Remigio
Analytics, Computing and Complex Systems lab, Asian Institute of Management, Makati City, Philippines

ARTICLE INFO ABSTRACT

Communicated by Q. Huang Effective computer-aided detection and diagnosis algorithms are being sought to carry out complex pattern
recognition with great precision while increasing efficiency and robustness through automation. In this study,
Keywords:
we leveraged self-constructing graph (SCG) and autoregressive moving average (ARMA) graph convolution to
Medical image classification
Convolutional neural network
postprocess multi-level feature maps obtained from an Inception V3 network for medical image classification.
Graph convolution The adopted SCG learns latent embeddings from the feature maps, which are subsequently used to construct
Autoregressive moving average convolution graph topology based on embedding similarity. The ARMA convolution then learns to encode more flexible
Feature embedding spectral filter responses that simultaneously takes into account long-range dependencies from the SCG graph.
The proposed IncARMAG model was evaluated on 2D MedMNIST datasets, which consist of several imaging
modalities and classification tasks in the medical domain. Results from our experiments on MedMNIST datasets
showed that IncARMAG outperformed state-of-the-art models on many cases. IncARMAG has also achieved the
highest accuracy based on evaluation of various benchmark CNN, transformers, and hybrid models on four
selected datasets. These results highlight how IncARMAG is a strong candidate in medical image analysis
involving classification tasks.

1. Introduction gradient problem during backpropagation. This problem was rectified

in residual network (ResNet) models via identity mapping or shortcut
Deep learning in computer-aided diagnosis and medical interven- connections [7]. Various CNN designs, such as EfficientNets, are based
tions have gained an ever-increasing attention due to their capability on neural architecture search methods that jointly optimize speed
to handle complex tasks with excellent performance. The comple- and accuracy [8,9]. Squeeze-and-excitation, partial dense, and parallel
mentary contributions of deep learning in medical imaging includes multi-resolution are some examples of complex convolution blocks built
automation of detection, decision-making support, and image quality to address various issues in CNN architectures [10–12].
improvement [1]. Most commonly used deep learning approach for One revolutionary concept in deep learning is the attention mech-
medical computer vision tasks leverage the convolutional neural net- anism, which forms the building block of transformers. Transformers
work (CNN). CNN maps meaningful hierarchical features from imaging was adopted from natural language processing (NLP) for image classi-
data through the subsequent application and optimization of cascaded fication, wherein an input image is divided into patches and treated as
convolutional filters [2,3]. On the feature extraction part, the compu- tokens prior to feature embedding and processing [13]. The results of
tational load does not scale with image size due to weight sharing of transformer-based models have shown to be superior in terms of image
convolution operation. classification accuracy compared to CNN-based models. Since then,
AlexNet [4] and VGG [5] were some of the pioneer models in many advanced designs utilizing CNN and/or transformer designs have
the early image classification open-challenge, which instigated the emerged in an attempt to achieve better inductive biases with attention
rapid revolution of deep learning on computer vision. In recent years, information. The hierarchical vision transformer with shifted windows
tremendous improvement on CNN architecture has been made, empha- (Swin transformer) is an example architecture that complements the
sizing on accuracy, computational efficiency, and scalability at various attention mechanism with the sliding window approach as in CNN [14].
domains [6]. For instance, a progressive increase in convolutional On the other hand, ConvNext attempts to imitate the self-attention
layer depth exhibited an initial increase in accuracy, followed by a properties in Transformer using depthwise separable convolutions [15].
decline in accuracy after a certain depth due to vanishing or exploding

E-mail address: [email protected].

https://doi.org/10.1016/j.neucom.2024.129038
Received 13 June 2024; Received in revised form 11 November 2024; Accepted 22 November 2024
Available online 30 November 2024
0925-2312/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A.S. Remigio Neurocomputing 617 (2025) 129038

The advent of geometric deep learning has provided the bene- 2. Instead of the conventional GCNs, we used autoregressive mov-
fits of topological data analysis for improved performance in high- ing average (ARMA) graph convolutions for the GNN-based
dimensional learning, including computer vision [16]. For example, processing of the multi-level graph topology.
better representations can be identified by imposing properties from 3. Our proposed model was shown to outperform many state-of-the
geometric priors like symmetries, isomorphisms, and deformation sta- art (SOTA) image classification models on many medical image
bility [17]. Geometric learning can also establish topological struc- classification datasets.
tures that can augment valuable information to the low-dimensional 4. IncARMAG showed superior performance on most of the datasets
projection. Graph neural networks (GNNs) is a subset of geometric compared to SOTA classification models.
deep learning, and is a method dedicated for representation learning
of graphs [3]. A standard graph convolutional network (GCN) layer
2. Related works
applies a simple linear transformation of features followed by a sum-
mation of the resulting transformation with all the one-hop neighbors.
2.1. Medical image classification
Graph attentional operator (GAT) replaces a simple summation in the
GCN with weighted summation in which the weights are simultane-
As vision models continue to advance, their ongoing application
ously learned by the GNN model [18]. Other graph convolution takes
in the medical field becomes increasingly prevalent. Classification al-
advantage of neural networks, such as multilayer perceptron (MLP) gorithms that were originally developed and/or benchmarked from
or gated recurrent unit (GRU) for potentially more expressive feature other domains are continuously being adopted in the medical field.
transformation [19,20]. Exploration of using negative node samples in For example, ResNet models have been used in numerous medical
GNNs using detrimental point process has been explored and has shown imaging classification tasks such as cervical cancer identification [26],
benefit in using negative relation [21]. breast cancer detection [27], and pulmonary diseases [28]. Other mod-
Autoregressive moving average (ARMA) graph filters are built upon els and learning methods have played prominent roles in advancing
to approximate a spectral response function that is more flexible com- medical image classification. An important example is the development
pared with standard polynomial graph filters, such as the Chebyshev and/or integration of multiple architectural modules to complement
filters [22]. Graph spectral frequency decomposition via application the benefits of such modules. Yengec-Tasdemir et al. [29] employed
of the Laplacian operator indicates the variation in features between contrastive learning to improve the performance of various CNN models
connected nodes. ARMA graph filters have a better approximation of for polyp classification. Models developed for image classification are
a particular target response at higher frequencies compared with a often utilized for more complex vision tasks, wherein they serve as
classical polynomial filter [23]. Thus, ARMA convolution can generate backbone or feature extraction networks. One example of such applica-
precise filters that extend to graphs with higher spectral frequencies or tion in medical imaging is the use of ResNet module on feature pyramid
those graphs with edges between highly dissimilar node features. network for tooth and pulp instance segmentation [30].
GNNs can also handle regular grid-structured data by finding and MedViT is a hybrid CNN-transformer model that combines the su-
constructing relational information among the spatial grids. One stan- perior inductive bias of convolution, while gaining the global receptive
dard approach from extracting graph structure from images and/or field from attention mechanism [31]. Their evaluation on MedMNIST
feature maps is the use of superpixel clustering methods. In these clus- datasets, a repository of 2D and 3D image datasets from different med-
tering methods, distance or feature similarity measures have been used. ical modalities [32], showed a superior average accuracy of MedViT
The preprocessing steps of graph construction from images usually in- over ResNet, auto-sklearn, AutoKeras, and Google AutoML. Similarly,
volve superpixel construction and/or prior application of convolutional MedMamba models both global relations within the multidirectional
feature extraction to form the graph topology [24,25]. Self-constructing input images or feature maps via a multi-branch model of convolution
graph (SCG) is an algorithm that constructs graph topology from the and structured state–space sequence model [33]. This state–space se-
downsampled feature map by learning an embedding representation quence model is based on an iterative solution of a linear ordinary
that is jointly optimized with the entire model [15]. SCG forms node differential equation (ODE), which can also be generalized into a single
connections based on the similarity of the latent embeddings, which 1-D global convolution.
endows them to learn non-local context information from long-range
node connections. SCG was found to perform well on semantic image 2.2. GNN networks for image classification
labeling and segmentation compared with purely CNN models.
Motivated by the advantages of learning relation information among As emphasized, graph representation can be obtained directly from
feature maps [25], a modified hybrid CNN-GNN model for medical images or feature maps through various methods. In the work of Monti
image classification was introduced in this study. As emphasized, CNN et al. [34], learnable Gaussian kernels were applied to determine the
models have smaller receptive fields that limit learning global feature connections based on geometric distances of superpixels on images. Us-
interactions per layer, specifically in earlier layers. Although attention ing clustering methods, such as SLIC and SEED, it is possible to directly
mechanisms have overcome this problem, their inductive bias suffers in generate graph representations from images. Avelar et al. has used K-
terms of locality and translation equivariance. Hybrid CNN-GNNs is a nearest neighbors (KNN) and SLIC clustering algorithm to generate the
potential approach in achieving this; the use of CNN for preprocessing graph topology from the images. Graph attention network (GAT) layers
features, alongside with the node and edge permutation invariance of was then used to encode for node embeddings from the input superpixel
undirected GNNs, help preserve inductive bias while allowing for long- graphs [35]. A hybrid GNN architecture of GCN, GraphSAGE, and
range dependencies to be captured. Self-constructing graphs (SCG) was ARMA convolution filters following SLIC clustering was proposed by
leveraged in order to derive abstract graph topologies optimized to the Yao et al. [36] for hyperspectral image classification.
task at hand. The ARMA convolution was then applied to extract better Numerous GNN model designs for vision tasks incorporates CNN
feature embeddings corresponding to subgraphs with high spectral or transformer architecture to optimize deep feature extraction. Con-
frequencies. Combining all the components, we henceforth refer to this textual information from a series of images was modeled through a
architecture as IncARMAG. The key contributions of this paper are as combination of CNN and GNN in the work of Campos et al. [37].
follows: Node features and edges were calculated based on the initial CNN
feature maps of bounding boxes, followed by subgraph integration and
1. The model architecture proposed in this study utilized a multi- subsequent GNN layer operations. VisionG (ViG) stands as one of the
level feature map refinement with GNN. most successful application of GNN models in image classification. ViG

2
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 1. The proposed IncARMAG architecture, consisting of the following stages: (1) an Inception V3 backbone, (2) Self-constructing graph module, (3) ARMA GNN processing,
and (4) weighted concatenation and classifier head.

introduced the grapher module, which better alleviates over-smoothing Our backbone architecture is based on Inception V3, excluding the
compared to the standard GCN layer by adding nonlinear activation, final two Inception blocks and the fully connected head classifiers. Re-
residual connection, and MLP layer [38]. Based on their benchmarking ducing computational cost was the primary reason behind the omission
experiments on ImageNet dataset, ViG has outperformed representative of the final two layers. It was also argued that adding auxiliary classi-
CNN, MLP, and transformer models. In the study of Fei et al. [38], a fiers that branches from low-level inception blocks slightly boost model
high-level feature map from a ResNet backbone is treated as an input performance, which can be caused by either the refinement of low-
into a multi-branch network comprising of both local and global feature level features or the regularization by the auxiliary heads [41]. This has
extractors. The local feature extractor has a simple convolution and inspired us to investigate the effect of replacing simple MLP layers with
MLP framework, whereas the global feature extractor path consisted GNN as auxiliary classifiers. Our IncARMAG architecture has three GNN
of GCN layers and multi-head self-attention pooling. processing branches that take feature maps from different respective
stages in the Inception V3 backbone. We denote these feature maps as
3. Methods 𝛼 , 𝛽 , and 𝛾 . 𝛾 ∈ R7×7×2048 corresponds to the feature map of the last
Inception block, whereas 𝛼 and 𝛽 are from the preceding stages of
the backbone network. Two variants of IncARMAG model with different
An overview of the proposed IncARMAG architecture is illustrated
depth locations of 𝛼 and 𝛽 were developed in this study.
in Fig. 1. Details of each component of the IncARMAG model will be
discussed in the following subsections.
3.2. Self-constructing graph topology

3.1. Deep feature extraction Prior to applying graph convolution to the extracted deep features,
the feature maps must be converted into their representative graph
The original motivation behind the Inception architecture is to topology. In general, a graph can be defined by an ordered pair, 𝐺 =
capture the correlation patterns of feature maps at multiple scales, with { , }, where  and  are the set of nodes and edges, respectively.
the kernel size representing the range over which the correlated units The graph topology can also be represented via the adjacency matrix,
are distributed [39]. Another advantage of the Inception block is in the 𝐴 ∈ R| |×| | , where | | is the number of nodes. The SCG algorithm,
reduction of computational cost via depthwise filter concatenation and proposed by Liu et al. [15], was employed to automatically learn the
dimensionality reduction. In the improved Inception V3 model, compu- adjacency matrix given the input feature map.
tational cost was further reduced by factorizing the 5 × 5 convolutional Fig. 2 provides a schematic description of the SCG module [15].
kernel into a series of smaller and spatially asymmetric kernels [40]. For any feature map, denoted here simply as  , an adaptive average

3
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 2. The operations involved in the SCG algorithm. SCG generates a graph, defined by a node feature and adjacency matrix, as the output. In addition, residual prediction from
SCG are also added on final GNN layers.

pooling was used to reduce the dimensions of the feature map from  ∈ is added to the processed node embeddings by the ARMA convolution
′ ′
R𝑛×𝑛×𝑐 to  ′ ∈ R𝑛 ×𝑛 ×𝑐 , where 𝑐 is the number of convolutional filters layers. Thus, the output of the SCG blocks for each feature map includes
′
while 𝑛 and 𝑛 are the feature dimensions before and after pooling, ̂ and a residual prediction term, 𝑦.
both a graph, 𝐺 = { ′ , 𝐴}, ̂
respectively. The pooled features are considered as node features in
the graph, wherein each node has a node feature equal to depth-wise 3.3. Autoregressive moving average GNN
channel vector of a particular 2D location in the feature map. This is
′ ′
equivalent to reshaping the pooled feature map:  ′ ∈ R𝑛 ×𝑛 ×𝑐 → R𝑁×𝑐 , In the ARMA GNN stage of our proposed architecture, the formula-
where 𝑁 = | | = 𝑛 × 𝑛 .
′ ′ tion of Bianchi et al. [22] was used. The schematic diagram of the flow
Generating the adjacency graph via SCG requires embedding the of node feature processing following SCG is illustrated in Fig. 3. Graph
feature map into a latent space, parameterized by mean, 𝜇, and stan- convolution can be treated in the graph spectral domain by applying a
dard deviation, 𝜎 matrices. The mean matrix, 𝜇, is further encoded from filter response function, ℎ(𝜆), where 𝜆 is an eigenvalue of the Laplacian
the pooled features using a convolution with 3 × 3 kernel size, whereas matrix, 𝐿 = 𝐷 − 𝐴. Graph filtering transforms an input node with a
a 1 × 1 convolution with exponential activation is applied to obtain the single feature,  ∈ R𝑁 , into a filtered node, ̃ via:
𝜎 matrix. These matrices encodes a latent embedding, expressed in the
̃ = 𝑈 ⋅ 𝐻𝑑 (𝜆) ⋅ 𝑈 𝑇  (6)
form:
where 𝑈 is the eigenvector matrix of 𝐿, and 𝐻𝑑 (𝜆) is the element-wise
𝑍 = 𝜇 + 𝜎 ⋅  (0, 𝐼) → 𝐶 𝑜𝑛𝑣3×3 ( ′ ) + exp[𝐶 𝑜𝑛𝑣1×1 ( ′ )] ⋅  (0, 𝐼) (1)
application of ℎ on the diagonal matrix of eigenvalues, 𝜆 [3].
where  (0, 𝐼) is a Gaussian distributed noise present during training Due to issues on computational eigendecomposition and non-
∑
time. Connection between two nodes is established if their projected localized embedding, the polynomial filter, ℎpoly (𝜆) = 𝐾 𝑘
𝑘=1 𝜔𝑘 𝜆 , was
embeddings are nonnegative, leading to an initial graph adjacency introduced, and the resulting graph filtering relation in Eq. (6) is:
matrix given by: ∑
𝐾
( ) ̃ = 𝜔𝑘 𝐿𝑘  (7)
𝐴 = 𝑅𝑒𝐿𝑈 𝑍 𝑍 𝑇 (2) 𝑘=0

where 𝑅𝑒𝐿𝑈 is the rectified linear unit activation function [42]. For multi-channeled node feature, the weight, 𝑤𝑘 , in Eq. (7) is replaced
Compared to the original graph auto-encoders, SCG introduced by a weight matrix, 𝑊𝑘 . Several polynomial filters, such as the Cheby-
diagonal enhancement through the adaptive factor: shev, were introduced to better model the desired filter responses while
√ minimizing overfitting in the extrapolated frequency range.
𝑁
𝛾 = 1 + ∑𝑁 (3) As graphs can be interpreted in the spectral domain through the
𝑖=1 𝐴𝑖𝑖 + 𝜖 eigenvalues of the Laplacian matrix, several graph convolution op-
where 𝐴𝑖𝑖 is the diagonal entries of the adjacency matrix, and 𝜖 is erators based on polynomial filters were developed. One example is
a smoothing factor. Combining the diagonal enhancement with the the Chebyshev polynomial, which replaces the simple 𝐿𝑘 with the
adjacency matrix normalization, the output adjacency matrix from SCG Chebyshev function, 𝑇𝑘 (𝐿) [43]. The graph convolution network (GCN)
is: is a simplification of the Chebyshev spectral filtering with 𝐾 = 1.
1 1 The autoregressive moving average (ARMA) filter is based on the filter
𝐴̂ = 𝐷− 2 (𝐴 + 𝛾 ⋅ diag(𝐴) + 𝐼) 𝐷 2 (4)
response relation:
where 𝐷 is the degree matrix with the diagonal entries equal to the ( )−1 ( 𝐾 )
∑𝐾 ∑
degree of the corresponding node. (𝑙+1) = 𝐼 + 𝑞𝑘 𝐿𝑘 𝑝𝑘 𝐿𝑘 (𝑙) (8)
In addition learning the graph from the feature maps, a residual 𝑘=0 𝑘=0

prediction term, given by: where (𝑙) and (𝑙+1) are the node feature embeddings at convolutional
layers 𝑙 and 𝑙 + 1, respectively; 𝑞𝑘 and 𝑝𝑘 are filter coefficients [23].
𝑦̂ = 𝛾 ⋅ 𝜇 (1 − log 𝜎) (5)

4
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 3. Schematic flow of node features in the ARMA convolution layer with arbitrary number of recursion, 𝑇 , and number of layers, 𝑀.

However, to avoid the intractable inverse matrix operation, the inverse Table 1
Details on the architectural design of the proposed model.
term is approximated by an autoregressive term with additional skip
Hyper-parameter 𝛼 𝛽 𝛾
connection. The resulting ARMA convolution is given by the recursion
SCG average pooling output size (12,12) (12,12) (5,5)
relation:
( ) SCG number of channels for 𝜇 and 𝜎 512 512 1024
(𝑡+1) ̂ (𝑡) 𝑊𝑙 +  (0) 𝑉𝑙 Number of channels for each ARMA 128 128 256
(𝑙) = 𝜎 𝐿 (𝑙) (𝑙)
(9)
Dropout rate for ARMA 0.3 0.3 0.3
Activation function used in ARMA ReLU ReLU ReLU
(𝑡)
where (𝑙) is the node embedding estimation at iteration 𝑡 and layer 𝑙; Number of channels for MLP layer 128 128 256
𝑊 and 𝑉 are learnable weight matrices; 𝐿̂ is the modified Laplacian; Weighted concatenation coefficient 𝛼 = 0.25 𝛽 = 0.5 –

and 𝜎 is a nonlinear activation function.

(𝑡)
Multiple stacks of (𝑙+1) can be made with ARMA graph convolution. Table 2
If 𝑡 ∈ {1, 2, … , 𝑇 }, the set of final node features from 𝐽 iterations are General information on some of the selected MedMNIST 2D dataset for classification
model benchmarking [32].
averaged to yield the output, ̃ . The GNN layers in our proposed model
Dataset Image modality Number of Number of
uses a special case of the ARMA convolution with 𝑇 = 3, and 𝐽 = 1. This classes train/val/test split
reduces to a single-branch ARMA filtering layer. Referring to Eqs. (5) DermaMNIST Dermatoscope 7 7,007/1,003/2,005
and (9), the ARMA filtering layer used in our proposed model can be OCTMNIST Retinal OCT 4 97,477/10,832/1,000
mathematically expressed as: PneumoniaMNIST Chest x-ray 2 4,708/524/624
( ) BreastMNIST Breast ultrasound 2 546/78/156
(3) (2) (1)
(𝑙+1) = (𝑙) ◦(𝑙) ◦(𝑙) ((𝑙) ) + 𝜌𝑦̂ (10) BloodMNIST Blood cell microscope 8 11,959/1,712/3,421
OrganAMNIST Abdominal CT 11 34,561/6,491/17,778
OrganCMNIST Abdominal CT 11 12975/2392/8268
where 𝜌 = 1 if (𝑙+1) is from the final ARMA convolutional layer and
OrganSMNIST Abdominal CT 11 13940/2452/8829
𝜌 = 0 otherwise. Two ARMA GNN layers are used in each feature map
processing stage.

4. Experimental results
3.4. Feature aggregation and classification head
4.1. Architecture and training implementation

The output node features from each feature processing level are A summary of the standard architecture is shown in Table 1. To
subjected to an average pooling to yield a single feature vector. Feature evaluate our proposed model in medical imaging, we used eight 2D
vectors ℎ𝛼 , ℎ𝛽 , and ℎ𝛾 correspond to the pooled node features following imaging datasets from the MedMNIST V2 repository [32]. The datasets
the GNN processing of 𝐹𝛼 , 𝐹𝛽 , and 𝐹𝛾 , respectively. Each feature vector used on our experiments are defined in Table 2. MedMNIST provides
is passed on a single fully-connected layer (or MLP) for feature post- an excellent platform for evaluating classification models in medical
processing. The feature vectors from the three GNN processing levels imaging due to its diverse range of modalities. DermaMNIST involves
are combined through weighted concatenation: a classification of several types of pigmented skin lesion from dermato-
[ ] scope images. PneumoniaMNIST, OCTMNIST, and BreastMNIST are
𝛼 𝛽
ℎ′ = Concat 𝑀 𝐿𝑃 (ℎ𝛼 ), 𝑀 𝐿𝑃 (ℎ𝛽 ), 𝑀 𝐿𝑃 (ℎ𝛾 ) (11) classification tasks for detection of a presence of a disease correspond-
𝛼+𝛽+1 𝛼+𝛽+1
ing to pediatric X-ray, retinal optical coherence tomography (OCT) and
Ultimately, the concatenated feature vector, ℎ′ , proceeds to the output breast ultrasound, respectively. Normal peripheral blood classification
softmax layer to yield the prediction score per class. using cell microscopy was also included as a benchmarking model.

5
A.S. Remigio Neurocomputing 617 (2025) 129038

Table 3
Comparison of the performance of IncARMAG with various published models that were evaluated on DermaMNIST,
OCTMNIST, PneumoniaMNIST, and BreastMNIST datasets.
Model DermaMNIST OCTMNIST Pneumonia BreastMNIST
MNIST
AUC ACC AUC ACC AUC ACC AUC ACC
ResNet18 [32] 0.920 0.754 0.958 0.763 0.956 0.864 0.891 0.833
ResNet50 [32] 0.912 0.731 0.958 0.776 0.962 0.884 0.857 0.812
auto-sklearn [32] 0.902 0.719 0.887 0.601 0.942 0.855 0.836 0.803
AutoKeras [32] 0.915 0.749 0.955 0.763 0.947 0.878 0.871 0.831
Google AutoML Vision [32] 0.914 0.768 0.963 0.771 0.991 0.946 0.919 0.861
MedViT-T [31] 0.914 0.768 0.961 0.767 0.993 0.949 0.934 0.896
MedViT-S [31] 0.937 0.780 0.960 0.782 0.995 0.961 0.938 0.897
MedViT-L [31] 0.920 0.773 0.945 0.761 0.991 0.921 0.929 0.883
EHDFL [45] 0.917 0.769 0.917 0.769 0.968 0.883 0.894 0.897
IncARMAG (Ours) 0.938 0.802 0.985 0.873 0.938 0.885 0.916 0.872

Table 4
Comparison of the performance of IncARMAG with various published models that were evaluated on BloodMNIST,
OrganAMNIST, OrganCMNIST, and OrganSMNIST datasets.
Model BloodMNIST OrganAMNIST OrganCMNIST OrganSMNIST
AUC ACC AUC ACC AUC ACC AUC ACC
ResNet18 [32] 0.998 0.958 0.997 0.935 0.994 0.920 0.974 0.782
ResNet50 [32] 0.997 0.950 0.998 0.947 0.993 0.911 0.975 0.785
auto-sklearn [32] 0.984 0.878 0.963 0.762 0.976 0.829 0.945 0.672
AutoKeras [32] 0.998 0.961 0.994 0.905 0.990 0.879 0.974 0.813
Google AutoML Vision [32] 0.998 0.966 0.990 0.886 0.988 0.877 0.964 0.749
MedViT-T [31] 0.996 0.950 0.995 0.931 0.991 0.901 0.972 0.789
MedViT-S [31] 0.997 0.951 0.996 0.928 0.993 0.916 0.987 0.805
MedViT-L [31] 0.996 0.954 0.997 0.943 0.994 0.922 0.973 0.806
EHDFL [45] – – 0.997 0.936 0.994 0.916 0.974 0.784
IncARMAG (Ours) 0.999 0.987 0.993 0.927 0.995 0.934 0.975 0.801

Lastly, datasets on organ identification in abdominal computed tomog- decline is noted in BreastMNIST, PneumoniaMNIST, and OrganAM-
raphy (CT) at different body planes are used for model performance NIST when comparing IncARMAG with the benchmark models. For
benchmarking. OrganAMNIST, ResNet50 achieved the highest accuracy and AUC. The
Although MedMNIST comes with different image dimensions, we architecture of our IncARMAG potentially matches the generalizabil-
limit our experiment on 224 × 224 dimensions since they closely match ity of MedViT in medical image classification based on the ranking
the dimensions utilized in real clinical environments. The images in all statistics of performance in Tables 3 and 4. Despite not using any data
datasets underwent z-normalization as a preprocessing step to the clas- augmentation techniques, our proposed model is still capable of gener-
sification models. All models implemented on this study were trained ating reliable predictions following model training, with the exception
and tested on the same set of training and testing images, respectively. of pediatric pneumonia classification task in PneumoniaMNIST. IncAR-
Our proposed model was trained using the adaptive moment estimation MAG has also surpassed the performance of autoML models and ResNet
(Adam) optimizer [44] for 100 epochs with an initial learning rate architectures in most of the MedMNIST datasets. Overall, IncARMAG
of 1E-04 that decays by a factor of 0.1 for epoch milestones 50 and delivered the best performance on 4 out of 8 datasets, with MedViT-
75. A batch size of 16 was used for all implemented models and for S ranking as the top model on 3 out of 8 datasets, while ResNet50
all datasets in this study. A regularization decay factor of 0.0001 was secured the top spot on just one dataset. IncARMAG also has a slight
used for all models except for the transformer-based models in order to advantage over MedViT in terms of parameter count; IncARMAG has
avoid nonfinite loss. Data augmentation was not employed during the 21.4 M parameters, while MedViT has 23 M.
training phase. All implementations were carried out using PyTorch and To further assess the capabilities of our model, we trained and
Torchvision in multiple Nvidia GeForce GTX 1080 GPUs. evaluated several CNN, transformer, and hybrid-type models on four
MedMNIST datasets using the aforementioned data preprocessing and
training hyper-parameters in Section 4.1. These models are the ResNet
4.2. Comparison with SOTA models
series [7], Inception V3 [41], EfficientNet V2 [8], ConvNext-B [6],
ViT-B [46], MaxViT [47], Swin-S [48], DEIT [49], RIFormer [50],
Tables 3 and 4 present the accuracy (ACC) and area under the HorNet [51], and ViG [37]. These models were implemented using
receiver operating curve (AUC) of our proposed IncARMAG across torchvision and mmpretrain libraries [52]. Overall accuracy, macro re-
eight MedMNIST datasets. We cite the results of some studies that call, and macro precision are reported for each model. Overall accuracy
has carried out experiments on MedMNIST datasets for comparison. offers a general insight on the model performance across the evaluation
These include ResNets and autoML models from the original MedM- dataset while macro precision and recall highlight false positive and
NIST study [32], MedViT [31], and evolutionary hybrid domain fea- false negative predictions, respectively, with less sensitivity to class
ture learning (EHDFL) [45]. IncARMAG achieved the highest accuracy imbalances [53].
and AUC on DermaMNIST, OCTMNIST, BloodMNIST, and OrganCM- The experimental results of our proposed and benchmarking models
NIST, whereas MedViT model performance dominated at the Pneumo- for the four selected MedMNIST datasets are shown in Tables 5 and
niaMNIST, BreastMNIST, and OrganSMNIST. An accuracy improvement 6, which shows that IncARMAG outperforms the counterpart bench-
greater than 2% is given by our proposed IncARMAG for DermaMNIST, marking models in terms of accuracy and precision. ResNet and In-
OCTMNIST, and BloodMNIST. Significant increase in accuracy is ob- ception have consistently attained above-median performance for the
served on the OCTMNIST dataset evaluation. However, performance four datasets. The performance of transformer-based models varies

6
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 4. Confusion matrix of IncARMAG predictions on the test set of (a) DermaMNIST, (b) PneumoniaMNIST, (c) BloodMNIST, and (d) BreastMNIST.

Table 5 Table 6
Performance of IncARMAG and benchmarking models on DermaMNIST and Pneumo- Performance of IncARMAG and benchmarking models on BloodMNIST and BreastMNIST
niaMNIST datasets. datasets.
Model DermaMNIST PneumoniaMNIST Model BloodMNIST BreastMNIST
Acc Prec Rec. Acc. Prec. Rec. Acc Prec Rec. Acc. Prec. Rec.
ResNet101 0.757 0.575 0.569 0.865 0.826 0.995 ResNet101 0.985 0.984 0.986 0.814 0.876 0.868
ResNet152 0.783 0.628 0.604 0.859 0.819 0.995 ResNet152 0.985 0.984 0.986 0.821 0.898 0.851
Inception V3 0.782 0.587 0.609 0.864 0.822 0.997 Inception V3 0.984 0.985 0.986 0.840 0.908 0.868
EfficientNet V2 0.764 0.597 0.526 0.852 0.813 0.995 EfficientNet V2 0.984 0.984 0.984 0.731 0.731 1.0
ConvNext 0.756 0.583 0.522 0.872 0.834 0.992 ConvNext 0.981 0.983 0.982 0.808 0.818 0.948
ViT-B 0.730 0.493 0.480 0.830 0.794 0.977 ViT-B 0.956 0.952 0.949 0.711 0.805 0.798
MaxViT 0.773 0.629 0.524 0.851 0.811 0.992 MaxViT 0.983 0.984 0.984 0.827 0.822 0.974
Swin-S 0.782 0.587 0.609 0.625 0.625 1.00 Swin-S 0.889 0.870 0.856 0.673 0.756 0.816
DEIT V3 0.732 0.514 0.509 0.835 0.800 0.982 DEIT V3 0.958 0.960 0.954 0.750 0.800 0.877
RIFormer 0.730 0.485 0.430 0.825 0.791 0.980 RIFormer 0.949 0.960 0.953 0.699 0.734 0.921
HorNet 0.758 0.565 0.574 0.830 0.806 0.982 HorNet 0.851 0.869 0.808 0.737 0.739 0.991
ViG 0.749 0.506 0.491 0.870 0.832 0.992 ViG 0.983 0.984 0.984 0.808 0.862 0.877
IncARMAG (Ours) 0.795 0.667 0.617 0.886 0.849 0.995 IncARMAG (Ours) 0.987 0.987 0.986 0.872 0.898 0.930

across the datasets, which can be attributed to the slower convergence

as a function of training data points [46]. The motivation behind some significant discrepancies in accuracy on two out of four of the
the SCG in IncARMAG and attention mechanism in transformer-based benchmark datasets.
models partially share some resemblance, which is set to allow learning Table 5 shows that although the overall accuracy of the models
long-range dependencies among the nodes for graphs or tokens for evaluated under the DermaMNIST dataset are reasonable in value, the
transformers. Thus, our proposed model leverages the similarity with precision and recall per class reveals unacceptable performance. These
attention mechanism while maintaining faster performance conver- results entail that model degeneracy and/or imbalanced class training
gence as demonstrated by CNN models. Although both IncARMAG may have occurred. Insights on the error distribution can be visualized
and ViG make use of GNN approach in their architecture, there are through the confusion matrices shown in Figs. 4(a)–4(d) for the four

7
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 5. Performance metrics of various IncARMAG variants evaluated on the test set of (a) DermaMNIST, (b) PneumoniaMNIST, (c) BloodMNIST, and (d) BreastMNIST. IncARMAG
V2 has no feature processing at 𝐹𝛼 and 𝐹𝛽 while IncARMAG V3 has feature processing frameworks moved to Inception blocks 7 and 8 for 𝐹𝛼 and 𝐹𝛽 , respectively.

datasets. Fig. 4(a) shows the IncARMAG has a strong accuracy owing 4.3. Ablation experiments
to the slight prediction bias towards the prominent melanoma label in
an imbalanced dataset. Ablation experiments were also conducted to assess the impact of
There is minimal to no improvement in classification accuracy various architectural components on the overall model performance.
compared with some of the benchmark models for PneumoniaMNIST These experiments were evaluated on the four of the MedMNIST
and BloodMNIST experiments. The confusion matrix of IncARMAG on datasets: DermaMNIST, PneumoniaMNIST, BloodMNIST, and BreastM-
BloodMNIST experiment shows strong classification accuracy across all NIST. Two additional variants of our proposed model were investigated.
labels, as illustrated in Fig. 4(c). A large gap in accuracy can be found One IncARMAG variant, denoted as IncARMAG v2, has a single GNN-
between IncARMAG and the benchmark models in the BreastMNIST based feature processing at the last feature map, 𝐹𝛾 . We have also
dataset. Swin-S was able to surpass IncARMAG in terms of recall explored varying the feature map levels, 𝐹𝛼 and 𝐹𝛽 , that were con-
as this model gives a completely degenerate prediction that leads to nected to the GNN-based feature refinement. IncARMAG V3 has 𝐹𝛼
nonexistent false negative. The ability of an image classifier model to and 𝐹𝛽 equal to the feature maps at Inception blocks 6 and 7, respec-
generalize to a diverse set of input data is crucial for robustness and tively. The evaluation results of these IncARMAG variants on the four
reliability in various real-world applications. This implies that a model’s MedMNIST datasets in Table 2 are shown in Figs. 5.
true performance is demonstrated when it consistently performs well There is a clear benefit in using multi-level feature processing
across a variety of datasets. Although there are instances in tables framework on the DermaMNIST, PneumoniaMNIST, and BreastMNIST
3 and 4 with near-perfect predictions from the benchmark models, datasets. Adding auxiliary classifiers may result in an increased accu-
our proposed model also exhibited this level of performance while racy, which can be ascribed to the feature refinement and/or inherent
surpassing them on certain datasets. regularization as argued in the Inception V3 paper [41]. Comparison

8
A.S. Remigio Neurocomputing 617 (2025) 129038

Table 7 Table 9
Model performances with different types of GNN convolution layers for DermaMNIST Impact of adaptive diagonal enhancement and residual predictions in IncARMAG
and PneumoniaMNIST datasets. performance on DermaMNIST and PneumoniaMNIST datasets.
Layer type DermaMNIST PneumoniaMNIST Details DermaMNIST PneumoniaMNIST
Acc Prec Rec. Acc. Prec. Rec. Acc Prec Rec. Acc. Prec. Rec.
GCN 0.763 0.582 0.526 0.864 0.824 0.995 Without diagonal enhancement 0.782 0.597 0.570 0.885 0.822 0.995
GAT 0.791 0.638 0.607 0.857 0.818 0.992
Without residual predictions 0.792 0.609 0.615 0.862 0.822 0.995
Chebyshev 0.794 0.656 0.622 0.862 0.825 0.990
Transformer 0.790 0.657 0.599 0.867 0.813 0.995
GIN 0.793 0.666 0.608 0.885 0.849 0.992
Table 10
ARMA 0.795 0.667 0.617 0.886 0.849 0.995
Impact of adaptive diagonal enhancement and residual predictions in IncARMAG
performance on BloodMNIST and BreastMNIST datasets.

Table 8 Details BloodMNIST BreastMNIST

Model performances with different types of GNN convolution layers for BloodMNIST Acc Prec Rec. Acc. Prec. Rec.
and BreastMNIST datasets.
Without diagonal enhancement 0.988 0.988 0.989 0.875 0.906 0.930
Layer type BloodMNIST BreastMNIST
Without residual predictions 0.986 0.987 0.987 0.865 0.912 0.904
Acc Prec Rec. Acc. Prec. Rec.
GCN 0.987 0.987 0.987 0.865 0.897 0.921
GAT 0.989 0.990 0.989 0.891 0.906 0.947
Chebyshev 0.988 0.988 0.988 0.878 0.906 0.930 convolutional layer generalizes best across different datasets [54–56].
Transformer 0.987 0.988 0.987 0.885 0.929 0.912 This can be attributed to different inherent dataset properties that may
GIN 0.988 0.988 0.988 0.885 0.914 0.930
ARMA 0.987 0.987 0.986 0.872 0.898 0.930
modulate the importance of node feature filtering, feature aggregation,
and update components.
One way to determine if meaningful features from images were
learned by IncARMAG is via visualization of feature activation
of results of Inception V3 on Tables 3–4 with Figs. 5(a) and 5(b) heatmaps. In Fig. 6, we show a comparison of feature activations
shows how we are able to increase the predictive performance from of ViG and IncARMAG obtained on several images using gradient-
the classical Inception V3 by considering ARMA convolutional layers weighted class activation mapping (grad-cam). Grad-cam initially cal-
as the multi-level auxiliary classifiers. However, there is no significant culates class-specific weight importance, which are incorporated to the
difference in accuracy between IncARMAG and IncARMAG v3 on all weighted-sum of feature activation maps [57].
four datasets. Thus, the benefit of auxiliary GNN classifiers in IncAR- The first row illustrates how the ViG model was able capture
MAG may be considered more of as an added regularizer wherein features localized in the melanoma region for prediction. On the other
it encourages global feature diversity through the SCG and ARMA hand, IncARMAG has better coverage of the region-of-interest while
modules. having partial inclusion of areas outside the melanoma region. Superior
Experiments comparing ARMA convolution with other GNN con- localization of feature activation importance is demonstrated by IncAR-
volution types in medical image classification were conducted. In this MAG compared with ViG for the PneumoniaMNIST image prediction.
case, the ARMA convolution was replaced with other GNN convolution In this case, IncARMAG obtained hotspots on lung areas with blurry in-
types while retaining the backbone, SCG, and classifier head archi- filtrates. Both IncARMAG and Vision-G have strengths and weaknesses
tectures from the original IncARMAG. The results of ablation study in terms of the extent and precision of feature activation coverage in the
for GNN convolution types in Tables 7 and 8 shows how ARMA con- eosinophil image. Lastly, IncARMAG form one aggregated hotspot with
volution is superior for DermaMNIST and PneumoniaMNIST datasets, significant extension beyond the breast tumor region, which is partially
whereas GAT convolution achieved the best performance in the Blood- justified for distinguishing the ROI from the background structures.
MNIST and BreastMNIST datasets. Thus, there are cases or datasets Overall, IncARMAG has shown wide feature activation coverage in class
wherein robust weighted aggregation provided by GAT convolution is score prediction with hotspots on the regions-of-interest.
more important than the flexible spectral filtering in ARMA convolu- It is imperative to develop classification models with great ac-
tion. Results from Tables 9 and 10 infer that excluding the adaptive curacy given the gravity of errors within the medical domain. This
diagonal enhancement in the SCG module has minimal effect on In- study has enhanced the foundational Inception model for medical
cARMAG performance, except in the DermaMNIST dataset. On the image classification. For future work, additional data preprocessing
other hand, the residual predictions from the SCG features have no- techniques, such as data augmentation, should be considered to en-
ticeable impact on PneumoniaMNIST and BreastMNIST, as well as in hance the IncARMAG model performance. In addition, hyper-parameter
the precision for DermaMNIST. optimization can be utilized to determine the optimal model hyper-
parameters, including the SCG and GNN hidden dimensions, number of
5. Discussion iterations in each ARMA layer, etc. Although the classification models
we have reported require further improvement on some of the datasets,
Our extensive benchmarking experiments present that the proposed IncARMAG provides a potential direction on how to develop computer-
CNN-GNN achieves state-of-the-art performance across multiple med- aided classification in medical images using multi-level GNN for feature
ical imaging datasets. Similar with the motivation behind MedViT processing. IncARMAG can also be used as a groundwork for developing
design, we have provided further support on the advantages of com- computer vision models on other medical imaging tasks, such as im-
bining CNN inductive bias with capturing long-range dependencies. age segmentation and registration. For example, IncARMAG can serve
However, it was also shown that the choice of algorithm for feature as a backbone encoder network in an encoder–decoder segmentation
extraction structure and handling of long-range dependencies has a sig- architecture.
nificant impact on performance. This was demonstrated by the subpar Translating these AI models to clinical utility requires substantial
performance of some hybrid CNN-Transformer models in Tables 5 and validation from prospective and retrospective studies. An initial chal-
6. The variation of model performance with type of graph convolution lenge is to increase the number of samples and cases in order to improve
despite using the same backbone and graph construction is also an accuracy and generalizability of classification models such that reliable
example. Various experimental evaluations of GCN, GAT, Chebyshev, predictions can be generated with unseen data [58]. Ensuring robust
GIN, and ARMA have shown that there is weak consensus on which performance also requires addressing algorithmic biases and data drift.

9
A.S. Remigio Neurocomputing 617 (2025) 129038

Fig. 6. Grad-cam visualization of ViG and IncARMAG feature activations on a sample image in each dataset. Red and blue colors in the heatmap correspond to the maximum and
minimum extremes of activation values, respectively.

Once a sufficient accuracy has been achieved, conducting prospec- Funding

tive studies for clinical validation further strengthens the practical
effectiveness of AI models in clinical settings [59]. This research did not receive any specific grant from funding agen-
cies in the public, commercial, or not-for-profit sectors.

6. Conclusion Declaration of competing interest

The novelty of our proposed IncARMAG lies on the adoption of The authors declare the following financial interests/personal rela-
SCG-ARMA GNN framework on different hierarchical feature levels to tionships which may be considered as potential competing interests:
Adrian S. Remigio reports a relationship with Asian Institute of Manage-
capture long-range representations. Our experiments has demonstrated
ment that includes: employment. If there are other authors, they declare
that IncARMAG ranks as one of the top models in classification of
that they have no known competing financial interests or personal
medical images that spans different imaging modalities. IncARMAG
relationships that could have appeared to influence the work reported
has also outperformed various CNN and transformer-based state-of- in this paper.
the-art models based on more extensive experiments on four medical
imaging datasets. The boost in performance with multi-level version of Data availability
IncARMAG has presented the advantages of using additional SCG-GNN
as auxiliary heads, which can be attributed to the regularization and/or Data will be made available on request.
feature diversity.

10
A.S. Remigio Neurocomputing 617 (2025) 129038

References [28] D.M. Ibrahim, N.M. Elshennawy, A.M. Sarhan, Deep-chest: Multi-classification
deep learning model for diagnosing COVID-19, pneumonia, and lung cancer chest
[1] J. Potočnik, S. Foley, E. Thomas, Current and potential applications of artificial diseases, Comput. Biol. Med. 132 (2021) 104348, http://dx.doi.org/10.1016/j.
intelligence in medical imaging practice: A narrative review, J. Med. Imag. compbiomed.2021.104348.
Radiat. Sci. (2023) http://dx.doi.org/10.1016/j.jmir.2023.03.033. [29] S.B. Yengec-Tasdemir, Z. Aydin, E. Akay, S. Dogan, B. Yilmaz, An effective
[2] A. Shah, M. Shah, A. Pandya, R. Sushra, R. Sushra, M. Mehta, K. Patel, K. Patel, colorectal polyp classification for histopathological images based on supervised
A comprehensive study on skin cancer detection using artificial neural network contrastive learning, Comput. Biol. Med. 172 (2024) 108267, http://dx.doi.org/
(ANN) and convolutional neural network (CNN), Clin. eHealth (2023). 10.1016/j.compbiomed.2024.108267.
[3] Y. Ma, J. Tang, Deep Learning on Graphs, Cambridge University Press, 2021. [30] W. Duan, Y. Chen, Q. Zhang, X. Lin, X. Yang, Refined tooth and pulp segmen-
[4] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep tation using U-Net in CBCT image, Dentomaxillofacial Radiol. 50 (6) (2021)
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012). 20200251.
[31] O.N. Manzari, H. Ahmadabadi, H. Kashiani, S.B. Shokouhi, A. Ayatollahi,
[5] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
MedViT: a robust vision transformer for generalized medical image classifi-
image recognition, 2014, http://dx.doi.org/10.48550/arXiv.1409.1556, arXiv.
cation, Comput. Biol. Med. 157 (2023) 106791, http://dx.doi.org/10.1016/j.
[6] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for
compbiomed.2023.106791.
the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
[32] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, B. Ni, Medmnist v2-a
Pattern Recognition, 2022, pp. 11976–11986.
large-scale lightweight benchmark for 2d and 3d biomedical image classification,
[7] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
Sci. Data 10 (1) (2023) 41, http://dx.doi.org/10.1038/s41597-022-01721-8.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
[33] Y. Yue, Z. Li, Medmamba: Vision mamba for medical image classification, 2024,
2016, pp. 770–778.
http://dx.doi.org/10.48550/arXiv.2403.03849, arXiv preprint arXiv:2403.03849.
[8] M. Tan, Q. Le, Efficientnetv2: Smaller models and faster training, in: International
[34] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M.M. Bronstein, Geometric
Conference on Machine Learning, PMLR, 2021, pp. 10096–10106.
deep learning on graphs and manifolds using mixture model cnns, in: Proceedings
[9] B. Zoph, Q.V. Le, Neural architecture search with reinforcement learning, 2016,
of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.
http://dx.doi.org/10.48550/arXiv.1611.01578, arXiv preprint arXiv:1611.01578.
5115–5124.
[10] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu,
[35] P.H. Avelar, A.R. Tavares, T.L. da Silveira, C.R. Jung, L.C. Lamb, Superpixel
R. Pang, V. Vasudevan, et al., Searching for mobilenetv3, in: Proceedings of the
image classification with graph attention networks, in: 2020 33rd SIBGRAPI
IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314–1324.
Conference on Graphics, Patterns and Images, SIBGRAPI, IEEE, 2020, pp.
[11] C.-Y. Wang, H.-Y.M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, I.-H. Yeh, CSPNet:
203–209, http://dx.doi.org/10.1109/SIBGRAPI51738.2020.00035.
A new backbone that can enhance learning capability of CNN, in: Proceedings
[36] D. Yao, Z. Zhi-li, Z. Xiao-feng, C. Wei, H. Fang, C. Yao-ming, W.-W. Cai,
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Deep hybrid: multi-graph neural network collaboration for hyperspectral image
Workshops, 2020, pp. 390–391.
classification, Defence Technol. 23 (2023) 164–176, http://dx.doi.org/10.1016/
[12] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M.
j.dt.2022.02.007.
Tan, X. Wang, et al., Deep high-resolution representation learning for visual
[37] K. Han, Y. Wang, J. Guo, Y. Tang, E. Wu, Vision gnn: An image is worth graph
recognition, IEEE Trans. Pattern Anal. Mach. Intell. 43 (10) (2020) 3349–3364,
of nodes, Adv. Neural Inf. Process. Syst. 35 (2022) 8291–8303.
http://dx.doi.org/10.1109/TPAMI.2020.2983686.
[38] Z. Fei, J. Guo, H. Gong, L. Ye, E. Attahi, B. Huang, A GNN architecture with
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser,
local and global-attention feature for image classification, IEEE Access (2023)
I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
http://dx.doi.org/10.1109/ACCESS.2023.3285246.
[14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: [39] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V.
Hierarchical vision transformer using shifted windows, in: Proceedings of the Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of
IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022. the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.
[15] Q. Liu, M. Kampffmeyer, R. Jenssen, et al., Self-constructing graph convolu- 1–9, http://dx.doi.org/10.1109/CVPR.2015.7298594.
tional networks for semantic labeling, in: IGARSS 2020-2020 IEEE International [40] X. Ding, X. Zhang, J. Han, G. Ding, Scaling up your kernels to 31 × 31: Revisiting
Geoscience and Remote Sensing Symposium, IEEE, 2020, pp. 1801–1804, http: large kernel design in cnns, in: Proceedings of the IEEE/CVF Conference on
//dx.doi.org/10.1109/IGARSS39084.2020.9324719. Computer Vision and Pattern Recognition, 2022, pp. 11963–11975, http://dx.
[16] Y. Singh, C. Farrelly, Q.A. Hathaway, A. Choudhary, G. Carlsson, B. Erickson, doi.org/10.1109/CVPR52688.2022.01166.
T. Leiner, The role of geometry in convolutional neural networks for medical [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception
imaging, Mayo Clin. Proc. Dig. Health 1 (4) (2023) 519–526. architecture for computer vision, in: Proceedings of the IEEE Conference on
[17] M.M. Bronstein, J. Bruna, T. Cohen, P. Veličković, Geometric deep learning: Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
Grids, groups, graphs, geodesics, and gauges, 2021, arXiv preprint arXiv:2104. [42] T.N. Kipf, M. Welling, Variational graph auto-encoders, 2016, http://dx.doi.org/
13478. 10.48550/arXiv.1611.07308, arXiv preprint arXiv:1611.07308.
[18] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, et al., [43] M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural networks on
Graph attention networks, stat 1050 (20) (2017) 10–48550. graphs with fast localized spectral filtering, Adv. Neural Inf. Process. Syst. 29
[19] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural (2016).
networks? 2018, arXiv preprint arXiv:1810.00826. [44] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, http:
[20] D. Buterez, J.P. Janet, S.J. Kiddle, D. Oglic, P. Liò, Graph neural networks with //dx.doi.org/10.48550/arXiv.1412.6980, arXiv preprint arXiv:1412.6980.
adaptive readouts, Adv. Neural Inf. Process. Syst. 35 (2022) 19746–19758. [45] Q. Han, M. Hou, H. Wang, C. Wu, S. Tian, Z. Qiu, B. Zhou, EHDFL: Evolutionary
[21] W. Duan, J. Xuan, M. Qiao, J. Lu, Learning from the dark: boosting graph hybrid domain feature learning based on windowed fast Fourier convolution
convolutional neural networks with diverse negative samples, in: Proceedings pyramid for medical image classification, Comput. Biol. Med. 152 (2023)
of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 6, 2022, pp. 106353, http://dx.doi.org/10.1016/j.compbiomed.2022.106353.
6550–6558. [46] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
[22] F.M. Bianchi, D. Grattarola, L. Livi, C. Alippi, Graph neural networks with M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16 × 16
convolutional arma filters, IEEE Trans. Pattern Anal. Mach. Intell. 44 (7) (2021) words: Transformers for image recognition at scale, 2020, http://dx.doi.org/10.
3496–3507, http://dx.doi.org/10.1109/TPAMI.2021.3054830. 48550/arXiv.2010.11929, arXiv preprint arXiv:2010.11929.
[23] N. Tremblay, P. Gonçalves, P. Borgnat, Design of graph filters and filterbanks, [47] Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, Y. Li, Maxvit: Multi-
in: Cooperative and Graph Signal Processing, Elsevier, 2018, pp. 299–324. axis vision transformer, in: European Conference on Computer Vision, Springer,
[24] V. Vasudevan, M. Bassenne, M.T. Islam, L. Xing, Image classification using graph 2022, pp. 459–479, http://dx.doi.org/10.1007/978-3-031-20053-327.
neural network and multiscale wavelet superpixels, Pattern Recognit. Lett. 166 [48] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong,
(2023) 89–96, http://dx.doi.org/10.1016/j.patrec.2023.01.003. et al., Swin transformer v2: Scaling up capacity and resolution, in: Proceedings
[25] M. Krzywda, S. Łukasik, A.H. Gandomi, Graph neural networks in computer of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
vision-architectures, datasets and common approaches, in: 2022 International pp. 12009–12019, http://dx.doi.org/10.1109/CVPR52688.2022.01170.
Joint Conference on Neural Networks, IJCNN, IEEE, 2022, pp. 1–10, http: [49] H. Touvron, M. Cord, H. Jego, DeiT III: Revenge of the ViT, in: European
//dx.doi.org/10.1109/IJCNN55064.2022.9892658. Conference on Computer Vision, 2022, pp. 516–533.
[26] S. De, R.J. Stanley, C. Lu, R. Long, S. Antani, G. Thoma, R. Zuna, A fusion-based [50] J. Wang, S. Zhang, Y. Liu, T. Wu, Y. Yang, X. Liu, K. Chen, P. Luo, D.
approach for uterine cervical cancer histology image classification, Comput. Lin, RIFormer: Keep your vision backbone effective but removing token mixer,
Med. Imag. Graph. 37 (7–8) (2013) 475–487, http://dx.doi.org/10.1016/j. in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
compmedimag.2013.08.001. Recognition, 2023, pp. 14443–14452, http://dx.doi.org/10.1109/CVPR52729.
[27] Z. Zhuang, Z. Yang, A.N.J. Raj, C. Wei, P. Jin, S. Zhuang, Breast ultrasound tumor 2023.01388.
image classification using image decomposition and fusion based on adaptive [51] Y. Rao, W. Zhao, Y. Tang, J. Zhou, S.-N. Lim, J. Lu, HorNet: Efficient high-order
multi-model spatial feature fusion, Comput. Methods Programs Biomed. 208 spatial interactions with recursive gated convolutions, Adv. Neural Inf. Process.
(2021) 106221, http://dx.doi.org/10.1016/j.cmpb.2021.106221. Syst. 35 (2022) 10353–10366.

11
A.S. Remigio Neurocomputing 617 (2025) 129038

[52] MMPreTrain Contributors, Openmmlab’s pre-training toolbox and benchmark, [59] E.H. Weissler, T. Naumann, T. Andersson, R. Ranganath, O. Elemento, Y. Luo,
2023. D.F. Freitag, J. Benoit, M.C. Hughes, F. Khan, et al., The role of machine learning
[53] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. in clinical research: transforming the future of evidence generation, Trials 22
Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine (2021) 1–15, http://dx.doi.org/10.1186/s13063-021-05489-x.
learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[54] M. Brockschmidt, Gnn-film: Graph neural networks with feature-wise linear
modulation, in: International Conference on Machine Learning, PMLR, 2020, pp. Adrian S. Remigio received a B.Sc. degree in applied
1144–1152. physics from the University of the Philippines — Manila
[55] X. Wang, M. Zhang, How powerful are spectral graph neural networks, in: in 2017 and his master’s in medical physics degree from
International Conference on Machine Learning, PMLR, 2022, pp. 23341–23362. the Royal Melbourne Institute of Technology University,
[56] D. Bo, X. Wang, C. Shi, H. Shen, Beyond low-frequency information in graph Melbourne, Australia in 2022. He is currently a data sci-
convolutional networks, in: Proceedings of the AAAI Conference on Artificial entist at the Analytics, Computing, and Complex Systems
Intelligence, Vol. 35, No. 5, 2021, pp. 3950–3957. laboratory under the Asian Institute of Management. His
[57] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad- research interests include computer vision, machine and
cam: Visual explanations from deep networks via gradient-based localization, in: deep learning, and differential equations applied to the
Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. biological and medical physics domains, such as imaging-
618–626, http://dx.doi.org/10.1109/ICCV.2017.74. based survival analysis and inter-patient image registration.
[58] R.J. Ellis, R.M. Sander, A. Limon, Twelve key challenges in medical machine
learning and solutions, Intell. Based Med. 6 (2022) 100068, http://dx.doi.org/
10.1016/j.ibmed.2022.100068.

Generative AI For Dummies
67% (3)
Generative AI For Dummies
6 pages
Chat GPT For Dummies. A Quick Introduction To Prompt Engineering 2023
90% (10)
Chat GPT For Dummies. A Quick Introduction To Prompt Engineering 2023
33 pages
PUBLICATION
No ratings yet
PUBLICATION
26 pages
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
No ratings yet
Bharath Simha Reddy 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012020
11 pages
2302.09462v1 (1)
No ratings yet
2302.09462v1 (1)
15 pages
Research Proposal Azeem
No ratings yet
Research Proposal Azeem
10 pages
Breast Cancer Project
No ratings yet
Breast Cancer Project
6 pages
Medical Image Analysis With Transformers
No ratings yet
Medical Image Analysis With Transformers
66 pages
Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
No ratings yet
Deep Convolutional Neural Networks For Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
14 pages
A Hybrid Machine Learning Method For Image Classification
No ratings yet
A Hybrid Machine Learning Method For Image Classification
15 pages
A Review of Transfer Learning For Medical Image CL
No ratings yet
A Review of Transfer Learning For Medical Image CL
27 pages
Bioengineering 12 00140 v2
No ratings yet
Bioengineering 12 00140 v2
16 pages
BU22CSEN0300401
No ratings yet
BU22CSEN0300401
5 pages
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
No ratings yet
Deep Learning and Convolutional Neural Networks For Medical Imaging and Clinical Informatics
452 pages
3
No ratings yet
3
46 pages
Deep Learning in Medical Image Analysis
No ratings yet
Deep Learning in Medical Image Analysis
28 pages
Article 1
No ratings yet
Article 1
28 pages
500 citation
No ratings yet
500 citation
27 pages
Vision Transformers in Medical Imaging- A Review
No ratings yet
Vision Transformers in Medical Imaging- A Review
31 pages
CNN Brain Tumor
No ratings yet
CNN Brain Tumor
27 pages
Guest Editorial Deep Learning in Medical Imaging Overview and Future Promise of An Exciting New Technique
No ratings yet
Guest Editorial Deep Learning in Medical Imaging Overview and Future Promise of An Exciting New Technique
7 pages
Deep Learning Applications in Medical Image Analysis Brain
No ratings yet
Deep Learning Applications in Medical Image Analysis Brain
21 pages
11-Deep Learning in Medical Image Analysis
No ratings yet
11-Deep Learning in Medical Image Analysis
30 pages
Neural Networks: Hossein Shahamat, Mohammad Saniee Abadeh
No ratings yet
Neural Networks: Hossein Shahamat, Mohammad Saniee Abadeh
17 pages
Alzheimar
No ratings yet
Alzheimar
14 pages
Yamashita2018 Article ConvolutionalNeuralNetworksAnO
No ratings yet
Yamashita2018 Article ConvolutionalNeuralNetworksAnO
19 pages
Convolutional Neural Networks For Image Classification
No ratings yet
Convolutional Neural Networks For Image Classification
5 pages
Brain
100% (1)
Brain
41 pages
252184
No ratings yet
252184
72 pages
2020 - Singh - 3D Deep Learning On Medical Images
No ratings yet
2020 - Singh - 3D Deep Learning On Medical Images
26 pages
nihms-1660484
No ratings yet
nihms-1660484
8 pages
1704 06825 PDF
No ratings yet
1704 06825 PDF
30 pages
Iclr2022 Should We Replace Cnns With TR
No ratings yet
Iclr2022 Should We Replace Cnns With TR
15 pages
C - E E: A C S S S M M I A: Omputation Fficient RA Omprehensive Urvey of Tate Pace Odels in Edical Mage Nalysis
No ratings yet
C - E E: A C S S S M M I A: Omputation Fficient RA Omprehensive Urvey of Tate Pace Odels in Edical Mage Nalysis
29 pages
4684_down
No ratings yet
4684_down
22 pages
A Guide To Deep Learning in Healthcare
No ratings yet
A Guide To Deep Learning in Healthcare
6 pages
surveycont
No ratings yet
surveycont
37 pages
Journal of Healthcare Engineering - 2019 - Fu - Machine Learning For Medical Imaging
No ratings yet
Journal of Healthcare Engineering - 2019 - Fu - Machine Learning For Medical Imaging
2 pages
CNN Eem305
100% (1)
CNN Eem305
7 pages
An Attention-Based Deep Convolutional Neural Network For Brain Tumor and Disorder Classification and Grading in Magnetic Resonance Imaging
No ratings yet
An Attention-Based Deep Convolutional Neural Network For Brain Tumor and Disorder Classification and Grading in Magnetic Resonance Imaging
14 pages
2017 Article 9983-Read
No ratings yet
2017 Article 9983-Read
11 pages
Fourcade 2019
No ratings yet
Fourcade 2019
10 pages
Research Paper
No ratings yet
Research Paper
12 pages
Research Paper
No ratings yet
Research Paper
7 pages
Brain Tumor Detection
100% (1)
Brain Tumor Detection
3 pages
AD Diagnosis 3-D CNN (1) (2)
No ratings yet
AD Diagnosis 3-D CNN (1) (2)
25 pages
Transfer Learning With Intelligent Training Data Selection For Prediction of Alzheimers Disease
No ratings yet
Transfer Learning With Intelligent Training Data Selection For Prediction of Alzheimers Disease
10 pages
Icarcv 2014 LCWZFC
No ratings yet
Icarcv 2014 LCWZFC
5 pages
Depp Learning For Medical Image Processing
No ratings yet
Depp Learning For Medical Image Processing
57 pages
Deep Learning For Medical Image Analysis Applicati
No ratings yet
Deep Learning For Medical Image Analysis Applicati
10 pages
Optimization of The Convolutional Neural Networks For Automatic Detection of Skin Cancer2020Open Medicine PolandOpen Access
No ratings yet
Optimization of The Convolutional Neural Networks For Automatic Detection of Skin Cancer2020Open Medicine PolandOpen Access
11 pages
27_A novel CNN Architecture with an efficient Channelization for Histopathological Medical Image Classification
No ratings yet
27_A novel CNN Architecture with an efficient Channelization for Histopathological Medical Image Classification
21 pages
1 s2.0 S0957417418307450 Main
No ratings yet
1 s2.0 S0957417418307450 Main
12 pages
ELYAN 2022 Computer Vision and Machine (VOR)
No ratings yet
ELYAN 2022 Computer Vision and Machine (VOR)
23 pages
icces51560.2020.9334594
No ratings yet
icces51560.2020.9334594
6 pages
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
No ratings yet
A Deep Learning Paradigm For Medical Imagin - 2024 - Expert Systems With Applica
9 pages
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
No ratings yet
Medical Image Segmentation With 3D Convolutional Neural Networks: A Survey
34 pages
Neural Architecture Search For Skin Lesion Classification
No ratings yet
Neural Architecture Search For Skin Lesion Classification
11 pages
Image
No ratings yet
Image
13 pages
Literature Survey: Performance Comparision of Residual Deep Network For The Brain Tumor Detection
No ratings yet
Literature Survey: Performance Comparision of Residual Deep Network For The Brain Tumor Detection
19 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Loki 2
No ratings yet
Loki 2
15 pages
Final Research Paper
No ratings yet
Final Research Paper
5 pages
CB Insights - Generative AI Bible
No ratings yet
CB Insights - Generative AI Bible
122 pages
Trackformer
No ratings yet
Trackformer
16 pages
LLMs - Liquid Foundation Models
No ratings yet
LLMs - Liquid Foundation Models
13 pages
Indian Sign Language Interpretation and Sentence Formation: Disha Gangadia Varsha Chamaria Vidhi Doshi
No ratings yet
Indian Sign Language Interpretation and Sentence Formation: Disha Gangadia Varsha Chamaria Vidhi Doshi
6 pages
Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4
No ratings yet
Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4
12 pages
DIGU
No ratings yet
DIGU
26 pages
Thesis Aico Schreurs Cito
No ratings yet
Thesis Aico Schreurs Cito
42 pages
A Survey On Multimodal Bidirectional Machine Learning Translation of Image and Natural Language Processing
No ratings yet
A Survey On Multimodal Bidirectional Machine Learning Translation of Image and Natural Language Processing
14 pages
Paper Review
No ratings yet
Paper Review
31 pages
Parameter-Efficient Fine-Tuning For Large Models: A Comprehensive Survey
No ratings yet
Parameter-Efficient Fine-Tuning For Large Models: A Comprehensive Survey
42 pages
PGP Aiml2024
No ratings yet
PGP Aiml2024
22 pages
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
No ratings yet
Scaling Down To Scale Up: A Guide To Parameter-Efficient Fine-Tuning
21 pages
Large Language Models
100% (1)
Large Language Models
23 pages
Natural Language Processingand Sentiment Analysis
No ratings yet
Natural Language Processingand Sentiment Analysis
15 pages
Towards Greener LLMS: Bringing Energy-Efficiency To The Forefront of LLM Inference
No ratings yet
Towards Greener LLMS: Bringing Energy-Efficiency To The Forefront of LLM Inference
6 pages
Epgp ML Ai 1706605342150
No ratings yet
Epgp ML Ai 1706605342150
27 pages
Transformers - Intuitively and Exhaustively Explained - by Daniel Warfield - Towards Data Science
No ratings yet
Transformers - Intuitively and Exhaustively Explained - by Daniel Warfield - Towards Data Science
38 pages
Liao and Smidt - 2022 - Equiformer Equivariant Graph Attention Transformer
No ratings yet
Liao and Smidt - 2022 - Equiformer Equivariant Graph Attention Transformer
25 pages
Financial Time Series
No ratings yet
Financial Time Series
34 pages
Deep Learning Book PDF
No ratings yet
Deep Learning Book PDF
272 pages
KECReport
No ratings yet
KECReport
23 pages
AI Powered Microscopy Image Analysis For Parasitol
No ratings yet
AI Powered Microscopy Image Analysis For Parasitol
14 pages
Copia de Chat GPT Seo KINDLE
No ratings yet
Copia de Chat GPT Seo KINDLE
159 pages
Multimodal Fusion Research Papers Survey
No ratings yet
Multimodal Fusion Research Papers Survey
1 page
2023 Attention transformer mechanism and fusion-based deep learning architecture for MRI brain tumor classification system
No ratings yet
2023 Attention transformer mechanism and fusion-based deep learning architecture for MRI brain tumor classification system
13 pages
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
No ratings yet
ChatGPT Is Not All You Need. A State of The Art Review of Large Generative AI Models
22 pages