IncARMAG a Convolutional Neural Network With Multi-level Autoregressive Moving Average Graph Convolutional Processing Framework for Medical Image Classification
IncARMAG a Convolutional Neural Network With Multi-level Autoregressive Moving Average Graph Convolutional Processing Framework for Medical Image Classification
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Communicated by Q. Huang Effective computer-aided detection and diagnosis algorithms are being sought to carry out complex pattern
recognition with great precision while increasing efficiency and robustness through automation. In this study,
Keywords:
we leveraged self-constructing graph (SCG) and autoregressive moving average (ARMA) graph convolution to
Medical image classification
Convolutional neural network
postprocess multi-level feature maps obtained from an Inception V3 network for medical image classification.
Graph convolution The adopted SCG learns latent embeddings from the feature maps, which are subsequently used to construct
Autoregressive moving average convolution graph topology based on embedding similarity. The ARMA convolution then learns to encode more flexible
Feature embedding spectral filter responses that simultaneously takes into account long-range dependencies from the SCG graph.
The proposed IncARMAG model was evaluated on 2D MedMNIST datasets, which consist of several imaging
modalities and classification tasks in the medical domain. Results from our experiments on MedMNIST datasets
showed that IncARMAG outperformed state-of-the-art models on many cases. IncARMAG has also achieved the
highest accuracy based on evaluation of various benchmark CNN, transformers, and hybrid models on four
selected datasets. These results highlight how IncARMAG is a strong candidate in medical image analysis
involving classification tasks.
https://doi.org/10.1016/j.neucom.2024.129038
Received 13 June 2024; Received in revised form 11 November 2024; Accepted 22 November 2024
Available online 30 November 2024
0925-2312/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
A.S. Remigio Neurocomputing 617 (2025) 129038
The advent of geometric deep learning has provided the bene- 2. Instead of the conventional GCNs, we used autoregressive mov-
fits of topological data analysis for improved performance in high- ing average (ARMA) graph convolutions for the GNN-based
dimensional learning, including computer vision [16]. For example, processing of the multi-level graph topology.
better representations can be identified by imposing properties from 3. Our proposed model was shown to outperform many state-of-the
geometric priors like symmetries, isomorphisms, and deformation sta- art (SOTA) image classification models on many medical image
bility [17]. Geometric learning can also establish topological struc- classification datasets.
tures that can augment valuable information to the low-dimensional 4. IncARMAG showed superior performance on most of the datasets
projection. Graph neural networks (GNNs) is a subset of geometric compared to SOTA classification models.
deep learning, and is a method dedicated for representation learning
of graphs [3]. A standard graph convolutional network (GCN) layer
2. Related works
applies a simple linear transformation of features followed by a sum-
mation of the resulting transformation with all the one-hop neighbors.
2.1. Medical image classification
Graph attentional operator (GAT) replaces a simple summation in the
GCN with weighted summation in which the weights are simultane-
As vision models continue to advance, their ongoing application
ously learned by the GNN model [18]. Other graph convolution takes
in the medical field becomes increasingly prevalent. Classification al-
advantage of neural networks, such as multilayer perceptron (MLP) gorithms that were originally developed and/or benchmarked from
or gated recurrent unit (GRU) for potentially more expressive feature other domains are continuously being adopted in the medical field.
transformation [19,20]. Exploration of using negative node samples in For example, ResNet models have been used in numerous medical
GNNs using detrimental point process has been explored and has shown imaging classification tasks such as cervical cancer identification [26],
benefit in using negative relation [21]. breast cancer detection [27], and pulmonary diseases [28]. Other mod-
Autoregressive moving average (ARMA) graph filters are built upon els and learning methods have played prominent roles in advancing
to approximate a spectral response function that is more flexible com- medical image classification. An important example is the development
pared with standard polynomial graph filters, such as the Chebyshev and/or integration of multiple architectural modules to complement
filters [22]. Graph spectral frequency decomposition via application the benefits of such modules. Yengec-Tasdemir et al. [29] employed
of the Laplacian operator indicates the variation in features between contrastive learning to improve the performance of various CNN models
connected nodes. ARMA graph filters have a better approximation of for polyp classification. Models developed for image classification are
a particular target response at higher frequencies compared with a often utilized for more complex vision tasks, wherein they serve as
classical polynomial filter [23]. Thus, ARMA convolution can generate backbone or feature extraction networks. One example of such applica-
precise filters that extend to graphs with higher spectral frequencies or tion in medical imaging is the use of ResNet module on feature pyramid
those graphs with edges between highly dissimilar node features. network for tooth and pulp instance segmentation [30].
GNNs can also handle regular grid-structured data by finding and MedViT is a hybrid CNN-transformer model that combines the su-
constructing relational information among the spatial grids. One stan- perior inductive bias of convolution, while gaining the global receptive
dard approach from extracting graph structure from images and/or field from attention mechanism [31]. Their evaluation on MedMNIST
feature maps is the use of superpixel clustering methods. In these clus- datasets, a repository of 2D and 3D image datasets from different med-
tering methods, distance or feature similarity measures have been used. ical modalities [32], showed a superior average accuracy of MedViT
The preprocessing steps of graph construction from images usually in- over ResNet, auto-sklearn, AutoKeras, and Google AutoML. Similarly,
volve superpixel construction and/or prior application of convolutional MedMamba models both global relations within the multidirectional
feature extraction to form the graph topology [24,25]. Self-constructing input images or feature maps via a multi-branch model of convolution
graph (SCG) is an algorithm that constructs graph topology from the and structured state–space sequence model [33]. This state–space se-
downsampled feature map by learning an embedding representation quence model is based on an iterative solution of a linear ordinary
that is jointly optimized with the entire model [15]. SCG forms node differential equation (ODE), which can also be generalized into a single
connections based on the similarity of the latent embeddings, which 1-D global convolution.
endows them to learn non-local context information from long-range
node connections. SCG was found to perform well on semantic image 2.2. GNN networks for image classification
labeling and segmentation compared with purely CNN models.
Motivated by the advantages of learning relation information among As emphasized, graph representation can be obtained directly from
feature maps [25], a modified hybrid CNN-GNN model for medical images or feature maps through various methods. In the work of Monti
image classification was introduced in this study. As emphasized, CNN et al. [34], learnable Gaussian kernels were applied to determine the
models have smaller receptive fields that limit learning global feature connections based on geometric distances of superpixels on images. Us-
interactions per layer, specifically in earlier layers. Although attention ing clustering methods, such as SLIC and SEED, it is possible to directly
mechanisms have overcome this problem, their inductive bias suffers in generate graph representations from images. Avelar et al. has used K-
terms of locality and translation equivariance. Hybrid CNN-GNNs is a nearest neighbors (KNN) and SLIC clustering algorithm to generate the
potential approach in achieving this; the use of CNN for preprocessing graph topology from the images. Graph attention network (GAT) layers
features, alongside with the node and edge permutation invariance of was then used to encode for node embeddings from the input superpixel
undirected GNNs, help preserve inductive bias while allowing for long- graphs [35]. A hybrid GNN architecture of GCN, GraphSAGE, and
range dependencies to be captured. Self-constructing graphs (SCG) was ARMA convolution filters following SLIC clustering was proposed by
leveraged in order to derive abstract graph topologies optimized to the Yao et al. [36] for hyperspectral image classification.
task at hand. The ARMA convolution was then applied to extract better Numerous GNN model designs for vision tasks incorporates CNN
feature embeddings corresponding to subgraphs with high spectral or transformer architecture to optimize deep feature extraction. Con-
frequencies. Combining all the components, we henceforth refer to this textual information from a series of images was modeled through a
architecture as IncARMAG. The key contributions of this paper are as combination of CNN and GNN in the work of Campos et al. [37].
follows: Node features and edges were calculated based on the initial CNN
feature maps of bounding boxes, followed by subgraph integration and
1. The model architecture proposed in this study utilized a multi- subsequent GNN layer operations. VisionG (ViG) stands as one of the
level feature map refinement with GNN. most successful application of GNN models in image classification. ViG
2
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 1. The proposed IncARMAG architecture, consisting of the following stages: (1) an Inception V3 backbone, (2) Self-constructing graph module, (3) ARMA GNN processing,
and (4) weighted concatenation and classifier head.
introduced the grapher module, which better alleviates over-smoothing Our backbone architecture is based on Inception V3, excluding the
compared to the standard GCN layer by adding nonlinear activation, final two Inception blocks and the fully connected head classifiers. Re-
residual connection, and MLP layer [38]. Based on their benchmarking ducing computational cost was the primary reason behind the omission
experiments on ImageNet dataset, ViG has outperformed representative of the final two layers. It was also argued that adding auxiliary classi-
CNN, MLP, and transformer models. In the study of Fei et al. [38], a fiers that branches from low-level inception blocks slightly boost model
high-level feature map from a ResNet backbone is treated as an input performance, which can be caused by either the refinement of low-
into a multi-branch network comprising of both local and global feature level features or the regularization by the auxiliary heads [41]. This has
extractors. The local feature extractor has a simple convolution and inspired us to investigate the effect of replacing simple MLP layers with
MLP framework, whereas the global feature extractor path consisted GNN as auxiliary classifiers. Our IncARMAG architecture has three GNN
of GCN layers and multi-head self-attention pooling. processing branches that take feature maps from different respective
stages in the Inception V3 backbone. We denote these feature maps as
3. Methods 𝛼 , 𝛽 , and 𝛾 . 𝛾 ∈ R7×7×2048 corresponds to the feature map of the last
Inception block, whereas 𝛼 and 𝛽 are from the preceding stages of
the backbone network. Two variants of IncARMAG model with different
An overview of the proposed IncARMAG architecture is illustrated
depth locations of 𝛼 and 𝛽 were developed in this study.
in Fig. 1. Details of each component of the IncARMAG model will be
discussed in the following subsections.
3.2. Self-constructing graph topology
3.1. Deep feature extraction Prior to applying graph convolution to the extracted deep features,
the feature maps must be converted into their representative graph
The original motivation behind the Inception architecture is to topology. In general, a graph can be defined by an ordered pair, 𝐺 =
capture the correlation patterns of feature maps at multiple scales, with { , }, where and are the set of nodes and edges, respectively.
the kernel size representing the range over which the correlated units The graph topology can also be represented via the adjacency matrix,
are distributed [39]. Another advantage of the Inception block is in the 𝐴 ∈ R| |×| | , where | | is the number of nodes. The SCG algorithm,
reduction of computational cost via depthwise filter concatenation and proposed by Liu et al. [15], was employed to automatically learn the
dimensionality reduction. In the improved Inception V3 model, compu- adjacency matrix given the input feature map.
tational cost was further reduced by factorizing the 5 × 5 convolutional Fig. 2 provides a schematic description of the SCG module [15].
kernel into a series of smaller and spatially asymmetric kernels [40]. For any feature map, denoted here simply as , an adaptive average
3
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 2. The operations involved in the SCG algorithm. SCG generates a graph, defined by a node feature and adjacency matrix, as the output. In addition, residual prediction from
SCG are also added on final GNN layers.
pooling was used to reduce the dimensions of the feature map from ∈ is added to the processed node embeddings by the ARMA convolution
′ ′
R𝑛×𝑛×𝑐 to ′ ∈ R𝑛 ×𝑛 ×𝑐 , where 𝑐 is the number of convolutional filters layers. Thus, the output of the SCG blocks for each feature map includes
′
while 𝑛 and 𝑛 are the feature dimensions before and after pooling, ̂ and a residual prediction term, 𝑦.
both a graph, 𝐺 = { ′ , 𝐴}, ̂
respectively. The pooled features are considered as node features in
the graph, wherein each node has a node feature equal to depth-wise 3.3. Autoregressive moving average GNN
channel vector of a particular 2D location in the feature map. This is
′ ′
equivalent to reshaping the pooled feature map: ′ ∈ R𝑛 ×𝑛 ×𝑐 → R𝑁×𝑐 , In the ARMA GNN stage of our proposed architecture, the formula-
where 𝑁 = | | = 𝑛 × 𝑛 .
′ ′ tion of Bianchi et al. [22] was used. The schematic diagram of the flow
Generating the adjacency graph via SCG requires embedding the of node feature processing following SCG is illustrated in Fig. 3. Graph
feature map into a latent space, parameterized by mean, 𝜇, and stan- convolution can be treated in the graph spectral domain by applying a
dard deviation, 𝜎 matrices. The mean matrix, 𝜇, is further encoded from filter response function, ℎ(𝜆), where 𝜆 is an eigenvalue of the Laplacian
the pooled features using a convolution with 3 × 3 kernel size, whereas matrix, 𝐿 = 𝐷 − 𝐴. Graph filtering transforms an input node with a
a 1 × 1 convolution with exponential activation is applied to obtain the single feature, ∈ R𝑁 , into a filtered node, ̃ via:
𝜎 matrix. These matrices encodes a latent embedding, expressed in the
̃ = 𝑈 ⋅ 𝐻𝑑 (𝜆) ⋅ 𝑈 𝑇 (6)
form:
where 𝑈 is the eigenvector matrix of 𝐿, and 𝐻𝑑 (𝜆) is the element-wise
𝑍 = 𝜇 + 𝜎 ⋅ (0, 𝐼) → 𝐶 𝑜𝑛𝑣3×3 ( ′ ) + exp[𝐶 𝑜𝑛𝑣1×1 ( ′ )] ⋅ (0, 𝐼) (1)
application of ℎ on the diagonal matrix of eigenvalues, 𝜆 [3].
where (0, 𝐼) is a Gaussian distributed noise present during training Due to issues on computational eigendecomposition and non-
∑
time. Connection between two nodes is established if their projected localized embedding, the polynomial filter, ℎpoly (𝜆) = 𝐾 𝑘
𝑘=1 𝜔𝑘 𝜆 , was
embeddings are nonnegative, leading to an initial graph adjacency introduced, and the resulting graph filtering relation in Eq. (6) is:
matrix given by: ∑
𝐾
( ) ̃ = 𝜔𝑘 𝐿𝑘 (7)
𝐴 = 𝑅𝑒𝐿𝑈 𝑍 𝑍 𝑇 (2) 𝑘=0
where 𝑅𝑒𝐿𝑈 is the rectified linear unit activation function [42]. For multi-channeled node feature, the weight, 𝑤𝑘 , in Eq. (7) is replaced
Compared to the original graph auto-encoders, SCG introduced by a weight matrix, 𝑊𝑘 . Several polynomial filters, such as the Cheby-
diagonal enhancement through the adaptive factor: shev, were introduced to better model the desired filter responses while
√ minimizing overfitting in the extrapolated frequency range.
𝑁
𝛾 = 1 + ∑𝑁 (3) As graphs can be interpreted in the spectral domain through the
𝑖=1 𝐴𝑖𝑖 + 𝜖 eigenvalues of the Laplacian matrix, several graph convolution op-
where 𝐴𝑖𝑖 is the diagonal entries of the adjacency matrix, and 𝜖 is erators based on polynomial filters were developed. One example is
a smoothing factor. Combining the diagonal enhancement with the the Chebyshev polynomial, which replaces the simple 𝐿𝑘 with the
adjacency matrix normalization, the output adjacency matrix from SCG Chebyshev function, 𝑇𝑘 (𝐿) [43]. The graph convolution network (GCN)
is: is a simplification of the Chebyshev spectral filtering with 𝐾 = 1.
1 1 The autoregressive moving average (ARMA) filter is based on the filter
𝐴̂ = 𝐷− 2 (𝐴 + 𝛾 ⋅ diag(𝐴) + 𝐼) 𝐷 2 (4)
response relation:
where 𝐷 is the degree matrix with the diagonal entries equal to the ( )−1 ( 𝐾 )
∑𝐾 ∑
degree of the corresponding node. (𝑙+1) = 𝐼 + 𝑞𝑘 𝐿𝑘 𝑝𝑘 𝐿𝑘 (𝑙) (8)
In addition learning the graph from the feature maps, a residual 𝑘=0 𝑘=0
prediction term, given by: where (𝑙) and (𝑙+1) are the node feature embeddings at convolutional
layers 𝑙 and 𝑙 + 1, respectively; 𝑞𝑘 and 𝑝𝑘 are filter coefficients [23].
𝑦̂ = 𝛾 ⋅ 𝜇 (1 − log 𝜎) (5)
4
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 3. Schematic flow of node features in the ARMA convolution layer with arbitrary number of recursion, 𝑇 , and number of layers, 𝑀.
However, to avoid the intractable inverse matrix operation, the inverse Table 1
Details on the architectural design of the proposed model.
term is approximated by an autoregressive term with additional skip
Hyper-parameter 𝛼 𝛽 𝛾
connection. The resulting ARMA convolution is given by the recursion
SCG average pooling output size (12,12) (12,12) (5,5)
relation:
( ) SCG number of channels for 𝜇 and 𝜎 512 512 1024
(𝑡+1) ̂ (𝑡) 𝑊𝑙 + (0) 𝑉𝑙 Number of channels for each ARMA 128 128 256
(𝑙) = 𝜎 𝐿 (𝑙) (𝑙)
(9)
Dropout rate for ARMA 0.3 0.3 0.3
Activation function used in ARMA ReLU ReLU ReLU
(𝑡)
where (𝑙) is the node embedding estimation at iteration 𝑡 and layer 𝑙; Number of channels for MLP layer 128 128 256
𝑊 and 𝑉 are learnable weight matrices; 𝐿̂ is the modified Laplacian; Weighted concatenation coefficient 𝛼 = 0.25 𝛽 = 0.5 –
4. Experimental results
3.4. Feature aggregation and classification head
4.1. Architecture and training implementation
The output node features from each feature processing level are A summary of the standard architecture is shown in Table 1. To
subjected to an average pooling to yield a single feature vector. Feature evaluate our proposed model in medical imaging, we used eight 2D
vectors ℎ𝛼 , ℎ𝛽 , and ℎ𝛾 correspond to the pooled node features following imaging datasets from the MedMNIST V2 repository [32]. The datasets
the GNN processing of 𝐹𝛼 , 𝐹𝛽 , and 𝐹𝛾 , respectively. Each feature vector used on our experiments are defined in Table 2. MedMNIST provides
is passed on a single fully-connected layer (or MLP) for feature post- an excellent platform for evaluating classification models in medical
processing. The feature vectors from the three GNN processing levels imaging due to its diverse range of modalities. DermaMNIST involves
are combined through weighted concatenation: a classification of several types of pigmented skin lesion from dermato-
[ ] scope images. PneumoniaMNIST, OCTMNIST, and BreastMNIST are
𝛼 𝛽
ℎ′ = Concat 𝑀 𝐿𝑃 (ℎ𝛼 ), 𝑀 𝐿𝑃 (ℎ𝛽 ), 𝑀 𝐿𝑃 (ℎ𝛾 ) (11) classification tasks for detection of a presence of a disease correspond-
𝛼+𝛽+1 𝛼+𝛽+1
ing to pediatric X-ray, retinal optical coherence tomography (OCT) and
Ultimately, the concatenated feature vector, ℎ′ , proceeds to the output breast ultrasound, respectively. Normal peripheral blood classification
softmax layer to yield the prediction score per class. using cell microscopy was also included as a benchmarking model.
5
A.S. Remigio Neurocomputing 617 (2025) 129038
Table 3
Comparison of the performance of IncARMAG with various published models that were evaluated on DermaMNIST,
OCTMNIST, PneumoniaMNIST, and BreastMNIST datasets.
Model DermaMNIST OCTMNIST Pneumonia BreastMNIST
MNIST
AUC ACC AUC ACC AUC ACC AUC ACC
ResNet18 [32] 0.920 0.754 0.958 0.763 0.956 0.864 0.891 0.833
ResNet50 [32] 0.912 0.731 0.958 0.776 0.962 0.884 0.857 0.812
auto-sklearn [32] 0.902 0.719 0.887 0.601 0.942 0.855 0.836 0.803
AutoKeras [32] 0.915 0.749 0.955 0.763 0.947 0.878 0.871 0.831
Google AutoML Vision [32] 0.914 0.768 0.963 0.771 0.991 0.946 0.919 0.861
MedViT-T [31] 0.914 0.768 0.961 0.767 0.993 0.949 0.934 0.896
MedViT-S [31] 0.937 0.780 0.960 0.782 0.995 0.961 0.938 0.897
MedViT-L [31] 0.920 0.773 0.945 0.761 0.991 0.921 0.929 0.883
EHDFL [45] 0.917 0.769 0.917 0.769 0.968 0.883 0.894 0.897
IncARMAG (Ours) 0.938 0.802 0.985 0.873 0.938 0.885 0.916 0.872
Table 4
Comparison of the performance of IncARMAG with various published models that were evaluated on BloodMNIST,
OrganAMNIST, OrganCMNIST, and OrganSMNIST datasets.
Model BloodMNIST OrganAMNIST OrganCMNIST OrganSMNIST
AUC ACC AUC ACC AUC ACC AUC ACC
ResNet18 [32] 0.998 0.958 0.997 0.935 0.994 0.920 0.974 0.782
ResNet50 [32] 0.997 0.950 0.998 0.947 0.993 0.911 0.975 0.785
auto-sklearn [32] 0.984 0.878 0.963 0.762 0.976 0.829 0.945 0.672
AutoKeras [32] 0.998 0.961 0.994 0.905 0.990 0.879 0.974 0.813
Google AutoML Vision [32] 0.998 0.966 0.990 0.886 0.988 0.877 0.964 0.749
MedViT-T [31] 0.996 0.950 0.995 0.931 0.991 0.901 0.972 0.789
MedViT-S [31] 0.997 0.951 0.996 0.928 0.993 0.916 0.987 0.805
MedViT-L [31] 0.996 0.954 0.997 0.943 0.994 0.922 0.973 0.806
EHDFL [45] – – 0.997 0.936 0.994 0.916 0.974 0.784
IncARMAG (Ours) 0.999 0.987 0.993 0.927 0.995 0.934 0.975 0.801
Lastly, datasets on organ identification in abdominal computed tomog- decline is noted in BreastMNIST, PneumoniaMNIST, and OrganAM-
raphy (CT) at different body planes are used for model performance NIST when comparing IncARMAG with the benchmark models. For
benchmarking. OrganAMNIST, ResNet50 achieved the highest accuracy and AUC. The
Although MedMNIST comes with different image dimensions, we architecture of our IncARMAG potentially matches the generalizabil-
limit our experiment on 224 × 224 dimensions since they closely match ity of MedViT in medical image classification based on the ranking
the dimensions utilized in real clinical environments. The images in all statistics of performance in Tables 3 and 4. Despite not using any data
datasets underwent z-normalization as a preprocessing step to the clas- augmentation techniques, our proposed model is still capable of gener-
sification models. All models implemented on this study were trained ating reliable predictions following model training, with the exception
and tested on the same set of training and testing images, respectively. of pediatric pneumonia classification task in PneumoniaMNIST. IncAR-
Our proposed model was trained using the adaptive moment estimation MAG has also surpassed the performance of autoML models and ResNet
(Adam) optimizer [44] for 100 epochs with an initial learning rate architectures in most of the MedMNIST datasets. Overall, IncARMAG
of 1E-04 that decays by a factor of 0.1 for epoch milestones 50 and delivered the best performance on 4 out of 8 datasets, with MedViT-
75. A batch size of 16 was used for all implemented models and for S ranking as the top model on 3 out of 8 datasets, while ResNet50
all datasets in this study. A regularization decay factor of 0.0001 was secured the top spot on just one dataset. IncARMAG also has a slight
used for all models except for the transformer-based models in order to advantage over MedViT in terms of parameter count; IncARMAG has
avoid nonfinite loss. Data augmentation was not employed during the 21.4 M parameters, while MedViT has 23 M.
training phase. All implementations were carried out using PyTorch and To further assess the capabilities of our model, we trained and
Torchvision in multiple Nvidia GeForce GTX 1080 GPUs. evaluated several CNN, transformer, and hybrid-type models on four
MedMNIST datasets using the aforementioned data preprocessing and
training hyper-parameters in Section 4.1. These models are the ResNet
4.2. Comparison with SOTA models
series [7], Inception V3 [41], EfficientNet V2 [8], ConvNext-B [6],
ViT-B [46], MaxViT [47], Swin-S [48], DEIT [49], RIFormer [50],
Tables 3 and 4 present the accuracy (ACC) and area under the HorNet [51], and ViG [37]. These models were implemented using
receiver operating curve (AUC) of our proposed IncARMAG across torchvision and mmpretrain libraries [52]. Overall accuracy, macro re-
eight MedMNIST datasets. We cite the results of some studies that call, and macro precision are reported for each model. Overall accuracy
has carried out experiments on MedMNIST datasets for comparison. offers a general insight on the model performance across the evaluation
These include ResNets and autoML models from the original MedM- dataset while macro precision and recall highlight false positive and
NIST study [32], MedViT [31], and evolutionary hybrid domain fea- false negative predictions, respectively, with less sensitivity to class
ture learning (EHDFL) [45]. IncARMAG achieved the highest accuracy imbalances [53].
and AUC on DermaMNIST, OCTMNIST, BloodMNIST, and OrganCM- The experimental results of our proposed and benchmarking models
NIST, whereas MedViT model performance dominated at the Pneumo- for the four selected MedMNIST datasets are shown in Tables 5 and
niaMNIST, BreastMNIST, and OrganSMNIST. An accuracy improvement 6, which shows that IncARMAG outperforms the counterpart bench-
greater than 2% is given by our proposed IncARMAG for DermaMNIST, marking models in terms of accuracy and precision. ResNet and In-
OCTMNIST, and BloodMNIST. Significant increase in accuracy is ob- ception have consistently attained above-median performance for the
served on the OCTMNIST dataset evaluation. However, performance four datasets. The performance of transformer-based models varies
6
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 4. Confusion matrix of IncARMAG predictions on the test set of (a) DermaMNIST, (b) PneumoniaMNIST, (c) BloodMNIST, and (d) BreastMNIST.
Table 5 Table 6
Performance of IncARMAG and benchmarking models on DermaMNIST and Pneumo- Performance of IncARMAG and benchmarking models on BloodMNIST and BreastMNIST
niaMNIST datasets. datasets.
Model DermaMNIST PneumoniaMNIST Model BloodMNIST BreastMNIST
Acc Prec Rec. Acc. Prec. Rec. Acc Prec Rec. Acc. Prec. Rec.
ResNet101 0.757 0.575 0.569 0.865 0.826 0.995 ResNet101 0.985 0.984 0.986 0.814 0.876 0.868
ResNet152 0.783 0.628 0.604 0.859 0.819 0.995 ResNet152 0.985 0.984 0.986 0.821 0.898 0.851
Inception V3 0.782 0.587 0.609 0.864 0.822 0.997 Inception V3 0.984 0.985 0.986 0.840 0.908 0.868
EfficientNet V2 0.764 0.597 0.526 0.852 0.813 0.995 EfficientNet V2 0.984 0.984 0.984 0.731 0.731 1.0
ConvNext 0.756 0.583 0.522 0.872 0.834 0.992 ConvNext 0.981 0.983 0.982 0.808 0.818 0.948
ViT-B 0.730 0.493 0.480 0.830 0.794 0.977 ViT-B 0.956 0.952 0.949 0.711 0.805 0.798
MaxViT 0.773 0.629 0.524 0.851 0.811 0.992 MaxViT 0.983 0.984 0.984 0.827 0.822 0.974
Swin-S 0.782 0.587 0.609 0.625 0.625 1.00 Swin-S 0.889 0.870 0.856 0.673 0.756 0.816
DEIT V3 0.732 0.514 0.509 0.835 0.800 0.982 DEIT V3 0.958 0.960 0.954 0.750 0.800 0.877
RIFormer 0.730 0.485 0.430 0.825 0.791 0.980 RIFormer 0.949 0.960 0.953 0.699 0.734 0.921
HorNet 0.758 0.565 0.574 0.830 0.806 0.982 HorNet 0.851 0.869 0.808 0.737 0.739 0.991
ViG 0.749 0.506 0.491 0.870 0.832 0.992 ViG 0.983 0.984 0.984 0.808 0.862 0.877
IncARMAG (Ours) 0.795 0.667 0.617 0.886 0.849 0.995 IncARMAG (Ours) 0.987 0.987 0.986 0.872 0.898 0.930
7
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 5. Performance metrics of various IncARMAG variants evaluated on the test set of (a) DermaMNIST, (b) PneumoniaMNIST, (c) BloodMNIST, and (d) BreastMNIST. IncARMAG
V2 has no feature processing at 𝐹𝛼 and 𝐹𝛽 while IncARMAG V3 has feature processing frameworks moved to Inception blocks 7 and 8 for 𝐹𝛼 and 𝐹𝛽 , respectively.
datasets. Fig. 4(a) shows the IncARMAG has a strong accuracy owing 4.3. Ablation experiments
to the slight prediction bias towards the prominent melanoma label in
an imbalanced dataset. Ablation experiments were also conducted to assess the impact of
There is minimal to no improvement in classification accuracy various architectural components on the overall model performance.
compared with some of the benchmark models for PneumoniaMNIST These experiments were evaluated on the four of the MedMNIST
and BloodMNIST experiments. The confusion matrix of IncARMAG on datasets: DermaMNIST, PneumoniaMNIST, BloodMNIST, and BreastM-
BloodMNIST experiment shows strong classification accuracy across all NIST. Two additional variants of our proposed model were investigated.
labels, as illustrated in Fig. 4(c). A large gap in accuracy can be found One IncARMAG variant, denoted as IncARMAG v2, has a single GNN-
between IncARMAG and the benchmark models in the BreastMNIST based feature processing at the last feature map, 𝐹𝛾 . We have also
dataset. Swin-S was able to surpass IncARMAG in terms of recall explored varying the feature map levels, 𝐹𝛼 and 𝐹𝛽 , that were con-
as this model gives a completely degenerate prediction that leads to nected to the GNN-based feature refinement. IncARMAG V3 has 𝐹𝛼
nonexistent false negative. The ability of an image classifier model to and 𝐹𝛽 equal to the feature maps at Inception blocks 6 and 7, respec-
generalize to a diverse set of input data is crucial for robustness and tively. The evaluation results of these IncARMAG variants on the four
reliability in various real-world applications. This implies that a model’s MedMNIST datasets in Table 2 are shown in Figs. 5.
true performance is demonstrated when it consistently performs well There is a clear benefit in using multi-level feature processing
across a variety of datasets. Although there are instances in tables framework on the DermaMNIST, PneumoniaMNIST, and BreastMNIST
3 and 4 with near-perfect predictions from the benchmark models, datasets. Adding auxiliary classifiers may result in an increased accu-
our proposed model also exhibited this level of performance while racy, which can be ascribed to the feature refinement and/or inherent
surpassing them on certain datasets. regularization as argued in the Inception V3 paper [41]. Comparison
8
A.S. Remigio Neurocomputing 617 (2025) 129038
Table 7 Table 9
Model performances with different types of GNN convolution layers for DermaMNIST Impact of adaptive diagonal enhancement and residual predictions in IncARMAG
and PneumoniaMNIST datasets. performance on DermaMNIST and PneumoniaMNIST datasets.
Layer type DermaMNIST PneumoniaMNIST Details DermaMNIST PneumoniaMNIST
Acc Prec Rec. Acc. Prec. Rec. Acc Prec Rec. Acc. Prec. Rec.
GCN 0.763 0.582 0.526 0.864 0.824 0.995 Without diagonal enhancement 0.782 0.597 0.570 0.885 0.822 0.995
GAT 0.791 0.638 0.607 0.857 0.818 0.992
Without residual predictions 0.792 0.609 0.615 0.862 0.822 0.995
Chebyshev 0.794 0.656 0.622 0.862 0.825 0.990
Transformer 0.790 0.657 0.599 0.867 0.813 0.995
GIN 0.793 0.666 0.608 0.885 0.849 0.992
Table 10
ARMA 0.795 0.667 0.617 0.886 0.849 0.995
Impact of adaptive diagonal enhancement and residual predictions in IncARMAG
performance on BloodMNIST and BreastMNIST datasets.
9
A.S. Remigio Neurocomputing 617 (2025) 129038
Fig. 6. Grad-cam visualization of ViG and IncARMAG feature activations on a sample image in each dataset. Red and blue colors in the heatmap correspond to the maximum and
minimum extremes of activation values, respectively.
The novelty of our proposed IncARMAG lies on the adoption of The authors declare the following financial interests/personal rela-
SCG-ARMA GNN framework on different hierarchical feature levels to tionships which may be considered as potential competing interests:
Adrian S. Remigio reports a relationship with Asian Institute of Manage-
capture long-range representations. Our experiments has demonstrated
ment that includes: employment. If there are other authors, they declare
that IncARMAG ranks as one of the top models in classification of
that they have no known competing financial interests or personal
medical images that spans different imaging modalities. IncARMAG
relationships that could have appeared to influence the work reported
has also outperformed various CNN and transformer-based state-of- in this paper.
the-art models based on more extensive experiments on four medical
imaging datasets. The boost in performance with multi-level version of Data availability
IncARMAG has presented the advantages of using additional SCG-GNN
as auxiliary heads, which can be attributed to the regularization and/or Data will be made available on request.
feature diversity.
10
A.S. Remigio Neurocomputing 617 (2025) 129038
References [28] D.M. Ibrahim, N.M. Elshennawy, A.M. Sarhan, Deep-chest: Multi-classification
deep learning model for diagnosing COVID-19, pneumonia, and lung cancer chest
[1] J. Potočnik, S. Foley, E. Thomas, Current and potential applications of artificial diseases, Comput. Biol. Med. 132 (2021) 104348, http://dx.doi.org/10.1016/j.
intelligence in medical imaging practice: A narrative review, J. Med. Imag. compbiomed.2021.104348.
Radiat. Sci. (2023) http://dx.doi.org/10.1016/j.jmir.2023.03.033. [29] S.B. Yengec-Tasdemir, Z. Aydin, E. Akay, S. Dogan, B. Yilmaz, An effective
[2] A. Shah, M. Shah, A. Pandya, R. Sushra, R. Sushra, M. Mehta, K. Patel, K. Patel, colorectal polyp classification for histopathological images based on supervised
A comprehensive study on skin cancer detection using artificial neural network contrastive learning, Comput. Biol. Med. 172 (2024) 108267, http://dx.doi.org/
(ANN) and convolutional neural network (CNN), Clin. eHealth (2023). 10.1016/j.compbiomed.2024.108267.
[3] Y. Ma, J. Tang, Deep Learning on Graphs, Cambridge University Press, 2021. [30] W. Duan, Y. Chen, Q. Zhang, X. Lin, X. Yang, Refined tooth and pulp segmen-
[4] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep tation using U-Net in CBCT image, Dentomaxillofacial Radiol. 50 (6) (2021)
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012). 20200251.
[31] O.N. Manzari, H. Ahmadabadi, H. Kashiani, S.B. Shokouhi, A. Ayatollahi,
[5] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale
MedViT: a robust vision transformer for generalized medical image classifi-
image recognition, 2014, http://dx.doi.org/10.48550/arXiv.1409.1556, arXiv.
cation, Comput. Biol. Med. 157 (2023) 106791, http://dx.doi.org/10.1016/j.
[6] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for
compbiomed.2023.106791.
the 2020s, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
[32] J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, B. Ni, Medmnist v2-a
Pattern Recognition, 2022, pp. 11976–11986.
large-scale lightweight benchmark for 2d and 3d biomedical image classification,
[7] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
Sci. Data 10 (1) (2023) 41, http://dx.doi.org/10.1038/s41597-022-01721-8.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
[33] Y. Yue, Z. Li, Medmamba: Vision mamba for medical image classification, 2024,
2016, pp. 770–778.
http://dx.doi.org/10.48550/arXiv.2403.03849, arXiv preprint arXiv:2403.03849.
[8] M. Tan, Q. Le, Efficientnetv2: Smaller models and faster training, in: International
[34] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M.M. Bronstein, Geometric
Conference on Machine Learning, PMLR, 2021, pp. 10096–10106.
deep learning on graphs and manifolds using mixture model cnns, in: Proceedings
[9] B. Zoph, Q.V. Le, Neural architecture search with reinforcement learning, 2016,
of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp.
http://dx.doi.org/10.48550/arXiv.1611.01578, arXiv preprint arXiv:1611.01578.
5115–5124.
[10] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu,
[35] P.H. Avelar, A.R. Tavares, T.L. da Silveira, C.R. Jung, L.C. Lamb, Superpixel
R. Pang, V. Vasudevan, et al., Searching for mobilenetv3, in: Proceedings of the
image classification with graph attention networks, in: 2020 33rd SIBGRAPI
IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314–1324.
Conference on Graphics, Patterns and Images, SIBGRAPI, IEEE, 2020, pp.
[11] C.-Y. Wang, H.-Y.M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, I.-H. Yeh, CSPNet:
203–209, http://dx.doi.org/10.1109/SIBGRAPI51738.2020.00035.
A new backbone that can enhance learning capability of CNN, in: Proceedings
[36] D. Yao, Z. Zhi-li, Z. Xiao-feng, C. Wei, H. Fang, C. Yao-ming, W.-W. Cai,
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Deep hybrid: multi-graph neural network collaboration for hyperspectral image
Workshops, 2020, pp. 390–391.
classification, Defence Technol. 23 (2023) 164–176, http://dx.doi.org/10.1016/
[12] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M.
j.dt.2022.02.007.
Tan, X. Wang, et al., Deep high-resolution representation learning for visual
[37] K. Han, Y. Wang, J. Guo, Y. Tang, E. Wu, Vision gnn: An image is worth graph
recognition, IEEE Trans. Pattern Anal. Mach. Intell. 43 (10) (2020) 3349–3364,
of nodes, Adv. Neural Inf. Process. Syst. 35 (2022) 8291–8303.
http://dx.doi.org/10.1109/TPAMI.2020.2983686.
[38] Z. Fei, J. Guo, H. Gong, L. Ye, E. Attahi, B. Huang, A GNN architecture with
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser,
local and global-attention feature for image classification, IEEE Access (2023)
I. Polosukhin, Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).
http://dx.doi.org/10.1109/ACCESS.2023.3285246.
[14] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: [39] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V.
Hierarchical vision transformer using shifted windows, in: Proceedings of the Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of
IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022. the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp.
[15] Q. Liu, M. Kampffmeyer, R. Jenssen, et al., Self-constructing graph convolu- 1–9, http://dx.doi.org/10.1109/CVPR.2015.7298594.
tional networks for semantic labeling, in: IGARSS 2020-2020 IEEE International [40] X. Ding, X. Zhang, J. Han, G. Ding, Scaling up your kernels to 31 × 31: Revisiting
Geoscience and Remote Sensing Symposium, IEEE, 2020, pp. 1801–1804, http: large kernel design in cnns, in: Proceedings of the IEEE/CVF Conference on
//dx.doi.org/10.1109/IGARSS39084.2020.9324719. Computer Vision and Pattern Recognition, 2022, pp. 11963–11975, http://dx.
[16] Y. Singh, C. Farrelly, Q.A. Hathaway, A. Choudhary, G. Carlsson, B. Erickson, doi.org/10.1109/CVPR52688.2022.01166.
T. Leiner, The role of geometry in convolutional neural networks for medical [41] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception
imaging, Mayo Clin. Proc. Dig. Health 1 (4) (2023) 519–526. architecture for computer vision, in: Proceedings of the IEEE Conference on
[17] M.M. Bronstein, J. Bruna, T. Cohen, P. Veličković, Geometric deep learning: Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
Grids, groups, graphs, geodesics, and gauges, 2021, arXiv preprint arXiv:2104. [42] T.N. Kipf, M. Welling, Variational graph auto-encoders, 2016, http://dx.doi.org/
13478. 10.48550/arXiv.1611.07308, arXiv preprint arXiv:1611.07308.
[18] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, et al., [43] M. Defferrard, X. Bresson, P. Vandergheynst, Convolutional neural networks on
Graph attention networks, stat 1050 (20) (2017) 10–48550. graphs with fast localized spectral filtering, Adv. Neural Inf. Process. Syst. 29
[19] K. Xu, W. Hu, J. Leskovec, S. Jegelka, How powerful are graph neural (2016).
networks? 2018, arXiv preprint arXiv:1810.00826. [44] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2014, http:
[20] D. Buterez, J.P. Janet, S.J. Kiddle, D. Oglic, P. Liò, Graph neural networks with //dx.doi.org/10.48550/arXiv.1412.6980, arXiv preprint arXiv:1412.6980.
adaptive readouts, Adv. Neural Inf. Process. Syst. 35 (2022) 19746–19758. [45] Q. Han, M. Hou, H. Wang, C. Wu, S. Tian, Z. Qiu, B. Zhou, EHDFL: Evolutionary
[21] W. Duan, J. Xuan, M. Qiao, J. Lu, Learning from the dark: boosting graph hybrid domain feature learning based on windowed fast Fourier convolution
convolutional neural networks with diverse negative samples, in: Proceedings pyramid for medical image classification, Comput. Biol. Med. 152 (2023)
of the AAAI Conference on Artificial Intelligence, Vol. 36, No. 6, 2022, pp. 106353, http://dx.doi.org/10.1016/j.compbiomed.2022.106353.
6550–6558. [46] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner,
[22] F.M. Bianchi, D. Grattarola, L. Livi, C. Alippi, Graph neural networks with M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16 × 16
convolutional arma filters, IEEE Trans. Pattern Anal. Mach. Intell. 44 (7) (2021) words: Transformers for image recognition at scale, 2020, http://dx.doi.org/10.
3496–3507, http://dx.doi.org/10.1109/TPAMI.2021.3054830. 48550/arXiv.2010.11929, arXiv preprint arXiv:2010.11929.
[23] N. Tremblay, P. Gonçalves, P. Borgnat, Design of graph filters and filterbanks, [47] Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, Y. Li, Maxvit: Multi-
in: Cooperative and Graph Signal Processing, Elsevier, 2018, pp. 299–324. axis vision transformer, in: European Conference on Computer Vision, Springer,
[24] V. Vasudevan, M. Bassenne, M.T. Islam, L. Xing, Image classification using graph 2022, pp. 459–479, http://dx.doi.org/10.1007/978-3-031-20053-327.
neural network and multiscale wavelet superpixels, Pattern Recognit. Lett. 166 [48] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong,
(2023) 89–96, http://dx.doi.org/10.1016/j.patrec.2023.01.003. et al., Swin transformer v2: Scaling up capacity and resolution, in: Proceedings
[25] M. Krzywda, S. Łukasik, A.H. Gandomi, Graph neural networks in computer of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
vision-architectures, datasets and common approaches, in: 2022 International pp. 12009–12019, http://dx.doi.org/10.1109/CVPR52688.2022.01170.
Joint Conference on Neural Networks, IJCNN, IEEE, 2022, pp. 1–10, http: [49] H. Touvron, M. Cord, H. Jego, DeiT III: Revenge of the ViT, in: European
//dx.doi.org/10.1109/IJCNN55064.2022.9892658. Conference on Computer Vision, 2022, pp. 516–533.
[26] S. De, R.J. Stanley, C. Lu, R. Long, S. Antani, G. Thoma, R. Zuna, A fusion-based [50] J. Wang, S. Zhang, Y. Liu, T. Wu, Y. Yang, X. Liu, K. Chen, P. Luo, D.
approach for uterine cervical cancer histology image classification, Comput. Lin, RIFormer: Keep your vision backbone effective but removing token mixer,
Med. Imag. Graph. 37 (7–8) (2013) 475–487, http://dx.doi.org/10.1016/j. in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
compmedimag.2013.08.001. Recognition, 2023, pp. 14443–14452, http://dx.doi.org/10.1109/CVPR52729.
[27] Z. Zhuang, Z. Yang, A.N.J. Raj, C. Wei, P. Jin, S. Zhuang, Breast ultrasound tumor 2023.01388.
image classification using image decomposition and fusion based on adaptive [51] Y. Rao, W. Zhao, Y. Tang, J. Zhou, S.-N. Lim, J. Lu, HorNet: Efficient high-order
multi-model spatial feature fusion, Comput. Methods Programs Biomed. 208 spatial interactions with recursive gated convolutions, Adv. Neural Inf. Process.
(2021) 106221, http://dx.doi.org/10.1016/j.cmpb.2021.106221. Syst. 35 (2022) 10353–10366.
11
A.S. Remigio Neurocomputing 617 (2025) 129038
[52] MMPreTrain Contributors, Openmmlab’s pre-training toolbox and benchmark, [59] E.H. Weissler, T. Naumann, T. Andersson, R. Ranganath, O. Elemento, Y. Luo,
2023. D.F. Freitag, J. Benoit, M.C. Hughes, F. Khan, et al., The role of machine learning
[53] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. in clinical research: transforming the future of evidence generation, Trials 22
Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine (2021) 1–15, http://dx.doi.org/10.1186/s13063-021-05489-x.
learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830.
[54] M. Brockschmidt, Gnn-film: Graph neural networks with feature-wise linear
modulation, in: International Conference on Machine Learning, PMLR, 2020, pp. Adrian S. Remigio received a B.Sc. degree in applied
1144–1152. physics from the University of the Philippines — Manila
[55] X. Wang, M. Zhang, How powerful are spectral graph neural networks, in: in 2017 and his master’s in medical physics degree from
International Conference on Machine Learning, PMLR, 2022, pp. 23341–23362. the Royal Melbourne Institute of Technology University,
[56] D. Bo, X. Wang, C. Shi, H. Shen, Beyond low-frequency information in graph Melbourne, Australia in 2022. He is currently a data sci-
convolutional networks, in: Proceedings of the AAAI Conference on Artificial entist at the Analytics, Computing, and Complex Systems
Intelligence, Vol. 35, No. 5, 2021, pp. 3950–3957. laboratory under the Asian Institute of Management. His
[57] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad- research interests include computer vision, machine and
cam: Visual explanations from deep networks via gradient-based localization, in: deep learning, and differential equations applied to the
Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. biological and medical physics domains, such as imaging-
618–626, http://dx.doi.org/10.1109/ICCV.2017.74. based survival analysis and inter-patient image registration.
[58] R.J. Ellis, R.M. Sander, A. Limon, Twelve key challenges in medical machine
learning and solutions, Intell. Based Med. 6 (2022) 100068, http://dx.doi.org/
10.1016/j.ibmed.2022.100068.
12