Paper_27-Recent_Advances_in_Medical_Image_Classification (4)
Paper_27-Recent_Advances_in_Medical_Image_Classification (4)
Abstract—Medical image classification is crucial for diagnosis necessitate robust and reliable MIC systems. Transparency and
and treatment, benefiting significantly from advancements in explainability are crucial for building trust among stakeholders.
artificial intelligence. The paper reviews recent progress in the Explainable AI (XAI) addresses this need by providing insights
field, focusing on three levels of solutions: basic, specific, and
applied. It highlights advances in traditional methods using deep into the decision-making process of MIC models, allowing
learning models like Convolutional Neural Networks and Vision physicians to understand the rationale behind classifications
Transformers, as well as state-of-the-art approaches with Vision- and make informed decisions.
Language Models. These models tackle the issue of limited labeled 3) Advancements in MIC: Recent advancements in MIC
data, and enhance and explain predictive results through have significantly enhanced its capabilities. Large-scale
Explainable Artificial Intelligence. Medical Vision-Language Models (Med-VLMs) trained on
Keywords—Medical Image Classification (MIC); Artificial extensive datasets of image-caption pairs enable a deeper
Intelligence (AI); Vision Transformer (ViT); Vision-Language understanding of visual information, leading to more accurate
Model (VLM); eXplainable AI (XAI) and generalizable models. Additionally, novel network
architectures like transformers and multi-task learning
I. INTRODUCTION approaches have further improved performance and efficiency.
Medical Image Classification (MIC), a crucial integration of Few-shot and zero-shot learning have also made significant
Artificial Intelligence (AI) and Computer Vision (CV), is contributions to MIC. Few-shot learning allows models to
revolutionizing image-based disease diagnosis. By categorizing classify images with minimal labeled examples, beneficial in
medical images into specific disease classes, MIC enhances fields where obtaining large labeled datasets is challenging.
diagnostic accuracy and efficiency. Utilizing various imaging
Zero-shot learning enables models to classify images from
modalities like X-rays, CT scans, MRI, and ultrasound, MIC
systems cater to specific clinical needs. Incorporating state-of- unseen classes by leveraging knowledge transfer from related
the-art technologies, MIC optimizes classification accuracy, tasks. Combined with Explainable AI (XAI) techniques, these
leading to precise diagnoses and improved patient care. approaches not only explain results and increase model
reliability but also optimize outcomes, enhancing system
1) The importance of MIC: The ability to interpret medical accuracy and performance. This comprehensive understanding
images accurately and efficiently is crucial for timely and and improved reliability facilitate their integration into clinical
effective patient care. However, manual image analysis can be practice with high confidence and precision, ultimately leading
time-consuming and prone to human error. MIC, leveraging AI to better patient outcomes and more efficient healthcare
and CV, offers automated analysis and classification of medical processes.
images, leading to several benefits: 4) Exploring MIC across three levels of solution: To fully
a) Improved diagnostic accuracy: MIC systems can grasp the current state of MIC, this paper delves into three
detect subtle patterns and features at the pixel level that may be distinct levels:
missed by human observers, leading to more accurate a) Level 1: Basic Models: This level examines the
diagnoses. fundamental theoretical models including MIC, including
b) Reduced workload for physicians: Automating image learning models, basic network architectures, and XAI
analysis frees up valuable time for physicians, allowing them to techniques.
focus on patient interaction and complex decision-making. b) Level 2: Task-Specific Models: This level explores
c) Enhanced efficiency: MIC systems can process large specific theoretical models and network architectures tailored
volumes of images quickly, leading to faster diagnoses and to particular MIC tasks, such as single-task and multi-task
treatment decisions. classification.
d) Improved patient outcomes: Ultimately, the improved c) Level 3: Applications: This level surveys prominent
accuracy and efficiency of MIC contribute to better patient applications of MIC within the medical community,
outcomes and overall healthcare quality. highlighting recent research trends and real-world
2) Challenges and the need for transparency: While MIC implementations.
offers immense potential, challenges remain. Hospital 5) Contributions and structure: This article makes several
overload, physician burnout, and the risk of misdiagnosis key contributions:
266 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
267 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
III. LEVEL 1 OF MIC (FUNDAMENTAL MODELS) full potential of MIC, ultimately leading to improved patient
Level 1 includes learning models, fundamental network care and clinical outcomes.
architectures and backbone DNN, and XAI. This level plays an 2) Multimodal learning with med-VLMs in MIC: Bridging
essential role in developing systems at the subsequent levels. the semantic gap between visual and textual information is
crucial for effective MIC. VLMs integrate Computer Vision
A. Learning Model
and Natural Language Processing, enabling a comprehensive
1) Unimodal learning in MIC: The evolution of learning understanding of medical data. This section explores the role of
models has significantly impacted the field of MIC, offering clinical and paraclinical data in Medical-VLMs (Med-VLMs)
solutions to challenges like manual data labeling and limited and surveys SOTA Med-VLMs for MIC.
generalization capacity. TABLE III. provides a concise a) Clinical and paraclinical data in Med-VLMs: To
comparison of various unimodal learning models commonly better understand the distinct roles and characteristics of
employed in MIC, highlighting their key characteristics and clinical and paraclinical data within Med-VLMs, TABLE I. It
suitability for different scenarios. Selecting an optimal learning provides a comparative analysis.
model for MIC tasks (see Table III) requires careful Clinical data provides valuable context for interpreting
consideration of data availability, labeling costs, privacy paraclinical images, while paraclinical data offers objective
requirements, and performance expectations. While supervised visualizations of potential abnormalities. Med-VLMs leverage
learning is powerful when labeled data is abundant, data both data types to enhance diagnostic accuracy and provide a
annotation limitations and privacy concerns necessitate holistic understanding of patient health.
exploring alternative paradigms. Semi-supervised, weakly-
b) State-of-the-Art (SOTA) Med-VLMs in MIC: Several
supervised, active learning, meta-learning, federated learning, advanced Med-VLMs have demonstrated remarkable
and self-supervised learning offer promising avenues to address performance in MIC tasks, utilizing sophisticated techniques
these challenges, fostering the development of more efficient such as transformer architectures, attention mechanisms, and
and generalizable MIC systems. Leveraging these diverse pre-training on large datasets. TABLE V. summarizes SOTA
approaches allows researchers and practitioners can unlock the Med-VLMs for MIC.
TABLE II. OVERVIEW OF THE THREE-LEVEL SOLUTION FRAMEWORK FOR MEDICAL IMAGE CLASSIFICATION
Level Content Specific solutions Explaination
Unimodal learning: Supervised learning, unsupervised learning, The evolution of learning models from
semi-supervised learning, weakly supervised learning, active learning, meta- unimodal to multimodal, exemplified by
learning, federated learning, self-supervised learning. the emergence of Med-VLM, represents
Med-VLMs: BiomedCLIP [1], XrayGPT [2], M-FLAG [3], and a significant advancement in the field.
Learning
MedBLIP [4]. Few-shot and zero-shot learning models
model
Some remarkable methods: further enhance the ability to classify
o Few-shot learning: BioViL-T [5], PM2 [6], and DeViDe [7]. medical images with minimal or no
o Zero-shot learning: MedCLIP [8], CheXZero [9], and MedKLIP labeled data, making them effective for
[10]. rare and novel diseases.
1
CNN: VGGNet [11], GoogleNet [12], ResNet [13], and EfficienNet Evolution of fundamental network
Architectures
[14]. architectures in image classification,
of fundamental
GNN: Graph Convolution Networks (GCN) [15], and GAT [16]. including CNNs, GNNs, and Vision
networks and
backbone DNN Transformer: ViT [17], DeiT [18], TransUnet [19], TransUnet+ Transformers, as well as their respective
[20], and TransUnet++ [21]. backbone DNNs.
For CNN: LIME [22], SHAP [23], CAM-based (CAM [24],
GradCAM [25], and GradCAM++ [26]). XAI is applied for CNN Architecture and
XAI
For Transformer: ProtoPFormer[27], X-Pruner [28], and GradCam Vision Transformer
for Vision Transformer [29].
CNN: Unet [30], Unet ++ [31], SNAPSHOT ENSEMBLE [32], and
Specific DNN Specialized network architectures have
PTRN [33].
architectures achieved high performance in MIC.
GNN: CCF-GNN [34] and GazeGNN [35].
and Med-VLM
for single task Transformer: SEViT [36] and MedViT [37].
(classification) Med-VLM: BERTHop [38], KAD [39], CLIPath [40], and Med-VLMs for MIC.
ConVIRT [41].
2 Specific DNN
architectures CNN: Mask-RCNN-X101 [42] and Cerberus [43]. MIC is advancing with multi-tasking.
and Med-VLM GNN: MNC-Net [44] and AMTI-GCN [45]. Classifying disease segments often excels
for multitask Transformer: TransMT-Net [46] and CNN-VisT-MLP-Mixer [47]. over whole image analysis. The advent of
(classification Med-VLM: GLoRIA [48], ASG [49], MeDSLIP [50], SAT [51], Med-VLMs for multi-tasking enhances
and CONCH [52], and ECAMP [53]. precision and depth of analyses.
segmentation)
Breast Cancer [54] [55], tuberculosis [56], eye disease diagnosis Surveying prominent applications
Specific [57][58], skin cancer diagnosis [59] [60], bone disease [61] - [63], other significant to the medical community.
3
applications pathological [64] [65] Recent research trends in MIC (2020 -
Cancer, brain, tumor, lesion, lung, breast, eye, etc. 2024) and cancer statistics for 2024
268 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
269 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
Flexibility: Can quickly adapt to new tasks with minimal Flexibility: Expands the diagnostic capabilities of AI
data, which is crucial in dynamic environments like systems to include rare and novel diseases, which are
medical imaging. often not well-represented in training datasets.
Disadvantages: Disadvantages:
Performance: May be less effective compared to models Accuracy: Performance may be lower compared to
trained on large, fully labelled datasets. models trained specifically on the classes of interest,
particularly for highly dissimilar unseen classes.
Complexity: Requires careful design of task sets for
training to ensure generalizability and robustness. Dependency on Semantic Descriptions: Requires
accurate and rich semantic information to function
b) Zero-shot learning in MIC: Zero-shot learning (ZSL) effectively, which can be a limitation if such data is not
enables the classification of unseen classes by leveraging available.
semantic relationships between known and unknown classes.
ZSL's core principle is to use auxiliary information, such as Overall, few-shot and zero-shot learning models address the
textual descriptions, to bridge the gap between seen and unseen challenge of limited labeled data in medical image classification.
classes, thereby expanding AI systems' diagnostic capabilities. FSL adapts quickly to new tasks with minimal training samples,
while ZSL uses semantic relationships to diagnose rare and
Core Principles:
novel diseases. Each approach has unique advantages and
Semantic Embeddings: Align visual features with limitations that must be considered when designing MIC
semantic representations (e.g., word embeddings) to systems. Understanding these principles is crucial for
infer the class of unseen instances by creating a shared developing effective and reliable MIC models.
space where both visual and semantic data coexist.
B. Architectures of Fundamental Networks and Backbone
Knowledge Transfer: Utilize knowledge from known DNN
classes to predict the properties of unknown classes MIC has significantly shifted from traditional machine
based on their semantic descriptions, effectively learning methods to deep learning approaches. This review
transferring learned information across domains. focuses on fundamental DL architectures commonly used in
Common Models: MIC, including Convolutional Neural Networks (CNNs), Graph
Neural Networks (GNNs), and Transformers. These
MedCLIP [8] uses contrastive learning from unpaired architectures have shown remarkable efficacy in automatically
medical image-text data to improve representation learning hierarchical feature representations and achieving state-
learning and zero-shot prediction, achieving strong of-the-art performance in various MIC tasks.
performance even with limited data.
1) Convolutional Neural Networks (CNNs): CNNs have
CheXZero [9] is a deep learning model specifically for become the cornerstone of MIC due to their ability to
chest X-ray classification, utilizing pre-trained CNNs automatically learn hierarchical feature representations.
and fine-tuning on labelled data to achieve high accuracy Inspired by the human visual cortex (Fig. 2 [66]), CNNs excel
in identifying thoracic diseases. at capturing local features within images, making them ideal for
MedKLIP [10] leverages medical knowledge during tasks like disease detection, organ segmentation, and anomaly
language-image pre-training in radiology, enhancing its identification. This section explores the core components of
ability to handle unseen diseases in zero-shot tasks and CNNs and their contributions to feature extraction and
maintaining strong performance even after fine-tuning. classification, followed by a review of popular CNN
These models represent significant advancements in medical architectures and their advancements in MIC.
image classification, demonstrating impressive results and a) Core components of CNNs: TABLE VI. summarizes
addressing the unique challenges posed by healthcare data. the core components of a CNN and their functions in feature
Advantages: extraction and class prediction.
These components work synergistically to enable CNNs to
Scalability: Enables classification of novel classes learn intricate features from medical images, leading to accurate
without prior training examples, making it highly classification.
scalable and versatile.
270 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
Language encoder:
Summarizes Hybrid: Integration of medical
Vicuma
chest X-rays Image-report Interactive knowledge through interactive
Vision encoder: Fine-tuned Vicuna
XrayGPT [2] by aligning matching and summaries from summaries, enhancing the
MedCLIP on curated reports
MedClip with mixed radiology reports interpretability and usability
Fusion method: early
Vicuna. objectives of diagnostic results.
fusion
Frozen
Language encoder: Hybrid:
language
CXR-BERT (frozen) Image-text Potential for Outperforms existing Model optimization and
model,
Vision encoder: contrastive classification, MedVLP efficiency, achieving high
M-FLAG [3] orthogonality
ResNet50 learning and segmentation, object approaches, 78% performance with reduced
loss for
Fusion method: late language detection parameter reduction parameters.
harmonized
fusion generative
latent space.
Language encoder:
Bootstraps Efficient 3D medical image
BioMedLM Global and
VLP from 3D Combines pre- SOTA zero-shot processing facilitates
Vision encoder: ViT- local
MedBLIP [4] medical trained vision and classification of classifying complex
G14 (EVA-CLIP) contrastive
images and language models Alzheimer’s disease conditions with minimal
Fusion method: late learning
texts labeled data.
fusion
b) Popular CNN architectures: A Historical Perspective: a) GNN variants and their advantages: Two prominent
The evolution of CNN architectures has been driven by Graph Neural Network (GNN) variants demonstrate
continuous innovation in addressing challenges and improving considerable potential in image classification: Graph
performance. TABLE VII. highlights key milestones: Convolutional Networks (GCNs) and Graph Attention
CNN architectures offer unique advantages and have Networks (GATs).
demonstrably excelled in image classification tasks. Their
capacity to learn intricate features and generalize to new data
underscores their value in advancing image analysis and related
research fields. Ongoing research promises further innovations
in CNN architecture and training methodologies, leading to
increasingly accurate and efficient image classification systems.
This progress holds particular significance for the medical
domain, where precise image classification can directly impact
diagnosis and patient care.
2) Graph Neural Networks (GNNs): leveraging
relationships in image data
GNNs offer a unique approach to image classification by
representing images as graphs and exploiting the relationships
Fig. 2. Illustration of convolutional neural networks (CNNs) inspired by
between pixels or image regions. This allows GNNs to capture biological visual mechanisms [66].
contextual information and learn more robust representations
compared to traditional CNNs.
271 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
TABLE VI. CNN COMPONENTS AND THEIR ROLES IN MIC dealing with data where relationships between elements are
Component Function Role in MIC crucial. Their ability to leverage graph structures and learn
Convolutional
Applies filters to extract Hierarchical feature contextual representations opens new avenues for improving
local features (edges, extraction, capturing accuracy and robustness in image classification tasks.
Layer
textures) spatial relationships.
Activation Introduces non-linearity Enables complex decision 3) Transformers: Expanding Horizons in Image
Function for learning complex boundaries for accurate Classification
(e.g., ReLU) patterns. classification.
Down-samples feature Improves robustness to Transformers, initially designed for NLP, have emerged as
Pooling Layer
(e.g., Max
maps to reduce image variations and powerful contenders in image classification. Unlike CNNs,
dimensionality and reduces computational transformers leverage self-attention mechanisms to capture
Pooling)
improve invariance. cost. global context and long-range dependencies within images,
Fully- Integrates local features
Connected into global patterns for
Combines learned features leading to richer feature representations.
for final class prediction.
Layer image understanding.
a) Contributions of transformers to image classification
Converts outputs into Provides class probabilities
Softmax
probability distribution for determining the most Feature extraction: Vision transformers (ViTs [17]) split
Layer
over predicted classes. likely class. images into patches, embed them into vectors, and
incorporate positional information. Self-attention
TABLE VII. POPULAR CNN ARCHITECTURES AND THEIR ADVANCEMENTS mechanisms then assess the importance of each patch in
Architecture relation to others, enabling the capture of global context
Advancement Key Technique and intricate features.
(Year)
Achieved SOTA
VGGNet
performance with
Small 3x3 filters for deeper Class prediction: A classification head on top of the final
(2014, [11]) networks transformer encoder layer predicts the image class based
increased depth
GoogleNet
Further reduced error Inception modules, 1x1 on the learned global context. Parallel processing of
rates with efficient convolutions, global average patches enhances computational efficiency compared to
(2014, [12])
architecture pooling sequential CNNs.
Residual blocks with skip
ResNet Enabled training of very
connections to address b) Evolution of transformer architectures
(2015, [13]) deep networks
vanishing gradients
Compound scaling for Vision Transformer (ViT [17]): Introduced the
EfficientNet SOTA accuracy with transformer architecture to image classification,
optimal efficiency and
(2019, [14]) fewer parameters
performance achieving impressive performance with patch-based
processing and self-attention.
GCNs [15], by generalizing the convolution operation to
graph data, effectively capture the local graph structure Data-efficient image Transformers (DeiT [18]):
and relationships between nodes. This capability allows Improved efficiency through knowledge distillation and
GCNs to leverage the inherent structural information efficient training strategies, achieving comparable results
within images for improved classification. with fewer resources.
GATs [16], on the other hand, introduce an attention Specialized variants (e.g., TransUnet [19], TransUnet+
mechanism to GNNs. This mechanism enables GATs to [20], and TransUnet++ [21]): Combine transformers
focus on relevant features within the graph, leading to with U-Net architectures for enhanced feature extraction
improved feature extraction and ultimately, enhanced and accurate segmentation in medical imaging tasks.
prediction accuracy. By selectively attending to
important features, GATs can make more informed c) Addressing challenges: Techniques like dropout,
decisions during image classification. regularization, and efficient optimization algorithms mitigate
overfitting and manage computational complexity in
b) Benefits of GNNs for Image Classification transformers.
Modeling complex relationships: GNNs excel at In summation, the choice of architecture depends on the
capturing intricate dependencies between image specific task and dataset characteristics. CNNs excel at local
elements, leading to better understanding of image feature extraction, GNNs leverage relationships within data, and
context. transformers capture global context and long-range
dependencies. Understanding these strengths and weaknesses
Improved feature extraction: By considering
empowers researchers to select the most appropriate architecture
relationships between nodes, GNNs can extract more
for their MIC tasks.
informative and discriminative features for
classification. C. Explainable Artificial Intelligence (XAI) in MIC
Enhanced robustness: GNNs are less susceptible to noise XAI techniques are crucial for fostering trust and
and variations in image data due to their focus on understanding in MIC systems. Despite achieving human-level
structural information. accuracy, the integration of automated MIC into clinical practice
has been limited due to the lack of explanations for algorithmic
c) Summary: GNNs offer a valuable complementary decisions. XAI methodologies provide insights into the rationale
approach to CNNs for image classification, particularly when behind the classification results of DL models, such as CNNs
272 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
and Transformers, used in MIC tasks. By addressing the 'how' XAI methods, transitioning from those primarily designed for
and 'why' behind predictive outcomes, XAI enhances the CNNs to novel techniques tailored for Transformer
transparency and interpretability of MIC systems, contributing architectures. The following tables provide a comparative
to their improved performance and acceptance in clinical analysis of recent advancements in XAI methods applied to
settings.
CNNs (TABLE VIII. ), Transformers (TABLE IX. ) within the
1) XAI methods in CNNs and transformers: The field of MIC domain, along with techniques used to enhance system
XAI has witnessed significant advancements, particularly in the performance (TABLE X. ).
domain of MIC. This progress is evident in the evolution of
TABLE VIII. XAI METHODS FOR CNNS IN MIC
273 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
2) Discussion: The tables above illustrate the diverse range SHAP provide a unified measure of feature importance,
of XAI methods available for both CNNs and Transformers in allowing for precise identification of influential features
MIC. While CNN-based methods like LIME, SHAP, and Grad- and potential sources of errors.
CAM variants have been widely explored, the emergence of CAM-based Methods: These methods generate visual
Transformers has led to the development of novel techniques explanations by highlighting regions in the input image
like ProtoPFormer and X-Pruner. These methods offer unique that influence the model's predictions, making it easier to
advantages in terms of interpretability and performance spot and address inaccuracies.
improvement. Transformer-based XAI Techniques:
a) Key Observations
Focus on Visual Explanations: Many XAI methods, ProtoPFormer uses prototypical parts to explain
particularly those applied to CNNs, emphasize visual predictions, aiding in the identification of errors by
explanations through saliency maps and other comparing new instances with learned prototypes.
visualization techniques. This is crucial in MIC, where X-Pruner prunes less important parts of the model,
understanding the model's focus on specific image enhancing interpretability and helping to pinpoint and fix
regions is essential for building trust and ensuring model weaknesses.
reliable diagnoses.
GradCam for Vision Transformer adapts GradCAM for
Evolution from Local to Global Explanations: XAI transformers, providing visual explanations that help in
methods have progressed from providing local diagnosing and correcting errors in transformer-based
explanations for individual predictions (e.g., LIME) to MIC models.
offering global interpretations of model behavior (e.g.,
SHAP). This allows for a more comprehensive Impact on MIC:
understanding of the decision-making process. Error Detection: XAI techniques make it easier to
Integration with Model Optimization: Techniques like identify misclassifications and understand why they
X-Pruner demonstrate the potential of integrating XAI occur, enabling targeted corrections.
with model optimization strategies like pruning. This Model Improvement: By revealing which features and
allows for the development of more efficient and regions are most influential, XAI helps refine model
interpretable models. training and architecture, leading to better performance.
b) Future directions Trust and Reliability: Enhanced transparency builds trust
Developing XAI methods specifically tailored for among clinicians, ensuring that MIC systems are more
Transformer architectures: While existing techniques likely to be adopted and relied upon in clinical settings.
like Grad-CAM have been adapted for ViTs, further
research is needed to explore methods that fully leverage Some recent XAI techniques:
the unique characteristics of Transformers. Recent studies have shown that using XAI methods such as
Combining XAI with other AI advancements: Integrated Gradients can significantly enhance the performance
Integrating XAI with areas like federated learning and of classification systems.
continual learning can lead to more robust and adaptable A notable study by Apicella et al. (2023, [67])
medical image classification systems. investigated the application of Integrated Gradients, a
Standardization and Benchmarking: Establishing technique from XAI, to enhance the performance of
standardized evaluation metrics and benchmarks for XAI classification models. The study focused on three distinct
methods will facilitate fair comparisons and accelerate datasets: Fashion-MNIST, CIFAR10, and STL10.
progress in the field. Integrated Gradients were employed to identify and
quantify the importance of input features contributing to
c) Enhancing performance and accuracy in MIC with the model's predictions. By analyzing these feature
XAI: XAI techniques significantly improve the performance attributions, the researchers were able to pinpoint which
and accuracy of MIC systems by providing transparency and features had the most significant impact on the model's
facilitating error detection and correction. These techniques output. The insights gained from Integrated Gradients
help identify and rectify model shortcomings, leading to more were then used to refine the model. This involved
reliable and effective MIC systems. adjusting the model parameters and structure to better
capture the critical features identified by the XAI
CNN-based XAI Techniques: method. The study demonstrated that through this
LIME create interpretable models for individual process of feature importance analysis and subsequent
predictions, helping to identify and correct model optimization, the classification performance
misclassifications by highlighting important features. improved significantly across all tested datasets. This
approach not only enhanced accuracy but also provided
a deeper understanding of the model's decision-making
process.
274 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
275 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
276 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
277 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
c) Anatomical structure guidance: ASG (IRA) and even with limited labeled data, making them valuable in
MeDSLIP leverage anatomical information to improve scenarios with scarce data resources.
interpretability and clinical relevance, leading to more accurate Significantly, Med-VLMs are revolutionizing MIC by
classifications. leveraging the power of multimodal AI and multitask learning.
d) Multitask capabilities: Many of these models excel at These models offer enhanced diagnostic precision, efficiency,
both classification and segmentation tasks, providing a more and interpretability, ultimately leading to improved patient care
comprehensive analysis of medical images. and outcomes. As research in this area continues, we can expect
even more powerful and versatile Med-VLMs to emerge, further
e) Zero-Shot and few-shot learning: Several models,
transforming the field of medical imaging and healthcare as a
including GLoRIA and SAT, demonstrate strong performance
whole.
278 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
1 7
https://nihcc.app.box.com/v/ChestXray-NIHCC https://physionet.org/content/mimic-cxr/2.0.0/
8
2
https://stanfordmlgroup.github.io/competitions/mura/ https://www.who.int/data/
9
3
https://clinicalcenter.nih.gov/ https://aimi.stanford.edu/medical-imagenet
10
4
https://isdis.org/ https://www.kaggle.com/datasets
11
5
https://camelyon17.grand-challenge.org/ https://paperswithcode.com/datasets
6
https://aimi.stanford.edu/chexpert-chest-x-rays
279 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
Superior soft tissue contrast, Expensive, long scan times, Radiation risk: None. Image
Detailed images of internal structures.
no ionizing radiation. contraindicated for patients detail: Very High. Best for
MRI Assesses brain, spinal cord, joints, and
Multiplanar imaging, detects with certain metallic soft tissue and organ
organs.
subtle abnormalities. implants. visualization.
Operator-dependent, limited
Real-time imaging, non- Radiation risk: None. Image
Uses sound waves to produce real-time penetration in obese patients,
invasive, safe, portable, detail: Moderate. Best for
Ultrasound images. Examines abdomen, pelvis, heart, and less detailed images
widely available, no ionizing real-time imaging and
monitors fetal development. compared to other
radiation. pregnancy monitoring.
modalities.
280 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
patient anxiety. Deep learning models have shown Overall, AI-CAD systems have shown remarkable potential
accuracy comparable to experienced radiologists, with in various medical imaging applications, from breast cancer
some hybrid models outperforming human experts. The screening to tuberculosis detection, eye disease diagnosis, skin
AI-STREAM study [55] aims to generate real-world cancer diagnosis, and other pathological conditions. By
evidence on the benefits and drawbacks of AI-based leveraging the power of deep learning and large datasets, these
computer-aided detection/diagnosis (CADe/x) for breast systems can augment and enhance human expertise, leading to
cancer screening. improved diagnostic accuracy, efficiency, and accessibility in
healthcare.
Tuberculosis Detection: AI-based CAD systems can
assist in community-based active case finding for C. Recent Research Trends in Medical Image Classification
tuberculosis, especially in areas with limited access to and Cancer Statistics (2020-2024)
experienced physicians. Okada et al. [56] demonstrated Recent statistics from representative journals using
the applicability of AI-CAD for pulmonary tuberculosis keywords related to medical image classification cover the latest
in community-based active case findings, showing advancements from 2020 to 2024. In addition, the 2024 Cancer
performance levels nearing human experts. This Statistics [74] indicate a 33% decrease in cancer deaths in the
approach holds promise in triaging and screening U.S. since 1991, attributed to reduced smoking, earlier
tuberculosis, with significant implications for addressing detection, and improved treatments. However, the incidence of
healthcare professional shortages in low- and middle- six major cancers continues to rise, with colorectal cancer
income countries. Such advancements contribute to the becoming a leading cause of death among men under 50. Efforts
World Health Organization's goal of "Ending like the Persistent Poverty Initiative aim to mitigate cancer
tuberculosis" by 2030. outcomes' impact of poverty, emphasizing the need for
Eye Disease Diagnosis: Google's deep learning analysis increased investment in prevention and disparity reduction.
[57] achieved a detection sensitivity of about 98% in The document concludes with a projection of the top ten
diagnosing eye diseases. AI analysis of fundus cancer types for new cases and deaths in the United States for
photographs [58] can assist in diagnosing not only eye 2024, underscoring the ongoing challenge and importance of
diseases but also systemic conditions like heart disease, advancements in medical imaging diagnosis.
surpassing human capabilities.
VI. CHALLENGES AND ADVANCEMENTS IN MIC
Skin Cancer Diagnosis: AI demonstrates accuracy
equivalent to or higher than dermatologists in diagnosing While MIC has experienced significant progress, challenges
skin cancer, utilizing deep learning on large datasets of remain in data limitations, algorithm development, and
skin lesions. Studies have shown AI achieving diagnostic healthcare integration. This section explores these challenges
accuracy comparable to dermatologists [59] and even and proposes innovative solutions to advance the field.
outperforming them in differentiating melanoma [60].
TABLE XVI. FIVE-YEAR STATISTICS OF MEDICAL IMAGE CLASSIFICATION
Bone diseases: The use of AI, particularly deep learning, RESEARCH IN FOUR REPRESENTATIVE JOURNALS (2020-2024)
is gaining traction in the medical community for Classes Springer Sciencedirect IEEE PubMed
diagnosing and treating bone diseases. Recent
1 cancer 4064 3474 748 291
applications focus on segmentation and classification of
bone tumors and lesions in medical images. For instance, 2 brain 3599 2984 523 112
Zhan et al. [61] developed SEAGNET, a novel 3 tumor 2440 2789 436 103
framework for segmenting malignant bone tumors. 4 lesion 2378 3035 286 81
Yildiz Potter et al. [62] explored a multi-task learning 5 lung 2374 2102 433 98
approach for automated bone tumor segmentation and 6 Breast 2019 1815 309 110
classification. Additionally, Ye et al. [63] investigated an
7 eye 1979 1602 144 39
ensemble multi-task deep learning framework for the
detection, segmentation, and classification of bone 8 COVID 1894 1460 343 107
tumors and infections using multi-parametric MRI. 9 skin 1865 1647 241 71
These studies highlight the potential of deep learning to 10 Heart 1547 1489 121 19
significantly improve the accuracy and efficiency of 11 AIDS 964 738 30 3
diagnosing and treating bone diseases. 12 liver 898 971 61 25
Other Pathological Applications: AI has demonstrated 13 bone 847 938 83 32
superior performance in detecting lymph node metastasis 14 cardiac 722 898 30 14
of breast cancer [64] and detecting diabetes from fundus 15 prostate 581 638 25 14
photographs [65] with high sensitivity and specificity. 16 kidney 541 696 32 20
These applications underscore AI's potential in
17 tuberculosis 471 321 49 4
enhancing the accuracy and efficiency of medical
imaging diagnosis, ultimately improving patient 18 colorectal 442 494 35 22
outcomes and healthcare delivery. 19 Malaria 178 115 25 6
281 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
282 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
283 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
[34] H. Wang et al., “CCF-GNN: A unified model aggregating appearance, [55] Y. Chang et al., “Artificial intelligence for breast cancer screening in
microenvironment, and topology for pathology image classification,” mammography (AI-STREAM): Preliminary interim analysis of a
IEEE Trans. Med. Imaging, vol. 42, no. 11, pp. 3179–3193, 2023. prospective multicenter cohort study,” 2024.
[35] B. Wang et al., “GazeGNN: A gaze-guided graph neural network for chest [56] K. Okada et al., “Applicability of artificial intelligence-based computer-
X-ray classification,” arXiv [cs.CV], 2023. aided detection (AI–CAD) for pulmonary tuberculosis to community-
[36] F. Almalik, M. Yaqub, and K. Nandakumar, “Self-Ensembling Vision based active case finding,” Trop. Med. Health, vol. 52, no. 1, 2024.
Transformer (SEViT) for Robust Medical Image Classification,” arXiv [57] V. Gulshan et al., “Development and validation of a deep learning
[cs.CV], 2022. algorithm for detection of diabetic retinopathy in retinal fundus
[37] O. N. Manzari, H. Ahmadabadi, H. Kashiani, S. B. Shokouhi, and A. photographs,” JAMA, vol. 316, no. 22, p. 2402, 2016.
Ayatollahi, “MedViT: A robust vision transformer for generalized [58] R. Poplin et al., “Prediction of cardiovascular risk factors from retinal
medical image classification,” Comput. Biol. Med., vol. 157, no. 106791, fundus photographs via deep learning,” Nat. Biomed. Eng., vol. 2, no. 3,
p. 106791, 2023. pp. 158–164, 2018.
[38] M. Monajatipoor, M. Rouhsedaghat, L. H. Li, C.-C. Jay Kuo, A. Chien, [59] A. Esteva et al., “Dermatologist-level classification of skin cancer with
and K.-W. Chang, “BERTHop: An effective vision-and-language model deep neural networks,” Nature, vol. 542, no. 7639, pp. 115–118, 2017.
for chest X-ray disease diagnosis,” in Lecture Notes in Computer Science, [60] H. A. Haenssle et al., “Man against machine: diagnostic performance of a
Cham: Springer Nature Switzerland, 2022, pp. 725–734. deep learning convolutional neural network for dermoscopic melanoma
[39] X. Zhang, C. Wu, Y. Zhang, W. Xie, and Y. Wang, “Knowledge- recognition in comparison to 58 dermatologists,” Ann. Oncol., vol. 29,
enhanced visual-language pre-training on chest radiology images,” Nat. no. 8, pp. 1836–1842, 2018.
Commun., vol. 14, no. 1, p. 4542, 2023. [61] X. Zhan et al., “An intelligent auxiliary framework for bone malignant
[40] Z. Lai, Z. Li, L. C. Oliveira, J. Chauhan, B. N. Dugger, and C.-N. Chuah, tumor lesion segmentation in medical image analysis,” Diagnostics
“CLIPath: Fine-tune CLIP with visual feature fusion for pathology image (Basel), vol. 13, no. 2, p. 223, 2023.
analysis towards minimizing data collection efforts,” 2023 IEEE/CVF [62] I. Yildiz Potter et al., “Automated bone tumor segmentation and
International Conference on Computer Vision Workshops (ICCVW), pp. classification as benign or malignant using computed tomographic
2366–2372, 2023. imaging,” J. Digit. Imaging, vol. 36, no. 3, pp. 869–878, 2023.
[41] Y. Zhang, H. Jiang, Y. Miura, C. D. Manning, and C. P. Langlotz, [63] Q. Ye et al., “Automatic detection, segmentation, and classification of
“Contrastive Learning of Medical Visual Representations from Paired primary bone tumors and bone infections using an ensemble multi-task
Images and Text,” in Proceedings of the 7th Machine Learning for deep learning framework on multi-parametric MRIs: a multi-center
Healthcare Conference, 05--06 Aug 2022, vol. 182, pp. 2–25. study,” Eur. Radiol., 2023.
[42] C. E. von Schacky et al., “Multitask deep learning for segmentation and [64] B. Ehteshami Bejnordi et al., “Diagnostic assessment of deep learning
classification of primary bone tumors on radiographs,” Radiology, vol. algorithms for detection of lymph node metastases in women with breast
301, no. 2, pp. 398–406, 2021. cancer,” JAMA, vol. 318, no. 22, p. 2199, 2017.
[43] S. Graham et al., “One model is all you need: Multi-task learning enables [65] M. D. Abràmoff, P. T. Lavin, M. Birch, N. Shah, and J. C. Folk, “Pivotal
simultaneous histology image segmentation and classification,” Med. trial of an autonomous AI-based diagnostic system for detection of
Image Anal., vol. 83, no. 102685, p. 102685, 2023. diabetic retinopathy in primary care offices,” NPJ Digit. Med., vol. 1, no.
[44] L. Huang, X. Ye, M. Yang, L. Pan, and S. H. Zheng, “MNC-Net: Multi- 1, p. 39, 2018.
task graph structure learning based on node clustering for early [66] Zhu, G., Jiang, B., Tong, L., Xie, Y., Zaharchuk, G., & Wintermark, M.
Parkinson’s disease diagnosis,” Comput. Biol. Med., vol. 152, no. (2019). Applications of deep learning to neuro-imaging techniques.
106308, p. 106308, 2023. Frontiers in Neurology, 10, 869. https://doi.org/10.3389/fneur.2019.0086
[45] S. Jiang, Q. Feng, H. Li, Z. Deng, and Q. Jiang, “Attention based multi- G. Zhu, B. Jiang, L. Tong, Y. Xie, G. Zaharchuk, and M. Wintermark,
task interpretable graph convolutional network for Alzheimer’s disease “Applications of deep learning to neuro-imaging techniques,” Front.
analysis,” Pattern Recognit. Lett., vol. 180, pp. 1–8, 2024. Neurol., vol. 10, p. 869, 2019.
[46] S. Tang et al., “Transformer-based multi-task learning for classification [67] A. Apicella, L. Di Lorenzo, F. Isgrò, A. Pollastro, and R. Prevete,
and segmentation of gastrointestinal tract endoscopic images,” Comput. “Strategies to exploit XAI to improve classification systems,” in
Biol. Med., vol. 157, no. 106723, p. 106723, 2023. Communications in Computer and Information Science, Cham: Springer
[47] J. Tagnamas, H. Ramadan, A. Yahyaouy, and H. Tairi, “Multi-task Nature Switzerland, 2023, pp. 147–159.
approach based on combined CNN-transformer for efficient segmentation [68] A. Apicella, S. Giugliano, F. Isgrò, A. Pollastro, and R. Prevete, “An XAI-
and classification of breast tumors in ultrasound images,” Vis. Comput. based masking approach to improve classification systems,”
Ind. Biomed. Art, vol. 7, no. 1, 2024. BEWARE@AI*IA, pp. 79–83, 2023.
[48] S.-C. Huang, L. Shen, M. P. Lungren, and S. Yeung, “GLoRIA: A [69] L. Dao and N. Q. Ly, “A comprehensive study on medical image
multimodal global-local representation learning framework for label- segmentation using deep neural networks,” Int. J. Adv. Comput. Sci.
efficient medical image recognition,” in 2021 IEEE/CVF International Appl., vol. 14, no. 3, 2023.
Conference on Computer Vision (ICCV), 2021, pp. 3942–3951. [70] T. Beyer et al., “What scans we will read: imaging instrumentation trends
[49] Q. Li et al., “Anatomical Structure-Guided medical vision-language pre- in clinical oncology,” Cancer Imaging, vol. 20, no. 1, pp. 1–38, 2020.
training,” arXiv [cs.CV], 2024. [71] D. M. H. Nguyen et al., “LVM-Med: Learning large-scale self-supervised
[50] F W. Fan et al., “MeDSLIP: Medical Dual-Stream Language-Image Pre- vision models for medical imaging via second-order graph matching,”
training for fine-grained alignment,” arXiv [cs.CV], 2024. arXiv [cs.CV], 2023.
[51] B. Liu et al., “Improving medical vision-language contrastive pretraining [72] M. Antonelli, A. Reinke, S. Bakas, K. Farahani, and M. Jorge Cardoso,
with semantics-aware triage,” IEEE Trans. Med. Imaging, vol. 42, no. 12, “The Medical Segmentation Decathlon,” Nature Communications, vol.
pp. 3579–3589, 2023. 13, no. 1, p. 4128, 2022.
[52] M. Y. Lu et al., “A visual-language foundation model for computational [73] H. Fujita, “AI-based computer-aided diagnosis (AI-CAD): the latest
pathology,” Nat. Med., vol. 30, no. 3, pp. 863–874, 2024. review to read first,” Radiol. Phys. Technol., vol. 13, no. 1, pp. 6–19,
[53] R. Wang et al., “ECAMP: Entity-centered context-aware Medical Vision 2020.
language pre-training,” arXiv [cs.CV], 2023. [74] R. L. Siegel, A. N. Giaquinto, and A. Jemal, “Cancer statistics, 2024,” CA
[54] R. C. Mayo, D. Kent, L. C. Sen, M. Kapoor, J. W. T. Leung, and A. T. Cancer J. Clin., vol. 74, no. 1, pp. 12–49, 2024.
Watanabe, “Reduction of false-positive markings on mammograms: A [75] H. E. Kim, A. Cosa-Linan, N. Santhanam, M. Jannesari, M. E. Maros, and
retrospective comparison study using an artificial intelligence-based T. Ganslandt, “Transfer learning for medical image classification: a
CAD,” J. Digit. Imaging, vol. 32, no. 4, pp. 618–624, 2019. literature review,” BMC Med. Imaging, vol. 22, no. 1, 2022.
284 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 15, No. 7, 2024
[76] M. Islam, H. Zunair, and N. Mohammed, “CosSIF: Cosine similarity- [81] M. U. Alam, J. R. Baldvinsson, and Y. Wang, “Exploring LRP and Grad-
based image filtering to overcome low inter-class variation in synthetic CAM visualization to interpret multi-label-multi-class pathology
medical image datasets,” arXiv [cs.CV], 2023. prediction using chest radiography,” in 2022 IEEE 35th International
[77] T. Huynh, A. Nibali, and Z. He, “Semi-supervised learning for medical Symposium on Computer-Based Medical Systems (CBMS), 2022, pp.
image classification using imbalanced training data,” arXiv [cs.CV], 258–263.
2021. [82] C. A. Ramezan, T. A. Warner, and A. E. Maxwell, “Evaluation of
[78] P. Sreenivasulu and S. Varadarajan, “An efficient lossless ROI image sampling and cross-validation tuning strategies for regional-scale
compression using wavelet-based modified region growing algorithm,” J. machine learning classification,” Remote Sens. (Basel), vol. 11, no. 2, p.
Intell. Syst., vol. 29, no. 1, pp. 1063–1078, 2019. 185, 2019.
[79] H. Guan and M. Liu, “Domain adaptation for medical image analysis: A [83] G. Joshi and M. Bhandari, “FDA approved Artificial Intelligence and
survey,” arXiv [cs.CV], 2021. Machine Learning (AI/ML)-Enabled Medical Devices: An updated 2022
landscape,” Research Square, 2022.
[80] G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren, “Secure,
privacy-preserving and federated machine learning in medical imaging,”
Nat. Mach. Intell., vol. 2, no. 6, pp. 305–311, 2020.
285 | P a g e
www.ijacsa.thesai.org