0% found this document useful (0 votes)
12 views31 pages

Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction

This document surveys Explainable Artificial Intelligence (XAI), addressing the challenges posed by the black-box nature of AI models in critical domains like healthcare and finance. It provides a comprehensive review of existing literature, covering terminologies, techniques, applications, and future research directions, aimed at enhancing trustworthiness and transparency in AI systems. The paper highlights the need for XAI, its benefits, and the gaps in current research, making it a valuable resource for researchers and practitioners in the field.

Uploaded by

dokokeanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction

This document surveys Explainable Artificial Intelligence (XAI), addressing the challenges posed by the black-box nature of AI models in critical domains like healthcare and finance. It provides a comprehensive review of existing literature, covering terminologies, techniques, applications, and future research directions, aimed at enhancing trustworthiness and transparency in AI systems. The paper highlights the need for XAI, its benefits, and the gaps in current research, making it a valuable resource for researchers and practitioners in the field.

Uploaded by

dokokeanto
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Explainable Artificial Intelligence:

A Survey of Needs, Techniques, Applications, and Future Direction

Melkamu Mershaa , Khang Lamb , Joseph Wooda , Ali AlShamia , Jugal Kalitaa
a College of Engineering and Applied Science, University of Colorado Colorado Springs, , 80918, CO, USA
b College of Information and Communication Technology, Can Tho University, , Can Tho, 90000, Vietnam

Abstract
Artificial intelligence models encounter significant challenges due to their black-box nature, particularly in safety-critical domains
such as healthcare, finance, and autonomous vehicles. Explainable Artificial Intelligence (XAI) addresses these challenges by
providing explanations for how these models make decisions and predictions, ensuring transparency, accountability, and fairness.
arXiv:2409.00265v2 [cs.AI] 13 Jan 2025

Existing studies have examined the fundamental concepts of XAI, its general principles, and the scope of XAI techniques. However,
there remains a gap in the literature as there are no comprehensive reviews that delve into the detailed mathematical representa-
tions, design methodologies of XAI models, and other associated aspects. This paper provides a comprehensive literature review
encompassing common terminologies and definitions, the need for XAI, beneficiaries of XAI, a taxonomy of XAI methods, and
the application of XAI methods in different application areas. The survey is aimed at XAI researchers, XAI practitioners, AI model
developers, and XAI beneficiaries who are interested in enhancing the trustworthiness, transparency, accountability, and fairness of
their AI models.
Keywords: XAI, explainable artificial intelligence, interpretable deep learning, machine learning, neural networks, evaluation
methods, computer vision, natural language processing, NLP, transformers, time series, healthcare, and autonomous cars.

1. Introduction tainty in how they operate. Since these systems impact lives,
it leads to an emerging need to understand how decisions are
Since the advent of digital computer systems, scientists have made. Lack of such understanding makes it difficult to adopt
been exploring ways to automate human intelligence via com- such a powerful tool in industries that require sensitivity or that
putational representation and mathematical theory, eventually are critical to the survival of the species.
giving birth to a computational approach known as Artificial The black-box nature of AI models raises significant con-
Intelligence (AI). AI and machine learning (ML) models are cerns, including the need for explainability, interpretability, ac-
being widely adopted in various domains, such as web search countability, and transparency. These aspects, along with legal,
engines, speech recognition, self-driving cars, strategy game- ethical, and safety considerations, are crucial for building trust
play, image analysis, medical procedures, and national de- in AI, not just among scientists but also among the wider pub-
fense, many of which require high levels of security, transpar- lic, regulators, and politicians who are increasingly attentive to
ent decision-making, and a responsibility to protect information new developments. With this in mind, there has been a shift
[1, 2]. Nevertheless, significant challenges remain in trusting from just relying on the power of AI to understanding and in-
the output of these complex ML algorithms and AI models be- terpreting how AI has arrived at decisions, leading to terms such
cause the detailed inner logic and system architectures are ob- as transparency, explainability, interpretability, or, more gener-
fuscated by the user by design. ally, eXplainable Artificial Intelligence (XAI). A new approach
AI has shown itself to be an efficient and effective way to is required to trust the AI and ML models, and though much
handle many tasks at which humans usually excel. In fact, it has been accomplished in the last decades, the interpretability
has become pervasive, yet hidden from the casual observer, in and black-box issues are still prevalent [4, 5]. Attention given
our day-to-day lives. As AI techniques proliferate, the imple- to XAI has grown steadily (Figure 1), and XAI has attracted a
mentations are starting to outperform even the best expectations thriving number of researchers, though there still exists a lack
across many domains [3]. Since AI solves difficult problems, of consensus regarding symbology and terminology. Contri-
the methodologies used have become increasingly complex. A butions rely heavily on their own terminology or theoretical
common analogy is that of the black box, where the inputs are framework [6].
well-defined, as are the outputs. However, the process is not Researchers have been working to increase the interpretabil-
transparent and cannot be easily understood by humans. The ity of AI and ML models to gain better insight into black-box
AI system does not usually provide any information about how decision-making. Questions being explored include how to ex-
it arrives at the decisions it makes. The systems and processes plain the decision-making process, approaches for interpretabil-
used in decision-making are often abstruse and contain uncer- ity and explainability, ethical implications, and detecting and
1
works. In Figure 2, our “all you need here” shows how our
survey offers a clear and systematic approach, enabling read-
ers to understand the multifaceted nature of XAI. To the best
of our knowledge, this is the first work to comprehensively re-
view explainability across traditional neural network models,
reinforcement learning models, and Transformer-based mod-
els (including large language models and Vision Transformer
models), covering various application areas, evaluation meth-
ods, XAI challenges, and future research directions.

Figure 1: Research publications in the explainability of artificial intelligence


(XAI) field during the last few years

addressing potential biases or errors [1, 7]. These and other crit-
ical questions remain open and require further research. This
survey attempts to address these questions and provide new in-
sights to advance the adoption of explainable artificial intelli-
gence among different stakeholders, including practitioners, ed-
ucators, system designers, developers, and other beneficiaries.
A significant number of comprehensive studies on XAI have
been released. XAI survey publications usually focus on de-
scribing XAI basic terminology, outlining the explainability
taxonomy, presenting XAI techniques, investigating XAI ap-
plications, analyzing XAI opportunities and challenges, and
proposing future research directions. Depending on the goals Figure 2: ‘All you need here’-A comprehensive overview of XAI concepts.
of each study, the researchers may concentrate on specific as-
pects of XAI. Some outstanding survey papers and their main The main contributions of our work are presented below:
contributions are as follows. Gilpin et al. [8] defined and distin-
• Develop and present a comprehensive review of XAI that
guished the key concepts of XAI, while Adadi and Berrada [9]
addresses and rectifies the limitations observed in previous
introduced criteria for developing XAI methods. Arrieta et
review studies.
al. [10] and Minh et al. [11] concentrated on XAI techniques.
In addition to XAI techniques, Vilone and Longo [4] also ex- • More than two hundred research articles were surveyed in
plored the evaluation methods for XAI. Stakeholders, who ben- this comprehensive study in the XAI field.
efit from XAI, and their requirements were examined by Langer
et al. [12]. Speith [13] performed studies on the common XAI • Discuss the advantages and drawbacks of each XAI tech-
taxonomies and identified new approaches to build new XAI nique in depth.
taxonomies. Räuker et al. [14] emphasized on inner inter-
pretability of the deep learning models. The use of XAI to • Highlight the research gaps and challenges of XAI to
enhance machine learning models is investigated in the study strengthen future works.
of Weber et al. [15]. People have discussed the applications of
The paper is organized into eight sections: Section 2 intro-
XAI in a variety of domains and tasks [16] or specific domains,
duces relevant terminology and reviews the background and
such as medicine [17, 18, 19], healthcare [20, 21, 22, 23], and
motivation of XAI research. Section 3 and Section 4 present
finance [23]. Recently, Longo et al. [24] proposed a manifesto
types of explainability techniques and discussions on XAI tech-
to govern the XAI studies and introduce more than twenty open
niques along different dimensions, respectively. Section 5 dis-
problems in XAI and their suggested solutions.
cusses XAI techniques in different applications. Section 6 and 7
Our systematic review carefully analyzes more than two hun-
present XAI evaluation methods and future research direction,
dred studies in the domain of XAI. This survey provides a com-
respectively. Section 8 concludes the survey.
plete picture of XAI techniques for beginners and advanced re-
searchers. It also covered explainable models, application ar-
eas, evaluation of XAI techniques, challenges, and future direc- 2. Background and Motivation
tions in the domain, Figure 2. The survey provides a compre-
hensive overview of XAI concepts, ranging from foundational Black-box AI systems have become ubiquitous and are per-
principles to recent studies incorporating mathematical frame- vasively integrated into diverse areas. XAI has emerged as a
2
necessity to establish trustworthy and transparent models, en- have arguably become as opaque as the human brain [30]. The
sure governance and compliance, and evaluate and improve the model accepts the input and gives the output or the prediction
decision-making process of AI systems. without any reasonable details about why and how the model
made that prediction or decision. The black-box nature of AI
2.1. Basic Terminology models can be attributed to various factors, including model
complexity, optimization algorithms, large and complex train-
Before discussing XAI in-depth, we briefly present the basic ing data sets, and the algorithms and processes used to train the
terminology used in this work. models. Deep neural AI models, in particular, exacerbate these
AI systems can perform tasks that normally require human concerns due to the design of deep neural networks (DNN),
intelligence [25]. They can solve complex problems, learn from with components that remain hidden from human comprehen-
large amounts of data, make autonomous decisions, and under- sion.
stand and respond to challenging prompts using complex algo-
rithms.
2.2. Need for Explanation
XAI systems refer to AI systems that are able to provide
explanations for their decisions or predictions and give insight Black-box AI systems have become ubiquitous throughout
into their behaviors. In short, XAI attempts to understand society, extensively integrated in a diverse range of disciplines,
“WHY did the AI system do X?”. This can help build compre- and can be found permeating many aspects of daily activities.
hensions about the influences on a model and specifics about The need for explainability in real-world applications is multi-
where a model succeeds or fails [11]. faceted and essential for ensuring the performance and reliabil-
Trust is the degree to which people are willing to have con- ity of AI models while allowing users to work effectively with
fidence in the outputs and decisions provided by an AI system. these models. XAI is becoming essential in building trustwor-
A relevant question is: Does the user trust the output enough to thy, accountable, and transparent AI models to satisfy delicate
perform real-world actions based upon it? [26]. application designs [31, 32].
Machine learning is a rapidly evolving field within com- Transparency: Transparency is the capability of an AI sys-
puter science. It is a subset of AI that involves the creation of tem to provide understandable and reasonable explanations of a
algorithms designed to emulate human intelligence by captur- model’s decision or prediction process [4, 33, 34]. XAI systems
ing data from surrounding environments and learning from such explain how AI models arrive at their prediction or decision so
data using models, as discussed in the previous paragraph [27]. that experts and model users can understand the logic behind
ML imitates the way humans learn, gradually improving accu- the AI systems [17, 35], which is crucial for trustworthiness and
racy over time based on experience. In essence, ML is about en- transparency. Transparency has a meaningful impact on peo-
abling computers to think and act with less human intervention ple’s willingness to trust the AI system by using directly inter-
by utilizing vast amounts of data to recognize patterns, make pretable models or availing XAI system explanations [36]. For
predictions, and take actions based on that data. example, if on a mobile device, voice-to-text recognition sys-
Models and algorithms are two different concepts. How- tems produce wrong transcription, the consequences may not
ever, they are used together in the development of real-world always be a big concern although it may be irritating. This may
AI systems. A model (in the context of machine learning) is a also be the case in a chat program like ChatGPT if the questions
computational representation of a system whose primary pur- and answers are “simple”. In this case, the need for explainabil-
pose is to make empirical decisions and predictions based on ity and transparency is less profound. In contrast, explainabil-
the given input data (e.g., neural network, decision tree, or lo- ity and transparency are crucial in critical safety systems such
gistic regression). In contrast, an algorithm is a set of rules or as autonomous vehicles, medical diagnosis and treatment sys-
instructions used to perform a task. The models can be simple tems, air traffic control systems, and military systems [2].
or complex, and trained on the input data to improve their accu- Governance and compliance issues: XAI enables gover-
racy in decision-making or prediction. Algorithms can also be nance in AI systems by confirming that decisions made by AI
simple or complex, but they are used to perform a specific task systems are ethical, accountable, transparent, and compliant
without any training. Models and algorithms differ by output, with any laws and regulations. Organizations in domains such
function, design, and complexity [28]. as healthcare and finance can be subject to strict regulations,
Deep learning refers to ML approaches for building multi- requiring human understanding for certain types of decisions
layer (or “deep”) artificial neural network models that solve made by AI models [1, 37, 38]. For example, if someone is
challenging problems. Specifically, multiple (and usually com- denied a loan by the bank’s AI system, he or she may have the
plex) layers of neural networks are used to extract features from right to know why the AI system made this decision. Simi-
data, where the layers between the input and output layers are larly, if a class essay is graded by an AI and the student gets
“hidden” and opaque [29]. a bad grade, an explanation may be necessary. Bias is often
A black-box model refers to the lack of transparency and present in the nature of ML algorithms’ training process, which
understanding of how an AI model works when making pre- is sometimes difficult to notice. This raises concerns about an
dictions or decisions. Extensive increases in the amount of data algorithm acting in a discriminatory way. XAI has been found
and performance of computational devices have driven AI mod- to serve as a potential remedy for mitigating issues of discrim-
els to become more complex, to the point that neural networks ination in the realms of law and regulation [39]. For instance,
3
if AI systems use sensitive and protected attributes (e.g., re-
ligion, gender, sexual orientation, and race) and make biased
decisions, XAI may help identify the root cause of the bias and
give insight to rectify the wrong decision. Hence, XAI can help
promote compliance with laws and regulations regarding data
privacy and protection, discrimination, safety, and reliability.
Model performance and debugging: XAI offers poten-
tial benefits in enhancing the performance of AI systems, par-
ticularly in terms of model design and debugging as well as
decision-making processes [2, 40, 41]. The use of XAI tech-
niques facilitates the identification and selection of relevant fea-
tures for developing accurate and practical models. These tech-
niques help tune hyperparameters such as choice of activation
functions, number of layers, and learning rates to prevent under-
fitting or overfitting. The explanation also helps the developers
with bias detection in the decision-making process. If the de-
velopers quickly detect the bias, they can adjust the system to
ensure that outputs are unbiased and fair. XAI can enable de-
velopers to identify decision-making errors and correct them,
helping develop more accurate and reliable models. Explana-
tion can enable users to have more control over the models so
as to be able to modify the input parameters and observe how Figure 3: XAI stakeholders/beneficiaries.
parameter changes affect the prediction or decision. Users can
also provide feedback to improve the model decision process
based on the XAI explanation. Industries: XAI is crucial for industries to provide trans-
Reliability: ML models’ predictions and outputs may result parent, interpretable, accountable, and trustable services and
in unexpected failures. We need some control mechanisms or decision-making processes. XAI can also help industries iden-
accountability to trust the AI models’ predictions and decisions. tify and reduce the risk of errors and biases, improve regulatory
For example, a wrong decision by a medical or self-driving compliance, enhance customer trust and confidence, facilitate
black-box may result in high risk for the impacted human be- innovations, and increase accountability and transparency.
ings [31, 38]. Researchers and system developers: The importance of
Safety: In certain applications, such as self-driving cars or XAI to researchers and AI system developers cannot be over-
military drones, it is important to understand the decisions made stated, as it provides critical insights that lead to improved
by an AI system in order to ensure the safety, security, and the model performance. Specifically, XAI techniques enable them
lives of humans involved [42]. to understand how AI models make decisions, and enable the
Human-AI collaboration: XAI can facilitate collaboration identification of potential improvement and optimization. XAI
between humans and AI systems by enabling humans to under- helps facilitate innovation and enhance the interpretability and
stand the reasoning behind an AI’s actions [43]. explainability of the model. From a regulatory perspective, XAI
can help enhance compliance with legal issues, in particular
2.3. Stakeholders of the XAI laws and regulations related to fairness, privacy, and security
Broadly speaking, all users of XAI systems, whether direct in the AI system. Finally, XAI can facilitate the debugging pro-
or indirect, stand to benefit from AI technology. Some of the cess critical to researchers and system developers, leading to
most common beneficiaries of the XAI system are identified in the identification and correction of errors and biases.
Figure 3.
Society: XAI plays a significant role in fostering social col- 2.4. Interpretability vs. Explainability
laboration and human-machine interactions [2] by increasing
the trustworthiness, reliability, and responsibility of AI systems, The concepts of interpretability and explainability are dif-
helping reduce the negative impacts such as unethical use of AI ficult to define rigorously. There is ongoing debate and re-
systems, discrimination, and biases. Hence, XAI promotes trust search about the best ways to operationalize and measure these
and the usage of models in society. two concepts. Even terminology can vary or be used in con-
Governments and associated organizations: Governments tradictory ways, though the concepts of building comprehen-
and governmental organizations have become AI system users. sion about what influences a model, how influence occurs, and
Therefore, the government will be greatly benefited by XAI where the model performs well and fails, are consistent within
systems. XAI can help develop the government’s public pol- the many definitions of these terms. Most studies at least agree
icy decisions, such as public safety and resource allocation, by that explainability and interpretability are related but distinct
making them transparent, accountable, and explainable to soci- concepts. Previous work suggests that interpretability is not a
ety. monolithic concept but a combination of several distinct ideas
4
that must be disentangled before any progress can be made to- plainability criteria, such as scope, stage, result, and function
ward a rigorous definition [44]. Explainability is seen as a sub- [48, 10, 13], are what we believe to be the most important be-
set of interpretability, which is the overarching concept that en- cause they provide a systematic and comprehensive framework
compasses the idea of opening the black-box. for understanding and evaluating different XAI techniques. We
The very first definition of interpretability in ML systems is have developed this taxonomy through rigorous study and anal-
“the ability to explain or to present in understandable terms to a ysis of existing taxonomies, along with an extensive review
human” [39], while explainability is “the collection of features of research literature pertinent to explainable artificial intelli-
of the interpretable domain, that have contributed for a given gence. We categorize our reviewed papers by the scope of
example to produce a decision” [45]. As indicated by Fuhrman explainability and training level or stage. The explainability
et al. [46], interpretability refers to “to understanding algorithm technique can be either global or local, and model-agnostic or
output for end-user implementation” and explainability refers model-specific, which can explain the model’s output or func-
to “techniques applied by a developer to explain and improve tion [9].
the AI system”. Gurmessa and Jimma [47] defined these con-
cepts as “the extent to which human observers can understand 3.1. Local and Global Explanation Techniques
the internal decision-making processes of the model” and “the Local and global approaches refer to the scope of explana-
provision of explanations for the actions or procedures under- tions provided by an explainability technique. Local explana-
taken by the model”, respectively. tions are focused on explaining predictions or decisions made
According to Das et al., [48], interpretability and explainabil- by a specific instance or input to a model [2, 10]. This approach
ity are the ability “to understand the decision-making process of is particularly useful for examining the behavior of the model
an AI model” and “to explain the decision-making process of an in relation to the local, individual predictions or decisions.
AI model in terms understandable to the end user”, respectively. Global techniques provide either an overview or a complete
Another study defines these two concepts as “the ability to de- description of the model, but such techniques usually require
termine cause and effect from a machine learning model” and knowledge of input data, algorithm, and trained model [44].
“the knowledge to understand representations and how impor- The global explanation technique needs to understand the
tant they are to the model’s performance”, respectively [8]. The whole structures, features, weights, and other parameters. In
AWS reports that interpretability is “to understand exactly why practice, global techniques are challenging to implement since
and how the model is generating predictions”, whereas explain- complex models with multiple dimensions, millions of param-
ability is “is how to take an ML model and explain the behavior eters, and weights are challenging to understand.
in human terms”.
The goal of explainability and interpretability is to make it 3.2. Ante-hoc and Post-hoc Explanation Techniques
clear to a user how the model arrives at its output, so that the Ante-hoc and post-hoc explanation techniques are two dif-
user can understand and trust the model’s decisions. However, ferent ways to explain the inner workings of AI systems. The
there are no satisfactory functionally-grounded criteria or uni- critical difference between them is the stage in which they
versally accepted benchmarks [49]. The most common defi- are implemented [7]. The ante-hoc XAI techniques are em-
nitions of interpretable ML models are those that are easy to ployed during the training and development stages of an AI sys-
understand and describe, while explainable ML models can tem to make the model more transparent and understandable,
provide an explanation for their predictions or decisions [50]. whereas the post-hoc explanation techniques are employed af-
A model that is highly interpretable is one that is simple and ter the AI models have been trained and deployed to explain
transparent, and whose behavior can be easily understood and the model’s prediction or decision-making process to the model
explained by humans. Conversely, a model that is not inter- users. Post-hoc explainability focuses on models which are
pretable is one that is complex and opaque, and whose behavior not readily explainable by ante-hoc techniques. Ante-hoc and
is difficult for humans to understand or explain [51]. post-hoc explanation techniques can be employed in tandem to
In general, interpretability is concerned with how a model gain a more comprehensive comprehension of AI systems, as
works, while explainability is concerned with why a model they are mutually reinforcing [10]. Some examples of ante-
makes a particular prediction or decision. Interpretability is hoc XAI techniques are decision trees, general additive models,
crucial because it allows people to understand how a model is and Bayesian models. Some examples of post-hoc XAI tech-
making predictions, which can help build trust in the model and niques are Local Interpretable Model-Agnostic Explanations
its results. Explainability is important because it allows peo- (LIME) [26] and Shapley Additive Explanations (SHAP) [52].
ple to understand the reasoning behind a model’s predictions, Arrieta et al. [10] classify the post-hoc explanation tech-
which can help identify any biases or errors in the model. Ta- niques into two categories:
ble 1 presents some representative XAI techniques and where
they lie on the spectrum. • Model-specific approaches provide explanations for the
predictions or decisions made by a specific AI model,
3. Categories of Explainability Techniques based on the model’s internal working structure and de-
sign. These techniques may not apply to other models with
In this section, we introduce a taxonomy for XAI techniques varying architectures, since they are designed for specific
and use specific criteria for general categorization. These ex- models [4]. However, a model-specific technique provides
5
Table 1: Examples of representative XAI techniques and where they lie on the spectrum.

Spectrum XAI techniques How does it work How to understand and explain
Because the model is based on simple linear
Use a linear relationship between equations, it is easy for a human to under-
Linear regression the input features and the target stand and explain the relationship between
variables to make predictions. the input features and the target variable.
This is built into the model.
Closer to
Because the rules are explicit and transpar-
Interpretability
ent, these models are both interpretable and
Rule-based mod- Use a set of explicit rules to make
explainable, as it is easy for a human to un-
els predictions.
derstand and explain the rules that the model
is using.
These decision rules are based on the val-
ues of the input features. Because it is easy
Use a series of simple decision
In the middle Decision trees to trace the model’s predictions back to the
rules to make predictions.
input data and the decision rules, it’s both
interpretable and explainable.
Because it provides a clear understanding of
which features are most important, it is easy
Use an algorithm to identify
Feature impor- to trace the model’s predictions back to the
the most important features in a
tance analysis input features. This is usually post-hoc and
model’s prediction or decision.
not part of the model architecture, so more
Closer to explainable then interpretable.
Explainability Use an approximate model to pro-
vide explanations for the predic-
Local inter- tions of a complex ML model. It Because it provides explanations for the pre-
pretable model- works by approximating the com- dictions of a complex model in a way that
agnostic explana- plex model with a simple, inter- is understandable to a human and is model-
tions pretable model, and providing ex- agnostic, LIME is explainable
planations based on the simple
model.

good insights into how the model works and makes a de- 3.3. Perturbation-based and Gradient-based XAI
cision. For example, neural networks, random forests, and
support vector machine models require model-specific ex- Perturbation-based and gradient-based methods are two of
planation methods. The model-specific technique in neu- the most common algorithmic design methodologies for devel-
ral networks provides more comprehensive insights into oping XAI techniques. Perturbation-based methods operate by
the network structure, including how weights are allocated modifying the input data, while gradient-based methods calcu-
to individual neurons and which neurons are explicitly ac- late the gradients of the model’s prediction with respect to its
tivated for a given instance. input data. Both techniques compute the importance of each
input feature through different approaches and can be used for
local and global explanations. Additionally, both techniques are
generally model-agnostic.
Perturbation-based XAI methods use perturbations to deter-
• Model-agnostic approaches are applied to all AI models mine the importance of each feature in the model’s prediction
and provide explanations of the models without depending process. These methods involve modifying the input data, such
on an understanding of the model’s internal working struc- as removing certain input examples, masking specific input fea-
ture or design. This approach is used to explain complex tures, producing noise over the input features, observing how
models that are difficult to explain using ante-hoc expla- the model’s output changes as a result, generating perturbations,
nation techniques. Model-agnostic approaches are model and analyzing the extent to which the output is affected by the
flexible, explanation flexible, and representation flexible, change of the input data. By comparing the original output with
making them useful for a wide range of models. However, the output from the modified input, it is possible to infer which
if the model is very complex, it may be hard to understand features of the input data are most important for the model’s
its behavior globally due to its flexibility and interpretabil- prediction [26]. The importance of each feature value provides
ity [51]. valuable insights into how the model made that prediction [52].
6
Hence, the explanation of the model is generated iteratively us- Y represent the input and output spaces, respectively [48, 57].
ing perturbation-based XAI techniques such as LIME, SHAP, Specifically, x ∈ X denotes an input instance, and y ∈ Y denotes
and counterfactual. the corresponding output or prediction. Let X ′ be the set of
Gradient-based XAI methods obtain the gradients of the perturbed and generated sample instances around the instance
model’s prediction with respect to its input features. These gra- x and x′ ∈ X ′ , an instance from this set. Another function g
dients reflect the sensitivity of the model’s output to changes in maps instances of X ′ to a set of representations denoted as Y ′
each input feature [53]. A higher gradient value for an input which are designed to be easily understandable or explainable:
feature implies greater importance for the model’s prediction. g : X ′ → Y ′ , where y′ ∈ Y ′ is an output from the set of possible
Gradient-based XAI methods are valuable for their ability to outputs in Y ′ . The use of interpretable instances in Y ′ allows
handle high-dimensional input space and scalability for large for clearer insights into the model’s prediction processes.
datasets and models. These methods can help gain a deeper un- LIME [26] provides an explanation for each input instance x,
derstanding of the model and detect errors and biases that de- where f (x) = y′ ≈ y is the prediction of the black-box model.
crease its reliability and accuracy, particularly in safety-critical The LIME model is g ∈ G where g is explanation model that be-
applications such as health care and self-driving cars [2]. Class longs to a set of interpretable models G. Let’s say every g < G
activation maps, integrated gradients, and saliency maps are is “good enough” to be interpretable. To prove this hypothesis,
among the most commonly used gradient-based XAI methods. LIME uses three important arguments: a measure of complexity
Figure 4 presents a summary of explainability taxonomy dis- Ω(g) of the explanation, ensuring it remains simple enough for
cussed in this section. human understanding; a proximity measure (π x (z)) that quanti-
Figure 5 illustrates a chronological overview of the state-of- fies the closeness between the original instance x and its pertur-
the-art XAI techniques focused on in this survey. Perturbation- bations; and a fidelity measure ζ( f, g, π x ) which assesses how
based methods like LIME, SHAP, Counterfactual explanations, well g approximates f ’s predictions, aiming for this value to be
and gradient approaches, including LRP, CAM, and Integrated minimal to maximize the faithfulness of the explanation. The
Gradients, have been selected for detailed discussion in this following formula achieves the explanation produced by LIME:
context. They serve as foundational frameworks upon which
other techniques are built, highlighting their significance in ξ(x) = argmin ζ( f, g, π x ) + Ω(g). (1)
the field. “Transformer Interpretability Beyond Attention Vi-
sualization” [54] and “XAI for Transformers: Better Explana- Figure 6 illustrates the LIME model for explaining a predic-
tions through Conservative Propagation” [55] are foundational tion of a black-box model based on an instance. LIME can be
works for discussing transformer explainability, providing key considered a model-agnostic technique for generating explana-
insights and practices that serve as a baseline in this survey. tions that can be used across different ML models. LIME is
insightful in understanding the specific decisions of a model by
providing local individual instance explanations, and in detect-
4. Detailed Discussions on XAI Techniques
ing and fixing biases by identifying the most influential feature
XAI techniques differ in their underlying mathematical prin- for a particular decision made by a model [10].
ciples and assumptions, as well as in their applicability and lim-
itations. We classify the widely used XAI techniques based on 4.1.2. SHAP
perturbation, gradient, and the use of the Transformer [56] ar- SHAP [52] is a model-agnostic method, applicable to any
chitecture. The Transformer has become a dominant architec- ML model, ranging from simple linear models to complex
ture in deep learning, whether it is in natural language process- DNN. This XAI technique employs contribution values as
ing, computer vision, time series data, or anything else. As a means for explicating the extent to which features contribute
result, we include a separate section on Transformer explain- to a model’s output. The contribution value is then leveraged to
ability. explain the output of a given instance x. SHAP computes the
average contribution of each feature through the subset of fea-
4.1. Perturbation-based Techniques tures by simulating the model’s behavior for all combinations
Perturbation-based XAI methods are used to provide local of feature values. The difference in output is computed when
and global explanations of the black-box models by making a feature is excluded or included in that output process. The
small and controlled changes to the input data to gain insights subsequent contribution values give a measure of the feature
into how the model made that decision. This section discusses relevance, which is significant to the model’s output [52, 58].
the most predominant perturbation-based XAI techniques, such Assume f is the original or black-box model, g is the expla-
as LIME, SHAP, and Counterfactual Explanations (CFE), in- nation model, M is the number of simplified input features, x is
cluding their mathematical formulation and underlying assump- a single input, and x′ is a simplified input such that x = h x (x′ ).
tions. Additive feature attribution methods, such as SHAP, have a lin-
ear function model explanation with binary variables.
4.1.1. LIME
M
A standard definition of a black-box model f , where the in-
X
g(x′ ) = ϕ0 + ϕi z′i , (2)
ternal workings are unknown, is f : X → Y, where X and i=1

7
Figure 4: Explainability taxonomy.

Figure 5: Chronological order of state-of-the-art XAI techniques.

where ϕ0 is the default explanation when no binary features, features of influences that may result in less accurate explana-
z′i ∈ {0, 1} M and ϕi ∈ R. The SHAP model explanation must tions. SHAP’s explanation is model output dependent. If the
satisfy three properties to provide high accuracy [52]: (i) “lo- model is biased, SHAP’s explanation reflects the bias of the
cal accuracy” requires that the explanation model g(x′ ) matches model behavior.
the original model f (x), (ii) “missingness” which states, if the
simplified inputs denote feature presence, then its attribute in- 4.1.3. CFE
fluence would be 0. More simply, if a feature is absent, it should CFE [59] is used to explain the predictions made by the
have no impact on the model output, and (iii) “consistency” re- ML model using generated hypothetical scenarios to under-
quires that if the contribution of a simplified input increases or stand how the model’s output is affected by changes in input
stays the same (regardless of the other inputs), then the input’s data. The standard classification models are trained to find the
attribution should not decrease. the optimal set of weights w:
SHAP leverages contribution values to explain the impor- argminω ζ( fω (xi ), yi ) + ρ(w), (3)
tance of each feature to a model’s output. The explanation
is based on the Shapley values, which represent the average where f is a model, ρ is the regularizer to prevent overfitting
contribution of each feature over all possible subsets of fea- in the training process, yi is the label for data point xi , and w
tures. However, in complex models, SHAP approximates the represents the model parameters to be optimized. The argument
8
and visualizing the most significant regions of that image for
the model’s prediction.
Suppose an image I0 , a specific class c, and the CNN classi-
fication model with the class score function S c (I) (that is used
to determine the score of the image) are analyzed. The pixels
of I0 are ranked based on their impact on this score S c (I0 ). The
linear score model for the class c is obtained as follows:

S c (I) = ωTc + bc , (5)

where I is a one-dimensional vectorized image, ωc is the weight


vector, and bc is the model’s bias. The magnitude of weights ω
Figure 6: Illustration of LIME detailing instance-level interpretations for pre- specifies the relevance of the corresponding pixels of an image
dictions by a black-box model.
I for class c. In the case of non-linear functions, S c (I) requires
to be approximated based on the neighborhood of I0 with the
argmin is the value of the variable that minimizes the given linear function using first-order Taylor expansion:
function, then,
S c (I) ≈ ωT + b, (6)
argmin x′ maxλ λ( fω (x ) − y ) + d(xi , x ).
′ ′ 2 ′
(4)
where ω is the derivative of S c with respect to the input image
This equation performs two different distance computations, I at the particular point in the image I0 :
comprising a quadratic distance between the model’s output for
the counterfactual x′ and the required output y′ , and a distance d ∂S c
ω= . (7)
between the input data xi to be explained and the counterfactual ∂I I0
x′ . The value of λ can be maximized by solving for x′ itera-
tively and increasing the value of λ until the closest solution The saliency map is sensitive to noise in the input data, which
is located. The distance function (viz., Manhattan distance) d may lead to incorrect explanations. The saliency map is only
should be carefully chosen based on task-specific requirements. applicable to gradient-based models. It provides a local expla-
The CFE technique offers valuable insights into the decision- nation for individual predictions without suggesting the global
making process of a model, thereby aiding in the identification behavior of a model. The explanation of saliency maps is some-
of biases and errors present in the black-box model. Impor- times ambiguous, where multiple features are highlighted, par-
tantly, the interpretation of the results and the insights gained ticularly in complex images [10].
from the counterfactual explanations can be model-agnostic,
while the generation of the counterfactuals may not be. In addi- 4.2.2. LRP
tion to being computationally expensive to generate counterfac- The main goal of LRP [61] is to explain each input feature’s
tuals, CFE is limited to individual instances and might not pro- contribution to the model’s output by assigning a relevance
vide a general behavior of the model. CFE is data distribution- score to each neuron. LRP, as visualized in Figure 7, propa-
dependent, which may not be consistent if the training data is gates the relevance score backward through the network layers.
incomplete or biased. Finally, CFE is sensitive to ethical con- It assigns a relevance score to each neuron, which allows for
cerns if counterfactuals make suggestions (e.g., gender or race). the determination of the contribution of each input feature to
the output of the model.
4.2. Gradient-based Techniques
Gradient-based techniques use the gradients of the output
with respect to the input features. They can handle high-
dimensional input space, are scalable for large datasets, pro-
vide deep insights into a model, and help detect errors and bi-
ases. Saliency Maps, Layer-wise Relevance BackPropagation
(LRP), Class Activation Maps (CAM), and Integrated Gradi-
ents are the most common gradient-based XAI techniques and
they are good frameworks for building other techniques.

4.2.1. Saliency Map


Simonyan et al. [60] utilized a saliency map for a model
explanation for the first time in deep CNN. As a visualiza-
tion technique, a saliency map, which is a model-agnostic tech- Figure 7: Visualizing the LRP technique (adapted from [62]). Each neuron
nique, highlights important features in the image classification redistributes the relevance score to the lower layer (R j ) when it receives from,
model by computing the output’s gradients for the input image the higher layer (Rk ).

9
LRP is subject to the conservation property, which means a
neuron that receives the relevance score must be redistributed
to the lower layer in an equal amount. Assume j and k are two
consecutive layers, where layer k is closer to the output layer.
The neurons in layer k have computed their relevance scores,
denoted as (Rk )k , propagating relevance scores to layer j. Then,
propagated relevance score to neuron R j is computed using the
following formula [62]:
X z jk
Rj = P Rk , (8)
k j z jk
P Figure 8: The predicted class score is mapped back to the previous convolu-
where z jk is the contribution of neuron j to Rk and j z jk is used tional layer to generate the class activation maps (input image from CIFAR 10
to enforce the conservation property. In this context, a pertinent dataset).
question arises as to how do we determine z jk , which represents
the contribution of a neuron j to a neuron k in the network, is
ascertained? LRP uses three significant rules to address this where wck is the weight relating to the class c for unit k. We can
question [61]. compute the class activation map Mc for class c of each special
element as k wck fk (x, y). Therefore, S c for a given class c can
P
• The basic rule redistributes the relevance score to the input be rewritten: X
features in proportion to their positive contribution to the Sc = Mc (x, y). (10)
output. x,y

• The Epsilon rule uses an ϵ to diminish relevance scores In the previous formula, Mc (x, y) shows the significance of the
when contributions to neuron k are weak and contradic- activation at spatial location (x, y), and it is critical to determine
tory. the class of the image to class c.
CAM is a valuable explanation technique for understanding
• The Gamma rule uses a large value of γ to reduces negative
the decision-making process of deep learning models applied
contribution or to lower noise and enhance stability.
to image data. However, it is important to note that CAM
Overall, LRP is faithful, meaning that it does not introduce is model-specific, as it requires access to the architecture and
any bias into the explanation [61]. This is important for en- weights of the CNN model being used.
suring that the explanations are accurate and trustworthy. LRP
is complex to implement and interpret, which requires a good 4.2.4. Integrated Gradients
understanding of the neural networks’ architecture. It is compu-
Integrated Gradients [64] provides insights into the input-
tationally expensive for large and complex models to compute
output behavior of DNNs which is critical in improving and
the backpropagating relevance scores throughout all layers of
building transparent ML models. Sundararajan et al. [64]
the networks. LRP is only applicable to backpropagation-based
strongly advocated that all attribution methods must adhere to
models like neural networks. It requires access to the internal
two axioms. The Sensitivity axiom is defined such that “an
structure and parameters of the model, which is sometimes im-
attribution method satisfies Sensitivity if for every input and
possible if a model is proprietary. LRP is a framework for other
baseline that differ in one feature but have different predictions,
XAI techniques. However, there is a lack of standardization,
then the differing feature should be given a non-zero attribu-
which leads to inconsistent explanations through different im-
tion”. The violation of the Sensitivity axiom may expose the
plementations.
model to gradients being computed using non-relevant features.
Thus, it is critical to control this sensitivity violation to assure
4.2.3. CAM
the attribution method is in compliance. The Implementation
CAM [63] is an explanation technique typically used for
Invariance axiom is defined as “two networks are functionally
CNN and deep learning models applied to image data. For ex-
equivalent if their outputs are equal for all inputs, despite hav-
ample, CAM can explain the predictions of a CNN model by
ing very different implementations. Attribution methods should
indicating which regions of the input image the model is focus-
satisfy Implementation Invariance, i.e., the attributions are al-
ing on, or it can simply provide a heatmap for the output of the
ways identical for two functionally equivalent networks”. Sup-
convolutional layer, as shown in Figure 8.
pose two neural networks perform the same task and gener-
Let fk (x, y) denote the activation of unit k in the last convo-
ate identical predictions for all inputs. Then, any attribution
lutional layer at location (x, y) in the given image, the global
P method used on them should provide the same attribution val-
average pooling is computed by x,y fk (x, y). Then, the input to
ues for each input to both networks, regardless of the differ-
the softmax function, called S c , for a given class c is obtained
ences in their implementation details. This ensures that the at-
using the following formula:
tributions are not affected by small changes in implementation
X X XX
Sc = wck fk (x, y) = wck fk (x, y), (9) details or architecture, thus controlling inconsistent or unreli-
k x,y x,y k able outputs. In this way, Implementation Invariance is critical
10
to ensuring consistency and trustworthiness of attribution meth- The relevance scores signify how much each feature at each
ods. layer contributes to the final prediction and decision.
Consider a function F: Rn → [0, 1], which represents a DNN. The LRP framework is a baseline for developing various rel-
We take x ∈ Rn to be the input instance and x′ ∈ Rn be the base- evance propagation rules. Let’s start the discussion by embed-
line input. In order to produce a counterfactual explanation, it ding Gradient×Input into the LRP framework to explain Trans-
is important to define the baseline as the absence of a feature in formers [55]. Assume (xi )i and (y j ) j represent the input and
the given input. However, it may be challenging to identify the output vectors of the neurons, respectively, and f is the out-
baseline in a very complex model. For instance, the baseline for put of the model. Gradient×Input attributions on these vector
image data could be black images, while for NLP data, it could representations can be computed as:
be a zero embedding vector, which is a vector of all zeroes used
as a default value for words not found in the vocabulary. To R(xi ) = xi · (∂ f /∂xi ) and R(y j ) = y j · (∂ f /∂y j ). (12)
obtain Integrated Gradients, we consider the straight-line path
(in Rn ) from x′ (the baseline) to the input instance x and com- The gradients at different layers are computed using the chain
pute the gradients at all points along this path. The collection rule. This principle states that the gradient of the function f
of these gradients provides the Integrated Gradients. In other with respect to an input neuron xi can be expressed as the sum
words, Integrated Gradients can be defined as the path integral of the products of two terms: the gradients of all connected
of the gradients along the straight-line path from x′ to the input neurons y j with respect to xi and the gradients of the function f
instance x. with respect to those neurons y j . This is mathematically repre-
The gradient of F(x) along the ith dimension is given by ∂F(x)
∂xi , sented as follows:
th
leading to the Integrated Gradient (IG) along the i dimension
for an input x and baseline x′ to be described as: ∂f X ∂ f ∂y j
= . (13)
∂xi j
∂y j ∂xi
1
∂F(x′ + α × (x − x′ ))
Z
IGi (x) = (xi − xi′ ) × dα. (11)
α=0 ∂xi We can convert the gradient propagation rule into an equivalent
relevance propagation by inserting equation (12) into equation
The Integral Gradient XAI method satisfies several important (13):
properties, such as sensitivity, completeness, and implementa- X ∂y j xi
tion details. It can be applied to any differential model, making R(xi ) = R(y j ), (14)
it a powerful and model-specific tool for explanation in a DNN j
∂xi y j
[64].
with the convention 0/0=0. We can prove that
=
P P
i R(xi ) j R(y j ) easily, and if this condition holds true,
4.3. XAI for Transformers conservation also holds true. However, Transformers break this
Transformers [56] have emerged as the dominant architec- conservation rule. The following subsections discuss methods
ture in Natural Language Processing (NLP), computer vision, to improve propagation rule [55].
multi-modal reasoning tasks, and a diverse and wide array of
applications such as visual question answering, image-text re- 4.3.1. Propagation in Attention Heads
trieval, cross-modal generation, and visual commonsense rea-
Transformers work based on Query (Q), Key (K), and
soning [65]. Predictions and decisions in Transformer-based
Value (V) matrices, and consider the attention head, which uses
architectures heavily rely on various intricate attention mech-
these core components [56]. The attention heads have the fol-
anisms, including self-attention, multi-head attention, and co-
lowing structure:
attention. Explaining these mechanisms presents a significant
challenge due to their complexity. In this section, we explore
1
the interpretability aspects of widely adopted Transformer- Y = so f tmax( √ (X ′ WQ )(XWK )τ )X, (15)
based architectures. dk
Gradient×Input [58], [66] and LRP [61] XAI techniques
where X = (xi )i and X ′ = (x′j ) j are input sequences, Y = (y j ) j
have been extended to explain Transformers [67], [68]. Atten-
is the sequence of the output, W{Q,K,V} are learned projection
tion rollouts [69] and generic attention are new techniques to
matrices, and dk is the dimensionality of the Key-vector. The
aggregate attention information [65] to explain the Transform-
previous equation is rewritten as follows:
ers. LRP [61], Gradient×Input [53], Integrated Gradients [64],
and SHAP [52] are designed based on the conservation ax- X
iom for the attribution of each feature. The conservation ax- yj = xi pi j , (16)
i
iom states that each input feature contributes a portion of the
predicted score at the output. LRP is employed to assign the exp(qi j )
model’s output back to the input features by propagating rele- where y j is the output, pi j = P
exp(qi′ j )
i′
is the softmax computa-
vance scores backward through the layers of a neural network, tion, and qi j = √1 xτ WK W τ x′ is the matching function between
dk i Q j
measuring their corresponding contributions to the final output. the two input sequences.
11
4.3.2. Propagation in LayerNorm counterfactual explanations, policy distillation, attention mech-
LayerNorm or Layer normalization is the crucial component anisms, Human-in-the-loop, query system, and natural lan-
in Transformers used to stabilize and improve the training of guage explanations [80]. Explainability in reinforcement learn-
models. LayerNorm is involved in the centering and standard- ing is crucial, particularly for safety-critical domains, due to
ization of key operations, defined as follows: the need for trust, safety assurance, regulatory compliance, eth-
ical decision-making, model debugging, collaborative human-
xi − E[x] AI interaction, accountability, and AI model adoption and ac-
yi = √ , (17)
ϵ + Var[x] ceptance [73, 81, 82].

where E[·] and Var[·] represent the mean and variance overall 4.5. Summary
activation of the corresponding channel. The relevance prop-
agation associated with Gradient×Input is represented by the Applying XAI techniques can enhance transparency and
conservation equation: trust in AI models by explaining their decision-making and
prediction processes. These techniques can be classified
X Var[x] X into categories such as local or global, post-hoc or ante-hoc,
R(xi ) = (1 − ) R(yi ), (18)
i
ϵ + Var[x] i model-specific or model-agnostic, and perturbation or gradient
methodology. We have added a special subsection for reinforce-
where R(y j ) = xτj (∂ f /∂y j ). The implied propagation rules in ment learning and Transformers due to their popularity and pro-
attention heads and LayerNorm in equation (14) are replaced found impact on applications of deep learning in a wide variety
by ad-hoc propagation rules to ensure conservation. Hence, we of areas. Table 2 summarizes the reviewed XAI techniques dis-
make a locally linear expansion of attention head by observing cussed.
the gating terms pi j as constant, and these terms are considered
as the weights of a linear layer which is locally mapping the
5. XAI Techniques in Application Areas
input sequence x into the output sequence y. As a result, we
can use the canonical LRP rule for linear layers as follows:
The area of XAI has been gaining attention in recent years
X xi pi j due to the growing need for transparency and trust in ML mod-
R(xi ) = P R(y j ). (19) els [26, 52]. XAI techniques are being used to explain the pre-
j i′ xi′ pi′ j
dictions of ML models [39, 2, 8]. These techniques can help
identify errors and biases that decrease the reliability and ac-
Recent studies such as Attention rollouts [69], generic atten- curacy of the models. This section explores the different XAI
tion [65], and Better Explanations through Conservative Propa- techniques used in natural language processing, computer vi-
gation [55] have provided empirical evidence that it is possible sion, and time series analysis, and how they contribute to im-
to improve the explainability of Transformers. proving the trust, transparency, and accuracy of ML models in
different application areas.
4.4. Explainability in Reinforcement Learning
5.1. Explainability in Natural Language Processing
Reinforcement Learning (RL) involves applications across
various domains, including safety-critical areas like au- Natural language processing employs ML, as it can help
tonomous vehicles, healthcare, and energy systems [70, 71]. efficiently handle, process, and analyze vast amounts of text
In the domain of autonomous vehicles, RL is employed to re- data generated daily through areas such as human-to-human
fine adaptive cruise control and lane-keeping features by learn- communication, chatbots, emails, and context generation soft-
ing optimal decision-making strategies from simulations of di- ware, to name a few [83]. One barrier to implementation is
verse traffic scenarios [72]. Explainability in reinforcement that such data are usually not inherently clean, and prepro-
learning concerns the ability to understand and explain the ra- cessing and training are essential tasks for achieving accurate
tionale behind the decisions made and actions taken by re- results with language models [84]. In NLP, language mod-
inforcement learning models within their specified environ- els can be classified into three categories: transparent archi-
ments [73, 74, 75]. Post-hoc explanations, such as SHAP and tectures, neural network (non-Transformer) architectures, and
LIME, can help us understand and explain which features are transformer architectures. Transparent models are straight-
most important for the decision-making process of an RL agent forward and easy to understand due to their clear processing
[76, 77]. Example-based explanation methods, like trajectory paths and direct interpretability. Models based on neural net-
analysis, help us to get insights into the decision-making pro- work (non-Transformer) architectures are often termed “black
cess of the RL model by examining specific trajectories, such boxes” due to their multi-layered structures and non-linear pro-
as sequences of states, actions, and rewards [78]. Visualiza- cessing. Transformer architectures utilize self-attention mech-
tion techniques enable us to understand and interpret the RL anisms to process sequences of data. The increased complexity
models by visually representing the model’s decision-making and larger number of parameters often make transformer-based
processing [79]. Several explainability methods exist to inter- models less interpretable, requiring advanced techniques to ex-
pret reinforcement learning models, including saliency maps, plain their decision-making processes.
12
Table 2: The XAI techniques discussed, the methods used, their advantages and disadvantages.

Technique Scope Application Method used Advantages Disadvantages


Perturbation-based Techniques
-Understanding
-Explain a prediction of a the specific deci- -Computationally expensive,
Model-
LIME [26] Local black-box model based on sions of a model, -Less effective for complex and high-
Agnostic
an instance. -Detecting and dimensional data.
fixing biases.
-Computationally expensive,
-Leverage contribution val-
-Need to be more scalable for highly
Model- ues to explain the impor- -Applicable to
SHAP [52] Local complex models and large data,
Agnostic tance of each feature to a any ML model.
Less accurate explanation in complex
model’s output.
models.
-Computationally expensive to gener-
ate counterfactuals,
-Use generated hypotheti- -Offer valuable -Limited to individual instances and
cal scenarios to understand insights into the does not dicuss a general behavior of
Local/ Model-
CFE [59] how the model’s output is decision-making the model,
Global Agnostic
affected by changes in input process of a -Not consistent if the training data are
data. model. incomplete or biased,
-Sensitive to ethical concerns if coun-
terfactuals make suggestions.
Gradient-based methodology
-Highlight important fea- -Sensitive to noise in the input data,
tures by obtaining output’s -Only applicable to gradient-based
-Quick insights
Saliency Local/ Model- gradients for input image, models,
-Applicable to
Map [60] Global Agnostic -Visualize the most signif- -Sometimes ambiguous, where multi-
various models
icant regions of that image ple features are highlighted, particu-
for the model’s prediction. larly in complex images.
-Complex to implement and interpret,
-Computationally expensive for large
-Explain each input fea- -Faithful,
and complex models,
ture’s contribution to the -Does not intro-
Local/ Model- -Only applicable to backpropagation-
LRP [62] model’s output by assign- duce any bias
Global specific based models,
ing a relevance score to into the explana-
-Lack of standardization,
each neuron. tion.
-Inconsistent explanations through
different implementations.
-Explain predictions of -A valuable XAI
CNN by indicating which technique for un-
regions of the input image derstanding the
Model- -Requires access to the architecture
CAM [63] Local the model is focusing on, decision-making
Specific and weights of the model used.
or it can simply provide a process of DL
heatmap for the output of models applied
the convolutional layer. to image data.
-Satisfies several
-Attribution methods must
Integrated important prop-
Model- adhere to Sensitivity and -Sensitivity to initialization
Gradients Local erties,
Specific Implementation Invariance -Computational intensity
[64] -Applies to any
axioms.
models.
LRP-
-Granular in-
Con- Model- -Explain predictions of the
Local sights -Complexity and computational cost
serva- Specific Transformers
-Clear attribution
tion [55]

13
The success of XAI techniques used in NLP applications is such as text classification, sentiment analysis, topic modeling,
heavily dependent on the quality of preprocessing and the type named entity recognition, and language generation [51, 86].
of text data used [2, 85]. This is important because XAI is SHAP computes the feature importance scores by generating
critical to developing reliable and transparent NLP models that a set of perturbations that remove one or more words from the
can be employed for real-world applications by allowing us to input text data. For each perturbation, SHAP computes the dif-
understand how a model arrived at a particular decision. This ference between the expected model output when the word is
section reviews some of the most common XAI techniques for included or not included, which is known as the Shapley value.
NLP. Figure 9 presents a taxonomy of NLP explainability. This approach then computes the importance of each word in
the original input text by combining and averaging the Shap-
ley values of all the perturbations. Finally, SHAP visualizes
the feature importance scores to indicate which words are more
useful in the model prediction process.
LRP: To apply LRP to NLP, one must encode preprocessed
input text as a sequence of word representations, such as word
embeddings, and feed them to a neural network [87]. The net-
work processes the embeddings using multiple layers and pro-
duces the model’s prediction. LRP then computes the relevance
scores by propagating the model’s output back through the net-
work layers. The relevance score for each word is normalized
using the sum of all relevance scores and multiplied by the
weight of that word in the original input text. This score re-
Figure 9: Taxonomy of explainability in natural language processing.
flects its contribution to the final prediction.
Integrated Gradients: Integrated Gradients are used in NLP
tasks such as text classification, sentiment analysis, and text
5.1.1. Explaining Neural Networks and Fine-Tuned Trans- summarization [88]. The Integrated Gradients technique com-
former Models: Insights and Techniques putes the integral of the gradients of the model’s prediction with
Transparent models are easy to interpret because their inter- the corresponding input text embeddings along the path from
nal mechanisms and decision-making processes are designed a baseline input to the original input. The baseline input is
to be inherently understandable [10]. Perturbation-based and a neutral or zero-embedding version of the input text that has
gradient-based techniques are the most commonly employed no relevant features for the prediction task. The difference be-
approaches for explaining neural network-based models and tween the input word embeddings and the baseline embeddings
fine-tuned transformer-based models. In this subsection, we is then multiplied by the integral to find the attribution scores
discuss some of the most common XAI techniques for neural for each word, which indicate the relevance of each word to the
network-based models and fine-tune Transformer-based mod- model’s prediction. Integrated Gradients output a heatmap that
els, used in NLP. highlights the most important words in the original input text
LIME: Discussed in Subsection 4.1.1, selects a feature, such based on the attribution scores [66]. This provides a clear visu-
as a word, from the original input text data and generates many alization of the words that were most significant for the model’s
perturbations around that feature by randomly removing or re- prediction, allowing users to understand how the model made
placing other features (i.e., other words). LIME trains a sim- that particular decision. IG can be used to identify the most
pler and explainable model using the perturbed data to generate important features in a sentence, understand how the model’s
feature importance scores for each word in the original input predictions change when different input features are changed,
text [26]. These scores indicate the contribution of each word and improve the transparency and interpretability of NLP mod-
to the black-box model prediction. LIME identifies and high- els [89].
lights the important words to indicate the impact of the model’s
prediction, as shown in Figure 10. 5.1.2. Prompt-Based Explainability for Transformer Models
In this subsection, we discuss some of the most com-
mon prompt-based explanation techniques, including Chain
of Thought (CoT), In-Context Learning (ICL), and interactive
prompts.
Chain of Thought: In the context of a large language model
(LLM) such as GPT-3[90], Chain of Thought prompts refer to
the input sequences intended to instruct the model using a se-
ries of intermediate reasoning steps to generate a coherent out-
Figure 10: LIME feature importance scores visualization. put [91, 92]. This technique helps enhance task performance
by providing a clear sequence of reasoning steps, making the
SHAP: SHAP [52] is a widely-used XAI technique in NLP model’s thought process more understandable to the audience
14
[93]. The gradient-based studies explored the impact of change- mantic priors and struggle to learn new mappings through the
of-thought prompting on the internal workings of LLMs by in- flipped labels. This learning capability demonstrates symbolic
vestigating the saliency scores of input tokens [94]. The scores reasoning in LLMs that extends beyond semantic priors, show-
are computed by identifying the input tokens (words or phrases) ing their ability to adapt to new, context-specific rules in input
and inputting them into the model to compute the output. The prompts, even when these rules are completely new or contra-
influence of each token is then calculated through backpropaga- dict pre-trained knowledge. Another study explores the work-
tion, utilizing the gradients of the model. The score reveals the ings of ICL in large language models by employing contrastive
impact of each input token on the model’s decision-making pro- demonstrations and analyzing saliency maps, focusing on sen-
cess at every intermediate step. By analyzing the step-by-step timent analysis tasks [99]. In this research, contrastive demon-
intermediate reasoning, users can gain a better understanding strations involve manipulating the input data through various
of how the model arrived at its decision, making it easier to approaches, such as flipping labels (from positive to negative or
interpret and trust the model’s outputs. vice versa), perturbing input text (altering the words or struc-
Perturbation-based studies on Chain of Thought explanation ture of the input sentences without changing their overall senti-
through the introduction of errors in few-shot prompts have pro- ment), and adding complementary explanations (providing con-
vided valuable insights into the internal working mechanism text and reasons along with the input text and flipped labels).
behind large language models [95, 96]. Counterfactual prompts Saliency maps are then applied to identify the parts of the input
have been suggested as a method of altering critical elements text that are most significant to the model’s decision-making
of a prompt, such as patterns and text, to assess their impact process. This method facilitates visualization of the impact that
on the output [95]. The study demonstrated that intermediate contrastive demonstrations have on the model’s behavior. The
reasoning steps primarily guide replicating patterns, text, and study revealed that the impact of contrastive demonstrations on
structures into factual answers. Measuring the faithfulness of model behavior varies depending on the size of the model and
a CoT, particularly within the context of LLMs, involves as- the nature of the task. This indicates that explaining in-context
sessing the accuracy and consistency with which the explana- learning’s effects requires a nuanced understanding that consid-
tions and reasoning process align with established facts, logical ers both the model’s architectural complexities and the specific
principles, and the predominant task objectives [97]. Several characteristics of the task at hand.
key factors are crucial when evaluating CoT faithfulness, in- ICL allows large language models to adapt their responses to
cluding logical consistency, factuality, relevance, completeness, the examples or instructions provided within the input prompts.
and transparency [97]. The assessment often requires qualita- Explainability efforts in LLMs aim to reveal how these models
tive evaluations by human judges and quantitative metrics that interpret and leverage in-context prompts, employing various
can be automatically calculated. The development of models techniques, such as saliency maps, contrastive demonstrations,
to measure faithfulness and the design of evaluation methods and feature attribution, to shed light on LLMs’ decision-making
remains an area of active research. processes. Understanding the workings of ICL in LLMs is cru-
Explaining In-Context Learning: In-context learning is a cial for enhancing model transparency, optimizing prompt de-
powerful mechanism for adapting the model’s internal behav- sign, and ensuring the reliability of model outputs across vari-
ior to the immediate context provided in the input prompt. ICL ous applications.
operates by incorporating examples or instructions directly into Explaining Interactive Prompt: Explaining Interactive
the prompt, guiding the model toward generating the desired Prompt is a technique that focuses on designing and us-
output for a specific task. This approach enables the model to ing prompts to interact effectively with large language mod-
understand and generate responses that are relevant to the speci- els [92, 100]. This method involves designing prompts that
fied task by leveraging the contextual prompts directly from the dynamically direct the conversation toward specific topics or
input. Several studies have focused on the explainability of how solicit explanations. Through the use of strategically designed
in-context learning influences the behavior of large language prompts, users can navigate the conversation with a model to
models, applying various techniques and experimental setups achieve more meaningful and insightful interactions, enhanc-
to elucidate this process. A recent study explores a critical as- ing the understanding of the model’s reasoning and decision-
pect of how ICL operates in large language models, focusing making process.
on the balance between leveraging semantic priors from pre- Several studies use various approaches to analyze and en-
training and learning new input-label mappings from examples hance the effectiveness of explaining interactive prompts. A
provided within prompts [98]. The study aims to understand study called TalkToModel introduced an interactive dialogue
whether the LLMs’ capability to adapt to new tasks through system designed to explain machine learning models under-
in-context learning is primarily due to the semantic priors ac- standable through natural language conversations or interactive
quired during pre-training or if they can learn new input-label prompts [100]. It evaluates the system’s language understand-
mappings directly from the examples provided in the prompts. ing capabilities, increasing deeper and more meaningful inter-
The experimental results revealed nuanced capabilities across actions between users and models through interactive prompts.
LLMs of different sizes. This approach enhances the interpretability of complex ma-
Larger LLMs showed a remarkable capability to override chine learning models’ behaviors and the model’s decision-
their semantic priors and learn new, contradictory input-label making process. The study called Prompt Pattern Catalog in-
mappings. In contrast, smaller LLMs rely more on their se- troduced a catalog designed to enhance prompt engineering by
15
systematically organizing and discussing various strategies for
constructing prompts [92]. This catalog aims to explain the
decision-making process of models more clearly. It provides
insights and methodologies for eliciting detailed, accurate, and
interpretable responses from models, thus improving the under-
standing of model behavior and decision-making logic.

5.1.3. Attention Mechanism


Attention mechanisms in the Transformer architecture enable
a model to focus selectively on different parts of the input se-
quence for each step of the output sequence, mimicking the way
humans pay attention to specific parts of a sentence [56]. Atten-
tion weights can be visualized to gain insights into the model’s
decision-making process, revealing which parts of the input it Figure 11: Taxonomy of explainability in computer vision.
considers significant. This visualization aids in understanding
how the model makes its decisions and explains the significance
assigned to various input segments, thereby enhancing the in- pixels as the most important for the model’s prediction, gener-
terpretability and transparency of black-box models. Attention- ating a heatmap that highlights these regions in the input image.
Viz [101], a visualization method for self-attention, highlights The resulting saliency heatmap provides important insights into
query and key embeddings, enabling global pattern analysis the CNN and aids in interpreting its decision-making processes.
across sequences and revealing deeper insights into Transform- The saliency map method removes less relevant regions (pix-
ers’ pattern identification and connection beyond previous visu- els), such as the image background, and identifies the most
alization methods. Another study introduces a method for ex- important regions of the input image for the model’s decision.
plaining predictions made by Transformer-based models, par- However, it is important to note that the saliency map only pro-
ticularly those using multi-modal data and co-attention mech- vides a local explanation by highlighting specific pixels of the
anisms [65]. This method provides generic explainability so- input image and does not provide a global explanation.
lutions for the three most common components of the Trans- Class Activation Maps: CNNs are powerful neural models
former architectures: pure self-attention, a combination of self- for image processing tasks, achieving state-of-the-art perfor-
attention and co-attention, and encoder-decoder attention. mance in various applications such as object recognition, seg-
mentation, and detection [104, 105]. However, their complex
5.2. Explainability in Computer Vision architecture and the high dimensionality of their learned fea-
tures make it challenging to understand how they make deci-
In computer vision, models can be categorized into Convo- sions. Using CAM for explaining the behavior of CNN mod-
lutional Neural Network-based models (CNNs) and attention- els is popular [106]. The CNN model is first trained with pre-
based models, such as Vision Transformers (ViTs), based on processed and labeled image data for image classification tasks.
their architecture. Accordingly, various XAI approaches are de- A weighted feature map is obtained by multiplying the fea-
signed to work with these model architectures. For CNNs, tech- ture map from the final convolutional layer with channel impor-
niques such as saliency maps, LRP, Integrated Gradients, and tance, which highlights the important regions of the input im-
CAM are the most commonly employed. On the other hand, age. The weighted feature map is then passed through a ReLU
for ViTs, methods like attention visualization, Attention Roll- activation function to keep only positive values. The resulting
out, Attention Flow, Counterfactual Visual Explanations, and positively weighted feature map is up-sampled to match the size
Feature Attribution are extensively used. Figure 11 presents a of the input image. Finally, CAM provides a visual output, as
taxonomy of vision explainability. shown in Figure 12, by highlighting the most important regions
of the original input image using this up-sampled feature map
5.2.1. Explainability of CNNs [63, 107]. CAM does not give specific information on a partic-
Saliency Maps: In computer vision, a saliency map helps ular pixel and how that pixel is important to the model’s pre-
identify the most important regions of an image for a deep diction. Nevertheless, this technique can be a valuable tool for
learning model’s prediction. Various methods have been pro- interpreting CNNs and enhancing their interpretability. Conve-
posed to obtain a saliency map, including deconvolutional net- niently, it can additionally be used to improve the robustness of
works, backpropagation, and guided backpropagation algo- the model by identifying the parts of the images that are irrele-
rithms [60, 102, 103]. To generate the saliency map, a pre- vant to the prediction and discarding them [108].
processed input image is fed into a pre-trained CNN model to Layer-wise Relevance Propagation: LRP computes the rel-
obtain the probability distribution over various classes of the evance scores for each pixel in an image by propagating a clas-
trained model. The output probability gradients are then com- sification model’s output back through the network layers. The
puted using the backpropagation approach, with higher gradient relevance score determines the contributions of each pixel to the
values indicating the relevance of each pixel to the model’s pre- final model’s prediction [61, 109]. LRP visualizes the weighted
diction. The saliency map detects the highest gradient value relevance scores as a heat map and highlights pixels in the in-
16
features is processed and integrated across the network’s layers.
Attention Visualization: Transformer Interpretability Be-
yond Attention Visualization is an advanced Transformer ex-
plainability technique that interprets the decision-making pro-
cess of Transformers, including ViT [54]. This method extends
beyond traditional attention mechanism visualizations to reveal
the nuanced roles of individual neurons, the strategic influence
of positional encodings, and the intricate layer-wise interactions
within the Transformer. Examining the comprehensive func-
tionalities and contributions of various transformer components
aims to provide a more complete explanation of the model’s
behavior than attention weights alone can provide. This holis-
tic approach enhances our understanding and interpretability of
Figure 12: Visualization of CAM (input image from CIFAR 10 dataset).
how these models process input features and make decisions.
Counterfactual Visual Explanations: Counterfactual Visual
Explanations (CVE) is a vision interpretability technique de-
put image to indicate the most important pixels for the model’s
signed to explain the decision-making processes of ViT models.
prediction. The primary advantage of LRP for image classifica-
This method involves changing specific parts of an input image
tion tasks is its ability to assign a relevance score for each pixel
and observing the subsequent changes in the model’s output
from the input image, allowing us to understand how much each
[111, 112]. This allows people to identify which image fea-
pixel contributed to the model’s output. LRP provides valuable
tures are most significant in the model’s decision-making pro-
information to interpret the model, offering insights into which
cess. CVE is a practical and insightful approach to understand-
pixels are most important for the final prediction. By analyzing
ing how ViT models process and interpret images, providing a
the relevance score, LRP helps interpret and understand how the
tangible means to explore and improve these complex models’
model made the decision and produced the output. Addition-
interpretability.
ally, LRP provides insight into each internal layer’s inner work-
ings, offering valuable insights into understanding the internal
workings of DNN and improving the model’s performance. 5.2.3. Model-agnostic Explainer for Vision Models
Model-agnostic approaches, such as LIME and SHAP, have
also been adapted to approximate ViTs’ behavior with more in-
5.2.2. Explainability of Vision Transformers
terpretable models, which is particularly useful for individual
Vision Transformers (ViTs) are a category of deep learning
predictions. This adaptation process requires considerations
models that adapt the Transformer architecture to the domain
such as the computational complexity due to the Transformer
of image recognition tasks [110]. These models handle images
structure, which may affect the feature attribution, the high di-
as sequences of patches and treat each patch similarly to how
mensionality of image data, the dependency between input fea-
tokens are treated in NLP.
tures in image data, which violates the independence assump-
Feature Attribution Techniques: Gradient-based Saliency tion in SHAP, and the choice of an appropriate surrogate model
Maps, Integrated Gradients, CAM, and LRP are among the that balances fidelity to the original model with interpretability.
most common feature attribution techniques employed to ex-
plain the decision-making process of complex models such as
CNNs and ViTs. Saliency Maps identify the pixels most signif- 5.3. Explainability in Time Series
icant to the model’s output [102]. Integrated Gradients trace the Time series forecasting models are widely used in various
contribution of each feature from a baseline to the actual input domains, such as business, finance, meteorology, and medicine,
[88]. LRP backpropagates the output decision to assign rele- to predict the future values of a target variable for a given en-
vance scores to individual input features [54]. CAM provides a tity at a specific time [113, 114]. Time series data refer to an
visual representation of which regions in the input image were ordered sequence of collected data points accumulated during
most relevant to a particular class by highlighting them [104]. regularly spaced time intervals. The predictive power of time
Thus, all these approaches provide insights into the model’s fo- series forecasting models is rooted in statistical and ML tech-
cus areas for specific features or regions. niques that analyze historical data. However, these models can
Attention Rollout and Attention Flow Methods: These be complex, unintuitive, and opaque, requiring the use of XAI
are Transformer explainability approaches designed for to ensure fairness and trustworthiness, as well as to improve
Transformer-based models [69]. They are designed to track and their accuracy [115]. The XAI techniques can be used to iden-
understand the flow of information from individual input fea- tify bias in the data, improve the accuracy of predictions, and
tures through the network’s self-attention mechanism. These make the model more interpretable. For instance, XAI can be
approaches help interpret the complexity of ViTs by addressing used to identify whether the model is biased against certain
layer-wise information mixing. They provide deeper insights groups of patients, improve the accuracy of the model’s pre-
into the decision-making process of the model by offering im- dictions by identifying the most important features for its pre-
proved methods for understanding how information from input dictions, and provide explanations for the model’s predictions.
17
The use of time series forecasting models is ubiquitous in to identify the segments of the time series, which are then used
real-world datasets. Accurate time series forecasting of stock to produce the saliency mapping of the weight vector to the
prices, for example, can inform investment decisions and aid in original time series for visualization and easier explainabil-
risk management [116]. In meteorology and climate science, ity. In addition to CAM, there are other approaches for vi-
accurate time series forecasting can aid in weather prediction sualizing the intermediate activations of a DNN that utilizes
and disaster risk management [117]. Devices, such as sen- convolutional layers and GAP layers before the final output
sors, accurately record data points with time signatures for use layer [103, 123], and the visualization technique called De-
in a combination of statistical and ML techniques to analyze convolutional Networks [102]. Deconvolutional Networks can
historical data and make forecasts about future values. How- be repurposed for time series data by treating sequential data
ever, these models face challenges due to outliers, seasonal pat- points as channels, which enables the visualization and compre-
terns, temporal dependencies, and missing data, among other hension of hierarchical features learned by convolutional layers
factors. Therefore, the quality of the data and model architec- in a time series-specific context [124].
ture choices play a vital role in the accuracy of time series pre-
dictions, necessitating the use of XAI to provide human under- 5.3.3. TSViz
standing of the models’ performance.
TSViz (Time Series Visualization) is a model-agnostic set
of visualization techniques that help users explore and com-
5.3.1. Saliency Maps
prehend complex time series data, regardless of the type of
Generating saliency maps for time series data involves sev-
model used for analysis [125]. This XAI technique uses
eral essential preprocessing steps. Normalization is employed
various dimensionality reduction techniques, such as princi-
to ensure that all features are on the same scale, facilitating fair
pal component analysis (PCA [126]), t-distributed stochastic
comparisons between them. Reshaping transforms the data into
neighbor embedding (t-SNE [127]), and uniform manifold ap-
a structured representation, where each row corresponds to a
proximation and projection (UMAP [128]), to simplify high-
time step, and each column represents a feature. Windowing
dimensional time series data and generate visualizations that
and padding are crucial to capture local relationships within the
reveal trends, patterns, and correlations [129, 130]. These tech-
data [118], enabling the model to discern patterns and depen-
niques are especially useful for identifying complex relation-
dencies. The resulting saliency map assigns a value to each
ships that might be difficult to detect with traditional visualiza-
time step in the input time series, with higher values indicating
tion methods [131].
greater model reliance on that particular time step. This visu-
TSViz is a post-hoc, human-in-the-loop XAI tool that sup-
alization aids in comprehending the interplay among various
ports analysis and decision-making by human experts [125].
features and their impact on the final prediction [119].
Human interpretation and input are essential for comprehend-
Saliency maps face challenges in accurately identifying fea-
ing the data and visualizations generated, and users are re-
ture importance over time and sensitivity to the underlying
sponsible for determining metrics and making necessary adjust-
model architecture [120]. Although the concept of saliency
ments to the model [132]. TSViz enhances users’ ability to ana-
maps is model-agnostic, their specific implementation tends to
lyze and model time-series data through various visualizations,
be model-specific, which underscores their role as a valuable
including line plots, heatmaps, seasonal plots, autocorrelation
output standard for other XAI techniques [121]. In analyzing
plots, and spectral density plots. However, it does not replace
time series data, saliency maps offer potential insights into the
the importance of human expertise and judgement required for
decision-making process, facilitating the identification of data
effective decision-making and insight drawing [133]. Overall,
or model-related issues. However, their dependence on model
TSViz is an essential tool that empowers users to make data-
architecture emphasizes the need to complement their usage
driven decisions and predictions, gain a deeper understanding
with other XAI techniques in comprehensive data analysis.
of underlying systems and phenomena, and identify potential
5.3.2. CAM sources of bias in their models.
Although originally designed for image data, CAMs [63] can
be applied to time series data by treating each time step as a sep- 5.3.4. LIME
arate “channel” in an image, similar to different color channels, Applying LIME to time series data introduces a unique set
such as red, green, and blue [122]. The time series can be win- of challenges stemming from the inherent temporal dependen-
dowed of fixed size and then stacked as different channels in cies within such data. In contrast to static tabular data, time
the image. The size of the windows can be chosen based on series data necessitates an understanding of how events evolve
the length of the time series and the desired level of detail. For over time. By crafting local, interpretable models that approx-
example, if the time series is 1,000 time steps long, you could imate black-box model predictions within specific time seg-
use windows of size 100 time steps. The windows can be over- ments, LIME equips analysts with a powerful tool for decipher-
lapped by a certain amount. ing the temporal nuances that influence outcomes [134]. One
The last convolutional layer is then used to compute a way to do this is to use a sliding window of fixed length that
weighted sum of the feature maps, which is upsampled to the moves along the time series data, or to use an attention mech-
original image size to generate the CAM. This technique uti- anism that identifies which parts of the time series are most
lizes the weights from the global average pooling (GAP) layer relevant to the prediction at each time point [135].
18
Let’s consider a specific example: a prediction model for the the robot. How much communication and explanation between
stock price on a particular day is explained using LIME [136]. a robot, patient, or physician is sufficient? The power of XAI
The most important time steps for the prediction is identified comes here to provide transparent, understandable, and inter-
by obtaining feature importance scores or time series cross- pretable explanations for the AI model decisions to build trust
validation. For example, we may find that the previous day’s between them. Hence, XAI is necessary in healthcare due to
stock price, trading volume, and news articles are the most im- its role in enhancing trust and confidence, ensuring faithfulness
portant features. Next, the time series is perturbed by randomly to ethical considerations, managing regulatory compliance, im-
changing the values of the most important time steps. The proving clinical decision-making, facilitating ongoing learning
amount of change is controlled by a hyperparameter to reduce and model fine-tuning, improving risk management and analy-
noise in the local model [51]. The perturbed time series is fed to sis, and enhancing collaborative communications between clin-
the black-box model and records the corresponding predictions. icians, patients, and AI models [138]. XAI has several crucial
This process is repeated for multiple perturbed instances. Then, applications in various aspects of healthcare, such as medical
we train a local, interpretable model using the perturbed time diagnosis, patient treatment, drug discovery and development,
series and their associated predictions. The coefficients of the clinical decision support, and risk assessment.
model can be examined to identify the most important features. Medical diagnosis: Medical data in healthcare are diverse
For instance, the previous day’s stock price has the highest co- and sophisticated, incorporating various types and sources, in-
efficient, indicating that it is the most important feature for the cluding imaging (like MRIs, CT scans, and X-rays), electronic
prediction. Finally, the coefficients of the local model are used health records, genomic data, laboratory test results, patient-
to explain the model’s prediction. If the coefficients show that reported data, wearable device data, social and behavioral data,
the previous day’s stock price was the most important feature, and pharmacological data. XAI improves disease diagnostics
we can say that the black-box model predicted the stock price by providing transparent interpretations of AI model decisions,
based on the previous day’s price. ensuring accurate disease identification, and predicting patient
outcomes through comprehensible explanations of the model
5.3.5. SHAP outputs for these complex medical data [22, 139]. XAI also
SHAP can be seamlessly applied to time series data, where it helps identify influential features from the complex medical
offers valuable global insights into a model’s decision-making data affecting the model decision-making process [18].
process over time. This capability aids in comprehending the Patient treatment: By examining the explanations of unique
evolving behavior of time series models and the identification health data, XAI helps design treatment plans for individual pa-
of trends or patterns influencing predictions. tients. It provides good insights into why a specific medicine
First, we train a time series model on a dataset of historical is suggested based on a patient’s medical data and diagno-
time series data. Once the model is trained, we can use SHAP sis [156].
to explain the output of the model for a new time series. To do Drug discovery and development: XAI plays a vital role
this, we need to calculate the SHAP values for each feature in in drug discovery and development in the pharmaceutical in-
the time series. For time series data, each feature corresponds to dustry [157]. It explains and provides insights into complex
a specific time step in the time series. To calculate the Shapley relationships among the molecular structures of drugs and their
values for a time series, we create a dataset that contains pertur- biological effects [158].
bations of the original time series. Each perturbation consists of Clinical decision support: XAI systems assist healthcare
a modified version of the original time series where the value of professionals by providing transparent and interpretable expla-
one time step is changed while keeping the values of all other nations of the decision-making processes of models handling
time steps fixed. These differences are used to calculate the complex data [159]. Clinicians can more easily understand
Shapley values for each time step in the time series. The Shap- complex cases by considering the influential features high-
ley values can be used to understand how the model works and lighted in the explanations provided by XAI systems [20].
to identify the most important features for the prediction. For Legal considerations: The use of XAI in healthcare raises
example, we can plot the Shapley values for each time step in several legal considerations. The model decision-making pro-
the time series to visualize the contribution of each time step to cess and how and why the model made that decision should be
the model’s output. transparent and understandable for healthcare professionals and
patients [22]. XAI systems should be ensured and safeguarded
for the privacy and security of medical and patient data [160].
5.4. Explainability in Healthcare
XAI systems mitigate the biases and ensure that AI-made deci-
Healthcare is a critical domain due to the high risks and com- sions are fair and reasonable across diverse patients [22]. Reg-
plexities of the medical decision-making process. For example, ulatory transparency and audits [161], medical device regula-
a study has found that human surgeon usually explains the de- tions [162], informed consent [163], liability and malpractice
tails of the surgery beforehand and has a 15% chance of causing concerns [164], and intellectual property rights [21] are among
death during the surgery. A robot surgeon has only a 2% chance the most critical legal considerations in the application of XAI
of death. A 2% risk of death is better than a 15% risk, but people in healthcare.
prefer the human surgeon [137]. The challenge is how to make Ethical considerations: The ethical considerations of XAI
the physicians and patients trust the deployed AI model within in healthcare are complicated and significant [165, 166]. It fo-
19
Table 3: Summarize XAI techniques in Healthcare.

Data
No. Medical data XAI Techniques Application Areas Benefits Papers
Type
X-rays, ultra- LRP, LIME, CAM, Radiology, Pathology, Interpretable
[140, 141,
1 Image sound, MRI and Saliency Maps, and Dermatology, Ophthal- image anal-
142, 143]
CT scans, etc Integrated Gradients mology, and Cardiology ysis
Clinical text, LIME, SHAP, Atten-
Electronic Health tion Mechanism, and Drug Safety, and Medical Interpretable [144, 145,
2 Text
Records (EHRs), Counterfactual Expla- Research text analysis 146]
and case studies nations
LIME, SHAP, Deci-
Patient de-
sion Trees, Rule-based
mographics, Patient Health Monitor- Interpretable
Structured Systems, Counterfac-
laboratory test ing and Management, Epi- structured
3 (Nu- tual Explanations, [7, 146, 147]
results, pharmacy demiology, and Clinical data analy-
meric) Integrated Gradients,
records, billing Trials and Research sis
and BERT Explana-
and claims
tions
Neurology and EEG
ECGs, EEGSs, TSViz, LIME, CAM,
Monitoring, Patient Mon- Interpretable
Time se- monitoring and SHAP, Feature Impor-
4 itoring in Critical Care, time series [148, 149]
ries wearable device tance, and Temporal
and Cardiology and Heart analysis
data Output Explanation
Health Monitoring
LIME, SHAP, At-
Cancer Diagnosis and
Telemedicine tention Mechanisms, Interpretable
Multi- Treatment, Neurology and
5 interactions (text, Multi-modal Fusion multi-modal [150, 151]
modal Brain Research, and Men-
audio, video) and Cross-modal analysis
tal Health and Psychiatry
Explanations
Sensitivity Analysis,
Genomic Medicine, On- Interpretable
LIME, SHAP, and
6 Genetic Genetic makeup cology, and Prenatal and genetic [152, 153]
Gene Expression
Newborn Screening analysis
Network Visualization
Saliency Maps, LRP,
Cardiology, Pulmonology, Interpretable
Heart and lung SHAP, LIME, and
7 Audio Mental Health, and Sleep audio analy- [154, 155]
sounds Temporal Output
Medicine sis
Explanation

cuses on the sensitivity and importance of medical information, 5.5. Explainability in Autonomous Vehicles
the decision of the AI model, and the explanations of the de-
Autonomous vehicles use complex and advanced AI systems
ployed XAI system. Transparency and accountability [22], fair-
by integrating several deep-learning models that can handle var-
ness and bias mitigation [165], ethical frameworks and guide-
ious data types, such as images, videos, audio, and informa-
lines [167], privacy and confidentiality [168] are some of the
tion from LIDAR and radar [170]. These models utilize inputs
key ethical aspects of XAI in healthcare. Medical data are com-
from diverse sources, including cameras, sensors, and GPS, to
plex and diverse, requiring the use of a variety of XAI tech-
deliver safe and accurate navigation. A crucial consideration
niques to interpret it effectively. Table 3 summarizes various
is determining which data are most critical. What information
XAI techniques and their application areas in healthcare. XAI
takes precedence, and why? Understanding the importance of
faces several challenges in healthcare, such as the complexity
different data types is key to enhancing our models and learn-
and diversity of medical data, the complexity of AI models,
ing effectively from the gathered information [171]. To address
updating XAI explanations in line with the dynamic nature of
these questions and better decision-making processes of black-
healthcare, the need for domain-specific knowledge, balancing
box AI models, developers use the XAI approach to evaluate
accuracy and explainability, and adhering to ethical and legal
the AI systems. Implementing XAI in autonomous vehicles
implications [169, 21].
significantly contributes to human-centered design by promot-
ing trustworthiness, transparency, and accountability. This ap-
proach considers various perspectives, including psychological,
sociotechnical, and philosophical dimensions, as highlighted in
20
Shahin et al. [172]. Table 4 shows a summary of various XAI tively manage the complexities encountered in these situations.
techniques in autonomous vehicles, including visual, spatial, Human-AI decision-making (collaboration): In recent au-
temporal, audio, environmental, communication, genetic, and tonomous vehicle systems, machine learning models are uti-
textual. The advancements significantly enhance the AI-driven lized to assist users in making final judgments or decisions,
autonomous vehicle system, resulting in a multitude of com- representing a form of collaboration between humans and AI
prehensive, sustainable benefits for all stakeholders involved, systems [183]. With XAI, these systems can foster appropriate
as follows. reliance, as decision-makers may be less inclined to follow an
Trust: User trust is pivotal in the context of autonomous AI prediction if an explanation reveals flawed model reasoning
vehicles. Ribeiro et al. [26] emphasized this by stating: “If [184]. From the users’ perspective, XAI helps to build trust and
users do not trust a model or its predictions, they will not use confidence through this collaboration. In contrast, in terms of
it”. This underscores the essential need to establish trust in the developers and engineers, XAI helps to debug the model, iden-
models we use. XAI can significantly boost user trust by pro- tify the potential risks, and enhance the models and the vehicle
viding clear and comprehensible explanations of system pro- technology [185].
cesses [173]. Israelson et al. [174] highlighted the critical need
for algorithmic assurance to foster trust in human-autonomous 5.6. Explainability in AI for Chemistry and Material Science
system relationships, as evidenced in their thorough analysis.
In chemistry and material science, AI models are becoming
The importance of transparency in critical driving decisions,
increasingly sophisticated, enhancing their capability to predict
noting that such clarity is crucial for establishing trust in the
molecular structures, chemical reactions, and material behav-
autonomous capabilities of self-driving vehicles [175].
iors, as well as discover new materials [195]. Explainability
Safety and reliability: These are critical components and
in chemistry and material science increases beyond simply un-
challenges in developing autonomous driving technology [176].
derstanding and analyzing model outputs; it encompasses un-
Under the US Department of Transportation, the American Na-
derstanding the rationale behind the model predictions [196].
tional Highway Traffic Safety Administration (NHTSA) has es-
XAI techniques play a crucial role in obtaining meaningful in-
tablished specific federal guidelines for automated vehicle pol-
sights and causal relationships, interpreting complex molecular
icy to enhance traffic safety, as outlined in their 2016 policy
behaviors, optimizing material properties, and designing inno-
document [177]. In a significant development in March 2022,
vative materials through the application of AI models [197]. By
the NHTSA announced a policy shift allowing automobile man-
explaining how and why machine learning models make pre-
ufacturers to produce fully autonomous vehicles without tradi-
dictions or decisions, researchers and practitioners in the field
tional manual controls, such as steering wheels and brake ped-
can more confidently trust machine learning models for ana-
als, not only in the USA but also in Canada, Germany, the UK,
lytical investigations and innovations. This understanding is
Australia, and Japan [172]. Following this, The International
important to increasing trust, facilitating insights and discov-
Organization for Standardization (ISO) responded by adopting
eries, enabling validation and error analysis, and dealing with
a series of standards that address the key aspects of automated
regulatory and ethical considerations in AI models [198]. The
driving. These standards are designed to ensure high levels of
study, “CrabNet for Explainable Deep Learning in Materials
safety, quality assurance, efficiency, and the promotion of an
Science” [199], focuses on improving the compositionally re-
environmentally friendly transport system [178]. Besides, Kim
stricted attention-based network to produce meaningful mate-
et al. [179, 180] described the system’s capability to perceive
rial property-specific element representations. These represen-
and react to its environment: The system can interpret its oper-
tations facilitate the exploration of elements’ identities, similar-
ational surroundings and explain its actions, such as “stopping
ities, interactions, and behaviors within diverse chemical envi-
because the red signal is on”.
ronments [199]. Various model-agnostic and model-specific in-
Regulatory compliance and accountability: Public insti-
terpretability methods are employed in chemistry and material
tutions at both national and international levels have responded
science to explain the prediction of black-box models’ molec-
by developing regulatory frameworks aimed at overseeing these
ular structure, chemical reactions, and the relationship between
data-driven systems [172]. The foremost goal of these regula-
chemical composition [200, 201, 202] and design of new mate-
tions is to protect stakeholders’ rights and ensure their authority
rials [197].
over personal data. The European Union’s General Data Protec-
tion Regulation (GDPR) [181] exemplifies this, establishing the
“right to an explanation” for users. This principle underscores 5.7. Explainability in Physics-Aware AI
the importance of accountability, which merges social expec- Physics-aware artificial intelligence focuses on integrating
tations with legislative requirements in the autonomous driving physical laws and principles into machine learning models to
domain. XAI plays a pivotal role by offering transparent and in- enhance the predictability and robustness of AI models [203].
terpretable insights into the AI decision-making process, ensur- Explainability in physics-aware AI is crucial for understanding
ing compliance with legal and ethical standards. Additionally, and interpreting these models. It also bridges the gap between
achieving accountability is vital for addressing potential liabil- the black-box nature of AI models and physical understand-
ity and responsibility issues, particularly in post-accident in- ing, making them more transparent and trustworthy [204]. Sev-
vestigations involving autonomous vehicles, as highlighted by eral approaches exist to explain physics-aware AI models [205].
Burton et al. [182]. Clear accountability is essential to effec- Domain-specific explanation methods are designed for specific
21
Table 4: Summarize XAI techniques in Autonomous Vehicles.

AI XAI
No. Input Types Sources Key Benefits Papers
models Techniques
Enhancing visual environmental inter-
LRP, CAM,
Camera action, understanding dynamic driving
CNN, Saliency Maps,
Images scenarios, allowing correct object de-
ViTs, Integrated [186,
1 Visual Data and tection, interpreting real-time decision-
RNN, or Gradients, 187]
Video making, and adaptive learning process by
LSTM Counterfactual
Streams providing insights into AI’s model predic-
Explanations
tion.
Enhancing 3D space and object inter-
Feature Impor- actions, improving safety, security, reli-
LIDAR
CNN, tance, SHAP, ability, design, development, and trou- [188,
2 Spatial Data and
DNN CAM, LRP, bleshooting of the car by providing in- 189]
Radar
LIME sights into the AI’s spatial data processing
and model decision-making
TSViz, LIME, provides insights into time-series data for
CAM, SHAP, reliable decision-making, identifying po-
Temporal RNN,
LRP Feature tential safety issues, enhancing overall ve- [190,
3 Data (Time- Sensor LSTM,
Importance, hicle safety, and a deeper interpretation of 191]
Series) GRU
Temporal Out- the vehicle’s actions over time for post-
put Explanation incident analysis
enhancing the vehicle’s ability to under-
LRP, CAM,
stand and react to auditory signals, im-
Auditory Micro- CNN, Saliency Maps,
4 proving safety and security by provid- [172]
Data phone RNN Attention Visu-
ing insights into AI’s audio data decision-
alization
making process
Rule-based Enhanced decision-making, improving
Environmental GNN,
GPS and Explanations, safety and efficiency, increasing safety,
Data Random
Cloud- Decision Trees, security, and reliability by providing [172,
5 (Weather Forest,
based SHAP, LIME, insights into environmental factors and 192]
& Geoloca- Gradient
services Counterfactual diverse conditions in the AI model’s
tion) Boosting
Explanations decision-making process
Engine LRP, LIME, Helping to interpret engine data and ve-
Control SHAP, De- hicle status information, predicting main-
Vehicle
Unit, cision Trees, tenance and potential issues, improv-
Telematics DNN,
6 On-Board Rule-based ing safety, reducing risk, and providing [193]
(engine & in- SVM
Diag- Explanations, clear vehicle health status through insights
ternal status)
nostics, Counterfactual gained from the AI model decision-making
Sensors Explanations process
Personal
Communica- Provide insights into model decision-
devices, LRP, Saliency
tion Data Reinfor- making that enhances trust and safety, in-
Cloud, Maps, Coun-
7 (Vehicle-to- cement teractions with external factors, and im- [194]
Vehicles, terfactual
Everything or Learning proving decision-making by interpreting
Cellular Explanations
V2X) complex V2X communications
networks
Provides textual explanations to model
Generative Lan-
users that enable them to interpret di-
8 All All All guage Model [185]
versified datasets and complex AI model
Explanation
decision-making process

domains, such as fluid dynamics, quantum mechanics, or ma- els to ensure their decisions are understandable in various sce-
terial science [206, 207]. Model-agnostic explanations are also narios regardless of the specific model architecture [208, 209].
used to explain the general behavior of physics-aware AI mod- In the context of physics-aware AI, explainability offers several

22
key advantages, such as enhancing trust and interpretability, represented by a clear and interpretable visualization mecha-
ensuring physically plausible predictions, improving model ar- nism. The explanations should build the end-users’ trust by pro-
chitecture and debugging, providing domain-specific insights, viding a transparent, consistent, and reliable decision-making
bridging knowledge gaps, ensuring regulatory compliance, and process for the model [215]. Hence, the designed evaluation
facilitating human-AI collaboration [204, 210]. methods ensure the users’ trust and satisfaction level through
surveys, interviews, questionnaires, behavioral analysis, and
other handy tools. The users’ satisfaction is the most essential
6. XAI Evaluation Methods aspect of evaluation methods. The XAI system can be evalu-
ated by collecting the users’ feedback and assessing their emo-
XAIs are essential in today’s advancing AI world to ensure tional responses [216]. Ease of using the XAI system and the
trust, transparency, and understanding of AI ethical decision- usefulness of the generated explanations to end users also pro-
making, particularly in sensitive domains like healthcare, fi- vide insights into the values of the explanation and XAI system.
nance, military operation, autonomous systems, and legal is- XAI systems can be assessed by evaluating how effectively the
sues. However, we need evaluation mechanisms to measure the generated explanations support users’ decision-making. How
generated explanations to ensure their quality, usefulness, and much do the XAI explanations apply to their decision-making
trustworthiness. XAI system evaluation methods are classified process, reduce errors, and enhance productivity? The human-
into human-centered and computer-centered categories based centered evaluation method also assesses the cognitive load that
on their applications and methodologies to judge the effective- ensures the provided XAI explanations do not affect the user’s
ness of XAI techniques [211, 212]. Figure 13 shows a simple cognitive processing capacity [215].
taxonomy of XAI evaluation methods.

6.2. Computer-Centered Approach

The computer-centered method aims to the evaluation of XAI


techniques based on technical and objective standards without
the interactions of human interpretation [217]. This method
involves important XAI technical, objective, and quantifiable
metrics such as fidelity, consistency, robustness, efficiency, and
other dimensions [218].

6.2.1. Fidelity
Fidelity refers to how the provided XAI explanations are
close to the actual decision made by a model focusing on the ac-
curacy of representation, quantitative measurement, and com-
plex model handling [219]. Does the explanation reflect the
accurate reasoning decision process of a model? Does the ex-
planation contain essential information about complex models,
such as deep learning models? Hence, high fidelity reflects that
the explanation is an accurate interpretation. Fidelity is com-
puted at the instance level using the following formula [220]:

n
X Y(xi ) − Y(xi′ )
S =1− , (20)
i=1
|Y(xi )|
Figure 13: A suggested classification framework for assessing the efficacy of
XAI systems (adapted from [211]). where n is total number of inputs, x is the original input for the
process instance, X ′ is the set of perturbations for x and x′ ∈ X ′ ,
Y(x) is the model output given input x, and Y(x′ ) is the model
6.1. Human-Centered Approach output given input x′ .

The human-centered approach evaluates how the provided


XAI explanations meet model users’ needs, understanding lev- 6.2.2. Consistency
els, and objectives from the human perspective [213]. This ap- Consistency focuses on the stability and coherence of the ex-
proach is concerned with comprehensibility, trust, user satis- planations provided by the XAI system with the same input and
faction, and decision-making support of the provided XAI ex- different model runs. Stability, uniformity, and predictability
planations [214]. The generated explanations should be clear, are the most important aspects of consistency. The consistency
brief, and easily understandable by the end-users without tech- of XAI systems can be determined in different ways [221]. The
nical knowledge. The explanation should also be relevant and following equations compute stability and uniformity.
23
Stability can be represented by the variance in explanations Computational speed (C s ) is the rate at which an XAI system
over multiple runs with the same input. can generate explanations.
N 1
1 X Cs = , (25)
σ2exp = (ei − ē)2 , (21) T ×R
N i=1
where T is the time taken to generate an explanation, and R
where σ2exp is variance of explanations, ei is the explanation for is the computational resources used, such as memory or CPU
the ith run, ē is the average of all explanations, and N is the total cycles. The lower C s indicates the higher efficiency.
number of runs. Scalability (S ) is the ability of an XAI system to handle in-
Uniformity (U) quantifies how uniformly distributed the ex- creasing volumes of input data and explanations.
planation features are. The higher value of U implies that the P(n)
distribution of features in the explanation is closer to a uniform S = lim , (26)
n→∞ n
distribution.
v
u where P is the system’s performance measure as the input n size
N !2
t
1 X 1 grows. The XAI system is scalable and efficient if S is bounded
U=1− rn − , (22) as n increases.
N N=1 N
6.2.5. Sufficiency
where rn is the relevance of nth feature.
The sufficiency metric assesses the adequacy of rationales in
supporting the model’s decision-making process [224]. A ratio-
6.2.3. Robustness nale is a subset of input data that a model identifies as critical
The robustness of the XAI system assesses the resilience of for its decision-making process. This metric evaluates whether
explanations in the behavior of model change in various aspects the rationales alone are sufficient for the model to maintain its
such as input change and adversarial attack. The robustness prediction confidence. It measures the difference in the model’s
also focuses on model update adaptability and generalizabil- confidence between using the entire input and using just the ra-
ity. Is the evaluation method applicable to various XAI systems tionale, and it is mathematically represented as:
on various platforms? Does the evaluation method continue its
function if the XAI system is updated? The robustness of XAI sufficiency = m(xi ) j − m(ri ) j (27)
systems can be computed using the perturbation approach using
the following formulas [222]. where m(xi ) j is the model’s confidence score for the full input
Resistance to Input Perturbations (R), obtained using the fol- xi and m(ri ) j is the model’s confidence score when only the ra-
lowing equation: the change in explanations with slightly per- tionale ri is provided.
turbed inputs. Measuring the difference in prediction confidence between the
full input and the rationale helps determine whether the ratio-
1 X
N nale alone can sustain the model’s decision-making process. A
R=1− ∥exp(xi ) − exp(xi′ )∥, (23) small difference indicates high sufficiency, meaning the ratio-
N i=1
nale captures most of the essential information. In contrast, a
large difference suggests low sufficiency, indicating that the ra-
where exp(x) is the explanation for input x, xi′ is the perturba-
tionale may be missing important information or that the model
tion of xi , and N is the total number of perturbations.
relies on other parts of the input.
Adaptability to Model Changes (A): the change in explana-
We can use alternative formulas to compute sufficiency.
tions after the model is updated.
Confidence Ratio (CR): It calculates the ratio of the model’s
N confidence with the rationale to the confidence with the full in-
1 X
A= ∥expm (xi ) − expm′ (xi )∥, (24) put.
N i=1
m(ri ) j
where expm (x) is the explanation from model m, expm′ (x) is CR = (28)
m(xi ) j
from the updated model m′ , and N is the total number of sam-
A CR close to 1 indicates the rationale is highly sufficient, a
ples.
significantly lower CR (close to 0) suggests it is insufficient,
and a CR greater than 1 might indicate potential overfitting or
6.2.4. Efficiency anomalies in the model’s reliance on the rationale.
The efficiency of the evaluation method involves computa- Percentage Drop in Confidence (PDC): It measures the per-
tional capacity and resources, such as resource utilization and centage decrease in confidence when using the rationale com-
time, to generate explanations and scalability that the evalua- pared to the full input.
tion method handles large-scale explanations. The following
!
simple computing formulas can represent computational speed m(ri ) j
and scalability, respectively [223]. PDC = 1 − × 100% (29)
m(xi ) j
24
A lower PDC indicates that the rationale is highly sufficient 7.5. Security and Privacy
with minimal loss of confidence, whereas a higher PDC sug-
The applications of complex AI systems have exhibited sev-
gests that the rationale is insufficient.
eral challenges toward an ethical code, such as security, privacy,
fairness, bias, accountability, and transparency. For example,
7. Future Research Direction the diversity of ethical issues is one of the main challenges,
The existing XAI systems have faced several challenges in as current ethical studies have shown. XAI systems help in-
various aspects, including design objectives, applications, stan- vestigate and explain how and why the model made such an
dardization, model complexity, security, and evaluation met- ethical decision. However, XAI systems themselves result in
rics. Further research is required to overcome these challenges privacy, security, and other related challenges and require an-
and enhance the state-of-the-art. In this section, we discuss other special consideration. XAI explanations may be causes
some of the most common XAI challenges and future research of information leakage, model inversion attacks, adversarial at-
directions. tacks, and explanation integrity. There is a trade-off between
explainability with security and privacy. Balancing this trade-
7.1. Model Complexity off using some strategies, including privacy-preserving, selec-
XAI techniques are often less effective with highly complex tive explanation, anonymization, secure communication, and
models [48]. Developing AI models by reducing complexity auditing techniques, is crucial. Integrating differential privacy
without compromising much on accuracy is a challenging task. techniques, generating explanations that only highlight broad
Model simplification, building hybrid models, and interactive features, producing a generalized explanation, and implement-
explanations may be possible approaches to overcoming the ex- ing access control mechanisms may also be possible to over-
isting challenges. come the trade-off between performance and interpretability.

7.2. Building ML Models with Explanation 7.6. Explainability of Multi-modal Models


Building AI models with explanations is crucial for safety-
critical application areas [225]. However, it is not only a tech- Multimodal AI models are designed to process multiple data
nical challenge but also involves practical considerations such modalities, such as texts, images, audio, videos, and other
as ethical and legal issues. Building accurate AI models with modalities. Explaining the fusion of modalities, intermodal
explanations is complex and requires further research. Apply- relationships, scalability, data heterogeneity, and high dimen-
ing XAI in the training stage, using data-driven insights, un- sionality are some of the most complex challenges in the cur-
derstanding the model’s predictions, and continuous interpre- rent state-of-the-art [65]. Therefore, designing combined XAI
tations may be the possible approaches to building AI models techniques may be helpful to address these challenges. The de-
with explanations [226]. sign process requires multi-disciplinary efforts from machine
learning, computer vision, natural language processing, and
7.3. Performance vs. Interpretability human-computer interaction expertise. Specifically, large lan-
The trade-off between Performance and Interpretability is guage models, GPT, are on the way to becoming Any-to-Any
one of the biggest challenges. Simplifying a model for better Multimodal models [227]. Hence, the Any-to-Any Multimodal
interpretability may lead to reduced accuracy of the model. Per- explanation technique is required, but it is complex and chal-
formance is critical for time-sensitive applications and complex lenging.
models, whereas interpretability and explainability are essential
for safety-critical applications to trust a model. Hence, the rec- 7.7. Real-time Explanation
ommended solution is finding a balance between performance
and interpretability. Complex AI models have recently become more popular
and deployed in real-time situations in various non-safety and
7.4. Standardization and Evaluation Methods safety-critical application areas. Safety-critical application ar-
The right evaluation metrics are essential to measure the per- eas such as healthcare monitoring, autonomous cars, military
formance of XAI systems. The AI models are designed to operations, and robotics models should provide real-time expla-
solve various problems with various design objectives [211], nations to ensure safety. However, the existing state-of-the-art
and these diversified AI systems require different types of XAI XAIs have several challenges addressing this real-time explana-
systems. Hence, applying the same evaluation metrics to dif- tion aspect. There are various factors for these XAI challenges.
ferent XAI systems can be challenging because the design ob- For example, a DNN may require numerous layers, thousands,
jectives of XAI systems are different. For example, the design millions, or billions of parameters to process the real-time in-
objectives of interpretability, accuracy, fairness, robustness, and put data, which is time-consuming, especially for large mod-
transparency are different. Each of these XAI design objectives els. On the other hand, large volumes of data are generated
may require different evaluation metrics, and it is essential to continuously in real-time situations. For instance, autonomous
select the right evaluation metrics that align with the design ob- cars have continuous and constant data streams from various
jectives of the XAI systems to address this challenge. More- sources, such as sensors, lidars, and radars with latency con-
over, applying the combinations of different evaluation metrics straints. Hence, processing and providing a model explanation
to measure the performance of XAI systems may be helpful. for this large amount of data and the latency constraint requires
25
efficient XAI algorithms and techniques in real-time. Much ef- [2] W. Samek, T. Wiegand, K.-R. Müller, Explainable Artificial Intelli-
fort is needed to address these challenges and meet the real- gence: Understanding, visualizing and interpreting deep learning mod-
els, arXiv preprint arXiv:1708.08296 (2017).
time requirements of safety-critical applications by consider- [3] A. Shrivastava, P. Kumar, Anubhav, C. Vondrick, W. Scheirer, D. Pri-
ing model optimization, parallel processing, efficient XAI al- jatelj, M. Jafarzadeh, T. Ahmad, S. Cruz, R. Rabinowitz, et al., Novelty
gorithms, hybrid approaches, and other handy techniques. in image classification, in: A Unifying Framework for Formal Theories
of Novelty: Discussions, Guidelines, and Examples for Artificial Intel-
7.8. Multilingual and Multicultural Explanation ligence, Springer, 2023, pp. 37–48.
[4] G. Vilone, L. Longo, Explainable Artificial Intelligence: a systematic
In recent years, large AI models that work on the diversity review, arXiv preprint arXiv:2006.00093 (2020).
of languages and cultures have been employed. However, these [5] G. Schwalbe, B. Finzel, A comprehensive taxonomy for explainable ar-
AI models face several challenges because of the users’ expec- tificial intelligence: a systematic survey of surveys on methods and con-
cepts, Data Mining and Knowledge Discovery (2023) 1–59.
tations, multilingual and multicultural nature. Hence, suitable
[6] G. Marcus, Deep learning: A critical appraisal, arXiv preprint
XAI methods are crucial to providing meaningful model expla- arXiv:1801.00631 (2018).
nations for cultural variations, regional preferences, and lan- [7] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pe-
guage diversities that allow us to address harmful biases and dreschi, A survey of methods for explaining black box models, ACM
Computing Surveys (CSUR) 51 (5) (2018) 1–42.
keep sensitive cultural norms in diverse societal environments. [8] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, L. Kagal, Ex-
plaining explanations: An overview of interpretability of machine learn-
ing, in: 2018 IEEE 5th International Conference on Data Science and
8. Conclusion
Advanced Analytics (DSAA), IEEE, 2018, pp. 80–89.
The applications of complex AI models are widely integrated [9] A. Adadi, M. Berrada, Peeking inside the black-box: a survey on Ex-
plainable Artificial Intelligence (XAI), IEEE Access 6 (2018) 52138–
with our daily life activities in various aspects. Because of 52160.
their complex nature, the demand for transparency, accountabil- [10] A. B. Arrieta, N. Dı́az-Rodrı́guez, J. Del Ser, A. Bennetot, S. Tabik,
ity, and trustworthiness is highly increased, specifically when A. Barbado, S. Garcı́a, S. Gil-López, D. Molina, R. Benjamins, et al.,
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, oppor-
these complex models make automated decisions that impact tunities and challenges toward responsible AI, Information Fusion 58
our lives differently. XAI techniques are required to explain (2020) 82–115.
why the model made that decision or prediction. In this sur- [11] D. Minh, H. X. Wang, Y. F. Li, T. N. Nguyen, Explainable Artificial
vey, we explored the standard definitions and terminologies, the Intelligence: a comprehensive review, Artificial Intelligence Review
(2022) 1–66.
need for XAI, the beneficiaries of XAI, techniques of XAI, and [12] M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt,
applications of XAI in various fields based on the current state- A. Sesing, K. Baum, What do we want from Explainable Artificial In-
of-the-art XAI works of literature. The study focuses on post- telligence (XAI)?–A stakeholder perspective on XAI and a conceptual
hoc model explainability. The taxonomy is designed to provide model guiding interdisciplinary XAI research, Artificial Intelligence 296
(2021) 103473.
high-level insights for each XAI technique. We classified XAI [13] T. Speith, A review of taxonomies of Explainable Artificial Intelligence
methods into different categories based on different perspec- (XAI) methods, in: 2022 ACM Conference on Fairness, Accountability,
tives, such as training stage, scopes, and design methodolo- and Transparency, 2022, pp. 2239–2250.
gies. In the training stage, ante-hoc and post-hoc explanation [14] T. Räuker, A. Ho, S. Casper, D. Hadfield-Menell, Toward transparent AI:
A survey on interpreting the inner structures of deep neural networks, in:
techniques are two different ways to explain the inner workings 2023 IEEE Conference on Secure and Trustworthy Machine Learning
of AI systems. Perturbation-based and gradient-based methods (SaTML), IEEE, 2023, pp. 464–483.
are two of the most common algorithmic design methodologies [15] L. Weber, S. Lapuschkin, A. Binder, W. Samek, Beyond explaining: Op-
for the development of XAI techniques. We discussed different portunities and challenges of XAI-based model improvement, Informa-
tion Fusion 92 (2023) 154–176.
perturbation and gradient-based XAI methods based on their [16] M. R. Islam, M. U. Ahmed, S. Barua, S. Begum, A systematic review
underlying mathematical principles and assumptions, as well of explainable artificial intelligence in terms of different application do-
as their applicability and limitations. Furthermore, we discuss mains and tasks, Applied Sciences 12 (3) (2022) 1353.
[17] A. Holzinger, G. Langs, H. Denk, K. Zatloukal, H. Müller, Causability
the usage of XAI techniques to explain the decisions and pre- and explainability of artificial intelligence in medicine, Wiley Interdis-
dictions of ML models employed in natural language process- ciplinary Reviews: Data Mining and Knowledge Discovery 9 (4) (2019)
ing and computer vision application areas. Our objective is to e1312.
provide a comprehensive review of the latest XAI techniques, [18] J. Lötsch, D. Kringel, A. Ultsch, Explainable Artificial Intelligence
(XAI) in biomedicine: Making AI decisions trustworthy for physicians
insights, and application areas to XAI researchers, XAI prac- and patients, BioMedInformatics 2 (1) (2021) 1–17.
titioners, AI model designers and developers, and XAI bene- [19] R. González-Alday, E. Garcı́a-Cuesta, C. A. Kulikowski, V. Maojo, A
ficiaries who are interested in enhancing the trustworthiness, scoping review on the progress, applicability, and future of explain-
transparency, accountability, and fairness of their AI models. able artificial intelligence in medicine, Applied Sciences 13 (19) (2023)
10778.
We also highlighted research gaps and challenges of XAI to [20] H. W. Loh, C. P. Ooi, S. Seoni, P. D. Barua, F. Molinari, U. R. Acharya,
strengthen the existing XAI methods and to give future research Application of Explainable Artificial Intelligence for healthcare: A sys-
directions in the field. tematic review of the last decade (2011–2022), Computer Methods and
Programs in Biomedicine (2022) 107161.
[21] M. N. Alam, M. Kaur, M. S. Kabir, Explainable AI in Healthcare: En-
References hancing Transparency and Trust upon Legal and Ethical Consideration
(2023).
[1] A. Weller, Transparency: motivations and challenges, in: Explainable [22] A. Albahri, A. M. Duhaim, M. A. Fadhel, A. Alnoor, N. S. Baqer,
AI: interpreting, explaining and visualizing deep learning, Springer, L. Alzubaidi, O. Albahri, A. Alamoodi, J. Bai, A. Salhi, et al., A sys-
2019, pp. 23–40.

26
tematic review of trustworthy and Explainable Artificial Intelligence in [46] J. D. Fuhrman, N. Gorre, Q. Hu, H. Li, I. El Naqa, M. L. Giger, A
healthcare: Assessment of quality, bias risk, and data fusion, Informa- review of explainable and interpretable AI with applications in COVID-
tion Fusion (2023). 19 imaging, Medical Physics 49 (1) (2022) 1–14.
[23] A. Saranya, R. Subhashini, A systematic review of Explainable Artificial [47] D. K. Gurmessa, W. Jimma, A comprehensive evaluation of explainable
Intelligence models and applications: Recent developments and future Artificial Intelligence techniques in stroke diagnosis: A systematic re-
trends, Decision Analytics Journal (2023) 100230. view, Cogent Engineering 10 (2) (2023) 2273088.
[24] L. Longo, M. Brcic, F. Cabitza, J. Choi, R. Confalonieri, J. Del Ser, [48] A. Das, P. Rad, Opportunities and challenges in Explainable Artificial
R. Guidotti, Y. Hayashi, F. Herrera, A. Holzinger, et al., Explainable Intelligence (XAI): A survey, arXiv preprint arXiv:2006.11371 (2020).
artificial intelligence (XAI) 2.0: A manifesto of open challenges and [49] R. Marcinkevičs, J. E. Vogt, Interpretability and explainability: A ma-
interdisciplinary research directions, Information Fusion (2024) 102301. chine learning zoo mini-tour, arXiv preprint arXiv:2012.01805 (2020).
[25] N. Bostrom, E. Yudkowsky, The ethics of Artificial Intelligence, in: Ar- [50] C. Rudin, Stop explaining black box machine learning models for high
tificial Intelligence Safety and Security, Chapman and Hall/CRC, 2018, stakes decisions and use interpretable models instead, Nature Machine
pp. 57–69. Intelligence 1 (5) (2019) 206–215.
[26] M. T. Ribeiro, S. Singh, C. Guestrin, “Why should I trust you?” Ex- [51] M. T. Ribeiro, S. Singh, C. Guestrin, Model-agnostic interpretability of
plaining the predictions of any classifier, in: Proceedings of the 22nd machine learning, arXiv preprint arXiv:1606.05386 (2016).
ACM SIGKDD International Conference on Knowledge Discovery and [52] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model
Data Mining, 2016, pp. 1135–1144. predictions, Advances in Neural Information Processing Systems 30
[27] I. El Naqa, M. J. Murphy, What is machine learning?, Springer, 2015. (2017).
[28] J. H. Moor, Three myths of computer science, The British Journal for [53] M. Ancona, E. Ceolini, C. Öztireli, M. Gross, Towards better under-
the Philosophy of Science 29 (3) (1978) 213–222. standing of gradient-based attribution methods for deep neural networks,
[29] A. Saxe, S. Nelli, C. Summerfield, If deep learning is the answer, what arXiv preprint arXiv:1711.06104 (2017).
is the question?, Nature Reviews Neuroscience 22 (1) (2021) 55–67. [54] H. Chefer, S. Gur, L. Wolf, Transformer interpretability beyond attention
[30] D. Castelvecchi, Can we open the black box of AI?, Nature News visualization, in: Proceedings of the IEEE/CVF Conference on Com-
538 (7623) (2016) 20. puter Vision and Pattern Recognition, 2021, pp. 782–791.
[31] D. Doran, S. Schulz, T. R. Besold, What does explainable AI re- [55] A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. Müller, L. Wolf, XAI
ally mean? A new conceptualization of perspectives, arXiv preprint for Transformers: Better explanations through conservative propagation,
arXiv:1710.00794 (2017). in: International Conference on Machine Learning, PMLR, 2022, pp.
[32] P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, P. M. Atkinson, Ex- 435–451.
plainable artificial intelligence: an analytical review, Wiley Interdisci- [56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
plinary Reviews: Data Mining and Knowledge Discovery 11 (5) (2021) Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in Neural
e1424. Information Processing Systems 30 (2017).
[33] F.-L. Fan, J. Xiong, M. Li, G. Wang, On interpretability of artificial [57] M. T. Ribeiro, S. Singh, C. Guestrin, Anchors: High-precision model-
neural networks: A survey, IEEE Transactions on Radiation and Plasma agnostic explanations, in: Proceedings of the AAAI Conference on Ar-
Medical Sciences 5 (6) (2021) 741–760. tificial Intelligence, Vol. 32, 2018.
[34] H. K. Dam, T. Tran, A. Ghose, Explainable software analytics, in: Pro- [58] M. Ancona, C. Oztireli, M. Gross, Explaining deep neural networks with
ceedings of the 40th International Conference on Software Engineering: a polynomial time algorithm for shapley value approximation, in: Inter-
New Ideas and Emerging Results, 2018, pp. 53–56. national Conference on Machine Learning, PMLR, 2019, pp. 272–281.
[35] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso- [59] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations with-
Moral, R. Confalonieri, R. Guidotti, J. Del Ser, N. Dı́az-Rodrı́guez, out opening the black box: Automated decisions and the GDPR, Harv.
F. Herrera, Explainable artificial intelligence (xai): What we know and JL & Tech. 31 (2017) 841.
what is left to attain trustworthy artificial intelligence, Information fu- [60] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional net-
sion 99 (2023) 101805. works: Visualising image classification models and saliency maps, arXiv
[36] Y. Zhang, Q. V. Liao, R. K. Bellamy, Effect of confidence and expla- preprint arXiv:1312.6034 (2013).
nation on accuracy and trust calibration in AI-assisted decision making, [61] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller,
in: Proceedings of the 2020 Conference on Fairness, Accountability, and W. Samek, On pixel-wise explanations for non-linear classifier decisions
Transparency, 2020, pp. 295–305. by layer-wise relevance propagation, PloS one 10 (7) (2015) e0130140.
[37] M. I. Jordan, T. M. Mitchell, Machine learning: Trends, perspectives, [62] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller,
and prospects, Science 349 (6245) (2015) 255–260. Layer-wise relevance propagation: an overview, Explainable AI: inter-
[38] Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network in- preting, explaining and visualizing deep learning (2019) 193–209.
terpretability, IEEE Transactions on Emerging Topics in Computational [63] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep
Intelligence 5 (5) (2021) 726–742. features for discriminative localization, in: Proceedings of the IEEE
[39] F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable ma- Conference on Computer Vision and Pattern Recognition, 2016, pp.
chine learning, arXiv preprint arXiv:1702.08608 (2017). 2921–2929.
[40] Q. Zhang, Y. N. Wu, S.-C. Zhu, Interpretable convolutional neural net- [64] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep net-
works, in: Proceedings of the IEEE Conference on Computer Vision and works, in: International Conference on Machine Learning, PMLR, 2017,
Pattern Recognition, 2018, pp. 8827–8836. pp. 3319–3328.
[41] W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, K.-R. Müller, Ex- [65] H. Chefer, S. Gur, L. Wolf, Generic attention-model explainability for
plainable AI: interpreting, explaining and visualizing deep learning, Vol. interpreting bi-modal and encoder-decoder transformers, in: Proceed-
11700, Springer Nature, 2019. ings of the IEEE/CVF International Conference on Computer Vision,
[42] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, D. Mané, 2021, pp. 397–406.
Concrete problems in AI safety, arXiv preprint arXiv:1606.06565 [66] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features
(2016). through propagating activation differences, in: International Conference
[43] B. J. Dietvorst, J. P. Simmons, C. Massey, Algorithm aversion: people on Machine Learning, PMLR, 2017, pp. 3145–3153.
erroneously avoid algorithms after seeing them err., Journal of Experi- [67] E. Voita, D. Talbot, F. Moiseev, R. Sennrich, I. Titov, Analyzing multi-
mental Psychology: General 144 (1) (2015) 114. head self-attention: Specialized heads do the heavy lifting, the rest can
[44] Z. C. Lipton, The mythos of model interpretability: In machine learning, be pruned, arXiv preprint arXiv:1905.09418 (2019).
the concept of interpretability is both important and slippery., Queue [68] Z. Wu, D. C. Ong, On explaining your explanations of BERT: An empir-
16 (3) (2018) 31–57. ical study with sequence classification, arXiv preprint arXiv:2101.00196
[45] G. Montavon, W. Samek, K.-R. Müller, Methods for interpreting and un- (2021).
derstanding deep neural networks, Digital Signal Processing 73 (2018) [69] S. Abnar, W. Zuidema, Quantifying attention flow in transformers, arXiv
1–15. preprint arXiv:2005.00928 (2020).

27
[70] C. Rana, M. Dahiya, et al., Safety of autonomous systems using rein- [93] Y. W. Jie, R. Satapathy, G. S. Mong, E. Cambria, et al., How inter-
forcement learning: A comprehensive survey, in: 2023 International pretable are reasoning explanations from prompting large language mod-
Conference on Advances in Computation, Communication and Infor- els?, arXiv preprint arXiv:2402.11863 (2024).
mation Technology (ICAICCIT), IEEE, 2023, pp. 744–750. [94] S. Wu, E. M. Shen, C. Badrinath, J. Ma, H. Lakkaraju, Analyzing chain-
[71] C. Yu, J. Liu, S. Nemati, G. Yin, Reinforcement learning in healthcare: of-thought prompting in large language models via gradient-based fea-
A survey, ACM Computing Surveys (CSUR) 55 (1) (2021) 1–36. ture attributions, arXiv preprint arXiv:2307.13339 (2023).
[72] Y. Ye, X. Zhang, J. Sun, Automated vehicle’s behavior decision mak- [95] A. Madaan, A. Yazdanbakhsh, Text and patterns: For effective chain of
ing using deep reinforcement learning and high-fidelity simulation envi- thought, it takes two to tango, arXiv preprint arXiv:2209.07686 (2022).
ronment, Transportation Research Part C: Emerging Technologies 107 [96] B. Wang, S. Min, X. Deng, J. Shen, Y. Wu, L. Zettlemoyer, H. Sun,
(2019) 155–170. Towards understanding chain-of-thought prompting: An empirical study
[73] G. A. Vouros, Explainable deep reinforcement learning: state of the art of what matters, arXiv preprint arXiv:2212.10001 (2022).
and challenges, ACM Computing Surveys 55 (5) (2022) 1–39. [97] T. Lanham, A. Chen, A. Radhakrishnan, B. Steiner, C. Denison,
[74] P. Madumal, T. Miller, L. Sonenberg, F. Vetere, Explainable reinforce- D. Hernandez, D. Li, E. Durmus, E. Hubinger, J. Kernion, et al.,
ment learning through a causal lens, in: Proceedings of the AAAI Con- Measuring faithfulness in chain-of-thought reasoning, arXiv preprint
ference on Artificial Intelligence, Vol. 34, 2020, pp. 2493–2500. arXiv:2307.13702 (2023).
[75] E. Puiutta, E. M. Veith, Explainable reinforcement learning: A survey, [98] J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu,
in: International Cross-domain Conference for Machine Learning and D. Huang, D. Zhou, et al., Larger language models do in-context learn-
Knowledge Extraction, Springer, 2020, pp. 77–95. ing differently, arXiv preprint arXiv:2303.03846 (2023).
[76] A. Heuillet, F. Couthouis, N. Dı́az-Rodrı́guez, Collective explainable [99] Z. Li, P. Xu, F. Liu, H. Song, Towards understanding in-context learn-
AI: Explaining cooperative strategies and agent contribution in multia- ing with contrastive demonstrations and saliency maps, arXiv preprint
gent reinforcement learning with shapley values, IEEE Computational arXiv:2307.05052 (2023).
Intelligence Magazine 17 (1) (2022) 59–71. [100] D. Slack, S. Krishna, H. Lakkaraju, S. Singh, Explaining machine learn-
[77] A. Heuillet, F. Couthouis, N. Dı́az-Rodrı́guez, Explainability in deep ing models with interactive natural language conversations using Talk-
reinforcement learning, Knowledge-Based Systems 214 (2021) 106685. ToModel, Nature Machine Intelligence 5 (8) (2023) 873–883.
[78] G. Zhang, H. Kashima, Learning state importance for preference-based [101] C. Yeh, Y. Chen, A. Wu, C. Chen, F. Viégas, M. Wattenberg, Atten-
reinforcement learning, Machine Learning (2023) 1–17. tionVIX: A global view of Transformer attention, IEEE Transactions on
[79] L. Wells, T. Bednarz, Explainable AI and reinforcement learning—a sys- Visualization and Computer Graphics (2023).
tematic review of current approaches and trends, Frontiers in Artificial [102] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional
Intelligence 4 (2021) 550030. networks, in: Computer Vision–ECCV 2014: 13th European Confer-
[80] A. Alharin, T.-N. Doan, M. Sartipi, Reinforcement learning interpreta- ence, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I
tion methods: A survey, IEEE Access 8 (2020) 171058–171077. 13, Springer, 2014, pp. 818–833.
[81] V. Chamola, V. Hassija, A. R. Sulthana, D. Ghosh, D. Dhingra, B. Sik- [103] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for
dar, A review of trustworthy and Explainable Artificial Intelligence simplicity: The all convolutional net, arXiv preprint arXiv:1412.6806
(XAI), IEEE Access (2023). (2014).
[82] V. Lai, C. Chen, Q. V. Liao, A. Smith-Renner, C. Tan, Towards a sci- [104] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification
ence of Human-AI decision making: a survey of empirical studies, arXiv with deep convolutional neural networks, Communications of the ACM
preprint arXiv:2112.11471 (2021). 60 (6) (2017) 84–90.
[83] A. Torfi, R. A. Shirvani, Y. Keneshloo, N. Tavaf, E. A. Fox, Natural [105] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
language processing advancements by deep learning: A survey, arXiv nition, in: Proceedings of the IEEE Conference on Computer Vision and
preprint arXiv:2003.01200 (2020). Pattern Recognition, 2016, pp. 770–778.
[84] D. Jurafsky, J. H. Martin, Speech and Language Processing: An In- [106] S. Yang, P. Luo, C.-C. Loy, X. Tang, Wider face: A face detection bench-
troduction to Natural Language Processing, Computational Linguistics, mark, in: Proceedings of the IEEE Conference on Computer Vision and
and Speech Recognition. Pattern Recognition, 2016, pp. 5525–5533.
[85] J. P. Usuga-Cadavid, S. Lamouri, B. Grabot, A. Fortin, Using deep learn- [107] W. Yang, H. Huang, Z. Zhang, X. Chen, K. Huang, S. Zhang, Towards
ing to value free-form text data for predictive maintenance, International rich feature discovery with class activation maps augmentation for per-
Journal of Production Research 60 (14) (2022) 4548–4575. son re-identification, in: Proceedings of the IEEE/CVF Conference on
[86] S. Jain, B. C. Wallace, Attention is not explanation, arXiv preprint Computer Vision and Pattern Recognition, 2019, pp. 1389–1398.
arXiv:1902.10186 (2019). [108] P. Linardatos, V. Papastefanopoulos, S. Kotsiantis, Explainable AI: A
[87] S. Gholizadeh, N. Zhou, Model explainability in deep learning based review of machine learning interpretability methods, Entropy 23 (1)
natural language processing, arXiv preprint arXiv:2106.07410 (2021). (2020) 18.
[88] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep [109] D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, Smooth-
networks, in: D. Precup, Y. W. Teh (Eds.), Proceedings of the 34th grad: removing noise by adding noise, arXiv preprint arXiv:1706.03825
International Conference on Machine Learning, Vol. 70 of Proceedings (2017).
of Machine Learning Research, PMLR, 2017, pp. 3319–3328. [110] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
URL https://proceedings.mlr.press/v70/ T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al.,
sundararajan17a.html An image is worth 16x16 words: Transformers for image recognition at
[89] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Ex- scale, arXiv preprint arXiv:2010.11929 (2020).
plaining nonlinear classification decisions with deep taylor decomposi- [111] S. Verma, V. Boonsanong, M. Hoang, K. E. Hines, J. P. Dickerson,
tion, Pattern Recognition 65 (2017) 211–222. C. Shah, Counterfactual explanations and algorithmic recourses for ma-
[90] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, chine learning: A review, arXiv preprint arXiv:2010.10596 (2020).
A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models [112] R. Guidotti, Counterfactual explanations and how to find them: litera-
are few-shot learners, Advances in neural information processing sys- ture review and benchmarking, Data Mining and Knowledge Discovery
tems 33 (2020) 1877–1901. (2022) 1–55.
[91] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, [113] R. H. Shumway, D. S. Stoffer, D. S. Stoffer, Time series analysis and its
D. Zhou, et al., Chain-of-thought prompting elicits reasoning in large applications, Vol. 3, Springer, 2000.
language models, Advances in Neural Information Processing Systems [114] B. Lim, S. Zohren, Time-series forecasting with deep learning: a survey,
35 (2022) 24824–24837. Philosophical Transactions of the Royal Society A 379 (2194) (2021)
[92] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, 20200209.
J. Spencer-Smith, D. C. Schmidt, A prompt pattern catalog to enhance [115] R. Verma, J. Sharma, S. Jindal, Time Series Forecasting Using Machine
prompt engineering with chatgpt, arXiv preprint arXiv:2302.11382 Learning, in: Advances in Computing and Data Sciences: 4th Interna-
(2023). tional Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020,

28
Revised Selected Papers 4, Springer, 2020, pp. 372–381. [138] R. Hamamoto, Application of artificial intelligence for medical research
[116] W. Bao, J. Yue, Y. Rao, A deep learning framework for financial time (2021).
series using stacked autoencoders and long-short term memory, PloS [139] S. Bharati, M. R. H. Mondal, P. Podder, A Review on Explainable Arti-
one 12 (7) (2017) e0180944. ficial Intelligence for Healthcare: Why, How, and When?, IEEE Trans-
[117] C. Huntingford, E. S. Jeffers, M. B. Bonsall, H. M. Christensen, actions on Artificial Intelligence (2023).
T. Lees, H. Yang, Machine learning and artificial intelligence to aid cli- [140] L. Li, M. Xu, H. Liu, Y. Li, X. Wang, L. Jiang, Z. Wang, X. Fan,
mate change research and preparedness, Environmental Research Let- N. Wang, A large-scale database and a CNN model for attention-
ters 14 (12) (2019) 124007. based glaucoma detection, IEEE transactions on Medical Imaging 39 (2)
[118] A. Farahat, C. Reichert, C. M. Sweeney-Reed, H. Hinrichs, Convo- (2019) 413–424.
lutional neural networks for decoding of covert attention focus and [141] Z. Bian, S. Xia, C. Xia, M. Shao, Weakly supervised vitiligo segmenta-
saliency maps for EEG feature visualization, Journal of Neural Engi- tion in skin image through saliency propagation, in: 2019 IEEE Interna-
neering 16 (6) (2019) 066010. tional Conference on Bioinformatics and Biomedicine (BIBM), IEEE,
[119] T. Huber, K. Weitz, E. André, O. Amir, Local and global explanations 2019, pp. 931–934.
of agent behavior: Integrating strategy summaries with saliency maps, [142] S. Rajaraman, S. Candemir, G. Thoma, S. Antani, Visualizing and ex-
Artificial Intelligence 301 (2021) 103571. plaining deep learning predictions for pneumonia detection in pediatric
[120] A. A. Ismail, M. Gunady, H. Corrada Bravo, S. Feizi, Benchmarking chest radiographs, in: Medical Imaging 2019: Computer-Aided Diagno-
deep learning interpretability in time series predictions, Advances in sis, Vol. 10950, SPIE, 2019, pp. 200–211.
Neural Information Processing Systems 33 (2020) 6441–6452. [143] G. Yang, F. Raschke, T. R. Barrick, F. A. Howe, Manifold Learning
[121] J. Cooper, O. Arandjelović, D. J. Harrison, Believe the HiPe: Hierarchi- in MR spectroscopy using nonlinear dimensionality reduction and un-
cal perturbation for fast, robust, and model-agnostic saliency mapping, supervised clustering, Magnetic Resonance in Medicine 74 (3) (2015)
Pattern Recognition 129 (2022) 108743. 868–878.
[122] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with [144] U. Ahmed, G. Srivastava, U. Yun, J. C.-W. Lin, EANDC: An explain-
deep neural networks: A strong baseline, in: 2017 International joint able attention network based deep adaptive clustering model for mental
Conference on Neural Networks (IJCNN), IEEE, 2017, pp. 1578–1585. health treatment, Future Generation Computer Systems 130 (2022) 106–
[123] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Towards bet- 113.
ter analysis of deep convolutional neural networks, International Confer- [145] Y. Ming, H. Qu, E. Bertini, Rulematrix: Visualizing and understanding
ence on Learning Representations (ICLR) (2015). classifiers with rules, IEEE Transactions on Visualization and Computer
[124] W. Song, L. Liu, M. Liu, W. Wang, X. Wang, Y. Song, Representation Graphics 25 (1) (2018) 342–352.
learning with deconvolution for multivariate time series classification [146] N. Rane, S. Choudhary, J. Rane, Explainable Artificial Intelligence
and visualization, in: Data Science: 6th International Conference of Pio- (XAI) in healthcare: Interpretable Models for Clinical Decision Sup-
neering Computer Scientists, Engineers and Educators, ICPCSEE 2020, port, Available at SSRN 4637897 (2023).
Taiyuan, China, September 18-21, 2020, Proceedings, Part I 6, Springer, [147] H. Magunia, S. Lederer, R. Verbuecheln, B. J. Gilot, M. Koeppen, H. A.
2020, pp. 310–326. Haeberle, V. Mirakaj, P. Hofmann, G. Marx, J. Bickenbach, et al.,
[125] S. A. Siddiqui, D. Mercier, M. Munir, A. Dengel, S. Ahmed, Tsviz: Machine learning identifies ICU outcome predictors in a multicenter
Demystification of deep learning models for time-series analysis, IEEE COVID-19 cohort, Critical Care 25 (2021) 1–14.
Access 7 (2019) 67027–67040. [148] A. Raza, K. P. Tran, L. Koehl, S. Li, Designing ecg monitoring
[126] C. Labrı́n, F. Urdinez, Principal component analysis, in: R for Political healthcare system with federated transfer learning and explainable AI,
Data Science, Chapman and Hall/CRC, 2020, pp. 375–393. Knowledge-Based Systems 236 (2022) 107763.
[127] L. Van Der Maaten, Accelerating t-SNE using tree-based algorithms, [149] F. C. Morabito, C. Ieracitano, N. Mammone, An explainable Artificial
The Journal of Machine Learning Research 15 (1) (2014) 3221–3245. Intelligence approach to study MCI to AD conversion via HD-EEG pro-
[128] L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold ap- cessing, Clinical EEG and Neuroscience 54 (1) (2023) 51–60.
proximation and projection for dimension reduction, arXiv preprint [150] S. El-Sappagh, J. M. Alonso, S. R. Islam, A. M. Sultan, K. S. Kwak,
arXiv:1802.03426 (2018). A multilayer multimodal detection and prediction model based on ex-
[129] K. Agrawal, N. Desai, T. Chakraborty, Time series visualization using plainable artificial intelligence for Alzheimer’s disease, Scientific Re-
t-SNE and UMAP, Journal of Big Data 8 (1) (2021) 1–21. ports 11 (1) (2021) 2660.
[130] A. Roy, L. v. d. Maaten, D. Witten, UMAP reveals cryptic population [151] G. Yang, Q. Ye, J. Xia, Unbox the black-box for the medical explainable
structure and phenotype heterogeneity in large genomic cohorts, PLoS AI via multi-modal and multi-centre data fusion: A mini-review, two
genetics 16 (3) (2020) e1009043. showcases and beyond, Information Fusion 77 (2022) 29–52.
[131] M. Munir, Thesis approved by the Department of Computer Science of [152] J. B. Awotunde, E. A. Adeniyi, G. J. Ajamu, G. B. Balogun, F. A.
the TU Kaiserslautern for the award of the Doctoral Degree doctor of Taofeek-Ibrahim, Explainable Artificial Intelligence in Genomic Se-
engineering, Ph.D. thesis, Kyushu University, Japan (2021). quence for Healthcare Systems Prediction, in: Connected e-Health: In-
[132] E. Mosqueira-Rey, E. Hernández-Pereira, D. Alonso-Rı́os, J. Bobes- tegrated IoT and Cloud Computing, Springer, 2022, pp. 417–437.
Bascarán, Á. Fernández-Leal, Human-in-the-loop machine learning: a [153] A. Anguita-Ruiz, A. Segura-Delgado, R. Alcalá, C. M. Aguilera, J. Al-
state of the art, Artificial Intelligence Review (2022) 1–50. calá-Fdez, eXplainable Artificial Intelligence (XAI) for the identifica-
[133] U. Schlegel, D. A. Keim, Time series model attribution visualizations tion of biologically relevant gene expression patterns in longitudinal hu-
as explanations, in: 2021 IEEE Workshop on TRust and EXpertise in man studies, insights from obesity research, PLoS Computational Biol-
Visual Analytics (TREX), IEEE, 2021, pp. 27–31. ogy 16 (4) (2020) e1007792.
[134] G. Plumb, S. Wang, Y. Chen, C. Rudin, Interpretable decision sets: A [154] A. Troncoso-Garcı́a, M. Martı́nez-Ballesteros, F. Martı́nez-Álvarez,
joint framework for description and prediction, in: Proceedings of the A. Troncoso, Explainable machine learning for sleep apnea prediction,
24th ACM SIGKDD International Conference on Knowledge Discovery Procedia Computer Science 207 (2022) 2930–2939.
& Data Mining, ACM, 2018, pp. 1677–1686. [155] E. Tjoa, C. Guan, A survey on Explainable Artificial Intelligence (XAI):
[135] Z. C. Lipton, D. C. Kale, R. Wetzel, et al., Modeling missing data in clin- Toward medical XAI, IEEE Transactions on Neural Networks and
ical time series with rnns, Machine Learning for Healthcare 56 (2016) Learning Systems 32 (11) (2020) 4793–4813.
253–270. [156] J. Liao, X. Li, Y. Gan, S. Han, P. Rong, W. Wang, W. Li, L. Zhou, Artifi-
[136] H. Lakkaraju, S. H. Bach, J. Leskovec, Interpretable decision sets: A cial intelligence assists precision medicine in cancer treatment, Frontiers
joint framework for description and prediction, in: Proceedings of the in Oncology 12 (2023) 998222.
22nd ACM SIGKDD International Conference on Knowledge Discov- [157] H. Askr, E. Elgeldawi, H. Aboul Ella, Y. A. Elshaier, M. M. Gomaa,
ery and Data Mining, 2016, pp. 1675–1684. A. E. Hassanien, Deep learning in drug discovery: an integrative review
[137] C. Rudin, J. Radin, Why are we using black box models in AI when we and future challenges, Artificial Intelligence Review 56 (7) (2023) 5975–
don’t need to? A lesson from an explainable AI competition, Harvard 6037.
Data Science Review 1 (2) (2019) 1–9. [158] Q.-H. Kha, V.-H. Le, T. N. K. Hung, N. T. K. Nguyen, N. Q. K. Le,

29
Development and validation of an explainable machine learning-based [180] J. Kim, A. Rohrbach, Z. Akata, S. Moon, T. Misu, Y.-T. Chen, T. Darrell,
prediction model for drug–food interactions from chemical structures, J. Canny, Toward explainable and advisable model for self-driving cars,
Sensors 23 (8) (2023) 3962. Applied AI Letters 2 (4) (2021) e56.
[159] C. Panigutti, A. Beretta, D. Fadda, F. Giannotti, D. Pedreschi, A. Perotti, [181] P. Regulation, Regulation (EU) 2016/679 of the European Parliament
S. Rinzivillo, Co-design of human-centered, explainable ai for clinical and of the Council, Regulation (eu) 679 (2016) 2016.
decision support, ACM Transactions on Interactive Intelligent Systems [182] S. Burton, I. Habli, T. Lawton, J. McDermid, P. Morgan, Z. Porter, Mind
(2023). the gaps: Assuring the safety of autonomous systems from an engi-
[160] D. Saraswat, P. Bhattacharya, A. Verma, V. K. Prasad, S. Tanwar, neering, ethical, and legal perspective, Artificial Intelligence 279 (2020)
G. Sharma, P. N. Bokoro, R. Sharma, Explainable AI for healthcare 5.0: 103201.
opportunities and challenges, IEEE Access (2022). [183] V. Chen, Q. V. Liao, J. Wortman Vaughan, G. Bansal, Understanding the
[161] A. Ward, A. Sarraju, S. Chung, J. Li, R. Harrington, P. Heidenre- role of human intuition on reliance in human-AI decision-making with
ich, L. Palaniappan, D. Scheinker, F. Rodriguez, Machine learning and explanations, Proceedings of the ACM on Human-Computer Interaction
atherosclerotic cardiovascular disease risk prediction in a multi-ethnic 7 (CSCW2) (2023) 1–32.
population, NPJ Digital Medicine 3 (1) (2020) 125. [184] A. Bussone, S. Stumpf, D. O’Sullivan, The role of explanations on trust
[162] X. Ma, Y. Niu, L. Gu, Y. Wang, Y. Zhao, J. Bailey, F. Lu, Understand- and reliance in clinical decision support systems, in: 2015 International
ing adversarial attacks on deep learning based medical image analysis Conference on Healthcare Informatics, IEEE, 2015, pp. 160–169.
systems, Pattern Recognition 110 (2021) 107332. [185] J. Dong, S. Chen, M. Miralinaghi, T. Chen, P. Li, S. Labi, Why did the
[163] M. Sharma, C. Savage, M. Nair, I. Larsson, P. Svedberg, J. M. Nygren, AI make that decision? Towards an explainable artificial intelligence
Artificial intelligence applications in health care practice: scoping re- (XAI) for autonomous driving systems, Transportation Research Part C:
view, Journal of Medical Internet Research 24 (10) (2022) e40238. Emerging Technologies 156 (2023) 104358.
[164] G. Maliha, S. Gerke, I. G. Cohen, R. B. Parikh, Artificial intelligence and [186] H. Mankodiya, D. Jadav, R. Gupta, S. Tanwar, W.-C. Hong, R. Sharma,
liability in medicine, The Milbank Quarterly 99 (3) (2021) 629–647. Od-XAI: Explainable AI-based semantic object detection for au-
[165] J. Amann, A. Blasimme, E. Vayena, D. Frey, V. I. Madai, P. Consortium, tonomous vehicles, Applied Sciences 12 (11) (2022) 5310.
Explainability for artificial intelligence in healthcare: a multidisciplinary [187] M. M. Karim, Y. Li, R. Qin, Toward explainable artificial intelligence for
perspective, BMC Medical Informatics and Decision Making 20 (2020) early anticipation of traffic accidents, Transportation Research Record
1–9. 2676 (6) (2022) 743–755.
[166] A. Chaddad, J. Peng, J. Xu, A. Bouridane, Survey of explainable AI [188] A. S. Madhav, A. K. Tyagi, Explainable Artificial Intelligence (XAI):
techniques in healthcare, Sensors 23 (2) (2023) 634. connecting artificial decision-making and human trust in autonomous
[167] A. Kerasidou, Ethics of artificial intelligence in global health: Explain- vehicles, in: Proceedings of Third International Conference on Comput-
ability, algorithmic bias and trust, Journal of Oral Biology and Cranio- ing, Communications, and Cyber-Security: IC4S 2021, Springer, 2022,
facial Research 11 (4) (2021) 612–614. pp. 123–136.
[168] T. d. C. Aranovich, R. Matulionyte, Ensuring AI explainability in health- [189] U. Onyekpe, Y. Lu, E. Apostolopoulou, V. Palade, E. U. Eyo, S. Kanara-
care: problems and possible policy solutions, Information & Communi- chos, Explainable Machine Learning for Autonomous Vehicle Position-
cations Technology Law 32 (2) (2023) 259–275. ing Using SHAP, in: Explainable AI: Foundations, Methodologies and
[169] N. Anton, B. Doroftei, S. Curteanu, L. Catãlin, O.-D. Ilie, F. Târcoveanu, Applications, Springer, 2022, pp. 157–183.
C. M. Bogdănici, Comprehensive review on the use of artificial intel- [190] X. Cheng, J. Wang, H. Li, Y. Zhang, L. Wu, Y. Liu, A method to evaluate
ligence in ophthalmology and future research directions, Diagnostics task-specific importance of spatio-temporal units based on explainable
13 (1) (2022) 100. artificial intelligence, International Journal of Geographical Information
[170] A. K. Al Shami, Generating Tennis Player by the Predicting Movement Science 35 (10) (2021) 2002–2025.
Using 2D Pose Estimation, Ph.D. thesis, University of Colorado Col- [191] T. Rojat, R. Puget, D. Filliat, J. Del Ser, R. Gelin, N. Dı́az-Rodrı́guez,
orado Springs (2022). Explainable Artificial Intelligence (XAI) on timeseries data: A survey,
[171] A. AlShami, T. Boult, J. Kalita, Pose2Trajectory: Using transformers on arXiv preprint arXiv:2104.00950 (2021).
body pose to predict tennis player’s trajectory, Journal of Visual Com- [192] C. I. Nwakanma, L. A. C. Ahakonye, J. N. Njoku, J. C. Odirichukwu,
munication and Image Representation 97 (2023) 103954. S. A. Okolie, C. Uzondu, C. C. Ndubuisi Nweke, D.-S. Kim, Explainable
[172] S. Atakishiyev, M. Salameh, H. Yao, R. Goebel, Explainable artificial in- Artificial Intelligence (XAI) for intrusion detection and mitigation in in-
telligence for autonomous driving: A comprehensive overview and field telligent connected vehicles: A review, Applied Sciences 13 (3) (2023)
guide for future research directions, arXiv preprint arXiv:2112.11561 1252.
(2021). [193] J. Li, S. King, I. Jennions, Intelligent fault diagnosis of an aircraft fuel
[173] D. Holliday, S. Wilson, S. Stumpf, User trust in intelligent systems: A system using machine learning—a literature review, Machines 11 (4)
journey over time, in: Proceedings of the 21st International Conference (2023) 481.
on Intelligent User Interfaces, 2016, pp. 164–168. [194] G. Bendiab, A. Hameurlaine, G. Germanos, N. Kolokotronis, S. Shi-
[174] B. W. Israelsen, N. R. Ahmed, “Dave... I can assure you... that it’s going aeles, Autonomous vehicles security: Challenges and solutions using
to be all right...” A definition, case for, and survey of algorithmic assur- blockchain and artificial intelligence, IEEE Transactions on Intelligent
ances in human-autonomy trust relationships, ACM Computing Surveys Transportation Systems (2023).
(CSUR) 51 (6) (2019) 1–37. [195] A. Maqsood, C. Chen, T. J. Jacobsson, The future of material scientists
[175] S. Atakishiyev, M. Salameh, H. Yao, R. Goebel, Towards safe, in an age of artificial intelligence, Advanced Science (2024) 2401401.
explainable, and regulated autonomous driving, arXiv preprint [196] F. Oviedo, J. L. Ferres, T. Buonassisi, K. T. Butler, Interpretable and
arXiv:2111.10518 (2021). explainable machine learning for materials science and chemistry, Ac-
[176] A. Corso, M. J. Kochenderfer, Interpretable safety validation for au- counts of Materials Research 3 (6) (2022) 597–607.
tonomous vehicles, in: 2020 IEEE 23rd International Conference on [197] G. Pilania, Machine learning in materials science: From explainable pre-
Intelligent Transportation Systems (ITSC), IEEE, 2020, pp. 1–6. dictions to autonomous design, Computational Materials Science 193
[177] D. V. McGehee, M. Brewer, C. Schwarz, B. W. Smith, et al., Review of (2021) 110360.
automated vehicle technology: Policy and implementation implications, [198] K. Choudhary, B. DeCost, C. Chen, A. Jain, F. Tavazza, R. Cohn, C. W.
Tech. rep., Iowa. Dept. of Transportation (2016). Park, A. Choudhary, A. Agrawal, S. J. Billinge, et al., Recent advances
[178] M. Rahman, S. Polunsky, S. Jones, Transportation policies for connected and applications of deep learning methods in materials science, npj
and automated mobility in smart cities, in: Smart Cities Policies and Computational Materials 8 (1) (2022) 59.
Financing, Elsevier, 2022, pp. 97–116. [199] A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for ex-
[179] J. Kim, S. Moon, A. Rohrbach, T. Darrell, J. Canny, Advisable learn- plainable deep learning in materials science: bridging the gap between
ing for self-driving vehicles by internalizing observation-to-action rules, academia and industry, Integrating Materials and Manufacturing Inno-
in: Proceedings of the IEEE/CVF Conference on Computer Vision and vation 11 (1) (2022) 41–56.
Pattern Recognition, 2020, pp. 9661–9670. [200] K. Lee, M. V. Ayyasamy, Y. Ji, P. V. Balachandran, A comparison of

30
explainable artificial intelligence methods in the phase classification of [222] N. Drenkow, N. Sani, I. Shpitser, M. Unberath, A systematic review of
multi-principal element alloys, Scientific Reports 12 (1) (2022) 11591. robustness in deep learning for computer vision: Mind the gap?, arXiv
[201] J. Feng, J. L. Lansford, M. A. Katsoulakis, D. G. Vlachos, Explainable preprint arXiv:2112.00639 (2021).
and trustworthy artificial intelligence for correctable modeling in chem- [223] G. Schryen, Speedup and efficiency of computational paralleliza-
ical sciences, Science advances 6 (42) (2020) eabc3204. tion: A unifying approach and asymptotic analysis, arXiv preprint
[202] T. Harren, H. Matter, G. Hessler, M. Rarey, C. Grebner, Interpretation of arXiv:2212.11223 (2022).
structure–activity relationships in real-world drug design data sets using [224] J. DeYoung, S. Jain, N. F. Rajani, E. Lehman, C. Xiong, R. Socher,
explainable artificial intelligence, Journal of Chemical Information and B. C. Wallace, Eraser: A benchmark to evaluate rationalized nlp models,
Modeling 62 (3) (2022) 447–462. arXiv preprint arXiv:1911.03429 (2019).
[203] J. Willard, X. Jia, S. Xu, M. Steinbach, V. Kumar, Integrating physics- [225] A. Thampi, Interpretable AI: Building explainable machine learning sys-
based modeling with machine learning: A survey, arXiv preprint tems, Simon and Schuster, 2022.
arXiv:2003.04919 1 (1) (2020) 1–34. [226] R. Dwivedi, D. Dave, H. Naik, S. Singhal, R. Omer, P. Patel, B. Qian,
[204] M. Datcu, Z. Huang, A. Anghel, J. Zhao, R. Cacoveanu, Explainable, Z. Wen, T. Shah, G. Morgan, et al., Explainable AI (XAI): Core ideas,
physics-aware, trustworthy artificial intelligence: A paradigm shift for techniques, and solutions, ACM Computing Surveys 55 (9) (2023) 1–33.
synthetic aperture radar, IEEE Geoscience and Remote Sensing Maga- [227] S. Wu, H. Fei, L. Qu, W. Ji, T.-S. Chua, NExt-GPT: Any-to-any multi-
zine 11 (1) (2023) 8–25. modal LLM, arXiv preprint arXiv:2309.05519 (2023).
[205] J. Willard, X. Jia, S. Xu, M. Steinbach, V. Kumar, Integrating scien-
tific knowledge with machine learning for engineering and environmen-
tal systems, ACM Computing Surveys 55 (4) (2022) 1–37.
[206] Z. Huang, X. Yao, Y. Liu, C. O. Dumitru, M. Datcu, J. Han, Physically
explainable CNN for SAR image classification, ISPRS Journal of Pho-
togrammetry and Remote Sensing 190 (2022) 25–37.
[207] J. Crocker, K. Kumar, B. Cox, Using explainability to design physics-
aware CNNs for solving subsurface inverse problems, Computers and
Geotechnics 159 (2023) 105452.
[208] S. Sadeghi Tabas, Explainable physics-informed deep learning for
rainfall-runoff modeling and uncertainty assessment across the continen-
tal united states (2023).
[209] R. Roscher, B. Bohn, M. F. Duarte, J. Garcke, Explainable machine
learning for scientific insights and discoveries, IEEE Access 8 (2020)
42200–42216.
[210] D. Tuia, K. Schindler, B. Demir, G. Camps-Valls, X. X. Zhu,
M. Kochupillai, S. Džeroski, J. N. van Rijn, H. H. Hoos, F. Del Frate,
et al., Artificial intelligence to advance earth observation: a perspective,
arXiv preprint arXiv:2305.08413 (2023).
[211] P. Lopes, E. Silva, C. Braga, T. Oliveira, L. Rosado, XAI Systems Eval-
uation: A Review of Human and Computer-Centred Methods, Applied
Sciences 12 (19) (2022) 9423.
[212] V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang,
S. Scardapane, I. Spinelli, M. Mahmud, A. Hussain, Interpreting black-
box models: a review on explainable artificial intelligence, Cognitive
Computation 16 (1) (2024) 45–74.
[213] S. Mohseni, N. Zarei, E. D. Ragan, A multidisciplinary survey and
framework for design and evaluation of explainable AI systems, ACM
Transactions on Interactive Intelligent Systems (TiiS) 11 (3-4) (2021)
1–45.
[214] S. Mohseni, J. E. Block, E. D. Ragan, A human-grounded evaluation
benchmark for local explanations of machine learning, arXiv preprint
arXiv:1801.05075 (2018).
[215] D. Gunning, D. Aha, DARPA’s Explainable Artificial Intelligence (XAI)
program, AI Magazine 40 (2) (2019) 44–58.
[216] M. Nourani, S. Kabir, S. Mohseni, E. D. Ragan, The effects of meaning-
ful and meaningless explanations on trust and perceived system accuracy
in intelligent systems, in: Proceedings of the AAAI Conference on Hu-
man Computation and Crowdsourcing, Vol. 7, 2019, pp. 97–105.
[217] A. Hedström, L. Weber, D. Krakowczyk, D. Bareeva, F. Motzkus,
W. Samek, S. Lapuschkin, M. M.-C. Höhne, Quantus: An explainable
ai toolkit for responsible evaluation of neural network explanations and
beyond, Journal of Machine Learning Research 24 (34) (2023) 1–11.
[218] J. Zhou, A. H. Gandomi, F. Chen, A. Holzinger, Evaluating the quality
of machine learning explanations: A survey on methods and metrics,
Electronics 10 (5) (2021) 593.
[219] A. F. Markus, J. A. Kors, P. R. Rijnbeek, The role of explainability in
creating trustworthy artificial intelligence for health care: a comprehen-
sive survey of the terminology, design choices, and evaluation strategies,
Journal of Biomedical Informatics 113 (2021) 103655.
[220] M. Velmurugan, C. Ouyang, C. Moreira, R. Sindhgatta, Developing a
fidelity evaluation approach for interpretable machine learning, arXiv
preprint arXiv:2106.08492 (2021).
[221] W. Sun, Stability of machine learning algorithms, Ph.D. thesis, Purdue
University (2015).

31

You might also like