Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction
Explainable Artificial Intelligence: A Survey of Needs, Techniques, Applications, and Future Direction
Melkamu Mershaa , Khang Lamb , Joseph Wooda , Ali AlShamia , Jugal Kalitaa
a College of Engineering and Applied Science, University of Colorado Colorado Springs, , 80918, CO, USA
b College of Information and Communication Technology, Can Tho University, , Can Tho, 90000, Vietnam
Abstract
Artificial intelligence models encounter significant challenges due to their black-box nature, particularly in safety-critical domains
such as healthcare, finance, and autonomous vehicles. Explainable Artificial Intelligence (XAI) addresses these challenges by
providing explanations for how these models make decisions and predictions, ensuring transparency, accountability, and fairness.
arXiv:2409.00265v2 [cs.AI] 13 Jan 2025
Existing studies have examined the fundamental concepts of XAI, its general principles, and the scope of XAI techniques. However,
there remains a gap in the literature as there are no comprehensive reviews that delve into the detailed mathematical representa-
tions, design methodologies of XAI models, and other associated aspects. This paper provides a comprehensive literature review
encompassing common terminologies and definitions, the need for XAI, beneficiaries of XAI, a taxonomy of XAI methods, and
the application of XAI methods in different application areas. The survey is aimed at XAI researchers, XAI practitioners, AI model
developers, and XAI beneficiaries who are interested in enhancing the trustworthiness, transparency, accountability, and fairness of
their AI models.
Keywords: XAI, explainable artificial intelligence, interpretable deep learning, machine learning, neural networks, evaluation
methods, computer vision, natural language processing, NLP, transformers, time series, healthcare, and autonomous cars.
1. Introduction tainty in how they operate. Since these systems impact lives,
it leads to an emerging need to understand how decisions are
Since the advent of digital computer systems, scientists have made. Lack of such understanding makes it difficult to adopt
been exploring ways to automate human intelligence via com- such a powerful tool in industries that require sensitivity or that
putational representation and mathematical theory, eventually are critical to the survival of the species.
giving birth to a computational approach known as Artificial The black-box nature of AI models raises significant con-
Intelligence (AI). AI and machine learning (ML) models are cerns, including the need for explainability, interpretability, ac-
being widely adopted in various domains, such as web search countability, and transparency. These aspects, along with legal,
engines, speech recognition, self-driving cars, strategy game- ethical, and safety considerations, are crucial for building trust
play, image analysis, medical procedures, and national de- in AI, not just among scientists but also among the wider pub-
fense, many of which require high levels of security, transpar- lic, regulators, and politicians who are increasingly attentive to
ent decision-making, and a responsibility to protect information new developments. With this in mind, there has been a shift
[1, 2]. Nevertheless, significant challenges remain in trusting from just relying on the power of AI to understanding and in-
the output of these complex ML algorithms and AI models be- terpreting how AI has arrived at decisions, leading to terms such
cause the detailed inner logic and system architectures are ob- as transparency, explainability, interpretability, or, more gener-
fuscated by the user by design. ally, eXplainable Artificial Intelligence (XAI). A new approach
AI has shown itself to be an efficient and effective way to is required to trust the AI and ML models, and though much
handle many tasks at which humans usually excel. In fact, it has been accomplished in the last decades, the interpretability
has become pervasive, yet hidden from the casual observer, in and black-box issues are still prevalent [4, 5]. Attention given
our day-to-day lives. As AI techniques proliferate, the imple- to XAI has grown steadily (Figure 1), and XAI has attracted a
mentations are starting to outperform even the best expectations thriving number of researchers, though there still exists a lack
across many domains [3]. Since AI solves difficult problems, of consensus regarding symbology and terminology. Contri-
the methodologies used have become increasingly complex. A butions rely heavily on their own terminology or theoretical
common analogy is that of the black box, where the inputs are framework [6].
well-defined, as are the outputs. However, the process is not Researchers have been working to increase the interpretabil-
transparent and cannot be easily understood by humans. The ity of AI and ML models to gain better insight into black-box
AI system does not usually provide any information about how decision-making. Questions being explored include how to ex-
it arrives at the decisions it makes. The systems and processes plain the decision-making process, approaches for interpretabil-
used in decision-making are often abstruse and contain uncer- ity and explainability, ethical implications, and detecting and
1
works. In Figure 2, our “all you need here” shows how our
survey offers a clear and systematic approach, enabling read-
ers to understand the multifaceted nature of XAI. To the best
of our knowledge, this is the first work to comprehensively re-
view explainability across traditional neural network models,
reinforcement learning models, and Transformer-based mod-
els (including large language models and Vision Transformer
models), covering various application areas, evaluation meth-
ods, XAI challenges, and future research directions.
addressing potential biases or errors [1, 7]. These and other crit-
ical questions remain open and require further research. This
survey attempts to address these questions and provide new in-
sights to advance the adoption of explainable artificial intelli-
gence among different stakeholders, including practitioners, ed-
ucators, system designers, developers, and other beneficiaries.
A significant number of comprehensive studies on XAI have
been released. XAI survey publications usually focus on de-
scribing XAI basic terminology, outlining the explainability
taxonomy, presenting XAI techniques, investigating XAI ap-
plications, analyzing XAI opportunities and challenges, and
proposing future research directions. Depending on the goals Figure 2: ‘All you need here’-A comprehensive overview of XAI concepts.
of each study, the researchers may concentrate on specific as-
pects of XAI. Some outstanding survey papers and their main The main contributions of our work are presented below:
contributions are as follows. Gilpin et al. [8] defined and distin-
• Develop and present a comprehensive review of XAI that
guished the key concepts of XAI, while Adadi and Berrada [9]
addresses and rectifies the limitations observed in previous
introduced criteria for developing XAI methods. Arrieta et
review studies.
al. [10] and Minh et al. [11] concentrated on XAI techniques.
In addition to XAI techniques, Vilone and Longo [4] also ex- • More than two hundred research articles were surveyed in
plored the evaluation methods for XAI. Stakeholders, who ben- this comprehensive study in the XAI field.
efit from XAI, and their requirements were examined by Langer
et al. [12]. Speith [13] performed studies on the common XAI • Discuss the advantages and drawbacks of each XAI tech-
taxonomies and identified new approaches to build new XAI nique in depth.
taxonomies. Räuker et al. [14] emphasized on inner inter-
pretability of the deep learning models. The use of XAI to • Highlight the research gaps and challenges of XAI to
enhance machine learning models is investigated in the study strengthen future works.
of Weber et al. [15]. People have discussed the applications of
The paper is organized into eight sections: Section 2 intro-
XAI in a variety of domains and tasks [16] or specific domains,
duces relevant terminology and reviews the background and
such as medicine [17, 18, 19], healthcare [20, 21, 22, 23], and
motivation of XAI research. Section 3 and Section 4 present
finance [23]. Recently, Longo et al. [24] proposed a manifesto
types of explainability techniques and discussions on XAI tech-
to govern the XAI studies and introduce more than twenty open
niques along different dimensions, respectively. Section 5 dis-
problems in XAI and their suggested solutions.
cusses XAI techniques in different applications. Section 6 and 7
Our systematic review carefully analyzes more than two hun-
present XAI evaluation methods and future research direction,
dred studies in the domain of XAI. This survey provides a com-
respectively. Section 8 concludes the survey.
plete picture of XAI techniques for beginners and advanced re-
searchers. It also covered explainable models, application ar-
eas, evaluation of XAI techniques, challenges, and future direc- 2. Background and Motivation
tions in the domain, Figure 2. The survey provides a compre-
hensive overview of XAI concepts, ranging from foundational Black-box AI systems have become ubiquitous and are per-
principles to recent studies incorporating mathematical frame- vasively integrated into diverse areas. XAI has emerged as a
2
necessity to establish trustworthy and transparent models, en- have arguably become as opaque as the human brain [30]. The
sure governance and compliance, and evaluate and improve the model accepts the input and gives the output or the prediction
decision-making process of AI systems. without any reasonable details about why and how the model
made that prediction or decision. The black-box nature of AI
2.1. Basic Terminology models can be attributed to various factors, including model
complexity, optimization algorithms, large and complex train-
Before discussing XAI in-depth, we briefly present the basic ing data sets, and the algorithms and processes used to train the
terminology used in this work. models. Deep neural AI models, in particular, exacerbate these
AI systems can perform tasks that normally require human concerns due to the design of deep neural networks (DNN),
intelligence [25]. They can solve complex problems, learn from with components that remain hidden from human comprehen-
large amounts of data, make autonomous decisions, and under- sion.
stand and respond to challenging prompts using complex algo-
rithms.
2.2. Need for Explanation
XAI systems refer to AI systems that are able to provide
explanations for their decisions or predictions and give insight Black-box AI systems have become ubiquitous throughout
into their behaviors. In short, XAI attempts to understand society, extensively integrated in a diverse range of disciplines,
“WHY did the AI system do X?”. This can help build compre- and can be found permeating many aspects of daily activities.
hensions about the influences on a model and specifics about The need for explainability in real-world applications is multi-
where a model succeeds or fails [11]. faceted and essential for ensuring the performance and reliabil-
Trust is the degree to which people are willing to have con- ity of AI models while allowing users to work effectively with
fidence in the outputs and decisions provided by an AI system. these models. XAI is becoming essential in building trustwor-
A relevant question is: Does the user trust the output enough to thy, accountable, and transparent AI models to satisfy delicate
perform real-world actions based upon it? [26]. application designs [31, 32].
Machine learning is a rapidly evolving field within com- Transparency: Transparency is the capability of an AI sys-
puter science. It is a subset of AI that involves the creation of tem to provide understandable and reasonable explanations of a
algorithms designed to emulate human intelligence by captur- model’s decision or prediction process [4, 33, 34]. XAI systems
ing data from surrounding environments and learning from such explain how AI models arrive at their prediction or decision so
data using models, as discussed in the previous paragraph [27]. that experts and model users can understand the logic behind
ML imitates the way humans learn, gradually improving accu- the AI systems [17, 35], which is crucial for trustworthiness and
racy over time based on experience. In essence, ML is about en- transparency. Transparency has a meaningful impact on peo-
abling computers to think and act with less human intervention ple’s willingness to trust the AI system by using directly inter-
by utilizing vast amounts of data to recognize patterns, make pretable models or availing XAI system explanations [36]. For
predictions, and take actions based on that data. example, if on a mobile device, voice-to-text recognition sys-
Models and algorithms are two different concepts. How- tems produce wrong transcription, the consequences may not
ever, they are used together in the development of real-world always be a big concern although it may be irritating. This may
AI systems. A model (in the context of machine learning) is a also be the case in a chat program like ChatGPT if the questions
computational representation of a system whose primary pur- and answers are “simple”. In this case, the need for explainabil-
pose is to make empirical decisions and predictions based on ity and transparency is less profound. In contrast, explainabil-
the given input data (e.g., neural network, decision tree, or lo- ity and transparency are crucial in critical safety systems such
gistic regression). In contrast, an algorithm is a set of rules or as autonomous vehicles, medical diagnosis and treatment sys-
instructions used to perform a task. The models can be simple tems, air traffic control systems, and military systems [2].
or complex, and trained on the input data to improve their accu- Governance and compliance issues: XAI enables gover-
racy in decision-making or prediction. Algorithms can also be nance in AI systems by confirming that decisions made by AI
simple or complex, but they are used to perform a specific task systems are ethical, accountable, transparent, and compliant
without any training. Models and algorithms differ by output, with any laws and regulations. Organizations in domains such
function, design, and complexity [28]. as healthcare and finance can be subject to strict regulations,
Deep learning refers to ML approaches for building multi- requiring human understanding for certain types of decisions
layer (or “deep”) artificial neural network models that solve made by AI models [1, 37, 38]. For example, if someone is
challenging problems. Specifically, multiple (and usually com- denied a loan by the bank’s AI system, he or she may have the
plex) layers of neural networks are used to extract features from right to know why the AI system made this decision. Simi-
data, where the layers between the input and output layers are larly, if a class essay is graded by an AI and the student gets
“hidden” and opaque [29]. a bad grade, an explanation may be necessary. Bias is often
A black-box model refers to the lack of transparency and present in the nature of ML algorithms’ training process, which
understanding of how an AI model works when making pre- is sometimes difficult to notice. This raises concerns about an
dictions or decisions. Extensive increases in the amount of data algorithm acting in a discriminatory way. XAI has been found
and performance of computational devices have driven AI mod- to serve as a potential remedy for mitigating issues of discrim-
els to become more complex, to the point that neural networks ination in the realms of law and regulation [39]. For instance,
3
if AI systems use sensitive and protected attributes (e.g., re-
ligion, gender, sexual orientation, and race) and make biased
decisions, XAI may help identify the root cause of the bias and
give insight to rectify the wrong decision. Hence, XAI can help
promote compliance with laws and regulations regarding data
privacy and protection, discrimination, safety, and reliability.
Model performance and debugging: XAI offers poten-
tial benefits in enhancing the performance of AI systems, par-
ticularly in terms of model design and debugging as well as
decision-making processes [2, 40, 41]. The use of XAI tech-
niques facilitates the identification and selection of relevant fea-
tures for developing accurate and practical models. These tech-
niques help tune hyperparameters such as choice of activation
functions, number of layers, and learning rates to prevent under-
fitting or overfitting. The explanation also helps the developers
with bias detection in the decision-making process. If the de-
velopers quickly detect the bias, they can adjust the system to
ensure that outputs are unbiased and fair. XAI can enable de-
velopers to identify decision-making errors and correct them,
helping develop more accurate and reliable models. Explana-
tion can enable users to have more control over the models so
as to be able to modify the input parameters and observe how Figure 3: XAI stakeholders/beneficiaries.
parameter changes affect the prediction or decision. Users can
also provide feedback to improve the model decision process
based on the XAI explanation. Industries: XAI is crucial for industries to provide trans-
Reliability: ML models’ predictions and outputs may result parent, interpretable, accountable, and trustable services and
in unexpected failures. We need some control mechanisms or decision-making processes. XAI can also help industries iden-
accountability to trust the AI models’ predictions and decisions. tify and reduce the risk of errors and biases, improve regulatory
For example, a wrong decision by a medical or self-driving compliance, enhance customer trust and confidence, facilitate
black-box may result in high risk for the impacted human be- innovations, and increase accountability and transparency.
ings [31, 38]. Researchers and system developers: The importance of
Safety: In certain applications, such as self-driving cars or XAI to researchers and AI system developers cannot be over-
military drones, it is important to understand the decisions made stated, as it provides critical insights that lead to improved
by an AI system in order to ensure the safety, security, and the model performance. Specifically, XAI techniques enable them
lives of humans involved [42]. to understand how AI models make decisions, and enable the
Human-AI collaboration: XAI can facilitate collaboration identification of potential improvement and optimization. XAI
between humans and AI systems by enabling humans to under- helps facilitate innovation and enhance the interpretability and
stand the reasoning behind an AI’s actions [43]. explainability of the model. From a regulatory perspective, XAI
can help enhance compliance with legal issues, in particular
2.3. Stakeholders of the XAI laws and regulations related to fairness, privacy, and security
Broadly speaking, all users of XAI systems, whether direct in the AI system. Finally, XAI can facilitate the debugging pro-
or indirect, stand to benefit from AI technology. Some of the cess critical to researchers and system developers, leading to
most common beneficiaries of the XAI system are identified in the identification and correction of errors and biases.
Figure 3.
Society: XAI plays a significant role in fostering social col- 2.4. Interpretability vs. Explainability
laboration and human-machine interactions [2] by increasing
the trustworthiness, reliability, and responsibility of AI systems, The concepts of interpretability and explainability are dif-
helping reduce the negative impacts such as unethical use of AI ficult to define rigorously. There is ongoing debate and re-
systems, discrimination, and biases. Hence, XAI promotes trust search about the best ways to operationalize and measure these
and the usage of models in society. two concepts. Even terminology can vary or be used in con-
Governments and associated organizations: Governments tradictory ways, though the concepts of building comprehen-
and governmental organizations have become AI system users. sion about what influences a model, how influence occurs, and
Therefore, the government will be greatly benefited by XAI where the model performs well and fails, are consistent within
systems. XAI can help develop the government’s public pol- the many definitions of these terms. Most studies at least agree
icy decisions, such as public safety and resource allocation, by that explainability and interpretability are related but distinct
making them transparent, accountable, and explainable to soci- concepts. Previous work suggests that interpretability is not a
ety. monolithic concept but a combination of several distinct ideas
4
that must be disentangled before any progress can be made to- plainability criteria, such as scope, stage, result, and function
ward a rigorous definition [44]. Explainability is seen as a sub- [48, 10, 13], are what we believe to be the most important be-
set of interpretability, which is the overarching concept that en- cause they provide a systematic and comprehensive framework
compasses the idea of opening the black-box. for understanding and evaluating different XAI techniques. We
The very first definition of interpretability in ML systems is have developed this taxonomy through rigorous study and anal-
“the ability to explain or to present in understandable terms to a ysis of existing taxonomies, along with an extensive review
human” [39], while explainability is “the collection of features of research literature pertinent to explainable artificial intelli-
of the interpretable domain, that have contributed for a given gence. We categorize our reviewed papers by the scope of
example to produce a decision” [45]. As indicated by Fuhrman explainability and training level or stage. The explainability
et al. [46], interpretability refers to “to understanding algorithm technique can be either global or local, and model-agnostic or
output for end-user implementation” and explainability refers model-specific, which can explain the model’s output or func-
to “techniques applied by a developer to explain and improve tion [9].
the AI system”. Gurmessa and Jimma [47] defined these con-
cepts as “the extent to which human observers can understand 3.1. Local and Global Explanation Techniques
the internal decision-making processes of the model” and “the Local and global approaches refer to the scope of explana-
provision of explanations for the actions or procedures under- tions provided by an explainability technique. Local explana-
taken by the model”, respectively. tions are focused on explaining predictions or decisions made
According to Das et al., [48], interpretability and explainabil- by a specific instance or input to a model [2, 10]. This approach
ity are the ability “to understand the decision-making process of is particularly useful for examining the behavior of the model
an AI model” and “to explain the decision-making process of an in relation to the local, individual predictions or decisions.
AI model in terms understandable to the end user”, respectively. Global techniques provide either an overview or a complete
Another study defines these two concepts as “the ability to de- description of the model, but such techniques usually require
termine cause and effect from a machine learning model” and knowledge of input data, algorithm, and trained model [44].
“the knowledge to understand representations and how impor- The global explanation technique needs to understand the
tant they are to the model’s performance”, respectively [8]. The whole structures, features, weights, and other parameters. In
AWS reports that interpretability is “to understand exactly why practice, global techniques are challenging to implement since
and how the model is generating predictions”, whereas explain- complex models with multiple dimensions, millions of param-
ability is “is how to take an ML model and explain the behavior eters, and weights are challenging to understand.
in human terms”.
The goal of explainability and interpretability is to make it 3.2. Ante-hoc and Post-hoc Explanation Techniques
clear to a user how the model arrives at its output, so that the Ante-hoc and post-hoc explanation techniques are two dif-
user can understand and trust the model’s decisions. However, ferent ways to explain the inner workings of AI systems. The
there are no satisfactory functionally-grounded criteria or uni- critical difference between them is the stage in which they
versally accepted benchmarks [49]. The most common defi- are implemented [7]. The ante-hoc XAI techniques are em-
nitions of interpretable ML models are those that are easy to ployed during the training and development stages of an AI sys-
understand and describe, while explainable ML models can tem to make the model more transparent and understandable,
provide an explanation for their predictions or decisions [50]. whereas the post-hoc explanation techniques are employed af-
A model that is highly interpretable is one that is simple and ter the AI models have been trained and deployed to explain
transparent, and whose behavior can be easily understood and the model’s prediction or decision-making process to the model
explained by humans. Conversely, a model that is not inter- users. Post-hoc explainability focuses on models which are
pretable is one that is complex and opaque, and whose behavior not readily explainable by ante-hoc techniques. Ante-hoc and
is difficult for humans to understand or explain [51]. post-hoc explanation techniques can be employed in tandem to
In general, interpretability is concerned with how a model gain a more comprehensive comprehension of AI systems, as
works, while explainability is concerned with why a model they are mutually reinforcing [10]. Some examples of ante-
makes a particular prediction or decision. Interpretability is hoc XAI techniques are decision trees, general additive models,
crucial because it allows people to understand how a model is and Bayesian models. Some examples of post-hoc XAI tech-
making predictions, which can help build trust in the model and niques are Local Interpretable Model-Agnostic Explanations
its results. Explainability is important because it allows peo- (LIME) [26] and Shapley Additive Explanations (SHAP) [52].
ple to understand the reasoning behind a model’s predictions, Arrieta et al. [10] classify the post-hoc explanation tech-
which can help identify any biases or errors in the model. Ta- niques into two categories:
ble 1 presents some representative XAI techniques and where
they lie on the spectrum. • Model-specific approaches provide explanations for the
predictions or decisions made by a specific AI model,
3. Categories of Explainability Techniques based on the model’s internal working structure and de-
sign. These techniques may not apply to other models with
In this section, we introduce a taxonomy for XAI techniques varying architectures, since they are designed for specific
and use specific criteria for general categorization. These ex- models [4]. However, a model-specific technique provides
5
Table 1: Examples of representative XAI techniques and where they lie on the spectrum.
Spectrum XAI techniques How does it work How to understand and explain
Because the model is based on simple linear
Use a linear relationship between equations, it is easy for a human to under-
Linear regression the input features and the target stand and explain the relationship between
variables to make predictions. the input features and the target variable.
This is built into the model.
Closer to
Because the rules are explicit and transpar-
Interpretability
ent, these models are both interpretable and
Rule-based mod- Use a set of explicit rules to make
explainable, as it is easy for a human to un-
els predictions.
derstand and explain the rules that the model
is using.
These decision rules are based on the val-
ues of the input features. Because it is easy
Use a series of simple decision
In the middle Decision trees to trace the model’s predictions back to the
rules to make predictions.
input data and the decision rules, it’s both
interpretable and explainable.
Because it provides a clear understanding of
which features are most important, it is easy
Use an algorithm to identify
Feature impor- to trace the model’s predictions back to the
the most important features in a
tance analysis input features. This is usually post-hoc and
model’s prediction or decision.
not part of the model architecture, so more
Closer to explainable then interpretable.
Explainability Use an approximate model to pro-
vide explanations for the predic-
Local inter- tions of a complex ML model. It Because it provides explanations for the pre-
pretable model- works by approximating the com- dictions of a complex model in a way that
agnostic explana- plex model with a simple, inter- is understandable to a human and is model-
tions pretable model, and providing ex- agnostic, LIME is explainable
planations based on the simple
model.
good insights into how the model works and makes a de- 3.3. Perturbation-based and Gradient-based XAI
cision. For example, neural networks, random forests, and
support vector machine models require model-specific ex- Perturbation-based and gradient-based methods are two of
planation methods. The model-specific technique in neu- the most common algorithmic design methodologies for devel-
ral networks provides more comprehensive insights into oping XAI techniques. Perturbation-based methods operate by
the network structure, including how weights are allocated modifying the input data, while gradient-based methods calcu-
to individual neurons and which neurons are explicitly ac- late the gradients of the model’s prediction with respect to its
tivated for a given instance. input data. Both techniques compute the importance of each
input feature through different approaches and can be used for
local and global explanations. Additionally, both techniques are
generally model-agnostic.
Perturbation-based XAI methods use perturbations to deter-
• Model-agnostic approaches are applied to all AI models mine the importance of each feature in the model’s prediction
and provide explanations of the models without depending process. These methods involve modifying the input data, such
on an understanding of the model’s internal working struc- as removing certain input examples, masking specific input fea-
ture or design. This approach is used to explain complex tures, producing noise over the input features, observing how
models that are difficult to explain using ante-hoc expla- the model’s output changes as a result, generating perturbations,
nation techniques. Model-agnostic approaches are model and analyzing the extent to which the output is affected by the
flexible, explanation flexible, and representation flexible, change of the input data. By comparing the original output with
making them useful for a wide range of models. However, the output from the modified input, it is possible to infer which
if the model is very complex, it may be hard to understand features of the input data are most important for the model’s
its behavior globally due to its flexibility and interpretabil- prediction [26]. The importance of each feature value provides
ity [51]. valuable insights into how the model made that prediction [52].
6
Hence, the explanation of the model is generated iteratively us- Y represent the input and output spaces, respectively [48, 57].
ing perturbation-based XAI techniques such as LIME, SHAP, Specifically, x ∈ X denotes an input instance, and y ∈ Y denotes
and counterfactual. the corresponding output or prediction. Let X ′ be the set of
Gradient-based XAI methods obtain the gradients of the perturbed and generated sample instances around the instance
model’s prediction with respect to its input features. These gra- x and x′ ∈ X ′ , an instance from this set. Another function g
dients reflect the sensitivity of the model’s output to changes in maps instances of X ′ to a set of representations denoted as Y ′
each input feature [53]. A higher gradient value for an input which are designed to be easily understandable or explainable:
feature implies greater importance for the model’s prediction. g : X ′ → Y ′ , where y′ ∈ Y ′ is an output from the set of possible
Gradient-based XAI methods are valuable for their ability to outputs in Y ′ . The use of interpretable instances in Y ′ allows
handle high-dimensional input space and scalability for large for clearer insights into the model’s prediction processes.
datasets and models. These methods can help gain a deeper un- LIME [26] provides an explanation for each input instance x,
derstanding of the model and detect errors and biases that de- where f (x) = y′ ≈ y is the prediction of the black-box model.
crease its reliability and accuracy, particularly in safety-critical The LIME model is g ∈ G where g is explanation model that be-
applications such as health care and self-driving cars [2]. Class longs to a set of interpretable models G. Let’s say every g < G
activation maps, integrated gradients, and saliency maps are is “good enough” to be interpretable. To prove this hypothesis,
among the most commonly used gradient-based XAI methods. LIME uses three important arguments: a measure of complexity
Figure 4 presents a summary of explainability taxonomy dis- Ω(g) of the explanation, ensuring it remains simple enough for
cussed in this section. human understanding; a proximity measure (π x (z)) that quanti-
Figure 5 illustrates a chronological overview of the state-of- fies the closeness between the original instance x and its pertur-
the-art XAI techniques focused on in this survey. Perturbation- bations; and a fidelity measure ζ( f, g, π x ) which assesses how
based methods like LIME, SHAP, Counterfactual explanations, well g approximates f ’s predictions, aiming for this value to be
and gradient approaches, including LRP, CAM, and Integrated minimal to maximize the faithfulness of the explanation. The
Gradients, have been selected for detailed discussion in this following formula achieves the explanation produced by LIME:
context. They serve as foundational frameworks upon which
other techniques are built, highlighting their significance in ξ(x) = argmin ζ( f, g, π x ) + Ω(g). (1)
the field. “Transformer Interpretability Beyond Attention Vi-
sualization” [54] and “XAI for Transformers: Better Explana- Figure 6 illustrates the LIME model for explaining a predic-
tions through Conservative Propagation” [55] are foundational tion of a black-box model based on an instance. LIME can be
works for discussing transformer explainability, providing key considered a model-agnostic technique for generating explana-
insights and practices that serve as a baseline in this survey. tions that can be used across different ML models. LIME is
insightful in understanding the specific decisions of a model by
providing local individual instance explanations, and in detect-
4. Detailed Discussions on XAI Techniques
ing and fixing biases by identifying the most influential feature
XAI techniques differ in their underlying mathematical prin- for a particular decision made by a model [10].
ciples and assumptions, as well as in their applicability and lim-
itations. We classify the widely used XAI techniques based on 4.1.2. SHAP
perturbation, gradient, and the use of the Transformer [56] ar- SHAP [52] is a model-agnostic method, applicable to any
chitecture. The Transformer has become a dominant architec- ML model, ranging from simple linear models to complex
ture in deep learning, whether it is in natural language process- DNN. This XAI technique employs contribution values as
ing, computer vision, time series data, or anything else. As a means for explicating the extent to which features contribute
result, we include a separate section on Transformer explain- to a model’s output. The contribution value is then leveraged to
ability. explain the output of a given instance x. SHAP computes the
average contribution of each feature through the subset of fea-
4.1. Perturbation-based Techniques tures by simulating the model’s behavior for all combinations
Perturbation-based XAI methods are used to provide local of feature values. The difference in output is computed when
and global explanations of the black-box models by making a feature is excluded or included in that output process. The
small and controlled changes to the input data to gain insights subsequent contribution values give a measure of the feature
into how the model made that decision. This section discusses relevance, which is significant to the model’s output [52, 58].
the most predominant perturbation-based XAI techniques, such Assume f is the original or black-box model, g is the expla-
as LIME, SHAP, and Counterfactual Explanations (CFE), in- nation model, M is the number of simplified input features, x is
cluding their mathematical formulation and underlying assump- a single input, and x′ is a simplified input such that x = h x (x′ ).
tions. Additive feature attribution methods, such as SHAP, have a lin-
ear function model explanation with binary variables.
4.1.1. LIME
M
A standard definition of a black-box model f , where the in-
X
g(x′ ) = ϕ0 + ϕi z′i , (2)
ternal workings are unknown, is f : X → Y, where X and i=1
7
Figure 4: Explainability taxonomy.
where ϕ0 is the default explanation when no binary features, features of influences that may result in less accurate explana-
z′i ∈ {0, 1} M and ϕi ∈ R. The SHAP model explanation must tions. SHAP’s explanation is model output dependent. If the
satisfy three properties to provide high accuracy [52]: (i) “lo- model is biased, SHAP’s explanation reflects the bias of the
cal accuracy” requires that the explanation model g(x′ ) matches model behavior.
the original model f (x), (ii) “missingness” which states, if the
simplified inputs denote feature presence, then its attribute in- 4.1.3. CFE
fluence would be 0. More simply, if a feature is absent, it should CFE [59] is used to explain the predictions made by the
have no impact on the model output, and (iii) “consistency” re- ML model using generated hypothetical scenarios to under-
quires that if the contribution of a simplified input increases or stand how the model’s output is affected by changes in input
stays the same (regardless of the other inputs), then the input’s data. The standard classification models are trained to find the
attribution should not decrease. the optimal set of weights w:
SHAP leverages contribution values to explain the impor- argminω ζ( fω (xi ), yi ) + ρ(w), (3)
tance of each feature to a model’s output. The explanation
is based on the Shapley values, which represent the average where f is a model, ρ is the regularizer to prevent overfitting
contribution of each feature over all possible subsets of fea- in the training process, yi is the label for data point xi , and w
tures. However, in complex models, SHAP approximates the represents the model parameters to be optimized. The argument
8
and visualizing the most significant regions of that image for
the model’s prediction.
Suppose an image I0 , a specific class c, and the CNN classi-
fication model with the class score function S c (I) (that is used
to determine the score of the image) are analyzed. The pixels
of I0 are ranked based on their impact on this score S c (I0 ). The
linear score model for the class c is obtained as follows:
9
LRP is subject to the conservation property, which means a
neuron that receives the relevance score must be redistributed
to the lower layer in an equal amount. Assume j and k are two
consecutive layers, where layer k is closer to the output layer.
The neurons in layer k have computed their relevance scores,
denoted as (Rk )k , propagating relevance scores to layer j. Then,
propagated relevance score to neuron R j is computed using the
following formula [62]:
X z jk
Rj = P Rk , (8)
k j z jk
P Figure 8: The predicted class score is mapped back to the previous convolu-
where z jk is the contribution of neuron j to Rk and j z jk is used tional layer to generate the class activation maps (input image from CIFAR 10
to enforce the conservation property. In this context, a pertinent dataset).
question arises as to how do we determine z jk , which represents
the contribution of a neuron j to a neuron k in the network, is
ascertained? LRP uses three significant rules to address this where wck is the weight relating to the class c for unit k. We can
question [61]. compute the class activation map Mc for class c of each special
element as k wck fk (x, y). Therefore, S c for a given class c can
P
• The basic rule redistributes the relevance score to the input be rewritten: X
features in proportion to their positive contribution to the Sc = Mc (x, y). (10)
output. x,y
• The Epsilon rule uses an ϵ to diminish relevance scores In the previous formula, Mc (x, y) shows the significance of the
when contributions to neuron k are weak and contradic- activation at spatial location (x, y), and it is critical to determine
tory. the class of the image to class c.
CAM is a valuable explanation technique for understanding
• The Gamma rule uses a large value of γ to reduces negative
the decision-making process of deep learning models applied
contribution or to lower noise and enhance stability.
to image data. However, it is important to note that CAM
Overall, LRP is faithful, meaning that it does not introduce is model-specific, as it requires access to the architecture and
any bias into the explanation [61]. This is important for en- weights of the CNN model being used.
suring that the explanations are accurate and trustworthy. LRP
is complex to implement and interpret, which requires a good 4.2.4. Integrated Gradients
understanding of the neural networks’ architecture. It is compu-
Integrated Gradients [64] provides insights into the input-
tationally expensive for large and complex models to compute
output behavior of DNNs which is critical in improving and
the backpropagating relevance scores throughout all layers of
building transparent ML models. Sundararajan et al. [64]
the networks. LRP is only applicable to backpropagation-based
strongly advocated that all attribution methods must adhere to
models like neural networks. It requires access to the internal
two axioms. The Sensitivity axiom is defined such that “an
structure and parameters of the model, which is sometimes im-
attribution method satisfies Sensitivity if for every input and
possible if a model is proprietary. LRP is a framework for other
baseline that differ in one feature but have different predictions,
XAI techniques. However, there is a lack of standardization,
then the differing feature should be given a non-zero attribu-
which leads to inconsistent explanations through different im-
tion”. The violation of the Sensitivity axiom may expose the
plementations.
model to gradients being computed using non-relevant features.
Thus, it is critical to control this sensitivity violation to assure
4.2.3. CAM
the attribution method is in compliance. The Implementation
CAM [63] is an explanation technique typically used for
Invariance axiom is defined as “two networks are functionally
CNN and deep learning models applied to image data. For ex-
equivalent if their outputs are equal for all inputs, despite hav-
ample, CAM can explain the predictions of a CNN model by
ing very different implementations. Attribution methods should
indicating which regions of the input image the model is focus-
satisfy Implementation Invariance, i.e., the attributions are al-
ing on, or it can simply provide a heatmap for the output of the
ways identical for two functionally equivalent networks”. Sup-
convolutional layer, as shown in Figure 8.
pose two neural networks perform the same task and gener-
Let fk (x, y) denote the activation of unit k in the last convo-
ate identical predictions for all inputs. Then, any attribution
lutional layer at location (x, y) in the given image, the global
P method used on them should provide the same attribution val-
average pooling is computed by x,y fk (x, y). Then, the input to
ues for each input to both networks, regardless of the differ-
the softmax function, called S c , for a given class c is obtained
ences in their implementation details. This ensures that the at-
using the following formula:
tributions are not affected by small changes in implementation
X X XX
Sc = wck fk (x, y) = wck fk (x, y), (9) details or architecture, thus controlling inconsistent or unreli-
k x,y x,y k able outputs. In this way, Implementation Invariance is critical
10
to ensuring consistency and trustworthiness of attribution meth- The relevance scores signify how much each feature at each
ods. layer contributes to the final prediction and decision.
Consider a function F: Rn → [0, 1], which represents a DNN. The LRP framework is a baseline for developing various rel-
We take x ∈ Rn to be the input instance and x′ ∈ Rn be the base- evance propagation rules. Let’s start the discussion by embed-
line input. In order to produce a counterfactual explanation, it ding Gradient×Input into the LRP framework to explain Trans-
is important to define the baseline as the absence of a feature in formers [55]. Assume (xi )i and (y j ) j represent the input and
the given input. However, it may be challenging to identify the output vectors of the neurons, respectively, and f is the out-
baseline in a very complex model. For instance, the baseline for put of the model. Gradient×Input attributions on these vector
image data could be black images, while for NLP data, it could representations can be computed as:
be a zero embedding vector, which is a vector of all zeroes used
as a default value for words not found in the vocabulary. To R(xi ) = xi · (∂ f /∂xi ) and R(y j ) = y j · (∂ f /∂y j ). (12)
obtain Integrated Gradients, we consider the straight-line path
(in Rn ) from x′ (the baseline) to the input instance x and com- The gradients at different layers are computed using the chain
pute the gradients at all points along this path. The collection rule. This principle states that the gradient of the function f
of these gradients provides the Integrated Gradients. In other with respect to an input neuron xi can be expressed as the sum
words, Integrated Gradients can be defined as the path integral of the products of two terms: the gradients of all connected
of the gradients along the straight-line path from x′ to the input neurons y j with respect to xi and the gradients of the function f
instance x. with respect to those neurons y j . This is mathematically repre-
The gradient of F(x) along the ith dimension is given by ∂F(x)
∂xi , sented as follows:
th
leading to the Integrated Gradient (IG) along the i dimension
for an input x and baseline x′ to be described as: ∂f X ∂ f ∂y j
= . (13)
∂xi j
∂y j ∂xi
1
∂F(x′ + α × (x − x′ ))
Z
IGi (x) = (xi − xi′ ) × dα. (11)
α=0 ∂xi We can convert the gradient propagation rule into an equivalent
relevance propagation by inserting equation (12) into equation
The Integral Gradient XAI method satisfies several important (13):
properties, such as sensitivity, completeness, and implementa- X ∂y j xi
tion details. It can be applied to any differential model, making R(xi ) = R(y j ), (14)
it a powerful and model-specific tool for explanation in a DNN j
∂xi y j
[64].
with the convention 0/0=0. We can prove that
=
P P
i R(xi ) j R(y j ) easily, and if this condition holds true,
4.3. XAI for Transformers conservation also holds true. However, Transformers break this
Transformers [56] have emerged as the dominant architec- conservation rule. The following subsections discuss methods
ture in Natural Language Processing (NLP), computer vision, to improve propagation rule [55].
multi-modal reasoning tasks, and a diverse and wide array of
applications such as visual question answering, image-text re- 4.3.1. Propagation in Attention Heads
trieval, cross-modal generation, and visual commonsense rea-
Transformers work based on Query (Q), Key (K), and
soning [65]. Predictions and decisions in Transformer-based
Value (V) matrices, and consider the attention head, which uses
architectures heavily rely on various intricate attention mech-
these core components [56]. The attention heads have the fol-
anisms, including self-attention, multi-head attention, and co-
lowing structure:
attention. Explaining these mechanisms presents a significant
challenge due to their complexity. In this section, we explore
1
the interpretability aspects of widely adopted Transformer- Y = so f tmax( √ (X ′ WQ )(XWK )τ )X, (15)
based architectures. dk
Gradient×Input [58], [66] and LRP [61] XAI techniques
where X = (xi )i and X ′ = (x′j ) j are input sequences, Y = (y j ) j
have been extended to explain Transformers [67], [68]. Atten-
is the sequence of the output, W{Q,K,V} are learned projection
tion rollouts [69] and generic attention are new techniques to
matrices, and dk is the dimensionality of the Key-vector. The
aggregate attention information [65] to explain the Transform-
previous equation is rewritten as follows:
ers. LRP [61], Gradient×Input [53], Integrated Gradients [64],
and SHAP [52] are designed based on the conservation ax- X
iom for the attribution of each feature. The conservation ax- yj = xi pi j , (16)
i
iom states that each input feature contributes a portion of the
predicted score at the output. LRP is employed to assign the exp(qi j )
model’s output back to the input features by propagating rele- where y j is the output, pi j = P
exp(qi′ j )
i′
is the softmax computa-
vance scores backward through the layers of a neural network, tion, and qi j = √1 xτ WK W τ x′ is the matching function between
dk i Q j
measuring their corresponding contributions to the final output. the two input sequences.
11
4.3.2. Propagation in LayerNorm counterfactual explanations, policy distillation, attention mech-
LayerNorm or Layer normalization is the crucial component anisms, Human-in-the-loop, query system, and natural lan-
in Transformers used to stabilize and improve the training of guage explanations [80]. Explainability in reinforcement learn-
models. LayerNorm is involved in the centering and standard- ing is crucial, particularly for safety-critical domains, due to
ization of key operations, defined as follows: the need for trust, safety assurance, regulatory compliance, eth-
ical decision-making, model debugging, collaborative human-
xi − E[x] AI interaction, accountability, and AI model adoption and ac-
yi = √ , (17)
ϵ + Var[x] ceptance [73, 81, 82].
where E[·] and Var[·] represent the mean and variance overall 4.5. Summary
activation of the corresponding channel. The relevance prop-
agation associated with Gradient×Input is represented by the Applying XAI techniques can enhance transparency and
conservation equation: trust in AI models by explaining their decision-making and
prediction processes. These techniques can be classified
X Var[x] X into categories such as local or global, post-hoc or ante-hoc,
R(xi ) = (1 − ) R(yi ), (18)
i
ϵ + Var[x] i model-specific or model-agnostic, and perturbation or gradient
methodology. We have added a special subsection for reinforce-
where R(y j ) = xτj (∂ f /∂y j ). The implied propagation rules in ment learning and Transformers due to their popularity and pro-
attention heads and LayerNorm in equation (14) are replaced found impact on applications of deep learning in a wide variety
by ad-hoc propagation rules to ensure conservation. Hence, we of areas. Table 2 summarizes the reviewed XAI techniques dis-
make a locally linear expansion of attention head by observing cussed.
the gating terms pi j as constant, and these terms are considered
as the weights of a linear layer which is locally mapping the
5. XAI Techniques in Application Areas
input sequence x into the output sequence y. As a result, we
can use the canonical LRP rule for linear layers as follows:
The area of XAI has been gaining attention in recent years
X xi pi j due to the growing need for transparency and trust in ML mod-
R(xi ) = P R(y j ). (19) els [26, 52]. XAI techniques are being used to explain the pre-
j i′ xi′ pi′ j
dictions of ML models [39, 2, 8]. These techniques can help
identify errors and biases that decrease the reliability and ac-
Recent studies such as Attention rollouts [69], generic atten- curacy of the models. This section explores the different XAI
tion [65], and Better Explanations through Conservative Propa- techniques used in natural language processing, computer vi-
gation [55] have provided empirical evidence that it is possible sion, and time series analysis, and how they contribute to im-
to improve the explainability of Transformers. proving the trust, transparency, and accuracy of ML models in
different application areas.
4.4. Explainability in Reinforcement Learning
5.1. Explainability in Natural Language Processing
Reinforcement Learning (RL) involves applications across
various domains, including safety-critical areas like au- Natural language processing employs ML, as it can help
tonomous vehicles, healthcare, and energy systems [70, 71]. efficiently handle, process, and analyze vast amounts of text
In the domain of autonomous vehicles, RL is employed to re- data generated daily through areas such as human-to-human
fine adaptive cruise control and lane-keeping features by learn- communication, chatbots, emails, and context generation soft-
ing optimal decision-making strategies from simulations of di- ware, to name a few [83]. One barrier to implementation is
verse traffic scenarios [72]. Explainability in reinforcement that such data are usually not inherently clean, and prepro-
learning concerns the ability to understand and explain the ra- cessing and training are essential tasks for achieving accurate
tionale behind the decisions made and actions taken by re- results with language models [84]. In NLP, language mod-
inforcement learning models within their specified environ- els can be classified into three categories: transparent archi-
ments [73, 74, 75]. Post-hoc explanations, such as SHAP and tectures, neural network (non-Transformer) architectures, and
LIME, can help us understand and explain which features are transformer architectures. Transparent models are straight-
most important for the decision-making process of an RL agent forward and easy to understand due to their clear processing
[76, 77]. Example-based explanation methods, like trajectory paths and direct interpretability. Models based on neural net-
analysis, help us to get insights into the decision-making pro- work (non-Transformer) architectures are often termed “black
cess of the RL model by examining specific trajectories, such boxes” due to their multi-layered structures and non-linear pro-
as sequences of states, actions, and rewards [78]. Visualiza- cessing. Transformer architectures utilize self-attention mech-
tion techniques enable us to understand and interpret the RL anisms to process sequences of data. The increased complexity
models by visually representing the model’s decision-making and larger number of parameters often make transformer-based
processing [79]. Several explainability methods exist to inter- models less interpretable, requiring advanced techniques to ex-
pret reinforcement learning models, including saliency maps, plain their decision-making processes.
12
Table 2: The XAI techniques discussed, the methods used, their advantages and disadvantages.
13
The success of XAI techniques used in NLP applications is such as text classification, sentiment analysis, topic modeling,
heavily dependent on the quality of preprocessing and the type named entity recognition, and language generation [51, 86].
of text data used [2, 85]. This is important because XAI is SHAP computes the feature importance scores by generating
critical to developing reliable and transparent NLP models that a set of perturbations that remove one or more words from the
can be employed for real-world applications by allowing us to input text data. For each perturbation, SHAP computes the dif-
understand how a model arrived at a particular decision. This ference between the expected model output when the word is
section reviews some of the most common XAI techniques for included or not included, which is known as the Shapley value.
NLP. Figure 9 presents a taxonomy of NLP explainability. This approach then computes the importance of each word in
the original input text by combining and averaging the Shap-
ley values of all the perturbations. Finally, SHAP visualizes
the feature importance scores to indicate which words are more
useful in the model prediction process.
LRP: To apply LRP to NLP, one must encode preprocessed
input text as a sequence of word representations, such as word
embeddings, and feed them to a neural network [87]. The net-
work processes the embeddings using multiple layers and pro-
duces the model’s prediction. LRP then computes the relevance
scores by propagating the model’s output back through the net-
work layers. The relevance score for each word is normalized
using the sum of all relevance scores and multiplied by the
weight of that word in the original input text. This score re-
Figure 9: Taxonomy of explainability in natural language processing.
flects its contribution to the final prediction.
Integrated Gradients: Integrated Gradients are used in NLP
tasks such as text classification, sentiment analysis, and text
5.1.1. Explaining Neural Networks and Fine-Tuned Trans- summarization [88]. The Integrated Gradients technique com-
former Models: Insights and Techniques putes the integral of the gradients of the model’s prediction with
Transparent models are easy to interpret because their inter- the corresponding input text embeddings along the path from
nal mechanisms and decision-making processes are designed a baseline input to the original input. The baseline input is
to be inherently understandable [10]. Perturbation-based and a neutral or zero-embedding version of the input text that has
gradient-based techniques are the most commonly employed no relevant features for the prediction task. The difference be-
approaches for explaining neural network-based models and tween the input word embeddings and the baseline embeddings
fine-tuned transformer-based models. In this subsection, we is then multiplied by the integral to find the attribution scores
discuss some of the most common XAI techniques for neural for each word, which indicate the relevance of each word to the
network-based models and fine-tune Transformer-based mod- model’s prediction. Integrated Gradients output a heatmap that
els, used in NLP. highlights the most important words in the original input text
LIME: Discussed in Subsection 4.1.1, selects a feature, such based on the attribution scores [66]. This provides a clear visu-
as a word, from the original input text data and generates many alization of the words that were most significant for the model’s
perturbations around that feature by randomly removing or re- prediction, allowing users to understand how the model made
placing other features (i.e., other words). LIME trains a sim- that particular decision. IG can be used to identify the most
pler and explainable model using the perturbed data to generate important features in a sentence, understand how the model’s
feature importance scores for each word in the original input predictions change when different input features are changed,
text [26]. These scores indicate the contribution of each word and improve the transparency and interpretability of NLP mod-
to the black-box model prediction. LIME identifies and high- els [89].
lights the important words to indicate the impact of the model’s
prediction, as shown in Figure 10. 5.1.2. Prompt-Based Explainability for Transformer Models
In this subsection, we discuss some of the most com-
mon prompt-based explanation techniques, including Chain
of Thought (CoT), In-Context Learning (ICL), and interactive
prompts.
Chain of Thought: In the context of a large language model
(LLM) such as GPT-3[90], Chain of Thought prompts refer to
the input sequences intended to instruct the model using a se-
ries of intermediate reasoning steps to generate a coherent out-
Figure 10: LIME feature importance scores visualization. put [91, 92]. This technique helps enhance task performance
by providing a clear sequence of reasoning steps, making the
SHAP: SHAP [52] is a widely-used XAI technique in NLP model’s thought process more understandable to the audience
14
[93]. The gradient-based studies explored the impact of change- mantic priors and struggle to learn new mappings through the
of-thought prompting on the internal workings of LLMs by in- flipped labels. This learning capability demonstrates symbolic
vestigating the saliency scores of input tokens [94]. The scores reasoning in LLMs that extends beyond semantic priors, show-
are computed by identifying the input tokens (words or phrases) ing their ability to adapt to new, context-specific rules in input
and inputting them into the model to compute the output. The prompts, even when these rules are completely new or contra-
influence of each token is then calculated through backpropaga- dict pre-trained knowledge. Another study explores the work-
tion, utilizing the gradients of the model. The score reveals the ings of ICL in large language models by employing contrastive
impact of each input token on the model’s decision-making pro- demonstrations and analyzing saliency maps, focusing on sen-
cess at every intermediate step. By analyzing the step-by-step timent analysis tasks [99]. In this research, contrastive demon-
intermediate reasoning, users can gain a better understanding strations involve manipulating the input data through various
of how the model arrived at its decision, making it easier to approaches, such as flipping labels (from positive to negative or
interpret and trust the model’s outputs. vice versa), perturbing input text (altering the words or struc-
Perturbation-based studies on Chain of Thought explanation ture of the input sentences without changing their overall senti-
through the introduction of errors in few-shot prompts have pro- ment), and adding complementary explanations (providing con-
vided valuable insights into the internal working mechanism text and reasons along with the input text and flipped labels).
behind large language models [95, 96]. Counterfactual prompts Saliency maps are then applied to identify the parts of the input
have been suggested as a method of altering critical elements text that are most significant to the model’s decision-making
of a prompt, such as patterns and text, to assess their impact process. This method facilitates visualization of the impact that
on the output [95]. The study demonstrated that intermediate contrastive demonstrations have on the model’s behavior. The
reasoning steps primarily guide replicating patterns, text, and study revealed that the impact of contrastive demonstrations on
structures into factual answers. Measuring the faithfulness of model behavior varies depending on the size of the model and
a CoT, particularly within the context of LLMs, involves as- the nature of the task. This indicates that explaining in-context
sessing the accuracy and consistency with which the explana- learning’s effects requires a nuanced understanding that consid-
tions and reasoning process align with established facts, logical ers both the model’s architectural complexities and the specific
principles, and the predominant task objectives [97]. Several characteristics of the task at hand.
key factors are crucial when evaluating CoT faithfulness, in- ICL allows large language models to adapt their responses to
cluding logical consistency, factuality, relevance, completeness, the examples or instructions provided within the input prompts.
and transparency [97]. The assessment often requires qualita- Explainability efforts in LLMs aim to reveal how these models
tive evaluations by human judges and quantitative metrics that interpret and leverage in-context prompts, employing various
can be automatically calculated. The development of models techniques, such as saliency maps, contrastive demonstrations,
to measure faithfulness and the design of evaluation methods and feature attribution, to shed light on LLMs’ decision-making
remains an area of active research. processes. Understanding the workings of ICL in LLMs is cru-
Explaining In-Context Learning: In-context learning is a cial for enhancing model transparency, optimizing prompt de-
powerful mechanism for adapting the model’s internal behav- sign, and ensuring the reliability of model outputs across vari-
ior to the immediate context provided in the input prompt. ICL ous applications.
operates by incorporating examples or instructions directly into Explaining Interactive Prompt: Explaining Interactive
the prompt, guiding the model toward generating the desired Prompt is a technique that focuses on designing and us-
output for a specific task. This approach enables the model to ing prompts to interact effectively with large language mod-
understand and generate responses that are relevant to the speci- els [92, 100]. This method involves designing prompts that
fied task by leveraging the contextual prompts directly from the dynamically direct the conversation toward specific topics or
input. Several studies have focused on the explainability of how solicit explanations. Through the use of strategically designed
in-context learning influences the behavior of large language prompts, users can navigate the conversation with a model to
models, applying various techniques and experimental setups achieve more meaningful and insightful interactions, enhanc-
to elucidate this process. A recent study explores a critical as- ing the understanding of the model’s reasoning and decision-
pect of how ICL operates in large language models, focusing making process.
on the balance between leveraging semantic priors from pre- Several studies use various approaches to analyze and en-
training and learning new input-label mappings from examples hance the effectiveness of explaining interactive prompts. A
provided within prompts [98]. The study aims to understand study called TalkToModel introduced an interactive dialogue
whether the LLMs’ capability to adapt to new tasks through system designed to explain machine learning models under-
in-context learning is primarily due to the semantic priors ac- standable through natural language conversations or interactive
quired during pre-training or if they can learn new input-label prompts [100]. It evaluates the system’s language understand-
mappings directly from the examples provided in the prompts. ing capabilities, increasing deeper and more meaningful inter-
The experimental results revealed nuanced capabilities across actions between users and models through interactive prompts.
LLMs of different sizes. This approach enhances the interpretability of complex ma-
Larger LLMs showed a remarkable capability to override chine learning models’ behaviors and the model’s decision-
their semantic priors and learn new, contradictory input-label making process. The study called Prompt Pattern Catalog in-
mappings. In contrast, smaller LLMs rely more on their se- troduced a catalog designed to enhance prompt engineering by
15
systematically organizing and discussing various strategies for
constructing prompts [92]. This catalog aims to explain the
decision-making process of models more clearly. It provides
insights and methodologies for eliciting detailed, accurate, and
interpretable responses from models, thus improving the under-
standing of model behavior and decision-making logic.
Data
No. Medical data XAI Techniques Application Areas Benefits Papers
Type
X-rays, ultra- LRP, LIME, CAM, Radiology, Pathology, Interpretable
[140, 141,
1 Image sound, MRI and Saliency Maps, and Dermatology, Ophthal- image anal-
142, 143]
CT scans, etc Integrated Gradients mology, and Cardiology ysis
Clinical text, LIME, SHAP, Atten-
Electronic Health tion Mechanism, and Drug Safety, and Medical Interpretable [144, 145,
2 Text
Records (EHRs), Counterfactual Expla- Research text analysis 146]
and case studies nations
LIME, SHAP, Deci-
Patient de-
sion Trees, Rule-based
mographics, Patient Health Monitor- Interpretable
Structured Systems, Counterfac-
laboratory test ing and Management, Epi- structured
3 (Nu- tual Explanations, [7, 146, 147]
results, pharmacy demiology, and Clinical data analy-
meric) Integrated Gradients,
records, billing Trials and Research sis
and BERT Explana-
and claims
tions
Neurology and EEG
ECGs, EEGSs, TSViz, LIME, CAM,
Monitoring, Patient Mon- Interpretable
Time se- monitoring and SHAP, Feature Impor-
4 itoring in Critical Care, time series [148, 149]
ries wearable device tance, and Temporal
and Cardiology and Heart analysis
data Output Explanation
Health Monitoring
LIME, SHAP, At-
Cancer Diagnosis and
Telemedicine tention Mechanisms, Interpretable
Multi- Treatment, Neurology and
5 interactions (text, Multi-modal Fusion multi-modal [150, 151]
modal Brain Research, and Men-
audio, video) and Cross-modal analysis
tal Health and Psychiatry
Explanations
Sensitivity Analysis,
Genomic Medicine, On- Interpretable
LIME, SHAP, and
6 Genetic Genetic makeup cology, and Prenatal and genetic [152, 153]
Gene Expression
Newborn Screening analysis
Network Visualization
Saliency Maps, LRP,
Cardiology, Pulmonology, Interpretable
Heart and lung SHAP, LIME, and
7 Audio Mental Health, and Sleep audio analy- [154, 155]
sounds Temporal Output
Medicine sis
Explanation
cuses on the sensitivity and importance of medical information, 5.5. Explainability in Autonomous Vehicles
the decision of the AI model, and the explanations of the de-
Autonomous vehicles use complex and advanced AI systems
ployed XAI system. Transparency and accountability [22], fair-
by integrating several deep-learning models that can handle var-
ness and bias mitigation [165], ethical frameworks and guide-
ious data types, such as images, videos, audio, and informa-
lines [167], privacy and confidentiality [168] are some of the
tion from LIDAR and radar [170]. These models utilize inputs
key ethical aspects of XAI in healthcare. Medical data are com-
from diverse sources, including cameras, sensors, and GPS, to
plex and diverse, requiring the use of a variety of XAI tech-
deliver safe and accurate navigation. A crucial consideration
niques to interpret it effectively. Table 3 summarizes various
is determining which data are most critical. What information
XAI techniques and their application areas in healthcare. XAI
takes precedence, and why? Understanding the importance of
faces several challenges in healthcare, such as the complexity
different data types is key to enhancing our models and learn-
and diversity of medical data, the complexity of AI models,
ing effectively from the gathered information [171]. To address
updating XAI explanations in line with the dynamic nature of
these questions and better decision-making processes of black-
healthcare, the need for domain-specific knowledge, balancing
box AI models, developers use the XAI approach to evaluate
accuracy and explainability, and adhering to ethical and legal
the AI systems. Implementing XAI in autonomous vehicles
implications [169, 21].
significantly contributes to human-centered design by promot-
ing trustworthiness, transparency, and accountability. This ap-
proach considers various perspectives, including psychological,
sociotechnical, and philosophical dimensions, as highlighted in
20
Shahin et al. [172]. Table 4 shows a summary of various XAI tively manage the complexities encountered in these situations.
techniques in autonomous vehicles, including visual, spatial, Human-AI decision-making (collaboration): In recent au-
temporal, audio, environmental, communication, genetic, and tonomous vehicle systems, machine learning models are uti-
textual. The advancements significantly enhance the AI-driven lized to assist users in making final judgments or decisions,
autonomous vehicle system, resulting in a multitude of com- representing a form of collaboration between humans and AI
prehensive, sustainable benefits for all stakeholders involved, systems [183]. With XAI, these systems can foster appropriate
as follows. reliance, as decision-makers may be less inclined to follow an
Trust: User trust is pivotal in the context of autonomous AI prediction if an explanation reveals flawed model reasoning
vehicles. Ribeiro et al. [26] emphasized this by stating: “If [184]. From the users’ perspective, XAI helps to build trust and
users do not trust a model or its predictions, they will not use confidence through this collaboration. In contrast, in terms of
it”. This underscores the essential need to establish trust in the developers and engineers, XAI helps to debug the model, iden-
models we use. XAI can significantly boost user trust by pro- tify the potential risks, and enhance the models and the vehicle
viding clear and comprehensible explanations of system pro- technology [185].
cesses [173]. Israelson et al. [174] highlighted the critical need
for algorithmic assurance to foster trust in human-autonomous 5.6. Explainability in AI for Chemistry and Material Science
system relationships, as evidenced in their thorough analysis.
In chemistry and material science, AI models are becoming
The importance of transparency in critical driving decisions,
increasingly sophisticated, enhancing their capability to predict
noting that such clarity is crucial for establishing trust in the
molecular structures, chemical reactions, and material behav-
autonomous capabilities of self-driving vehicles [175].
iors, as well as discover new materials [195]. Explainability
Safety and reliability: These are critical components and
in chemistry and material science increases beyond simply un-
challenges in developing autonomous driving technology [176].
derstanding and analyzing model outputs; it encompasses un-
Under the US Department of Transportation, the American Na-
derstanding the rationale behind the model predictions [196].
tional Highway Traffic Safety Administration (NHTSA) has es-
XAI techniques play a crucial role in obtaining meaningful in-
tablished specific federal guidelines for automated vehicle pol-
sights and causal relationships, interpreting complex molecular
icy to enhance traffic safety, as outlined in their 2016 policy
behaviors, optimizing material properties, and designing inno-
document [177]. In a significant development in March 2022,
vative materials through the application of AI models [197]. By
the NHTSA announced a policy shift allowing automobile man-
explaining how and why machine learning models make pre-
ufacturers to produce fully autonomous vehicles without tradi-
dictions or decisions, researchers and practitioners in the field
tional manual controls, such as steering wheels and brake ped-
can more confidently trust machine learning models for ana-
als, not only in the USA but also in Canada, Germany, the UK,
lytical investigations and innovations. This understanding is
Australia, and Japan [172]. Following this, The International
important to increasing trust, facilitating insights and discov-
Organization for Standardization (ISO) responded by adopting
eries, enabling validation and error analysis, and dealing with
a series of standards that address the key aspects of automated
regulatory and ethical considerations in AI models [198]. The
driving. These standards are designed to ensure high levels of
study, “CrabNet for Explainable Deep Learning in Materials
safety, quality assurance, efficiency, and the promotion of an
Science” [199], focuses on improving the compositionally re-
environmentally friendly transport system [178]. Besides, Kim
stricted attention-based network to produce meaningful mate-
et al. [179, 180] described the system’s capability to perceive
rial property-specific element representations. These represen-
and react to its environment: The system can interpret its oper-
tations facilitate the exploration of elements’ identities, similar-
ational surroundings and explain its actions, such as “stopping
ities, interactions, and behaviors within diverse chemical envi-
because the red signal is on”.
ronments [199]. Various model-agnostic and model-specific in-
Regulatory compliance and accountability: Public insti-
terpretability methods are employed in chemistry and material
tutions at both national and international levels have responded
science to explain the prediction of black-box models’ molec-
by developing regulatory frameworks aimed at overseeing these
ular structure, chemical reactions, and the relationship between
data-driven systems [172]. The foremost goal of these regula-
chemical composition [200, 201, 202] and design of new mate-
tions is to protect stakeholders’ rights and ensure their authority
rials [197].
over personal data. The European Union’s General Data Protec-
tion Regulation (GDPR) [181] exemplifies this, establishing the
“right to an explanation” for users. This principle underscores 5.7. Explainability in Physics-Aware AI
the importance of accountability, which merges social expec- Physics-aware artificial intelligence focuses on integrating
tations with legislative requirements in the autonomous driving physical laws and principles into machine learning models to
domain. XAI plays a pivotal role by offering transparent and in- enhance the predictability and robustness of AI models [203].
terpretable insights into the AI decision-making process, ensur- Explainability in physics-aware AI is crucial for understanding
ing compliance with legal and ethical standards. Additionally, and interpreting these models. It also bridges the gap between
achieving accountability is vital for addressing potential liabil- the black-box nature of AI models and physical understand-
ity and responsibility issues, particularly in post-accident in- ing, making them more transparent and trustworthy [204]. Sev-
vestigations involving autonomous vehicles, as highlighted by eral approaches exist to explain physics-aware AI models [205].
Burton et al. [182]. Clear accountability is essential to effec- Domain-specific explanation methods are designed for specific
21
Table 4: Summarize XAI techniques in Autonomous Vehicles.
AI XAI
No. Input Types Sources Key Benefits Papers
models Techniques
Enhancing visual environmental inter-
LRP, CAM,
Camera action, understanding dynamic driving
CNN, Saliency Maps,
Images scenarios, allowing correct object de-
ViTs, Integrated [186,
1 Visual Data and tection, interpreting real-time decision-
RNN, or Gradients, 187]
Video making, and adaptive learning process by
LSTM Counterfactual
Streams providing insights into AI’s model predic-
Explanations
tion.
Enhancing 3D space and object inter-
Feature Impor- actions, improving safety, security, reli-
LIDAR
CNN, tance, SHAP, ability, design, development, and trou- [188,
2 Spatial Data and
DNN CAM, LRP, bleshooting of the car by providing in- 189]
Radar
LIME sights into the AI’s spatial data processing
and model decision-making
TSViz, LIME, provides insights into time-series data for
CAM, SHAP, reliable decision-making, identifying po-
Temporal RNN,
LRP Feature tential safety issues, enhancing overall ve- [190,
3 Data (Time- Sensor LSTM,
Importance, hicle safety, and a deeper interpretation of 191]
Series) GRU
Temporal Out- the vehicle’s actions over time for post-
put Explanation incident analysis
enhancing the vehicle’s ability to under-
LRP, CAM,
stand and react to auditory signals, im-
Auditory Micro- CNN, Saliency Maps,
4 proving safety and security by provid- [172]
Data phone RNN Attention Visu-
ing insights into AI’s audio data decision-
alization
making process
Rule-based Enhanced decision-making, improving
Environmental GNN,
GPS and Explanations, safety and efficiency, increasing safety,
Data Random
Cloud- Decision Trees, security, and reliability by providing [172,
5 (Weather Forest,
based SHAP, LIME, insights into environmental factors and 192]
& Geoloca- Gradient
services Counterfactual diverse conditions in the AI model’s
tion) Boosting
Explanations decision-making process
Engine LRP, LIME, Helping to interpret engine data and ve-
Control SHAP, De- hicle status information, predicting main-
Vehicle
Unit, cision Trees, tenance and potential issues, improv-
Telematics DNN,
6 On-Board Rule-based ing safety, reducing risk, and providing [193]
(engine & in- SVM
Diag- Explanations, clear vehicle health status through insights
ternal status)
nostics, Counterfactual gained from the AI model decision-making
Sensors Explanations process
Personal
Communica- Provide insights into model decision-
devices, LRP, Saliency
tion Data Reinfor- making that enhances trust and safety, in-
Cloud, Maps, Coun-
7 (Vehicle-to- cement teractions with external factors, and im- [194]
Vehicles, terfactual
Everything or Learning proving decision-making by interpreting
Cellular Explanations
V2X) complex V2X communications
networks
Provides textual explanations to model
Generative Lan-
users that enable them to interpret di-
8 All All All guage Model [185]
versified datasets and complex AI model
Explanation
decision-making process
domains, such as fluid dynamics, quantum mechanics, or ma- els to ensure their decisions are understandable in various sce-
terial science [206, 207]. Model-agnostic explanations are also narios regardless of the specific model architecture [208, 209].
used to explain the general behavior of physics-aware AI mod- In the context of physics-aware AI, explainability offers several
22
key advantages, such as enhancing trust and interpretability, represented by a clear and interpretable visualization mecha-
ensuring physically plausible predictions, improving model ar- nism. The explanations should build the end-users’ trust by pro-
chitecture and debugging, providing domain-specific insights, viding a transparent, consistent, and reliable decision-making
bridging knowledge gaps, ensuring regulatory compliance, and process for the model [215]. Hence, the designed evaluation
facilitating human-AI collaboration [204, 210]. methods ensure the users’ trust and satisfaction level through
surveys, interviews, questionnaires, behavioral analysis, and
other handy tools. The users’ satisfaction is the most essential
6. XAI Evaluation Methods aspect of evaluation methods. The XAI system can be evalu-
ated by collecting the users’ feedback and assessing their emo-
XAIs are essential in today’s advancing AI world to ensure tional responses [216]. Ease of using the XAI system and the
trust, transparency, and understanding of AI ethical decision- usefulness of the generated explanations to end users also pro-
making, particularly in sensitive domains like healthcare, fi- vide insights into the values of the explanation and XAI system.
nance, military operation, autonomous systems, and legal is- XAI systems can be assessed by evaluating how effectively the
sues. However, we need evaluation mechanisms to measure the generated explanations support users’ decision-making. How
generated explanations to ensure their quality, usefulness, and much do the XAI explanations apply to their decision-making
trustworthiness. XAI system evaluation methods are classified process, reduce errors, and enhance productivity? The human-
into human-centered and computer-centered categories based centered evaluation method also assesses the cognitive load that
on their applications and methodologies to judge the effective- ensures the provided XAI explanations do not affect the user’s
ness of XAI techniques [211, 212]. Figure 13 shows a simple cognitive processing capacity [215].
taxonomy of XAI evaluation methods.
6.2.1. Fidelity
Fidelity refers to how the provided XAI explanations are
close to the actual decision made by a model focusing on the ac-
curacy of representation, quantitative measurement, and com-
plex model handling [219]. Does the explanation reflect the
accurate reasoning decision process of a model? Does the ex-
planation contain essential information about complex models,
such as deep learning models? Hence, high fidelity reflects that
the explanation is an accurate interpretation. Fidelity is com-
puted at the instance level using the following formula [220]:
n
X Y(xi ) − Y(xi′ )
S =1− , (20)
i=1
|Y(xi )|
Figure 13: A suggested classification framework for assessing the efficacy of
XAI systems (adapted from [211]). where n is total number of inputs, x is the original input for the
process instance, X ′ is the set of perturbations for x and x′ ∈ X ′ ,
Y(x) is the model output given input x, and Y(x′ ) is the model
6.1. Human-Centered Approach output given input x′ .
26
tematic review of trustworthy and Explainable Artificial Intelligence in [46] J. D. Fuhrman, N. Gorre, Q. Hu, H. Li, I. El Naqa, M. L. Giger, A
healthcare: Assessment of quality, bias risk, and data fusion, Informa- review of explainable and interpretable AI with applications in COVID-
tion Fusion (2023). 19 imaging, Medical Physics 49 (1) (2022) 1–14.
[23] A. Saranya, R. Subhashini, A systematic review of Explainable Artificial [47] D. K. Gurmessa, W. Jimma, A comprehensive evaluation of explainable
Intelligence models and applications: Recent developments and future Artificial Intelligence techniques in stroke diagnosis: A systematic re-
trends, Decision Analytics Journal (2023) 100230. view, Cogent Engineering 10 (2) (2023) 2273088.
[24] L. Longo, M. Brcic, F. Cabitza, J. Choi, R. Confalonieri, J. Del Ser, [48] A. Das, P. Rad, Opportunities and challenges in Explainable Artificial
R. Guidotti, Y. Hayashi, F. Herrera, A. Holzinger, et al., Explainable Intelligence (XAI): A survey, arXiv preprint arXiv:2006.11371 (2020).
artificial intelligence (XAI) 2.0: A manifesto of open challenges and [49] R. Marcinkevičs, J. E. Vogt, Interpretability and explainability: A ma-
interdisciplinary research directions, Information Fusion (2024) 102301. chine learning zoo mini-tour, arXiv preprint arXiv:2012.01805 (2020).
[25] N. Bostrom, E. Yudkowsky, The ethics of Artificial Intelligence, in: Ar- [50] C. Rudin, Stop explaining black box machine learning models for high
tificial Intelligence Safety and Security, Chapman and Hall/CRC, 2018, stakes decisions and use interpretable models instead, Nature Machine
pp. 57–69. Intelligence 1 (5) (2019) 206–215.
[26] M. T. Ribeiro, S. Singh, C. Guestrin, “Why should I trust you?” Ex- [51] M. T. Ribeiro, S. Singh, C. Guestrin, Model-agnostic interpretability of
plaining the predictions of any classifier, in: Proceedings of the 22nd machine learning, arXiv preprint arXiv:1606.05386 (2016).
ACM SIGKDD International Conference on Knowledge Discovery and [52] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model
Data Mining, 2016, pp. 1135–1144. predictions, Advances in Neural Information Processing Systems 30
[27] I. El Naqa, M. J. Murphy, What is machine learning?, Springer, 2015. (2017).
[28] J. H. Moor, Three myths of computer science, The British Journal for [53] M. Ancona, E. Ceolini, C. Öztireli, M. Gross, Towards better under-
the Philosophy of Science 29 (3) (1978) 213–222. standing of gradient-based attribution methods for deep neural networks,
[29] A. Saxe, S. Nelli, C. Summerfield, If deep learning is the answer, what arXiv preprint arXiv:1711.06104 (2017).
is the question?, Nature Reviews Neuroscience 22 (1) (2021) 55–67. [54] H. Chefer, S. Gur, L. Wolf, Transformer interpretability beyond attention
[30] D. Castelvecchi, Can we open the black box of AI?, Nature News visualization, in: Proceedings of the IEEE/CVF Conference on Com-
538 (7623) (2016) 20. puter Vision and Pattern Recognition, 2021, pp. 782–791.
[31] D. Doran, S. Schulz, T. R. Besold, What does explainable AI re- [55] A. Ali, T. Schnake, O. Eberle, G. Montavon, K.-R. Müller, L. Wolf, XAI
ally mean? A new conceptualization of perspectives, arXiv preprint for Transformers: Better explanations through conservative propagation,
arXiv:1710.00794 (2017). in: International Conference on Machine Learning, PMLR, 2022, pp.
[32] P. P. Angelov, E. A. Soares, R. Jiang, N. I. Arnold, P. M. Atkinson, Ex- 435–451.
plainable artificial intelligence: an analytical review, Wiley Interdisci- [56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
plinary Reviews: Data Mining and Knowledge Discovery 11 (5) (2021) Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in Neural
e1424. Information Processing Systems 30 (2017).
[33] F.-L. Fan, J. Xiong, M. Li, G. Wang, On interpretability of artificial [57] M. T. Ribeiro, S. Singh, C. Guestrin, Anchors: High-precision model-
neural networks: A survey, IEEE Transactions on Radiation and Plasma agnostic explanations, in: Proceedings of the AAAI Conference on Ar-
Medical Sciences 5 (6) (2021) 741–760. tificial Intelligence, Vol. 32, 2018.
[34] H. K. Dam, T. Tran, A. Ghose, Explainable software analytics, in: Pro- [58] M. Ancona, C. Oztireli, M. Gross, Explaining deep neural networks with
ceedings of the 40th International Conference on Software Engineering: a polynomial time algorithm for shapley value approximation, in: Inter-
New Ideas and Emerging Results, 2018, pp. 53–56. national Conference on Machine Learning, PMLR, 2019, pp. 272–281.
[35] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso- [59] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations with-
Moral, R. Confalonieri, R. Guidotti, J. Del Ser, N. Dı́az-Rodrı́guez, out opening the black box: Automated decisions and the GDPR, Harv.
F. Herrera, Explainable artificial intelligence (xai): What we know and JL & Tech. 31 (2017) 841.
what is left to attain trustworthy artificial intelligence, Information fu- [60] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional net-
sion 99 (2023) 101805. works: Visualising image classification models and saliency maps, arXiv
[36] Y. Zhang, Q. V. Liao, R. K. Bellamy, Effect of confidence and expla- preprint arXiv:1312.6034 (2013).
nation on accuracy and trust calibration in AI-assisted decision making, [61] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller,
in: Proceedings of the 2020 Conference on Fairness, Accountability, and W. Samek, On pixel-wise explanations for non-linear classifier decisions
Transparency, 2020, pp. 295–305. by layer-wise relevance propagation, PloS one 10 (7) (2015) e0130140.
[37] M. I. Jordan, T. M. Mitchell, Machine learning: Trends, perspectives, [62] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, K.-R. Müller,
and prospects, Science 349 (6245) (2015) 255–260. Layer-wise relevance propagation: an overview, Explainable AI: inter-
[38] Y. Zhang, P. Tiňo, A. Leonardis, K. Tang, A survey on neural network in- preting, explaining and visualizing deep learning (2019) 193–209.
terpretability, IEEE Transactions on Emerging Topics in Computational [63] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep
Intelligence 5 (5) (2021) 726–742. features for discriminative localization, in: Proceedings of the IEEE
[39] F. Doshi-Velez, B. Kim, Towards a rigorous science of interpretable ma- Conference on Computer Vision and Pattern Recognition, 2016, pp.
chine learning, arXiv preprint arXiv:1702.08608 (2017). 2921–2929.
[40] Q. Zhang, Y. N. Wu, S.-C. Zhu, Interpretable convolutional neural net- [64] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep net-
works, in: Proceedings of the IEEE Conference on Computer Vision and works, in: International Conference on Machine Learning, PMLR, 2017,
Pattern Recognition, 2018, pp. 8827–8836. pp. 3319–3328.
[41] W. Samek, G. Montavon, A. Vedaldi, L. K. Hansen, K.-R. Müller, Ex- [65] H. Chefer, S. Gur, L. Wolf, Generic attention-model explainability for
plainable AI: interpreting, explaining and visualizing deep learning, Vol. interpreting bi-modal and encoder-decoder transformers, in: Proceed-
11700, Springer Nature, 2019. ings of the IEEE/CVF International Conference on Computer Vision,
[42] D. Amodei, C. Olah, J. Steinhardt, P. Christiano, J. Schulman, D. Mané, 2021, pp. 397–406.
Concrete problems in AI safety, arXiv preprint arXiv:1606.06565 [66] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features
(2016). through propagating activation differences, in: International Conference
[43] B. J. Dietvorst, J. P. Simmons, C. Massey, Algorithm aversion: people on Machine Learning, PMLR, 2017, pp. 3145–3153.
erroneously avoid algorithms after seeing them err., Journal of Experi- [67] E. Voita, D. Talbot, F. Moiseev, R. Sennrich, I. Titov, Analyzing multi-
mental Psychology: General 144 (1) (2015) 114. head self-attention: Specialized heads do the heavy lifting, the rest can
[44] Z. C. Lipton, The mythos of model interpretability: In machine learning, be pruned, arXiv preprint arXiv:1905.09418 (2019).
the concept of interpretability is both important and slippery., Queue [68] Z. Wu, D. C. Ong, On explaining your explanations of BERT: An empir-
16 (3) (2018) 31–57. ical study with sequence classification, arXiv preprint arXiv:2101.00196
[45] G. Montavon, W. Samek, K.-R. Müller, Methods for interpreting and un- (2021).
derstanding deep neural networks, Digital Signal Processing 73 (2018) [69] S. Abnar, W. Zuidema, Quantifying attention flow in transformers, arXiv
1–15. preprint arXiv:2005.00928 (2020).
27
[70] C. Rana, M. Dahiya, et al., Safety of autonomous systems using rein- [93] Y. W. Jie, R. Satapathy, G. S. Mong, E. Cambria, et al., How inter-
forcement learning: A comprehensive survey, in: 2023 International pretable are reasoning explanations from prompting large language mod-
Conference on Advances in Computation, Communication and Infor- els?, arXiv preprint arXiv:2402.11863 (2024).
mation Technology (ICAICCIT), IEEE, 2023, pp. 744–750. [94] S. Wu, E. M. Shen, C. Badrinath, J. Ma, H. Lakkaraju, Analyzing chain-
[71] C. Yu, J. Liu, S. Nemati, G. Yin, Reinforcement learning in healthcare: of-thought prompting in large language models via gradient-based fea-
A survey, ACM Computing Surveys (CSUR) 55 (1) (2021) 1–36. ture attributions, arXiv preprint arXiv:2307.13339 (2023).
[72] Y. Ye, X. Zhang, J. Sun, Automated vehicle’s behavior decision mak- [95] A. Madaan, A. Yazdanbakhsh, Text and patterns: For effective chain of
ing using deep reinforcement learning and high-fidelity simulation envi- thought, it takes two to tango, arXiv preprint arXiv:2209.07686 (2022).
ronment, Transportation Research Part C: Emerging Technologies 107 [96] B. Wang, S. Min, X. Deng, J. Shen, Y. Wu, L. Zettlemoyer, H. Sun,
(2019) 155–170. Towards understanding chain-of-thought prompting: An empirical study
[73] G. A. Vouros, Explainable deep reinforcement learning: state of the art of what matters, arXiv preprint arXiv:2212.10001 (2022).
and challenges, ACM Computing Surveys 55 (5) (2022) 1–39. [97] T. Lanham, A. Chen, A. Radhakrishnan, B. Steiner, C. Denison,
[74] P. Madumal, T. Miller, L. Sonenberg, F. Vetere, Explainable reinforce- D. Hernandez, D. Li, E. Durmus, E. Hubinger, J. Kernion, et al.,
ment learning through a causal lens, in: Proceedings of the AAAI Con- Measuring faithfulness in chain-of-thought reasoning, arXiv preprint
ference on Artificial Intelligence, Vol. 34, 2020, pp. 2493–2500. arXiv:2307.13702 (2023).
[75] E. Puiutta, E. M. Veith, Explainable reinforcement learning: A survey, [98] J. Wei, J. Wei, Y. Tay, D. Tran, A. Webson, Y. Lu, X. Chen, H. Liu,
in: International Cross-domain Conference for Machine Learning and D. Huang, D. Zhou, et al., Larger language models do in-context learn-
Knowledge Extraction, Springer, 2020, pp. 77–95. ing differently, arXiv preprint arXiv:2303.03846 (2023).
[76] A. Heuillet, F. Couthouis, N. Dı́az-Rodrı́guez, Collective explainable [99] Z. Li, P. Xu, F. Liu, H. Song, Towards understanding in-context learn-
AI: Explaining cooperative strategies and agent contribution in multia- ing with contrastive demonstrations and saliency maps, arXiv preprint
gent reinforcement learning with shapley values, IEEE Computational arXiv:2307.05052 (2023).
Intelligence Magazine 17 (1) (2022) 59–71. [100] D. Slack, S. Krishna, H. Lakkaraju, S. Singh, Explaining machine learn-
[77] A. Heuillet, F. Couthouis, N. Dı́az-Rodrı́guez, Explainability in deep ing models with interactive natural language conversations using Talk-
reinforcement learning, Knowledge-Based Systems 214 (2021) 106685. ToModel, Nature Machine Intelligence 5 (8) (2023) 873–883.
[78] G. Zhang, H. Kashima, Learning state importance for preference-based [101] C. Yeh, Y. Chen, A. Wu, C. Chen, F. Viégas, M. Wattenberg, Atten-
reinforcement learning, Machine Learning (2023) 1–17. tionVIX: A global view of Transformer attention, IEEE Transactions on
[79] L. Wells, T. Bednarz, Explainable AI and reinforcement learning—a sys- Visualization and Computer Graphics (2023).
tematic review of current approaches and trends, Frontiers in Artificial [102] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional
Intelligence 4 (2021) 550030. networks, in: Computer Vision–ECCV 2014: 13th European Confer-
[80] A. Alharin, T.-N. Doan, M. Sartipi, Reinforcement learning interpreta- ence, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I
tion methods: A survey, IEEE Access 8 (2020) 171058–171077. 13, Springer, 2014, pp. 818–833.
[81] V. Chamola, V. Hassija, A. R. Sulthana, D. Ghosh, D. Dhingra, B. Sik- [103] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for
dar, A review of trustworthy and Explainable Artificial Intelligence simplicity: The all convolutional net, arXiv preprint arXiv:1412.6806
(XAI), IEEE Access (2023). (2014).
[82] V. Lai, C. Chen, Q. V. Liao, A. Smith-Renner, C. Tan, Towards a sci- [104] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification
ence of Human-AI decision making: a survey of empirical studies, arXiv with deep convolutional neural networks, Communications of the ACM
preprint arXiv:2112.11471 (2021). 60 (6) (2017) 84–90.
[83] A. Torfi, R. A. Shirvani, Y. Keneshloo, N. Tavaf, E. A. Fox, Natural [105] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
language processing advancements by deep learning: A survey, arXiv nition, in: Proceedings of the IEEE Conference on Computer Vision and
preprint arXiv:2003.01200 (2020). Pattern Recognition, 2016, pp. 770–778.
[84] D. Jurafsky, J. H. Martin, Speech and Language Processing: An In- [106] S. Yang, P. Luo, C.-C. Loy, X. Tang, Wider face: A face detection bench-
troduction to Natural Language Processing, Computational Linguistics, mark, in: Proceedings of the IEEE Conference on Computer Vision and
and Speech Recognition. Pattern Recognition, 2016, pp. 5525–5533.
[85] J. P. Usuga-Cadavid, S. Lamouri, B. Grabot, A. Fortin, Using deep learn- [107] W. Yang, H. Huang, Z. Zhang, X. Chen, K. Huang, S. Zhang, Towards
ing to value free-form text data for predictive maintenance, International rich feature discovery with class activation maps augmentation for per-
Journal of Production Research 60 (14) (2022) 4548–4575. son re-identification, in: Proceedings of the IEEE/CVF Conference on
[86] S. Jain, B. C. Wallace, Attention is not explanation, arXiv preprint Computer Vision and Pattern Recognition, 2019, pp. 1389–1398.
arXiv:1902.10186 (2019). [108] P. Linardatos, V. Papastefanopoulos, S. Kotsiantis, Explainable AI: A
[87] S. Gholizadeh, N. Zhou, Model explainability in deep learning based review of machine learning interpretability methods, Entropy 23 (1)
natural language processing, arXiv preprint arXiv:2106.07410 (2021). (2020) 18.
[88] M. Sundararajan, A. Taly, Q. Yan, Axiomatic attribution for deep [109] D. Smilkov, N. Thorat, B. Kim, F. Viégas, M. Wattenberg, Smooth-
networks, in: D. Precup, Y. W. Teh (Eds.), Proceedings of the 34th grad: removing noise by adding noise, arXiv preprint arXiv:1706.03825
International Conference on Machine Learning, Vol. 70 of Proceedings (2017).
of Machine Learning Research, PMLR, 2017, pp. 3319–3328. [110] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,
URL https://proceedings.mlr.press/v70/ T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al.,
sundararajan17a.html An image is worth 16x16 words: Transformers for image recognition at
[89] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Ex- scale, arXiv preprint arXiv:2010.11929 (2020).
plaining nonlinear classification decisions with deep taylor decomposi- [111] S. Verma, V. Boonsanong, M. Hoang, K. E. Hines, J. P. Dickerson,
tion, Pattern Recognition 65 (2017) 211–222. C. Shah, Counterfactual explanations and algorithmic recourses for ma-
[90] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, chine learning: A review, arXiv preprint arXiv:2010.10596 (2020).
A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models [112] R. Guidotti, Counterfactual explanations and how to find them: litera-
are few-shot learners, Advances in neural information processing sys- ture review and benchmarking, Data Mining and Knowledge Discovery
tems 33 (2020) 1877–1901. (2022) 1–55.
[91] J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, [113] R. H. Shumway, D. S. Stoffer, D. S. Stoffer, Time series analysis and its
D. Zhou, et al., Chain-of-thought prompting elicits reasoning in large applications, Vol. 3, Springer, 2000.
language models, Advances in Neural Information Processing Systems [114] B. Lim, S. Zohren, Time-series forecasting with deep learning: a survey,
35 (2022) 24824–24837. Philosophical Transactions of the Royal Society A 379 (2194) (2021)
[92] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, 20200209.
J. Spencer-Smith, D. C. Schmidt, A prompt pattern catalog to enhance [115] R. Verma, J. Sharma, S. Jindal, Time Series Forecasting Using Machine
prompt engineering with chatgpt, arXiv preprint arXiv:2302.11382 Learning, in: Advances in Computing and Data Sciences: 4th Interna-
(2023). tional Conference, ICACDS 2020, Valletta, Malta, April 24–25, 2020,
28
Revised Selected Papers 4, Springer, 2020, pp. 372–381. [138] R. Hamamoto, Application of artificial intelligence for medical research
[116] W. Bao, J. Yue, Y. Rao, A deep learning framework for financial time (2021).
series using stacked autoencoders and long-short term memory, PloS [139] S. Bharati, M. R. H. Mondal, P. Podder, A Review on Explainable Arti-
one 12 (7) (2017) e0180944. ficial Intelligence for Healthcare: Why, How, and When?, IEEE Trans-
[117] C. Huntingford, E. S. Jeffers, M. B. Bonsall, H. M. Christensen, actions on Artificial Intelligence (2023).
T. Lees, H. Yang, Machine learning and artificial intelligence to aid cli- [140] L. Li, M. Xu, H. Liu, Y. Li, X. Wang, L. Jiang, Z. Wang, X. Fan,
mate change research and preparedness, Environmental Research Let- N. Wang, A large-scale database and a CNN model for attention-
ters 14 (12) (2019) 124007. based glaucoma detection, IEEE transactions on Medical Imaging 39 (2)
[118] A. Farahat, C. Reichert, C. M. Sweeney-Reed, H. Hinrichs, Convo- (2019) 413–424.
lutional neural networks for decoding of covert attention focus and [141] Z. Bian, S. Xia, C. Xia, M. Shao, Weakly supervised vitiligo segmenta-
saliency maps for EEG feature visualization, Journal of Neural Engi- tion in skin image through saliency propagation, in: 2019 IEEE Interna-
neering 16 (6) (2019) 066010. tional Conference on Bioinformatics and Biomedicine (BIBM), IEEE,
[119] T. Huber, K. Weitz, E. André, O. Amir, Local and global explanations 2019, pp. 931–934.
of agent behavior: Integrating strategy summaries with saliency maps, [142] S. Rajaraman, S. Candemir, G. Thoma, S. Antani, Visualizing and ex-
Artificial Intelligence 301 (2021) 103571. plaining deep learning predictions for pneumonia detection in pediatric
[120] A. A. Ismail, M. Gunady, H. Corrada Bravo, S. Feizi, Benchmarking chest radiographs, in: Medical Imaging 2019: Computer-Aided Diagno-
deep learning interpretability in time series predictions, Advances in sis, Vol. 10950, SPIE, 2019, pp. 200–211.
Neural Information Processing Systems 33 (2020) 6441–6452. [143] G. Yang, F. Raschke, T. R. Barrick, F. A. Howe, Manifold Learning
[121] J. Cooper, O. Arandjelović, D. J. Harrison, Believe the HiPe: Hierarchi- in MR spectroscopy using nonlinear dimensionality reduction and un-
cal perturbation for fast, robust, and model-agnostic saliency mapping, supervised clustering, Magnetic Resonance in Medicine 74 (3) (2015)
Pattern Recognition 129 (2022) 108743. 868–878.
[122] Z. Wang, W. Yan, T. Oates, Time series classification from scratch with [144] U. Ahmed, G. Srivastava, U. Yun, J. C.-W. Lin, EANDC: An explain-
deep neural networks: A strong baseline, in: 2017 International joint able attention network based deep adaptive clustering model for mental
Conference on Neural Networks (IJCNN), IEEE, 2017, pp. 1578–1585. health treatment, Future Generation Computer Systems 130 (2022) 106–
[123] J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Towards bet- 113.
ter analysis of deep convolutional neural networks, International Confer- [145] Y. Ming, H. Qu, E. Bertini, Rulematrix: Visualizing and understanding
ence on Learning Representations (ICLR) (2015). classifiers with rules, IEEE Transactions on Visualization and Computer
[124] W. Song, L. Liu, M. Liu, W. Wang, X. Wang, Y. Song, Representation Graphics 25 (1) (2018) 342–352.
learning with deconvolution for multivariate time series classification [146] N. Rane, S. Choudhary, J. Rane, Explainable Artificial Intelligence
and visualization, in: Data Science: 6th International Conference of Pio- (XAI) in healthcare: Interpretable Models for Clinical Decision Sup-
neering Computer Scientists, Engineers and Educators, ICPCSEE 2020, port, Available at SSRN 4637897 (2023).
Taiyuan, China, September 18-21, 2020, Proceedings, Part I 6, Springer, [147] H. Magunia, S. Lederer, R. Verbuecheln, B. J. Gilot, M. Koeppen, H. A.
2020, pp. 310–326. Haeberle, V. Mirakaj, P. Hofmann, G. Marx, J. Bickenbach, et al.,
[125] S. A. Siddiqui, D. Mercier, M. Munir, A. Dengel, S. Ahmed, Tsviz: Machine learning identifies ICU outcome predictors in a multicenter
Demystification of deep learning models for time-series analysis, IEEE COVID-19 cohort, Critical Care 25 (2021) 1–14.
Access 7 (2019) 67027–67040. [148] A. Raza, K. P. Tran, L. Koehl, S. Li, Designing ecg monitoring
[126] C. Labrı́n, F. Urdinez, Principal component analysis, in: R for Political healthcare system with federated transfer learning and explainable AI,
Data Science, Chapman and Hall/CRC, 2020, pp. 375–393. Knowledge-Based Systems 236 (2022) 107763.
[127] L. Van Der Maaten, Accelerating t-SNE using tree-based algorithms, [149] F. C. Morabito, C. Ieracitano, N. Mammone, An explainable Artificial
The Journal of Machine Learning Research 15 (1) (2014) 3221–3245. Intelligence approach to study MCI to AD conversion via HD-EEG pro-
[128] L. McInnes, J. Healy, J. Melville, UMAP: Uniform manifold ap- cessing, Clinical EEG and Neuroscience 54 (1) (2023) 51–60.
proximation and projection for dimension reduction, arXiv preprint [150] S. El-Sappagh, J. M. Alonso, S. R. Islam, A. M. Sultan, K. S. Kwak,
arXiv:1802.03426 (2018). A multilayer multimodal detection and prediction model based on ex-
[129] K. Agrawal, N. Desai, T. Chakraborty, Time series visualization using plainable artificial intelligence for Alzheimer’s disease, Scientific Re-
t-SNE and UMAP, Journal of Big Data 8 (1) (2021) 1–21. ports 11 (1) (2021) 2660.
[130] A. Roy, L. v. d. Maaten, D. Witten, UMAP reveals cryptic population [151] G. Yang, Q. Ye, J. Xia, Unbox the black-box for the medical explainable
structure and phenotype heterogeneity in large genomic cohorts, PLoS AI via multi-modal and multi-centre data fusion: A mini-review, two
genetics 16 (3) (2020) e1009043. showcases and beyond, Information Fusion 77 (2022) 29–52.
[131] M. Munir, Thesis approved by the Department of Computer Science of [152] J. B. Awotunde, E. A. Adeniyi, G. J. Ajamu, G. B. Balogun, F. A.
the TU Kaiserslautern for the award of the Doctoral Degree doctor of Taofeek-Ibrahim, Explainable Artificial Intelligence in Genomic Se-
engineering, Ph.D. thesis, Kyushu University, Japan (2021). quence for Healthcare Systems Prediction, in: Connected e-Health: In-
[132] E. Mosqueira-Rey, E. Hernández-Pereira, D. Alonso-Rı́os, J. Bobes- tegrated IoT and Cloud Computing, Springer, 2022, pp. 417–437.
Bascarán, Á. Fernández-Leal, Human-in-the-loop machine learning: a [153] A. Anguita-Ruiz, A. Segura-Delgado, R. Alcalá, C. M. Aguilera, J. Al-
state of the art, Artificial Intelligence Review (2022) 1–50. calá-Fdez, eXplainable Artificial Intelligence (XAI) for the identifica-
[133] U. Schlegel, D. A. Keim, Time series model attribution visualizations tion of biologically relevant gene expression patterns in longitudinal hu-
as explanations, in: 2021 IEEE Workshop on TRust and EXpertise in man studies, insights from obesity research, PLoS Computational Biol-
Visual Analytics (TREX), IEEE, 2021, pp. 27–31. ogy 16 (4) (2020) e1007792.
[134] G. Plumb, S. Wang, Y. Chen, C. Rudin, Interpretable decision sets: A [154] A. Troncoso-Garcı́a, M. Martı́nez-Ballesteros, F. Martı́nez-Álvarez,
joint framework for description and prediction, in: Proceedings of the A. Troncoso, Explainable machine learning for sleep apnea prediction,
24th ACM SIGKDD International Conference on Knowledge Discovery Procedia Computer Science 207 (2022) 2930–2939.
& Data Mining, ACM, 2018, pp. 1677–1686. [155] E. Tjoa, C. Guan, A survey on Explainable Artificial Intelligence (XAI):
[135] Z. C. Lipton, D. C. Kale, R. Wetzel, et al., Modeling missing data in clin- Toward medical XAI, IEEE Transactions on Neural Networks and
ical time series with rnns, Machine Learning for Healthcare 56 (2016) Learning Systems 32 (11) (2020) 4793–4813.
253–270. [156] J. Liao, X. Li, Y. Gan, S. Han, P. Rong, W. Wang, W. Li, L. Zhou, Artifi-
[136] H. Lakkaraju, S. H. Bach, J. Leskovec, Interpretable decision sets: A cial intelligence assists precision medicine in cancer treatment, Frontiers
joint framework for description and prediction, in: Proceedings of the in Oncology 12 (2023) 998222.
22nd ACM SIGKDD International Conference on Knowledge Discov- [157] H. Askr, E. Elgeldawi, H. Aboul Ella, Y. A. Elshaier, M. M. Gomaa,
ery and Data Mining, 2016, pp. 1675–1684. A. E. Hassanien, Deep learning in drug discovery: an integrative review
[137] C. Rudin, J. Radin, Why are we using black box models in AI when we and future challenges, Artificial Intelligence Review 56 (7) (2023) 5975–
don’t need to? A lesson from an explainable AI competition, Harvard 6037.
Data Science Review 1 (2) (2019) 1–9. [158] Q.-H. Kha, V.-H. Le, T. N. K. Hung, N. T. K. Nguyen, N. Q. K. Le,
29
Development and validation of an explainable machine learning-based [180] J. Kim, A. Rohrbach, Z. Akata, S. Moon, T. Misu, Y.-T. Chen, T. Darrell,
prediction model for drug–food interactions from chemical structures, J. Canny, Toward explainable and advisable model for self-driving cars,
Sensors 23 (8) (2023) 3962. Applied AI Letters 2 (4) (2021) e56.
[159] C. Panigutti, A. Beretta, D. Fadda, F. Giannotti, D. Pedreschi, A. Perotti, [181] P. Regulation, Regulation (EU) 2016/679 of the European Parliament
S. Rinzivillo, Co-design of human-centered, explainable ai for clinical and of the Council, Regulation (eu) 679 (2016) 2016.
decision support, ACM Transactions on Interactive Intelligent Systems [182] S. Burton, I. Habli, T. Lawton, J. McDermid, P. Morgan, Z. Porter, Mind
(2023). the gaps: Assuring the safety of autonomous systems from an engi-
[160] D. Saraswat, P. Bhattacharya, A. Verma, V. K. Prasad, S. Tanwar, neering, ethical, and legal perspective, Artificial Intelligence 279 (2020)
G. Sharma, P. N. Bokoro, R. Sharma, Explainable AI for healthcare 5.0: 103201.
opportunities and challenges, IEEE Access (2022). [183] V. Chen, Q. V. Liao, J. Wortman Vaughan, G. Bansal, Understanding the
[161] A. Ward, A. Sarraju, S. Chung, J. Li, R. Harrington, P. Heidenre- role of human intuition on reliance in human-AI decision-making with
ich, L. Palaniappan, D. Scheinker, F. Rodriguez, Machine learning and explanations, Proceedings of the ACM on Human-Computer Interaction
atherosclerotic cardiovascular disease risk prediction in a multi-ethnic 7 (CSCW2) (2023) 1–32.
population, NPJ Digital Medicine 3 (1) (2020) 125. [184] A. Bussone, S. Stumpf, D. O’Sullivan, The role of explanations on trust
[162] X. Ma, Y. Niu, L. Gu, Y. Wang, Y. Zhao, J. Bailey, F. Lu, Understand- and reliance in clinical decision support systems, in: 2015 International
ing adversarial attacks on deep learning based medical image analysis Conference on Healthcare Informatics, IEEE, 2015, pp. 160–169.
systems, Pattern Recognition 110 (2021) 107332. [185] J. Dong, S. Chen, M. Miralinaghi, T. Chen, P. Li, S. Labi, Why did the
[163] M. Sharma, C. Savage, M. Nair, I. Larsson, P. Svedberg, J. M. Nygren, AI make that decision? Towards an explainable artificial intelligence
Artificial intelligence applications in health care practice: scoping re- (XAI) for autonomous driving systems, Transportation Research Part C:
view, Journal of Medical Internet Research 24 (10) (2022) e40238. Emerging Technologies 156 (2023) 104358.
[164] G. Maliha, S. Gerke, I. G. Cohen, R. B. Parikh, Artificial intelligence and [186] H. Mankodiya, D. Jadav, R. Gupta, S. Tanwar, W.-C. Hong, R. Sharma,
liability in medicine, The Milbank Quarterly 99 (3) (2021) 629–647. Od-XAI: Explainable AI-based semantic object detection for au-
[165] J. Amann, A. Blasimme, E. Vayena, D. Frey, V. I. Madai, P. Consortium, tonomous vehicles, Applied Sciences 12 (11) (2022) 5310.
Explainability for artificial intelligence in healthcare: a multidisciplinary [187] M. M. Karim, Y. Li, R. Qin, Toward explainable artificial intelligence for
perspective, BMC Medical Informatics and Decision Making 20 (2020) early anticipation of traffic accidents, Transportation Research Record
1–9. 2676 (6) (2022) 743–755.
[166] A. Chaddad, J. Peng, J. Xu, A. Bouridane, Survey of explainable AI [188] A. S. Madhav, A. K. Tyagi, Explainable Artificial Intelligence (XAI):
techniques in healthcare, Sensors 23 (2) (2023) 634. connecting artificial decision-making and human trust in autonomous
[167] A. Kerasidou, Ethics of artificial intelligence in global health: Explain- vehicles, in: Proceedings of Third International Conference on Comput-
ability, algorithmic bias and trust, Journal of Oral Biology and Cranio- ing, Communications, and Cyber-Security: IC4S 2021, Springer, 2022,
facial Research 11 (4) (2021) 612–614. pp. 123–136.
[168] T. d. C. Aranovich, R. Matulionyte, Ensuring AI explainability in health- [189] U. Onyekpe, Y. Lu, E. Apostolopoulou, V. Palade, E. U. Eyo, S. Kanara-
care: problems and possible policy solutions, Information & Communi- chos, Explainable Machine Learning for Autonomous Vehicle Position-
cations Technology Law 32 (2) (2023) 259–275. ing Using SHAP, in: Explainable AI: Foundations, Methodologies and
[169] N. Anton, B. Doroftei, S. Curteanu, L. Catãlin, O.-D. Ilie, F. Târcoveanu, Applications, Springer, 2022, pp. 157–183.
C. M. Bogdănici, Comprehensive review on the use of artificial intel- [190] X. Cheng, J. Wang, H. Li, Y. Zhang, L. Wu, Y. Liu, A method to evaluate
ligence in ophthalmology and future research directions, Diagnostics task-specific importance of spatio-temporal units based on explainable
13 (1) (2022) 100. artificial intelligence, International Journal of Geographical Information
[170] A. K. Al Shami, Generating Tennis Player by the Predicting Movement Science 35 (10) (2021) 2002–2025.
Using 2D Pose Estimation, Ph.D. thesis, University of Colorado Col- [191] T. Rojat, R. Puget, D. Filliat, J. Del Ser, R. Gelin, N. Dı́az-Rodrı́guez,
orado Springs (2022). Explainable Artificial Intelligence (XAI) on timeseries data: A survey,
[171] A. AlShami, T. Boult, J. Kalita, Pose2Trajectory: Using transformers on arXiv preprint arXiv:2104.00950 (2021).
body pose to predict tennis player’s trajectory, Journal of Visual Com- [192] C. I. Nwakanma, L. A. C. Ahakonye, J. N. Njoku, J. C. Odirichukwu,
munication and Image Representation 97 (2023) 103954. S. A. Okolie, C. Uzondu, C. C. Ndubuisi Nweke, D.-S. Kim, Explainable
[172] S. Atakishiyev, M. Salameh, H. Yao, R. Goebel, Explainable artificial in- Artificial Intelligence (XAI) for intrusion detection and mitigation in in-
telligence for autonomous driving: A comprehensive overview and field telligent connected vehicles: A review, Applied Sciences 13 (3) (2023)
guide for future research directions, arXiv preprint arXiv:2112.11561 1252.
(2021). [193] J. Li, S. King, I. Jennions, Intelligent fault diagnosis of an aircraft fuel
[173] D. Holliday, S. Wilson, S. Stumpf, User trust in intelligent systems: A system using machine learning—a literature review, Machines 11 (4)
journey over time, in: Proceedings of the 21st International Conference (2023) 481.
on Intelligent User Interfaces, 2016, pp. 164–168. [194] G. Bendiab, A. Hameurlaine, G. Germanos, N. Kolokotronis, S. Shi-
[174] B. W. Israelsen, N. R. Ahmed, “Dave... I can assure you... that it’s going aeles, Autonomous vehicles security: Challenges and solutions using
to be all right...” A definition, case for, and survey of algorithmic assur- blockchain and artificial intelligence, IEEE Transactions on Intelligent
ances in human-autonomy trust relationships, ACM Computing Surveys Transportation Systems (2023).
(CSUR) 51 (6) (2019) 1–37. [195] A. Maqsood, C. Chen, T. J. Jacobsson, The future of material scientists
[175] S. Atakishiyev, M. Salameh, H. Yao, R. Goebel, Towards safe, in an age of artificial intelligence, Advanced Science (2024) 2401401.
explainable, and regulated autonomous driving, arXiv preprint [196] F. Oviedo, J. L. Ferres, T. Buonassisi, K. T. Butler, Interpretable and
arXiv:2111.10518 (2021). explainable machine learning for materials science and chemistry, Ac-
[176] A. Corso, M. J. Kochenderfer, Interpretable safety validation for au- counts of Materials Research 3 (6) (2022) 597–607.
tonomous vehicles, in: 2020 IEEE 23rd International Conference on [197] G. Pilania, Machine learning in materials science: From explainable pre-
Intelligent Transportation Systems (ITSC), IEEE, 2020, pp. 1–6. dictions to autonomous design, Computational Materials Science 193
[177] D. V. McGehee, M. Brewer, C. Schwarz, B. W. Smith, et al., Review of (2021) 110360.
automated vehicle technology: Policy and implementation implications, [198] K. Choudhary, B. DeCost, C. Chen, A. Jain, F. Tavazza, R. Cohn, C. W.
Tech. rep., Iowa. Dept. of Transportation (2016). Park, A. Choudhary, A. Agrawal, S. J. Billinge, et al., Recent advances
[178] M. Rahman, S. Polunsky, S. Jones, Transportation policies for connected and applications of deep learning methods in materials science, npj
and automated mobility in smart cities, in: Smart Cities Policies and Computational Materials 8 (1) (2022) 59.
Financing, Elsevier, 2022, pp. 97–116. [199] A. Y.-T. Wang, M. S. Mahmoud, M. Czasny, A. Gurlo, CrabNet for ex-
[179] J. Kim, S. Moon, A. Rohrbach, T. Darrell, J. Canny, Advisable learn- plainable deep learning in materials science: bridging the gap between
ing for self-driving vehicles by internalizing observation-to-action rules, academia and industry, Integrating Materials and Manufacturing Inno-
in: Proceedings of the IEEE/CVF Conference on Computer Vision and vation 11 (1) (2022) 41–56.
Pattern Recognition, 2020, pp. 9661–9670. [200] K. Lee, M. V. Ayyasamy, Y. Ji, P. V. Balachandran, A comparison of
30
explainable artificial intelligence methods in the phase classification of [222] N. Drenkow, N. Sani, I. Shpitser, M. Unberath, A systematic review of
multi-principal element alloys, Scientific Reports 12 (1) (2022) 11591. robustness in deep learning for computer vision: Mind the gap?, arXiv
[201] J. Feng, J. L. Lansford, M. A. Katsoulakis, D. G. Vlachos, Explainable preprint arXiv:2112.00639 (2021).
and trustworthy artificial intelligence for correctable modeling in chem- [223] G. Schryen, Speedup and efficiency of computational paralleliza-
ical sciences, Science advances 6 (42) (2020) eabc3204. tion: A unifying approach and asymptotic analysis, arXiv preprint
[202] T. Harren, H. Matter, G. Hessler, M. Rarey, C. Grebner, Interpretation of arXiv:2212.11223 (2022).
structure–activity relationships in real-world drug design data sets using [224] J. DeYoung, S. Jain, N. F. Rajani, E. Lehman, C. Xiong, R. Socher,
explainable artificial intelligence, Journal of Chemical Information and B. C. Wallace, Eraser: A benchmark to evaluate rationalized nlp models,
Modeling 62 (3) (2022) 447–462. arXiv preprint arXiv:1911.03429 (2019).
[203] J. Willard, X. Jia, S. Xu, M. Steinbach, V. Kumar, Integrating physics- [225] A. Thampi, Interpretable AI: Building explainable machine learning sys-
based modeling with machine learning: A survey, arXiv preprint tems, Simon and Schuster, 2022.
arXiv:2003.04919 1 (1) (2020) 1–34. [226] R. Dwivedi, D. Dave, H. Naik, S. Singhal, R. Omer, P. Patel, B. Qian,
[204] M. Datcu, Z. Huang, A. Anghel, J. Zhao, R. Cacoveanu, Explainable, Z. Wen, T. Shah, G. Morgan, et al., Explainable AI (XAI): Core ideas,
physics-aware, trustworthy artificial intelligence: A paradigm shift for techniques, and solutions, ACM Computing Surveys 55 (9) (2023) 1–33.
synthetic aperture radar, IEEE Geoscience and Remote Sensing Maga- [227] S. Wu, H. Fei, L. Qu, W. Ji, T.-S. Chua, NExt-GPT: Any-to-any multi-
zine 11 (1) (2023) 8–25. modal LLM, arXiv preprint arXiv:2309.05519 (2023).
[205] J. Willard, X. Jia, S. Xu, M. Steinbach, V. Kumar, Integrating scien-
tific knowledge with machine learning for engineering and environmen-
tal systems, ACM Computing Surveys 55 (4) (2022) 1–37.
[206] Z. Huang, X. Yao, Y. Liu, C. O. Dumitru, M. Datcu, J. Han, Physically
explainable CNN for SAR image classification, ISPRS Journal of Pho-
togrammetry and Remote Sensing 190 (2022) 25–37.
[207] J. Crocker, K. Kumar, B. Cox, Using explainability to design physics-
aware CNNs for solving subsurface inverse problems, Computers and
Geotechnics 159 (2023) 105452.
[208] S. Sadeghi Tabas, Explainable physics-informed deep learning for
rainfall-runoff modeling and uncertainty assessment across the continen-
tal united states (2023).
[209] R. Roscher, B. Bohn, M. F. Duarte, J. Garcke, Explainable machine
learning for scientific insights and discoveries, IEEE Access 8 (2020)
42200–42216.
[210] D. Tuia, K. Schindler, B. Demir, G. Camps-Valls, X. X. Zhu,
M. Kochupillai, S. Džeroski, J. N. van Rijn, H. H. Hoos, F. Del Frate,
et al., Artificial intelligence to advance earth observation: a perspective,
arXiv preprint arXiv:2305.08413 (2023).
[211] P. Lopes, E. Silva, C. Braga, T. Oliveira, L. Rosado, XAI Systems Eval-
uation: A Review of Human and Computer-Centred Methods, Applied
Sciences 12 (19) (2022) 9423.
[212] V. Hassija, V. Chamola, A. Mahapatra, A. Singal, D. Goel, K. Huang,
S. Scardapane, I. Spinelli, M. Mahmud, A. Hussain, Interpreting black-
box models: a review on explainable artificial intelligence, Cognitive
Computation 16 (1) (2024) 45–74.
[213] S. Mohseni, N. Zarei, E. D. Ragan, A multidisciplinary survey and
framework for design and evaluation of explainable AI systems, ACM
Transactions on Interactive Intelligent Systems (TiiS) 11 (3-4) (2021)
1–45.
[214] S. Mohseni, J. E. Block, E. D. Ragan, A human-grounded evaluation
benchmark for local explanations of machine learning, arXiv preprint
arXiv:1801.05075 (2018).
[215] D. Gunning, D. Aha, DARPA’s Explainable Artificial Intelligence (XAI)
program, AI Magazine 40 (2) (2019) 44–58.
[216] M. Nourani, S. Kabir, S. Mohseni, E. D. Ragan, The effects of meaning-
ful and meaningless explanations on trust and perceived system accuracy
in intelligent systems, in: Proceedings of the AAAI Conference on Hu-
man Computation and Crowdsourcing, Vol. 7, 2019, pp. 97–105.
[217] A. Hedström, L. Weber, D. Krakowczyk, D. Bareeva, F. Motzkus,
W. Samek, S. Lapuschkin, M. M.-C. Höhne, Quantus: An explainable
ai toolkit for responsible evaluation of neural network explanations and
beyond, Journal of Machine Learning Research 24 (34) (2023) 1–11.
[218] J. Zhou, A. H. Gandomi, F. Chen, A. Holzinger, Evaluating the quality
of machine learning explanations: A survey on methods and metrics,
Electronics 10 (5) (2021) 593.
[219] A. F. Markus, J. A. Kors, P. R. Rijnbeek, The role of explainability in
creating trustworthy artificial intelligence for health care: a comprehen-
sive survey of the terminology, design choices, and evaluation strategies,
Journal of Biomedical Informatics 113 (2021) 103655.
[220] M. Velmurugan, C. Ouyang, C. Moreira, R. Sindhgatta, Developing a
fidelity evaluation approach for interpretable machine learning, arXiv
preprint arXiv:2106.08492 (2021).
[221] W. Sun, Stability of machine learning algorithms, Ph.D. thesis, Purdue
University (2015).
31