12 Diagnostics
12 Diagnostics
Systematic Review
Interpretable Machine Learning Techniques in ECG-Based
Heart Disease Classification: A Systematic Review
Yehualashet Megersa Ayano 1 , Friedhelm Schwenker 2, * , Bisrat Derebssa Dufera 1 , Taye Girma Debelee 3,4
1 Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa 11760, Ethiopia
2 Institute of Neural Information, University of Ulm, 89069 Ulm, Germany
3 Ethiopian Artificial Intelligence Institute, Addis Ababa 40782, Ethiopia
4 College of Electrical and Computer Engineering, Addis Ababa Science and Technology University,
Addis Ababa 16417, Ethiopia
* Correspondence: [email protected]
Abstract: Heart disease is one of the leading causes of mortality throughout the world. Among the
different heart diagnosis techniques, an electrocardiogram (ECG) is the least expensive non-invasive
procedure. However, the following are challenges: the scarcity of medical experts, the complexity
of ECG interpretations, the manifestation similarities of heart disease in ECG signals, and heart
disease comorbidity. Machine learning algorithms are viable alternatives to the traditional diagnoses
of heart disease from ECG signals. However, the black box nature of complex machine learning
algorithms and the difficulty in explaining a model’s outcomes are obstacles for medical practitioners
in having confidence in machine learning models. This observation paves the way for interpretable
machine learning (IML) models as diagnostic tools that can build a physician’s trust and provide
evidence-based diagnoses. Therefore, in this systematic literature review, we studied and analyzed
the research landscape in interpretable machine learning techniques by focusing on heart disease
diagnosis from an ECG signal. In this regard, the contribution of our work is manifold; first, we
present an elaborate discussion on interpretable machine learning techniques. In addition, we identify
and characterize ECG signal recording datasets that are readily available for machine learning-based
tasks. Furthermore, we identify the progress that has been achieved in ECG signal interpretation
using IML techniques. Finally, we discuss the limitations and challenges of IML techniques in
Citation: Ayano, Y.M.; Schwenker, F.;
interpreting ECG signals.
Dufera, B.D.; Debelee, T.G.
Interpretable Machine Learning
Keywords: interpretable; machine learning; IML; ECG; heart disease
Techniques in ECG-Based Heart
Disease Classification: A Systematic
Review. Diagnostics 2023, 13, 111.
https://doi.org/10.3390/
diagnostics13010111 1. Introduction
that ECG interpretation skills among medical doctors are poor. According to the study,
heart disease, such as acute myocardial infarction (AMI), ventricular tachycardia (VT), and
a second degree AV block missed with 13.4 %, 44.1%, and 64.6% by the resident physicians,
respectively. In addition, the existence of different types of heart disease conditions poses a
challenge for making a diagnosis through reading an ECG signal, even by a well-trained
cardiologist. Moreover, the similarities of heart disease manifestations on ECG signals pose
extra challenges for properly distinguishing them. Apart from these challenges, the ECG
signal recording may show discrepancies for the same disease condition based on age, race,
and the overall physical conditions of patients [2].
To mitigate these challenges and aid physicians in the diagnosis of heart conditions, a
computerized interpretation of ECG records (CIE) was introduced [11]. However, studies
have shown significant inaccuracies of this method and limitations of computerized ECG
interpretation [12]. Thus, despite attempts to improve the accuracies of automated ECG
interpretation techniques, the final ECG interpretation still requires a physician re-read.
Furthermore, the lack of an internationally accepted standard for computerized ECG
interpretation poses a challenge to relying on CIE [11].
Figure 1. The placement of ECG electrodes on the chest, arms, and legs [16].
A single cycle of an ECG contains a pattern of waves, as shown in Figure 3. When the
sinoatrial (SA) node triggers an impulse, the atrial fibers depolarize to produce a potential
difference called a P wave , leading to atrial contraction. In a normal ECG, as shown in
Figure 3, a P wave has a duration of about 0.08 s [14]. A P wave is seen in leads II and V1.
Diagnostics 2023, 13, 111 3 of 37
Moreover, it leans inverted in the lead aVR and is upright in leads I and II, as shown in
Figure 2.
After the atrial fiber depolarization, the impulse reaches the ventricular fibers and
rapidly depolarizes them. Since the ventricular walls are thick, the depolarization results
in more electrical changes; it is called the QRS complex, which consists of Q, R, and S waves.
The QRS complex also lasts for about 0.08 s [14]. Then, as the ventricles repolarize, a T wave
is produced. The T wave is about 0.16 s in a normal ECG. It can be seen from Figure 3 that
the atrial repolarization is missing from the pattern due to atrial fiber repolarization at the
same time as ventricular fiber depolarization [14].
As shown in Figure 3, the PR interval is the period between the P wave and the QRS
complex. The PR interval indicates the impulse transmission times between the SA and
atrioventricular (AV) nodes. It contains atrial depolarization, contraction, and depolar-
ization waves via the conduction system. The ST segment, on the other hand, occurs
during the depolarization of the ventricular myocardium, and it lasts about 0.22 s. The QT
interval that lasts about 0.38 s is a period from the start of ventricular depolarization to
repolarization [14]. The TP segment is an isoelectric region that indicates the absence of a
substantial amount of potential difference in the ventricular myocardial cells. It is a resting
Diagnostics 2023, 13, 111 4 of 37
state of the ventricular myocardial cell and covers a time from the end of repolarization to
the onset of the next depolarization [17]. Any deviation from this normal cardiac cycle may
indicate heart disease and conduction system problems. As shown in Figure 4, for instance,
a QRS duration greater than 0.12 s, broad monophasic R waves in leads I, V5, and V6, and
the absence of Q waves in leads V5 and V6 are indications of the left bundle branch block
(LBBB) [2].
Figure 4. A 12-lead ECG of a patient with exam_id of 1503778 diagnosed for LBBB [18].
explaining the ML model output developed for ECG signal-based heart disease classifi-
cation are investigated and presented in Section 5. Section 6 discusses the performance
evaluation methods for IML techniques focusing on ECG signal-based heart disease classi-
fication. The findings of this review work and existing challenges and future directions are
discussed in Sections 7 and 8. Finally, Section 9 presents the conclusion.
2. Related Work
This section discusses the related systematic review works to examine state-of-the-art
research and challenges toward heart disease classification using interpretable machine
learning (IML)-based techniques from ECG signal. To the best of our knowledge, systematic
reviews that are related to IML-based heart disease classification from ECG signals are very
limited in number and scope. However, some works have investigated and discussed the
IML techniques from the point of view of healthcare applications, as well as the existing
challenges and future directions in the field of medicine [32,34–41].
Abdullah et al. [32] provided a comprehensive survey on the uses of IML techniques
in healthcare. The paper presented an in-depth theoretical discussion of the existing
well-known IML techniques. However, only a single piece of literature was reviewed
that focuses on the application of IML on ECG signal-based heart disease classification.
Similarly, Rasheed et al. [36] reviewed a single literature study on IML-based ECG signal
interpretation. However, they provide a comprehensive review of IML techniques that
explain the reason behind their decisions. Likewise, Yang et al. [37], Stiglic et al. [38], and
Jin et al. [41] did not provide reviews on the progress of interpretable techniques on ECG
signal-based heart disease diagnosis. Instead, they described the progress made in using in-
terpretable techniques in explaining black box ML models developed in different healthcare
solutions. In addition, Yang et al. [37] showcased the benefits of ML model interpretable
methods in explaining multi-modal and multi-fusion medical image segmentation. On the
other hand, Stiglic et al. [38] emphasized feature importance-based ML model explanations.
Whereas, Jin et al. [41] provided a discussion on the benefits and limitations of various
ML model interpretability techniques to acquaint researchers and practitioners with IML
in the fields of ML and medicine so that they can contribute to the field. However, the
mathematical foundations in ML interpretable methods are not briefly discussed in these
review works [36–38,41].
Du et al. [39] and Carvalho et al. [40] presented the need that necessitates explaining
the prediction of complex ML models by providing human-friendly explanations within
societal ethics and legal framework. In this regard, Du et al. [39] discussed some IML
techniques and their categorization. Moreover, they outlined challenges to be addressed
while designing and evaluating these techniques. Similarly, Carvalho et al. [40] provided
an elaborated discussion on the categorization of IML techniques and presented the need
for explaining ML by focusing on the societal impacts. In addition, the literature focused
on identifying the mechanism for assessing the quality of the explanation and metrics to
evaluate the explanations provided by IML techniques.
Xiong et al. [34] reviewed the most popular deep learning algorithms for detecting
and locating myocardial infractions. Furthermore, the paper discussed the necessity of the
model’s explainability for evidence-based medical diagnosis. However, the review did not
include a discussion on IML-based myocardial infraction detection techniques. Similarly,
Somani et al. [35], reviewed deep learning-based literature aimed at detecting and classi-
fying five (5) types of heart disease from an ECG, including arrhythmia, cardiomyopathy,
myocardial ischemia, valvulopathy, and non-cardiac diseases. The article pinpointed the
potential of deep learning models in heart disease detection, especially for mass screening
purposes. However, a very limited and shallow discussion on the interpretable model was
presented. A summary of related works is given in Table 1.
Diagnostics 2023, 13, 111 6 of 37
• Presented a comprehensive survey on the uses of IML techniques in healthcare; • Only a single piece of literature was reviewed that focuses on the application
Abdullah et al. [32], 2021 • The paper presented an in-depth theoretical discussion of the existing well- of IML on ECG signal-based heart disease classification;
known IML techniques. • Limited discussion on how to evaluate the performance of IML techniques.
• Reviewed the most popular deep learning algorithms for detecting and locat- • Did not include a discussion on the interpretability of ML models used for
Xiong et al. [34], 2022
ing myocardial infractions. myocardial infraction detection.
Rasheed et al. [36], 2021 • Provided a comprehensive review of IML techniques • Reviewed single literature on IML-based ECG signal interpretation.
• Described the progress made in applying explainable AI in healthcare; • The review did not include literature on interpreting ML models designed for
Yang et al. [37], 2022
• Showcased the importance of explainable AI in clinical scenarios. ECG signal-based heart disease classification.
3. Method
This section presents the methodology employed for reviewing the use of IML tech-
niques for the detection of heart disease using an ECG signal. To that end, the preferred
reporting items for systematic reviews and meta-analyses (PRISMA) [42,43] reporting tech-
nique is used to define the research questions, data sources (databases), and search strings
for this particular research study. Based on the PRISMA guideline, the following steps are
followed to accomplish our systematic review work.
• Defining the research questions;
• Based on the research questions, retrieving some keywords to create proper search strings;
• Identifying the databases for performing the search using the created search strings;
• Setting filtering criteria, including the chronological period, the quality, and the type
of literature to be included in the review;
• Skimming titles and abstracts to avoid unrelated articles and duplicates from the pool
of papers;
• Defining more detailed suitability criteria and using them in a full paper reading of the
outlived papers from the previous steps;
• Analyzing and interpreting the outlived articles from all the filtering procedures in
line with research questions defined in the beginning;
• Reporting and evaluating the systematic review.
Are there any freely available heart • Identify heart ECG signal datasets
RQ1 ECG signal datasets? What are their • The characteristics, nature, and impor-
characteristics? tant features of ECG
3. IEEE Xplore, this database contains high-quality technical literature in the fields of
electrical engineering, electronics, computer science, and other related fields;
4. ScienceDirect, using this database, access to journals and technical articles published
by Elsevier is possible;
5. MDPI, a publisher of open-access peer-reviewed scientific journals;
6. Wiley Online Library, this is a repository of published articles in various disciplines,
including computational, intelligent systems, and life sciences;
7. SpringerLink, through this database, we can access scientific articles published by
Springer Nature.
By rigorously following the steps listed above, our systematic review work is aimed at
achieving three targets: (1) to be used as a reference in the existing IML techniques that
use ECG signals for heart disease classification; (2) to help researchers in avoiding work
redundancy; (3) to aid researchers in the area to identify research gaps in an evidence-based
heart disease diagnosis using IML.
To meet these targets, primarily, an elaborate discussion on interpretable machine-
learning techniques will be presented. In addition, it identifies and characterizes heart
disease ECG signal datasets that are readily available for machine learning-based research.
Furthermore, it identifies the progress that has been achieved in ECG signal interpretation
using IML techniques in terms of different IML model performance measuring techniques.
Finally, it discusses the limitations and challenges of IML techniques in interpreting an
ECG signal.
Search strings used to find the literature for this review work are tailored toward these
seven databases to specifically focus on not missing literature from each of them. As a result,
the search strings used for Google Scholar, ScienceDirect, PubMed, Wiley Online Library,
and SpringerLink are the following: [(“Explainable” OR “Interpretable”) AND (“Machine
learning Techniques” OR “Deep Learning Techniques”) AND (“Heart Disease”) AND
(“Electrocardiogram” OR “ECG”) AND (“Detection” OR “Classification”)], for IEEE Xplore
is: [(“All Metadata”: Interpretable) AND (“All Metadata”: Machine learning techniques)
OR (“All Metadata”: Deep learning techniques) AND (“All Metadata”: Heart disease
detection) AND (“All Metadata”: ECG signal)], and for MDPI is: [(“Interpretable OR
Explainable”) AND (“Machine learning” OR “Deep learning”) AND (“Heart disease”)
AND (“CG signal”)].
The inclusion and exclusion criteria for the identified literature are indicated in Table 3.
On the other hand, Figure 5 shows the literature selection process for our systematic review.
Furthermore, the total number of journal articles identified for the quantitative analysis,
and the stages for the inclusion and exclusion criteria used in the selection process are
clearly shown in Figure 5.
# of Classes
Samp. Freq.
Dataset # of Lead # of Records [Including Website URL 1
(Hz)
Normal]
https://irhythm.github.io/
Hannun et al. [53] Single lead 91,232 12 200
cardiol_test_set/ 2
https://archive.physionet.org/
2017 PhysioNet
Single lead 8528 4 300 physiobank/database/challenge/
Challenge [54,55]
2017/
2020 PhysioNet 257, https://physionet.org/content/
12-lead 43,101 111
Challenge [47,56] 500 challenge-2020/1.0.2/
Chapman University, https:
Shaoxing People’s 12-lead 10,646 11 500 //physionet.org/content/ecg-
Hospital [57] arrhythmia/1.0.0/#files-panel
China Physiological http:
12-lead 6877 9 500
Signal Challenge [44] //2018.icbeb.org/Challenge.html
PTB-XL ECG https://physionet.org/content/
12-lead 21,837 71 500
dataset [46,58] ptb-xl/1.0.1/
https:
//springernature.figshare.com/
Shandong Provincial collections/A_large-scale_multi-
12-lead 25,770 44 500
Hospital [59] label_12-lead_electrocardiogram_
database_with_standardized_
diagnostic_statements/5779802/1
https://zenodo.org/record/4916
CODE dataset [18] 12-lead 2,322,513 7 300–600
206#.Y1eIWuxBxmo 3
1All website URLs were accessed on 25 October 2022. 2 Only test data are available through this URL. The
complete dataset can be obtained upon request from Hannun et al. [53] . 3 Only 15% is available through this
URL. The complete dataset can be obtained upon requesting from Ribeiro et al. [18].
Table 5. Beat, rhythm, and signal quality level of the annotated heart ECG signal datasets.
Samp.
# of Annotation # of Classes
Dataset # of Records Freq. Website URL 1
Lead Type [Including Normal]
(Hz)
• 20 classes of ar-
• Beat rhythmia beats
MIT-BIH 48 two-channel
• Rhythm • 15 classes of ar- https://physionet.org/
Arrhythmia 2 leads half-hour 360
• Signal rhythmia rhythms content/mitdb/1.0.0/
database [48] recordings
quality • 5 classes of signal
quality
MIT-BIH Atrial
25 two-channel • 4 classes of https://physionet.org/
Fibrillation 2 leads • Rhythm 250
10-h recordings rhythms content/afdb/1.0.0/
Database [49]
MIT-BIH Normal
18 two-channel • Beat • Normal beats and https://physionet.org/
Sinus Rhythm 2 leads 128
24-h recordings • Rhythm rhythms content/nsrdb/1.0.0/
Database [50]
BIDMC-
Congestive Heart 15 two-channel • CHF (NYHA https://physionet.org/
2 leads • Beat 250
failure (CHF) 20-h recordings class 3–4) content/chfdb/1.0.0/
database [51]
Diagnostics 2023, 13, 111 11 of 37
Table 5. Cont.
Samp.
# of Annotation # of Classes
Dataset # of Records Freq. Website URL 1
Lead Type [Including Normal]
(Hz)
Normal sinus
54 two-channel
rhythm RR https://physionet.org/
2 leads half-hour • Beat • Normal beats 128
interval database content/nsr2db/1.0.0/
recordings
[52]
1 All website URLs are accessed on 25 October 2022.
where xS represent the input feature values in a set S, f S ( xS ) represents the marginal value
of f for the features present in S, and f S∪{i} ( xS∪{i} ) denotes the marginal value of f for the
feature values present in S plus feature Xi . Thus, Equation (1) computes the disparity over
all possible subsets S ⊆ F \{i } weighed by the number of features in the S from the total
number of features, F.
Though the interpretation obtained from the SHapely values of the features can be
comprehended and thoroughly tested for interpreting ECG-based ML models [68–71], the
SHapley technique still has limitations. The major challenge is the computational burdens
associated with calculating SHapley values for all feature subsets where the computational
complexity is exponential [72]. In addition, it does not consider the correlation between
the features. Instead, it takes all features as independent [66,73]. However, to mitigate
these limitations, techniques, such as restricting the subset permutation using the causal
relationship of features [74] and incorporating the constraint of correlations among feature
values [75,76] have been proposed.
Moreover, to overcome the computational expensiveness of Equation (1), kernel
SHAP [72], and treeSHAP [77] have been introduced. However, the computational com-
Diagnostics 2023, 13, 111 13 of 37
plexities of SHAP-based post hoc model explanation techniques are still expensive. In
addition, they can be tricked to rationalize decisions made by an unfair black box ML
model; that is, they can be fooled [78].
Few pieces of literature have attempted to show the applicability of LIME in inter-
preting ECG signal-based heart disease classification ML model outputs [80,81]. LIME
provides an easily understandable explanation, although it depends on the complexity
of the local surrogate models. The interpretations made by the local surrogate models
use features sampled from the original dataset. This process adds to the importance of
LIME techniques, specifically when complex features are employed to train the black box
ML model. However, the feature importance scores in a LIME do not add up to give
the prediction probabilities that create ambiguity. Moreover, they do not deliver a global
explanation of the learned complex ML model over the entire spectrum of feature values. In
addition, the random perturbations of feature instances left the LIME techniques to suffer
from the instabilities that pose challenges in reproducing the explanations. Furthermore,
LIME can be manipulated to hide biases [78]. As a result, different techniques have been
proposed in the literature to mitigate this instability and the resulting unfaithfulness of
LIME [82–85].
n k
1 (τl )(i)
PFI ( f , j) =
nk ∑ ∑ [L[y(i) , f (xj )] − L[y(i) , f (x(i) )]] (4)
i =1 l =1
where τl is a random permutation vector of instances in a dataset, D = {x(i) , y(i) }in=1 , with
n instances for l = 1, . . . , k permutations. L is a loss function linking the model output f ( x )
(τl )(i)
to the target pair y. Thus, L[y(i) , f (x j )] is the loss function linking the perturbed output
(τl ) (i ) (i )
of the model f (x j ) = f ( x1 (i) , x2 (i) , . . . , x j (τl ) , . . . , x M (i) ) to the target y(i) with respect
to the perturbed feature x j and L[y(i) , f (x(i) )] gives a baseline loss linking the baseline
output of the model and f (x(i) ) to the target pair y(i) for the instance i.
PFI has been experimented with to explain the classification output of ML mo; PFI can
give model-agnostic global insight into the black box model, f . It also takes into account
the dependency between features while determining their importance. In addition, it
avoids retraining a model with a different subset of features, which saves time and even
circumvents from reaching a new model due to the retraining process. Furthermore, the
computational complexity associated with PFI is small enough to make the implementation
easy. However, PFI needs a labeled ground truth of a given instance to calculate the feature
importance. This limitation allows PFI to be used only during the model’s development,
i.e., in the training and testing of an ML model. Likewise, in situations where strongly
correlated features exist in a dataset, the result from PFI may be biased to the extent that
less important features can take the highest importance value [89].
where wkc is the weight of the FC for filter k; classes c, i, and j are indices of the last feature
map units; c is the class category; and k is a filter index.
The main aim of CAM is to find the contribution of the last feature maps that satisfy
yc = ∑i,j Lijc . Thus, the contribution of each unit in the last feature map, Lijc , can be obtained
from Equation (6), as shown in Equation (7):
In a single-dimensional time series signal, such as an ECG signal, the class activation
map for class c at the specific temporal instance t is as indicated in Equation (8):
where Ftk is the activation of filter k in the last conventional layer at the temporal instance t,
and Lct indicates the importance of the activation at the temporal location t leading to the
categorization of a signal into class c.
CAM has been used for interpreting an ECG signal classification result of a convo-
lutional neural network [114]. Accordingly, it allows the visualization of segments of an
ECG signal that the classification model mainly uses in its decision. Techniques, such as
Grad-CAM [98,99,115–123], Grad-CAM++ [101,124], and guided Grad-CAM [125] have
been proposed in the ECG signal-based heart disease classification. However, the linear lay-
ers vanish the non-linearity of deep classifiers. In addition, the integration of CAM changes
the network architecture and needs retraining [126]. Moreover, these gradient-based CAMs
suffer from a gradient saturation problem that results in inaccurate localization of relevant
regions. In addition, the localization of the descriptive signal part is highly affected by
small perturbations of the input signal. Furthermore, the explanation is noisy and contains
discontinuities [126].
Saliency Maps
Feature saliency map highlights the regions of a signal that are most relevant for
categorizing the input signal into a given class. The saliency map can be built using
Diagnostics 2023, 13, 111 16 of 37
gradients of the output, yc (x), of an ML model over the input, x, for the class c [102]. The
idea is that the class score yc can be approximated by using the first-order Taylor expansion
as given in Equation (9):
yc (x) ≈ w T x + b (9)
where b is a scalar, and w, as indicated in Equation (10)), is the gradient that provides an
explanation for the model classification outcome:
∂yc (x)
w= (10)
∂x
Among other techniques, the saliency map can be generated using guided backpropa-
gation where the gradient of each neuron is calculated and those with the highest gradient
values are activated to form a heatmap [103]. The heatmap shows the most salient parts of
the signal that contribute most to classifying the input x to class c.
Saliency maps were experimented with for explaining complex ML models in ECG
signal-based heart disease diagnosis [102,103,127,128]. Although the backpropagation
gradient saliency map can visually enhance regions of the input signal that contribute
the most to classification, it has certain limitations. At first, the backpropagation saliency
suffers from a gradient saturation problem mainly because saliency maps are based on
input sensitivity [129]. Next, the generated gradient heatmap often does not explain
the direct relation to the classifier’s decision. Instead, it only indicates the important
signal segments used by the model for classification [130]. More importantly, the saliency
method is susceptible to small shifts in the input signal so that its explanation may not be
reliable [131].
= ∑ Ri ← j
(l ) (l,l +1)
Ri such that i contributes to j (11)
j
The propagation of relevant scores R j of layer l + 1 onto neurons of the l layer can be
achieved using different types of rules. Moreover, different rules can be used at each layer
of the network architecture [133]. One of the simplest rules is given in Equation (12) [132]:
ai wij
Ri = ∑∑ Rj (12)
j 0,i ai wij
where ai is an activation of the neuron i, wij is the weight connecting neuron i to neuron
j, and ∑0,i indicates the sum over all neurons j in the l layer. Moreover, the rule satisfies
the basic properties in which deactivated neurons, neurons with no connection, and zero
weight has no relevant value.
LRP has been used for interpreting the DL model output through heat mapping
the relevant regions of the input that contribute most to the output prediction. Having
fewer noises around the target class and the capacity to show the part of a signal that
negatively contributes to the output, LRP is superior over gradient-based explanation
Diagnostics 2023, 13, 111 17 of 37
techniques [133,134]. However, the heatmap produced by an LRP is still noisy due to the
initialization of the non-target class to zero relevance value. Moreover, it has a limitation
in discriminating targets that produce identical heatmaps for different entities in an input
signal [135]. Furthermore, the selection of propagation rules is problem-dependent, and
obtaining the best parameters is trivial [136]. As a result, different techniques, such as
contrastive LRP [137], selective LRP [135], and a softmax–gradient LRP [138] are being
proposed in the literature to alleviate these challenges.
Occlusion Map
The occlusion map is one of the attribution-based techniques where the model output
is explained by changing part of the input data with different values [139]. The input
can be altered on a specific location, for instance, in a time series signal such as an ECG
with total h time points, the alteration can cover certain time step durations (d) with an
occlusion window of (w). For a signal x = {t1 , t2 , . . . , th }, the locally altered signal ( x̂ ) can
be obtained as follows Equation (13) [139]:
x̂ = ( x m1 ) + o v m2 (13)
where m1 and m2 are masks that complement each other, i.e., m2 = ¬m1 and ov are the
occluding values. The values for m1 , m2 , and ov are determined based on the required
modifications on x.
The occlusion-based ML model’s interpretation algorithms are simple to implement.
Moreover, it can measure the marginal effects of each windowed region of the input signal
given that the segments of the input are independent [140,141]. In addition, the occlusion
method is used to interpret the output of non-differentiable ML models, unlike gradient-
based explanation techniques [102]. However, similar to other perturbation-based model
output explanation methods, such as LIME and SHapley value maps, the computational
complexity associated with the input occlusion is high [142,143].
Attention Mechanisms
Attention mechanisms are commonly used in time-series data because of their ability
to improve the limitation of traditional encoder–decoder-based models [106,144]. The
attention mechanism can be incorporated into ML networks and it allows the ML model
to focus on specific regions of an input signal that contributes most to the output predic-
tion [105,106,144–148]. Moreover, domain-specific knowledge can be integrated to guide
attention mechanisms so that the contribution of each segment of a signal in the model’s
classification output is captured [145].
The attention mechanism takes the encoder output (latent vector) as the input and
performs three consecutive computations, which are alignment scoring (eij ), computing
attention weights, and attention score vector computation, as given in Equation (14),
Equation (15), and Equation (16) [149], respectively.
where a is an alignment model whose score eij measures how well the input around
position j of the encoder’s hidden state h j matches the previous decoder hidden state si−1 at
position i just before emitting. Then, the attention weight score (αij ) of each h j is computed
by applying an activation function, for instance, the softmax activation function, on the
alignment score as shown in Equation (15).
exp(eij )
αij = (15)
∑kT=1 exp(eij )
Diagnostics 2023, 13, 111 18 of 37
where T is the number of the encoder’s hidden states. Finally, the attention score vector,
which is the output of the attention mechanism, is computed as a weighted sum of all
encoder hidden states, as shown in Equation (16).
T
ci = ∑ αij h j (16)
j =1
Based on the techniques employed for generating attention scores, attention mecha-
nisms are broadly classified into deterministic attention and stochastic attention [150]. In
the case of a deterministic, attention scores are calculated as the weighted sum of all hidden
states, whereas, in stochastic attention, attention scores are determined by selecting one of
the hidden states, h j .
The attention mechanism introduces the model’s output interpretability scheme, in
addition to improving the performance of the ML model’s ECG signal-based heart disease
classification [105–108,144]. However, the computational complexity associated with an
attention mechanism is one of the limitations that need to be improved [144].
segment of input instances or feature values on the output of the model [62,84]. Thus,
these techniques help to understand the causal relations between specific input instances
and their corresponding ML model outputs [39]. However, the explanation obtained
from these techniques is valid only for a single input instance and does not generalize. In
addition, the explanation result obtained from these techniques lacks stability. That means
the explanation generated through consecutively running these techniques may result in a
different outcome. Furthermore, the local surrogate model may spuriously approximate
the complex ML models, i.e., the explanation outcome may have no real connection with
the ML model [158,159].
On the other hand, global model interpretation methods focus on answering ’how an
ML model makes a prediction?’. These methods can try to understand how subsets of the
model influence the model’s decisions. Global interpretability can be achieved through
training interpretable constraints together with the input data [39]. In addition, it can also
be achieved by demonstrating the statistical contribution of each feature in the decision of
the underlying black box model. Furthermore, the global explanation can also be obtained
by capturing representation at the intermediate layers of complex DL models. Thus, these
techniques help to understand the inner working mechanisms of ML models and increase
the model’s transparency [39]. However, globally scoped interpretation techniques often
miss explaining a model output for specific input instances. However, different methods
have been proposed in the literature for obtaining a global explanation of the black box
model through aggregating local explanations [160].
literature, methods have been proposed to mitigate the trade-off in reducing the model
performance for interpretability. One of the methods is adding semantically meaningful
constraints to complex models to improve interpretability without a significant loss in the
performance [91]. Moreover, domain-specific knowledge can be integrated with complex
ML models through attention mechanisms to improve interpretability, as discussed in
Section 5.2.3 of this article.
The post hoc explanation methods are usually applied after the ML model is trained
and provide an explanation without modifying the trained model. Moreover, the complex
ML model can be approximated by surrogate models, such as decision trees and shallow
neural networks. These surrogate models provide a global post hoc model-agnostic expla-
nation by mimicking the complex ML model [161–163]. These techniques are much more
flexible and can switch to explain different black box ML models. However, the post hoc
methods compromise the fidelity of the explanation. In addition, they may fail to represent
the behavior of the complex ML model [39].
Table 6. Summary of commonly used techniques for ML interpretation in ECG-based heart dis-
ease classification.
Table 6. Cont.
Table 8. Class activation maps based visual observation based IML techniques performance evaluation.
Table 8. Cont.
Table 9. Occlusion Maps, Saliency Maps, and LRP based Visual observation based IML techniques
performance evaluation.
Table 10. Analysis of the feature effects via SHAP, feature importance, and a LIME-based IML
performance evaluation.
(shapelets ∩ Ww )
J (shapelets, Ww ) = (18)
(shapelets ∪ Ww )
Neves et al. [80] uses the shapelet classifier [177,178] output as a ground truth to
measure the performances of IML methods. However, it is worth knowing that the shapelet
classifier has associated performance issues. Thus, the result obtained from Equation (18)
may not faithfully measure the performance of the IML methods in reality.
The performance decrease-based approach does not need ground truth to measure the
performances of IML techniques. Thus, IML method performance results obtained from
this approach may not be feasible to be used in reality.
7. Discussion
The non-invasive diagnosis test nature of an ECG and its associated lower cost has
made it one of the most commonly used tools in heart disease diagnosis. However, most
physicians, irrespective of their experience and specialty level, face challenges in accurately
reading ECG tracings. This challenge often arises due to several types of heart disease,
the indistinguishable manifestation of heart disease in an ECG tracing, and the variation
of ECG tracings because of the patient’s age, race, and physical condition. Recently, ML-
based heart disease classification techniques using ECG tracings have been proposed in
the literature to aid physicians in reading an ECG tracing. However, the black box nature
of ML techniques has left physicians from knowing the reason behind the ML model’s
classification output and faithfully using the model’s results. As a result, different IML
techniques have been suggested for explaining ML model outputs. As shown in Figure 7,
the number of literary studies that proposed IML methods for interpreting the reason
behind the result of the ML model’s heart disease classification (from an ECG signal) is
increasing; this is an active research area.
This systematic review work presented a thorough investigation of IML methods used
in explaining outputs of heart disease classification results of black box ML models. Among
the IML techniques proposed in the literature, the class activation maps and their variants,
such as Grad-CAM, guided Grad-CAM, and Grad-CAM++ took the lion’s share, as shown
in Figure 8. These techniques localize in the form of heatmaps, i.e., the regions of an ECG
signal where the black box ML model is used in its classification output. However, apart
from localization inaccuracy, the explanation presentation technique via the heatmap might
not be well understood by expert physicians.
Similarly, most of the IML techniques proposed in the literature for explaining black
box heart disease classification ML models attempted to localize segments of an ECG signal
that the ML used for output prediction. However, for a physician who has no exposure
to the concepts of IML or machine learning, these types of explanations may not help
in obtaining an evidence-based diagnosis. In addition, the performances of these IML
techniques were not measured against ground truth, partially because of the unavailability
of the annotated dataset and commonly agreed-on quantitative metrics. For instance,
the ECG heart disease dataset presented in Table 4 was annotated only by disease types
and did not incorporate clinical reasons or findings. As per our knowledge, no publicly
available ECG heart disease dataset contains the clinical descriptions for categorizing the
ECG tracings into their respective disease class. Moreover, most IML methods proposed in
the literature for explaining the ECG signal-based heart disease ML classification outputs
are adopted from computer vision and other applications where the model training data
are either images or tabular formats.
Integrating IML methods in the workflow of the ML model development for heart
disease classification from an ECG signal is in its infancy stage and not well tested. As
shown in Figure 9, almost half of the published articles attempted to integrate and test their
proposed IML methods to explain the classification outputs of only two disease conditions.
Diagnostics 2023, 13, 111 28 of 37
40
35
30
# of literature [%]
25
20
15
10
0
2018 2019 2020 2021 2022
Year
Number of papers [#] Percentage [%]
17
15
Number of literature
10
8
7
5 4
3 3
2 2 2
1
0
EB
LIME
LRP
OMs
FI
LIPs
SMs
AMs
SHAP
CAMs
4
5 3
7.5%
5%
6 15%
2.5%
15%
9 47.5%
2.5%
2.5%
2.5%
10 2
24
26
Figure 9. Distribution of reviewed IML methods with respect to the number of disease classes.
9. Conclusions
Heart disease diagnosis from ECG tracings is difficult for physicians across different
levels. This difficulty necessitates the intervention of ML models. However, the black
box nature of these ML models and their limited performances have reduced their trust-
worthiness. Thus, the usefulness of interpreting the output of black box ML models is
undeniable in earning the trust of physicians. Thus, in this systematic review work, we first
identified the available heart electrocardiogram diagnosis datasets. Then, we discussed the
taxonomy of IML methods in terms of the result presentation method, scope, specificity,
Diagnostics 2023, 13, 111 30 of 37
and complexity of the ML model. In addition, we briefly examined these methods with
their strengths and weaknesses. Furthermore, we present the progress made in integrating
the IML methods in an ECG signal-based heart disease diagnosis through a few established
performance evaluation metrics. Finally, we discussed the existing challenges in IML
techniques and their mitigation options.
The main findings of this review work, in terms of the research questions listed in
Section 3.1, are summarized as follows:
• RQ1: Are there any freely available heart ECG signal datasets? What are their characteristics?
As discussed in Section 4, there are several annotated heart disease ECG tracing
datasets in repositories. These datasets are composed of single-lead and 12-lead ECG
tracings (sampled at different sampling frequencies). In addition, the number of
recordings in the dataset and classes annotating heart disease also vary. Moreover,
the disease classes in these datasets are not balanced. Furthermore, some annotations
are at the heartbeat level and others involve whole ECG tracing. Above all, these
repositories are not fit for developing and testing IML methods as they do not have
clinical reasoning, such as location and morphological manifestations of abnormalities
in ECG tracing.
• RQ2: What are IML techniques and commonly investigated interpretable techniques in ECG
signal-based heart disease diagnoses?
As discussed in Section 5, we identified IML methods and categorized them in a
taxonomy to discuss their working principles and spot their gaps. These IML methods
attempt to localize the regions of an ECG signal that contributes the most to the classi-
fication process. However, they have limitations, such as computational complexity,
gradient saturation problem, lack of generalization, and susceptibility to input ECG
signal perturbation.
• RQ3: What is the overall progress and performance of IML algorithms in providing evidence-
based heart disease diagnoses?
The proposed methods in the literature explain the ML model’s output in terms of
visual presentation, feature importance, internal ML model parameters, and factual
examples. However, the explanations provided are not easily understandable. In
addition, due to the lack of commonly agreed-upon performance evaluation metrics
and ground truth, the methods are not rigorously evaluated.
• RQ4: Are there any limitations and challenges in IML-based heart disease classifications?
Section 8 clearly identifies the existing challenges, such as the absence of standardized
evaluation metrics, lack of well-defined use cases, explanation clarity, and ground
truth dataset. In addition, future directions are highlighted.
In conclusion, the promising results achieved so far should be strengthened by defining
the use cases of IML methods together with expert physicians. In addition, new techniques
should be designed, and existing ones need to be customized to achieve physician-level
reasoning behind ML model decisions. Furthermore, the research community has to devise
performance evaluation metrics to evaluate the IML methods.
Author Contributions: Conceptualization, Y.M.A.; methodology, Y.M.A.; validation, F.S., T.G.D. and
B.D.D.; writing—Y.M.A.; writing—review and editing, F.S., T.G.D., and B.D.D. All authors have read
and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Diagnostics 2023, 13, 111 31 of 37
References
1. Fact Sheet: Cardiovascular Diseases. Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-
diseases-(cvds) (accessed on 23 May 2022).
2. Morris, F. ABC of Clinical Electrocardiography; Blackwell Pub: Oxford, UK, 2008.
3. Manda, Y.R.; Baradhi, K.M. Cardiac Catheterization Risks and Complications; StatPearls Publishing: Treasure Island, FL, USA, 2021.
4. Jørgensen, M.E.; Andersson, C.; Nørgaard, B.L.; Abdulla, J.; Shreibati, J.B.; Torp-Pedersen, C.; Gislason, G.H.; Shaw, R.E.; Hlatky,
M.A. Functional Testing or Coronary Computed Tomography Angiography in Patients With Stable Coronary Artery Disease.
J. Am. Coll. Cardiol. 2017, 69, 1761–1770. [CrossRef] [PubMed]
5. Syed, I.S.; Glockner, J.F.; Feng, D.; Araoz, P.A.; Martinez, M.W.; Edwards, W.D.; Gertz, M.A.; Dispenzieri, A.; Oh, J.K.; Bellavia, D.; et al.
Role of Cardiac Magnetic Resonance Imaging in the Detection of Cardiac Amyloidosis. JACC Cardiovasc. Imaging 2010, 3, 155–164.
[CrossRef]
6. Pannu, J.; Poole, S.; Shah, N.; Shah, N.H. Assessing Screening Guidelines for Cardiovascular Disease Risk Factors using Routinely
Collected Data. SCient. Rep. 2017, 7, 6488. [CrossRef] [PubMed]
7. Iragavarapu, T.; Radhakrishna, T.; Babu, K.J.; Sanghamitra, R. Acute coronary syndrome in young—A tertiary care centre
experience with reference to coronary angiogram. J. Pract. Cardiovasc. Sci. 2019, 5, 18. [CrossRef]
8. Rafie, N.; Kashou, A.H.; Noseworthy, P.A. ECG Interpretation: Clinical Relevance, Challenges, and Advances. Hearts 2021,
2, 505–513. [CrossRef]
9. Cook, D.A.; Oh, S.Y.; Pusic, M.V. Accuracy of Physicians’ Electrocardiogram Interpretations. JAMA Intern. Med. 2020, 180, 1461.
[CrossRef]
10. Higueras, J.; Gómez-Talavera, S.; Cañadas, V.; Bover, R.; P, M.L.; Gómez-Polo, J.C.; Olmos, C.; Fernandez, C.; Villacastín, J.;
Macaya, C. Expertise in Interpretation of 12-Lead Electrocardiograms of Staff and Residents Physician: Current Knowledge and
Comparison between Two Different Teaching Methods. J. Cardiol. Curr. Res. 2016, 5, 00160. [CrossRef]
11. Schläpfer, J.; Wellens, H.J. Computer-Interpreted Electrocardiograms. J. Am. Coll. Cardiol. 2017, 70, 1183–1192. [CrossRef]
12. Martínez-Losas, P.; Higueras, J.; Gómez-Polo, J.C.; Brabyn, P.; Ferrer, J.M.F.; Cañadas, V.; Villacastín, J.P. The influence of
computerized interpretation of an electrocardiogram reading. Am. J. Emerg. Med. 2016, 34, 2031–2032. [CrossRef]
13. Dey, S.; Pal, R.; Biswas, S. Deep Learning Algorithms for Efficient Analysis of ECG Signals to Detect Heart Disorders. In Biomedical
Engineering; IntechOpen: London, UK, 2022. [CrossRef]
14. Moini, J. Anatomy and Physiology; Jones and Bartlett Learning: Burlington, MA, USA, 2020; Chapter 18: The Heart, pp. 449–471.
15. Park, J.; An, J.; Kim, J.; Jung, S.; Gil, Y.; Jang, Y.; Lee, K.; young Oh, I. Study on the use of standard 12-lead ECG data for
rhythm-type ECG classification problems. Comput. Methods Programs Biomed. 2021, 21, 106521. [CrossRef]
16. Rawshani, A. The ECG Leads: Electrodes, Limb Leads, Chest (Precordial) Leads, 12-Lead ECG (EKG). Available online:
https://ecgwaves.com/topic/ekg-ecg-leads-electrodes-systems-limb-chest-precordial/ (accessed on 16 June 2022).
17. Rautaharju, P.M.; Surawicz, B.; Gettes, L.S. AHA/ACCF/HRS Recommendations for the Standardization and Interpretation of
the Electrocardiogram. Circulation 2009, 53, 982–991. [CrossRef]
18. Ribeiro, A.H.; Ribeiro, M.H.; Paixão, G.M.M.; Oliveira, D.M.; Gomes, P.R.; Canazart, J.A.; Ferreira, M.P.S.; Andersson, C.R.;
Macfarlane, P.W.; Meira, W.; et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat. Commun. 2020, 11,
1760. [CrossRef]
19. Siontis, K.C.; Noseworthy, P.A.; Attia, Z.I.; Friedman, P.A. Artificial intelligence-enhanced electrocardiography in cardiovascular
disease management. Nat. Rev. Cardiol. 2021, 18, 465–478. [CrossRef]
20. Alfaras, M.; Soriano, M.C.; Ortín, S. A Fast Machine Learning Model for ECG-Based Heartbeat Classification and Arrhythmia
Detection. Front. Phys. 2019, 7, 103. [CrossRef]
21. Kashou, A.H.; Ko, W.Y.; Attia, Z.I.; Cohen, M.S.; Friedman, P.A.; Noseworthy, P.A. A comprehensive artificial intelligence–enabled
electrocardiogram interpretation program. Cardiovasc. Digit. Health J. 2020, 1, 62–70. [CrossRef]
22. Hammad, M.; Maher, A.; Wang, K.; Jiang, F.; Amrani, M. Detection of abnormal heart conditions based on characteristics of ECG
signals. Measurement 2018, 125, 634–644. [CrossRef]
23. Aamir, K.M.; Ramzan, M.; Skinadar, S.; Khan, H.U.; Tariq, U.; Lee, H.; Nam, Y.; Khan, M.A. Automatic Heart Disease Detection
by Classification of Ventricular Arrhythmias on ECG Using Machine Learning. Comput. Mater. Contin. 2022, 71, 17–33. [CrossRef]
24. Zhang, X.; Gu, K.; Miao, S.; Zhang, X.; Yin, Y.; Wan, C.; Yu, Y.; Hu, J.; Wang, Z.; Shan, T.; et al. Automated detection of
cardiovascular disease by electrocardiogram signal analysis: A deep learning system. Cardiovasc. Diagn. Ther. 2020, 10, 227–235.
[CrossRef]
25. Śmigiel, S.; Pałczyński, K.; Ledziński, D. ECG Signal Classification Using Deep Learning Techniques Based on the PTB-XL Dataset.
Entropy 2021, 23, 1121. [CrossRef]
26. Ortín, S.; Soriano, M.C.; Alfaras, M.; Mirasso, C.R. Automated real-time method for ventricular heartbeat classification. Comput.
Methods Programs Biomed. 2019, 169, 1–8. [CrossRef]
27. Gao, J.; Zhang, H.; Lu, P.; Wang, Z. An Effective LSTM Recurrent Network to Detect Arrhythmia on Imbalanced ECG Dataset.
J. Healthc. Eng. 2019, 2019, 6320651. [CrossRef] [PubMed]
28. Feyisa, D.W.; Debelee, T.G.; Ayano, Y.M.; Kebede, S.R.; Assore, T.F. Lightweight Multireceptive Field CNN for 12-Lead ECG
Signal Classification. Comput. Intell. Neurosci. 2022, 2022, 8413294. [CrossRef] [PubMed]
29. Liu, X.; Wang, H.; Li, Z.; Qin, L. Deep learning in ECG diagnosis: A review. Knowl.-Based Syst. 2021, 227, 107187. [CrossRef]
Diagnostics 2023, 13, 111 32 of 37
30. Kashou, A.H.; Mulpuru, S.K.; Deshmukh, A.J.; Ko, W.Y.; Attia, Z.I.; Carter, R.E.; Friedman, P.A.; Noseworthy, P.A. An artificial
intelligence–enabled ECG algorithm for comprehensive ECG interpretation: Can it pass the ‘Turing test’? Cardiovasc. Digit. Health
J. 2021, 2, 164–170. [CrossRef] [PubMed]
31. Khan, A.H.; Hussain, M.; Malik, M.K. Cardiac Disorder Classification by Electrocardiogram Sensing Using Deep Neural Network.
Complexity 2021, 2021, 5512243. [CrossRef]
32. Abdullah, T.A.A.; Zahid, M.S.M.; Ali, W. A Review of Interpretable ML in Healthcare: Taxonomy, Applications, Challenges, and
Future Directions. Symmetry 2021, 13, 2439. [CrossRef]
33. Das, A.; Rad, P. Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. arXiv 2020. arXiv:2006.11371.
34. Xiong, P.; Lee, S.M.Y.; Chan, G. Deep Learning for Detecting and Locating Myocardial Infarction by Electrocardiogram:
A Literature Review. Front. Cardiovasc. Med. 2022, 9, 860032. [CrossRef]
35. Somani, S.; Russak, A.J.; Richter, F.; Zhao, S.; Vaid, A.; Chaudhry, F.; Freitas, J.K.D.; Naik, N.; Miotto, R.; Nadkarni, G.N.; et al.
Deep learning and the electrocardiogram: Review of the current state-of-the-art. EP Europace 2021, 23, 1179–1191. [CrossRef]
36. Rasheed, K.; Qayyum, A.; Ghaly, M.; Al-Fuqaha, A.; Razi, A.; Qadir, J. Explainable, Trustworthy, and Ethical Machine Learning
for Healthcare: A Survey. Comput. Biol. Med. 2021, 106043. [CrossRef]
37. Yang, G.; Ye, Q.; Xia, J. Unbox the black box for the medical explainable AI via multi-modal and multi-centre data fusion:
A mini-review, two showcases and beyond. Inf. Fusion 2022, 77, 29–52. [CrossRef]
38. Stiglic, G.; Kocbek, P.; Fijacko, N.; Zitnik, M.; Verbert, K.; Cilar, L. Interpretability of machine learning-based prediction models in
healthcare. WIREs Data Min. Knowl. Discov. 2020, 10, e1379. [CrossRef]
39. Du, M.; Liu, N.; Hu, X. Techniques for interpretable machine learning. Commun. ACM 2019, 63, 68–77. [CrossRef]
40. Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics
2019, 8, 832. [CrossRef]
41. Jin, D.; Sergeeva, E.; Weng, W.H.; Chauhan, G.; Szolovits, P. Explainable deep learning in healthcare: A methodological survey
from an attribution view. WIREs Mech. Dis. 2022, 14, e1548. [CrossRef]
42. Brennan, S.E.; Munn, Z. PRISMA 2020: A reporting guideline for the next generation of systematic reviews. JBI Evid. Synth. 2021,
19, 906–908. [CrossRef]
43. Rethlefsen, M.L.; .; Kirtley, S.; Waffenschmidt, S.; Ayala, A.P.; Moher, D.; Page, M.J.; Koffel, J.B. PRISMA-S: An extension to the
PRISMA Statement for Reporting Literature Searches in Systematic Reviews. Syst. Rev. 2021, 10, 39. [CrossRef]
44. Liu, F.; Liu, C.; Zhao, L.; Zhang, X.; Wu, X.; Xu, X.; Liu, Y.; Ma, C.; Wei, S.; He, Z.; et al. An Open Access Database for Evaluating
the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. J. Med. Imaging Health Inform. 2018,
8, 1368–1373. [CrossRef]
45. Tihonenko, V.; Khaustov, A.; Ivanov, S.; Rivin, A. St.-Petersburg Institute of Cardiological Technics 12-Lead Arrhythmia Database.
2007. Available online: https://physionet.org/content/incartdb/1.0.0/ (accessed on 25 October 2022). [CrossRef]
46. Wagner, P.; Strodthoff, N.; Bousseljot, R.D.; Samek, W.; Schaeffter, T. PTB-XL, a Large Publicly Available Electrocardiography
Dataset. 2020. PhysioNet. Available online: https://physionet.org/content/ptb-xl/1.0.1/ (accessed on 25 October 2022).
[CrossRef]
47. Perez Alday, E.A.; Gu, A.; Shah, A.; Liu, C.; Sharma, A.; Seyedi, S.; Bahrami Rad, A.; Reyna, M.; Clifford, G. Classification of
12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Available online: https://physionet.org/content/
challenge-2020/1.0.2/ (accessed on 25 October 2022). [CrossRef]
48. Moody, G.B.; Mark, R.G. MIT-BIH Arrhythmia Database. 1992. Available online: https://physionet.org/content/mitdb/1.0.0/
(accessed on 25 October 2022).
49. Moody, G.B.; Mark, R.G. MIT-BIH Atrial Fibrillation Database. 1992. Available online: https://physionet.org/content/afdb/1.0.
0/ (accessed on 25 October 2022). [CrossRef]
50. The Beth Israel Deaconess Medical Center, T.A.L. The MIT-BIH Normal Sinus Rhythm Database. 1990. Available online:
https://physionet.org/content/nsrdb/1.0.0/ (accessed on 25 October 2022). [CrossRef]
51. Baim, D.S.; Colucci, W.S.; Monrad, E.S.; Smith, H.S.; Wright, R.F.; Lanoue, A.; Gauthier, D.F.; Ransil, B.J.; Grossman, W.; Braunwald,
E. The BIDMC Congestive Heart Failure Database. 2000. Available online: https://physionet.org/content/chfdb/1.0.0/ (accessed
on 25 October 2022). [CrossRef]
52. Stein, P.; Goldsmith, R. Normal Sinus Rhythm RR Interval Database. 2003. Available online: https://physionet.org/content/
nsr2db/1.0.0/ (accessed on 25 October 2022). [CrossRef]
53. Hannun, A.Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-level arrhythmia
detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 2019, 25, 65–69. [CrossRef]
54. Clifford, G.; Liu, C.; Moody, B.; wei Lehman, L.; Silva, I.; Li, Q.; Johnson, A.; Mark, R. AF Classification from a Short Single Lead ECG
Recording: The Physionet Computing in Cardiology Challenge 2017. In Proceedings of the Computing in Cardiology Conference
(CinC), Computing in Cardiology, Rennes, France, 24–27 September 2017. [CrossRef]
55. Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.;
Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet. Circulation 2000, 101, e215–e220. [CrossRef]
56. Alday, E.A.P.; Gu, A.; Shah, A.J.; Robichaux, C.; Wong, A.K.I.; Liu, C.; Liu, F.; Rad, A.B.; Elola, A.; Seyedi, S.; et al. Classification of
12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 2020, 41, 124003. [CrossRef] [PubMed]
Diagnostics 2023, 13, 111 33 of 37
57. Zheng, J.; Guo, H.; Chu, H. A Large Scale 12-Lead Electrocardiogram Database for Arrhythmia Study. 2022. Available online:
https://physionet.org/content/ecg-arrhythmia/1.0.0/ (accessed on 25 October 2022). [CrossRef]
58. Wagner, P.; Strodthoff, N.; Bousseljot, R.D.; Kreiseler, D.; Lunze, F.I.; Samek, W.; Schaeffter, T. PTB-XL, a large publicly available
electrocardiography dataset. Sci. Data 2020, 7, 154. [CrossRef]
59. Liu, H.; Wang, Y.; Chen, D.; Zhang, X.; Li, H.; Bian, L.; Shu, M.; Chen, D. A Large-Scale Multi-Label 12-Lead Electrocar-
diogram Database with Standardized Diagnostic Statements, 2022. Mapping from Chinese ECG Statements to AHA Codes.
Figshare. Dataset. Available online: https://springernature.figshare.com/collections/A_large-scale_multi-label_12-lead_
electrocardiogram_database_with_standardized_diagnostic_statements/5779802/1 (accessed on 22 December 2022). [CrossRef]
60. Edwar d Hanc e Shortliffe. Computer-Based Medical Consultations: Mycin; Elsevier: Amsterdam, The Netherlands, 1976. [CrossRef]
61. Watson, D.S. Conceptual challenges for interpretable machine learning. Synthese 2022, 200, 65. [CrossRef]
62. Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable Machine Learning—A Brief History, State-of-the-Art and Challenges. In ECML
PKDD 2020 Workshops; Springer International Publishing: Ghent, Belgium, 2020; pp. 417–431. [CrossRef]
63. Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine
learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [CrossRef] [PubMed]
64. Arrieta, A.B.; Díaz-Rodríguez, N.; Ser, J.D.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.;
Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward
responsible AI. Inf. Fusion 2020, 58, 82–115. [CrossRef]
65. Belle, V.; Papantonis, I. Principles and Practice of Explainable Machine Learning. Front. Big Data 2021, 4, 39. [CrossRef]
66. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International
Conference on Neural Information Processing Systems; Red Hook, NY, USA, 4–9 December 2017; Curran Associates Inc.: New
York, NY, USA, 2017; NIPS’17, pp. 4768–4777.
67. Rothman, D. Hands-On Explainable AI (XAI) with Python; Packt Publishing: Birmingham, UK, 2020.
68. Angelaki, E.; Marketou, M.E.; Barmparis, G.D.; Patrianakos, A.; Vardas, P.E.; Parthenakis, F.; Tsironis, G.P. Detection of abnormal
left ventricular geometry in patients without cardiovascular disease through machine learning: An ECG-based approach. J. Clin.
Hypertens. 2021, 23, 935–945. [CrossRef]
69. Rouhi, R.; Clausel, M.; Oster, J.; Lauer, F. An Interpretable Hand-Crafted Feature-Based Model for Atrial Fibrillation Detection.
Front. Physiol. 2021, 12, 657304. [CrossRef]
70. Anand, A.; Kadian, T.; Shetty, M.K.; Gupta, A. Explainable AI decision model for ECG data of cardiac disorders. Biomed. Signal
Process. Control 2022, 75, 103584. [CrossRef]
71. Ibrahim, L.; Mesinovic, M.; Yang, K.W.; Eid, M.A. Explainable Prediction of Acute Myocardial Infarction Using Machine Learning
and Shapley Values. IEEE Access 2020, 8, 210410–210417. [CrossRef]
72. Aas, K.; Jullum, M.; Løland, A. Explaining individual predictions when features are dependent: More accurate approximations to
Shapley values. Artif. Intell. 2021, 298, 103502. [CrossRef]
73. Rozemberczki, B.; Watson, L.; Bayer, P.; Yang, H.T.; Kiss, O.; Nilsson, S.; Sarkar, R. The Shapley Value in Machine Learning. arXiv
2022, arXiv:2202.05594.
74. Frye, C.; Rowat, C.; Feige, I. Asymmetric Shapley Values: Incorporating Causal Knowledge into Model-Agnostic Explainability.
In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12
December 2020; Curran Associates Inc.: New York, NY, USA, 2020; NIPS’20.
75. Basu, I.; Maji, S. Multicollinearity Correction and Combined Feature Effect in Shapley Values. In Lecture Notes in Computer Science;
Springer International Publishing: Berlin/Heidelberg, Germany, 2022; pp. 79–90. [CrossRef]
76. Frye, C.; de Mijolla, D.; Begley, T.; Cowton, L.; Stanley, M.; Feige, I. Shapley Explainability on the Data Manifold. arXiv 2020,
arXiv:2006.01272.
77. Yang, J. Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. arXiv 2021, arXiv:2109.09847.
78. Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP. In Proceedings of the AAAI/ACM Conference on
AI, Ethics and Society, New York, NY, USA, 7–9 February 2020; pp. 180–186. [CrossRef]
79. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of
the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17
August 2016; pp. 1135–1144. [CrossRef]
80. Neves, I.; Folgado, D.; Santos, S.; Barandas, M.; Campagner, A.; Ronzio, L.; Cabitza, F.; Gamboa, H. Interpretable heartbeat
classification using local model-agnostic explanations on ECGs. Comput. Biol. Med. 2021, 133, 104393. [CrossRef]
81. Bodini, M.; Rivolta, M.W.; Sassi, R. Interpretability Analysis of Machine Learning Algorithms in the Detection of ST-Elevation
Myocardial Infarction. In Proceedings of the 2020 Computing in Cardiology Conference (CinC), Computing in Cardiology, Rimini,
Italy, 14 September 2020. [CrossRef]
82. Zhou, Z.; Hooker, G.; Wang, F. S-LIME: Stabilized-LIME for Model Explanation; Association for Computing Machinery: New York,
NY, USA, 2021; KDD ’21, pp. 2429–2438. [CrossRef]
83. Visani, G.; Bagli, E.; Chesani, F. OptiLIME: Optimized LIME Explanations for Diagnostic Computer Algorithms. arXiv 2020,
arXiv:2006.05714
84. Zafar, M.R.; Khan, N. Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability. Mach. Learn.
Knowl. Extr. 2021, 3, 525–541. [CrossRef]
Diagnostics 2023, 13, 111 34 of 37
85. Shankaranarayana, S.M.; Runje, D. ALIME: Autoencoder Based Approach for Local Interpretability. In Intelligent Data Engineering
and Automated Learning—IDEAL 2019; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 454–463.
[CrossRef]
86. Fisher, A.; Rudin, C.; Dominici, F. All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an
Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. JMLR 2019, 20, 1–81.
87. Au, Q.; Herbinger, J.; Stachl, C.; Bischl, B.; Casalicchio, G. Grouped feature importance and combined features effect plot. Data
Min. Knowl. Discov. 2022, 36, 1401–1450. [CrossRef]
88. Sood, A.; Craven, M. Feature Importance Explanations for Temporal Black-Box Models. arXiv 2021, arXiv:2102.11934.
89. Hooker, G.; Mentch, L.; Zhou, S. Unrestricted permutation forces extrapolation: Variable importance requires at least one more
model, or there is no free variable importance. Stat. Comput. 2021, 31, 82. [CrossRef]
90. Izza, Y.; Ignatiev, A.; Marques-Silva, J. On Explaining Decision Trees. arXiv 2020, arXiv:2010.11034.
91. Zhang, Q.; Wu, Y.N.; Zhu, S.C. Interpretable Convolutional Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [CrossRef]
92. Masís, S. Interpretable Machine Learning with Python; Packt Publishing: Birmingham, UK, 2021.
93. Sagi, O.; Rokach, L. Approximating XGBoost with an interpretable decision tree. Inf. Sci. 2021, 572, 522–542. [CrossRef]
94. Rath, A.; Mishra, D.; Panda, G. Imbalanced ECG signal-based heart disease classification using ensemble machine learning
technique. Front. Big Data 2022, 5, 1021518. [CrossRef]
95. Zhang, W.; Li, R.; Shen, S.; Yao, J.; Peng, Y.; Chen, G.; Zhou, B.; Wang, Z. Interpretable Detection and Location of Myocardial
Infarction Based on Ventricular Fusion Rule Features. J. Healthc. Eng. 2021, 2021, 4123471. [CrossRef]
96. Maturo, F.; Verde, R. Pooling random forest and functional data analysis for biomedical signals supervised classification: Theory
and application to electrocardiogram data. Stat. Med. 2022, 41, 2247–2275. [CrossRef]
97. Hohman, F.; Kahng, M.; Pienta, R.; Chau, D.H. Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers.
IEEE Trans. Vis. Comput. Graph. 2019, 25, 2674–2693. [CrossRef]
98. Porumb, M.; Iadanza, E.; Massaro, S.; Pecchia, L. A convolutional neural network approach to detect congestive heart failure.
Biomed. Signal Process. Control 2020, 55, 101597. [CrossRef]
99. Jahmunah, V.; Ng, E.; Tan, R.S.; Oh, S.L.; Acharya, U.R. Explainable detection of myocardial infarction using deep learning models
with Grad-CAM technique on ECG signals. Comput. Biol. Med. 2022, 146, 105550. [CrossRef]
100. Hicks, S.A.; Isaksen, J.L.; Thambawita, V.; Ghouse, J.; Ahlberg, G.; Linneberg, A.; Grarup, N.; Strümke, I.; Ellervik, C.; Olesen,
M.S.; et al. Explaining deep neural networks for knowledge discovery in electrocardiogram analysis. Sci. Rep. 2021, 11, 10949.
[CrossRef]
101. Fang, R.; Lu, C.C.; Chuang, C.T.; Chang, W.H. A visually interpretable detection method combines 3-D ECG with a multi-VGG
neural network for myocardial infarction identification. Comput. Methods Programs Biomed. 2022, 219, 106762. [CrossRef]
102. Bodini, M.; Rivolta, M.W.; Sassi, R. Opening the black box: Interpretability of machine learning algorithms in electrocardiography.
Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2021, 379, 20200253. [CrossRef]
103. Bridge, J.; Fu, L.; Lin, W.; Xue, Y.; Lip, G.Y.H.; Zheng, Y. Artificial intelligence to detect abnormal heart rhythm from scanned
electrocardiogram tracings. J. Arrhythmia 2022, 38, 425–431. [CrossRef]
104. Strodthoff, N.; Wagner, P.; Schaeffter, T.; Samek, W. Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL.
IEEE J. Biomed. Health Inform. 2021, 25, 1519–1528. [CrossRef]
105. Mousavi, S.; Afghah, F.; Acharya, U.R. HAN-ECG: An interpretable atrial fibrillation detection model using hierarchical attention
networks. Comput. Biol. Med. 2020, 127, 104057. [CrossRef]
106. Jin, Y.; Liu, J.; Liu, Y.; Qin, C.; Li, Z.; Xiao, D.; Zhao, L.; Liu, C. A Novel Interpretable Method Based on Dual-Level Attentional
Deep Neural Network for Actual Multilabel Arrhythmia Detection. IEEE Trans. Instrum. Meas. 2022, 71, 2500311. [CrossRef]
107. Lee, H.; Shin, M. Learning Explainable Time-Morphology Patterns for Automatic Arrhythmia Classification from Short Single-
Lead ECGs. Sensors 2021, 21, 4331. [CrossRef]
108. Fu, L.; Lu, B.; Nie, B.; Peng, Z.; Liu, H.; Pi, X. Hybrid Network with Attention Mechanism for Detection and Location of
Myocardial Infarction Based on 12-Lead Electrocardiogram Signals. Sensors 2020, 20, 1020. [CrossRef]
109. Wickramasinghe, N.L.; Athif, M. Multi-label classification of reduced-lead ECGs using an interpretable deep convolutional
neural network. Physiol. Meas. 2022, 43, 064002. [CrossRef]
110. Zhang, D.; Yang, S.; Yuan, X.; Zhang, P. Interpretable deep learning for automatic diagnosis of 12-lead electrocardiogram. iScience
2021, 24, 102373. [CrossRef]
111. Rashed-Al-Mahfuz, M.; Moni, M.A.; Lio’, P.; Islam, S.M.S.; Berkovsky, S.; Khushi, M.; Quinn, J.M.W. Deep convolutional neural
networks based ECG beats classification to diagnose cardiovascular conditions. Biomed. Eng. Lett. 2021, 11, 147–162. [CrossRef]
[PubMed]
112. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. In
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June
2016. [CrossRef]
113. Goswami, M.; Boecking, B.; Dubrawski, A. Weak Supervision for Affordable Modeling of Electrocardiogram Data. AMIA Annu.
Symp. Proc. AMIA Symp. 2021, 2021, 536–545. [PubMed]
Diagnostics 2023, 13, 111 35 of 37
114. Goodfellow, S.D.; Goodwin, A.; Greer, R.; Laussen, P.C.; Mazwi, M.; Eytan, D. Towards Understanding ECG Rhythm Classification
Using Convolutional Neural Networks and Attention Mappings. In Proceedings of the 3rd Machine Learning for Healthcare
Conference, Palo Alto, CA, USA, 17–18 August 2018; Volume 85, pp. 83–101.
115. Wang, J.; Qiao, X.; Liu, C.; Wang, X.; Liu, Y.; Yao, L.; Zhang, H. Automated ECG classification using a non-local convolutional
block attention module. Comput. Methods Programs Biomed. 2021, 203, 106006. [CrossRef] [PubMed]
116. Raza, A.; Tran, K.P.; Koehl, L.; Li, S. Designing ECG monitoring healthcare system with federated transfer learning and explainable
AI. Knowl.-Based Syst. 2022, 236, 107763. [CrossRef]
117. M., G.; Ravi, V.; V, S.; E.A, G.; K.P, S. Explainable Deep Learning-Based Approach for Multilabel Classification of Electrocardiogram.
IEEE Trans. Eng. Manag. 2022, 1–13. [CrossRef]
118. Lopes, R.R.; Bleijendaal, H.; Ramos, L.A.; Verstraelen, T.E.; Amin, A.S.; Wilde, A.A.; Pinto, Y.M.; de Mol, B.A.; Marquering,
H.A. Improving electrocardiogram-based detection of rare genetic heart disease using transfer learning: An application to
phospholamban p.Arg14del mutation carriers. Comput. Biol. Med. 2021, 131, 104262. [CrossRef]
119. Li, D.; Wu, H.; Zhao, J.; Tao, Y.; Fu, J. Automatic Classification System of Arrhythmias Using 12-Lead ECGs with a Deep Neural
Network Based on an Attention Mechanism. Symmetry 2020, 12, 1827. [CrossRef]
120. Cho, Y.; myoung Kwon, J.; Kim, K.H.; Medina-Inojosa, J.R.; Jeon, K.H.; Cho, S.; Lee, S.Y.; Park, J.; Oh, B.H. Artificial intelligence
algorithm for detecting myocardial infarction using six-lead electrocardiography. Sci. Rep. 2020, 10, 20495. [CrossRef]
121. myoung Kwon, J.; Kim, K.H.; Jeon, K.H.; Lee, S.Y.; Park, J.; Oh, B.H. Artificial intelligence algorithm for predicting cardiac arrest
using electrocardiography. Scand. J. Trauma, Resusc. Emerg. Med. 2020, 28, 98. [CrossRef]
122. Sangha, V.; Mortazavi, B.J.; Haimovich, A.D.; Ribeiro, A.H.; Brandt, C.A.; Jacoby, D.L.; Schulz, W.L.; Krumholz, H.M.; Ribeiro,
A.L.P.; Khera, R. Automated multilabel diagnosis on electrocardiographic images and signals. Nat. Commun. 2022, 13, 1583.
[CrossRef]
123. Kwon, J.M.; Lee, S.Y.; Jeon, K.H.; Lee, Y.; Kim, K.H.; Park, J.; Oh, B.H.; Lee, M.M. Deep Learning–Based Algorithm for Detecting
Aortic Stenosis Using Electrocardiography. J. Am. Heart Assoc. 2020, 9, e014717. [CrossRef]
124. Jiang, M.; Qiu, Y.; Zhang, W.; Zhang, J.; Wang, Z.; Ke, W.; Wu, Y.; Wang, Z. Visualization deep learning model for automatic
arrhythmias classification. Physiol. Meas. 2022, 43, 085003. [CrossRef]
125. Aufiero, S.; Bleijendaal, H.; Robyns, T.; Vandenberk, B.; Krijger, C.; Bezzina, C.; Zwinderman, A.H.; Wilde, A.A.M.; Pinto, Y.M. A
deep learning approach identifies new ECG features in congenital long QT syndrome. BMC Med. 2022, 20, 162. [CrossRef]
126. Jung, H.; Oh, Y. Towards Better Explanations of Class Activation Mapping. arXiv 2021, arXiv:2102.05228.
127. myoung Kwon, J.; Kim, K.H.; Medina-Inojosa, J.; Jeon, K.H.; Park, J.; Oh, B.H. Artificial intelligence for early prediction of
pulmonary hypertension using electrocardiography. J. Heart Lung Transplant. 2020, 39, 805–814. [CrossRef]
128. Jo, Y.Y.; myoung Kwon, J.; Jeon, K.H.; Cho, Y.H.; Shin, J.H.; Lee, Y.J.; Jung, M.S.; Ban, J.H.; Kim, K.H.; Lee, S.Y.; et al. Detection
and classification of arrhythmia using an explainable deep learning model. J. Electrocardiol. 2021, 67, 124–132. [CrossRef]
129. Srinivas, S.; Fleuret, F. Full-Gradient Representation for Neural Network Visualization. In Proceedings of the 33rd International
Conference on in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle,
H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32,
pp. 4124–4133.
130. Mohamed, E.; Sirlantzis, K.; Howells, G. A review of visualisation-as-explanation techniques for convolutional neural networks
and their evaluation. Displays 2022, 73, 102239. [CrossRef]
131. Kindermans, P.J.; Hooker, S.; Adebayo, J.; Alber, M.; Schütt, K.T.; Dähne, S.; Erhan, D.; Kim, B. The (Un)reliability of Saliency Meth-
ods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer International Publishing: Berlin/Heidelberg,
Germany, 2019; pp. 267–280. [CrossRef]
132. Montavon, G.; Binder, A.; Lapuschkin, S.; Samek, W.; Müller, K.R. Layer-Wise Relevance Propagation: An Overview. In Explainable
AI: Interpreting, Explaining and Visualizing Deep Learning; Springer International Publishing: Berlin/Heidelberg, Germany, 2019;
pp. 193–209. [CrossRef]
133. Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Muller, K.R. Explaining Deep Neural Networks and Beyond: A Review of
Methods and Applications. Proc. IEEE 2021, 109, 247–278. [CrossRef]
134. Montavon, G.; Samek, W.; Müller, K.R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process.
2018, 73, 1–15. [CrossRef]
135. Jung, Y.J.; Han, S.H.; Choi, H.J. Explaining CNN and RNN Using Selective Layer-Wise Relevance Propagation. IEEE Access 2021,
9, 18670–18681. [CrossRef]
136. Huang, X.; Jamonnak, S.; Zhao, Y.; Wu, T.H.; Xu, W. A Visual Designer of Layer-wise Relevance Propagation Models. Comput.
Graph. Forum 2021, 40, 227–238. [CrossRef]
137. Gu, J.; Yang, Y.; Tresp, V. Understanding Individual Decisions of CNNs via Contrastive Backpropagation. In Asian Conference
on Computer Vision—ACCV, Perth Australia, 4–6 December 2018; Jawahar, C.V., Li, H., Mori, G., Schindler, K., Eds.; Springer
International Publishing: Cham, Switzerland, 2019; pp. 119–134.
138. Iwana, B.K.; Kuroki, R.; Uchida, S. Explaining Convolutional Neural Networks using Softmax Gradient Layer-wise Relevance
Propagation. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul,
Korea, 27–28 October 2019. [CrossRef]
Diagnostics 2023, 13, 111 36 of 37
139. Resta, M.; Monreale, A.; Bacciu, D. Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals. Entropy
2021, 23, 1064. [CrossRef] [PubMed]
140. Ancona, M.; Ceolini, E.; Öztireli, C.; Gross, M. Towards better understanding of gradient-based attribution methods for Deep
Neural Networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC,
Canada, 30 April–3 May 2018; Conference Track Proceedings. OpenReview.net, 2018.
141. Bleijendaal, H.; Ramos, L.A.; Lopes, R.R.; Verstraelen, T.E.; Baalman, S.W.; Pool, M.D.O.; Tjong, F.V.; Melgarejo-Meseguer,
F.M.; Gimeno-Blanes, F.J.; Gimeno-Blanes, J.R.; et al. Computer versus cardiologist: Is a machine learning algorithm able to
outperform an expert in diagnosing a phospholamban p.Arg14del mutation on the electrocardiogram? Heart Rhythm 2021,
18, 79–87. [CrossRef] [PubMed]
142. Ivanovs, M.; Kadikis, R.; Ozols, K. Perturbation-based methods for explaining deep neural networks: A survey. Pattern Recognit.
Lett. 2021, 150, 228–234. [CrossRef]
143. Dissanayake, T.; Fernando, T.; Denman, S.; Sridharan, S.; Ghaemmaghami, H.; Fookes, C. A Robust Interpretable Deep Learning
Classifier for Heart Anomaly Detection Without Segmentation. IEEE J. Biomed. Health Inform. 2021, 25, 2162–2171. [CrossRef]
144. Li, R.; Zhang, X.; Dai, H.; Zhou, B.; Wang, Z. Interpretability Analysis of Heartbeat Classification Based on Heartbeat Activity’s
Global Sequence Features and BiLSTM-Attention Neural Network. IEEE Access 2019, 7, 109870–109883. [CrossRef]
145. Hong, S.; Xiao, C.; Ma, T.; Li, H.; Sun, J. MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography
Signals. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, International Joint
Conferences on Artificial Intelligence Organization, Vienna, Austria, 10–16 August 2019; pp. 5888–5894. [CrossRef]
146. Yao, Q.; Wang, R.; Fan, X.; Liu, J.; Li, Y. Multi-class Arrhythmia detection from 12-lead varied-length ECG using Attention-based
Time-Incremental Convolutional Neural Network. Inf. Fusion 2020, 53, 174–182. [CrossRef]
147. Elul, Y.; Rosenberg, A.A.; Schuster, A.; Bronstein, A.M.; Yaniv, Y. Meeting the unmet needs of clinicians from AI systems
showcased for cardiology with deep-learning–based ECG analysis. Proc. Natl. Acad. Sci. USA 2021, 118, e2020620118. [CrossRef]
148. Mousavi, S.S.; Afghah, F.; Razi, A.; Acharya, U.R. ECGNET: Learning where to attend for detection of atrial fibrillation with deep
visual attention. In Proceedings of the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI),
Chicago, IL, USA, 19–22 May 2019. [CrossRef]
149. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the
3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015; Conference Track
Proceedings.
150. Hassanin, M.; Anwar, S.; Radwan, I.; Khan, F.S.; Mian, A. Visual Attention Methods in Deep Learning: An In-Depth Survey. arXiv
2022, arXiv:2204.07756.
151. Cai, C.J.; Jongejan, J.; Holbrook, J. The effects of example-based explanations in a machine learning interface. In Proceedings of
the 24th International Conference on Intelligent User Interfaces, Marina del Ray, CA USA, 17–20 March 2019. [CrossRef]
152. Mochaourab, R.; Venkitaraman, A.; Samsten, I.; Papapetrou, P.; Rojas, C.R. Post Hoc Explainability for Time Series Classification:
Toward a signal processing perspective. IEEE Signal Process. Mag. 2022, 39, 119–129. [CrossRef]
153. Guidotti, R. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Min. Knowl. Discov.
2022. [CrossRef]
154. Han, X.; Hu, Y.; Foschini, L.; Chinitz, L.; Jankelson, L.; Ranganath, R. Deep learning models for electrocardiograms are susceptible
to adversarial attack. Nat. Med. 2020, 26, 360–363. [CrossRef]
155. Suresh, H.; Lewis, K.M.; Guttag, J.; Satyanarayan, A. Intuitively Assessing ML Model Reliability through Example-Based
Explanations and Editing Model Inputs. In Proceedings of the 27th International Conference on Intelligent User Interfaces,
Helsinki, Finland, 22–25 March 2022. [CrossRef]
156. Karlsson, I.; Rebane, J.; Papapetrou, P.; Gionis, A. Locally and globally explainable time series tweaking. Knowl. Inf. Syst. 2019,
62, 1671–1700. [CrossRef]
157. Verma, S.; Dickerson, J.; Hines, K. Counterfactual Explanations for Machine Learning: Challenges Revisited. arXiv 2021,
arXiv:2106.07756.
158. Maratea, A.; Ferone, A. Pitfalls of local explainability in complex black box models. In Proceedings of the WILF 2021, the 13th
International Workshop on Fuzzy Logic and Applications, Vietri sul Mare, Italy, 20–22 December 2021; Volume 3074.
159. Molnar, C.; König, G.; Herbinger, J.; Freiesleben, T.; Dandl, S.; Scholbeck, C.A.; Casalicchio, G.; Grosse-Wentrup, M.; Bischl, B.
General Pitfalls of Model-Agnostic Interpretation Methods for Machine Learning Models. In xxAI—Beyond Explainable AI; Lecture
Notes in Computer Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2022; Volume 13200, pp. 39–68.
[CrossRef]
160. Setzu, M.; Guidotti, R.; Monreale, A.; Turini, F.; Pedreschi, D.; Giannotti, F. GLocalX—From Local to Global Explanations of Black
Box AI Models. Artif. Intell. 2021, 294, 103457. [CrossRef]
161. Elshawi, R.; Al-Mallah, M.H.; Sakr, S. On the interpretability of machine learning-based model for predicting hypertension. BMC
Med. Inform. Decis. Mak. 2019, 19, 146. [CrossRef]
162. Marton, S.; Lüdtke, S.; Bartelt, C. Explanations for Neural Networks by Neural Networks. Appl. Sci. 2022, 12, 980. [CrossRef]
163. Jia, S.; Lin, P.; Li, Z.; Zhang, J.; Liu, S. Visualizing surrogate decision trees of convolutional neural networks. J. Vis. 2019,
23, 141–156. [CrossRef]
Diagnostics 2023, 13, 111 37 of 37
164. Krasteva, V.; Christov, I.; Naydenov, S.; Stoyanov, T.; Jekova, I. Application of Dense Neural Networks for Detection of Atrial
Fibrillation and Ranking of Augmented ECG Feature Set. Sensors 2021, 21, 6848. [CrossRef]
165. Hua, Q.; Yaqin, Y.; Wan, B.; Chen, B.; Zhong, Y.; Pan, J. An Interpretable Model for ECG Data Based on Bayesian Neural Networks.
IEEE Access 2021, 9, 57001–57009. [CrossRef]
166. Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the Quality of Machine Learning Explanations: A Survey on Methods
and Metrics. Electronics 2021, 10, 593. [CrossRef]
167. Chen, V.; Li, J.; Kim, J.S.; Plumb, G.; Talwalkar, A. Interpretable machine learning. Commun. ACM 2022, 65, 43–50. [CrossRef]
168. Petrutiu, S.; Sahakian, A.V.; Swiryn, S. The Long-Term AF Database, 2008. Available online: https://physionet.org/content/
ltafdb/1.0.0/ (accessed on 25 October 2022). [CrossRef]
169. Couderc, J. The telemetric and holter ECG warehouse initiative (THEW): A data repository for the design, implementation and
validation of ECG-related technologies. In Proceedings of the 2010 Annual International Conference of the IEEE Engineering in
Medicine and Biology, Buenos Aires, Argentina, 31 August–4 September 2010. [CrossRef]
170. Bousseljot, R.D.; Kreiseler, D.; Schnabel, A. The PTB Diagnostic ECG Database. 2004. Available online: https://physionet.org/
content/ptbdb/1.0.0/ (accessed on 25 October 2022). [CrossRef]
171. Deng, H.; Guo, P.; Zheng, M.; Huang, J.; Xue, Y.; Zhan, X.; Wang, F.; Liu, Y.; Fang, X.; Liao, H.; et al. Epidemiological Characteristics
of Atrial Fibrillation in Southern China: Results from the Guangzhou Heart Study. Sci. Rep. 2018, 8, 17829. [CrossRef] [PubMed]
172. Kim, Y.G.; Shin, D.; Park, M.Y.; Lee, S.; Jeon, M.S.; Yoon, D.; Park, R.W. ECG-ViEW II, a freely accessible electrocardiogram
database. PLoS ONE 2017, 12, e0176222. [CrossRef] [PubMed]
173. Megersa, Y.; Alemu, G. Brain tumor detection and segmentation using hybrid intelligent algorithms. In Proceedings of the
AFRICON 2015, Addis Ababa, Ethiopia, 14–17 September 2015. [CrossRef]
174. Waldamichael, F.G.; Debelee, T.G.; Ayano, Y.M. Coffee disease detection using a robust HSV color-based segmentation and
transfer learning for use on smartphones. Int. J. Intell. Syst. 2021, 37, 4967–4993. [CrossRef]
175. Anand, V.; Gupta, S.; Koundal, D.; Nayak, S.R.; Barsocchi, P.; Bhoi, A.K. Modified U-NET Architecture for Segmentation of Skin
Lesion. Sensors 2022, 22, 867. [CrossRef]
176. Amirkhani, D.; Bastanfard, A. An objective method to evaluate exemplar-based inpainted images quality using Jaccard index.
Multimed. Tools Appl. 2021, 80, 26199–26212. [CrossRef]
177. Ye, L.; Keogh, E. Time series shapelets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining—KDD ' 09, Paris, France, 28 June–1 July 2009. [CrossRef]
178. Liu, H.Y.; Gao, Z.Z.; Wang, Z.H.; Deng, Y.H. Time Series Classification with Shapelet and Canonical Features. Appl. Sci. 2022,
12, 8685. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.