0% found this document useful (0 votes)
12 views27 pages

Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

This document surveys the field of Explainable Predictive Maintenance (XPM), highlighting the integration of Explainable AI (XAI) and Interpretable Machine Learning (iML) in predictive maintenance systems. It discusses current methods, challenges, and future research directions while adhering to PRISMA guidelines for systematic reviews. The authors emphasize the importance of trust in predictive systems, especially in critical applications, and categorize various XPM methods to enhance user comprehension and system performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

This document surveys the field of Explainable Predictive Maintenance (XPM), highlighting the integration of Explainable AI (XAI) and Interpretable Machine Learning (iML) in predictive maintenance systems. It discusses current methods, challenges, and future research directions while adhering to PRISMA guidelines for systematic reviews. The authors emphasize the importance of trust in predictive systems, especially in critical applications, and categorize various XPM methods to enhance user comprehension and system performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Explainable Predictive Maintenance: A Survey

of Current Methods, Challenges and


Opportunities
LOGAN CUMMINS1 , ALEX SOMMERS1 , SOMAYEH BAKHTIARI RAMEZANI1 ,
SUDIP MITTAL1 , JOSEPH JABOUR2 , MARIA SEALE2 and SHAHRAM RAHIMI1
1
Department of Computer Science and Engineering, Mississippi State University, Mississippi State, MS 39762 USA
2
U.S. Army Engineer Research and Development Center (ERDC), Vicksburg, MS, 39180, MS
Corresponding author: Logan Cummins (e-mail: [email protected]).
arXiv:2401.07871v1 [cs.AI] 15 Jan 2024

This work by the Mississippi State University was financially supported by the U.S. Department of Defense (DoD) High Performance
Computing Modernization Program, through the US Army Engineering Research and Development Center (ERDC) (#W912HZ21C0014).
The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official
policies or endorsements, either expressed or implied, of the U.S. Army ERDC or the U.S. DoD. Authors would also like to thank
Mississippi State University’s Predictive Analytics and Technology Integration (PATENT) Laboratory for its support.

ABSTRACT Predictive maintenance is a well studied collection of techniques that aims to prolong the life
of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to
perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time
costs of upkeep. As these methods are adopted for more serious and potentially life-threatening applications,
the human operators need trust the predictive system. This attracts the field of Explainable AI (XAI) to
introduce explainability and interpretability into the predictive system. XAI brings methods to the field of
predictive maintenance that can amplify trust in the users while maintaining well-performing systems. This
survey on explainable predictive maintenance (XPM) discusses and presents the current methods of XAI
as applied to predictive maintenance while following the Preferred Reporting Items for Systematic Reviews
and Meta-Analyses (PRISMA) 2020 guidelines. We categorize the different XPM methods into groups that
follow the XAI literature. Additionally, we include current challenges and a discussion on future research
directions in XPM.

INDEX TERMS eXplainable Artificial Intelligence, XAI, Predictive Maintenance, Industry 4.0, Industry
5.0, Interpretable Machine Learning, PRISMA

I. INTRODUCTION tive maintenance (PdM). PdM encompasses many different


The history of technological advancements within the past problems in the field of maintenance, but an overarching
couple of hundred years is well documented. These centuries representation of PdM involves monitoring the system as it
and decades can be categorized into what is described as is in the present and alerting for any potential problems such
revolutions, i.e. Industrial Revolutions [1]. The most recent of as a specific anomaly or time until failure [1], [6]. While
these is agreed to be known as the fourth industrial revolution this problem that exists in the cyber-physical realm has been
or Industry 4.0 [1]–[4]. well studied from the perspective of deep learning models,
Industry 4.0 is categorized by bridging the gap between statistical models, and more, the people that get impacted
machinery through hardware and software connectivity [5]. by these systems have had considerably less attention. This
This revolution is characterized by the inclusion of human- change of focus leads us into the fifth industrial revolution or
machine interfaces, AI, and internet of things technologies Industry 5.0.
[5]. Through these technologies, we can become more auto- While the mechanical systems were the focus of the fourth
mated and efficient with new challenges that come with big industrial revolution, human-centered challenges have be-
data and cyber-physical systems. One of the problems created come the focus of the fifth revolution. As described by Leng
from this revolution has centered around the optimization of et al. [2], humans must be important in the processes related
mechanical systems. to these important decision-making systems. Nahavandi et
One method of optimizing mechanical systems is to mini- al. [4] illustrates Industry 5.0 in the realm of a factory line.
mize the downtime the system may suffer from due to break- The human performs a task that is assisted by an artificial
downs and repairs. To tackle this level of optimization, re- intelligent agent that can increase the productivity of the
searchers of Industry 4.0 have developed the field of predic- human.

1
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

As these systems are moving the focus away from mechan- these methods is the endeavor to interpret the workings of
ics and towards humans, a different area must be brought to an already-trained model. As Sokol et al. succinctly put it,
the forefront. The way to address human-centered processes explainability is for the model’s output [19]. From a more
can be derived from the fields of eXplainable AI (XAI) analytical standpoint, XAI predominantly encompasses post-
and Interpretable Machine Learning (iML). XAI and iML hoc strategies to shed light on otherwise opaque, black-box
are extensively researched from multiple fields on a wide models [16]. This paradigm is illustrated in Figure 1, where
array of problems including the various problems in PdM. a model’s explanations are constructed to enhance user com-
Our article’s main contribution involves using the Preferred prehension.
Reporting Items for Systematic Reviews and Meta-Analyses
statement to organize the XAI and iML works applied to
PdM. We also describe and categorize the different methods,
note challenges found in PdM and provide key aspects to
keep the field of Explainable Predictive Maintenance (XPM)
moving forward.
The article is organized in the following manner. In Section
II, important information surrounding explainability, Inter-
pretable Machine Learning, and predictive maintenance are
described. Section III describes the literature search per-
formed including identification, screening, and inclusion. In
Sections IV,V and VI, the results of the literature review are
categorized and discussed in detail. Section VII discusses
challenges in the field that remain to be addressed, and Sec-
FIGURE 1. Visualization of XAI Design Cycle
tion VIII provides our closing remarks.

II. BACKGROUND
To accommodate readers of varying backgrounds, we briefly a: Model-Agnostic and Model-Specific.
explain a couple of key topics needed for understanding the Explainable methods can be categorized based on their suit-
importance of this research, namely Explainable Artificial ability for addressing various types of black-box models.
Intelligence (XAI), Interpretable Machine Learning (iML), Methods that are applicable to models regardless of their
and Predictive Maintenance (PdM). We will also discuss the architecture are called model-agnostic. Common methods
distinction between XAI and iML to inform the readers of the that fall into this category are Shapley Additive Explanations
perspective with which we evaluated the literature. (SHAP) [23] and Local Interpretable Model-agnostic Expla-
nations (LIME) [24]. These methods and additional model-
A. EXPLAINABILITY AND INTERPRETABILITY IN agnostic methods are described in Section V-A. The oppo-
ARTIFICIAL INTELLIGENCE site of these methods are known as model-specific. Model-
The fine distinction between explainability and interpretabil- specific methods such as Class Activation Mapping (CAM)
ity in the context of AI and ML has raised considerable [25] for Convolutional Neural Networks (CNNs) are designed
debate [7]. While several researchers argue that the terms are to take advantage of the architecture already to provide ex-
synonymous, viewing them as interchangeable to simplify plainability. These methods and others are described in Sec-
discussions [8]–[11], others assert that they capture distinct tion V-B.
concepts [12]–[19]. Interestingly, a third perspective points
out that one term is a subset of the other, adding another layer b: Local Explanations and Global Explanations.
to the discourse [20]–[22]. Another way of classifying explainable methods is by the
To ensure clarity and coherence in this article, we consider scope of the explaination. These scopes are commonly de-
that explainability and interpretability are related yet distinct. scribed as either local or global. Local explanations aim
While there exists a certain degree of overlap, they emphasize at explaining the model’s behavior for a single data point.
different facets of machine learning. Global explanations provide reasoning that represents the
model’s behavior for any data point.
1) Explainable Artificial Intelligence
The rapidly growing field of eXplainable Artificial Intelli- c: XAI Example
gence (XAI) aims to demystify AI systems by clarifying To give a concrete example of XAI, a researcher may want
their reasoning mechanisms and subsequent outputs [7]. XAI to use a Long Short-Term Memory neural network for time-
methodologies can typically be classified based on features series analysis due to its temporal modeling capabilities [1],
such as the scope of explanation—whether global or lo- [6]. Common deep learning models like this one are not com-
cal—and the techniques employed for generating explana- monly interpretable, so to make it explainable, the researcher
tions, like feature perturbation. A unifying theme across might consider using a simpler model, i.e., linear regression,
2
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

decision tree, etc., to serve as a surrogate for post-hoc expla-


nations. These explanations would then be presented to the
user/developer/stakeholder to better explain the behavior of
an inherent black-box architecture.

2) Interpretable Machine Learning


Interpretable Machine Learning (iML) describes ML mod-
els that are referred to as white- or gray-boxes [12], and
their interpretability is enforced by architectural or functional
constraints. Between the two, architectural constraints make
models simple enough to understand, while physical con-
straints attempt to cast the model’s computations in terms
of real-world features. While XAI focuses on the model’s
output, iML focuses on the model itself [19]. This has also
been stated as intrinsic interpretability as to separate it from
post-hoc explainability methods [22], [26]. As follows, this
article will equate iML with models that are intrinsically inter-
pretable through methods of structural constraints, physical
bindings, etc. This can be seen in Figure 2, where there is no
need for translating the model through an explainable method.

FIGURE 3. PRISMA Search

FIGURE 2. Visualization of interpretable ML Design Cycle

deals with predicting the remaining useful life (RUL) or time


For a concrete example, a researcher may have a problem until failure [1], [6], [28]. This puts prognosis in the domain
that could benefit from a simple logistic regression classifier. of regression problems. Now that these terms are defined and
With such a simple architecture, the network itself would categorized into their different problems, we can discuss the
be interpretable as it would be clear what inputs affect what PRISMA compliant systematic search that we performed.
outputs. One could also extrapolate the overarching equation
if the network is simple enough. This illustrates inherent III. SYSTEMATIC SEARCH
interpretability. We utilized the Preferred Reporting Items for Systematic
Reviews and Meta-Analyses (PRISMA) 2020 statement [31],
B. PREDICTIVE MAINTENANCE [32] to layout a systemized methodology of performing a
Predictive maintenance (PdM) is a subcategory of prognostics literature review. The full process can be seen in Fig. 3.
and health management (PHM) that has seen widespread
attention in recent years [1], [22], [27], [28]. PdM utilizes AI A. IDENTIFICATION
and previous failure information from mechanical systems to In identifying the potential databases, we focused on popular
predict a fault or downtime in the future [1], [6], [29]. PdM computer science publishers as well as general scientific
is implemented with a variety of tools, including anomaly publishers. We utilized the following databases for literature
detection, fault diagnosis and prognosis [22], [28]. searches: IEEE Xplore, ACM Digital Library, ScienceDirect
Anomaly detection and fault diagnosis have a very distinct and Scopus, all of which were accessed on June 21, 2023. To
difference. Whereas anomaly detection aims at determining capture as much as we could, we searched titles, keywords,
whether a fault occured or not, fault diagnosis aims to identify and abstracts with two ideas in mind: XAI and iML and PdM.
the cause of a fault [28], [30]. This means that anomaly detec- In the former case we used explainable OR interpretable
tion can be thought of as a binary classification problem, and OR xai to capture the first grouping of papers. This should
fault diagnosis can be thought of as an extension of anomaly gather papers with common phrases like explainable artificial
detection to a multi-classification problem. Finally, prognosis intelligence, explainable machine learning, interpretable ML,
3
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

XAI, etc. To capture the PdM aspect, we provided more


explicit words so as to represent the research area better. We
used prognos* OR diagnos* OR RUL OR remaining useful
life OR predictive maintenance or detection. This would cap-
ture ideas such as prognosis, prognostics, diagnosis, diagnos-
tics, detection, etc.
In research, words like prognosis and diagnosis appear
in medically related articles. This makes sense as many can
attest that they would go to their physician for a diagnosis. To
minimize the inclusion of medical literature, ScienceDirect
and Scopus were set to look at Engineering and Computer
Science related articles only. Even with this selection, the
initial pool of research was 6932 articles. FIGURE 4. Articles published per year in our inclusion results

This narrowing down of papers was not as effective as we


initially expected as only the titles, keywords, and abstracts
were checked. Prior to removing duplicates, we also removed
articles that did not mention predictive maintenance inside of
the article. After removing those papers and duplicates, the
initial screening started with 296 articles.

B. EXCLUSION CRITERIA AND SCREENING


Our initial screenings involved skimming through the ab-
stracts, main objectives, conclusions, and images of the arti-
cles. These initial screenings utilized the following exclusion
criteria:
1) Neither XAI nor iML are a main focus of the article. FIGURE 5. Google Search Trend for PdM, XAI, and iML from our article
2) Articles are not PdM case studies. years

3) No explanation or interpretation is provided.


The need for the first tow criteria is easily apparent. Many ar-
• Three mention XAI/iML in the abstract but do not utilize
ticles would mention one of the search terms from XAI/iML,
any methods that we could find.
but they would not fall into this category of work (n = 49).
• Two were neither XAI nor iML. These mention search
This would mainly emerge as using the words explainable
terms in the abstracts, but do not build on them.
or interpretable in a sentence of the abstract. Similarly, to
• Three offer no interpretations of their interpretive
the PdM case studies, many articles mention diagnosis, and
method.
such, in a sentence without it being the focus of the article
• Two mention PdM in the abstract but do not focus on
(n = 97). However, the third criterion needs a more in-depth
PdM in an experiment.
explanation.
• One was not a case study.
When stating that an architecture is interpretable or ex-
plainable, a certain expectation is implanted in the reader’s
mind. This applies to any concept whether it be computer C. INCLUSION
science related or not. One of the expectations that we agreed After careful review of the articles, we finalized a population
upon was providing proof of interpretability or explainability. of 102 articles. Our findings and these articles are now dis-
This would necessitate the explanation from the explainable cussed in Section IV.
method or the inherent interpretation of the interpretable
model. With this expectation in mind, a few articles (n = 34) IV. SEARCH RESULTS
were removed before in-depth screening due to a mention To paint an overarching picture of our results, Fig. 4 shows a
of an explanatory method without any output of the said break-down of our inclusion population grouped by year. This
method. This finalized a screening population of 116 articles shows a clear increasing trajectory in publications that can be
which were sought for retrieval. Three were not retrieved by explained by a few potential factors. Firstly, the popularity of
our resources. Upon further examination, those three articles predictive maintenance continues to increase, as shown in [1]
seem to lead to dead URLs. and in Fig. 5, as we move to a big-data centric world in in-
For final assessment of eligibility, all of the resources dustry. This provides more opportunities to implement these
were read. Many of the articles that were excluded were not very large and very complex neural architectures for making
available outside of a small preview. Of the remaining 113 important decisions. The importance of these decisions leads
articles, 11 were excluded for the following reasons: to a second reason for increasing importance, trust.
4
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

FIGURE 7. Papers per Anomaly Detection (AD), Fault Diagnosis (FD), and
FIGURE 6. Distribution of XAI and iML in the search results
Prognosis (Prog)

Many articles discuss the importance of increasing the trust


of the users in the model while decreasing the bias in black-
box models [33]–[36]. Rojat et al. define trust as achieved
once a model can effectively explain its decisions to a person
[18]. This would necessitate some sort of explainable or
inherently interpretable architecture that could give the users
insight. Furthermore, Vollert et al. [22] even state that trust is
a prerequisite for a successful data-driven application.
Looking at Fig. 6, our findings reflect the idea that XAI
is slightly more popular than iML in PdM. One potential
reason could be the desire to make use of the benefits from
complex models. Many of the articles utilize architectures FIGURE 8. Split between XAI and iML per category of predictive
such as Deep Convolutional Neural Networks [37] or Long maintenance
Short-term Memory Neural Networks [38] due to their high
performance in the application. With the inherent black-box
nature of these models, these researchers need post-hoc ex- These methods are colloquially known as model-agnostic ex-
plainable methods. This desire for XAI over iML seems to be plainable methods [149]. These methods found in this section
affecting specific PdM tasks more than others. can be applied to any architecture and consist of SHAP in
The articles are categorized according to PdM task in Fig. Section V-A1, LIME in Section V-A2 and additional related
7, and those are further distinguished into XAI and iML methods.
within tasks in Fig. 8. Our article population reflects anomaly
detection as the main task that utilizes XAI and iML. Fault 1) Shapley Additive Explanations (SHAP).
diagnosis and prognosis are virtually the same in number SHAP values were introduced by Lundberge et al. as a unified
of articles published within this population; however, Fig. 8 measure of feature importance [23]. SHAP is based on three
shows that the interest in XAI and iML are reversed in these properties that are shared with classical Shapley value esti-
groups. Succinctly, prognosis focuses on XAI, while diagno- mation: local accuracy, missingness, and consistency. Local
sis focuses on iML. We now describe the many methods that accuracy refers to the ability of the simplified input to at
were applied to the varying datasets seen in Table 1. These least match the output of the input from the data. Missingness
methods are split between section V for XAI methods and refers to the features that are missing from the simplified
section VI for iML methods. Additionally, specific articles of input. Succinctly, this states that if a feature is not useful to the
interest can be found in Table 4. explanation, then it is not useful to the model. Finally, consis-
tency brings the idea that the importance of a feature should
V. EXPLAINABLE AI IN PREDICTIVE MAINTENANCE stay the same or increase regardless of the other features.
XAI in predictive maintenance captures a wide range of meth- By far, SHAP is the most used method seen in our sample.
ods that can be categorized in several ways. To not repeat in- Moreover, SHAP is one of the few methods that has been
formation, the methods are broken up into three subsections: applied to the problems of anomaly detection [72], [106]–
model-agnostic, model-specific, and combination. [108], [132], fault diagnosis [130], [134], and prognosis [75],
[77], [112]. This is likely due to its wide versatility as a model-
A. MODEL-AGNOSTIC agnostic method that can provide global explanations.
This section describes the explainable methods in our popula- Steurtewagen et al. [134] created a framework for fault
tion, seen in Table 2, that could be applied to any architecture. diagnosis that consists of three parts: data collection, progno-
5
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

TABLE 1. Datasets from the Literature Search TABLE 2. Explainable Methods from the Literature

Datasets Articles Method Articles


Bearings and PRONOSTIA [39], [40] [41]–[53] Shapley Values (V-A1) [72], [95], [106]–[108], [130], [132]
[33], [37], [54], [55] [36]–[38], [42], [66], [75]–[77], [112],
Vehicles or vehicle subsystem [56]–[68] [131], [134]
CMAPSS [69] [35]–[37], [70]–[79] LIME (V-A2) [35]–[38], [44], [50], [51], [54], [61],
[80] [66], [76], [84]
General Machine Faults and Failures [81] [42], [48], [82]–[86] Feature Importance (V-A3) [34], [54], [67], [85], [86], [92], [104],
Trains [34], [87]–[92] [110], [137], [139]
Gearboxes [93] [42], [45], [48], [94] LRP (V-A4) [37], [44], [68], [105], [126]
Artificial Dataset [44], [95], [96] Rule-based (V-A5) [65], [70], [71], [73]
Hot or Cold Rolling Steel [72], [95], [97] CAM and GradCAM (V-B1) [37], [44], [48], [63]
Mechanical Pump [98]–[100] Surrogate (V-A6) [82], [89], [95], [140]
Aircraft [52], [101] Visualization (V-A9) [47], [74], [109], [129]
Amusement Park Rides [102], [103] DIFFI (V-B2) [42], [102], [128]
Particle Accelerators [104], [105] Integrated Gradients (V-A7) [94], [131]
Chemical plant [106], [107] Causal Inference (V-A8) [88]
Maritime [108]–[110] ACME (V-A10) [103]
Semi-conductors [111] [112], [113] Statistics (V-A11) [59]
Air Conditioners [56] SmoothGrad (V-A12) [131]
Hard Drives [114] [38], [70], [115] Counterfactuals (V-A13) [97]
Tennessee Eastman Process [116] [70] LionForests (V-B3) [96]
Compacting Machines [103] ELI5 (V-A14) [51]
UCI Machine Learning Repository [117] [118] Saliency Maps (V-B4) [37]
Wind Turbines [119] [120]–[122] ARCANA (V-B5) [120]
Transducers [123]
Lithium-ion Batteries [124] [37], [125], [126]
Heaters [127]
Computer Numerical Control data [128] number of unplanned shutdowns detected, and they utilized
Textiles [129] show SHAP as an effective way of measuring root cause
Plastic Extruders [130]
Press Machine [131] analysis.
Coal Machinery [132] Gashi et al. [112] conducted predictive maintenance on
Refrigerators [133] a multi-component system. Their objective was to model
Gas Compressors [134]
Hydraulic Systems [135] interdependencies and assess the significance of the inter-
Iron Making Furnaces [136] dependencies. Prior to training their Random Forest model,
Cutting Tools [137] they used visual exploration to study interdependencies. They
Power Lines [138] [139]
Communication Equipment [140] used two methods to justify the use of interdependencies:
Water Pump [141] statistics and XAI. They used chi-squared testing to show
Oil Drilling Equipment [142] that the performance of a model with interdependencies is
Solenoid operated valves [143]
Coal Conveyors [144] better (p < 0.001). When applying SHAP to the random forest,
Temperature Monitoring Devices [145] they showed that the interdependency variables were usually
Distillation Unit [146] among the top explainer features. This adds validity to SHAP
Water Pipes [147] [148]
as an explainable method in terms of the accuracy of its
explanations.

sis, and diagnosis. Importantly, in the data collection phase, 2) Local Interpretable Model-agnostic Explanations (LIME).
they received the reports that were associated with the faults. LIME was introduced by Ribeiro et al. as a way of explaining
The prognosis section used an XGBoost algorithm to detect a any model using a local representation around the prediction
fault occurring. The diagnosis utilized SHAP to determine the [24]. This is done by sampling around the given input data
features that are important to the output of XGBoost. These and training a linear model with the sampled data. In doing
features are validated using the reports that accompany the this, they can generate an explanation that is faithful to that
fault. prediction while using only information gained from the orig-
Choi et al. [107] proposed a method for explainable un- inal model.
supervised anomaly detection to predict system shutdowns Protopapadakis et al. [35] computed the RUL as applied
for chemical processes. Their method consisted of what they to the CMAPSS turbofan dataset. They initially attempted to
call a period-independent framework and a period-integrated perform RUL prediction with two models, a random forest
framework. The period-independent framework searched for and a deep neural network. They found the random forest to
the best anomaly detection model and applied the explainable perform poorly, which would lead to poor explanations. Their
method. In the period-integrated framework, they applied deep neural network achieved high performance, so they
real-time information to the model chosen from the previous applied LIME. They compared two LIME explanations, one
framework. They found that the isolation forest provided the for early life and one for late life with a specific fault. They
best results in the period-independent framework based on the found that LIME was able to label the important features for
6
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

failures that reflected the physical faults. Additionally, they 4) Layer-wise Relevance Propagation (LRP).
showed that LIME would have a more difficult time labeling LRP was introduced by Bach et al. [151] as an explainable
the important features when it was applied to segments with method that assumes that a classifier can be decomposed into
no faults as anything could occur in the future. several layers of computation. LRP works with the concept
Allah Bukhsh et al. [92] discussed multiple tree-based of a relevance score that measures how important a feature
classifiers for predicting the need for maintenance events, i.e., is to an output. LRP works by extrapolating the relevance to
anomaly detection, for train switches. From their pool of tree- the input layer by moving backwards through the architecture
based classifiers, including decision tree, random forest, and starting at the output layer. The importance of an input feature
gradient boosted tree, they identified gradient boosted tree can then be measured as a summation of features it impacts
as the most accurate amongst the models when predicting through the architecture.
if a problem would occur. In a separate test, they had the LRP falls into the category of model-agnostic which can
same models predict specific types of anomalies. In this be seen in the use-cases in the literature. Felsberger et al.
experiment, random forest outperformed the rest. For inter- [105] applied LRP to multiple architectures including kNN,
pretability, they implemented LIME to learn from the outputs random forest, and CNN-based models. Through LRP, they
of the random forest. The researchers intend that the output found that the CNN architectures were learning important
from LIME will help establish trust in the model for domain features which led to higher performance. Han et al. [68]
experts and decision makers performed fault diagnosis for motors using the notable model
LeNet [152]. Through the use of LRP, they were able to bring
3) Feature Importance. explainability to a notable architecture.
Wang et al. [126] proposed a method of using explainability
Feature importance refers to the idea that some of the input
as a method of driving the training process. They utilized
features have more influence on the output than others. For
LRP to calculate feature importance for the training data. The
example, when determining if an image is a dog, the back-
importance calculations were embedded for optimizing the
ground that has no pixels of the dog would potentially be less
model’s performance. They introduced this explainability-
important than the pixels with the dog. Feature importance is
driven approach to the problem of aging batteries, and showed
typically assessed using techniques like SHAP and LIME, but
its superb accuracy when compared to a data-driven approach.
various approaches exist in the literature.
Many researchers have applied different methods of feature 5) Rule-based Explainers.
importance calculations. Bakdi et al. [110] tackled predictive Rule-based explainers use a combination of the black-box
maintenance for ship propulsion systems. They combined model and the training data to create a series of IF-THEN
balanced random forest models and multi-instance learning rules. These rules are generally created using combinatorial
to achieve a high true positive rate which was then explained logic (ANDs, ORs, and NOTs) to combine the features in the
via Gini feature importance. Schmetz et al. [137] also applied IF portion of the rules. The THEN portion of the rules are
Gini feature importance to verify a Tree Interpreter [150] for populated by the result from the model, usually a class or a
their random forest classifier. predicted value. The rules are then presented as explanations
Other researchers have ranked their features in different or may be used as a replacement for the black-box model
ways. Manco et al. [34] performed fault prediction to train itself.
systems where they ranked time steps by how anomalous they Even in rule-based explainers, there are numerous methods
were within a time window. This ranking was performed by that have been used. Wu et al. [71] proposed the K-PdM (KPI-
mixture modeling of the prior probability of the trend with oriented PDM) framework, a cluster-based HMM based on
the probability of the trend being normal behavior. Marcato key performance indicators (KPIs). A KPI is a vector of one
et al. [104] applied anomaly detection to particle accelerators feature of fine-grained deterioration, and a combination of
where permutation-based feature importance to guide further KPIs reflect the health of a machine. The health was modeled
model development. as an HMM for each KPI. These HMMs were converted into
Finally, Voronov et al. [67] and Ghasemkhani et al. [86] a rule-based reasoning system for explainability.
each proposed different methods of calculating feature impor- Brunello et al. [70], [73] showed twice that temporal logic
tance that tackle different problems. Voronov et al. proposed can be used in anomaly detection. Firstly, they showed that
a forest-based variable selector called Variable Depth Distri- linear temporal logic could be added to an online system
bution (VDD) that addressed the issue of variable interdepen- for monitoring failures [73]. They again showed that tem-
dencies through clustering of features. The important features poral logic could be used in a different approach to the
appeared in multiple clusters. Ghasemkhani et al. developed same problem. Brunello et al. [70] created syntax trees that
Balanced K-Star to deal with the imbalance problem com- utilized bounded signal temporal logic statement. The trees
monly found in predictive maintenance. To add explainability, were altered using an evolutionary approach to predict failure
they applied chi-square to determine the important features in in Blackblaze Hard Drive [114], Tennessee Eastman Pro-
the machine failure. cess [116], and CMAPSS [69] datasets, commononly used
datasets for PdM of hard drives, electrical processes and
7
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

turbofans. This method led to great performance with rule- tical for two functionally equivalent networks. With these
based explanations. axioms in mind, the integrated gradients are calculated via
Ribeiro et al. [65] applied XAI to the online learning pro- small summations through the layers’ gradients.
cess using a Long Short-term Memory AutoEncoder (LSTM- Hajgato et al. [94] introduced the PredMaX framework for
AE) for modeling public transport faults. Simultaneously, the predictive maintenance which identified sensitive machine
authors’ system learned regression rules that explained the parts and clustered time periods. It works in two steps: a deep
outputs of the model. While their system was learning to map convolutional autoencoder was applied to the data, and clus-
the anomalies, the output of their model was fed into Adaptive tering was performed on the latent space in the autoencoder.
Model Rules (AMRules), a stream rule learning algorithm. From the clusters, they showed which channels contribute
They applied their method to four public transport datasets, to the transition from normal to abnormal. Additionally, the
and they output their global and local rule-based explanations integrated gradients technique was used to extract the relevant
given used in their system. sensor channels for a malfunctioning machine part.

6) Surrogate Models. 8) Causal Inference.


Surrogate models are simpler models that are used to repre- Causality goes beyond the notion of statistics dependencies
sent more complex models. These surrogate models generally as it shows a true relationship between two or more variables
take the form of simple decision trees and linear/logistic re- [154]. Causality can be measured in causal strength which
gression models. The simplistic nature of these models makes measures the change in distribution of n-1 variables when
them interpretable; however, their use has their interpretabil- one variable has been changed [154]. Causality is not an
ity as an explainable method for a black-box model. easy quality to analyze as it can only be truly discovered by
When utilizing a surrogate model as an explainable repeated observations of a phenomenon occurring given an
method, the surrogate model must be inherently interpretable event; however, causal inference has been a method of XAI
as a way of allowing an explanation to be gathered from the that some researchers have utilized.
main model. Glock et al. [82] utilized two ARIMA models to Trilla et al. [88] designed an anomaly detection framework
explain a random forest model. One ARIMA model learned based around a denoising variational autoencoder (VAE) and
the same data as the random forest, and the second ARIMA an MLP. They extracted intra-subsystem and inter-subsystem
model learned the residual errors from the random forest. patterns by making the time series data into voxels. The VAE
While the random forest is not explainable, the two ARIMA generalized the embeddings. Finally the MLP was used to
models could show what the random forest could and could create a smooth diagnosis probabilistic function. They ap-
not learn. plied their method on a locomotion dataset and utilized causal
Zhang et al. [140] proposed an alarm-fault association rule inference via the Peter-Clark algorithm to answer the question
extraction based on feature importance and decision trees. "Did the VAE learn cause-effect relationships?" They found
Their process started with a weighted-random forest. Feature that the VAE could at best be described as modeling a corre-
selection was performed to gather the important features in lation relationship, but this limitation was mainly attributed
the abnormal state. These features were used to create a series to limited data availability.
of C4.5 decision trees that model different features. Once their
random forest was trained and predicted a fault, the decision 9) Visualization.
tree with the highest accuracy could be used to extrapolate an Visualization techniques do not take any one specific form.
explanation of the fault. Generally, these visualizations take the form of visualizing
Errandonea et al. [89] tested XAI on edge computing with weights; however they may also take the form of visualizing
all possible models in H2O.ai’s AutoML to perform their fault specific examples. Whatever the case, these methods benefit
diagnosis. After determining the optimal architectures, they the users by providing an image enlightens the user to the
trained a decision tree surrogate model to add explainability to inner workings of the architecture.
their autoML process. By optimizing hardware and accuracy, Visualizations can be utilized in many ways for explain-
they showed that explainable predictive maintenance could ability. Michalowska et al. [109] use visualizations to com-
theoretically occur on edge computing devices. pare healthy and anomalous data. Costa et al. [74] utilized
visualizations coupled with a recurrent variational encoder.
7) Integrated Gradients. They show that the latent space created by the encoder can
Integrated gradients was introduced by Sundararajan et al. add explainability. When input data with similar RULs pass
[153] to attribute the prediction of a deep architecture to its through the encoder, they show the latent spaces are similar
input features. They introduce two axioms, sensitivity and for those with similar RULs.
implementation invariance, to build their explainable method. Xin et al. [47] aimed to address bearing fault diagnosis via
Sensitivity is achieved if for every input and baseline that a novel model named logarithmic-short-time Fourier trans-
differ in one feature but have different predictions then the form modified self-calibrated residual network (log-STFT-
differing feature should be given a non-zero attribution. Im- MSCResNet). The STFT extracts time-frequency features
plementation invariance means attributions are always iden- from raw signals to retain physical meaning of fault signatures
8
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

FIGURE 9. Use of XAI Methods

which are visualized for explainability. The MSCResNet is by using ACME with all of the data while SHAP would be
used to enlarge the receptive field without introducing more slower even with access to 30% of the data.
parameters. With the combination of the two, they aim to have
high accuracy even under unknown working conditions. They 11) Statistics.
compared their model to popular models such as LSTM and As a method of explaination applied to the problem of predic-
ResNet18. log-STFT-MSCResNet performed among the best tive maintenance, statistical tests can be used to compare the
even under unknown working conditions, had a small number distribution of the features between different classes.
of features and had a shorter training time than the others. Fan et al. [59] developed ML methods that take advantage
of physics knowledge for added interpretability. Their case
10) Accelerated Model-agnostic Explanations (ACME). study was fault detection of leak-related faults in vehicle air
ACME was introduced by Dandolo et al. [155] as a method systems. They applied three physics equations to their data
of quickly generating local and global feature importance that would model the air leakage. Moreover, they used that
measures based on perturbations of the data. For global ex- data in the training data of their kNN and MLP models. Re-
planations, they take a vector that holds the mean of each sults showed that the physics-assisted models to outperform
feature through the entire dataset; this is known as the baseline the non-assisted models.
vector. Then a variable-quantile matrix is created that holds
the different quantiles of the features. This matrix is used 12) Smooth Gradients (SmoothGrad).
to gather predictions that would represent each quantile. The SmoothGrad was developed by Smilkov et al. [156] to pro-
global feature importance is finally calculated for each feature duce a gradient-based sensitivity map. The intuition behind
by computing the standardized effect over each quantile. To SmoothGrad involves differentiating the predicting model
get a local explanation, the baseline vector is replaced with with respect to the input. This derivative creates a sensitivity
the specific data point that is meant for explaining. map that represents how much difference a change in each
Anello et al. [103] applied ACME to the problem of pixel of the input would make to the classification [156].
anomaly detection to compare it to SHAP. They utilized Moreover, this sensitivity map can ideally show regions that
isolation forest to detect anomalies as it is commonly used are key to the prediction.
for detecting outliers or anomalies. An anomaly score was
used as a label for the time series to represent the problem 13) Counterfactuals.
as a regression task which allows ACME to be applied. After Counterfactuals were introduced by Wachter et al. [157] to
applying SHAP and ACME to a roller coaster dataset and a provide statements of the differences needed to gain the
compacting machine dataset, they found a drastic speed up desirable outcome. This method also works by providing
9
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

an explanation for the output of the model, but this extra Solis-Martin et al. [37] present a comparison on LIME,
capability makes counterfactuals very unique in realm of XAI SHAP, LRP, Image-Specific Class Saliency (Saliency Maps)
methods. and GradCAM as applied to predictive maintenance datasets
Jakubowski et al. [97] developed a predictive main- such as CMAPSS and batteries. They identify eight metrics
tenance solution for an industrial cold rolling operation. for comparison: identity, separability, stability, selectivity,
They utilize a semi-supervised algorithm based on the coherence, completeness, congruence and acumen, an evalu-
Physics-Informed Auto-Encoder (PIAE). This architecture ation proposed by the authors. When comparing the different
was physics-informed by applying a list of equations at the methods as applied to a CNN architecture, GradCAM per-
beginning of their input data. The output of the equations was formed the best in regards to the nine metrics.
appended to the input data of their AE. Their model proved Oh et al. [63] propose a fault detection and diagnosis
to be more accurate than a base AE. While PIAE has some framework that consists of a 1D-CNN for fault detection,
interpretable aspects already, they applied counterfactuals as class activation maps for fault diagnosis (explainable method)
an explainability method to show the important features from and VAE for implementing user feedback. The CNN utilizes a
their algorithm’s decisions. GAP layer as the output layer due to its ability to maintain the
temporal information. This also allows them to use CAM as
14) Explain Like I’m 5 (ELI5). an explainable method as opposed to GradCAM. The VAE is
ELI5 is a popular method from Github [158] maintained by utilized with the principle of Garbage-In, Garbage-Out logic
the user TeamHG-Memex and 15 other contributors. This to minimize the amount of false positives and negatives that
Python library focuses on explaining the weights of a model would be presented to the users. To verify their method, they
which also serves as a method for calculating feature im- apply it to the Ford Motor dataset which is a vehicle engine
portance. While maintaining original methods, ELI5 also dataset that contains an amount of noisy data. They show that
provides other explainability method implementations. their model is accurate even in noisy data, and they show that
the VAE increases their accuracy. They also show via CAM
B. MODEL-SPECIFIC that the anomalous data is linearly separable, which is found
This section describes the explainable methods in our pop- in the VAE.
ulation that base the explanations on the properties of the
architecture it intends to explain. These methods are known 2) Depth-based Isolation Forrest Feature Importance (DIFFI).
as model-specific [149]. Here we discuss methods that take DIFFI was introduced by Carletti et al. [160] as an explanable
advantage of the architecture for generating explanations such method for isolation forests. Isolation forests are an ensemble
as CAM and GradCAM in Section V-B1, DIFFI in Section of isolation trees which learn outliers by isolating them from
V-B2 and more. the inliers. DIFFI relies on two hypotheses to define feature
importance where a feature must: induce the isolation of
1) Class Activation Mapping (CAM) and Gradient-weighted anomalous data points at small depth (i.e., close to the root)
Cam (GradCAM). and produce a higher imbalance on anomalous data points
CAM was introduced by Zhou et al. [25] as a method of global while being useless on regular points [160]. These hypotheses
explainability for convolutional neural networks (CNN). The would allow explanations for anomalous data which would
map that is created indicates the image regions that are used allow for explanations of outliers or faulty data.
by the CNN to identify the target category. CAM does this by Berno et al. [102] performed anomaly detection for auto-
utilizing a global average pooling (GAP) layer in the CNN mated rides at entertainment parks. They introduced the idea
architecture which outputs the spatial average of the feature of providing extra focus specific features by splitting their
map of the final layer. The pixels with higher values are data into a multivariate set and many univariate sets based
associated with the pixels in the image associated with the on a prior knowledge. They utilized isloation forest to model
class label. Additionally, Selvaraju et al. [159] extend CAM the multivariate time series with DIFFI explaining the output.
to GradCAM by using the gradient information going into the They modeled the univariate time series with a Growing
last convolutional layer to understand the importance of the When Required (GWR) neural gas network. The multivariate
features. analysis was used for determining anomalies within most of
GradCAM has been validated through different studies via the variables, and the explanations were used to rank the
comparison and metrics. Mey et al. [44] focuses on the plausi- features causing the anomaly.
bility of XAI for explaining a CNN. They investigated Grad- Lorenti et al. [128] designed an unsupervised interpretable
CAM, LRP and LIME as methods of explaining a CNN for anomaly detection pipeline known as Continuous Unsuper-
anomaly detection. They found non-distinguishable features vised Anomaly Detection on Machining Operations (CUAD-
highlighted by LRP, and they found unimportant features MO). CUAD-MO consists of 4 parts: data segmentation and
highlighted by LIME. GradCAM was able to highlight the im- feature extraction, unsupervised feature selection via Forward
portant features that they labeled prior to CNN training. This Selection Component Analysis (FSCA), anomaly detection
could point towards model-specific methods outperforming via Isolation Forest, and post-hoc explainability via DIFFI.
model-agnostic methods when applicable. Their feature extraction consisted of adding basic statistics
10
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

and higher order moments of the signals such as Kurtosis. the difference between the output of the autoencoder and the
FSCA iteratively selects features to maximize the amount of input, they add this bias vector to the input data as to have a
variance explained. Finally, the Isolation Forest is used to corrected input. Moreover, the bias shows "incorrect" features
detect outliers which are handled as faulty events. These are based on the output; therefore, the bias would explain the
explained via DIFFI. They applied their method to 2 years of behavior of the autoencoder by showing which features are
computer numerical control data resulting in a 67% precision making the output anomalous.
rate. Roelofs et al. [120] also utilize their method for anomaly
detection and root cause analysis for wind turbines. They
3) LionForests. verify that ARCANA provides the most important feature
LionForests were introduced by Mollas et al. [161] as a local causing the issues with their wind turbines. This method is
explanation method specifically for random forests. Their done by firstly measuring the features reconstruction error.
method follows these steps: estimating the minimum number When performing ARCANA, the feature that shows the most
of paths for the accurate answer, reducing the paths through importance is the same feature with the largest error. They
association rules, clustering, random selection or distribution- then show that even when the feature does not appear in
based selection, extracting the feature-ranges, categorical the reconstruction error, ARCANA is able to find feature
handling of features, composing the interpretation, and vi- importance in sensors that are applicable to known anomalies.
sualizing the feature ranges. The outputs of their method
are the interpretations in the form of IF-THEN rules and C. COMBINATION OF METHODS
visualizations of the features. This section describes the works that used multiple explain-
Mylonas et al. [96] aimed to alleviate the non-explainable ability methods. Some of these works were utilized to just
nature of random forest by applying an expanded version of note the differences between the different explainable meth-
LionForests to fault diagnosis. They expanded LionForests ods. Other works compared the methods as to determine the
into the realm of multi-label classification by applying three better method. This section reviews the works that combine
different strategies: single label, predicted labelset, and la- multiple methods without aiming to declare one method as
bel subsets. Single label aims at explaining every individual better than another.
prediction (local); predicted labelset aims at explaining all Utilizing multiple explainable methods can be used in a
predictions (global); and label subsets aim at explaining based stacked manner or in a simultaneous manner. The stacked
on frequently appearing subsets of predictions. With their manner involves using explainable methods sequentially. In
expansion, their attention is focused on multiple machine Jakubowski et al. [95] they created a quasi-autoencoder for
failure datasets, but specifically the AI4I dataset [162]. They explainable anomaly detection. A surrogate model of XG-
utilized accuracy metrics such as precision, and they provided Boost was used as a way of simplifying the original model.
metrics for their explanations such as length of explanations They achieved a high R2 score using this XGBoost model
and coverage of data. One of the more notable elements of while adding explainability via TreeExplainer (SHAP).
their work involves comparing their XAI algorithm to other More commonly, a simultaneous utilization of explainable
algorithms, namely global and local surrogates and Anchors. methods appears in the literature where the authors obtain
multiple explanations from different methods. Khan et al.
4) Saliency Maps. [36] found the best architecture for their problem of RUL
Saliency maps were introduced by Simonyan et al. [163] as prediction amongst: random forest, SVM, gradient boosting,
a method for explaining CNN outputs. Given an input and elastic net GLM and an MLP regressor. After seeing the MLP
a model, saliency maps rank the pixels of the input based regressor to have the best performance, they used LIME and
on their influence on the output of the model. This is done SHAP to explain the output. LIME and SHAP did not have the
by approximating the output with a linear function in the same explanations, but they had similar explanations. Simi-
neighborhood of the input by using the derivative of the larly, Jakubowski et al. [76] performed an experiment testing
scoring function with respect to the input. This approximation five architectures and using SHAP and LIME as explainers.
is the saliency map. The found that SHAP and LIME had different explanations
throughout the different neural architectures suggesting a
5) Autoencoder-based Anomaly Root Cause Analysis fidelity concern between architectures.
(ARCANA). Like the prior two, Serradilla et al. [51] performed remain-
ARCANA was introduced by Roelofs et al. [120]. They no- ing useful life prediction on a bushings testbed. They tested
ticed that autoencoders were a popular method of detecting six different models and determined random forest regressor
anomalies in their target domain, wind turbines; however by to be the best. They then utilize two explainability methods
themselves, autoencoders are not interpretable. To overcome (ELI5 and LIME) to show global and local feature importance
this lack of interpretability, they implement ARCANA as a of driving model development. Brito et al. [42] performed
way of explaining the cause of the reconstruction error of an a large experiment that applied many unsupervised learn-
autoencoder. ARCANA works by minimizing a loss function ing algorithms for fault detection and fault diagnosis. They
that is based on reconstruction. As opposed to measuring showed that Local-DIFFI and SHAP seemed to be mostly in
11
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

agreement about the explanation for the model’s output, but mechanisms and could be used to add interpretability of why
they did not move further in asking which is better. the network output the DTC.
Ferraro et al. [38] focused on analyzing the effectiveness of For interpretable fault prediction, Wang et al. [56] pro-
explainability methods on the predictions of a recurrent neural posed a two-stage method based on anomaly detection and
network based model for RUL prediction. Notably, the model anomaly accumulation. The anomaly detection module was
performed well, but the focus was on the explainable methods made using a CT-GAN to train a discriminator on limited
SHAP and LIME. A quantitative analysis was performed data, i.e., faults. The anomaly scores from the CT-GAN
using three metrics: identity, stability and separability. This were fed into the anomaly accumulation module based on an
showed: (1) LIME was unable to give identical explanations Attention-LSTM. This modeled the temporal dependencies
for identical instances; (2) LIME more than SHAP gave of the anomaly scores while the attention mechanism was
similar explanations to instances in the same class; and (3) used to give importance to different anomalies at different
LIME and SHAP were able to give different explanations for time steps. Their model outperforms models such as SVM
instances in different classes. and LSTM on prediction and DTW on classification.
Li et al. [66] aimed at integrating explainability into an Xu et al. [133] was not only interested in anomaly detec-
AutoML environment used for vehicle data. They tested four tion, but also anomaly precursor detection, early symptoms
different AutoML platforms: AutoSklearn, TPOT, H2 O, and of an upcoming anomaly. They argued that detecting pre-
AutoKeras. They performed two different experiments where cursors is useful for early prediction of anomalies to better
they provided different subsections of their dataset with both understand when and what kind of anomaly will occur. They
resulting in TPOT performing the best in accuracy. Finally, proposed Multi-instance Contrastive learning approach with
they apply LIME and SHAP to the resulting model to explain Dual Attention (MCDA) to tackle the problem of anomaly
a local sample and the whole model. Their work results in precursor detection. MCDA combined multi-instance learn-
a defined workflow for an automatic predictive maintenance ing and tensorized LSTM with time-dependent correlation
system that includes explainability. to learn the precursors. Additionally, the dual-attention mod-
ule produced their interpretable results. This approach had
VI. INTERPRETABLE ML IN PREDICTIVE MAINTENANCE high accuracy results, and their attention mechanism provided
Interpretable machine learning (iML) encompasses many variables which are explanatory for the results. Importantly,
methods whose inner-workings are understandable without they verified these explanations with domain experts.
requiring a post-hoc method for explanation generation.
These methods can be interpreted by the target audience with- B. FUZZY-BASED.
out the need of separate methods to serve as a translator be- Fuzzy logic was introduced by Dr. Lofti Zadeh [165] as a
tween the model and the person. iML methods namely consist way of understanding the approximate mode of reasoning
of architectures that can have human-readable outputs such as opposed to the exact. Following this approximate model
as rule-based systems, simple visual representations such as of understanding, all knowledge would come with a degree
decision trees and simple networks or physical mappings that of confidence as opposed to a statement being 100% in a
are intelligible to the user. category. This adds some interesting and useful components
to machine learning as these in-between categories can be
A. ATTENTION. utilized in a way that is different from having all information
Attention was introduced by Vaswani et al. [164] as a method fall strictly into one category.
of natural language processing. This attention module gets Fuzzy-based methods apply fuzzy logic in different ways.
extended to introduce the transformer architecture that has led Lughofer et al. [53] and Kothamasu et al. [52] used type 1
to many famous models such as GPT. The weights from the fuzzy logic. Lughofer et al. proposed a framework of repre-
attention modules can be visualized to allow interpretation of sentation learning based on transfer of fuzzy classifiers. The
the aspects the architecture is focusing. transfer learning matched the distributions between the source
Xia et al. [58] and Hafeez et al. [64] tackled interpretable data and the target task using fuzzy rule activation. This was
fault diagnosis in two separate ways. Xia et al. looked at done by feeding the model all of the source data and the
hierarchical attention by grouping the features by systems and healthy data from the target domain. Through this training,
subsystems. They utilized BiLSTM encoders with attention the model classified unseen healthy and unhealthy data from
to obtain important features where the attention components the target task. Their model did not outperform all black box
added interpretability. Hafeez et al. created an architecture models; however, it was in the upper ranks of performance
known as the DTCEncoder to learn low level representations while bringing interpretability to the user.
of multivariate sequences with attention. It utilized the Diag- Additionally, Kothamasu et al. [52] presented a Mamdani
nostic Trouble Codes (DTC) commonly found in predictive neuro-fuzzy modeling approach for two use cases, bearing
maintenance problems as a class label for fault diagnosis. fault detection and aircraft engine fault diagnosis. They chose
Dense layers were used to translate the encoded latent space this model as it has the characteristics of being adaptive,
from DTCEncoder into a probability distribution for the dif- flexible, lucid, and robust. Their model consists of five layers:
ferent DTCs. The latent space was learned using attention input, linguistic term input, rules, linguistic terms output, and
12
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

FIGURE 10. Use of iML Methods

defuzzification. As the rules can become undistinguisable between different features where the links take the form
through training, they utilized Kullback-Leibler mean infor- of a link when discussing graphs or production rules when
mation to refine the rules. discussing production systems. These methods produce inter-
Fuzzy-based methods can also take the form of higher- pretation by providing these connections within the features,
order fuzzy logic as seen by Upasane et al. [123], [141]. They usually in the form of natural language.
proposed a type 2 fuzzy logic system for fault prediction Xia et al. [142] proposed a maintenance-oriented knowl-
to allow interpretability [123]. Additionally, the Big-Bang edge graph to apply for predictive maintenance of oil drilling
Big-Crunch (BB-BC) evolutionary algorithm was used for equipment. Once they had the maintenance-oriented knowl-
optimizing the number of antecedents of their fuzzy logic edge graph, an attention-based compressed relational graph
system. This was optimized for minimizing the RMSE of their convolutional network (ACRGCN) was used to predict solu-
system. Their system was able to get a very low RMSE with tions for different faults by predicting links between knowl-
100 rules and six antecedents per rule. edge. This method also explained faults due to its knowledge-
Upasane et al. [141] extended their previous work [123] to graph that maps different symptoms and maintenance re-
include most of the faults that can occur as well as proposing quirements. Even though knowledge-graphs have inherent
an explainable framework. While maintaining accuracy with interpretability, they created a question-answer system that
more faults is noteworthy, the experiment’s measurement of allowed the user to query the graph.
users’ trust was quite unique compared to the literature. They Salido et al. [99] created a fuzzy diagnosis system based
observed that 80% of the respondents agreed or strongly on knowledge-based networks (KBN) and genetic algorithms
agreed with having trust in the interpretable system. This trust (GA). The KBN constructed fuzzy rules using neural learning
is attributed to the explainable framework and interpretable where the input is the features and the following layers are OR
nature of their architecture; moreover, the interface is noted neurons and AND neurons. To determine the optimal number
to provide helpful insights to the users that would minimize of neurons, they used a GA. Importantly in their GA, they
downtime of the assets. added a metric to measure simplicity of their rules by making
more concise rules. With their architecture, they could 1)
C. KNOWLEDGE-BASED. detect a fault and 2) explain the fault using an IF-THEN rule
In this paper, knowledge-based approaches include meth- which can be used as a method of root cause analysis.
ods such as knowledge-graphs, knowledge-based systems, Cao et al. [113] created an approach based on knowledge-
knowledge graphs, etc. Knowledge-based approaches focus based systems for anomaly prediction. Their method is bro-
on a symbolic representation of the data that one can find in a ken into three parts: pruning of chronicle rule base, integra-
source of data. These representations consist of connections tion of expert rules, and predictive maintenance. Pruning of
13
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

chronicle rule base consists of mining the rules with frequent backward tracking. This network has 1) interpretable mean-
chronicle mining, translating the rules into SWRL rules, and ing from the wavelet kernel convolutional layer, 2) capsule
using accuracy (how many true rules) and coverage (how layers that allow decoupling of the compound fault, and 3)
many true encompassing rules) to select the best quality backward tracking which helps interpret output by focusing
rules. The integration of expert rules involved receiving input on the relationships between the features and health condi-
from the experts and placing the same restrictions on their tions. Not only was their framework able to achieve high
rules. Finally, the rules were used for anomaly prediction of accuracy on all conditions, including compound faults, but
semiconductors. also they showed that the backward tracking method can
decouple the capsule layers effectively.
TABLE 3. Interpretable Methods from the Literature Ben et al. [49] proposed a new architecture, SincNet, that
trains directly on the raw vibration signals to diagnose bearing
Method Articles
faults. Their architecture utilized interpretable digital filters
Attention (VI-A) [56], [58], [64], [78], [133]
Fuzzy (VI-B) [52], [53], [123], [141] for CNN architectures. They reduced the number of trainable
Knowledge-based (VI-C) [99], [113], [142] parameters and extracted meaningful representations by hav-
Sparse Networks (VI-M) [46], [100], [121] ing the predefined functions serve as the convolution. When
Interpretable Filters (VI-D) [45], [49], [60]
Decision Tree (VI-E) [57], [115] comparing the performance to a CNN, the SincNet had a
Fault Tree (VI-F) [79], [127] higher precision and reached convergence faster.
Physical Constraints (VI-G) [98], [143]
Statistical Model (VI-H) [41], [55]
Graph Attention Networks (VI-I) [87] E. DECISION TREES.
Gaussian Mixture Model (VI-J) [125] Decision trees encompass both classification and regression
Explainable Boosting Machine (VI-K) [76]
Hidden Markov Model (VI-L) [80]
trees that date back to the first regression tree algorithm
Prototype (VI-N) [62] proposed by Morgan and Songquist [167]. Decision trees
Signal Temporal Logic (VI-O) [136] create a tree-based architecture where each set of children
Digital Twin (VI-P) [144]
Symbolic Life Model (VI-Q) [33]
of each node is split using a feature. To produce an output,
Generalized Additive Model (VI-R) [43] a decision tree algorithm starts at the root of the tree and
MTS (VI-S) [118] proceeds down the tree by evaluating the feature that is used
k-Nearest Neighbors (VI-T) [145]
Rule-based Interpretations (VI-U) [146]
for splitting. The output corresponds to the final leaf node that
the decision trees reaches on its path.
Amram et al. [115] utilized two types of decision trees,
optimal classification trees [168] and optimal survival trees
D. INTERPRETABLE FILTERS. [169]. Their goals included predicting the RUL of long-term
Interpretable filters are a concept that brings specific wave- health of hard drives, predicting RUL of the short-term health
forms to a CNN architecture as a way of showing what signals of hard drives, predicting failure classification in short-term
are being learned. As explained in Ravanelli and Bengio health of the hard drives and performing similar experiments
[166], the first layer of a CNN appears to be important for with limited information. Their results showed that they could
waveform-based CNNs. In using these interpretable filters gather better results using separate models for the tasks as
that take the form of common waveforms, one can begin to opposed to using one model. They also showed the inter-
understand the behavior of the CNN if one understands the pretable methods shared many of the important features for
behavior of the waveform. the different tasks.
Li et al. [45] aimed to improve CNN-based methods for Panda et al. [57] aimed at tackling the problem of com-
PHM by addressing the black box problem. They proposed mercial vehicle predictive maintenance by designing an in-
the Continuous Wavelet Convolution (CWC) layer which is terpretable ML framework. To simplify their problem, they
designed to make the first layer of a CNN interpretable. solely looked at the air compressor system. By looking at the
It does this by using a library of filters that have physical air compressor system, they ran a broad experiment that an-
meanings which are convolved on the input signal. These con- alyzed different configurations of models and data. The C5.0
volutions can be traversed along the series and projected into with boosting model performed the best, and the inclusion
a two-dimensional time and scale dimension. Its performance of Diagnostic Trouble Codes with the sensor data raised the
was compared with a CNN with different wavelets, and their performance metrics.
findings were two-fold. Firstly, the performance of the CNN Simmons et al. [139] argued that the dynamics of a time-
with a CWC layer showed better performance than a CNN series are in themselves discriminative of health or failure.
without. Lastly, the CWC learned a well-defined waveform Additionally, the dynamics are interpretable because they
while the one without learned what looked to be a noisy and are derived directly from the information. These ideas were
uninterpretable representation. translated into the data mining domain by creating features
Li et al. [45] built on their previous work by examining that represent shorter time series in the temporal, spatial, and
compound faults. They designed an interpretable framework mixed domains. The features went through a rank-based se-
called wavelet capsule network (WavCapsNet) which utilizes lection process which gathered features that were statistically
14
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

different between classes. These features were used to train models aim to combine model-based and data-driven ap-
a Light Gradient Boosting Machine (LightGBM) which is a proaches by attaching the mathematical properties of the
type of gradient boosting decision tree introduced by Ke et al. system to the data in data-driven approaches [98].
[170]. This method allows for constant monitoring of feature Tod et al. [143] implemented a first-principle model-based
importance during training which can be used for interpreting approach to assess the health of solenoid operated valves.
the results. Compared to other first-principle models, their improved
model takes other degradation effects into account, namely
F. FAULT TREES. shading ring degradation and mechanical wear. The method
Fault trees were introduced by H.A. Watson at Bell Labs in extracts three condition indicators which allows them to de-
1961 [171]. Fault trees were introduced as an understandable tect problematic signals that can be directly mapped to phys-
model that can learn complex systems and perform root cause ical components through their model.
analysis. They are tree-like structures that are created using Wang et al. [55] performed fault diagnostics of wind tur-
different types of nodes: basic events, gate events, condition bines. Their method was an online method that detected
events, and transfer events. Basic events are the nodes that issues with bearings. Coupled with equations that represent
represent either a failure event or a normal operating event. the physical aspects of the bearings, they detected issues sur-
Gate events are the logic combining nodes and consists of rounding clearance of the bearings with high interpretability.
AND, OR, Inhibit, Priority and Exclusive OR. Condition Their interpretation specifically showed the different frequen-
events represent conditions that must occur for a gate event cies around the physical features of the bearings.
to occur. Transfer events are nodes that point to somewhere Xu et al. [98] propose the physics-constraint variational
else in the tree. With all of these gates, fault trees are able to neural network (PCVNN) as applied to external gear pumps.
learn root causes for different faults that can occur in a system. The PCVNN is physics-informed asymmetric autoencoder
Verkuil et al. [127] noticed that fault trees are made via where the encoder is a stacked CNN, BiLSTM, Attention
human intervention. With the idea of automating the process, network while the decoder is a generative physical model.
they applied the C4.5 tree combined with LIFT to create This would allow for an NN to learn the data, and it would
fault trees for domestic heaters. C4.5 is used to learn the allow the physical model to represent the learned patterns in
failure thresholds of the sensor data. LIFT creates fault trees a way that is consistent with the physics of the problem.
in an iterative process using the learned features. While they
do not provide a performance metric, they note that their H. STATISTICAL METHODS.
method cannot be optimal for the reasons of oversimplifying Statistical methods are using for explaining by analyzing
the problem and using a greedy heuristic. However, domain different features along different classes using statistical tests,
experts weighed in on the explanations provided in a positive such as Student’s t-test [172], Pearson’s chi-squared test
manner. [173], etc.
Waghen et al. [79] utilized fault trees to perform inter- Yao et al. [41] proposed a framework with interpretable
pretable time causality analysis. Their methodology consisted and automatic approaches that consisted of solely statistical
of building multiple logic trees for each subset of data. These processing. Their method proposed kurtosis-energy metric to
logic trees were aggregated into one fault tree representing define key sub-bands, a new health index of these sub-bands,
the multiple trees. They performed interpretatable time cause a joint statistical alarm and fault identification strategy. Addi-
analysis by going through each variable in the fault tree. By tionally, they proposed a health phase segmentation strategy
traversing the fault tree, they were able to extrapolate rules for health phase assessment and degradation pattern analy-
that can model the causality through time towards faults. sis. This method involved analyzing the data on the time-
frequency domain and suppressing the disturbing components
G. PHYSICAL CONSTRAINTS. such as noise. This analysis was able to help form the sub-
Physical constraints are used to bring real-life limitations to bands for monitoring the current state. If it fails statistical
the data-driven models. This can be in the form of mapping tests, then an anomaly is detected. They tested their method
the input and output of the architectures to physical compo- on the PHM 2012 rolling bearing dataset, and they reported
nents, or more commonly, utilizing known physics informa- very low false positives.
tion or equations about the real-life system in the architecture
of their model in some way. I. GRAPH ATTENTION NETWORKS (GATS).
The methods of applying physical constraints can be GATs were introduced by Velivckovic et al. [174] as a way
seen in different forms, namely model-based approaches and of combining self-attention layers with graph-structured data.
physics-informed approaches, which need to be differenti- This is done by applying attention layers where nodes can
ated. Model-based approaches are created to model a system attend whole neighborhoods of previous graph nodes. While
without the training of a network with the data provided, this comes with many benefits, the main two come from the
separate from data-based models [143]. These model-based benefits that other architectures gain from attention mecha-
approaches have physical constraints as they have to model nisms and the retraction of needing prior knowledge of the
the mathematical properties of the system. Physics-informed graph structure.
15
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

TABLE 4. Examples of Articles from Sample Population

Title Objective Contribution


Impact of Interdependencies: Multi-Component Perform predictive maintenance by modeling in- Showed with statistical significance that interde-
System Perspective Toward Predictive Mainte- terdependencies and test their importance pendency modeling increases performance and
nance Based on Machine Learning and XAI [112] understandability of a model
Explainable and Interpretable AI-Assisted Re- Compute RUL of the CMAPSS turbofan dataset Showed that LIME performed poorly when ap-
maining Useful Life Estimation for Aeroengines with LIME explaining the performance plied to segments with no faults but performed
[35] well when labeling features with failing se-
quences
Explainability-driven Model Improvement for Perform predictive maintenance by embedding Introduced the idea of explainability-driven train-
SOH Estimation of Lithium-ion Battery [126] explanations into the training loop ing for predictive maintenance
Online Anomaly Explanation: A Case Study on Apply XAI methods to the online learning pro- Showed that local and global explanations could
Predictive Maintenance [65] cess be added into the online learning paradigm
Explaining a Random Forest with the Difference Utilize two ARIMA surrogate models to explain Introduced a method of sandwiching a model
of Two ARIMA Models in an Industrial Fault the capabilities of a random forest model between two surrogates to show where a model
Detection Scenario [82] fails to perform well
Edge Intelligence-based Proposal for Onboard Test XAI on edge computing for fault diagnosis Provided a method of performing XAI in an
Catenary Stagger Amplitude Diagnosis [89] edge computing example coupled with AutoML
libraries
Explainable AI Algorithms for Vibration Data- Discover the plausibility of XAI methods ex- LRP showed non-distinguishable features, LIME
based Fault Detection: Use Case-adapted Meth- plaining the output of CNN architectures showed unimportant features, and GradCAM
ods and Critical Evaluation [44] showed the important features
On the Soundness of XAI in Prognostics and Compare different XAI methods for the Showed different metrics for comparing expla-
Health Management (PHM) [37] CMAPSS and lithium-ion battery dataset nations generated by different XAI methods and
showed GradCAM to perform the best on CNN
architectures
Interpreting Remaining Useful Life Estimations Perform RUL of bushings through multiple dif- Showed the importance of applying global and
Combining Explainable Artificial Intelligence ferent models and explanatory methods local explanations to interpret performances of
and Domain Knowledge in Industrial Machinery models from all aspects
[51]
Evaluating Explainable Artificial Intelligence Analyze the effectiveness of explainability meth- Utilized three metrics to compare explanations
Tools for Hard Disk Drive Predictive Mainte- ods for recurrent neural network based models for from LIME and SHAP and showed where each
nance [38] RUL prediction of them shine over the others
Automatic and Interpretable Predictive Mainte- Aimed to integrate explainability into an AutoML Defined a workflow for an automatic explainable
nance System [66] environment predictive maintenance system
DTCEncoder: A Swiss Army Knife Architec- Perform fault detection by classifying DTCs Designed the DTCEncoder that utilizes an atten-
ture for DTC Exploration, Prediction, Search and tion mechanism to provide an interpretable latent
Model interpretation [64] space as to why the a DTC is output
Deep Multi-Instance Contrastive Learning with Perform anomaly detection and anomaly precur- Performed anomaly precursor detection through
Dual Attention for Anomaly Precursor Detection sor detection multi-instance learning with verified explana-
[133] tions through domain experts
A Type-2 Fuzzy Based Explainable AI System Utilize an evolutionary algorithm to optimize Used a type 2 fuzzy logic system and evolution-
for Predictive Maintenance Within the Water their fuzzy logic system for fault prediction ary optimization to generate fuzzy rules for fault
Pumping Industry [141] prediction
Waveletkernelnet: An Interpretable Deep Neural Improve CNN-based methods for PHM Designed the Continuous Wavelet Convolution to
Network for Industrial Intelligent Diagnosis [45] add physical interpretations to the first layer of
CNN architectures
Restricted Sparse Networks for Rolling Bearing Perform fault detection using a sparse network Explored the Restricted-Sparse Frequency Do-
Fault Diagnosis, [46] main Space and used the transform into this space
to train a two-layer network that performs equal
to a CNN-LSTM
Interpretable and Steerable Sequence Learning Construct a deep learning model with built-in Introduced Prototype Sequence Network (ProS-
via Prototypes [62] interpretability for fault diagnosis via DTCs eNet) which uses prototype similarity in the train-
ing of the network and justified the interpretabil-
ity of their approach via a user study on Amazon
MTurk
Causal and Interpretable Rules for Time Series Perform predictive maintenance while utilizing Designed Case-crossover APriori algorithm
Analysis [146] causal rules for explanations for predictive maintenance which showed both
higher performance occurs when having rules
that are additive and subtractive to an output

Liu et al. [87] designed a framework for fault detection directed acyclic graph. The Disentangled Causal Attention
based around the Graph Convolutional Network and Graph (DC-Attention) aggregates the causal variables for generating
Attention Networks. They propose the Causal-GAT. Causal- representations of the effect variables. The DC-Attention out-
GAT is comprised of two parts: causal graph construction puts the system status (faulty or not faulty). They then utilize
and DC-Attention for extracting features and detection. The a custom loss function that calculates the distance between
causal graph construction uses causal discovery methods the current support of representations and its theoretically
and/or prior expertise to encode monitoring variables into a disentangled support.

16
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

J. GAUSSIAN MIXTURE MODEL. learning agent learned a policy for maintenance based on the
As described by Reynolds [175], Gaussian mixture mod- failures. The first challenge of this approach involves repre-
els (GMMs) is a probability density function designed as a senting predictive maintenance as a reinforcement learning
weighted sum of Gaussian component densities. The com- problem. This is done by representing the potential actions
ponent densities are created using the mean vector and co- as hold, repair, or replace, creating a reward function based
variance matrix of the data while the mixture weights are on holding, early replacement and replacement after failure,
estimated. GMMs are commonly used due to their capability and measuring the cost based on these reward functions. The
of representing information via a discrete set of Gaussian HMM is used for interpreting the output of their model by
functions to improve modeling of larger distributions. These observing the features that led the model into detecting a
models can be labeled as interpretable as the models directly failure state.
represent the distributions of the features. These models can
then be directly used to explain the features. M. SPARSE NETWORKS.
Csalodi et al. [125] performed survival analysis via a Sparse networks are neural networks that are limited in
Weibull distribution by representing the operation signals their architecture. Large deep neural networks are inherently
as a Gaussian mixture models and the parameters of the blackbox models; however, interpretable whitebox models
Weibull model via clustering. Specifically, their method used can take the form of very simple neural network models
an expectation-maximization algorithm which consists of two such as linear regression or logistic regression models. As the
parts. The expectation step determined the probability that models are simple, the impacts of the input features can be
a data point belongs to any cluster given the survival time seen as they are propagated through the network.
and parameters while assuming the clustering is correct. The Beretta et al. [121] utilized two different models for pre-
maximization step updated the parameters for the Gaussian dictive maintenance: a gradient-boosting regressor to model
mixture models and the Weibell distribution to better rep- the normal data and an isolation forest to model the fault
resent the data. When applying their method to lithium-ion data. The output of these are merged with a mean average
batteries, they represented distributions of unhealthy batter- of the temperature readings to create a score of failure. The
ies quite accurately while healthy batteries were less well- authors praise the simplicity of the algorithms as the source
represented. This occurred due to the large category of healthy of interpretability in their method.
data which was harder to represent in one small model while Pu et al. [46] explored a new frequency domain space they
the unhealthy data could be easily represented when isolated. call the restricted sparse frequency domain space (RSFDS)
for rolling bearing faults. The RSFDS breaks down the fea-
K. EXPLAINABLE BOOSTING MACHINE (EBM). tures into a space that is made of real and imaginary points.
EBM were introduced by Nori et al. [176] as a glassbox This space is able to visualize boundaries that have physical
model, another term for interpretable model, with similar ac- meanings to the faults. They use a simple two-layer neural
curacy to that of state-of-the-art blackbox algorithms. EBM is network to these points, and they achieve high performance
a type of generalized additive model that learns each feature’s equal to that of a CNN-LSTM with less memory and CPU
function using techniques such as bagging. Additionally, it usage.
can detect interactions between features and include those Langone et al. [100] proposed a model for interpretable
pairs of terms by learning functions of combinations of fea- anomaly prediction based on a logistic regression model with
tures. Because of its nature as an additive model, the features elastic net regularization. Their method is made of 3 steps:
can be explained by their impact on the outcome. data preparation, learning and refinement of the prediction
model. In the data preparation phase, they categorize the data
L. HIDDEN MARKOV MODEL. using included statistics, apply windowing to the data, and fi-
HMMs were introduced by Baum and Petrie [177] and can be nally mark the windows as either being anomalous or not. The
described as a statistical state-space algorithm [29]. HMMs learning phase consists of learning the relevant features from
represent the learning as a statistical process that transitions the windowed data. This includes considering the feature dis-
between states, and HMMs represent the output as separate tributions across failures and non-failures and measuring the
states that extend from the transitional states. HMMs, as a distance according to the Kolmogorov-Smirnov metric. The
statistical process, can discern hidden states from the data refinement of prediction model phase consists of the training
that may not be readily apparent. They are also capable of and utilization of the logistic regression model. Coupled with
learning combinations of sensor data, leveraging confounding elastic net regularization, this model selected a smaller subset
variables, and executing dimensionality reduction to simplify of the original data and captures the variable correlations.
the complexity of the data. [80]. They applied their method to a plunger pump in a chemical
Abbas et al. [80] combined the input-output HMM with plant and produced relative good and consistent scores.
reinforcement learning to make interpretable maintenance de-
cisions. Their hierarchical method consisted of two steps. The N. PROTOTYPE LEARNING.
input-output HMM filters the data and detects failure states. Prototype learning, as described by Ming et al. [62], is a form
Once the failure state was detected, the deep reinforcement of case-based reasoning that determines the output of an input
17
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

by comparison to a representative example. Determining the asset and a digital system that holds the information about
best prototypes is a problem itself, but the interpretability the physical system. Using digital twins, one can observe
it bring is apparent. The output of a specified input would the performance of the physical system without having the
be similar to its most similar prototype’s output; therefore, physically observe the asset.
the reason that the input data has a certain output is due Mahmoodian et al. [144] proposed the use of a digital
to the output of a very similar piece of data. This brings twin to monitor the infrastructure of a conveyor. Their digital
interpretability via comparison to the prototype. twin consists of taking in real data from different sensors and
Ming et al. [62] used the concept of prototype learning to simulating the data. This data is compared to the real time
construct a deep learning model with built-in interpretability. data to ensure the data is consistent. Their digital twin can
They introduced the prototype sequence network (ProSeNet) display the different information as well as receive input from
for a multi-class classification problem of fault diagnosis via the users to rate the explanations given. If it is seen as not
diagnostic trouble codes. The model consists of a sequence valid, the digital twin can run simulations surrounding that
encoder that is based on a recurrent architecture. The hidden data to increase its accuracy.
state is fed into a prototype layer that determines how similar
the hidden state is to prototypes in the form of a similarity Q. SYMBOLIC LIFE MODEL.
vector. The network then outputs a prediction probability for Symbolic life models aim to alleviate the black box effect by
the different classes based on the similarity vector. Inter- modeling the process learned by mapping relationships and
pretability can be conceived via the prototypes that are most results. Symbolic life models are a form of symbolic regres-
similar to the input. They justified the interpretability of their sion based on genetic programming. This method creates a
model by using Amazon MTurk and surveying the users about tree representation of an equation where the nodes are an in-
the interpretability. They also studied how the input of human put, a mathematical expression or a number. The output of the
knowledge would affect the interpretability. They showed that tree given an input is found by traversing the tree and perform-
including the human feedback improved the interpretability ing the mathematical expressions as nodes are expanded. The
of their network in a post-study of different Amazon MTurk genetic algorithm is used to perform crossovers and mutations
users. based on the different mathematical functions and numbers
where the goal is to maximize the tree’s performance on a
O. SIGNAL TEMPORAL LOGIC (STL). given dataset. For more detailed information, we recommend
Introduced by Maler and Nichovic [178], STL as a type of Augusto and Barbosa [180].
temporal logic that is used for dense-time real-valued signals. Ding et al. [33] proposed the use of symbolic life models,
STL is defined as predicates over atomic propositions. These specifically dynamic structure-adaptive symbolic approach
STL rules are formed by applying Boolean filters for these (DSASA), as a way of modeling RUL. DSASA combines the
atomic propositions that transforms a signal into a Boolean evolving methods of symbolic life models with the structure
signal. This involves considering: the filter that is being ap- of adaption methods. An initial symbolic life model is created
plied, the length of the signal, the sampling of the signal and from a genetic programming algorithm and run-to-failure
any additional desired samples. We refer the reader to Maler data. This is followed by the dynamic adjustment to the life
and Nichovic [178] Section 4 for an example. models based on the performance on real-time information.
Chen et al. [136] performed fault diagnosis on a furnace This creates groups of improved models that can all be used
using internet-of-things, reinforcement learning, and signal for prediction. The life models are interpretable as they are
temporal logic. Their algorithm takes in the STL grammar simple models that perform based on the physical constraints.
and labeled input data, and it outputs an optimal STL formula.
The agent chooses a formula from the agenda and adds it to R. GENERALIZED ADDITIVE MODEL (GAM).
a chart based on the current policy. The evaluator evaluates Introduced by Hastie and Tibshirani [181], GAMs are a way
the performance of the formula on the input. The learner of estimating a function by summing a list of nonlinear func-
updates the policy function according to the performance. tions in an iterative manner as to become better with accurate
The agenda is updated based on the formulas in the chart. local models as opposed to an overarching global model.
They utilize an MDP to construct the agenda-based formulas These local models are smoothed using a series of smoothing
while the reinforcement learning solves the problem. They functions. Additionally, these local models are independent
apply their method to multiple faults demonstrating good of one another as they are trained using single features. These
robustness results, fast runtimes, and statistically significant local models allow for interpretability as well as importance
performances. related to their impacts on the outcome of the GAM.
Yang et al. [43] introduced the Noise-Aware Sparse Gaus-
P. DIGITAL TWIN. sian Process as a way of solving the scalability and noise sen-
Digital twins originated in 2002 as described by Grieves and sitivity issues of normal Guassian Processes. Based on their
Vickers [179] as a way of creating a digital construct that NASGP algorithm, they developed an interpretable GAM that
describes a physical system. Moreover, digital twins consist uses additive kernels and individual features. They applied
of two systems: a physical system that is represented by the their method to the IEEE PHM 2012 data challenge in forms
18
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

of RUL prediction and fault diagnosis. Their method per- U. RULE-BASED INTERPRETATIONS.
formed well in comparison to other methods and allowed a Similar to rule-based explainers presented in V-A, rule-based
level of interpretability. interpretations involve utilizing rules that are learned from the
data. Unlike the rule-based explainers, rule-based interpreta-
S. MAHALANOBIS-TAGUCHI SYSTEM (MTS). tions remove the black-box from the problem. This allows the
rules to be directly learn from the information as opposed to
MTS was introduced by Taguchi and Jugulum [182] as a
learning from the black-box model and the data.
diagnosis and forecasting method. This method bases its
Dhaou et al. [146] proposed a novel approach that com-
discriminative power on the Mahalanobis distance calcula-
bines case-crossover research design with Apriori data min-
tion; this method cannot feasibly work if the classes cannot
ing. This combination resulted in the Case-crossover APriori
be distinguished this way. The feature space is reduced via
(CAP) algorithm for association and causal rules explanation.
orthogonal arrays and signal-to-noise ratios. The orthogonal
The case-crossover design describes the way of setting up the
array contains different subsets of the features. The signal-to-
problem. They ignored the group of data where nothing goes
noise ratio measures the abnormality of the feature. Finally,
wrong, and they focused on the subjects that have the class
the Mahalanobis distance is maximized by only including the
change. In the case of predictive maintenance, a class change
features whose signal-to-noise ratio increases the distance.
would be from healthy to failure data. The case-crossover
This maximized distance can be seen as the reason for a
design looks at the period prior to class change as the control
diagnosis, which is determined by the features that are used
group, and it looks at moments before the class change as the
to calculate the distance.
case period. These data points are combined with Association
Scott et al. [118] introduced use of the Mahalanobis- Rule Mining APriori to extract causal rules. These causal
Taguchi system for fault detection. MTS utilizes Mahalanobis rules can be both additive (predictive of truth) and subtractive
distance, orthogonal arrays, and signal-to-noise ratios for (predictive of falsehood). Their results show that both additive
multivariate diagnosis and prediction. The Mahalanobis space and subtractive rules help with performance, and they show
represents the stable operations andyields the difference of an their algorithm to outperform random forest on the same
observation from stable. The orthogonal arrays and signal-to- problem.
noise ratio is used to diagnose or identify variables responsi-
ble for the fault. This method was able to detect roughly 75% VII. CHALLENGES AND RESEARCH DIRECTIONS OF
of the faults tested. EXPLAINABLE PREDICTIVE MAINTENANCE
XAI and iML have been successfully utilized in predictive
T. K-NEAREST NEIGHBORS (KNN). maintenance on many accounts. Researchers have shown that
these methods can add to a prediction in a way that can be used
Originally introduced in 1951 by Fix and Hodges [183], kNN
for root cause analysis, validation of faults, etc. The main fo-
is a supervised learning algorithm that is based on grouping
cus of much of the research focuses on adding explainability
input data with the k most similar other pieces of input data.
to a complex and unexplainable problem. While an important
It represents the input data as a large feature space. The
aspect of this field of study, there are multiple facets to the
output of some input data is represented by its place in the
problem that generally go under-represented.
feature space in relation to the k closest other data points.
Small k values lead to less consideration for the output value
A. PURPOSE OF THE EXPLANATIONS
of the input data; however, it also leads to a more specific
All explanations serve one overarching purpose: produce rea-
output. Larger k values lead to considering more values when
sons that make the model’s functioning understandable. This
determining the output; however, too large k values will make
information transfer has taken form in visualizations of data
the output less meaningful.
distributions, visualizations of feature importance graphs,
Konovalenko et al. [145] used a modified kNN algorithm predictive rules, etc; however, the information is not specific
for generating decision support of temperature alarms. They to a target audience. To echo Neupane et al. [15], "explana-
tackled three problems associated with kNN: (1) the difficulty tions are not being designed around stakeholders". Not only
associated with sparse regions; (2) the blindness to class are the explanations not being designed for stakeholders, but
boundaries leading to misclassifications; and (3) sensitivity also many explanations do not have a target audience outside
to class overlap. These problems were addressed by adding of the implicit audience of the model’s designer.
principles of local similarity and neighborhood homogeneity. Barredo Arrieta et al. [184] provides a list of potential au-
Local similarity refers to the idea that a new data is closer to diences XAI can target. While they go into more detail, some
training samples with the same class label. Neighborhood ho- potential target audiences, especially for predictive mainte-
mogeneity is the idea that new data falls into a neighborhood nance, could be the data scientists and developers creating
where the class label represents the majority. This method is the predictive system, the project managers and stakeholders
interpretable through its ability to separate classes of data on in the project, or even the mechanics working on the physical
a small dimensional graph. systems. These different people may need different types of
explanations ranging from more explanations relating to the
19
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

physical and time domains to higher level abstract informa-


tion.
TABLE 5. Explanation Evaluation Metrics from [185]–[189]

Metrics Viewpoint Description


D Objective Difference between the model’s
performance and the explana-
tion’s performance
R Objective Number of rules in explanations
F Objective Number of features in explanation
S Objective The stability of the explanation
Sensitivity Objective Measure the degree in which ex-
planations are affected by small
changes to the test points
Robustness Objective Similar inputs should have similar FIGURE 11. Potential Audiences of Explainable Predictive Maintenance.
explanations Icons taken from 2
Monotonicity Objective Feature attributions should be
monotonic; otherwise, the correct
importance is not calculated Kadir et al. [188] propose a taxonomy of XAI evaluations
Explanation cor- Objective Sensitivity and Fidelity
rectness as they appeared in the literature. They identified 28 dif-
Fidelity Objective Explanations correctly describe ferent metrics through their literature search. These metrics
the model; features and their attri- are broken down into a taxonomy of how the analysis is
bution are correlated
Generalizability Objective How much one explanation in- performed. An example would be sensitivity analysis for local
forms about others explanations. Sensitivity analysis is broken down into the
Trust Subjective Measured through user question- removal of features and the addition of features. Each of these
naires
Effectiveness Subjective Measures the usefulness of the ex- categories then includes many methods that were used.
planations Hoffman et al. [189] express the importance of high quality
Satisfaction Subjective Ease of use explanations in XAI. If explanations are received well and
are valid, a user would be better equipped to trust and use
a system that employes the XAI process. This allows for
B. EVALUATION OF THE EXPLANATIONS multiple areas of evaluations including the goodness of the
In the literature presented above, there are over ten differ- explanation, the satisfaction the explanations provided to
ent evaluation metrics for the performance of the machine the users, the comprehension of the user, the curiosity that
learning algorithms, including RMSE, MAPE, FP, etc. This motivates the user, the trust and reliance the user has with
shows that the field has collectively come to an agreement on the AI, and the performance of the human-XAI system. They
how we should measure performance in a meaningful way. provide methods for measuring these metrics that are readily
The evaluation of the explanations has not received the same available.
attention as the performance of the algorithm even though
work has been done in defining these different metrics, some C. ADDITION OF HUMAN INVOLVEMENT
of which are seen in Table 5. The target audience of an explainable system is a human
Miller [185] provides one of the most in-depth descriptions subject whether a data scientist, a stakeholder, an engineer,
of various people’s needs regarding explanations. Miller has or other. Addressing the needs of different types of users of
provided many theoretical representations for explanation an explainable system is an important area of research that
including scientific explanations and data explanations. They is currently lacking. As seen in Fig. 11, different people on
also provide much more information including levels of ex- the same task have different goals and desires from predic-
planation that could be applicable to different types of users, tive maintenance. While compensating for these differences
structures of explanations that could impact the power of the would be difficult, we suggest a way to accomplish this,
explanations, and more. together with the resulting benefits.
Coroama and Groza [186] present 37 different metrics for First, a target audience for the explainable system should be
measuring the effectiveness of an explanation. The meth- identified, ensuring that a sample population of statistically
ods range from objective to subjective types. Each method significant size is used. Presenting the information to this
includes the property it measures and whether there is a sample population would bring many benefits to the XAI
systemic implementation. field as a whole. These include: making more quality metrics
Sisk et al. [187] present the case for human-centered eval- available, allowing researchers to discern which information
uations and objective evaluations for explainable methods. is more or less useful, and bringing more attention to cus-
Their human-centered evaluations aim at partitioning the tomizable explanations via the type of user. These would
users based on their wants from explainable systems. The push the field of XAI forward as well as push the field of
objective metrics provided involve many aspects of the ex-
planations including number of rules and number of features. 2 https://icons8.com/

20
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

predictive maintenance forward towards a human-AI teaming [3] F. Longo, A. Padovano, and S. Umbrello, ‘‘Value-Oriented and ethical
environment. technology engineering in industry 5.0: A Human-Centric perspective for
the design of the factory of the future,’’ NATO Adv. Sci. Inst. Ser. E Appl.
Sci., vol. 10, no. 12, p. 4182, Jun. 2020.
D. STUDY LIMITATIONS [4] S. Nahavandi, ‘‘Industry 5.0—a Human-Centric solution,’’ Sustain. Sci.
This study focuses on a small amount of potential XAI and Pract. Policy, vol. 11, no. 16, p. 4371, Aug. 2019.
[5] A. Lavopa and M. Delera, ‘‘What is the Fourth Industrial Revolution?
iML literature. While this survey reflects the work done as | Industrial Analytics Platform,’’ 2021. [Online]. Available: https:
applied to predictive maintenance, it does not reflect many of //iap.unido.org/articles/what-fourth-industrial-revolution
the applied XAI and iML algorithms that exist. It also does not [6] L. Cummins, B. Killen, K. Thomas, P. Barrett, S. Rahimi, and M. Seale,
reflect all of the applicable ML algorithms developed within ‘‘Deep learning approaches to remaining useful life prediction: A survey,’’
in 2021 IEEE Symposium Series on Computational Intelligence (SSCI),
the context of predictive maintenance. While we do not see 2021, pp. 1–9.
this as a detriment to the article presented, we do note that [7] T. Speith, ‘‘A review of taxonomies of explainable artificial intelligence
there are a number of popular methods of which the reader (xai) methods,’’ in Proceedings of the 2022 ACM Conference on Fairness,
Accountability, and Transparency, 2022, pp. 2239–2250.
may be aware that are not present. [8] J. Zhou, A. H. Gandomi, F. Chen, and A. Holzinger, ‘‘Evaluating the
quality of machine learning explanations: A survey on methods and
VIII. CONCLUSION metrics,’’ Electronics, vol. 10, no. 5, p. 593, 2021.
Over the last decade, predictive maintenance has occupied [9] M. Nauta, J. Trienes, S. Pathak, E. Nguyen, M. Peters, Y. Schmitt,
J. Schlötterer, M. van Keulen, and C. Seifert, ‘‘From anecdotal evidence
a considerable presence in the field of machine learning to quantitative evaluation methods: A systematic review on evaluating
research. As we move towards complex mechanical systems explainable ai,’’ ACM Computing Surveys, 2022.
with interdependencies that we struggle to explain, predictive [10] Y. Rong, T. Leemann, T.-t. Nguyen, L. Fiedler, T. Seidel, G. Kasneci,
and E. Kasneci, ‘‘Towards human-centered explainable ai: user studies
maintenance allows us to break down the mysticality of what for model explanations,’’ arXiv preprint arXiv:2210.11584, 2022.
could potentially go wrong in the system. Many of these [11] J. Sharma, M. L. Mittal, and G. Soni, ‘‘Condition-based maintenance us-
approaches move us closer to understanding the system while ing machine learning and role of interpretability: a review,’’ International
Journal of System Assurance Engineering and Management, Dec. 2022.
building a new system that we need to comprehend. Ex- [12] R. Marcinkevičs and J. E. Vogt, ‘‘Interpretable and explainable machine
plainable predictive maintenance and interpretable predictive learning: A methods-centric overview with concrete examples,’’ Wiley
maintenance aim at breaking down these new walls to bring Interdisciplinary Reviews: Data Mining and Knowledge Discovery, p.
e1493, 2023.
us closer to a clear understanding of the mechanical system.
[13] M. Clinciu and H. Hastie, ‘‘A survey of explainable ai terminology,’’
In this review, we provided a wide range of methods that in Proceedings of the 1st workshop on interactive natural language
are being used to tackle the problem of explainability. These technology for explainable artificial intelligence (NL4XAI 2019), 2019,
methods are broken down in XAI and iML approaches. In our pp. 8–13.
[14] S. S. Kim, E. A. Watkins, O. Russakovsky, R. Fong, and A. Monroy-
writing, XAI was broken-up into model-agnostic approaches Hernández, ‘‘" help me help the ai": Understanding how explainability
like SHAP, LIME and LRP, and model-specific approaches can support human-ai interaction,’’ in Proceedings of the 2023 CHI
like GradCAM and DIFFI. iML approaches all apply different Conference on Human Factors in Computing Systems, 2023, pp. 1–17.
[15] S. Neupane, J. Ables, W. Anderson, S. Mittal, and others, ‘‘Explainable
methods of applying inherently interpretable models to the intrusion detection systems (x-ids): A survey of current methods, chal-
problem of predictive maintenance. lenges, and opportunities,’’ IEEE, 2022.
Our systematic review of XAI and iML as applied to [16] A. K. M. Nor, S. R. Pedapati, M. Muhammad, and V. Leiva, ‘‘Overview of
explainable artificial intelligence for prognostic and health management
predictive maintenance showed some weak points in the field of industrial assets based on preferred reporting items for systematic
that can be addressed. Namely, there is a lack of utilization reviews and Meta-Analyses,’’ Sensors, vol. 21, no. 23, Dec. 2021.
of metrics of explanations in predictive maintenance. The [17] S. Ali, T. Abuhmed, S. El-Sappagh, K. Muhammad, J. M. Alonso-Moral,
field of XAI has shown a number of metrics that do not even R. Confalonieri, R. Guidotti, J. D. Ser, N. Díaz-Rodríguez, and F. Herrera,
‘‘Explainable artificial intelligence (XAI): What we know and what is left
need to show the explanations to the target audience of the to attain trustworthy artificial intelligence,’’ Inf. Fusion, p. 101805, Apr.
explainable systems. We provided a list of potential metrics 2023.
found in the literature that can be applied to this domain. [18] T. Rojat, R. Puget, D. Filliat, J. Del Ser, R. Gelin, and N. Díaz-Rodríguez,
‘‘Explainable artificial intelligence (xai) on timeseries data: A survey,’’
Lastly, we provided a short description of how humans can arXiv preprint arXiv:2104.00950, 2021.
be brought into the evaluation of explainable and interpretable [19] K. Sokol and P. Flach, ‘‘Explainability is in the mind of the beholder:
methods. After defining the target audience, researchers can Establishing the foundations of explainable artificial intelligence,’’ arXiv
preprint arXiv:2112.14466, 2021.
gather a statistically significant sized sample of that audience. [20] T. A. Schoonderwoerd, W. Jorritsma, M. A. Neerincx, and K. Van
Providing the explanations to that sample would give feed- Den Bosch, ‘‘Human-centered xai: Developing design patterns for expla-
back and allow the field to push towards human-specified nations of clinical decision support systems,’’ International Journal of
Human-Computer Studies, vol. 154, p. 102684, 2021.
explanations.
[21] P. Lopes, E. Silva, C. Braga, T. Oliveira, and L. Rosado, ‘‘Xai systems
evaluation: A review of human and computer-centred methods,’’ Applied
REFERENCES Sciences, vol. 12, no. 19, p. 9423, 2022.
[1] S. B. Ramezani, L. Cummins, B. Killen, R. Carley, A. Amirlatifi, [22] S. Vollert, M. Atzmueller, and A. Theissler, ‘‘Interpretable machine learn-
S. Rahimi, M. Seale, and L. Bian, ‘‘Scalability, explainability and perfor- ing: A brief survey from the predictive maintenance perspective,’’ in
mance of data-driven algorithms in predicting the remaining useful life: 2021 26th IEEE International Conference on Emerging Technologies and
A comprehensive review,’’ IEEE Access, 2023. Factory Automation (ETFA ). ieeexplore.ieee.org, Sep. 2021, pp. 01–08.
[2] J. Leng, W. Sha, B. Wang, P. Zheng, C. Zhuang, Q. Liu, T. Wuest, [23] S. M. Lundberg and S.-I. Lee, ‘‘A unified approach to interpreting model
D. Mourtzis, and L. Wang, ‘‘Industry 5.0: Prospect and retrospect,’’ predictions,’’ Advances in neural information processing systems, vol. 30,
Journal of Manufacturing Systems, vol. 65, pp. 279–295, Oct. 2022. 2017.

21
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

[24] M. T. Ribeiro, S. Singh, and C. Guestrin, ‘‘" why should i trust you?" Transactions on Industrial Informatics, vol. 19, no. 4, pp. 5995–6005,
explaining the predictions of any classifier,’’ in Proceedings of the 22nd 2022.
ACM SIGKDD international conference on knowledge discovery and data [44] O. Mey and D. Neufeld, ‘‘Explainable ai algorithms for vibration
mining, 2016, pp. 1135–1144. data-based fault detection: Use case-adadpted methods and critical
[25] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, ‘‘Learning evaluation,’’ Sensors, vol. 22, no. 23, 2022. [Online]. Available:
deep features for discriminative localization,’’ in 2016 IEEE Conference https://www.mdpi.com/1424-8220/22/23/9037
on Computer Vision and Pattern Recognition (CVPR). IEEE, Jun. 2016, [45] T. Li, Z. Zhao, C. Sun, L. Cheng, X. Chen, R. Yan, and R. X. Gao,
pp. 2921–2929. ‘‘Waveletkernelnet: An interpretable deep neural network for industrial
[26] C. Molnar, Interpretable machine learning. Lulu. com, 2020. intelligent diagnosis,’’ IEEE Transactions on Systems, Man, and Cyber-
[27] S. B. Ramezani, L. Cummins, B. Killen, R. Carley, S. Rahimi, and netics: Systems, vol. 52, no. 4, pp. 2302–2312, 2022.
M. Seale, ‘‘Similarity based methods for faulty pattern detection in predic- [46] H. Pu, K. Zhang, and Y. An, ‘‘Restricted sparse networks for rolling
tive maintenance,’’ in 2021 International Conference on Computational bearing fault diagnosis,’’ IEEE Transactions on Industrial Informatics,
Science and Computational Intelligence (CSCI), 2021, pp. 207–213. pp. 1–11, 2023.
[28] Y. Wen, F. Rahman, H. Xu, and T.-L. B. Tseng, ‘‘Recent advances and [47] G. Xin, Z. Li, L. Jia, Q. Zhong, H. Dong, N. Hamzaoui, and J. Antoni,
trends of predictive maintenance from data-driven machine prognostics ‘‘Fault diagnosis of wheelset bearings in high-speed trains using loga-
perspective,’’ Measurement, vol. 187, p. 110276, Jan. 2022. rithmic short-time fourier transform and modified self-calibrated residual
[29] S. B. Ramezani, B. Killen, L. Cummins, S. Rahimi, A. Amirlatifi, and network,’’ IEEE Transactions on Industrial Informatics, vol. 18, no. 10,
M. Seale, ‘‘A survey of hmm-based algorithms in machinery fault pre- pp. 7285–7295, 2022.
diction,’’ in 2021 IEEE Symposium Series on Computational Intelligence [48] L. C. Brito, G. A. Susto, J. N. Brito, and M. A. V. Duarte, ‘‘Fault
(SSCI), 2021, pp. 1–9. diagnosis using explainable ai: A transfer learning-based approach
[30] K. L. Tsui, N. Chen, Q. Zhou, Y. Hai, W. Wang et al., ‘‘Prognostics and for rotating machinery exploiting augmented synthetic data,’’ Expert
health management: A review on data driven approaches,’’ Mathematical Systems with Applications, vol. 232, p. 120860, 2023. [Online]. Available:
Problems in Engineering, vol. 2015, 2015. https://www.sciencedirect.com/science/article/pii/S0957417423013623
[31] M. J. Page, D. Moher, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, [49] F. Ben Abid, M. Sallem, and A. Braham, ‘‘An end-to-end bearing fault
C. D. Mulrow, L. Shamseer, J. M. Tetzlaff, E. A. Akl, S. E. Brennan diagnosis and severity assessment with interpretable deep learning.’’ Jour-
et al., ‘‘Prisma 2020 explanation and elaboration: updated guidance and nal of Electrical Systems, vol. 18, no. 4, 2022.
exemplars for reporting systematic reviews,’’ bmj, vol. 372, 2021. [50] D. C. Sanakkayala, V. Varadarajan, N. Kumar, Karan, G. Soni, P. Kamat,
[32] N. R. Haddaway, M. J. Page, C. C. Pritchard, and L. A. McGuinness, S. Kumar, S. Patil, and K. Kotecha, ‘‘Explainable ai for bearing fault
‘‘Prisma2020: An r package and shiny app for producing prisma 2020- prognosis using deep learning techniques,’’ Micromachines, vol. 13, no. 9,
compliant flow diagrams, with interactivity for optimised digital trans- p. 1471, 2022.
parency and open synthesis,’’ Campbell Systematic Reviews, vol. 18, no. 2, [51] O. Serradilla, E. Zugasti, C. Cernuda, A. Aranburu, J. R. de Okariz, and
p. e1230, 2022. U. Zurutuza, ‘‘Interpreting remaining useful life estimations combining
[33] P. Ding, M. Jia, and H. Wang, ‘‘A dynamic structure-adaptive symbolic explainable artificial intelligence and domain knowledge in industrial
approach for slewing bearings’ life prediction under variable working machinery,’’ in 2020 IEEE international conference on fuzzy systems
conditions,’’ Structural Health Monitoring, vol. 20, no. 1, pp. 273–302, (FUZZ-IEEE). IEEE, 2020, pp. 1–8.
Jan. 2021.
[52] R. Kothamasu and S. H. Huang, ‘‘Adaptive mamdani fuzzy model for
[34] G. Manco, E. Ritacco, P. Rullo, L. Gallucci, W. Astill, D. Kimber, and
condition-based maintenance,’’ Fuzzy sets and Systems, vol. 158, no. 24,
M. Antonelli, ‘‘Fault detection and explanation through big data analysis
pp. 2715–2733, 2007.
on sensor streams,’’ Expert Systems with Applications, vol. 87, pp. 141–
[53] E. Lughofer, P. Zorn, and E. Marth, ‘‘Transfer learning of fuzzy classifiers
156, 2017.
for optimized joint representation of simulated and measured data in
[35] G. Protopapadakis and A. I. Kalfas, ‘‘Explainable and interpretable AI-
anomaly detection of motor phase currents,’’ Applied Soft Computing, vol.
Assisted remaining useful life estimation for aeroengines,’’ ASME Turbo
124, p. 109013, 2022.
Expo 2022: Turbomachinery Technical Conference and Exposition, p.
V002T05A002, Oct. 2022. [54] A. L. Alfeo, M. G. C. A. Cimino, and G. Vaglini, ‘‘Degradation stage
classification via interpretable feature learning,’’ Journal of Manufactur-
[36] T. Khan, K. Ahmad, J. Khan, I. Khan, and N. Ahmad, ‘‘An explainable
ing Systems, vol. 62, pp. 972–983, Jan. 2022.
regression framework for predicting remaining useful life of machines,’’
in 2022 27th International Conference on Automation and Computing [55] J. Wang, M. Xu, C. Zhang, B. Huang, and F. Gu, ‘‘Online bearing
(ICAC). IEEE, 2022, pp. 1–6. clearance monitoring based on an accurate vibration analysis,’’ Energies,
[37] D. Solís-Martín, J. Galán-Páez, and J. Borrego-Díaz, ‘‘On the soundness vol. 13, no. 2, p. 389, 2020.
of XAI in prognostics and health management (PHM),’’ Information, [56] W. Wang, Z. Peng, S. Wang, H. Li, M. Liu, L. Xue, and N. Zhang, ‘‘Ifp-
vol. 14, no. 5, p. 256, Apr. 2023. adac: A two-stage interpretable fault prediction model for multivariate
[38] A. Ferraro, A. Galli, V. Moscato, and G. Sperlì, ‘‘Evaluating explainable time series,’’ in 2021 22nd IEEE International Conference on Mobile
artificial intelligence tools for hard disk drive predictive maintenance,’’ Data Management (MDM). IEEE, 2021, pp. 29–38.
Artificial Intelligence Review, vol. 56, no. 7, pp. 7279–7314, Jul. 2023. [57] C. Panda and T. R. Singh, ‘‘Ml-based vehicle downtime reduction: A
[39] P. Nectoux, R. Gouriveau, K. Medjaher, E. Ramasso, B. Chebel-Morello, case of air compressor failure detection,’’ Engineering Applications of
N. Zerhouni, and C. Varnier, ‘‘Pronostia: An experimental platform for Artificial Intelligence, vol. 122, p. 106031, 2023.
bearings accelerated degradation tests.’’ in IEEE International Confer- [58] S. Xia, X. Zhou, H. Shi, S. Li, and C. Xu, ‘‘A fault diagnosis method with
ence on Prognostics and Health Management, PHM’12. IEEE Catalog multi-source data fusion based on hierarchical attention for auv,’’ Ocean
Number: CPF12PHM-CDR, 2012, pp. 1–8. Engineering, vol. 266, p. 112595, 2022.
[40] H. Qiu, J. Lee, J. Lin, and G. Yu, ‘‘Wavelet filter-based weak [59] Y. Fan, H. Sarmadi, and S. Nowaczyk, ‘‘Incorporating physics-based
signature detection method and its application on rolling element bearing models into data driven approaches for air leak detection in city buses,’’
prognostics,’’ Journal of Sound and Vibration, vol. 289, no. 4, pp. in Joint European Conference on Machine Learning and Knowledge
1066–1090, 2006. [Online]. Available: https://www.sciencedirect.com/ Discovery in Databases. Springer, 2022, pp. 438–450.
science/article/pii/S0022460X0500221X [60] W. Li, H. Lan, J. Chen, K. Feng, and R. Huang, ‘‘Wavcapsnet: An inter-
[41] R. Yao, H. Jiang, C. Yang, H. Zhu, and C. Liu, ‘‘An integrated framework pretable intelligent compound fault diagnosis method by backward track-
via key-spectrum entropy and statistical properties for bearing dynamic ing,’’ IEEE Transactions on Instrumentation and Measurement, vol. 72,
health monitoring and performance degradation assessment,’’ Mechanical pp. 1–11, 2023.
Systems and Signal Processing, vol. 187, p. 109955, 2023. [61] G. B. Jang and S. B. Cho, ‘‘Anomaly detection of 2.4 l diesel engine using
[42] L. C. Brito, G. A. Susto, J. N. Brito, and M. A. Duarte, ‘‘An explainable one-class svm with variational autoencoder,’’ in Proc. Annual Conference
artificial intelligence approach for unsupervised fault detection and diag- of the Prognostics and Health Management Society, vol. 11, no. 1, 2019.
nosis in rotating machinery,’’ Mechanical Systems and Signal Processing, [62] Y. Ming, P. Xu, H. Qu, and L. Ren, ‘‘Interpretable and steerable sequence
vol. 163, p. 108105, 2022. learning via prototypes,’’ in Proceedings of the 25th ACM SIGKDD
[43] J. Yang, Z. Yue, and Y. Yuan, ‘‘Noise-aware sparse gaussian processes International Conference on Knowledge Discovery & Data Mining, 2019,
and application to reliable industrial machinery health monitoring,’’ IEEE pp. 903–913.

22
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

[63] C. Oh, J. Moon, and J. Jeong, ‘‘Explainable process monitoring based on [83] S. Matzka, ‘‘Explainable artificial intelligence for predictive maintenance
class activation map: Garbage in, garbage out,’’ in IoT Streams for Data- applications,’’ in 2020 third international conference on artificial intelli-
Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded gence for industries (ai4i). IEEE, 2020, pp. 69–74.
Machine Learning, J. Gama, S. Pashami, A. Bifet, M. Sayed-Mouchawe, [84] A. Torcianti and S. Matzka, ‘‘Explainable artificial intelligence for predic-
H. Fröning, F. Pernkopf, G. Schiele, and M. Blott, Eds. Cham: Springer tive maintenance applications using a local surrogate model,’’ in 2021 4th
International Publishing, 2020, pp. 93–105. International Conference on Artificial Intelligence for Industries (AI4I).
[64] A. B. Hafeez, E. Alonso, and A. Riaz, ‘‘Dtcencoder: A swiss army knife IEEE, 2021, pp. 86–88.
architecture for dtc exploration, prediction, search and model interpreta- [85] Y. Remil, A. Bendimerad, M. Plantevit, C. Robardet, and M. Kaytoue,
tion,’’ in 2022 21st IEEE International Conference on Machine Learning ‘‘Interpretable summaries of black box incident triaging with subgroup
and Applications (ICMLA). IEEE, 2022, pp. 519–524. discovery,’’ in 2021 IEEE 8th International Conference on Data Science
[65] R. P. Ribeiro, S. M. Mastelini, N. Davari, E. Aminian, B. Veloso, and and Advanced Analytics (DSAA), Oct. 2021, pp. 1–10.
J. Gama, ‘‘Online anomaly explanation: A case study on predictive [86] B. Ghasemkhani, O. Aktas, and D. Birant, ‘‘Balanced K-Star: An explain-
maintenance,’’ in Joint European Conference on Machine Learning and able machine learning method for Internet-of-Things-Enabled predictive
Knowledge Discovery in Databases. Springer, 2022, pp. 383–399. maintenance in manufacturing,’’ Machines, vol. 11, no. 3, p. 322, Feb.
[66] X. Li, Y. Sun, and W. Yu, ‘‘Automatic and interpretable predictive main- 2023.
tenance system,’’ in SAE Technical Paper Series, no. 2021-01-0247. 400 [87] J. Liu, S. Zheng, and C. Wang, ‘‘Causal graph attention network with
Commonwealth Drive, Warrendale, PA, United States: SAE International, disentangled representations for complex systems fault detection,’’ Reli-
Apr. 2021. ability Engineering & System Safety, vol. 235, p. 109232, 2023.
[67] S. Voronov, D. Jung, and E. Frisk, ‘‘A forest-based algorithm for select- [88] A. Trilla, N. Mijatovic, and X. Vilasis-Cardona, ‘‘Unsupervised proba-
ing informative variables using variable depth distribution,’’ Engineering bilistic anomaly detection over nominal subsystem events through a hi-
Applications of Artificial Intelligence, vol. 97, p. 104073, 2021. erarchical variational autoencoder,’’ International Journal of Prognostics
[68] J.-H. Han, S.-U. Park, and S.-K. Hong, ‘‘A study on the effectiveness of and Health Management, vol. 14, no. 1, 2023.
current data in motor mechanical fault diagnosis using XAI,’’ Journal of [89] I. Errandonea, P. Ciáurriz, U. Alvarado, S. Beltrán, and S. Arrizabalaga,
Electrical Engineering & Technology, vol. 17, no. 6, pp. 3329–3335, Nov. ‘‘Edge intelligence-based proposal for onboard catenary stagger
2022. amplitude diagnosis,’’ Computers in Industry, vol. 144, p. 103781, 2023.
[69] A. Saxena and K. Goebel, ‘‘Turbofan engine degradation simulation data [Online]. Available: https://www.sciencedirect.com/science/article/pii/
set,’’ NASA ames prognostics data repository, vol. 18, 2008. S0166361522001774
[70] A. Brunello, D. Della Monica, A. Montanari, N. Saccomanno, and A. Ur- [90] B. Steenwinckel, D. De Paepe, S. V. Hautte, P. Heyvaert, M. Bentefrit,
golo, ‘‘Monitors that learn from failures: Pairing stl and genetic program- P. Moens, A. Dimou, B. Van Den Bossche, F. De Turck, S. Van Hoecke
ming,’’ IEEE Access, 2023. et al., ‘‘Flags: A methodology for adaptive anomaly detection and root
[71] Z. Wu, H. Luo, Y. Yang, P. Lv, X. Zhu, Y. Ji, and B. Wu, ‘‘K-pdm: Kpi- cause analysis on sensor data streams by fusing expert knowledge with
oriented machinery deterioration estimation framework for predictive machine learning,’’ Future Generation Computer Systems, vol. 116, pp.
maintenance using cluster-based hidden markov model,’’ IEEE Access, 30–48, 2021.
vol. 6, pp. 41 676–41 687, 2018. [91] H. Li, D. Parikh, Q. He, B. Qian, Z. Li, D. Fang, and A. Hampapur, ‘‘Im-
[72] J. Jakubowski, P. Stanisz, S. Bobek, and G. J. Nalepa, ‘‘Anomaly detection proving rail network velocity: A machine learning approach to predictive
in asset degradation process using variational autoencoder and explana- maintenance,’’ Transportation Research Part C: Emerging Technologies,
tions,’’ Sensors, vol. 22, no. 1, p. 291, 2021. vol. 45, pp. 17–26, 2014.
[73] A. Brunello, D. Della Monica, A. Montanari, and A. Urgolo, ‘‘Learning [92] Z. Allah Bukhsh, A. Saeed, I. Stipanovic, and A. G. Doree, ‘‘Predictive
how to monitor: Pairing monitoring and learning for online system veri- maintenance using tree-based classification techniques: A case of railway
fication.’’ in OVERLAY, 2020, pp. 83–88. switches,’’ Transp. Res. Part C: Emerg. Technol., vol. 101, pp. 35–54, Apr.
[74] N. Costa and L. Sánchez, ‘‘Variational encoding approach for inter- 2019.
pretable assessment of remaining useful life estimation,’’ Reliability En- [93] P. Cao, S. Zhang, and J. Tang, ‘‘Preprocessing-free gear fault diagnosis us-
gineering & System Safety, vol. 222, p. 108353, 2022. ing small datasets with deep convolutional neural network-based transfer
[75] M. Sayed-Mouchaweh and L. Rajaoarisoa, ‘‘Explainable decision support learning,’’ IEEE Access, vol. 6, pp. 26 241–26 253, 2018.
tool for iot predictive maintenance within the context of industry 4.0,’’ [94] G. Hajgató, R. Wéber, B. Szilágyi, B. Tóthpál, B. Gyires-Tóth, and
in 2022 21st IEEE International Conference on Machine Learning and C. Hős, ‘‘PredMaX: Predictive maintenance with explainable deep con-
Applications (ICMLA). IEEE, 2022, pp. 1492–1497. volutional autoencoders,’’ Advanced Engineering Informatics, vol. 54, p.
[76] J. Jakubowski, P. Stanisz, S. Bobek, and G. J. Nalepa, ‘‘Performance of 101778, Oct. 2022.
explainable AI methods in asset failure prediction,’’ in Computational [95] J. Jakubowski, P. Stanisz, S. Bobek, and G. J. Nalepa, ‘‘Explainable
Science – ICCS 2022. Springer International Publishing, 2022, pp. 472– anomaly detection for hot-rolling industrial process,’’ in 2021 IEEE
485. 8th International Conference on Data Science and Advanced Analytics
[77] E. Kononov, A. Klyuev, and M. Tashkinov, ‘‘Prediction of technical state (DSAA). IEEE, 2021, pp. 1–10.
of mechanical systems based on interpretive neural network model,’’ [96] N. Mylonas, I. Mollas, N. Bassiliades, and G. Tsoumakas, ‘‘Local multi-
Sensors, vol. 23, no. 4, Feb. 2023. label explanations for random forest,’’ in Joint European Conference on
[78] T. Jing, P. Zheng, L. Xia, and T. Liu, ‘‘Transformer-based hierarchical Machine Learning and Knowledge Discovery in Databases. Springer,
latent space VAE for interpretable remaining useful life prediction,’’ 2022, pp. 369–384.
Advanced Engineering Informatics, vol. 54, p. 101781, Oct. 2022. [97] J. Jakubowski, P. Stanisz, S. Bobek, and G. J. Nalepa, ‘‘Roll wear
[79] K. Waghen and M.-S. Ouali, ‘‘A Data-Driven fault tree for a time causality prediction in strip cold rolling with physics-informed autoencoder and
analysis in an aging system,’’ Algorithms, vol. 15, no. 6, p. 178, May 2022. counterfactual explanations,’’ in 2022 IEEE 9th International Conference
[80] A. N. Abbas, G. C. Chasparis, and J. D. Kelleher, ‘‘Interpretable Input- on Data Science and Advanced Analytics (DSAA). IEEE, 2022, pp. 1–10.
Output hidden markov Model-Based deep reinforcement learning for the [98] W. Xu, Z. Zhou, T. Li, C. Sun, X. Chen, and R. Yan, ‘‘Physics-constraint
predictive maintenance of turbofan engines,’’ in Big Data Analytics and variational neural network for wear state assessment of external gear
Knowledge Discovery. Springer International Publishing, 2022, pp. 133– pump,’’ IEEE Transactions on Neural Networks and Learning Systems,
148. 2022.
[81] J. Brito and R. Pederiva, ‘‘Using artificial intelligence tools to detect [99] J. M. F. Salido and S. Murakami, ‘‘A comparison of two learning mech-
problems in induction motors,’’ in Proceedings of the 1st International anisms for the automatic design of fuzzy diagnosis systems for rotating
Conference on Soft Computing and Intelligent Systems (International machinery,’’ Applied Soft Computing, vol. 4, no. 4, pp. 413–422, 2004.
Session of 8th SOFT Fuzzy Systems Symposium) and 3rd International [100] R. Langone, A. Cuzzocrea, and N. Skantzos, ‘‘Interpretable anomaly
Symposium on Advanced Intelligent Systems (SCIS and ISIS 2002), vol. 1, prediction: Predicting anomalous behavior in industry 4.0 settings via
2002, pp. 1–6. regularized logistic regression tools,’’ Data & Knowledge Engineering,
[82] A.-C. Glock, ‘‘Explaining a random forest with the difference of two vol. 130, p. 101850, 2020.
arima models in an industrial fault detection scenario,’’ Procedia Com- [101] V. M. Janakiraman, ‘‘Explaining aviation safety incidents using deep
puter Science, vol. 180, pp. 476–481, 2021. temporal multiple instance learning,’’ in Proceedings of the 24th ACM

23
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

SIGKDD International Conference on Knowledge Discovery & Data [121] M. Beretta, A. Julian, J. Sepulveda, J. Cusidó, and O. Porro, ‘‘An ensem-
Mining, 2018, pp. 406–415. ble learning solution for predictive maintenance of wind turbines main
[102] M. Berno, M. Canil, N. Chiarello, L. Piazzon, F. Berti, F. Ferrari, A. Za- bearing,’’ Sensors, vol. 21, no. 4, Feb. 2021.
upa, N. Ferro, M. Rossi, and G. A. Susto, ‘‘A machine learning-based [122] M. Beretta, Y. Vidal, J. Sepulveda, O. Porro, and J. Cusidó, ‘‘Improved
approach for advanced monitoring of automated equipment for the enter- ensemble learning for wind turbine main bearing fault diagnosis,’’
tainment industry,’’ in 2021 IEEE International Workshop on Metrology Applied Sciences, vol. 11, no. 16, 2021. [Online]. Available: https:
for Industry 4.0 & IoT (MetroInd4. 0&IoT). IEEE, 2021, pp. 386–391. //www.mdpi.com/2076-3417/11/16/7523
[103] E. Anello, C. Masiero, F. Ferro, F. Ferrari, B. Mukaj, A. Beghi, and [123] S. J. Upasane, H. Hagras, M. H. Anisi, S. Savill, I. Taylor, and
G. A. Susto, ‘‘Anomaly detection for the industrial internet of things: K. Manousakis, ‘‘A big bang-big crunch type-2 fuzzy logic system for
an unsupervised approach for fast root cause analysis,’’ in 2022 IEEE explainable predictive maintenance,’’ in 2021 IEEE International Con-
Conference on Control Technology and Applications (CCTA). IEEE, ference on Fuzzy Systems (FUZZ-IEEE). IEEE, 2021, pp. 1–8.
2022, pp. 1366–1371. [124] P. M. Attia, A. Grover, N. Jin, K. A. Severson, T. M. Markov, Y.-H.
[104] D. Marcato, G. Arena, D. Bortolato, F. Gelain, V. Martinelli, E. Munaron, Liao, M. H. Chen, B. Cheong, N. Perkins, Z. Yang et al., ‘‘Closed-
M. Roetta, G. Savarese, and G. A. Susto, ‘‘Machine learning-based loop optimization of fast-charging protocols for batteries with machine
anomaly detection for particle accelerators,’’ in 2021 IEEE Conference learning,’’ Nature, vol. 578, no. 7795, pp. 397–402, 2020.
on Control Technology and Applications (CCTA). IEEE, 2021, pp. 240– [125] R. Csalódi, Z. Bagyura, and J. Abonyi, ‘‘Mixture of survival analysis
246. models-cluster-weighted weibull distributions,’’ IEEE Access, vol. 9, pp.
[105] L. Felsberger, A. Apollonio, T. Cartier-Michaud, A. Müller, B. Todd, 152 288–152 299, 2021.
and D. Kranzlmüller, ‘‘Explainable deep learning for fault prognostics in [126] F. Wang, Z. Zhao, Z. Zhai, Z. Shang, R. Yan, and X. Chen,
complex systems: A particle accelerator use-case,’’ in Machine Learning ‘‘Explainability-driven model improvement for SOH estimation of
and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG lithium-ion battery,’’ Reliab. Eng. Syst. Saf., vol. 232, p. 109046, Apr.
12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, 2023.
Ireland, August 25–28, 2020, Proceedings 4. Springer, 2020, pp. 139– [127] B. Verkuil, C. E. Budde, and D. Bucur, ‘‘Automated fault tree learning
158. from continuous-valued sensor data: a case study on domestic heaters,’’
[106] P. Bellini, D. Cenni, L. A. I. Palesi, P. Nesi, and G. Pantaleo, ‘‘A deep arXiv preprint arXiv:2203.07374, 2022.
learning approach for short term prediction of industrial plant working [128] L. Lorenti, G. De Rossi, A. Annoni, S. Rigutto, and G. A. Susto, ‘‘Cuad-
status,’’ in 2021 IEEE Seventh International Conference on Big Data mo: Continuos unsupervised anomaly detection on machining opera-
Computing Service and Applications (BigDataService). IEEE, 2021, tions,’’ in 2022 IEEE Conference on Control Technology and Applications
pp. 9–16. (CCTA). IEEE, 2022, pp. 881–886.
[107] H. Choi, D. Kim, J. Kim, J. Kim, and P. Kang, ‘‘Explainable anomaly de- [129] B. A. ugli Olimov, K. C. Veluvolu, A. Paul, and J. Kim, ‘‘Uzadl: Anomaly
tection framework for predictive maintenance in manufacturing systems,’’ detection and localization using graph laplacian matrix-based unsuper-
Applied Soft Computing, vol. 125, p. 109147, 2022. vised learning method,’’ Computers & Industrial Engineering, vol. 171,
[108] D. Kim, G. Antariksa, M. P. Handayani, S. Lee, and J. Lee, ‘‘Explainable p. 108313, 2022.
anomaly detection framework for maritime main engine sensor data,’’ [130] A. Lourenço, M. Fernandes, A. Canito, A. Almeida, and G. Marreiros,
Sensors, vol. 21, no. 15, p. 5200, 2021. ‘‘Using an explainable machine learning approach to minimize oppor-
[109] K. Michałowska, S. Riemer-Sørensen, C. Sterud, and O. M. Hjellset, tunistic maintenance interventions,’’ in International Conference on Prac-
‘‘Anomaly detection with unknown anomalies: Application to maritime tical Applications of Agents and Multi-Agent Systems. Springer, 2022,
machinery,’’ IFAC-PapersOnLine, vol. 54, no. 16, pp. 105–111, 2021. pp. 41–54.
[110] A. Bakdi, N. B. Kristensen, and M. Stakkeland, ‘‘Multiple instance learn- [131] O. Serradilla, E. Zugasti, J. Ramirez de Okariz, J. Rodriguez, and
ing with random forest for event logs analysis and predictive maintenance U. Zurutuza, ‘‘Adaptable and explainable predictive maintenance: Semi-
in ship electric propulsion system,’’ IEEE Trans. Ind. Inf., vol. 18, no. 11, supervised deep learning for anomaly detection and diagnosis in press
pp. 7718–7728, Nov. 2022. machine data,’’ Applied Sciences, vol. 11, no. 16, p. 7376, 2021.
[111] M. McCann and A. Johnston, ‘‘SECOM,’’ UCI Machine Learning Repos- [132] M. Hermansa, M. Kozielski, M. Michalak, K. Szczyrba, Ł. Wróbel, and
itory, 2008, DOI: https://doi.org/10.24432/C54305. M. Sikora, ‘‘Sensor-based predictive maintenance with reduction of false
[112] M. Gashi, B. Mutlu, and S. Thalmann, ‘‘Impact of interdependen- alarms—a case study in heavy industry,’’ Sensors, vol. 22, no. 1, p. 226,
cies: Multi-component system perspective toward predictive maintenance 2021.
based on machine learning and xai,’’ Applied Sciences, vol. 13, no. 5, p. [133] D. Xu, W. Cheng, J. Ni, D. Luo, M. Natsumeda, D. Song, B. Zong,
3088, 2023. H. Chen, and X. Zhang, ‘‘Deep multi-instance contrastive learning with
[113] Q. Cao, C. Zanni-Merk, A. Samet, F. d. B. de Beuvron, and C. Reich, dual attention for anomaly precursor detection,’’ in Proceedings of the
‘‘Using rule quality measures for rule base refinement in knowledge- 2021 SIAM International Conference on Data Mining (SDM). SIAM,
based predictive maintenance systems,’’ Cybernetics and Systems, vol. 51, 2021, pp. 91–99.
no. 2, pp. 161–176, 2020. [134] B. Steurtewagen and D. Van den Poel, ‘‘Adding interpretability to pre-
[114] A. Klein, ‘‘Hard drive failure rates: A look at drive reliability,’’ dictive maintenance by machine learning on sensor data,’’ Computers &
Jul 2021. [Online]. Available: https://www.backblaze.com/blog/ Chemical Engineering, vol. 152, p. 107381, 2021.
backblaze-hard-drive-stats-q1-2020/ [135] A. T. Keleko, B. Kamsu-Foguem, R. H. Ngouna, and A. Tongne, ‘‘Health
[115] M. Amram, J. Dunn, J. J. Toledano, and Y. D. Zhuo, ‘‘Interpretable predic- condition monitoring of a complex hydraulic system using deep neural
tive maintenance for hard drives,’’ Machine Learning with Applications, network and deepshap explainable xai,’’ Advances in Engineering Soft-
vol. 5, p. 100042, Sep. 2021. ware, vol. 175, p. 103339, 2023.
[116] I. Katser, V. Kozitsin, V. Lobachev, and I. Maksimov, ‘‘Unsupervised [136] G. Chen, M. Liu, and Z. Kong, ‘‘Temporal-logic-based semantic fault
offline changepoint detection ensembles,’’ Applied Sciences, vol. 11, diagnosis with time-series data from industrial internet of things,’’ IEEE
no. 9, p. 4280, 2021. Transactions on Industrial Electronics, vol. 68, no. 5, pp. 4393–4403,
[117] D. Dua and C. Graff, ‘‘Uci machine learning repository,’’ 2017. [Online]. 2020.
Available: http://archive.ics.uci.edu/ml [137] A. Schmetz, C. Vahl, Z. Zhen, D. Reibert, S. Mayer, D. Zontar, J. Garcke,
[118] K. Scott, D. Kakde, S. Peredriy, and A. Chaudhuri, ‘‘Computational and C. Brecher, ‘‘Decision support by interpretable machine learning
enhancements to the mahalanobis-taguchi system to improve fault de- in acoustic emission based cutting tool wear prediction,’’ in 2021 IEEE
tection and diagnostics,’’ in 2023 Annual Reliability and Maintainability International Conference on Industrial Engineering and Engineering
Symposium (RAMS). IEEE, 2023, pp. 1–7. Management (IEEM). ieeexplore.ieee.org, Dec. 2021, pp. 629–633.
[119] K. S. Hansen, N. Vasiljevic, and S. A. Sørensen, ‘‘Wind farm [138] T. V. Addison Howard, Sohier Dane, ‘‘Vsb power line fault
measurements,’’ May 2021. [Online]. Available: https://data.dtu.dk/ detection,’’ 2018. [Online]. Available: https://kaggle.com/competitions/
collections/Wind_Farm_measurements/5405418/3 vsb-power-line-fault-detection
[120] C. M. Roelofs, M.-A. Lutz, S. Faulstich, and S. Vogt, ‘‘Autoencoder-based [139] S. Simmons, L. Jarvis, D. Dempsey, and A. W. Kempa-Liehr, ‘‘Data min-
anomaly root cause analysis for wind turbines,’’ Energy and AI, vol. 4, p. ing on extremely long Time-Series,’’ in 2021 International Conference on
100065, 2021. Data Mining Workshops (ICDMW), Dec. 2021, pp. 1057–1066.

24
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

[140] Y. Zhang, P. Wang, K. Liang, Y. He, and S. Ma, ‘‘An alarm and fault [163] K. Simonyan, A. Vedaldi, and A. Zisserman, ‘‘Deep inside convolutional
association rule extraction method for power equipment based on explain- networks: Visualising image classification models and saliency maps,’’
able decision tree,’’ in 2021 11th International Conference on Power and arXiv preprint arXiv:1312.6034, 2013.
Energy Systems (ICPES), Dec. 2021, pp. 442–446. [164] Vaswani, Shazeer, Parmar, and others, ‘‘Attention is all you need,’’ Adv.
[141] S. J. Upasane, H. Hagras, M. H. Anisi, S. Savill, I. Taylor, and Neural Inf. Process. Syst., 2017.
K. Manousakis, ‘‘A type-2 fuzzy based explainable AI system for predic- [165] L. A. Zadeh, ‘‘Fuzzy logic,’’ Computer, vol. 21, no. 4, pp. 83–93, Apr.
tive maintenance within the water pumping industry,’’ IEEE Transactions 1988.
on Artificial Intelligence, pp. 1–14, 2023. [166] M. Ravanelli and Y. Bengio, ‘‘Speaker recognition from raw waveform
[142] L. Xia, Y. Liang, J. Leng, and P. Zheng, ‘‘Maintenance planning recom- with sincnet,’’ in 2018 IEEE spoken language technology workshop
mendation of complex industrial equipment based on knowledge graph (SLT). IEEE, 2018, pp. 1021–1028.
and graph neural network,’’ Reliab. Eng. Syst. Saf., vol. 232, p. 109068, [167] J. N. Morgan and J. A. Sonquist, ‘‘Problems in the analysis of survey data,
Apr. 2023. and a proposal,’’ Journal of the American statistical association, vol. 58,
[143] G. Tod, A. P. Ompusunggu, and E. Hostens, ‘‘An improved first-principle no. 302, pp. 415–434, 1963.
model of AC powered solenoid operated valves for maintenance applica- [168] D. Bertsimas and J. Dunn, ‘‘Optimal classification trees,’’ Machine Learn-
tions,’’ ISA Trans., vol. 135, pp. 551–566, Apr. 2023. ing, vol. 106, pp. 1039–1082, 2017.
[144] M. Mahmoodian, F. Shahrivar, S. Setunge, and S. Mazaheri, ‘‘Develop- [169] D. Bertsimas, J. Dunn, E. Gibson, and A. Orfanoudaki, ‘‘Optimal survival
ment of digital twin for intelligent maintenance of civil infrastructure,’’ trees,’’ Machine Learning, vol. 111, no. 8, pp. 2951–3023, 2022.
Sustain. Sci. Pract. Policy, vol. 14, no. 14, p. 8664, Jul. 2022. [170] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-
[145] I. Konovalenko and A. Ludwig, ‘‘Generating decision support for alarm Y. Liu, ‘‘Lightgbm: A highly efficient gradient boosting decision tree,’’
processing in cold supply chains using a hybrid k-nn algorithm,’’ Expert Advances in neural information processing systems, vol. 30, 2017.
Systems with Applications, vol. 190, p. 116208, 2022. [171] H. A. Watson et al., ‘‘Launch control safety study,’’ Bell labs, 1961.
[146] A. Dhaou, A. Bertoncello, S. Gourvénec, J. Garnier, and E. Le Pennec, [172] Student, ‘‘The probable error of a mean,’’ Biometrika, vol. 6, no. 1, pp.
‘‘Causal and interpretable rules for time series analysis,’’ in Proceedings 1–25, 1908.
of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data
[173] K. Pearson, ‘‘X. on the criterion that a given system of deviations from
Mining, ser. KDD ’21. New York, NY, USA: Association for Computing
the probable in the case of a correlated system of variables is such that it
Machinery, Aug. 2021, pp. 2764–2772.
can be reasonably supposed to have arisen from random sampling,’’ The
[147] [Online]. Available: https://catalogue.data.wa.gov.au/dataset/
London, Edinburgh, and Dublin Philosophical Magazine and Journal of
water-pipe-wcorp-002
Science, vol. 50, no. 302, pp. 157–175, 1900.
[148] P. Castle, J. Ham, M. Hodkiewicz, and A. Polpo, ‘‘Interpretable survival
[174] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Ben-
models for predictive maintenance,’’ in 30th European Safety and Relia-
gio, ‘‘Graph attention networks,’’ arXiv preprint arXiv:1710.10903, 2017.
bility Conference and 15th Probabilistic Safety Assessment and Manage-
[175] D. A. Reynolds et al., ‘‘Gaussian mixture models.’’ Encyclopedia of
ment Conference. research-repository.uwa.edu.au, 2020, pp. 3392–3399.
biometrics, vol. 741, no. 659-663, 2009.
[149] V. Belle and I. Papantonis, ‘‘Principles and practice of explainable ma-
[176] H. Nori, S. Jenkins, P. Koch, and R. Caruana, ‘‘Interpretml: A uni-
chine learning,’’ Front Big Data, vol. 4, p. 688969, Jul. 2021.
fied framework for machine learning interpretability,’’ arXiv preprint
[150] A. Saabas, ‘‘Interpreting random forests,’’ Oct 2014. [Online]. Available:
arXiv:1909.09223, 2019.
http://blog.datadive.net/interpreting-random-forests/
[151] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and [177] L. E. Baum and T. Petrie, ‘‘Statistical inference for probabilistic functions
W. Samek, ‘‘On Pixel-Wise explanations for Non-Linear classifier deci- of finite state markov chains,’’ The annals of mathematical statistics,
sions by Layer-Wise relevance propagation,’’ PLoS One, vol. 10, no. 7, p. vol. 37, no. 6, pp. 1554–1563, 1966.
e0130140, Jul. 2015. [178] O. Maler and D. Nickovic, ‘‘Monitoring temporal properties of continu-
[152] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learning ous signals,’’ in International Symposium on Formal Techniques in Real-
applied to document recognition,’’ Proceedings of the IEEE, vol. 86, Time and Fault-Tolerant Systems. Springer, 2004, pp. 152–166.
no. 11, pp. 2278–2324, 1998. [179] M. Grieves and J. Vickers, ‘‘Digital twin: Mitigating unpredictable,
[153] M. Sundararajan, A. Taly, and Q. Yan, ‘‘Axiomatic attribution for deep undesirable emergent behavior in complex systems,’’ Transdisciplinary
networks,’’ in Proceedings of the 34th International Conference on Ma- perspectives on complex systems: New findings and approaches, pp. 85–
chine Learning, ser. Proceedings of Machine Learning Research, D. Pre- 113, 2017.
cup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 3319–3328. [180] D. A. Augusto and H. J. Barbosa, ‘‘Symbolic regression via genetic
[154] D. Janzing, D. Balduzzi, M. Grosse-Wentrup, and B. Schölkopf, ‘‘Quan- programming,’’ in Proceedings. Vol. 1. Sixth Brazilian symposium on
tifying causal influences,’’ aos, vol. 41, no. 5, pp. 2324–2358, Oct. 2013. neural networks. IEEE, 2000, pp. 173–178.
[155] D. Dandolo, C. Masiero, M. Carletti, D. Dalle Pezze, and G. A. Susto, [181] T. Hastie and R. Tibshirani, ‘‘Generalized additive models,’’ Stat. Sci.,
‘‘AcME—Accelerated model-agnostic explanations: Fast whitening of vol. 1, no. 3, pp. 297–310, Aug. 1986.
the machine-learning black box,’’ Expert Syst. Appl., vol. 214, p. 119115, [182] G. Taguchi, G. Taguchi, and R. Jugulum, The Mahalanobis-Taguchi
Mar. 2023. strategy: A pattern technology system. John Wiley & Sons, 2002.
[156] D. Smilkov, N. Thorat, B. Kim, F. Viégas, and M. Wattenberg, ‘‘Smooth- [183] E. Fix and J. L. Hodges, ‘‘Discriminatory analysis. nonparametric
grad: removing noise by adding noise,’’ arXiv preprint arXiv:1706.03825, discrimination: Consistency properties,’’ International Statistical Re-
2017. view/Revue Internationale de Statistique, vol. 57, no. 3, pp. 238–247,
[157] S. Wachter, B. Mittelstadt, and C. Russell, ‘‘Counterfactual explanations 1989.
without opening the black box: Automated decisions and the GDPR,’’ [184] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik,
Harv. JL & Tech., 2017. A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila,
[158] TeamHG-Memex, ‘‘Teamhg-memex/eli5: A library for debug- and F. Herrera, ‘‘Explainable artificial intelligence (XAI): Concepts,
ging/inspecting machine learning classifiers and explaining their predic- taxonomies, opportunities and challenges toward responsible AI,’’ Inf.
tions.’’ [Online]. Available: https://github.com/TeamHG-Memex/eli5 Fusion, vol. 58, pp. 82–115, Jun. 2020.
[159] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Ba- [185] T. Miller, ‘‘Explanation in artificial intelligence: Insights from the social
tra, ‘‘Grad-CAM: Visual explanations from deep networks via gradient- sciences,’’ Artif. Intell., vol. 267, pp. 1–38, Feb. 2019.
based localization,’’ Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, Feb. [186] L. Coroama and A. Groza, ‘‘Evaluation metrics in explainable artificial
2020. intelligence (xai),’’ in International Conference on Advanced Research
[160] M. Carletti, M. Terzi, and G. A. Susto, ‘‘Interpretable anomaly detection in Technologies, Information, Innovation and Sustainability. Springer,
with DIFFI: Depth-based feature importance of isolation forest,’’ Eng. 2022, pp. 401–413.
Appl. Artif. Intell., vol. 119, p. 105730, Mar. 2023. [187] M. Sisk, M. Majlis, C. Page, and A. Yazdinejad, ‘‘Analyzing
[161] I. Mollas, N. Bassiliades, and G. Tsoumakas, ‘‘Conclusive local interpre- xai metrics: Summary of the literature review,’’ TechRxiv preprint
tation rules for random forests,’’ Data Min. Knowl. Discov., vol. 36, no. 4, techrxiv.21262041.v1, 2022.
pp. 1521–1574, Jul. 2022. [188] M. A. Kadir, A. Mosavi, and D. Sonntag, ‘‘Assessing xai: Unveiling
[162] ‘‘AI4I 2020 Predictive Maintenance Dataset,’’ UCI Machine Learning evaluation metrics for local explanation, taxonomies, key concepts, and
Repository, 2020, DOI: https://doi.org/10.24432/C5HS5C. practical applications,’’ engrXiv preprint, 2023.

25
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

[189] R. R. Hoffman, S. T. Mueller, G. Klein, and J. Litman, ‘‘Met- SUDIP MITTAL (Member, IEEE) is an Assistant
rics for explainable ai: Challenges and prospects,’’ arXiv preprint Professor in the Department of Computer Science
arXiv:1812.04608, 2018. & Engineering at the Mississippi State University.
He graduated with a Ph.D. in Computer Science
from the University of Maryland Baltimore County
in 2019. His primary research interests are cyber-
security and artificial intelligence. Mittal’s goal is
to develop the next generation of cyber defense
systems that help protect various organizations and
LOGAN CUMMINS (Member, IEEE) received
people. At Mississippi State, he leads the Secure
their B.S. degree in Computer Science and Engi-
and Trustworthy Cyberspace (SECRETS) Lab and has published over 80
neering from Mississippi State University. They
journals and conference papers in leading cybersecurity and AI venues.
are currently pursuing a Ph.D degree in Computer
Mittal has received funding from the NSF, USAF, USACE, and various other
Science at Mississippi State University with a mi-
Department of Defense programs. He also serves as a Program Committee
nor in Cognitive Science.
member or Program Chair of leading AI and cybersecurity conferences and
They are a Graduate Research Assistant with the
workshops. Mittal’s work has been cited in the LA times, Business Insider,
Predictive Analytics and Technology Integration
WIRED, the Cyberwire, and other venues. He is a member of the ACM and
(PATENT) Laboratory in collaboration with the
IEEE.
Institute for Systems Engineering Research. Ad-
ditionally, they perform research with the Social Therapeutic and Robotic
Systems (STaRS) research lab. Their research interests include explainable
artificial intelligence and its applications, cognitive science, and human-
computer interactions as applied to human-agent teamming. They are a
member of ACM and IEEE at Mississippi State University.
JOSEPH JABOUR received his B. S. of Com-
puter Science from the University of Mississippi in
2019, and is currently pursuing an M. S. of Com-
puter Science from Mississippi State University.
He is a Computer Scientist at the Information
Technology Lab (ITL) of the Engineering Re-
ALEXANDER SOMMERS (Member, IEEE) re- search and Development Center (ERDC) in Vicks-
ceived his B.S. in Computer Science from Saint burg, MS. He began working at the ERDC in
Vincent Collage and his M.S. in Computer Science 2019, and he has pursued research in the field
from Southern Illinois University. He is pursuing of Artificial Intelligence and Machine Learning.
A Ph.D. in Computer Science at Mississippi State Additionally, he has performed a significant amount of work in the fields
University with a concentration in machine learn- of Data Visualization, Digital Twins, and many other forms of research. He
ing. has since presented at several nationally recognized conferences, has held a
He is a Graduate Research Assistant in the Pre- vice chair position in the ERDC Association of Computing Machinery, and is
dictive Analytics and Technology Integration Lab- currently a facilitator of the ERDC ITL Field Training Exercise based off of
oratory (PATENT Lab), in collaboration with the leadership principles from the Echelon Front. He has received awards for his
Institute for Systems Engineering Research. His work concerns synthetic research and development including but not limited to the ERDC Award for
time-series generation and remaining-useful-life prediction. His interests are Outstanding Innovation in Research and Development. He seeks to push past
the application of machine learning to reliability engineering and lacuna the forefront of technological development and innovation and endeavors to
discovery respectively. He is a member of IEEE and ACM. identify and implement solutions to our nation’s leading causes of concern.

SOMAYEH BAKHTIARI RAMEZANI (Member, MARIA SEALE received the B.S. degree in com-
IEEE) received the B.S. degree in computer engi- puter science from the University of Southern Mis-
neering and the M.S. degree in information tech- sissippi, in 1987, and the M.S. and Ph.D. degrees in
nology engineering from the Iran University of computer science from Tulane University, in 1992
Science and Technology, in 2004 and 2008, respec- and 1995, respectively.
tively. She is currently pursuing the Ph.D. degree Prior to joining the Information Technology
in computer science with Mississippi State Univer- Laboratory, U.S. Army Engineer Research and De-
sity. velopment Center (ERDC), in 2016, she held po-
She is a Graduate Research Assistant with the sitions with the Institute for Naval Oceanography,
Predictive Analytics and Technology Integration the U.S. Naval Research Laboratory, and various
(PATENT) Laboratory in collaboration with the Institute for Systems Engi- private companies, as well as a tenured Associate Professorship with the
neering Research. Prior to joining Mississippi State University, in 2019, she University of Southern Mississippi. At ERDC, she has been involved with
was with several companies in the energy and healthcare sectors as an HPC research in making scalable machine learning algorithms available on high-
programmer and a data scientist. She is a 2021 SIGHPC Computational and performance computing platforms and expanding the center’s capabilities
Data Science Fellow. Her research interests include probabilistic modeling to manage and analyze very large data sets. Her research interests include
and optimization of dynamic systems, the application of ML, quantum natural language processing, machine learning, natural computing, high-
computation, and time-series segmentation in the healthcare sector. She is performance data analytics, and prognostics and health management for engi-
a member of an ACM, the President of the ACM-W Student Chapter at neered systems. She is a member of the Prognostics and Health Management
Mississippi State University, and the Chair of the IEEE-WIE AG Mississippi Society, the American Society of Mechanical Engineers, and the Association
section. of Computing Machinery.

26
Cummins et al.: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and Opportunities

SHAHRAM RAHIMI (Member, IEEE) is cur-


rently a Professor and the Head of the Department
of Computer Science and Engineering, Mississippi
State University. Prior to that, he led the Depart-
ment of Computer Science, Southern Illinois Uni-
versity, for five years. He is also a recognized
leader in the area of artificial and computational
intelligence, with over 220 peer-reviewed publica-
tions and a few patents or pending patents in this
area.
He is a member of the IEEE New Standards Committee in Computational
Intelligence. He provides advice to staff and administration at the federal gov-
ernment on predictive analytics for foreign policy. He was a recipient of the
2016 Illinois Rising Star Award from ISBA, selected among 100s of highly
qualified candidates. His intelligent algorithm for patient flow optimization
and hospital staffing is currently used in over 1000 emergency departments
across the nation. He was named one of the top ten AI technology for
healthcare, in 2018, by HealthTech Magazine. He has secured over $20M
of federal and industry funding as a PI or a co-PI in the last 20 years. He has
also organized 15 conferences and workshops in the areas of computational
intelligence and multi-agent systems over the past two decades. He has served
as the Editor-in-Chief for two leading computational intelligence journals
and is on the editorial board of several other journals.

27

You might also like