0% found this document useful (0 votes)

19 views15 pages

Approach_to_provide_interpretability_in_machine_le

Uploaded by

Melissa Lounis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views15 pages

Approach_to_provide_interpretability_in_machine_le

Uploaded by

Melissa Lounis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Industrial Artificial Intelligence (2023) 1:10

https://doi.org/10.1007/s44244-023-00009-z

RESEARCH

Approach to provide interpretability in machine learning models

for image classification
Anja Stadlhofer1 · Vitaliy Mezhuyev1

Received: 10 January 2023 / Accepted: 27 June 2023

Abstract
One of the main reasons why machine learning (ML) methods are not yet widely used in productive business processes is
the lack of confidence in the results of an ML model. To improve the situation, interpretability methods may be used, which
provide insight into the internal structure of an ML model, and criteria, based on which the model makes a certain prediction.
This paper aims to consider the state of the art in interpretability methods and apply the selected methods to an industrial
use case. Two methods, called LIME and SHAP, were selected from the literature and next implemented in the use case for
image classification using a convolutional neural network. The research methodology consists of three parts, the first is the
literature analysis, followed by the practical implementation of an ML model for image classification and the subsequent
application of the interpretability methods, and the third part is a multi-criteria comparison of selected LIME and SHAP
methods. This work enables companies to select the most effective interpretability method according to their use case and
also to increase companies’ motivation for using ML.

Keywords Machine learning · Interpretability methods · LIME · SHAP · Image classification · Convolutional neural
networks · Multi-criteria analysis · Industry

1 Introduction stabler results [2]. Being successfully implemented, ML

models can fully automize industrial processes, previously
Industry 4.0 and Smart Factory have become common terms done by humans.
in a modern business environment. There are numerous pro- ML in image classification is a subset of artificial intel-
jects and initiatives to push digitalization in the industrial ligence algorithms, which aims to make decisions based on
sector. Data are the driver for Industry 4.0, and in the last a developed model [3]. An ML model is obtained through
years, enormous amounts of data were generated and used studying, engineering, modelling, and training with data
for analyses in many industrial fields. Companies need to according to the use case. An image classification model
respond to the digitalisation trend to stay competitive in their can make predictions and help in various industrial settings,
business areas and gain a more automized and cost-efficient for example, in digital quality control. Accordingly, the input
production process, leading to an economical advantage [1]. data quality and the ML model’s structure are crucial ele-
In the last few years, methods of machine learning (ML) ments for image classification [2].
have become increasingly important for industry digitaliza- Due to the rising application of ML in image classification,
tion. The application of ML is getting assumed as a must- several challenges came up. One of these challenges is the
have in every smart factory. ML approaches make human lack of transparency and interpretability of ML models and
work less time-consuming, more efficient, and often provide their prediction outcomes. Most of the existing models are
“black boxes” that predict an outcome in a way, that humans
* Vitaliy Mezhuyev do not understand the internal mechanisms of predictions.
[email protected] Understanding the behaviour of the ML model is important
Anja Stadlhofer to give trust in the results and the ML approach itself. There-
[email protected] fore, explainable and interpretable ML models have become
progressively important [4]. Interpretable ML models provide
1
Institute of Industrial Management, FH Joanneum University insights into how the model works and why it produces the
of Applied Sciences, 8605 Kapfenberg, Austria

13
Vol.:(0123456789)
10 Page 2 of 15 Industrial Artificial Intelligence (2023) 1:10

results. There exist several methods concerning how to achieve ML can help with a variety of different problems and has
those interpretations and how to understand them [5]. One of a lot of practical applications [8].
the most common is the Local Interpretable Model-agnostic Classification algorithms solve the task of assigning a
Explanations (LIME) [6] and Shapley Additive Explanations category to an item. Examples of that are classifying docu-
(SHAP) [7] methods. ments and assigning those to the categories, such as politics,
This research develops an approach to give interpretations of sports, weather, etc., or classifying images and assigning
the predictions for image classification models with LIME and those to the categories, such as a cat, a dog, or a bird. In
SHAP methods. The methodology includes both theoretical (lit- some cases, the approach goes outside of discrete classes to
erature analysis and multicriteria evaluation) and experimental predict real numbers with regression algorithms (e.g., stock
stages. The experimental setup consisted of three parts. The first values) [8, 9].
step was a collection of image data for training and validating
the model. The second step contained the design and training
of an ML model to get prediction results. This was performed 2.1 Convolutional neural networks
iteratively through alternating model parameters, adapting image
data, and reducing environmental influences. Then, the result CNNs are deep Artificial Neural Networks (ANNs) that are
was validated and verified. The last part included interpreting trained with many layers [10]. The training process belongs
the final model predictions to explain why the model gave an to the category of supervised ML methods. They are mostly
accurate prediction and to understand the mapping between the used for applications in computer vision, object recognition,
model structure and a prediction result. Therefore, the following biological computation, and image classification [11, 12].
research questions are addressed: Explanation of CNN starts with an introduction to ANN,
proceeds with explaining different layers of a CNN, and fin-
What are the current approaches to support interpret- ishes using CNNs for image classification.
ability in ML models for image classification?
To answer this question, literature analysis and multicriteria
comparison of the existing methods are done: 2.1.1 Artificial neural networks

What is the most effective interpretability method for a ANNs were inspired by the real-world biological neural net-
specific use case? works in the brain [13]. The neuron has so-called dendrites
To answer this question, developed ML models were which act as the signal transmitters to the cell body. The cell
explored with selected interpretability methods. Their predic- body processes these signals, and the axon is responsible for
tions were examined to compare the methods. sending these signals to other neurons. To make this pos-
This paper is structured as follows. Section 1 introduces sible, the axon terminals are connected to the dendrites of
the topic and formulates the problem statement and research another neuron. An NN is made up of one-to-many neurons
questions. which are in communication with each other. What one neu-
Section 2 gives the machine learning fundamentals. It intro- ron puts out can be used as input by another neuron [14].
duces the principles of ML, focusing on convolutional neural An ANN can be compared to a directed graph, where the
networks (CNNs) and image classification. nodes are the neurons, and the edges are the links between
Section 3 provides a theoretical overview of interpretability the neurons. The neuron gets a weighted sum of the out-
in ML models, focusing on the LIME and SHAP methods. puts of neurons as input [15]. The perceptron is the simplest
Section 4 describes the implementation part. The methodol- ANN, because it consists of one single neuron. It takes an
ogy is discussed in detail, and the results are presented. input vector, resulting in a weighted sum, and applies a step
Section 5 is the comparison and evaluation section. The function to the sum for the output.
interpretability methods are compared and evaluated based on The algorithm of an ANN uses mini-batches to work
the literature research chosen metrics. through the training data sets. Each time, it goes through a
Section 6 provides a conclusion that summarizes the out- batch, a so-called epoch is finished. The mini-batch starts at
come of the research and gives an outlook on future work. the input layer, where it enters the network. After that, it is
passed to the first hidden layer, where the output of all neu-
rons is calculated. This output is then passed to the next layer
2 Machine learning fundamentals and this process continues until all layers are done and the
output layer is reached. The next step of the algorithm is to
The ML approach consists in building a model based on recognize output errors, which are identified through a loss
a given data set, to make predictions without an explicitly function that compares the required and the actual output.
programmed algorithm. To further investigate the errors, every output connection is

13
Industrial Artificial Intelligence (2023) 1:10 Page 3 of 15 10

measured regarding how much it contributed to the made 2.2 Image classification with CNN
error. Thus, for every epoch, the algorithm makes a predic-
tion and measures its error. After this, the algorithm goes Image classification is one of the applications for a CNN.
through all layers backwards to measure the contribution of It takes an image as an input and generates an output that
each connection to the made error. In addition, based on that, classifies the image to a certain class. The output also gives
it adapts the connection weights which helps the network to a probability of whether the image belongs to a certain class
produce fewer errors in the future [13]. [18].
The classification task starts with data collection. Then,
2.1.2 CNN layers the data have to be split into a training data set, a test data set
and a validation data set. The next step is to build the CNN
The CNNs are structured in layers. The first layer is con- according to the use case. Next, the model is trained with the
volutional, which focuses on detecting features. Those fea- training data set. During the training, the model is evaluated
tures can be lines, edges, colour, or other visual components. constantly with the validation data set [19, 20]. The model
The filters detect the features, and the more filters exist in performance and accuracy are verified with the test data set.
the layer; the more features can be determined. The filter is If the classification task is binary, a single output neuron
specified by a hyper-parameter that controls its width and is enough. The neuron output shows the probability of the
height. Furthermore, there are weights, defined between the classification. In the case of more than one label, more out-
convolutional layer and the previous layer. Those weights put neurons are needed [13].
can be used in another convolutional layer, which reduces
the processing time. The input 3D box has the same width
and height as the image itself and its depth is based on the 3 Interpretability of machine learning
image’s colour depth. The input for the next layer is a 3D models
box, characterized by the hyper-parameters of the previous
layer [2, 16]. ML algorithms are used as black boxes and it is often not
The purpose of a pooling layer is to decrease the size of a clear how they produce specific results. At the same time,
3D box. It transforms each input point, arranged in groups of and especially in critical applications, there is a need to trust
an image into a single value. Those groups can be compared the algorithm, so that it makes the correct output [21]. Inter-
to the pixels in an image. Pooling layers do not have weights pretability in ML is the concept of understanding the predic-
and they do not influence the training of the network. There tion of a model [22].
exist different pooling layers, where max pooling and aver- There exist several interpretability methods for human-
age pooling are the most used ones. When using max pool- friendly explanations of ML models, especially their pre-
ing, each group of input points is transformed into a single diction outcomes [23]. Without interpretability methods,
pixel with the greatest value of the group. Average pooling models are often evaluated only with accuracy metrics.
works the same way but with the difference that the value Explaining the prediction outcome means providing textual
of the transformed pixel is the average of the group [2, 16]. or visual tools to display the relationship between the pre-
Another layer type is the dense layer, which connects diction of the model and the involved in training data. The
every neuron with the neurons of the previous layer. In a focus of this research lies in explaining the two methods
CNN, it means that every output 3D box is connected to LIME and SHAP [24].
every neuron of the dense layer and passed through an acti-
vation function. Dense layers are often used at the end of 3.1 LIME
a neural network. The last dense layer usually makes the
classification task. Each classification class will then be one The LIME, which is short for Local Interpretable Model-
output neuron with corresponding probabilities of the clas- agnostic Explanation, is a model-agnostic method which can
sification predictions [13, 16]. be used to provide simple interpretations to model predic-
Dropout layers are necessary for preventing a neural net- tions. LIME is generally used for interpretations of models,
work to be overfitted. It sets in the training phase the number which are not highly complex [22, 25]. The method works
of input elements to zero, which can lead to higher learning with the surrogate model technique, which suggests using a
rates. During the test, validation, and production phases, the simpler model to generate a prediction in exchange for the
dropout layer is not used. Dropout layers are used in big complex original model. For ANN, this means that the origi-
networks, where model overfitting is likely [2]. nal model is transferred to an interpretable model, which
A convolutional neural network is made by putting all tries to replicate the behaviour of the original one. This
these layers together. How many layers of which type are interpretable simpler model then produces the explanations
used and depends on the use case [17]. for the original NN model [26].

13
10 Page 4 of 15 Industrial Artificial Intelligence (2023) 1:10

The overall idea behind LIME as an interpretability attribution masks. LIME requires a large number of ran-
method may be explained based on Fig. 1. The model, which domly perturbed samples to compute local explanations for
needs to be interpreted, is a black box model. With LIME, complex ANN models. To produce an actual explanation,
the goal is to create a white-box model, which is simpler a prediction of the class of every sample has to be done,
and, therefore, easier to provide the local explanations for which needs significant computational power in the case of
the original complex model. First, the influencing factors in a large-size network [26].
the black-box model for a single point of interest need to be
understood. The single point of interest, which is going to be 3.1.1 LIME for images
explained, is marked as the black cross (Fig. 1). The nonlin-
ear background in two colours represents the decisions made Using LIME for image interpretation differs from tabular
about whether an instance belongs to one of the two classes. or text data. Here, the target cannot be reached by perturb-
Around the point of interest, a new data set is generated to ing individual pixels, because a huge number of pixels is
create the linear white-box model (a surrogate model). This involved in predicting the class. The approach here is to seg-
new data set consists of sample points represented as the ment the image into so-called superpixels that may be turned
dots in Fig. 1. on or off. During this process, several image variations are
These sample points are weighted corresponding to their generated. The superpixels consist of a number of intercon-
distance to the point of interest. Weight is graphically dis- nected pixels, having a similar colour. Superpixels can be
played by their size. A local linear model which is displayed turned on or off, where turned-off pixels are coloured in a
as the dotted line is then trained and used for the approxima- specific colour, for example, grey [22].
tion of the original model. Instead of linear models, decision The steps for creating an image interpretation with LIME
trees can also be used as a choice for the local surrogate are as follows. First, a prediction of the correct image class
model. Therefore, a local interpretation can be achieved, but with the trained ML model is to be made. The class with
is not globally correct [22, 27]. the highest probability to be correctly predicted is used for
The LIME method is implemented as a library in Python the interpretation. To create an interpretation, LIME builds
and R for creating explanations of individual predictions. a new data set of random perturbations. A local surrogate
The input data format can be text, images, or tabular data. model is fitted, where the computed superpixels for the input
LIME can be used for multi-class predictions [23]. The image are used to create the random perturbations. Those
method was proposed in 2016 by Ribeiro et al. [24]. LIME superpixels are computed with the quick shift algorithm.
generates explanations for single instances by mapping input The number of perturbations is user-specific. The trained
data to an interpretable representation. This could be pixels model predicts the class of each perturbed image. Before fit-
when working with images or a set of words when working ting the surrogate model, weights of the images, close to the
with text data [26]. original image, have to be applied to suggest the importance
In more detail, LIME perturbs the input data samples of the perturbed images. This importance is calculated with
and observes the change in the predictions to understand a distance metric that gives the gap of each perturbation
the model [22]. Here, perturbation is a method that is used to to the original image (when all superpixels are on). This
explain feature relevance. This is done by measuring altera- weighting is done for all perturbed images in the data set.
tions in the prediction outputs when features are modified. Then, the surrogate model is fitted with the perturbations,
To succeed in explaining feature contribution, perturbation predictions, and weights. As result, a factor for each super-
replacing, ignoring nonsignificant features, and learning pixel is calculated, which states the effect of the superpixel
on the right class prediction. The sorted factors are used to
decide on the most important for the prediction superpixels.
In addition, a heatmap can be overlayed to see the contribu-
tion of superpixels.
This result is the LIME interpretation, where shown
superpixels have the strongest impact on the prediction. In
addition, the result allows users to understand that the model
makes the prediction based on significant parts of the image
according to the predicted class. To validate the model, sev-
eral sample images are to be interpreted, and the overall
result must be evaluated [22, 28].

Fig. 1 Idea behind the LIME method [27]

13
Industrial Artificial Intelligence (2023) 1:10 Page 5 of 15 10

3.1.2 Advantages and disadvantages in the model and the game reproduces the result of the ML
model [22]. For example, the Shapely values may be used to
Because of the representation of interpretable data, LIME is predict the price of an apartment. For example, we are look-
used frequently for image and text analysis. It is one of the ing for an apartment with the characteristics of being near a
few interpretability methods that could work with tabular park, having 50 m2 on the second floor with no cats allowed
data, text data and images. The inner algorithm of LIME is and the price of €300,000. This prediction may be explained
based on the generation of a simpler model to approximate by looking at each of these feature values individually and
the original model. It allows the use of the same local, inter- explaining their contribution to the overall prediction. Say,
pretable model to do the interpretation [27, 29]. the amount of €310,000 is the average prediction the model
For an end user, it is easy to find an explanation in the made for all apartments. The goal now is to compare the
interpreted parts of an image or text and to understand it. average prediction to the prediction with emphasis on how
LIME is a good option for not very experienced ML users, much each feature value contributed to that. For linear mod-
because LIME produces human-friendly explanations. els, the effect is calculated through the weight of the feature
At the same time, LIME might not be the best option to times the feature value.
use when detailed prediction explanations are required. SHAP estimates the effect through local models and
For tabular data, LIME may also not be the best option, Shapley values, by assigning payouts to players, which
since there are issues in finding the correct interpretable depends on their contribution to the total pay-out. The game
representation. At the same time, there have been attempts in this setting is the prediction task for a single element in
to solve this problem with several different transformation the data set. The earning is the made prediction minus the
methods. average prediction for all elements. The players, as men-
What should also be considered when working with tioned before, are the feature values of the element, which
LIME is that the needed data points are sparse and defining help to get the earnings. In the apartment example, the pre-
a local neighbourhood is not a simple task. If the neighbour- diction of €300,000 was achieved through the feature val-
hood is changed, the explanation results are different, caus- ues describing its characteristics. The goal of explaining the
ing instability in the method. Due to the unstable outcomes difference was achieved by subtracting the €300,000 from
in the repeating sampling process, there is a need to check the average predicted €310,000 resulting in a difference of
for every iteration if interpretations make sense. As a result, − €10,000. The interpretation of the ML model could be,
LIME is a promising method for high-dimensional data, for example, that the park contributed €30,000, the area
and its issues are to be overcome in its further development €10,000, the 2nd-floor €0 and the banned cat − €50,000
phases [27, 29]. which results in the sum of − €10,000.
The Shapley value for one feature is calculated by the
average marginal contribution of this feature value above
3.2 SHAP
all potential coalitions. A sample contribution of the banned
cat feature value appended to a coalition of a park is nearby
SHAP is the abbreviation for Shapley Additive exPlanations.
and the area can look like the following. A random apart-
It is an interpretability method with the idea to explain ML
ment from the data set is selected and its value for the floor
models using a game theory approach [7]. This approach
feature with a coalition of a park nearby, the banned cat and
considers a feature for a data instance as a player and the
the area feature. The previously 2nd-floor feature is now set
prediction as a playout. The SHAP method works based on
randomly as a 1st-floor feature. With this information, the
Shapley values, which are used to rank the ML model’s fea-
price of €310,000 is predicted. The next step is to elimi-
tures. Overall, the SHAP approach considers all predictions,
nate the banned cat feature and substitute it with a random
which makes it more reliable and ensures stable results as
allowed or banned cat feature. Then, the price with the park
opposed to LIME. At the same time, this makes it more
nearby and area coalition may be €320,000. The banned cat
computationally time-consuming. SHAP can be used both
feature contributed − €10,000 to the prediction, calculated as
for model-agnostic and model-specific techniques [23, 25]. It
€310,000 minus €320,000. These steps to create the interpre-
is a post-hoc method used mainly in local interpretations, but
tation are based on the values for the floor and cat features
global ones are also possible. Before discussing, the SHAP
values from the randomly selected apartment. The estima-
values are considered in more detail.
tion will improve with every sampling step, whereby the
contributions are getting averaged. Such calculations need
3.2.1 Shapley values
to be done for all potential coalitions resulting in the Shapley
value, which is the average of all contributions to all poten-
Shapley’s values came from the game theory. The relation-
tial coalitions [29].
ship between them can be seen as the players being included

13
10 Page 6 of 15 Industrial Artificial Intelligence (2023) 1:10

Shapley values are often used in the approximate solu- instance based on the weight of the coalition and LIME
tion, because their calculating takes up a lot of time due to applies weights based on the original instance. The idea
many possible coalitions. Furthermore, Shapley values tend here is to learn about features when they are isolated. For
to be misinterpreted in a way that they are the difference example, when the coalition has only one feature, then the
between the predicted value after deleting a certain feature effect this feature has on the prediction can be derived. The
from the model. Instead, a Shapley value is the contribution same principle applies when the coalition consists of many
of a feature value to the difference that exists between the features. TreeSHAP uses the model-specific approach and
mean and the actual prediction. This interpretability method is faster than KernelSHAP. The principle of TreeSHAP is
uses all features as opposed to LIME, where the option to to generate computations down the tree at the same time.
select is given. Shapley values deliver a full explanation, TreeSHAP may have the issue that a feature, which does
provided through the full distribution among all included not influence the prediction, can be assigned other than
feature values of the data instance. In addition, comparison the zero value. This is the case, when a feature correlates
against a subset or a single data point is possible, not only with another feature, significantly influencing the predic-
for average predictions as LIME does [26, 29]. tion [22].

3.2.2 SHAP method
3.2.3 SHAP for images
SHAP is implemented as a python library. Due to the
assumption that the local surrogate model must not be linear, SHAP for images is used mostly for image classification.
SHAP is more time-consuming than LIME [30]. The idea here is to determine for every pixel in the predicted
SHAP provides an interpretation, based on explaining image the level of pixel contribution to a certain class. To
individual predictions. This is done by calculating the con- make the interpretation, an NN model for image classifica-
tribution of every feature to the prediction with the Shapley tion is to be trained and next used for a prediction on a test
values. The feature values of an example data instance can set image for every class. Finally, the SHAP values with the
be seen as a player in a coalition according to the game help of the SHAP library are generated and visualized. The
theory approach. A player can be a cluster of feature values interpretation of visualization of the images shows high-
or a single feature value. As an example, the single feature lighted parts in shades of red and blue. The image labels
value can be distributed on superpixels for images. SHAP show the predicted classes in descending order of probabil-
has three required properties. The first one is local accuracy ity. The red pixels indicate SHAP values that contributed in
to make the method efficient. The second one is missingness, a positive way to the classification of the labelled class. The
which helps with keeping the local accuracy in place and blue pixels indicate the opposite meaning, i.e., they contrib-
is mostly relevant for constant features. The third property ute negatively to the prediction [32].
is consistency. It ensures that if the model changes by its
contribution of a feature value, then the Shapley value also
changes accordingly [29]. 3.2.4 Conclusion
There exist several approximation methods for calculat-
ing Shapley values, where KernelSHAP and TreeSHAP The SHAP interpretability method has a sound theoreti-
[26] are most used once. KernelSHAP is a kernel-based cal foundation. The method uses the distributed feature to
approximation approach and TreeSHAP is efficient for generate the Shapley values. SHAP relates to LIME, taking
tree-based models [29]. KernelSHAP is a combination of the interpretable ML area a step further to be unified in an
linear LIME, and Shapley values and is a model-agnostic overall approach. The possibility of providing global and
implementation of SHAP. It is an algorithm that makes local explanations is also an advantage of SHAP. The Tree-
the Shapley value approximation locally referred to a data SHAP allows a user to create fast interpretations. The slower
instance, by creating samples of possible coalitions. The KernelSHAP may be applied for the models, where TreeS-
kernel in this case has the function of weighting the coali- HAP is not feasible. KernelSHAP can be time-consuming
tions [26, 31]. To be more specific, the first step to cal- for the computation of Shapley values if a lot of instances
culate the contribution of a feature to the prediction is to are involved. In addition, creating misleading interpretations
define sample coalitions, and then to get a prediction for is possible, for example, to hide biases [27, 29]. The final
each one of them. Afterwards, the SHAP kernel applies comparison and evaluation of the LIME and SHAP methods
weights, and the weighted linear model is fitted to return will be done in Sect. 5. The metrics will be derived from the
the Shapley values to be further processed. The differ- literature research.
ence to LIME is that SHAP applies weights to the sample

13
Industrial Artificial Intelligence (2023) 1:10 Page 7 of 15 10

4 Implementation started with splitting the image data into a train and test
set and labelling the images according to their class. The
For the implementation, a use case in the Smart Production last step was data augmentation to prepare for building the
Lab of the FH JOANNEUM University of applied sciences, CNN model. Layers and parameters were adjusted before the
Austria, was developed. The focus was on creating an ML training starts and the trained model could be validated. If
model for image classification for quality control and on the results were not acceptable, then the adjusting repeated
providing interpretability options for the developed model. again. After this, the model was verified by classifying a test
The product of the selected use case was a watch, manufac- image data set to determine whether the model made the
tured in the Smart Production Lab. One part of the product, correct prediction. The individual steps and milestones are
namely, the watch stand, was chosen for quality control. described next in more detail.
The developed ML model classified two types of images.
The image taken of the watch stand could be considered 4.1.1 Experimental setup
as “good”, which means that the product has the desired
quality, so that it can be further processed to be assembled The development of the ML model was done with a virtual
into the whole product. The other type of image classifies machine on a High-Performance Cluster (HPC) provided
the products with defects, with the consequence that these by the FH Joanneum. The HPC is a network of servers that
products cannot be further processed. Therefore, the devel- works in parallel to increase computing power and, there-
oped model distinguished between these two types, which fore, makes model training faster. To upload the image data
resulted in a binary classification problem. To analyse, why on the HPC, the open-source server- and client-software
the developed ML model made a certain prediction, the FileZilla was used. Furthermore, Anaconda was applied
methods LIME and SHAP were implemented. The results as the environment management system for installing, run-
were compared to provide a selection approach, helping the ning and updating packages and their dependencies. Conda
industry in choosing a proper method based on the use case. was used in combination with the programming language
Python. As a development environment, the Jupyter-Note-
4.1 Machine learning model implementation book was applied, which is an open-source, browser-based
tool that enables users to create documents with live-code,
Figure 2 shows the methodology flowchart for the imple- text and visualizations. For the CNN implementation, the
mentation stage of the ML model for image classification. open-source platform TensorFlow was used. TensorFlow
It is segmented into four major phases, namely, collection, incorporates a lot of tools and libraries for ML applications.
preprocessing, learning, evaluation, and prediction. In detail, the module Keras, which runs on top of Tensor-
The starting point was to collect data by taking product Flow, was used. It is a deep learning Application Program-
pictures, changing the lighting conditions, and adjusting ming Interface, written in Python for simple, flexible and
angles and perspectives. This process was done with prod- powerful use to solve ML tasks. Keras has a lot of libraries,
ucts of the class “good” and “defect”. The second phase which can be used for the implementation of the ML model,
especially for image classification.

4.1.2 Data collection and preprocessing

For this use case, the images of the watch stands were made
with the camera of a smartphone. The camera of the brand
Sony has 64 megapixels, which was enough for the distinc-
tion between the classes. Furthermore, the images were
downsized to accelerate the computations. A possible series
of “good” products are shown in Fig. 3. Examples of defec-
tive parts are shown in Fig. 4.
These images were organized in a directory structure
for the model training and its further steps. The main data
directory had two subdirectories, called “train” and “test”. In
addition, these two directories both had two subdirectories
called “defect” and “good”. This structure is due to using
Keras, because it recognizes the classes according to the
directory names. 80% of acquired data were used for model
Fig. 2 Methodology flowchart of the model implementation training and 20% for testing purposes. The number of images

13
10 Page 8 of 15 Industrial Artificial Intelligence (2023) 1:10

Table 1 Confusion matrix

Predicted good Predicted
defective

Actually good 1197 4

Actually defected 2 1199

Fig. 3 Series of product images of the class “good”

Fig. 5 Methodology chart of the LIME method

the developed CNN ten times. For every epoch, the weights
were changed to create higher accuracy outputs. To opti-
mize passing the entire data set through the network, data
were divided into batches with the size 50 (the number of
images used at once to pass the CNN). Iterations of batches
Fig. 4 Series of product images of the class “defect”
were used until all images from the data set were through
to complete one epoch. For this configuration, the training
taken for one class was 6007. This number was split into process took up around 2 min and 20 s to complete. The
the “test” and “train” directories, resulting in 4806 pictures entire process, from loading the images to training the model
(both “defect” and “good”) in the “train” directory and 1201 to verify the results, took up 6 min and 30 s.
images in the “test” directory (“defect” and “good”). In total, The overall model accuracy at the end of the training
12,014 images were taken to generate the data to create an with 10 epochs was about 99%. The corresponding confu-
image classification ML model. To speed up the process, sion matrix for the model validation can be seen in Table 1.
image retrieval was performed by making a video of the The model made the prediction for every 1201 «good» and
product parts in different angles and positions with different 1201 «defect» images. The model predicted 1197 images,
lighting conditions. Then, the video frames were cut out to that were labelled as good, to actually be good, and only four
be used as the images. of these good images got predicted as defective ones. The
second line in the confusion matrix says that 1199 images
4.1.3 Model description that are labelled as «defect» were classified correctly and
only two of those were classified as «good» ones.
The implementation of the ML model was based on Keras
libraries. The model consists of the core data structure of 4.2 LIME interpretation methodology
Keras, namely, layers. The simple sequential model was
used, which is a linear stack of several layers. The model The methodology of creating the interpretation with LIME
consisted of three major convolutional layers, three Max- included several steps (Fig. 5). The first step is to choose
Pooling layers, one flattens layer and one dense layer. This an image from the test data set. After this, the data were
structure had been chosen by an experimental approach. This randomly perturbed by turning superpixels on and off and
means, during the development phase, different design struc- those newly generated samples were weighted. The weight-
tures were created and evaluated. Once the desired result ing of the samples was done by the means of their proxim-
with the intended accuracy was achieved, this model struc- ity to the region of interest. Based on the new samples, a
ture was selected to work with. prediction was made with the original model. Thereafter, a
The next step was compiling the model to initiate the new weighted model with the newly generated data set was
learning process. Then, the actual training could start. trained and referred to as the interpretable or the local sur-
This was done by iterating over the training data with the rogate model. After the feature selection, the local surrogate
fit() function. The training was done for 10 epochs, which model was interpreted with the prediction on the test image
means that the entire training data set was passed through and its result was displayed.

13
Industrial Artificial Intelligence (2023) 1:10 Page 9 of 15 10

The idea behind the process is that LIME uses deci- parts show the increase or decrease in the probability of the
sion boundaries for the two classes. The prediction of an image to be classified to the first or the second class.
instance, also called a data point, gets then the explanation. Another explanation method of LIME interpretation is
This explanation is generated by creating a new data set of to use a heatmap, which plots the explanation weights (see
around the data point’s located perturbations. Every per- Fig. 7). The colorbar on the right displays the values of the
turbation gets predicted from the ML model and classified weights in the image. To create this heatmap, every expla-
into the class “good” or “defect”. Every perturbation has a nation weight was mapped to its corresponding superpixel
level of importance. Perturbations are more important when and then plotted using different colours. With that, a more
the distance to the original data point is small. Next, these detailed insight into the interpretation can be given, showing
distances are used to calculate the weights. These prepara- how the superpixels are contributed. For example, the light
tions are used to create the local surrogate model. The key pink part in the middle is an important one, belonging to the
components to make the interpretation, therefore, are the positively contributed part.
newly generated data set, the predictions on the new samples
and their weights. 4.2.2 Result interpretation

4.2.1 Software implementation Figures 8 and 9 allow us to get a better understanding of the

LIME interpretations on different sample images. In these
For implementation, the LIME libraries of version 0.2.0.1 images, the green highlighted parts represent the positively
were used. The LimeImageExplainer was used to create the contributed superpixels to the correct prediction and the red
LIME explainer instance. The parameters were the image, highlighted parts are the negatively or not contributed super-
on which the interpretation should be done and the model to pixels. For all presented in Figs. 8 and 9 sample images, the
be used for the prediction. The top_labels parameter induces ML model made the correct prediction. It classified all the
to have a look at the top two predictions for the test image, images in Fig. 8 as “good” products and all images in Fig. 9
since in this use case there were only two classes. The hide_ as “defect” ones. With this graphically displayed interpreta-
color parameter represents the colour for a superpixel that tion, it gets clear that the edges and screws on the images of
is turned off (i.e., it is not relevant for the prediction made). the class “good” were important for the made predictions.
In addition, the num_samples parameter sets the number of For the “defect “class, LIME interpretations focused on the
the generated artificial data points.
Applying the LIME explainer results in the explanation
of the prediction in the form of the LIME interpretation. The
first image in Fig. 6 shows the top five superpixels that are
the most important towards the correctly classified class.
The rest of the image is hidden. The second image in Fig. 6
shows the same interpretation but in a different representa-
tion style.
The red highlighted parts equal the hidden parts of the
first image and indicate that these parts contribute negatively
or have no impact on the correctly made prediction. Whereas
the green highlighted parts contribute positively to the made
prediction and played a significant role in it. The coloured Fig. 7 Heatmap of the LIME interpretation image

Fig. 6 LIME interpretation image Fig. 8 Example of LIME interpretations of the class “good”

13
10 Page 10 of 15 Industrial Artificial Intelligence (2023) 1:10

the test image with the visualized SHAP interpretation over

it.
SHAP interpretation process aims at explanation of the
prediction of a single instance in the test image. This is
achieved by calculating the contribution of each feature to
a certain prediction class. Shapley values are computed to
provide information on distribution of the prediction among
the features. For image classification, the pixels in an image
are grouped into superpixels and the prediction is allocated
to them.
Fig. 9 Example of LIME interpretations of the class “defect”

4.4 Software implementation

Programming of the SHAP interpretation method starts with

importing and initializing the shap Python package. In the
implementation, SHAP of the version 0.40.0 was used. The
two class labels “good” and “defect” were defined. After
this, an image from the test data set was chosen and loaded
Fig. 10 Methodology process chart of the SHAP method into the Jupyter-Notebook. To create the visual explana-
tion of the test image, a mask to be defined for blurring
edges and the screws of the product; and also, the holes the interpretation of the original image. The shape of the
played a significant role. It coincides with the way a human image needed to be defined to place the correct fitting mask
would classify the images, which makes the model more over the original image. The next step is to use the SHAP
comprehensible and promotes confidence towards its real- library to generate the SHAP values. The primary explainer
life application in the industry. of SHAP was used to create the interpretation of the predic-
To describe the interpretation in more detail, the first tion of the image. This explainer using every combination
image of Fig. 8 may be used. In this image, the bottom screw of the model and the masker to return a subclass object. This
is the most important for the correct prediction. The parts of is achieved with the explainer constructor with parameters:
the image towards the upper screw increased the probability the trained CNN model, the previously described masker
the image was classified into the correct class “good”. In and the class labels in the form of a list. The parameters of
the first image of Fig. 9, it appeared that nearly part of it the object include the image on which the interpretation is
contributed positively to the correct prediction. However, to be made and the number of sample images that should
one corner with an exactly framed screw is highlighted red. be taken from the test data set. Furthermore, the number of
This means that the model considered this screw as a part evaluations of the ML to estimate the SHAP values were
of a “good” product, but because of the other three screws, defined. This number defines the duration of the explanation,
the model classified it to the class “defect”. In this way, a and correspondingly, the quality of the approximation. In
close look at interpreted images is needed to understand the this use case, the size of the batch of the evaluation was set
LIME interpretation and to encourage the model’s decisions. to 50. The output includes two interpreted images and the
Another result can be that the model makes its prediction original image. The last step is then to plot the SHAP values
based on irrelevant parts of the image and thus it should not to see the interpretation visually.
be trusted. Applying the SHAP values and plotting the result can
be seen in Fig. 11. The first image is the original image
4.3 SHAP implementation from the test data set. The second image shows the SHAP
interpretation and provides the information of the predic-
Figure 10 summarizes the most important steps of a SHAP tion result. Here, the test image was classified as “good”.
interpretation. SHAP is also a post-hoc method, meaning the The third image is assigned to the class with the second
interpretation is done on a single image from the test data set highest probability. Note, it is possible to show more than
after the model training. The original ML model makes the two classes for multiclass classification problems. Whether
prediction on the chosen test image. Then, the contribution the prediction was done correctly or not can be seen in the
of each feature is calculated. The obtained information is plotted SHAP result. The image highlights the SHAP inter-
used to compute the Shapley values to make the interpreta- pretation in the shades of red and blue (in the background
tion. The last step is to display the interpretation by plotting the original product can be seen).

13
Industrial Artificial Intelligence (2023) 1:10 Page 11 of 15 10

Fig. 11 SHAP interpretation image

The highlighted by red parts mean that they contributed

positively to the made prediction. In addition, the blue high-
lighted parts contributed negatively to the prediction. Work-
ing with superpixels makes it easier to recognize the parts in
the image that are important for the prediction. The darker
the shade of red or blue is the higher is the contribution Fig. 13 Example collection of SHAP interpretations of the class
to the classification of the particular class. This means the “defect”
parts, which were shaded darker are more important vs the
pale colors. It can also be seen by the colorbar at the bot-
tom, which represents the SHAP values corresponding to collections contain eight randomly chosen images from the
the color shade. test data set that got interpreted with the SHAP method.
On these images, the red highlighted parts represent the
4.4.1 Result interpretation positively contributed superpixels to the correct prediction
and the image parts with the blue highlighted superpixels
The SHAP method was executed on different sample images contributed negatively to the prediction. All these 16 exam-
from the test data set. A collection of the class “good” is ple images were classified correctly to their classes by ML
shown in Fig. 12 and of the class “defect” in Fig. 13. Both model. Unlike the visual interpretation of LIME, the edges
of the products in case of the SHAP do not play an important
role in the explanation of the prediction. At the same time,
important become the screws, which are placed correctly in
the “good” images, misplaced besides the product and also
holes without screws in the “defect” images.
To describe the interpretation of SHAP in more detail,
let us consider the first plotted result in Fig. 12. It shows
the product of the class “good” with two correctly placed
screws. The visualized interpretation presented besides the
actual image highlighted superpixels in shades of red and
blue according to the computed SHAP values. The dark-
est highlighted parts are shown in the places of the screws.
This makes it easy to understand why the model classified
this image as “good”. There are also red highlighted parts
but with a lighter shade. For this and for other images, the
black shadow of the screw is sometimes considered as a
hole, which is a significant feature for a “defect” product.
These are then highlighted blue, because the ML model
would classify a product with a hole as “defect”, even though
the product is “good”. Thus, these pixels contributed nega-
tively to the result. However, the image is still classified
Fig. 12 Example collection of SHAP interpretations of the class as “good”, because the other features are superior. It can
“good” be seen for the second classification with the label “defect”

13
10 Page 12 of 15 Industrial Artificial Intelligence (2023) 1:10

that the highlighted pixels have exactly the opposite colour 5.2 Presentation of the interpretation
on the same place. This shows that if the image had been
classified as “defect”, then the parts of the image that are Both methods present their results visually in a simple,
actually considered “good” in that case contribute negatively and easy-to-understand form. The representation of the
to the prediction. explained prediction in both cases is an overlay to the origi-
Let us also consider the first image of the class “defect” nal image from a test data set. Indicating the positive or
in Fig. 13. This example has nearly the same amount of negative contributed parts is made by the colour palette, both
highlighted red and blue parts, which means that it was not for LIME and SHAP. However, LIME uses only two colours,
completely sure for the ML model to which class the image which highlights either a strictly positive or negative con-
belongs. The two holes in the product contributed positively tribution. Whereas SHAP uses different shades of the two
to the correctly made prediction, since they were highlighted colours, which makes the interpretation more expressive.
in a darker-shaded red. However, the two additional screws Thus, SHAP provides more information, and a conclusion
were highlighted blue, which means that they contributed can be drawn, which positively contributed parts are more
in a negative way. This behaviour can be explained by the important than others. A similar option provides the LIME
fact that “good” products have exactly two screws in the heatmap, by segmenting the superpixels according to their
same position. These are the features that contributed to the weights, but this is only to be seen as additional information.
“good” classification. Another aspect here is the shadow To conclude on the explanatory power, SHAP appears to
of the product, which showed these parts of the image as have a more informative way of expressing and representing
important for the prediction. Looking at sample images the interpretation of the ML model prediction.
which were interpreted with SHAP, it gets clear based on
which features the ML model makes its classification predic- 5.3 Interpretability
tion. Having this knowledge, the ML model can be improved
further, for example, by eliminating shadows that interfere LIME and SHAP are primarily local interpretability meth-
with the significant parts of the image. ods, meaning that they are unaware of the inner structure
of the model. The option that SHAP provides for global
interpretability is to sum all individual predictions of the
SHAP values. Although both methods are primarily local,
5 Comparison and evaluation they use different approaches. LIME builds local surrogate
linear models for each prediction that gets explained. This
This section compares LIME and SHAP methods through creates a white-box model from the initial black-box model.
functionality metrics, derived from the literature. However, this approach is limited to the local neighbourhood
of the model. SHAP uses the Shapley values to determine
the average marginal contribution of all feature values for
5.1 Computational time to explain the prediction all possible coalitions. This means that SHAP investigates
all possible predictions of the image or non-image data. This
The computational times, needed to derive the explanation approach ensures that the interpretations of SHAP are accu-
of the prediction, were retrieved during the execution of the rate and consistent. In some literature sources, it is suggested
Python code for each method in the same controlled condi- that LIME is a subset of SHAP with a lack of consistency
tions. There were two types of times measured. The first one or accuracy.
was the CPU time (execution time). It measures how much
time has elapsed until the CPU has executed the core pro- 5.4 Applicability
gram (without initializing variables and plotting the result).
The second one was the wall time (running time), which LIME and SHAP are the most common methods in ML
measures the total time to execute the program. Since the interpretability. Therefore, they are applicable to differ-
execution of the program was done on a high-performance ent use cases and different data types in industrial set-
cluster, the wall time was smaller than the CPU time to tings. Both methods have Python implementation, which
compute the interpretation. For LIME, the CPU time for the is currently the most used programming language in ML
same ML model was 12.2 s, and for SHAP, it was 11.8 s. The applications. We need to note that the selection of the
wall time for LIME was 4.8 s and for SHAP 2.29 s. Thus, for method may depend on a specific ML algorithm. In this
this specific use case, program implementation of the SHAP use case, both methods were applied for CNN. At the same
method is slightly faster than LIME. In addition, based on time, for a model built with the k-nearest neighbour algo-
the literature, LIME is generally faster when not used for rithm, computing the SHAP values will take a long time,
image data, but for text or tabular data. in comparison with LIME. Furthermore, using LIME and

13
Industrial Artificial Intelligence (2023) 1:10 Page 13 of 15 10

SHAP on Keras machine learning models works out of 6 Conclusion and future work
the box. However, LIME, for example, cannot be used on
an XGBoost machine learning model without creating a The development in artificial intelligence and especially
workaround. In conclusion, LIME and SHAP are generally ML methods is fast-moving and makes enormous progress
well-applicable in industrial settings in contrast to similar every year [33–37]. While in some industrial areas, ML
methods, for example, Anchors. concepts are used productively, in others implementation
is lagging. Industries, which want to stay competitive must
deal with the introduction of ML in their corporate pro-
5.5 Replicability and reliability cesses, for example, in automated quality control. At the
same time, ML is often used without proper interpreta-
Recomputing the SHAP values on similar images will tion of the models. The question of why the model makes
always result in a similar explanation output. Thus, SHAP a certain prediction cannot be answered. Therefore, the
is a stable interpretability method, where the explanation use of explainability and especially, interpretability meth-
output can be replicated. On the other hand, for the LIME, ods, becomes an increasingly important approach, which
explanation outputs for similar images can be different. This should be established in every company that uses machine
problem arises because of the rather weak approximated learning applications.
local surrogate model in relation to the original black-box This work introduces the ML interpretability methods
model. This is the case because of the perturbation step of for image classification in industrial use. Specifically, it
LIME, which can differ when repeated. For our use case, implements and examines LIME and SHAP methods. To
LIME results differ slightly in the form of smaller bound- limit the area of application, the focus was given to apply-
ary shifts from the positively or negatively contributed side. ing the methods for a binary image classification, which
Superpixels, which were at the tipping point between the was performed using a CNN algorithm. Due to a limited
positive or negative contribution, changed in the repeated scope, the research insights and recommendations are
sampling process. However, the differences in the results for valid for the developed use case. At the same time, the
LIME interpretations were minimal. proposed recommendations and evaluation scheme can be
used in similar use cases in the industry.
Application of the interpretability methods to the pre-
5.6 Implementability effort dictions of an ML model provides a better understanding
of the model work. In our use case, based on the interpret-
Both methods are easy to implement with a correspond- ability methods, the ML model for image classification
ing python library. The programming effort depends on was redesigned and reached an accuracy score of 99%.
the industrial use case, available data and the selected ML Another result is giving users insights to understand
model. After the data preparation step, the right design is to the incorrect image classification, for example, that the
be found for the ML model. In conclusion, the implementa- ML model may consider a shadow of a screw as a hole
bility effort of LIME or SHAP methods for a programmer (which is a criterion for a “defect” product). In general,
is quite similar. any shadows of the product played a significant role in ML
prediction. From that, it can be concluded that the experi-
mental setup should be improved by adding light sources
5.7 Limitations to minimize the shadows.
The literature research provides the answer to the first
Implementation of the method should be always examined, research question and selects the approaches (the LIME
and the results should not be blindly trusted. Due to sample and the SHAP methods). The results of the comparison
variations, LIME lacks a guarantee of producing stable and of the methods showed that they produce similar results
consistent results for similar images. Furthermore, LIME based on different approaches. The SHAP method is, in
limits itself to producing local surrogate models with dif- some aspects, superior to the LIME method. SHAP has a
ferent quality levels, since the fit of the data to the model more elaborated theoretical foundation behind the com-
cannot be controlled. There are no clear instructions on how putation of the interpretation values and produces more
many features to select for the local surrogate LIME model, stable and consistent results. In addition, in the SHAP
which may result in either too complex or too simple inter- method, the visual interpretation is more detailed and rea-
pretation. In addition, SHAP is generally known to be slower sonable, which leads to the possibility to draw rich conclu-
than LIME when many instances need to be computed. This sions. While LIME was described in the literature sources
can be a significant criterion for the selection of the corre- as the faster alternative for SHAP, for the selected use
sponding method in the industrial context.

13
10 Page 14 of 15 Industrial Artificial Intelligence (2023) 1:10

case, SHAP was faster than LIME. Both methods operate 5. Bhatt U et al (2019) Explainable machine learning in deployment.
at a local level, which means they interpret not the whole In: Presented at proceedings of the 2020 conference on fairness,
accountability and transparency, Cambridge, United Kingdom
model, but only the final predictions. Based on these 6. Ribeiro MTC (2021) Lime. https://github.com/marcotcr/lime.
results, the second research question is answered with the Accessed 2 Jan 2022
selection of the method SHAP. It can be recommended as 7. Lundberg S (2018) Shap documentation. https://s hap.r eadth edocs.
the most effective option when providing interpretability io/en/latest/index.html. Accessed 14 May 2022
8. Mohri M, Rostamizadeh A, Talwalkar A (2018) Introduction.
for an ML model for image classification in the industrial Foundations of machine learning, 2nd edn. MIT Press, Cam-
context. bridge, pp 2–3
Future work would be to evaluate the selected methods 9. Flach P (2012) The ingredients of machine learning. Machine
on different use cases and ML models Further research learning—the art and science of algorithms that make sense of
data, 1st edn. Cambridge University Press, Cambridge, p 14
must be done, since the existing interpretability methods 10. Lindsay GW (2020) Convolutional neural networks as a model
are only at the beginning of their development and need to of the visual system: past, present, and future. J Cogn Neurosci
be evaluated in their industrial applications towards gain- 33:1–15
ing an understanding of ML results. 11. Abiyev RH, Ma’aitah MKS (2018) Deep convolutional neural
networks for chest diseases detection. J Healthc Eng. https://doi.
org/10.1155/2018/4168538
12. Zou L et al (2019) A technical review of convolutional neural
Author contributions We acknowledge that the authors have con- network-based mammographic breast cancer diagnosis. Comput
tributed significantly and are in agreement with the content of the Math Methods Med 2019:1–16
manuscript. 13. Géron A (2019) Introduction to artificial neural networks with
Keras. Hands-on machine learning with scikit-learn, Keras, and
Funding Not applicable. TensorFlow, 2nd edn. Sebastopol, O’Reilly, pp 277–291
14. Neapolitan RE, Jiang X (2018) Neural networks and deep learn-
Availability of data and materials Data are available upon request to ing. Artificial intelligence—with an introduction to machine
the authors. learning, 2nd edn. CRC Press, Boca Raton, pp 373–379
15. Shai S-S, Shai B-D (2014) Neural networks. Understanding
Declarations machine learning—from theory to algorithms. Cambridge Uni-
versity Press, New York, pp 228–230
Competing interests We acknowledge that authors have no competing 16. Heaton J (2015) Convolutional neural networks. Artificial intel-
interests. ligence for humans volume 3: deep learning and neural networks.
Heaton Research Inc., Chesterfield, pp 186–194
Open Access This article is licensed under a Creative Commons Attri- 17. Raschka S, Vahid M (2017) Implementing a deep convolutional
bution 4.0 International License, which permits use, sharing, adapta- neural network using TensorFlow. Python machine learning, 2nd
tion, distribution and reproduction in any medium or format, as long edn. Birmingham, Packt, pp 514–515
as you give appropriate credit to the original author(s) and the source, 18. Bonner A (2019) The complete beginner’s guide to deep learning:
provide a link to the Creative Commons licence, and indicate if changes convolutional neural networks and image classification. https://
were made. The images or other third party material in this article are towardsdatascience.com/wtf-is-image-classification-8e78a8235a
included in the article's Creative Commons licence, unless indicated cb. Accessed 30 May 2022
otherwise in a credit line to the material. If material is not included in 19. Hossain A, Sajib SA (2019) Classification of image using convo-
the article's Creative Commons licence and your intended use is not lutional neural network (CNN). Glob J Comp Sci Technol 19:1–7
permitted by statutory regulation or exceeds the permitted use, you will 20. Lee S (2020) How to train neural networks for image classifica-
need to obtain permission directly from the copyright holder. To view a tion—Part 1. https://sandy-lee.medium.com/how-to-train-neural-
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. networks-for-image-classification-part-1-21327fe1cc1. Accessed
30 May 2022
21. Rebala G, Ravi A, Churiwala S (2019) Machine learning defini-
tion and basics. An introduction to machine learning. Springer
Press, Cham, pp 1–2
References 22. Nandi A, Pal AK (2022) Interpreting machine learning models.
Apress, Bangalore, pp 141–278
1. Oks SJ, Frietzsche A, Lehmann C (2016) The digitalization of 23. Agarwal N, Das S (2020) Interpretable machine learning tools:
industry from a strategic perspective. In: Presented at the R&D a survey. In: Presented at the IEEE SSCI, pp 1528–1534. https://
management conference from science to society: innovation and doi.org/10.1109/SSCI47803.2020.9308260
value creation, Cambridge, United Kingdom 24. Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?
2. Bonaccorso G (2017) A gentle introduction to machine learning. Explaining the predictions of any classifier. arXiv preprint, pp
In: Machine learning algorithms—a reference guide to popular 1–10
algorithms for data science and machine learning, Birmingham, 25. Das S et al (2020) Taxonomy and survey of interpretable machine
United Kingdom, pp 6–9 learning method. In: Presented at the IEEE SSCI, pp 670–677
3. Zhang X (2020) Machine learning. A matrix algebra approach to 26. Kamath U, Liu J (2021) Explainable artificial intelligence: an
artificial intelligence, 1st edn. Springer, Singapore, pp 223–224 introduction to interpretable machine learning. Springer Press,
4. Dosilovic FK, Brcic M, Hlupic N (2018) Explainable artificial Cham, pp 192–224
intelligence: a survey. In: Presented at the 41st international 27. Biecek P, Burzykowski T (2021) Explanatory model analysis—
convention on MIPRO, Opatija, Croatia, pp 210–215 explore, explain and examine predictive models. CRC Press, Boca
Raton, pp 95–115

13
Industrial Artificial Intelligence (2023) 1:10 Page 15 of 15 10

28. Cian D, Gemert JV, Lengyel A (2020) Evaluating the performance Anja Stadlhofer received a bach-
of the LIME and Grad-CAM explanation methods on a LEGO elor’s degree in information
multi-label image classification task. arXiv preprint management from FH Joan-
29. Molnar C (2021) Model-agnostic methods. Interpretable machine neum, Graz, Austria, and is cur-
learning—a guide for making black box models explainable, 2nd rently studying in the master’s
edn. Munich, Christoph Molnar, pp 140–178 program of international indus-
30. Nayak A (2019) Idea behind LIME and SHAP. https://towar trial management at FH Joan-
dsdatascience.com/idea-behind-lime-and-shap-b603d35d34eb. neum, Kapfenberg, Austria,
Accessed 29 July 29 2022 where she focuses on the
31. Lundberg S, Lee S-I (2017) A unified approach to interpreting research field of machine learn-
model predictions. In: Proc. ICNIP, Long Beach, CA, USA, pp ing. She is presently working as
4768–4777 an IT project manager.
32. Zhang T (2021) Deep learning model interpretation using SHAP.
https://towardsdatascience.com/deep-learning-model-interpreta
tion-using-shap-a21786e91d16, Accessed 29 July 2022
33. Hartner R, Mezhuyev V (2022) Time series-based forecasting
methods in production systems: a systematic literature review.
Int J Ind Eng Manag 13(2):119–134. https://doi.org/10.24867/
IJIEM-2022-2-306
34. Hartner R, Komar J, Mezhuyev V (2022) An approach for increas-
ing the throughput of CNN-based quality inspections systems in Vitaliy Mezhuyev Ph.D. (Educa-
constrained environments. In: 11th international conference on tional Technology), Kyiv
software and computer applications (ICSCA 2022), February National Pedagogical University,
24–26, 2022, Melaka, Malaysia, pp 179–184. https://doi.org/10. Ukraine; 2002; ScD (Informa-
1145/3524304.3524330 t i o n Te ch n o l o g y ) , O d e s a
35. Mezhuyev V, Gunchenko YO, Shvorov SA, Chyrchenko DV National Technical University,
(2020) A method for planning the routes of harvesting equip- Ukraine, 2012. Professor of
ment. Autosoft. Advanced ICT and IoT technologies for the fourth informatics in Berdyansk State
industrial revolution, vol 25 Pedagogical University, Ukraine,
36. Hartner R, Mezhuyev V, Tschandl M, Bischof C. Data-driven digi- 2004–14. Professor of informat-
tal shop floor management: a practical framework for implementa- ics in University Malaysia
tion. In: ACM proceedings of the International conference ICSCA Pahang, 2014–19. From 2019
2020, February 18–21, 2020, Langkawi, Malaysia, pp 41–45 with the Institute of Industrial
37. Mueller C, Mezhuyev V (2022) AI models and methods in Management in FH Joanneum,
automotive manufacturing: a systematic literature review. In: Kapfenberg, Austria. Visiting
Al-Emran M, Shaalan K (eds) Recent innovations in artificial professor at six international uni-
intelligence and smart applications, vol 1061. Studies in com- versities. Participated in multiple international scientific and industrial
putational intelligence. Springer, Cham. https://doi.org/10.1007/ projects, devoted to the design, development, and formal verification
978-3-031-14748-7_1 of computer systems. Published over 140 scientific papers in peer-
review journals, including highly reputed venues such as Complexity,
Publisher's Note Springer Nature remains neutral with regard to Computers & Education, Cybernetics and Systems, IEEE Access,
jurisdictional claims in published maps and institutional affiliations. Information Management, Intelligent automation and soft computing,
Technology in Society among many others.

Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
No ratings yet
Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
26 pages
Machine Learning Systems
No ratings yet
Machine Learning Systems
300 pages
21 SS133
No ratings yet
21 SS133
85 pages
How To Do Machine Learning With Small Data
No ratings yet
How To Do Machine Learning With Small Data
22 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
10 pages
neural network 1
No ratings yet
neural network 1
28 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
163 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Project Report
No ratings yet
Project Report
16 pages
Little Book of Deep Learning
100% (1)
Little Book of Deep Learning
158 pages
Mlunit 1
No ratings yet
Mlunit 1
63 pages
pdf
No ratings yet
pdf
7 pages
Unit II
No ratings yet
Unit II
14 pages
Interpretability of Machine Learning: Recent Advances and Future Prospects
No ratings yet
Interpretability of Machine Learning: Recent Advances and Future Prospects
12 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Mastering AI and ML With Python_ACE_INTL (1) - Copy
No ratings yet
Mastering AI and ML With Python_ACE_INTL (1) - Copy
223 pages
lbdl
No ratings yet
lbdl
156 pages
01_ml_basics
No ratings yet
01_ml_basics
61 pages
A Survey of Machine Learning Methods For Iot and Their Future Applications
No ratings yet
A Survey of Machine Learning Methods For Iot and Their Future Applications
5 pages
ABES Presentation
No ratings yet
ABES Presentation
91 pages
Camacho_Reflection#12
No ratings yet
Camacho_Reflection#12
1 page
Cracking The AI Code - Unlocking The Secrets of Machine Learning
No ratings yet
Cracking The AI Code - Unlocking The Secrets of Machine Learning
18 pages
LBDL
No ratings yet
LBDL
142 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
140 pages
LBDL
No ratings yet
LBDL
156 pages
ML LittelBook
No ratings yet
ML LittelBook
161 pages
Shap Lime
No ratings yet
Shap Lime
6 pages
-3
No ratings yet
-3
28 pages
The - Little - Book - of - Deep Learning
No ratings yet
The - Little - Book - of - Deep Learning
140 pages
Module 2
No ratings yet
Module 2
73 pages
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
No ratings yet
The Little Book of Deep Learning - (François Fleuret) - University of Geneva-2023.compressed
163 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
168 pages
Image Classification Using Small Convolutional Neural Network
No ratings yet
Image Classification Using Small Convolutional Neural Network
5 pages
lbdl
No ratings yet
lbdl
143 pages
Deep Learning Model
No ratings yet
Deep Learning Model
144 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
155 pages
The Little Book of Deep Learning
No ratings yet
The Little Book of Deep Learning
143 pages
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
No ratings yet
Python Machine Learning - Machine Learning and Deep Learning With Python Scikit Learn and Tensorflow 2 Third Edition
4 pages
AA12_Deep_Learning_2024 (1)
No ratings yet
AA12_Deep_Learning_2024 (1)
30 pages
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
No ratings yet
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
102 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
ML Midterm Cheatsheet
No ratings yet
ML Midterm Cheatsheet
2 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
RESEARCH PAPER ON MACHINE LEARNING .pdf (1)
No ratings yet
RESEARCH PAPER ON MACHINE LEARNING .pdf (1)
15 pages
The Little Book of Deep Learning
100% (1)
The Little Book of Deep Learning
140 pages
UNIT 1-2
No ratings yet
UNIT 1-2
15 pages
Lecture 1 Introduction to Machine Learning_Notes (1)
No ratings yet
Lecture 1 Introduction to Machine Learning_Notes (1)
9 pages
An Introduction To Machine Learning and Its Applications
No ratings yet
An Introduction To Machine Learning and Its Applications
8 pages
AI Project Cycle-Notes
No ratings yet
AI Project Cycle-Notes
14 pages
CRIME DATA PREDICTION BASED ON GEOGRAPHICAL LOCATION USING MACHI
No ratings yet
CRIME DATA PREDICTION BASED ON GEOGRAPHICAL LOCATION USING MACHI
60 pages
Machine Learning
No ratings yet
Machine Learning
90 pages
Second Draft GeneralPurpose AI Code of Practice SqiZllS04RyZNy2IyekfAXEYvU 111374
No ratings yet
Second Draft GeneralPurpose AI Code of Practice SqiZllS04RyZNy2IyekfAXEYvU 111374
66 pages
02 Ai Project Cycle Revision Notes
No ratings yet
02 Ai Project Cycle Revision Notes
4 pages
Chapter-3
No ratings yet
Chapter-3
4 pages
LinearModels Slides
No ratings yet
LinearModels Slides
130 pages
02 Ai Project Cycle Revision Notes
No ratings yet
02 Ai Project Cycle Revision Notes
4 pages
Complete Download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi PDF All Chapters
100% (3)
Complete Download Interpretable AI Building explainable machine learning systems MEAP V02 Ajay Thampi PDF All Chapters
62 pages
Unit_IIAIProjectCycle
No ratings yet
Unit_IIAIProjectCycle
9 pages
DL Unit 1
No ratings yet
DL Unit 1
58 pages
Intership Final
No ratings yet
Intership Final
23 pages
ASTRAL- Automated Safety Testing of Large Language Models
No ratings yet
ASTRAL- Automated Safety Testing of Large Language Models
11 pages
ISB_Assignment 2
No ratings yet
ISB_Assignment 2
5 pages
ref-38
No ratings yet
ref-38
15 pages
Extraction and Classification of User Interface Components From An Image
No ratings yet
Extraction and Classification of User Interface Components From An Image
16 pages
032 Linear Regression With Time Series Data
No ratings yet
032 Linear Regression With Time Series Data
10 pages
Phace 1 Report T20
No ratings yet
Phace 1 Report T20
10 pages
Data Analytics Compendium BITeSys 2024
No ratings yet
Data Analytics Compendium BITeSys 2024
46 pages
Design of Heart Disease
No ratings yet
Design of Heart Disease
6 pages
Updated_Hotel_Cancellation_Project_Report
No ratings yet
Updated_Hotel_Cancellation_Project_Report
25 pages
Satish 4
No ratings yet
Satish 4
6 pages
Sample LP-III Chits
No ratings yet
Sample LP-III Chits
6 pages
Development of Student's Academic Performance Prediction Model
No ratings yet
Development of Student's Academic Performance Prediction Model
16 pages
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
No ratings yet
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
5 pages
A Machine Learning Predictive Model For Determining Credit Risks in Ethiopian Microfinance Institutions
No ratings yet
A Machine Learning Predictive Model For Determining Credit Risks in Ethiopian Microfinance Institutions
20 pages
ECON - Explicit Clothed Humans Obtained From Normals
No ratings yet
ECON - Explicit Clothed Humans Obtained From Normals
17 pages
ML SP24 Mid Term Exam - Solution (1)
No ratings yet
ML SP24 Mid Term Exam - Solution (1)
8 pages
Eat
No ratings yet
Eat
3 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Demand Forecasting For Improved Inventory Management in 1hoyakdy
No ratings yet
Demand Forecasting For Improved Inventory Management in 1hoyakdy
11 pages
Nieto Etal 2006 Estimation of Lightning-Caused Fires Occurrence Probability in Central Spain
No ratings yet
Nieto Etal 2006 Estimation of Lightning-Caused Fires Occurrence Probability in Central Spain
15 pages
Using Deep Learning To Detect Price Change Indications in Financial Markets
No ratings yet
Using Deep Learning To Detect Price Change Indications in Financial Markets
5 pages
Supervised, Unsupervised, and Reinforcement Learning - by Renu Khandelwal - Medium
No ratings yet
Supervised, Unsupervised, and Reinforcement Learning - by Renu Khandelwal - Medium
12 pages
Sensors 20 07151
No ratings yet
Sensors 20 07151
14 pages
AI Class 10 Sample Paper-1 - 2024
100% (9)
AI Class 10 Sample Paper-1 - 2024
7 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
From Everand
MATHEMATICAL FOUNDATIONS OF MACHINE LEARNING: Unveiling the Mathematical Essence of Machine Learning (2024 Guide for Beginners)
DAVID MACKAY
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Approach_to_provide_interpretability_in_machine_le

Uploaded by

Approach_to_provide_interpretability_in_machine_le

Uploaded by

Industrial Artificial Intelligence (2023) 1:10

Approach to provide interpretability in machine learning models

Received: 10 January 2023 / Accepted: 27 June 2023

1 Introduction stabler results [2]. Being successfully implemented, ML

Fig. 1 Idea behind the LIME method [27]

4.1.2 Data collection and preprocessing

Table 1 Confusion matrix

Actually good 1197 4

Fig. 3 Series of product images of the class “good”

Fig. 5 Methodology chart of the LIME method

4.2.1 Software implementation Figures 8 and 9 allow us to get a better understanding of the

the test image with the visualized SHAP interpretation over

Programming of the SHAP interpretation method starts with

Fig. 11 SHAP interpretation image

The highlighted by red parts mean that they contributed

You might also like

4.1.2 Data collection and preprocessing

4.2.1 Software implementation Figures 8 and 9 allow us to get a better understanding of the