0% found this document useful (0 votes)
19 views

Applying Genetic Programming To Improve Interpretability in Machine Learning Models

Applying Genetic programming to improve interpretebility in ML

Uploaded by

Amine Mino
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Applying Genetic Programming To Improve Interpretability in Machine Learning Models

Applying Genetic programming to improve interpretebility in ML

Uploaded by

Amine Mino
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Applying Genetic Programming to Improve

Interpretability in Machine Learning Models


Leonardo Augusto Ferreira, Frederico Gadelha Guimarães Rodrigo Silva
Machine Intelligence and Data Science (MINDS) Lab Department of Computer Science
Department of Electrical Engineering Universidade Federal de Ouro Preto, UFOP
Universidade Federal de Minas Gerais, UFMG, 35400-000 Ouro Preto - MG, Brasil
31270-000 Belo Horizonte – MG, Brazil [email protected]
[email protected], [email protected] ORCID 0000-0003-2547-3835
ORCID 0000-0001-9238-8839

Abstract—Explainable Artificial Intelligence (or xAI) has be- urgent need for understanding and justifying their decisions.
come an important research topic in the fields of Machine This difficulty is an important impediment for the adoption of
Learning and Deep Learning. In this paper, we propose a Genetic ML, particularly DL, in domains such as healthcare [1], crim-
Programming (GP) based approach, name Genetic Program-
ming Explainer (GPX), to the problem of explaining decisions inal justice and finance [2]. The term Explainable AI (or XAI)
computed by AI systems. The method generates a noise set has been adopted by the community to refer to techniques that
located in the neighborhood of the point of interest, whose help the understanding of decisions or results of AI artifacts
prediction should be explained, and fits a local explanation by a given audience, which can be domain experts, regulatory
model for the analyzed sample. The tree structure generated by agencies, managers, decision-makers, policy-makers or users
GPX provides a comprehensible analytical, possibly non-linear,
expression which reflects the local behavior of the complex model. affected by these decisions.
We considered three machine learning techniques that can be The interpretability problem can be understood as the tech-
recognized as complex black-box models: Random Forest, Deep nical challenge of explaining AI decisions, especially when the
Neural Network and Support Vector Machine in twenty data sets underlying AI technology is perceived as a black-box model.
for regression and classifications problems. Our results indicate Humans tend to be less willing to accept decisions that are
that the GPX is able to produce more accurate understanding
of complex models than the state of the art. The results validate not directly interpretable and trustworthy [3]. Therefore, users
the proposed approach as a novel way to deploy GP to improve should be able to understand the models outputs in order to
interpretability. trust them. There are two types of trusting, as described in
Index Terms—Interpretability, Machine Learning, Genetic [4]: one about the model and another about the prediction.
Programming, Explainability Though similar, they are not the same thing. The first one
is related to whether someone will choose a model or not,
I. I NTRODUCTION whereas the second relates to whether someone will make a
Advances in Machine Learning (ML) and Deep Learning decision relying on that prediction.
(DL) have had a profound impact in science and technology. Interpretability, as pointed out in [5], is associated with
These techniques have had many recent successes, achieving a human perception i.e. the ability to classify something or
unprecedented performance in tasks such as image classifica- someone according to their main characteristics. This idea
tion, machine translation and speech recognition, to cite a few. applied to ML models would be to highlight the main features
The remarkable performance of Artificial Intelligence (AI) that contributed to a prediction. Other works, such as [6]
methods and the growing investment on AI worldwide will and [7], define interpretability as “the ability to explain or to
lead to an ever-increasing utilization of AI systems, having present in understandable terms to a human”. In other words,
a significant impact on society and everyday life decisions. a model can be defined as explainable whether its decisions
However, depending on the model used, understanding why it are easier for a human to understand.
makes a certain prediction can be difficult. This is particularly Another issue involving interpretability goes beyond trusting
the case with the high performing DL models and ML models some model or prediction. The European Union has recently
in general. The more complex the model, the more opaque its deployed the General Data Protection Regulation (GDPR)
decisions are to human understanding. Black-box ML models as pointed out in [8] and [9]. GDPR directly deals with
are increasingly used in critical applications, leading to an subjects related to European citizens’ data, for example: it
prohibits judgments based solely on automated decisions [10].
This work has been supported by the Brazilian agencies (i) National Council European Union’s new GDPR has a major impact in deploying
for Scientific and Technological Development (CNPq); (ii) Coordination for machine learning algorithms and AI-based systems. It restricts
the Improvement of Higher Education (CAPES) and (iii) Foundation for
Research of the State of Minas Gerais (FAPEMIG, in Portuguese). automated individual decision-making which “significantly
MINDS Laboratory – https://minds.eng.ufmg.br/ affects” users. The law also creates a “right to explanation,”

978-1-7281-6929-3/20/$31.00 ©2020 IEEE


Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
whereby someone has a right to be given an explanation for In summary, the proposed approach aims to contribute to
an output of the algorithm [11]. improving interpretability of black-box ML by using automatic
There are many important results of interpretability and generation of model-agnostic explanations that are able to fit
explainability in the literature. For instance, interpretable the input-output decisions performed by the more complex
mimic learning [12] is an approach in which the behavior model, which would be otherwise impossible to be derived
of a slow and complex model is approximated by a faster by usual analytical methods. The GP algorithm is applied
and simpler model (also more transparent) with comparable locally and provides an analytical and visual explanation in
performance. The idea is that by mimicking the performance terms of a human readable expression represented as a tree.
of other complex models, one is able to compress the learning These evolved explanations can be easily interpreted and help
into a more interpretable model and derive rules or logical understanding these complex decisions.
relationships. In this regard, it is possible to cite Lime (Local This paper is organized as follows. Section III-A reviews
Interpretable Model-Agnostic Explanations) [4] and SHAP the main ideas of GP Algorithm and discusses how it will be
(SHapley Additive exPlanations) [13], which have been widely applied for the purpose to provide interpretability. Section II
used for interpretability. Despite their success, both Lime and introduces some concepts about interpretability and describes
SHAP assume that a linear model is a good local represen- our approach according to these. Section III presents our
tation of the original one. This simplification may cause, in methodology and the main idea of our solution for approaching
some circumstances, a significant loss in the mimicking model the interpretability problem. Section V discusses the results of
accuracy, which may spoil the final interpretation. this article compared with the state of the art.
In this paper we present an approach to interpretability
based on Genetic Programming (GP), named Genetic Pro- II. C ONCEPTS OF I NTERPRETABILITY
gramming Explainer (GPX). GP has the ability to produce
linear and nonlinear models increasing the flexibility of the Humans are capable of making predictions about a subject
aforementioned methods. The evolution process of the GP and build a logical explanation to justify it. When a prediction
algorithm ensures the selection of the best features for the is based on understandable choices, it gives the decision maker
analyzed sample. Moreover, the tree structure representation more confidence on the model [2]. On the other hand, Machine
naturally provides an analytical expression that reflects a local Learning and Deep Learning models are not able to provide
explanation about the model’s prediction. The main goal is the same level confidence. Their complexity and exorbitant
to produce an accurate mimicking model which preserves the number of parameters make them unintelligible for a human
advantages of having a closed mathematical expression such being and for most of the purposes they are seen as black-
as readability and the ability of computing partial derivatives boxes. Humans tend to be resistant to techniques that are not
to assess the sensitivity of the output with respect to the input well understood or that cannot be directly interpreted [5].
parameters. According to the taxonomy recently advocated by More recently, several strategies have been applied to un-
[14], the proposed approach can be categorized as a model derstand how black-box models work. These strategies aim
agnostic technique for post-hoc explainability, able to provide to decrease the opacity of artificial intelligence systems and
a local explanation for a specific prediction output given by a to make these models more user friendly. In order to address
complex black-box model. the opacity of some machine learning models, we first must
In this work we have considered the following pre-trained introduce some concepts of interpretability [7], [2]:
complex ML algorithms that can be recognized as black- • Comprehensibility: refers to the ability of a learning
box models: Random Forest [15], Supppot Vector Machines algorithm to represent its learned knowledge in a human
(SVM) and a Deep Neural Network (DNN) [16] in a number understandable fashion, in such a way that the knowledge
of different data sets (10 regression and 10 binary classification representation can be comprehended.
problems available in public repositories). These methods • Interpretability: It is defined as the ability to explain
have great performance in most ML problems, however, their or to provide the meaning in understandable terms to a
explainability is low. For each pre-trained complex model, human.
we compared (GPX) against other methods that could be • Explainability: is associated with the notion of expla-
used for generating local explanations: Lime and Decision nation as an interface between humans and a decision
Tree/Regression. The statistical analysis shows that GPX was maker that is, at the same time, both an accurate proxy
able to better approximate the complex model, providing an of the decision maker and comprehensible to humans.
interpretable explanation in terms of a symbolic expression. • Transparency: A model is considered to be transparent
Genetic Programming brought us a new approach for inter- if by itself it is understandable. For instance, a decision
pretability with a local explanation for black-box models pre- tree is a transparent model. The model must allow the
dictions. In addition, we present two case studies to illustrate reproduction of every calculation step, it needs to have
the proposed methodology and serve as a guide for future use. a clear explanation of parameters and hyper-parameters
The first one is about predicting home prices in Boston area and to provide an explanation about the learning process.
and the other one measures the progression of Diabetes over Basically complex models can be considered as black-box
the years. models because they lack transparency.

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
• Functionality: the model must provide an understandable
output, it needs to have visualization tools and to ensure a
local explanation [4], [13]. The outputs must be presented
in a user-friendly way.
Post hoc explainability techniques employ other methods
after training step to analyze a given model. Different tech-
niques can be used to enhance interpretability of a black-
box model, such as text explanations, visual explanations,
local explanations, explanations by example, explanations by
simplification and feature relevance explanations [5].
More specifically, post hoc explainability by means of
local explanations is an approach applied for a prediction
or set of predictions of pre-trained black-box models, for
example: Lime [4] and SHAP [13] fall in this category. Lime
generates a local explanation model while SHAP identifies
feature relevance explanations.
The approach presented in this paper can be considered
as a post hoc technique since the GP algorithm is applied
to a pre-trained model. The goal of the GP is to produce a
local symbolic model to provide a visual explanation in terms Fig. 1. This flowchart presents the steps of the evolutionary process in GP.
of a human readable expression represented as a tree. Thus,
in this sense, the proposed method generates a knowledge
representation that can be comprehended, a local explanation
model that is interpretable, transparent and functional.
Explainability and interpretability are extremely relevant
issues nowadays. They are fundamental pieces for dealing
with several philosophical and ethical aspects of the interaction
between humans and AI artifacts.
III. E VOLVING E XPLANATIONS
A. Genetic Programming
Fig. 2. This binary tree represents the expression: f (x) = x20 + 0.5x2 − 3.
Genetic Programming (GP) was developed by Koza [17] The terminal nodes are constants or input variables and the internal nodes are
to evolve functional structures such as algebraic expressions, the functions or operations.
computer programs, and logical expressions [18]. GP works
with a population of programs, usually initialized at random.
In a GP there is a problem to be solved and the fitness of The gplearn1 Python library was used as the GP search
the individuals is related to how well they solve this problem. engine. This library extends from scikit-learn, a widely known
Thus, the GP algorithm evolves this population until it finds Python library for ML.
the best solution for the problem. Figure 1 presents all steps B. GP approach to local explanation
of the GP algorithm.
In this section, the proposed approach to the interpretability
The GP algorithm is relatively simple to describe, as ob-
problem is described. The example discussed here helps under-
served in Figure 1. The first step is to generate a random pop-
standing the steps needed to achieve our local interpretation.
ulation with several individuals. The next step is to evaluate the
Let x ∈ Rn be an input fed to a complex pre-trained
fitness of each individual to measure how well an individual
machine learning model. The first step in our method is to
solves the problem at hand. If some individual satisfies the
generate m sample points around the input x. This set of
stopping criterion, the algorithm ends. Otherwise, it goes to
samples, called noise set, η, is created by sampling from a
the selection step in which the algorithm will favor the better
multivariate Gaussian distribution centered at x with covari-
individuals based on fitness evaluation. After the selection
ance matrix, Σ = In ×σ where, In is the n×n identity matrix
step a genetic operator is chosen randomly to generate new
and σ is measured on training data.
individuals [17], [18] .
The goal of GPX is to find the function, f ∗ : Rn → R
In this work, genetic programming is used to evolve non-
which is easy to interpret and, at the same time, mimics the
linear symbolic expressions for regression and classification
behavior of original complex model, g : Rn → R, over the
problems. These expressions are represented in the program as
sample set, η. Formally, the problem to be solved by GPX can
a tree over which the genetic operations of mutation, crossover
be defined as follows:
and reproduction are defined. Figure 2 illustrates the tree
representation of a mathematical expression. 1 https://gplearn.readthedocs.io/

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
Figure 5 illustrates the best representation found for f ∗
by the GP. For didactic reasons, this example was built with
f ∗ = arg min d([f (s1 ), . . . , f (sm )] − [g(s1 ), . . . , g(sm )]) two dimensions only and the produced representation for f ∗
f ∈F,si ∈η
(1) contains all the input features. It is important to highlight,
where: however, that the GP is not obliged to use all the features.
d Thus, the produced model already indicates which variables
• F : set of all admissible functions f : R → R;
are important. Besides, the returned closed mathematical ex-
• {s1 , s2 , ..., sm }: samples form the noise set, η;
pression allows the user to compute partial derivatives to assess
• g(si ): is the prediction given by the complex ML model;
the local importance of each feature.
• f (si ) is the prediction generated by a given individual in
the population; and
• d(·): some distance metric, usually the l2 -norm or the
root-mean-square error (RMSE).

Fig. 5. A tree structure representing an algebraic expression, f (s) = x1 +


(x0 − (−9.558)). This structure was generated by GPX and visualized via
graphviz.

Fig. 3. Suppose a classification problem in which there are 1,500 samples IV. E XPERIMENTAL M ETHODOLOGY
in the training data set in 3 separate classes. Class 1 are triangles, class 2
circles and class 3 stars. This data set was created by scikit-learn tool, where A. Data Sets
the standard deviation in each class is 1.0, 2.5, 0.5 respectively.
In this work twenty data sets were used. Ten for clas-
As an example, Figure 3 illustrates some training data sification and ten regression. All data sets were extracted
Xk×n , where k = 1, 500 samples and n = 2. Assume that from well-known repositories such as UCI Machine Learning
a model, g(·), has been previously obtained from this data set. Repository2 , OpenML3 and Kaggle4 . Our selection was based
After defining σ, it is possible to generate the noise set, η, on features variability between data sets and number of down-
located in the neighborhood of the point of interest x whose loads. Table I lists the chosen data sets for the classification
prediction should be explained. Figure 4 shows x and the noise problems and the corresponding number of used features.
set, η, generated randomly around x.
The next step is to apply GP to try to find f ∗ as defined in TABLE I
Equation (1). C LASSIFICATION DATA S ET

Data Set Features


Diabetes [19] 8
Steel Plates Faults [20] 33
The Monk’s Problems 2 [21] 6
Phoneme [22] 5
Blood Transfusion Service Center [23] 4
Ozone Level Detection [24] 72
Hill-Valley [25] 100
EEG Eye State [25] 14
Spambase [25] 57
ILPD (Indian Liver Patient Data Set) [26] 10

Table II presents the data sets chosen for the regression


problems and the corresponding number of used features.
Fig. 4. The bold X represents the input x for a complex ML model, whose
prediction should be interpreted. The set η contains 100 samples classified by 2 https://archive.ics.uci.edu//
the complex model. This figure also presents the surface decision generated 3 https://www.openml.org
by GP algorithm. 4 https://www.kaggle.com/

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
TABLE II
R EGRESSION DATA S ETS 1 X
ur (f ) = (f (si ) − g(si ))2 (2)
|η| s ∈η
Data Set Features i
Diabetes [25] 10
Boston [27] 13 For classification problems, it may be formalized as the
Fetch California Housing [27] 8 accuracy of the explainer with respect to the complex models
Bike Sharing in Washington D.C. 13 predictions.
Red Wine Quality [28] 11
House Sales in King County, USA 18 1 X
GPU Kernel Performance [29] 14 uc (f ) = h(si ) (3)
Beer Consumption - São Paulo 5 |η| s ∈η
i
Houses to rent data 8
Predicting Compressive Strength of Concrete 10 where,
(
1 if f (si ) = g(si )
h(si ) = (4)
B. Complex, black-box, models 0 if f (si ) 6= g(si )

There are several machine learning models which can be The first method, named Lime [4], generates an explainer
considered black-box. In this work we have selected three of based on a linear least squares method with l2 -norm regular-
the most popular to serve as “complex models”. That is, model ization [34]. This linear model is used into Lime in order to
for which explanations have to be produced. These models are measure locally the feature importance.
the Random Forest, Deep Neural Networks and Suport Vector Another good candidate for explainer is the Decision Tree
Machines. They are briefly introduced below. (DT). A DT is, as the name implies, a tree where each node
Random Forest is a type of tree ensemble which have shown represents a feature, each branch represents a decision, and
improvements for the generalization of a single decision tree each leaf represents a prediction. By making a path from
and has achieved remarkable results in the literature [15], [30]. the root to the prediction leaf node, the user can obtain an
Our experiments set Random Forest with up two thousand explanation for that prediction.
trees. The more trees are used in the architecture the more The performance of Lime, DT and GPX (explained in
complexity is brought to the understanding of the model. section III-A) as explainers of Support Vector Machines,
Deep neural networks (DNNs) [16] are artificial neural Neural Networks and Random Forests are discussed next.
networks (ANNs) with multiple layers between the input and
V. T ESTING E XPLAINERS ACCURACY
output layers. The multi-layer architecture is inspired in the
human brain structure and has shown to be quite effective A. Experimental Setup
in many difficult classification and regression problems [31]. In this section we test the ability of Lime, GPX, DTs to
This architecture, however, represents compositions of non- understand the different complex models discussed in section
linear functions which are not easy to interpret. IV-B. The experimental steps can be described as follows:
Finally, Support Vector Machines (SVMs) [32] are classi- 1) Divide all the data sets described in section IV-A into
fiers which find a separating hyperplane, such that the distance training and test data. We used 80% for training and
on either side of that hyperplane to the next-closest data 20% for test.
points is maximized. In other words, given labeled training 2) Train the complex models (described in section IV-B)
data, the algorithm outputs an optimal hyperplane which with training data.
categorizes new examples. Support Vector Regresssion (SVR) 3) Using the trained complex models, predict the value
is the extension of SVMs for regression problems [33]. The of 100 random samples selected from the test set. At
SVR tries to fit the error within a certain threshold and can also this point, we have six thousand predictions for the
be considered opaque since it is hard to interpret its decisions, regression and classification problems together (20 data
specially when nonlinear kernel functions are used. sets × 100 predictions × 3 complex models).
4) Build the three explainers (Lime, GPX, DT) for each
C. Competing Explainers of these six thousand predictions in order to measure
In this section we present the competing explainers. Just which interpreter can better understand the black-box
like our method, the idea is to sample a noise set, η, around prediction. At this time we have 18,000 data in order to
the point of interest, x. The set η is then used to build a local apply a statistical analysis.
comprehensible model which will be used as an explainer. We The GPX hyper-parameters were set as follows:
say that an explainer, f , “understands” the complex model, g, • Population size: 100 individuals.
if its error on the noise set, η, relative to the complex model • Probability of crossover: 70%.
predictions is small. For regression problems, this concept may • Probability of hoist mutation: 5%. This mutation is called
be translated into the Root Mean Squared Error between the hoist mutation because the method chooses randomly a
model and the explainer as follows: subtree and hoist it into the tree.

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
• Probability of point mutation: 10%. This mutation selects Method. Considering a confidence level of 95%, the results
a random node to be replaced. indicate that, in average, GPX was better than both DT and
• Search interval: [−100, 100]. Lime in understanding the complex models. The DT, in turn,
The other hyper-parameters are set to the library default was better than Lime.
values.
The noise set, η (see section III-B), has the same one TABLE VI
PAIRWISE P ERMUTATION T EST FOR THE ACCURACY D IFFERENCE
thousand samples for each explainer. Thus, the explainers
always access same data. In order to measure how well the Comparison Stat P-value P.adjust
DT - GPX = 0 -15.09 1.866e-51 5.598e-51
explainers understand the complex models (2) and (3) are used DT - Lime = 0 28.31 0 0.000e+00
for the regression and classification problems, respectively. GPX -Lime = 0 33.27 0 0.000e+00

B. Results
The complete table of results as well as the analysis scripts
Table III presents the average and the standard deviation are available at https://github.com/leauferreira/GpX.
of the error, computed with (2), for each explainer across the Overall, these results show that, at least for the scenarios
regression problems. presented here, the Lime assumption that the complex model
can be locally approximated by a linear model, is not always
TABLE III adequate. The DT and the GPX specially were superior
E XPLAINERS ERROR
methods in understanding the complex model.
Explainer Average Error Error Standard Deviation Having shown that the proposed methodology can indeed
DT 0.083 0.329 produce accurate local models, in the next section, a demon-
GPX 0.065 0.508
Lime 7.577 36.913 stration of the use of GPX in practice is presented.
VI. C ASE STUDY ON INTERPRETABILITY FOR R ANDOM
A Permutation Pairwise Test5 was performed in order to test
F OREST R EGRESSOR
the hypothesis that the difference between the error means
is zero. The results are shown in Table IV where P.adjust In this case study we selected the Boston and the Diabetes
represents the P-Value adjusted with the Bonferroni Method. data sets, see Table II, and the Random Forest Regressor
Considering a confidence level of 95%, it is possible to say with 2,000 estimators (trees), from scikit-learn. Both data sets
that there is no difference in the mean error presented by the (Diabetes and Boston) can be found in scikit-learn
DT and the GPX. On the other hand, the results indicate that The Boston data set has 13 features, 506 samples and the
both DT and GPX, understand the complex model better than target consists of home prices in Boston. The data set was
Lime. randomly split into training and test sets with 80% of the
samples for training and the remainder for testing. Apart from
TABLE IV the number of trees the other hyper-parameters were set to
PAIRWISE P ERMUTATION T EST FOR THE E RROR D IFFERENCE the default values in the library. It is hard to analyze the joint
Comparison Stat P-value P.adjust decision performed by 2,000 trees, therefore explainability of
DT - GPX = 0 1.599 0.1099 0.3297 Random Forest is low, although its performance is high.
DT - Lime = 0 -11.01 3.515e-28 1.054e-27 After the training step we take the first sample in the test
GPX - Lime = 0 -11.03 2.668e-28 8.004e-28
set, x1 , and apply the process described in Section III-B in
order to create a noise around x1 . Then, the GP algorithm is
Table V presents the average and the standard deviation of
trained with the noise set, η = {s1 , ..., sn } where the targets
the accuracy, computed with (3), for each explainer across the
are given by g(si ), ∀si ∈ η, and g is the Random Forest
classification problems.
Regressor model.
The Boston data set consists of d = 13 features. However,
TABLE V
E XPLAINERS ACCURACY
as observed in Figure 6, after the evolution process only two
features were chosen by the GP. They were:
Explainer Average Accuracy Accuracy Standard Deviation
• PTRATIO : pupil-teacher ratio by town
DT 0.852 0.104
GPX 0.899 0.130 • NOX: nitric oxides concentration (parts per 10 million)
Lime 0.658 0.334
In (5) we changed PTRATIO to xptratio and NOX to
xnox . The tree structure in Figure 6 represents the following
A Permutation Pairwise Test6 was performed again in order equation:
to test the hypothesis that the difference between the accuracy
means is zero. The results are shown in Table VI where x2ptratio xnox
P.adjust represents the P-Value adjusted with the Bonferroni f ∗ (s) = (5)
28.390
5 See https://rdrr.io/cran/rcompanion/man/pairwisePermutationTest.html Figure 7 presents the result of the same process but in
6 See https://rdrr.io/cran/rcompanion/man/pairwisePermutationTest.html a different area, by considering sample x2 . It is possible

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
" #T
∗ xptratio xnox x2ptratio
∇f (s) = , (7)
14.195 28.390
By analyzing equations (5) and (7) it is easy to understand
how and by how much each feature affects housing prices.
The Random Forest Regressor has also been applied to the
Diabetes data set. This data set consists of 402 samples and
10 features. In this case, the target is a quantitative measure of
disease progression. The GPX was set with the same hyper-
parameters used for the Boston data set. Figure 8 presents the
result for the Diabetes data set.
Fig. 6. GP algorithm output for the Random Forest Regressor prediction
applied to instance x1 in Boston data set.

to observe that different features were chosen during the


evolutionary process. The features chosen for the solution
shown in Figure 7 were:
• PTRATIO: pupil-teacher ratio by town
• INDUS proportion of non-retail business acres per town
centres
• LSTAT lower status of the population

Fig. 8. GP algorithm output for the Random Forest Regressor prediction


applied to instance xd1 in Diabetes data set.

The evolutionary process chose a final tree with two fea-


tures:
• bmi: Body mass index
Fig. 7. GP algorithm output for the Random Forest Regressor prediction • S6 : blood serum measurements
applied to instance x2 in Boston data set.
The tree structure in Figure 8 represents the following
The expression represented in Figure 7 can be defined as: expression:

xindus −67.934xS6 + xbmi + 158.525 (8)


f ∗ (s) = + xptratio (6)
xlstat
where xbmi is bmi and xS6 is S6 .
Based on the equations (5) and (6) we can understand This expression shows for instance that decreasing body
the behavior around x1 and x2 . First of all, it is possible mass index is a relevant feature to decrease diabetes progres-
to observe that the feature PTRATIO (pupil-teacher ratio by sion for that specific patient.
town) is relevant to define home prices in the neighborhood
of both instances (x1 , x2 ). However, in the neighborhood of VII. C ONCLUSION
x2 , PTRATIO has more influence. This paper presented an approach to the interpretability
These results can help a decision-maker to understand which problem based on Genetic Programming. We discussed several
features contribute the most to the increase in the price in a concepts of interpretability and why this subject is relevant
neighborhood. Moreover, it is possible to know which features nowadays.
changed in order to increase or decrease home prices. In this The GP algorithm used was able to produce a non-linear
regard, it is useful to compute the gradient of the model algebraic expression as output, which in turn provides many
output with respect to the selected input parameters. Lets take opportunities for the interpretability of the more sophisticated
equation (5) as an example. Its gradient is given by: ML algorithms. It naturally selects the most important features

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.
to build a local explanation around a given sample. Besides, [15] F. Livingston, “Implementation of breiman’s random forest machine
the produced analytic expression allows for easy differentia- learning algorithm,” ECE591Q Machine Learning Journal Paper, pp.
1–13, 2005.
tion, which gives the sensitivity of the output with respect to [16] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed.
each feature. USA: Prentice Hall PTR, 1998.
Examples using the classic Boston and Diabetes data sets [17] J. R. Koza, Genetic Programming: On the Programming of Computers
by Means of Natural Selection. Cambridge, MA, USA: MIT Press,
show that the proposed approach can be seen as a source 1992.
for interpretability and help the decision maker to understand [18] A. Gaspar-Cunha, R. Takahashi, and C. Antunes, Manual de
better how the complex model is making a decision. computação evolutiva e metaheurı́stica, ser. Ensino. Imprensa da
Universidade de Coimbra / Coimbra University Press, 2012. [Online].
We submit the explainers GPX, Lime and Decision Tree Available: https://books.google.com.br/books?id=9Di5CwAAQBAJ
to a stress test in order to measure which one can, locally, [19] J. W. Smith, J. Everhart, W. Dickson, W. Knowler, and R. Johannes,
better understand the black-box model. The statistical analysis “Using the adap learning algorithm to forecast the onset of diabetes
mellitus,” Proceedings. Symposium on Computer Applications in
showed us that GPX, besides bringing a new approach to Medical Care, p. 261—265, November 1988. [Online]. Available:
interpretability, presented better or at least similar results when https://europepmc.org/articles/PMC2245318
compared with the state of the art. [20] M. Buscema, S. Terzi, and W. Tastle, “A new meta-classifier,” in 2010
Annual Meeting of the North American Fuzzy Information Processing
Society, 2010, pp. 1–7.
R EFERENCES [21] S. B. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. D.
[1] Z.-H. Zhou, Y. Jiang, Y.-B. Yang, and S.-F. Chen, “Lung cancer Jong, S. Dzeroski, S. E. Fahlman, D. Fisher, R. Hamann, K. Kaufman,
cell identification based on artificial neural network ensembles,” S. Keller, I. Kononenko, J. Kreuziger, R. Michalski, T. Mitchell, P. Pa-
Artificial Intelligence in Medicine, vol. 24, no. 1, pp. 25 – 36, chowicz, Y. Reich, H. Vafaie, W. V. D. Welde, W. Wenzel, J. Wnek, and
2002. [Online]. Available: http://www.sciencedirect.com/science/article/ J. Zhang, “The monk’s problems a performance comparison of different
pii/S093336570100094X learning algorithms,” Tech. Rep., 1991.
[2] S. Chakraborty, R. Tomsett, R. Raghavendra, D. Harborne, M. Alzantot, [22] J.-L. Voz, M. Verleysen, P. Thissen, and J.-D. Legat, “A practical
F. Cerutti, M. Srivastava, A. Preece, S. Julier, R. M. Rao, T. D. Kelley, view of suboptimal bayesian classification with radial gaussian kernels,”
D. Braines, M. Sensoy, C. J. Willis, and P. Gurram, “Interpretability of in Proceedings of the International Workshop on Artificial Neural
deep learning models: A survey of results,” pp. 1–6, Aug 2017. Networks: From Natural to Artificial Neural Computation, ser. IWANN
[3] J. Zhu, A. Liapis, S. Risi, R. Bidarra, and G. M. Youngblood, “Explain- ’96. Berlin, Heidelberg: Springer-Verlag, 1995, p. 404–411.
able ai for designers: A human-centered perspective on mixed-initiative [23] I.-C. Yeh, K.-J. Yang, and T.-M. Ting, “Knowledge discovery on rfm
co-creation,” in 2018 IEEE Conference on Computational Intelligence model using bernoulli sequence,” Expert Systems with Applications,
and Games (CIG), 2018, pp. 1–8. vol. 36, no. 3, Part 2, pp. 5866 – 5871, 2009. [Online]. Available:
[4] M. T. Ribeiro, S. Singh, and C. Guestrin, ““Why Should I Trust http://www.sciencedirect.com/science/article/pii/S0957417408004508
You?”: Explaining the Predictions of Any Classifier,” in Proceedings [24] K. Zhang and W. Fan, “Forecasting skewed biased stochastic
of the 22nd ACM SIGKDD International Conference on Knowledge ozone days: Analyses, solutions and beyond,” Knowl. Inf. Syst.,
Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: vol. 14, no. 3, p. 299–326, Mar. 2008. [Online]. Available:
Association for Computing Machinery, 2016, p. 1135–1144. [Online]. https://doi.org/10.1007/s10115-007-0095-1
Available: https://doi.org/10.1145/2939672.2939778 [25] D. Dua and C. Graff, “UCI machine learning repository,” 2017.
[5] E. Tjoa and C. Guan, “A survey on explainable artificial intelligence [Online]. Available: http://archive.ics.uci.edu/ml
(xai): Towards medical xai,” 10 2019. [26] B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu, “A critical
[6] P. Hall and N. Gill, An Introduction to Machine Learning Interpretabil- comparative study of liver patients from usa and india: An exploratory
ity: An Applied Perspective on Fairness, Accountability, Transparency, analysis,” 2012.
and Explainable AI. O’Reilly Media, 2018. [27] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel,
[7] C. Molnar, Interpretable Machine Learning. A Guide for Making V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. Van-
Black Box Models Explainable, 2019, https://christophm.github.io/ derPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine
interpretable-ml-book/. learning software: experiences from the scikit-learn project,” in ECML
[8] C. Tankard, “What the gdpr means for businesses,” Network PKDD Workshop: Languages for Data Mining and Machine Learning,
Security, vol. 2016, no. 6, pp. 5 – 8, 2016. [Online]. Available: 2013, pp. 108–122.
http://www.sciencedirect.com/science/article/pii/S1353485816300563 [28] P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis,
[9] J. P. Albrecht, “How the gdpr will change the world,” European “Modeling wine preferences by data mining from physicochemical
Data Protection Law Review, vol. 2, no. 3, 2016. [Online]. Available: properties.” Decis. Support Syst., vol. 47, no. 4, pp. 547–553, 2009.
https://doi.org/10.21552/EDPL/2016/3/4 [Online]. Available: http://dblp.uni-trier.de/db/journals/dss/dss47.html#
[10] B. P. Evans, B. Xue, and M. Zhang, “What’s inside the black- CortezCAMR09
box? a genetic programming method for interpreting complex [29] R. Ballester-Ripoll, E. G. Paredes, and R. Pajarola, “Sobol tensor trains
machine learning models,” p. 1012–1020, 2019. [Online]. Available: for global sensitivity analysis,” Reliability Engineering & System Safety,
https://doi.org/10.1145/3321707.3321726 vol. 183, pp. 311–322, 2019.
[11] B. Goodman and S. Flaxman, “European union regulations on algorith- [30] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
mic decision-making and a ”right to explanation”,” AI Magazine, vol. 38, 5–32, 2001.
pp. 50–57, 2017. [31] I. G. Goodfellow, Y. Bengio, and A. C. Courville, “Deep learning,”
[12] Z. Che, S. Purushotham, R. G. Khemani, and Y. Liu, “Interpretable Nature, vol. 521, pp. 436–444, 2015.
deep models for icu outcome prediction,” AMIA ... Annual Symposium [32] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm
proceedings. AMIA Symposium, vol. 2016, pp. 371–380, 2016. for optimal margin classifiers,” in Proceedings of the 5th Annual ACM
[13] S. M. Lundberg and S.-I. Lee, “A unified ap- Workshop on Computational Learning Theory, pp. 144–152.
proach to interpreting model predictions,” pp. 4765– [33] H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V. Vapnik,
4774, 2017. [Online]. Available: http://papers.nips.cc/paper/ “Support vector regression machines,” in Advances in neural information
7062-a-unified-approach-to-interpreting-model-predictions.pdf processing systems, 1997, pp. 155–161.
[14] A. B. Arrieta], N. Dı́az-Rodrı́guez, J. D. Ser], A. Bennetot, S. Tabik, [34] A. N. Tikhonov, “Solution of incorrectly formulated problems and the
A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, regularization method,” Soviet Math. Dokl., vol. 4, pp. 1035–1038, 1963.
R. Chatila, and F. Herrera, “Explainable artificial intelligence (xai):
Concepts, taxonomies, opportunities and challenges toward responsible
ai,” Information Fusion, vol. 58, pp. 82 – 115, 2020. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S1566253519308103

Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on September 27,2021 at 12:41:14 UTC from IEEE Xplore. Restrictions apply.

You might also like