0% found this document useful (0 votes)
52 views

AIexplain AI

Paper on AI explaining AI

Uploaded by

Fathima Heera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

AIexplain AI

Paper on AI explaining AI

Uploaded by

Fathima Heera
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

TED: Teaching AI to Explain its Decisions

Noel C. F. Codella,∗ Michael Hind,∗ Karthikeyan Natesan Ramamurthy,∗


Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilović
IBM Research AI

Abstract lack specific expertise), is actionable, and is flexible enough


to support various technical approaches.
Artificial intelligence systems are being increasingly de-
Unfortunately, the advance in effectiveness of machine-
ployed due to their potential to increase the efficiency, scale,
consistency, fairness, and accuracy of decisions. However, as learning techniques has coincided with increased complex-
many of these systems are opaque in their operation, there is ity in the inner workings of these techniques. For some tech-
a growing demand for such systems to provide explanations niques, like deep neural networks or large random forests,
for their decisions. Conventional approaches to this problem even experts cannot explain how decisions are reached.
attempt to expose or discover the inner workings of a machine Thus, we have a stronger need for explainable AI, just when
learning model with the hope that the resulting explanations we have a greater gap in achieving it.
will be meaningful to the consumer. In contrast, this paper This has sparked a growing research community focused
suggests a new approach to this problem. It introduces a sim- on this problem (Kim et al. 2017; Kim, Varshney, and Weller
ple, practical framework, called Teaching Explanations for 2018). Most of this research attempts to explain the inner
Decisions (TED), that provides meaningful explanations that
match the mental model of the consumer. We illustrate the
workings of a machine learning model either directly, indi-
generality and effectiveness of this approach with two differ- rectly via a simpler proxy model, or by probing the model
ent examples, resulting in highly accurate explanations with with related inputs. This paper proposes a different approach
no loss of prediction accuracy for these two examples. that requires a model to jointly produce both a decision as
well as an explanation, rather than exposing the inner de-
tails of how the model produces a decision. The explanation
1 Introduction is not constrained to any particular format and can vary to
Machine learning based systems have proven to be quite ef- accommodate user needs.
fective for producing highly accurate results in several do- The main contributions of this work are as follows:
mains. This effectiveness is leading to wider adoption in • A description of the challenges in providing meaningful
higher stakes domains, which has the potential to lead to explanations for machine learning systems.
more accurate, consistent, and fairer decisions and the re-
sulting societal benefits. However, given the higher stakes of • A new framework, called TED, that enables machine
these domains, there is a growing demand that these systems learning algorithms to provide meaningful explanations
provide explanations for their decisions, so that necessary that match the complexity and domain of consumers.
oversight can occur, and a citizen’s due process rights are • A simple instantiation of the framework that demonstrates
respected (Goodman and Flaxman 2016; Wachter, Mittel- the generality and simplicity of the approach.
stadt, and Floridi 2017b; Vacca 2018; Campolo, Whittaker,
and Crawford 2017; Kim 2017; Doshi-Velez et al. 2017; • Two illustrative examples and results that demonstrate the
Wachter, Mittelstadt, and Floridi 2017a; Caruana et al. 2015; effectiveness of the instantiation in providing meaningful
Varshney 2016). explanations.
The demand for explanation has manifested itself in new • A discussion on several possible extensions and open re-
regulations that call for automated decision making sys- search opportunities that this framework enables.
tems to provide “meaningful information” on the logic The rest of this paper is organized as follows. Section 2
used to reach conclusions (Goodman and Flaxman 2016; explores the challenges presented by the problem statement
Wachter, Mittelstadt, and Floridi 2017b; Selbst and Powles of providing explanations for AI decisions. Section 3 dis-
2017). Selbst and Powles (2017) interpret the concept of cusses related work. Section 4 describes our general ap-
“meaningful information” as information that should be un- proach, TED, and a simple instantiation for providing ex-
derstandable to the audience (potentially individuals who planations that are understandable by the consumer and dis-

These authors contributed equally. cusses the advantages of the approach. Section 5 presents
Copyright c 2019, Association for the Advancement of Artificial results from two examples that demonstrate the effective-
Intelligence (www.aaai.org). All rights reserved. ness of the simple instantiation. Section 6 discusses future
directions and open issues for the TED approach. Section 7 patients, loan applicants, employees, arrested individuals,
draws conclusions. at-risk children, etc. They desire explanations that can
help them understand if they were treated fairly and what
2 Challenges to Providing AI Explanations factor(s) could be changed to get a different result (Doshi-
Velez et al. 2017).
This section explores the challenges in providing meaning-
ful explanations, which provide the motivation for the TED Group 3: Regulatory Bodies: Government agencies,
framework. charged to protect the rights of their citizens, want
The concept of an explanation is probably as old as hu- to ensure that decisions are made in a safe and fair
man communication. Intuitively, an explanation is the com- manner, and that society is not negatively impacted by
munication from one person (A) to another (B) that pro- the decisions.
vides justification for an action or decision made by person Group 4: AI System Builders: Technical individuals (data
A. Mathematicians use proofs to formally provide expla- scientists and developers) who build or deploy an AI sys-
nations. These are constructed using agreed-upon logic and tem want to know if their system is working as expected,
formalism, so that any person trained in the field can verify if how to diagnose and improve it, and possibly gain insight
the proof/explanation is valid. Unfortunately, we do not have from its decisions.
such formalism for non-mathematical explanations. Even in
Understanding the motivations and expectations behind
the judicial system, we utilize nonexpert jurors to determine
each group’s needs for an explanation will help to ensure
if a defendant has violated a law, relying on their intuition
a solution that satisfies these expectations. For example,
and experience in weighing (informal) arguments made by
Group 4 is likely to desire a more complex explanation of
prosecution and defense.
the system’s inner workings to take action. Group 3’s needs
Since we do not have a satisfying formal definition for
may be satisfied by showing the overall process, including
valid human-to-human explanations, developing one for
training data, is fair and free of negative societal impact and
system-to-human explanations is challenging (Kim 2017;
they may not be able to consume the same level of complex-
Lipton 2016). Motivated by the concept of meaningful in-
ity as Group 4. Group 1 will have a high need for domain
formation (Goodman and Flaxman 2016; Wachter, Mittel-
sophistication, but will also have less tolerance for complex
stadt, and Floridi 2017b; Selbst and Powles 2017), we feel
explanations. Finally, Group 2 will have the lowest thresh-
that explanations must have the following three characteris-
old for both complexity and domain information. These are
tics:
affected users, such as loan applicants, and need to have the
Justification: An explanation needs to provide justification reasons for their outcomes such as loan denials explained in
for a decision that increases trust in the decision. This of- a simple manner without industry terms or complex formu-
ten includes some information that can be verified by the las.
consumer. In summary, outside of a logical proof, there is no clear
Complexity Match: The complexity of the explanation definition of a valid explanation; it seems to be subjective
needs to match the complexity capability of the con- to the consumer and circumstances. Furthermore, there is a
sumer (Kulesza et al. 2013; Dhurandhar et al. 2017). For wide diversity of potential consumers of explanations, with
example, an explanation in equation form may be appro- different needs, different levels of sophistication, and differ-
priate for a statistician, but not for a nontechnical per- ent levels of domain knowledge. This seems to make it im-
son (Miller, Howe, and Sonenberg 2017). possible to produce a single meaningful explanation without
any information about the consumer.
Domain Match: An explanation needs to be tailored to
the domain, incorporating the relevant terms of the do- 3 Related Work
main. For example, an explanation for a medical diagno- Prior work in providing explanations can be partitioned into
sis needs to use terms relevant to the physician (or patient) three areas:
who will be consuming the prediction.
1. Making existing or enhanced models interpretable, i.e.
There are at least four distinct groups of people who are to provide a precise description of how the model de-
interested in explanations for an AI system, with varying termined its decision (e.g., (Ribeiro, Singh, and Guestrin
motivations. 2016; Montavon, Samek, and Müller 2017; Lundberg and
Group 1: End User Decision Makers: These are the peo- Lee 2017)).
ple who use the recommendations of an AI system to 2. Creating a second, simpler-to-understand model, such as a
make a decision, such as physicians, loan officers, man- small number of logical expressions, that mostly matches
agers, judges, social workers, etc. They desire explana- the decisions of the deployed model (e.g., (Bastani, Kim,
tions that can build their trust and confidence in the sys- and Bastani 2018; Caruana et al. 2015)).
tem’s recommendations and possibly provide them with 3. Work in the natural language processing and computer
additional insight to improve their future decisions and vision domains that generate rationales/explanations de-
understanding of the phenomenon. rived from input text (e.g., (Lei, Barzilay, and Jaakkola
Group 2: Affected Users: These are the people impacted 2016; Ainur, Choi, and Cardie 2010; Hendricks et al.
by the recommendations made by an AI system, such as 2016)).
The first two groups attempt to precisely describe how a etc. From this training information, we will generate a model
machine learning decision was made, which is particularly that, for new input, will predict answers and provide expla-
relevant for AI system builders (Group 4). The insight into nations based on the explanations it was trained on. Because
the inner workings of a model can be used to improve the these explanations are the ones that were provided by the
AI system and may serve as the seeds for an explanation training, they are relevant to the target domain and meet the
to a non-AI expert. However, work still remains to deter- complexity level of the explanation consumer.
mine if these seeds are sufficient to satisfy the needs of the Previous researchers have demonstrated that providing
diverse collection of non-AI experts (Groups 1–3). Further- explanations with the training dataset may not add much of a
more, when the underlying features are not human compre- burden to the training time and may improve the overall ac-
hensible, these approaches are inadequate for providing hu- curacy of the training data (Zaidan and Eisner 2007; 2008;
man consumable explanations. Zhang, Marshall, and Wallace 2016; McDonnell et al. 2016).
The third group seeks to generate textual explanations
with predictions. For text classification, this involves select- 4.1 TED Framework and a Simple Instantiation
ing the minimal necessary content from a text body that The TED framework leverages existing machine learning
is sufficient to trigger the classification. For computer vi- technology in a straightforward way to generate a classifier
sion (Hendricks et al. 2016), this involves utilizing textual that produces explanations along with classifications. To re-
captions in training to automatically generate new textual view, a supervised machine learning algorithm takes a train-
captions of images that are both descriptive as well as dis- ing dataset that consists of a series of instances with the fol-
criminative. Although promising, it is not clear how these lowing two components:
techniques generalize to other domains and if the explana-
tions will be meaningful to the variety of explanation con- X, a set of features (feature vector) for the particular entity,
sumers described in Section 2. such as an image, a paragraph, loan application, etc.
Doshi-Velez et al. (2017) discuss the societal, moral, and Y, a label/decision/classification for each feature vector,
legal expectations of AI explanations, provide guidelines for such an image description, a paragraph summary, or a
the content of an explanation, and recommend that explana- loan-approval decision.
tions of AI systems be held to a similar standard as humans.
Our approach is compatible with their view. Biran and Cot- The TED framework requires a third component:
ton (2017) provide an excellent overview and taxonomy of E, an explanation for each decision, Y , which can take any
explanations and justifications in machine learning. form, such as a number, text string, an image, a video file,
Miller (2017) and Miller, Howe, and Sonenberg (2017) etc. Unlike traditional approaches, E does not necessarily
argue that explainable AI solutions need to meet the needs need to be expressed in terms of X. It could be some other
of the users, an area that has been well studied in philos- high-level concept specific to the domain that applies with
ophy, psychology, and cognitive science. They provides a some domain-specific combination of X, such as “scary
brief survey of the most relevant work in these fields to the image” or “loan information is not trustworthy”. Regard-
area of explainable AI. They, along with Doshi-Velez and less of the format, we represent each unique value of E
Kim (2017), call for more rigor in this area. with an identifier.
4 Teaching Explanations The TED framework takes this augmented training set and
Given the challenges to developing meaningful explanations produces a classifier that predicts both Y and E. There are
for the diversity of consumers described in Section 2, we several ways that this can be accomplished. The instantia-
advocate a non-traditional approach. We suggest a high- tion we explore in this work is a simple Cartesian product
level framework, with one simple instantiation, that we see approach. This approach encodes Y and E into a new clas-
as a promising complementary approach to the traditional sification, called Y E, which, along with the feature vector,
“inside-out” approach to providing explanations. X, is provided as the training input to any machine learn-
To understand the motivation for the TED approach, con- ing classification algorithm to produce a classifier that pre-
sider the common situation when a new employee is being dicts Y E’s. After the model produced by the classification
trained for their new job, such as a loan approval officer. algorithm makes a prediction, we apply a decoding step to
The supervisor will show the new employee several exam- partition a Y E prediction into its components, Y and E, to
ple situations, such as loan applications, and teach them the return to the consumer. Figure 1 illustrates the algorithm.
correct action: approve or reject, and explain the reason for The boxes in dashed lines are new TED components that en-
the action, such as “insufficient salary”. Over time, the new code Y and E into Y E and decode a predicted Y E into its
employee will be able to make independent decisions on new individual components, Y and E. The solid boxes represent
loan applications and will give explanations based on the ex- 1) any machine learning algorithm that takes a normal train-
planations they learned from their supervisor. This is analo- ing dataset: features and labels, and 2) the resulting model
gous to how the TED framework works. We ask the training produced by this algorithm.
dataset to teach us, not only how to get to the correct an-
swer (approve or reject), but also to provide the correct ex- 4.2 Example
planation, such as “insufficient salary”, “too much existing Let’s assume we are training a system to recommend cancer
debt”, “insufficient job stability”, “incomplete application”, treatments. A typical training set for such a system would
4.3 Advantages
Although this approach is simple, there are several nonobvi-
ous advantages that are particularly important in addressing
the requirements of explainable AI for groups 1, 2, and 3
discussed in Section 2.
Complexity/Domain Match: Explanations provided by the
algorithm are guaranteed to match the complexity and men-
tal model of the domain, given that they are created by the
domain expert who is training the system.
Figure 1: Overview of TED Algorithm Dealing with Incomprehensible Features: Since the expla-
nation format can be of any type, they are not limited to be-
ing a function of the input features, which is useful when the
features are not comprehensible.
Accuracy: Explanations will be accurate if the training data
explanations are accurate and representative of production
data.
Figure 2: Illustration of Changes to Training Dataset Generality: This approach is independent of the machine
learning classification algorithm; it can work with any su-
pervised classification algorithm, including neural networks,
be of the following form, where Pi is the feature vector rep- making this technique widely deployable.
resenting patient i and Tj , represents various treatment rec- Preserves Intellectual Property: There is no need to ex-
ommendations. pose details of machine learning algorithm to the consumer.
(P1 , TA ), (P2 , TA ), (P3 , TA ), (P4 , TA ) Thus, proprietary technology can remain protected by their
(P5 , TB ), (P6 , TB ), (P7 , TB ), (P8 , TC ) owners.
Easy to incorporate: The Cartesian product approach does
The TED approach would require adding an additional not require a change to the current machine learning algo-
explanation component to the training dataset as follows: rithm, just the addition of pre- and post-processing com-
(P1 , TA , E1 ), (P2 , TA , E1 ), (P3 , TA , E2 ), (P4 , TA , E2 ) ponents: encoder and decoder. Thus, an enterprise does not
(P5 , TB , E3 ), (P6 , TB , E3 ), (P7 , TB , E4 ), (P8 , TC , E5 ) need to adopt a new machine learning algorithm, just to get
explanations.
Each Ei would be an explanation to justify why a feature Educates Consumer: The process of providing good train-
vector representing a patient would map to a particular treat- ing explanations will help properly set expectations for what
ment. Some treatments could be recommended for multiple kind of explanations the system can realistically provide. For
reasons/explanations. For example, treatment TA is recom- example, it is probably easier to explain in the training data
mended for two different reasons, E1 and E2 , but treatment why a particular loan application is denied than to explain
TC is only recommended for reason E5 . why a particular photo is a cat. Setting customer expecta-
Given this augmented training data, the Cartesian prod- tions correctly for what AI systems can (currently) do is im-
uct instantiation of the TED framework transforms this triple portant to their satisfaction with the system.
into a form that any supervised machine learning algorithm Improved Auditability: After creating a TED dataset, the
can use, namely (feature, class) by combining the second domain expert will have enumerated all possible explana-
and third components into a unique new class as follows: tions for a decision. (The TED system does not create any
(P1 , TA E1 ), (P2 , TA E1 ), (P3 , TA E2 ), (P4 , TA E2 ) new explanations.) This enumeration can be useful for the
(P5 , TB E3 ), (P6 , TB E3 ), (P7 , TB E4 ), (P8 , TC E5 ) consumer’s auditability, i.e., to answer questions such as
“What are the reasons why you will deny a loan?” or “What
Figure 2 shows how the training dataset would change us- are the situations in which you will prescribe medical treat-
ing the TED approach for the above example. The left pic- ment X?”
ture illustrates how the original 8 training instances in the May Reduce Bias: Providing explanations will increase the
example are mapped into the 3 classes. The right picture likelihood of detecting bias in the training data because 1)
shows how the training data is changed, with explanations a biased decision will likely be harder for the explanation
added. Namely, Class A was decomposed to Classes A1 and producer to justify, and 2) one would expect that training
A2. Class B was transformed into Classes B3 and B4 and instances with the same explanations cluster close to each
Class C became C5. other in the feature space. Anomalies from this property
As Figure 2 illustrates, adding explanations to training could signal a bias or a need for more training data.
data implicitly creates a 2-level hierarchy in that the trans-
formed classes are members of the original classes, e.g., 5 Evaluation
Classes A1 and A2 are a decomposition of the original Class
A. This hierarchical property could be exploited by employ- To evaluate the ideas presented in this work, we focus on
ing hierarchical classification algorithms when training to two fundamental questions:
improve accuracy. 1. How useful are the explanations produced by the TED ap-
proach?
Table 1: Accuracy for predicting Y and E in Tic-Tac-Toe and
2. How is the prediction accuracy impacted by incorporating Loan Repayment
explanations into the training dataset? Accuracy (%)
Since the TED framework has many instantiations, can be Training Tic-Tac-Toe Loan Repayment
incorporated into many kinds of learning algorithms, tested Input Y E Y E
against many datasets, and used in many different situations, X, Y 96.5 NA 99.2 (0.2) NA
a definitive answer to these questions is beyond the scope of X, Y, and E 97.4 96.3 99.6 (0.1) 99.4 (0.1)
this paper. Instead we try to address these two questions us-
ing the simple Cartesian product instantiation with two dif-
ferent machine learning algorithms (neural nets and random To answer the first question, does the approach provide
forest), on two use cases to show that there is justification useful explanations, we calculated the accuracy of the pre-
for further study of this approach. dicted explanation. Although there are only 4 rules, each rule
Determining if any approach provides useful explana- applies to 9 different preferred moves, resulting in 36 pos-
tions is a challenge and no consensus metric has yet to sible explanations. Our classifier was able to generate the
emerge (Doshi-Velez et al. 2017). However, since the TED correct explanation 96.3% of the time, i.e., very rarely did it
approach requires explanations be provided for the target get the correct move and not the correct rule.
dataset (training and testing), one can evaluate the accuracy The second question asks how the accuracy of the classi-
of a model’s explanation (E) in a similar way that one eval- fier is impacted by the addition of E’s in the training dataset.
uates the accuracy of a predicted label (Y ). Given the increase in number of classes, one might expect
The TED approach requires a training set that contains the accuracy to decrease. However, for this example, the ac-
explanations. Since such datasets are not yet readily avail- curacy of predicting the preferred move actually increases
able, we evaluate the approach on two synthetic datasets de- to 97.4%. This illustrates that the approach works well in
scribed below: tic-tac-toe and loan repayment. this domain; it is possible to provide accurate explanations
without impacting the Y prediction accuracy. Table 1 sum-
5.1 Tic-Tac-Toe marizes the results for both examples.
The tic-tac-toe example tries to predict the best move given
a particular board configuration. A tic-tac-toe board is rep- 5.2 Loan Repayment
resented by two 3 × 3 binary feature planes, indicating the
presence of X and O, respectively. An additional binary fea- The second example is closer to an industry use case and
ture indicates the side to move, resulting in a total of 19 bi- is based on the FICO Explainable Machine Learning Chal-
nary input features. Each legal non-terminal board position lenge dataset (FICO 2018). The dataset contains around
(4,520) is labeled with a preferred move, along with the rea- 10,000 applications for Home Equity Line of Credit (HE-
son the move is preferred. The labeling is based on a simple LOC), with the binary Y label indicating payment perfor-
set of rules that are executed sequentially:1 mance (any 90-day or longer delinquent payments) over 2
years.
1. If a winning move is available, completing three in a row
for the side to move, choose that move with reason Win Since the dataset does not come with explanations (E),2
we generated them by training a rule set on the training data,
2. If a blocking move is available, preventing the opponent resulting in the following two 3-literal rules for the “good”
from completing three in a row on their next turn, choose class Y = 1 (see (FICO 2018) for a data dictionary):
that move with reason Block
1. NumSatisfactoryTrades ≥ 23 AND
3. If a threatening move is available, creating two in a row ExternalRiskEstimate ≥ 70 AND
with an empty third square in the row, choose that move NetFractionRevolvingBurden ≤ 63;
with reason Threat
4. Otherwise, choose an empty square, preferring center 2. NumSatisfactoryTrades ≤ 22 AND
over corners over middles, with reason Empty ExternalRiskEstimate ≥ 76 AND
NetFractionRevolvingBurden ≤ 78.
Two versions of the dataset were created, one with only
the preferred move (represented as a 3 × 3 plane), the sec- These two rules, from researchers at IBM Research, predict
ond with the preferred move and explanation (represented as Y with 72% accuracy and were the winning entry to the
a 3 × 3 × 4 stack of planes). A simple neural network classi- challenge. Since the TED approach requires 100% consis-
fier was built on each of these datasets, with one hidden layer tency between explanations and labels, we modified the Y
of 200 units using ReLU and a softmax over the 9 (or 36) labels in instances where they disagree with the rules. We
outputs. We use a 90%/10% split of the legal non-terminal then assigned the explanation E to one of 8 values: 2 for
board positions for the training/testing datasets. This classi- the good class, corresponding to which of the two rules is
fier obtained an accuracy of 96.5% on the baseline move- satisfied (they are mutually exclusive), and 6 for delinquent,
only prediction task, i.e., when trained with just X (the 19 corresponding first to which of the rules should apply based
features) and Y it was highly accurate. 2
The challenge asks participants to provide explanations along
1
These rules do not guarantee optimal play. with predictions, which will be judged by the organizers.
on NumSatisfactoryTrades, and then to which of the remain- vs. a loan officer or regulator. This would be a postprocess-
ing conditions (ExternalRiskEstimate, NetFractionRevolv- ing step once the explanation is predicted by the classifier.
ingBurden, or both) are violated. Providing explanations for the full training set is ideal, but
We trained a Random Forest classifier (100 trees, min- may not be realistic. Although it may be easy to add expla-
imum 5 samples per leaf) on first the dataset with just X nations while creating the training dataset, it may be more
and (modified) Y and then on the enhanced dataset with E challenging to add explanations after a dataset has been cre-
added. The accuracy of the baseline classifier (predicting bi- ated because the creator may not available or may not re-
nary label Y ) was 99.2%. The accuracy of TED in predict- member the justification for a label. One possibility is to use
ing explanations E was 99.4%, despite the larger class car- an external knowledge source to generate explanations, such
dinality of 8. In this example, Y predictions can be derived as WebMD in a medical domain. Another possibility is to
from E predictions through the mapping mentioned above, request explanations on a subset of the training data and ap-
and doing so resulted in an improved Y accuracy of 99.6%. ply ideas from few-shot learning (Goodfellow, Bengio, and
While these accuracies may be artificially high due to the Courville 2016) to learn the rest of the training dataset expla-
data generation method, they do show two things as in Sec- nations. Another option is to use active learning to guide the
tion 5.1: (1) To the extent that user explanations follow sim- user where to add explanations. One approach may be to first
ple logic, very high explanation accuracy can be achieved; ask the user to enumerate the classes and explanations and
(2) Accuracy in predicting Y not only does not suffer but then to provide training data (X) for each class/explanation
actually improves. The second result has been observed by until the algorithm achieves appropriate confidence. At a
other researchers who have suggested adding “rationales” minimum one could investigate how the performance of the
to improve classifier performance, but not for explainabil- explanatory system changes as more training explanations
ity (Sun and DeJong 2005; Zaidan and Eisner 2007; 2008; are provided. Combinations of the above may be fruitful.
Zhang, Marshall, and Wallace 2016; McDonnell et al. 2016;
Donahue and Grauman 2011; Duan et al. 2012; Peng et al. 7 Conclusions
2016).
This paper introduces a new paradigm for providing expla-
nations for machine learning model decisions. Unlike ex-
6 Extensions and Open Questions isting methods, it does not attempt to probe the reasoning
The TED framework assumes a training dataset with expla- process of a model. Instead, it seeks to replicate the reason-
nations and uses it to train a classifier that can predict Y and ing process of a human domain user. The two paradigms
E. This work described a simple way to do this, by taking share the objective to produce a reasoned explanation, but
the Cartesian product of Y and E and using any suitable the model introspection approach is more suited to AI sys-
machine learning algorithm to train a classifier. Another in- tem builders who work with models directly, whereas the
stantiation would be to bring together the labels and expla- teaching explanations paradigm more directly addresses do-
nations in a multitask setting. Yet another option is to learn main users. Indeed, the European Union GDPR guidelines
feature embeddings using labels and explanation similarities say: “The controller should find simple ways to tell the data
in a joint and aligned way to permit neighbor-based expla- subject about the rationale behind, or the criteria relied on in
nation prediction. reaching the decision without necessarily always attempting
Under the Cartesian product approach, adding explana- a complex explanation of the algorithms used or disclosure
tions to a dataset increases the number of classes that the of the full algorithm.”
classification algorithm will need to handle. This could Work in social and behavioral science (Lombrozo 2007;
stress the algorithm’s effectiveness or training time perfor- Miller, Howe, and Sonenberg 2017; Miller 2017) has found
mance, although we did not observe this in our two exam- that people prefer explanations that are simpler, more gen-
ples. However, techniques from the “extreme classification” eral, and coherent, even over more likely ones. Miller writes
community (Extreme 2017) could be applicable. that in the context of Explainable AI: “Giving simpler ex-
Although the flexibility of allowing any format for an ex- planations that increase the likelihood that the observer both
planation, provided the set of explanations can be enumer- understands and accepts the explanation may be more useful
ated, is quite general, it could encourage a large number of to establish trust (Miller, Howe, and Sonenberg 2017).”
explanations that differ in only unintended ways, such as Our two examples illustrate promise for this approach.
“insufficient salary” vs. “salary too low”. Providing more They both showed highly accurate explanations and no loss
structure via a domain-specific language (DSL) or good in prediction accuracy. We hope this work will inspire other
tooling could be useful. If free text is used, we could lever- researchers to further enrich this paradigm.
age word embeddings to provide some structure and to help
reason about similar explanations. References
As there are many ways to explain the same phenomenon,
it may be useful to explore having more than one version of Ainur, Y.; Choi, Y.; and Cardie, C. 2010. Automatically
the same base explanation for different levels of consumer generating annotator rationales to improve sentiment classi-
sophistication. Applications already do this for multilingual fication. In Proceedings of the ACL 2010 Conference Short
support, but in this case it would be multiple levels of sophis- Papers, 336–341.
tication in the same language for, say, a first time borrower Bastani, O.; Kim, C.; and Bastani, H. 2018. Interpret-
ing blackbox models via model extraction. arXiv preprint Lipton, Z. C. 2016. The mythos of model interpretability.
arXiv:1705.08504. In ICML Workshop on Human Interpretability of Machine
Biran, O., and Cotton, C. 2017. Explanation and justification Learning.
in machine learning: A survey. In IJCAI-17 Workshop on Lombrozo, T. 2007. Simplicity and probability in causal
Explainable AI (XAI). explanation. Cognitive Psychol. 55(3):232–257.
Campolo, A.; Whittaker, M. S. M.; and Crawford, K. 2017. Lundberg, S., and Lee, S.-I. 2017. A unified approach to
2017 annual report. Technical report, AI NOW. interpreting model predictions. In Advances of Neural Inf.
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; and Proc. Systems.
Elhadad, N. 2015. Intelligible models for healthcare: Pre- McDonnell, T.; Lease, M.; Kutlu, M.; and Elsayed, T. 2016.
dicting pneumonia risk and hospital 30-day readmission. In Why is that relevant? collecting annotator rationales for rel-
Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., evance judgments. In Proc. AAAI Conf. Human Comput.
1721–1730. Crowdsourc.
Dhurandhar, A.; Iyengar, V.; Luss, R.; and Shanmugam, K. Miller, T.; Howe, P.; and Sonenberg, L. 2017. Explain-
2017. A formal framework to characterize interpretability of able AI: Beware of inmates running the asylum or: How I
procedures. In Proc. ICML Workshop Human Interp. Mach. learnt to stop worrying and love the social and behavioural
Learn., 1–7. sciences. In Proc. IJCAI Workshop Explainable Artif. Intell.
Donahue, J., and Grauman, K. 2011. Annotator rationales Miller, T. 2017. Explanation in artificial intelli-
for visual recognition. In ICCV. gence: Insights from the social sciences. arXiv preprint
Doshi-Velez, F., and Kim, B. 2017. Towards a rig- arXiv:1706.07269.
orous science of interpretable machine learning. In Montavon, G.; Samek, W.; and Müller, K.-R. 2017. Methods
https://arxiv.org/abs/1702.08608v2. for interpreting and understanding deep neural networks.
Doshi-Velez, F.; Kortz, M.; Budish, R.; Bavitz, C.; Gersh- Digital Signal Processing.
man, S.; O’Brien, D.; Schieber, S.; Waldo, J.; Weinberger, Peng, P.; Tian, Y.; Xiang, T.; Wang, Y.; and Huang, T. 2016.
D.; and Wood, A. 2017. Accountability of AI under the law: Joint learning of semantic and latent attributes. In ECCV
The role of explanation. CoRR abs/1711.01134. 2016, Lecture Notes in Computer Science, volume 9908.
Duan, K.; Parikh, D.; Crandall, D.; and Grauman, K. 2012. Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. “Why
Discovering localized attributes for fine-grained recognition. should I trust you?”: Explaining the predictions of any clas-
In CVPR. sifier. In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data
2017. Extreme classification: The nips workshop on multi- Min., 1135–1144.
class and multi-label learning in extremely large label Selbst, A. D., and Powles, J. 2017. Meaningful informa-
spaces. tion and the right to explanation. Int. Data Privacy Law
FICO. 2018. Explainable machine learning challenge. 7(4):233–242.
Goodfellow, I.; Bengio, Y.; and Courville, A. Sun, Q., and DeJong, G. 2005. Explanation-augmented
2016. Deep Learning. MIT Press. http: svm: an approach to incorporating domain knowledge into
//www.deeplearningbook.org. svm learning. In 22nd International Conference on Machine
Goodman, B., and Flaxman, S. 2016. EU regulations on Learning.
algorithmic decision-making and a ‘right to explanation’. In Vacca, J. 2018. A local law in relation to automated decision
Proc. ICML Workshop Human Interp. Mach. Learn., 26–30. systems used by agencies. Technical report, The New York
Hendricks, L. A.; Akata, Z.; Rohrbach, M.; Donahue, J.; City Council.
Schiele, B.; and Darrell, T. 2016. Generating visual ex- Varshney, K. R. 2016. Engineering safety in machine learn-
planations. In European Conference on Computer Vision. ing. In Information Theory and Applications Workshop.
Kim, B.; Malioutov, D. M.; Varshney, K. R.; and Weller, A., Wachter, S.; Mittelstadt, B.; and Floridi, L. 2017a. Trans-
eds. 2017. 2017 ICML Workshop on Human Interpretability parent, explainable, and accountable AI for robotics. Science
in Machine Learning. Robotics 2.
Kim, B.; Varshney, K. R.; and Weller, A., eds. 2018. 2018 Wachter, S.; Mittelstadt, B.; and Floridi, L. 2017b. Why
Workshop on Human Interpretability in Machine Learning. a right to explanation of automated decision-making does
Kim, B. 2017. Tutorial on interpretable machine learning. not exist in the general data protection regulation. Int. Data
Kulesza, T.; Stumpf, S.; Burnett, M.; Yang, S.; Kwan, I.; Privacy Law 7(2):76–99.
and Wong, W.-K. 2013. Too much, too little, or just right? Zaidan, O. F., and Eisner, J. 2007. Using ’annotator ratio-
Ways explanations impact end users’ mental models. In nales’ to improve machine learning for text categorization.
Proc. IEEE Symp. Vis. Lang. Human-Centric Comput., 3– In In NAACL-HLT, 260–267.
10. Zaidan, O. F., and Eisner, J. 2008. Modeling annotators: A
Lei, T.; Barzilay, R.; and Jaakkola, T. 2016. Rationalizing generative approach to learning from annotator rationales.
neural predictions. In EMNLP. In Proceedings of EMNLP 2008, 31–40.
Zhang, Y.; Marshall, I. J.; and Wallace, B. C. 2016.
Rationale-augmented convolutional neural networks for text
classification. In Conference on Empirical Methods in Nat-
ural Language Processing (EMNLP).

You might also like