Noel C. F. Codella,∗ Michael Hind,∗ Karthikeyan Natesan Ramamurthy,∗
Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilović IBM Research AI
Abstract lack specific expertise), is actionable, and is flexible enough
to support various technical approaches. Artificial intelligence systems are being increasingly de- Unfortunately, the advance in effectiveness of machine- ployed due to their potential to increase the efficiency, scale, consistency, fairness, and accuracy of decisions. However, as learning techniques has coincided with increased complex- many of these systems are opaque in their operation, there is ity in the inner workings of these techniques. For some tech- a growing demand for such systems to provide explanations niques, like deep neural networks or large random forests, for their decisions. Conventional approaches to this problem even experts cannot explain how decisions are reached. attempt to expose or discover the inner workings of a machine Thus, we have a stronger need for explainable AI, just when learning model with the hope that the resulting explanations we have a greater gap in achieving it. will be meaningful to the consumer. In contrast, this paper This has sparked a growing research community focused suggests a new approach to this problem. It introduces a sim- on this problem (Kim et al. 2017; Kim, Varshney, and Weller ple, practical framework, called Teaching Explanations for 2018). Most of this research attempts to explain the inner Decisions (TED), that provides meaningful explanations that match the mental model of the consumer. We illustrate the workings of a machine learning model either directly, indi- generality and effectiveness of this approach with two differ- rectly via a simpler proxy model, or by probing the model ent examples, resulting in highly accurate explanations with with related inputs. This paper proposes a different approach no loss of prediction accuracy for these two examples. that requires a model to jointly produce both a decision as well as an explanation, rather than exposing the inner de- tails of how the model produces a decision. The explanation 1 Introduction is not constrained to any particular format and can vary to Machine learning based systems have proven to be quite ef- accommodate user needs. fective for producing highly accurate results in several do- The main contributions of this work are as follows: mains. This effectiveness is leading to wider adoption in • A description of the challenges in providing meaningful higher stakes domains, which has the potential to lead to explanations for machine learning systems. more accurate, consistent, and fairer decisions and the re- sulting societal benefits. However, given the higher stakes of • A new framework, called TED, that enables machine these domains, there is a growing demand that these systems learning algorithms to provide meaningful explanations provide explanations for their decisions, so that necessary that match the complexity and domain of consumers. oversight can occur, and a citizen’s due process rights are • A simple instantiation of the framework that demonstrates respected (Goodman and Flaxman 2016; Wachter, Mittel- the generality and simplicity of the approach. stadt, and Floridi 2017b; Vacca 2018; Campolo, Whittaker, and Crawford 2017; Kim 2017; Doshi-Velez et al. 2017; • Two illustrative examples and results that demonstrate the Wachter, Mittelstadt, and Floridi 2017a; Caruana et al. 2015; effectiveness of the instantiation in providing meaningful Varshney 2016). explanations. The demand for explanation has manifested itself in new • A discussion on several possible extensions and open re- regulations that call for automated decision making sys- search opportunities that this framework enables. tems to provide “meaningful information” on the logic The rest of this paper is organized as follows. Section 2 used to reach conclusions (Goodman and Flaxman 2016; explores the challenges presented by the problem statement Wachter, Mittelstadt, and Floridi 2017b; Selbst and Powles of providing explanations for AI decisions. Section 3 dis- 2017). Selbst and Powles (2017) interpret the concept of cusses related work. Section 4 describes our general ap- “meaningful information” as information that should be un- proach, TED, and a simple instantiation for providing ex- derstandable to the audience (potentially individuals who planations that are understandable by the consumer and dis- ∗ These authors contributed equally. cusses the advantages of the approach. Section 5 presents Copyright
c 2019, Association for the Advancement of Artificial results from two examples that demonstrate the effective- Intelligence (www.aaai.org). All rights reserved. ness of the simple instantiation. Section 6 discusses future directions and open issues for the TED approach. Section 7 patients, loan applicants, employees, arrested individuals, draws conclusions. at-risk children, etc. They desire explanations that can help them understand if they were treated fairly and what 2 Challenges to Providing AI Explanations factor(s) could be changed to get a different result (Doshi- Velez et al. 2017). This section explores the challenges in providing meaning- ful explanations, which provide the motivation for the TED Group 3: Regulatory Bodies: Government agencies, framework. charged to protect the rights of their citizens, want The concept of an explanation is probably as old as hu- to ensure that decisions are made in a safe and fair man communication. Intuitively, an explanation is the com- manner, and that society is not negatively impacted by munication from one person (A) to another (B) that pro- the decisions. vides justification for an action or decision made by person Group 4: AI System Builders: Technical individuals (data A. Mathematicians use proofs to formally provide expla- scientists and developers) who build or deploy an AI sys- nations. These are constructed using agreed-upon logic and tem want to know if their system is working as expected, formalism, so that any person trained in the field can verify if how to diagnose and improve it, and possibly gain insight the proof/explanation is valid. Unfortunately, we do not have from its decisions. such formalism for non-mathematical explanations. Even in Understanding the motivations and expectations behind the judicial system, we utilize nonexpert jurors to determine each group’s needs for an explanation will help to ensure if a defendant has violated a law, relying on their intuition a solution that satisfies these expectations. For example, and experience in weighing (informal) arguments made by Group 4 is likely to desire a more complex explanation of prosecution and defense. the system’s inner workings to take action. Group 3’s needs Since we do not have a satisfying formal definition for may be satisfied by showing the overall process, including valid human-to-human explanations, developing one for training data, is fair and free of negative societal impact and system-to-human explanations is challenging (Kim 2017; they may not be able to consume the same level of complex- Lipton 2016). Motivated by the concept of meaningful in- ity as Group 4. Group 1 will have a high need for domain formation (Goodman and Flaxman 2016; Wachter, Mittel- sophistication, but will also have less tolerance for complex stadt, and Floridi 2017b; Selbst and Powles 2017), we feel explanations. Finally, Group 2 will have the lowest thresh- that explanations must have the following three characteris- old for both complexity and domain information. These are tics: affected users, such as loan applicants, and need to have the Justification: An explanation needs to provide justification reasons for their outcomes such as loan denials explained in for a decision that increases trust in the decision. This of- a simple manner without industry terms or complex formu- ten includes some information that can be verified by the las. consumer. In summary, outside of a logical proof, there is no clear Complexity Match: The complexity of the explanation definition of a valid explanation; it seems to be subjective needs to match the complexity capability of the con- to the consumer and circumstances. Furthermore, there is a sumer (Kulesza et al. 2013; Dhurandhar et al. 2017). For wide diversity of potential consumers of explanations, with example, an explanation in equation form may be appro- different needs, different levels of sophistication, and differ- priate for a statistician, but not for a nontechnical per- ent levels of domain knowledge. This seems to make it im- son (Miller, Howe, and Sonenberg 2017). possible to produce a single meaningful explanation without any information about the consumer. Domain Match: An explanation needs to be tailored to the domain, incorporating the relevant terms of the do- 3 Related Work main. For example, an explanation for a medical diagno- Prior work in providing explanations can be partitioned into sis needs to use terms relevant to the physician (or patient) three areas: who will be consuming the prediction. 1. Making existing or enhanced models interpretable, i.e. There are at least four distinct groups of people who are to provide a precise description of how the model de- interested in explanations for an AI system, with varying termined its decision (e.g., (Ribeiro, Singh, and Guestrin motivations. 2016; Montavon, Samek, and Müller 2017; Lundberg and Group 1: End User Decision Makers: These are the peo- Lee 2017)). ple who use the recommendations of an AI system to 2. Creating a second, simpler-to-understand model, such as a make a decision, such as physicians, loan officers, man- small number of logical expressions, that mostly matches agers, judges, social workers, etc. They desire explana- the decisions of the deployed model (e.g., (Bastani, Kim, tions that can build their trust and confidence in the sys- and Bastani 2018; Caruana et al. 2015)). tem’s recommendations and possibly provide them with 3. Work in the natural language processing and computer additional insight to improve their future decisions and vision domains that generate rationales/explanations de- understanding of the phenomenon. rived from input text (e.g., (Lei, Barzilay, and Jaakkola Group 2: Affected Users: These are the people impacted 2016; Ainur, Choi, and Cardie 2010; Hendricks et al. by the recommendations made by an AI system, such as 2016)). The first two groups attempt to precisely describe how a etc. From this training information, we will generate a model machine learning decision was made, which is particularly that, for new input, will predict answers and provide expla- relevant for AI system builders (Group 4). The insight into nations based on the explanations it was trained on. Because the inner workings of a model can be used to improve the these explanations are the ones that were provided by the AI system and may serve as the seeds for an explanation training, they are relevant to the target domain and meet the to a non-AI expert. However, work still remains to deter- complexity level of the explanation consumer. mine if these seeds are sufficient to satisfy the needs of the Previous researchers have demonstrated that providing diverse collection of non-AI experts (Groups 1–3). Further- explanations with the training dataset may not add much of a more, when the underlying features are not human compre- burden to the training time and may improve the overall ac- hensible, these approaches are inadequate for providing hu- curacy of the training data (Zaidan and Eisner 2007; 2008; man consumable explanations. Zhang, Marshall, and Wallace 2016; McDonnell et al. 2016). The third group seeks to generate textual explanations with predictions. For text classification, this involves select- 4.1 TED Framework and a Simple Instantiation ing the minimal necessary content from a text body that The TED framework leverages existing machine learning is sufficient to trigger the classification. For computer vi- technology in a straightforward way to generate a classifier sion (Hendricks et al. 2016), this involves utilizing textual that produces explanations along with classifications. To re- captions in training to automatically generate new textual view, a supervised machine learning algorithm takes a train- captions of images that are both descriptive as well as dis- ing dataset that consists of a series of instances with the fol- criminative. Although promising, it is not clear how these lowing two components: techniques generalize to other domains and if the explana- tions will be meaningful to the variety of explanation con- X, a set of features (feature vector) for the particular entity, sumers described in Section 2. such as an image, a paragraph, loan application, etc. Doshi-Velez et al. (2017) discuss the societal, moral, and Y, a label/decision/classification for each feature vector, legal expectations of AI explanations, provide guidelines for such an image description, a paragraph summary, or a the content of an explanation, and recommend that explana- loan-approval decision. tions of AI systems be held to a similar standard as humans. Our approach is compatible with their view. Biran and Cot- The TED framework requires a third component: ton (2017) provide an excellent overview and taxonomy of E, an explanation for each decision, Y , which can take any explanations and justifications in machine learning. form, such as a number, text string, an image, a video file, Miller (2017) and Miller, Howe, and Sonenberg (2017) etc. Unlike traditional approaches, E does not necessarily argue that explainable AI solutions need to meet the needs need to be expressed in terms of X. It could be some other of the users, an area that has been well studied in philos- high-level concept specific to the domain that applies with ophy, psychology, and cognitive science. They provides a some domain-specific combination of X, such as “scary brief survey of the most relevant work in these fields to the image” or “loan information is not trustworthy”. Regard- area of explainable AI. They, along with Doshi-Velez and less of the format, we represent each unique value of E Kim (2017), call for more rigor in this area. with an identifier. 4 Teaching Explanations The TED framework takes this augmented training set and Given the challenges to developing meaningful explanations produces a classifier that predicts both Y and E. There are for the diversity of consumers described in Section 2, we several ways that this can be accomplished. The instantia- advocate a non-traditional approach. We suggest a high- tion we explore in this work is a simple Cartesian product level framework, with one simple instantiation, that we see approach. This approach encodes Y and E into a new clas- as a promising complementary approach to the traditional sification, called Y E, which, along with the feature vector, “inside-out” approach to providing explanations. X, is provided as the training input to any machine learn- To understand the motivation for the TED approach, con- ing classification algorithm to produce a classifier that pre- sider the common situation when a new employee is being dicts Y E’s. After the model produced by the classification trained for their new job, such as a loan approval officer. algorithm makes a prediction, we apply a decoding step to The supervisor will show the new employee several exam- partition a Y E prediction into its components, Y and E, to ple situations, such as loan applications, and teach them the return to the consumer. Figure 1 illustrates the algorithm. correct action: approve or reject, and explain the reason for The boxes in dashed lines are new TED components that en- the action, such as “insufficient salary”. Over time, the new code Y and E into Y E and decode a predicted Y E into its employee will be able to make independent decisions on new individual components, Y and E. The solid boxes represent loan applications and will give explanations based on the ex- 1) any machine learning algorithm that takes a normal train- planations they learned from their supervisor. This is analo- ing dataset: features and labels, and 2) the resulting model gous to how the TED framework works. We ask the training produced by this algorithm. dataset to teach us, not only how to get to the correct an- swer (approve or reject), but also to provide the correct ex- 4.2 Example planation, such as “insufficient salary”, “too much existing Let’s assume we are training a system to recommend cancer debt”, “insufficient job stability”, “incomplete application”, treatments. A typical training set for such a system would 4.3 Advantages Although this approach is simple, there are several nonobvi- ous advantages that are particularly important in addressing the requirements of explainable AI for groups 1, 2, and 3 discussed in Section 2. Complexity/Domain Match: Explanations provided by the algorithm are guaranteed to match the complexity and men- tal model of the domain, given that they are created by the domain expert who is training the system. Figure 1: Overview of TED Algorithm Dealing with Incomprehensible Features: Since the expla- nation format can be of any type, they are not limited to be- ing a function of the input features, which is useful when the features are not comprehensible. Accuracy: Explanations will be accurate if the training data explanations are accurate and representative of production data. Figure 2: Illustration of Changes to Training Dataset Generality: This approach is independent of the machine learning classification algorithm; it can work with any su- pervised classification algorithm, including neural networks, be of the following form, where Pi is the feature vector rep- making this technique widely deployable. resenting patient i and Tj , represents various treatment rec- Preserves Intellectual Property: There is no need to ex- ommendations. pose details of machine learning algorithm to the consumer. (P1 , TA ), (P2 , TA ), (P3 , TA ), (P4 , TA ) Thus, proprietary technology can remain protected by their (P5 , TB ), (P6 , TB ), (P7 , TB ), (P8 , TC ) owners. Easy to incorporate: The Cartesian product approach does The TED approach would require adding an additional not require a change to the current machine learning algo- explanation component to the training dataset as follows: rithm, just the addition of pre- and post-processing com- (P1 , TA , E1 ), (P2 , TA , E1 ), (P3 , TA , E2 ), (P4 , TA , E2 ) ponents: encoder and decoder. Thus, an enterprise does not (P5 , TB , E3 ), (P6 , TB , E3 ), (P7 , TB , E4 ), (P8 , TC , E5 ) need to adopt a new machine learning algorithm, just to get explanations. Each Ei would be an explanation to justify why a feature Educates Consumer: The process of providing good train- vector representing a patient would map to a particular treat- ing explanations will help properly set expectations for what ment. Some treatments could be recommended for multiple kind of explanations the system can realistically provide. For reasons/explanations. For example, treatment TA is recom- example, it is probably easier to explain in the training data mended for two different reasons, E1 and E2 , but treatment why a particular loan application is denied than to explain TC is only recommended for reason E5 . why a particular photo is a cat. Setting customer expecta- Given this augmented training data, the Cartesian prod- tions correctly for what AI systems can (currently) do is im- uct instantiation of the TED framework transforms this triple portant to their satisfaction with the system. into a form that any supervised machine learning algorithm Improved Auditability: After creating a TED dataset, the can use, namely (feature, class) by combining the second domain expert will have enumerated all possible explana- and third components into a unique new class as follows: tions for a decision. (The TED system does not create any (P1 , TA E1 ), (P2 , TA E1 ), (P3 , TA E2 ), (P4 , TA E2 ) new explanations.) This enumeration can be useful for the (P5 , TB E3 ), (P6 , TB E3 ), (P7 , TB E4 ), (P8 , TC E5 ) consumer’s auditability, i.e., to answer questions such as “What are the reasons why you will deny a loan?” or “What Figure 2 shows how the training dataset would change us- are the situations in which you will prescribe medical treat- ing the TED approach for the above example. The left pic- ment X?” ture illustrates how the original 8 training instances in the May Reduce Bias: Providing explanations will increase the example are mapped into the 3 classes. The right picture likelihood of detecting bias in the training data because 1) shows how the training data is changed, with explanations a biased decision will likely be harder for the explanation added. Namely, Class A was decomposed to Classes A1 and producer to justify, and 2) one would expect that training A2. Class B was transformed into Classes B3 and B4 and instances with the same explanations cluster close to each Class C became C5. other in the feature space. Anomalies from this property As Figure 2 illustrates, adding explanations to training could signal a bias or a need for more training data. data implicitly creates a 2-level hierarchy in that the trans- formed classes are members of the original classes, e.g., 5 Evaluation Classes A1 and A2 are a decomposition of the original Class A. This hierarchical property could be exploited by employ- To evaluate the ideas presented in this work, we focus on ing hierarchical classification algorithms when training to two fundamental questions: improve accuracy. 1. How useful are the explanations produced by the TED ap- proach? Table 1: Accuracy for predicting Y and E in Tic-Tac-Toe and 2. How is the prediction accuracy impacted by incorporating Loan Repayment explanations into the training dataset? Accuracy (%) Since the TED framework has many instantiations, can be Training Tic-Tac-Toe Loan Repayment incorporated into many kinds of learning algorithms, tested Input Y E Y E against many datasets, and used in many different situations, X, Y 96.5 NA 99.2 (0.2) NA a definitive answer to these questions is beyond the scope of X, Y, and E 97.4 96.3 99.6 (0.1) 99.4 (0.1) this paper. Instead we try to address these two questions us- ing the simple Cartesian product instantiation with two dif- ferent machine learning algorithms (neural nets and random To answer the first question, does the approach provide forest), on two use cases to show that there is justification useful explanations, we calculated the accuracy of the pre- for further study of this approach. dicted explanation. Although there are only 4 rules, each rule Determining if any approach provides useful explana- applies to 9 different preferred moves, resulting in 36 pos- tions is a challenge and no consensus metric has yet to sible explanations. Our classifier was able to generate the emerge (Doshi-Velez et al. 2017). However, since the TED correct explanation 96.3% of the time, i.e., very rarely did it approach requires explanations be provided for the target get the correct move and not the correct rule. dataset (training and testing), one can evaluate the accuracy The second question asks how the accuracy of the classi- of a model’s explanation (E) in a similar way that one eval- fier is impacted by the addition of E’s in the training dataset. uates the accuracy of a predicted label (Y ). Given the increase in number of classes, one might expect The TED approach requires a training set that contains the accuracy to decrease. However, for this example, the ac- explanations. Since such datasets are not yet readily avail- curacy of predicting the preferred move actually increases able, we evaluate the approach on two synthetic datasets de- to 97.4%. This illustrates that the approach works well in scribed below: tic-tac-toe and loan repayment. this domain; it is possible to provide accurate explanations without impacting the Y prediction accuracy. Table 1 sum- 5.1 Tic-Tac-Toe marizes the results for both examples. The tic-tac-toe example tries to predict the best move given a particular board configuration. A tic-tac-toe board is rep- 5.2 Loan Repayment resented by two 3 × 3 binary feature planes, indicating the presence of X and O, respectively. An additional binary fea- The second example is closer to an industry use case and ture indicates the side to move, resulting in a total of 19 bi- is based on the FICO Explainable Machine Learning Chal- nary input features. Each legal non-terminal board position lenge dataset (FICO 2018). The dataset contains around (4,520) is labeled with a preferred move, along with the rea- 10,000 applications for Home Equity Line of Credit (HE- son the move is preferred. The labeling is based on a simple LOC), with the binary Y label indicating payment perfor- set of rules that are executed sequentially:1 mance (any 90-day or longer delinquent payments) over 2 years. 1. If a winning move is available, completing three in a row for the side to move, choose that move with reason Win Since the dataset does not come with explanations (E),2 we generated them by training a rule set on the training data, 2. If a blocking move is available, preventing the opponent resulting in the following two 3-literal rules for the “good” from completing three in a row on their next turn, choose class Y = 1 (see (FICO 2018) for a data dictionary): that move with reason Block 1. NumSatisfactoryTrades ≥ 23 AND 3. If a threatening move is available, creating two in a row ExternalRiskEstimate ≥ 70 AND with an empty third square in the row, choose that move NetFractionRevolvingBurden ≤ 63; with reason Threat 4. Otherwise, choose an empty square, preferring center 2. NumSatisfactoryTrades ≤ 22 AND over corners over middles, with reason Empty ExternalRiskEstimate ≥ 76 AND NetFractionRevolvingBurden ≤ 78. Two versions of the dataset were created, one with only the preferred move (represented as a 3 × 3 plane), the sec- These two rules, from researchers at IBM Research, predict ond with the preferred move and explanation (represented as Y with 72% accuracy and were the winning entry to the a 3 × 3 × 4 stack of planes). A simple neural network classi- challenge. Since the TED approach requires 100% consis- fier was built on each of these datasets, with one hidden layer tency between explanations and labels, we modified the Y of 200 units using ReLU and a softmax over the 9 (or 36) labels in instances where they disagree with the rules. We outputs. We use a 90%/10% split of the legal non-terminal then assigned the explanation E to one of 8 values: 2 for board positions for the training/testing datasets. This classi- the good class, corresponding to which of the two rules is fier obtained an accuracy of 96.5% on the baseline move- satisfied (they are mutually exclusive), and 6 for delinquent, only prediction task, i.e., when trained with just X (the 19 corresponding first to which of the rules should apply based features) and Y it was highly accurate. 2 The challenge asks participants to provide explanations along 1 These rules do not guarantee optimal play. with predictions, which will be judged by the organizers. on NumSatisfactoryTrades, and then to which of the remain- vs. a loan officer or regulator. This would be a postprocess- ing conditions (ExternalRiskEstimate, NetFractionRevolv- ing step once the explanation is predicted by the classifier. ingBurden, or both) are violated. Providing explanations for the full training set is ideal, but We trained a Random Forest classifier (100 trees, min- may not be realistic. Although it may be easy to add expla- imum 5 samples per leaf) on first the dataset with just X nations while creating the training dataset, it may be more and (modified) Y and then on the enhanced dataset with E challenging to add explanations after a dataset has been cre- added. The accuracy of the baseline classifier (predicting bi- ated because the creator may not available or may not re- nary label Y ) was 99.2%. The accuracy of TED in predict- member the justification for a label. One possibility is to use ing explanations E was 99.4%, despite the larger class car- an external knowledge source to generate explanations, such dinality of 8. In this example, Y predictions can be derived as WebMD in a medical domain. Another possibility is to from E predictions through the mapping mentioned above, request explanations on a subset of the training data and ap- and doing so resulted in an improved Y accuracy of 99.6%. ply ideas from few-shot learning (Goodfellow, Bengio, and While these accuracies may be artificially high due to the Courville 2016) to learn the rest of the training dataset expla- data generation method, they do show two things as in Sec- nations. Another option is to use active learning to guide the tion 5.1: (1) To the extent that user explanations follow sim- user where to add explanations. One approach may be to first ple logic, very high explanation accuracy can be achieved; ask the user to enumerate the classes and explanations and (2) Accuracy in predicting Y not only does not suffer but then to provide training data (X) for each class/explanation actually improves. The second result has been observed by until the algorithm achieves appropriate confidence. At a other researchers who have suggested adding “rationales” minimum one could investigate how the performance of the to improve classifier performance, but not for explainabil- explanatory system changes as more training explanations ity (Sun and DeJong 2005; Zaidan and Eisner 2007; 2008; are provided. Combinations of the above may be fruitful. Zhang, Marshall, and Wallace 2016; McDonnell et al. 2016; Donahue and Grauman 2011; Duan et al. 2012; Peng et al. 7 Conclusions 2016). This paper introduces a new paradigm for providing expla- nations for machine learning model decisions. Unlike ex- 6 Extensions and Open Questions isting methods, it does not attempt to probe the reasoning The TED framework assumes a training dataset with expla- process of a model. Instead, it seeks to replicate the reason- nations and uses it to train a classifier that can predict Y and ing process of a human domain user. The two paradigms E. This work described a simple way to do this, by taking share the objective to produce a reasoned explanation, but the Cartesian product of Y and E and using any suitable the model introspection approach is more suited to AI sys- machine learning algorithm to train a classifier. Another in- tem builders who work with models directly, whereas the stantiation would be to bring together the labels and expla- teaching explanations paradigm more directly addresses do- nations in a multitask setting. Yet another option is to learn main users. Indeed, the European Union GDPR guidelines feature embeddings using labels and explanation similarities say: “The controller should find simple ways to tell the data in a joint and aligned way to permit neighbor-based expla- subject about the rationale behind, or the criteria relied on in nation prediction. reaching the decision without necessarily always attempting Under the Cartesian product approach, adding explana- a complex explanation of the algorithms used or disclosure tions to a dataset increases the number of classes that the of the full algorithm.” classification algorithm will need to handle. This could Work in social and behavioral science (Lombrozo 2007; stress the algorithm’s effectiveness or training time perfor- Miller, Howe, and Sonenberg 2017; Miller 2017) has found mance, although we did not observe this in our two exam- that people prefer explanations that are simpler, more gen- ples. However, techniques from the “extreme classification” eral, and coherent, even over more likely ones. Miller writes community (Extreme 2017) could be applicable. that in the context of Explainable AI: “Giving simpler ex- Although the flexibility of allowing any format for an ex- planations that increase the likelihood that the observer both planation, provided the set of explanations can be enumer- understands and accepts the explanation may be more useful ated, is quite general, it could encourage a large number of to establish trust (Miller, Howe, and Sonenberg 2017).” explanations that differ in only unintended ways, such as Our two examples illustrate promise for this approach. “insufficient salary” vs. “salary too low”. Providing more They both showed highly accurate explanations and no loss structure via a domain-specific language (DSL) or good in prediction accuracy. We hope this work will inspire other tooling could be useful. If free text is used, we could lever- researchers to further enrich this paradigm. age word embeddings to provide some structure and to help reason about similar explanations. References As there are many ways to explain the same phenomenon, it may be useful to explore having more than one version of Ainur, Y.; Choi, Y.; and Cardie, C. 2010. Automatically the same base explanation for different levels of consumer generating annotator rationales to improve sentiment classi- sophistication. Applications already do this for multilingual fication. In Proceedings of the ACL 2010 Conference Short support, but in this case it would be multiple levels of sophis- Papers, 336–341. tication in the same language for, say, a first time borrower Bastani, O.; Kim, C.; and Bastani, H. 2018. Interpret- ing blackbox models via model extraction. arXiv preprint Lipton, Z. C. 2016. The mythos of model interpretability. arXiv:1705.08504. In ICML Workshop on Human Interpretability of Machine Biran, O., and Cotton, C. 2017. Explanation and justification Learning. in machine learning: A survey. In IJCAI-17 Workshop on Lombrozo, T. 2007. Simplicity and probability in causal Explainable AI (XAI). explanation. Cognitive Psychol. 55(3):232–257. Campolo, A.; Whittaker, M. S. M.; and Crawford, K. 2017. Lundberg, S., and Lee, S.-I. 2017. A unified approach to 2017 annual report. Technical report, AI NOW. interpreting model predictions. In Advances of Neural Inf. Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; and Proc. Systems. Elhadad, N. 2015. Intelligible models for healthcare: Pre- McDonnell, T.; Lease, M.; Kutlu, M.; and Elsayed, T. 2016. dicting pneumonia risk and hospital 30-day readmission. In Why is that relevant? collecting annotator rationales for rel- Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., evance judgments. In Proc. AAAI Conf. Human Comput. 1721–1730. Crowdsourc. Dhurandhar, A.; Iyengar, V.; Luss, R.; and Shanmugam, K. Miller, T.; Howe, P.; and Sonenberg, L. 2017. Explain- 2017. A formal framework to characterize interpretability of able AI: Beware of inmates running the asylum or: How I procedures. In Proc. ICML Workshop Human Interp. Mach. learnt to stop worrying and love the social and behavioural Learn., 1–7. sciences. In Proc. IJCAI Workshop Explainable Artif. Intell. Donahue, J., and Grauman, K. 2011. Annotator rationales Miller, T. 2017. Explanation in artificial intelli- for visual recognition. In ICCV. gence: Insights from the social sciences. arXiv preprint Doshi-Velez, F., and Kim, B. 2017. Towards a rig- arXiv:1706.07269. orous science of interpretable machine learning. In Montavon, G.; Samek, W.; and Müller, K.-R. 2017. Methods https://arxiv.org/abs/1702.08608v2. for interpreting and understanding deep neural networks. Doshi-Velez, F.; Kortz, M.; Budish, R.; Bavitz, C.; Gersh- Digital Signal Processing. man, S.; O’Brien, D.; Schieber, S.; Waldo, J.; Weinberger, Peng, P.; Tian, Y.; Xiang, T.; Wang, Y.; and Huang, T. 2016. D.; and Wood, A. 2017. Accountability of AI under the law: Joint learning of semantic and latent attributes. In ECCV The role of explanation. CoRR abs/1711.01134. 2016, Lecture Notes in Computer Science, volume 9908. Duan, K.; Parikh, D.; Crandall, D.; and Grauman, K. 2012. Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. “Why Discovering localized attributes for fine-grained recognition. should I trust you?”: Explaining the predictions of any clas- In CVPR. sifier. In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data 2017. Extreme classification: The nips workshop on multi- Min., 1135–1144. class and multi-label learning in extremely large label Selbst, A. D., and Powles, J. 2017. Meaningful informa- spaces. tion and the right to explanation. Int. Data Privacy Law FICO. 2018. Explainable machine learning challenge. 7(4):233–242. Goodfellow, I.; Bengio, Y.; and Courville, A. Sun, Q., and DeJong, G. 2005. Explanation-augmented 2016. Deep Learning. MIT Press. http: svm: an approach to incorporating domain knowledge into //www.deeplearningbook.org. svm learning. In 22nd International Conference on Machine Goodman, B., and Flaxman, S. 2016. EU regulations on Learning. algorithmic decision-making and a ‘right to explanation’. In Vacca, J. 2018. A local law in relation to automated decision Proc. ICML Workshop Human Interp. Mach. Learn., 26–30. systems used by agencies. Technical report, The New York Hendricks, L. A.; Akata, Z.; Rohrbach, M.; Donahue, J.; City Council. Schiele, B.; and Darrell, T. 2016. Generating visual ex- Varshney, K. R. 2016. Engineering safety in machine learn- planations. In European Conference on Computer Vision. ing. In Information Theory and Applications Workshop. Kim, B.; Malioutov, D. M.; Varshney, K. R.; and Weller, A., Wachter, S.; Mittelstadt, B.; and Floridi, L. 2017a. Trans- eds. 2017. 2017 ICML Workshop on Human Interpretability parent, explainable, and accountable AI for robotics. Science in Machine Learning. Robotics 2. Kim, B.; Varshney, K. R.; and Weller, A., eds. 2018. 2018 Wachter, S.; Mittelstadt, B.; and Floridi, L. 2017b. Why Workshop on Human Interpretability in Machine Learning. a right to explanation of automated decision-making does Kim, B. 2017. Tutorial on interpretable machine learning. not exist in the general data protection regulation. Int. Data Kulesza, T.; Stumpf, S.; Burnett, M.; Yang, S.; Kwan, I.; Privacy Law 7(2):76–99. and Wong, W.-K. 2013. Too much, too little, or just right? Zaidan, O. F., and Eisner, J. 2007. Using ’annotator ratio- Ways explanations impact end users’ mental models. In nales’ to improve machine learning for text categorization. Proc. IEEE Symp. Vis. Lang. Human-Centric Comput., 3– In In NAACL-HLT, 260–267. 10. Zaidan, O. F., and Eisner, J. 2008. Modeling annotators: A Lei, T.; Barzilay, R.; and Jaakkola, T. 2016. Rationalizing generative approach to learning from annotator rationales. neural predictions. In EMNLP. In Proceedings of EMNLP 2008, 31–40. Zhang, Y.; Marshall, I. J.; and Wallace, B. C. 2016. Rationale-augmented convolutional neural networks for text classification. In Conference on Empirical Methods in Nat- ural Language Processing (EMNLP).