0% found this document useful (0 votes)
261 views

Hands-On Machine Learning Model Interpretation - Towards Data Science

The document discusses model interpretation of machine learning models. It introduces explainable artificial intelligence and the need for model interpretation. It then provides a hands-on tutorial on interpreting machine learning models trained on a census income dataset. The tutorial covers several state-of-the-art model interpretation frameworks and techniques, including LIME, ELI5, Skater and SHAP. It demonstrates how to generate feature importances, partial dependence plots, local explanations and SHAP values to interpret model predictions. The overall goal is to familiarize readers with tools for interpreting "black box" machine learning models.

Uploaded by

cidsant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
261 views

Hands-On Machine Learning Model Interpretation - Towards Data Science

The document discusses model interpretation of machine learning models. It introduces explainable artificial intelligence and the need for model interpretation. It then provides a hands-on tutorial on interpreting machine learning models trained on a census income dataset. The tutorial covers several state-of-the-art model interpretation frameworks and techniques, including LIME, ELI5, Skater and SHAP. It demonstrates how to generate feature importances, partial dependence plots, local explanations and SHAP values to interpret model predictions. The overall goal is to familiarize readers with tools for interpreting "black box" machine learning models.

Uploaded by

cidsant
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Images haven’t loaded yet. Please exit printing, wait for images to load, and try to print again.
Explainable Arti cial Intelligence (Part 3)
Hands-on Machine Learning Model
Interpretation
A comprehensive guide to interpreting machine
learning models

Dipanjan (DJ) Sarkar Follow


Dec 13 · 26 min read

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 1/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Introduction
Interpreting Machine Learning models is no longer a luxury but a
necessity given the rapid adoption of AI in the industry. This article in a
continuation in my series of articles aimed at ‘Explainable Arti cial
Intelligence (XAI)’. The idea here is to cut through the hype and enable
you with the tools and techniques needed to start interpreting any

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 2/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

black box machine learning model. Following are the previous articles
in the series in case you want to give them a quick skim (but are not
mandatory for this article).

• Part 1 — The Importance of Human Interpretable Machine


Learning’: which covers the what and why of human
interpretable machine learning and the need and importance of
model interpretation along with its scope and criteria

• Part 2 —Model Interpretation Strategies’ which covers the how


of human interpretable machine learning where we look at
essential concepts pertaining to major strategies for model
interpretation.

In this article we will give you hands-on guides which showcase various
ways to explain potential black-box machine learning models in a
model-agnostic way. We will be working on a real-world dataset on
Census income, also known as the Adult dataset available in the UCI ML
Repository where we will be predicting if the potential income of people
is more than $50K/yr or not.

The purpose of this article is manifold. The rst main objective is to


familiarize ourselves with the major state-of-the-art model
interpretation frameworks out there (a lot of them being extensions of
LIME — the original framework and approach proposed for model
interpretation which we have covered in detail in Part 2 of this series).

We cover usage of the following model interpretation frameworks in


our tutorial.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 3/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

• ELI5

• Skater

• SHAP

The major model interpretation techniques we will be covering in this


tutorial include the following.

• Feature Importances

• Partial Dependence Plots

• Model Prediction Explanations with Local Interpretation

• Building Interpretable Models with Surrogate Tree-based


Models

• Model Prediction Explanation with SHAP values

• Dependence & Interaction Plots with SHAP

Without further ado let’s get started!

Loading Necessary Dependencies


We will be using a lot of frameworks and tools in this article given it is a
hands-on guide to model interpretation. We recommend you to load up
the following dependencies to get the maximum out of this guide!

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 4/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Rememeber to call the shap.initjs() function since a lot of the plots


from shap require JavaScript.

Load and View the Census Income Dataset


You can actually get the census income dataset (popularly known as
the adult dataset) from the UCI ML repository. Fortunately shap

provides us an already cleaned up version of this dataset which we will


be using here since the intent of this article is model interpretation.

Viewing the Data Attributes


Let’s take a look at the major features or attributes of our dataset.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 5/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

1 data, labels = shap.datasets.adult(display=True)


2 labels = np.array([int(label) for label in labels])
3
4 print(data.shape, labels.shape)

((32561, 12), (32561,))

Census Dataset Features

We will explain these features shortly.

Viewing the Class Labels


Let’s view the distribution of people with <= $50K ( False ) and >
$50K ( True ) income which are our class labels which we want to
predict.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 6/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

In [6]: Counter(labels)

Out[6]: Counter({0: 24720, 1: 7841})

De nitely some class imbalance which is expected given that we should


have less people having a higher income.

Understanding the Census Income Dataset


Let’s now take a look at our dataset attributes and understand their
meaning and signi cance.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 7/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Attribute
Type Description
Name

Represents age of the


Age Continuous
person

Represents the nature


of working
class\category
(Private, Self-emp-
Workclass Categorical not-inc, Self-emp-inc,
Federal-gov,
Local-gov, State-gov,
Without-pay, Never-
worked)

Numeric
representation of
educational
qualification.
Ranges from 1-16.
(Bachelors, Some-
Education- college, 11th, HS-
Categorical
Num grad, Prof-school,
Assoc-acdm, Assoc-
voc,
9th, 7th-8th, 12th,
M t 1 t 4th

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 8/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Masters, 1st-4th,
10th, Doctorate, 5th-
6th, Preschool)

Represents the
marital status of the

We have a total of 12 features and our objective is to predict if the


income of a person will be more than $50K ( True ) or less than $50K
( False ). Hence we will be building and interpreting a classi cation
model.

Basic Feature Engineering


Here we convert the categorical columns with string values to numeric
representations. Typically the XGBoost model can handle categorical
data natively being a tree-based model so we don’t one-hot encode the
features here.

1 cat_cols = data.select_dtypes(['category']).columns
2 data[cat_cols] = data[cat_cols].apply(lambda x: x.cat.c
3 data.head()

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 9/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Time to build our train and test datasets before we build our
classi cation model.

Building Train and Test Datasets


For any machine learning model, we always need train and test
datasets. We will be building the model on the train dataset and test the
performance on the test dataset. We maintain two datasets (one with
the encoded categorical values and one with the original values) so we
can train with the encoded dataset but use the original dataset as
needed later on for model interpretation.

1 from sklearn.model_selection import train_test_split


2
3 X_train, X_test, y_train, y_test = train_test_split(dat
4 print(X_train.shape, X_test.shape)
5 X train head(3)

((22792, 12), (9769, 12))

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 10/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Our encoded dataset

We also maintain our base dataset with the actual (not encoded) values
also in a separate dataframe (useful for model interpretation later).

1 data_disp, labels_disp = shap.datasets.adult(display=Tr


2 X_train_disp, X_test_disp, y_train_disp, y_test_disp =
3 print(X_train_disp.shape, X_test_disp.shape)
4 X_train_disp.head(3)

((22792, 12), (9769, 12))

Our actual dataset

Training the classi cation model

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 11/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

We will now train and build a basic boosting classi cation model on our
training data using the popular XGBoost framework, an optimized
distributed gradient boosting library designed to be highly e cient,
exible and portable.

Wall time: 8.16 s

Making predictions on the test data


Here we do the usual, use the trained model to make predictions on the
test dataset.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 12/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

array([0, 0, 1, 0, 0, 1, 1, 0, 0, 1])

Model Performance Evaluation


Time to put the model to the test! Let’s evaluate how our model has
performed with its predictions on the test data. We use my nifty
model_evaluation_utils module for this which leverages scikit-

learn internally to give us standard classi cation model evaluation


metrics.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 13/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Default Model Interpretation Methods


By default it is di cult to gauge on speci c model interpretation
methods for machine learning models out of the box. Parametric
models like logistic regression are easier to interpret given that the
total number of parameters of the model are xed regardless of the
volume of data and one can make some interpretation of the model’s
prediction decisions leveraging the parameter coe cients.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 14/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Non-parametric models are harder to interpret given that the total


number of parameters remain unbounded and increase with the
increase in the data volume. Some non-parametric models like tree-
based models do have some out of the box model interpretation
methods like feature importance which helps us in understanding
which features might be in uential in the model making its prediction
decisions.

Classic feature importances from XGBoost


Here we try out the global feature importance calcuations that come
with XGBoost. The model enables us to view feature importances based
on the following.

• Feature Weights: This is based on the number of times a feature


appears in a tree across the ensemble of trees

• Gain: This is based on the average gain of splits which use the
feature

• Coverage: This is based on the average coverage (number of


samples a ected) of splits which use the feature

Note that they all contradict each other, which motivates the use of
model interpretation frameworks like SHAP which uses something
known as SHAP values, which claim to come with consistency
guarantees (meaning they will typically order the features correctly).

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 15/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 16/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Feature Importance Plots from XGBoost

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 17/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Model Interpretation with ELI5


ELI5 is a Python package which helps to debug machine learning
classi ers and explain their predictions in an easy to understand an
intuitive way. It is perhaps the easiest of the three machine learning
frameworks to get started with since it involves minimal reading of
documentation! However it doesn’t support true model-agnostic
interpretations and support for models are mostly limited to tree-based
and other parametric\linear models. Let’s look at some intuitive ways
of model interpretation with ELI5 on our classi cation model.

Installation Instructions
We recommend installing this framework using pip install eli5

since the conda version appears to be a bit out-dated. Also feel free to
check out the documentation as needed.

Feature Importances with ELI5


Typically for tree-based models ELI5 does nothing special but uses the
out-of-the-box feature importance computation methods which we
discussed in the previous section. By default, ‘gain’ is used, that is the
average gain of the feature when it is used in trees.

eli5.show_weights(xgc.get_booster())

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 18/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Explaining Model Prediction Decisions with ELI5


One of the best way to explain model prediction decisions to either a
technical or a more business-oriented individual, is to examine
individual data-point predictions. Typically, ELI5 does this by showing
weights for each feature depicting how in uential it might have been in
contributing to the nal prediction decision across all trees. The idea
for weight calculation is described here; ELI5 provides an independent
implementation of this algorithm for XGBoost and most scikit-learn
tree ensembles which is de nitely on the path towards model-agnostic
interpretation but not purely model-agnostic like LIME.

Typically, the prediction can be de ned as the sum of the feature


contributions + the “bias” (i.e. the mean given by the topmost region
that covers the entire training set)

Predicting when a person’s income <= $50K

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 19/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Here we can see the most in uential features being the Age , Hours

per week , Marital Status , Occupation & Relationship

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 20/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Predicting when a person’s income > $50K


Here we can see the most in uential features being the Education ,
Relationship , Occupation , Hours per week & Marital Status

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 21/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

It is de nitely interesting to see how similar features play an in uential


role in explaining model prediction decisions for both classes!

Model Interpretation with Skater


Skater is a uni ed framework to enable Model Interpretation for all
forms of models to help one build an Interpretable machine learning
system often needed for real world use-cases using a model-agnostic
approach. It is an open source python library designed to demystify the
learned structures of a black box model both globally(inference on the
basis of a complete data set) and locally(inference about an individual
prediction).

Skater originally started o as a fork of LIME but then broke out as an


independent framework of it’s own with a wide variety of feature and
capabilities for model-agnostic interpretation for any black-box models.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 22/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

The project was started as a research idea to nd ways to enable better


interpretability(preferably human interpretability) to predictive “black
boxes” both for researchers and practitioners.

Install Skater
You can typically install Skater using a simple pip install skater . For
detailed information on the dependencies and intallation instruction
check out installing skater.

📖 Documentation
We recommend you to check out the detailed documentation of Skater.

Algorithms
Skater has a suite of model interpretation techniques some of which are
mentioned below.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 23/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Usage and Examples


Since the project is under active development, the best way to
understand usage would be to follow the examples mentioned in the
Gallery of Interactive Notebook. But we will be showcasing its major
capabilities using the model trained on our census dataset.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 24/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Global Interpretations with Skater


A predictive model is a mapping from an input space to an output
space. Interpretation algorithms are divided into those that o er
statistics and metrics on regions of the domain, such as the marginal
distribution of a feature, or the joint distribution of the entire training
set. In an ideal world there would exist some representation that would
allow a human to interpret a decision function in any number of
dimensions. Given that we generally can only intuit visualizations of a
few dimensions at time, global interpretation algorithms either
aggregate or subset the feature space.

Currently, model-agnostic global interpretation algorithms supported


by skater include partial dependence and feature importance with a
very new release of tree-surrogates also. We will be covering feature
importance and partial dependence plots here.

Creating an interpretation object


The general work ow within the skater package is to create an
interpretation, create a model, and run interpretation algorithms.
Typically, an Interpretation consumes a dataset, and optionally some
metadata like feature names and row ids. Internally, the
Interpretation will generate a DataManager to handle data requests
and sampling.

• Local Models( InMemoryModel ): To create a skater model based


on a local function or method, pass in the predict function to an
InMemoryModel . A user can optionally pass data samples to the
examples keyword argument. This is only used to infer output

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 25/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

types and formats. Out of the box, skater allows models return
numpy arrays and pandas dataframes.

• Operationalized Model( DeployedModel ): If your model is


accessible through an API, use a DeployedModel , which wraps the
requests library. DeployedModels require two functions, an input
formatter and an output formatter, which speak to the requests
library for posting and parsing. The input formatter takes a
pandas DataFrame or a numpy ndarray, and returns an object
(such as a dict) that can be converted to JSON to be posted. The
output formatter takes a requests.response as an input and returns
a numpy ndarray or pandas DataFrame.

We will use the following work ow:

• Build an interpretation object

• Build an in-memory model

• Perform interpretations

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 26/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Feature Importances with Skater


Feature importance is generic term for the degree to which a predictive
model relies on a particular feature. The skater feature importance
implementation is based on an information theoretic criteria,
measuring the entropy in the change of predictions, given a
perturbation of a given feature. The intuition is that the more a model’s
decision criteria depend on a feature, the more we’ll see predictions
change as a function of perturbing a feature. The default method used
is prediction-variance which is the mean absolute value of changes
in predictions, given perturbations in the data.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 27/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Feature Importances from Skater

Partial Dependence
Partial Dependence describes the marginal impact of a feature on
model prediction, holding other features in the model constant. The
derivative of partial dependence describes the impact of a feature
(analogous to a feature coe cient in a regression model). This has
been adapted from T. Hastie, R. Tibshirani and J. Friedman, Elements of
Statistical Learning Ed. 2, Springer. 2009.

The partial dependence plot (PDP or PD plot) shows the marginal e ect
of a feature on the predicted outcome of a previously t model. PDPs
can show if the relationship between the target and a feature is linear,
monotonic or more complex. Skater can show 1-D as well as 2-D PDPs

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 28/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP of ‘Age’ a ecting model prediction


Let’s take a look at how the Age feature a ects model predictions.

PDP for the Age feature

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 29/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Looks like the middle-aged people have a slightly higher chance of


making more money as compared to younger or older people.

PDP of ‘Education Num’ a ecting model prediction


Let’s take a look at how the Education-Num feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 30/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Education-Num feature

Looks like higher the education level, the better the chance of making
more money. Not surprising!

PDP of ‘Capital Gain’ a ecting model prediction


Let’s take a look at how the Capital Gain feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 31/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Capital Gain feature

Unsurprisingly higher the capital gain, the more chance of making


money, there is a steep rise in around $5K — $8K.

PDP of ‘Relationship’ a ecting model prediction

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 32/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Remember that relationship is coded as a categorical variable with


numeric representations. Let’s rst look at how it is represented.

Label Encoding for Relationship

Let’s now take a look at how the Relationship feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 33/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Relationship feature

Interesting de nitely that married folks (husband-wife) have a higher


chance of making more money than others!

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 34/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing interactions between features


‘Age’ and ‘Education-Num’ and their e ect on making
more than $50K
We run a deeper model interpretation here over all the data samples,
trying to see interactions between Age and Education-Num and also
their e ect on the probability of the model predicting if the person will
make more money, with the help of a two-way partial dependence plot.

Two-way PDP showing e ects of the Age and Education-Num features

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 35/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Interesting to see higher the education level and the middle-aged folks
(30–50) having the highest chance of making more money!

Two-way PDP showing interactions between features


‘Education-Num’ and ‘Capital Gain’ and their e ect on
making more than $50K
We run a deeper model interpretation here over all the data samples,
trying to see interactions between Education-Num and Capital Gain

and also their e ect on the probability of the model predicting if the
person will make more money, with the help of a two-way partial
dependence plot.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 36/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing e ects of the Education-Num and Capital Gain features

Basically having a better education and more capital gain leads to you
making more money!

Local Interpretations with Skater


Local Interpretation could be possibly be achieved in two ways. Firstly,
one could possibly approximate the behavior of a complex predictive
model in the vicinity of a single input using a simple interpretable
auxiliary or surrogate model (e.g. Linear Regressor). Secondly, one
could use the base estimator to understand the behavior of a single
prediction using intuitive approximate functions based on inputs and
outputs.

Local Interpretable Model-Agnostic


Explanations(LIME)

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 37/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

LIME is a novel algorithm designed by Riberio Marco, Singh Sameer,


Guestrin Carlos to access the behavior of the any base
estimator(model) using interpretable surrogate models (e.g. linear
classi er/regressor). Such form of comprehensive evaluation helps in
generating explanations which are locally faithful but may not align
with the global behavior. Basically, LIME explanations are based on
local surrogate models. These, surrogate models are interpretable
models (like a linear model or decision tree) that are learned on the
predictions of the original black box model. But instead of trying to t a
global surrogate model, LIME focuses on tting local surrogate models
to explain why single predictions were made.

The idea is very intuitive. To start with, just try and unlearn what you
have done so far! Forget about the training data, forget about how your
model works! Think that your model is a black box model with some
magic happening inside, where you can input data points and get the
models predicted outcomes. You can probe this magic black box as
often as you want with inputs and get output predictions.

Now, you main objective is to understand why the machine learning


model which you are treating as a magic black box, gave the outcome it
produced. LIME tries to do this for you! It tests out what happens to you
black box model’s predictions when you feed variations or
perturbations of your dataset into the black box model. Typically, LIME
generates a new dataset consisting of perturbed samples and the
associated black box model’s predictions. On this dataset LIME then
trains an interpretable model weighted by the proximity of the sampled
instances to the instance of interest. Following is a standard high-level
work ow for this.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 38/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

• Choose your instance of interest for which you want to have an


explanation of the predictions of your black box model.

• Perturb your dataset and get the black box predictions for these
new points.

• Weight the new samples by their proximity to the instance of


interest.

• Fit a weighted, interpretable (surrogate) model on the dataset


with the variations.

• Explain prediction by interpreting the local model.

We recommend you to read the LIME chapter in Christoph Molnar’s


excellent book on Model Interpretation which talks about this in detail.

Explaining Model Predictions with Skater using LIME


Skater can leverage LIME to explain model predictions. Typically, its
LimeTabularExplainer class helps in explaining predictions on tabular
(i.e. matrix) data. For numerical features, it perturbs them by sampling
from a Normal(0,1) and doing the inverse operation of mean-centering
and scaling, according to the means and stds in the training data. For
categorical features, it perturbs by sampling according to the training
distribution, and making a binary feature that is 1 when the value is
the same as the instance being explained. The explain_instance()

function generates explanations for a prediction. First, we generate


neighborhood data by randomly perturbing features from the instance.
We then learn locally weighted linear (surrogate) models on this

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 39/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

neighborhood data to explain each of the classes in an interpretable


way.

Since XGBoost has some issues with feature name ordering when
building models with dataframes, we will build our same model with
numpy arrays to make LIME work without additional hassles of feature
re-ordering. Remember the model being built is the same ensemble
model which we treat as our black box machine learning model.

XGBClassifier(base_score=0.5, booster='gbtree',
colsample_bylevel=1,
colsample_bytree=1, gamma=0,
learning_rate=0.1,
max_delta_step=0, max_depth=5,
min_child_weight=1,
missing=None, n_estimators=500, n_jobs=1,
nthread=None, objective='binary:logistic',
random_state=42, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, seed=None, silent=True,
subsample=1)

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 40/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Predicting when a person’s income <= $50K


Skater gives a nice reasoning below showing which features were the
most in uential in the model taking the correct decision of predicting
the person’s income as below $50K.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 41/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Local interpretations with LIME in Skater

Predicting when a person’s income > $50K


Skater gives a nice reasoning below showing which features were the
most in uential in the model taking the correct decision of predicting
the person’s income as above $50K.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 42/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Local interpretations with LIME in Skater

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 43/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Path to more interpretable models with Tree


Surrogates using Skater
We have see various ways to interpret machine learning models with
features, dependence plots and even LIME. But can we build an
approximation or a surrogate model which is more interpretable from a
really complex black box model like our XGBoost model having
hundreds of decision trees?

Here in, we introduce the novel idea of using TreeSurrogates as


means for explaining a model's learned decision policies (for inductive
learning tasks), which is inspired by the work of Mark W. Craven
described as the TREPAN algorithm.

We recommend checking out the following excellent papers on the


TREPAN algorithm to build surrogate trees.

• Mark W. Craven(1996) EXTRACTING COMPREHENSIBLE MODELS


FROM TRAINED NEURAL NETWORKS

• Mark W. Craven and Jude W. Shavlik(NIPS, 96). Extracting Thee-


Structured Representations of Thained Networks

Brie y, Trepan constructs a decision tree in a best- rst manner. It


maintains a queue of leaves which are expanded into subtrees as they
are removed from the queue. With each node in the queue, Trepan
stores,

• A subset of the training examples,

• Another set of instances (query instances),

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 44/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

• A set of constraints.

The stored subset of training examples consists simply of those


examples that reach the node. The query instances are used, along with
the training examples, to select the splitting test if the node is an
internal node or to determine the class label if it is a leaf. The constraint
set describes the conditions that instances must satisfy in order to reach
the node; this information is used when drawing a set of query
instances for a newly created node. The process of expanding a node in
Trepan is much like it is in conventional decision tree algorithms: a
splitting test is selected for the node, and a child is created for each
outcome of the test. Each child is either made a leaf of the tree or put
into the queue for future expansion.

For Skater’s implementation, for building explainable surrogate


models, the base estimator(Oracle) could be any form of a supervised
learning predictive model — our black box model. The explanations are
approximated using Decision Trees(both for Classi cation/Regression)
by learning decision boundaries similar to that learned by the Oracle
(predictions from the base model are used for learning the Decision
Tree representation). The implementation also generates a delity
score to quantify tree based surrogate model’s approximation to the
Oracle. Ideally, the score should be 0 for truthful explanation both
globally and locally. Let’s check this out in action!

NOTE: The implementation is currently experimental and might change


in future.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 45/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Using the interpreter instance invoke call to the


TreeSurrogate
We can use the Interpretation object we instantiated earlier to invoke a
call to the TreeSurrogate capability.

Using the surrogate model to learn the decision


boundaries learned by the base estimator
We can now t this surrogate model on our dataset to learn the
decision boundaries of our base estimator.

• Reports the delity value when compared to the base estimator


(closer to 0 is better)

• Learner uses F1-score as the default metric of choice for


classi cation.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 46/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

0.009

Taking a look at the position for each feature


We do this since the feature names in the surrogate tree are not
displayed (but are present in the model)

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 47/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Visualizing the Surrogate Tree


We can now visualize our surrogate tree model using the following
code.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 48/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Our surrogate tree model

Interesting rules from the surrogate tree

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 49/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Here are some interesting rules you can observe from the above tree.

• If Relationship < 0.5 (means 0) and Education-num <= 9.5


and Capital Gain <= 4225 → 70% chance of person making
<= $50K

• If Relationship < 0.5 (means 0) and Education-num <= 9.5


and Capital Gain >= 4225 → 94.5% chance of person making
> $50K

• If Relationship < 0.5 (means 0) and Education-num >= 9.5


and Education-num is also >= 12.5 → 94.7% chance of person
making > $50K

Feel free to derive more interesting rules from this and also your own
models! Let’s look at how our surrogate model performs on the test
dataset now.

Surrogate Model Performance Evaluation


Let’s check the performance of our surrogate model now on the test
data.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 50/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Just as expected, the model performance drops a fair bit but still we get
an overall F1-score of 83% as compared to our boosted model’s score
of 87% which is quite good!

Model Interpretation with SHAP

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 51/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

SHAP (SHapley Additive exPlanations) is a uni ed approach to


explain the output of any machine learning model. SHAP connects
game theory with local explanations, uniting several previous methods
and representing the only possible consistent and locally accurate
additive feature attribution method based on what they claim! (do
check out the SHAP NIPS paper for details). We have also covered this
in detail in Part 2 of this series.

Installation
SHAP can be installed from PyPI

pip install shap

or conda-forge

conda install -c conda-forge shap

The really awesome aspect about this framework is while SHAP values
can explain the output of any machine learning model, for really
complex ensemble models it can be slow. But they have developed a
high-speed exact algorithm for tree ensemble methods (Tree SHAP

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 52/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

arXiv paper). Fast C++ implementations are supported for XGBoost,


LightGBM, CatBoost, and scikit-learn tree models!

SHAP (SHapley Additive exPlanations) assigns each feature an


importance value for a particular prediction. Its novel components
include: the identi cation of a new class of additive feature importance
measures, and theoretical results showing there is a unique solution in
this class with a set of desirable properties. Typically, SHAP values try
to explain the output of a model (function) as a sum of the e ects of
each feature being introduced into a conditional expectation.
Importantly, for non-linear functions the order in which features are
introduced matters. The SHAP values result from averaging over all
possible orderings. Proofs from game theory show this is the only
possible consistent approach.

An intuitive way to understand the Shapley value is the following: The


feature values enter a room in random order. All feature values in the
room participate in the game (= contribute to the prediction). The
Shapley value ϕᵢⱼ is the average marginal contribution of feature value
xᵢⱼ by joining whatever features already entered the room before, i.e.

The following gure from the KDD 18 paper, Consistent Individualized


Feature Attribution for Tree Ensembles summarizes this in a nice way!

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 53/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Let’s now dive into SHAP and leverage it for interpreting our model!

Explain predictions with SHAP


Here we use the Tree SHAP implementation integrated into XGBoost to
explain the test dataset! Remember that there are a variety of explainer
methods based on the type of models you are building. We estimate the
SHAP values for a set of samples (test data)

Expected Value: -1.3625857

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 54/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

This returns a matrix of SHAP values ( # samples , # features ). Each


row sums to the di erence between the model output for that sample
and the expected value of the model output (which is stored as
expected_value attribute of the explainer). Typically this di erence
helps us in explaining why the model is inclined on predicting a speci c
class outcome.

Predicting when a person’s income <= $50K


SHAP gives a nice reasoning below showing which features were the
most in uential in the model taking the correct decision of predicting
the person’s income as below $50K. The below explanation shows
features each contributing to push the model output from the base
value (the average model output over the training dataset we passed)
to the actual model output. Features pushing the prediction higher are
shown in red, those pushing the prediction lower are in blue.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 55/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Predicting when a person’s income > $50K


Similarly, SHAP gives a nice reasoning below showing which features
were the most in uential in the model taking the correct decision of
predicting the person’s income as greater than $50K.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 56/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Visualizing and explaining multiple predictions


One of the key advantages of SHAP is it can build beautiful interactive
plots which can visualize and explain multiple predictions at once.
Here we visualize model prediction decisions for the rst 1000 test data
samples.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 57/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 58/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

The above visualization can be interacted with in multiple ways. The


default visualization shows some interesting model prediction pattern
decisions.

• The rst 100 test samples all probably earn more than $50K and
they are married or\and have a good capital gain or\and have a
higher education level!

• The next 170+ test samples all probably earn less than or equal
to $50K and they are mostly un-married and\or are very young
in age or divorced!

• The next 310+ test samples have an inclination towards mostly


earning more than $50K and they are of diverse pro les
including married folks, people with di erent age and education
levels and occupation. Most dominant features pushing the model
towards making a prediction for higher income is the person being
married i.e. relationship: husband or wife!

• The remaining 400+ test samples have an inclination towards


mostly earning less than $50K and they are of diverse pro les
however dominant patterns include relationship: either
unmarried or divorced and very young in age!

De nitely interesting how we can nd out patterns which lead to the


model making speci c decisions and being able to provide explanations
for them.

Feature Importances with SHAP

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 59/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

This basically takes the average of the SHAP value magnitudes across
the dataset and plots it as a simple bar chart.

SHAP feature importance plot

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 60/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

SHAP Summary Plot


Besides a typical feature importance bar chart, SHAP also enables us to
use a density scatter plot of SHAP values for each feature to identify
how much impact each feature has on the model output for individuals
in the validation dataset. Features are sorted by the sum of the SHAP
value magnitudes across all samples. Note that when the scatter points
don’t t on a line they pile up to show density, and the color of each
point represents the feature value of that individual.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 61/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

SHAP summary plot

It is interesting to note that the age and marital status feature has
more total model impact than the capital gain feature, but for those
samples where capital gain matters it has more impact than age or
marital status. In other words, capital gain a ects a few predictions
by a large amount, while age or marital status a ects all predictions by
a smaller amount.

SHAP Dependence Plots


SHAP dependence plots show the e ect of a single (or two) feature
across the whole dataset. They plot a feature’s value vs. the SHAP value
of that feature across many samples. SHAP dependence plots are
similar to partial dependence plots, but account for the interaction
e ects present in the features, and are only de ned in regions of the

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 62/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

input space supported by data. The vertical dispersion of SHAP values


at a single feature value is driven by interaction e ects, and another
feature can be chosen for coloring to highlight possible interactions.

You will also notice its similarity with Skater’s Partial Dependence
Plots!

PDP of ‘Age’ a ecting model prediction


Let’s take a look at how the Age feature a ects model predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 63/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Age feature

Just like we observed before. the middle-aged people have a slightly


higher shap value, pushing the model’s prediction decisions to say that
these individuals make more money as compared to younger or older
people

PDP of ‘Education-Num’ a ecting model prediction


Let’s take a look at how the Education-Num feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 64/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Education-Num feature

Higher education levels have higher shap values, pushing the model’s
prediction decisions to say that these individuals make more money as
compared to people with lower education levels.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 65/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP of ‘Relationship’ a ecting model prediction


Let’s take a look at how the Relationship feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 66/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Relationship feature

Just like we observed during the model prediction explanations,


married people (husband or wife) have a slightly higher shap value,
pushing the model’s prediction decisions to say that these individuals
make more money as compared to other folks!

PDP of ‘Capital Gain’ a ecting model prediction


Let’s take a look at how the Capital Gain feature a ects model
predictions.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 67/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

PDP for the Capital Gain feature

Two-way PDP showing interactions between features


‘Age’ and ‘Capital Gain’ and their e ect on making
more than $50K

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 68/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

The vertical dispersion of SHAP values at a single feature value is


driven by interaction e ects, and another feature is chosen for coloring
to highlight possible interactions. Here we are trying to see interactions
between Age and Capital Gain and also their e ect on the SHAP
values which lead to the model predicting if the person will make more
money or not, with the help of a two-way partial dependence plot.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 69/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing e ects of the Age and Capital Gain features

Interesting to see higher the higher capital gain and the middle-aged
folks (30–50) having the highest chance of making more money!

Two-way PDP showing interactions between features


‘Education-Num’ and ‘Relationship’ and their e ect on
making more than $50K
Here we are trying to see interactions between Education-Num and
Relationship and also their e ect on the SHAP values which lead to
the model predicting if the person will make more money or not, with
the help of a two-way partial dependence plot.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 70/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing e ects of the Education-Num and Relationship features

This is interesting because both the features are similar in some


context, we can see typically married people with relationship status of
either husband or wife having the highest chance of making more
money!

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 71/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing interactions between features


‘Marital Status’ and ‘Relationship’ and their e ect on
making more than $50K
Here we are trying to see interactions between Marital Status and
Relationship and also their e ect on the SHAP values which lead to
the model predicting if the person will make more money or not, with
the help of a two-way partial dependence plot.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 72/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing e ects of the Marital Status and Relationship features

Interesting to see higher the higher education level and the husband or
wife (married) folks having the highest chance of making more money!

Two-way PDP showing interactions between features


‘Age’ and ‘Hours per week’ and their e ect on making
more than $50K
Here we are trying to see interactions between Age and Hours per

week and also their e ect on the SHAP values which lead to the model
predicting if the person will make more money or not, with the help of
a two-way partial dependence plot.

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 73/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Two-way PDP showing e ects of the Age and Hours per week features

Nothing extra-ordinary here, middle-aged people working the most


make the most money!

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 74/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Conclusion
If you are reading this, I would like to really commend your e orts on
going through this huge and comprehensive tutorial on machine
learning model interpretation. This article should help you leverage the
state-of-the-art tools and techniques which should help you in your
journey on the road towards Explanable AI (XAI). Based on the
concepts and techniques we learnt in Part 2, in this article, we actually
implemented them all on a complex machine learning ensemble model
trained on a real-world dataset. I encourage you to try out some of
these frameworks with your own models and datasets and explore the
world of model interpretation!

. . .

What’s next?
In Part 4 of this series, we will be looking at a comprehensive guide to
building interpreting models on unstructured data like text and maybe
even deep learning models!

• Hands-on Model Interpretation on Unstructured Datasets

• Advanced Model Interpretation on Deep Learning Models

Stay tuned for some interesting content!

. . .

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 75/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

Note: There are a lot of rapid developments in this area including a lot of
new tools and frameworks being released over time. In case you want me to
cover any other popular frameworks, feel free to reach out to me. I’m
de nitely interested and will be starting by taking a look into H2O’s model
interpretation capabilities some time in the future.

The code used in this article is available on my GitHub and also as an


interactive Jupyter Notebook.

Check out ‘Part 1 — The Importance of Human Interpretable


Machine Learning’ which covers the what and why of human
interpretable machine learning and the need and importance of model
interpretation along with its scope and criteria in case you haven’t!

Also Part 2 — Model Interpretation Strategies’ which covers the how


of human interpretable machine learning where we look at essential
concepts pertaining to major strategies for model interpretation.

Have feedback for me? Or interested in working with me on research,


data science, arti cial intelligence or even publishing an article on
TDS? You can reach out to me on LinkedIn.

Dipanjan Sarkar - AI Consultant


&amp; Data Science Mentor -
Springboard | LinkedIn
View Dipanjan Sarkar's pro le on
LinkedIn, the world's largest

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 76/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

professional community. Dipanjan


has 2 jobs listed on…
www.linkedin.com

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 77/78
12/27/2018 Hands-on Machine Learning Model Interpretation – Towards Data Science

https://towardsdatascience.com/explainable-artificial-intelligence-part-3-hands-on-machine-learning-model-interpretation-e8ebe5afc608 78/78

You might also like