8/22/25, 4:07 PM 1 Introduction – Interpretable Machine Learning
1 Introduction
1 Introduction
“What’s 2 + 5?” asked teacher Wilhelm van Osten. The answer, of course, was 7. The crowd that had
gathered to witness this spectacle was amazed. Because it wasn’t a human who answered, but a horse
called “Clever Hans”. Clever Hans could do math – or so it seemed. 2 + 5? That’s seven taps with the
horse’s foot and not one more. Quite impressive for a horse.
And indeed, Clever Hans was very clever, as later investigations showed. But its skills were not in math,
but in reading social cues. It turned out that an important success factor was that the human asking
Hans knew the answer. Hans relied on the tiniest changes in the human’s body language and facial
expressions to stop tapping at the right time.
Don’t blindly trust model performance
In machine learning, we have our own versions of this clever horse: Clever Hans Predictors, a term
coined by Lapuschkin et al. (2019). Some examples:
A machine learning model trained to detect whales learned to rely on artifacts in audio files
instead of basing the classification on the audio content (DeLMA and Cukierski 2013).
An image classifier learned to use text on images instead of visual features (Lapuschkin et al.
2019).
A wolf versus dog classifier relied on snow in the background instead of image regions that
showed the animals (Ribeiro, Singh, and Guestrin 2016).
In all these examples, the flaws didn’t lower the predictive performance on the test set. So it’s not
surprising that people are wary, even for well-performing models. They want to look inside the models,
to make sure they are not taking shortcuts. And there are many other reasons to make models
interpretable. For example, scientists are using machine learning in their work. In a survey asking
scientists for their biggest concerns about using machine learning, the top answer was “Leads to more
reliance on pattern recognition without understanding” (Van Noorden and Perkel 2023). This lack of
understanding is not unique to science. If you work in marketing and build a churn model, you want to
predict not only who is likely to churn, but also understand why. Otherwise, how would the marketing
team know what the right response is? The team could send everyone a voucher, but what if the reason
for high churn probability was that they are annoyed by the many emails? Good predictive performance
alone wouldn’t be enough to make full use of the churn model.
Further, many data scientists and statisticians have told me that one reason they are using “simpler
models” is that they couldn’t convince their boss to use a “black box model”. But what if the complex
models make better predictions? Wouldn’t it be great if you could have both good performance and
interpretability?
To solve trust issues, to provide insights into the models, and to better debug the models, you are
reading the right book. Interpretable Machine Learning offers the tools to extract insights from the
model.
A young field with old roots
[Link] 1/3
8/22/25, 4:07 PM 1 Introduction – Interpretable Machine Learning
Linear regression models were already used at the beginning of the 19th century. (Legendre 1806;
Gauss 1877). Statistical modeling grew around that linear regression model, and today we have more
options like generalized additive models and LASSO, to name some popular model classes. In classic
statistics, we typically model distributions and rely on further assumptions that allow us to make
conclusions about the world. To do that, interpretability is key. For example, if you model the effect of
drinking alcohol on risk for cardiovascular problems, statisticians need to be able to extract that insight
from the model. This is typically done by keeping the model interpretable and having a coefficient that
can be interpreted as the effect of a feature on the outcome.
Machine learning has a different modeling approach. It’s more task-driven and prediction-focused, and
the emphasis is on algorithms rather than distributions. Typically, machine learning produces more
complex models. Foundational work in machine learning began in the mid-20th century, while later
developments expanded the field further in the later half of the century. However, neural networks go
back to the 1960s (Schmidhuber 2015), and rule-based machine learning, which is part of
interpretable machine learning, is an active research area since the mid of the 20th century. While not
the main focus, interpretability has always been a concern in machine learning, and researchers
suggested ways to improve interpretability: An example would be the random forest (Breiman 2001)
which already came with built-in feature importance measure.
Interpretable Machine Learning, or Explainable AI, has really exploded as a field around 2015 (Molnar,
Casalicchio, and Bischl 2020). Especially the subfield of model-agnostic interpretability, which offers
methods that work for any model, gained a lot of attention. New methods for the interpretation of
machine learning models are still being published at breakneck speed. To keep up with everything that
is published would be madness and simply impossible. That’s why you will not find the most novel and
fancy methods in this book, but established methods and basic concepts of machine learning
interpretability. These basics prepare you for making machine learning models interpretable.
Internalizing the basic concepts also empowers you to better understand and evaluate any new paper
on interpretability published on the pre-print server [Link] in the last 5 minutes since you began
reading this book (I might be exaggerating the publication rate).
How to read the book
You don’t have to read the book cover to cover, since Interpretable Machine Learning is more of a
reference book with most chapters describing one method. If you are new to interpretability, I would
only recommend reading the chapters on Interpretability, Goals, and Methods Overview first to
understand what interpretability is all about and to have a “map” where you can place each method.
The book is organized into the following parts:
The introductory chapters, including interpretability definitions and methods overview
Interpretable models
Local model-agnostic methods
Global model-agnostic methods
Methods for neural networks
Outlook
Machine learning terminology
Each method chapter follows a similar structure: The first paragraph summarizes the method, followed
by an intuitive explanation that doesn’t rely on math. Then we look into the theory of the method to get
[Link] 2/3
8/22/25, 4:07 PM 1 Introduction – Interpretable Machine Learning
a deeper understanding of how it works, including math and algorithms. I believe that a new method is
best understood using examples. Therefore, each method is applied to real data. Some people say that
statisticians are very critical people. For me, this is true because each chapter contains critical
discussions about the pros and cons of the respective interpretation method. This book is not an
advertisement for the methods, but it should help you decide whether a method is a good fit for your
project or not. In the last section of each chapter, I listed available software implementations.
I hope you will enjoy the read!
Privacy Policy | Impressum
View source Report an issue
Cookie Preferences
[Link] 3/3