0% found this document useful (0 votes)
18 views

Informed Machine Learning A Taxonomy and Survey of Integrating Prior Knowledge Into Learning Systems

AI ML

Uploaded by

kaigamer00852
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Informed Machine Learning A Taxonomy and Survey of Integrating Prior Knowledge Into Learning Systems

AI ML

Uploaded by

kaigamer00852
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

614 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO.

1, JANUARY 2023

Informed Machine Learning –


A Taxonomy and Survey of Integrating
Prior Knowledge into Learning Systems
Laura von Rueden , Sebastian Mayer , Katharina Beckh , Bogdan Georgiev , Sven Giesselbach ,
Raoul Heese , Birgit Kirsch , Julius Pfrommer , Annika Pick , Rajkumar Ramamurthy ,
Michal Walczak , Jochen Garcke , Christian Bauckhage , Member, IEEE, and Jannis Schuecker

Abstract—Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is
the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we
present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine
learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves
as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its
integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge
representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous
papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.

Index Terms—Machine learning, prior knowledge, expert knowledge, informed, hybrid, neuro-symbolic, survey, taxonomy

1 INTRODUCTION dictated by natural laws, or given through regulatory or secu-


rity guidelines, which are important for trustworthy AI [8].
learning has shown great success in building
M ACHINE
models for pattern recognition in domains ranging
from computer vision [1] over speech recognition [2] and
With machine learning models becoming more and more
complex, there is also a growing need for models to be inter-
pretable and explainable [9].
text understanding [3] to Game AI [4]. In addition to
These issues have led to increased research on how to
these classical domains, machine learning and in particu-
improve machine learning models by additionally incorporat-
lar deep learning are increasingly important and success-
ing prior knowledge into the learning process. Although inte-
ful in engineering and the sciences [5], [6], [7]. These
grating knowledge into machine learning is common, e.g.,
success stories are grounded in the data-based nature of
through labelling or feature engineering, we observe a grow-
the approach of learning from a tremendous number of
ing interest in the integration of more knowledge, and espe-
examples.
cially of further formal knowledge representations. For
However, there are many circumstances where purely
example, logic rules [10], [11] or algebraic equations [12], [13]
data-driven approaches can reach their limits or lead to unsat-
have been added as constraints to loss functions. Knowledge
isfactory results. The most obvious scenario is that not enough
graphs can enhance neural networks with information about
data is available to train well-performing and sufficiently gen-
relations between instances [14], which is of interest in image
eralized models. Another important aspect is that a purely
classification [15], [16]. Furthermore, physical simulations
data-driven model might not meet constraints such as
have been used to enrich training data [17], [18], [19]. This het-
erogeneity in approaches leads to some redundancy in nomen-
 Laura von Rueden, Katharina Beckh, Bogdan Georgiev, Sven Giesselbach, clature; for instance, we find terms such as physics-informed
Birgit Kirsch, Annika Pick, Rajkumar Ramamurthy, Christian Bauckhage, deep learning [20], physics-guided neural networks [12], or
and Jannis Schuecker are with the Fraunhofer IAIS, Institute for Intelli- semantic-based regularization [21]. The recent growth of
gent Analysis and Information Systems, 53757 Sankt Augustin, Germany.
E-mail: [email protected]. research activities shows that the combination of data- and
 Sebastian Mayer and Jochen Garcke are with the Fraunhofer SCAI, Insti- knowledge-driven approaches becomes relevant in more and
tute for Algorithms and Scientific Computing, 53757 Sankt Augustin, more areas. However, the growing number and increasing
Germany. variety of research papers in this field motivates a systematic
 Raoul Heese and Michal Walczak are with the Fraunhofer ITWM,
Institute for Industrial Mathematics, 67663 Kaiserslautern, Germany. survey.
 Julius Pfrommer is with the Fraunhofer IOSB, Institute for Optronics, A recent survey synthesizes this into a new paradigm of
System Technologies and Image Exploitation, 76131 Karlsruhe, Germany. theory-guided data science and points out the importance of
Manuscript received 5 Feb. 2020; revised 15 Mar. 2021; accepted 26 Apr. 2021. enforcing scientific consistency in machine learning [22]. Even
Date of publication 12 May 2021; date of current version 7 Dec. 2022. for support vector machines there exists a survey about the
(Corresponding author: Laura von Rueden.)
Recommended for acceptance by L. Chen. incorporation of knowledge into this formalism [23]. The
Digital Object Identifier no. 10.1109/TKDE.2021.3079836 fusion of symbolic and connectionist AI seems more and
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 615

Fig. 1. Information flow in informed machine learning. The informed machine learning pipeline requires a hybrid information source with two compo-
nents: Data and prior knowledge. In conventional machine learning knowledge is used for data preprocessing and feature engineering, but this pro-
cess is deeply intertwined with the learning pipeline (*). In contrast, in informed machine learning prior knowledge comes from an independent
source, is given by formal representations (e.g., by knowledge graphs, simulation results, or logic rules), and is explicitly integrated.

more approachable. In this regard, we refer to recent a survey we classified the approaches in terms of our applied survey-
on graph neural networks and a research direction framed as ing methodology and our obtained key insights. Section 4
relational inductive bias [24]. Our work complements the presents the taxonomy and its elements that we distilled
aforementioned surveys by providing a systematic categori- from surveying a large number of research papers. In Sec-
zation of knowledge representations that are integrated into tion 5, we describe the approaches for the integration of
machine learning. We provide a structured overview based knowledge into machine learning classified according to the
on a survey of a large number of research papers on how to taxonomy in more detail. After brief historical account in
integrate additional, prior knowledge into the machine learn- Section 6, we finally discuss future directions in Section 7
ing pipeline. As an umbrella term for such methods, we and conclude in Section 8.
henceforth use informed machine learning.
Our contributions are threefold: We propose an abstract
concept for informed machine learning that clarifies its build- 2 CONCEPT OF INFORMED MACHINE LEARNING
ing blocks and relation to conventional machine learning. It In this section, we present our concept of informed machine
states that informed learning uses a hybrid information source learning. We first state our notion of knowledge and then pres-
that consists of data and prior knowledge, which comes from ent our descriptive definition of its integration into machine
an independent source and is given by formal representations. learning.
Our main contribution is the introduction of a taxonomy that
classifies informed machine learning approaches, which is
novel and the first of its kind. It contains the dimensions of 2.1 Knowledge
the knowledge source, its representation, and its integra- The meaning of knowledge is difficult to define in general
tion into the machine learning pipeline. We put a special and is an ongoing debate in philosophy [25], [26], [27].
emphasis on categorizing various knowledge representa- During the generation of knowledge, it first appears as
tions, since this may enable practitioners to incorporate useful information [28], which is subsequently validated.
their domain knowledge into machine learning processes. People validate information about the world using the
Moreover, we present a description of available app- brain’s inner statistical processing capabilities [29], [30] or
roaches and explain how different knowledge representa- by consulting trusted authorities. Explicit forms of valida-
tions, e.g., algebraic equations, logic rules, or simulation tion are given by empirical studies or scientific experi-
results, can be used in informed machine learning. ments [27], [31].
Our goal is to equip potential new users of informed Here, we assume a computer-scientific perspective and
machine learning with established and successful meth- understand knowledge as validated information about
ods. As we intend to survey a broad spectrum of methods relations between entities in certain contexts. Regarding
in this field, we cannot describe all methodical details and its use in machine learning, an important aspect of knowl-
we do not claim to have covered all available research edge is its formalization. The degree of formalization
papers. We rather aim to analyze and describe common depends on whether knowledge has been put into writing,
grounds as well as the diversity of approaches in order to how structured the writing is, and how formal and strict
identify the main research directions in informed machine the language is that was used (e.g., natural language ver-
learning. sus mathematical formula). The more formally knowledge
In Section 2, we begin with a formulation of our concept is represented, the more easily it can be integrated into
for informed machine learning. In Section 3, we describe how machine learning.
616 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

2.2 Integrating Prior Knowledge into Machine similarities or differences, and to offer guidelines for users
Learning and researchers. In this section, we describe our classification
Apart from the usual information source in a machine learn- methodology and summarize our key insights.
ing pipeline, the training data, one can additionally integrate
knowledge. If this knowledge is pre-existent and independent 3.1 Methodology
of learning algorithms, it can be called prior knowledge. The methodology of our classification is determined by spe-
Moreover, such prior knowledge can be given by formal rep- cific analysis questions which we investigated in a system-
resentations, which exist in an external, separated way from atic literature survey.
the learning problem and the usual training data. Machine
learning that explicitly integrates such knowledge representa- 3.1.1 Analysis Questions
tions will henceforth be called informed machine learning.
Our guiding question is how prior knowledge can be inte-
Definition Informed machine learning describes learning grated into the machine learning pipeline. Our answers will
from a hybrid information source that consists of data and prior particularly focus on three aspects: Since prior knowledge
knowledge. The prior knowledge comes from an independent in informed machine learning consists of an independent
source, is given by formal representations, and is explicitly inte- source and requires some form of explicit representations,
grated into the machine learning pipeline. we consider knowledge sources and representations. Since
it also is essential at which component of the machine learn-
This notion of informed machine learning thus describes
ing pipeline what kind of knowledge is integrated, we also
the flow of information in Fig. 1 and is distinct from conven-
consider integration methods. In short, our literature survey
tional machine learning.
addresses the following three questions:

2.2.1 Conventional Machine Learning 1) Source: Which source of knowledge is integrated?


Conventional machine learning starts with a specific problem 2) Representation: How is the knowledge represented?
for which there is training data. These are fed into the machine 3) Integration: Where in the learning pipeline is it
learning pipeline, which delivers a solution. Problems can integrated?
typically be formulated as regression tasks where inputs X
have to be mapped to outputs Y . Training data is generated or 3.1.2 Literature Surveying Procedure
collected and then processed by algorithms, which try to To systematically answer the above analysis questions, we
approximate the unknown mapping. This pipeline comprises surveyed a large number of publications describing informed
four main components, namely the training data, the hypothe- machine learning approaches. We used a comparative and
sis set, the learning algorithm, and the final hypothesis [32]. iterative surveying procedure that consisted of different
In traditional approaches, knowledge is generally used in cycles. In the first cycle, we inspected an initial set of papers
the learning pipeline, however, mainly for training data pre- and took notes as to how each paper answers our questions.
processing (e.g., labelling) or feature engineering. This kind Here, we observed that specific answers occur frequently,
of integration is involved and deeply intertwined with the which then led to the idea of devising a classification frame-
whole learning pipeline, such as the choice of the hypothe- work in the form of a taxonomy. In the second cycle, we
sis set or the learning algorithm, as depicted in Fig. 1. inspected an extended set of papers and classified them
Hence, this knowledge is not really used as an independent according to a first draft of the taxonomy. We then further
source or through separated representations, but is rather refined the taxonomy to match the observations from the liter-
used with adaption and as required. ature. In the third cycle, we re-inspected and re-sorted papers
and, furthermore, expanded our set of papers. This resulted
2.2.2 Informed Machine Learning in an extensive literature basis in which all papers are classi-
The information flow of informed machine learning com- fied according to the distilled taxonomy.
prises an additional prior-knowledge integration and thus
consists of two lines originating from the problem, as shown 3.2 Key Insights
in Fig. 1. These involve the usual training data and addi- Next, we present an overview over key insights from our
tional prior knowledge. The latter exists independently of systematic classification. As a preview, we refer to Fig. 2,
the learning task and can be provided in form of logic rules, which visually summarizes our findings. A more detailed
simulation results, knowledge graphs, etc. description of our findings will be given in Sections 4 and 5.
The essence of informed machine learning is that this prior
knowledge is explicitly integrated into the machine learning 3.2.1 Taxonomy
pipeline, ideally via clear interfaces defined by the knowl- Based on a comparative and iterative literature survey, we
edge representations. Theoretically, this applies to each of identified a taxonomy that we propose as a classification
the four components of the machine learning pipeline. framework for informed machine learning approaches.
Guided by the above analysis questions, the taxonomy con-
3 CLASSIFICATION OF APPROACHES sists of the three dimensions knowledge source, knowledge
To comprehend how the concept of informed machine learn- representation and knowledge integration. Each dimension con-
ing is implemented, we performed a systematic classification tains a set of elements that represent the spectrum of differ-
of existing approaches based on an extensive literature sur- ent approaches found in the literature. This is illustrated in
vey. Our goals are to uncover different methods, identify their the taxonomy in Fig. 2.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 617

Fig. 2. Taxonomy of informed machine learning. This taxonomy serves as a classification framework for informed machine learning and structures
approaches according to the three above analysis questions about the knowledge source, knowledge representation and knowledge integration.
Based on a comparative and iterative literature survey, we identified for each dimension a set of elements that represent a spectrum of different
approaches. The size of the elements reflects the relative count of papers. We combine the taxonomy with a Sankey diagram in which the paths con-
nect the elements across the three dimensions and illustrate the approaches that we found in the analyzed papers. The broader the path, the more
papers we found for that approach. Main paths (at least four or more papers with the same approach across all dimensions) are highlighted in darker
grey and represent central approaches of informed machine learning.

With respect to knowledge sources, we found three broad constitute an abstract interface that connects the applica-
categories: Rather specialized and formalized scientific knowl- tion- and the method-oriented side.
edge, everyday life’s world knowledge, and more intuitive
expert knowledge. For scientific knowledge we found the
most informed machine learning papers. With respect to 3.2.2 Frequent Approaches
knowledge representations, we found versatile and fine- The taxonomy serves as a classification framework and
grained approaches and distilled eight categories (Algebraic allows us to identify frequent approaches of informed
equations, differential equations, simulation results, spatial machine learning. In our literature survey, we categorized
invariances, logic rules, knowledge graphs, probabilistic rela- each research paper with respect to each of the three taxon-
tions and human feedback). Regarding knowledge integration, omy dimensions.
we found approaches for all stages of the machine learning Paths Through the Taxonomy. When visually highlighting
pipeline, from the training data and the hypothesis set, over and connecting them, a specific combination of entries
the learning algorithm, to the final hypothesis. However, most across the taxonomy dimensions figuratively results in a
informed machine learning papers consider the two central path through the taxonomy. Such paths represent specific
stages. approaches towards informed learning and we illustrate
Depending on the perspective, the taxonomy can be this by combining the taxonomy with a Sankey diagram, as
regarded from either one of two sides: An application-ori- shown in Fig. 2. We observe that, while various paths
ented user might prefer to read the taxonomy from left to through the taxonomy are possible, specific ones occur
right, starting with some given knowledge source and then more frequently and we will call them main paths. For
selecting representation and integration. Vice versa, a example, we often observed the approach that scientific
method-oriented developer or researcher might prefer to knowledge is represented in algebraic equations, which are
read the taxonomy from right to left, starting with some then integrated into the learning algorithm, e.g., the loss
given integration method. For both perspectives, knowl- function. As another example, we often found that world
edge representations are important building blocks and knowledge such as linguistics is represented by logic rules,
618 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

Fig. 3. Knowledge representations and learning tasks. Fig. 4. Knowledge integration and its goals.

which are then integrated into the hypothesis set, e.g., the conformity. Although these goals are interrelated or even par-
network architecture. These paths, especially the main tially equivalent according to statistical learning theory, it is inter-
paths, can be used as a guideline for users new to the field esting to examine them as different motivations for the chosen
or provide a set of baseline methods for researchers. approach. The distribution of goals for the distinct integration
Paths From Source to Representation. We found that the paths types is shown in Fig. 4. We observe that the main goal always is
from source to representation form groups. That is, for every to achieve better performance. The integration of prior knowl-
knowledge source there appear prevalent representation edge into the training data stands out, because its main goal is to
types. Scientific knowledge is mainly represented in terms of train with less data. The integration into the final hypothesis is
algebraic or differential equations or exist in form of simula- also special, because it is mainly used to ensure knowledge con-
tion results. While other forms of representation are possible, formity for secure and trustworthy AI. All in all, this distribution
too, there is a clear preference for equations or simulations, suggests suitable integration approaches depending on the goal.
likely because most sciences aim at finding natural laws
encoded in formulas. For world knowledge, the representation
4 TAXONOMY
forms of logic rules, knowledge graphs, or spatial invariances
are the primary ones. These can be understood as a group of In this section, we describe the informed machine learning tax-
symbolic representations. Expert knowledge is mainly repre- onomy that we distilled as a classification framework in our
sented by probabilistic relations or human feedback. This is literature survey. For each of the three taxonomy dimen-
appears reasonable because such representations allow for sions knowledge source, knowledge representation and knowl-
informality as well as for a degree of uncertainty, both of which edge integration we describe the found elements, as shown in
might be useful for representing intuition. We also performed Fig. 2. While an extensive approach categorization accord-
an additional analysis on the dependency of the learning task ing to this taxonomy with further concrete examples will be
and found a confirmation of the above described representa- presented in the next section (Section 5), we here describe
tion groups as shown in Fig. 3. the taxonomy on a more conceptual level.
From a theoretical point of view, transformations between
representations are possible and indeed often apparent within 4.1 Knowledge Source
the aforementioned groups. For example, equations can be The category knowledge source refers to the origin of prior
transformed to simulation results, or logic rules can be repre- knowledge to be integrated in machine learning. We observe
sented as knowledge graphs and vice versa. Nevertheless, that the source of prior knowledge can be an established
from a practical point of view, differentiating between forms knowledge domain but also knowledge from an individual
of representations appears useful as specific representations group of people with respective experience.
might already be available in a given set up. We find that prior knowledge often stems from the sciences
Paths from Representation to Integration. For most of the repre- or is a form of world or expert knowledge, as illustrated on the
sentation types we found at least one main path to an integra- left in Fig. 2. This list is neither complete nor disjoint but
tion type. The following mappings can be observed. intended show a spectrum from more formal to less formal, or
Simulation results are very often integrated into the training explicitly to implicitly validated knowledge. Although partic-
data. Knowledge graphs, spatial invariances, and logic rules ular knowledge can be assigned to more than one of these
are frequently incorporated into the hypothesis set. The learn- sources, the goal of this categorization is to identify paths in
ing algorithm is mainly enhanced by algebraic or differential our taxonomy that describe frequent approaches of knowl-
equations, logic rules, probabilistic relations, or human feed- edge integration into machine learning. In the following we
back. Lastly, the final hypothesis is often checked by knowl- shortly describe each of the knowledge sources.
edge graphs or also by simulation results. However, since we Scientific Knowledge. We subsume the subjects of science,
observed various possible types of integration for all represen- technology, engineering, and mathematics under scientific
tation types, the integration still appears to be problem specific. knowledge. Such knowledge is typically formalized and vali-
Hence, we additionally analyzed the literature for the goal of dated explicitly through scientific experiments. Examples are
the prior knowledge integration and found four main goals: the universal laws of physics, bio-molecular descriptions of
Data efficiency, accuracy, interpretability, or knowledge genetic sequences, or material-forming production processes.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 619

TABLE 1
Illustrative Overview of Knowledge Representations in the Informed Machine Learning Taxonomy

Each representation type is illustrated by a simple or prominent example in order to give a first intuitive understanding.

World Knowledge. By world knowledge we refer to facts stating that nothing can travel faster than the speed of light in
from everyday life that are known to almost everyone and vacuum.
can thus also be called general knowledge. It can be more or Differential Equations. Differential equations are a sub-
less formal. Generally, it can be intuitive and validated set of algebraic equations, which describe relations between
implicitly by humans reasoning in the world surrounding functions and their spatial or temporal derivatives. Two
them. Therefore, world knowledge often describes relations famous examples in Table 1 are the heat equation, which is a
of objects or concepts appearing in the world perceived by partial differential equation (PDE), and Newton’s second
humans, for instance, the fact that a bird has feathers and law, which is an ordinary differential equation (ODE). In
can fly. Moreover, by world knowledge we also subsume both cases, there exists a (possibly empty) set of functions
linguistics. Such knowledge can also be explicitly validated that solve the differential equation for given initial or bound-
through empirical studies. Examples are the syntax and ary conditions. Differential equations are often the basis of a
semantics of language. numerical computer simulation. We distinguish the taxon-
Expert Knowledge. We consider expert knowledge to be omy categories of differential equations and simulation
knowledge that is held by a particular group of experts. results in the sense that the former represents a compact
Within the expert’s community it can also be called com- mathematical model while the latter represents unfolded,
mon knowledge. Such knowledge is rather informal and data-based computation results.
needs to be formalized, e.g., with human-machine interfa- Simulation Results. Simulation results describe the
ces. It is also validated implicitly through a group of experi- numerical outcome of a computer simulation, which is
enced specialists. In the context of cognitive science, this an approximate imitation of the behavior of a real-world
expert knowledge can also become intuitive [29]. For exam- process. A simulation engine typically solves a mathe-
ple, an engineer or a physician acquires knowledge over matical model using numerical methods and produces
several years of experience working in a specific field. results for situation-specific parameters. Its numerical
outcome is the simulation result that we describe here as
the final knowledge representation. Examples are the
4.2 Knowledge Representation flow field of a simulated fluid or pictures of simulated
The category knowledge representation describes how knowl- traffic scenes.
edge is formally represented. With respect to the flow of Spatial Invariances. Spatial invariances describe proper-
information in informed machine learning in Fig. 1, it ties that do not change under mathematical transformations
directly corresponds to our key element of prior knowledge. such as translations and rotations. If a geometric object is
This category constitutes the central building block of our invariant under such transformations, it has a symmetry
taxonomy, because it determines the potential interface to (for example, a rotationally symmetric triangle). A function
the machine learning pipeline. can be called invariant, if it has the same result for a sym-
In our literature survey, we frequently encountered certain metric transformation of its argument. Connected to invari-
representation types, as listed in the taxonomy in Fig. 2 and ance is the property of equivariance.
illustrated more concretely in Table 1. Our goal is to provide a Logic Rules. Logic provides a way of formalizing knowl-
classification framework of informed machine learning edge about facts and dependencies and allows for translat-
approaches including the used knowledge representation ing ordinary language statements (e.g., IF A THEN B) into
types. Although some types can be mathematically trans- formal logic rules (A ) B). Generally, a logic rule consists
formed into each other, we keep the representation that are of a set of Boolean expressions (A, B) combined with logical
closest to those in the reviewed literature. Here we give a first connectives (^, _, ) , . . .). Logic rules can be also called
conceptual overview over these types. logic constraints or logic sentences.
Algebraic Equations. Algebraic equations represent Knowledge Graphs. A graph is a pair ðV; EÞ, where V
knowledge as equality or inequality relations between mathe- are its vertices and E denotes edges. In a knowledge graph,
matical expressions consisting of variables or constants. Equa- vertices (or nodes) usually describe concepts whereas edges
tions can be used to describe general functions or to constrain represent (abstract) relations between them (as in the exam-
variables to a feasible set and are thus sometimes also called ple “Man wears shirt” in Table 1). In an ordinary weighted
algebraic constraints. Prominent examples in Table 1 are the graph, edges quantify the strength and the sign of a rela-
equation for the mass-energy equivalence and the inequality tionship between nodes.
620 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

Probabilistic Relations. The core concept of probabilistic 5 DESCRIPTION OF INTEGRATION APPROACHES


relations is a random variable X from which samples x can be
In this section, we give a detailed account of the informed
drawn according to an underlying probability distribution
machine learning approaches we found in our literature
P ðXÞ. Two or more random variables X; Y can be interdepen-
survey. We will focus on methods and therefore structure
dent with joint distribution ðx; yÞ  P ðX; Y Þ. Prior knowledge
our presentation according to knowledge representations.
could be assumptions on the conditional independence or the
This is motivated by the assumption that similar represen-
correlation structure of random variables or even a full
tations are integrated into machine learning in similar
description of the joint probability distributions.
ways as they form the mathematical basis for the integra-
Human Feedback. Human feedback refers to technologies
tion. Moreover the representations combine both the appli-
that transform knowledge via direct interfaces between users
cation- and the method-oriented perspective as described
and machines. The choice of input modalities determines the
in Section 3.2.1.
way information is transmitted. Typical modalities include
For each knowledge representation, we describe the infor-
keyboard, mouse, and touchscreen, followed by speech and
med machine learning approaches in a separate subsection
computer vision, e.g., tracking devices for motion capturing.
and present the observed (paths from) knowledge source and
In theory, knowledge can also be transferred directly via brain
the observed (paths to) knowledge integration. We describe
signals using brain-computer interfaces.
each dimension along its entities starting with the main path
entity, i.e., the one we found in most papers.
This whole section refers to Tables 2 and 3, which lists
4.3 Knowledge Integration
the paper references sorted according to our taxonomy.
The category knowledge integration describes where the
knowledge is integrated into the machine learning pipeline.
Our literature survey revealed that integration approaches 5.1 Algebraic Equations
can be structured according to the four components of training The main path for algebraic equations that we found in
data, hypothesis set, learning algorithm, and final hypothesis. our literature survey comes from scientific knowledge
Though we present these approaches more thoroughly in Sec- and goes into the learning algorithm, but also other inte-
tion 5, the following gives a first conceptual overview. gration types are possible, as illustrated in the following
Training Data. A standard way of incorporating knowledge figure.
into machine learning is to embody it in the underlying training
data. Whereas a classic approach in traditional machine learning
is feature engineering where appropriate features are created
from expertise, an informed approach according to our defini-
tion is the use of hybrid information in terms of the original data
set and an additional, separate source of prior knowledge.
This separate source of prior knowledge allows to accumu-
late information and therefore can create a second data set,
which can then be used together with, or in addition to, the
original training data. A prominent approach is simula- 5.1.1 (Paths From) Knowledge Source
tion-assisted machine learning where the training data is Algebraic equations are mainly used to represent formal-
augmented through simulation results. ized scientific knowledge, but may also be used to express
Hypothesis Set. Integrating knowledge into the hypothe- more intuitive expert knowledge.
sis set is common, say, through the definition of a neural Scientific Knowledge. We observed that algebraic equa-
network’s architecture and hyper-parameters. For example, tions are used in machine learning in various domains of nat-
a convolutional neural network applies knowledge as to ural sciences and engineering, particularly in physics [12],
location and translation invariance of objects in images. [13], [33], [34], [35], but also in biology [36], [37], robotics [38],
More generally, knowledge can be integrated by choosing or manufacturing and production processes [34], [39].
model structure. A notable example is the design of a net- Three representative examples are the following: The tra-
work architecture considering a mapping of knowledge ele- jectory of objects can be described with kinematic laws, e.g.,
ments, such as symbols of a logic rule, to particular neurons. that the position y of a falling object can be described as a
Learning Algorithm. Learning algorithms typically function of time t, namely yðtÞ ¼ y0 þ v0 t þ at2 . Such knowl-
involve a loss function that can be modified according to edge from Newtonian mechanics can be used to improve
additional knowledge, e.g. by designing an appropriate reg- object detection and tracking in videos [13]. Or, the propor-
ularizer. A typical approach of informed machine learning tionality of two variables can be expressed via inequality
is that prior knowledge in form of algebraic equations, for constraints, for example, that the water density r at two dif-
example laws of physics, is integrated by means of addi- ferent depths d1 < d2 in a lake must obey rðd1 Þ  rðd2 Þ,
tional loss terms. which can be used in water temperature prediction [12].
Final Hypothesis. The output of a learning pipeline, i.e., Furthermore, for the prediction of key performance indica-
the final hypothesis, can be benchmarked or validated tors in production processes, relations between control
against existing knowledge. For example, predictions that parameters (e.g., voltage, pulse duration) and intermediate
do not agree with known constraints can be discarded or observables (e.g., current density) are known to influence
marked as suspicious so that results are consistent with prior outcomes and can be expressed as linear equations derived
knowledge. from principles of physical chemistry [34].
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 621

specifically, relations between input parameters, intermedi-


Insert 1: Knowledge-Based Loss Term
ate observables, or output variables reflecting physical con-
When learning a function f  from data ðxi ; yi Þ where
straints can be encoded as linear connections between the
the xi are input features and the yi are labels, a knowl-
layers of a network model [34], [38].
edge-based loss term Lk can be built into the objective
Final Hypothesis. Another integration path applies
function [10], [12]:
algebraic equations to the final hypothesis, mainly serv-
Labelbased Regul: ing as a consistency check with given constraints from a
zfflfflfflfflfflfflfflfflfflfflfflfflffl
P ffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflffl{ zfflfflffl}|fflfflffl{ knowledge domain. This can be implemented as an

f ¼ arg min l i Lðfðxi Þ; yi Þ þ r RðfÞ inconsistency measure that quantifies the deviation of
f
! (1) the predicted results from given knowledge similar to
the above knowledge-based loss terms. It can then be
þ k Lk ðfðxi Þ; xi Þ :
|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl} used as an additional performance metric for model
Knowledgebased
comparison [12]. Such a physical consistency check can
also comprise an entire diagnostics set of functions
Whereas L is the usual label-based loss and R is a regu-
describing particular characteristics [41].
larization function, Lk quantifies the violation of given
Training Data. Another natural way of integrating alge-
prior-knowledge equations. Parameters l , r and k
braic equations into machine learning is to use them for
determine the weight of the terms.
training data generation. While there are many papers in
Note that Lk only depends on the input features xi
this category, we want to highlight one that integrates prior
and the learned function f and thus offers the possibility
knowledge as an independent, second source of information
of label-free supervision [13].
by constructing a specific feature vector that directly models
physical properties and constraints [42].
Expert Knowledge. An example for the representation of
5.2 Differential Equations
expert knowledge is to define valid ranges of variables
Next, we describe informed machine learning approaches
according to experts’ intuition as approximation con-
based on differential equations, which frequently represent
straints [33] or monotonicity constraints [39].
scientific knowledge and are integrated into the hypothesis
set or the learning algorithm.
5.1.2 (Paths to) Knowledge Integration
We observe that a frequent way of integrating equation-
based knowledge into machine learning is via the learning
algorithm. The integration into the other stages is possible,
too, and we describe the approaches here ordered by their
occurence.
Learning Algorithm. Algebraic equations and inequa-
tions can be integrated into learning algorithms via addi-
tional loss terms [12], [13], [33], [35] or, more generally, via 5.2.1 (Paths From) Knowledge Source
constrained problem formulation [36], [37], [39].
The integration of algebraic equations as knowledge-based Differential equations model the behavior of dynamical sys-
loss terms into the learning objective function is detailed in tems by relating state variables to their rate of change. In the
Insert 1. These knowledge-based terms measure potential literature discussed here, differential equations represent
inconsistencies w.r.t., say, physical laws [12], [13]. Such an knowledge from the natural sciences.
Scientific Knowledge. Here we give three prominent
extended loss is usually called physics-based or hybrid loss
examples: The work in [20], [43] considers the Burger’s
and fosters the learning from data as well as from prior
equation, which is used in fluid dynamics to model simple
knowledge. Beyond the measuring inconsistencies with exact
one-dimensional currents and in traffic engineering to
formulas, inconsistencies with approximation ranges or gen-
describe traffic density behavior. Advection-diffusion equa-
eral monotonicity constraints, too, can be quantified via recti-
fied linear units [33]. tions [44] are used in oceanography to model the evolution
As a further approach, support vector machines can incor- of sea surface temperatures. The Schr€ odinger equation stud-
porate knowledge by relaxing the optimization problem into ied in [20] describes quantum mechanical phenomena such
a linear minimization problem to which constraints are added as wave propagation in optical fibres or the behavior of
in form of linear inequalities [36]. Similarly, it is possible to Bose-Einstein condensates.
relax the optimization problem behind certain kernel-based
approximation methods to constrain the behavior of a regres- 5.2.2 (Paths to) Knowledge Integration
sor or classifier in a possibly nonlinear region of the input Regarding the integration of differential equations, our sur-
domain [37]. vey particularly focuses on the integration into neural net-
Hypothesis Set. An alternative approach is the integra- work models.
tion into the hypothesis set. In particular, algebraic equa- Learning Algorithm. A neural network can be trained
tions can be translated into the architecture of neural to approximate the solution of a differential equation. To
networks [34], [38], [40]. One idea is to sequence predefined this end, the governing differential equation is integrated
operations leading to a functional decomposition [40]. More into the loss function similar to Equation (1) [45]. This
622 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

requires evaluating derivatives of the network with


respect to its inputs, for example, via automatic differen-
tiation, an approach that was recently adapted to deep
learning [20]. This ensures the physical plausibility of
the neural network output. An extension to generative
models is possible, too [43]. Finally, probabilistic models
can also be trained by minimizing the distance between
the model conditional density and the Boltzmann distri-
bution dictated by a differential equation and boundary Fig. 5. Information flow for synthetic training data from simulations.
conditions [46].
Hypothesis Set. In many applications, differential
equations contain unknown time- and space-dependent Insert 2: Simulation Results as Synthetic Tr. Data
parameters. Neural networks can model the behavior of The results from a simulation can be used as synthetic
such parameters, which then leads to hybrid architec- training data (see Fig. 5) and can thus augment the origi-
tures where the functional form of certain components is nal, real training data. Some papers that follow this
analytically derived from (partially) solving differential approach are [12], [18], [19], [59], [64], [65], [67].
equations [44], [47], [48]. In other applications, one faces
the problem of unknown mappings from input data to
quantities whose dynamics are governed by known dif-
ferential equations, usually called system states. Here,
neural networks can learn a mapping from observed 5.3.2 (Paths to) Knowledge Integration
data to system states [49]. This also leads to hybrid We find that the integration of simulation results into
architectures with knowledge-based modules, e.g., in machine learning is most often happens via the augmenta-
form of a physics engine. tion of training data. Other approaches that occur fre-
quently are the integration into the hypothesis set or the
final hypothesis.
5.3 Simulation Results Training Data. The integration of simulation results into
Simulation results are also a prominent knowledge repre- training data [12], [18], [19], [59], [64], [65], [67] depends on
sentation in informed machine learning. They mainly come how the simulated, i.e., synthetic, data is combined with the
from scientific knowledge and are used to extend the train- real-world measurements:
ing data. First, additional input features are simulated and, together
with real data, form input features. For example, original fea-
tures can be transformed by multiple approximate simulations
and the similarity of the simulation results can be used to build
a kernel [59].
Second, additional target variables are simulated and
added to the real data as another feature. This way the model
does not necessarily learn to predict targets, e.g., an underlying
physical process, but rather the systematic discrepancy
between simulated and the true target data [12].
5.3.1 (Paths From) Knowledge Source Third, additional target variables are simulated and used as
Computer simulations have a long tradition in many areas synthetic labels, which is of particular use when the original
of the sciences. While they are also gaining popularity in experiments are very expensive [19]. This approach can also be
other domains, most works on integrating simulation realized with physics engines, for example, pre-trained neural
results into machine learning deal with natural sciences and networks can be tailored towards an application through addi-
engineering. tional training on simulated data [64]. Synthetic training data
Scientific Knowledge. Simulation results informing generated from simulations can also be used to pre-train com-
machine learning can be found in fluid- and thermodynam- ponents of Bayesian optimization frameworks [65].
ics [12], material sciences [19], [60], [61], life sciences [59], In informed machine learning, training data thus stems
mechanics and robotics [64], [65], [66], or autonomous driv- from a hybrid information source and contains both simulated
ing [18]. To make it more concrete, we give three examples: In and real data points (see Insert 2). The gap between the syn-
material sciences, a density functional theory ab-initio simula- thetic and the real domain can be narrowed via adversarial
tion can be used to model the energy and stability of potential networks such as SimGAN. These improve the realism of, say,
new material compounds and their crystal structure [61]. Even synthetic images and can generate large annotated data sets by
complex material forming processes can be simulated, for simulation [67]. The SPIGAN framework goes one step further
example a composite textile draping process can be simulated and uses additional, privileged information from internal data
based on a finite-element model [19]. As an example for auton- structures of the simulation in order to foster unsupervised
omous driving, urban traffic scenes under specific weather domain adaption of deep networks [18].
and illumination conditions, which might be useful for the Hypothesis Set. Another approach we observed integrates
training of visual perception components, can be simulated simulation results into the hypothesis set [60], [68], [69], which
with dedicated physics engines [18]. is of particular interest when dealing with low-fidelity
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 623

TABLE 2
References Classified by Knowledge Representation and (Path From) Knowledge Source

TABLE 3
References Classified by Knowledge Representation and (Path to) Knowledge Integration

simulations. These are simplified simulations that approxi- hypothesis set of a machine learning model. Specifically, simu-
mate the overall behaviour of a system but ignore intricate lations can validate results of a trained model [19], [61], [66].
details for the sake of computing speed.
When building a machine learning model that reflects the
actual, detailed behaviour of a system, low-fidelity simulation 5.4 Spatial Invariances
results or a response surface (a data-driven model of the simu- Next, we describe informed machine learning approaches
lation results) can be build into the architecture of a knowl- involving the representation type of spatial invariances.
edge-based neural network (KBANN [53], see Insert 3), e.g. by Their main path comes from world knowledge and goes to
replacing one or more neurons. This way, parts of the network the hypothesis set.
can be used to learn a mapping from low-fidelity simulation
results to a few real-world observations or high-fidelity simu-
lations [60], [69].
Learning Algorithm. Furthermore, a simulation can
directly be integrated into iterations of a a learning algorithm.
For example, a realistic positioning of objects in a 3D scene can
be improved by incorporating feedback from a solid-body sim-
ulation into learning [66]. By means of reinforcement learning,
this is even feasible if there are no gradients available from the
simulation. 5.4.1 (Paths From) Knowledge Source
Final Hypothesis. A last but important approach that we We mainly found references using spatial invariances in the
found in our survey integrates simulation results into the final context of world knowledge or scientific knowledge.
624 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

invariance in image recognition considers harmonic network


architecture where a certain response entanglement (arising
from features that rotate at different frequencies) is resolved
[75]. The goal is to design CNNs that exhibits equivariance to
patch-wise translation and rotation by replacing conventional
CNN filters with circular harmonics.
Fig. 6. Steps of rules-to-network translation [53]. Simple example for In support vector machines, invariances under group
integrating rules into a KBANN. transformations and prior knowledge about locality can be
incorporated by the construction of appropriate kernel func-
tions [72]. In this context, local invariance is defined in terms
Insert 3: Knowledge-Based Artificial Neural Networks
of a regularizer that penalizes the norm of the derivative of
(KBANNs)
the decision function [23].
Rules can be integrated into neural architectures by
Training Data. An early example of integrating knowledge
mapping the rule’s components to the neurons and
as invariances into machine learning is the creation of virtual
weights with these steps [53] (see Fig. 6):
examples [76] and it has been shown that data augmentation
1) Get rules. If needed, rewrite them to have a hier- through virtual examples is mathematically equivalent to
archical structure. incorporating prior knowledge via a regularizer. A similar
2) Map rules to a network architecture. Construct approach is the creation of meta-features [82]. For instance, in
(positively/negatively) weighted links for (exist- turbulence modelling using the Reynolds stress tensor, a fea-
ing/negated) dependencies. ture can be createad that is rotational, reflectional and Galilean
3) Add nodes. These are not given through the ini- invariant [52]. This is achieved by selecting features fulfilling
tial rule set and represent hidden units. rotational and Gallilean symmetries and augmenting the train-
4) Perturb the complete set of weights. ing data to ensure reflectional invariance.
After the KBANN’s architecture is built, the network
is refined with learning algorithms. 5.5 Logic Rules
Logic Rules play an important role for the integration of
prior knowledge into machine learning. In our literature
World Knowledge. Knowledge about invariances may survey, we mainly found the the source of world knowledge
fall into the category of world knowledge, for example and the two integration paths into the hypothesis set and
when modeling facts about local or global pixel correlations the learning algorithm.
in images [72]. Indeed, invariants are often used in image
recognition where many characteristics are invariant under
metric-preserving transformations. For example, in object
recognition, an object should be classified correctly indepen-
dent of its rotation in an image.
Scientific Knowledge. In physics, Noether’s theorem
states that certain symmetries (invariants) lead to conserved
quantities (first integrals) and thus integrate Hamiltonian
systems or equations of motion [52], [50]. For example, in 5.5.1 (Path From) Knowledge Source
equations modeling planetary motion, the angular momen- Logic rules can formalize knowledge from various sources,
tum serves as such an invariant. but the most frequent is world knowledge. Here we give
some illustrative examples.
5.4.2 (Paths to) Knowledge Integration World Knowledge. Logic rules often describe knowledge
In most references we found spatial invariances informing about real-world objects [10], [11], [13], [77], [78] such as seen in
the hypothesis set. images. This can focus on object properties, such as for animals
Hypothesis Set. Invariances from physical laws can be x that ðFLYðxÞ ^ LAYEGGSðxÞ ) BIRDðxÞÞ [10]. It can also
integrated into the architecture of a neural network. For focus on relations between objects such as the co-occurrence of
example, invariant tensor bases can be used to embed Gali- characters in game scenes, e.g. ðPEACH ) MARIO) [13].
lean invariance for the prediction of fluid anisotropy ten- Another knowledge domain that can be well represented
sors [50], or the physical Minkowski metric that reflects by logic rules is linguistics [83], [84], [85], [90], [91], [112],
mass invariance can be integrated via a Lorentz layer into a [113]. Linguistic rules can consider the sentiment of a sen-
neural network [51]. tence (e.g., if a sentence consists of two sub-clauses con-
A recent trend is to integrate knowledge as spatial invarian- nected with a ’but’, then the sentiment of the clause after
ces into the architecture or layout of convolutional neural net- the ’but’ dominates [85]); or the order of tags in a given
works, which leads to so called geometric deep learning word sequence (e.g., if a given text element is a citation,
in [111]. A natural generalization of CNNs are group equivar- then it can only start with an author or editor field [83]).
iant CNNs (G-CNNs) [70], [71], [74]. G-convolutions provide a Rules can also describe dependencies in social networks. For
higher degree of weight sharing and expressiveness. Simply example, on a scientific research platform, it can be observed
put, the idea is to define filters based on a more general group- that authors citing each other tend to work in the same
theoretic convolution. Another approach towards rotation field (Citeðx; yÞ ^ hasFieldAðxÞ ) hasFieldAðyÞ) [21].
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 625

5.5.2 (Path to) Knowledge Integration


We observe that logic rules are integrated into learning
mainly in the hypothesis set or, alternatively, in the learning
algorithm.
Hypothesis Set. Integration into the hypothesis set com-
prises both deterministic and probabilistic approaches. The
former include neural-symbolic systems, which use rules as
the basis for the model structure [53], [54], [89]. In Knowledge-
Fig. 7. Illustrative application example of using neural networks and
Based Artificial Neural Networks (KBANNs), the architecture knowledge graphs for image classification, similar as in [15]. The image
is constructed from symbolic rules by mapping the compo- (from the COCO dataset) shows a pedestrian cross walk.
nents of propositional rules to network components [53] as fur-
ther explained in Insert 3. Extensions are available that also
output a revised rule set [54] or also consider first-order Insert 4: Integrating Knowledge Graphs in CNNs for Image
logic [89]. A recent survey about neural-symbolic comput- Classification
ing [114] summarizes further methods. Image classification through convolutional neural
Integrating logic rules into the hypothesis set in a proba- networks can be improved by using knowledge
bilistic manner is yet another approach [77], [78], [90], [91]. graphs that reflect relations between detected objects.
These belong to the research direction of statistical relational Technically, such relations form adjacency matrices in
learning [115]. Corresponding frameworks provide a logic gated graph neural networks [15] (see Fig. 7). During
templating language to define a probability distribution the detection, the network graph is propagated, start-
over a set of random variables. Two prominent frameworks ing with detected nodes and then expanding to neigh-
are markov logic networks [77], [90] and probabilistic soft bors [24].
logic [78], [91], which translate a set of first-order logic rules
to a markov random field. Each rule specifies dependencies
between random variables and serves as a template for so
called potential functions, which assign probability mass to World Knowledge. Since humans perceive the world
joint variable configurations. as composed of entities, graphs are often used to repre-
Learning Algorithm. The integration of logic rules into sent relations between visual entities. For example, the
the learning algorithm is often accomplished via additional, Visual Genome knowledge graph is build from human
semantic loss terms [10], [11], [13], [21], [83], [84], [85]. These annotations of object attributes and relations between
augment the objective function similar to the knowledge- objects in natural images [15], [16]. Similarly, the MIT
based loss terms explained above. However, for logic rules, ConceptNet [116] encompasses concepts of everyday life
the additional loss terms evaluate a functional that trans- and their relations automatically built from text data. In
forms rules into continuous and differentiable constraints, natural language processing, knowledge graphs often
for example via the t-norm [10]. Semantic loss functions can represent knowledge about relations among concepts,
also be derived from first principles using a set of axi- which can be referred to by words. For example, Word-
oms [11]. As a specific approach for student-teacher archi- Net [117] represents semantic and lexical relations of
tectures, the rules can be first integrated in a teacher words such as synonymy. Such knowledge graphs are
network and can then be used by a student network that is often used for information extraction in natural language
trained by minimizing a semantic loss term that measures processing, but information extraction can also be used
the imitation of the teacher network [84], [85]. to build new knowledge graphs [118].
Scientific Knowledge. In physics, graphs can immedi-
5.6 Knowledge Graphs ately describe physical systems such as spring-coupled
The taxonomy paths we observed in our literature survey masses [14]. In medicine, networks of gene-protein interac-
that are related to knowledge representation are illustrated tions describe biological pathway information [55] and the
in the following graphic. hierarchical nature of medical diagnoses is captured by clas-
sification systems such as the International Classification of
Diseases (ICD) [56], [63].

5.6.2 (Paths to) Knowledge Integration


In our survey, we observed the integration of knowledge
graphs in all four components of the machine learning
pipeline but most prominently in the hypothesis set.
Hypothesis Set. The fact that the world consists of
5.6.1 (Paths From) Knowledge Source interrelated objects can be integrated by altering the
Since graphs are very versatile modeling tools, they can rep- hypothesis set. Graph neural networks operate on graphs
resent various kinds of structured knowledge. Typically, and thus feature an object- and relation-centric bias in
they are constructed from databases, however, the most fre- their architecture [24]. A recent survey [24] gives an
quent source we found in informed machine learning overview over this field and explicitly names this knowl-
papers is world knowledge. edge integration relational inductive bias. This bias is of
626 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

benefit, e.g., for learning physical dynamics [14], [62] or


object detection [16].
In addition, graph neural networks allow for the
explicit integration of a given knowledge graph as a sec-
ond source of information. This allows for multi-label
classification in natural images where inference about a
particular object is facilitated by using relations to other
objects in an image [15] (see Insert 4). More generally, a
graph reasoning layer can be inserted into any neural
network [81]. The main idea is to enhance representa- 5.7.1 (Paths From) Knowledge Source
tions in a given layer by propagating through a given Knowledge in form of probabilistic relations originates most
knowledge graph. prominently from domain experts, but can also come from
Another approach is to use attention mechanisms on a other sources such as natural sciences.
knowledge graph in order to enhance features. In natural Expert Knowledge. A human expert has intuitive knowl-
language analysis, this facilitates the understanding as well edge over a domain, for example, which entities are related
as the generation of conversational text [79]. Similarly, to each other and which are independent. Such relational
graph-based attention mechanism are used to counteract knowledge, however, is often not quantified and validated
too few data points by using more general categories and differs from, say, knowledge in natural sciences.
[63]. Also, attention on related knowledge graph embed- Rather, it involves degrees of belief or uncertainty.
ding can support the training of word embeddings like Human expertise exists in all domains. In the car insur-
ERNIE [86], which are fed into language models like ance, driver features like age relate to risk aversion [95].
BERT [94], [119]. Another examples is computer expertise for troubleshoot-
Training Data. Another prominent approach is distant ing, i.e relating a device status to observations [90].
supervision where information in a graph is used to auto- Scientific Knowledge. Correlation structures can also be
matically annotate texts to train natural language processing obtained from natural sciences knowledge. For example,
systems. This was originally done naı̈vely by considering correlations between genes can be obtained from gene inter-
each sentence that matches related entities in a graph as a action networks [122] or from a gene ontology [57].
training sample [80]; however, recently attention-based net-
works have been used to reduce the influence of noisy train-
ing samples [120]. 5.7.2 (Paths to) Knowledge Integration
Learning Algorithm. Various works discuss the inte- We generally observe the integration of probabilistic rela-
gration of graph knowledge into the learning algorithm. tions into the hypothesis set as well as into the learning
For instance, a regularization term based on the graph algorithm and the final hypothesis.
Laplacian matrix can enforce strongly connected varia- Hypothesis Set. Expert knowledge is the basis for probabi-
bles to behave similarly in the model, while unconnected listic graphical models. For example, Bayesian network struc-
variables are free to contribute differently. This is com- tures are typically designed by human experts and thus fall
monly used in bioinformatics to integrate genetic path- into the category of informing the hypothesis set. Here, we
way information [55], [56]. Some natural language focus on contributions where knowledge and Bayesian infer-
models, too, include information from a knowledge ence are combined in more intricate ways, for instance, by
graph into the learning algorithm, e.g. when computing learning network structures from knowledge and from data.
word embeddings. Known relations among words can be A recent overview [123] categorizes the type of prior knowl-
utilized as augmented contexts [88] in word2vec training edge about network structures into the presence or absence of
[121]. edges, edge probabilities, and knowledge about node orders.
Final Hypothesis. Finally, graph can also be used to Probabilistic knowledge can be used directly in the
improve or validate final hypotheses or trained models. For hypothesis set. For example, extra nodes can be added to a
instance, a recent development is to post-process word Bayesian network thus altering the hypothesis set [96], or
embeddings based on information from knowledge graphs the structure of a probabilistic model can be chosen in accor-
[87], [92]. Furthermore, semantic segmentation in autono- dance to given spatio-temporal structures [124]. In other
mous driving can be validated using knowledge graphs of hybrid approaches, the parameters of the conditional distri-
street maps [110], or in object detection, predicted probabili- bution of the Bayesian network are either learned from data
ties of a learning system can be refined using semantic con- or obtained from knowledge [73], [100].
sistency measures [93] derived form knowledge graphs. In Learning Algorithm. Human knowledge can also be used
both cases, the knowledge graphs are used to indicate to define an informative prior [100], [125], which affects the
whether the prediction is consistent with available learning algorithm as is has a regularizing effect. Structural
knowledge. constraints can alter score functions or the selection policies of
conditional independence test, informing the search for the
network structure [95]. More qualitative knowledge, e.g.,
5.7 Probabilistic Relations observing one variable increases the probability of another,
The most frequent paths probabilistic relations found in our was integrated using isotonic regression, i.e., parameter esti-
literature survey comes from expert knowledge and goes to mation with order constraints [102]. Causal network inference
the hypothesis set or the learning algorithm. can make use of ontologies to select the tested interventions
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 627

[57]. Furthermore, prior causal knowledge can be used to con- method because the human knowledge is essentially used
strain the direction of links in a Bayesian network [58]. for label generation only. However, recent efforts integrate
Final Hypothesis. Finally, predictions obtained from a further knowledge into the active learning process.
Bayesian network can be judged by probabilistic relational Visual analytics combines analysis techniques and inter-
knowledge in order to refine the model [106]. active visual interfaces to enable exploration of –and infer-
ence from– data [127]. Machine learning is increasingly
5.8 Human Feedback combined with visual analytics. For example, visual analyt-
Finally, we look at informed machine learning approaches ics systems allow users to drag similar data points closer in
belonging to the representation type of human feedback. order to learn distance functions [105], provide corrective
The most common path begins with expert knowledge and feedback in object recognition [107], or even to alter cor-
ends at the learning algorithm. rectly identified instances where the interpretation is not in
line with human explanations [108], [109].
Lastly, various tools exist for text analysis, in particular for
topic modeling [97] where users can create, merge and refine
topics or change keyword weights. They thus impart knowl-
edge by generating new reference matrices (term-by-topic
and topic-by-document matrices) that are integrated in a regu-
larization term that penalizes the difference between the new
and the old reference matrices. This is similar to the semantic
5.8.1 (Paths From) Knowledge Source loss term described above.
Training Data and Hypothesis Set. Another approach
Compared to other categories in our taxonomy, knowledge
towards incorporating expert knowledge in reinforcement
representation via human feedback is less formalized and
learning considers human demonstration of problem solving.
mainly stems from expert knowledge.
Expert demonstrations can be used to pre-train a deep Q-net-
Expert Knowledge. Examples of knowledge that fall into
work, which accelerates learning [104]. Here, prior knowledge
this category include knowledge about topics in text docu- is integrated into the hypothesis set and the training data since
ments [97], agent behaviors [98], [99], [103], [104], and data the demonstrations inform the training of the Q-network and,
patterns and hierarchies [97], [105], [109]. Knowledge is at the same time, allow for interactive learning via simulations.
often provided in form of relevance or preference feedback
and humans in the loop can integrate their intuitive knowl-
edge into the system without providing an explanation for 6 HISTORICAL BACKGROUND
their decision. For example, in object recognition, users can
The idea of integrating knowledge into learning has a long
provide their corrective feedback about object boundaries
history. Historically, AI research roughly considered the
via brush strokes [107]. As another example, in Game AI, an
two antipodal paradigms of symbolism and connectionism.
expert user can give spoken instructions for an agent in an
The former dominated up until the 1980s and refers to rea-
Atari game [99].
soning based on symbolic knowledge; the latter became
more popular in the 1990s and considers data-driven deci-
5.8.2 (Paths to) Knowledge Integration sion making using neural networks. Especially Minsky [128]
Human feedback for machine learning is usually assumed to pointed out limitations of symbolic AI and promoted a
be limited to feature engineering and data annotation. How- stronger focus on data-driven methods to allow for causal
ever, it can also be integrated into the learning algorithm itself. and fuzzy reasoning. Already in the 1990s were knowledge
This often occurs in areas of reinforcement learning, or inter- data bases used together with training data to obtain knowl-
active learning combined with visual analytics. edge-based artificial neural networks [53]. In the 2000s,
Learning Algorithm. In reinforcement learning, an agent when support vector machines (SVMs) were the de-facto
observes an unknown environment and learns to act based on paradigm in classification, there was interest in incorporat-
reward signals. The TAMER framework [98] provides the ing knowledge into this formalism [23]. Moreover, in the
agent with human feedback rather than (predefined) rewards. geosciences, and most prominently in weather forecasting,
This way, the agent learns from observations and human knowledge integration dates back to the 1950s. Especially
knowledge alike. While these approaches can quickly learn the discipline of data assimilation deals with techniques
optimal policies, it is cumbersome to obtain the human feed- that combine statistical and mechanistic models to improve
back for every action. Human preference w.r.t. whole action prediction accuracy [129], [130].
sequences, i.e., agent behaviors, can circumvent this [103].
This enables the learning of reward functions. Expert knowl-
edge can also be incorporated through natural language inter- 7 DISCUSSION OF CHALLENGES AND DIRECTIONS
faces[99]. Here, a human provides instructions and agents Our findings about the main approaches of informed
receive rewards upon completing these instructions. machine learning are summarized in Table 4. It gives for
Active learning offers a way to include the “human in the each approach the taxonomy path, its main motivation, the
loop” to efficiently learn with minimal human intervention. central approach idea, remarks to potential challenges, and
This is based on iterative strategies where a learning algo- our viewpoint on current or future directions. For further
rithm queries an annotator for labels [126]. We do not con- details on the methods themselves and the corresponding
sider this standard active learning as an informed learning papers, we refer to Section 5. In the following, we discuss
628 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

TABLE 4
Main Approaches of Informed Machine Learning

The approaches are sorted by taxonomy path and knowledge representation. Methodical details can be found in Section 5. Challenges and directions are discussed
in Section 7.

the challenges and directions for these main approaches, techniques from data assimilation [130] could also be help-
sorted by the integrated knowledge representations. ful to combine modelling from knowledge and data.
Prior knowledge in the form of algebraic equations can Simulation results can be used for synthetic data genera-
be integrated as constraints via knowledge-based loss terms tion or augmentation (e.g., [18], [19], [59]), but this can bring
(e.g., [12], [13], [35]). Here, we see a potential challenge in up the challenge of a mismatch between real and simulated
finding the right weights for supervision from knowledge data. A promising direction to close the gap is domain adapta-
versus data labels. Currently, this is solved by setting the tion, especially adversarial training [67], [137], or domain ran-
hyperparameters for the individual loss terms [12]. How- domization [138]. Moreover, for future work we see further
ever, we think that strategies from more recently developed potential in the development of new hybrid systems that com-
learning algorithms, such as self-supervised [131] or few- bine machine learning and simulation in more sophisticated
shot learning [132], could also advance the supervision ways [139].
from prior knowledge. Moreover, we suggest further The utilization of spatial invariances through model archi-
research on theoretical concepts based on the existing gener- tectures with invariant characteristics, such as group equivar-
alization bounds from statistical learning theory [133], [134] iant or convolutional networks, diminish the model search
and the connection between regularization and effective space (e.g., [70], [71], [75]). Here, a potential challenge is the
hypothesis space [135]. proper invariance specification and implementation [75] or
Differential equations can be integrated similarly, but expensive evaluations on more complex geometries [111].
with a specific focus on physics-informed neural networks Therefore, we think that the efficient adaptation of invariant-
that constrain the model derivatives by the underlying dif- based models to further scenarios can further improve geo-
ferential equation (e.g., [20], [45], [46]). A potential challenge metric-based representation learning [111].
is the robustness of the solution, which is the subject of cur- Logic rules can be encoded in the architecture of knowl-
rent research. One approach is to investigate the the model edge-based neural networks (KBANNs), (e.g., [53], [54], [89]).
quality by a suitable quanitification of its uncertainty [43], Since this idea was already developed when neural networks
[46]. We think, a more in-depth comparison with existing had only a few layers, a question is, if it is still feasible for
numerical solvers [136] would also be helpful. Another chal- deep neural networks. In order to improve the practicality,
lenge of physical systems is the generation and integration we suggest to develop automated interfaces for knowledge
of sensor data in real-time. This is currently tackled by integration. A future direction could be the development of
online learning methods [48]. Furthermore, we think that new neuro-symbolic systems. Although the combination of
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 629

connectionist and symbolic systems into hybrid systems is a 8 CONCLUSION


longtime idea [140], [141], it is currently getting more atten-
In this paper, we presented a unified classification framework
tion [142], [143]. Another challenge, especially in statistical
for the explicit integration of additional prior knowledge into
relational learning (SRL), such as Markov logic networks or
machine learning, which we described using the umbrella
probabilistic soft logic (e.g., [78], [91], [144]). is the aquisition
term of informed machine learning. Our main contribution is the
of rules when they are not yet given. An ongoing research
development of a taxonomy that allows a structured categori-
topic to this end is the learning of rules from data, which is
zation of approaches and the uncovering of main paths. More-
called structure learning [145].
over, we presented a conceptual clarification of informed
Knowledge graphs can be integrated into learning systems
machine learning, as well as a systematic and comprehensive
either explicitly via graph propagation and attention mecha-
research survey. This helps current and future users of
nisms, or implicitly via graph neural networks with relational
informed machine learning to identify the right methods to
inductive bias (e.g., [14], [15], [16]). A challenge is the compa-
use their prior knowledge, for example, to deal with insuffi-
rability between different methods, because authors often use
cient training data or to make their models more robust.
template like ConceptNet [79] or VisualGenome [15], [16] and
customize the graphs in to improve running time and perfor-
mance. Since the choice of graph can have high influence [81],
ACKNOWLEDGMENTS
we suggest a pool of standardized graphs in order to improve The authors would like to thank Dorina Weichert, Daniel
comparability, or even to establish benchmarks. Another Paurat, Lars Hillebrand, Theresa Bick, and Nico Piatkowski
interesting direction is to combine graph using and graph for helpful discussions. This work was a joint effort of the
learning. A requirement here is the need for good entity link- Fraunhofer Research Center for Machine Learning (RCML)
ing models in approaches such as KnowBERT [94] and within the Fraunhofer Cluster of Excellence Cognitive Inter-
ERNIE [86] and the continuous embedding of new facts in the net Technologies (CCIT) and the Competence Center for
graph. Machine Learning Rhine Ruhr (ML2R). This work was sup-
Probabilistic Relations can be integrated as prior knowl- ported by the Federal Ministry of Education and Research
edge in terms of a-priori probability distributions that are of Germany under Grant 01|S18038B. All authors are with
refined with additional observations (e.g., [73], [96], [100]). the Fraunhofer Center for Machine Learning.
The main challenges are the large computational effort and
the formalization of knowledge in terms of inductive priors.
REFERENCES
Directions responding to this are variational methods with
origins in optimization theory and functional analysis [146] [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-
cation with deep convolutional neural networks,” in Proc. Neural
and variational neural networks [147]. Besides scaling issues, Inf. Process. Syst., 2012, pp. 84–90.
an explicit treatment of causality is becoming more important [2] G. Hinton et al., “Deep neural networks for acoustic modeling in
in machine learning and closely related to graphical probabi- speech recognition,” IEEE Signal Process. Mag., vol. 29, no. 6, pp.
82–97, Nov. 2012.
listic models [148]. [3] A. Conneau, H. Schwenk, L. Barrault, and Y. Lecun, “Very deep
Human feedback can be integrated into the learning algo- convolutional networks for text classification,” 2016,
rithm by human-in-the-loop (HITL) reinforcement learning arXiv:1606.01781.
(e.g., [98], [103]), or by explanation alignment through interac- [4] D. Silver et al., “Mastering the game of go with deep neural net-
works and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
tive learning combined with visual analytics (e.g., [108], [5] K. T. Butler, D. W. Davies, H. Cartwright, O. Isayev, and
[109]). However, the exploration of human feedback can be A. Walsh, “Machine learning for molecular and materials scien-
very expensive due to its latency in real systems. Exploratory ce,” Nature, vol. 559, no. 7715, pp. 547–555, 2018.
[6] T. Ching et al., “Opportunities and obstacles for deep learning in
actions could hamper user experience [149], [150], so that biology and medicine,” J. Roy. Soc. Interface, vol. 15, no. 141, 2018,
online reinforcement learning is generally avoided. A promis- Art. no. 20170387.
ing approach is learning a reward estimator [151], [152] from [7] J. N. Kutz, “Deep learning in fluid dynamics,” J. Fluid Mechanics,
collected logs, which then provides unlimited feedback for vol. 814, pp. 1–4, 2017.
[8] M. Brundage et al., “Toward trustworthy AI development: Mech-
unseen instances that do not have any human judgments. anisms for supporting verifiable claims,” 2020, arXiv:2004.07213.
Another challenge is that human feedback is often intuitive [9] R. Roscher, B. Bohn, M. F. Duarte, and J. Garcke, “Explainable
and not formalized and thus difficult to incorporate into machine learning for scientific insights and discoveries,” 2019,
machine learning systems. Also human-gorunded evaluation arXiv:1905.08883.
[10] M. Diligenti, S. Roychowdhury, and M. Gori, “Integrating prior
is very costly, especially compared to functionally-grounded knowledge into deep learning,” in Proc. Int. Conf. Mach. Learn.
evaluation [153]. Therefore we suggest to further study repre- Appl. 2017, pp. 920–923.
sentation transformations to formalize intuitive knowledge, [11] J. Xu, Z. Zhang, T. Friedman, Y. Liang, and G. V. d. Broeck, “A
semantic loss function for deep learning with symbolic knowl-
e.g., from human feedback to logical rules. Furthermore, we edge,” arXiv:1711.11157.
found that improved interpretability still only is a minor goal [12] A. Karpatne, W. Watkins, J. Read, and V. Kumar, “Physics-
for knowledge integration (see Fig. 4). This, too, suggests guided neural networks (PGNN): An application in lake temper-
opportunities for future work. ature modeling,” 2017, arXiv:1710.11431.
[13] R. Stewart and S. Ermon, “Label-free supervision of neural net-
Even if these directions are motivated by specific works with physics and domain knowledge,” in Proc. Conf. Artif.
approaches, we think that they are generally relevant Intell., 2017, pp. 2576–2582.
and can advance the whole field of informed machine [14] P. Battaglia et al., “Interaction networks for learning about
objects, relations and physics,” in Proc. Int. Conf. Neural Inf. Pro-
learning. cess. Syst., 2016, pp. 4509–4517.
630 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

[15] K. Marino, R. Salakhutdinov, and A. Gupta, “The more you [42] S. Jeong, B. Solenthaler, M. Pollefeys, M. Gross et al., “Data-
know: Using knowledge graphs for image classification,” in Proc. driven fluid simulations using regression forests,” ACM Trans.
IEEE Conf. Comput. Vis. Pattern Recognit.. 2017, pp. 20–28. Graph., vol. 34, no. 6, 2015, Art. no. 199.
[16] C. Jiang, H. Xu, X. Liang, and L. Lin, “Hybrid knowledge routed [43] Y. Yang and P. Perdikaris, “Physics-informed deep generative
modules for large-scale object detection,” in Proc. Int. Conf. Neu- models,” 2018, arXiv:1812.03511.
ral Inf. Process. Syst., 2018, 1559–1570. [44] E. de Bezenac, A. Pajot, and P. Gallinari, “Deep learning for
[17] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can physical processes: Incorporating prior scientific knowledge,”
adapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, 2015. 2017, arXiv:1711.07970.
[18] K.-H. Lee, J. Li, A. Gaidon, and G. Ros, “Spigan: Privileged [45] I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural net-
adversarial learning from simulation,” in Proc. Int. Conf. Learn. works for solving ordinary and partial differential equations,”
Representations, 2019. IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 987–1000, Sep. 1998.
[19] J. Pfrommer, C. Zimmerling, J. Liu, L. K€arger, F. Henning, and [46] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris,
J. Beyerer, “Optimisation of manufacturing process parameters “Physics-constrained deep learning for high-dimensional surro-
using deep neural networks as surrogate models,” Procedia CIRP, gate modeling and uncertainty quantification without labeled
vol. 72, no. 1, pp. 426–431, 2018. data,” J. Comput. Phys., vol. 394, pp. 56–81, 2019.
[20] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics [47] D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first
informed deep learning (part i): Data-driven solutions of nonlin- principles approach to process modeling,” AIChE J., vol. 38,
ear partial differential equations,” 2017, arXiv:1711.10561. no. 10, pp. 1499–1511, 1992.
[21] M. Diligenti, M. Gori, and C. Sacca, “Semantic-based regulariza- [48] M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks:
tion for learning and inference,” Artif. Intell., vol. 244, pp. 143– Using physics as model prior for deep learning,” 2019,
165, 2017. arXiv:1907.04490.
[22] A. Karpatne et al., “Theory-guided data science: A new paradigm [49] F. D. A. Belbute-peres, K. R. Allen, K. A. Smith, and J. B. Tenen-
for scientific discovery from data,” Trans. Knowl. Data Eng., baum, “End-to-end differentiable physics for learning and con-
vol. 29, no. 10, pp. 2318–2331, 2017. trol,” in Proc. Neural Inf. Process. Syst., 2018, pp. 7178–7189.
[23] F. Lauer and G. Bloch, “Incorporating prior knowledge in sup- [50] J. Ling, A. Kurzawski, and J. Templeton, “Reynolds averaged
port vector machines for classification: A review,” Neurocomput- turbulence modelling using deep neural networks with embed-
ing, vol. 71, no. 7–9, pp. 1578–1594, 2008. ded invariance,” J. Fluid Mechanics, vol. 807, pp. 155 –166, 2016.
[24] P. W. Battaglia et al., “Relational inductive biases, deep learning, [51] A. Butter, G. Kasieczka, T. Plehn, and M. Russell, “Deep-learned
and graph networks,” 2018, arXiv:1806.01261. top tagging with a lorentz layer,” SciPost Phys, vol. 5, no. 28, 2018.
[25] M. Steup, “Epistemiology,” in The Stanford Encyclopedia of Philosophy [52] J.-L. Wu, H. Xiao, and E. Paterson, “Physics-informed machine
(Winter 2012 Edition), Edward N. Zalta (ed.), 2018. [Online]. Avail- learning approach for augmenting turbulence models: A com-
able: https://stanford.library.sydney.edu.au/archives/win2018/ prehensive framework,” Phys. Rev. Fluids, vol. 3, no. 7, 2018, Art.
entries/epistemology/. no. 074602.
[26] L. Zagzebski, What is Knowledge? Hoboken, NJ, USA: Wiley, 2017. [53] G. G. Towell and J. W. Shavlik, “Knowledge-based artificial neu-
[27] P. Machamer and M. Silberstein, The Blackwell Guide to the Philos- ral networks,” Artif. Intell., vol. 70, no. 1–2, pp. 119–165, 1994.
ophy of Science. Hoboken, NJ, USA: Wiley, 2008, vol. 19. [54] A. S. d. Garcez and G. Zaverucha, “The connectionist inductive
[28] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining learning and logic programming system,” Appl. Intell., vol. 11,
to knowledge discovery in databases,” AI Mag., vol. 17, no. 3, 1996. no. 1, pp. 59–77, 1999.
[29] D. Kahneman, Thinking, Fast and Slow. New York, NY, USA: Mac- [55] T. Ma and A. Zhang, “Multi-view factorization autoencoder with
millan, 2011. network constraints for multi-omic integrative analysis,” in Proc.
[30] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, Int. Conf. Bioinform. Biomed. 2018, pp. 702–707.
“Building machines that learn and think like people,” Behav. [56] Z. Che, D. Kale, W. Li, M. T. Bahadori, and Y. Liu, “Deep compu-
Brain Sci., vol. 40, 2017. tational phenotyping,” in Proc. Int. Conf. Knowl. Discov. Data Min-
[31] H. G. Gauch, Scientific Method in Practice. Cambridge, U.K.: Cam- ing, 2015, pp. 507–516.
bridge University Press, 2003. [57] M. B. Messaoud, P. Leray, and N. B. Amor, “Integrating ontologi-
[32] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning cal knowledge for iterative causal discovery and visualization,”
From Data. USA: AMLBook, 2012. in Proc. Eur. Conf. Symbolic Quantitative Approaches Reasoning
[33] N. Muralidhar, M. R. Islam, M. Marwah, A. Karpatne, and N. Ram- Uncertainty. Springer, 2009, pp. 168–179.
akrishnan, “Incorporating prior domain knowledge into deep neu- [58] G. Borboudakis and I. Tsamardinos, “Incorporating causal prior
ral networks,” in Proc. Int. Conf. Big Data, 2018, pp. 36–45. knowledge as path-constraints in Bayesian networks and maxi-
[34] Y. Lu, M. Rajora, P. Zou, and S. Liang, “Physics-embedded mal ancestral graphs,” 2021, arXiv:1206.6390.
machine learning: Case study with electrochemical micro- [59] T. Deist, A. Patti, Z. Wang, D. Krane, T. Sorenson, and D. Craft,
machining,” Machines, vol. 5, no. 1, 2017, Art. no. 4. “Simulation assisted machine learning.” Bioinformatics, vol. 35,
[35] R. Heese, M. Walczak, L. Morand, D. Helm, and M. Bortz, “The no. 20, pp. 4072–4080, 2019.
good, the bad and the ugly: Augmenting a black-box model with [60] H. S. Kim, M. Koc, and J. Ni, “A hybrid multi-fidelity approach
expert knowledge,” in Proc. Int. Conf. Artif. Neural Netw., 2019, to the optimal design of warm forming processes using a knowl-
pp. 391–395. edge-based artificial neural network,” Int. J. Mach. Tools Manuf.,
[36] G. M. Fung, O. L. Mangasarian, and J. W. Shavlik, “Knowledge- vol. 47, no. 2, pp. 211–222, 2007.
based support vector machine classifiers,” in Proc. 15th Int. Conf. [61] G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and G. Ceder,
Neural Inf. Process. Syst., 2003, pp. 537–544. “Finding nature’s missing ternary oxide compounds using
[37] O. L. Mangasarian and E. W. Wild, “Nonlinear knowledge- machine learning and density functional theory,” Chem. Mater.,
based classification,” IEEE Trans. Neural Netw., vol. 19, vol. 22, no. 12, pp. 3762–3767, 2010.
no. 10, pp. 1826–1832, Oct. 2008. [62] M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum, “A
[38] R. Ramamurthy, C. Bauckhage, R. Sifa, J. Sch€ ucker, and S. Wro- compositional object-based approach to learning physical
bel, “Leveraging domain knowledge for reinforcement learning dynamics,” 2016, arXiv:1612.00341.
using MMC architectures,” in Proc. Int. Conf. Artifi. Neural Netw., [63] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun,
2019, 595–607. “Gram: Graph-based attention model for healthcare representa-
[39] M. von Kurnatowski, J. Schmid, P. Link, R. Zache, L. Morand, tion learning,” in Proc. Int. Conf. Knowl. Discov. Data Mining,
T. Kraft, I. Schmidt, and A. Stoll, “Compensating data shortages 2017, pp. 787–795.
in manufacturing with monotonicity knowledge,” 2020, [64] A. Lerer, S. Gross, and R. Fergus, “Learning physical intuition of
arXiv:2010.15955. block towers by example,” 2016, arXiv:1603.01312.
[40] C. Bauckhage, C. Ojeda, J. Sch€ ucker, R. Sifa, and S. Wrobel, [65] A. Rai, R. Antonova, F. Meier, and C. G. Atkeson, “Using simula-
“Informed machine learning through functional composition.” tion to improve sample-efficiency of Bayesian optimization for
In LWDA, pp. 33–37, 2018. bipedal robots.” J. Machine Learn. Res., vol. 20, no. 49, pp. 1–24,
[41] R. King, O. Hennigh, A. Mohan, and M. Chertkov, “From deep to 2019.
physics-informed learning of turbulence: Diagnostics,” 2018, [66] Y. Du et al., “Learning to exploit stability for 3D scene parsing,”
arXiv:1810.07785. in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 1733–1743.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 631

[67] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. [92] G. Glavas and I. Vulic, “Explicit retrofitting of distributional
Webb, “Learning from simulated and unsupervised images word vectors,” in Proc. Assoc. Comput. Linguistics, 2018,
through adversarial training,” in Proc IEEE Conf. Comput. Vis. 34–45.
Pattern Recognit., 2017, pp. 2242–2251. [93] Y. Fang, K. Kuan, J. Lin, C. Tan, and V. Chandrasekhar, “Object
[68] F. Wang and Q.-J. Zhang, “Knowledge-based neural models for detection meets knowledge graphs,” Int. Joint Conf. Artif. Intell.
microwave design,” Trans. Microwave Theory Techn., vol. 45, 2017, pp. 1661–1667.
no. 12, pp. 2333–2343, Dec. 1997. [94] M. E. Peters et al., “Knowledge enhanced contextual word rep-
[69] S. J. Leary, A. Bhaskar, and A. J. Keane, “A knowledge-based resentations,” in Proc. Conf. Empirical Methods Natural Lang.
approach to response surface modelling in multifidelity opti- Process. (EMNLP), Int. Joint Conf. Nat. Lang. Process., 2019, pp.
mization,” J. Global Optim., vol. 26, no. 3, pp. 297–319, 2003. 43–54.
[70] T. S. Cohen and M. Welling, “Group equivariant convolutional [95] L. M. de Campos and J. G. Castellano, “Bayesian network learn-
networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 2990–2999. ing algorithms using structural restrictions,” Int. J. Approx. Rea-
[71] S. Dieleman, J. De Fauw, and K. Kavukcuoglu, “Exploiting cyclic soning, vol. 45, no. 2, pp. 233–254, 2007.
symmetry in convolutional neural networks,” 2016, [96] A. C. Constantinou, N. Fenton, and M. Neil, “Integrating expert
arXiv:1602.02660. knowledge with data in Bayesian networks: Preserving data-
[72] B. Sch€ olkopf, P. Simard, A. J. Smola, and V. Vapnik, “Prior driven expectations when the expert variables remain unob-
knowledge in support vector kernels,” in Proc. Conf. Adv. Neural served,” Expert Syst. Appl., vol. 56, pp. 197–208, 2016.
Inf. Process. Syst., 1998, 640–646. [97] J. Choo, C. Lee, C. K. Reddy, and H. Park, “Utopian: User-driven
[73] B. Yet, Z. B. Perkins, T. E. Rasmussen, N. R. Tai, and D. W. R. topic modeling based on interactive nonnegative matrix
Marsh, “Combining data and meta-analysis to build Bayesian factorization,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12,
networks for clinical decision support,” J. Biomed. Inform., vol. 52, pp. 1992–2001, Dec. 2013.
pp. 373–385, 2014. [98] W. B. Knox and P. Stone, “Interactively shaping agents via
[74] J. Li, Z. Yang, H. Liu, and D. Cai, “Deep rotation equivariant human reinforcement: The tamer framework,” in Proc. Int. Conf.
network,” Neurocomputing, vol. 290, pp. 26–33, 2018. Knowl. Capture (K-CAP)2009, pp. 9–16.
[75] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Bros- [99] R. Kaplan, C. Sauer, and A. Sosa, “Beating atari with natural lan-
tow, “Harmonic networks: Deep translation and rotation equiv- guage guided reinforcement learning,” 2017, arXiv:1704.05539.
ariance,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2017, [100] D. Heckerman, D. Geiger, and D. M. Chickering, “Learning
pp. 5028–5037. Bayesian networks: The combination of knowledge and statisti-
[76] P. Niyogi, F. Girosi, and T. Poggio, “Incorporating prior informa- cal data,” Mach. Learn., vol. 20, no. 3, pp. 197–243, 1995.
tion in machine learning by creating virtual examples,” Proc. [101] M. Richardson and P. Domingos, “Learning with knowledge
IEEE, vol. 86, no. 11, pp. 2196–2209, Nov. 1998. from multiple experts,” in Proc. Int. Conf. Mach. Learn., 2003,
[77] M. Schiegg, M. Neumann, and K. Kersting, “Markov logic pp. 624–631.
mixtures of gaussian processes: Towards machines reading [102] A. Feelders and L. C. Van der Gaag, “Learning Bayesian network
regression data,” in Proc. Artif. Intell. Statist., 2012, pp. 1002– parameters under order constraints,” Int. J. Approx. Reasoning,
1011. vol. 42, no. 1–2, 2006, pp. 37–53.
[78] M. Sachan, K. A. Dubey, T. M. Mitchell, D. Roth, and E. P. Xing, [103] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and
“Learning pipelines with limited data and domain knowledge: A D. Amodei, “Deep reinforcement learning from human prefer-
study in parsing physics problems,” in Neural Inf. Process. Syst., ences,” in Proc. Neural Inf. Process. Syst., 2017, pp. 4302–4310.
2018, pp. 140–151. [104] T. Hester et al., “Deep q-learning from demonstrations,” in Proc.
[79] H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu, Conf. Artif. Intell., 2018, PP. 3223–3230.
“Commonsense knowledge aware conversation generation with [105] E. T. Brown, J. Liu, C. E. Brodley, and R. Chang, “Dis-function:
graph attention,” in Proc. Int. Joint Conf. Artif. Intell., 2018, Learning distance functions interactively,” in Proc. Conf. Visual
pp. 4623–4629. Analytics Sci. Technol., 2012, pp. 83–92.
[80] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision [106] B. Yet, Z. Perkins, N. Fenton, N. Tai, and W. Marsh, “Not just
for relation extraction without labeled data,” in Proc. Assoc. Com- data: A method for improving prediction with knowledge,” J.
put. Linguistics, Int. Joint Conf. Natural Lang. Process., 2009, Biomed. Inf., vol. 48, 2014, pp. 28–37.
pp. 1003–1011. [107] J. A. Fails and D. R. Olsen Jr, “Interactive machine learning,” in
[81] X. Liang, Z. Hu, H. Zhang, L. Lin, and E. P. Xing, “Symbolic Proc. Int. Conf. Intell. User Interfaces, 2003, pp. 39–45.
graph reasoning meets convolutions,” in Proc. Int. Conf. Neural [108] L. Rieger, C. Singh, W. J. Murdoch, and B. Yu, “Interpretations
Inf. Process. Syst., 2018, pp. 1858–1868. are useful: Penalizing explanations to align neural networks
[82] D. L. Bergman, “Symmetry constrained machine learning,” in with prior knowledge,” 2019, arXiv:1909.13584.
Proc. SAI Intelligent Systems Conf. 2019, pp 501–512. [109] P. Schramowski et al., “Right for the wrong scientific reasons:
[83] M.-W. Chang, L. Ratinov, and D. Roth, “Guiding semi-supervi- Revising deep networks by interacting with their explanations,”
sion with constraint-driven learning,” in Association for Computat. 2020, arXiv:2001.05371.
Linguistics, 2007, pp. 280–287. [110] L. von Rueden, T. Wirtz, F. Hueger, J. D. Schneider, N. Piatkow-
[84] Z. Hu, Z. Yang, R. Salakhutdinov, and E. Xing, “Deep neural net- ski, and C. Bauckhage, “Street-map based validation of semantic
works with massive learned knowledge,” in Proc. Conf. Empirical segmentation in autonomous driving,” in Proc. Int. Conf. Pattern
Methods Natural Lang. Process., 2016, pp. 1670–1679. Recognit., 2021, pp. 10203–10210.
[85] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, “Harnessing deep [111] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van-
neural networks with logic rules,” 2016, arXiv:1603.06318. dergheynst, “Geometric deep learning: Going beyond euclid-
[86] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “Ernie: ean data,” IEEE Signal Process. Mag., vol. 34, no. 4, 2017, pp.
Enhanced language representation with informative entities,” 18–42.
2019, arXiv:1905.07129. [112] M.-W. Chang, L. Ratinov, and D. Roth, “Structured learning with
[87] N. Mrk si et al., “Counter-fitting word vectors to linguistic con- constrained conditional models,” Mach. Learn., vol. 88, no. 3,
straints,” 2016, arXiv:1603.00892. 2012, pp. 399–431.
[88] J. Bian, B. Gao, and T.-Y. Liu, “Knowledge-powered deep learn- [113] D. Sridhar, J. Foulds, B. Huang, L. Getoor, and M. Walker, “Joint
ing for word embedding,” in Proc. Joint Eur. Conf. Mach. Learn. models of disagreement and stance in online debate,” in Proc.
Knowl. Discov. Databases, 2014, pp. 132–148. Assoc. Comput. Linguistics Int. Joint Conf. Nat. Lang. Process., 2015,
[89] M. V. França, G. Zaverucha, and A. S. d. Garcez, “Fast relational pp. 116–125.
learning using bottom clause propositionalization with artificial [114] A. S. d. Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and
neural networks,” Mach. Learn., vol. 94, no. 1, pp. 81–104, 2014. S. N. Tran, “Neural-symbolic computing: An effective methodol-
[90] M. Richardson and P. Domingos, “Markov logic networks,” ogy for principled integration of machine learning and reason-
Mach. Learn., vol. 62, no. 1–2, pp. 107–136, 2006. ing,” 2019, arXiv:1905.06088.
[91] A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor, “A [115] L. D. Raedt, K. Kersting, and S. Natarajan, Statistical Relational
short introduction to probabilistic soft logic,” in Proc. NIPS Work- Artificial Intelligence: Logic, Probability, and Computation. San
shop Probabilistic Program: Found. Appl., 2012, pp. 1–4. Rafael, CA, USA: Morgan & Claypool, 2016.
632 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023

[116] R. Speer and C. Havasi, “Conceptnet 5: A large semantic network [144] S. H. Bach, M. Broecheler, B. Huang, and L. Getoor, “Hinge-loss
for relational knowledge,” in Proc. People’s Web Meets NLP, 2013, Markov random fields and probabilistic soft logic,” 2015,
pp. 161–176. arXiv:1505.04406.
[117] G. A. Miller, “WordNet: A lexical database for english,” Commu- [145] V. Embar, D. Sridhar, G. Farnadi, and L. Getoor, “Scalable struc-
nications ACM, vol. 38, no. 11, 1995, pp. 39–41. ture learning for probabilistic soft logic,” 2018, arXiv:1807.00973.
[118] T. Mitchell et al., “Never-ending learning,” Communications ACM, [146] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational infer-
vol. 61, no. 5, 2018, pp. 103–115. ence: A review for statisticians,” J. Amer. Stat. Assoc., vol. 112,
[119] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- no. 518, 2017, pp. 859–877.
training of deep bidirectional transformers for language under- [147] D. P. Kingma and M. Welling, “An introduction to variational
standing,” 2018, arXiv:1810.04805. autoencoders,” 2019, arXiv:1906.02691.
[120] Z.-X. Ye and Z.-H. Ling, “Distant supervision relation extraction [148] J. Pearl, Causality. Cambridge, U.K.: Cambridge Univ. Press, 2009.
with intra-bag and inter-bag attentions,” 2019, arXiv:1904.00143. [149] J. Kreutzer, S. Riezler, and C. Lawrence, “Learning from human
[121] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estima- feedback: Challenges for real-world reinforcement learning in
tion of word representations in vector space,” 2013, NLP,” 2020, arXiv:2011.02511.
arXiv:1301.3781. [150] G. Dulac-Arnold, D. Mankowitz, and T. Hester, “Challenges of
[122] M. S. Massa, M. Chiogna, and C. Romualdi, “Gene set analysis real-world reinforcement learning,” 2019, arXiv:1904.12901.
exploiting the topology of a pathway,” BMC Syst. Biol., vol. 4, [151] J. Kreutzer, J. Uyheng, and S. Riezler, “Reliability and learnabil-
no. 1, 2010, Art. no. 121. ity of human bandit feedback for sequence-to-sequence rein-
[123] N. Angelopoulos and J. Cussens, “Bayesian learning of Bayesian forcement learning,” in Proc. Assoc. Computat. Linguistics, 2018,
networks with informative priors,” Ann. Math. Artif. Intell., pp. 1777–1788.
vol. 54, no. 1–3, 2008, pp. 53–98. [152] Y. Gao, C. M. Meyer, and I. Gurevych, “April: Interactively learn-
[124] N. Piatkowski, S. Lee, and K. Morik, “Spatio-temporal random ing to summarise by combining active preference learning and
fields: Compressible representation and distributed estimation,” reinforcement learning,” in Proc. Conf. Empirical Methods Natural
Mach. Learn., vol. 93, no. 1, 2013, pp. 115–139. Lang. Process., 2018, pp. 4120–4130.
[125] R. Fischer, N. Piatkowski, C. Pelletier, G. I. Webb, F. Petitjean, [153] F. Doshi-Velez and B. Kim, “Towards a rigorous science of inter-
and K. Morik, “No cloud on the horizon: Probabilistic gap filling pretable machine learning,” 2017, arXiv:1702.08608.
in satellite image series,” in Proc. Int. Conf. Data Sci. Adv. Anal.,
2020, pp. 546–555. Laura von Rueden received the BSc degree in
[126] B. Settles, “Active learning literature survey,” Comput. Sci., Univ. physics and the MSc degree in simulation sciences
Wisconsin–Madison, Madison, WI, USA, Tech. Rep. 1648, 2009. in 2015 from RWTH Aachen University. She was a
[127] D. Keim, G. Andrienko, J.-D. Fekete, C. G€ org, J. Kohlhammer, data scientist with Capgemini. Since 2018, she has
and G. Melançon, “Visual analytics: Definition, process, and been a research scientist with Fraunhofer IAIS.
challenges,” in Proc. Inf. Visual., 2008, pp. 154–175. She is currently working toward the PhD degree in
[128] M. L. Minsky, “Logical versus analogical or symbolic versus connec- computer science with the Universita €t Bonn. Her
tionist or neat versus scruffy,” AI Mag., vol. 12, no. 2, pp. 34–51, 1991. research interests include machine learning and
[129] E. Kalnay, Atmospheric Modeling, Data Assimilation and Predictabil- especially the combination of data and knowledge-
ity. Cambridge, U.K: Cambridge University Press, 2003. based modeling.
[130] S. Reich and C. Cotter, Probabilistic Forecasting and Bayesian Data
Assimilation. Cambridge, U.K: Cambridge Univ. Press, 2015.
[131] M. Janner, J. Wu, T. D. Kulkarni, I. Yildirim, and J. B. Tenen- Sebastian Mayer received the diploma degree
baum, “Self-supervised intrinsic image decomposition,” in Proc. inmathematics from TU Darmstadt, in 2011, and
Neural Inf. Process. Syst., 2017, pp. 5938–5948. the PhD degree in mathematics from University
[132] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a Bonn, in 2018. Since 2017, he has been a
few examples: A survey on few-shot learning,” ACM Comput. research scientist with Fraunhofer SCAI. His
Surveys, vol. 53, no. 3, 2020, Art no. 63. research interests include machine learning and
[133] F. Cucker and D. X. Zhou, Learning Theory: An Approximation The- biologically-inspired algorithms in the context of
ory Viewpoint. Cambridge, U.K.: Cambridge Univ. Press, 2007. cyberphysical systems.
[134] I. Steinwart and A. Christmann, Support Vector Machines. Ger-
many: Springer, 2008.
[135] F. Cucker and S. Smale, “Best choices for regularization parame-
ters in learning theory: On the bias-variance problem,” Proc.
Found. Comput. Math., vol. 2, no. 4, 2002, pp. 413–428.
[136] L. Lapidus and G. F. Pinder, Numerical Solution of Partial Differen- Katharina Beckh received the MSc degree in
tial Equations in Science and Engineering. Hoboken, NJ, USA: human-computer interaction from the Julius Maxi-
Wiley, 2011. milian University of Wuerzburg in 2019. Since
[137] M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance 2019, she has been a research scientist with
change in outdoor robotics with adversarial domain adaptation,” Fraunhofer IAIS. Her research interests include
in Proc. Int. Conf. Intell. Robots Syst., 2017, pp. 1551–1558. interactive machine learning, human oriented
[138] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim- modeling, and text mining with a primary focus in
to-real transfer of robotic control with dynamics randomization,” the medical domain.
in Proc. IEEE Int. Conf. Robot. Automat., 2018, pp. 3803–03810.
[139] L. von Rueden, S. Mayer, R. Sifa, C. Bauckhage, and J. Garcke,
“Combining machine learning and simulation to a hybrid model-
ling approach: Current and future directions,” in Proc. Int. Symp.
Intell. Data Anal., 2020, pp. 548–560.
[140] K. McGarry, S. Wermter, and J. MacIntyre, “Hybrid neural sys- Bogdan Georgiev received the PhD degree in
tems: From simple coupling to fully integrated neural networks,” mathematics from Max-Planck-Institute and
Neural Comput. Surv., vol. 2, no. 1, 1999, pp. 62–93. Bonn University in 2018. Since 2018, he has
[141] R. Sun, “Connectionist implementationalism and hybrid systems,” been a research scientist with Fraunhofer IAIS.
Encyclopedia of Cognitive Science. Hoboken, NJ, USA: Wiley, 2006. His current research interests include aspects of
[142] A. S. d. Garcez and L. C. Lamb, “Neurosymbolic AI: The 3rd learning theory such as generalization or com-
wave,” 2020, arXiv:2012.05876. pression bounds, geometric learning, and quan-
[143] T. Dong et al. “Imposing category trees onto word-embeddings tum computing.
using a geometric construction,” in Proc. Int. Conf. Learn. Repre-
sentations, 2018.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 633

Sven Giesselbach received the MSc degree in Rajkumar Ramamurthy received the MSc
computer science from the University of Bonn in degree in media informatics from RWTH Aachen
2012. Since 2015, he has been a data scientist University in 2016. Since 2018, he has been a
with Fraunhofer IAIS and is also lead of the team data scientist with Fraunhofer IAIS. He is cur-
natural language understanding with the depart- rently working toward the Phd degree with the
ment knowledge discovery. His research interest University of Bonn. His research interests include
includes the use of external knowledge in natural reinforcement learning and natural language
language processing. processing.

Raoul Heese received the diploma and PhD Jochen Garcke received the diploma and PhD
degrees from the Institute of Quantum Physics, degrees in mathematics from the Universita €t
Ulm University, Germany, in 2012 and 2016, Bonn, in 1999 and 2004, respectively. From 2004
respectively. He is currently a research scientist to 2006, he was a postdoctoral fellow with the
with Fraunhofer ITWM, Kaiserslautern, Germany. Australian National University. He was a postdoc-
His research interests include informed learning, toral researcher from 2006 to 2008 and a Junior
supervised learning, and their application to real- Research Group leader from 2008 to 2011, with
world problems. the Technical University Berlin. Since 2011, he
has been professor of numerics with the Univer-
sity of Bonn and department head with
Fraunhofer SCAI, Sankt Augustin. His research
interests include machine learning, scientific computing, reinforcement
learning, and highdimensional approximation. He is currently a member
Birgit Kirsch received the MSc degree in busi- of DMV, GAMM, and SIAM. He is currently a reviewer for the IEEE
ness informatics from Hochschule Trier in 2017. Transactions on Industrial Informatics, the IEEE Transactions on Neural
Since 2017, she has been a research scientist Networks, and the IEEE Transactions on Pattern Analysis and Machine
with Fraunhofer IAIS. Her research interests Intelligence.
include natural language processing and statisti-
cal relational learning.
Christian Bauckhage (Member, IEEE) received
the MSc and PhD degrees in computer science
from Bielefeld University, in 1998 and 2002,
respectively. Since 2008, he has been a professor
of computer science with the University of Bonn
Michal Walczak received the PhD degree in phys- and lead scientist for machine learning with
ics from the Georg-August University of Goettin- Fraunhofer IAIS. He was with the Centre for
Vision Research, Toronto, Canada, and a senior
gen, Germany, in 2014. Since 2016, he has been a
research scientist with Deutsche Telekom Labo-
research scientist with Fraunhofer ITWM, Kaiser-
slautern, Germany. His research interests include ratories, Berlin. His research interests include
machine learning, decision support, multicriteria theory and practice of learning systems and next
optimization, and their application to radiotherapy generation computing. He is currently reviewer for the IEEE Transac-
planning and process engineering. tions on Neural Networks and Learning Systems, the IEEE Transactions
on Pattern Analysis and Machine Intelligence, and the IEEE Transac-
tions on Games. He is currently an associate editor for the IEEE Trans-
actions on Games.

Julius Pfrommer received the PhD degree in Jannis Schuecker received the doctoral degree
computer science from the Karlsruhe Institute of in physics from the RWTH Aachen University.
Technology in 2019. Since 2018, he has been the Until 2019, he was a research scientist with
head of a research group with Fraunhofer IOSB. Fraunhofer IAIS. His research interests include
His research interests include distributed sys- machine learning in particular, time series model-
tems, planning under uncertainty, and optimiza- ing using neural networks, and interpretable
tion theory with its many applications for machine machine learning.
learning and optimal control.

" For more information on this or any other computing topic,


Annika Pick received the MSc. degree in com- please visit our Digital Library at www.computer.org/csdl.
puter science from the University of Bonn in
2018. Since 2019, she has been a data scientist
with Fraunhofer IAIS. Her research interests
include learning from healthcare data and pattern
mining.

You might also like