Informed Machine Learning A Taxonomy and Survey of Integrating Prior Knowledge Into Learning Systems
Informed Machine Learning A Taxonomy and Survey of Integrating Prior Knowledge Into Learning Systems
1, JANUARY 2023
Abstract—Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is
the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we
present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine
learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves
as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its
integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge
representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous
papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.
Index Terms—Machine learning, prior knowledge, expert knowledge, informed, hybrid, neuro-symbolic, survey, taxonomy
Fig. 1. Information flow in informed machine learning. The informed machine learning pipeline requires a hybrid information source with two compo-
nents: Data and prior knowledge. In conventional machine learning knowledge is used for data preprocessing and feature engineering, but this pro-
cess is deeply intertwined with the learning pipeline (*). In contrast, in informed machine learning prior knowledge comes from an independent
source, is given by formal representations (e.g., by knowledge graphs, simulation results, or logic rules), and is explicitly integrated.
more approachable. In this regard, we refer to recent a survey we classified the approaches in terms of our applied survey-
on graph neural networks and a research direction framed as ing methodology and our obtained key insights. Section 4
relational inductive bias [24]. Our work complements the presents the taxonomy and its elements that we distilled
aforementioned surveys by providing a systematic categori- from surveying a large number of research papers. In Sec-
zation of knowledge representations that are integrated into tion 5, we describe the approaches for the integration of
machine learning. We provide a structured overview based knowledge into machine learning classified according to the
on a survey of a large number of research papers on how to taxonomy in more detail. After brief historical account in
integrate additional, prior knowledge into the machine learn- Section 6, we finally discuss future directions in Section 7
ing pipeline. As an umbrella term for such methods, we and conclude in Section 8.
henceforth use informed machine learning.
Our contributions are threefold: We propose an abstract
concept for informed machine learning that clarifies its build- 2 CONCEPT OF INFORMED MACHINE LEARNING
ing blocks and relation to conventional machine learning. It In this section, we present our concept of informed machine
states that informed learning uses a hybrid information source learning. We first state our notion of knowledge and then pres-
that consists of data and prior knowledge, which comes from ent our descriptive definition of its integration into machine
an independent source and is given by formal representations. learning.
Our main contribution is the introduction of a taxonomy that
classifies informed machine learning approaches, which is
novel and the first of its kind. It contains the dimensions of 2.1 Knowledge
the knowledge source, its representation, and its integra- The meaning of knowledge is difficult to define in general
tion into the machine learning pipeline. We put a special and is an ongoing debate in philosophy [25], [26], [27].
emphasis on categorizing various knowledge representa- During the generation of knowledge, it first appears as
tions, since this may enable practitioners to incorporate useful information [28], which is subsequently validated.
their domain knowledge into machine learning processes. People validate information about the world using the
Moreover, we present a description of available app- brain’s inner statistical processing capabilities [29], [30] or
roaches and explain how different knowledge representa- by consulting trusted authorities. Explicit forms of valida-
tions, e.g., algebraic equations, logic rules, or simulation tion are given by empirical studies or scientific experi-
results, can be used in informed machine learning. ments [27], [31].
Our goal is to equip potential new users of informed Here, we assume a computer-scientific perspective and
machine learning with established and successful meth- understand knowledge as validated information about
ods. As we intend to survey a broad spectrum of methods relations between entities in certain contexts. Regarding
in this field, we cannot describe all methodical details and its use in machine learning, an important aspect of knowl-
we do not claim to have covered all available research edge is its formalization. The degree of formalization
papers. We rather aim to analyze and describe common depends on whether knowledge has been put into writing,
grounds as well as the diversity of approaches in order to how structured the writing is, and how formal and strict
identify the main research directions in informed machine the language is that was used (e.g., natural language ver-
learning. sus mathematical formula). The more formally knowledge
In Section 2, we begin with a formulation of our concept is represented, the more easily it can be integrated into
for informed machine learning. In Section 3, we describe how machine learning.
616 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
2.2 Integrating Prior Knowledge into Machine similarities or differences, and to offer guidelines for users
Learning and researchers. In this section, we describe our classification
Apart from the usual information source in a machine learn- methodology and summarize our key insights.
ing pipeline, the training data, one can additionally integrate
knowledge. If this knowledge is pre-existent and independent 3.1 Methodology
of learning algorithms, it can be called prior knowledge. The methodology of our classification is determined by spe-
Moreover, such prior knowledge can be given by formal rep- cific analysis questions which we investigated in a system-
resentations, which exist in an external, separated way from atic literature survey.
the learning problem and the usual training data. Machine
learning that explicitly integrates such knowledge representa- 3.1.1 Analysis Questions
tions will henceforth be called informed machine learning.
Our guiding question is how prior knowledge can be inte-
Definition Informed machine learning describes learning grated into the machine learning pipeline. Our answers will
from a hybrid information source that consists of data and prior particularly focus on three aspects: Since prior knowledge
knowledge. The prior knowledge comes from an independent in informed machine learning consists of an independent
source, is given by formal representations, and is explicitly inte- source and requires some form of explicit representations,
grated into the machine learning pipeline. we consider knowledge sources and representations. Since
it also is essential at which component of the machine learn-
This notion of informed machine learning thus describes
ing pipeline what kind of knowledge is integrated, we also
the flow of information in Fig. 1 and is distinct from conven-
consider integration methods. In short, our literature survey
tional machine learning.
addresses the following three questions:
Fig. 2. Taxonomy of informed machine learning. This taxonomy serves as a classification framework for informed machine learning and structures
approaches according to the three above analysis questions about the knowledge source, knowledge representation and knowledge integration.
Based on a comparative and iterative literature survey, we identified for each dimension a set of elements that represent a spectrum of different
approaches. The size of the elements reflects the relative count of papers. We combine the taxonomy with a Sankey diagram in which the paths con-
nect the elements across the three dimensions and illustrate the approaches that we found in the analyzed papers. The broader the path, the more
papers we found for that approach. Main paths (at least four or more papers with the same approach across all dimensions) are highlighted in darker
grey and represent central approaches of informed machine learning.
With respect to knowledge sources, we found three broad constitute an abstract interface that connects the applica-
categories: Rather specialized and formalized scientific knowl- tion- and the method-oriented side.
edge, everyday life’s world knowledge, and more intuitive
expert knowledge. For scientific knowledge we found the
most informed machine learning papers. With respect to 3.2.2 Frequent Approaches
knowledge representations, we found versatile and fine- The taxonomy serves as a classification framework and
grained approaches and distilled eight categories (Algebraic allows us to identify frequent approaches of informed
equations, differential equations, simulation results, spatial machine learning. In our literature survey, we categorized
invariances, logic rules, knowledge graphs, probabilistic rela- each research paper with respect to each of the three taxon-
tions and human feedback). Regarding knowledge integration, omy dimensions.
we found approaches for all stages of the machine learning Paths Through the Taxonomy. When visually highlighting
pipeline, from the training data and the hypothesis set, over and connecting them, a specific combination of entries
the learning algorithm, to the final hypothesis. However, most across the taxonomy dimensions figuratively results in a
informed machine learning papers consider the two central path through the taxonomy. Such paths represent specific
stages. approaches towards informed learning and we illustrate
Depending on the perspective, the taxonomy can be this by combining the taxonomy with a Sankey diagram, as
regarded from either one of two sides: An application-ori- shown in Fig. 2. We observe that, while various paths
ented user might prefer to read the taxonomy from left to through the taxonomy are possible, specific ones occur
right, starting with some given knowledge source and then more frequently and we will call them main paths. For
selecting representation and integration. Vice versa, a example, we often observed the approach that scientific
method-oriented developer or researcher might prefer to knowledge is represented in algebraic equations, which are
read the taxonomy from right to left, starting with some then integrated into the learning algorithm, e.g., the loss
given integration method. For both perspectives, knowl- function. As another example, we often found that world
edge representations are important building blocks and knowledge such as linguistics is represented by logic rules,
618 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
Fig. 3. Knowledge representations and learning tasks. Fig. 4. Knowledge integration and its goals.
which are then integrated into the hypothesis set, e.g., the conformity. Although these goals are interrelated or even par-
network architecture. These paths, especially the main tially equivalent according to statistical learning theory, it is inter-
paths, can be used as a guideline for users new to the field esting to examine them as different motivations for the chosen
or provide a set of baseline methods for researchers. approach. The distribution of goals for the distinct integration
Paths From Source to Representation. We found that the paths types is shown in Fig. 4. We observe that the main goal always is
from source to representation form groups. That is, for every to achieve better performance. The integration of prior knowl-
knowledge source there appear prevalent representation edge into the training data stands out, because its main goal is to
types. Scientific knowledge is mainly represented in terms of train with less data. The integration into the final hypothesis is
algebraic or differential equations or exist in form of simula- also special, because it is mainly used to ensure knowledge con-
tion results. While other forms of representation are possible, formity for secure and trustworthy AI. All in all, this distribution
too, there is a clear preference for equations or simulations, suggests suitable integration approaches depending on the goal.
likely because most sciences aim at finding natural laws
encoded in formulas. For world knowledge, the representation
4 TAXONOMY
forms of logic rules, knowledge graphs, or spatial invariances
are the primary ones. These can be understood as a group of In this section, we describe the informed machine learning tax-
symbolic representations. Expert knowledge is mainly repre- onomy that we distilled as a classification framework in our
sented by probabilistic relations or human feedback. This is literature survey. For each of the three taxonomy dimen-
appears reasonable because such representations allow for sions knowledge source, knowledge representation and knowl-
informality as well as for a degree of uncertainty, both of which edge integration we describe the found elements, as shown in
might be useful for representing intuition. We also performed Fig. 2. While an extensive approach categorization accord-
an additional analysis on the dependency of the learning task ing to this taxonomy with further concrete examples will be
and found a confirmation of the above described representa- presented in the next section (Section 5), we here describe
tion groups as shown in Fig. 3. the taxonomy on a more conceptual level.
From a theoretical point of view, transformations between
representations are possible and indeed often apparent within 4.1 Knowledge Source
the aforementioned groups. For example, equations can be The category knowledge source refers to the origin of prior
transformed to simulation results, or logic rules can be repre- knowledge to be integrated in machine learning. We observe
sented as knowledge graphs and vice versa. Nevertheless, that the source of prior knowledge can be an established
from a practical point of view, differentiating between forms knowledge domain but also knowledge from an individual
of representations appears useful as specific representations group of people with respective experience.
might already be available in a given set up. We find that prior knowledge often stems from the sciences
Paths from Representation to Integration. For most of the repre- or is a form of world or expert knowledge, as illustrated on the
sentation types we found at least one main path to an integra- left in Fig. 2. This list is neither complete nor disjoint but
tion type. The following mappings can be observed. intended show a spectrum from more formal to less formal, or
Simulation results are very often integrated into the training explicitly to implicitly validated knowledge. Although partic-
data. Knowledge graphs, spatial invariances, and logic rules ular knowledge can be assigned to more than one of these
are frequently incorporated into the hypothesis set. The learn- sources, the goal of this categorization is to identify paths in
ing algorithm is mainly enhanced by algebraic or differential our taxonomy that describe frequent approaches of knowl-
equations, logic rules, probabilistic relations, or human feed- edge integration into machine learning. In the following we
back. Lastly, the final hypothesis is often checked by knowl- shortly describe each of the knowledge sources.
edge graphs or also by simulation results. However, since we Scientific Knowledge. We subsume the subjects of science,
observed various possible types of integration for all represen- technology, engineering, and mathematics under scientific
tation types, the integration still appears to be problem specific. knowledge. Such knowledge is typically formalized and vali-
Hence, we additionally analyzed the literature for the goal of dated explicitly through scientific experiments. Examples are
the prior knowledge integration and found four main goals: the universal laws of physics, bio-molecular descriptions of
Data efficiency, accuracy, interpretability, or knowledge genetic sequences, or material-forming production processes.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 619
TABLE 1
Illustrative Overview of Knowledge Representations in the Informed Machine Learning Taxonomy
Each representation type is illustrated by a simple or prominent example in order to give a first intuitive understanding.
World Knowledge. By world knowledge we refer to facts stating that nothing can travel faster than the speed of light in
from everyday life that are known to almost everyone and vacuum.
can thus also be called general knowledge. It can be more or Differential Equations. Differential equations are a sub-
less formal. Generally, it can be intuitive and validated set of algebraic equations, which describe relations between
implicitly by humans reasoning in the world surrounding functions and their spatial or temporal derivatives. Two
them. Therefore, world knowledge often describes relations famous examples in Table 1 are the heat equation, which is a
of objects or concepts appearing in the world perceived by partial differential equation (PDE), and Newton’s second
humans, for instance, the fact that a bird has feathers and law, which is an ordinary differential equation (ODE). In
can fly. Moreover, by world knowledge we also subsume both cases, there exists a (possibly empty) set of functions
linguistics. Such knowledge can also be explicitly validated that solve the differential equation for given initial or bound-
through empirical studies. Examples are the syntax and ary conditions. Differential equations are often the basis of a
semantics of language. numerical computer simulation. We distinguish the taxon-
Expert Knowledge. We consider expert knowledge to be omy categories of differential equations and simulation
knowledge that is held by a particular group of experts. results in the sense that the former represents a compact
Within the expert’s community it can also be called com- mathematical model while the latter represents unfolded,
mon knowledge. Such knowledge is rather informal and data-based computation results.
needs to be formalized, e.g., with human-machine interfa- Simulation Results. Simulation results describe the
ces. It is also validated implicitly through a group of experi- numerical outcome of a computer simulation, which is
enced specialists. In the context of cognitive science, this an approximate imitation of the behavior of a real-world
expert knowledge can also become intuitive [29]. For exam- process. A simulation engine typically solves a mathe-
ple, an engineer or a physician acquires knowledge over matical model using numerical methods and produces
several years of experience working in a specific field. results for situation-specific parameters. Its numerical
outcome is the simulation result that we describe here as
the final knowledge representation. Examples are the
4.2 Knowledge Representation flow field of a simulated fluid or pictures of simulated
The category knowledge representation describes how knowl- traffic scenes.
edge is formally represented. With respect to the flow of Spatial Invariances. Spatial invariances describe proper-
information in informed machine learning in Fig. 1, it ties that do not change under mathematical transformations
directly corresponds to our key element of prior knowledge. such as translations and rotations. If a geometric object is
This category constitutes the central building block of our invariant under such transformations, it has a symmetry
taxonomy, because it determines the potential interface to (for example, a rotationally symmetric triangle). A function
the machine learning pipeline. can be called invariant, if it has the same result for a sym-
In our literature survey, we frequently encountered certain metric transformation of its argument. Connected to invari-
representation types, as listed in the taxonomy in Fig. 2 and ance is the property of equivariance.
illustrated more concretely in Table 1. Our goal is to provide a Logic Rules. Logic provides a way of formalizing knowl-
classification framework of informed machine learning edge about facts and dependencies and allows for translat-
approaches including the used knowledge representation ing ordinary language statements (e.g., IF A THEN B) into
types. Although some types can be mathematically trans- formal logic rules (A ) B). Generally, a logic rule consists
formed into each other, we keep the representation that are of a set of Boolean expressions (A, B) combined with logical
closest to those in the reviewed literature. Here we give a first connectives (^, _, ) , . . .). Logic rules can be also called
conceptual overview over these types. logic constraints or logic sentences.
Algebraic Equations. Algebraic equations represent Knowledge Graphs. A graph is a pair ðV; EÞ, where V
knowledge as equality or inequality relations between mathe- are its vertices and E denotes edges. In a knowledge graph,
matical expressions consisting of variables or constants. Equa- vertices (or nodes) usually describe concepts whereas edges
tions can be used to describe general functions or to constrain represent (abstract) relations between them (as in the exam-
variables to a feasible set and are thus sometimes also called ple “Man wears shirt” in Table 1). In an ordinary weighted
algebraic constraints. Prominent examples in Table 1 are the graph, edges quantify the strength and the sign of a rela-
equation for the mass-energy equivalence and the inequality tionship between nodes.
620 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
TABLE 2
References Classified by Knowledge Representation and (Path From) Knowledge Source
TABLE 3
References Classified by Knowledge Representation and (Path to) Knowledge Integration
simulations. These are simplified simulations that approxi- hypothesis set of a machine learning model. Specifically, simu-
mate the overall behaviour of a system but ignore intricate lations can validate results of a trained model [19], [61], [66].
details for the sake of computing speed.
When building a machine learning model that reflects the
actual, detailed behaviour of a system, low-fidelity simulation 5.4 Spatial Invariances
results or a response surface (a data-driven model of the simu- Next, we describe informed machine learning approaches
lation results) can be build into the architecture of a knowl- involving the representation type of spatial invariances.
edge-based neural network (KBANN [53], see Insert 3), e.g. by Their main path comes from world knowledge and goes to
replacing one or more neurons. This way, parts of the network the hypothesis set.
can be used to learn a mapping from low-fidelity simulation
results to a few real-world observations or high-fidelity simu-
lations [60], [69].
Learning Algorithm. Furthermore, a simulation can
directly be integrated into iterations of a a learning algorithm.
For example, a realistic positioning of objects in a 3D scene can
be improved by incorporating feedback from a solid-body sim-
ulation into learning [66]. By means of reinforcement learning,
this is even feasible if there are no gradients available from the
simulation. 5.4.1 (Paths From) Knowledge Source
Final Hypothesis. A last but important approach that we We mainly found references using spatial invariances in the
found in our survey integrates simulation results into the final context of world knowledge or scientific knowledge.
624 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
[57]. Furthermore, prior causal knowledge can be used to con- method because the human knowledge is essentially used
strain the direction of links in a Bayesian network [58]. for label generation only. However, recent efforts integrate
Final Hypothesis. Finally, predictions obtained from a further knowledge into the active learning process.
Bayesian network can be judged by probabilistic relational Visual analytics combines analysis techniques and inter-
knowledge in order to refine the model [106]. active visual interfaces to enable exploration of –and infer-
ence from– data [127]. Machine learning is increasingly
5.8 Human Feedback combined with visual analytics. For example, visual analyt-
Finally, we look at informed machine learning approaches ics systems allow users to drag similar data points closer in
belonging to the representation type of human feedback. order to learn distance functions [105], provide corrective
The most common path begins with expert knowledge and feedback in object recognition [107], or even to alter cor-
ends at the learning algorithm. rectly identified instances where the interpretation is not in
line with human explanations [108], [109].
Lastly, various tools exist for text analysis, in particular for
topic modeling [97] where users can create, merge and refine
topics or change keyword weights. They thus impart knowl-
edge by generating new reference matrices (term-by-topic
and topic-by-document matrices) that are integrated in a regu-
larization term that penalizes the difference between the new
and the old reference matrices. This is similar to the semantic
5.8.1 (Paths From) Knowledge Source loss term described above.
Training Data and Hypothesis Set. Another approach
Compared to other categories in our taxonomy, knowledge
towards incorporating expert knowledge in reinforcement
representation via human feedback is less formalized and
learning considers human demonstration of problem solving.
mainly stems from expert knowledge.
Expert demonstrations can be used to pre-train a deep Q-net-
Expert Knowledge. Examples of knowledge that fall into
work, which accelerates learning [104]. Here, prior knowledge
this category include knowledge about topics in text docu- is integrated into the hypothesis set and the training data since
ments [97], agent behaviors [98], [99], [103], [104], and data the demonstrations inform the training of the Q-network and,
patterns and hierarchies [97], [105], [109]. Knowledge is at the same time, allow for interactive learning via simulations.
often provided in form of relevance or preference feedback
and humans in the loop can integrate their intuitive knowl-
edge into the system without providing an explanation for 6 HISTORICAL BACKGROUND
their decision. For example, in object recognition, users can
The idea of integrating knowledge into learning has a long
provide their corrective feedback about object boundaries
history. Historically, AI research roughly considered the
via brush strokes [107]. As another example, in Game AI, an
two antipodal paradigms of symbolism and connectionism.
expert user can give spoken instructions for an agent in an
The former dominated up until the 1980s and refers to rea-
Atari game [99].
soning based on symbolic knowledge; the latter became
more popular in the 1990s and considers data-driven deci-
5.8.2 (Paths to) Knowledge Integration sion making using neural networks. Especially Minsky [128]
Human feedback for machine learning is usually assumed to pointed out limitations of symbolic AI and promoted a
be limited to feature engineering and data annotation. How- stronger focus on data-driven methods to allow for causal
ever, it can also be integrated into the learning algorithm itself. and fuzzy reasoning. Already in the 1990s were knowledge
This often occurs in areas of reinforcement learning, or inter- data bases used together with training data to obtain knowl-
active learning combined with visual analytics. edge-based artificial neural networks [53]. In the 2000s,
Learning Algorithm. In reinforcement learning, an agent when support vector machines (SVMs) were the de-facto
observes an unknown environment and learns to act based on paradigm in classification, there was interest in incorporat-
reward signals. The TAMER framework [98] provides the ing knowledge into this formalism [23]. Moreover, in the
agent with human feedback rather than (predefined) rewards. geosciences, and most prominently in weather forecasting,
This way, the agent learns from observations and human knowledge integration dates back to the 1950s. Especially
knowledge alike. While these approaches can quickly learn the discipline of data assimilation deals with techniques
optimal policies, it is cumbersome to obtain the human feed- that combine statistical and mechanistic models to improve
back for every action. Human preference w.r.t. whole action prediction accuracy [129], [130].
sequences, i.e., agent behaviors, can circumvent this [103].
This enables the learning of reward functions. Expert knowl-
edge can also be incorporated through natural language inter- 7 DISCUSSION OF CHALLENGES AND DIRECTIONS
faces[99]. Here, a human provides instructions and agents Our findings about the main approaches of informed
receive rewards upon completing these instructions. machine learning are summarized in Table 4. It gives for
Active learning offers a way to include the “human in the each approach the taxonomy path, its main motivation, the
loop” to efficiently learn with minimal human intervention. central approach idea, remarks to potential challenges, and
This is based on iterative strategies where a learning algo- our viewpoint on current or future directions. For further
rithm queries an annotator for labels [126]. We do not con- details on the methods themselves and the corresponding
sider this standard active learning as an informed learning papers, we refer to Section 5. In the following, we discuss
628 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
TABLE 4
Main Approaches of Informed Machine Learning
The approaches are sorted by taxonomy path and knowledge representation. Methodical details can be found in Section 5. Challenges and directions are discussed
in Section 7.
the challenges and directions for these main approaches, techniques from data assimilation [130] could also be help-
sorted by the integrated knowledge representations. ful to combine modelling from knowledge and data.
Prior knowledge in the form of algebraic equations can Simulation results can be used for synthetic data genera-
be integrated as constraints via knowledge-based loss terms tion or augmentation (e.g., [18], [19], [59]), but this can bring
(e.g., [12], [13], [35]). Here, we see a potential challenge in up the challenge of a mismatch between real and simulated
finding the right weights for supervision from knowledge data. A promising direction to close the gap is domain adapta-
versus data labels. Currently, this is solved by setting the tion, especially adversarial training [67], [137], or domain ran-
hyperparameters for the individual loss terms [12]. How- domization [138]. Moreover, for future work we see further
ever, we think that strategies from more recently developed potential in the development of new hybrid systems that com-
learning algorithms, such as self-supervised [131] or few- bine machine learning and simulation in more sophisticated
shot learning [132], could also advance the supervision ways [139].
from prior knowledge. Moreover, we suggest further The utilization of spatial invariances through model archi-
research on theoretical concepts based on the existing gener- tectures with invariant characteristics, such as group equivar-
alization bounds from statistical learning theory [133], [134] iant or convolutional networks, diminish the model search
and the connection between regularization and effective space (e.g., [70], [71], [75]). Here, a potential challenge is the
hypothesis space [135]. proper invariance specification and implementation [75] or
Differential equations can be integrated similarly, but expensive evaluations on more complex geometries [111].
with a specific focus on physics-informed neural networks Therefore, we think that the efficient adaptation of invariant-
that constrain the model derivatives by the underlying dif- based models to further scenarios can further improve geo-
ferential equation (e.g., [20], [45], [46]). A potential challenge metric-based representation learning [111].
is the robustness of the solution, which is the subject of cur- Logic rules can be encoded in the architecture of knowl-
rent research. One approach is to investigate the the model edge-based neural networks (KBANNs), (e.g., [53], [54], [89]).
quality by a suitable quanitification of its uncertainty [43], Since this idea was already developed when neural networks
[46]. We think, a more in-depth comparison with existing had only a few layers, a question is, if it is still feasible for
numerical solvers [136] would also be helpful. Another chal- deep neural networks. In order to improve the practicality,
lenge of physical systems is the generation and integration we suggest to develop automated interfaces for knowledge
of sensor data in real-time. This is currently tackled by integration. A future direction could be the development of
online learning methods [48]. Furthermore, we think that new neuro-symbolic systems. Although the combination of
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 629
[15] K. Marino, R. Salakhutdinov, and A. Gupta, “The more you [42] S. Jeong, B. Solenthaler, M. Pollefeys, M. Gross et al., “Data-
know: Using knowledge graphs for image classification,” in Proc. driven fluid simulations using regression forests,” ACM Trans.
IEEE Conf. Comput. Vis. Pattern Recognit.. 2017, pp. 20–28. Graph., vol. 34, no. 6, 2015, Art. no. 199.
[16] C. Jiang, H. Xu, X. Liang, and L. Lin, “Hybrid knowledge routed [43] Y. Yang and P. Perdikaris, “Physics-informed deep generative
modules for large-scale object detection,” in Proc. Int. Conf. Neu- models,” 2018, arXiv:1812.03511.
ral Inf. Process. Syst., 2018, 1559–1570. [44] E. de Bezenac, A. Pajot, and P. Gallinari, “Deep learning for
[17] A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret, “Robots that can physical processes: Incorporating prior scientific knowledge,”
adapt like animals,” Nature, vol. 521, no. 7553, pp. 503–507, 2015. 2017, arXiv:1711.07970.
[18] K.-H. Lee, J. Li, A. Gaidon, and G. Ros, “Spigan: Privileged [45] I. E. Lagaris, A. Likas, and D. I. Fotiadis, “Artificial neural net-
adversarial learning from simulation,” in Proc. Int. Conf. Learn. works for solving ordinary and partial differential equations,”
Representations, 2019. IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 987–1000, Sep. 1998.
[19] J. Pfrommer, C. Zimmerling, J. Liu, L. K€arger, F. Henning, and [46] Y. Zhu, N. Zabaras, P.-S. Koutsourelakis, and P. Perdikaris,
J. Beyerer, “Optimisation of manufacturing process parameters “Physics-constrained deep learning for high-dimensional surro-
using deep neural networks as surrogate models,” Procedia CIRP, gate modeling and uncertainty quantification without labeled
vol. 72, no. 1, pp. 426–431, 2018. data,” J. Comput. Phys., vol. 394, pp. 56–81, 2019.
[20] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics [47] D. C. Psichogios and L. H. Ungar, “A hybrid neural network-first
informed deep learning (part i): Data-driven solutions of nonlin- principles approach to process modeling,” AIChE J., vol. 38,
ear partial differential equations,” 2017, arXiv:1711.10561. no. 10, pp. 1499–1511, 1992.
[21] M. Diligenti, M. Gori, and C. Sacca, “Semantic-based regulariza- [48] M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks:
tion for learning and inference,” Artif. Intell., vol. 244, pp. 143– Using physics as model prior for deep learning,” 2019,
165, 2017. arXiv:1907.04490.
[22] A. Karpatne et al., “Theory-guided data science: A new paradigm [49] F. D. A. Belbute-peres, K. R. Allen, K. A. Smith, and J. B. Tenen-
for scientific discovery from data,” Trans. Knowl. Data Eng., baum, “End-to-end differentiable physics for learning and con-
vol. 29, no. 10, pp. 2318–2331, 2017. trol,” in Proc. Neural Inf. Process. Syst., 2018, pp. 7178–7189.
[23] F. Lauer and G. Bloch, “Incorporating prior knowledge in sup- [50] J. Ling, A. Kurzawski, and J. Templeton, “Reynolds averaged
port vector machines for classification: A review,” Neurocomput- turbulence modelling using deep neural networks with embed-
ing, vol. 71, no. 7–9, pp. 1578–1594, 2008. ded invariance,” J. Fluid Mechanics, vol. 807, pp. 155 –166, 2016.
[24] P. W. Battaglia et al., “Relational inductive biases, deep learning, [51] A. Butter, G. Kasieczka, T. Plehn, and M. Russell, “Deep-learned
and graph networks,” 2018, arXiv:1806.01261. top tagging with a lorentz layer,” SciPost Phys, vol. 5, no. 28, 2018.
[25] M. Steup, “Epistemiology,” in The Stanford Encyclopedia of Philosophy [52] J.-L. Wu, H. Xiao, and E. Paterson, “Physics-informed machine
(Winter 2012 Edition), Edward N. Zalta (ed.), 2018. [Online]. Avail- learning approach for augmenting turbulence models: A com-
able: https://stanford.library.sydney.edu.au/archives/win2018/ prehensive framework,” Phys. Rev. Fluids, vol. 3, no. 7, 2018, Art.
entries/epistemology/. no. 074602.
[26] L. Zagzebski, What is Knowledge? Hoboken, NJ, USA: Wiley, 2017. [53] G. G. Towell and J. W. Shavlik, “Knowledge-based artificial neu-
[27] P. Machamer and M. Silberstein, The Blackwell Guide to the Philos- ral networks,” Artif. Intell., vol. 70, no. 1–2, pp. 119–165, 1994.
ophy of Science. Hoboken, NJ, USA: Wiley, 2008, vol. 19. [54] A. S. d. Garcez and G. Zaverucha, “The connectionist inductive
[28] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining learning and logic programming system,” Appl. Intell., vol. 11,
to knowledge discovery in databases,” AI Mag., vol. 17, no. 3, 1996. no. 1, pp. 59–77, 1999.
[29] D. Kahneman, Thinking, Fast and Slow. New York, NY, USA: Mac- [55] T. Ma and A. Zhang, “Multi-view factorization autoencoder with
millan, 2011. network constraints for multi-omic integrative analysis,” in Proc.
[30] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, Int. Conf. Bioinform. Biomed. 2018, pp. 702–707.
“Building machines that learn and think like people,” Behav. [56] Z. Che, D. Kale, W. Li, M. T. Bahadori, and Y. Liu, “Deep compu-
Brain Sci., vol. 40, 2017. tational phenotyping,” in Proc. Int. Conf. Knowl. Discov. Data Min-
[31] H. G. Gauch, Scientific Method in Practice. Cambridge, U.K.: Cam- ing, 2015, pp. 507–516.
bridge University Press, 2003. [57] M. B. Messaoud, P. Leray, and N. B. Amor, “Integrating ontologi-
[32] Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T. Lin, Learning cal knowledge for iterative causal discovery and visualization,”
From Data. USA: AMLBook, 2012. in Proc. Eur. Conf. Symbolic Quantitative Approaches Reasoning
[33] N. Muralidhar, M. R. Islam, M. Marwah, A. Karpatne, and N. Ram- Uncertainty. Springer, 2009, pp. 168–179.
akrishnan, “Incorporating prior domain knowledge into deep neu- [58] G. Borboudakis and I. Tsamardinos, “Incorporating causal prior
ral networks,” in Proc. Int. Conf. Big Data, 2018, pp. 36–45. knowledge as path-constraints in Bayesian networks and maxi-
[34] Y. Lu, M. Rajora, P. Zou, and S. Liang, “Physics-embedded mal ancestral graphs,” 2021, arXiv:1206.6390.
machine learning: Case study with electrochemical micro- [59] T. Deist, A. Patti, Z. Wang, D. Krane, T. Sorenson, and D. Craft,
machining,” Machines, vol. 5, no. 1, 2017, Art. no. 4. “Simulation assisted machine learning.” Bioinformatics, vol. 35,
[35] R. Heese, M. Walczak, L. Morand, D. Helm, and M. Bortz, “The no. 20, pp. 4072–4080, 2019.
good, the bad and the ugly: Augmenting a black-box model with [60] H. S. Kim, M. Koc, and J. Ni, “A hybrid multi-fidelity approach
expert knowledge,” in Proc. Int. Conf. Artif. Neural Netw., 2019, to the optimal design of warm forming processes using a knowl-
pp. 391–395. edge-based artificial neural network,” Int. J. Mach. Tools Manuf.,
[36] G. M. Fung, O. L. Mangasarian, and J. W. Shavlik, “Knowledge- vol. 47, no. 2, pp. 211–222, 2007.
based support vector machine classifiers,” in Proc. 15th Int. Conf. [61] G. Hautier, C. C. Fischer, A. Jain, T. Mueller, and G. Ceder,
Neural Inf. Process. Syst., 2003, pp. 537–544. “Finding nature’s missing ternary oxide compounds using
[37] O. L. Mangasarian and E. W. Wild, “Nonlinear knowledge- machine learning and density functional theory,” Chem. Mater.,
based classification,” IEEE Trans. Neural Netw., vol. 19, vol. 22, no. 12, pp. 3762–3767, 2010.
no. 10, pp. 1826–1832, Oct. 2008. [62] M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum, “A
[38] R. Ramamurthy, C. Bauckhage, R. Sifa, J. Sch€ ucker, and S. Wro- compositional object-based approach to learning physical
bel, “Leveraging domain knowledge for reinforcement learning dynamics,” 2016, arXiv:1612.00341.
using MMC architectures,” in Proc. Int. Conf. Artifi. Neural Netw., [63] E. Choi, M. T. Bahadori, L. Song, W. F. Stewart, and J. Sun,
2019, 595–607. “Gram: Graph-based attention model for healthcare representa-
[39] M. von Kurnatowski, J. Schmid, P. Link, R. Zache, L. Morand, tion learning,” in Proc. Int. Conf. Knowl. Discov. Data Mining,
T. Kraft, I. Schmidt, and A. Stoll, “Compensating data shortages 2017, pp. 787–795.
in manufacturing with monotonicity knowledge,” 2020, [64] A. Lerer, S. Gross, and R. Fergus, “Learning physical intuition of
arXiv:2010.15955. block towers by example,” 2016, arXiv:1603.01312.
[40] C. Bauckhage, C. Ojeda, J. Sch€ ucker, R. Sifa, and S. Wrobel, [65] A. Rai, R. Antonova, F. Meier, and C. G. Atkeson, “Using simula-
“Informed machine learning through functional composition.” tion to improve sample-efficiency of Bayesian optimization for
In LWDA, pp. 33–37, 2018. bipedal robots.” J. Machine Learn. Res., vol. 20, no. 49, pp. 1–24,
[41] R. King, O. Hennigh, A. Mohan, and M. Chertkov, “From deep to 2019.
physics-informed learning of turbulence: Diagnostics,” 2018, [66] Y. Du et al., “Learning to exploit stability for 3D scene parsing,”
arXiv:1810.07785. in Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 1733–1743.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 631
[67] A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. [92] G. Glavas and I. Vulic, “Explicit retrofitting of distributional
Webb, “Learning from simulated and unsupervised images word vectors,” in Proc. Assoc. Comput. Linguistics, 2018,
through adversarial training,” in Proc IEEE Conf. Comput. Vis. 34–45.
Pattern Recognit., 2017, pp. 2242–2251. [93] Y. Fang, K. Kuan, J. Lin, C. Tan, and V. Chandrasekhar, “Object
[68] F. Wang and Q.-J. Zhang, “Knowledge-based neural models for detection meets knowledge graphs,” Int. Joint Conf. Artif. Intell.
microwave design,” Trans. Microwave Theory Techn., vol. 45, 2017, pp. 1661–1667.
no. 12, pp. 2333–2343, Dec. 1997. [94] M. E. Peters et al., “Knowledge enhanced contextual word rep-
[69] S. J. Leary, A. Bhaskar, and A. J. Keane, “A knowledge-based resentations,” in Proc. Conf. Empirical Methods Natural Lang.
approach to response surface modelling in multifidelity opti- Process. (EMNLP), Int. Joint Conf. Nat. Lang. Process., 2019, pp.
mization,” J. Global Optim., vol. 26, no. 3, pp. 297–319, 2003. 43–54.
[70] T. S. Cohen and M. Welling, “Group equivariant convolutional [95] L. M. de Campos and J. G. Castellano, “Bayesian network learn-
networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 2990–2999. ing algorithms using structural restrictions,” Int. J. Approx. Rea-
[71] S. Dieleman, J. De Fauw, and K. Kavukcuoglu, “Exploiting cyclic soning, vol. 45, no. 2, pp. 233–254, 2007.
symmetry in convolutional neural networks,” 2016, [96] A. C. Constantinou, N. Fenton, and M. Neil, “Integrating expert
arXiv:1602.02660. knowledge with data in Bayesian networks: Preserving data-
[72] B. Sch€ olkopf, P. Simard, A. J. Smola, and V. Vapnik, “Prior driven expectations when the expert variables remain unob-
knowledge in support vector kernels,” in Proc. Conf. Adv. Neural served,” Expert Syst. Appl., vol. 56, pp. 197–208, 2016.
Inf. Process. Syst., 1998, 640–646. [97] J. Choo, C. Lee, C. K. Reddy, and H. Park, “Utopian: User-driven
[73] B. Yet, Z. B. Perkins, T. E. Rasmussen, N. R. Tai, and D. W. R. topic modeling based on interactive nonnegative matrix
Marsh, “Combining data and meta-analysis to build Bayesian factorization,” IEEE Trans. Vis. Comput. Graph., vol. 19, no. 12,
networks for clinical decision support,” J. Biomed. Inform., vol. 52, pp. 1992–2001, Dec. 2013.
pp. 373–385, 2014. [98] W. B. Knox and P. Stone, “Interactively shaping agents via
[74] J. Li, Z. Yang, H. Liu, and D. Cai, “Deep rotation equivariant human reinforcement: The tamer framework,” in Proc. Int. Conf.
network,” Neurocomputing, vol. 290, pp. 26–33, 2018. Knowl. Capture (K-CAP)2009, pp. 9–16.
[75] D. E. Worrall, S. J. Garbin, D. Turmukhambetov, and G. J. Bros- [99] R. Kaplan, C. Sauer, and A. Sosa, “Beating atari with natural lan-
tow, “Harmonic networks: Deep translation and rotation equiv- guage guided reinforcement learning,” 2017, arXiv:1704.05539.
ariance,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2017, [100] D. Heckerman, D. Geiger, and D. M. Chickering, “Learning
pp. 5028–5037. Bayesian networks: The combination of knowledge and statisti-
[76] P. Niyogi, F. Girosi, and T. Poggio, “Incorporating prior informa- cal data,” Mach. Learn., vol. 20, no. 3, pp. 197–243, 1995.
tion in machine learning by creating virtual examples,” Proc. [101] M. Richardson and P. Domingos, “Learning with knowledge
IEEE, vol. 86, no. 11, pp. 2196–2209, Nov. 1998. from multiple experts,” in Proc. Int. Conf. Mach. Learn., 2003,
[77] M. Schiegg, M. Neumann, and K. Kersting, “Markov logic pp. 624–631.
mixtures of gaussian processes: Towards machines reading [102] A. Feelders and L. C. Van der Gaag, “Learning Bayesian network
regression data,” in Proc. Artif. Intell. Statist., 2012, pp. 1002– parameters under order constraints,” Int. J. Approx. Reasoning,
1011. vol. 42, no. 1–2, 2006, pp. 37–53.
[78] M. Sachan, K. A. Dubey, T. M. Mitchell, D. Roth, and E. P. Xing, [103] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and
“Learning pipelines with limited data and domain knowledge: A D. Amodei, “Deep reinforcement learning from human prefer-
study in parsing physics problems,” in Neural Inf. Process. Syst., ences,” in Proc. Neural Inf. Process. Syst., 2017, pp. 4302–4310.
2018, pp. 140–151. [104] T. Hester et al., “Deep q-learning from demonstrations,” in Proc.
[79] H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu, Conf. Artif. Intell., 2018, PP. 3223–3230.
“Commonsense knowledge aware conversation generation with [105] E. T. Brown, J. Liu, C. E. Brodley, and R. Chang, “Dis-function:
graph attention,” in Proc. Int. Joint Conf. Artif. Intell., 2018, Learning distance functions interactively,” in Proc. Conf. Visual
pp. 4623–4629. Analytics Sci. Technol., 2012, pp. 83–92.
[80] M. Mintz, S. Bills, R. Snow, and D. Jurafsky, “Distant supervision [106] B. Yet, Z. Perkins, N. Fenton, N. Tai, and W. Marsh, “Not just
for relation extraction without labeled data,” in Proc. Assoc. Com- data: A method for improving prediction with knowledge,” J.
put. Linguistics, Int. Joint Conf. Natural Lang. Process., 2009, Biomed. Inf., vol. 48, 2014, pp. 28–37.
pp. 1003–1011. [107] J. A. Fails and D. R. Olsen Jr, “Interactive machine learning,” in
[81] X. Liang, Z. Hu, H. Zhang, L. Lin, and E. P. Xing, “Symbolic Proc. Int. Conf. Intell. User Interfaces, 2003, pp. 39–45.
graph reasoning meets convolutions,” in Proc. Int. Conf. Neural [108] L. Rieger, C. Singh, W. J. Murdoch, and B. Yu, “Interpretations
Inf. Process. Syst., 2018, pp. 1858–1868. are useful: Penalizing explanations to align neural networks
[82] D. L. Bergman, “Symmetry constrained machine learning,” in with prior knowledge,” 2019, arXiv:1909.13584.
Proc. SAI Intelligent Systems Conf. 2019, pp 501–512. [109] P. Schramowski et al., “Right for the wrong scientific reasons:
[83] M.-W. Chang, L. Ratinov, and D. Roth, “Guiding semi-supervi- Revising deep networks by interacting with their explanations,”
sion with constraint-driven learning,” in Association for Computat. 2020, arXiv:2001.05371.
Linguistics, 2007, pp. 280–287. [110] L. von Rueden, T. Wirtz, F. Hueger, J. D. Schneider, N. Piatkow-
[84] Z. Hu, Z. Yang, R. Salakhutdinov, and E. Xing, “Deep neural net- ski, and C. Bauckhage, “Street-map based validation of semantic
works with massive learned knowledge,” in Proc. Conf. Empirical segmentation in autonomous driving,” in Proc. Int. Conf. Pattern
Methods Natural Lang. Process., 2016, pp. 1670–1679. Recognit., 2021, pp. 10203–10210.
[85] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, “Harnessing deep [111] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Van-
neural networks with logic rules,” 2016, arXiv:1603.06318. dergheynst, “Geometric deep learning: Going beyond euclid-
[86] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun, and Q. Liu, “Ernie: ean data,” IEEE Signal Process. Mag., vol. 34, no. 4, 2017, pp.
Enhanced language representation with informative entities,” 18–42.
2019, arXiv:1905.07129. [112] M.-W. Chang, L. Ratinov, and D. Roth, “Structured learning with
[87] N. Mrk si et al., “Counter-fitting word vectors to linguistic con- constrained conditional models,” Mach. Learn., vol. 88, no. 3,
straints,” 2016, arXiv:1603.00892. 2012, pp. 399–431.
[88] J. Bian, B. Gao, and T.-Y. Liu, “Knowledge-powered deep learn- [113] D. Sridhar, J. Foulds, B. Huang, L. Getoor, and M. Walker, “Joint
ing for word embedding,” in Proc. Joint Eur. Conf. Mach. Learn. models of disagreement and stance in online debate,” in Proc.
Knowl. Discov. Databases, 2014, pp. 132–148. Assoc. Comput. Linguistics Int. Joint Conf. Nat. Lang. Process., 2015,
[89] M. V. França, G. Zaverucha, and A. S. d. Garcez, “Fast relational pp. 116–125.
learning using bottom clause propositionalization with artificial [114] A. S. d. Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and
neural networks,” Mach. Learn., vol. 94, no. 1, pp. 81–104, 2014. S. N. Tran, “Neural-symbolic computing: An effective methodol-
[90] M. Richardson and P. Domingos, “Markov logic networks,” ogy for principled integration of machine learning and reason-
Mach. Learn., vol. 62, no. 1–2, pp. 107–136, 2006. ing,” 2019, arXiv:1905.06088.
[91] A. Kimmig, S. Bach, M. Broecheler, B. Huang, and L. Getoor, “A [115] L. D. Raedt, K. Kersting, and S. Natarajan, Statistical Relational
short introduction to probabilistic soft logic,” in Proc. NIPS Work- Artificial Intelligence: Logic, Probability, and Computation. San
shop Probabilistic Program: Found. Appl., 2012, pp. 1–4. Rafael, CA, USA: Morgan & Claypool, 2016.
632 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 35, NO. 1, JANUARY 2023
[116] R. Speer and C. Havasi, “Conceptnet 5: A large semantic network [144] S. H. Bach, M. Broecheler, B. Huang, and L. Getoor, “Hinge-loss
for relational knowledge,” in Proc. People’s Web Meets NLP, 2013, Markov random fields and probabilistic soft logic,” 2015,
pp. 161–176. arXiv:1505.04406.
[117] G. A. Miller, “WordNet: A lexical database for english,” Commu- [145] V. Embar, D. Sridhar, G. Farnadi, and L. Getoor, “Scalable struc-
nications ACM, vol. 38, no. 11, 1995, pp. 39–41. ture learning for probabilistic soft logic,” 2018, arXiv:1807.00973.
[118] T. Mitchell et al., “Never-ending learning,” Communications ACM, [146] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational infer-
vol. 61, no. 5, 2018, pp. 103–115. ence: A review for statisticians,” J. Amer. Stat. Assoc., vol. 112,
[119] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre- no. 518, 2017, pp. 859–877.
training of deep bidirectional transformers for language under- [147] D. P. Kingma and M. Welling, “An introduction to variational
standing,” 2018, arXiv:1810.04805. autoencoders,” 2019, arXiv:1906.02691.
[120] Z.-X. Ye and Z.-H. Ling, “Distant supervision relation extraction [148] J. Pearl, Causality. Cambridge, U.K.: Cambridge Univ. Press, 2009.
with intra-bag and inter-bag attentions,” 2019, arXiv:1904.00143. [149] J. Kreutzer, S. Riezler, and C. Lawrence, “Learning from human
[121] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estima- feedback: Challenges for real-world reinforcement learning in
tion of word representations in vector space,” 2013, NLP,” 2020, arXiv:2011.02511.
arXiv:1301.3781. [150] G. Dulac-Arnold, D. Mankowitz, and T. Hester, “Challenges of
[122] M. S. Massa, M. Chiogna, and C. Romualdi, “Gene set analysis real-world reinforcement learning,” 2019, arXiv:1904.12901.
exploiting the topology of a pathway,” BMC Syst. Biol., vol. 4, [151] J. Kreutzer, J. Uyheng, and S. Riezler, “Reliability and learnabil-
no. 1, 2010, Art. no. 121. ity of human bandit feedback for sequence-to-sequence rein-
[123] N. Angelopoulos and J. Cussens, “Bayesian learning of Bayesian forcement learning,” in Proc. Assoc. Computat. Linguistics, 2018,
networks with informative priors,” Ann. Math. Artif. Intell., pp. 1777–1788.
vol. 54, no. 1–3, 2008, pp. 53–98. [152] Y. Gao, C. M. Meyer, and I. Gurevych, “April: Interactively learn-
[124] N. Piatkowski, S. Lee, and K. Morik, “Spatio-temporal random ing to summarise by combining active preference learning and
fields: Compressible representation and distributed estimation,” reinforcement learning,” in Proc. Conf. Empirical Methods Natural
Mach. Learn., vol. 93, no. 1, 2013, pp. 115–139. Lang. Process., 2018, pp. 4120–4130.
[125] R. Fischer, N. Piatkowski, C. Pelletier, G. I. Webb, F. Petitjean, [153] F. Doshi-Velez and B. Kim, “Towards a rigorous science of inter-
and K. Morik, “No cloud on the horizon: Probabilistic gap filling pretable machine learning,” 2017, arXiv:1702.08608.
in satellite image series,” in Proc. Int. Conf. Data Sci. Adv. Anal.,
2020, pp. 546–555. Laura von Rueden received the BSc degree in
[126] B. Settles, “Active learning literature survey,” Comput. Sci., Univ. physics and the MSc degree in simulation sciences
Wisconsin–Madison, Madison, WI, USA, Tech. Rep. 1648, 2009. in 2015 from RWTH Aachen University. She was a
[127] D. Keim, G. Andrienko, J.-D. Fekete, C. G€ org, J. Kohlhammer, data scientist with Capgemini. Since 2018, she has
and G. Melançon, “Visual analytics: Definition, process, and been a research scientist with Fraunhofer IAIS.
challenges,” in Proc. Inf. Visual., 2008, pp. 154–175. She is currently working toward the PhD degree in
[128] M. L. Minsky, “Logical versus analogical or symbolic versus connec- computer science with the Universita €t Bonn. Her
tionist or neat versus scruffy,” AI Mag., vol. 12, no. 2, pp. 34–51, 1991. research interests include machine learning and
[129] E. Kalnay, Atmospheric Modeling, Data Assimilation and Predictabil- especially the combination of data and knowledge-
ity. Cambridge, U.K: Cambridge University Press, 2003. based modeling.
[130] S. Reich and C. Cotter, Probabilistic Forecasting and Bayesian Data
Assimilation. Cambridge, U.K: Cambridge Univ. Press, 2015.
[131] M. Janner, J. Wu, T. D. Kulkarni, I. Yildirim, and J. B. Tenen- Sebastian Mayer received the diploma degree
baum, “Self-supervised intrinsic image decomposition,” in Proc. inmathematics from TU Darmstadt, in 2011, and
Neural Inf. Process. Syst., 2017, pp. 5938–5948. the PhD degree in mathematics from University
[132] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a Bonn, in 2018. Since 2017, he has been a
few examples: A survey on few-shot learning,” ACM Comput. research scientist with Fraunhofer SCAI. His
Surveys, vol. 53, no. 3, 2020, Art no. 63. research interests include machine learning and
[133] F. Cucker and D. X. Zhou, Learning Theory: An Approximation The- biologically-inspired algorithms in the context of
ory Viewpoint. Cambridge, U.K.: Cambridge Univ. Press, 2007. cyberphysical systems.
[134] I. Steinwart and A. Christmann, Support Vector Machines. Ger-
many: Springer, 2008.
[135] F. Cucker and S. Smale, “Best choices for regularization parame-
ters in learning theory: On the bias-variance problem,” Proc.
Found. Comput. Math., vol. 2, no. 4, 2002, pp. 413–428.
[136] L. Lapidus and G. F. Pinder, Numerical Solution of Partial Differen- Katharina Beckh received the MSc degree in
tial Equations in Science and Engineering. Hoboken, NJ, USA: human-computer interaction from the Julius Maxi-
Wiley, 2011. milian University of Wuerzburg in 2019. Since
[137] M. Wulfmeier, A. Bewley, and I. Posner, “Addressing appearance 2019, she has been a research scientist with
change in outdoor robotics with adversarial domain adaptation,” Fraunhofer IAIS. Her research interests include
in Proc. Int. Conf. Intell. Robots Syst., 2017, pp. 1551–1558. interactive machine learning, human oriented
[138] X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim- modeling, and text mining with a primary focus in
to-real transfer of robotic control with dynamics randomization,” the medical domain.
in Proc. IEEE Int. Conf. Robot. Automat., 2018, pp. 3803–03810.
[139] L. von Rueden, S. Mayer, R. Sifa, C. Bauckhage, and J. Garcke,
“Combining machine learning and simulation to a hybrid model-
ling approach: Current and future directions,” in Proc. Int. Symp.
Intell. Data Anal., 2020, pp. 548–560.
[140] K. McGarry, S. Wermter, and J. MacIntyre, “Hybrid neural sys- Bogdan Georgiev received the PhD degree in
tems: From simple coupling to fully integrated neural networks,” mathematics from Max-Planck-Institute and
Neural Comput. Surv., vol. 2, no. 1, 1999, pp. 62–93. Bonn University in 2018. Since 2018, he has
[141] R. Sun, “Connectionist implementationalism and hybrid systems,” been a research scientist with Fraunhofer IAIS.
Encyclopedia of Cognitive Science. Hoboken, NJ, USA: Wiley, 2006. His current research interests include aspects of
[142] A. S. d. Garcez and L. C. Lamb, “Neurosymbolic AI: The 3rd learning theory such as generalization or com-
wave,” 2020, arXiv:2012.05876. pression bounds, geometric learning, and quan-
[143] T. Dong et al. “Imposing category trees onto word-embeddings tum computing.
using a geometric construction,” in Proc. Int. Conf. Learn. Repre-
sentations, 2018.
VON RUEDEN ET AL.: INFORMED MACHINE LEARNING 633
Sven Giesselbach received the MSc degree in Rajkumar Ramamurthy received the MSc
computer science from the University of Bonn in degree in media informatics from RWTH Aachen
2012. Since 2015, he has been a data scientist University in 2016. Since 2018, he has been a
with Fraunhofer IAIS and is also lead of the team data scientist with Fraunhofer IAIS. He is cur-
natural language understanding with the depart- rently working toward the Phd degree with the
ment knowledge discovery. His research interest University of Bonn. His research interests include
includes the use of external knowledge in natural reinforcement learning and natural language
language processing. processing.
Raoul Heese received the diploma and PhD Jochen Garcke received the diploma and PhD
degrees from the Institute of Quantum Physics, degrees in mathematics from the Universita €t
Ulm University, Germany, in 2012 and 2016, Bonn, in 1999 and 2004, respectively. From 2004
respectively. He is currently a research scientist to 2006, he was a postdoctoral fellow with the
with Fraunhofer ITWM, Kaiserslautern, Germany. Australian National University. He was a postdoc-
His research interests include informed learning, toral researcher from 2006 to 2008 and a Junior
supervised learning, and their application to real- Research Group leader from 2008 to 2011, with
world problems. the Technical University Berlin. Since 2011, he
has been professor of numerics with the Univer-
sity of Bonn and department head with
Fraunhofer SCAI, Sankt Augustin. His research
interests include machine learning, scientific computing, reinforcement
learning, and highdimensional approximation. He is currently a member
Birgit Kirsch received the MSc degree in busi- of DMV, GAMM, and SIAM. He is currently a reviewer for the IEEE
ness informatics from Hochschule Trier in 2017. Transactions on Industrial Informatics, the IEEE Transactions on Neural
Since 2017, she has been a research scientist Networks, and the IEEE Transactions on Pattern Analysis and Machine
with Fraunhofer IAIS. Her research interests Intelligence.
include natural language processing and statisti-
cal relational learning.
Christian Bauckhage (Member, IEEE) received
the MSc and PhD degrees in computer science
from Bielefeld University, in 1998 and 2002,
respectively. Since 2008, he has been a professor
of computer science with the University of Bonn
Michal Walczak received the PhD degree in phys- and lead scientist for machine learning with
ics from the Georg-August University of Goettin- Fraunhofer IAIS. He was with the Centre for
Vision Research, Toronto, Canada, and a senior
gen, Germany, in 2014. Since 2016, he has been a
research scientist with Deutsche Telekom Labo-
research scientist with Fraunhofer ITWM, Kaiser-
slautern, Germany. His research interests include ratories, Berlin. His research interests include
machine learning, decision support, multicriteria theory and practice of learning systems and next
optimization, and their application to radiotherapy generation computing. He is currently reviewer for the IEEE Transac-
planning and process engineering. tions on Neural Networks and Learning Systems, the IEEE Transactions
on Pattern Analysis and Machine Intelligence, and the IEEE Transac-
tions on Games. He is currently an associate editor for the IEEE Trans-
actions on Games.
Julius Pfrommer received the PhD degree in Jannis Schuecker received the doctoral degree
computer science from the Karlsruhe Institute of in physics from the RWTH Aachen University.
Technology in 2019. Since 2018, he has been the Until 2019, he was a research scientist with
head of a research group with Fraunhofer IOSB. Fraunhofer IAIS. His research interests include
His research interests include distributed sys- machine learning in particular, time series model-
tems, planning under uncertainty, and optimiza- ing using neural networks, and interpretable
tion theory with its many applications for machine machine learning.
learning and optimal control.