a survey on NL
a survey on NL
• Neural-symbolic learning systems combine the neural systems and the symbolic systems into a unified
framework.
• Neural-symbolic learning systems can equip AI with the ability to perform perception and cognition.
arXiv:2111.08164v3 [cs.LG] 25 Jun 2023
• A good combination between the neural systems and the symbolic systems allows the model to achieve the
desired performance.
A Survey on Neural-symbolic Learning Systems
Dongran Yua,c , Bo Yanga,b,∗ , Dayou Liua,b , Hui Wangd and Shirui Pane
a School of Computer Science and Technology and the Key Laboratory of Symbolic Computation and Knowledge Engineer (Jilin University),
Ministry of Education, Changchun, Jilin 130012, China
b School of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
c School of Artificial Intelligence, Jilin University, Changchun, Jilin, 130012, China
d School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast
e School of Information and Communication Technology, Griffith University
1. Introduction
Perception, represented by connectionism or neural systems, and cognition, represented by symbolism or symbolic
systems, are two fundamental paradigms in the field of artificial intelligence (AI), each having prevailed for several
decades. Figure 1 showcases the rise and fall of these two doctrines, which researchers commonly categorize into
three significant periods. (1) The 1960s-1970s (1956-1968). The inception of reasoning in AI can be traced back to
the 1956 Dartmouth Conference, where Newell and Simon introduced the "logical theorist" program, successfully
proving 38 mathematical theorems. This marked the beginning of the reasoning era in AI. However, researchers soon
realized that relying solely on heuristic search algorithms had limitations, and many complex problems necessitated
specialized domain knowledge to achieve higher levels of intelligence. Consequently, incorporating knowledge into
AI models became a prevalent notion, as it enhanced the ability to find solutions within large solution spaces. (2) The
1970s-1990s (1968-1985). During this period, significant developments occurred, such as the creation of the first expert
system, "DenDral" by Faigan’s nanny and Lederberg in 1968. This milestone represented the organic integration of AI
and domain knowledge, marking the advent of the knowledge-focused era in AI. However, knowledge acquisition posed
a significant challenge, leading to a shift towards automatic acquisition of valuable knowledge from massive datasets,
which gradually became the mainstream trend in AI. (3) From the 1990s to the present: In 1983, neural networks
started gaining prominence, and after 2000, AI entered the era of machine learning. Notable breakthroughs in machine
learning, particularly through neural networks, include the 2012 triumph of ImageNet with Deep Convolutional Neural
Networks (CNN), and the 2016 victory of AlphaGo against the Go world champion. These significant milestones
exemplify the power of machine learning approaches, predominantly driven by neural networks.
To date, neural networks have demonstrated remarkable accomplishments in perception-related tasks, such as image
recognition [108]. However, there exist various scenarios, including question answering [48], medical diagnosis [111],
and autonomous driving [138], where relying solely on perception can present limitations or yield unsatisfactory
outcomes. For instance, when confronted with unseen situations during training, machines may struggle to make
accurate decisions in medical diagnosis. Another crucial consideration is the compatibility of purely perception-
based models with the principles of explainable AI [105]. Neural networks, being black-box systems, are unable to
∗ Correspondingauthor. School of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China (Note:Accepted by
Neural Networks Journal)
[email protected] (D. Yu); [email protected] (B. Yang); [email protected] (S. Pan)
Table 1
Summarize properties for the symbolic systems and neural systems separately.
Systems Processing Methods Knowledge representation Primary algorithms Advantages Disadvantages
Strong generalization ability Weak at handling unstructured data
Symbolic systems Deductive reasoning Logical representation Logical deduction Good interpretability Weak robustness
Knowledge-driven Slow reasoning
Strong at handling unstructured data Weak generalizability (adaptability)
Neural systems
Inductive learning Distributed representation BP algorithms Strong robustness Lack of interpretability
(Sub-symbolic systems)
Fast learning Data-driven
provide explicit calculation processes. In contrast, symbolic systems offer enhanced appeal in terms of reasoning
and interpretability. For example, through deductive reasoning and automatic theorem proving, symbolic systems can
generate additional information and elucidate the reasoning process employed by the model.
Consequently, an increasing number of researchers have directed their attention towards the fusion of neural
systems and symbolic systems, aiming to achieve the third wave of AI: neural-symbolic learning systems [81, 80,
29, 85, 152, 35]. In a special NeurIPS 2019 lecture, Turing Award laureate Yoshua Bengio drew inspiration from Dr.
Daniel Kahneman’s renowned book "Thinking Fast and Slow" [56] to emphasize the need for a system-1-to-system-2
transformation in deep learning. Here, system 1 represents the intuitive, rapid, unconscious, nonlinguistic, and habitual
aspects, while system 2 embodies the deliberative, logical, sequential, conscious, linguistic, algorithmic, planning-
related, and reasoning-related facets. Indeed, since the 1990s, numerous researchers in the fields of artificial intelligence
and cognitive science have explicitly proposed the concept of dual processes that correspond to these contrasting
systems [50]. These highlight the necessity of combining neural systems and symbolic systems. By unifying these
two system types within a comprehensive framework, neural-symbolic learning systems can be created, endowing
AI with the capability to perform both perception and reasoning tasks. It is worth noting that the idea of integrating
neural systems and symbolic systems, referred to as hybrid connectionist-symbolic models, was initially introduced in
the 1990s [120].
Neural-symbolic learning systems leverage the combine strengths of both neural systems and symbolic systems
[79, 155, 99, 38, 126, 8, 44, 60, 118, 42, 31, 59, 66, 69]. To provide a comprehensive understanding, the survey
initially outlines key characteristics of symbolic systems and neural systems (refer to Table 1), including processing
methods, knowledge representation, etc. Analysis of Table 1 reveals that symbolic systems and neural systems exhibit
complementary features across various aspects. For instance, symbolic systems may possess limited robustness,
whereas neural systems demonstrate robustness. Consequently, neural-symbolic learning systems emerge as a means
to compensate for the shortcomings inherent in individual systems.
Moreover, we conduct an analysis of neural-symbolic learning systems from three key perspectives: efficiency,
generalization, and interpretability. As depicted in Figure 2, neural-symbolic learning systems excel in these areas.
Firstly, in terms of efficiency, neural-symbolic learning systems can reason quickly compared to pure symbolic systems,
thereby reducing computational complexity [150]. This accelerated computation can be attributed to the integration of
Neural systems
Neural-symbolic
systems
Efficiency
Symbolic systems
Low
Interpretability High
Figure 2: The advantages of neural-symbolic learning systems with respect to model efficiency, generalization, and
interpretability. The neural systems are black-box systems, while symbolic systems are white-box systems.
neural networks, as outlined in Section 2 (Learning for reasoning). Traditional symbolic approaches typically employ
search algorithms to navigate solution spaces, leading to increased computational complexity as the search space
grows in size. Secondly, with regard to generalization, neural-symbolic learning systems outperform standalone neural
systems in terms of their capacity for generalization. The incorporation of symbolic knowledge as valuable training
data enhances the model’s generalization abilities [58] (see Section 2 Reasoning for learning). Thirdly, in terms of
interpretability, neural-symbolic learning systems represent gray-box systems, in contrast to standalone neural systems.
By leveraging symbolic knowledge, these systems can provide explicit computation processes, such as traced reasoning
processes or chains of evidence for results [146]. Consequently, neural-symbolic learning systems have emerged as
vital components of explainable AI, yielding superior performance across diverse domains, including computer vision
[140, 28, 134, 58, 153, 14, 70], and natural language processing [72, 147, 125], etc.
Challenge: Symbolic systems and neural systems diverge in terms of their data representations and problem-
solving approaches. Symbolic systems rely on discrete symbolic representations and traditional search algorithms
to discover solutions, while neural systems employ continuous feature vector representations and neural cells to
learn mapping functions. Consequently, a significant challenge lies in designing a unified framework that seamlessly
integrates both symbolic and neural components. The aim is to strike a balance and select an appropriate combination
of symbolic and neural systems that aligns with the requirements of the specific problem [50].
To provide new readers with a comprehensive understanding of neural-symbolic learning systems, this paper
surveys representative research and applications of these systems. To summarize and systematically review related
works, there are several surveys conducted during the past few years [5, 43, 8, 12, 128, 65, 84, 132, 127, 41, 120, 119],
[61, 82]. For example, [5, 127] center around knowledge extraction techniques, which aligns with the first category
discussed in Section 2. On the other hand, [65, 12, 132, 84] provide detailed reviews from specific perspectives, such
as graph neural networks (GNNs), prior knowledge integration, explainable artificial intelligence (XAI), and statistical
relational learning. While surveys [8, 41] also cover neural-symbolic learning systems comprehensively, their focus
remains primarily theoretical, lacking a thorough introduction to specific techniques and related works. Therefore, an
urgent need arises to provide a comprehensive survey that encompasses popular methods and specific techniques (e.g.,
model frameworks, execution processes) to expedite advancements in the neural-symbolic field. Distinguishing itself
from the aforementioned surveys, this paper emphasizes classifications, techniques, and applications within the domain
of neural-symbolic learning systems.
Motivation: For our part, we do not seek to replace the above literature but complement them by offering a
comprehensive overview of the broader domain of neural-symbolic learning systems. This encompasses various
technologies, cutting-edge developments, and diverse application areas within the realm of neural-symbolic learning
systems. Additionally, this article caters to individuals engaged in the pursuit of integrating symbolic systems and
neural systems. With an emphasis on integration, we present a novel classification framework for neural-symbolic
learning systems in Section 2.
Our contributions can be summarized as follows:
1) We propose a novel taxonomy of neural-symbolic learning systems. Neural-symbolic learning systems are
categorized into three groups: learning for reasoning, reasoning for learning, and learning-reasoning.
2) We provide a comprehensive overview of neural-symbolic techniques, along with types and representations of
symbols such as logic knowledge and knowledge graphs. For each taxonomy, we provide detailed descriptions of the
representative methods, summarize the corresponding characteristics, and give a new understanding of neural-symbolic
learning systems.
3) We discuss the applications of neural-symbolic learning systems and propose four potential future research
directions, thus paving the way for further advancements and exploration in this field.
The remainder of this survey is organized as follows. In Section 2, we categorize the different methods of neural-
symbolic learning systems. Section 3 introduces the main technologies of neural-symbolic learning systems. We
summarize the main applications of neural-symbolic learning systems in Section 4. Section 5 discusses the future
research directions, after which Section 6 concludes this survey.
In this survey, the neural systems mainly refer to deep learning (or deep neural networks), while symbolic systems
include symbolic knowledge [123, 22, 10] and symbolic reasoning techniques [18, 110], etc. The methodology of our
classification is determined by the integration mode between neural systems and symbolic systems, which has three
main integration methodologies. These three classifications are similar to [61] from essence but include it.
Table 2
Main Approaches of neural-symbolic learning systems. Taxonomies of the approaches and combination modes are presented
in Section 2. Methodical details can be found in Section 3. The symbols and neural networks used herein are introduced
in Appendix A, and "Agnostic" means arbitrary neural networks. "Serialization", "parallelization" and "interaction" are
combination modes, respectively. Applications are discussed in Section 5.
pLogicNet[103]
GNN Knowledge graph reasoning
ExpressGNN[150]
SBR[25] CNN
Classification
SL[142] Agnostic
Propositional logic
LENSR[140] Visual relationship detection
Reguariziting
CA-ZSL[77] Reasoning for learning CNN,GNN
LSFSL[70] (Parallelization)
DGP[58] Transfering
CNN,GNN
KGTN[14]
DeepProLog[80]
Agnostic
ABL[152]
Learning-reasoning Complex reasoning
GABL[11] Interacting First-order logic CNN
(Interaction)
WS-NeSyL[125] RNN
search space, making the computation more efficient. The second aspect of learning for reasoning is the abstraction
or extraction of symbols from data using neural networks to facilitate symbolic reasoning [113, 40, 107]. In this
case, neural networks serve as a means of acquiring knowledge for symbolic reasoning tasks. They learn to extract
meaningful symbols from input data and use them for subsequent reasoning processes. The basic framework for
Data bases
Logic rules Symbol Symbolic Reasoning
representation
𝑦
Knowledge graphs systems
…
𝑠
Images Vector Neural
Videos representation systems Learning 𝑥
Texts
Solution
…
𝑥 System integration
Figure 3: Schematic diagram of two systems integration. Symbolic systems generally use reasoning technologies, such as
logic programs and search algorithms, to obtain a solution based on domain knowledge, such as first-order logic, knowledge
graphs, etc. The goal of neural systems is to learn a function from the training samples to predict a solution.
learning for reasoning is illustrated in Figure 4. As depicted in the figure, this type of model is characterized by
a serialization process, where the neural network component and the symbolic reasoning component are connected
sequentially. The neural network extracts relevant features or symbols from the input data, which are then used by the
symbolic reasoning module to perform higher-level reasoning tasks.
Neural network
accelerating 𝑦
Data bases
Logic rules Symbolic reasoning
Knowledge graphs
… 𝑥
𝑥 Solution
(a) First aspect
𝑦
Images
transforming
Videos Neural network Symbolic reasoning
Texts
…
𝑥 𝑥
(b) Second aspect Solution
Figure 4: Principle of Learning for reasoning. Its goal is to introduce neural networks to reasoning, a problem that it solves
primarily through reasoning technologies.
approach is illustrated in Figure 5. This type of model is characterized by parallelization, where the neural system and
symbolic system operate in parallel during the learning process. The neural network component learns from the data,
while the symbolic system provides additional knowledge or constraints to guide the learning process.
Data bases
Logic rules Symbolic reasoning
Knowledge graphs
𝑦
…
Constraining
Images
Videos
Texts Neural network
…
𝑥 Solution
𝑥
Figure 5: Principle of reasoning for learning. It introduces symbolic knowledge to neural networks, and the main body relies
on neural networks to get solutions.
2.3. Learning-reasoning
In the third category, referred to as learning-reasoning, the interaction between neural systems and symbolic
systems is bidirectional, with both paradigms playing equal roles and working together in a mutually beneficial way
[80, 152, 11, 125, 148, 47]. The goal of learning-reasoning is to strike a balance between the involvement of neural
systems and symbolic systems in the problem-solving process. In this approach, the output of the neural network
becomes an input to the symbolic reasoning component, and the output of the symbolic reasoning becomes an input
to the neural network. By allowing the neural systems and symbolic systems to exchange information and influence
each other iteratively, this approach aims to leverage the strengths of both paradigms and enhance the overall problem-
solving capability. For example, incorporating symbolic reasoning techniques like abduction enables the design of
connections between deep neural networks and symbolic reasoning frameworks [79, 152, 148]. In this case, the neural
network component generates hypotheses or predictions, which are then used by the symbolic reasoning component
to perform logical reasoning or inference. The results from symbolic reasoning can subsequently be fed back to the
neural network to refine and improve the predictions. The basic principle of learning-reasoning is illustrated in Figure
6, where the interaction between neural systems and symbolic systems occurs in an alternating fashion. This mode of
combining both technologies allow for iterative learning and reasoning, enabling a deeper integration of neural and
symbolic approaches. By embracing bidirectional interaction and iterative exchange of information between neural
systems and symbolic systems, learning-reasoning approaches aim to maximize the strengths of both paradigms and
achieve enhanced problem-solving capabilities in various domains.
𝑦
Images
transforming
Videos Symbolic reasoning
Neural network
Texts
…
𝑥 constraining 𝑥
Solution
Figure 6: Principle of learning-reasoning. It combines neural networks and reasoning technologies as an alternate process,
and both together to output a solution.
In summary, the above three taxonomies are heterogeneous multi-module architectures in [119]. Learning for
reasoning and reasoning for learning are loosely coupled while learning-reasoning is tightly coupled. According to
the taxonomy of the neural-symbolic method presented in this paper, we summarize the existing main approaches from
six dimensions in Table 2: representative works, taxonomies, methods, symbols, neural networks and applications.
In the Section 3, we will introduce the details of these approaches, discussing their methodologies, techniques,
and characteristics. This will provide readers with a deeper understanding of how neural and symbolic systems are
combined in various ways to tackle different problems.
Table 3
Main characteristics of the selected methods.
Approaches Inputs Technology Tools Mechanism/Objective
pLogicNet [103] x,s SRL MLN learn a joint probability distribution
ExpressGNN [150] x,s SRL MLN learn a joint probability distribution
NLIL[146] x ILP Transformer use a Transform to learn rules based ILP
reason based on parsing symbols
NS-CL[81] x quasi-symbolic program concept parser
for images and questions
HDNN[51] x,s regularization t-norm learn a student network based on knowledge
learn a model with logic knowledge
SBR[25] x,s regularization t-norm
as a constraint of the hypothesis space
design a semantic loss to
SL[142] x,s regularization arithmetic circuits
act as a regularization term
align distributions between deep learning
LENSR[140] x,s regularization d-DNNF
and propositional logic
CA-ZSL[77] x,s regularization GCN learn a conditional random field
learn a deep learning model
SEKB-ZSR[134] x,s knowledge transfer GCN
with powerful generalization
DGP[58] x,s knowledge transfer GCN learn network embedding with semantics
KGTN [14] x,s knowledge transfer GGNN transfer semantic knowledge into weights
transform knowledge into
PROLONETS [115] x,s knowledge transfer decision tree
neural network parameters
construct an interface between
DeepProLog [80] x,s ProbLog SDD
probLog program and the deep learning models
minimize inconsistency between
ABL[152] x,s abductive reasoning SLD
pseudo-labels and symbolic knowledge
learn an encoder-decoder
WS-NeSyL[125] x,s ProLog SDD
constrained by logic rules
learn a model that fits both
BPGR[148] x,s SRL MLN
the ground truth and FOL
Variational EM
Symbol space
Continuous space
Figure 7: The framework of the ExpressGNN model. The model contains continuous space and symbolic space, and the
variational EM algorithm acts as a bridge that connects the continuous space and symbolic space. Note, in the knowledge
graph, 𝑒𝑘 is the entity and 𝑟𝑘 is the relation.
Indeed, relying solely on manually constructed logic rules may not capture the complete knowledge present in the
data. To address this limitation, researchers have explored approaches to automatically learn logic rules from data. Two
notable methods in this regard are the extensions of Markov logic networks (MLNs) proposed by Marra et al. [86, 83]
and differentiable inductive logic programming (ILP) models. Marra et al. extended MLNs by designing a general
neural network architecture that can automatically learn the potential functions of MLNs from the original data. By
training the neural network on labeled data, the model learns to capture the underlying patterns and relationships in
the data, effectively learning logic rules that approximate the true structure of the domain. This approach enables the
integration of neural networks and symbolic reasoning, allowing the model to learn logic rules directly from the data.
Differentiable ILP is another approach that combines neural networks and logic to learn rules . It extends traditional
ILP methods [67] by introducing differentiable operations that enable the incorporation of neural networks into the
ILP framework. This allows the model to learn logic rules by leveraging the expressive power of neural networks.
Several differentiable ILP models have been proposed [37, 35, 13, 109, 98], including 𝜕ILP [35], which uses predefined
templates to construct logic rules and applies forward reasoning for inference. 𝜕ILP is capable of learning effective
logic rules even in the presence of noisy data, making it robust to imperfect input.
The current methods for learning logic rules often face limitations in expressiveness and computational feasibility.
Approaches such as expressing the chain rule as a Horn clause and controlling the search length, number of
relationships, and entities have been employed to address these challenges [21, 145, 45]. However, the limited
expressive power of these complex logic rules can hinder their effectiveness. To overcome these limitations, Yang
et al. introduced neural logic inductive learning (NLIL) [146]. NLIL is a differentiable ILP model that extends the
multi-hop reasoning framework to address general ILP problems. It allows for the learning of complex logic rules,
including tree and conjunctive rules, which offer greater expressiveness compared to traditional approaches. NLIL
leverages neural networks to learn logic rules from data and provides explanations for patterns observed in the data.
It first converts a logical predicate into a predicate operation, and then transforms all the intermediate variables
into predicate operation representations of the head and tail entities, and such the head and the tail variables can be
represented by randomly initialized vectors in the concrete implementation, so as to get rid of data dependency; then
such a predicate operation forms the atom of the logical paradigm, which greatly expands the expression capability
of logical predicates such as from chains to trees. Next, the NLIL model further extends the expressive power of the
generated logical paradigm by combining atoms using logical connectives (and, or, not). Indeed, the NLIL model uses a
hierarchical transformer model 1 to efficiently compute the intermediate parameters to be learned, including the vectors
of logical predicates and the corresponding parameters of the attention mechanism (the weighted summation).
In NLIL, logic rules are grounded through matrix multiplication. For example, consider the logic rule 𝐹 𝑟𝑖𝑒𝑛𝑑𝑠(𝑥, 𝑦) ⇒
𝑆𝑚𝑜𝑘𝑒𝑠(𝑥), where constants 𝐶 = {𝐴, 𝐵} are one-hot vectors, such as 𝑉𝐴 and 𝑉𝐵 , and predicates 𝐹 𝑟𝑖𝑒𝑛𝑑𝑠 and 𝑆𝑚𝑜𝑘𝑒𝑠
are mapped to matrices such as 𝑀𝐹 𝑟𝑖𝑒𝑛𝑑 (𝐴, 𝐵) is a score of 𝐴 and 𝐵 that are related by 𝐹 𝑟𝑖𝑒𝑛𝑑. The score of grounding
is 𝑉𝐵 = 𝑉𝐴 𝑀𝐹 𝑟𝑖𝑒𝑛𝑑 .
obj1
obj2
obj3
obj4
Query(Shape, Filter(Red))
domain-specific language
Figure 8: The framework of the NS-CL. The perception module begins by parsing visual scenes into object-based deep
representations, while the semantic parser parse sentences into executable programs. A symbolic reasoning process bridges
two modules.
from the unstructured data, neural networks provide the symbolic systems with meaningful inputs for reasoning and
decision-making.
∑
𝑁
𝜃 (𝑡+1) = arg min𝜃∈Θ 1∕𝑁 (1 − 𝜋)𝑙(𝑦𝑛 , 𝜎𝜃 (𝑥𝑛 )) + 𝜋𝑙(𝑠𝑛 (𝑡) , 𝜎𝜃 (𝑥𝑛 )), (2)
𝑛=1
⎧min ∑
𝑞,𝜉≥0 𝐾𝐿(𝑞(𝑌 |𝑋)‖𝑝𝜃 (𝑌 |𝑋)) + 𝐶 𝑙,𝑔𝑙 𝜉𝑙,𝑔𝑙
⎪
⎨𝜆𝑙 (1 − 𝐸𝑞 [𝑟𝑙,𝑔𝑙 (𝑋, 𝑌 )]) ≤ 𝜉𝑙,𝑔𝑙 (3)
⎪𝑔𝑙 = 1, ..., 𝐺 , 𝑙 = 1, ..., 𝐿
⎩ 𝑙
In Eq. (2), 𝜋 is the limitation parameter used to calibrate the relative importance of the two objectives; 𝑥𝑛 represents
the training data, while 𝑦𝑛 the label of the training data; 𝑙 denotes the loss function selected according to specific
applications (e.g., the cross-entropy loss for classification); 𝑠(𝑡)
𝑛 is the soft prediction vector of 𝑞(𝑦|𝑥) on 𝑥𝑛 at iteration
𝑡; 𝜎𝜃 (𝑥) represents the output of 𝑝𝜃 (𝑦|𝑥); the first term is the student network, and the second term is the teacher
network. In Eq. (3), 𝜉𝑙,𝑔𝑙 ≥ 0 is the slack variable for the respective logic constraint; 𝐶 is the regularization parameter;
𝑙 is the index of the rule; 𝑔𝑙 is the index of the ground rule; 𝜆𝑙 is the weight of the rule.
Different from the knowledge distillation framework, certain approaches incorporate logical knowledge as a
constraint within the hypothesis space. These methods involve encoding a logic formula, either propositional or
first-order, into a real-valued function that serves as a regularization term for the neural model. An example of
such an approach is semantic-based regularization (SBR) proposed by Diligenti et al. [25]. SBR combines the
strengths of classic machine learning, with its ability to learn continuous feature representations, and symbolic
reasoning techniques, with their advanced semantic knowledge reasoning capabilities. SBR is applied to address
various problems, including multi-task optimization and classification. Following the classical penalty approach for
constrained optimization, constraint satisfaction can be enforced by adding a term that penalizes the violation of these
constraints into the loss of the model.
Building upon the idea of semantic-based regularization (SBR), Xu et al. [142] introduced a novel approach called
semantic loss (SL). SL combines the power of propositional logic reasoning with deep learning architectures by
incorporating the output of the neural network into the loss function as a constraint for the learnable network. This
enables the neural network to leverage the reasoning capabilities of propositional logic to improve its learning ability.
In contrast to SBR, SL takes a different approach to incorporating logic rules into the loss function. It encodes the
logic rules using an arithmetic circuit, specifically a Sentential Decision Diagram (SDD) [18], which allows for the
evaluation of the model. This encoding serves as an additional regularization term that can be directly integrated into
an existing loss function. The formulation of SL is provided in Equation (4).
∑ ∏ ∏
𝐿𝑠 (𝛼, 𝑝) ∝ − log 𝑝𝑖 (1 − 𝑝𝑖 ), (4)
𝑥⊧𝛼 𝑖∶𝑥⊧𝑋𝑖 𝑖∶𝑥⊧¬𝑋𝑖
where 𝑝 is a vector of probabilities from the prediction of the neural network, 𝛼 is a propositional logic, 𝑥 is the
instantiation of 𝑋𝑖 , and 𝑥 ⊧ 𝛼 is a state 𝑥 that satisfies a sentence 𝛼.
Notably, the aforementioned methods do not employ explicit knowledge representation techniques, leading to an
unclear computational process for symbolic knowledge. To address this issue, some researchers have chosen to utilize
tools that can model symbols, such as d-DNNF (as mentioned in Section A). Xie et al. [140] integrated propositional
logic into a relationship detection model and proposed a logic embedding network with semantic regularization
(LENSR) to enhance the relationship detection capabilities of deep models. The process of LENSR can be summarized
as follows: 1) The visual relationship detection model predicts the probability distribution of the relation predicate for
each image; 2) The prior propositional logic formula related to the sample image is expressed as a directed acyclic
graph by d-DNNF, after which GNN is used to learn its probability distribution; 3) An objective function is designed
that aligns the above two distributions.
Figure 9 presents a schematic diagram of LENSR, which uses a propositional logic of the form 𝑃 ⇒ 𝑄. In this
example, the predicate 𝑃 represents 𝑤𝑒𝑎𝑟(𝑝𝑒𝑟𝑠𝑜𝑛, 𝑔𝑙𝑎𝑠𝑠𝑒𝑠), and the predicate 𝑄 represents 𝑖𝑛(𝑔𝑙𝑎𝑠𝑠𝑒𝑠, 𝑝𝑒𝑟𝑠𝑜𝑛). The
ground truth of the input image and the corresponding propositional logic (prior knowledge) are on the left. The
directed acyclic graph of propositional logic d-DNNF is then sent to the Embedder 𝑞 (Embedder is a graph neural
network (GNN) [63] that learns the vector representation) to obtain 𝑞(𝐹𝑥 ), which is the embedding of the propositional
logic knowledge. On the right is the relation label predicted by the detection network. The predicted labels are then
combined into a conjunctive normal form ℎ(𝑥) = ∧𝑝𝑖 to construct a directed acyclic graph d-DNNF, which is sent to
the Embedder 𝑞 to obtain 𝑞(ℎ(𝑥)), the embedding of the predicted propositional logic. The optimization goal of LENSR
is shown in Eq. (5). Here, 𝐿𝑡𝑎𝑠𝑘 represents the loss of a specific task, 𝜆 is a hyperparameter that acts as a balance factor,
and 𝐿𝑙𝑜𝑔𝑖𝑐 is the loss of propositional logic (that is, the distance between vector 𝑞(𝐹 𝑥) and vector 𝑞(ℎ(𝑥))).
𝒒:
Embedder
p1 Λ p2 Λ p3
P => Q ^ ^
p1: wear (person, glasses)
P: wear (person, glasses)
⋁ p2:next-to(person, board)
Q: in (glasses, person)
-p q p1 p2 p3 p2:above(person, desk)
………….
Fx ∧𝑖 𝒑𝒊
Figure 9: The framework of the LENSR. The GCN-based embedder 𝑞 projects a logic graph to the vector space, satisfying
the requirement that the distribution of the projected result is as close to the distribution of the real label as possible. We
use this embedding space to form logic losses that regularize deep neural networks for a target task.
However, LENSR models a local dependency graph for each logic formula and only captures local knowledge
information, which may result in the poor expressive ability of the model. To solve this problem, the researcher started
to model a global dependency graph for all logic formulas, which can improve the expressive ability of the model and
effectively capture uncertainty [148].
The above methods utilize logic rules as prior knowledge. In contrast, Luo et al. [77] focused on knowledge graphs
and proposed a context-aware zero-shot recognition method (CA-ZSL) to address the zero-shot detection problem.
CA-ZSL constructs a model based on deep learning and conditional random fields (CRF) and leverages knowledge
graphs, which represent the semantic relationships between classes, to assist in identifying objects from unseen classes.
The framework of CA-ZSL is depicted in Figure 10. In this framework, individual and pairwise features are extracted
from the image. The instance-level zero-shot inference module utilizes individual features to generate a unary potential
function, while the relationship inference module employs pairwise features and knowledge graphs to generate a binary
potential function. Finally, based on the conditional random field constructed by these two potential functions, the label
of the unseen objects is predicted.
CA-ZSL incorporates a knowledge graph, which includes the GCN-encoded embedding, into the computation
of the binary potential function within the conditional random field, thereby facilitating the learning process of the
model. The objective of optimization is to maximize the joint probability distribution of the conditional random field,
as depicted in Equation (7). In the equation, 𝜃 denotes the unary potential function, 𝜓 represents the binary potential
function, 𝑐𝑖 represents the class, 𝐵𝑖 represents the object region in the image, 𝛾 is the balance factor, and 𝑁 denotes
the number of objects.
∑ ∑
𝑃 = (𝑐1 ...𝑐𝑁 |𝐵1 ...𝐵𝑁 ) ∝ 𝑒𝑥𝑝( 𝜃(𝑐𝑖 |𝐵𝑖 ) + 𝛾 𝜓(𝑐𝑖 , 𝑐𝑗 |𝐵𝑖 , 𝐵𝑗 )). (7)
𝑖 𝑖≠𝑗
Knowledge Graph
class1
Object detection model
class5
class6
Images
Figure 10: The framework of the CA-ZSL. The features of individual objects and pairwise features are extracted from
the image and input into an instance-level zero-shot inference module and a relationship inference module respectively. In
combination with the knowledge graph, the unary potential function and binary potential function of CRF are generated
respectively to predict the labels of objects.
∑
𝑀 ∑
𝑃
2
𝐿 = 1∕2𝑀 (𝑊𝑖,𝑗 − 𝑊 ′ 𝑖,𝑗 ) . (8)
𝑖=1 𝑗=1
Transferring correlation information between classes can be beneficial for learning new concepts. DGP demon-
strates the importance of aligning the semantic classifier with the feature classifier. Building upon this idea, Chen et
al. [14] proposed the Knowledge Graph Transfer Network (KGTN) to address the few-shot classification problem. In
KGTN, knowledge graphs are utilized to capture and model the correlations between seen and unseen classes. These
CNN
𝑊
𝐿𝑜𝑠𝑠 = ||𝑊 − 𝑊 ′ ||2
GNN
Knowledge graph 𝑊′
Figure 11: The framework of the DGP. DGP is trained to predict the classifier weights 𝑊 for each node/class in a graph.
The weights for the training classes are extracted from the final layer of a pre-trained ResNet. The graph is constructed from
a knowledge graph, and each node is represented by a vector that encodes semantic class information (the word embedding
class in this paper). The network consists of two phases: a descendant phase (where each node receives knowledge from
its descendants) and an ancestor phase (where it receives knowledge from its ancestors).
knowledge graphs serve as a means to transfer knowledge and facilitate the learning process. The overall architecture
of KGTN is depicted in Figure 12.
Specifically, KGTN comprises three main parts: the feature extraction module, knowledge graph transfer module,
and prediction module. The feature extraction module uses CNN to extract the feature vector of images. The knowledge
graph transfer module uses a gated graph neural network (GGNN) to learn the knowledge graph node embedding. After
𝑇 iterations, the knowledge graph transfer module obtains the final weight 𝑤∗ , which has captured the correlation
between the seen and the unseen classes. The prediction module calculates the similarity between the weight 𝑤∗ and
the image feature to predict the probability distribution of the label.
CNN
x
Knowledge Graph
𝐰∗
Figure 12: The framework of the KGTN model. It incorporates the prior knowledge of category correlation and makes
use of the interaction between category classifier weights, facilitating better learning of the classifier weights of unknown
categories.
In contrast to static domain knowledge, Silva et al. [115] introduced Propositional Logic Nets (PROLONETS),
which directly encode domain knowledge as a collection of propositional rules within a neural network. PROLONETS
not only incorporates domain knowledge into the model but also allows for the refinement of domain knowledge based
on the trained neural network. The framework of PROLONETS is illustrated in Figure 13. This approach enables the
neural network to leverage domain-specific information and improve its learning and reasoning capabilities.
PROLONETS aids in “warm starting" the learning process in deep reinforcement learning. The first step involves
knowledge representation, where policies and actions express domain knowledge in the form of propositional rules.
These rules are then encoded into a decision tree structure. The second step is neural network initialization, wherein the
nodes of the decision tree are directly transformed into neural network weights. This allows the agent to immediately
commence learning effective strategies in reinforcement learning. The final step is training, during which the initialized
network interacts with the environment, collecting data that is subsequently used to update parameters and rectify
domain knowledge.
position>-1
Figure 13: The framework of the PROLONETS. Domain knowledge is constructed into a decision tree that is then used
to directly initialize a PROLONET’s architecture and parameters; here, leaf nodes are actions and other nodes are policies
in reinforcement learning. The PROLONET can then begin reinforcement learning in the given domain, outgrowing its
original specification.
Let us consider the cart pole as an example. The state space of a cart pole is a four-dimensional vector: cart
position, cart velocity, pole angle, and pole velocity. The action space is a two-dimensional vector (left, right). Domain
knowledge can be expressed as "if the cart’s position is right of center, move left; otherwise, move right.". The decision
nodes of the tree become linear layers, leaves become action weights, and the final output is a sum of the leaves weighted
by path probabilities. Therefore, if 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 > −1, the weight of the neural network is 𝑤 = {1, 0, 0, 0}, and the bias is
𝑏 = −1.
Conclusion: Based on the above text, we can summarize the following key factors in reasoning for learning. (1)
Knowledge representation. Symbolic knowledge is a kind of discrete representation. To achieve combination between
symbolic knowledge (discrete representation) and neural network (continuous representation), most methods usually
convert the symbolic knowledge into an intermediate representation, such as a graph, tree, etc. Moreover, another
approaches use fuzzy logic (such as t-norm) to assign soft truth degrees in the continuous set [0, 1]. In summary, it
may very well be among the essential representations grounded in the environment that form the foundation of a much
larger representational edifice that is needed for human-like general intelligence [50]. (2) Combining approaches.
One type of approach involves taking symbolic knowledge as a regularization term in the loss functions of the neural
networks. The others involve encoding symbolic knowledge into the structure of the neural networks as an initiative to
improve their performance. It is worth noting that logic rules usually are added as constraints to loss functions, while
knowledge graphs often enhance neural networks with information about relations between instances.
3.3. Learning-reasoning
In learning-reasoning approaches, learning and reasoning do not work in isolation but instead closely interact. This
is a development trend of neural-symbolic learning systems [80, 152, 11, 125, 148].
Based on ProbLog [23], Robin et al. [80] introduced neural facts and neural annotated disjunction (neural
AD) to propose a model that seamlessly integrates probability, logic, and deep learning, known as DeepProbLog.
DeepProbLog is a pioneering framework that combines a generic deep neural network with probabilistic logic in a
unique manner. It offers the advantage of enhanced expressive capability and enables end-to-end training for neural
networks and logical reasoning in a unified framework.
DeepProbLog is a probabilistic programming language that integrates deep learning through the use of "neural
predicates". These neural predicates serve as an interface between neural networks and symbolic reasoning. In
DeepProbLog, an image, for instance, is processed by a neural network, which outputs the distribution of each
class in the dataset as logical facts for symbolic reasoning. Specifically, neural networks are employed to process
simple concepts or unstructured data, generating inputs for symbolic reasoning in DeepProbLog. Symbolic reasoning
in DeepProbLog utilizes SDD (Sentential Decision Diagrams) [18] to construct a directed graph, which is then
transformed into an arithmetic circuit for inference and answering queries. To enable end-to-end training that bridges
continuous embedding and discrete symbols, DeepProbLog leverages the gradient semiring [32] as an optimization
tool. The framework of DeepProbLog is visually depicted in Figure 14.
× ×
Machine learning
In a different approach to DeepProbLog, Zhou et al. [152] proposed abductive learning (ABL) as a framework
that combines abductive reasoning [2] with induction. Abductive reasoning, which is a form of logical reasoning,
involves inferring the best explanation for given observations or evidence. ABL leverages both induction, which
is a key component of modern machine learning, and abduction, which is the process of generating hypotheses or
explanations, in a mutually beneficial way. ABL provides a unified framework that bridges machine learning and logical
reasoning, allowing for the integration of both approaches to improve the overall learning and reasoning process. This
framework offers a new perspective and methodology for effectively combining machine learning techniques with
logical reasoning techniques.
In more detail, given raw data (This raw data only includes the data features and a label of true or false, with
no label of the class.), an initialized classifier, and a knowledge base (KB), the raw data is fed into the initialized
classifier to obtain pseudo-labels in machine learning. These pseudo-labels (pseudo-grounding) are then transformed
into symbolic representations that can be accepted by logical reasoning. Next, ABL uses ProLog as the KB and adopts
abductive reasoning technology to abduct the pseudo-labels and rules. That is to say, logical reasoning minimizes
the inconsistency between the symbolic representation and the KB to revise pseudo-labels, then outputs the deductive
labels. Finally, a new classifier is trained by the deductive labels and the raw data, which replaces the original classifier.
The above is an iterative process that continues until the classifier is no longer changed or the pseudo-labels are
consistent with the KB. ABL is a special kind of weakly supervised learning, in which the supervision information
comes not only from the ground-truth labels but also from knowledge abduction.
Based on the ABL framework, Tian et al. [125] proposed a weakly supervised neural symbolic learning model
(WS-NeSyL) for cognitive tasks with logical reasoning. The difference between WS-NeSyL and ABL is that ABL uses
a metric of minimal inconsistency in logical reasoning, while WS-NeSyL adopts sampling technology. In WS-NeSyL,
to provide supervised information for the reasoning process in complex reasoning tasks, the neural network is designed
as an encoder-decoder framework that includes an encoder and two decoders (perceptive decoder and cognitive
decoder). The encoder can encode input information as a vector, while the perceptive decoder decodes the vector to
predict labels (pseudo-labels). According to these pseudo-labels and the sampled logic rules from the knowledge base,
the cognitive decoder to reason results. To supervise the reasoning of the cognitive decoder, WS-NeSyL provides a
back search algorithm to sample logic rules from the knowledge base to act as labels that are used to revise the predicted
labels. To solve the sampling problem, WS-NeSyL introduces a regular term of logic rules. The whole model is trained
iteratively until convergence.
The knowledge base is an important factor in logical reasoning, and different knowledge bases are used by different
reasoning technologies. The above approaches use probabilistic logic programming language (ProbLog) as their
knowledge base; notably, they only consider that neural networks can provide facts for the knowledge base, and do
not quantify how many logic rules should be triggered by the neural networks. To resolve this issue, Yu et al. [148]
proposed a bi-level probabilistic graphical reasoning framework, called BPGR. To quantify the amount of symbolic
knowledge that is triggered, BPGR uses MLN to model all logic rules. For instance, MLN can express the time at
which a logic rule is true in the form of a potential function.
BPGR includes two parts: the visual reasoning module (VRM) and the symbolic reasoning module(SRM). VRM
extracts the features of objects in images and the inferred labels of objects and relationships. SRM uses symbolic
knowledge to guide the reasoning of VRM in a good direction, which acts as an error correction. In terms of the model
framework, more concretely, SRM is a double-layer probabilistic graph that contains two types of nodes: one is the
reasoning results of the VRD model in the high-level structure, and the other is the ground atoms of logic rules in
the low-level structure. When the probabilistic graphical model is constructed, BPGR can be efficiently trained in an
end-to-end manner by the variational EM algorithm. An overall framework of BPGR is provided in Figure 15.
update
MLN
Figure 15: The framework of the BPGR. This model is a two-layers probabilistic graphical model which consists of a visual
reasoning model and a symbolic reasoning module. Here, The high-level structure is the result of the visual reasoning
module, while the low-level structure is the ground atom of logic rules. The model is trained to output reasoning results of
the visual reasoning module based on symbolic knowledge. Note that G represents the grounding operator, the solid line
represents ground truth edges, and the dotted line represents pseudo-edges.
Conclusion: The field of learning-reasoning approaches in AI research has gained significant attention due to the
advantages it offers by combining neural networks and symbolic reasoning. The integration of neural networks and
symbolic reasoning allows for the utilization of the strengths of both approaches. Neural networks provide the ability
to process complex data and generate predictions, while symbolic reasoning provides a structured and interpretable
framework for representing and reasoning about knowledge. For instance, DeepProbLog and ABL have similar model
principles: the modeling of complex problems is defined in a logic programming language, and the neural network
is used to define simple concepts in a logic programming language. BPGR uses neural networks to accelerate the
search process of symbolic reasoning, along with symbolic knowledge to constrain neural network learning. This
model not only characterizes the matching degree between prediction results and symbolic knowledge but also clearly
states which symbolic knowledge is being fitted, along with the probability of this symbolic knowledge being fitted as
an explanation for the model prediction. However, one limitation of these approaches is their reliance on predefined
logic programming or logic rules, which restricts their generalizability to other tasks. Future research in this field
should explore higher-level interactions between neural networks and symbolic reasoning, such as learning symbolic
knowledge during training. By enhancing the ability of models to acquire and reason with symbolic knowledge in a
data-driven manner, the field can further advance the integration of learning and reasoning in AI systems.
Based on representative works, we summarize a general design idea for neural-symbolic approaches. The
interaction between neural networks and symbolic systems allows encoding the embedding of symbolic knowledge into
neural network models and feeds abstracted symbols of the neural networks into symbolic systems. Further, we provide
some characteristics that should be considered in designing neural-symbolic approaches, as follows: (1) Uncertainty.
The output of the neural network is a distribution, not "True" or "False". Therefore, we need to consider the uncertainty
of the triggered symbolic knowledge. (2) Globalization. It is necessary to consider the fit of all symbolic knowledge
in the knowledge base, not just the local knowledge. (3) Importance. Different knowledge may have different weights,
and the degree of fitting knowledge with different weights should be considered. (4) Interpretability. Interpretability
should be explicitly considered in learning (e.g. the immediate process of the result of learning).
4. Applications
4.1. Object/visual-relationship detection
The goal of object/visual-relationship detection is to recognize objects or the relationships between objects in
images. However, relying solely on visual features to train a model often leads to relatively weak performance. In recent
years, the emergence of neural-symbolic learning systems has paved the way for incorporating external knowledge to
enhance the detection performance of these models. This integration of external knowledge into the learning process
has shown promising results and has become an active area of research in the field.
Donadello et al. [28] propose a novel approach that combines neural networks with first-order logic, known as Logic
Tensor Networks (LTN). By incorporating logical constraints, LTNs enable effective reasoning from noisy images
while also providing a means to describe data characteristics through logic rules. This integration of logic into the neural
network framework enhances interpretability in image recognition tasks. In the context of remote sensing, Marszalek
and Forestier et al. [87, 36] emphasize the utilization of symbolic knowledge from domain experts to improve the
detection capabilities. By leveraging expert knowledge, the remote sensing systems can gain a deeper understanding
of the data and achieve better performance in detecting specific features or patterns. Zhu and Nyga et al. [153, 96]
adopt a different approach by using Markov Logic Networks (MLN) to model symbolic knowledge for integration into
deep learning models. MLNs allow for learning a scoring function and predicting relations between input images and
specific objects or concepts. For example, given an input image of a horse, the model can predict the relation "ridable"
between the horse and people. This approach combines the strengths of deep learning and symbolic reasoning, enabling
more comprehensive and nuanced analysis of visual data.
Yang et al. [144] proposed the integration of symbolic planning and hierarchical reinforcement learning (HRL) [7]
to address decision-making in dynamic environments with uncertainties. They introduced a framework called PEORL
(Planning-Execution-Observation-Reinforcement-Learning) that combines these two approaches. Symbolic planning
is employed to guide the agent’s task execution and learning process, while the learned experiences are fed back to
the symbolic knowledge to enhance the planning phase. Specifically, commonsense knowledge of actions constrains
the answer set solver to generate a symbolic plan. The symbolic plan is subsequently mapped to a deterministic
sequence of stochastic options, which guides the hierarchical reinforcement learning (HRL) process. This approach
represents the first utilization of symbolic planning for option discovery within the HRL framework. To achieve
task-level interpretability, Lyu et al. [78] proposed the Symbolic Deep Reinforcement Learning (SDRL) framework,
which shares similarities with REORL and consists of a planner, controller, and meta-controller, along with symbolic
knowledge. The planner employs prior symbolic knowledge to perform long-term planning through a sequence of
symbolic actions (subtasks) that aim to achieve its intrinsic goal. The controller utilizes deep reinforcement learning
(DRL) algorithms to learn sub-policies for each subtask based on intrinsic rewards. The meta-controller learns extrinsic
rewards by evaluating the training performance of the controllers and suggesting new intrinsic goals to the planner.
In essence, both PEORL and SDRL leverage symbolic knowledge to guide the reinforcement learning process and
facilitate decision-making.
5. Future directions
The above paper introduces the current research status and research methods of neural-symbolic learning systems
in detail. On this basis, we discuss some potential future research directions.
is the design of more robust and efficient symbolic representation learning methods. The advancement of graph
representation learning offers a promising avenue for addressing this challenge. By mapping nodes to low-dimensional,
dense, and continuous vectors, graph representation learning can flexibly support various learning and reasoning tasks.
Given that symbolic knowledge often exhibits heterogeneity, multiple relations, and even multimodality, exploring
the development and utilization of heterogeneous graph representation learning methods becomes another important
direction to overcome the challenges faced by neural-symbolic learning systems.
6. Conclusion
In this paper, we have presented an overall framework for neural-symbolic learning systems. Our main contribution
is the proposal of a novel taxonomy for neural-symbolic learning systems and the outline of three structured
categorizations. Additionally, we describe the techniques used in each structured categorization, explore a wide range
of applications, and discuss future directions for neural-symbolic learning systems. We firmly believe that a systematic
and comprehensive research survey in this field holds significant value in terms of both theory and application. It
deserves further in-depth research and discussion.
Acknowledgments
This research was partly supported by Foundation item: National Natural Science Foundation of China (61876069);
Jilin Province Key Scientific and Technological Research and Development Project under Grant Nos. 20180201067GX
and 20180201044GX; and Jilin Province Natural Science Foundation (20200201036JC).
References
[1] Abboud, R., Ceylan, I., Lukasiewicz, T., 2020. Learning to reason: Leveraging neural networks for approximate dnf counting, in: AAAI, pp.
3097–3104.
[2] Aliseda, A., 2006. Abductive Reasoning: Logical Investigations into Discovery and Explanation.
[3] Altszyler, E., Brusco, P., Basiou, N., Byrnes, J., Vergyri, D., 2021. Zero-shot multi-domain dialog state tracking using descriptive rules, in:
IJCLR.
[4] Andreas, J., Rohrbach, M., Darrell, T., Klein, D., 2016. Neural module networks, in: CVPR, pp. 39–48.
[5] Andrews, R., Diederich, J., Tickle, A.B., 1995. Survey and critique of techniques for extracting rules from trained artificial neural networks,
in: KBS, pp. 373–389.
[6] Bach, S.H., Broecheler, M., Huang, B., Getoor, L., 2015. Hinge-loss markov random fields and probabilistic soft logic, in: arXiv preprint
arXiv:1505.04406.
[7] Barto, A.G., Mahadevan, S., 2003. Recent advances in hierarchical reinforcement learning, in: DEDS, pp. 41–77.
[8] Besold, T.R., Garcez, A.d., Bader, S., Bowman, H., Domingos, P., Hitzler, P., Kühnberger, K.U., Lamb, L.C., Lowd, D., Lima, P.M.V., et al.,
2017. Neural-symbolic learning and reasoning: A survey and interpretation, in: arXiv preprint arXiv:1711.03902.
[9] Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O., 2013. Translating embeddings for modeling multi-relational data, in:
NIPS.
[10] Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y., 2019. Comet: Commonsense transformers for automatic
knowledge graph construction, in: ACL.
[11] Cai, L.W., Dai, W.Z., Huang, Y.X., Li, Y.F., Muggleton, S., Jiang, Y., 2021. Abductive learning with ground knowledge base, in: IJCAI.
[12] Calegari, R., Ciatto, G., Omicini, A., 2020. On the integration of symbolic and sub-symbolic techniques for xai: A survey, in: IA, pp. 7–32.
[13] Campero, A., Pareja, A., Klinger, T., Tenenbaum, J., Riedel, S., 2018. Logical rule induction and theory learning using neural theorem
proving, in: NIPS.
[14] Chen, R., Chen, T., Hui, X., Wu, H., Li, G., Lin, L., 2020. Knowledge graph transfer network for few-shot recognition, in: AAAI, pp.
10575–10582.
[15] Chen, T., Chen, R., Nie, L., Luo, X., Liu, X., Lin, L., 2018. Neural task planning with and–or graph representations, in: IEEE Transactions
on Multimedia, pp. 1022–1034.
[16] Conti, C.J., Varde, A.S., Wang, W., 2020. Robot action planning by commonsense knowledge in human-robot collaborative tasks, in:
IEMTRONICS, pp. 1–7.
[17] Darwiche, A., 2001. On the tractable counting of theory models and its application to truth maintenance and belief revision, in: JANCL, pp.
11–34.
[18] Darwiche, A., 2011. Sdd: A new canonical representation of propositional knowledge bases, in: AI.
[19] Darwiche, A., Marquis, P., 2002. A knowledge compilation map, in: JAIR, pp. 229–264.
[20] Das, R., Dhuliawala, S., Zaheer, M., Vilnis, L., Durugkar, I., Krishnaurthy, A., Smola, A., McCallum, A., 2017. Go for a walk and arrive at
the answer: Reasoning over knowledge bases with reinforcement learning., in: NIPS.
[21] Das, R., Neelakantan, A., Belanger, D., McCallum, A., 2016. Chains of reasoning over entities, relations, and text using recurrent neural
networks, in: ACL.
[22] Davis, E., 2017. Logical formalizations of commonsense reasoning: a survey, in: JAIR, pp. 651–723.
[23] De Raedt, L., Kimmig, A., Toivonen, H., 2007. Problog: A probabilistic prolog and its application in link discovery., in: IJCAI, pp. 2462–2467.
[24] Dettmers, T., Minervini, P., Stenetorp, P., Riedel, S., 2018. Convolutional 2d knowledge graph embeddings, in: AAAI.
[25] Diligenti, M., Gori, M., Sacca, C., 2017. Semantic-based regularization for learning and inference, in: AI, pp. 143–165.
[26] Dollár, K.H.G.G.P., Girshick, R., 2017. Mask r-cnn, in: ICCV, pp. 2961–2969.
[27] Domingos, P., Lowd, D., 2019. Unifying logical and statistical ai with markov logic, in: Commun.ACM, pp. 74–83.
[28] Donadello, I., Serafini, L., Garcez, A.D., 2017. Logic tensor networks for semantic image interpretation, in: IJCAI.
[29] Dong, H., Mao, J., Lin, T., Wang, C., Li, L., Zhou, D., 2019. Neural logic machines, in: ICLR.
[30] Dos Martires, P.Z., Derkinderen, V., Manhaeve, R., Meert, W., Kimmig, A., De Raedt, L., 2019. Transforming probabilistic programs into
algebraic circuits for inference and learning, in: NIPS.
[31] Dragone, P., Teso, S., Passerini, A., 2021. Neuro-symbolic constraint programming for structured prediction, in: arXiv preprint
arXiv:2103.17232.
[32] Eisner, J., 2002. Parameter estimation for probabilistic finite-state transducers, in: ACL, pp. 1–8.
[33] Ellis, K.M., Morales, L.E., Sablé-Meyer, M., Solar Lezama, A., Tenenbaum, J.B., 2018. Library learning for neurally-guided bayesian
program induction, in: NIPS.
[34] Enderton, H.B., 2001. A mathematical introduction to logic.
[35] Evans, R., Grefenstette, E., 2018. Learning explanatory rules from noisy data, in: JAIR, pp. 1–64.
[36] Forestier, G., Wemmert, C., Puissant, A., 2013. Coastal image interpretation using background knowledge and semantics, in: Comput Geosci,
pp. 88–96.
[37] Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M., 2015. Fast rule mining in ontological knowledge bases with amie + +, in: VLDB,
pp. 707–730.
[38] Garcez, A.d., Besold, T.R., De Raedt, L., Földiak, P., Hitzler, P., Icard, T., Kühnberger, K.U., Lamb, L.C., Miikkulainen, R., Silver, D.L.,
2015. Neural-symbolic learning and reasoning: contributions and challenges, in: AAAI.
[39] Garcez, A.d., Dutra, A.R.R., Alonso, E., 2018. Towards symbolic reinforcement learning with common sense, in: arXiv preprint
arXiv:1804.08597.
[40] Garcez, A.d., Gori, M., Lamb, L.C., Serafini, L., Spranger, M., Tran, S.N., 2019. Neural-symbolic computing: An effective methodology for
principled integration of machine learning and reasoning, in: arXiv preprint arXiv:1905.06088.
[41] Garcez, A.d., Lamb, L.C., 2020. Neurosymbolic ai: the 3rd wave, in: arXiv preprint arXiv:2012.05876.
[42] Garcez, A.S.A., Zaverucha, G., 1999. The connectionist inductive learning and logic programming system, in: APIN, pp. 59–77.
[43] Garcez, A.S.d., Broda, K., Gabbay, D.M., et al., 2002. Neural-symbolic learning systems: foundations and applications.
[44] Garcez, A.S.d., Broda, K.B., Gabbay, D.M., 2012. Neural-symbolic learning systems: foundations and applications.
[45] Gardner, M., Mitchell, T., 2015. Efficient and expressive knowledge base completion using subgraph feature extraction, in: EMNLP, pp.
1488–1498.
[46] Garnelo, M., Arulkumaran, K., Shanahan, M., 2016. Towards deep symbolic reinforcement learning, in: NIPS.
[47] Gupta, N., Lin, K., Roth, D., Singh, S., Gardner, M., 2020. Neural module networks for reasoning over text, in: ICLR.
[48] Gupta, V., Patro, B.N., Parihar, H., Namboodiri, V.P., 2022. Vquad: Video question answering diagnostic dataset, in: WACVW, pp. 282–291.
[49] Hoffmann, J., Navarro, O., Kastner, F., Janßen, B., Hubner, M., 2017. A survey on cnn and rnn implementations, in: PESARO.
[50] Honavar, V., 1995. Symbolic artificial intelligence and numeric artificial neural networks: towards a resolution of the dichotomy.
Computational architectures integrating neural and symbolic processes: a perspective on the state of the art , 351–388.
[51] Hu, Z., Ma, X., Liu, Z., Hovy, E., Xing, E., 2016. Harnessing deep neural networks with logic rules, in: ACL.
[52] Hudson, D.A., Manning, C.D., 2018. Compositional attention networks for machine reasoning, in: ICLR.
[53] Hudson, D.A., Manning, C.D., 2019. Learning by abstraction: The neural state machine, in: NIPS.
[54] Ji, J., Zhu, F., Cui, J., Zhao, H., Yang, B., 2022. A dual-system method for intelligent fault localization in communication networks, in: ICC
2022-IEEE International Conference on Communications, pp. 4062–4067.
[55] Jiang, X., Wang, Q., Wang, B., 2019. Adaptive convolution for multi-relational learning, in: NAACL HLT, pp. 978–987.
[96] Nyga, D., Balint-Benczedi, F., Beetz, M., 2014. Pr2 looking at things—ensemble learning for unstructured information processing with
markov logic networks, in: ICRA, pp. 3916–3923.
[97] Oltramari, A., Francis, J., Ilievski, F., Ma, K., Mirzaee, R., 2021. Generalizable neuro-symbolic systems for commonsense question
answering, in: Neuro-Symbolic Artificial Intelligence: The State of the Art, pp. 294–310.
[98] Payani, A., Fekri, F., 2019. Inductive logic programming via differentiable deep neural logic networks, in: arXiv preprint arXiv:1906.03523.
[99] Perotti, A., Boella, G., Colombo Tosatto, S., d’Avila Garcez, A.S., Genovese, V., van der Torre, L., 2012. Learning and reasoning about
norms using neural-symbolic systems, in: AAMAS, pp. 1023–1030.
[100] Poon, H., Domingos, P., 2006. Sound and efficient inference with probabilistic and deterministic dependencies, in: AAAI, pp. 458–463.
[101] Poon, H., Domingos, P., 2009. Unsupervised semantic parsing, in: EMNLP, pp. 1–10.
[102] Prates, M., Avelar, P.H., Lemos, H., Lamb, L.C., Vardi, M.Y., 2019. Learning to solve np-complete problems: A graph neural network for
decision tsp, in: AAAI, pp. 4731–4738.
[103] Qu, M., Tang, J., 2020. Probabilistic logic neural networks for reasoning, in: ICLR.
[104] Raizada, M., 2022. Survey on recommender systems incorporating trust, in: ICAAIC, pp. 1011–1015.
[105] Ratti, E., Graves, M., 2022. Explainable machine learning practices: opening another black box for reliable medical ai, in: AI and Ethics, pp.
1–14.
[106] Richardson, M., Domingos, P., 2006. Markov logic networks, in: ML, pp. 107–136.
[107] Riegel, R., Gray, A., Luus, F., Khan, N., Makondo, N., Akhalwaya, I.Y., Qian, H., Fagin, R., Barahona, F., Sharma, U., et al., 2020. Logical
neural networks, in: arXiv preprint arXiv:2006.13155.
[108] Rissati, J.V., Molina, P.C., Anjos, C.S., 2020. Hyperspectral image classification using random forest and deep learning algorithms, in:
LAGIRS, pp. 132–132.
[109] Rocktäschel, T., Riedel, S., 2017. End-to-end differentiable proving, in: NIPS.
[110] Safavian, S.R., Landgrebe, D., 1991. A survey of decision tree classifier methodology, in: IEEE T SYST MAN CY-S, pp. 660–674.
[111] Salahuddin, Z., Woodruff, H.C., Chatterjee, A., Lambin, P., 2022. Transparency of deep neural networks for medical image analysis: A
review of interpretability methods, in: CIBM, pp. 105–111.
[112] Schlichtkrull, M., Kipf, T.N., Bloem, P., Berg, R.v.d., Titov, I., Welling, M., 2018. Modeling relational data with graph convolutional networks,
in: ESWC, pp. 593–607.
[113] Serafini, L., Garcez, A.d., 2016. Logic tensor networks: Deep learning and logical reasoning from data and knowledge, in: arXiv preprint
arXiv:1606.04422.
[114] Sikka, K., Huang, J., Silberfarb, A., Nayak, P., Rohrer, L., Sahu, P., Byrnes, J., Divakaran, A., Rohwer, R., 2020. Zero-shot learning with
knowledge enhanced visual semantic embeddings, in: arXiv preprint arXiv:2011.10889.
[115] Silva, A., Gombolay, M., 2021. Encoding human domain knowledge to warm start reinforcement learning, in: AAAI, pp. 5042–5050.
[116] Singla, P., Domingos, P., 2005. Discriminative training of markov logic networks, in: AAAI, pp. 868–873.
[117] Singla, P., Domingos, P., 2006. Memory-efficient inference in relational domains, in: AAAI, pp. 488–493.
[118] Sourek, G., Aschenbrenner, V., Zelezny, F., Schockaert, S., Kuzelka, O., 2018. Lifted relational neural networks: Efficient learning of latent
relational structures, in: JAIR, pp. 69–100.
[119] Sun, R., Alexandre, F., 2013. Connectionist-symbolic integration: From unified to hybrid approaches.
[120] Sun, R., Bookman, L.A., 1994. Computational architectures integrating neural and symbolic processes: A perspective on the state of the art .
[121] Sun, Y., Tang, D., Duan, N., Gong, Y., Feng, X., Qin, B., Jiang, D., 2020. Neural semantic parsing in low-resource settings with back-
translation and meta-learning, in: AAAI, pp. 8960–8967.
[122] Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J., 2019. Rotate: Knowledge graph embedding by relational rotation in complex space, in: ICLR.
[123] Tandon, N., Varde, A.S., de Melo, G., 2018. Commonsense knowledge in machine intelligence, in: SIGMOD, pp. 49–52.
[124] Teru, K., Denis, E., Hamilton, W., 2020. Inductive relation prediction by subgraph reasoning, in: ICML, pp. 9448–9457.
[125] Tian, J., Li, Y., Chen, W., Xiao, L., He, H., Jin, Y., 2022. Weakly supervised neural symbolic learning for cognitive tasks, in: AAAI.
[126] Towell, G.G., Shavlik, J.W., 1994. Knowledge-based artificial neural networks, in: AI, pp. 119–165.
[127] Townsend, J., Chaton, T., Monteiro, J.M., 2019. Extracting relational explanations from deep neural networks: A survey from a neural-
symbolic perspective, in: TNNLS, pp. 3456–3470.
[128] Townsend, J., Chaton, T., Monteiro, J.M., 2020. Extracting relational explanations from deep neural networks: A survey from a neural-
symbolic perspective, in: TNNLS, pp. 3456–3470.
[129] Tran, S.D., Davis, L.S., 2008. Event modeling and recognition using markov logic networks, in: ECCV, pp. 610–623.
[130] Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., Bouchard, G., 2016. Complex embeddings for simple link prediction, in: ICML, pp. 2071–
2080.
[131] Vashishth, S., Sanyal, S., Nitin, V., Talukdar, P., 2020. Composition-based multi-relational graph convolutional networks, in: ICLR.
[132] Von Rueden, L., Mayer, S., Beckh, K., Georgiev, B., Giesselbach, S., Heese, R., Kirsch, B., Pfrommer, J., Pick, A., Ramamurthy, R., et al.,
2021. Informed machine learning–a taxonomy and survey of integrating prior knowledge into learning systems, in: TKDE, pp. 614–633.
[133] Wang, P., Dou, D., Wu, F., de Silva, N., Jin, L., 2019a. Logic rules powered knowledge graph embedding, in: arXiv preprint arXiv:1903.03772.
[134] Wang, X., Ye, Y., Gupta, A., 2018. Zero-shot recognition via semantic embeddings and knowledge graphs, in: CVPR, pp. 6857–6866.
[135] Wang, Y., Yao, Q., Kwok, J.T., Ni, L.M., 2020. Generalizing from a few examples: A survey on few-shot learning, in: CSUR, pp. 1–34.
[136] Wang, Z., Ren, Z., He, C., Zhang, P., Hu, Y., 2019b. Robust embedding with multi-level structures for link prediction., in: IJCAI, pp.
5240–5246.
[137] Wang, Z., Zhang, J., Feng, J., Chen, Z., 2014. Knowledge graph embedding by translating on hyperplanes, in: AAAI.
[138] Wen, L.H., Jo, K.H., 2022. Deep learning-based perception systems for autonomous driving: A comprehensive survey, in: Neurocomputing.
[139] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y., 2020. A comprehensive survey on graph neural networks, in: TNNLS, pp. 4–24.
[140] Xie, Y., Xu, Z., Kankanhalli, M.S., Meel, K.S., Soh, H., 2019. Embedding symbolic knowledge into deep networks .
[141] Xiong, W., Hoang, T., Wang, W.Y., 2017. Deeppath: A reinforcement learning method for knowledge graph reasoning, in: EMNLP.
[142] Xu, J., Zhang, Z., Friedman, T., Liang, Y., Broeck, G., 2018. A semantic loss function for deep learning with symbolic knowledge, in: ICML,
pp. 5502–5511.
[143] Yang, B., Yih, W.t., He, X., Gao, J., Deng, L., 2015. Embedding entities and relations for learning and inference in knowledge bases, in:
ICLR.
[144] Yang, F., Lyu, D., Liu, B., Gustafson, S., 2018. Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust
decision-making, in: IJCAI.
[145] Yang, F., Yang, Z., Cohen, W.W., 2017. Differentiable learning of logical rules for knowledge base completion, in: NIPS.
[146] Yang, Y., Song, L., 2020. Learn to explain efficiently via neural logic inductive learning, in: ICLR.
[147] Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J.B., 2018. Neural-symbolic vqa: Disentangling reasoning from vision and
language understanding, in: NIPS.
[148] Yu, D., Yang, B., Wei, Q., Li, A., Pan, S., 2022. A probabilistic graphical model based on neural-symbolic reasoning for visual relationship
detection, in: CVPR, pp. 10609–10618.
[149] Zhang, J., Chen, B., Zhang, L., Ke, X., Ding, H., 2021. Neural, symbolic and neural-symbolic reasoning on knowledge graphs, in: AI Open,
pp. 14–35.
[150] Zhang, Y., Chen, X., Yang, Y., Ramamurthy, A., Li, B., Qi, Y., Song, L., 2020. Efficient probabilistic logic reasoning with graph neural
networks, in: ICLR.
[151] Zhong, J., Haoran, W., Yunlong, Y., Yanwei, P., 2019. A decadal survey of zero-shot image classification, in: SCIENTIA SINICA
Informationis, pp. 1299–1320.
[152] Zhou, Z.H., 2019. Abductive learning: towards bridging machine learning and logical reasoning, in: Science China Information Sciences,
pp. 1–3.
[153] Zhu, Y., Fathi, A., Fei-Fei, L., 2014. Reasoning about object affordances in a knowledge base representation, in: ECCV, pp. 408–424.
[154] Zhu, Y., Xian, Y., Fu, Z., de Melo, G., Zhang, Y., 2021. Faithfully explainable recommendation via neural logic reasoning, in: ACL.
[155] Zuidberg Dos Martires, P., Kumar, N., Persson, A., Loutfi, A., De Raedt, L., 2020. Symbolic learning and reasoning with noisy data for
probabilistic anchoring, in: Front. Robot. AI, p. 100.
A. Preliminaries
In this section, we introduce background information related to symbolic knowledge and neural networks. For
symbolic knowledge, we focus on two categories: logic knowledge and knowledge graphs. Logic knowledge can be
further subdivided into propositional logic and first-order logic.
A.1. Symbols
A.1.1. Propositional logic
Propositional logic statements are declarative sentences that are either True or False. A declarative sentence is a
True sentence if it is consistent with the fact; otherwise, it is a False sentence. The connectors between propositions
are "∧", "∨", "¬" and "⇒". Propositional logic can be expressed in the form of the following formula:
𝑃 ⇒ 𝑄, (9)
where 𝑃 represents the antecedent (condition), while 𝑄 represents the consequent (conclusion).
Propositional logic is usually compiled in the form of directed acyclic graphs. Conjunctive Normal Forms (CNFs),
deterministic-Decomposable Negation Normal Forms (d-DNNFs) [17, 19], and Sentential Decision Diagrams (SDDs)
[18] are representative examples of knowledge representation, where SDD is a subset of d-DNNFs. For example, given
a propositional logic 𝑆𝑚𝑜𝑘𝑒𝑠 ⇒ 𝐶𝑜𝑢𝑔ℎ, its CNF and d-DNNF graph are presented in Figure 16 (a) and Figure 16 (b),
respectively.
Table 4
An instance of a Markov logical network.
Proposition First-order logic Weight
Smoking causes cough. 𝐹 1 ∶ ∀𝑥, 𝑆𝑚𝑜𝑘𝑒𝑠(𝑥) ⇒ 𝐶𝑜𝑢𝑔ℎ(𝑥) 1.5
If two people are friends, either both smoke or neither does. 𝐹 2 ∶ ∀𝑥∀𝑦, 𝐹 𝑟𝑖𝑒𝑛𝑑𝑠(𝑥, 𝑦) ⇒ (𝑆𝑚𝑜𝑘𝑒𝑠(𝑥) ⇔ 𝑆𝑚𝑜𝑘𝑒𝑠(𝑦)) 1.1
^ ^
⋁
⋁
^
-Smokes Cough -Smokes Cough Smokes
(a)CNF (b)d-DNNF
Figure 16: Directed acyclic graph of CNF and d-DNNF. In the graph, leaf nodes represent atoms of propositional logic,
and other nodes represent connectors. Directed edges are the relationship between nodes.
and quantifiers. The four types include constants, variables, functions, and predicates. Constants represent objects in
the domain of interest (for example, in the predicate father(a,b), a=Bob, b=Mara, a and b are constant). Variables
range over the objects in the domain (for example, in the predicate father(𝑥,𝑦), where 𝑥 is the father of 𝑦, and the
variable 𝑥 is limited to the scope of the father class). Functions represent mappings from tuples of objects to objects.
Predicates represent relations among objects in a given domain or attributes of these objects. The connector is the same
as in propositional logic. FOL involves the combination of atoms through connectors, such that an expression can be
written in the following form:
where 𝐵1 (𝑥), 𝐵2 (𝑥), ⋯, 𝐵𝑛 (𝑥) represents the rule body, which is composed of multiple atoms. 𝐻(𝑥) represents the
rule head and is the result derived from the rule body.
Knowledge representation of FOL can be achieved by a Markov logic network (MLN) [106]. MLN is an undirected
graph in which each node represents a variable, and the joint distribution is represented as follows:
∑
𝑃 (𝑋 = 𝑥) = 1∕𝑍𝑒𝑥𝑝{ 𝑤𝑖 𝑛𝑖 (𝑥)}, (11)
𝑖
where 𝑍 represents the partition function, 𝑤𝑖 represents the weight of the rule, 𝑛𝑖 (𝑥) represents the number of times
that the value of the rule is true, and we use t-norm fuzzy logic [94] to calculate logical connectives.
The following introduces a simple example of an MLN. Table 4 shows the two rules (𝐹1 ,1.5), (𝐹2 ,1.1) of this
example [27]. Given a constant set 𝐶 = {𝐴, 𝐵}, the generated ground Markov logic network is shown in Figure 17.
Friends(A,B)
Cough(A)
Cough(B)
Friends(B,A)
Figure 17: Ground Markov logic network. Nodes are variables and edges are the relationship between variables.
Panda
Bird
Tail Mammals
Cat Horse
Ears
Attribute
Paw
Dog
Knowledge graph
For their part, a knowledge graph representation is used to encode discrete symbols (entities, attributes, relation-
ships, etc.) into a low-dimensional vector space to obtain a distributed representation. Typical methods include R-
GCN [112], M-GNN [136], CompGCN [131], TransE [9],TransR [73], TransH [137], RotatE [122], DisMult [143],
ComplEX [130], ConvE [24], ConvR [55], GGNN [71], and GCN [63], among others.
…
Image softmax
…
Input Class n
Class n
Input layers Hidden layers Output layers
Convolutional layers Fully-connected layers
Graph structure
Task-related outputs
Node feature
Figure 20: GNN, which feeds graph structure and node feature into graph convolutional layers.
GNNs (Graph Neural Networks, Figure 20) [139] with multiple graph convolutional layers that can attain each
node’s hidden representation by aggregating feature information from its neighbors. The graph structure and node
feature information as inputs and the outputs of GNNs can focus on different graph analysis tasks, such as node
classification and graph classification. Therefore, GNNs have different variants, such as GCN, GAE, etc.
Input 1 Output 1
Input 2 Output 2
…
…
…
Input n Output n
Hidden layers
Figure 21: RNN, which feeds its output back into its input. Here, dashed lines represent recurrent connections.
RNNs (Recursive Neural Networks, Figure 21) [49] are neural networks in which the input and output layers
are identical, and the latter propagates activation back to the former through recurrent connections. Over time, with
each time step consisting of a full input-to-output forward pass, the hidden layer transitions through a series of states.
Usually, the input is sequence data such as sentences in natural language processing. LSTM (Long Short-Term Memory
Networks) is a variable of RNN.