0% found this document useful (0 votes)

25 views92 pages

Limits of AI: Second-Wave Insights

The document discusses the limitations of second-wave AI and deep learning, highlighting issues such as data requirements, lack of explainability, and challenges in transfer learning. It emphasizes the need for hybrid AI models that combine symbolic and subsymbolic approaches to address these limitations. Additionally, it introduces knowledge graphs as essential tools for improving AI by providing contextual relationships and enhancing reasoning capabilities.

Uploaded by

nbknabuka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views92 pages

Limits of AI: Second-Wave Insights

Uploaded by

nbknabuka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2263 - Data Science and Artificial Intelligence

Data Science and Artificial

Intelligence – Lecture 6

Axel Polleres, Johann Mitlöhner

Institute for Data , Process and Knowledge Management

Limits of second-Wave AI and Deep Learning:
• Data-Hungry:
• training neural networks needs a lot of correctly labelled training data.
• Models are not "explainable"
• Cannot "apply" abstract knowledge, e.g. to new, unseen examples, cannot “compute”
• Often not robust to changed contexts
• Cannot learn rules and constraints reliably (nor apply them)
• Transfer-Learning hits boundaries:
• fine-tuning still, often needs considerable amounts of labeled data
• not always possible/feasible in very specific domains
Limits of first-Wave AI and Logics:
• Lacking the capability to learn

• Focused (mostly )on discrete relationships

• Scalability/Combinatorial explosion (which sometimes can’t be

avoided)
What more did you learn on
(1) limits?
(2) approaches to address these limits?
Limits of second-Wave AI and Deep Learning:
• Example –Cultural Heritage trying to apply pre-trained contemporary
image models to historic images:

Laptop 97%

Tabea Tietz, Jörg Waitelonis, Mehwish Alam and Harald Sack. Knowledge Graph based Analysis and Exploration of Historical
Theatre Photographs. Proceedings of the Conference on Digital Curation Technologies (Qurator 2020), Berlin, January 20-21,
2020. [Link]
Limits of second-Wave AI and Deep Learning:
Don't trust "end-to-end" Deep-
Learning solutions.
Hybrid AI models are the way
forward.

[Link]

[Link]
5?utm_source=twt_nnc&utm_medium=social&utm_campaign=naturenews&sf221156507=1
Truthfulness/Understanding:
• Not solvable without contextual+conceptual background knowledge!
Trends: Transfer Learning/Fine-tuning:

• Idea: fine-tune existing,

pre-trained models (e.g.
large text corpora from
news, image recognition
models)
• to specific, related contexts
(e.g. classify trucks instead
of cars)

• Many Pre-trained models

available, for both images
and text:
• [Link]
Trends: Zero-shot learning (wikipedia):
• “Unlike standard generalization in machine learning, where classifiers are
expected to correctly classify new samples to classes they have already
observed during training, in ZSL, no samples from the classes have been given
during training the classifier. It can therefore be viewed as an extreme case of
domain adaptation. “
• […]
• builds on the ability to "understand the labels"—represent the labels in the
same semantic space as that of the documents to be classified.
• Challenge: How to “detect”?“ new classes…
• Based on either attiditonal feature background knowledge:
• structured attribute combinations (e.g. “a Zebra is a Horse with stripes”)
• or (less common) in combination with unsupervised learning:
• E.g. [Link]
“Third-Wave AI” or “Hybrid AI”
Setting priorities:
AAAI & Catalyst report 2019

[Link]
Typical applications nowadays need/use both:
Gameplaying:
Symbolic (1st wave): Subsymbolic (2nd wave):

12
Typical application need/use both:
Autonomous driving:
Symbolic: Subsymbolic:
Road network+ Routing algorithms:

Traffic Rules:

13
Typical application need/use both:
Converstational agents/Chatbots:
Symbolic: Intent extraction:
Subsymbolic:
Response Policy Rules: Speech Recognition:

[Link]

[Link]
[Link]

14
Recall: Good morning!
connecting to my previous lecture ;-)

[Link]
Also language models open up a whole lot of new
possibilities for hybrid combinations!
Not limited to natural language,
But also usable for
• Programming
• Database queries

For instance:

[Link]

[Link]
Four pillars of today’s AI systems
(to be taken with a grain of salt):
(Old but still true: [Link]
machine-learning-is-just-a-fancy-plugin/ )

• Learning: First, a system has to be taught. This can be done by single experts or a community is used where people teach the machine
bits of knowledge. This is what the machine uses to be able to learn on its own. You might think this way it doesn’t learn on its own,
but it does. Consider how a child learns. It learns by being taught by his parents, teacher, other children or anyone else teaching things
and it just copies and pastes everything with its “sensors” like ears and eyes. Thus, the AI learns best practices and reasoning from
experts. Knowledge is taught in atomic pieces of information that represent individual steps of a process.
• Semantic Graph: The taught knowledge has to be stored, which is done within a data store. The store is used to supply information for
the understanding of the world doing semantic reasoning. Like: I know that my mom is connected to dad. And I am connected to my
sister. And my sister is connected to her work colleagues. And she works in this city in that building. This is a semantic map of the
world that we know. That is part of our memory – a semantic graph. By creating a semantic data map, the AI understands the world in
which it operates.
• Process Engine: The engine is the central back-end service that puts everything together and thus delivers a solution to a certain
problem. The engine knows the map of the world where a system is acting in. In doing so, the engine takes everything it knows and
finds the correct solution to a specific problem on its own, step by step based on the knowledge it has.
• Problem Solving: Problem solving also known as machine reasoning (MR) is the ability to dynamically react to change and by doing
this, reusing existing knowledge for new and unknown problems. With machine reasoning, problems are solved in ambiguous and
changing environments. The AI dynamically reacts to the ever-changing context, selecting the best course of action. Thus, machine
reasoning is the basis for a general artificial intelligence (General AI).

Personal Opinion: I strongly consider the

last statement exaggerated for now!
17
Four pillars of today’s AI systems
(to be taken with a grain of salt):
[Link]

• Learning: First, a system has to be taught. This can be done by single experts or a community is used where people teach the machine bits of
knowledge. This is what the machine uses to be able to learn on its own. You might think this way it doesn’t learn on its own, but it does.
Consider how a child learns. It learns by being taught by his parents, teacher, other children or anyone else teaching things and it just copies
and pastes everything with its “sensors” like ears and eyes. Thus, the AI learns best practices and reasoning from experts. Knowledge is
taught in atomic pieces of information that represent individual steps of a process.
• Semantic Graph: The taught knowledge has to be stored, which is done within a data store. The store is used to supply information for the
understanding of the world doing semantic reasoning. Like: I know that my mom is connected to dad. And I am connected to my sister. And
my sister is connected to her work colleagues. And she works in this city in that building. This is a semantic map of the world that we know.
That is part of our memory – a semantic graph. By creating a semantic data map, the AI understands the world in which it operates.
• Process Engine: The engine is the central back-end service that puts everything together and thus delivers a solution to a certain problem.
The engine knows the map of the world where a system is acting in. In doing so, the engine takes everything it knows and finds the correct
solution to a specific problem on its own, step by step based on the knowledge it has.
We have not talked about this yet!
• Problem Solving: Problem solving also known as machine reasoning (MR) is the ability to dynamically react to change and by doing this,
reusing existing knowledge for new and unknown problems. Let’s
With get back
machine to those
reasoning, two
problems areguys!
solved in ambiguous and changing
environments. The AI dynamically reacts to the ever-changing context, selecting the best course of action. Thus, machine reasoning is the
basis for a general artificial intelligence (General AI).
What is a Knowledge Graph?

… good question!

Says more what a KG does

than what it is…
“interesting things and [understanding their]
relationships”
19
Bing: Knowledge Graph

● over a billion entities

● 21 billion associated facts
● 18 billion links to key actions
● 5 billion relationships between entities

Bing Knowledge Graph Facts

20
More Knowledge Graphs
Other companies with knowledge graphs:
● Facebook Entity Graph
● LinkedIn
● How NASA Finds Critical Data through a Knowledge
Graph [Neo4j Blog]
● Amazon Product graph
● Yandex, Baidu
Some free open knowledge graphs:
●Dbpedia
●WikiData
●Linked Open Data project
21
Knowledge Graphs are Big Data:

[Link]
What is a Knowledge Graph?

By properties:
McCusker, Chastain, Erickson, and McGuinness. What is a
Knowledge Graph? (unpublished, 2016).
identify some principles: p.2+3
“principled” aggregation of Linked Data? p.7

Rospocher, van Erp, Vossen, Fokkens, Aldabe, Rigau, Soroa,

Ploeger, Bogaard. Building event-centric knowledge graphs
from news. JWS (2016)

“knowledge-base of facts about entities typically

[Remark: often automatically] obtained from
structured repositories [such as Freebase]”

23
Applications of Knowledge graphs in AI:

Why are Knowledge Graphs important for hybrid (third wave) AI?
Improve AI (search/perceiving) by “knowing how things are connected”.
Enabling AI (reasoning) by “knowing how things are connected”.
Explain AI (learning/abstrations) by “knowing how things are connected”.

Goal: Knowledge Graphs could be the

“glue” to connect

24
AI(?) Applications of Knowledge Graphs:
Google – Search:
- Rich Snippets
- Personalised recommendations across services:

25
AI(?) Applications of Knowledge Graphs:
IBM Watson - Reasoning :
• Used DBpedia as one of its fact bases! (predates the #LLM hype!)

[Link]

26
AI(?) Technologies behind Knowledge Graphs:

• (Database-like) query languages…

• Cf. DMA lecture, we already learned SPARQL there as one such

language.

• … in gerenal, KGs are one of the “drivers” of the recent “revival” of

Graph Databases!

27
KG enrichment by Rule-based Logical Reasoning:
RDFS (RDF Schema) and OWL: Two More standards
extending RDF to encode
• schema information
• taxonomic relationships
• and implicit triples

Basic logical Reasoning techniques:

• Materialisation (can be done by rules/queries) [2]
• Rewriting [1]
1. Stefan Bischof, Markus Krötzsch, Axel Polleres, and Sebastian Rudolph. Schema-agnostic query rewriting in SPARQL 1.1. In
Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Lecture Notes in Computer Science (LNCS).
Springer, October 2014. [ .pdf ]
2. Axel Polleres, Aidan Hogan, Renaud Delbru, and Jürgen Umbrich. RDFS & OWL reasoning for linked data. In Reasoning Web
2013, volume 8067 of LNCS, pages 91--149. Springer, Mannheim, Germany, July 2013. [ .pdf ]
Reasoning by Querying – Materialisation:
SELECT ?X WHERE
{
?X a dbo:Scientist ; dbo:birthPlace dbr:Bologna . No answer L
} … since implicit edges are missing!

instance data:
dbr:Marta_Grandi a dbo:Entomologist ;
dbo:birthPlace dbr:Bologna . Bologna
Marta_Grandi birthplace
dbr:Costanzo_Varolio a dbo:Medician;
dbo:birthPlace dbr:Bologna . a
Agent
a
⊑ ⊑
a
Ontology (schema data): a Person Organisation
dbo:Entomologist rdfs:subClassOf dbo:Scientist.
⊑
dbo:Medician rdfs:subClassOf dbo:Scientist.
dbo:Scientist rdfs:subClassOf dbo:Person. Scientist
dbo:Person rdfs:subClassOf dbo:Agent. ⊑ ⊑
dbo:Organisation rdfs:subClassOf dbo:Agent.
dbo:birthPlace rdfs:domain dbo:Person . Entomologist Medician
29
dbo:Organisation owl:disjointWith dbo:Place.
...
RDFS deduction rules:
cf. [Link]

Could be read as Datalog deduction rules, e.g.:

triple(U,rdfs:subClassOf,S) :- triple(U,rdfs:subClassOf,V) , triple(V,rdfs:subClassOf,S) .

triple(V,rdfs:type,S) :- triple(U,rdfs:subClassOf,S) , triple(V,rdf:type,U) .
30
RDFS deduction rules:
cf. [Link]

Could be read as Datalog deduction rules, e.g.:

triple(U,rdfs:subClassOf,S) :- triple(U,rdfs:subClassOf,V), triple(V,rdfs:subClassOf,S).

triple(V,rdfs:type,S) :- triple(U,rdfs:subClassOf,S), triple(V,rdf:type,U).
RDFS deduction rules:
cf. [Link]

… and Datalog deduction rules could be written as SPARQL Construct statements:

CONSTRUCT {?U rdfs:subClassOf ?S} WHERE { ?U rdfs:subClassOf ?V. ?V rdfs:subClassOf ?S }

CONSTRUCT {?V rdfs:type ?S} WHERE { ?U rdfs:subClassOf ?S. ?V rdf:type ?U }
Reasoning by Querying – Materialisation:
SELECT ?X WHERE Applying the rules of the previous
{ slides exhaustively (until a fixpoint),
?X a dbo:Scientist .
?X dbo:birthPlace dbr:Bologna . will yield additional implicit KG edges
} (i.e., RDF triples):

instance data:
dbr:Marta_Grandi a dbo:Entomologist ; dbr:Marta_Grandi a dbo:Scientist,
dbo:birthPlace dbr:Bologna . dbo:Person, dbo:Agent.

dbr:Costanzo_Varolio a dbo:Medician; dbr:Costanzo_Varolio a

dbo:birthPlace dbr:Bologna . dbo:Scientist, dbo:Person, dbo:Agent.

Ontology (schema data):

dbo:Entomologist rdfs:subClassOf dbo:Scientist. dbo:Entomologist rdfs:subClassOf
dbo:Medician rdfs:subClassOf dbo:Scientist. dbo:Person, dbo:Agent.
dbo:Scientist rdfs:subClassOf dbo:Person. dbo:Medician rdfs:subClassOf
dbo:Person rdfs:subClassOf dbo:Agent. dbo:Person, dbo:Agent.
dbo:Organisation rdfs:subClassOf dbo:Agent. dbo:Sienticst rdfs:subClassOf
dbo:birthPlace rdfs:domain dbo:Person . dbo:Agent.
33
dbo:Organisation
... owl:disjointWith dbo:Place.
Reasoning by Querying – Query Rewriting:
SELECT ?X WHERE
{
{ {?X a dbo:Scientist } UNION {?X a dbo:Medician } UNION {?X a dbo:Entomologist } }
?X dbo:birthPlace dbr:Bologna .
}

Ontology (schema data):

dbo:Entomologist rdfs:subClassOf dbo:Scientist.
dbo:Medician rdfs:subClassOf dbo:Scientist.
dbo:Scientist rdfs:subClassOf dbo:Person.
dbo:Person rdfs:subClassOf dbo:Agent.
dbo:Organisation rdfs:subClassOf dbo:Agent.
dbo:birthPlace rdfs:domain dbo:Person .
34
dbo:Organisation owl:disjointWith dbo:Place.
...
Reasoning by Querying – Query Rewriting:
SELECT ?X WHERE
{
{ {?X a/subclassOf* dbo:Scientist}
?X dbo:birthPlace dbr:Bologna .
}

instance data:
dbr:Marta_Grandi a dbo:Entomologist ; Alternatively, the rules can be
dbo:birthPlace dbr:Bologna . used “backwards” to rewrite the
original query to yield a more
dbr:Costanzo_Varolio a dbo:Medician; generic query!
dbo:birthPlace dbr:Bologna .
You can also use SPARQL path
expressions in this query
Ontology (schema data):
rewriting! [1]
dbo:Entomologist rdfs:subClassOf dbo:Scientist.
dbo:Medician rdfs:subClassOf dbo:Scientist.
dbo:Scientist rdfs:subClassOf dbo:Person.
dbo:Person rdfs:subClassOf dbo:Agent.
1. Stefan Bischof, Markus Krötzsch, Axel Polleres, and Sebastian Rudolph. Schema-
dbo:Organisation rdfs:subClassOf dbo:Agent.
agnostic query rewriting in SPARQL 1.1. In Proceedings of the 13th International
dbo:birthPlace rdfs:domain dbo:Person .
35 Semantic Web Conference (ISWC 2014), Lecture Notes in Computer Science
(LNCS). Springer, October 2014. [ .pdf ]
dbo:Organisation owl:disjointWith dbo:Place.
...
Existing KGs aren’t (logically) consistent L [1]
• E.g.
Dbpedia Ontology:

dbo:Agent owl:disjointWith dbo:Place.

You can think of this as a Constraint:

:- triple(X, "rdf:type, "dbo:Agent"),
triple(X, "rdf:type", "dbo:place").

dbo:Country rdfs:subClassOf dbo:Place.

dbo:Organisation rdfs:subClassOf dbo:Agent.

1. Stefan Bischof, Markus Krötzsch, Axel Polleres, and Sebastian Rudolph. Schema-agnostic query rewriting in
36 SPARQL 1.1. In Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Lecture Notes in
Computer Science (LNCS). Springer, October 2014. [ .pdf ]
Reasoning by Querying – Consistency checking:
SELECT
ASK {?X ?C1 ?C2}
WHERE { ?X a/subClassOf* ?C1;
a/subClassOf* ?C2.
?C1 owl:disjointWith ?C2. }

instance data:
dbr:European_Union a dbo:Country. Similarly, RDFS and OWL
inconsistency checking also can
dbr:European_Union a dbo:Organisation. be done by querying! [1]
(simplified)

Ontology (schema data):

dbo:Entomologist rdfs:subClassOf dbo:Scientist.

dbo:Medician rdfs:subClassOf dbo:Scientist.
dbo:Scientist rdfs:subClassOf dbo:Person.
dbo:Person rdfs:subClassOf dbo:Agent.
dbo:Organisation rdfs:subClassOf dbo:Agent.
dbo:birthPlace rdfs:domain dbo:Person .
dbo:Country rdfs:subClassOf dbo:Place.
37 dbo:Organisation owl:disjointWith dbo:Place.
Often, you also need to deal with
contextualized information
• E.g. from
Doesn’t work
(since the data is
contextualized and
doesn't match the
pattern)

„Cities in the Italy with more than 1M population“:

Structured queries (SPARQL):

[Link]
[Link] [Link]
PREFIX : <[Link]
PREFIX dbo: <[Link]
Automatic PREFIX yago: <[Link]
Exctractors
SELECT DISTINCT ?city ?pop WHERE {
?city a yago:City108524735 .
?city dbo:country :Italy.
?city dbo:populationTotal ?pop

FILTER ( ?pop > 1000000 )

}

38
Contextualised Information is better modeled
in another Open Knowledge Graph: Wikidata
• Wikidata can also be queried as RDF with SPARQL!

39
Wikidata as RDF … can be queried by SPARQL
• “Simple” surface query:

SELECT DISTINCT ?city WHERE {

?city wdt:P31/wdt:P279* wd:Q515.
?city wdt:P1082 ?population .
?city wdt:P17 wd:Q38 .
FILTER (?population > 1000000) }

• What’s this?

40 SEITE 40
Wikidata as RDF … can be queried by SPARQL
• However, Wikidata has more complex info:
(temporal context, provenance,…)
• Rome:
What do we learn?
• [Link] • Data and meta-data
(context/ provenance)
… Can I query since when which Italian cities at the same level à
have more than 1M population with SPARQL? Yes! one RDF graph, mixing
reification and plain
data, cf. [Hernandez et
al. 2015]
• Quite some
Knowledge about the
ontology required!

SEITE 41
Reification/Property Graphs:
• How to (best) describe contextual statements (aka meta-modeling) about triples in RDF is a bit open…

• One could use different Graph data models/Graph databases:

• Labeled Directed graphs (plain RDF) - supported by RDF triple stores:

Entity1 Entity1
predicate

• Property graphs – supported by other graph databases (e.g. Neo4J, BlazeGraph, etc.)
type: capitalOf
Name: Rome added: 2019-06-12 Name: Italy
Type: City author: @Axel Type: Country
added: 2019-06-11 added: 2019-06-11
42 author: @Sebastian author: @Sebastian
Reification/Property Graphs:
• How to (best) describe contextual statements (aka meta-modeling) about triples in RDF is a bit open
• no real prevalent standard… the way Wikidata models it (see above) is just one option.
• various other options and proposals, inter-translatable but affect performance of querying:

• Example: "Rome is Italy's capital since 1861."

:Rome :capitalOf :Italy. [1861, [

Challenge:
RDF reification: [ a rdf:Statement;
How to
rdf:subject :Rome;
rdf:predicate :capitalOf; reformulate
rdf:object :Italy ] :yearBegins 1861 . the inference
rules or
“Named Graphs” :G1 {:Rome :capitalOf :Italy. } rewritings
:G1 :yearBegins 1861 . from slide 58?

“Singleton” properties: :Rome :p1 :Italy.

:p1 :subPropertyOf :capitalOf;
43 :yearBegins 1861 .
Technology behind Knowledge Graphs:
• Storing & Querying Knowledge Graphs:
• Enriching/Verifying Knowledge Graphs via AI:
• Rule-based Reasoning
• Rule-Learning
• Link prediction via Graph Embeddings
• Advanced AI applications that leverage KGs

44
Rule-Learning/Mining over Knowledge Graphs
• AMIE (Association Rule Mining under Incomplete Evidence)
• extracts supported and confident logical rules from a Knowledge Graph, such as:

?a rdf:type ?b :- ?e <[Link] ?a , ?e rdf:type ?b .

• uses sampling of "counterexamples", as acccording to a partial completeness assumption:

• If we have the triple (s p o) in the KG K and func(p) ≥ func(p-), then AMIE assumes that all (s p o') for o' !=
o do not hold in the real world
• This can be used to sample counter examples e.g.
• positive example:
<[Link] <[Link] "Axel".
• generated negative example:
<[Link] <[Link] "Hans".
• Both positive examples and sampled counter-examples are used for the training and rule mining.

[Link]
Rule-Learning/Mining over Knowledge Graphs
• AMIE (Association Rule Mining under Incomplete Evidence)
• AMIE sample session:

?e <sameAs> ?a ?e <sameAs> ?b => ?a <sameAs> ?b

?e <knows> ?a ?e rdf:type ?b => ?a rdf:type ?b
?f <sameAs> ?b ?a <sameAs> ?f => ?a <sameAs> ?b
?a <sameAs> ?f ?f <knows> ?b => ?a <knows> ?b

[Link]
Code: [Link]
Technology behind Knowledge Graphs:
• Storing & Querying Knowledge Graphs:
• Enriching/Verifying Knowledge Graphs via AI:
• Rule-based Reasoning
• Rule-Learning
• Link prediction via Graph Embeddings
• Advanced AI applications that leverage KGs

47
Embeddings, recall word embeddings:
• encoding words to vectors:
• frequency based (e.g. TF/IDF)
• context-based
• co-occurrence based
• prediction-based
• Encode/train NN to predict context-word(s) from word(s) n-grams
(e.g. training with sentences with missing words, or predicting next
words, essentially what LLMs do!)

• Applications:
• Document Classification
• Search (ranking)
• Machine translation
• Conversational AI
• Question answering
Similar idea: Graph embeddings
• General idea:
• embed nodes and relations in a KG in a vectorspace…
• learn embeddings that should filfill certain properties using neural networks, e,g,
TransE [*] embedding:
Example:
Maria
married_to
Theresia
Franz Stephan von Lothringen
Elizabeth II
Other embeddings proposed Prince Philipp, Duke of Edinburgh

since: TransH (2014), TransR

(2015), RDF2Vev (2016)

• … where this vector embedding can be used again for other AI (particularly ML tasks.
• e.g. Link prediction, Node Classification (i.e. enriching a KG by missing links)
• item recommendation cf. [Link]
of-knowledge-graph-embeddings-for-item-recommendation

[*] Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, Oksana Yakhnenko:
Translating Embeddings for Modeling Multi-relational Data. NIPS 2013: 2787-2795
Technology behind Knowledge Graphs:
• Storing & Querying Knowledge Graphs:
• Enriching/Verifying Knowledge Graphs via AI
• Advanced AI applications that leverage KGs
• Question Answering
• Semantic Search

50
Some of our own research in this space:
● HDT - Lightweight/portable compressed graph/triple store.
● Extending SPARQL with “real” path queries (prototype)
● Connecting Knowledge Graphs to text and conversations
● Understanding and Searching Open Data by linking it to a Knowledge Graph

Svitlana Vakulenko, Maarten de Rijke, Michael Cochez, Vadim Savenkov, and Axel Polleres. Measuring semantic coherence of a conversation. In Proceedings of the 17th International
Semantic Web Conference (ISWC 2018), volume 11136 of Lecture Notes in Computer Science (LNCS), pages 634--651, Monterey, CA, October 2018. Springer. [ DOI | .pdf ]

Svitlana Vakulenko, Javier Fernández, Axel Polleres, Maarten de Rijke, and Michael Cochez. Message passing for complex question answering over knowledge graphs. In 28th ACM
International Conference on Information and Knowledge Management (CIKM2019, Beijing, China, November 2019. [ http ]

Sebastian Neumaier and Axel Polleres. Enabling spatio-temporal search in open data. Journal of Web Semantics (JWS), pre-print available at [Link] 2019. to
appear (accepted for publication).

Javier D. Fernández, Miguel A. Martinez-Prieto, Claudio Gutiérrez, Axel Polleres, and Mario Arias. Binary RDF Representation for Publication and Exchange (HDT). Journal of Web
Semantics (JWS), 19(2), 2013. [ http ]

Vadim Savenkov, Qaiser Mehmood, Jürgen Umbrich, and Axel Polleres. Counting to k, or how SPARQL 1.1 could be efficiently enhanced with top k shortest path queries. In 13th
International Conference on Semantic Systems (SEMANTiCS), pages 97--103, Amsterdam, the Netherlands, September 2017. ACM. [ .pdf ]

51
Tackling hard NLP problems with Knowledge Graphs:

Svitlana Vakulenko, Maarten de Rijke, Michael Cochez, Vadim Savenkov, and Axel Polleres. Measuring semantic coherence
of a conversation. In Proceedings of the 17th International Semantic Web Conference (ISWC 2018), volume 11136 of
Lecture Notes in Computer Science (LNCS), pages 634--651, Monterey, CA, October 2018. Springer. [ DOI | .pdf ]

Idea: Use Graph

embeddings to a vector
space and/or shortest path
connections in a graph.
Tackling hard NLP problems with Knowledge Graphs:

Idea: use unsupervised

message passing to
propagate confidence
scores obtained by
parsing an input
question and matching
terms in the
knowledge graph to a
set of possible
answers.
[Link], [Link]ández, A. Polleres, M. de Rijke, and M.
Cochez. Message passing for complex question answering
over knowledge graphs. CIKM2019,
Admittedly, that one is easy for LLMs:
[Link]
Ongoing PhDs and Master theses:

PhD Nicolas Ferranti: Knowledg Graph Consolidation in Wikidata

cf. first results: [Link]

validating-wikidatas-property-constraints-using-shacl-and-sparql-0

MSc Gerhard Klager

Cf. first results published in a worksshop
[Link]

Further MSc topics, cf. our list in Canvas ;-)

Time for our hands-on/hackathon exercise!

Challenge: Question Answering … how to combine LLMs and KGs?

• direct Question answering vs. leverageing a KG and SPARQL

• Possible substeps:
1. Question analysis? à a SPARQL pattern template
2. Named entity recognition à as set of “entities” and “relationships” mentioned in the question
3. Matching Entity labels à finding the identifiers matching the entities from step 2 in the KG
4. Query construction and execution à executable SPARQL Query
5. Answer formulation à Translating the answer back to Natural Language
Disclaimer:
1) you don’t need an OpenAI API key… Alternatives:
• use an Open Source Language model, e.g. via Colab:
• [Link]
• Simply use the non-paid GPT interface

2) Be sure to NOT feed any sensitive (personal) or copyrighted data into OpenAI!
Recall:

Not limited to natural language,

But also usable for
• Programming
• Database queries

[Link]

• Interestingly, that didn’t work so well,

when I retried this morgning:

[Link]
c978-4b1f-acd0-44a7bbaf3c51
Appendix: More of our own (ongoing anf older) reseach using Knowledge Graphs
CiyData Pipeline:
Enriching and completing Open City Data with a combination of using Equational Knowledge and Machine Learning

PhD Stefan Bischof:

Thesis:
[Link]
Indicators

Problem: different
sources conatin
different indicators
(area, population,
popultiondensity,
economic factors, for
different years.

Idea: using both first-

wave and second wave
AI (ML&statistics)
methods
Step 1: Storing Data from different sources in
RDF:
• Data from some sources like eurostat come as multidimensional data -
Data Cube vocabulary (QB):
• Temporal (December)
• Unit of measurement (degrees Celsius)
• Aggregation (mean, min, max, …)
• Indicator (temperature, population density)

city Vienna
year 2016
indicator
source value
error population
1 852 997
eurostat 0.0

63
Use equational knowledge for missing
computed/computable values:
population area
indicator indicator
city Vienna city

year year
value 2016
error error
value
city
1 852 997 year 0.0
0.0
414.650
population
populationdensity =
area
indicator
derived From error value
population density
population 4 467
populationdensity Ü 0.0
area

64
equations: deal with
approximated values by using
error propagation
population area
indicator indicator
city Vienna city

year year source

value 2016 error

error
value KNN
city
1 852 997 year ϵ prediction
0.0
414.650
population
populationdensity =
area
indicator
derived From error value
population density
𝑒𝑞! population
populationdensity Ü
area
?
𝑝𝑟𝑜𝑝𝑎𝑔𝑎𝑡𝑒(0.0, 𝜖, 𝑒𝑞! ) 4 467

65
More Details:
Stefan Bischof, Andreas Harth, Benedikt Kämpgen, Axel Polleres, and [Link] integrated statistical open city data by
combining equational knowledge and missing value imputation. Journal of Web Semantics (JWS), October [Link] press.

Main idea:
Combine
(1) ontological reasoning, (2)ML, (3)
equations “iteratively” with
QB equations
Evaluation Combination
PCA Regression + QB Equations
• Statistics one iteration (PCA regression + QB Equations)
• 991k observations from crawled data
• 522k new or better observations from PCA regression
• 230k better observations from QB Equations
• 232k new observations from QB Equations
• Same or better values (improved RMSE) for 80 of 82 indicators
• QB Equations are sensitive to correct error estimates

• More details: [Link]

67
City Data Pipeline
Prototype
[Link]
• Search for indicators & cities
• obtain results incl. sources
• Integrated data served as Linked Open Data
• Predicted values AND estimated error rates for
missing data...

...it‘s not finished, but:

assumption: Predictions get better, the more Open
data we integrate...
CiyData Pipeline:
Enriching and completing Open City Data with a combination of using Equational Knowledge and Machine Learning

PhD Sebastian Neumaier:

Thesis:
[Link]
Tackling hard research problems with Knowledge Graphs:

Why is Search in Open Data a problem?

[Link]

Structured Data in Web Search by Alon Halevy

vs.

Open Data Search is hard...

a) No natural language „cues“ like in Web tables...
b) Existing knowledge graphs don‘t cover the domain of "Open
Data“ well
c) Open Data is not properly geo-referenced

70
Our research:
Knowledge Graphs for Natural Data Search &
Integration!

• 2 approaches how knowledge graphs could help to solve the Open Data search problem:
1. Hierarchical labelling of Labeling of numeric data
2. Hierarchical labelling of Spatio-Temporal entities

71
Example Table

federal state district year sex population

Upper Austria Linz 2013 male 98157

Upper Austria Steyr 2013 male 18763

Upper Austria Wels 2013 male 29730

… … … … …

72
Open Data CSVs look more like this

NUTS2 LAU2_NAME YEAR SEX P_TOTAL

AT31 Linz 2013 1 98157

AT31 Steyr 2013 1 18763

AT31 Wels 2013 1 29730

… … … …

73 Source: [Link]
Why not use the numeric values?
• Identifying the most likely semantic label for a bag of
numerical values
• Deliberately ignore surroundings

NUTS2 LAU2_NAME YEAR SEX P_TOTAL

AT31 Linz 2013 1 98157

AT31 Steyr 2013 1 18763

AT31 Wels 2013 1 29730

… … … …

74
Why not use numeric values?
• Identifying the most likely semantic label for a bag of
numerical values
• Deliberately ignore surroundings

population (a district) (country Austria)

98157

18763

29730

75
Background Knowledge Graph
• Cities
• Population
• Area
• Country
• Location (Coordinates)
• Economic indicators
• …
• Organisations:
• Revenues
• Board members
• …
• Persons (e.g. celebrities, sports)
• Name
• Profession
What’s in there? • Height
• Landmarks (e.g. famous buildings)
• Country
• Location
• Height
• Events
• Dates
• Location

76
Background Knowledge Graph
• Find properties with
numerical range
• Hierarchical clustering approach

• Two hierarchical layers:

• Type hierarchy
(using OWL classes)

• Property-object hierarchy
(shared property-object pairs)

77
Label based on Nearest Neighbors

2
4
3
6 5
1

78
Example OD Labelling
populationTotal (a Settlement)
populationDensity (a City)

79 Source: [Link]
Lessons learned
• We can assign fine-grained semantic labels
• If there is enough evidence in Background Knowledge Graph
• However: Missing domain knowledge for labelling OD

Future work:
• Complementary to existing approaches (column header labeling, entity linking and relation extraction)
• Combined approaches may improve results
• Focusing on core dimensions of specific domains e.g. city data, maye more promising than “general”
value labeling.

80
What else can we do/use?
Focus on specific dimensions:
• Particularly temporal and geospatial queries require better support [2]

NUTS2 LAU2_NAME YEAR SEX AGE_TOTAL

AT31 Linz 2013 1 98157

AT31 Steyr 2013 1 18763

AT31 Wels 2013 1 29730

… … … …

[2] Emilia Kacprzak, et al.: A Query Log Analysis of Dataset Search. International Conference on Web Engineering (2017)

81
Available Geospatial Knowledge Bases

82
82
Geo-Knowledge Graph Construction
European Classification
Wikidata,
of Territorial Units
GeoNames

Wikidata links

Mapping OSM entities Wikidata links

to GeoNames regions

Extracting OSM
streets and places
83
Available Temporal Knowledge
Temporal Knowledge Graph Construction

} • Named events and their

labels
• Links to parent
periods/events
• Links to the spatial
coverage

}
• Temporal extent, i.e. a
single beginning and end
date

85
85
Dataset Labelling
Metadata descriptions
• Geo-entities in titles, descriptions,
organizations
• Restricted to „origin“ country of the
dataset (from portal)
• Temporal tagging using Heideltime
framework [3]

CSV cell value disambiguation

• Row context:
• Filter candidates by potential parents
(if available)
• Column context:
• Least common ancestor of the spatial
entities

[3] Strötgen, Gertz: Multilingual and Cross-domain Temporal Tagging. 86

86 Language Resources and Evaluation, 2013.
Indexed Datasets

87
RDF Export 1/2:
Knowledge Graph

• Spatial and temporal base knowledge graph

• Annotated data points in metadata and CSV cells
• CSV metadata using CSVW vocabulary
• e.g., delimiter, encoding, header, …
RDF Export 2/2: Annotate Datasets à
CSV on the Web Metadata [4]
• Note: no real cell level annotaitons, we
needed to add those!
• E.g.:
• csvwx:cell
• csvwx:hasTime
• csvw:refersToEntity
• …

Details: cf.:
[Link]

[4] R. Pollock et al., Metadata Vocabulary for Tabular Data, W3C CSV on the Web (2015)
89
Enable e.g. Search by GeoSPARQL Queries:
• Standard for representation and querying of geospatial linked data
• (Almost) no complete implementations of GeoSPARQL

90
Search Interface
[Link]
Faceted query interface:
§ Timespan
§ Time pattern
§ Geo-entities
§ Full-text queries

Back end:
§ MongoDB for efficient key
look-ups
§ ElasticSearch for indexing
and full-text queries
§ Virtuoso as a triple store

91
91
Lessons learned
• Geospatial and Temporal scope is the most useful search
feature for Open Data
• Respective Hierarchical Knowledge Graphs can be built
from existing Linked Data Sources
• Our algorithms annotate CSV tables and their metadata
descriptions

à KGs improve search (with some extra work)

92
92

Types of AI Agents and Knowledge Representation
No ratings yet
Types of AI Agents and Knowledge Representation
11 pages
AI Concepts and Project Cycle for Class 9
No ratings yet
AI Concepts and Project Cycle for Class 9
13 pages
AI Concepts for Class 9 Students
No ratings yet
AI Concepts for Class 9 Students
8 pages
Introduction to Artificial Intelligence
100% (1)
Introduction to Artificial Intelligence
122 pages
Overview of Artificial Intelligence Concepts
No ratings yet
Overview of Artificial Intelligence Concepts
60 pages
Types and Techniques of AI Problem Solving
No ratings yet
Types and Techniques of AI Problem Solving
14 pages
Essential Skills for AI Careers
No ratings yet
Essential Skills for AI Careers
16 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
15 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
6 pages
Unit.6 - Part-1 Notes
No ratings yet
Unit.6 - Part-1 Notes
55 pages
Comprehensive AI Study Guide
No ratings yet
Comprehensive AI Study Guide
5 pages
AI and NLP Course Notes for Students
No ratings yet
AI and NLP Course Notes for Students
40 pages
Introduction To Artificial Intelligence and Machine Learning
No ratings yet
Introduction To Artificial Intelligence and Machine Learning
20 pages
Ai Notes
No ratings yet
Ai Notes
11 pages
AI, ML, Convergent Tech Textbook
No ratings yet
AI, ML, Convergent Tech Textbook
13 pages
Artificial Intelligence Course Overview
No ratings yet
Artificial Intelligence Course Overview
195 pages
Introduction to Artificial Intelligence
No ratings yet
Introduction to Artificial Intelligence
16 pages
Foundations of Artificial Intelligence
No ratings yet
Foundations of Artificial Intelligence
4 pages
Understanding AI: Types and Applications
No ratings yet
Understanding AI: Types and Applications
24 pages
AI and ML Overview for Beginners
No ratings yet
AI and ML Overview for Beginners
105 pages
02 Artificial Intelligence
No ratings yet
02 Artificial Intelligence
12 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
3 pages
AI: Current Trends and Future Outlook
No ratings yet
AI: Current Trends and Future Outlook
30 pages
Understanding AI: Concepts and Applications
No ratings yet
Understanding AI: Concepts and Applications
58 pages
Notes 013
No ratings yet
Notes 013
8 pages
NoteBookLM Output
No ratings yet
NoteBookLM Output
3 pages
Overview of AI Law and Regulations
No ratings yet
Overview of AI Law and Regulations
24 pages
Introduction to Artificial Intelligence Concepts
No ratings yet
Introduction to Artificial Intelligence Concepts
20 pages
Essential AI Building Blocks Explained
No ratings yet
Essential AI Building Blocks Explained
47 pages
Overview of Artificial Intelligence Concepts
No ratings yet
Overview of Artificial Intelligence Concepts
4 pages
Understanding AI Solutions and Types
No ratings yet
Understanding AI Solutions and Types
58 pages
AI Problem Solving and Search Methods
100% (1)
AI Problem Solving and Search Methods
24 pages
Introduction to Artificial Intelligence Basics
No ratings yet
Introduction to Artificial Intelligence Basics
15 pages
STD 9 CH-1 AI Reflection & Ethics Q&A - 20260219 - 131205
No ratings yet
STD 9 CH-1 AI Reflection & Ethics Q&A - 20260219 - 131205
6 pages
Overview of Artificial Intelligence Concepts
No ratings yet
Overview of Artificial Intelligence Concepts
35 pages
Lesson 3 AI, Expert System and Machine Learning - 260306 - 102313
No ratings yet
Lesson 3 AI, Expert System and Machine Learning - 260306 - 102313
11 pages
Understanding AI: Key Concepts Explained
No ratings yet
Understanding AI: Key Concepts Explained
26 pages
Understanding Expert Systems in AI
No ratings yet
Understanding Expert Systems in AI
13 pages
AI and Machine Learning Guide 2025
No ratings yet
AI and Machine Learning Guide 2025
6 pages
Machine Learning Life Cycle Explained
No ratings yet
Machine Learning Life Cycle Explained
4 pages
Knowledge Representation in AI Explained
No ratings yet
Knowledge Representation in AI Explained
84 pages
AI's Transformative Role in Medicine
No ratings yet
AI's Transformative Role in Medicine
6 pages
Understanding AI: Key Concepts and Types
No ratings yet
Understanding AI: Key Concepts and Types
16 pages
Data Science Seminar Overview 2023
No ratings yet
Data Science Seminar Overview 2023
22 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
12 pages
Key Milestones in AI Evolution
No ratings yet
Key Milestones in AI Evolution
15 pages
Deep Learning Applications Overview
No ratings yet
Deep Learning Applications Overview
18 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
43 pages
Understanding Generative AI Basics
No ratings yet
Understanding Generative AI Basics
14 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
60 pages
AI Introduction for Class 9 Students
No ratings yet
AI Introduction for Class 9 Students
6 pages
AI Knowledge-Based Systems Overview
No ratings yet
AI Knowledge-Based Systems Overview
20 pages
AI and Robotics - Overview and Q&A
No ratings yet
AI and Robotics - Overview and Q&A
13 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
20 pages
Comprehensive Guide to Artificial Intelligence
No ratings yet
Comprehensive Guide to Artificial Intelligence
10 pages
AI, ML, and Deep Learning Overview
No ratings yet
AI, ML, and Deep Learning Overview
16 pages
U1 Gemini
No ratings yet
U1 Gemini
4 pages
Installing Bluetooth Adapter Inside Laptop
No ratings yet
Installing Bluetooth Adapter Inside Laptop
8 pages
Power BI Integration with Jet Analytics
No ratings yet
Power BI Integration with Jet Analytics
6 pages
Quotation for Win 10 Hardware Components
No ratings yet
Quotation for Win 10 Hardware Components
2 pages
ICDL Professional Skills for Careers
No ratings yet
ICDL Professional Skills for Careers
4 pages
PLC Fundamentals Quiz and Practice Test
No ratings yet
PLC Fundamentals Quiz and Practice Test
4 pages
Employee Management System Code
No ratings yet
Employee Management System Code
6 pages
MX960 Hardware Guide Overview
No ratings yet
MX960 Hardware Guide Overview
612 pages
WordPress For Beginners 7th Edition
100% (6)
WordPress For Beginners 7th Edition
196 pages
EU Digital COVID Vaccination Certificate
No ratings yet
EU Digital COVID Vaccination Certificate
2 pages
Lonsdale Street Project Documentation
No ratings yet
Lonsdale Street Project Documentation
1 page
Data Structure Using C Course Plan 2025-26
No ratings yet
Data Structure Using C Course Plan 2025-26
3 pages
Game-Based Learning in History Education
No ratings yet
Game-Based Learning in History Education
16 pages
Reporting Phishing and Vishing Incidents
No ratings yet
Reporting Phishing and Vishing Incidents
1 page
Nakamura Lacquer Company: Market Dilemma
No ratings yet
Nakamura Lacquer Company: Market Dilemma
8 pages
23 09 2023 From Roman PHD - Maksym - Literature Review
No ratings yet
23 09 2023 From Roman PHD - Maksym - Literature Review
339 pages
Understanding Embedded Systems Basics
No ratings yet
Understanding Embedded Systems Basics
144 pages
Crazytalk 6.x Pro Manual Enu
100% (1)
Crazytalk 6.x Pro Manual Enu
267 pages
PowerLogic PM8000 & METSEPM8210 Overview
No ratings yet
PowerLogic PM8000 & METSEPM8210 Overview
6 pages
Signal Encoding in Data Communications
No ratings yet
Signal Encoding in Data Communications
55 pages
Amazon Prime Video System Requirements
No ratings yet
Amazon Prime Video System Requirements
15 pages
Costing Systems in Excel Exercise
No ratings yet
Costing Systems in Excel Exercise
4 pages
Kto12crs-V8 - District 2 - 41
No ratings yet
Kto12crs-V8 - District 2 - 41
19 pages
High-Fidelity Face Editing with HifaFace
No ratings yet
High-Fidelity Face Editing with HifaFace
16 pages
ROSS: Open Source Rotordynamic Tool
No ratings yet
ROSS: Open Source Rotordynamic Tool
5 pages
New Perspectives On HTML5 CSS3 JavaScript 6th Edition Carey Ebook & Testbank
No ratings yet
New Perspectives On HTML5 CSS3 JavaScript 6th Edition Carey Ebook & Testbank
260 pages
Zeta Reticuli: Analyzing Betty Hill's Map
No ratings yet
Zeta Reticuli: Analyzing Betty Hill's Map
21 pages
Web Dynpro ABAP Page Builder
No ratings yet
Web Dynpro ABAP Page Builder
68 pages
TCA410 Voltage Follower Specifications
No ratings yet
TCA410 Voltage Follower Specifications
7 pages
Optimized PI Controller for SEPIC Converter
No ratings yet
Optimized PI Controller for SEPIC Converter
19 pages
HTML Web Design Basics for Students
No ratings yet
HTML Web Design Basics for Students
6 pages