Synthesis Lectures On Data, Semantics, and Knowledge
Synthesis Lectures On Data, Semantics, and Knowledge
MEROÑO-PEÑUELA • ET AL
Data, Semantics, and Knowledge
Series Editors: Ying Ding, University of Texas at Austin
Paul Groth, University of Amsterdam
Founding Editor Emeritus: James Hendler, Rensselaer Polytechnic Institute
Web Data APIs for Knowledge Graphs
Easing Access to Semantic Data for Application Developers
Albert Meroño-Peñuela, King’s College London
Pasquale Lisena, EURECOM, France
Carlos Martínez-Ortiz, Netherlands eScience Center
This book describes a set of methods, architectures, and tools to extend the data pipeline at the
disposal of developers when they need to publish and consume data from Knowledge Graphs (graph-
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.
Web Data APIs for Knowledge Graphs: Easing Access to Semantic Data for Application
Developers
Albert Meroño-Peñuela, Pasquale Lisena, and Carlos Martínez-Ortiz
2021
Ontology Engineering
Elisa F. Kendall and Deborah L. McGuinness
2019
Library Linked Data in the Cloud: OCLC’s Experiments with New Models of Resource
Description
Carol Jean Godby, Shenghui Wang, and Jeffrey K. Mixter
2015
Publishing and Using Cultural Heritage Linked Data on the Semantic Web
Eero Hyvönen
2012
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
Web Data APIs for Knowledge Graphs: Easing Access to Semantic Data for Application Developers
Albert Meroño-Peñuela, Pasquale Lisena, and Carlos Martínez-Ortiz
www.morganclaypool.com
DOI 10.2200/S01114ED1V01Y202107DSK021
Lecture #21
Series Editors: Ying Ding, University of Texas at Austin
Paul Groth, University of Amsterdam
Founding Editor Emeritus: James Hendler, Rensselaer Polytechnic Institute
Series ISSN
Print 2691-2023 Electronic 2691-2031
Web Data APIs
for Knowledge Graphs
Easing Access to Semantic Data
for Application Developers
Albert Meroño-Peñuela
King’s College London, United Kingdom
Pasquale Lisena
EURECOM, France
Carlos Martínez-Ortiz
Netherlands eScience Center, The Netherlands
M
&C Morgan & cLaypool publishers
ABSTRACT
This book describes a set of methods, architectures, and tools to extend the data pipeline at the
disposal of developers when they need to publish and consume data from Knowledge Graphs
(graph-structured knowledge bases that describe the entities and relations within a domain in
a semantically meaningful way) using SPARQL, Web APIs, and JSON. To do so, it focuses
on the paradigmatic cases of two middleware software packages, grlc and SPARQL Transformer,
which automatically build and run SPARQL-based REST APIs and allow the specification of
JSON schema results, respectively. The authors highlight the underlying principles behind these
technologies—query management, declarative languages, new levels of indirection, abstraction
layers, and separation of concerns—, explain their practical usage, and describe their penetration
in research projects and industry. The book, therefore, serves a double purpose: to provide a sound
and technical description of tools and methods at the disposal of publishers and developers to
quickly deploy and consume Web Data APIs on top of Knowledge Graphs; and to propose
an extensible and heterogeneous Knowledge Graph access infrastructure that accommodates a
growing ecosystem of querying paradigms.
KEYWORDS
knowledge graphs, web APIs, querying infrastructures, query interfaces, SPARQL,
grlc
ix
Contents
Foreword by Tobias Kuhn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi
6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1 grlc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1.1 Linked Data Platform for Genetics Research . . . . . . . . . . . . . . . . . . . 70
6.1.2 Nanopublications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1.3 CLARIAH and Social History Research . . . . . . . . . . . . . . . . . . . . . . 71
6.1.4 TNO: FoodCube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1.5 NewGen Chennai: Conference Proceedings . . . . . . . . . . . . . . . . . . . . 72
6.1.6 EU RISIS: Science, Technology, and Innovation . . . . . . . . . . . . . . . . 72
6.2 SPARQL Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.1 KG Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2.2 FADE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 grlc and Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Demos/Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
xiii
Foreword
Knowledge Graphs have recently emerged as a powerful concept and highly valuable technology.
They can also be seen as the third umbrella term of the line of research and practice that started
from the vision called the Semantic Web, which in some fields later became more commonly
known under the name of Linked Data and is now often labeled Knowledge Graphs. These
different umbrellas have much of the underlying concepts and technologies in common, but also
come with clear differences in focus. Openness and decentralization are aspects that have shifted
out of focus with the rise of the third umbrella term. Knowledge Graphs are often associated
with big companies unilaterally building silo-like databases, with no particular emphasis on
making this knowledge accessible to third parties or on keeping the architecture decentralized.
Recently, however, I see that an increasing number of fields and communities are embracing
the term Knowledge Graph for smaller and more restricted knowledge bases, and they are also
pushing openness and decentralization back into focus. They also often promote the use of the
core Semantic Web language RDF, which the big companies were hesitant to adopt. Web Data
APIs have, in my view, an important role in this development because they lower the bar for
interoperability and reuse. While the problems of interoperability and reuse have received a large
amount of theoretical consideration, ever since the inception of the Semantic Web vision, they
have been lacking in practical solutions for many scenarios. In a constructive and incremental
way, Web Data APIs have started to fill in the last missing gaps to make Knowledge Graphs
work as ecosystems rather than just systems. Web Data APIs enable small and large knowledge
bases to connect and build upon each other, and they thereby form a crucial component in this
development.
The query language SPARQL has been a core technological component for a large portion
of the approaches under the umbrella terms introduced above. SPARQL can be seen as a success
story, with its high adoption rate and the availability of countless implementations and related
tools. SPARQL can, however, also be seen as a problem child. It is known to be hard to use by
developers who have not received specific training, and even experienced users like myself often
depend on copy–paste templates as a starting point. On the server side, moreover, SPARQL
endpoints are notoriously difficult to set up in a way they can reliably and efficiently answer all
the possible queries they are supposed to be able to handle. In my opinion, Web Data APIs
are underway to solve these problems, not by replacing SPARQL but by building upon it and
only slightly redrawing its architectural role. That is exactly what the authors of this book did
by developing grlc and SPARQL Transformer, the two core components described in this book.
With just a few lines of declarative code, a SPARQL query template is turned into a developer-
friendly Web Data API.
xiv FOREWORD
I therefore believe that this book is a must-read for everybody involved in the engineering
parts of Knowledge Graph projects that build upon the RDF technology. Moreover, it might
convince you to choose RDF technology in the first place. Most of the downsides of RDF you
might have heard about are eliminated with the methods and tools described in this book, while
its strengths are in full force. With these strengths, the knowledge in your Knowledge Graph can
get a life of its own and can become part of the higher realm of our shared human knowledge.
Tobias Kuhn
June 2021
xv
Preface
Knowledge Graphs are among the most exciting and disruptive technological developments
of our time. Heavily inspired by ideas from the Semantic Web and Linked Data communities,
Knowledge Graphs are graph-structured knowledge bases that describe the entities and relations
in a domain in a semantically meaningful way. This is typically done through the data model
and syntax provided by the Resource Description Framework (RDF) [Cyganiak et al., 2014], a
Web data publishing paradigm based on structured statements of subject-predicate-object triples
suitable for machine consumption. For example, Wikidata [Vrandečić and Krötzsch, 2014], a
Knowledge Graph built collaboratively by more than 20,000 volunteers, describes more than
90 million items for anyone to reuse under a CC0 public domain license. The concepts and
relationships thus described in Knowledge Graphs are frequently used in intelligent systems
like Google Search (visible in the knowledge panels of its results page), Siri, and Alexa (through
voice-assisted answers), and provide these systems with some degree of understanding of the
world.
But how can we use the wealth of knowledge contained in Knowledge Graphs in our
applications? Is this some intricate mechanism that only large tech corporations like Google,
Apple, and Amazon can afford? Or, on the contrary, can any developer query these Knowledge
Graphs?
The answer to this question is far from being trivial, and the main objective of this book
is providing an accurate one. A first version of this answer, known to experienced Knowledge
Graph developers, is a firm yes, backed by the existence of standard, declarative, and rich query
languages that can be used to interrogate these Knowledge Graphs from applications and get
back results. The SPARQL Protocol and Query Language [W3C SPARQL working group,
2013], for example, is one of these languages. SPARQL operates over HTTP, the same protocol
developers use to transfer Web pages and Web API data responses from servers to clients, and
therefore it is easy to see how it can be used to transfer Knowledge Graph data on the Web
and will remain a fine mechanism to do so for a great variety of users. Moreover, clients of
SPARQL-compliant servers can always request results of their queries to be returned in JSON
[Bourhis et al., 2017], the de facto Web API data transfer format, thus making it in principle
friendly to the technology stack developers are used to.
Why, then, this book? Aren’t RDF, SPARQL,1 and JSON enough to query Knowledge
Graphs? In our view, there are at least four reasons these technologies do not cut it for any
1 We use here SPARQL as a prototypical example, and we intend to illustrate the situation for all Knowledge Graph
query languages, not just SPARQL. We do not mean to criticize SPARQL in particular; we are, as we will see later in the
book, quite fond of SPARQL.
xvi PREFACE
developer who wishes to query RDF Knowledge Graphs—especially if that developer is used to
other querying paradigms.
1. The first reason is that, although this is a somewhat uncomfortable truth, SPARQL is an
uncommon query language among developers: only a tiny fraction of developers know—
or are keen to learn—how to handle RDF data or write SPARQL queries [Booth et al.,
2019]. The immense majority of Web developers are used to query data published under
the de facto standard paradigm of Web APIs. Are these two, SPARQL and Web APIs,
confronted paradigms in querying Knowledge Graphs? Or can we devise ways of combin-
ing their strengths?
2. The second reason is that writing SPARQL queries from within application code can get
very repetitive. Large applications need huge amounts of data, leading to lots of queries
that developers need to type over and over again. Is there a way to reuse, somehow, these
queries, and avoid redundant labor?
3. This leads to the third reason: query management. In general, dealing with SPARQL
queries—or any Knowledge Graph queries for that matter—inside application code is
more often based on craftsmanship and improvisation rather than on principled and sys-
tematic engineering. Queries end up hard-coded and mixed with application source code,
poorly documented and spread in several separate files, at best. What should better query
management look like?
4. A fourth reason limiting SPARQL is its inability to let developers indicate how they would
like their query results to be structured. SPARQL is extraordinarily effective in letting de-
velopers specify what to query in a declarative way. However, is rather limited in how these
results are returned, typically in a table-like form even when data is requested in the richly
structured JSON format. In fact, JSON-LD—the RDF serialization proposed for the use
in Web-based environments [Sporny et al., 2013]—is not among the possible output for-
mats of a SELECT query. Developers need—and are therefore forced—to manipulate and
appropriately shape the structure of JSON results themselves in a post-hoc manner. How
can we extend the available standards to allow and generalize precisely that?
These are important concerns that many developers often find themselves confronted with
in production environments. In this book, we describe a set of methods, architectures, and tools
to overcome these limitations, and we extend the data pipeline at the disposal of developers when
they need to publish and consume data from Knowledge Graphs with SPARQL, Web APIs,
and JSON. To do so, we focus on the paradigmatic cases of two middleware software packages
that we wrote, and currently maintain: grlc [Meroño-Peñuela and Hoekstra, 2016, 2017] and
SPARQL Transformer [Lisena et al., 2019]. Both grlc and SPARQL Transformer have reached
maturity and wide community use, and we believe are useful tools to tackle these challenges. They
are, respectively, targeted at automatically generating SPARQL-based Web APIs for Knowledge
PREFACE xvii
Graphs and at allowing developers to specify the shape in which they desire to receive their
SPARQL JSON results.
The blooming ecosystem of solutions, models, and ideas around Knowledge Graphs chal-
lenge our classic conceptions about how to represent, store, and share data, but it also comes
with an overwhelming variety of technologies and standards. Therefore, deciding what specific
technologies belong to the “Knowledge Graph technology stack” is a difficult challenge with
many different answers, which will depend on specific developer goals. We are choosing what
these “Knowledge Graph technologies” are by looking at you, the reader, and imagining you as
an application developer who is interested in querying or otherwise building a Web application
or API over a Knowledge Graph. Part of the premise of the book is that you want to do this
by leveraging as much data as possible, in particular existing large repositories of Linked Data: a
Web data publishing paradigm that relies on HTTP, RDF, and SPARQL under which billions
of unique statements over hundreds of thousands of datasets and domains [Fernández et al.,
2017, Heath and Bizer, 2011] have been published in the last two decades. This means that
knowledge of RDF, SPARQL, and Linked Data is required to fully appreciate the book, and
that by being an application developer you might not be an expert on those topics just yet. In
order to address this, we start the book with a short introduction to these subjects, and we point
you to additional materials that we think are valuable teaching resources that will quickly get you
on track. If you are new to the Linked Data, RDF, and SPARQL world, we strongly encourage
you to read on and familiarize yourself with these topics before you proceed to the remainder
of the book. If you are a developer who already knows about RDF and SPARQL, that’s great!
Please feel free to skip Chapter 1 altogether. You can also quickly skim through it and use it as
a recap. If, on the other hand, you are an application developer who has never built or dealt with
Web APIs, we also have got you covered with an introduction to that in Chapter 3.
We have written this book with the idea of transcending a mere technical manual about
these tools, and consequently emphasize the principles behind our design choices rather than the
tools themselves. Some of these principles are:
• query management: distributed query storage and publishing over decentralized sys-
tems;
• declarative languages: mapping API specifications to SPARQL; mapping SPARQL
output to arbitrarily structured JSON;
• new levels of indirection: globally and uniquely identifying queries on the Web; and
• Knowledge Graph access: multi-tier architectures, separation of concerns.
We will refer to these principles across the book as we resort to them through the chapters.
The book is, therefore, organized as follows.
• Chapter 1 is a quick introduction to the background required for the book, and intro-
duces the core ideas behind Linked Data, RDF, and SPARQL as the basic technology
xviii PREFACE
stack for building and querying Knowledge Graphs and representing semantic data on
the Web. We also outline other relevant technologies for Knowledge Graphs, such as
GraphQL.
• Chapter 2 describes how to access Knowledge Graphs programmatically. We explain
the underlying principles of HTTP and SPARQL, the main protocols involved in
querying Knowledge Graphs, and how to use them within application code, with vari-
ous examples. We then show how several libraries can accelerate this process under two
assumptions: (1) that developers just want to execute some SPARQL remotely and use
the results and (2) that developers want to manipulate output from SPARQL.
• Chapter 3 is about how to build Web data APIs on top of SPARQL. Here, we dive
deeper into the main paradigmatic differences between SPARQL and Web APIs, as
the two confronted models of Web data querying we are interested in for this book.
We explain the entire bottom-up process of building such APIs on top of SPARQL
by explaining the OpenAPI specification (the de facto standard), how to merge it with
SPARQL, and the consequences of doing so manually.
• Chapter 4 revolves around automating the processes explained in Chapter 3 through
sharing queries with grlc. We explain its underlying principles, and we describe a
lightweight query documentation and metadata language to map features of the Ope-
nAPI specification to SPARQL. We include here some exercises for training on the
explained content.
• Chapter 5 explains how to shape the JSON results of SPARQL queries, addressing
another important process of the Knowledge Graph data consumption pipeline of
Chapter 3. We explain what the curse of the bindings is, and how to use a single JSON
object as both query and results template in SPARQL Transformer. We also describe its
architecture, features, and syntax, and how to integrate it with grlc. Again, we include
some exercises.
• Chapter 6 collects a number of successful uses and applications of the tools and prin-
ciples presented in previous chapters. We provide abundant documentation and links
to resources.
• Chapter 7 presents our conclusions and future challenges, answering some of the ques-
tions pointed out in this introduction.
We have created a series of exercises at the end of Chapters 4 and 5 to practice the core
knowledge introduced in them. You will find their solutions in an appendix at the end of the
book, so you can study them comfortably without having the solutions too temptingly close.
Please, only look at them once you have tried to solve the exercises by yourself. For your conve-
nience, we have also uploaded these exercises and their solutions, together with all code snippets
PREFACE xix
used in the book, to an online repository2 where they are easier to use in practical settings. In
addition, you will find there the materials and slides that we used for the SPARQL Endpoints and
Web API (SWApi) tutorials that we organized during the International Semantic Web Confer-
ence 2020 (ISWC 2020), the Web Conference 2021 (TheWebConf 2021), and the Extended
Semantic Web Conference 2021 (ESWC 2021). The book that you are about to start is based on
those tutorials: it provides a more fluid narrative around Web APIs for RDF Knowledge Graphs
and adds a substantial amount of references, exercises, and background knowledge in order to
make its contents more accessible to a wider range of Web developers.
We hope that you enjoy reading the book as much as we have enjoyed writing it. We will
consider our efforts a big success if we made your life as a developer a little easier, or if we inspired
you to make your own ideas around APIs and Knowledge Graphs grow. Have lots of fun!
2 https://github.com/api4kg
xxi
Acknowledgments
The idea of this book came from a series of tutorials that we organized in academic conferences
around the Semantic Web and Knowledge Graphs, concretely at the International Semantic
Web Conference 2020 (ISWC 2020), the Web Conference 2021 (TheWebConf 2021), and the
Extended Semantic Web Conference 2021 (ESWC 2021). Initially, the plan was just to extend
and solidify the contents of those tutorials into a more permanent format, but we soon realized
that, on top of that, we also wanted to expand our audiences and talk not just to the amazing
community of Semantic Web developers, but also to Web developers with a passion for data
and knowledge in general. Executing this plan, which would result in the writing of this book,
required much more than just our own hands.
First, and foremost, we would like to thank our series editors, Paul Groth and Ying Ding,
for offering us the opportunity to publish our book with them, and especially to Paul for the
idea of turning our tutorial into a book. We are very grateful to Michael Morgan at Morgan
& Claypool for his brilliant and effective guidance through the editing process. Thanks also to
Christine Kiilerich for her essential help on editing. Very special thanks to Aidan Hogan for
reviewing the book, and his essential suggestions for increasing its quality and reach to wider
audiences.
We will always be grateful to the Semantic Web and eScience academic communities for
their vibrant support and constructive criticism. Among them, we want to thank our mentors
who have always inspired and motivated us to pursue and refine our ideas. Thanks to Rinke
Hoekstra for saying “just write your own implementation,” hence proving that good ideas come
not only from good papers, but also from good code. Thank you to Frank van Harmelen for being
a true beacon of science and supporting our research adventures even when the outcomes were
not that clear, and to Richard Zijdeman, for giving us freedom way beyond the budget, being
our best ambassador and tester, and the true realization of interdisciplinary research. Thank you
to Raphaël Troncy, who has been a great supporter and sponsor, encouraging throughout the
writing of this book and the development of some of the technologies presented here. Thanks
to our dear friend Ilaria Tiddi, who put us in contact for the first time; without her, our col-
laboration (and this book) may not have been possible. In addition, very special thanks go to
our esteemed colleagues and friends. Finally, a big, big thank you to Tobias Kuhn for being an
enthusiastic user, an exceptional researcher, and for writing the foreword of this book.
We owe a well-deserved acknowledgment to all the institutions that have supported the
research contained in this book. We would like to thank the Netherlands eScience Center for
their continued support to grlc over the years and in the preparation of this book. We are deeply
grateful to CLARIAH, CLARIAH NL, and CLARIAH-PLUS, and to Henk Wals, Kees
xxii ACKNOWLEDGMENTS
Mandemakers, Frank van Harmelen, Jan Luiten van Zanden, Richard Zijdeman, and Auke
Rijpma for supporting and funding the research that led to this book.
Finally, our deepest gratitude goes to you, the developers, for carrying out the work that
keeps the world connected through the Web. Thank you to all the grlc and SPARQL Transformer
contributors for your code, your issues, and your wisdom, with special thanks to Rinke Hoekstra,
Richard Zijdeman, Roderick van der Weerdt, Jaap Blom, Arnold Kuzniar, Mari Wigham, Jur-
riaan Spaaks, Jonas Jetschni, and Ruud Steltenpool. Thanks to John Walker for his enthusiastic
support and reporting key issues. A big thank you to the SALAD (Services and Applications
over Linked APIs and Data) community, the community of Semantic Web developers, and Web
developers from all walks of life. It is for you that we have written this book. We hope you enjoy
it!
CHAPTER 1
1 See https://www.w3.org/standards/semanticweb/data.
2 1. KNOWLEDGE GRAPHS OF LINKED DATA
scale” [Hogan et al., 2021]. References to these ideas, and to the term itself, can be traced back
to the 1950s, 1960s, and 1970s [Gutierrez and Sequeda, 2020], and were coined around the
general idea of semantic networks and the data structures that could support rich descriptions of
domains that were needed for the research being developed at the time around knowledge rep-
resentation and reasoning. However, in 2012 Google popularized the term when it released the
Google Knowledge Graph [Blog, 2012] to support the generation of its now famous knowledge
panels in Web searches; many other Knowledge Graphs followed to the extent that no list can
account for how many of them are out there anymore.
But before this happened, the Semantic Web community had been using the term Linked
Data since at least 20062 to refer to a concept with many overlaps with Knowledge Graphs.
Similar to the Web of HTML documents, Linked Data proposes four basic principles to inter-
connect data bits on the Web
1. use URIs as names for things;
2. use HTTP URIs so that people can look up those names;
3. when someone looks up a URI, provide useful information, using the standards (RDF,
SPARQL); and
4. include links to other URIs so that they can discover more things.
In this way, a “graph” of connected URIs can be published (with RDF statements;
see Section 1.2), queried (with SPARQL; see Section 1.3), and traversed (with HTTP;
see Chapter 2) just as we usually do with Web HTML documents; this time, however, machines
can process the knowledge made explicit in the graph—contrarily to HTML documents con-
taining human language, which needs a great deal of Natural Language Processing (NLP). These
principles, together with standards like Linked Data Platform (LDP) [Speicher et al., 2015] and
initiatives for vocabulary reuse such as schema.org [Guha et al., 2016] and Linked Open Vocab-
ularies [Vandenbussche et al., 2017], made an entire ecosystem of hundreds of thousands of
Linked Data datasets available to the public with billions of triples [Fernández et al., 2017]. To
many, this was the realization, through a community of enthusiastic volunteers and institutions,
of a global, decentralized, and publicly available Knowledge Graph.
Further reading To know more about Knowledge Graphs and Linked Data we forward you
to the following sources.
• On Knowledge Graphs, the recent survey by Hogan et al. [2021] is a key contribution
to condense all ideas, technologies, and methods published under this flag until 2021.
Its size (136 pages!) reflects the titanic effort it represents to cover all what Knowledge
Graphs really encompass. This reference is highly recommended.
2 https://www.w3.org/DesignIssues/LinkedData.html
1.2. RDF: RESOURCE DESCRIPTION FRAMEWORK 3
• To know more about the history of Knowledge Graphs, see the insightful ISWC 2019
tutorial Knowledge Graphs: how did we get here? 3 [Gutierrez and Sequeda, 2020].
• Other notable graph databases include Neo4j and the Cyhper query language,4 Grem-
lin,5 and GraphQL (which we introduce later in this chapter in Section 1.4).
can be processed by any RDF compliant parser, and tell it that Tim is a person and that his
email address is [email protected]. Therefore, RDF statements are made of these short, three-
component sentences that we call triples. The first element of the triple is the subject (i.e., the
resource we want to say something about), the second is the predicate (i.e., the verb or property
on that subject), and the third is the object (i.e., the value of that property or the receptor of
the statement). Resources, in the form of URIs, can appear as subjects, predicates, or objects.
Literals (e.g., strings, integers, dates, etc.) can appear only in the object position (for example to
specify a birth date, an age, or a name). We can also use anonymous URIs in RDF, also known
as blank nodes, if we don’t know—or don’t want to create—the URI for a particular resource.
Some of these statements may contain semantically enhanced resources, e.g., terms of
the RDFS [Brickley and Guha, 2014] and OWL [McGuinness et al., 2004] vocabularies allow
reasoners to derive triples that are only implied by those terms. RDF Knowledge Graphs are
practically written down using a number of serializations, like N-Triples, N-Quads, Turtle, or
JSON-LD (which embeds RDF into the popular JSON data format).
3 Juan F. Sequeda and Claudio Gutierrez. Knowledge Graphs: How did we get here? A Half Day Tutorial on the History of
Knowledge Graph’s Main Ideas, ISWC 2019, October 27, 2019. http://knowledgegraph.today.
4 Neo4j official guide: https://neo4j.com/developer/cypher/guide-cypher-basics/.
5 Gremlin official guide: https://tinkerpop.apache.org/docs/current/tutorials/getting-started/.
4 1. KNOWLEDGE GRAPHS OF LINKED DATA
N-Triples N-Triples [Beckett, 2014] is an RDF serialization format that emphasizes ease of
processing by applications, especially streaming. The basic ideas of N-Triples are: (a) one line,
one triple; (b) subjects, predicates, and objects are separated by blank spaces, and triples end
with a full stop (.); and (c) each line should be fully interpretable in its own (i.e., no prefixes or
aliases declared elsewhere). An example is shown in Listing 1.1.
<http :// dbpedia .org/ resource /AC/DC > <http :// www.w3.org
/1999/02/22 - rdf -syntax -ns#type > <http :// dbpedia .org/ ontology /
Band > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
genre > <http :// dbpedia .org/ resource /Hard_rock > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
genre > <http :// dbpedia .org/ resource / Blues_rock > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
activeYearsStartYear > ”1973”^^< http :// www.w3.org /2001/
XMLSchema #gYear > .
Turtle Contrarily to N-Triples, the Turtle (Terse RDF Triple Language) serializa-
tion [Beckett, 2014] prioritizes human readability in front of ease of automated processing.
Turtle introduces several aliasing techniques to make RDF triples shorter and more readable to
humans:
• URI namespaces can be aliased through the @prefix keyword.
• If we want to repeat the subject of this triple in the next one, we can omit it altogether
by finishing the current triple with a semi-colon (;).
• If we want to repeat the subject and predicate of this triple in the next one, we can
omit both altogether by finishing the current triple with a comma (,).
• Triples end with a full stop (.).
This gives a much more natural reading of RDF, in particular to English speakers; an
example is shown in Listing 1.2.
N-Quads N-Quads [Carothers, 2014] is a similar serialization to N-Triples in the sense that
it prioritizes machine readability and a line-by-line coherent, independent processing. On top
of this, N-Quads adds a fourth “column” to indicate the named graph the triple belongs to,
after the one indicating the object (remember RDF statements are made of subject, predi-
cate, object triples). This effectively modifies the basic model of RDF, replacing the notion of
subject-object-predicate “triples” with subject-object-predicate-graph “quads.” The fourth URI,
the named graph, allows a higher level of grouping and building sets of triples, allowing triples
to be repeated in different graphs if so desired. An example is shown in Listing 1.3.
<http :// dbpedia .org/ resource /AC/DC > <http :// www.w3.org
/1999/02/22 - rdf -syntax -ns#type > <http :// dbpedia .org/ ontology /
Band > <http :// bands .org/awesome -bands > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
genre > <http :// dbpedia .org/ resource /Hard_rock > <http :// bands .
org/awesome -bands > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
genre > <http :// dbpedia .org/ resource / Blues_rock > <http :// bands .
org/awesome -bands > .
<http :// dbpedia .org/ resource /AC/DC > <http :// dbpedia .org/ ontology /
activeYearsStartYear > ”1973”^^< http :// www.w3.org /2001/
XMLSchema #gYear > <http :// bands .org/awesome -bands > .
Listing 1.3: Some RDF statements with named graphs (quads) serialized as N-Quads.
JSON-LD Among standard serializations of RDF, JSON-LD is the one developed thinking
to web application scenarios. Based on the JSON format, it provides some special attributes for
explicit the semantic of the data. In particular, the id and type of a resource can be specified in
the @id and @type fields. The other property-values pairs reflect the predicate-object part of the
triples, with the predicate to be disambiguated against the schema specified in the context. The
latter can be a mapping of each property to a particular URI or a reference to an external schema.
Literals can be expressed as plain string or object containing a @value and—optionally—a @type
or a @language.
The following example contains some data about AC/DC in JSON-LD.
6 1. KNOWLEDGE GRAPHS OF LINKED DATA
{
” @context ”: ”http :// dbpedia .org/ ontology /”,
” @graph ”: {
”@id”: ”http :// dbpedia .org/ resource /AC/DC”,
” @type ”: ”http :// dbpedia .org/ ontology /Band”,
” genre ”: [
”http :// dbpedia .org/ resource / Hard_rock ”,
”http :// dbpedia .org/ resource / Blues_rock ”
],
” activeYearsStartYear ”: {
” @value ”: ”1973” ,
” @type ”:
”http :// www.w3.org /2001/ XMLSchema # gYear ”
}
}
}
Further reading To know more about Knowledge Graphs and Linked Data we forward you
to:
• the Linked Data: Evolving the Web into a Global Data Space book [Heath and Bizer,
2011], totally available online in HTML free of charge. To many, this was the canonical
technical manual to publish Linked Data in RDF on the Web;
• the RDF 1.1 Primer document6 [Schreiber et al., 2014] is a nice introduction to the
basic concepts and syntax of RDF;
• the official W3C standard specification for RDF Turtle 1.17 [Beckett, 2014]; and
• Harald Sack’s essential course on YouTube https://www.youtube.com/channel/
UCjkkhNSNuXrJpMYZoeSBw6Q/.
Listing 1.5: Minimal example of a SPARQL SELECT query. PREFIX works analogously to
Turtle’s @prefix; notice the slight difference in syntax.
The following triple matches these conditions and could be included in the result set.
{ {
band: { "band": {
name "name": "AC/DC",
genre "genre": "Hard Rock",
founded_in "founded_in": 1973
} }
} GraphQL Query } Results
8 https://graphql.github.io/
9 According to the State of JavaScript 2020 survey. https://2020.stateofjs.com/en-US/technologies/datalayer/.
1.4. GRAPHQL: WEB API MADE EASY 9
query language, we recommend the tutorials An Introduction To GraphQL10 and The Fullstack
Tutorial for GraphQL.11
10 Olaf Hartig and Ruben Taelman. An Introduction To GraphQL, Half-day Tutorial at ISWC 2019, October 27, 2019.
https://www.ida.liu.se/research/semanticweb/events/GraphQLTutorialAtISWC2019.shtml.
11 GraphQL Community and Prisma. The Fullstack Tutorial for GraphQL, 2017.
https://www.howtographql.com/.
11
CHAPTER 2
Listing 2.1: HTTP request retrieving rock bands from a Web API.
For the sake of simplicity we will focus on GET requests as a canonical example for query-
ing. It is important to emphasize that the HTTP resource identifier consists of two parts.
2.1. QUERYING KNOWLEDGE GRAPHS 13
1. The HTTP server. This is the URL prefix that all HTTP resources share in a server; this
is, the http://example.org part.
2. The HTTP resource within that server. That is the URL suffix that uniquely identifies the
resource with in the server; this is, the /bands part.
HTTP uses this syntax to identify different resources within a server—for example, we
will likely find information about dogs within that server just replacing /bands by /artists—
but also to introduce an abstraction layer through different servers. If they agree on a common
API, it is likely we can use the same trick to get /bands and /artists from a different HTTP
server, e.g., http://rockbands.io.
SPARQL uses this distinction too, albeit with a different nomenclature:
1. a SPARQL HTTP server is called a SPARQL endpoint; and
2. a SPARQL HTTP resource is a combination of a GET parameter (typically called query)
and an URL-encoded SPARQL query.1
For example, let us assume that we want to query the Wikidata [Vrandečić and Krötzsch,
2014] endpoint, https://query.wikidata.org/sparql, to get all instances of “rock group”
it knows about, using the query shown in Listing 2.2. Property P31 is instance of, and item
Q5741069 are rock group.
Listing 2.2: SPARQL query to retrieve all rock groups, and their English labels, from Wikidata.
All we need to do to send this query from an application via HTTP is to compose a
URL that concatenates the endpoint name https://query.wikidata.org/sparql, the query
parameter ?query=, and the URL-encoded version of the query; and to request that URL with
HTTP GET. The result looks like shown in Listing 2.3.
1 For other HTTP-compatible ways of sending queries to SPARQL endpoints, see [W3C SPARQL working group,
2013].
14 2. ACCESSING KNOWLEDGE GRAPHS PROGRAMMATICALLY
GET https :// query . wikidata .org/ sparql ? query = SELECT %20%3 Fitem %20%3
FitemLabel %20%0 AWHERE %20%0 A%7B%0A %20%20%3 Fitem %20 wdt %3 AP31 %20
wd %3 AQ5741069 .%0A %20%20 SERVICE %20 wikibase %3 Alabel %20%7 B%20 bd %3
AserviceParam %20 wikibase %3 Alanguage %20%22%5 BAUTO_LANGUAGE %5D%2
Cen %22.%20%7 D%0A%7D
Listing 2.3: HTTP request retrieving rock groups from the Wikidata SPARQL endpoint
service.
This is a well-formed, SPARQL-compliant request that we can issue with any HTTP
client or library. For example, curl is a CLI program that can transfer this HTTP requests and
show its response, as shown in Listing 2.4.
curl -H'Accept : application /json ' -X GET ” https :// query . wikidata .
org/ sparql ? query = SELECT %20%3 Fitem %20%3 FitemLabel %20%0 AWHERE
%20%0 A%7B%0A %20%20%3 Fitem %20 wdt %3 AP31 %20 wd %3 AQ5741069 .%0A
%20%20 SERVICE %20 wikibase %3 Alabel %20%7 B%20 bd %3 AserviceParam %20
wikibase %3 Alanguage %20%22%5 BAUTO_LANGUAGE %5D%2 Cen %22.%20%7 D%0A
%7D”
Listing 2.4: Using curl to transfer HTTP data about rock groups from the Wikidata SPARQL
endpoint. We set the HTTP header Accept to ask the endpoint to send back the response in
JSON format.
curl is convenient for trying HTTP requests out from a terminal interface, but not so
much for executing SPARQL queries from the application code. Fortunately, there are plenty of
libraries for HTTP requests in a variety of programming languages. There are too many for us to
illustrate in this book, so we will use a prototypical example in Python and the requests2 library.
Listing 2.5 shows how this is done, freeing developers from having to manually concatenate
URLs of HTTP servers and resources, URL-encode parameters, etc. All we need is two strings
to hard-code the SPARQL endpoint address and the body of the SPARQL query, and to let
the library know that this query is the value of a parameter named query that should be sent as
payload. We can also set another dictionary with optional request headers to request the data
in, e.g., JSON format, as we did with curl in Listing 2.4. The body of the response from the
server is then available to the application to continue its flow.
2 https://requests.readthedocs.io/en/master/
2.1. QUERYING KNOWLEDGE GRAPHS 15
import requests
print (r.text)
Listing 2.5: Send a SPARQL query to Wikidata via HTTP using the Python library requests.
SPARQL Wrapper
SPARQL Wrapper 3 is a popular library for sending SPARQL queries and processing their results
in Python.
We can better understand the approach of SPARQL Wrapper looking back to the Python
examples using requests in Section 2.1.1. Listing 2.6 shows how to use SPARQL Wrapper
3 https://rdflib.dev/sparqlwrapper/
16 2. ACCESSING KNOWLEDGE GRAPHS PROGRAMMATICALLY
to achieve a similar result. In this case, the constructor SPARQLWrapper expects the URL of
the SPARQL endpoint to be queried; in the example, we use DBpedia, another large-scale
Knowledge Graph of world knowledge extracted from Wikipedia. Next, the setQuery method
expects a string encoding the SPARQL query to send; in the example, we want to retrieve the
human-readable label for the resource representing the city of Barcelona. An interesting feature
of SPARQL Wrapper is that it lets us set the return format of the data in the response using
the setReturnFormat method. JSON will be the preferred option for most developers, but
other options like XML are definitely possible.4 Finally, we execute the query—under the hood,
the library will take care of all details around preparing the HTTP request and receiving and
converting the response—and we iterate over the result dictionary. An important consideration
is that the dictionary keys ”results”, ”bindings”, and ”value” are static and will not change
regardless of the query’s contents.5 However, the key ”label” gets its name from the query’s
variable ?label and we carefully need to adapt it if we change that variable in the query.
RDFLib
RDFLib6 allows developers to create and manipulate RDF, but also lets them query RDF data
using SPARQL, in a slightly different manner. Many times, rather than a SPARQL endpoint,
4 Note here that: (a) in the HTTP example we achieved a similar result by directly manipulating the contents of the
Accept HTTP header; and (b) not all SPARQL endpoints will support all content negotiation formats.
5 We will see this in deep in Chapter 5.
6 https://rdflib.readthedocs.io/en/stable/
2.1. QUERYING KNOWLEDGE GRAPHS 17
we need to query a file-like source of RDF data, for example a local N-triples file [Beckett,
2014] or a remote HTML page with embedded RDFa [Herman et al., 2015]. In such cases,
using RDFLib for querying and processing can be quite useful. Listing 2.7 shows how to do
this, assuming there is a public RDF N-triples file at http://example.org/rdf-data.nt with some
information about music bands. We start by creating a Graph() object, which can parse() any
file-like object containing RDF data; this can be a local or a remote file, even an HTML page
with embedded RDFa triples. query() requires the SPARQL query as a string and will take
care of locally resolving it against the retrieved data.
import rdflib
g = rdflib . Graph ()
g. parse (”http :// example .org/rdf -data.nt”)
qres = g. query (
”””
PREFIX dbo: <http :// dbpedia .org/ ontology />
PREFIX rdf: <http :// www.w3.org /1999/02/22 - rdf -syntax -ns#>
PREFIX rdfs: <http :// www.w3.org /2000/01/ rdf - schema #>
SELECT ? bandname
WHERE {
?a rdf:type dbo:Band ;
rdfs: label ? bandname .
}”””)
Listing 2.7: Retrieving file-like RDF data (local or remote) and querying it with SPARQL and
RDFLib.
RDFLib offers other interesting ways of querying RDF data using Basic Graph Pattern
(BGP) matching, instead of SPARQL syntax. The example shown in Listing 2.8 is analogous to
that shown in Listing 2.7, but uses the triples() iterator7 to match resources of type dbo:Band
and then iterates over the labels for such resources, producing the same result.
7 Note that: (a) the triples() iterator requires a Python tuple as a parameter of the form (s, p, o), matching the shape
of a triple pattern; and (b) the None Python keyword is used to specify the unknown resource(s) of the query—which typically
are SPARQL variables.
18 2. ACCESSING KNOWLEDGE GRAPHS PROGRAMMATICALLY
import rdflib
from rdflib import URIRef
from rdflib . namespace import RDF , RDFS
g = rdflib . Graph ()
g. parse (”http :// example .org/rdf -data.nt”)
Listing 2.8: Retrieving file-like RDF data (local or remote) and querying it using RDFLib
iterators.
Jena
Apache Jena8 is a Java framework which provides similar functionality to that provided by RD-
FLib and SPARQL Wrapper libraries in Python, via its ARQ query engine.
Jena is a full framework, with different APIs including a SPARQL server and semantic
reasoning engine, making it a very powerful SPARQL library.
8 https://jena.apache.org/index.html
9 https://rdf.js.org/
2.2. MANIPULATING SPARQL’S OUTPUT 19
The model aims to represent data keeping the graph structure, which is then directly ma-
nipulated by the developer. The atomic information is the Term, an abstract element represent-
ing any node or edge in the graph. A Term is instantiated in one of the interface’s extensions:
NamedNode, BlankNode, Literal, Variable (intended to be used in queries), and DefaultGraph. The
RDF triple is encapsulated in a Quad, which requires a subject term, a predicate term, and an
object term, as well as the graph the triple belongs to. The DataFactory object contains methods
to create terms and quads.
Different libraries implement the RDFJS data model, among which rdflib.js is
quite known in the community. The data is extracted from a data store, initialized with
$rdf.graph(). The content of the store can be created directly in the code (as in Listing 2.10),
fetched from the Web, or read from a local file. The data is inserted or selected using the triple
20
4
Data Factory Term Quad
termType: String subject: Term
value: String predicate: Term
namedNode( value:String ): NamedNode object: Term
blankNode( [value:String] ): BlankNode equals( other:Term ): boolean graph: Term
literal( value:String, [languageOrDatatype:StringlNamedNode] ): Literal
variable( value:String ): Variable equals( other:Quad ): boolean
defaultGraph( ): DefaultGraph
quad( subject:Term, predicate: Term, object:Term, [graph:Term] ): Quad
LDFlex
LDFlex is a library for RDF data access and manipulation in JavaScript, built following the
RDFJS specification. The goal of LDFlex10 is to provide the user a way of “querying Linked
Data on the Web as if you were browsing a local JavaScript graph” [Verborgh and Taelman,
2020]. This work is fully integrated into the Comunica framework [Taelman et al., 2018a] and
the Solid ecosystem [Verborgh, 2020].
10 https://ldflex.github.io/LDflex/
22 2. ACCESSING KNOWLEDGE GRAPHS PROGRAMMATICALLY
In the LDFlex local graph paradigm, the user can navigate the graph by accessing the
attributed of a JavaScript object. For example, band.genre.label would retrieve the labels of
the genres played by band. The semantics is disambiguated through a JSON-LD context, which
maps each attribute to an RDF predicate (see Chapter 1).
When an attribute is accessed, LDFlex intercepts the operation through JavaScript Prox-
ies,11 which execute a function (called handler) which computes the actual attribute value. In
particular, the function constructs a SPARQL query using the information of the context, exe-
cutes it and returns the solution. In other words, the Proxies are used to “disguise” the queries
as attributes. JavaScript native async/await syntax makes sure that the results are obtained
before moving to the next instruction. In this way, the query mechanic ends up being totally
transparent to the developer.
11 ECMAScript 2017 Language Specification (ECMA-262, 8th edition, June 2017) https://262.ecma-international.org/
8.0/.
2.2. MANIPULATING SPARQL’S OUTPUT 23
{ GraphQL Query
label @single
album {
label
}
genre(label_en: "Hard rock") @single
}
{ JSON-LD context
"@context": {
"label": "http://www.w3.org/2000/01/rdf-schema#label",
"label_en": { "@id": "http://www.w3.org/2000/01/rdf-schema#label", "@language": "en" },
"album": { "@reverse": "http://dbpedia.org/ontology/album" },
"genre": "http://dbpedia.org/ontology/genre"
}
}
In addition to the described behavior, LDFlex provides ways for sorting the results, up-
dating the data, etc. The query engine is not part of the library, for which the use of Comunica
is recommended. Listing 2.11 shows an example of usage of the module.
GraphQL-Based Strategies
Several SPARQL interfaces for GraphQL have been proposed so far, following different strate-
gies [Taelman et al., 2019]. Some of these solutions rely on automatic mappings of variables
to property names (Stardog12 ), while others rely on a context (GraphQL-LD13 ) or a schema
(HyperGraphQL14 ).
In GraphQL-LD [Taelman et al., 2018b], the disambiguation of properties is controlled
by a JSON-LD context, which contains the mapping of each property to a URI—similarly to
what we have seen for LDFlex. A special syntax can be used for filtering by language (combina-
tion of @id and @language), changing the predicate direction (@reverse), obtaining a single
result rather that a list (@single), as it is possible to appreciate in Figure 2.2. The library com-
bines the information in the query and in the context for computing the equivalent SPARQL
query, sent to the SPARQL endpoint. The results are then applied to the query object structure
and returned.
HyperGraphQL requires a schema definition and a configuration file. The latter contains
a list of SPARQL endpoints—called services—that the API needs to query, together with some
preference for setting up the API server. The schema definition includes a context for mapping
properties to URIs. In addition, every type and property can be assigned to a specific server,
allowing federated queries. The results are wrapped in a more complex object, including metadata
12 https://www.stardog.com/
13 https://github.com/rubensworks/graphql-to-sparql.js
14 https://www.hypergraphql.org
2.2. MANIPULATING SPARQL’S OUTPUT 25
type __Context {
Band: _@href(iri: "http://dbpedia.org/ontology/Band")
Genre: _@href(iri: "http://dbpedia.org/ontology/Genre")
label: _@href(iri: "http://www.w3.org/2000/01/rdf-schema#label")
genre: _@href(iri: "http://dbpedia.org/ontology/genre")
}
type Band @service(id:"dbpedia-sparql") {
label: [String] @service(id:"dbpedia-sparql")
genre: Genre @service(id:"dbpedia-sparql")
}
type Genre @service(id:"dbpedia-sparql") {
label: [String] @service(id:"dbpedia-sparql")
} Schema definition
such as the used context. Figure 2.3 shows an example of configuration, schema, and query in
HyperGraphQL.
27
CHAPTER 3
1 Despite REST has a precise definition and not every Web API is a REST API, we will be using both terms REST API
and Web API in this book, highlighting the differences in every instance.
28 3. WEB DATA APIS OVER SPARQL
away almost any expressive power in their queries, telling developers precisely what data can be
queried and how, and therefore downgrading how much choice developers have at the client
side [Haupt et al., 2015]. Perhaps out of pure pragmatism this seemed not to be an issue for
most developers that embraced the RESTful API paradigm. In production development envi-
ronments, and outside the context of the Semantic Web and Knowledge Graphs, REST APIs
are the industry standard.
This apparent dichotomy between SPARQL and RESTful APIs lies at the core of this
book, and is specifically addressed in Chapters 4 and 5. Are these two models of Web querying
irreconcilable? Is it possible to offer developers ways to benefit from the two? Previous research
benefits from the integration space that Knowledge Graphs offer.
An exemplary case is the OpenPHACTS platform [Williams et al., 2012], which builds
an API layer on top of its SPARQL endpoint, between SPARQL and application requests (see
Figure 3.1). This kind of stacked, separation-of-concerns (SoC) based architecture is the foun-
dation of the methods, techniques, and tools we will be diving into in Chapters 4 and 5. But
before we do that, we need to gain a better understanding of the building blocks of such an archi-
tecture and their interactions: Web Data API specifications; the build-up of such specifications
for SPARQL-accessible Knowledge Graphs; and the pros and cons of taking this approach.
1. They define and expose what resources and methods (see Section 2.1.1) are allowed
(also known as the API documentation or “API docs”).
2. They receive the HTTP requests against those allowed resources and methods, process
them, and return a result to the client.
An example of an API doc about rock bands is shown in Figure 3.2. The first route is
GET /bands/, and its descriptions suggest that this will return a list of all available rock bands
behind the API. Conversely, the next one, POST /bands/, will create a new band in the backend
(possibly asking details about such a band as parameters, as we’ll see later; and surely asking
clients to be authenticated). The documentation goes on in the various ways in which clients
may ask details about a specific rock band (GET /bands/{band_id/}/), update its details (with
HTTP PUT), delete it, etc.
Originally, API docs were manually written by developers in HTML as technical docu-
mentation, so they could communicate to other developers what routes were available at their
servers, what parameters should be used in every route, etc. With time, this became an increas-
ingly tiresome task that API publishers wanted to automate. This is how API documentations
quickly became API specifications in need of standardization.
3.1. REST APIS 29
REST API
SPARQL endpoint
Figure 3.1: Overall concepts behind the OpenPHACTS architecture for enabling Knowledge
Graph access through REST APIs. The main idea is to build a Knowledge Graph to integrate
information from various sources; use SPARQL on top to leverage that integrated data space;
and build a REST API layer on top of SPARQL to enable easy consumption by users and
applications.
30 3. WEB DATA APIS OVER SPARQL
Figure 3.2: An example of an API doc, listing the routes (combinations of resources and meth-
ods) that are available for clients to call about rock bands.
2 https://www.openapis.org
3 https://swagger.io
4 See https://swagger.io/tools/open-source/open-source-integrations/ for a list of open source libraries.
3.1. REST APIS 31
openapi: 3.0.0
info:
title: Music Bands API
description: Provides information about music bands . This
description supports [ CommonMark ]( http:// commonmark .org /) or
HTML syntax .
version: 1.2.1
servers:
- url: http:// api. rockbands .io/v1
description: Main ( production ) server # This description is
optional
- url: http:// staging -api. rockbands .io
description: Internal staging server for testing # This
description is optional
Listing 3.1: Example of the metadata and server routes sections of an OpenAPI specification.
the API adheres to, which may be useful to inform consumers of its features; and the title,
description, and version of the API, which will mainly inform humans of what the API does
and its development status. Server routes are used to indicate the URLs that must be used as a base
namespace when using the API, and to which the path (or method) names must be appended
to. These URLs are usually unique, as one API is typically based in one server; but having more
might be useful for development purposes (for example, for hosting different production and a
testing versions of the API).
paths:
/bands /{ band_id }:
description: Returns a band by its id , and optionally its
location details
parameters:
- name: band_id
in: path
required: true
description: the band identifier
schema:
type: integer
- name: includeLocation
in: query
description: whether to return the band 's location
required: false
schema:
type: boolean
get:
responses:
'200 ':
description: the band being returned
content:
application /json:
schema:
type: object
properties:
id: # the unique band id
type: integer
name: # the band 's name
type: string
format: binary
location: # the band 's location
type: string
format: binary
GET http://api.rockbands.io/v1/bands/32?includeLocation=true
According to the specification in Listing 3.2, when such a call is successful it will return a
200 response5 with a well-defined result in its body in JSON format. As defined in the schema
section, this result will be one single object with three properties: the band ID (as an integer),
the band’s name (as a binary string), and the band’s location (if we requested it via the query
parameter includeLocation). For example (ignoring the HTTP headers):
Listing 3.3: Example of the implementation of an API path in Python, as defined in an excerpt
of an OpenAPI specification.
assume a db object with a prepared connection able to answer SELECT queries, and we use
that interface to retrieve the band’s details. Importantly, as a next step we would need to post-
process the results form the database: this might include data aggregations, transformations, and
cleaning; in general, any processing needed to adapt the raw data from the database into values
that fit to our declared schema. Then, we fit those post-processed results into such a schema: in
this case, by creating a Python dictionary with the fields required by the API schema—the band
ID, its name, and its location. Finally, we return an HTTP 200 response with a conversion of
that dictionary into a JSON object, which is what the API specification promises.
With an implementation such as this one for each of the paths defined in an OpenAPI
specification, we can see now how REST APIs, in particular by providing structured, stan-
dard specifications, and their accompanying implementations, can function as a communication
interface between data consumers (clients, applications) and data providers. The simple combi-
nation of HTTP methods (GET, POST, etc.) and resources (/bands/ with its parameters) is
a simple yet powerful paradigm to access data, independent of which database engine is used
to store this data. How does this fit, then, the picture of querying Knowledge Graphs with
SPARQL? In the next sections, we will see how REST APIs may work when data—rather than
locally stored in a database—is remotely published in a Knowledge Graph.
However, we are only interested in a specific band identified by its ID (Led Zeppelin’s
ID in Wikidata is wdt:Q2331). We could replace all instances of the variable ?item with that
ID, but that would include the projection variable right after SELECT, which would be a syntax
error. A much more elegant solution is to introduce the VALUES clause, which will defer the job
to SPARQL by requesting to include results that bind ?item to the specific values in the clause.
The problem is, of course, that we do not know beforehand what these values will be. To solve
this, one option we have is to introduce a placeholder ___id___ that will be replaced with the
value of the URL parameter band_id. With this placeholder in the query (sometimes hence also
called query template or parametrized query), our API implementation will systematically rewrite
it to retrieve the details of the band whose ID has been supplied only, therefore fulfilling the
API specification for this particular path.
As the reader can notice, parametrizing queries in this way can be a general and effective
method for mapping the requirements of Open/REST API specifications and the particularities
of SPARQL queries. SPARQL queries can still grow in complexity, requiring the addition of
36 3. WEB DATA APIS OVER SPARQL
data = r.json ()
response = {'id ': data['item '], 'name ': data['itemLabel ']}
if includeLocation :
response ['location '] = data['locationLabel ']
Listing 3.4: Example of the implementation of an API path in Python, querying the Wikidata
SPARQL endpoint instead of a local database.
3.3. LIMITATIONS OF KNOWLEDGE GRAPH APIS 37
more API parameter placeholders; and some concrete features of the OpenAPI spec may need
to be managed properly, like authentication or write permissions.
But under reasonable assumptions, we could imagine that the average use case for build-
ing APIs for Knowledge Graphs essentially consists in repeating this pattern over and over: map
the API specification into the function/route headers; map the API parameters into SPARQL
VALUES placeholders; send the rewritten query to the specified endpoint and collect the results;
and map the results into the desired schema and build the HTTP response. If we further assume
that the entire API operate over one single SPARQL endpoint, which is often the case, we may
need to ask for the endpoint URL just once. In other words, we have reduced the complex-
ity of building Knowledge Graph APIs to properly managing, and documenting, a handful of
SPARQL queries.
As we have seen, combining REST APIs, OpenAPI specifications, and HTTP requests can be
a powerful tool to quickly build and deploy SPARQL-based Knowledge Graph APIs. We could
leave it at this point, and recommend all developers interested in exposing Knowledge Graph
APIs to adapt the example shown in Listing 3.4 to any operation or API path that they want to
expose to their clients.
However, such a recommendation would have three very important limitations: repetitive
work, query management, and controlling results. The rest of this book explains methods and
tools to overcome them.
CHAPTER 4
4.1 OVERVIEW
grlc is an open-source software package which automatically builds Knowledge Graph APIs
from a collection of SPARQL queries. It incorporates many features which allow for APIs
flexibility (see Section 4.4). This allows grlc to tackle the limitations discussed in Section 3.3: a
systematic approach to API development avoids repetitive work; queries are stored separately
from application source code; queries can be made publicly available on the Web allowing for
API transparency; and queries can be stored in GitHub, allowing version controlled APIs.
So, what is the basic idea of grlc? Its main tenet is that bringing up Web APIs on top
of SPARQL endpoints (as we have seen in Chapter 3) should not be any harder than simply
publishing the SPARQL queries implementing the operations of those Web APIs. Very often,
such operations do not require anything more or anything less than mapping their API param-
eters to a SPARQL query, executing that query against the endpoint, and simply returning the
results to the user of the API. True, there will be cases that will not adjust to this schema: some
complex API operations might require more than one SPARQL query (for example, if the op-
eration needs to grab data from various data sources using different query languages) or may
need some pre-processing (for example, to the input parameters or to the queried databases) or
40 4. grlc: API AUTOMATION BY QUERY SHARING
post-processing (for example, transforming the data with algorithms that are not supported in
SPARQL; or converting the results to another data model). But, in our experience, these cases
are rare and, for most users, a one-query SPARQL implementation of the operation is more
than enough.
grlc can automatically build Web APIs on top of SPARQL endpoints by leveraging this
one-query, one-operation philosophy, and three other important principles.
Avoiding repetitive labor As we have seen in Chapter 2, writing Web APIs involves a lot
of code repetition around language-dependent facilities such as function headers, decorators,
parameters, etc., around the API. In most cases, developers are not really interested in these
mechanical, language-based translations: they just want to turn their SPARQL queries into a
functional Web API.
Leveraging query management and documentation It turns out that, in many cases, devel-
opers are already saving their queries into proper management systems, and not just hard-coding
them as we saw in Chapter 2. With just a little bit of additional care in adding the right meta-
data on top of those queries, they can go from simple text files encoding one query, to central
resources in specifying full-fledged, actionable API operations. Most of this chapter assumes
that GitHub1 is such a system; but any other proxy with similar features will do also.
Reusing publicly accessible queries The key of grlc de-coupling with queries is that it does
not need to physically store nor confine them to its system. All it needs is that SPARQL queries
are publicly available on the Web and de-referenceable via HTTP.2 This brings up a unique,
interesting feature of grlc: it assumes that SPARQL queries can (and should) be uniquely and
globally identifiable on the Web through URIs, for example http://mydomain.org/myqueries/
query1.sparql; just as Linked Data and RDF propose to identify any Web resource, so this should
apply to queries as well. Because SPARQL queries acquire this “full Web citizenship,” services
can retrieve and operate on them as they need; in grlc’s case, to automatically build Web APIs
using them. Circumstantially, most of this chapter also assumes GitHub as this HTTP server
and URI manager (because any file stored in GitHub acquires these features automatically); but,
again, this is not required and any other alternative providing similar features will work just fine.
By leveraging these principles, grlc can automatically create Web APIs on top of SPARQL
endpoints with zero coding, practically and effectively supporting developers to deploy Web
APIs for Knowledge Graphs in no time, saving resources, and helping them focus on other,
more critical Knowledge Graph architecture parts.
grlc is written in Python. Its source code is available at https://github.com/CLARIAH/
grlc under the permissive MIT license. It can be installed via Python package manager “pip”
1 https://github.com
2 This just means that a HTTP server will respond to HTTP requests to the URIs representing SPARQL queries with
their actual content.
4.2. ARCHITECTURE 41
HTTP SPARQL
request query
GRLC
SPARQL
server
https:.../query-id PREFIX: ... endpoint
SPARQL response
{
Request SPARQL "head": { ... },
query query "results": { ... }
}
Query
repository
4.2 ARCHITECTURE
grlc is based on a very simple architecture. It consists of an HTTP server, which accepts HTTP
requests from a client. The URL of these requests is mapped to a SPARQL query stored in a
query repository. This query repository can be Github, the local storage or a remote location
available via HTTP. When the SPARQL query has been resolved based on the HTTP re-
quest, the query can be executed in a SPARQL endpoint. Results from the SPARQL query
are returned to the user as an HTTP response to the original request. Figure 4.1 illustrates this
architecture.
Within this architecture, grlc supports and can carry on the following two use-cases, il-
lustrated in Figure 4.2.
1. The generation of valid and complete Open API-based (see Chapter 3) API specifications;
and the generation of UIs and documentation of such API specifications based on the
Swagger UI.3
2. The actual execution of the operations indicated in such API specifications.
Open API-based specifications and UIs This is shown at the top flow in Figure 4.2. Through
this service, a client can request either the Open API specification of a SPARQL query collection
3 https://swagger.io/tools/swagger-ui/
42 4. grlc: API AUTOMATION BY QUERY SHARING
Linked data cloud
grlc
SPARQL
endpoint
OpenAPI YAML Parameter
spec parser parser
RDF
dump
Client
application
#LD
Call name Query server
execution writer
HTML/
RDFa
Git
repository
metadata
Query provider (GitHub)
Raw queries
Figure 4.2: grlc’s basic features: generation of API specifications (top path) and execution of API
operations (bottom path)—with its interaction with external systems.
stored in an external query management repository (e.g., GitHub) in JSON; or the equivalent
Swagger UI built on top of such specification. In both cases, grlc makes various requests to
the external query management system, retrieving the query collection metadata (such as the
name, license, and creator of the query collection) and also the contents of the specific queries
in such collection. Those queries are annotated with YAML, as we will see later in this chapter,
to fully describe how the query should be interpreted in the context of a Web API (for example,
a summary of what the query does and the address of the SPARQL endpoint against which it
is meant to be executed). A YAML parser deals with those annotations, and together with a
parameter parser they put together a valid and complete Open API specification that is finally
returned to the client.
Operation/callname execution This is shown at the bottom flow in Figure 4.2. Through this
service, a client can request to directly execute one of the operations listed in the Open API
specification. When this is invoked, grlc makes requests to the actual contents of the query from
the query management system; retrieving, again, its body (the SPARQL syntax) and the YAML
annotations. With these, grlc can map any passed HTTP parameter to its query rewriter, which
replaces certain variables, as we will see later, with the value of those HTTP parameters. Finally,
4.3. WORKING WITH grlc 43
grlc sends the SPARQL query to the SPARQL endpoint indicated in the YAML metadata via
HTTP, obtains its results, and returns them to the client in the requested format.
The public instance of grlc at https://grlc.io/ is useful here to show how the grlc’s API
implements these two services/use cases. Assuming that the query management system grlc uses
is GitHub, and that such system exposes SPARQL queries at https://github.com/:owner/:repo,
where :owner is a username and :repo is a repository name, then:
• https://grlc.io/api-git/:owner/:repo/spec returns the query collection Open API spec-
ification in JSON.
• https://grlc.io/api-git/:owner/:repo/ returns the full-blown, Swagger UI based on such
specification.
• https://grlc.io/api-git/:owner/:repo/:operation?p_1=v_1...p_n=v_n calls the opera-
tion/query named :operation with the indicated parameter values.
Nevertheless, GitHub is not the only query management system supported. The following
routes are also available via alternative APIs:
• https://grlc.io/api-local/ for queries stored locally.
• https://grlc.io/api-url/ for queries published elsewhere on the Web via any standard
HTTP server.4
The next Section illustrates the use of all these services and use cases with practical exam-
ples.
4 Collections supplied in this fashion need to comply with a concrete specification; see https://github.com/CLARIAH/
grlc/blob/master/README.md.
44 4. grlc: API AUTOMATION BY QUERY SHARING
Listing 4.1: Querying list of band, album, and genre from DBpedia.
However, if she/he is not familiar with SPARQL—or does not want to bother with
SPARQL syntax every time there is need to look at this data—she/he could create this query
once (potentially with help of a SPARQL-saavy colleague), store this query in Github (for ex-
ample https://github.com/CLARIAH/grlc-queries/blob/master/description.rq) and use grlc to
execute this query, by simply visiting the URL https://grlc.io/api-git/CLARIAH/grlc-queries/
description.
This page would produce a result similar to this:
Executing such a query works for accessing a specific piece of information, but it is not very
flexible and it would produce a very long list of results. Our user may want to access information
only information from rock bands, in which case the query is modified in:
4.3. WORKING WITH grlc 45
Listing 4.3: Querying list of rock bands and albums from DBpedia.
The next day she/he might be interested in looking at alternative rock bands, and yet again
a different query would be generated:
Listing 4.4: Querying list of alternative bands and albums from DBpedia.
Clearly this does not scale. grlc provides a parameter mapping mechanism which allows a
query to define a variable which gets replaced by an API parameter. For example, our user could
use the query in Listing 4.5.
Listing 4.5: Querying list of bands and albums from DBpedia from a specified genre.
This query includes the special variable ?_genre_iri which when executing the query
via the API will turn into the URL parameter genre.5 Our user could then use different
two different URLs: http://grlc.io/api-git/CLARIAH/grlc-queries/enumerate?genre=http://
dbpedia.org/resource/Rock_music for rock or http://grlc.io/api-git/CLARIAH/grlc-queries/
enumerate?genre=http://dbpedia.org/resource/Alternative_rock for alternative rock—or replace
the genre parameter with any other genre.
In web API development, it is usual to use tools for API documentation such as the Ope-
nAPI specification (Section 3.1.1). grlc makes use of OpenAPI to generate API documentation.
For grlc APIs, the OpenAPI definition allows the user to understand which queries are available
to access a Knowledge Graph, representing those in a web page (Figure 4.3. Going back to our
earlier example, our user would be able inspect the Swagger API specification of her/his queries
by visiting: http://grlc.io/api-git/CLARIAH/grlc-queries.
4.4 FEATURES
Web APIs are more powerful when they provide sufficient flexibility to enable consumers to use
access data in an optimal way. grlc accommodates for flexible APIs by means of query decorators
and variable mapping.
Decorators are special keywords included at the top of the file containing a SPARQL
query. They provide special functionality, feeding the OpenAPI documentation generated by
grlc (Section 4.4.1), adding variable parameters to API endpoints (Section 4.4.2), modifying
how queries are executed (Section 4.4.3), and manipulating query responses (Section 4.4.4). All
decorators start with #+, for example:
#+ decorator_1 : value_1
#+ decorator_2 : value_2
In the next sections we describe the different decorators that can be used.
5 More details about variables are reported in Section 4.4.2.
4.4. FEATURES 47
• tags – OpenAPI allows for queries to be grouped together. Each query/operation can
be assigned a tag. Queries/operations with the same tag will be part of the same group.
This allows users to organize their queries/operations.
48 4. grlc: API AUTOMATION BY QUERY SHARING
4.4.2 QUERY VARIABLES
The next group of decorators can modify how query variables are handled in the documentation
UI. Before describing these decorators, we should describe the function and syntax of query
variables in grlc.
Queries can be made more flexible by dynamically replacing variables in the query. This
is called parameter mapping, because HTTP parameters are mapped to these query variables.
In grlc, a variable is defined by adding _ to a SPARQL variable, like this: ?_var. By default,
variables will be replaced by literals. Other data types are supported and can be indicated by
appending an additional _ for example:
• ?_name_en will be interpreted as a literal written in English; this can be done with any
language tag.
• ?_name_integer will be interpreted as an integer.
• ?_name_iri will be interpreted as an IRI.
• ?_name_prefix_datatype will be interpreted as a datatype of type
^^prefix:datatype.
If we want to allow that a variable may have no replacement, two underscores (?__name)
can be used to create an optional parameter.
Additionally, there are decorators which affect how query variables appear in the docu-
mentation UI:
• defaults – sets a default value in the documentation. This value can be changed, so it
serves mostly as a hint to the user.
• enumerate – creates a dropdown menu in the documentation, listing possible values
for the variable.
4.5 EXERCISES
This section provides a series of exercises which aim at providing practical experience of turning
a SPARQL query into a web API using grlc.
You can test your solution using https://grlc.io/, by reading your queries from a GitHub
repository,6 or from a specification file.7 You can also run your own copy of grlc and load your
queries from your local file system.8 All solutions are in Appendix A.
Exercise 4.1 Create an API that retrieves all bands from DBpedia.
Tip: Use the DBpedia ontology type dbo:Band.
Exercise 4.2 Create an API that lists bands that play either Rock or Jazz, and that have either
Liverpool or Los Angeles as hometown.
Tip 1: Use the DBpedia ontology types dbo:genre and dbo:hometown.
Tip 2: Use the grlc enumerate decorator.
Exercise 4.3 Expand the API from the previous exercise by adding documentation and making
sure your query can only be run on DBpedia SPARQL endpoint.
Tip: Use the summary, description, endpoint, and endpoint_in_url deco-
rators.
Exercise 4.4 Create an API that lists the name, genre and hometown of bands whose name
matches a given string.
Tip 1: Use the DBpedia property type dbp:name.
Tip 2: Because DBpedia uses Virtuoso, you can use the built in function
bif:contains.9
6 https://github.com/CLARIAH/grlc#from-a-github-repository
7 https://github.com/CLARIAH/grlc#specification-file-syntax
8 https://github.com/CLARIAH/grlc#from-local-storage
9 http://docs.openlinksw.com/virtuoso/rdfsparqlrulefulltext/
51
CHAPTER 5
Joe John
Sleepless directed
ted Versus the Patrick
direc
Nora Volcano Shanley
Ephron directed act Lana
ed in
in Wacho-
ted That
You’ve ac d Thing wski
directe
d
cte
Got Mail acted in You Do
re
Andy
di
Tom ted Wacho-
directed Apollo acted in direc wski
13 acted in Hanks Cloud
Ron
Howard acted in Atlas directed
directed The Da acted Tom
d in
Vinci acte in Tykwer
Charlie
Code
Wilson’s
War directe Mike
d
Nichols
Figure 5.1: A graph of movies, actors and directors (top), with 3 different ways of representing
it as a tree (bottom).
to convert this graph into JSON-LD, we have a choice. Where do we start? From which root
node do we start building our tree? In our case, we have at least three possible choices. We can
represent this data as an array of films, each of them containing information about the actors
and directors involved. Or, we can have the actors at the top, grouping together for each of them
the set of films in which they performed—with their respective directors. Or, finally, we can do
the same work by choosing the directors as root, then on a second level the films, finally the
actors. This choice certainly depends on our ultimate goal, i.e., how we want to use this data.
However, you should take into account that this choice is not trivial and software and DBMS
do not make it for you.
If you have some experience with SPARQL endpoints, you also know that the JSON
output of a query is not JSON-LD.1 When performing a SELECT query, the standard structure
follows a W3C recommendation called SPARQL Query Result JSON Format.2 This structure
is made of two members:
• the head object contains some metadata, such as the list of variable names used in the
results; and
1 Exceptions are some kinds of queries, such as DESCRIBE, that output RDF data as results, and for which endpoints (e.g.,
Virtuoso) offer JSON-LD output.
2 SPARQL Query Result JSON Format recommendation https://www.w3.org/TR/sparql11-results-json/.
5.1. SPARQL RESULTS JSON FORMAT: THE CURSE OF THE BINDINGS 53
• the results object has a required key (bindings) containing the query results. It may
also include other kinds of metadata, such as if results are ordered.
We give now a better look to the bindings key, which is a list of results objects as binding
of variables. This means that each object represents a single query solution; its keys are query
variable names and its values are JSON objects containing always a type (uri, literal or blank
node) and a value represented as a string, sometimes a datatype or a xml:lang tag. With
this structure, reading any value from the results means accessing 2 objects keys (results and
bindings), then an array item, then again 2 keys (the involved variable and value), for a total
of 5 levels, deeper in the structure than what a developer may expect.
As the query results represent all the valid solutions of the query, it is possible that two
bindings differ only by a single field. We can take the query in Listing 4.1 as an example and run
it against DBpedia. Looking at the results in any format ( JSON included), we can see several
repetitions in the band values: each band appears in the results N M times, where N is the
number of albums by the band and M the genres assigned to it. The more variables there are, the
more repetitions are possible. From another perspective, it is as if we are reading tabular data;
each line is a solution and each cell contains (maximum) a value. This is confirmed by looking
at the query results in Listing 4.2: in the first 7 bindings, Asia (band) appears 4 times (1 album
* 4 genres), while Bauhaus (band) appears 2 times (1 album * 2 genres). For obtaining complete
information about a specific band, we need to collect all related bindings. In the worst scenario,
pagination may split these bindings into different pages, hampering the collection task.
Having described the standard JSON format, several barriers obstruct the data consump-
tion from a web development perspective. In particular, developers have to accomplish four
recurrent tasks.
1. Skip redundant metadata. Often, the metadata in the SPARQL output is simply not used
by developers. For example, this is true for the list of used variables names in the head
object: not only it is already known from the query, but one might also infer it directly
from the results. In practice, developers may ignore completely this part and check for the
availability of a certain property directly in the bindings.
2. Reducing and parsing. The value of a property is always wrapped in an object with the
type and value attributes. All literals are returned as strings, without regard to the original
RDF datatype, expressed separately. A simpler structure can be obtained by extracting the
final value from its wrapping object and directly attach it to the variable key, taking care
of the casting of numbers and Booleans.
3. Merging. The repetition of values is one of the biggest limitations of bindings. Merging
them based on common URIs is therefore crucial. Nevertheless, we need to make a choice
about which node will be the root of the merged tree and the anchor for the merging. The
developer needs to be in charge of this choice, and technology must empower her/him in
this task.
54 5. SHAPING JSON RESULTS: SPARQL TRANSFORMER
4. Mapping. There may exist specific needs for giving the results a particular structure or vo-
cabulary, i.e., for using the data in input to a third-party library, or for embedding metadata
in web pages following schema.org.
The libraries seen in Section 2.2 solve one or more of these problems, but none of them
handle all. In particular, the merging of results is unusually taken in charge, leaving the developer
alone against the curse of the bindings.
[
{
”id”: ”http :// dbpedia .org/ resource / Asia_ (band)”,
” album ”: ”http :// dbpedia .org/ resource / Axioms_ ( album )”,
” genre ”: [
”http :// dbpedia .org/ resource / Art_rock ”,
”http :// dbpedia .org/ resource / Progressive_rock ”,
”http :// dbpedia .org/ resource /Album - oriented_rock ”,
”http :// dbpedia .org/ resource / Arena_rock ”
]
},
{
”id”: ”http :// dbpedia .org/ resource / Bauhaus_ (band)”,
” album ”: ”http :// dbpedia .org/ resource / Swing_the_Heart ...” ,
” genre ”: [
”http :// dbpedia .org/ resource / Gothic_rock ”,
”http :// dbpedia .org/ resource /Post -punk”
]
},
...
Listing 5.1: Result from query in Listing 4.1 obtained with SPARQL Transformer.
The goal of avoiding the above-introduced recurrent tasks is at the foundation of a differ-
ent method for writing and executing queries. SPARQL Transformer (ST) [Lisena and Troncy,
2018, Lisena et al., 2019] is an approach for accessing data contained in SPARQL repositories
and obtaining them in a convenient JSON shape. It consists of two main components: (1) a
JSON-based query syntax and (2) a library that processes this syntax and uses it for retrieving
data.
5.2. DATA RESHAPE WITH SPARQL TRANSFORMER 55
{
"proto": {
Prototype definition
"id" : "?band",
- describe the template
"album" : "?album",
- define the replacements
"genre" : "$dbo:genre$required"
- options and filters
},
"$where" : [
"?band a dbo:Band",
$-Modifiers "?album a schema:MusicAlbum",
$where $values $limit $distinct "?album dbo:artist ?band"
$orderly $groupby $filter ... ],
"$limit": 100
}
Figure 5.2: An example of JSON Query, in which it is possible to distinguish the prototype
definition and the $-modifiers.
With ST, it is possible to obtain a result similar to the one in Listing 5.1: each band is
represented using a single object, collecting all its albums and genres in a single array. This can
be obtained thanks to the merging capabilities of ST, driven by user choices. In addition, this
output does not include unnecessary metadata. Among other features, Booleans and numbers
are automatically parsed and you can easily define and change the requested language for literals
in localized queries.
{
” language ”: ”en”,
” value ” : ”Asia (band)”
}
Listing 5.2: A leaf node in the results with the preserved language tag.
5.2.2 ARCHITECTURE
The SPARQL Transformer library, compliant with the so-described syntax, is available in two
different open-source implementations:
• in JavaScript4 published on NPM.5 This version offers the library as an ECMAScript
Module, designed to both work in Node.js and the browser; and
• in Python,6 published on PyPI.7 This version returns a dict object, which can be di-
rectly manipulated by the code—in scripts or notebooks—and it is particularly rec-
ommended in research applications. In addition, the output can be serialized in JSON
4 https://github.com/D2KLab/sparql-transformer
5 https://www.npmjs.com/package/sparql-transformer
6 https://github.com/D2KLab/py-sparql-transformer
7 https://pypi.org/project/SPARQLTransformer/
5.2. DATA RESHAPE WITH SPARQL TRANSFORMER 57
SPARQL
endpoint
SPARQL
SPARQL results
JSON query Query ( JSON) JSON
Parser Shaper
query performer output
Prototype
{
"id": "?band",
"album": "?v1",
"genre": "?v2"
}
using standard Python methods, for example to serve it in web applications using Flask
or Django.
Both versions present an architecture as in Figure 5.3. The input of the library is a JSON
query, following the syntax seen before. A Parser component reads the input and extracts (1) a
SPARQL SELECT query and (2) a clean version of the prototype, in which each JSON key is
assigned a placeholder SPARQL variable. This variable can be defined by the user or automat-
ically generated by the library. The SPARQL query is then passed to the Query Performer, in
charge of performing the request to the SPARQL endpoint and returning the results in the
JSON standard JSON format. Finally, the Shaper accesses the results and reshapes each bind-
ing into the prototype template, leveraging the placeholders for inserting the values in the right
place. In addition, data-type parsing, pruning of nodes with no value (from OPTIONAL blocks),
and merging of the objects with common identifiers are applied.
Given that its interactions consist of SPARQL queries and HTTP requests, SPARQL
Transformer is compatible with any SPARQL endpoint by specifying its public address. It can
be used as middleware between the endpoint and the application, as an alternative to or in
combination with other SPARQL clients (Chapter 2). The Query Performer can be replaced by
the user with a custom one, for fulfilling different requirements for accessing the endpoint (e.g.,
authentication) or for integration into more complex environments.
58 5. SHAPING JSON RESULTS: SPARQL TRANSFORMER
Table 5.1: Differences in keywords between the two alternative SPARQL Transformer syntaxes
which may be found in the context of the query or of the final results
where the object variable ?v1 is automatically assigned. In the results (Listing 5.1), the bindings
with the same id value are aggregated into a single object, while the others appear as arrays.
8 @type is a standard JSON-LD keyword used for expressing the RDF class of the described entity
5.3. FEATURES AND SYNTAX 59
Figure 5.4: A JSON query for retrieving the list of music bands, with labels, albums, and genres.
On the right, the equivalent SPARQL query. The arrows mark the transformation from a line
in the JSON query to a line in the SPARQL query.
The merging is applied at different levels of the JSON tree. Figure 5.4 extends the query
about bands with the addition of the band name and the genre label, introducing a second level
(the genre object) in the JSON tree. You can notice the automatically generated variables ?v1,
?v2, and ?v3. Each level has its own id, which serves as the subject for the sibling predicate
nodes. In this way, the subject for the first rdfs:label is ?band, while it is ?v2 for the second
one. The node genre.id is a predicate node with the role of anchor, which is handled as a special
case: not having a sibling anchor node—apart from itself—its subject is the parent anchor, in this
case ?band. Given this kind of query, the aggregation of results is performed as follows. First,
the bindings are merged on the root id. Those having the same value for ?band are grouped
and represented as a single object. In this object, sibling predicate nodes (name and album) are
expressed as an array of distinct values. Finally, all nested objects are merged on their own anchor.
Looking again at our example, for each band, the genre objects with the same genre.id are
merged. In this way, the final output contains one object for each band, including on its own all
distinct labels, albums and genres. A genre is represented only once for each band, with all its
different labels.
Is the use of an id keyword mandatory for enabling the merging? Development needs
may require a different naming convention or a structure not including an id property. In those
60 5. SHAPING JSON RESULTS: SPARQL TRANSFORMER
cases, it is possible to select an alternative anchor by appending to the JSON value the inline
modifier $anchor. For instance, we may change Figure 5.4 in order to store the band URI in a
band property, replacing line 3 (”id”: ”?band”) with:
”band”: ”?band$anchor”
While the resulting SPARQL query remains unchanged, the prototype avoids the id key-
word and allows the user to select a different anchor.9 The latter inherits the role of subject
for SPARQL predicates and the one of merging nodes. If both are present, the $anchor mod-
ifier takes priority over the id keyword; in other words, the $anchor modifier allows to have a
id property in the final JSON structure without any special role in the query process.
10 https://tools.ietf.org/html/rfc7231#section-5.3.5
11 https://virtuoso.openlinksw.com/
62 5. SHAPING JSON RESULTS: SPARQL TRANSFORMER
Input Note
$where string, array Add where clause in the triple format.
Ex. "$where": "?id a dbo:City"
$values object Set VALUES
presence of a lang tag or of the $lang attribute
attached to the related property is taken into account.
Ex. "$values": {"?id": ["dbr:Paris",
"http://dbpedia.org/resource/Roma"]}
$limit number LIMIT the SPARQL results
$limitMode "query" (default) Perform the LIMIT operation in the query or on
or "library" the obtained results (library)
$offset number OFFSET applied to the SPARQL results
$from string(uri) FROM which selecting the results
$distinct boolean (default true) Set the DISTINCT in the select
$orderby string, array Build an ORDER BY on the variables in the input.
Ex. "$orderby":["DESC(?name)","?age"]
$groupby string, array Build an GROUP BY on the variables in the input.
Ex. "$groupby":"?id"
$having string, array Allows to declare the content of HAVING. If it is an
array, the items are concatenated by &&.
$filter string, array Add the content as a FILTER.
Ex. "$filter": "?myNum >3"
$prefixes object
"foaf": "http://xmlns.com/foaf/0.1/".
$lang :acceptedLangs[string] $bestlang
(see Table 5.3), expressed through the
Accept-Language standard.
Ex. $lang:en;q=1, it;q=0.7 *;q=0.1
$langTag "hide", "show" (default) When hide, language tags are not included in the
output. Ex. hide → "label":"Bologna" ;
show → "label":{"value": "Bologna",
"language": "it"}
5.3. FEATURES AND SYNTAX 63
Table 5.3: Full list of inline $-modifiers. All options are required unless differently specified.
Input Note
$required n/a When omitted, the clause is wrapped by
OPTIONAL { ... }.
$sample n/a Extract a single value for that property by adding a
SAMPLE(?v) in the SELECT.
$lang :lang [string, optional] FILTER by language. In absence of a language, pick
$lang in the root. Ex. $lang:it,
$lang:en, $lang.
$bestlang :acceptedLangs Choose the best match (using BEST_LANGMATCH) over
[string, optional] the languages according to the list expressed through
merged on myprop.
$reverse n/a Set this property for use the current variable as subject
of the SPARQL predicate, rather than object.
$count n/a Return the respective aggregate function (COUNT, SUM,
$sum MIN, MAX, AVG) on the variable.
$min
$max
$avg
64 5. SHAPING JSON RESULTS: SPARQL TRANSFORMER
Figure 5.5: An example of a JSON query (left) with the equivalent SPARQL query (right).
according to the selected language without the need to individually change all impacted literal
fields, marked once for all with the inline modifiers.
When the language is selected in this way including the language tag in the results may be
too verbose by setting the root $langTag modifier as hide, all string literals will be represented
as strings, by avoiding the wrapping inside an intermediate object for reporting the language.
• All variables that need to appear in root modifiers are made explicit with a $var mod-
ifier. All the others are automatically assigned by the library (ex. ?v2, ?v41, etc.), so
that they cannot be referenced elsewhere in the query.
• $reverse saves adding a triple expression in $where, as we have done in Figure 5.4.
• When not marked with $required, the nodes are transformed into OPTIONAL triples
in the SPARQL query. In the case of author.uri (second-level anchor), all triples
produced by the nodes in author are wrapped in an optional block.
• We ask for all distinct labels of works, while we want only the English one for the
author. The language can be modified for all involved fields from the root $lang.
• The museum is defined in $values, acting as a parameter. The value can be easily
changed, reusing this query for other museums.
5.5 EXERCISES
What follows is a list of exercises for practicing the SPARQL Transformer syntax, based on real-
world data coming from DBpedia. Take into account that these may be solved in different ways,
all of them providing similar results.
For solving the exercises, it is possible to use the SPARQL Transformer Playground,12 a
web application for writing and testing JSON queries. The application provides live conversion
into SPARQL while writing, the possibility to retrieve results, and a comparison between the
original and reshaped data. All solutions are in Appendix A.
Exercise 5.1 Write a JSON query equivalent to the following SPARQL query.
SELECT DISTINCT *
WHERE {
?id a dbo:Band.
?id rdfs: label ? band_label .
?id dbo: genre ? genre .
?genre rdfs: label ? genre_label
}
LIMIT 100
Exercise 5.2 For each NBA player, retrieve his URI identifier, name, a single image (if available)
and his birth date (if available).
Tip: you may want to start by looking at LeBron James in DBpedia.
12 https://d2klab.github.io/sparql-transformer/
5.5. EXERCISES 67
Exercise 5.3 For each team in the NBA,13retrieve the name of the team and the the id and
name for all players of the team. For any name, be sure to pick the best label for an
English-speaking public. Improve the readability of results by hiding the language
tag.
Exercise 5.4 For each country using the Euro as currency,14 retrieve its id, name, and the list
of its cities, together with city names and populations. Make sure to pick exactly
the English labels and to hide the language tag. Limit the results to the first 100.
Tip: you may start by looking at Athens in DBpedia.
Exercise 5.5 For each country using the Euro as currency, retrieve the id, the name, and the
total number of cities in the country. Order by descending number of cities. Make
sure to pick exactly the English labels and to hide the language tag.
Exercise 5.6 Retrieve the list of Italian regions, their names and the list of cities in the region (id
+ label). Limit to the first 100 results and pick labels in Italian, hiding the language
tag. Use the JSON-LD syntax. Make sure that your query is easily extensible to
other countries and languages, for example France and French or United States
and English.
Tip: you may start by looking at Piedmont in DBpedia.
CHAPTER 6
Applications
The tools and principles that we have explained in this book, in particular those that facilitated
the creation of the tools grlc (described in Chapter 4) and SPARQL Transformer (described in
Chapter 5), originated in research programs around Knowledge Graphs and the Semantic Web,
in particular the Dutch national program CLARIAH1 [Meroño-Peñuela et al., 2020] and the
French project DOREMUS (DOing REusable MUSical data)2 [Achichi et al., 2018], from
2016 onward.
Ever since, we have observed adoption of these methods and tools, often beyond the limits
of their incubating projects in a number of application domains in various projects, companies,
and institutions. For example, from the start of its operation in July 2016, the public instance of
grlc3 has attracted 4,948 unique visitors, 39.92% of return rate, and generating 9,840 sessions. grlc
has also attracted the attention of external developers, who have sent 147 pull requests that have
been integrated into the master branch. Its docker container has been pulled 2.5K times.4 A list
of community maintained queries and matching APIs is available at https://git.io/grlc-usage,
currently counting 444 publicly shared queries. SPARQL Transformer has been downloaded from
npm and PyPI thousands of times since 2017 (an average of 100 downloads per month).
In this chapter, we describe some of the most relevant success cases in the applica-
tion of the methods and tools described in this book, extending on previously published
cases [Lisena et al., 2019, Meroño-Peñuela and Hoekstra, 2017]. For each of these cases, we
explain what the challenges are and what the situation was before and after deploying solution
based on these methods and tools, emphasizing the specific requirements that were considered
critical and the extent to which they were addressed. We group them in three categories: applica-
tions of grlc; applications of SPARQL Transformer; and applications that leverage the combined
capabilities of both. We end the chapter with a reference table with links to relevant sources of
code, documentation, tools, and examples.
1 https://clariah.nl
2 https://www.doremus.org
3 https://grlc.io
4 https://hub.docker.com/r/clariah/grlc
70 6. APPLICATIONS
6.1 grlc
6.1.1 LINKED DATA PLATFORM FOR GENETICS RESEARCH
In genetics research, it is increasingly common to analyze fully sequenced genomes and identify
traits associated with specific genes. However, information about genetic traits is usually avail-
able in disparate sources, such as scientific literature and in public biological databases. Genomics
researchers need to combine these information sources in order to identify genes of particular
interest.
Researchers from the Plant Breeding group at Wageningen University and the Nether-
lands eScience Center built a Linked Data platform called pbg-ld which combines data extracted
from Europe PubMed Central (PMC) repository and genomic annotations from the Sol Ge-
nomics Network (SGN), UniProt, and Ensembl Plants databases. This analytical platform al-
lowed its users to access relevant information on Solanaceae species.
A collection of SPARQL queries5 powers this platform, allowing users to count genomic
features in a genome graph, extract the genomic location of specific features, and extract annota-
tions from specific genes, among other features. Users do not need to understand or modify the
SPARQL queries, but instead are able to access the data directly through a web API. A Jupyter
notebook queries the API endpoints, and presents results to the users in the form of summary
tables or bar charts.
This platform provides users with a way to access the most used datasets for candidate
gene discovery in tomato and potato species. In turn, this increases the transparency for users
who wish to visualize data on this platform or extend this tool for other crop species.
Further details of this example can be found in Singh et al. [2020].
6.1.2 NANOPUBLICATIONS
Nanopublications [Groth et al., 2010] are a Linked Data format for scholarly data publishing
that has received considerable uptake in the last few years. In contrast to common Linked Data
publishing practice, nanopublications consist of atomic information snippets of assertion data,
providing a container format to link provenance information and metadata to those assertions.
While the nanopublications format is domain-independent, the datasets that have become avail-
able in this format are mostly from Life Science domains, including data about diseases, genes,
proteins, drugs, biological pathways, and biotic interactions. More than 10 million such nanop-
ublications have been published, which now form a valuable resource for studies on the domain
level of the given Life Science domains as well as on the more technical levels of provenance
modeling and heterogeneous Linked Data.
In order to facilitate easier and more powerful access to nanopublications, Kuhn et al.
[2018] provided a Linked Data API to access the full set of nanopublications available on the
network. This API is powered by a grlc server in front of a GraphDB triple store instance with a
5 https://github.com/candYgene/queries
6.1. grlc 71
SPARQL endpoint.6
This API offers a standard entry point to the data in the nanopublications
network, which any Linked Data client can consume via HTTP without specific knowledge of
SPARQL or RDF. A Python package7 integrates publishing and retracting nanopublications
from the network using grlc, together with many other features.
6 The API is available at http://purl.org/nanopub/api; the underlying parametrized queries can be found at https://github.
com/peta-pico/nanopub-api/.
7 https://github.com/fair-workflows/nanopub
8 http://clariah.nl/
72 6. APPLICATIONS
Figure 6.1: The use of grlc makes Knowledge Graphs accessible from any http compatible ap-
plication.
6.2.2 FADE
FADE (Filling Automatically Dialog Events)17 is a component for chatbot applications for
extracting data from a Knowledge Graph. Its core feature consists of the automatic extraction
13 https://github.com/D2KLab/explorer
14 https://git.io/adasilk
15 https://git.io/memad-explorer
16 The code is extracted from the ADASilk configuration and is fully available at https://github.com/silknow/adasilk/blob/
main/config/routes/object.js.
17 https://github.com/ehrhart/fade
74 6. APPLICATIONS
{
view: 'browse ', // 'browse ' creates a search page
showInNavbar : true ,
rdfType : 'ecrm:E22_Man - Made_Object ',
uriBase : 'http :// data. silknow .org/ object ',
filters : [{ // filters to appear in the advanced search
id: 'material ',
whereFunc : () => [ // added to the base query
'? production ecrm: P126_employed ? material ',
'OPTIONAL { ? broaderMaterial (skos: member |skos: narrower )*
? material }'
],
filterFunc : ( values ) => { // added to the base query
return [ values .map (( val) =>
`? material = <${val}> || ? broaderMaterial = <${val}>`).
join(' || ')];
}}] ,
query : { // base query
'@graph ': [{
'@type ': 'ecrm:E22_Man - Made_Object ',
'@id ': '?id ',
'@graph ': '?g',
label : '$rdfs : label ',
identifier : '$dc: identifier ',
description : '$ecrm : P3_has_note ',
}],
$where : 'GRAPH ?g { ?id a ecrm:E22_Man - Made_Object }'
}
}
Listing 6.1: Extract of the ADASilk configuration showing the usage of SPARQL Transformer
in KG Explorer.
of dictionaries of entries which are then processed by the natural language unit (NLU). FADE
is integrated in the tourist assistant MinoTour.18
18 https://minotour.eurecom.fr/
6.3. grlc AND TRANSFORMER 75
Figure 6.2: Screenshot of the Tapas interface (from Lisena et al. [2019]).
A configuration file in JSON is used to define the different intents.19 The terms that refer
to entities and that need to be recognized by the NLU are extracted with SPARQL Transformer.
The JSON query is directly included in the JSON configuration and the output values are shaped
in order to fit the data structure expected by the other components. The merging capabilities are
crucial for correctly handling synonyms for the same entity.
19 In chatbot development, an intent is the goal that the user wants to achieve when sending a message to the system. An
important part of chatbot development is indeed the intent detection, which aims to assign to each message the right goal.
20 https://github.com/peta-pico/tapas
76 6. APPLICATIONS
Table 6.1: Links to resources and tools presented in this book
Name URL
grlc
grlc website https://grlc.io/
Demo api https://grlc.io/api-git/CLARIAH/grlc-queries/
Repository https://github.com/CLARIAH/grlc
grlc queries in GitHub https://git.io/grlc-usage
Docker Image https://hub.docker.com/r/clariah/grlc
SPARQL Transformer
Repository ( JS) https://github.com/D2KLab/sparql-transformer
Repository (Python) https://github.com/D2KLab/py-sparql-transformer
Playground https://d2klab.github.io/sparql-transformer
SWApi Tutorial https://api4kg.github.io/swapi-tutorial/ (with videos)
Example and Exercises https://github.com/api4kg/exercises
and JSON background prefer these kinds of interfaces that return and show query results with
some degree of object nesting, according to their needs.
6.4 DEMOS/LINKS
Table 6.1 includes all links referring to resources and tools presented in this book. Of particular
relevance to the reader who wants to solidify the concepts already seen in this book, we point
out the Tutorial SPARQL Endpoints and Web APIs (SWApi), which took place at the 19th In-
ternational Semantic Web Conference (ISWC 2020). The website offers teaching material and also
video tutorials about SPARQL Transformer and grlc.
77
CHAPTER 7
APPENDIX A
Solutions
A.1 CHAPTER 4
Exercise 4.1 Create an API that retrieves all bands from DBpedia.
Tip: Use the DBpedia ontology type dbo:Band.
Solution 4.1
#+ summary : Lists all DBpedia dbo:Band
#+ endpoint : http :// dbpedia .org/ sparql
#+ method : GET
SELECT DISTINCT ?s
WHERE {
?s a dbo:Band
}
Exercise 4.2 Create an API that lists bands that play either Rock or Jazz, and that have either
Liverpool or Los Angeles as hometown.
Tip 1: Use the DBpedia ontology types dbo:genre and dbo:hometown.
Tip 2: Use the grlc enumerate decorator.
Solution 4.2
#+ summary : Bands by city and genre
#+ endpoint : http :// dbpedia .org/ sparql
#+ tags:
#+ - dbpedia
#+ method : GET
#+ enumerate :
#+ - genre :
#+ - http :// dbpedia .org/ resource / Rock_music
#+ - http :// dbpedia .org/ resource /Jazz
#+ - hometown :
82 A. SOLUTIONS
#+ - http :// dbpedia .org/ resource / Liverpool
#+ - http :// dbpedia .org/ resource / Los_Angeles
Exercise 4.3 Expand the API from the previous exercise by adding documentation and making
sure your query can only be run on DBpedia SPARQL endpoint.
Tip: Use the summary, description, endpoint, and endpoint_in_url deco-
rators.
Solution 4.3
#+ summary : Bands by city and genre
#+ description :
#+ This API endpoint lists bands from DBPedia
#+ that play either Rock or Jazz , and that have
#+ either Liverpool or Los Angeles as hometown .
#+ endpoint : http :// dbpedia .org/ sparql
#+ endpoint_in_url : false
#+ tags:
#+ - dbpedia
#+ method : GET
#+ enumerate :
#+ - genre :
#+ - http :// dbpedia .org/ resource / Rock_music
#+ - http :// dbpedia .org/ resource /Jazz
#+ - hometown :
#+ - http :// dbpedia .org/ resource / Liverpool
#+ - http :// dbpedia .org/ resource / Los_Angeles
Exercise 4.4 Create an API that lists the name, genre and hometown of bands whose name
matches a given string.
Tip 1: Use the DBpedia property type dbp:name.
Tip 2: Because DBpedia uses Virtuoso, you can use the built in function
bif:contains1
Solution 4.4
#+ summary : Bands by city and genre
#+ description :
#+ This API endpoint lists bands from DBPedia
#+ that play either Rock or Jazz , and that have
#+ either Liverpool or Los Angeles as hometown .
#+ endpoint : http :// dbpedia .org/ sparql
#+ endpoint_in_url : false
#+ tags:
#+ - dbpedia
#+ method : GET
1 http://docs.openlinksw.com/virtuoso/rdfsparqlrulefulltext/
84 A. SOLUTIONS
A.2 CHAPTER 5
Exercise 5.1 Write a JSON query equivalent to the following SPARQL query.
SELECT DISTINCT *
WHERE {
?id a dbo:Band.
?id rdfs: label ? band_label .
?id dbo: genre ? genre .
?genre rdfs: label ? genre_label
}
LIMIT 100
Solution 5.1
{
” proto ”: {
”id”: ”?id”,
” label ”: ” $rdfs : label$required ”,
” genre ”: {
”id”: ”$dbo: genre$required ”,
” label ”: ” $rdfs : label$required ”
}
},
” $where ”: ”?id a dbo:Band”,
” $limit ”: 100
}
Exercise 5.2 For each NBA player, retrieve his URI identifier, name, a single image (if available),
and his birth date (if available).
Tip: you may want to start by looking at LeBron James in DBpedia.
Solution 5.2
{
” proto ”: {
”id”: ”?id”,
”name”: ” $rdfs : label$required ”,
” league ”: ”$dbo: league$var : league ”,
” image ”: ” $foaf : depiction$sample ”,
” birthDate ”: ”$dbo: birthDate ”
A.2. CHAPTER 5 85
},
” $values ”: {
” league ”:”dbr: National_Basketball_Association ”
}
}
Exercise 5.3 For each team in NBA, retrieve the name of the team and the id and name for all
players of the team. For any name, be sure to pick the best label for an English-
speaking public. Improve results readability by hiding the language tag.
Solution 5.3
{
” proto ”: {
”team” : ”? team$anchor ”,
”name”: ” $rdfs : label$required$bestlang ”,
” players ” : {
”id”: ”$dbo: team$reverse ”,
”name”: ” $rdfs : label$required$bestlang ”
}
},
” $where ”: ”?team dct: subject dbc:
National_Basketball_Association_teams ”,
” $lang ”: ”en”,
” $langTag ”: ”hide”
}
Exercise 5.4 For each country using the Euro as currency,2 retrieve its id, name, and list of its
cities, together with city names and populations. Make sure to pick exactly the
English labels and to hide the language tag. Limit the results to the first 100.
Tip: you may start by looking at Athens in DBpedia.
Solution 5.4
{
” proto ”: {
” state ”: ”? state$anchor ”,
”name”: ” $rdfs : label$required$lang :en”,
” cities ”: {
Exercise 5.5 For each country using the Euro as currency, retrieve the id, the name, and the
total number of cities in the country. Order by descending number of cities. Make
sure to pick exactly the English labels and to hide the language tag.
Solution 5.5
{
” proto ”: {
” state ”: ”? state$anchor ”,
”name”: ” $rdfs : label$required$lang :en”,
” cities ”: ”$dbo: country$reverse$var : city$count ”
},
” $where ”: [
”? state dbo: currency dbr:Euro”,
”?city a dbo:City”
] ,
” $orderby ” : ”desc (? city)”,
” $langTag ”: ”hide”
}
Exercise 5.6 Retrieve the list of Italian regions, with names and the list of cities in the region (id
+ label). Limit to the first 100 results and pick labels in Italian, hiding the language
tag. Use the JSON-LD syntax. Make sure that your query is easily extensible to
other countries and languages, for example France and French or United States
and English.
Tip: you may start by looking at Piedmont in DBpedia.
A.2. CHAPTER 5 87
Solution 5.6
{
” @context ”: ”http :// example .org /”,
” @graph ”: [{
” @type ”: ” AdministrativeArea ”,
”@id”: ”?id”,
”name”: ” $rdfs : label$required$lang ”,
” country ”:”$dbo: country$required$var : country ”,
”city”: {
”@id”:”$dbo: region$required$reverse$var :city”,
”name”: ” $rdfs : label$required$lang ”
}
}],
” $where ”: [
”?id a dbo: AdministrativeRegion ”,
”?city a dbo:City”
],
” $values ” : {
” country ” : ”dbr: Italy ”
},
” $lang ”: ”it”,
” $langTag ”: ”hide”,
” $limit ”: 100
}
89
Bibliography
Manel Achichi, Pasquale Lisena, Konstantin Todorov, Raphaël Troncy, and Jean Delahousse.
DOREMUS: A graph of linked musical works. In 17th International Semantic Web Conference
(ISWC), Monterey, CA, October 2018. DOI: 10.1007/978-3-030-00668-6_1 69
Stanislav Barton. Indexing structure for discovering relationships in RDF graph recursively
applying tree transformation. In Semantic Web Workshop at 27th Annual International ACM
SIGIR Conference, pages 58–68, Desna, Czech Republic, 2004. 51
Bradley R. Bebee, Daniel Choi, Ankit Gupta, Andi Gutmans, Ankesh Khandelwal, Yigit Kiran,
Sainath Mallidi, Bruce McGaughy, Mike Personick, Karthik Rajan, et al. Amazon neptune:
Graph data management in the cloud. In 17th International Semantic Web Conference (ISWC),
P&D/Industry/BlueSky, 2018. 78
David Beckett. RDF 1.1 N-triples. W3C Recommendation, World Wide Web Consortium,
2014. https://www.w3.org/TR/n-triples 4, 6, 17
GO Blog. Introducing the knowledge graph: Things, not strings, 2012. https://blog.google/
products/search/introducing-knowledge-graph-things-not/ 2
David Booth, Christopher G. Chute, Hugh Glaser, and Harold Solbrig. Toward easier RDF.
In W3C Workshop on Web Standardization for Graph Data, Berlin, Germany, 2019. xvi, 78, 79
Pierre Bourhis, Juan L. Reutter, Fernando Suárez, and Domagoj Vrgoč. JSON: data model,
query languages and schema specification. In 36th ACM SIGMOD-SIGACT-SIGAI Sympo-
sium on Principles of Database Systems, pages 123–135, 2017. DOI: 10.1145/3034786.3056120
xv
Dan Brickley and R. V. Guha. RDF schema 1.1. W3C Recommendation, World Wide Web
Consortium, 2014. https://www.w3.org/TR/rdf-schema/ 3
Jeen Broekstra, Arjohn Kampman, and Frank van Harmelen. Sesame: A generic architecture
for storing and querying RDF and RDF schema. In Ian Horrocks and James Hendler, Eds.,
1st International Semantic Web Conference (ISWC), pages 54–68, Springer Berlin Heidelberg,
2002. DOI: 10.1007/3-540-48005-6_7 51
Gavin Carothers. RDF 1.1 N-Quads. A line-based syntax for RDF datasets. W3C Recommen-
dation, World Wide Web Consortium, 2014. http://www.w3.org/TR/2014/REC-n-quads-
20140225/ 5
90 BIBLIOGRAPHY
Richard Cyganiak, David Wood, and Markus Lanthaler. RDF 1.1 Concepts and Abstract Syn-
tax. Technical report, World Wide Web Consortium (W3C), 2014. http://www.w3.org/TR/
rdf11-concepts/ xv
Marilena Daquino, Ivan Heibi, Silvio Peroni, and David Shotton. Creating RESTful APIs over
SPARQL endpoints with RAMOSE. ArXiv Preprint ArXiv:2007.16079, 2020. 79
Bob DuCharme. Learning SPARQL: Querying and Updating with SPARQL 1.1. O’Reilly Media,
Inc., 2013. 8
Paola Espinoza-Arias, Daniel Garijo, and Oscar Corcho. Mapping the web ontology language
to the OpenAPI specification. In International Conference on Conceptual Modeling, pages 117–
127, Springer, 2020. DOI: 10.1007/978-3-030-65847-2_11 80
Javier D. Fernández, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias. LOD-a-lot:
A queryable dump of the LOD cloud. In 16th International Semantic Web Conference (ISWC),
pages 75–83, Springer, 2017. DOI: 10.1007/978-3-319-68204-4_7 xvii, 2
Roy Fielding and Julian Reschke. Hypertext transfer protocol (HTTP/1.1): Semantics and
content. RFC 7231, RFC Editor, June 2014. https://tools.ietf.org/html/rfc7231 61
Roy T. Fielding and Richard N. Taylor. Architectural Styles and the Design of Network-Based
Software Architectures, volume 7. University of California, Irvine, 2000. 27
Paul Groth, Andrew Gibson, and Jan Velterop. The anatomy of a nanopublication. Information
Services and Use, 30(1–2):51–56, 2010. DOI: 10.3233/isu-2010-0613 70
Ramanathan V. Guha, Dan Brickley, and Steve Macbeth. Schema.org: Evolution of structured
data on the Web. Communications of the ACM, 59(2):44–51, 2016. DOI: 10.1145/2844544 2
Claudio Gutierrez and Juan F. Sequeda. Knowledge graphs: A tutorial on the history of knowl-
edge graph’s main ideas. In 29th ACM International Conference on Information and Knowledge
Management (CIKM), pages 3509–3510, 2020. DOI: 10.1145/3340531.3412176 2, 3
Florian Haupt, Frank Leymann, and Cesare Pautasso. A conversation based approach for mod-
eling REST APIs. In 12th Working IEEE/IFIP Conference on Software Architecture, pages 165–
174, 2015. DOI: 10.1109/WICSA.2015.20 28
Tom Heath and Christian Bizer. Linked data: Evolving the Web into a global data space.
Synthesis Lectures on the Semantic Web: Theory and Technology, 1(1):1–136, 2011. DOI:
10.2200/s00334ed1v01y201102wbe001 xvii, 6
Ivan Herman, Ben Adida, Manu Sporny, and Mark Birbeck. RDFa 1.1 primer-rich structured
data markup for web documents. W3C Working Group Note, World Wide Web Consortium,
2015. https://www.w3.org/TR/rdfa-primer 17
BIBLIOGRAPHY 91
Rinke Hoekstra, Albert Meroño-Peñuela, Kathrin Dentler, Auke Rijpma, Richard Zijdeman,
and Ivo Zandhuis. An ecosystem for linked humanities data. In European Semantic Web
Conference, pages 425–440, Springer, 2016. DOI: 10.1007/978-3-319-47602-5_54 71
Rinke Hoekstra, Albert Meroño-Peñuela, Auke Rijpma, Richard Zijdeman, Ashkan Ashkpour,
Kathrin Dentler, Ivo Zandhuis, and Laurens Rietveld. The dataLegend ecosystem for histori-
cal statistics. Journal of Web Semantics, 50:49–61, 2018. DOI: 10.1016/j.websem.2018.03.001
71
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio
Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres,
Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas
Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann. Knowledge graphs,
Communications of the ACM, 64(3), 2021. DOI: 10.1145/3418294 1, 2
Martynas Jusevičius. Linked data templates: Generic read-write linked data API. Technical
Report, W3C Draft Community Group, 2020. 80
Ali Khalili and Klaas Andries de Graaf. Linked data reactor: Towards data-aware user in-
terfaces. In 13th International Conference on Semantic Systems, pages 168–172, 2017. DOI:
10.1145/3132218.3132231 80
Ali Khalili and Albert Meroño-Peñuela. WYSIWYQ-what you see is what you query. In
VOILA@ISWC, pages 123–130, 2017. 72
Tobias Kuhn, Albert Meroño-Peñuela, Alexander Malic, Jorrit H. Poelen, Allen H. Hurlbert,
Emilio Centeno Ortiz, Laura I. Furlong, Núria Queralt-Rosinach, Christine Chichester, Juan
M. Banda, et al. Nanopublications: A growing resource of provenance-centric scientific linked
data. In IEEE 14th International Conference on e-Science (e-Science), pages 83–92, 2018. DOI:
10.1109/escience.2018.00024 70
Pasquale Lisena and Raphaël Troncy. Transforming the JSON output of SPARQL queries for
linked data clients. In WWW’18 Companion: The Web Conference Companion, Lyon, France,
ACM, 2018. DOI: 10.1145/3184558.3188739 54
Pasquale Lisena, Albert Meroño-Peñuela, Tobias Kuhn, and Raphaël Troncy. Easy web API de-
velopment with SPARQL transformer. In 18th International Semantic Web Conference (ISWC),
In-Use Track, pages 454–470, Auckland, New Zealand, 2019. DOI: 10.1007/978-3-030-
30796-7_28 xvi, 54, 69, 75
Deborah L. McGuinness, Frank Van Harmelen, et al. OWL web ontology language overview.
W3C Recommendation, 10(10), 2004. 3
92 BIBLIOGRAPHY
Albert Meroño-Peñuela and Rinke Hoekstra. grlc makes GitHub taste like linked data APIs.
In European Semantic Web Conference, pages 342–353. Springer, 2016. DOI: 10.1007/978-3-
319-47602-5_48 xvi
Albert Meroño-Peñuela and Rinke Hoekstra. Automatic query-centric API for routine access to
linked data. In 16th International Semantic Web Conference (ISEC), pages 334–349, Springer,
2017. DOI: 10.1007/978-3-319-68204-4_30 xvi, 69
Albert Meroño-Peñuela, Victor de Boer, Marieke van Erp, Willem Melder, Rick Mourits, Auke
Rijpma, Ruben Schalk, and Richard Zijdeman. Ontologies in CLARIAH: Towards inter-
operability in history, language and media. ArXiv Preprint ArXiv:2004.02845, 2020. 69
Laurens Rietveld and Rinke Hoekstra. The YASGUI family of SPARQL clients. Semantic Web,
8(3):373–383, 2017. DOI: 10.3233/sw-150197 75
Guus Schreiber, Yves Raimond, Frank Manola, Eric Miller, and Brian McBride. RDF 1.1
primer. W3C Recommendation, World Wide Web Consortium, 2014. http://www.w3.org/
TR/rdf11-primer/ 3, 6
Steve Speicher, John Arwe, and Ashok Malhotra. Linked data platform 1.0. W3C Recom-
mendation, World Wide Web Consortium, 2015. http://www.w3.org/TR/2015/REC-ldp-
20150226/ 2
Manu Sporny, Dave Longley, Gregg Kellogg, Markus Lanthaler, Pierre-Antoine Champin, and
Niklas Lindström. SPARQL 1.1 Overview. W3C Recommendation, World Wide Web
Consortium, 2013. https://www.w3.org/TR/json-ld11// xvi
Ruben Taelman, Joachim Van Herwegen, Miel Vander Sande, and Ruben Verborgh. Comunica:
A modular SPARQL query engine for the Web. In 17th International Semantic Web Conference
(ISWC), October 2018a. https://comunica.github.io/Article-ISWC2018-Resource/ DOI:
10.1007/978-3-030-00668-6_15 21
Ruben Taelman, Miel Vander Sande, and Ruben Verborgh. GraphQL-LD: Linked data query-
ing with GraphQL. In 17th International Semantic Web Conference (ISWC), Poster and Demo
Track, Monterey, CA, 2018b. 24
Ruben Taelman, Miel Vander Sande, and Ruben Verborgh. Bridges between GraphQL and
RDF. In W3C Workshop on Web Standardization for Graph Data, Berlin, Germany, 2019. 24
BIBLIOGRAPHY 93
Peter van den Besselaar and Ali Khalili. Using the SMS data platform: RISIS deliverable 12.
Technical Report, Vrije Universtiteit, 2018. 72
Willem Robert van Hage, with contributions from: Tomi Kauppinen, Benedikt Graeler,
Christopher Davis, Jesper Hoeksema, Alan Ruttenberg, and Daniel Bahls. SPARQL:
SPARQL client, 2013. http://CRAN.R-project.org/package=SPARQL R package version
1.15. 71
Pierre-Yves Vandenbussche, Ghislain A. Atemezing, María Poveda-Villalón, and Bernard
Vatant. Linked open vocabularies (LOV): A gateway to reusable semantic vocabularies on
the Web. Semantic Web, 8(3):437–452, 2017. DOI: 10.3233/sw-160213 2
Ruben Verborgh. Re-decentralizing the Web, for good this time. In Oshani Seneviratne and
James Hendler, Eds., Linking the World’s Information: A Collection of Essays on the Work of
Sir Tim Berners-Lee, ACM, 2020. https://ruben.verborgh.org/articles/redecentralizing-the-
web/ 21
Ruben Verborgh and Ruben Taelman. LDflex: A read/write linked data abstraction for front-
end web developers. In 19th International Semantic Web Conference (ISWC), pages 193–211,
Springer International Publishing, Athens, Greece, 2020. DOI: 10.1007/978-3-030-62466-
8_13 21
Denny Vrandečić and Markus Krötzsch. Wikidata: A free collaborative knowledgebase. Com-
munications of the ACM, 57(10):78–85, 2014. DOI: 10.1145/2629489 xv, 13
W3C SPARQL working group. SPARQL 1.1 overview. W3C Recommendation, World Wide
Web Consortium, 2013. https://www.w3.org/TR/sparql11-overview/ xv, 8, 12, 13
K. E. D. Wapenaar. TNO early research program 2015–2018 annual report 2016. Technical
Report, TNO, 2017. 71
Antony J. Williams, Lee Harland, Paul Groth, Stephen Pettifer, Christine Chichester, Egon
L. Willighagen, Chris T. Evelo, Niklas Blomberg, Gerhard Ecker, Carole Goble, et al.
Open PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today, 17(21–
22):1188–1198, 2012. DOI: 10.1016/j.drudis.2012.05.016 28
95
Authors’ Biographies
ALBERT MEROÑO-PEÑUELA
Albert Meroño-Peñuela is a Lecturer (Assistant Profes-
sor) in Computer Science and Knowledge Engineering in
the Department of Informatics of King’s College London
(United Kingdom). He obtained his Ph.D. at the Vrije Uni-
versiteit Amsterdam in 2016, under the supervision of Frank
van Harmelen, Stefan Schlobach, and Andrea Scharnhorst.
His research focuses on Knowledge Graphs, Web Querying,
and Cultural AI. Albert has participated in large Knowledge
Graph infrastructure projects in Europe, such as CLARIAH,
DARIAH, and Polifonia H2020, and has published research
in ISWC, ESWC, the Semantic Web Journal, and the Journal
of Web Semantics. He is, together with Rinke Hoekstra, the original author of grlc, and together
with Carlos Martínez-Ortiz, its main current maintainer.
PASQUALE LISENA
Pasquale Lisena is a researcher in the Data Science depart-
ment at EURECOM, Sophia Antipolis (France). He obtained
his Ph.D. in Computer Science from Sorbonne University of
Paris in 2019, with a thesis on music representation and rec-
ommendation, under the supervision of Raphaël Troncy. His
research focuses on Semantic Web, Knowledge Graphs, and
Information Extraction, with particular application to the do-
main of Digital Humanities, contributing on AI projects such
as DOREMUS, SILKNOW, and Odeuropa. Pasquale’s work
has been published in leading conferences in the field, such as
ISWC, EKAW, and ISMIR. Given his past background as a web developer, his interest also
involves data usability in web applications and human-computer interaction. He is the main
author of SPARQL Transformer.
96 AUTHORS’ BIOGRAPHIES
CARLOS MARTÍNEZ-ORTIZ
Carlos Martínez-Ortiz is a community manager at the
Netherlands eScience Center. He obtained his Ph.D. in Com-
puter Science at the University of Exeter (United Kingdom).
Afterward, he worked on various research projects at the Uni-
versity of Exeter, Plymouth University, and the eScience Cen-
ter. These projects were in collaboration with industrial and
academic partners in diverse fields such as veterinary science,
digital humanities, and life sciences. He has been involved in
large projects such as CLARIAH and ODISSEI and works in
close collaboration with partners such as SURF, DANS, and
The Software Sustainability Institute. His current research in-
terests include linked open data, natural language processing,
and software sustainability.
Synthesis Lectures on
MEROÑO-PEÑUELA • ET AL
Data, Semantics, and Knowledge
Series Editors: Ying Ding, University of Texas at Austin
Paul Groth, University of Amsterdam
Founding Editor Emeritus: James Hendler, Rensselaer Polytechnic Institute
Web Data APIs for Knowledge Graphs
Easing Access to Semantic Data for Application Developers
Albert Meroño-Peñuela, King’s College London
Pasquale Lisena, EURECOM, France
Carlos Martínez-Ortiz, Netherlands eScience Center
This book describes a set of methods, architectures, and tools to extend the data pipeline at the
disposal of developers when they need to publish and consume data from Knowledge Graphs (graph-
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.