0% found this document useful (0 votes)
113 views

Validating RDF Data 2017

A Chapman and Hall book on RDF. Very specific.

Uploaded by

Moi ChezMoi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Validating RDF Data 2017

A Chapman and Hall book on RDF. Very specific.

Uploaded by

Moi ChezMoi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 308

Validating RDF Data

Jose Emilio Labra Gayo


University of Oviedo

Eric Prud’hommeaux
W3C/MIT and Micelio

Iovka Boneva
University of Lille

Dimitris Kontokostas
University of Leipzig

SYNTHESIS LECTURES ON SEMANTIC WEB:


THEORY AND TECHNOLOGY #16

M
&C Morgan & cLaypool publishers
Copyright © 2018 by Morgan & Claypool

Validating RDF Data


Jose Emilio Labra Gayo, Eric Prud’hommeaux, Iovka Boneva, and Dimitris Kontokostas
www.morganclaypool.com

ISBN: 9781681731643 paperback


ISBN: 9781681731650 ebook
ISBN: 9781681731667 e-pub
ABSTRACT
RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing
to zoology. Requirements for detecting bad data differ across communities, fields, and tasks,
but nearly all involve some form of data validation. This book introduces data validation and
describes its practical use in day-to-day data exchange.
The Semantic Web offers a bold, new take on how to organize, distribute, index, and share
data. Using Web addresses (URIs) as identifiers for data elements enables the construction of
distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an infor-
mation revolution, and also like the Web, it is encumbered by data quality issues. The quality of
Semantic Web data is compromised by the lack of resources for data curation, for maintenance,
and for developing globally applicable data models.
At the enterprise scale, these problems have conventional solutions. Master data manage-
ment provides an enterprise-wide vocabulary, while constraint languages capture and enforce
data structures. Filling a need long recognized by Semantic Web users, shapes languages pro-
vide models and vocabularies for expressing such structural constraints.
This book describes two technologies for RDF validation: Shape Expressions (ShEx) and
Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the
two, and some example applications.

KEYWORDS
RDF, ShEx, SHACL, shape expressions, shapes constraint language, data quality,
web of data, Semantic Web, linked data
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

Foreword by Phil Archer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Foreword by Tom Baker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

Foreword by Dan Brickley and Libby Miller . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 RDF and the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 RDF: The Good Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Challenges for RDF Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Conventions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 The RDF Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.1 RDF History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 RDF Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Shared Entites and Vocabularies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Technologies Related with RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Inference Systems: RDF Schema and OWL . . . . . . . . . . . . . . . . . . . . . 20
2.4.3 Linked Data, JSON-LD, Microdata, and RDFa . . . . . . . . . . . . . . . . . . 23
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Non-RDF Schema Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 SQL and Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.5 CSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Understanding the RDF Validation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Previous RDF Validation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Query-based Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Inference-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Structural Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Validation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.2 Graph-based Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 RDF Data Model Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.4 Data-modeling-based Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.5 Expressiveness of Schema Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.6 Validation Invocation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.7 Usability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Use of ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 ShEx implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 The Shape Expressions Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Shape Expressions Compact Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Invoking Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.3 Structure of Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.4 Start Shape Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Node Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.1 Node kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.2 Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.3 Facets on Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.4 Value Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.1 Triple Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.2 Groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.3 Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.4 Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.5 Nested Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6.6 Inverse Triple Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6.7 Repeated Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.8 Permitting other Triples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.7.1 Shape References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.7.2 Recursion and Cyclic References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7.3 External Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7.4 Labeled Triple Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.5 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.8 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8.1 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8.2 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.8.3 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.9 Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.9.1 Fixed Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.9.2 Query Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.9.3 Result Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.9.4 JSON Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.9.5 Chaining Validation Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.10 Semantic Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.11 ShEx and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.12 Importing schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.13 RDF and JSON-LD Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.15 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5 SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1 Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2 SHACL Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 Basic Definitions: Shapes Graphs, Node, and Property Shapes . . . . . . . . . . . 124
5.4 Importing other Shapes Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.5 Validation Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.6 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.1 Node shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.2 Property Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6.3 Constraint Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.6.4 Human Friendly Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.6.5 Declaring Shape Severities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.6.6 Deactivating Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.7 Target Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.7.1 Target Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.7.2 Target Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.7.3 Implicit Class Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7.4 Target Subjects Of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.7.5 Target Objects Of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.8 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.9 Constraints on Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.9.1 Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.9.2 Class of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.9.3 Node Kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.9.4 Sets of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.9.5 Specific Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.10 Datatype Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.10.1 Value Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.10.2 String-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.10.3 Language-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.11 Logical Constraints: and, or, not, xone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.11.1 AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.11.2 OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.11.3 Exactly One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.11.4 Not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.11.5 Combining Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.12 Shape-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.12.1 Shape References and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.12.2 Qualified Value Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.13 Closed Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.14 Property Pair Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.15 Non-validating SHACL Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.16 SHACL-SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.16.1 SPARQL Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.16.2 SPARQL-based Constraint Components . . . . . . . . . . . . . . . . . . . . . . 185
5.17 SHACL and Inference Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.18 SHACL Compact Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.19 SHACL Rules and Advanced Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.20 SHACL Javascript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.21 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.22 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1 Describing a Linked Data Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1.1 WebIndex in ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.1.2 WebIndex in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.2 Describing Clinical Records—FHIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.2.1 FHIR as Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2.2 Consistency constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2.3 FHIR/RDF Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2.4 Generic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.3 Springer Nature SciGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
6.4 DBpedia Validation Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4.1 Ontology-based Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4.2 RDF Mappings Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.4.3 Validating Link Contributions with SHACL . . . . . . . . . . . . . . . . . . . 215
6.4.4 Ontology Validation with SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.5 ShEx for ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.6 SHACL in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.8 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

7 Comparing ShEx and SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233


7.1 Common Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.2 Syntactic Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.3 Foundation: Schema vs. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.4 Invoking Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.5 Modularization and Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.6 Shapes, Classes, and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
7.7 Violation Reporting and Severities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.8 Default Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
7.9 Property Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.10 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
7.11 Property Pair Constraints and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.12 Repeated Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.13 Exactly One and Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.14 Treatment of Closed Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.15 Stems and Stem Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7.16 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.17 Semantics and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.18 Extension Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.19 Conclusions and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.20 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.21 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

A WebIndex in ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

B WebIndex in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

C ShEx in ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

D SHACL in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Preface
This book describes two languages for implementing constraints on RDF data, describing the
main features of both Shape Expressions (ShEx) and Shapes Constraint Language (SHACL)
from a user perspective, and also offering a comparison of the technologies. Throughout this
book, we develop a small number of examples that typify validation requirements and demon-
strate how they can be met with ShEx and SHACL. The book is not intended to be a formal
specification of the languages, for which the interested reader can consult the corresponding
reference documents, but rather, it is meant to serve as an introduction to the technologies with
some background about the rationale of their design and some points of comparison.
Chapter 1 provides a brief introduction to the topic. Chapter 2 presents a short
overview of the RDF data model and RDF-related technologies; this chapter could be skipped
by any reader who already knows RDF or Turtle. Chapter 3 helps the reader to understand what
to expect from data validation. It describes the problem of RDF validation and some approaches
that have been proposed. This book specifically reviews two of these approaches in further detail:
ShEx (Chapter 4) and SHACL (Chapter 5). These chapters describe each language and provide
a practical introduction using examples. Following the discussion of both languages, Chapter 6
presents some applications using either ShEx, SHACL, or both. Finally, Chapter 7 compares
ShEx and SHACL and offers some conclusions.
The goal of this book is to serve as a practical introduction to ShEx and SHACL using
examples. While we omitted formal definitions or specifications, references for further reading
can be found at the end of each chapter. We give a quick overview of some background and
related technologies so that readers without RDF knowledge can follow the book’s contents.
Also, it is not necessary to have any prior knowledge of programming or ontologies to understand
RDF validation technologies. The intended audience is anyone interested in data representation
and quality.

Jose Emilio Labra Gayo, Eric Prud’hommeaux, Iovka Boneva, and Dimitris Kontokostas
July 2017
Foreword by Phil Archer
“Anyone can say anything about anything,” says the mantra for the Semantic Web. More for-
mally, the Semantic Web adopts the Open World Assumption: just because your data encodes
a set of facts, that doesn’t mean there aren’t other facts stated elsewhere about the same thing.
All of which is fine and part of the design of RDF which supports the creation of a graph at
Web scale, but in a lot of practical applications you just need to know whether the triples you’ve
ingested match what you were expecting; you need validation. You might think of it as a de-
fined subset of the whole graph, or maybe a profile, providing a huge boost to interoperability
between disparate systems. If you can validate the data you’ve received then you can process it
with confidence, using more terse code, perhaps with more performant queries. I don’t accept
that RDF is hard, certainly no harder than any other Web technology; what is hard is thinking
in graphs. Keeping in your head that this node supports these properties and has relationships
with those other nodes becomes complex for anything other than trivial datasets. The validation
techniques set out in this book provide a means to tame that complexity, to set out for humans
and machines exactly what the structure of the data is or should be. That’s got to be helpful and,
incidentally, ties in with new work now under way at W3C on dataset exchange. In my role at
W3C I watched as the SHACL and ShEx camps tried hard to converge on a single method:
they couldn’t, hence the two different approaches. Both are described in detail here with co-
pious examples, which is just what you need to get started. How can you choose between the
two methods? Chapter 7 gives a detailed comparison and allows you to make your own choice.
Whichever you choose, this is the book you need to make sense of RDF validation.

Phil Archer, Former W3C Data Strategist


July 2017
Foreword by Tom Baker
The technologies described here meet a need first recognized, albeit dimly, two decades ago.
Rewind to circa 2000, when the parallel development of two W3C specifications, XML Schema
and RDF Schema, both called “schema languages” but with radically different uses, caused some
confusion.
This confusion permeated our early discussions about the Dublin Core. Was it an XML
format, an RDF vocabulary, or somehow both? Could metadata just follow arbitrary formats or
did it need a data model? In 2000, the Dublin Core community turned to “application profiles”
as a way to mix and match multiple vocabularies to meet specific needs, and the idea was an
instant hit even if people disagreed about their use. Were they more for validating data, or more
about finding rough consensus on a metadata model within a community of practice? Attempts
to bridge the XML and RDF mindsets in the DCMI community, notably with a Description
Set Profile constraint language for validating RDF-based metadata (2008), never quite caught
on. Perhaps the idea needed a bigger push?
Fast-forward to 2013, when W3C convened a workshop on RDF validation which re-
vealed that many communities had been circling around the same issues, and which ultimately
led to the results described here [82]. This book focuses on data validation, an addition to the
Semantic Web stack that is long overdue. But from a DCMI perspective, the ideas for future
work outlined in its Conclusion are just as exciting: the prospect of using ShEx- or SHACL-
based application profiles to map and convert between data models, size up aggregated datasets,
or provide nuanced feedback to data providers on quality issues. ShEx and SHACL, finally
production-ready, are full of potential for further evolution.

Tom Baker, Dublin Core Metadata Initiative


July 2017
Foreword by Dan Brickley and
Libby Miller
People think RDF is a pain because it is complicated. The truth is even worse. RDF is painfully
simplistic, but it allows you to work with real-world data and problems that are horribly com-
plicated. While you can avoid RDF, it is harder to avoid complicated data and complicated
computer problems. RDF brings together data across application boundaries and imposes no
discipline on mandatory or expected structures. This can make working with RDF data frus-
trating. Its schema and ontology languages can help define the meaning of RDF content but,
again, can’t express rules about how actual data records should look. The contents of this book
are nearly 20 years too late, but better now than never. Recent developments around RDF vali-
dation have finally made it easier to record, exchange, and understand rules about validating and
otherwise checking RDF data. Who knows what wonders await us in another 20 years.

Dan Brickley, Schema.org and Google


Libby Miller, BBC
July 2017
CHAPTER 1

Introduction
1.1 RDF AND THE WEB OF DATA
These days more and more devices generate data automatically and it is relatively easy to develop
applications in different domains backed by databases and exposing data to the Web. The amount
and diversity of data produced clearly exceeds our capacity to consume it.
The term big data has emerged to name data that is so large and complex that traditional
data processing applications can’t handle it. Big data has been described by at least three words
starting by V: volume, velocity, variety. Although volume and velocity are the most visible fea-
tures, variety is a key concern which prevents data integration and generates lots of interoper-
ability problems.
RDF was proposed as a graph-based data model which became part of the Semantic Web
vision. Its reliance on the global nature of URIs offered a solution to the data integration problem
as RDF datasets produced by different means can seamlessly be integrated with other data. Data
integration using RDF is faster and more robust than traditional solutions in the face of schema
changes.
RDF is also a key enabler of linked data. Linked data [46] was proposed as a set of best
practices to publish data on the Web. It was introduced by Tim Berners-Lee [8] and was based
on four main principles. RDF is mentioned in the third principle as one of the standards that
provides useful information. The goal is that information must be useful not only for humans
navigating through browsers (for which HTML would be enough) but also for other agents that
may automatically process that data.
The linked data principles became popular and several initiatives were created to publish
data portals. The size of data on the Web increased significantly in the last years. For example,
the LODStats project [36] aggregates around 150 billion triples from 2,973 datasets.

1.2 RDF: THE GOOD PARTS


RDF has been acknowledged as the language for the Web of Data and it has several advantages
like the following.

• Disambiguation. The use of IRIs to identify predicates and to make assertions about re-
sources enables the user to globally identify the property that is being asserted as well as
the resources involved in the statement. Those global properties can be identified by auto-
2 1. INTRODUCTION
mated agents which can recognize the data that they must understand in a non-ambiguous
way.
• RDF as an integration language. RDF is compositional in the sense that two RDF graphs
obtained from independent sources can automatically be merged to obtain a larger graph.
This property facilitates the integration of data from heterogeneous sources.
One of the biggest challenges of the current era related with computer science is how to
solve the interoperability problem between different applications that manipulate data that
comes from heterogeneous sources. RDF is a step forward to partially solve this problem
as RDF data can automatically be integrated even if it has been produced by different
parties.
• RDF as a lingua franca for semantic web and linked data. The simplicity and generality of
the RDF data model enables its use to model any kind of data that can be easily integrated
with other data.
RDF is at the core of the semantic web stack or layer cake and is mentioned in the linked
data principles and in the five-star model.
• RDF data stores and SPARQL. SPARQL was proposed as a query language for RDF in
2008. The language met an overwhelming acceptance and adoption by the RDF com-
munity. The ability to query led to the development of many new applications as well as
databases and libraries. RDF data stores began to popularize and some companies started
using RDF internally to represent their data. Some of those applications chose RDF just
for practical reasons, even without reference to the semantic web. Storing RDF and query-
ing it using SPARQL offers a very flexible model which can adapt very quickly to data
model changes. RDF data stores can be seen as part of the NoSQL movement and there
are solutions for RDF data stores with high capabilities that can work with very large
databases [67].
• Extensibility. When one starts to develop an application to solve some problem, it is nec-
essary to record information in a format with room to grow, which enables the data model
to evolve and increasingly adapt to new needs. The extensible graph model of RDF makes
it very easy to add more statements to any graph.
• Flexibility. While a change in a relational database may be difficult to accomplish. RDF
embraces flexibility and these changes are usually a matter of updating the triples.
• Open by default. The semantic web approach to knowledge representation promoted what is
called Open World Assumption (OWA) instead of the Closed World Assumption (CWA)
which was popular in previous knowledge representation systems. The CWA considers
that what is not known to be true must be false, while the OWA considers that what is
not known is just unknown.
1.3. CHALLENGES FOR RDF ADOPTION 3
The CWA is usually applied in systems that have complete information while the OWA
is more natural for incomplete information systems like the Web.
Given that RDF was applied for the semantic web, most of the applications based on RDF
also adopt the Open World Assumption adapting to the appearance of new data.
Although RDF and related technologies employ the Open World Assumption by default,
this does not mean that every application must adopt that assumption. In some contexts,
it may be necessary to take the opposite view and consider that a system contains all the
information on some topic in order to operate.

1.3 CHALLENGES FOR RDF ADOPTION


In spite of all the advantages of RDF, its widespread adoption is not yet a reality. Some reasons
for this can be guessed.
• RDF is mistakenly identified as a complex language. Some people consider RDF as a theoreti-
cal, knowledge representation language which does not appeal to practical web developers.
However, the RDF data model is very simple and can be understood by almost any person
in less than an hour. In its simplicity lies its power and the advantages that we enumerated
in previous sections. It is true that some of the technologies built on top of RDF, like
OWL, have a more theoretical foundation based on description logics which diverge from
this simplicity.
We consider that it is necessary to separate the RDF data model from its more powerful
and complex relatives. This is not to say that these technologies are not useful or practical,
but that the people who will manage them are different than the people who develop
applications. Web developers are not so much interested in ontological discussions, they
have more mundane concerns like what are the arcs expected to have for some node, what
datatypes are allowed, which data structures can be used to represent some nodes, etc.
• Ugly syntax. The RDF data model was defined along with an XML syntax in 1999. At
that time XML was a popular syntax and that decision made sense. RDF/XML syntax
was not human-friendly (it was difficult to write RDF/XML by hand) and it was also
difficult to process (it needed specialized libraries and parsers). The difference between
the hierarchical, tree-based XML model and the graph-based RDF data model makes
necessary to serialize the RDF graph to be represented in XML. The same RDF graph
could be serialized in many ways making very difficult to use standard XML tools like
XSLT or XPath to process RDF.
There were several attempts to define a more human-friendly syntax. Notation3 was pro-
posed as a human-friendly language that was able to extend RDF and express other logical
operations and rules. Turtle was later proposed as a subset of Notation3 for only express-
ing RDF. Turtle became popular in the semantic web community although not so much
4 1. INTRODUCTION
between web developers. Given that it is a special format, it requires a separate parser and
tools.
In 2013, RDF 1.1 promotes also JSON-LD for developers who are familiar with JSON
and RDFa which enables to embed RDF annotations along HTML content.
Although these efforts can help popularize RDF adoption between the developer com-
munity, some extra work is still needed to better understand the role of RDF in the Web
development and publishing pipeline.
• RDF production/consumption dilemma. It is necessary to find ways that data producers can
generate their data so it can be handled by potential consumers. The return of inversion
for data producers comes when there are agents consuming their data.
There is some structure of the data that publishers have and want to transmit. For example,
they may want to declare that some nodes have some properties with some specific values.
Data consumers need to know that structure to develop applications to consume the data.
Although RDF is a very flexible schema-less language, enterprise and industrial appli-
cations may require an extra level of validation before processing for several reasons like
security, performance, etc.
Veteran users of RDF and SPARQL have confronted the problem of composing or con-
suming data with some expectations about the structure of that data. They may have described
that structure in a schema or ontology, or in some human-readable documentation, or maybe
expected users to learn the structure by example. Ultimately, users of that application need to
understand the graph structure that the application expects.
While it can be trivial to synchronize data production and consumption within a single
application, consuming foreign data frequently involves a lot of defensive programming, usu-
ally in the form of SPARQL queries that search out data in different structures. Given lots of
potential representations of that data, it is difficult to be confident that we have addressed all of
the intended ways our application may encounter its information.
Grammars are a common tool for defining data structures and the languages that con-
vey them. Every data structure with sufficient complexity and precision relies on some formal
convention for enumerating groups of properties and expressing data types, cardinalities, and
relationships between structures. The need for such a representation grows with the complexity
of the language.
To illustrate this, consider the specifications for RDF and SPARQL. RDF is a simple
data model consisting of graphs made of triples composed from three types of nodes. Because
of this simplicity, it does not need a defining grammar (though most academic papers about
RDF include one). By contrast, the SPARQL language would be enormously complicated or
impossible to define without a systematic grammar.
This book describes two languages for implementing constraints on RDF data. They can
enumerate RDF properties and identify permissible data types, cardinalities, and groups of prop-
1.4. STRUCTURE OF THE BOOK 5
erties. These languages can be used for documentation, user interface generation, or validation
during data production or consumption.
Shape Expressions (ShEx) were proposed as a user-friendly and high-level language for
RDF validation. Initially proposed as a human-readable syntax for OSLC Resource Shapes [86],
ShEx grew to embrace more complex user requirements coming from clinical and library use
cases. ShEx now has a rigorous semantics and interchangeable representations: JSON-LD,
RDF, and the one meant for human eyes.
Another technology, SPIN, was used for RDF validation, principally in TopQuadrant’s
TopBraid Composer. This technology, influenced from OSLC Resource Shapes as well, evolved
into both an implementation and definition of the Shapes Constraint Language (SHACL),
which was adopted by the W3C Data Shapes Working Group.
Although both ShEx and SHACL have similar goals and share some similarities they
solve the problem from different perspectives and formalisms. At the time of this writing the
W3C Data Shapes Working Group has been unable to obtain a compromise solution that brings
together both proposals so it seems that they will evolve as different technologies in the future.
This book describes the main features of both ShEx and SHACL from a user perspective
and also offers a comparison of the technologies. Throughout this book, we develop a small
number of examples that typify validation requirements and demonstrate how they can be met
with ShEx and SHACL. The book is not intended as a formal specification of the languages, for
which the interested reader can consult the corresponding documents, but as an introduction to
the technologies with some background about the rationale of their design and some comparison
between them.

1.4 STRUCTURE OF THE BOOK


Chapter 2 presents a short overview of the RDF data model and RDF-related technologies.
This chapter could be skipped by any reader who already knows RDF or Turtle.
Chapter 3 helps us understand what to expect from data validation. It describes the prob-
lem of RDF validation and some approaches that have been proposed. In this book, we will
further review two of them: Shape Expressions (ShEx) and SHACL.
The next two chapters focus on two proposals: Shape Expressions (Chapter 4) and Shapes
Constraint Language (Chapter 5). The description of both languages is more intended to be a
practical introduction to them using examples than a formal specification. Once we present both
languages, Chapter 6 presents some applications using either ShEx, SHACL or both. Finally,
Chapter 7 compares ShEx and SHACL and presents some conclusions.
The goal of this book is to serve as a practical introduction to ShEx and SHACL using
examples. We omitted formal definitions or specifications and just added a section at the end of
each chapter with references to further reading.
The intended audience is anyone interested in data representation and quality. We give a
quick overview of some background and related technologies so readers without RDF knowl-
6 1. INTRODUCTION
edge can follow the book contents. Also, it is not necessary to have any prior knowledge on
programming or ontologies to understand RDF validation technologies.

1.5 CONVENTIONS AND NOTATION


We provide a short introduction to RDF and Turtle in Chapter 2 and from that point on, we
use Turtle for the rest of the book.
Once a prefix declaration is presented in Turtle and ShEx, it is omitted thereafter to sim-
plify the examples unless needed for clarity. The prefix declarations and namespaces used are
shown in Table 1.1. Most examples in the book will need to be prepended with prefix declara-
tions in order to run correctly.

Table 1.1: Common prefix declarations

Alias Namespace
prefix : <http://example.org/>
prefix cex: <http://purl.org/weso/computex/ontology#>
prefix cdt: <http://example.org/customDataTypes#>
prefix dbr: <http://dbpedia.org/resource/>
prefix ex: <http://example.org/>
prefix qb: <http://purl.org/linked-data/cube#>
prefix org: <http://www.w3.org/ns/org#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix sx: <http://shex.io/ns/shex#>

RDF is being applied to lots of domains, some of them highly specialized. We opted to
present examples using concepts from familiar domains like people, courses, companies, etc.
that we think will be familiar to any reader. Most of the examples use properties borrowed from
schema.org,1 which provides lots of concepts from familiar domains. The examples are just for
illustration purposes and do not pretend to check schema.org rules. Nevertheless, validating
schema.org using ShEx or SHACL can be an interesting exercise for readers.
For examples that involve validation of a node against a shape, we use the following no-
tation:
1 http://schema.org
1.5. CONVENTIONS AND NOTATION 7

1 :good schema:name "Valid node" . # V Passes as :Shape

3 :bad schema:name "Bad node" . # X Fails as :Shape

which means that node :good validates against shape :Shape, while node :bad does not.
The examples have been tested using the different tools available. We maintain a public
repository where we keep the examples used in this book. The URL is: https://github.com
/labra/validatingRDFBookExamples.
CHAPTER 2

The RDF Ecosystem


This chapter includes a short overview of the RDF data model and the Turtle notation, as well as
some technologies like SPARQL, RDF Schema, and OWL that form part of the RDF ecosys-
tem.
Readers that are already familiar with these technologies may skip this chapter and jump
into the next chapter that describes the RDF validation problem.

2.1 RDF HISTORY


The first draft of RDF was published in 1997 [68] and became a W3C recommendation in 1999
along with an XML syntax [69].
A class hierarchy which allows to describe a reasoning like if Socrates is human and all
humans are mortals, then Socrates is mortal and property domains and ranges followed a year later.
It is perhaps unfortunate that this came under the name RDF Schema (RDFS) as it didn’t
offer any of the data constraints available in other schema languages like SQL’s DDL or W3C
XML Schema.
In hindsight, this development path was clearly in tension with the priorities of every-
day programmers and systems architects who care primarily about creating and accessing well-
structured data, and perhaps secondarily about inference. Four years after RDFS, OWL ex-
tended the facilities provided by RDFS into an expressive ontology language that could describe
the information required for instances of classes.
However, once again, the language was oriented toward a healthy, distributed information
infrastructure and not that last mile which permits developers to confidently produce and con-
sume data. While OWL could detect errors when one used a literal with the wrong datatype, or
infer a subclass relationship between two classes explicitly declared disjoint, it simply would not
complain if one says that every vehicle registration has an owner’s first name and last name, and
then fails to supply those values. OWL is designed for an open world semantics, which means
that it won’t conclude anything (e.g., signaling missing data) based on the absence of provided
data. The absence of evidence is not evidence of absence.
Another four years later in 2008, the RDF community assembled to deliver a query lan-
guage (SPARQL) to meet the most elementary of application needs, accessing data. The lan-
guage met immediately with overwhelming acceptance and adoption. This ability to query led
to the development of many new applications, as well as databases and libraries designed to fa-
cilitate application development. This energy led to the expansion of SPARQL 1.1 into update
10 2. THE RDF ECOSYSTEM
(analogous to SQL DDL) and HTTP protocol. It did not, however, elegantly solve the problem
of RDF data description and verification.

2.2 RDF DATA MODEL


When RDF was created in 1997, XML was quickly becoming a popular format. It had a strong
influence on the RDF syntax which was called RDF/XML. That format is quite verbose and
there appeared several proposals to have a more human-readable syntax for RDF.
In 2014, RDF 1.1 [25] was published as a revised version which maintained most of the
data model and added support for other serialization formats like Turtle [78], Trig [18], or
JSON-LD [5].
In this section we give a short overview of the RDF data model following the RDF 1.1
definitions and using Turtle as the serialization format.
The RDF data model is based on the concept of triples. Each triple consists of a subject, a
predicate and an object. RDF triples are usually depicted as a directed arc connecting two nodes
(subject and object) by an edge (predicate) (see Figure 2.1).

predicate
subject object

Figure 2.1: Example of RDF triple.

An RDF triple asserted means that some relationship, indicated by the predicate, holds
between the resources denoted by the subject and object. This is known as an RDF statement.
The predicate is an IRI that denotes a property. An RDF statement can be thought of as a binary
relation identified by the property between the subject and object.
There can be three kinds of nodes: IRIs, literals, and blank nodes.
• An IRI (Internationalized Resource Identifier) [34] refers to a resource (the referent). A
resource can be any thing. IRIs can appear as subjects, predicates and objects. In Turtle,
IRIs are enclosed by < and >. For example, an IRI can be <http://example.org/john>.
Most RDF formats include some mechanism called prefix declaration which enables to
simplify writing long IRIs declaring prefix labels. A prefix label associates an alias with an
IRI and enables the definition of prefixed names. A prefixed name contains a prefix label
and a local part separated by : and represents the IRI formed by concatenating the IRI
associated with the prefix label and the local part. For example, if ex is declared as a prefix
label to represent <http://example.org/>, then ex:alice is a prefixed name that represents
<http://example.org/alice> (see Figure 2.2).

There are some popular namespace aliases like rdf, xsd, rdfs, owl, etc. The http://pref
ix.cc service can be used to lookup the IRI associated with those popular aliases. The
snippets of code used in this book assume these prefix declarations. Table 1.1
2.2. RDF DATA MODEL 11

Prefix declaration prefix ex: <http://example.org/>

Prefix label ex:

denotes
Prefixed name ex:alice <http://example.org/alice>

Figure 2.2: Example of prefix declaration.

• A literal denotes resources which have an associated value, for example, an integer or string
value. Literals can only appear as objects in triples. They contain a lexical form and a
datatype IRI which are represented as "lexicalForm"^^datatype in Turtle. For example: "23"
^^xsd:integer represents an integer with value 23 and "1980-03-01"^^xsd:date represents the
March 1, 1980.
All literals in RDF have an associated datatype. In the case of string literals with no de-
clared datatype, it is assumed the xsd:string datatype by default. So "hi" is the same as
"hi"^^xsd:string.

A special type of literals are language-tagged strings, which are literals with datatype
rdf:langString that also contain a language tag [75] to identify a specific language.
Language-tagged strings are represented in Turtle as "string"@tag. For example: "hola"@es
represents the literal value "hola" written in Spanish (es).

• Blank nodes are local identifiers which do not identify specific resources. Blank nodes
can be used as subjects or objects of triples. They specify that something with the given
relationship exists, without explicitly naming it.
In Turtle, blank nodes can be denoted by an underscore followed by a colon and a local
identifier. For example: _:id represents a blank node.
An RDF graph is a set of RDF triples. Notice that the edges of RDF graphs can only
be IRIs. This is an important feature of RDF that enables to globally identify the predicates
asserted by triples. The subjects can only be IRIs or blank nodes, while the objects can be IRIs,
blank nodes or literals.

Example 2.1 Simple RDF file in Turtle


The following code represents an RDF graph in Turtle. The first three lines are prefix
declarations and the rest represent a sequence of RDF triples separated by dots.
1 prefix ex: <http: // example .org/>
2 prefix schema: <http: // schema .org/>
3 prefix dbr: <http: // dbpedia .org/ resource />
12 2. THE RDF ECOSYSTEM
4 prefix xsd: <http: // www.w3.org /2001/ XMLSchema #>

6 ex:alice schema:knows ex:bob .

8 ex:bob schema:knows ex:carol .


9 ex:bob schema:name " Robert " .
10 ex:bob schema:birthDate "1980 -03 -10"^^ xsd:date .
11 ex:bob schema:birthPlace dbr:Oviedo .

13 ex:carol schema:knows ex:alice .


14 ex:carol schema:knows ex:bob .
15 ex:carol schema:birthPlace dbr:Oviedo .

The corresponding RDF graph has been depicted in Figure 2.3. Rounded boxes represent
IRIs while orange rectangles represent literals.

Robert xsd:string
schema:name
schema:birthDate
1980-03-10 xsd:date

schema:knows ex:bob schema:birthPlace

schema:knows schema:knows dbr:Oviedo

ex:alice

schema:knows schema:birthPlace
ex:carol

Figure 2.3: Example of an RDF graph.

Blank nodes can be used to make assertions about some elements whose IRIs are not
known.

Example 2.2 Blank nodes in RDF


The following RDF Turtle code declares that ex:alice knows someone who knows ex:dave,
and that ex:carol knows someone who was born in the same place as dave, whose age is 23.
1 prefix ex: <http: // example .org/>
2 prefix schema: <http: // schema .org/>
3 prefix dbr: <http: // dbpedia .org/ resource />
2.2. RDF DATA MODEL 13
5 ex:alice schema:knows _:x .
6 _:x schema:knows ex:dave .

8 ex:carol schema:knows _:y .

10 _:y schema:birthPlace _:z ;


11 schema:age "23"^^ xsd:integer .

13 ex:dave schema:birthPlace _:z .

An important feature of RDF graphs is that two independent RDF graphs can automati-
cally be merged to obtain a larger RDF graph formed by the union on their sets of triples. Given
the global nature of IRIs, nodes with the same IRI are automatically unified. Using shared IRIs
makes the powerful statement the entities and relationships in one graph carry the same intent
as they do in the other graphs using the same identifiers. In a sense, the use of RDF gets rid of
the data merging problem and lets us focus on the hard problems of establishing shared entities
and vocabularies.
For example, the union of the RDF graphs from Figures 2.3 and 2.4 is depicted in Fig-
ure 2.5. Turtle contains several simplifications to facilitate readability.

ex:alice

ex:carol
schema:knows

schema:knows
schema:age
23 xsd:integer

schema:knows schema:birthPlace

ex:dave

schema:birthPlace

Figure 2.4: Example of an RDF graph with blank nodes.

• When the subject is repeated, it is possible to use predicate lists collapsing the triples
with the same subject and to omit it separating the different predicates and objects by
semicolons (;). So, instead of writing
1 ex:bob schema:name " Robert " .
2 ex:bob schema:birthDate "1980 -03 -10"^^ xsd:date .
14 2. THE RDF ECOSYSTEM

schema:name Robert xsd:string

schema:birthDate
1980-03-10 xsd:date

schema:knows ex:bob schema:birthPlace

schema:knows dbr:Oviedo
ex:alice schema:knows

schema:knows
schema:knows
ex:carol schema:birthPlace

schema:age 23
schema:knows
xsd:integer

schema:knows
schema:birthPlace
ex:dave
schema:birthPlace

Figure 2.5: Merged RDF graph.

3 ex:bob schema:birthPlace dbr:Oviedo .


4 ex:bob schema:knows ex:carol .

it is possible to write:
1 ex:bob schema:name " Robert " ;
2 schema:birthDate "1980 -03 -10"^^ xsd:date ;
3 schema:birthPlace dbr:Oviedo ;
4 schema:knows ex:carol .

• When the subject and predicate are the same, it is possible to use object lists collapsing
the subjects and predicates and separating the different objects by commas (,).
Instead of writing
1 ex:carol schema:knows ex:alice .
2 ex:carol schema:knows ex:bob .

it is possible to write:
2.2. RDF DATA MODEL 15

1 ex:carol schema:knows ex:alice, ex:bob .

Example 2.3 Turtle simplifications


The RDF graph represented in Example 2.1 can be simplified as:
1 prefix schema: <http: // schema .org/>
2 prefix ex: <http: // example .org/>
3 prefix dbr: <http: // dbpedia .org/ resource />
4 prefix xsd: <http: // www.w3.org /2001/ XMLSchema #>

6 ex:alice schema:knows ex:bob .

8 ex:bob schema:name " Robert " ;


9 schema:birthDate "1980 -03 -10"^^ xsd:date ;
10 schema:birthPlace dbr:Oviedo ;
11 schema:knows ex:carol .

13 ex:carol schema:birthPlace dbr:Oviedo ;


14 schema:knows ex:alice, ex:bob .

• Although number and Boolean literals can be defined like other literals with their lexical
form and datatype, there is also a shorthand syntax in Turtle to automatically parse some
values as literals. Table 2.1 shows how some values in shorthand notation are parsed as
literals.
Table 2.1: Shorthand syntax for numbers and Booleans in Turtle

Datatype Shorthand Example Lexical Example


xsd:integer -3 "-3”ˆˆxsd:integer
xsd:decimal -3.14 "-3.14"ˆˆxsd:decimal
xsd:double 3.14e2 "3.14e2"ˆˆxsd:double
xsd:boolean true "true"ˆˆxsd:boolean

• A triple of the form X rdf:type Y asserts that X has the type represented by Y. In Turtle,
rdf:type can also be represented by the token a, so the previous triple could also be repre-
sented as X a Y.
• RDF collections are list structures chained by the rdf:rest that end with rdf:nil and whose
values are declared by each value of the rdf:first property.
16 2. THE RDF ECOSYSTEM
Example 2.4 RDF collections not simplified
The following snippet declares the results of a marathon as an RDF Collection:
1 :m23 schema:name "New York City Marathon " ;
2 :results _:1 .

4 _:1 rdf:first :dave .


5 _:1 rdf:rest _:2 .
6 _:2 rdf:first :alice .
7 _:2 rdf:rest _:3 .
8 _:3 rdf:first :bob .
9 _:3 rdf:rest rdf:nil .

Turtle has a special notation for RDF collections enumerating the values enclosed by round
brackets. The previous example can also be represented in Turtle as:
1 :m23 schema:name "New York City Marathon " ;
2 :results (:dave :alice :bob) .

New York City Marathon


schema:name

:m23

rdf:first :dave
:results

rdf:first :alice

rdf:rest

rdf:first :bob

rdf:rest

rdf:rest
rdf:nil

Figure 2.6: RDF collection example.


2.3. SHARED ENTITES AND VOCABULARIES 17
• Fresh blank nodes in Turtle can also be represented by using square brackets ([ and ]). In
this way, Example 2.2 can be rewritten as follows.

Example 2.5 Blank nodes with square brackets


1 ex:carol schema:knows [ schema:age 23 ;
2 schema:birthPlace _:x
3 ] .

5 ex:dave schema:birthPlace _:x .

7 ex:alice schema:knows [ schema:knows ex:dave ] .

The RDF data model is very simple. This simplicity if part of its power as it enables RDF
to be used as a data representation language in a lot of scenarios.

2.3 SHARED ENTITES AND VOCABULARIES


One of RDF strengths is to promote the use of IRIs instead of plain strings to facilitate merging
data from heterogeneous sources and to avoid ambiguity. This poses the challenge of agreeing on
common entities and relationships. Usually, those sets of entities and relationships are grouped
in vocabularies which can be general-purpose or domain specific.
There are several well-known vocabularies like schema.org which is a collaborative, com-
munity activity founded by Google, Microsoft, Yahoo, and Yandex that promotes the use of
common structured data on the internet.
An interesting project is the LOV (Linked Open Vocabularies)1 project that collects open
vocabularies and provides a vocabulary search engine.
Shared identifiers are frequently minted by some authority releasing data using those iden-
tifiers followed by community uptake of those identifiers. Services like http://identifiers.
org/ publish these identifiers and, in the frequent case where multiple identifiers exist for the
same entity, map between them. The property owl:sameIndividualAs can be used to assert that
mapping.
Consensus on vocabularies is typically by communities producing human-readable speci-
fications, which is accompanied by some descriptions of the terms in the vocabulary using RDF
Schema (see Section 2.4.2). Ontologies take this a step further by providing much more pow-
erful inference and can be used to detect some errors in the conceptual model (for instance, if a
vehicle registration conflated a car with its owner).
As we share more models, we implicitly raise our expectations for the accuracy of these
models. George Box stated in 1976 that all models are wrong but some are useful [15]. Raising
1 http://lov.okfn.org/
18 2. THE RDF ECOSYSTEM
the bar for these models means we expect them to be useful in more situations than they were
originally designed for.
Something as apparently simple as schema.org’s schema:gender offers a simple model for a
complex issue. For at least 90% of the population, the model’s terms schema:Male and schema:Female
suffice. Extending that to 99% or 99.9% of the population we see these terms are insufficient
for the many variations in both identity and biology. Schema.org extends the model for these
cases by permitting a string value. FHIR HL7 (see Section 6.2) standards use a concept of
administrative gender, which adds two other possibilities.
For simplicity in this chapter and the next, we will use a notion of gender which is con-
strained to male and female. In later chapters we will use this to show how RDF validation lan-
guages can use the extended value set to provide coverage for more use cases.

2.4 TECHNOLOGIES RELATED WITH RDF


RDF was created as a language on which other technologies could be based on. The semantic web
stack (also called layer cake) illustrates a hierarchy of technologies where RDF plays a central
role. Although that stack is still evolving, there are two concepts that are worth mentioning:
SPARQL and inference systems.

2.4.1 SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is an RDF query language which is
able to retrieve and manipulate data stored in RDF. SPARQL 1.0 became a recommendation
in 2008 [79] and SPARQL 1.1 was published in 2013 [44].
SPARQL is based on the notion of Basic Graph Patterns which are sets of triple patterns.
A triple pattern is an extension of an RDF triple where some of the elements can be variables
which are denoted by a question mark.
A Basic Graph Pattern matches a subgraph of the RDF data when RDF terms from that
subgraph may be substituted for the variables and the result is an RDF graph equivalent to the
subgraph.

Example 2.6 Simple SPARQL query


The following SPARQL query retrieves the nodes ?x whose birth place is dbr:Oviedo and
the nodes ?y that are known by them.
1 prefix : <http: // example .org/>
2 prefix schema: <http: // schema .org/>
3 prefix dbr: <http: // dbpedia .org/ resource />

5 SELECT ?x ?y WHERE {
6 ?x schema:birthPlace dbr:Oviedo .
7 ?x schema:knows ?y
8 }
2.4. TECHNOLOGIES RELATED WITH RDF 19
Applying the previous SPARQL query to the RDF data defined in Example 2.3, a
SPARQL processor would return the results shown in Table 2.2.

Table 2.2: Results of SPARQL query

?x ?y
:carol :alice
:carol :bob
:bob :carol

SPARQL queries consist of three parts [73].


• A pattern matching part which includes operators to operate on graphs like optional parts,
union of patterns, nesting, and filtering values.

• Solution modifiers, which once the output of the pattern has been computed as a table
of variables/values, modify those values applying operators like projection, distinct, order,
limit, offset, grouping, etc.

• The output of SPARQL queries can be of different types like: ASK queries, which return
yes/no depending on the existence of matching values, SELECT queries that return a
selection of values for the variables that match a pattern and CONSTRUCT queries,
which return the triples generated from the values that match the pattern.

Example 2.7 SPARQL query with Filter and Counts


The following SPARQL query returns people who know only one value.
1 SELECT ? person ?known {
2 ? person schema:knows ?known .
3 { SELECT ? person ( count (*) as ? countKnown ) {
4 ? person schema:knows ?known .
5 } GROUP BY ? person
6 }
7 FILTER (? countKnown = 1)
8 }

It contains a nested query (lines 3–5) which groups each element with the number of
known entries and a filter (line 8) which removes those elements whose counter is different to
one.

A full introduction to SPARQL is out of the scope of this book. For the interested reader,
we recommend [33].
20 2. THE RDF ECOSYSTEM
SPARQL is a very expressive language which can be used to describe very complex queries.
It can also be employed to validate the structure of complex RDF graphs [55]. In Section 3.13,
we describe how SPARQL can be used to validate RDF.

2.4.2 INFERENCE SYSTEMS: RDF SCHEMA AND OWL


RDF was designed so it could be used as a central piece for knowledge representation in the
Web. The goal is that agents can automatically infer new knowledge in the form of new RDF
statements from existing RDF graphs. To that end, several technologies were proposed to in-
crease RDF expressiveness. In this section we will briefly review two of the most popular: RDF
Schema and OWL.

RDF Schema
RDF Schema was proposed as a data-modeling vocabulary for RDF data. The first public work-
ing draft of RDF Schema appeared in 1998 [16] and was accepted as a recommendation in
2004 [26].
It is a semantic extension of RDF which provides mechanisms to describe groups of re-
sources and relationships between them. It defines a set of common classes and properties.
The main classes defined in RDFS are:

• rdfs:Resource: the class of everything

• rdfs:Class: the class of all classes

• rdfs:Literal: the class of all literal values

• rdfs:Datatype: the class of all datatypes

• rdf:Property: the class of all properties

RDFS contains several properties like rdfs:label, rdfs:comment, rdfs:domain, rdfs:range,


rdf:type, rdfs:subClassOf
and rdfs:subPropertyOf.

Example 2.8 RDF Schema


The following snippet contains some description about teachers and people using RDF
Schema terms. It declares that schema:Person is an rdfs:Class, as well as :Teacher. It also declares
that the :Teacher class is a subclass of schema:Person which could be read as saying that every
instance of :Teacher is also an instance of schema:Person.
Finally, it declares that :teaches is a property that relates instances of :Teacher with instances
of :Course, i.e., any two elements related by the property :teaches will satisfy that the first is an
:Teacher and the second a :Course.
2.4. TECHNOLOGIES RELATED WITH RDF 21

1 schema:Person a rdfs:Class .

3 :Teacher a rdfs:Class ;
4 rdfs:subClassOf schema:Person .

6 :teaches a rdfs:Property ;
7 rdfs:domain :Teacher ;
8 rdfs:range :Course .

RDF Schema processors contain several rules that enable them to infer new RDF data.
For example, for any C rdfs:subClassOf D and x a C they can infer x a D, and for any p
rdfs:domain C and x p y they can infer x a C.
If we apply those rules to the following data:
1 :alice a :Person .
2 :bob a :Teacher .
3 :carol :teaches :algebra .

An RDFS processor could infer that :bob and :carol have rdf:type :Person and that :algebra
has rdf:type :Course.

OWL
OWL (Web Ontology Language) defines a vocabulary for expressing ontologies based on de-
scription logics. It was published as a W3C recommendation in 2004 [29] and a new ver-
sion, OWL 2, was accepted in 2009 [70]. OWL has several syntaxes: an RDF-based syntax,
functional-style Syntax, manchester syntax, etc., and a formally defined meaning. We will use
RDF syntax in the following examples with Turtle notation.
An ontology can be defined as a vocabulary of terms, usually about a specific domain and
shared by a community of users. Ontologies specify the definitions of terms by describing their
relationships with other terms in the ontology.
The main concepts in OWL are as follows.

• Classes, which represent sets of individuals. Classes can be subclasses of other classes, with
two special classes: owl:Thing that represents the set of all individuals and owl:Nothing that
represents the empty set.

• Individuals, which are elements in the domain. Individuals can be members of an OWL
class.

• Properties, which represent relationships. Properties are classified as datatype properties,


object properties and annotation properties. Datatype properties relate an individual with
a data value such as a string or integer. Object properties relate an individual with another
22 2. THE RDF ECOSYSTEM
individual. And Annotation properties encode information about the ontology itself (such
as the author or the creation date of an ontology).
• Constructors which allow to define complex concepts from other concepts using expres-
sions.

Example 2.9 OWL example


In the following example we declare two classes :Man and :Woman that have a property :gender
with the value :Male or :Female, respectively.
1 :Man a owl:Class ;
2 owl:equivalentClass [
3 owl:intersectionOf (
4 :Person
5 [ a owl:Restriction ;
6 owl:onProperty :gender ;
7 owl:hasValue :Male
8 ] ) ] .
9

10 :Woman a owl:Class ;
11 owl:equivalentClass [
12 owl:intersectionOf (
13 :Person
14 [ a owl:Restriction ;
15 owl:onProperty :gender ;
16 owl:hasValue :Female
17 ] ) ] .

Now, we can define :Person as the union of the :Man and :Woman classes, and to declare that
those classes are disjoint.
18 :Person owl:equivalentClass [
19 rdf:type owl:Class ;
20 owl:unionOf ( :Woman :Man )
21 ] .
22

23 [ a owl:AllDisjointClasses ;
24 owl:members ( :Woman :Man )
25 ] .

Given the previous declarations, if we add the following instance data:


26 :alice a :Woman ;
27 :gender :Female .
28

29 :bob a :Man .

An OWL reasoner can infer the following triples:


2.4. TECHNOLOGIES RELATED WITH RDF 23

30 :alice a :Person .
31 :bob a :Person .
32 :bob :gender :Male .

OWL can be used to define ontologies in several domains and there are several tools
like the Protégé editor [66] which provide facilities for the creation and visualization of large
ontologies.

2.4.3 LINKED DATA, JSON-LD, MICRODATA, AND RDFA


As we mentioned in Section 1.1, one of the principles of linked data is to provide useful infor-
mation when dereferencing a URI, using standards such as RDF. The goal is to return not only
human-readable content like HTML that a machine can only represent in a browser, but also
some machine understandable content in RDF which can be automatically processed.
There are two main possibilities: return different representations of the same resource using
content negotiation, or return the same representation with RDF embedded.
The first approach can be easier to implement because developers have several mechanisms
to transform a resource to different representations on the fly. A popular format nowadays is
JSON-LD which is a JSON-based representation of RDF.

Example 2.10 JSON-LD example


The Turtle Example 2.1 can be represented in JSON-LD as:
1 {" @context ": {
2 "ex": "http: // example .org/",
3 " schema ": "http: // schema .org/",
4 "dbr": "http: // dbpedia .org/ resource /",
5 "xsd": "http: // www.w3.org /2001/ XMLSchema #",
6 "name": { "@id": " schema:name " },
7 " birthDate ": { "@id": " schema:birthDate ", " @type ": " xsd:date " },
8 " birthPlace ": { "@id": " schema:birthPlace " },
9 "knows": { "@id": " schema:knows " }
10 },
11 " @graph ": [
12 { "@id": " ex:alice ",
13 "knows ": {"@id": " ex:bob " }
14 },
15 {"@id": " ex:bob ",
16 "name": " Robert ",
17 "knows ": {"@id": " ex:carol "},
18 " birthDate ": "1980 -03 -10",
19 " birthPlace ": {"@id": " dbr:Oviedo " }
20 },
21 { "@id": " ex:carol ",
24 2. THE RDF ECOSYSTEM
22 "knows": [{"@id": " ex:alice " }, {"@id": " ex:bob "}],
23 " birthPlace ": {"@id": " dbr:Oviedo " }
24 }
25 ]
26 }

An alternative approach is to embed RDF content in HTML.


RDFa can be used to embed RDF in HTML attributes.
1 <div xmlns : schema ="http :// schema .org/"
2 xmlns :ex="http :// example .org/"
3 xmlns :xsd="http :// www.w3.org /2001/ XMLSchema #"
4 typeof =" schema : Person "
5 about="[ex:alice ]">
6 My name is <span property =" schema :name">Alice </span >.
7 <p>I was born on
8 <span property =" schema : birthDate "
9 content ="1974 -12 -01"
10 datatype ="xsd:date">a Sunday , some time ago </span >,
11 and I am a professor at the
12 <span about="[ex: uniovi ]"
13 typeof =" schema : Organization "
14 property =" schema :name"
15 rel=" schema : member "
16 resource ="[ex:alice]">University of Oviedo </span >
17 </p>
18 </div >

An HTML browser visualizes the information:


My name is Alice. I was born on a Sunday, some time ago, and I am a professor at the
University of Oviedo
while an RDFa processor obtains the following RDF data:
1 ex:alice a schema:Person ;
2 schema:birthDate "1974 -12 -01"^^ xsd:date ;
3 schema:name "Alice " .

5 ex:uniovi a schema:Organization ;
6 schema:member ex:alice ;
7 schema:name " University of Oviedo " .

Another alternative is to use microdata:


1 <div itemscope
2 itemtype ="http :// schema .org/ Person "
3 itemid ="http :// example .org/alice ">
2.5. SUMMARY 25
4 Home page of <span itemprop ="name">Alice </span >.
5 <p>I was born on
6 <time itemprop =" birthDate "
7 datetime ="1974 -12 -01">a Sunday ,
8 some time ago </time >,
9 and I am a <span itemprop =" jobTitle ">Professor </span >
10 at the <span itemscope
11 itemprop =" affiliation "
12 itemtype ="http :// schema .org/ Organization ">
13 itemid ="http :// example .org/ uniovi "
14 <span itemprop ="name">University of Oviedo </span >
15 </span >
16 </p>
17 </div >

Which represents the same information as the RDFa example.

2.5 SUMMARY
• RDF defines a simple and powerful data model based on directed graphs.

• There are several syntaxes for RDF: Turtle, N-Triples, JSON-LD, RDF/XML, etc.

• The edges of the graph are predicate IRIs.

• RDF is the basis for the semantic web stack.

• RDF enables the integration of heterogeneous data.

• SPARQL is a query language for RDF.

• RDFS and OWL offer inference capabilities over RDF data.

• JSON-LD is a popular syntax for RDF based on JSON.

• Two alternatives to embed metadata in HTML content are RDFa and microdata.

2.6 SUGGESTED READING


Official online documents:

• R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1—Concepts and Abstract Syntax. W3C
Recommendation, February 2014. http://www.w3.org/TR/rdf11-concepts/

• S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Recommendation, 2013.
http://www.w3.org/TR/sparql11-query/
26 2. THE RDF ECOSYSTEM
• D. Brickley and R. V. Guha. RDF Schema 1.1. W3C Recommendation, 2014. http:
//www.w3.org/TR/rdf-schema/

• W. OWL Working Group. OWL 2 Web Ontology Language: Document Overview. W3C
Recommendation, October 2009. http://www.w3.org/TR/owl2-overview/
There are several books introducing the concepts of RDF and Semantic Web in general like:
• J. Hjelm. Creating the Semantic Web with RDF: Professional Developer’s Guide. Professional
Developer’s Guide Series. Wiley, 2001
• S. Powers. Practical RDF. O’Reilly & Associates, Inc., Sebastopol, CA, 2003
• T. B. Passin. Explorer’s Guide to the Semantic Web. Manning Publications Co., Greenwich,
CT, 2004
• T. Segaran, C. Evans, J. Taylor, S. Toby, E. Colin, and T. Jamie. Programming the Semantic
Web, 1st ed. O’Reilly Media, Inc., 2009
• J. Hebeler, M. Fisher, R. Blace, and A. Perez-Lopez. Semantic Web Programming. Wiley
Publishing, 2009
• P. Hitzler, M. Krötzsch, and S. Rudolph. Foundations of Semantic Web Technologies. Chap-
man & Hall/CRC, 2009
• G. Antoniou, P. Groth, F. v. v. Harmelen, and R. Hoekstra. A Semantic Web Primer. The
MIT Press, 2012
And about particular topics:
• Linked data: T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global
Data Space, volume 1. Morgan & Claypool Publishers LLC, February 2011. DOI:
10.2200/s00334ed1v01y201102wbe001
• SPARQL: B. DuCharme. Learning SPARQL. O’Reilly Media, Inc., 2011
• OWL and semantic modeling: D. Allemang and J. Hendler. Semantic Web for the Working
Ontologist: Effective Modeling in RDFS and OWL, 2nd ed. Morgan Kaufmann Publishers
Inc., San Francisco, CA, 2011
CHAPTER 3

Data Quality
People have been using computers to record and reason about data for many decades. Typically,
this reasoning is less esoteric than artificial intelligence tasks like classification.
A data modeler usually has some structure of the data that she is trying to model. That
structure must be explicitly defined and communicated using some technology that can at the
same time be understood by other people and also be processed by automatic systems that can
check and enforce it. Using natural language for that is not enough as it can have ambiguities
and is difficult to process by machines. On the other hand, enforcing that structure using some
procedural programming language is difficult to maintain by other people. The right balance is
usually to have some declarative language that can be readable by humans but at the same time
parsed and checked by machines.
Rigorous data validation is like a contract that offers advantages to several different parties.
• Consumers have an easier time understanding the semantics of data. For instance, a data
structure that requires either a full name or a given and family name has a simple intuition
while one that has optional full, given and family names leaves the consumer unsure about
the many combinations she may encounter in the data.
• Programmers have to do much less “defensive coding” when working with predictable
data. A programmer need not write special cases for permutations like no name, a full
name and a given name, etc. Introducing quality control into data workflows can reduce
security exploits and catch systematic errors when they first occur rather than years later
when someone stumbles across inconsistent data. For instance, a process may erroneously
insert multiple primary addresses if no system enforces that a person should have no more
than one primary address.
• Producers can precisely define and validate their output. This allows them to test consis-
tency with business processes, perform quality control, and unambiguously communicate
their assets to other parties.
• Queriers can tailor the sophistication of their queries to address a constrained set of pos-
sibilities. Queriers are a specific kind of consumers who are especially vulnerable to sys-
tematic data errors. Unexpected variations in data structures can result in missing query
results. Possibly worse, a single accidental duplication of a record can result in it being
counted many times, once for each combination of attributes in the original and duplicate
record.
28 3. DATA QUALITY
3.1 NON-RDF SCHEMA LANGUAGES
While RDF is a relative newcomer to the data scene, most widely-used structured data languages
have a way to describe and enforce some form of data consistency. Examining UML, SQL,
XML, JSON, and CSV allows us to set expectations for RDF validation.

3.1.1 UML
The Unified Modeling Language (UML) is a general-purpose visual modeling language that
can be used to provide a standard way to visualize the design of a system [85]. In 2005, the
Object Management Group (OMG) published UML 2, a revision largely based on the same
diagram notations, but using a modeling infrastructure specified using Meta-Object Facility
(MOF). UML contains 14 types of diagrams, which are classified in three categories: structure,
behavior and interaction. The most popular diagram is the UML class diagram, which defines
the logical structure of a system in terms of classes and relationships between them. Given the
Object Oriented tradition of UML, classes are usually defined in terms of sets of attributes and
operations.
UML class diagrams are employed to visually represent data models.

Example 3.1 UML Class diagram


Figure 3.1 represents an example of a UML class diagram. In this case, there are two
classes, User and Course with several attributes and two relationships. The relation enrolledIn es-
tablishes that a user can be enrolled in a course. The cardinalities 0..* means that a user may be
enrolled in several courses while a cardinality 1..* means that a course must have at least one
user enrolled. The other relationship is instructor which means that a course must have one in-
structor (cardinality 1) while a user can be the instructor of 0 or several courses. There is another
relationship (knows) between users.

Ø..* enrolledIn 1..*


User Course

name: String name:String


birthdate: Date? startDate: Date
Ø..* instructor 1
knows birthplace: IRI? endDate: Date
gender: [Male Female]?

Figure 3.1: Example of UML class diagram.

UML diagrams are typically not refined enough to provide all the relevant aspects of a
specification. There is, among other things, a need to describe additional constraints about the
3.1. NON-RDF SCHEMA LANGUAGES 29
objects in the model. OCL (Object Constraint Language)1 has been proposed as a declarative
language to define this kind of constraints. It can also be used to define well-formedness rules,
pre- and post-conditions, model transformations, etc.
OCL contains a repertoire of primitive types (Integer, Real, Boolean, String) and several
constructs to define compound datatypes like tuples, ordered sets, sequences, bag and sets.

Example 3.2 OCL constraints


The following code represents some constraints in OCL: that the gender must be 'Male'
or 'Female', that a user does not know itself and that the start date of a course must be bigger
that the end date. Notice that we are using a hypothetical operator < to compare dates while in
OCL dates are not primitive types.
1 course User inv:
2 self.gender -> forAll (g | Set{'Male ','Female '}-> includes (g) )
3 self.knows -> forAll (k | k <> self)
4

5 context Course inv:


6 self. startDate < self. endDate

3.1.2 SQL AND RELATIONAL DATABASES


Probably the largest deployment of machine-actionable data is in relational databases, and cer-
tainly the most popular access to relational data is by Structured Query Language (SQL). One
challenge in describing SQL is the difference between the ISO standard and deployed imple-
mentations.
SQL is designed to capture tabular data, with some implementations enforcing referential
integrity constraints for consistent linking between tables. SQL’s Data Definition Language
(DDL) is used to lay out a table structure; SQL is used to populate and query those tables.
The SQL implementations that do enforce integrity constraints do so when data is inserted into
tables.
The concept of DDL was introduced in the Codasyl database model to write the schema
of a database describing the records, fields and sets of the user data model. It was later used to
refer to a subset of SQL for creating tables and constraints. DDL statements list the properties
in a particular table, their associated primitive datatypes, and list uniqueness and referential
constraints.

1 http://www.omg.org/spec/OCL/
30 3. DATA QUALITY
Example 3.3 DDL
1 CREATE TABLE User (
2 id INTEGER PRIMARY KEY NOT NULL ,
3 name VARCHAR (40) NOT NULL ,
4 birthDate DATE ,
5 birthPlace VARCHAR (50) ,
6 gender ENUM('male ','female ')
7 );
8

9 CREATE TABLE Course (


10 id INTEGER PRIMARY KEY ,
11 StartDate DATE not null ,
12 EndDate DATE not null ,
13 Instructor INTEGER FOREIGN KEY REFERENCES User(id)
14 )
15

16 CREATE TABLE EnrolledIn (


17 studendId INTEGER FOREIGN KEY REFERENCES User(id),
18 courseId INTEGER FOREIGN KEY REFERENCES Course (id),
19 )

While implementation support for constraints and datatypes varies, popular datatypes
include numerics like various precisions of integer or float, characters, dates and strings.
Two popular constraints in DDL are for primary and foreign keys. In SQL and DDL,
attribute values are primitive types, which is to say that a user’s course is not a course record, but
instead typically an integer that is unique in some table of courses.

Users table Courses table


id Name Birthdate BirthPlace Gender id Name startDate endDate Instructor
67 Alice 1969-09-10 Oviedo Female 23 Algebra 2017-09-03 2017-12-20 82
82 Robert 1981-07-10 Lille Male 45 Logic 2018-01-10 2018-06-15 82
34 Carol 1982-03-01 London Female … … … … …

Enrolledin
studentId CourseId
82 23
34 45
… …

Figure 3.2: Example of two tables.


3.1. NON-RDF SCHEMA LANGUAGES 31
Because RDF is a graph, one would typically bypass this reference convention and create
a graph where a user’s course is a course instead of a reference.

3.1.3 XML
XML was proposed by the W3C as an extensible markup language for the Web around
1996 [98]. XML derives from SGML [42], a meta-language that provides a common syn-
tax for textual markup systems and from which the first versions of HTML were also derived.
Given its origins in typesetting, the XML model is adapted to represent textual information
that contains mixed text and markup elements.
The XML model is known as the XML Information Set (XML InfoSet) and consists of
a tree structure, where each node of the tree is defined to be an information item of a particular
type. Each item has a set of type-specific properties associated with it. At the root there is a
document item, which has exactly one element as its child. An element has a set of attribute
items and a list of child elements or text nodes. Attribute items may contain character items or
they may contain typed data such as name tokens, identifiers and references. Element identifiers
and references may be used to connect nodes transforming the underlying tree into a graph.

Example 3.4 XML example


An example of a course representation in XML can be:
1 <course name=" Algebra ">
2 <student id="alice ">
3 <name >Alice </name >
4 <gender >Female </ gender >
5 <comments >Friend of <person ref="bob">Robert </ person ></ comments >
6 </ student >
7 <student id="bob">
8 <name >Robert </name >
9 <gender >Male </ gender >
10 <birthDate >1981 -09 -24 </ birthDate >
11 </ student >
12 </ course >

XML became very popular in industry and a lot of technologies were developed to query
and transform XML. Among them, XPath was a simple language to select parts of XML doc-
uments that was embedded in other technologies like XSLT or XQuery.
The next XPath snippet finds the names of all students whose gender is "Female":
1 // student [ gender = " Female "]/ name

XML defines the notion of well-formed documents and valid documents. Well-formed
documents are XML documents with a correct syntax while valid documents are documents
that in addition of being well-formed, conform to some schema definition.
32 3. DATA QUALITY

root

course

name="Algebra" student student

id="alice" name gender comments id="bob" name gender comments

Alice Female Friend of person Bob Male 1981-09-24

ref="bob" Robert

Figure 3.3: Tree structure of an XML document.

If one decides to define a schema, there are several possibilities.


• Document Type Definition (DTD). The XML specification [98] declares a basic mecha-
nism to define the schema of XML documents, which was inherited from SGML and is
called DTD. It allows to define the structure of a family of XML documents

Example 3.5 DTD example


A DTD to validate the XML file in Example 3.4 could be:
1 <! ELEMENT course ( student *)>
2 <! ELEMENT student (name ,gender , birthDate ?) >
3 <! ELEMENT name (# PCDATA )>
4 <! ELEMENT gender (# PCDATA )>
5 <! ELEMENT birthDate (# PCDATA )>
6 <! ATTLIST student id ID # REQUIRED >
7 <! ATTLIST course name CDATA # IMPLIED >

DTD defines the structure of XML using a basic form of regular expressions. However,
DTDs have a limited support for datatypes. For example, it is not possible to validate that
the birth date of a student has the shape of a date.
3.1. NON-RDF SCHEMA LANGUAGES 33
• XML Schema. This specification was divided in two parts. The first part specifies the
structure of XML documents [89] and the second part a repertoire of XML Schema
datatypes [9].

Example 3.6 XML Schema example


1 <xs:schema xmlns:xs ='http: // www.w3.org /2001/ XMLSchema '>
2 <xs:element name=" course ">
3 <xs:complexType >
4 <xs:sequence >
5 <xs:element name=" student " minOccurs ='1' maxOccurs ='100 '
6 type=" Student "/>
7 </ xs:sequence >
8 <xs:attribute name="name" type=" xs:string " />
9 </ xs:complexType >
10 </ xs:element >
11 <xs:complexType name=" Student ">
12 <xs:sequence >
13 <xs:element name="name" type=" xs:string " />
14 <xs:element name=" gender " type=" Gender " />
15 <xs:element name=" birthDate " type=" xs:date " minOccurs ='0'/>
16 </ xs:sequence >
17 <xs:attribute name="id" type="xs:ID" use='required '/>
18 </ xs:complexType >
19 <xs:simpleType name=" Gender ">
20 <xs:restriction base=" xs:token ">
21 <xs:enumeration value="Male"/>
22 <xs:enumeration value=" Female "/>
23 </ xs:restriction >
24 </ xs:simpleType >
25 </ xs:schema >

An XML Schema validator decorates each structure of the XML document with addi-
tional information called the Post-Schema Validation Infoset, or PSVI. This structure
contains information about the validation process that can be later employed by other
XML tools.
• RelaxNG [20] was developed within the Organization for the Advancement of Structured
Information Standards (OASIS) as an alternative for XML Schema. RelaxNG has two
syntaxes: an XML-based one and a compact one. RelaxNG is grammar based and its
semantics is formally defined by means of axioms and inference rules.

Example 3.7 RelaxNG example


The following code contains a RelaxNG schema to validate Example 3.4 using the Re-
laxNG compact syntax.
34 3. DATA QUALITY

1 element course {
2 element student {
3 element name { xsd:string },
4 element gender { ”Male” | ”Female” },
5 element birthDate { xsd:date }?,
6 attribute id { xsd:ID }
7 }* ,
8 attribute name { xsd:string }
9 }

The same example can be expressed in XML as:


1 <element name=" course "
2 xmlns ="http: // relaxng .org/ns/ structure /1.0"
3 datatypeLibrary ="http: // www.w3.org /2001/ XMLSchema - datatypes ">
4 <zeroOrMore >
5 <element name=" student ">
6 <element name="name">
7 <data type=" string "/>
8 </ element >
9 <element name=" gender ">
10 <choice >
11 <value >Female </value >
12 <value >Male </value >
13 </ choice >
14 </ element >
15 <optional >
16 <element name=" birthDate ">
17 <data type="date"/>
18 </ element >
19 </ optional >
20 <attribute name="id">
21 <data type="ID"/>
22 </ attribute >
23 </ element >
24 </ zeroOrMore >
25 <attribute name="name">
26 <data type=" string "/>
27 </ attribute >
28 </ element >

• Schematron [50] is a rule-based language based on patterns, rules, and assertions. An as-
sertion contains an XPath expression and an error message. The error message is displayed
when the XPath expression fails. A rule groups various assertions together and defines
3.1. NON-RDF SCHEMA LANGUAGES 35
a context in which assertions are evaluated using an XPath expression. Finally, patterns
group various rules together.
Schematron has more expressive power than other schema languages like DTDs, RelaxNG
or XML Schema as it can express complex constraints that are impossible with them. In
fact, it is often used to define business rules.
Although Schematron can be used as a stand-alone, it is commonly used in cooperation
with other schema languages which define the document structure.

Example 3.8 Schematron example


If we have XML documents containing course grades like the following:
1 <course name=" Algebra ">
2 <student id="S234">
3 <name >Alice </name >
4 <grade >8</grade >
5 </ student >
6 <student id="B476">
7 <name >Robert </name >
8 <grade >5</grade >
9 </ student >
10 <average >9</ average >
11 </ course >

We can define the following Schematron file to validate.


– That student IDs start by S (lines 4–8).
– That the value of <average> is the mean of the grades.
1 <sch:schema
2 xmlns:sch ="http: // purl.oclc.org/dsdl/ schematron ">
3 <sch:pattern name="Check Ids">
4 <sch:rule context =" student ">
5 <sch:assert test="starts -with(@id ,'S')"
6 >IDs must start by S</ sch:assert >
7 </ sch:rule >
8 </ sch:pattern >
9 <sch:pattern name="Check mean">
10 <sch:rule context =" average ">
11 <sch:assert
12 test="sum (// student /grade) div
13 count (// student /grade) = ."
14 >Value of <sch:name /> does not match mean
15 </ sch:assert >
16 </ sch:rule >
17 </ sch:pattern >
18 </ sch:schema >
36 3. DATA QUALITY
Schematron is more expressive than other schema languages like DTDs, XML Schema,
or RelaxNG as it can define business rules and co-occurrence constraints at the same time
that it can also define structural constraints like the other ones. Nevertheless, Schematron
rules can become complex to define and debug. A popular approach is to combine both
approaches, defining the XML document structure with a traditional schema language
and complementing it with schematron rules.
• Other schema languages for XML has been SchemaPath was proposed as a simple ex-
tension of XML Schema with conditional constraints [22]. Bonxai [62] has been recently
proposed. It also contains a readable syntax inspired by RelaxNG.
Invoking validation in XML. Different approaches have been proposed to indicate how an
XML document has to be validated against a schema. Some of those approaches are the follow-
ing.
• Embedded schema. DTDs can directly be embedded in XML documents:
1 <! DOCTYPE course [
2 <! ELEMENT course ( student *) >
3 <! ELEMENT student (name ,grade)>
4 <! ATTLIST student id CDATA # REQUIRED >
5 ]>
6 <course name=" Algebra ">
7 ...
8 </ course >

• Directly associate instance data with XML Schema. It can be done, for example, using the
xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes.

For example, the following XML document directly declares that it follows the schema
identified by http://example.org/ns/Course which is located at http://example.org/course.xsd:
1 <course xmlns:xsi ="http: // www.w3.org /2001/ XMLSchema - instance "
2 xsi:schemaLocation ="http: // example .org/ns/ Course
3 http: // example .org/ course .xsd">
4 ...
5 </ course >

• The XML processing instruction <?xml-model ?> has been proposed to associate an XML
document with a schema [43].
1 <?xml -model href="http: // example .org/ course .rng" ?>
2 <?xml -model href="http: // example .org/ course .xsd" ?>
3 <course name=" Algebra ">
4 ...
5 </ course >
3.1. NON-RDF SCHEMA LANGUAGES 37
Note that the XML model processing instruction enables to use multiple schemas for the
same document.

• In WSDL [19] it is possible to associate documents or predetermined nodes in a document


with arbitrary XML Schema types.
As can be seen XML provides several ways to associate XML data with schemas for their
validation.

3.1.4 JSON
JSON was proposed by Douglas Crockford around 2001 as a subset of Javascript (the original
acronym was Javascript Object Notation). It has evolved as an independent data-interchange
format with its own ECMA specification [35].
A JSON value, or JSON document, can be defined recursively as follows.
• true, false and null are JSON values.

• Any decimal number is also a JSON value.

• Any string of Unicode characters enclosed by " is also a JSON value, called a string value.

• If k1 ; k2 ; : : : ; kn are distinct string values and v1 ; v2 ; : : : ; vn are JSON values, then fk1 W
v1 ; k2 W v2 ; : : : ; kn W vn g are JSON values, called objects. In this case, each ki W vi is a key-
value pair. The order of the key-value pairs is not significant.

• If v1 ; v2 ; : : : ; vn are JSON values, then Œv1; v2; : : : ; vn  are JSON values, called arrays. The
order of the array elements is significant.
Note that in the case of arrays and objects the values vi can again be objects or arrays, thus
allowing the documents an arbitrary level of nesting. In this way, the JSON data model can be
represented as a tree [14].

Example 3.9 JSON example


The following example contains a JSON object with two keys: name and students. The value
of name is a string while the value of students is an array of two objects.
1 { "name": " Algebra " ,
2 " students ": [
3 { "name": " Alice ",
4 " gender ": " Female ",
5 "age": 18
6 },
7 { "name": " Robert ",
38 3. DATA QUALITY
8 " gender ": "Male",
9 " birthDate ": "1981 -09 -24"
10 }
11 ]
12 }

Figure 3.4 shows a tree representation of the previous JSON value.

"name" "name"

"Algebra"
 1

"name" "age" "name" "birthDate"

"gender" "gender"
"Alice" "Female" 18 "Bob" "Male" "1980-09-24"

Figure 3.4: Tree structure of JSON.

JSON Schema [101] was proposed as an Schema language for JSON with a role similar
to XML Schema for XML. It is written itself using JSON syntax and is programming language
agnostic. It contains the following predefined datatypes: null, Boolean, object, array, number
and string, and allows to define constraints on each of them.
In JSON Schema, it is possible to have reusable definitions which can later be referenced.
Recursion is not allowed between references [74].

Example 3.10 JSON Schema example


The following example contains a JSON schema that can be used to validate Example 3.9.
It declares student as an object type with four properties: name, gender, birthDate and age. The first
two are required and some constraints can be added on their values.
The JSON value has type object and contains two properties: name, which must be a string
value, and students which must be an array, whose items conform to the student definition.
1 { " $schema ": "http :// json - schema.org /draft -04/ schema #",
2 " definitions ": {
3 " student ": { "type": " object ",
3.1. NON-RDF SCHEMA LANGUAGES 39
4 " properties ": {
5 "name": {"type": " string " },
6 " gender ": {"type": " string ", "enum":["Male"," Female "]},
7 " birthDate ": {"type": " string ", " format ": "date" },
8 "age": {"type": " integer "," minimum ": 1 }
9 },
10 " required ": ["name"," gender "]
11 }
12 },
13 "type": " object ",
14 " properties ": {
15 "name": { "type": " string " },
16 " students " : { "type": " array ",
17 " items ": { "$ref": "#/ definitions / student " }
18 }
19 },
20 " required ": ["name"," students "]
21 }

3.1.5 CSV
Comma-Separated Values (CSV) and Tab-Separated Values (TSV) files have historically had
no format-specific schema language. A common use case for CSV (and TSV) is to import
it into a relational database, where it is subject to the same integrity constraints as any other
SQL data. However, wide-ranging practices for documenting table structure and semantics have
historically made it hard for consumers of CSV to consume published CSV data with confidence.
Column headings and meanings may appear as rows in the CSV file, columns in an auxiliary
CSV or flat file, or be omitted entirely.
Spreadsheets are another common generator and consumer of CSV data. Some spread-
sheets may have hand-tooled integrity constraints but they offer no standard schema language.
While traditionally schema-less, a recent standard, CSV on the Web (CSVW) attempts
to describe the majority of deployed CSV data. This includes semantics (e.g., mapping to an
ontology), provenance, XML Schema length and numeric value facets (e.g., minimum length,
max exclusive value), and format and structural constraints like foreign keys and datatypes.
CSVW describes a wide corpus of existing practice for publishing CSV documents. Be-
cause of it’s World Wide Web orientation, it includes internationalization and localization fea-
tures not found in other schema languages. Where most data languages standardize the lexical
representation of datatypes like dateTime or integer, CSVW describes a wide range of region
40 3. DATA QUALITY
or domain-specific datatypes. For instance, the following can all be representations of the same
numeric value: 12345.67, 12,345.67, 12.345,67, 1,23,45.67.
CSVW is also unusual in that it can be used to describe denormalized data. Because of
this, it includes separator specifiers to aid in micro-parsing individual data cells into sequences
of atomic datatypes.
CSVW is a very new specification and applies to a domain with historically no standard
schema language. Tools like CSVLint2 are adopting CSVW as a way to offer interoperable
schema declarations to enable data quality tests.

3.2 UNDERSTANDING THE RDF VALIDATION PROBLEM


As we can see in Table 3.1, most data technologies have some description and validation tech-
nology which enables users to describe the desired schema of the data and to check if some
existing data conforms with that schema.

Table 3.1: Data validation approaches

Data Format Validation Technology


Relational databases DDL
XML DTD, XML Schema, RelaxNG, Schematron
CSV CSV on the Web
JSON JSON Schema
RDF ShEx/SHACL

Although there have been several previous attempts to define RDF validation technologies
(see Section 3.3) this book focuses on ShEx and SHACL.
In this section we describe what are the particular concepts of RDF that have to be taken
into account for its validation:

Graph data model RDF is composed of triples, which have arcs (predicates) between nodes.
We can describe:

• the form of a node (the mechanisms for doing this will be called “node constraints”);

• the number of possible arcs incoming/outgoing from a node; and

• the possible values associated with those arcs.

Figure 3.5 presents an RDF node and its corresponding Shape.


2 https://csvlint.io/
3.2. UNDERSTANDING THE RDF VALIDATION PROBLEM 41
ShEx
:alice schema:name "Alice"; <User> IRI {
schema:knows :bob . RDF Node schema:name xsd:string ;
schema:knows IRI *
}

Shape of RDF
:IRI schema:name string (1, 1);
Nodes that
schema:knows IRI (, *)
represent Users

Figure 3.5: RDF node and its shape.

Unordered arcs A difference between RDF and XML with regards to their data model is that
while in RDF, the arcs are unordered, in XML, the sub-elements form an ordered sequence.
RDF validation languages must not assume any order on how the arcs of a node will be treated,
while in XML, the order of the elements affect the validation process.
From a theoretical point of view, the arcs related with a node in RDF can be represented
as a bag or multiset, i.e., a set which allows duplicate elements.
RDF Validation ¤ Ontology ¤ Instance data Notice that RDF validation is different from
ontology definition and also different from instance data.
• Ontologies are usually focused on real-world things or at least objects from some domain.
The semantic web community has put a lot of emphasis on defining ontologies for different
domains and there are several vocabularies like OWL, RDFS, etc. that can be used to that
end. People concerned with this level are ontology engineers which must have skills to
understand how to represent the knowledge of some domain.
• Instance data refers to the data of some situation or problem at any given point. That data
can be obtained from different sources and is materialized in some data representation
language. In our case, instance data refers to RDF graphs that are created by developers
and programmers, or generated automatically from other sources like sensors.
• RDF validation is an intermediate process that can check if that instance data conforms to
some desired schema. In the case of RDF, it is focused on RDF graph features which are
at a lower level than ontology features. The people interested in RDF data description and
validation are data engineers and have concerns that are different from those of ontology
engineers. Data engineers are more worried about how to model data so the developers
can effectively and efficiently produce or consume it.
Figure 3.6 represents the difference between instance data, ontology definitions, and RDF
validation.
Shapes ¤ Types Given the open and flexible nature of RDF, nodes in RDF graphs can have
zero, one or many rdf:type arcs.
42 3. DATA QUALITY

schema:knows a owl:ObjectProperty ;
Ontology rdfs:domain schema:Person ;
rdfs:range schema:Person .

A user must have only two properties: <User> IRI {


Different Constraints
schema:name of value xsd:string schema:name xsd:string ;
levels RDF Validation
schema:knows with an IRI value schema:knows IRI
}

:alice schema:name "Alice";


Instance data schema:knows :bob .

Figure 3.6: RDF validation vs. ontology definition.

Some application can use nodes of type schema:Person with some properties while another
application can use nodes with the same type but different properties. For example, schema:Person
can represent friend, invitee, patient,...in different applications or even in different contexts
of the same application. The same types can have different meanings and different structure
depending on the context.
While from an ontology point of view a concept has a single meaning, applications that are
using that same concept may select different properties and values and thus, the corresponding
representations may differ.
Nodes in RDF graphs are not necessarily annotated with fully discriminating types. This
implies that it is not possible to validate the shape of a node by just looking at its rdf:type arc.
We should be able to define specific validation constraints in different contexts.
Inference Validation can be performed before or after inference. Validation after inference (or
validation on a backward-chaining store that does inference on the fly) checks the correctness of
the implications. An inference testing service could use an input schema describing the contents
of the input RDF graph and an output schema describing the contents of the expected inferred
RDF graph. The service can check that instance data conforms to the input schema before infer-
ence and that after applying a reasoner, the resulting RDF graph with inferred triples, conforms
to the output schema.

Example 3.11 Suppose we have a schema with two shapes, each with one requirement:
• PersonShape requires an rdf:type of :Person
• TeacherShape requires an rdf:type of :Teacher
If we validate the following RDF graph without inference, only :alice would match
PersonShape. However, if we validate the RDF graph that results of applying RDF Schema in-
ference, then both :bob and :carol would also match PersonShape.
3.2. UNDERSTANDING THE RDF VALIDATION PROBLEM 43

1 :teaches rdfs:domain :Teacher .


2 :Teacher rdfs:subClassOf :Person .

4 :alice a :Person .

6 :bob a :Teacher .

8 :carol :teaches :algebra .

Validation workflows will likely perform validation both before and after validation. Sys-
tems which perform possibly incomplete inference can use this to verify that their light-weight,
partial inference is producing the required triples.
RDF flexibility RDF was born as a schema-less language, a feature which provided a series
of advantages in terms of flexibility and adaptation of RDF data to different scenarios.
The same property, can have different types of values. For example, a property like
schema:creator can have as value a string literal or a more complex resource.

1 :angie schema:creator "Keith Richards " ,


2 [ a schema:Person ;
3 schema:givenName "Mick" ;
4 schema:familyName " Jagger "
5 ] .

Repeated properties Sometimes, the same property is used for different purposes in the same
data. For example, a book can have two codes with different structure.
1 :book schema:name "Moby Dick";
2 schema:productID "ISBN -10 :1503280780 ";
3 schema:productID "ISBN -13 :978 -1503280786 " .

This is a natural consequence of the re-use of general properties,3 which is especially com-
mon in domains where many kinds of data are represented in the same structure.

Example 3.12 Repeated properties example in clinical records


Repeated properties which require different model for each value appear frequently in
real-life scenarios. For example, FHIR (see Section 6.2 for a more detailed description) rep-
resents clinical records using a generic observation object. This means that a blood pressure
measurement is recorded using the same data structure as a temperature. The challenge is that
while a temperature observation has one value:4
3 Those familiar with the Protégé Pizza Tutorial will recall that it uses a has topping property rather than a has pizza
topping property.
4 Simplified from http://build.fhir.org/observation-example-body-temperature.ttl.html.
44 3. DATA QUALITY

1 :Obs1 a fhir:Observation ;
2 fhir:Observation .code fhir:LOINC8310 -5 ;
3 fhir:Observation . valueQuantity 36.5 ;
4 fhir:Observation . valueUnit "Cel" .

a blood pressure observation has two:5


1 :Obs2 a fhir:Observation ;
2 fhir:Observation .code fhir:LOINC55284 -4 ;
3 fhir:Observation . component [
4 fhir:Observation . component .code fhir:LOINC8480 -6 ;
5 fhir:Observation . component . valueQuantity 107 ;
6 fhir:Observation . component . valueUnit "mm[Hg]"
7 ];
8 fhir:Observation . component [
9 fhir:Observation . component .code fhir:LOINC8462 -4 ;
10 fhir:Observation . component . valueQuantity 60 ;
11 fhir:Observation . component . valueUnit "mm[Hg]"
12 ] .

We can see that a blood pressure observation must have two instances of the
fhir:Observation.component property, one with a code for a systolic measurement and the other
with a code for a diastolic measurement.
Treating these two constraints on the property fhir:Observation.component individually
would cause the systolic constraint to reject the diastolic measurement and the diastolic con-
straint to reject the systolic measurement—both constraints must be considered as being satisfied
if one of the components satisfies one and the other component satisfies the other.

Closed Shapes The RDF dictum of anyone can say anything about anything is in tension with
conventional data practices which reject data with any assertions that are not recognized by
the schema. For SQL schemas, this is enforced by the data storage itself; there’s simply no
place to record assertions that does not correspond to some attribute in a table specified by the
DDL. XML Schema offers some flexibility with constructs like <xs:any processContents="skip">
but these are rare in formats for the exchange of machine-processable data. Typically the edict
is if you pass me something I do not understand fully, I will reject it.
For shapes-based schema languages, a shape is a collection of constraints to be applied to
some node in an RDF graph and if it is closed, every property attached to that node must be
included in the shape.
Even if the receiver of the data permits extra triples, it may not be able to store or return
them. For instance, a Linked Data container may accept arbitrary data, search for sub-graph
which it recognizes, and ignore the rest. A user expecting to put data in such a container and
5 Simplified from http://build.fhir.org/observation-example-bloodpressure.ttl.html.
3.3. PREVIOUS RDF VALIDATION APPROACHES 45
retrieve it will have a rude surprise when he gets back only a subset of the submitted data. Even
if the receiver does not validate with closed shapes, the user may wish to pre-emptively validate
their data against the receiver’s schema, flagging any triples not recognized by the schema.
Another value of closed shapes is that it can be used to detect spelling mistakes. If a shape
in a schema includes an optional rdfs:label and a user has accidentally included an rdf:label, the
schema has no way to detect that mistake unless all unknown properties are reported.
Like with repeated properties, the validation of closed shapes must consider property con-
straints as a whole, rather than examining each individually.

3.3 PREVIOUS RDF VALIDATION APPROACHES


In this section we review some previous approaches that have already been proposed to validate
RDF.

3.3.1 QUERY-BASED VALIDATION


Query-based approaches use a query Language to express validation constraints. One of the ear-
liest attempts in this category was Schemarama [63], by Libby Miller and Dan Brickley, which
applied Schematron to RDF using the Squish query language. That approach was later adapted
to use TreeHuger which reinterpreted XPath syntax to describe paths in the RDF model [95].
Once SPARQL appeared in scene, it was also adopted for RDF validation. SPARQL has
a lot of expressiveness and can be used to validate numerical and statistical computations [55].

Example 3.13 Using SPARQL to validate RDF


If we want to validate that an RDF node has a schema:name property with a xsd:string value
and a schema:gender property whose value must be one of schema:Male or schema:Female in SPARQL,
we can do the following query:
1 ASK {
2 { SELECT ? Person {
3 ? Person schema:name ?o .
4 } GROUP BY ? Person HAVING ( COUNT (*) =1)
5 }
6 { SELECT ? Person {
7 ? Person schema:name ?o .
8 FILTER ( isLiteral (?o) &&
9 datatype (?o) = xsd:string )
10 } GROUP BY ? Person HAVING ( COUNT (*) =1)
11 }
12 { SELECT ? Person ( COUNT (*) AS ?c1) {
13 ? Person schema:gender ?o .
14 } GROUP BY ? Person HAVING ( COUNT (*) =1)
15 }
16 { SELECT ? Person ( COUNT (*) AS ?c2) {
46 3. DATA QUALITY
17 ? Person schema:gender ?o .
18 FILTER ((?o = schema:Female || ?o = schema:Male ))
19 } GROUP BY ? Person HAVING ( COUNT (*) =1)
20 }
21 FILTER (?c1 = ?c2)
22 }

Using plain-SPARQL queries for RDF validation has the following benefits.
• It is very expressive and can handle most RDF validation needs.

• SPARQL is ubiquitous: most of RDF products already have support for SPARQL.
But it also has the following problems.
• Being very expressive, it is also very verbose. SPARQL queries can be difficult to write and
debug by non-experts.

• It can be idiomatic in the sense that there can be more than one way to encode the same
constraint.

• For all but the simplest data structures, it is complex to exhaustively write SPARQL queries
which accept all valid permutations and reject all incorrect structures. This exhaustive enu-
meration is essentially the job of the approaches described below.
SPARQL Inferencing Notation (SPIN)[51] was introduced by TopQuadrant as a mech-
anism to attach SPARQL-based constraints and rules to classes. SPIN also contained tem-
plates, user-defined functions and template libraries. SPIN rules are expressed as SPARQL
ASK queries where true indicates an error or CONSTRUCT queries that produce violations.
SPIN uses the expressiveness of SPARQL plus the semantics of the variable ?this standing for
the current focus node (the subject being validated).
SPIN has heavily influenced the design of SHACL. The Working Group has decided
to offer a SPARQL based semantics and the second part of the working draft also contains
a SPIN-like mechanism for defining SPARQL native constraints, templates and user-defined
functions. There are some differences like the renaming of some terms and the addition of more
core constraints like disjunction, negation or closed shapes. The following document describes
how SHACL and SPIN relate (http://spinrdf.org/spin-shacl.html).
There have been other proposals using SPARQL combined with other technologies. Für-
ber and Hepp [39] proposed a combination between SPARQL and SPIN as a semantic data
quality framework, Simister and Brickley [90] propose a combination between SPARQL queries
and property paths which is used by Google and Kontokostas et al. [53] proposed RDFUnit a
Test-driven framework which employs SPARQL query templates that are instantiated into con-
crete quality test queries.
3.3. PREVIOUS RDF VALIDATION APPROACHES 47
3.3.2 INFERENCE-BASED APPROACHES
Inference based approaches adapt RDF Schema or OWL to express validation semantics. The
use of Open World and Non-unique name assumption limits the validation possibilities. In fact,
what triggers constraint violations in closed world systems leads to new inferences in standard
OWL systems. Motik, Horrocks, and Sattler [64] proposed the notion of extended description
logics knowledge bases, in which a certain subset of axioms were designated as constraints.
In [72], Peter F. Pater-Schneider, separates the validation problem in two parts: integrity
constraint and closed-world recognition. He shows that description logics can be implemented
for both by translation to SPARQL queries.
In 2010, Tao et al. [96] had already proposed the use of OWL expressions with Closed
World Assumption and a weak variant of Unique Name Assumption to express integrity con-
straints.
Their work forms the bases of Stardog ICV [21] (Integrity Constraint Validation), which
is part of the Stardog database. It allows to write constraints using OWL syntax but with a
different semantics based on a closed world and unique name assumption. The constraints are
translated to SPARQL queries. As an example, a User could be specified as follows.

Example 3.14 Validation constraints using Stardog ICV


The following code declares several integrity constraints in Stardog ICV. It declares that
nodes that are instances of schema:Person must have at exactly one value of schema:name (it is a
functional property) which must be a xsd:string, an optional value of schema:gender which must
be either schema:Male or schema:Female, and zero or more values of schema:knows which must be
instances of schema:Person.
1 schema:Person a owl:Class ;
2 rdfs:subClassOf [ owl:onProperty schema:name ;
3 owl:minCardinality 1 ] ,
4 [ owl:onProperty schema:gender ;
5 owl:minCardinality 0 ]
6 [ owl:onProperty schema:knows ;
7 owl:minCardinality 0
8 ] .
9

10 schema:name a owl:DatatypeProperty ,
11 owl:FunctionalProperty ;
12 rdfs:domain schema:Person ;
13 rdfs:range xsd:string .
14

15 schema:gender a owl:ObjectProperty ,
16 owl:FunctionalProperty ;
17 rdfs:domain schema:Person ;
18 rdfs:range :Gender .
19

20 schema:knows a owl:ObjectProperty ;
48 3. DATA QUALITY
21 rdfs:domain schema:Person ;
22 rdfs:range schema:Person .
23

24 schema:Female a :Gender .
25 schema:Male a :Gender .

Instance nodes are required to have an rdf:type declaration whose value is schema:Person.

3.3.3 STRUCTURAL LANGUAGES


While SPARQL and OWL Closed World were existing languages which were applied to RDF
validation, some novel languages have been designed specifically to that task.
OSLC Resource Shapes [86] have been proposed as a high level and declarative descrip-
tion of the expected contents of an RDF graph expressing constraints on RDF terms.

Example 3.15 OSLC example


Example 3.13 can be represented in OSLC as:
1 :user a rs:ResourceShape ;
2 rs:property [
3 rs:name "name" ;
4 rs:propertyDefinition schema:name ;
5 rs:valueType xsd:string ;
6 rs:occurs rs:Exactly -one ;
7 ] ;
8 rs:property [
9 rs:name " gender " ;
10 rs:propertyDefinition schema:gender ;
11 rs:allowedValue schema:Male , schema:Female ;
12 rs:occurs rs:Zero -or -one ;
13 ].

Dublin Core Application Profiles [23] also define a set of validation constraints using
Description Templates
Fischer et al. [38] proposed RDF Data Descriptions as another domain specific language
that is compiled to SPARQL. The validation is class based in the sense that RDF nodes are val-
idated against a class C whenever they contain an rdf:type C declaration. This restriction enables
the authors to handle the validation of large datasets and to define some optimization techniques
which could be applied to shape implementations.

3.4 VALIDATION REQUIREMENTS


In this section we collect the different validation requirements that we have identified for an
RDF validation language.
3.4. VALIDATION REQUIREMENTS 49
Some of this requirements have been borrowed from the SHACL Use Cases and Re-
quirements document [91]. Other collections of validation requirements have also been pro-
posed [13].

3.4.1 GENERAL REQUIREMENTS


• VR 1. High-level language: The schema must be defined using a high-level language that
uses concepts familiar to the users that intend to validate RDF.

• VR 2. Concise: Schemas must be easy to understand, read, and write by humans. Verbose
languages tend to be neglected by their users.

• VR 3. Formal: It must be based on a formal language that can be automatically processed


by machines without ambiguity. The schemas must be parsed and processed by automatic
means and the semantics of the different terms must be defined in a non-ambiguous way.

• VR 4. Implementation independence: The schema definition must be implementation in-


dependent so processors can be implemented using different programming languages and
technologies

• VR 5. Feasibility: The validation algorithm that a schema processor has to implement


must be feasibly computed. It is necessary to check that suitable algorithms are available
to check if RDF datasets comply with some schema. Otherwise, if the validation requires
too many computational resources, there will not be interest in its application in practical
scenarios.

• VR 6. Least power: The schema language must be able to do its job well but no more than
that. Although one could use whole procedural languages like Java or Python to validate
RDF, doing it in this way will be cumbersome as the validation rules will be interspersed
with the code [97]. This principle states that a declarative language should be preferred
over a procedural one.

3.4.2 GRAPH-BASED REQUIREMENTS


Given that the RDF data model is a graph model. An RDF validation language must be able to
describe graph structures. The following set of requirements could be applied to any validation
language related with graphs.

• VR 7. Focus identification: A validation process must identify the graph nodes that are
expected match constraints. Unlike tree structures like XML or JSON, graphs like RDF
have no “root” node. For RDF, the focii would be IRIs, literals and blank nodes which are
subject to validation.
50 3. DATA QUALITY
• VR 8. Properties: A schema language must be able to describe which arcs relate with which
nodes. In the case of RDF, arcs between nodes are called properties or predicates and are
IRIs. The schema language must be able to describe the properties that depart from some
nodes.

• VR 9. Repeated properties: Some of the arcs that depart from a node may be repeated and
the nodes that they point to could have different structure. The schema language must be
able to declare that some properties can appear repeated but with different contents.

• VR 10. Inverse properties: It must be possible to describe the incoming arcs of a node,
which are also called inverse properties.

• VR 11. Paths: The schema language must be able to describe the paths that relate two
given nodes in a graph. SPARQL 1.1 contains a language to describe paths in an RDF
graph. For example, the transitive traversal of the rdfs:subClassOf property can be expressed
as rdfs:subClassOf*.

3.4.3 RDF DATA MODEL REQUIREMENTS


The schema language must be able to check the different types of contents that appear in the
RDF data model.

• VR 12. Node kinds: The RDF data model contains three kinds of nodes: IRIs, Literals, and
BNodes. The schema language must be able to describe the kind of some specific nodes

• VR 13. Datatypes: The schema language must be able to describe which are the datatypes
that some nodes have.

• VR 14. Datatype facets: The XML Schema datatypes are the most popular datatypes em-
ployed in RDF datasets. Those datatypes can be qualified with facets which constrain the
possible values. For example, one can say that a value is an xsd:integer between 10 and 20.

• VR 15. Language tags: The schema language can describe the language tag associated with
literals of type rdf:langString.

3.4.4 DATA-MODELING-BASED REQUIREMENTS


This set of requirements are common to technologies that model data.

• VR 16. Conjunction: It must be possible to declare that some content must satisfy all the
constraints in a set.

• VR 17. Disjunction: It must be possible to declare that some content must satisfy some of
the constraints in a set.
3.4. VALIDATION REQUIREMENTS 51
• VR 18. Addition: It must be possible to declare that some content must be the addition of
some content. In the case of RDF graphs, one may want to declare that a node must have
some content and some other content.
• VR 19. Regular cardinalities: . The schema must support regular cardinalities like optional,
zero or more, one or more.
• VR 20. Numerical cardinalities: . The schema must support numerical cardinalities like
repetitions between m and n, or at least m repetitions.
• VR 21. Negation: It must be possible to declare that some content must not satisfy some
constraint.
• VR 22. Recursion: It must be possible to declare that some group of constraints that depend
on another group in a recursive way.
• VR 23. OneOf : It must be possible to declare that some content can have one of several
structures. For example, a person can have either a full name or a combination of first name
and last name, but not both.
• VR 24. Open/Closed models: The schema language must be able to define that some content
is open and admits other features apart from the declared structure or closed and does not
admit other features.
• VR 25. Co-occurrence constraints: The schema language must be able to declare that the
appearance of some content affects other content.

3.4.5 EXPRESSIVENESS OF SCHEMA LANGUAGE


• VR 26. Comparisons: The schema language must describe comparisons between values like
declaring that a value is less than or equal to another one.
• VR 27. Arithmetic: The schema language can perform arithmetic expressions for constraint
checking. For example, to describe the area of a rectangle as the product of its declared
base by its declared height it must perform that multiplication.
• VR 28. Expressions: The schema language can define complex expressions to enable fur-
ther constraint checking. This requirement can contradict VR1 so it is necessary to find a
balance between both requirements.
• VR 29. Composition: The schema language provides mechanisms to define constraints that
are composed of other constraints.
• VR 30. Abstraction: The schema language provides mechanisms to define abstractions
with parameters that can later be reused. This feature is usually implemented by functions,
macros, or templates.
52 3. DATA QUALITY
• VR 31. Modularity: The schema definitions can be done in a modular way so they can be
reused and imported from external sources.
• VR 32. Specialization: The schema language can define a group of constraints that extends
another group of constraints with some further refinements.

3.4.6 VALIDATION INVOCATION REQUIREMENTS


The following requirements refer to the relationship between schema and instance data, and to
the mechanism by which the validation process is triggered.
• VR 33. Whole dataset: The schema language can define constraints that must be satisfied
by a whole RDF dataset.
• VR 34. Single node: It must be possible to validate a single node in an RDF graph against
a set of constraints.
• VR 35. Selection: There are mechanisms to select which nodes in an RDF graph are selected
for validation against which sets of constraints.
• VR 36. Reuse: It should be possible to reuse a set of constraints in different contexts.

3.4.7 USABILITY REQUIREMENTS


The following set of requirements refer to the usability of the schema language.
• VR 37. Error reporting: Validation processors complying to the schema language can gen-
erate a report of the different violation errors that appeared during validation.
• VR 38. Validation report: The schema language can generate a report of the nodes that
have been validated and the set of constraints they satisfy.
• VR 39. Annotations: It is possible to provide annotations with some extra information
that does not affect validation but can be used for different purposes such as searching,
browsing, UI generation, etc.
• VR 40. Familiar syntax: The schema language supports a syntax that is familiar to its
intended audience. In the case of RDF validation, a familiar syntax could be RDF.
• VR 41. Profiles: The schema language can include the notion of profiles with different ex-
pressiveness so that certain processors implement a subset of the validation functionalities.

3.5 SUMMARY
In this chapter we learned which are the main motivations for validating RDF. We started de-
scribing what do other technologies do for validation with an overview of UML, SQL, XML,
3.6. SUGGESTED READING 53
JSON, and so on. This section was aimed to present those technologies and to gather some list
of validation requirements that are common to all of them.
We also described some of the previous RDF validation approaches and collected a list
of validation requirements that a good schema language for RDF validation must fulfil. Notice
that some of them contradict each other, so it is necessary to reach some compromise solution.

3.6 SUGGESTED READING


Non-RDF schema languages
• The following book contains a good overview of non-RDF validation approaches: S. Abite-
boul, I. Manolescu, P. Rigaux, M.-C. Rousset, and P. Senellart. Web Data Management.
Cambridge University Press, 2012. DOI: 10.1017/cbo9780511998225
• R. J. Glushko, Ed. The Discipline of Organizing. The MIT Press, 2013. DOI:
10.1002/bult.2013.1720400108
• M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages
using formal language theory. ACM Transactions on Internet Technology, 5(4):660–704,
November 2005. DOI: 10.1145/1111627.1111631
• Overview of JSON Schema: P. Bourhis, J. L. Reutter, F. Suárez, and D. Vrgoč.
JSON: Data model, query languages and schema specification. In Proc. of the 36th
ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’17,
pages 123–135, New York, ACM, 2017. DOI: 10.1145/3034786.3056120
RDF validation approaches
• J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity constraints in OWL. In Proc.
of the 24th Conference on Artificial Intelligence (AAAI’10), 2010
• T. Bosch, E. Acar, A. Nolle, and K. Eckert. The role of reasoning for RDF validation. In
Proc. of the 11th International Conference on Semantic Systems, SEMANTICS’15, pages 33–
40, New York, ACM, 2015. DOI: 10.1145/2814864.2814867
• SHACL use cases and requirements: S. Steyskal and K. Coyle. SHACL Use Cases and
Requirements. W3C Working Draft, 2016. https://www.w3.org/TR/shacl-ucr/
CHAPTER 4

Shape Expressions
Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx
was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource
Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the
language were adopted from Turtle [80] and SPARQL [44] with tokens for grouping, repetition
and wildcards from regular expression and RelaxNG Compact Syntax [100]. The language was
described in a paper [80] and codified in a June 2014 W3C member submission [92] which
included a primer and a semantics specification. This was later deemed “ShEx 1.0”.
The W3C Data Shapes Working group started in September 2014 and quickly coalesced
into two groups: the ShEx camp and the SHACL camp. In 2016, the ShEx camp split from
the Data Shapes Working Group to form a ShEx Community Group (CG). In April of 2017,
the ShEx CG released ShEx 2 with a primer, a semantic specification and a test-suite with
implementation reports.
As of publication, the ShEx Community Group was starting work on ShEx 2.1 to add
features like value comparison and unique keys. See the ShEx Homepage http://shex.io/
for the state of the art in ShEx. A collection of ShEx schemas has also been started at https:
//github.com/shexSpec/schemas.

4.1 USE OF SHEX


Strictly speaking, a ShEx schema defines a set of graphs. This can be used for many purposes,
including communicating data structures associated with some process or interface, generating
or validating data, or driving user interface generation and navigation. At the core of all of these
use cases is the notion of conformance with schema. Even one is using ShEx to create forms,
the goal is to accept and present data which is valid with respect to a schema.
ShEx has several serialization formats:
• a concise, human-readable compact syntax (ShExC);
• a JSON-LD syntax (ShExJ) which serves as an abstract syntax; and
• an RDF representation (ShExR) derived from the JSON-LD syntax.
These are all isomorphic and most implementations can map from one to another.
Tools that derive schemas by inspection or translate them from other schema languages
typically generate ShExJ. Interactions with users, e.g., in specifications are almost always in the
56 4. SHAPE EXPRESSIONS
compact syntax ShExC. As a practical example, in HL7 FHIR, ShExJ schemas are automat-
ically generated from other formats, and presented to the end user using compact syntax. See
Section 6.2.3 for more details.
ShExR allows to use RDF tools to manage schemas, e.g., doing a SPARQL query to find
out whether an organization is using dc:creator with a string, a foaf:Person, or even whether an
organization is consistent about it.

4.2 FIRST EXAMPLE


Example 4.1 below contains a very simple ShEx schema.
• The first three lines declare prefixes using the same syntax as SPARQL Turtle.
• Nest line defines a shape called :User. Nodes with that shape must satisfy the following
constraints on their properties.
• They must have exactly one value for property schema:name which must be a xsd:string.
• They can have an optional property schema:birthDate with type xsd:date.
• They must have exactly one property schema:gender whose value is schema:Male or
schema:Female or some string.

• They can have zero or more properties schema:knows whose value must be an IRI and con-
form to the :User shape.

Example 4.1
1 PREFIX : <http: // example .org/>
2 PREFIX schema: <http: // schema .org/>
3 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
4

5 :User {
6 schema:name xsd:string ;
7 schema:birthDate xsd:date ? ;
8 schema:gender [ schema:Male schema:Female ] OR xsd:string ;
9 schema:knows IRI @:User *
10 }

The following RDF graph validates:


1 :alice schema:name "Alice" ; # V Passes as :User
2 schema:gender schema:Female ;
3 schema:knows :bob .

5 :bob schema:gender schema:Male ; # V Passes as :User


4.2. FIRST EXAMPLE 57
6 schema:name " Robert ";
7 schema:birthDate "1980 -03 -10"^^ xsd:date .

9 :carol schema:name "Carol " ; # V Passes as :User


10 schema:gender " unspecified " ;
11 foaf:name "Carol " .

The nodes :alice, :bob and :carol have shape :User.

• :alice conforms because it contains schema:name and schema:gender with their corresponding
values. It does not contain the property schema:birthDate but that property is optional, as
indicated by ‘?‘. It also has the property schema:knows with the value :bob which has :User
shape.

• :bob conforms because it contains the properties and values of the :User shape. Note that
the order in which triples are expressed in the example does not matter. These are parsed
into an RDF graph and RDF graphs are unordered collections of triples.

• :carolconforms because it has property schema:name with a xsd:string value, schema:gender


with another xsd:string value and an extra property foaf:name.
Notice that :carol conforms even if it has other properties apart of those mentioned by the
:User shape definition (in this case foaf:name).

ShEx shapes are open by default, which means that they constrain neither the existence
nor the value of the properties not mentioned in the shape. This behavior can be modified
using the CLOSED qualifier as we will explain in Section 4.6.8.

Given the following RDF graph:


1 :dave schema:name "Dave"; # X Fails as :User
2 schema:gender "XYY"; #
3 schema:birthDate 1980 . # 1980 is not an xsd:date

5 :emily schema:name "Emily", " Emilee " ; # X Fails as :User


6 schema:gender schema:Female . # too many schema:names

8 :frank foaf:name "Frank " ; # X Fails as :User


9 schema:gender: schema:Male . # missing schema:name

11 :grace schema:name "Grace " ; # X Fails as :User


12 schema:gender schema:Male ; #
13 schema:knows _:x . # _:x is not an IRI

15 :harold schema:name " Harold " ; # X Fails as :User


16 schema:gender schema:Male ;
17 schema:knows :grace . # :grace does not conform to :User
58 4. SHAPE EXPRESSIONS
If we try to validate the nodes in the following graph against the shape :User, the validator
would fail for all of the nodes:

• :dave fails because the value of schema:birthDate is 1980 (an integer) which is not an xsd:date.

• :emilyfails because it has two values for property schema:name. Unless otherwise specified,
the default cardinality is “exactly one” (which can also be written as “{1}” or “{1,1}”).

• :frank fails because it does not have the property schema:name.

• :gracefails because the value of schema:knows is a blank node and there is a node constraint
saying that it must be an IRI.

• :harold fails because the value of schema:knows is :grace and :grace does not conform to the
:User shape.

4.3 SHEX IMPLEMENTATIONS


At the time of this writing, we are aware of the following implementations of ShEx.

• shex.js for Javascript/N3.js (Eric Prud’hommeaux) https://github.com/shexSpec/sh


ex.js/;

• Shaclex for Scala/Jena ( Jose Emilio Labra Gayo) https://github.com/labra/shacl


ex/;

• shex.rb for Ruby/RDF.rb (Gregg Kellogg) https://github.com/ruby-rdf/shex;

• Java ShEx for Java/Jena (Iovka Boneva/University of Lille) https://gforge.inria.fr/


projects/shex-impl/; and

• ShExkell for Haskell (Sergio Iván Franco and Weso Research Group) https://github
.com/weso/shexkell.

There are also several online demos and tools that can be used to experiment with ShEx.

• shex.js (http://rawgit.com/shexSpec/shex.js/master/doc/shex-simple.html);

• Shaclex (http://shaclex.herokuapp.com); and

• ShExValidata (for ShEx 1.0) (https://www.w3.org/2015/03/ShExValidata/).


4.4. THE SHAPE EXPRESSIONS LANGUAGE 59
4.4 THE SHAPE EXPRESSIONS LANGUAGE
4.4.1 SHAPE EXPRESSIONS COMPACT SYNTAX
The ShEx compact syntax (ShExC) was designed to be read and edited by humans. It follows
some conventions which are similar to Turtle or SPARQL.
• PREFIX and BASE declarations follow the same convention as in Turtle. In the rest of this
chapter we will omit prefix declarations for brevity.

• Comments start with a # and continue until the end of line.

• The keyword a identifies the rdf:type property.

• Relative and absolute IRIs are enclosed by <> and prefixed names (a shorter way to write
out IRIs) are written with prefix followed by a colon “:”.

• Blank nodes are identified using _:label notation.

• Literals can be enclosed by the same quotation conventions (', ", ''', """) as in Turtle.

• Keywords (apart from a) are not case sensitive. Which means that MinInclusive is the same
as MININCLUSIVE.
A ShExC document declares a ShEx schema. A ShEx schema is a set of labeled shape
expressions which are composed of node constraints and shapes. These constrain the permissible
values or graph structure around a node in an RDF graph. When we are considering a specific
node, we call that node the focus node.
The triples which have the focus node as a subject are called outgoing arcs; those with
the focus node as an object are called incoming arcs. (Typical RDF idioms call for constraints
on outgoing arcs much more frequently than on incoming arcs.) Together, the incoming and
outgoing arcs are called the neighborhood of that node.
Shape expression labels can be IRIs or blank nodes but only IRI labels can be referenced
from outside the schema. In the previous Example 4.1, :User is an IRI label.
Node constraints declare the shape of a focus node without looking at the arcs. They can
declare the kind of node (IRI, blank node or literal), the datatype in case of literals, describe it
with XML Schema facets (e.g., min and max numeric values, string lengths, number of digits), or
enumerate a value set. Figure 4.1 signals the node constraints that appear in Example 4.1 which
are: xsd:string and xsd:date (datatype constraints), [schema:Male schema:Female] (a value set), IRI (a
node kind declaration) and @:User (a value shape). Node constraints will be described in more
detail in Section 4.5.
Triple constraints define the triples that appear in the neighborhood of a focus node. They
usually contain a property (or inverse property), a node constraint, and a cardinality declaration
which is one by default.
60 4. SHAPE EXPRESSIONS

Node constraints

:User {
schema:name xsd:string ;
schema:birthDate xsd:date ? ;
schema:gender [schema:Male schema:Female] OR xsd:string;
schema:knows IRI @:User *
}

Node constraints

Figure 4.1: Node constraints in a shape.

For example, schema:name xsd:string is a triple constraint. The :User shape from Example 4.1
was formed by four triple constraints. Triple constraints will be described later in Section 4.6.1.

:User {
schema:name xsd:string ;
schema:birthDate xsd:date? ; Triple
schema:gender [schema:Male schema:Female] OR xsd:string; constraints
schema:knows IRI @:User*
}

Figure 4.2: Triple constraints in a shape.

Triple constraints can be grouped using the semicolon operator ; to form triple expres-
sions.1 Shapes are enclosed by curly braces {} and contain triple expressions.
Shapes are the basic form of shape expressions, although more complex shape expressions
can be formed by combining the logical operators AND, OR and NOT which will be later described
in Section 4.6. Shape expressions are identified by shape expression labels.
Figure 4.4 shows a compound shape expression formed by combining the shape reference
@:User with a shape that contains a single triple constraint :teaches @:Course using the AND operator.
The full ShEx BNF grammar is specified at http://shex.io/shex-semantics/#she
xc.

4.4.2 INVOKING VALIDATION


In Example 4.1, we tested several RDF nodes (:alice, :bob, ... :harold) against the shape :User.
1 We will see that the pipe operator can also be used to form triple expressions in Section 4.6.4.
4.4. THE SHAPE EXPRESSIONS LANGUAGE 61
Shape Shape
label

:User { Triple expression


schema:name xsd:string ;
schema:birthDate xsd:date? ;
schema:gender [schema:Male schema:Female] OR xsd:string;
schema:knows IRI @:User*
}

Figure 4.3: Shapes, shape expression labels and triple expressions.

Shape expression
label
Shape espression
Shape

:Teacher @:User AND {


:teaches @:Course ;
}

Figure 4.4: Shape expression and shape.

ShEx validation takes as input a schema, an RDF graph, and a shape map, and returns
another shape map.
The input shape map (called fixed shape map) contains a list of nodeSelector@shapeLabel as-
sociations separated by commas, where nodeSelector is an RDF node and shapeLabel is a shape
label. Both use N-Triples notation.
A fixed map would look like:
1 <http: // data. example /# alice >@<http: // schema . example /# User >,
2 <http: // data. example /#bob >@<http: // schema . example /# User >

Although shape maps use absolute IRIs for RDF nodes and shape labels, we will use
prefixes to abbreviate them in our listings:
1 :alice@:User ,
2 :bob@User

Note that during evaluation, the processor may need to check the conformance of other
nodes against other shapes.

Example 4.2 Invoking validation example


If we define the following schema:
62 4. SHAPE EXPRESSIONS

1 :User {
2 schema:name xsd:string ;
3 schema:knows @:User *
4 }

and the RDF graph:


1 :alice schema:name "Alice";
2 schema:knows :carol .

4 :bob schema:name " Robert " .

6 :carol schema:name "Carol" .

when we invoke a ShEx processor with the fixed shape map:


1 :alice@:User ,
2 :bob@:User

the result shape map is:


1 :alice@:User ,
2 :bob@:User ,
3 :carol@:User

The reason is that in order to check that :alice conforms to :User, the processor must
check that :carol also conforms to :User and hence, it adds the association :carol@:User to the
result shape map.
Figure 4.5 depicts the validation process.
There are many use case-dependent ways to compose a fixed shape map. ShEx defines a
common one called query shape map which uses triple patterns to select nodes. Triple patterns
use curly braces and three values that represent the subject, predicate and object of a triple. They
can contain the value FOCUS to identify the node we want to select and _ to indicate that we do
not constrain some value.

Example 4.3 Query map example


The following query map selects all subjects of schema:name, all objects of schema:knows and
nodes that have rdf:type with value schema:Person.
1 {FOCUS schema:name _}@:User ,
2 {_ schema:knows FOCUS } @:User
3 {FOCUS rdf:type schema:Person } @:User

Section 4.9 describes fixed shape maps and query shape maps in greater detail.
4.4. THE SHAPE EXPRESSIONS LANGUAGE 63
ShExSchema
:User {
schema:name xsd:string ;
schema:knows @:User *
}

Fixed Shape Map


ShEx :alice@:User,
:alice@:User, :bob @:User,
Validator
:bob @:User :Carol@:User

RDF Graph
:alice schema:name "Alice";
schema:knows :carol .

:bob schema:name "Robert" .

:carol schema:name "Carol" .

Figure 4.5: Validation process which accepts a fixed shape map and emits a result shape map.

In the previous example, validating :alice as a :User entailed validating :carol as a :User.
Unless the validation engine has some sort of state persistence, it would be more efficient to
validate once with a shape map like:
1 : alice@ :User ,: carol@ :User

than to validate :alice and :carol separately.


Validating a shape map with multiple nodeSelector/shapeLabel pairs allows the engine to
leverage any pairs that it has already tested.

4.4.3 STRUCTURE OF SHAPE EXPRESSIONS


In Section 4.4.1, we described shape expressions as being composed of node constraints and
shapes. These can also be combined with the logical operators And, Or and Not. And and Or expres-
sions in turn contain two or more shape expressions. When we refer to a shape expression, we
mean one of the following.
• A node constraint, which constrains the set of allowed values of a node.
• A shape, which constrains the neighborhood of a node.
• An And of two or more shape expressions (called ShapeAnd).
64 4. SHAPE EXPRESSIONS
• An Or of two or more shape expressions (called ShapeOr).

• A Not of one shape expression (called ShapeNot)

• An external shape expression.

This recursive structure forms a tree which has node constraints and shapes as leaves.
Figure 4.6 represents the ShEx data model.

ShapeExpr

NodeConstraint Shape ShapeAnd ShapeOr ShapeNot ShapeExternal


nodeKind [IRI| closed: Boolean? expressions: expressions: expressions:
BNode| extra: List[IRI]? ShapeExpr{2,}
{2,} ShapeExpr{2,} ShapeExpr
Literal|
Nonliteral]?
datatype: IRI?
xsFacets: XsFacet*
values: ValueSetValue*

TripleExpr

TripleConstraint EachOf OneOf


inverse: Boolean expressions: expressions:
pred: IRI TripleEpr{2, } TripleEpr{2, }
min: Integer min: Integer min: Integer
max: Integer | max: Integer | max: Integer |
Unbounded Unbounded Unbounded
valueExpr: ShapeExpr

Figure 4.6: ShEx data model.

Node constraints and shapes are described in the following sections while the logical op-
erators are discussed in Section 4.8 and external shapes in Section 4.7.3.

4.4.4 START SHAPE EXPRESSION


The shape expression might be selected by label or it might default to a special shape called the
start shape.
A schema can have one more shape expression called the start expression. This serves as
start here advice from the schema author and is useful when describing a graph with a single
4.5. NODE CONSTRAINTS 65
purpose. For instance, the medical data protocol FHIR (see Section 6.2) has specific schemas
for resources like Patient.

Example 4.4 ShEx schema with start directive


Consider the following code:
1 start = @<Patient >
2

3 <Patient > {
4 ...
5 }
6 ...

In the compact syntax, the directive start = @<Patient> declares that the shape expression
<Patient> will be used by default if a shape is not explicitly provided in the shapes map.

In shape maps, it is possible to declare that a node must be validated against the shape
map by using the keyword START. For example, the following shape map:
1 :alice@START ,
2 :bob@ <Doctor >

would validate :alice against the start shape expression (in the previous example, it would be
<Patient>) and :bob against <Doctor>.

4.5 NODE CONSTRAINTS


Node constraints describe the allowed values of a node. These include specification of RDF
node kind, literal datatype, string and numeric facets, and value sets.
Node constraints can appear as a labeled shape expression or as part of triple constraints.

Example 4.5
Any place one does not want a node constraint, can be marked with a period ("."). This
is analogous to the period which matches any character in regular expressions. The following
example lists the properties that a :User must have but it does not specify any constraint in their
values:
1 :User {
2 schema:name . ;
3 schema:alternateName . * ;
4 schema:birthDate . ?
5 }

Given the following RDF graph:


66 4. SHAPE EXPRESSIONS

1 :alice schema:name 23 . # V Passes as :User

3 :bob schema:name " Robert " ; # V Passes as :User


4 schema:alternateName "Bob",
5 "Bobby",
6 <Bob > ;
7 schema:birthDate " Unknown " .

If we provide the shape map :alice@:User,:bob@:User the ShEx processor would return that
they both conform.

Node constraints usually appear as part of value expressions in triple constraints.


Example 4.6 Node constraint in a value expression
The following example declares that nodes with shape :User must have a property schema:url
whose value must be an IRI.
1 :User {
2 schema:url IRI
3 }

Node constraints can also appear as top level shapes.


Example 4.7 Node constraint as top-level shape
The following code defines two shapes, :HomePage and :CanVoteAge, which are defined as
node constraints. The first one declares that nodes must be IRIs and the second one that they
must be xsd:integer values greater than 18.
1 :HomePage IRI
2

3 :CanVoteAge xsd:integer MinInclusive 18

If we provide a ShEx processor the shape map


1 <http: // example .org/alice >@:HomePage ,
2 23 @:VotingAge ,
3 45 @:HomePage ,
4 14 @:VotingAge

The result would be that the first two nodes are conformant while the last two nodes are
non-conformant.

It is also possible to combine top-level node constraints with more complex shapes.
Example 4.8 Node constraint as top-level shape
The following declaration of shape :User says that nodes conforming to shape :User must
be IRIs and have a property schema:name with an xsd:string value.
4.5. NODE CONSTRAINTS 67

1 :User IRI AND {


2 schema:name xsd:string
3 }

In this case, the external AND can be omitted, so the previous shape is equivalent to:
1 :User IRI {
2 schema:name xsd:string
3 }

Table 4.1 gives an overview of the main types of node constraints with some examples
and a short description.

Table 4.1: Node constraints


Name Description Examples
Anything The value can be anything .
Datatype The value must be an element of the datatype xsd:string
xsd:date
cdt:distance
. . .
Node kind The value must have that kind IRI
BNode
Literal
NonLiteral
Value set The value must be an element of that set [:Male :Female]
Shape reference The value must conform to <User> @:User

4.5.1 NODE KINDS


Node kinds describe the kind that a value must have. There are four node kinds in ShEx: Literal,
IRI, BNode, and NonLiteral which follow the rules defined in RDF 1.1 for such terms.

Example 4.9
The following example declares that the value of property schema:name must be a literal and
the value of schema:follows must be an IRI.
1 :User {
2 schema:name Literal ;
3 schema:follows IRI
4 }
68 4. SHAPE EXPRESSIONS
Table 4.2: Node kinds
Value Description Examples
Literal Any RDF literal "Alice"
"Spain"@en
42
true
IRI Any RDF IRI <http://example.org/Alice>
ex:alice
:bob
BNode Any blank node _:x
[]
NonLiteral Any IRI or blank node <http://example.org/alice>
_:x

1 :alice schema:name "Alice"; # V Passes as :User


2 schema:follows :bob .

4 :bob schema:name :Bob ; # X Fails as :User


5 schema:follows _:x . # :Bob is not a literal and _:x is not an IRI

4.5.2 DATATYPES
Like most schema languages, ShEx includes datatype constraints which declare that a focus
node must be a literal with some specific datatype. ShEx has special support for XML Schema
datatypes [9] for which it checks that the lexical form also conforms to the expected datatype.

Example 4.10 Simple datatypes


The following example declares the datatypes that must have the values of schema:name and
schema:birthDate properties.

1 :User {
2 schema:name xsd:string ;
3 foaf:age xsd:integer ;
4 schema:birthDate xsd:date ;
5 }

1 :alice schema:name "Alice "; # V Passes as :User


2 foaf:age 36 ;
3 schema:birthDate "1981 -07 -10"^^ xsd:date .
4.5. NODE CONSTRAINTS 69

5 :bob schema:name " Robert "^^ xsd:string ; # V Passes as :User


6 foaf:age "26"^^ xsd:integer ;
7 schema:birthDate "1981 -07 -10"^^ xsd:date .

9 :carol schema:name :Carol ; # X


Fails as :User
10 foaf:age "14" ; # :Carol is an IRI
11 schema:birthDate "2003 -06 -10"^^ xsd:date . # and "14" a string

13 :dave schema:name "Dave" ; # X


Fails as :User
14 foaf:age " Unknown "^^ xsd:integer ; # invalid lexical forms
15 schema:birthDate " Unknown "^^ xsd:date .

As we said, for XML Schema datatypes, ShEx also checks that the lexical form matches
the expected datatype. For example, the foaf:age of :dave is "Unknown"^^xsd:integer and although
it declares that "Unknown" is an integer and some RDF parsers allow those declarations, "Unknown"
does not have the integer’s lexical form and the ShEx processor will complain. The same happens
for the value of schema:birthDate.

Example 4.11 Custom datatypes


Although the most common use case is to use XML Schema datatypes, RDF data can
use other datatypes. In the following example, a picture contains the properties schema:width and
schema:height using a hypothetical custom datatype for distances (cdt:distance).

1 :Picture {
2 schema:name xsd:string ;
3 schema:width cdt:distance ;
4 schema:height cdt:distance
5 }

1 :gioconda schema:name "Mona Lisa"; # V Passes as :Picture


2 schema:width "21 in"^^ cdt:distance ;
3 schema:height "30 in"^^ cdt:distance .

5 :other schema:name "Other picture " ; # X


Fails as :Picture
6 schema:width "21 in"^^ xsd:string ; # expected cdt:distance
7 schema:height 30 .

Example 4.12 Language-tagged literals


The datatype rdf:langString identifies language-tagged literals (see [25, Section 3.3]), i.e.,
RDF literals that have a language tag.
70 4. SHAPE EXPRESSIONS

1 :Country {
2 schema:name rdf:langString ;
3 }

1 :spain schema:name "España"@es . # V Passes as :Country

3 :france schema:name " France " . # X Fails as :Country

4.5.3 FACETS ON LITERALS


XML Schema provides a useful library of string and numeric tests called facets [9]. These facets
are listed in Table 4.3 with a sample argument and some passing and failing values.

Table 4.3: Facets on literals


Facet and Argument Passing Values Failing Values
MinInclusive 1 "1"ˆˆxsd:decimal, “1”ˆˆxsd:string,
1, 2, 98, 99, 100 -1, 0
MinExclusive 1 2, 98, 99, 100 -1, 0, 1
MaxInclusive 99 1, 2, 98, 99 100
MaxExclusive 99 1, 2, 98 99, 100
TotalDigits 3 "1"ˆˆxsd:integer, "1"ˆˆxsd:string,
9, 999, 0999, 1000, 01000,
9.99, 99.9, 0.1020 1.1020, .1021, 0.1021
FractionDigits 3 "1"ˆˆxsd:decimal, "1"ˆˆxsd:integer,
0.1, 0.1020, 1.1020 0.1021, 0.10212
Length 3 "123"ˆˆxsd:string, "12"ˆˆxsd:string,
"123"ˆˆxsd:integer, "12"ˆˆxsd:integer,
"abc" "ab", "abcd"
MinLength 3 "abc", "abcd" "", "ab"
MaxLength 3 "", "ab", "abc" "abcd", "abcde"
/ˆab+/ "ab", "abb", "abbcd" "", "a", "acd", "cab"
Regex pattern "AB", "ABB", "ABBCD"
/ˆab+/i "ab", "abb", "abbcd" "", "a", "acd"
Regex pattern "AB", "ABB", "ABBCD"
with i flag
4.5. NODE CONSTRAINTS 71
Example 4.13
1 :Product {
2 schema:name xsd:string MaxLength 10 ;
3 schema:weight xsd:decimal MinInclusive 1 MaxInclusive 200 ;
4 schema:sku /^[A-Z0 -9]{10 ,20}$/ ;
5 }

1 :product1 schema:name " Product 1"; # V Passes as :Product


2 schema:weight "23.0"^^ xsd:decimal ;
3 schema:sku " A23456B234CBDF " .

5 :product2 schema:name " Product 2" ; # X


Fails as :Product
6 schema:weight "245.5"^^ xsd:decimal ;# schema:weight > 200
7 schema:sku "ABC" . # schema:sku fails regex

The pattern constraint (‘/regex/’) is based on the XPath regular expression function
fn:matches(str,re,flags)which takes as parameters the string to match, the regular expression,
and an optional flags parameter to modify the matching behavior.
XPath regular expressions are based on common conventions from other languages like
Perl or other Unix tools like grep. The regular expression language is a string composed of the
characters to match and some characters which have special meaning called meta-characters.
• x matches the 'x' character.
• \u0078 matches the unicode codepoint U+78 (which is again 'x').
• . matches any character.
• [vxz] declares a character class, and matches any of 'v', 'x', or 'z'.
• \d is a pre-defined character class which matches any digit. It is equivalent ot “[0-9]”.
• \S is a pre-defined character class which matches any space character (which also includes
tabs and newlines). It is equivalent ot “[\u0008\u000d\u000a\u0020]”.
Inside character classes, the symbol “^” means negation and “-” can be used to declare
character ranges. For instance, the character class [^a-zA-Z] matches any non-letter.
Cardinality (repetition) operators can be used to specify how many characters are matched.
The possibilities are as follows.
• ? represents zero or one values.
• + one or more values.
• * zero or more values.
72 4. SHAPE EXPRESSIONS
• {m,n} between m and n values.

Any string of characters must be matched in the order of its characters with the following
alterations.

• | declares alternatives, e.g., “abc|def|ghi” matches any of “abc”, “def”, “ghi”.

• ^ matches the beginning of a string.

• $ matches the end of a string.

• “()” declares a group which is useful for cardinality and alternatives. For example: “\^ab(
cd|ef){2,}gh” matches “abcdcdcdghij”.

All of the meta characters above will be treated as a literal (i.e., they match themselves) if
they are prefixed with a \\ (backslash).
Table 4.4 contains several examples of regular expression matches.

Table 4.4: Examples of regular expressions

Regular Expression Some Values that Match Some Values that Don’t Match
P\d{2,3} P12 P234 A1 P2n P1 P2233
(pa)*b b pab papab papapab . . . pa po
(pa)*b b pab papab papapab . . . pa po
[a-z]{2,3} ab abc a abcd 23
[a-z]{2,3} ab abc a abcd x45 23

The flags string has the following possibilities.

• i: Case-insensitive mode.

• m: Multi-line mode. If present, the ^ character matches the start of any line (not only the
start of the string) and the $ matches the end of any line (not only the end of the string).

• s: If present, the dot matches also newlines, otherwise it matches any character except
newlines. This mode is called single-line mode in Perl.

• x: Removes white space characters in the regular expression before matching.

• q: All meta characters are interpreted as literals, i.e., they match themselves in the input
string. q is compatible with the i flag. If it’s used with the m, s or x flag, that flag is ignored.
4.5. NODE CONSTRAINTS 73
4.5.4 VALUE SETS
A value set is a node constraint which enumerates the list of possible values that a focus node
may have. In ShExC, value sets are enclosed by square brackets ([ and ]) where each possible
value is separated by a space.

Example 4.14 Example with value sets


The following example declares a shape :Product with two properties: schema:color and
schema:manufacturer, whose possible values are enumerated.

1 :Product {
2 schema:color [ "Red" "Green " "Blue" ] ;
3 schema:manufacturer [ :OurCompany :AnotherCompany ]
4 }

1 :x1 schema:color "Red"; # V Passes as :Product


2 schema:manufacturer :OurCompany .

4 :x2 schema:color "Cyan" ; # X Fails as :Product


5 schema:manufacturer :OurCompany .

7 :x3 schema:color "Green" ; # X Fails as :Product


8 schema:manufacturer :Unknown .

Unit value sets A common pattern is to declare that a node must have a specific value. This
can be done by a unit value set, i.e., a value set with a single value.

Example 4.15
1 :Spanish {
2 schema:country [ :Spain ]
3 }
4

5 :User {
6 a [ schema:Person ]
7 }

1 :alice schema:country :Spain . # V Passes as :Spanish

3 :bob schema:country :France . # X Fails as :Spanish

5 :carol a schema:Person ; # V Passes as :Spanish and :User


6 schema:country :Spain .

8 :p1 a schema:Product ; # X Fails as :User


74 4. SHAPE EXPRESSIONS
9 schema:country :Spain . # V Passes as :Spanish

11 :dave rdf:type schema:Person ; # V Passes as :User


12 schema:country :Japan . # X Fails as :Spanish

Note that the :User shape employs the a keyword which stands for rdf:type. There is no
inference in ShEx, even for rdf:type, which is treated as any other arc. See Section 3.2 for a
discussion of the difference between shapes and classes.
Language-tagged values As seen above, value sets contain one or more values. The examples
so far have included IRI and strings (literals with a datatype of xsd:string). These match precisely
the same value in the data. They can also be language tags, which match any literal with the given
language tag.

Example 4.16
1 :FrenchProduct {
2 schema:label [ @fr ]
3 }
4

5 :SpanishProduct {
6 schema:label [ @es @es -AR @es -ES ]
7 }

1 :car1 schema:label " voiture "@fr . # V Passes as :FrenchProduct

3 :car2 schema:label "Autom\'ovil"@es . # V Passes as :SpanishProduct

5 :car3 schema:label "Carro"@es -AR . # V Passes as :SpanishProduct

7 :car4 schema:label "Coche"@es -ES . # V Passes as :SpanishProduct

Ranges We can see in the example above that it would be convenient to accept literals with any
language tag starting with "es". This can be indicated with the postfix operator ‘~’. For example,
Argentinian, Chilean, and other region codes for Spain could be accepted with ‘schema:label [
@es~ ]’.

Example 4.17 Language-tagged ranges


The following code declares that Spanish products contain rdfs:label with a value that
must be a language-tagged literal in Spanish or any variant.
4.5. NODE CONSTRAINTS 75

1 :SpanishProduct {
2 schema:label [ @es~ ]
3 }

1 :car1 schema:label "Autom\'ovil"@es . # V Passes as :SpanishProduct

3 :car2 schema:label "Carro"@es -AR . # V Passes as :SpanishProduct

5 :car3 schema:label "Coche"@es -ES . # V Passes as :SpanishProduct

This also works for strings, e.g., ‘"+34"~’ (French telephone numbers) and IRIs, e.g., ‘<http:
//www.w3.org/ns/>~’ (W3C namespaces).

Example 4.18 String and IRI ranges example


1 :SpanishW3CPeople {
2 schema:telephone [ "+34"~ ] ;
3 schema:url [ <http: // www.W3C.es/Personal >~ ]
4 }

1 :alice schema:telephone "+34 123 456 789"; # V


Passes as :SpanishW3CPeople
2 schema:url <http: // www.W3C.es/ Personal /Alice > .

4 :bob schema:telephone "123 456 789" ; # X Fails as :SpanishW3CPeople


5 schema:url <http: // other.org/bob > . # Bad telephone and url

IRIs represented as prefixed names can also have a postfix ‘~’, e.g., foaf:~ represents the set
of all URIs that start with the namespace bound to the prefix foaf:.

Example 4.19
In the following example, we declare that the status of a product must start by http://
example.codes/good. or http://example.codes/bad..

1 prefix codes: <http: // example .codes />


2

3 :Product {
4 :status [ codes:good .~ codes:bad .~ ]
5 }

1 prefix codes: <http: // example .codes />


2 prefix other: <http: // other .codes/>

4 :x1 :status codes:good . Shipped . # V Passes as :Product


76 4. SHAPE EXPRESSIONS
6 :x2 :status other:done . # X Fails as :Product

8 :x3 :status <http: // example .codes/bad.Lost > . # V Passes as :Product

Exclusions It can also be useful to exclude some values from a range. Exclusions are marked
by the minus - sign. For example: codes:~ - codes:unknown represents all values starting by codes:
except codes:unknown.
Exclusions can themselves be ranges. For example: codes:~ - codes:bad.~ represents all
values starting by codes: except those that start by codes:bad..
Example 4.20 Range exclusions
The following code prescribes that the status of products can be anything that starts with
codes: except codes:unknown or codes starting with codes:bad..

1 prefix codes: <http: // example .codes/>


2

3 :Product {
4 :status [ codes: ~ - codes:unknown - codes:bad .~ ]
5 }

1 prefix codes: <http: // example .codes/>


2 prefix other: <http: // other.codes/>

4 :p1 :status codes:good . Shipped . # V Passes as :Product

6 :p2 :status other:done . # X Fails as :Product

8 :p3 :status <http: // example .codes/bad.Lost > . # X Fails as :Product

10 :p4 :status <http: // example .codes/unknown > . # X Fails as :Product

Exclusions must be the same kind (IRI, string or language tag) as the stem type. For
instance, ‘[ codes:good.~ - "bad."- @fr~ ]’ would be malformed as it’s an IRI range excluding a
string and a language stem.
Heterogeneous value sets There is no requirement that value sets be composed of a consistent
kind of value (IRI, string or language tag). For instance, the status of a product can be the IRIs
(:Accepted or :Rejected) or a string, e.g., “unknown”.
Example 4.21
1 :Product {
2 schema:status [ :Accepted :Rejected " unknown " ]
3 }
4.5. NODE CONSTRAINTS 77
Wildcard stem ranges Sometimes we want to accept user data with any value except some
specific values. For this, a wildcard character (‘.’) followed by one or more exclusions can be used
(so long as those exclusions are all of the same kind). The kind of the exlcusions (IRI, string, or
language tag) establishes the type of RDF term that will be matched.

Example 4.22 Example of a wildcard range with exclusion


The following code declares that the status of products can be anything except the IRI
codes:bad. Given that the exclusion is an IRI, the status must be an IRI.

1 prefix codes: <http: // example .codes />


2

3 :Product {
4 :status [ . - codes:bad ]
5 }

1 prefix codes: <http: // example .codes />


2 prefix other: <http: // other .codes/>

4 :p1 :status codes:good . # V Passes as :Product

6 :p1 :status other:bad . # V Passes as :Product

8 :p2 :status codes:bad . # X Fails as :Product

10 :p2 :status "good" . # X Fails as :Product


11 # "good" must be a IRI

Value set expressivity Value sets are mostly a shorthand syntax for complex Boolean com-
binations of node constraints. ShEx includes them because they are much more concise and,
given their ubiquity in other schema languages, they are fundamental to how people model and
understand data.

Example 4.23 Representing value sets


The following shape:
1 :User {
2 schema:gender [ schema:Male schema:Female ]
3 }

can be defined without value sets using the OR operator that will be presented in Section 4.6.
1 :User {
2 schema:gender [ schema:Male ]
3 } OR {
4 schema:gender [ schema:Female ]
5 }
78 4. SHAPE EXPRESSIONS
4.6 SHAPES
In the previous section we explored node constraints and how they declare a set of permissi-
ble RDF terms. Most of the examples used node constraints in triple constraints, limiting the
permissible values for triples in the input graph.

Example 4.24 Simple example


In the following example, we describe a shape :User
1 :User {
2 schema:name xsd:string
3 }

and we will try to validate the nodes :alice and :bob represented in the following data:
1 :alice schema:name "Alice" ; # V Passes as :User
2 schema:knows :bob .
3

4 :bob schema:name 34 ; # X Fails as :User


5 schema:knows :alice . # wrong schema:name

To solidify our intuition of validating shapes, we need to think of this as a series of steps
to validate a focus node against a shape expression.

1. Check if focus node :alice conforms to the shape expression :User.

2. :Useris a shape so check if the neighborhood of :alice matches the triple expression in the
shape :User. This step means that one needs to find a way to distribute the triples in the
neighborhood to satisfy the triple expression.

3. The shape’s triple expression is a single triple constraint so all one needs to do is find
the triple with a matching predicate in the neighborhood. In this case, the triple :alice
schema:name "Alice".

4. The triple expression has a value expression so consider the object, "Alice", as the focus
node and test it against the node constraint (in this case xsd:string).

5. "Alice" matches ‘xsd:string’ so this test succeeds.

6. The cardinality of the triple constraint is {1,1} (the default one) and as there is only one
tripe matching the node conforms to the shape expression.

When the same steps are performed to check :bob, the last step will have 34 as the focus
node. This test fails so :bob fails to conform to :User.
4.6. SHAPES 79
Shape A shape is a container for a triple expression along with some properties stating how
to treat triples not matching the triple expression. We will describe these properties after in-
troducing triple expressions (Section 4.6.8). Since triple expressions are combinations of triple
constraints, we start with them.

4.6.1 TRIPLE CONSTRAINTS


The basic building block of a triple expression is a triple constraint. It is composed of a property,
a node constraint, and a cardinality.
A triple constraint expresses a constraint on the values of triples with the given property
and the number of values expressed by the cardinality. Cardinalities will be described in more
detail in Section 4.6.3.

Example 4.25 The following shape is defined by a single triple constraint whose components
are depicted in Figure 4.7.
1 :Product {
2 schema:productId xsd:string {1 ,2}
3 }

The meaning is that nodes conforming to :Product must satisfy:


• They must have property schema:productId.
• All the values of schema:productId must satisfy the node constraint xsd:string.
• As the cardinality is {1,2}, there can be between 1 and 2 values of schema:productId.

:Product {
schema:productId xsd:string {1,2} Triple constraint
}
Property Node Cardinality
constraint

Figure 4.7: Parts of a triple constraint.

1 :p1 schema:productId "P1" . # V Passes as :Product

3 :p2 schema:productId "P2", "C2". # V Passes as :Product

5 :p3 schema:productId "P3", "C3", "X3" . # X Fails as :Product


6 # Cardinality exceeded

8 :p4 schema:name "No Id" . # X Fails as :Product


9 # No schema:productId
80 4. SHAPE EXPRESSIONS

11 :p5 schema:productId 5 . # X Fails as :Product


12 # xsd:string not satisfied

14 :p6 schema:productId "P6", 5 . # X Fails as :Product


15 # xsd:string not satisfied

Closing a property Triple constraints have an implicit meaning of closing the possible values
of a property. In the previous example, the declaration schema:productId xsd:string requires all
values of schema:productId to satisfy xsd:string. That’s why :p6 failed to conform: although it had
one string value, the other value wasn’t.
This behavior can be modified with the directives EXTRA and CLOSED that will be shown in
Section 4.6.8.

4.6.2 GROUPINGS
The EachOf operator combines two or more triple expressions. All the sub-expressions must be
satisfied by triples in the neighborhood of the focus node. EachOf is indicated by a semicolon
(;) in the compact syntax.

Example 4.26 A :User is defined by an EachOf expression that combines three triple con-
straints. A node satisfies the :User type if all the three triple constraints are satisfied.
1 :User {
2 schema:name xsd:string ;
3 foaf:age xsd:integer ;
4 schema:email xsd:string
5 }

4.6.3 CARDINALITIES
Cardinalities indicate the required number of triples satisfying the given constraint. They are
most often used on triple constraints although they can also be applied to more complex expres-
sions. Table 4.5 gives an overview of the different representations of cardinalities in ShExC.
If the cardinality is not specified, the default value is {1} (exactly one).

Example 4.27 Cardinalities example


The following :User shape declares that nodes must have exactly one value for schema:name
(default cardinality), and optional value for schema:worksFor and zero or more values for
schema:follows.
The :Company shape uses the explicit {m,n} syntax to assert that a matching node must have
between 1 and 100 employees and an optional schema:founder value.
4.6. SHAPES 81
Table 4.5: ShEx cardinalities
Value Description
* 0 or more
+ 1 or more
? 0 or 1
{m} Exactly m repetitions
{m,n} Between m and n repetitions
{m, } m or more repetitions

1 :User {
2 schema:name xsd:string ;
3 schema:worksFor IRI ? ;
4 schema:follows IRI *
5 }
6

7 :Company {
8 schema:founder IRI ?;
9 schema:employee IRI {1 ,100}
10 }

1 :alice schema:name "Alice"; # V Passes as :User


2 schema:follows :bob;
3 schema:worksFor :OurCompany .

5 :bob schema:name " Robert " ; # V Passes as :User


6 schema:worksFor :OurCompany .

8 :carol schema:name "Carol" ; # V Passes as :User


9 schema:follows :alice .

11 :dave schema:name "Dave" . # V Passes as :User

13 :emily schema:name "Emily" ; # X


Fails as :User
14 schema:worksFor :OurCompany, # more than one schema:worksFor
15 :OtherCompany .

17 :OurCompany schema:founder :dave ;


18 schema:employee :alice, :bob. # V Passes as :Company

20 :OtherCompany schema:founder :alice . # XFails as :Company


21 # 0 employees
82 4. SHAPE EXPRESSIONS
A cardinality can also be used on more general expressions indicating that the neighbor-
hood of a node must contain several groups of triples, each of them satisfying the expression.

Example 4.28 Cardinalities on expressions


The following shape declares that nodes must have exactly one value for schema:name and
that they can contain the combination of schema:givenName and schema:familyName with optional
cardinality (either they contain the group of both properties or none of them).
1 :User {
2 schema:name xsd:string ;
3 ( schema:givenName xsd:string ;
4 schema:familyName xsd:string ) ?
5 }

1 :alice schema:name "Alice" # V Passes as :User


2 .

4 :bob schema:name " Robert " ; # V Passes as :User


5 schema:givenName " Robert " ;
6 schema:familyName "Smith" .

8 :carol schema:name "Carol" ; # X Fails as :User


9 schema:givenName "Carol" .

4.6.4 CHOICES
The pipe or choice operator | can be used to declare compose complex triple expressions with
the meaning that one of the branches must be satisfied.

Example 4.29 OneOf operator


The following shape declares that nodes must have either schema:name or foaf:name, but not
both.
1 :User {
2 schema:name xsd:string |
3 foaf:name xsd:string
4 }

1 :alice schema:name "Alice" . # V Passes as :User

3 :bob foaf:name "Bob" ; # V Passes as :User


4 schema:identifier "P234" .

6 :carol schema:name "Carol" ; # X Fails as :User


4.6. SHAPES 83
7 foaf:name "Carol" . # More than one

9 :dave schema:identifier "P123" . # XFails as :User


10 # None provided

A typical pattern consists of combining OneOf (| operator) with EachOf (;) to form more
complex expressions.

Example 4.30
The following shape declares that nodes must have either one schema:name or a combination
of zero or more schema:givenName and one schema:lastName.
1 :User {
2 schema:name xsd:string |
3 ( schema:givenName xsd:string + ;
4 schema:familyName xsd:string
5 )
6 }

2 :alice schema:name "Alice" . # V Passes as :User

4 :bob schema:givenName "Bob" ; # V Passes as :User


5 schema:givenName "Bobby";
6 schema:familyName "Smith" .

8 :carol schema:name "Carol" ; # XFails as :User


9 schema:familyName "King" . # Can't have both

11 :dave schema:name 23 . # XFails as :User


12 # schema:name must be xsd:string

A typical pattern is to add some cardinality to an expression formed by the OneOf (|)
operator.

Example 4.31 Cardinality on OneOf expression


The following shape declares that nodes must have exactly one value for schema:productId
and that they can contain between 0 or two combinations of schema:isRelatedTo or
schema:isSimilarTo.

1 :Product {
2 schema:productId xsd:string ;
3 ( schema:isRelatedTo @:Product |
4 schema:isSimilarTo @:Product ){0 ,2}
5 }
84 4. SHAPE EXPRESSIONS

1 :p1 schema:productId "P1" ; # V Passes as :Product


2 schema:isRelatedTo :p2, :p3 .

4 :p2 schema:productId "P2" . # V Passes as :Product

6 :p3 schema:productId "P3"; # V Passes as :Product


7 schema:isRelatedTo :p1 ;
8 schema:isSimilarTo :p2 .

10 :p4 schema:productId "P4" ; # V Passes as :Product


11 schema:isRelatedTo :p1, :p2, :p3 .

4.6.5 NESTED SHAPES


It is possible to avoid defining two shapes when one of them is just an auxiliary shape that is not
needed elsewhere.

Example 4.32
The following schema declares that nodes conforming with :User must have a property
schema:name with xsd:string and another property schema:worksFor whose value must conform with
an anonymous shape _:1 which must have rdf:type with the value :Company.
1 :User {
2 schema:name xsd:string ;
3 schema:worksFor @_:1
4 }
5

6 _:1 { a [ :Company ] }

It can be rewritten as:


1 :User {
2 schema:name xsd:string ;
3 schema:worksFor {
4 a [ :Company ]
5 }
6 }

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:worksFor :OurCompany .

4 :bob schema:name " Robert " ; # V Passes as :User


5 schema:worksFor [ a :Company ] .

7 :carol schema:name "Carol" ; # X Fails as :User


4.6. SHAPES 85
8 schema:worksFor [ # The value of schema:worksFor
9 schema:name " AnotherCompany " # does not have rdf:type :Company
10 ].

12 :OurCompany a :Company . # V Passes as anonymous shape

Nested shapes can be used to emulate simple SPARQL property paths.

Example 4.33
1 :Grandson {
2 :parent { :parent . + }+ ;
3 }

1 :alice :parent :bob, :carol . # V Passes as :Grandson

3 :bob :parent :dave . # V Passes as :Grandson

5 :carol :parent :emily . # X Fails as :Grandson

7 :dave :parent :grace . # X Fails as :Grandson

9 :emily schema:name "Emily" . # X Fails as :Grandson

4.6.6 INVERSE TRIPLE CONSTRAINTS


The ^ operator reverses the order of the triple constraint. Instead of constraining the focus node’s
outgoing arcs, it constrains incoming arcs.

Example 4.34 Inverse triple constraints


The following code declares that nodes conforming to shape :Company must have rdf:type
:Company and must be the objects of one or more triples with predicate schema:worksFor and a
subject conforming to shape :User.
1 :User {
2 schema:name xsd:string
3 }
4

5 :Company {
6 a [ schema:Company ] ;
7 ^ schema:worksFor @:User +
8 }
86 4. SHAPE EXPRESSIONS
With the following data, node :Company1 conforms to :Company because there are two nodes,
:alice and :bob that work for it. However, node :Company2 does not conform because there are no
node pointing to it by the property schema:worksFor and node :Company3 also fails because the node
that works for it, does not conform to shape :User.
1 :alice schema:name "Alice"; # V Passes as :User
2 schema:worksFor :Company1 .

4 :bob schema:name "Bob" ; # V Passes as :User


5 schema:worksFor :Company1 .

7 :carol schema:worksFor :Company3 . # X


Fails as :User
8 # No schema:name

10 :Company1 a schema:Company . # V Passes as :Company

12 :Company2 a schema:Company . # XFails as :Company


13 # No one works for it

15 :Company3 a schema:Company . # XFails as :Company


16 # Carol works for it
17 # but does not conform to User

4.6.7 REPEATED PROPERTIES


The EachOf operator is different from a conjunction operator. This is best illustrated when a
shape uses the same property several times; we call this a repeated property. In Example 4.35,
the :User shape is an EachOf with three triple constraints, two of which have the same property
:parent. This shape is conformed by a node that has two arcs for the :parent property, each of
which contributes to satisfy one of the two triple constraints.

Example 4.35 Repeated properties


1 :User {
2 schema:name xsd:string ;
3 schema:parent {
4 schema:gender [ schema:Male ]
5 } ;
6 schema:parent {
7 schema:gender [ schema:Female ]
8 } ;
9 }

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:parent :bob, :carol .
4.6. SHAPES 87

4 :bob schema:gender schema:Male .


5 :carol schema:gender schema:Female .

7 :dave schema:name "Dave" ; # X Fails as :User


8 schema:parent :carol, :emily . # both parents are Female

10 :emily schema:gender schema:Female .

12 :frank schema:name "Frank"; # X Fails as :User


13 schema:parent :x . # only one parent

15 :x schema:gender schema:Female,
16 schema:Male .

Remember that ShEx distributes the triples to triple constraints in a triple expression (see
Section 4.6). This means the same triple cannot contribute for satisfying two different triple
constraints, even if its object satisfies the node constraints for both. That is why the node :frank
does not conform to the :User shape even if its parent satisfies both conditions.

4.6.8 PERMITTING OTHER TRIPLES


When defining RDF-based services using ShEx schemas, there are several possibilities that have
to be taken into account. Some services backed by an RDF triple store may simply accept and
store any triples not described in the schema; in such a case, the role of the schema is mainly
to identify and constrain the triples that the service understands and manipulates, allowing any
extra triples for unforeseen applications. This open model is more popular in the semantic web
community.
At the other extreme, some services or databases may accept or emit some fixed structure,
disallowing any triples that are not mentioned in the schema. In this case, the role of ShEx
schemas is to validate and verify the content before it is processed or published. This closed model
has been traditionally employed in contexts where data quality and security play a significant
part.
ShEx manages these use cases with two granularities:
• extra properties manage triples with predicates that appear in the shape expression but do
not have corresponding values; and
• closed shapes manage triples with predicates that do not appear in the shape expression.

Extra Properties
As we described in Section 4.6.1 triple constraints close properties by default. Sometimes, it is
useful to open a property to permit instances of it which are not included in the schema. The
EXTRA qualifier can be used to allow the appearance of other properties.
88 4. SHAPE EXPRESSIONS
A shape of the form
1 <Shape > EXTRA <property > {
2 <property > <NodeConstraint >
3 }

is equivalent to:
1 <Shape > {
2 <property > <NodeConstraint > ;
3 <property > (Not <NodeConstraint >)*
4 }

which means that it allows zero or more values of <property> that do not satisfy <NodeConstraint>.
Note that that there is a hidden negation in any shape that includes an EXTRA qualifier.

Example 4.36 EXTRA example


The following example declares that nodes that conform to :FollowSpaniards must follow
one of more nodes whose nationality is :Spain, but can also follow other nodes.
1 :FollowSpaniards EXTRA schema:follows {
2 schema:follows { schema:nationality [ :Spain ] }+
3 }

1 :alice schema:follows :david . # V Passes as :FollowSpaniards


2

3 :bob schema:follows :david , :emily . # V Passes as :FollowSpaniards


4

5 :carol schema:follows :emily . # X Fails as :FollowSpaniards


6

7 :david schema:nationality :Spain .


8 :emily schema:nationality :France .

Notice that in the case of :bob is passes although it follows :emily which is not Spaniard.
If we remove the EXTRA declaration it would fail.

A typical pattern using EXTRA declarations is to constrain the set of required values of a
node but to allow other values.

Example 4.37 EXTRA properties with several types


The following example declares the shapes for companies which must have two values for
the rdf:type predicate: schema:Organization and org:Organization. Shape :Company1 does not allow
any extra rdf:type arc, while shape :Company2 allows extra values.
4.6. SHAPES 89

1 :Company1 {
2 a [ schema:Organization ] ;
3 a [ org:Organization ]
4 }
5

6 :Company2 EXTRA a { # Allows extra values of rdf:type


7 a [ schema:Organization ] ;
8 a [ org:Organization ]
9 }

1 :OurCompany a org:Organization, # V Passes as :Company1 and :Company2


2 schema:Organization .

4 :OurUniversity a org:Organization, # X Fails as :Company1


5 schema:CollegeOrUniversity, # unexpected rdf:type
6 schema:Organization . # V Passes as :Company2

Closed Shapes
A shape can be declared to have only the triples matching a given set of triple constraints and
no others using the keyword CLOSED.

Example 4.38 CLOSED shape example


1 :User1 {
2 schema:name xsd:string ;
3 schema:knows IRI*
4 }
5

6 :User2 CLOSED {
7 schema:name xsd:string ;
8 schema:knows IRI*
9 }

1 :alice schema:name "Alice" ; # V Passes as :User1 and :User2


2 schema:knows :bob .

4 :bob schema:name "Bob" ; # V Passes as :User1


5 schema:knows :alice ; # X Fails as :User2
6 schema:age 23 . # unexpected schema:age

A common pattern is to combine CLOSED and EXTRA.

Example 4.39 CLOSED shapes


The shape KnowsW3CPeople
90 4. SHAPE EXPRESSIONS

1 :KnowsW3CPeople CLOSED EXTRA schema:knows {


2 schema:name xsd:string ;
3 schema:affiliation IRI ? ;
4 schema:knows { schema:affiliation [:W3C] }+
5 }

1 :alice schema:name "Alice" ; # V Passes as :KnowsW3CPeople


2 schema:affiliation :ACompany ;
3 schema:knows :bob .

5 :bob schema:name "Bob" ; # X Fails as :KnowsW3CPeople


6 schema:affiliation :W3C;
7 schema:knows :carol . # :carol's affiliation is not :W3C

9 :carol schema:name "Carol" ; # V Passes as :KnowsW3CPeople


10 schema:affiliation :ACompany ;
11 schema:knows :alice, :bob .

13 :dave schema:name "Dave" ; # V Passes as :KnowsW3CPeople


14 schema:knows :alice, :bob ;
15 schema:age 23 . # schema:age not allowed

4.7 REFERENCES
4.7.1 SHAPE REFERENCES
A node constraint can be a shape reference, which has the form @label where label is the identifier
of another shape expression in the schema. Shape expression reference would be a more precise
name but is long enough to be awkard.

Example 4.40 Shape references


1 :User {
2 schema:worksFor @:Company ;
3 }
4

5 :Company {
6 schema:name xsd:string
7 }

1 :alice a :User; # V Passes as :User


2 schema:worksFor :a .

4 :bob a :User; # X Fails as :User because :x fails as :Company


5 schema:worksFor :x .
4.7. REFERENCES 91

7 :a schema:name " CompanyA " . # V Passes as :Company

9 :x schema:name 23 . # X Fails as :Company

4.7.2 RECURSION AND CYCLIC REFERENCES


It is possible to define data models with cyclic references, i.e., shapes that recursively refer to
themselves either directly or indirectly. ShEx supports these kinds of data models which appear
frequently.

Example 4.41 Cyclic data model


The model depicted in Figure 4.8 can be specified in ShEx as:
1 :User {
2 schema:worksFor @:Company ;
3 }
4 :Company {
5 schema:name xsd:string ;
6 schema:employee @:User *
7 }

schema:worksFor
:User :Company

schema:name: xsd:string

schema:employee

Figure 4.8: Example of cyclic data model.

1 :alice schema:worksFor :OurCompany . # V Passes as :User

3 :bob schema:name " Robert "; # V Passes as :User


4 schema:worksFor :OurCompany .

6 :carol schema:worksFor :AnotherCompany . # V Passes as :User

8 :OurCompany schema:name " OurCompany " ; # V Passes as :Company


9 schema:employee :alice, :bob .

11 :AnotherCompany schema:name " AnotherCompany " . # V Passes as :Company


92 4. SHAPE EXPRESSIONS
Example 4.42 More complex cyclic model
As an exercise, we present a more complex cyclic data model in Figure 4.9. Although the
model has several cycles, it can be easily represented in ShEx as:
1 :University {
2 schema:name xsd:string ;
3 schema:employee @:Teacher +;
4 schema:course @:Course +
5 }
6

7 :Teacher {
8 a [ schema:Person ];
9 schema:name xsd:string ;
10 :teaches @:Course *
11 }
12

13 :Course {
14 schema:name xsd:string ;
15 :university @:University ;
16 :hasStudent @:Student +
17 }
18

19 :Student {
20 a [ schema:Person ];
21 schema:name xsd:string ;
22 schema:mbox IRI ;
23 :hasFriend @:Student * ;
24 :isEnroledIn @:Course *
25 }

Notice the separation between the types and shapes of nodes. Both :Teacher and :Student
must have rdf:type with value schema:Person, but their properties are different.

As can be seen, ShEx can model any kind cyclic or recursive model in a natural way. The
only restriction is when combining recursion with negation, as we will explain in Section 4.8.3
where the negation operator NOT is introduced.

4.7.3 EXTERNAL SHAPES


External shapes are an extension mechanism to externally define shapes. This is useful when we
want to describe functional shapes or very large value sets. As a practical example, in medical
schemas, value sets can be dynamically derived and include hundreds of thousands of terms. In
the FHIR use case (see Section 6.2), these are resolved using an emerging REST API for ShEx.
4.7. REFERENCES 93

:hasCourse
:University :Course

schema:name: xsd:string :university schema:name: xsd:string

:employee :teaches :hasStudent :inEnroledIn

:Teacher :Student

rdf:type [schema:Person] rdf:type [schema:Person]


schema:name: xsd:string schema:name: xsd:string
schema:mbox IRI

:hasFriend

Figure 4.9: Exercise to represent cyclic data model.

Example 4.43 External shape example


The following code declares an external shape for products where the value of
schema:category is defined as an external shape. In this case, an annotation declares the prop-
erty :service that points to the URL where the shape can be retrieved.
1 :Product {
2 schema:productId xsd:string ;
3 schema:category EXTERNAL // :service <http: // categories .org/>
4 }

Although at the time of this writing, the ShEx specification does not define a mechanism
like the :service above, it is expected that future mechanisms like that will be developed.

4.7.4 LABELED TRIPLE EXPRESSION


Much as shape references (Section 4.7.1) are allowed wherever a shape expression may appear,
any triple expression can be labeled so it can later be referenced.
The target triple expression must be labeled with $label and references are made with
&label.
For instance, if we want to share a name expression between :User and :Employee shapes,
we could include the expression in one and reference it from the other.
94 4. SHAPE EXPRESSIONS
Example 4.44 Labeled triple expression
1 :User {
2 $:name ( schema:name .
3 | schema:givenName . ;
4 schema:familyName .
5 ) ;
6 schema:email IRI
7 }
8 :Employee {
9 &:name ;
10 :employeeId .
11 }

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:email <mailto:alice@example .org > .

4 :bob schema:givenName " Robert " ; # V Passes as :Employee


5 schema:familyName "Smith" ;
6 :employeeId 1234567 .

The “\&:name” directive can be considered to insert the value of :name into its place. Logically,
:Employee is equivalent to this:

Example 4.45 Equivalent triple expression


1 :Employee {
2 ( schema:name .
3 | schema:givenName . ;
4 schema:familyName .) ;
5 :employeeId .
6 }

4.7.5 ANNOTATIONS
ShEx allows to provide annotations, which are lists of pairs (predicate,object) where predicate
is an IRI and object is any RDF node. Annotations provide additional information about the
elements to that they are applied, which can be triple constraints, EachOf, OneOf, or shapes.
The compact syntax for annotations uses two slashes // followed by a predicate and an
object.

Example 4.46 Shape with annotations


The following code declares a shape :User which must have a schema:name with a xsd:string
value, and a schema:birthDate with a xsd:date. Each triple constraint has its corresponding
rdfs:label and rdfs:comment annotations.
4.8. LOGICAL OPERATORS 95

1 :Person {
2 schema:name xsd:string
3 // rdfs:label "Name"
4 // rdfs:comment "Name of person " ;
5

6 schema:birthDate xsd:date
7 // rdfs:label " birthDate "
8 // rdfs:comment "Birth of date" ;
9 }

In this case, each triple constraint has its specific annotations which are internally repre-
sented as triples.

At the time of this writing ShEx does not have any built-in annotation vocabulary. It is ex-
pected that some specific annotations could be used for future uses like user interface generation
or any other use case.

4.8 LOGICAL OPERATORS


The logical operators AND, OR, and NOT can be used to form complex shape expressions. Their
meaning follows the conventional logical meaning of conjunction, disjunction, and negation.
The precedence of the operators is the usual one.

Table 4.6: Logical operators on shape expressions

Operation Description
AND S1 AND S2 is satisfied if and only if both are satisfied
OR S1 OR S2 is satisfied if and only if S1 or S2 (or both) are satisfied
NOT NOT S is satisfied if and only if S is not satisfied

4.8.1 CONJUNCTION
The AND operator forms a new shape expression from two shape expressions with the meaning
that a node conforms to S1 AND S2 if it conforms to both S1 and S2.

Example 4.47 Conjunction example


The following example expresses that :User nodes must satisfy two shape expressions at
the same time. Notice that the appearance of the repeated property schema:owns means that both
expressions must be satisfied, i.e., that the value of schema:owns must be an IRI and must have
shape :Product, which must have a property schema:productId whose value is a xsd:string between
5 and 10 characters.
96 4. SHAPE EXPRESSIONS

1 :User { schema:name xsd:string ; schema:owns IRI }


2 AND { schema:owns @:Product }
3

4 :Product {
5 schema:productId xsd:string AND MINLENGTH 5 AND MAXLENGTH 10
6 }

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:owns :product1 .

4 :bob schema:name " Robert " ; # X Fails as :User


5 schema:owns :product2, :product3 .

7 :carol schema:name "Carol" ; # X Fails as :User


8 schema:owns _:x .

10 :product1 schema:productId " Product1 " . # V Passes as :Product


11 :product2 schema:productId " Product2 " . # V Passes as :Product
12 :product3 schema:productId " Product3 " . # V Passes as :Product
13 :product4 schema:productId "P4" . # X Fails as :Product
14 _:x schema:productId " ProductX " . # V Passes as :Product

If the left-hand side of the conjunction is a node constraint, the AND keyword can be omit-
ted.

Example 4.48 Omitting ANDs


In the following schema, :User1 and :User2, and :Product1 and :Product2 are equivalent:
1 :User1 IRI AND { schema:name xsd:string }
2 :User2 IRI { schema:name xsd:string }
3

4 :Product1 { schema:productId xsd:string AND MINLENGTH 5 AND MAXLENGTH 10 }


5 :Product2 { schema:productId xsd:string MINLENGTH 5 MAXLENGTH 10 }

Reusing shape expressions A common situation is to declare a set of constraints that we want
to repeat.

Example 4.49 Reusing constraints


In the following example, we reuse :CompanyConstraints in two places (for schema:worksFor
and for schema:affiliation).
4.8. LOGICAL OPERATORS 97

1 :CompanyConstraints IRI /^ http: \/\/ example .org \/ id [0 -9]+/ @:CompanyShape


2

3 :User {
4 schema:name xsd:string ;
5 schema:worksFor @:CompanyConstraints ;
6 schema:affiliation @:CompanyConstraints
7 }
8

9 :CompanyShape {
10 schema:founder xsd:string ;
11 }

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:worksFor :id1 ;
3 schema:affiliation :id2 .

5 :id1 schema:founder " Robert " .

7 :id2 schema:founder "Carol" .

Another example of shape reuse is to extend a shape with more constraints emulating a
kind of inheritance as in Object-Oriented languages.

Example 4.50 Extending shapes


The following example declares a top-level shape :Person whose nodes must have rdf:type
with value schema:Person and schema:name. The shape :User extends :Person adding a new constraint
on the existing property schema:name and declaring the need of another property schema:email.
Finally, the shape :Student extends :User adding a new property :course.
1 :Person {
2 a [ schema:Person ] ;
3 schema:name xsd:string ;
4 }
5

6 :User @:Person AND {


7 schema:name MaxLength 20 ;
8 schema:email IRI
9 }
10

11 :Student @:User AND {


12 :course IRI *;
13 }

1 :alice a schema:Person ; # V Passes as :Person


2 schema:name "Alice" .
98 4. SHAPE EXPRESSIONS

4 :bob schema:name " Robert "; # X Fails as :User


5 schema:email <bob@example .org > . # lacks rdf:type :Person

7 :carol a schema:Person ; # V Passes as :Person and :User


8 schema:name "Carol" ;
9 schema:email <carol@example .org > .

11 :dave a schema:Person ; # V
Passes as :Person, :User and Student
12 schema:name "Carol" ;
13 schema:email <carol@example .org >;
14 :course :algebra .

Notice that this kind of reuse requires the shapes extended to be compatible with the new
ones. Otherwise, there will be no nodes satisfying them.
For example, we may want to declare a :Teacher shape extending :User but adding the
constraint that teachers have no email.
1 :Teacher @:User AND {
2 schema:email . {0 ,0} ;
3 }

However, there will be no nodes satisfying it, because shape :User prescribes that they must
have exactly one schema:email, while the extended shape :Teacher prescribes that they must have
no schema:email.
In order to obtain the desired model, it is necessary that the shapes to be extended are
general enough to be compatible with the new shapes. In this case, for example, it would be
better to declare that the cardinality of schema:email in :User was optional.

4.8.2 DISJUNCTION
The Or operator combines two shape expressions with an inclusive disjunction, i.e., either one
side or the other, or both must be satisfied.

Example 4.51 Disjunction


The following example declares that nodes of shape :User must have either a schema:name
with xsd:string value or a combination of schema:givenName and schema:familyName with xsd:string
values, or both.
1 :User { schema:name xsd:string }
2 OR { schema:givenName xsd:string ;
3 schema:familyName xsd:string
4 }
4.8. LOGICAL OPERATORS 99

1 :alice schema:name "Alice" . # V Passes as :User

3 :bob schema:givenName " Robert "; # V Passes as :User


4 schema:familyName "Smith" .

6 :carol schema:name "Carol King" ; # V Passes as :User


7 schema:givenName "Carol";
8 schema:familyName "King" .

Example 4.52 Difference between Or and |


There is a difference between the Or and the choice (|) operator. The former defines an
inclusive-or, while the latter specifies an exclusive-or in this case (only one of the shape expres-
sions must be satisfied, but not both).
1 :User1 { schema:name xsd:string }
2 OR { schema:givenName xsd:string ;
3 schema:familyName xsd:string
4 }
5

6 :User2 { schema:name xsd:string


7 | schema:givenName xsd:string ;
8 schema:familyName xsd:string
9 }

1 :alice schema:name "Alice" . # V Passes as :User1 and :User2

3 :bob schema:givenName " Robert "; # V Passes as :User1 and :User2


4 schema:familyName "Smith" .

6 :carol schema:name "Carol King" ; # V Passes as :User1


7 schema:givenName "Carol"; # X Fails as :User2
8 schema:familyName "King" .

10 :dave schema:name "Dave" ; # V Passes as :User1


11 schema:givenName "Dave" . # X Fails as :User2

Example 4.53 Disjunction of datatypes


A common use case is to declare that the value of some property is the disjunction
of several datatypes or value sets. The following example declares that products must have a
rdfs:label with a string value or a language tagged literal (remember that those literal have type
rdf:langString), and a schema:releaseDate whose values must be either xsd:date, xsd:gYear or one of
the values "unknown-past" or "unknown-future".
100 4. SHAPE EXPRESSIONS

1 :Product {
2 rdfs:label xsd:string OR rdf:langString ;
3 schema:releaseDate xsd:date OR xsd:gYear OR
4 [ "unknown -past" "unknown - future " ]
5 }

1 :p1 a :Product ; # V Passes as :Product


2 rdfs:label " Laptop ";
3 schema:releaseDate "1990"^^ xsd:gYear .

5 :p2 a :Product ; # V Passes as :Product


6 rdfs:label "Car"@en ;
7 schema:releaseDate "unknown - future " .

9 :p3 a :Product ; # X Fails as :Product


10 rdfs:label :House ;
11 schema:releaseDate "2020"^^ xsd:integer .

Emulating recursive property paths SPARQL property paths are a very expressive feature
that can define complex expressions. ShEx does not support property paths in order to have a
more controlled way to define shapes. However, using nested shapes (see Example 4.33), recur-
sion and logical operators, it is possible to emulate their behavior.

Example 4.54 SHACL instance of Person


In SHACL, instances are declared by the expression rdfs:subClassOf*/rdf:type, which de-
fines the closure of the rdfs:subClassof property followed by rdf:type (see Section 5.7.2). The
following example declares that nodes conforming to shape :Person must be SHACL instances
of schema:Person.
1 :Person { a @:PersonShape }
2

3 :PersonShape [ schema:Person ] OR { rdfs:subClassOf @:PersonShape }

1 :alice a schema:Person . # V Passes as :PersonInstance

3 :bob a :Teacher . # V Passes as :PersonInstance

5 :carol a :Assistant . # V Passes as :PersonInstance

7 :Teacher rdfs:subClassOf schema:Person .


8 :Assistant rdfs:subClassOf :Teacher .
4.8. LOGICAL OPERATORS 101
4.8.3 NEGATION
NOT s creates a new shape expression from a shape s. Nodes conform to NOT s when they do not
conform to s.

Example 4.55 Not


1 :NoName Not {
2 schema:name .
3 }

1 :alice schema:givenName "Alice" ; # V Passes as :NoName


2 schema:familyName " Cooper " .

4 :bob schema:name " Robert " . # X Fails as :NoName

6 :carol schema:givenName "Carol" ; # X Fails as :NoName


7 schema:name "Carol" .

A common use case for Not is to check other shapes. Defining a shape :NotS as Not :S, all
nodes in an RDF graph can be valid, some of them will conform to :S while the others will
conform to :NotS. In this way, a continuous integration system can define the shape map that all
nodes must satisfy (either positive or negatively) and check whether they satisfy it or not.

Example 4.56 Not


The following code declares a shape :User and its complementary :NotUser.
1 :User {
2 schema:name xsd:string ;
3 schema:birthDate xsd:date ? ;
4 }
5

6 :NoUser Not @:User .

Both nodes :alice and :bob conform to one of the shapes, :alice to :User and :bob to :NoUser.
1 :alice schema:name "Alice" ; # V
Passes as :User
2 schema:birthDate "1980 -03 -10"^^ xsd:date .

4 :bob schema:name 23 ; # V Passes as :NoUser


5 schema:birthDate " Unknown " .
102 4. SHAPE EXPRESSIONS
Difference between Not and Max-cardinality 0 The operator Not checks that a node fails
to conform to a whole shape expression. Sometimes, the intended meaning is not to negate a
whole shape expression but to declare that some properties cannot appear. This behavior is better
described by declaring the maximum cardinality to 0.

Example 4.57 Difference between Not and Max-0


Shape :NoName1 prohibits the appearance of property schema:name establishing its maximum
cardinality to 0. Shape :NoName2 looks like it does the same thing using the negation. However,
notice that :NoName2 will be satisfied by any node that does not conform to schema:name xsd:string
1 :NoName1 {
2 schema:name xsd:string {0}
3 }
4

5 :NoName2 Not {
6 schema:name xsd:string
7 }

The behavior differs for node :bob which conforms to :NoName2. The reason is that it fails to
have a string value for schema:name so it fails to conform to the shape {schema:name xsd:string} and
thus, conforms to :NoName2.
1 :alice schema:name "Alice". # X Fails as :NoName1 and :NoName2

3 :bob schema:name 23 . # X Fails as :NoName1 V Passes as :NoName2

5 :carol foaf:age 34 . # V Passes as :NoName1 V Passes as :NoName2

IF-THEN pattern A common pattern is the IF-THEN construct: if some condition holds,
then a given shape expression must be satisfied.
This pattern can be modeled using the logical operators OR and NOT. Remember that IF x
THEN y is equivalent to (NOT x)OR y.

Example 4.58 IF-THEN pattern example


The following example specifies that all products must have a schema:productID and if
a product has type schema:Vehicle, then it must have the properties schema:vehicleEngine and
schema:fuelType.

1 :Product { schema:productID . } AND


2 NOT { a [ schema:Vehicle ] }
3 OR { schema:vehicleEngine . ;
4 schema:fuelType .
5 }
4.8. LOGICAL OPERATORS 103

1 :kitt schema:productID "C21"; # V Passes as :Product


2 a schema:Vehicle ;
3 schema:vehicleEngine :x42 ;
4 schema:fuelType :electric .

6 :bad schema:productID "C22"; # X Fails as :Product


7 a schema:Vehicle ;
8 schema:fuelType :electric .

10 :c23 schema:productID "C23" ; # V Passes as :Product


11 a schema:Computer .

IF-THEN-ELSE pattern The IF-THEN-ELSE pattern construct can be defined in a similar way.
In this case:
IF X THEN Y ELSE Z  ((NOT X) OR Y) AND (X OR Z)

Example 4.59 IF-THEN-ELSE pattern example


The following shape declares that if a product has type schema:Vehicle, then it must have
the properties schema:vehicleEngine and schema:fuelType, otherwise, it must have the property
schema:category with a xsd:string value.

1 :Product (
2 NOT { a [ schema:Vehicle ] } OR
3 { schema:vehicleEngine . ;
4 schema:fuelType .
5 }
6 ) AND ({ a [ schema:Vehicle ] } OR
7 { schema:category xsd:string } )

With the following data, nodes :kitt and :c23 conform to :Product each one passing one
of the branches, while :bad1 and :bad2 do not conform.
1 :kitt a schema:Vehicle ; # V Passes as :Product
2 schema:vehicleEngine :x42 ;
3 schema:fuelType :electric .

5 :c23 a schema:Computer ; # V Passes as :Product


6 schema:category " Laptop " .

8 :bad1 a schema:Vehicle ; # X Fails as :Product


9 schema:fuelType :electric .

11 :bad2 a schema:Computer . # X Fails as :Product


104 4. SHAPE EXPRESSIONS
Restriction on cyclic dependencies with negation One problem of combining recursion with
negation freely is the possibility of defining paradoxical shapes.

Example 4.60 Barber’s paradox


The following shape declares a :Barber as someone who shaves a person but does not shave
a barber.
1 :Barber { # Violates the negation requirement
2 :shaves @:Person
3 } AND NOT {
4 :shaves @:Barber
5 }
6

7 :Person {
8 schema:name xsd:string
9 }

Given the following data:


1 :albert :shaves :dave . # V Passes as :Barber

3 :bob schema:name " Robert " ; # VPasses as :Person


4 :shaves :bob . # Passes :Barber or not?

6 :dave schema:name "Dave" . # V Passes as :Person

It is easy to check that :bob conforms to :Person (he has schema:name with a xsd:string value),
so he shaves a person, but:
Does :bob conform to :Barber?
If we assume he does, then it should not shave another barber, but as he shaves himself,
and we assumed he conformed to :Barber then he fails the constraint of not shaving barbers which
means that he should not conform. On the other hand, if we assumed he does not conform to
:Barber then he satisfies both constraints, and he should conform to :Barber.

This kind of problems that arise when combining negation and recursion have been studied
by the logic programming and databases community. Several approaches have been studied such
as negation-as-failure, stratified negation and well-founded semantics [1].
ShEx imposes a constraint to avoid ill formed data models: whenever a shape refers to
itself either directly or indirectly, the chain of references cannot traverse an occurrence of the
negation operation NOT.
The previous shape :Barber violates the negation requirement as is has one self reference
pointing to itself that includes a negation. More formally, we say that there is a dependency
from :ShapeA to :ShapeB if the definition of :ShapeA contains a reference @:ShapeB.
We say that a dependency from :ShapeA to :ShapeB is a negative dependency if at least one
of the following holds:
4.9. SHAPE MAPS 105
• the occurrence of @:ShapeB in the definition of :ShapeA appears under an occurrence of the
negation operator NOT; and

• there is a triple constraint :prop @:ShapeB in the definition of :ShapeA and the property :prop
is declared as EXTRA in the corresponding triple expression.

In the latter case, the negation operator NOT does not appear explicitly, but we still need to
verify that a :ShapeB is not satisfied in some neighbor nodes. This was called hidden negation in
Section 4.6.8.

4.9 SHAPE MAPS


The ShEx 2 specification is focused on the semantics of the validation language and separates
the invocation mechanisms to a different specification called Shape Maps [77]. They were al-
ready introduced in Section 4.4.2 and are node/shape associations that are used as input to the
validation process and are also the result of it.
In ShEx, the construction of shape maps is orthogonal to their use in validation. Decou-
pling these processes enables ShEx to address a wide range of use cases. Just as XML Schema
could not have predicted its use in WSDL (a protocol that was developed years later), it is im-
possible to predict the many and varied ways in which shape maps may be constructed in the
future.
The current ShapeMap specification defines three kinds of shape map.

• Fixed shape map: input to the validation process.

• Query shape map: query mechanism to construct a fixed shape map.

• Result shape map: result of validation.

Each of these consists of a comma-separated list of node/shape associations with at least


two components.

• nodeSelector - identify a set of RDF nodes.

• shapeLabel - select a shape expression from the schema.

The simplest kind of shape map is a fixed shape map.

4.9.1 FIXED SHAPE MAPS


ShEx validation takes as input a set of nodeSelector/shapeLabel pairs called a fixed shape map.
The shapeLabel is either the label for a shape expression in the schema or the case-insensitive
keyword START to identify the start shape (see Section 4.4.4).
For the fixed shape map, the nodeSelector is one of:
106 4. SHAPE EXPRESSIONS
• an RDF IRI,
• an RDF literal, or
• for systems which support it, the label of a bnode in an RDF dataset.
Note that because the shapeLabel can identify a shape expression with only node con-
straints, one can use ShEx to valdiate RDF terms that do not appear in the graph. This can be
useful for testing membership in a value set or verifying the form of a URL.
Fixed shape maps have a compact syntax which consists of separating each shape associ-
ation by comma and node selectors from shape labels by @:
1 : alice@ :User ,
2 : alice@ :Employee ,
3 :bob@:User

4.9.2 QUERY SHAPE MAPS


The query shape map extends the fixed shape map to enable simple pattern matching to select
focus nodes from the data graph. This is done by permitting the node selectors to be either an
RDF node as in a fixed map or a triple pattern. A triple pattern can have a focus keyword to
represent the nodes that will be validated and a node or wildcard (represented by the underscore
character _).

Example 4.61 Query shape map example


The shape map:
1 { FOCUS schema:worksFor _ } @:User
2 { FOCUS rdf:type schema:Person }@:User ,
3 { _ schema:worksFor FOCUS } @:Company

associates all subjects of property schema:worksFor and all nodes of type schema:Person with :User,
and all objects of property schema:worksFor with shape :Company.
Any node in the data graph which is both of type schema:Person and the subject of a
schema:worksFor triple would be selected by both triple patterns and associated with :User in the
fixed map. Such duplicates are eliminated in accordance with the rule that a shape map can have
no duplicate pairs of nodeSelector and shapeLabel.

While the nodeSelector may be a triple pattern, it may also be an RDF node as we would
see in a fixed shape map. Common idioms of query map can do the following.
• Explicitly bind nodes to shapes. This effectively adds one nodeSelector/shapeLabel pair to
the shape map. This mechanism is employed in SHACL with the declaration sh:targetNode
(see Section 5.7).
4.9. SHAPE MAPS 107
Fixed Shape Map
:alice@:User,
{FOCUS schema:worksFor _}@:User ShapeMap :bob@:User,
{FOCUS rdf:type schema:Person _}@:User, Resolver :Carol@:User,
{_ schema:worksFor FOCUS }@:Company :c1@:Company,
:c2@:Company,

RDF Graph
:alice a :User .

:bob schema:worksFor :c1,


:c2 .

:carol a :User ;
schema:worksFor :c1 .

Figure 4.10: Shape map resolution which accepts a query shape map and emits a fixed shape map.

• Declare that all nodes with some property must match a given shape. This mechanism is
also defined in SHACL with the declarations sh:targetSubjectsOf and sh:targetObjectsOf.
• Select nodes with a given property and value. This refinement of the previous approach is
especially useful for general-purpose predicates like rdf:type. In fact, the SHACL directive
sh:targetClass offers a similar selection mechanism for the rdf:type predicate (the difference
is that SHACL uses the notion of SHACL instance), see 5.7.2). As with the above selectors,
this one is very use-case specific—one may not want to say that everything with an rdf:type
property should be validated against a :Person, but it may be reasonable to select everything
with type :Employee.
While it is not currently part of the shape map specification, the Wikidata use of shape
maps extends the nodeSelector to contain a SPARQL query, enabling another common use case.
• Select nodes or node/shape pairs by SPARQL query or inference. Where earlier mech-
anisms are all limited to either a direct identification of an RDF node or its selection by
triple pattern, this one enables a more nuanced heuristics in the selection of focus nodes.
Query shape maps are not the only way to select focus nodes. For instance, it would make
sense to associate a shape with a service endpoint. The Linked Data Platform [93] defines a
notion of container which handles requests to get, create, modify and delete objects with a given
structure. While it does not specify a mechanism to publish that structure or validate incoming
data against it, earlier work at OSLC used Resource Shapes for that purpose. It is reasonable to
assume that protocols like the linked data platform will exploit shapes technology, perhaps with
the added precision of using HTTP Link headers to specify a node of interest, which would be
associated with the related shape with that interface.
108 4. SHAPE EXPRESSIONS
4.9.3 RESULT SHAPE MAPS
The product of validation is a result shape map which is annotated with errors encountered while
testing the conformance of each node/shape pair. The result shape map is again an extension of
the fixed map. Each nodeSelector/shapeLabel association in the result shape map may include
any of these three additional components:
• result: either conformant or nonconformant;
• reason: a human-readable report, usualy to explain a non-conformant result; or
• appInfo: a machine readable structure.
Engines vary in how they report errors, and they may add extra information to the resulting
shape map. Some implementations extend this to include machine-readable failure messages in
case of errors or recursive proof of conformance in case of success.
Example 4.62 Full validation process
Given the following ShEx schema:
1 :User {
2 schema:name xsd:string ;
3 schema:knows @:User *
4 }

and the RDF data:


1 :alice schema:name "Alice";
2 schema:knows :carol .

4 :bob schema:name " Robert ";


5 schema:knows :carol .

7 :carol schema:name "Carol" .

If we have the query shape map:


1 {FOCUS schema:knows _ } @:User

A shape map resolver would generate the fixed shape map:


1 :alice@:User,
2 :bob@:User

After applying the validation process, the result shape map obtained would be:
1 :alice@:User,
2 :bob@:User,
3 :carol@:User

Figure 4.11 depicts a whole validation process with the different shape maps involved.
4.9. SHAPE MAPS 109
ShExSchema
:User {
schema:name xsd:string ;
Query Shape Map schema:knows @:User
}
{FOCUS schema:knows _}@:User

Result Shape Map


Fixed Map
ShapeMap ShEx :alice@:User,
:alice@:User, :bob @:User,
Resolver Validator
:bob @:User :Carol@:User,

RDF Graph
:alice schema:name "Alice";
schema:knows :carol .

:bob schema:name "Robert" ;


schema:knows :carol .

:carol schema:name "Carol" .

Figure 4.11: Full validation process with query, fixed, and result shape map.

4.9.4 JSON REPRESENTATION


The fixed shape map from Figure 4.11 can be represented as:2
1 [ { " nodeSelector ": ": alice ",
2 " shapeLabel ": ":User"
3 },
4 { " nodeSelector ": ":bob",
5 " shapeLabel ": ":User"
6 }
7 ]

The output shape map would be:


1 [ { " nodeSelector ": ": alice ",
2 " shapeLabel ": ":User",
3 " status ": " conformant "
4 },
5 { " nodeSelector ": ":bob",
2 At the time of this writing shape maps specification requires full IRIs but we use prefixed IRIs for simplicity.
110 4. SHAPE EXPRESSIONS
6 " shapeLabel ": ":User",
7 " status ": " conformant "
8 },
9 { " nodeSelector ": ": carol ",
10 " shapeLabel ": ":User",
11 " status ": " conformant "
12 }
13 ]

4.9.5 CHAINING VALIDATION WORKFLOWS


Because the input and output of the validation process is a shape map, long-running workflows
can use the result shape map as a starting state for further validation. This is useful when shapes
have inter-dependencies, i.e., when validating one node/shape pair requires validating others.
Let’s look at a simplified subset of that schema and data.

Example 4.63 ShEx validator and shape maps


Given the following schema:
1 :User {
2 schema:name xsd:string ;
3 schema:knows @:User *
4 }

and RDF graph


1 :alice schema:name "Alice";
2 schema:knows :bob .

4 :bob schema:name " Robert " .

If we were to individually validate :alice and :bob, we would validate :bob twice, once while
validating :alice’s schema:knows arc and once for the explicit call to validate :bob.

4.10 SEMANTIC ACTIONS


Semantic actions3 serve as an extension point for Shape Expressions. They can be used to signal
a failure or perform some operations during the validation process.
A semantic action contains a label that indicates the language in which the action is written
and a string with its contents. When the ShEx validator finds a semantic action, it checks if it
3 The name semantic actions is inspired by parser generators. It is not related to the semantic web.
4.11. SHEX AND INFERENCE 111
has a processor for that language and calls it with the action contents. The result of the processor
is cast to a Boolean value, in case the result is false, the corresponding shape would fail.

Example 4.64 Semantic actions


The following example uses a hypothetical Javascript semantic actions processor to capture
the start and end events in a conference and to check that the start date is before the end date.
1 prefix js: <http: // shex.io/ extensions /javascript >
2

3 :Event {
4 schema:startDate xsd:dateTime %js:{ let start = o %} ;
5 schema:endDate xsd:dateTime %js:{ let end = o %} ;
6 }

The following example checks that the declared area of a rectangle is effectively its width
times height.
1 prefix js: <http: // shex.io/ extensions /javascript >
2

3 :Rectangle {
4 :height xsd:float %js:{ let height = o %} ;
5 :width xsd:float %js:{ let width = o %} ;
6 :area xsd:float %js:{ o = height * width %}
7 }

Semantic actions have been employed to transform RDF files to other formats like XML
or JSON [80], or even other ShEx schemas as performed by the Map extension.4
The test suite defines a single extension language called Test5 that can fail a validation
and/or return a message.

4.11 SHEX AND INFERENCE


ShEx was designed as an RDF validation language which is independent of reasoners or infer-
ence systems. A ShEx processor takes as input an RDF graph and checks if its nodes conform
to the shapes defined in a ShEx schema. The shapes describe the topology of the RDF graph
taking into account the possible values of nodes as well as the incoming and outgoing arcs. In
ShEx, a triple whose predicate is rdf:type is treated as any other triple, and in fact there is no
special treatment for nodes that are also RDF classes. ShEx separates RDF classes and types
following the guidelines described in Section 3.2.
This independence between ShEx and reasoners makes it possible to apply a ShEx pro-
cessor to a plain RDF graph before inference, to validate the resulting graph after applying a
4 http://shex.io/extensions/Map/
5 http://shexspec.github.io/extensions/Test/
112 4. SHAPE EXPRESSIONS
reasoner, or even to validate the intermediate graphs during the reasoning phase, checking rea-
soner’s behavior.

Example 4.65 Validating data before and after inference


The following shapes can be used to check an RDF graph before and after RDF Schema
inference. Shape :TeacherBefore describes that nodes must have rdf:type :Teacher, a property
schema:name with a xsd:string value and zero or more properties :teaches whose nodes must con-
form to :Course.
Shape :TeacherAfter describes the shape that teachers must have after inference. For exam-
ple, they must have rdf:type :Teacher and :Person, and the values of property :teaches must have
rdf:type :Course.

1 :TeacherBefore EXTRA a {
2 a [ :Teacher ]? ;
3 schema:name xsd:string ;
4 :teaches @:Course *
5 }
6

7 :TeacherAfter EXTRA a {
8 a [ :Teacher ];
9 a [ :Person ];
10 schema:name xsd:string ;
11 :teaches { a [ :Course ] } @:Course
12 }
13

14 :Course {
15 a [ :Course ]?
16 }

If we validate the following RDF data before applying inference, nodes :bob and :carol do
not conform to shape :TeacherAfter
1 :alice a :Teacher , :Person ; # V Passes as :TeacherBefore
2 schema:name "Alice" ; # V Passes as :TeacherAfter
3 :teaches :algebra .
4

5 :bob schema:name " Robert " ; # V Passes as :TeacherBefore


6 :teaches :logic . # X Fails as :TeacherAfter
7

8 :carol a :Teacher ; # V Passes as :TeacherBefore


9 schema:name "Carol" . # X Fails as :TeacherAfter
10

11 :algebra a :Course .
12 :teaches rdfs:domain :Teacher .
13 :teaches rdfs:range :Course .
14 :Teacher rdfs:subClassOf :Person .
4.12. IMPORTING SCHEMAS 113
On the other side, if we validate the previous RDF graph after applying RDF Schema
inference, both :bob and :carol should conform to :TeacherAfter.
This combination of shapes before and after inference can be used to check the behavior
of a reasoner. For example, if in the previous case, a faulty RDFS reasoner does not infer that
:logic must have rdf:type :Course, :bob would not conform to :TeacherAfter and the bug could be
detected.

4.12 IMPORTING SCHEMAS


ShEx has an import keyword that specifies the IRI of another schema that can be imported. The
ShEx processor puts the labeled shapes and triple expressions of the imported schema in scope
for resolution of references in the importing document. If the imported schema imports other
schemas, they are also imported.
Example 4.66 Import example
For example, if there is a schema located at http://example.org/Person.shex with the content.
1 :Person {
2 $:name ( schema:name .
3 | schema:givenName . ; schema:familyName .
4 ) ;
5 schema:email .
6 }

And we define a new schema as.


1 import <http: // example .org/ Person .shex >
2

3 :Employee {
4 &:name ;
5 schema:worksFor <CompanyShape >
6 }
7

8 :Company {
9 schema:employee @:Employee ;
10 schema:founder @:Person ;
11 }

1 :alice schema:name "Alice"; # V Passes as :Employee


2 schema:worksFor :OurCompany .

4 :OurCompany schema:employee :alice ;


5 schema:founder :bob .

7 :bob schema:name " Robert " ;


8 schema:email <mailto:bob@example .com > .
114 4. SHAPE EXPRESSIONS
The ShEx processor imports each imported schemas exactly once so cyclic imports are
allowed. For instance, a schema may import itself or it may import some schema which directly
or indirectly imports it.
However, it is an error to import a schema which attempts to re-define a shape expres-
sion or triple expression. For instance, if http://example.org/Person.shex defined either :Employee or
:Company, or if the importing schema defined :name, the import would fail and processing would
stop.

4.13 RDF AND JSON-LD SYNTAX


The ShEx language is defined in terms of a JSON-LD syntax, called “ShExJ”, which separates
the compact syntax details from the language specification. This serves as an abstract syntax in
that it has constructs to capture all of the logic of ShEx. Having an abstract syntax provides a
clear definition of the language, makes it easier to write language processors and encourages the
definition of other concrete syntax formats. The fact that it is JSON-LD means that the RDF
representation of ShEx, called “ShExR”, is simply the JSON-LD interpretation of ShExJ.

Example 4.67
The following ShEx schema
1 PREFIX : <http: // example .org/>
2 PREFIX schema: <http: // schema .org/>
3 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
4

5 :User IRI {
6 schema:name xsd:string ;
7 schema:knows @:User *
8 }

can be represented in ShExR as6 :


1 PREFIX sx: <http: // shex.io/ns/shex#>
2 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
3 prefix : <http: // example .org/>
4 prefix schema: <http: // schema .org/>

6 <> a sx:Schema ;
7 sx:shapes :User .

9 :User a sx:ShapeAnd ;
10 sx:shapeExprs (
11 [ a sx:NodeConstraint ;
12 sx:nodeKind sx:iri ]
13 [ a sx:Shape ;
6 Note that a value of -1 in max means unbounded.
4.13. RDF AND JSON-LD SYNTAX 115
14 sx:expression [
15 a sx:EachOf ;
16 sx:expressions (
17 [ a sx:TripleConstraint ;
18 sx:predicate schema:name ;
19 sx:valueExpr [
20 a sx:NodeConstraint ;
21 sx:datatype xsd:string
22 ]
23 ]
24 [ a sx:TripleConstraint ;
25 sx:predicate schema:knows ;
26 sx:valueExpr :User;
27 sx:min 0 ;
28 sx:max -1
29 ] )
30 ] ] ).

It can can also be represented in JSON-LD as:

1 { " @context ": " https :// shexspec.github.io / context.jsonld ",


2 "type": " Schema ",
3 " shapes ": [
4 { "type": " ShapeAnd ",
5 " shapeExprs ": [
6 { "type": " NodeConstraint ",
7 " nodeKind ": "iri"
8 },
9 { "type": " Shape ",
10 " expression ": {
11 "type": " EachOf ",
12 " expressions ": [
13 { "type": " TripleConstraint ",
14 " predicate ": "http :// schema.org /name",
15 " valueExpr ": { "type": " NodeConstraint ",
16 " datatype ": "xsd: string "
17 }
18 },
19 { "type": " TripleConstraint ",
20 " predicate ": "http :// schema.org / knows ",
21 " valueExpr ": "http :// example.org /User",
22 "min": 0,
23 "max": -1
116 4. SHAPE EXPRESSIONS
24 }
25 ]
26 }
27 }
28 ],
29 "id": "http :// example.org /User"
30 }
31 ]
32 }

4.14 SUMMARY
In this chapter we learned about the ShEx language.

• ShEx was designed as a human-readable language for RDF description and validation.

• ShEx can be considered as a grammar for RDF.

• There are two syntaxes for ShEx: A compact syntax and an RDF-based.

• ShEx defines the notion of shape expressions and node constraints.

• Shape Expressions can be combined using the logical operators: AND, OR, and NOT on
top of triple expressions.

• Triple expressions declare the topology of the neighborhood of a node (incoming and
outgoing edges).

• Node constraints declare constraints on the form of a single node.

• Semantic actions offer an extension mechanism over ShEx.

4.15 SUGGESTED READING


We collected the following selection of references about Shape Expressions.

• Short introduction to ShEx: T. Baker and E. Prud’hommeaux. Shape Expressions (ShEx)


Primer. https://shexspec.github.io/primer/, April 2017

• ShEx 2.0 language specification: E. Prud’hommeaux, I. Boneva, J. E. Labra Gayo, and


G. Kellog. Shape expressions language 2.0. https://shexspec.github.io/spec/,
April 2017
4.15. SUGGESTED READING 117
• Description of the first version of ShEx: E. Prud’hommeaux, Jose E. Labra Gayo, and
H. R. Solbrig. Shape expressions: An RDF validation and transformation language. In
Proc. of the 10th International Conference on Semantic Systems, SEMANTICS, pages 32–40,
ACM, 2014. DOI: 10.1145/2660517.2660523
• An algorithm to implement Shape Expressions based on derivatives: Jose E. Labra Gayo,
E. Prud’hommeaux, I. Boneva, S. Staworko, H. Solbrig, and S. Hym. Towards an RDF
validation language based on regular expression derivatives. http://ceur-ws.org/Vol-
1330/paper-32.pdf

• Theoretical foundations of ShEx: S. Staworko, I. Boneva, Jose E. Labra Gayo, S. Hym,


E. G. Prud’hommeaux, and H. R. Solbrig. Complexity and expressiveness of ShEx for
RDF. In 18th International Conference on Database Theory, ICDT, volume 31 of LIPIcs,
pages 195–211, Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 2015 http://la
bra.github.io/pdf/2015_ComplexityExpressivenessShEx.pdf

• Well-founded semantics of shape schemas (which are the basis of ShEx): I. Boneva, J. E.
Labra Gayo, and E. Prud’hommeaux. Semantics and validation of shapes schemas for
RDF. In International Semantic Web Conference, 2017 https://labra.github.io/pdf/
2017_SemanticsValidationShapesSchemas.pdf
CHAPTER 5

SHACL
Shapes Constraint Language (SHACL) has been developed by the W3C RDF Data Shapes
Working Group, which was chartered in 2014 with the goal to “produce a language for defining
structural constraints on RDF graphs [6].”
The first public working draft was published in October 2015 and it was proposed as a
W3C Recommendation in June 2017.1
SHACL was influenced by SPIN, and some parts from OSLC resource shapes and ShEx.
At the beginning of the Working Group activity it was considered that SHACL was going to
be an integration of all the validation approaches into a unified language. However, due to core
differences, SHACL and ShEx did not converge. Chapter 7 contains a comparison of both
languages and describes the main differences.
SHACL is divided in two parts. The first part, called SHACL Core, describes a core
RDF vocabulary to define common shapes and constraints while the second part describes an
extension mechanism in terms of SPARQL and has been called: SHACL-SPARQL.
Two working group notes have been published to extend SHACL with (a) advanced fea-
tures such as rules and complex expressions2 and (b) to enable the definition of constraint com-
ponents in Javascript (called SHACL-Javascript).3
A W3C SHACL community group4 has been created to continue working on SHACL
preparing educational contents and supporting SHACL adoption. A working group note was
also suggested for a SHACL Compact Syntax5 but it was decided to postpone it for the W3C
community group.

5.1 SIMPLE EXAMPLE

SHACL groups the information and constraints that apply to data nodes into some constructs
called shapes. SHACL shapes differ from ShEx shapes in the sense that they also contain in-
formation about the target nodes or set of nodes to which they can be applied.

1 https://www.w3.org/TR/shacl
2 https://www.w3.org/TR/shacl-af/
3 https://www.w3.org/TR/shacl-js/
4 https://www.w3.org/community/shacl/
5 https://w3c.github.io/data-shapes/shacl-compact-syntax/
120 5. SHACL
The syntax of SHACL is defined in terms of RDF so we will use Turtle in this book
although it is possible to employ other RDF serialization formats such as JSON-LD or
RDF/XML.

Example 5.1 UserShape example in SHACL


The following example is similar to the ShEx definition in Example 4.1.6 It defines a shape
:UserShape of type sh:NodeShape. It has target class declaration pointing to :User which means that
it applies to all nodes that are instances of :User (see Section 5.7.2). The next lines declare that
nodes conforming to :UserShape must satisfy the following constraints.
• They must have exactly one property schema:name with values of type xsd:string (lines 3–8).

• They must have exaclty one property schema:gender whose value must be either schema:Male
or schema:Female or any xsd:string literal (lines 9–17).

• They have zero or one schema:birthDate property whose datatype must be xsd:date (lines
18–22).

• They have zero or more schema:knows properties whose nodes must be IRIs and have type
:User (lines 23–27).

1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [ # Blank node 1
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] ;
9 sh:property [ # Blank node 2
10 sh:path schema:gender ;
11 sh:minCount 1;
12 sh:maxCount 1;
13 sh:or (
14 [ sh:in ( schema:Male schema:Female ) ]
15 [ sh:datatype xsd:string ]
16 )
17 ] ;
18 sh:property [ # Blank node 3
19 sh:path schema:birthDate ;
20 sh:maxCount 1;
21 sh:datatype xsd:date ;
22 ] ;
23 sh:property [ # Blank node 4
6 The example differs in the avoidance of recursion for SHACL. See Section 5.12.1.
5.1. SIMPLE EXAMPLE 121
24 sh:path schema:knows ;
25 sh:nodeKind sh:IRI ;
26 sh:class :User ;
27 ] .

SHACL defines shapes as a conjunction of constraints that nodes must satisfy. A SHACL
processor checks each of the constraints and returns validation errors for every constraint that is
not satisfied.
When no error is reported, it is assumed that the RDF graph has been validated.

Example 5.2 RDF graph conforming to Example 5.1


The following RDF data graph conforms to the previous example:
1 :alice a :User; # V Passes as :UserShape
2 schema:name "Alice " ;
3 schema:gender schema:Female ;
4 schema:knows :bob .

6 :bob a :User; # V
Passes as :UserShape
7 schema:gender schema:Male ;
8 schema:name " Robert ";
9 schema:birthDate "1980 -03 -10"^^ xsd:date .

11 :carol a :User; # V Passes as :UserShape


12 schema:name "Carol " ;
13 schema:gender schema:Female ;
14 foaf:name "Carol " .

When an RDF graph conforms to a shapes graph, SHACL processors return a validation
report with no errors. The validation report contains the declaration:
1 [ a sh:ValidationReport ;
2 sh:conforms true
3 ].

Example 5.3 Example of non conforming RDF graph


The following RDF graph does not conform to the shapes graph declared in Example 5.1.
1 :dave a :User ; # X Fails as :UserShape
2 schema:name "Dave";
3 schema:gender :Unknown ;
4 schema:birthDate 1980 ;
5 schema:knows :grace .

7 :emily a :User ; # X Fails as :UserShape


122 5. SHACL
8 schema:name "Emily", " Emilee ";
9 schema:gender schema:Female .

11 :frank a :User ; # X Fails as :UserShape


12 foaf:name "Frank" ;
13 schema:gender schema:Male .

15 _:x a :User; # X Fails as :UserShape


16 schema:name " Unknown " ;
17 schema:gender schema:Male ;
18 schema:knows _:x .

A SHACL processor reports the following errors.

• :dave has value different from schema:Male, schema:Female or string for property schema:gender
(the allowed values).

• :dave has value 1980 for property schema:birthDate which is of datatype integer when it should
be of datatype xsd:date.

• :dave has value :grace for property schema:knows which is not an instance of :User.

• :emily has 2 values for property schema:name when the maximum count is 1.

• :frank does not have value for property schema:name.

• _:x fails because the value of schema:knows is a blank node and must be an IRI.

When an RDF graph does not conform to a shapes graph, SHACL processors return a
validation report that contains several errors. Section 5.5 describes the validation report struc-
ture.

5.2 SHACL IMPLEMENTATIONS


At the time of this writing, there are several implementations of SHACL.

• TopQuadrant has an open source implementation in Java (using the Apache Jena Library)
called TopBraid SHACL API7 . It implements SHACL Core, SHACL-SPARQL, and
SHACL rules (see 5.19) and also offers a command line tool. TopQuadrant is the company
behind TopBraid Composer, which is a commercial interactive development environment
for semantic web and linked data applications. TopBraid Composer (including the free
edition) includes a version of the API for RDF validation.
7 https://github.com/TopQuadrant/shacl
5.2. SHACL IMPLEMENTATIONS 123
• SHACL Playground,8 an online SHACL demo implemented in Javascript by TopQuad-
rant.

• SHACLex9 implements SHACL Core (it also implements ShEx). It has been written in
Scala based on a simple and generic RDF Library (currently it works on top of Apache
Jena library but there are plans to use other libraries). SHACLex can be used to deploy an
online validator service and an online demo is deployed in Heroku.10

• Corese STTL SHACL validator. Implemented by Olivier Corby. It is an implementation


of SHACL Core using STTL (SPARQL Template Transformation language), which is
a generic transformation language for RDF.11 STTL is itself implemented in Java. An
online demo of the validator is also available.12

• Netage SHACL Engine13 implemented in Java (using the Jena Library) by Nicky van
Oorschot. It has support for SHACL-SPARQL.

• SHACL-Check a prototype implemented by Tim Berners-Lee to check the specifica-


tion.14

• RDFUnit.15 A test driven data-debugging framework that runs test cases against RDF
data and records any violations in structured form. Besides its SPARQL-based constraint
definition language, RDFUnit supports rule translation from multiple formats i.e. OWL
under closed world semantics, OSLC and DSP. At the time of this writing, RDFUnit
supports a very big part of SHACL-Core and SHACL-SPARQL16 . One of the future
plans for RDFUnit is to support ShEx through the SHACLex implementation.

• Alternative SHACL implementation, by Peter F. Patel-Schneider in Python.17

• ELI Validator, by the ELI (European Legislation Identifier) Initiative18 which is based on
the TopBraid SHACL API.

• SHACL for rdf4j19 (formerly Sesame) developed as a Google Summer of Code 2017
project.
8 http://shacl.org/playground/
9 http://labra.github.io/shaclex/
10 http://shaclex.herokuapp.com/
11 http://ns.inria.fr/sparql-template
12 http://corese.inria.fr/
13 http://www.netage.nl
14 https://github.com/linkeddata/shacl-check
15 http://aksw.org/Projects/RDFUnit.html
16 https://github.com/AKSW/RDFUnit/issues/62
17 https://github.com/pfps/shacl
18 http://labs.sparna.fr/eli-validator/
19 https://github.com/eclipse/rdf4j
124 5. SHACL
5.3 BASIC DEFINITIONS: SHAPES GRAPHS, NODE, AND
PROPERTY SHAPES
A SHACL processor has two inputs: a data graph that contains the RDF data to validate and a
shapes graph that contains the shapes. Example 5.1 contains a shapes graph and Examples 5.2
and 5.3 contain two possible RDF data graphs. It is possible to use a single graph that contains
both the data and shapes graph merged.
There are two main types of shapes: node shapes and property shapes. Node shapes declare
constraints directly on a node. Property shapes declare constraints on the values associated with
a node through a path.
Property shapes have a property sh:path that declares the path that goes from the focus
node to the value that they describe. The most frequent paths are predicate paths which are
formed by a single IRI.
A node shape usually contains several property shapes which are declared through the
sh:property predicate.
Example 5.1 contained four such property shape declarations. The first one was defined
as:
1 :UserShape ...
2 sh:property [ # Blank node 1
3 sh:path schema:name ;
4 sh:minCount 1;
5 sh:maxCount 1;
6 sh:datatype xsd:string ;
7 ] ;
8 ...

Which means that nodes that conform to :UserShape must also conform to the property
shape identified by blank node 1. The path of that property shape (line 3) is the predicate
schema:name which is, in this case, a single IRI. The property shape contains several components
that declare that there can be a minimum and a maximum of one values that can be accessed
through that path (lines 4 and 5) and that they must belong to the xsd:string datatype (line 6).
Notice that in Example 5.1 we used blank nodes for property shapes and enumerated them
from 1–4 because we will refer to them when we describe the validation report in next section.
Although using blank nodes may be more readable, sometimes, it may be better to declare an
IRI for the property shapes so they can be referenced from other shapes graphs when they are
imported (see the next section).

Example 5.4 Declaring IRIs for property shapes


Example 5.1 could be rewritten as:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
5.4. IMPORTING OTHER SHAPES GRAPHS 125
3 sh:nodeKind sh:IRI ;
4 sh:property :HasEmail ;
5 sh:property :HasGender ;
6 sh:property :MaybeBirthDate ;
7 sh:property :KnowsUsers .

9 :HasEmail sh:path schema:name ;


10 sh:minCount 1;
11 sh:maxCount 1;
12 sh:datatype xsd:string .

14 :HasGender sh:path schema:gender ;


15 sh:minCount 1;
16 sh:maxCount 1;
17 sh:or (
18 [ sh:in ( schema:Male schema:Female ) ]
19 [ sh:datatype xsd:string ]
20 ) .

22 :MaybeBirthDate sh:path schema:birthDate ;


23 sh:maxCount 1;
24 sh:datatype xsd:date .

26 :KnowsUsers sh:path schema:knows ;


27 sh:class :User .

5.4 IMPORTING OTHER SHAPES GRAPHS


A shapes graph contains shapes definitions that will be passed to the SHACL validation pro-
cess. Shapes graphs can be reusable modules that can be referenced by other shapes graphs with
the predicate owl:imports. As a pre-validation step, SHACL processors should extend the orig-
inal shapes graph by following and importing all referenced shapes graphs through owl:imports
declarations. The resulting graph will be the input shapes graph that will be used for validation.

Example 5.5 Importing shapes graphs


If we assume that Example 5.1 is available at IRI http://example.org/UserShapes,
then, the following shapes graph imports its shapes and uses them to declare that nodes that
conform to :TeacherShape must also conform to :UserShape (line 5) and have the predicate :teaches
with a value of datatype xsd:string.
1 <> owl:imports <http: // example .org/UserShapes > .

3 :TeacherShape a sh:NodeShape ;
4 sh:targetClass :Teacher ;
126 5. SHACL
5 sh:node :UserShape ;
6 sh:property [
7 sh:path :teaches ;
8 sh:minCount 1;
9 sh:datatype xsd:string ;
10 ]
11 .

Given the following data:


1 :alice a :Teacher ; # V Passes as :TeacherShape
2 schema:name "Alice" ;
3 schema:gender schema:Female ;
4 schema:knows :bob ;
5 :teaches " Algebra " .

7 :bob a :User ; # V Passes as :UserShape


8 schema:gender schema:Male ;
9 schema:name " Robert " .

11 :carol a :Teacher ; # X Fails as :TeacherShape


12 schema:gender 23 ;
13 :teaches "Logic" .

A SHACL processor validates that :alice conforms to :TeacherShape, and :bob to :UserShape
but reports that :carol does not conform to :TeacherShape.

5.5 VALIDATION REPORT


As we said, SHACL processors take as input a data graph and a shapes graph and return a
validation report.
The validation report is defined as an RDF graph with the following structure. If the data
graph conforms to the shapes graph, the report contains a sh:conforms declaration with the value
true:

1 :report a sh:ValidationReport ;
2 sh:conforms true .

If the data graph does not conform to the shapes graph, the validation report will have a
value false for the property sh:conforms and a set of validation errors of type sh:ValidationResult
linked by the property sh:result.
Each validation result contains metadata about the cause of the error such as sh:focusNode,
sh:value, sh:resultPath, etc. Table 5.1 describes the properties of validation results.
5.5. VALIDATION REPORT 127
Table 5.1: SHACL validation result properties

Property Description
sh:focusNode The focus node that was being validated when the
error happened.
sh:resultPath The path from the focus node. This property is op-
tional usually corresponds to the sh:path declara-
tion of property shapes.
sh:value The value that violated the constraint, when avail-
able.
sh:sourceShape The shape that the focus node was validated against
when the constraint was violated.
sh:sourceConstraintComponent The IRI that identifies the component that caused
the violation.
sh:detail May point to further details about the cause of the
error. This property can be used for reporting errors
in nested nested shapes.
sh:resultMessage Textual details about the error. This message can be
affected by the sh:message property (see Section
5.6.4).
sh:resultSeverity A value which is equal to the sh:severity value
of the shape that caused the violation error. If the
shape doesn’t have sh:severity declaration then
the default value will be sh:Violation.

Example 5.6
The validation report generated by a SHACL processor when trying to validate the shapes
graph in Example 5.1 with the data graph from Example 5.3 could be:
1 :report a sh:ValidationReport ;
2 sh:conforms false ;
3 sh:result
4 [ a sh:ValidationResult ;
5 sh:resultSeverity sh:Violation ;
6 sh:sourceConstraintComponent sh:InConstraintComponent ;
7 sh:sourceShape ... ; # blank node 2
8 sh:focusNode :dave ;
9 sh:value :Unknown ;
10 sh:resultPath schema:gender ;
128 5. SHACL
11 sh:resultMessage "Value has none of the shapes from the or list"],
12 [ a sh:ValidationResult ;
13 sh:resultSeverity sh:Violation ;
14 sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
15 sh:sourceShape ... ; # blank node 3
16 sh:focusNode :dave ;
17 sh:value 1980 ;
18 sh:resultPath schema:birthDate ;
19 sh:resultMessage "Value does not have datatype xsd:date " ],
20 [ a sh:ValidationResult ;
21 sh:resultSeverity sh:Violation ;
22 sh:sourceConstraintComponent sh:ClassConstraintComponent ;
23 sh:sourceShape ... ; # blank node 4
24 sh:focusNode :dave ;
25 sh:value :grace ;
26 sh:resultPath schema:knows ;
27 sh:resultMessage "Value is not an instance of User" ],
28 [ a sh:ValidationResult ;
29 sh:resultSeverity sh:Violation ;
30 sh:sourceConstraintComponent sh:MaxCountConstraintComponent ;
31 sh:sourceShape ... ; # blank node 1
32 sh:focusNode :emily ;
33 sh:resultPath schema:name ;
34 sh:resultMessage "More than 1 values " ],
35 [ a sh:ValidationResult ;
36 sh:resultSeverity sh:Violation ;
37 sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
38 sh:sourceShape ...; # blank node 1
39 sh:focusNode :frank ;
40 sh:resultPath schema:name ;
41 sh:resultMessage "Less than 1 values " ],
42 [ a sh:ValidationResult ;
43 sh:resultSeverity sh:Violation ;
44 sh:sourceConstraintComponent sh:NodeKindConstraintComponent ;
45 sh:sourceShape :UserShape ;
46 sh:focusNode _:x ;
47 sh:value _:x ;
48 sh:resultMessage "Value does not have node kind sh:IRI "]
49 .

Although in the rest of this chapter we will describe the different errors in natural language
for simplicity, the validation results returned by SHACL processors will have the structure above.
5.6. SHAPES 129
5.6 SHAPES
There are two types of shapes in SHACL: node shapes and property shapes. Node shapes specify
constraints about a node while property shapes specify constraints about the values that can be
reached from a node by a path.

Shape

target declarations

NodeShape PropertyShape

constraint components sh:path: rdfs:Resource


constraint components

Figure 5.1: Shapes in SHACL.

5.6.1 NODE SHAPES


Node shapes directly specify constraints about a focus node.
Example 5.7 Node shape example
The following shapes graph declares a node shape :UserShape which applies to all nodes
that are instances of :User and the constraint that nodes conforming to :UserShape must be IRIs.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:nodeKind sh:IRI .

Given the following RDF graph:


1 :alice a :User . # V Passes as :UserShape

3 <http: // other.uri.com/bob > a :User . # V Passes as :UserShape

5 _:1 a :User . # X Fails as :UserShape

A SHACL processor checks that :alice and <http://other.uri.com/bob> conform to shape


:User and returns the error:
• _:1 is not an IRI
130 5. SHACL
5.6.2 PROPERTY SHAPES
Property shapes specify constraints about the values that can be reached from a focus node by
some path. sh:property associates a shape with a property shape.
The nodes that are affected by a property shape are specified using sh:path property that
can take as value IRIs or SHACL paths.
SHACL paths are semantically equivalent to a subset of the SPARQL 1.1 property paths
but they use an RDF encoding based on the following rules.
• Direct predicates use a single IRI.
• Inverse paths are declared by a blank node with the property sh:inversePath.
• Sequence paths are encoded by RDF lists whose values are SHACL paths themselves.
• Alternative paths are declared by a blank node with the property sh:alternativePath whose
value is an RDF list with the different alternatives.
• The path modifiers ?, *, and + are encoded by a blank node with the corresponding prop-
erties sh:zeroOrOnePath, sh:zeroOrMorePath or oneOrMorePath.
Table 5.2 presents some examples of SHACL paths and their corresponding SPARQL
paths.

Table 5.2: SHACL and SPARQL paths

SHACL Path SPARQL Path


schema:name schema:name
[sh:inversePath schema:knows] ˆschema:knows
(schema:knows schema:name) schema:knows/schema:name
[sh:alternativePath (schema:knows schema:knows|schema:follows
schema:follows)]
[sh:zeroOrOnePath schema:knows] schema:knows?
[sh:oneOrMorePath schema:knows] schema:knows+
([sh:zeroOrMorePath schema:knows] schema:knows*/schema:name
schema:name)

Example 5.8 SHACL paths example


The following shape declares that nodes that are instances of :User must satisfy that they
must have a value for property schema:knows or schema:follows, which must be an IRI and that any
node linked to users by the transitive closure of the schema:knows property must have a schema:email
whose value must also be an IRI.
5.6. SHAPES 131

1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path [ sh:alternativePath ( schema:knows schema:follows )] ;
5 sh:nodeKind sh:IRI ;
6 sh:minCount 1
7 ] ;
8 sh:property [
9 sh:path ([ sh:oneOrMorePath schema:knows ] schema:email ) ;
10 sh:nodeKind sh:IRI
11 ].

Given the following RDF data:


1 :alice a :User ; # V Passes as :UserShape
2 schema:follows <mailto:alice@mail .org >;
3 schema:knows :bob, :carol .

5 :bob schema:email <mailto:bob@mail .org >;


6 schema:knows :carol .

8 :carol schema:email <mailto:carol@mail .org > .

10 :dave a :User ; # X Fails as :UserShape


11 schema:knows <mailto:dave@mail .org > ;
12 schema:knows :carol, :emily .

14 :emily schema:email " Unknown " .

A SHACL processor verifies that :alice conforms to shape :UserShape because it has
schema:email with an IRI value and all the nodes that can be reached by the property schema:knows
one or more times followed by the property schema:email (which is equivalent to schema:knows+/
schema:email using SPARQL notation) are also IRIs.
The SHACL processor would return error for :dave because one of the values of
schema:knows has an schema:email that is not an IRI (:emily).

5.6.3 CONSTRAINT COMPONENTS


SHACL defines the concept of constraint components which are associated with shapes to de-
clare constraints. Each node or property shape can be associated with several constraint compo-
nents.
Constraint components are identified by an IRI and have two types of parameters: manda-
tory and optional. The association between a shape and a constraint component is made by
declaring values for the parameters. The parameters are also identified by IRIs and have val-
ues. Most of the constraint components in SHACL Core have a single parameter and follow
132 5. SHACL
the convention that if the parameter is named sh:p, the corresponding constraint component is
named sh:pConstraintComponent.

Example 5.9 Shape with two constraints


The following code:
1 :UserShape a sh:NodeShape ;
2 sh:nodeKind sh:IRI ;
3 sh:class schema:Person .

declares a node shape :UserShape with two constraints which are associated with the following
constraint components:

• with the value sh:IRI for the parameter sh:nodeKind. The con-
sh:NodeKindConstraintComponent
straint means that nodes that conform to :UserShape must be IRIs; and

• sh:ClassConstraintComponent with the value schema:Person for the parameter sh:class. The con-
straint means that nodes conforming to :UserShape must be instances of schema:Person.

Given the following data:


1 :alice a schema:Person . # V Passes as :UserShape

3 :bob a schema:Product . # X Fails as :UserShape

5 _:x a schema:Person . # X Fails as :UserShape

When a constraint component declares a single parameter, the parameter may be used
several times in the same shape. Each value of the parameter declares a different constraint. The
interpretations of such declarations is conjunctive, i.e., all constraints apply.

Example 5.10 Shape with two constraints with the same parameter
The following code:
1 :UserShape a sh:NodeShape ;
2 sh:class foaf:Person ;
3 sh:class schema:Person .

Declares two constraints with the parameter sh:class that means that nodes conforming
to :UserShape must be instances of both foaf:Person and schema:Person.

Constraint components are associated with validators which define the behavior of the
constraint.
5.6. SHAPES 133
Table 5.3: SHACL core constraint components

Operation Parameters Section


Cardinality constraints sh:minCount, sh:maxCount 5.8
Value types sh:class, sh:datatype, sh:nodeKind 5.9
sh:in, sh:hasValue
Value range constraints sh:minInclusive, sh:maxInclusive 5.10.1
sh:minExclusive, sh:maxExclusive
String-based constraints sh:minLength, sh:maxLength 5.10.2
sh:length sh:pattern
Language-based sh:uniqueLang, sh:languageIn 5.10.3
Logical constraints sh:and, sh:or, sh:xone, sh:not 5.11
Shape-based constraints sh:node, sh:property 5.12
sh:qualifiedValueShape,
sh:qualifiedValueShapesDisjoint
sh:qualifiedMinCount
sh:qualifiedMaxCount
Closed shapes sh:closed, sh:ignoredProperties 5.13
Property pair constraints sh:equals, sh:disjoint 5.14
sh:lessThan, sh:lessThanOrEquals
Non-validating properties sh:name, sh:description, sh:order, 5.15
sh:group

SHACL Core contains a list of built-in constraint components that are classified in Ta-
ble 5.3. In the table, we included the parameter names because they are shorter than the com-
ponent IRIs. Those components will be described in more detail in their corresponding sections
later in this chapter.
As we will show in Section 5.16, SHACL-SPARQL can be used to declare other con-
straint components.

5.6.4 HUMAN FRIENDLY MESSAGES


The property sh:message can be used to associate a human-friendly message with a shape. If there
is a violation that affects that shape, a SHACL processor can include the value of sh:message as
the value of sh:resultMessage in the validation report.
134 5. SHACL
Example 5.11 sh:message example
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [ # Blank node 1
4 sh:path schema:name ;
5 sh:minCount 1 ;
6 sh:message "Where is the name?"
7 ] .

Given the following RDF graph:


1 :alice a :User ; # V Passes as :UserShape
2 schema:name "Alice" .

4 :bob a :User ; # X Fails as :UserShape


5 foaf:name "Bob" .

A SHACL processor would return the following validation report:


1 :report a :ValidationReport ;
2 sh:conforms false ;
3 sh:result [ a sh:ValidationResult ;
4 sh:resultSeverity sh:Violation ;
5 sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
6 sh:sourceShape ... ; # Blank node 1
7 sh:focusNode :bob ;
8 sh:resultPath schema:name ;
9 sh:resultMessage "Where is the name?" ;
10 ] .

5.6.5 DECLARING SHAPE SEVERITIES


The property sh:severity can be used to declare a severity value for a shape. If there is a violation
that affects that shape, a SHACL processor can include the value of sh:severity as the value
of sh:resultSeverity in the validation report. SHACL describes three kinds of severity levels:
sh:Info, sh:Warning, and sh:Violation. If the shape does not declare a severity value, the default
one is sh:Violation.

Example 5.12 sh:severity example


Given the following shapes graph:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [ # Blank node 1
4 sh:path schema:name ;
5 sh:datatype xsd:string ;
5.6. SHAPES 135
6 sh:severity sh:Warning
7 ] .

and the RDF graph:


1 :alice a :User ; # V Passes as :UserShape
2 schema:name "Alice" .

4 :bob a :User ; # X Fails as :UserShape


5 schema:name 23 .

A SHACL processor returns the following validation report:


1 :report a :ValidationReport ;
2 sh:conforms false ;
3 sh:result [ a sh:ValidationResult ;
4 sh:resultSeverity sh:Warning ;
5 sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
6 sh:sourceShape ... ; # Blank node 1
7 sh:focusNode :bob ;
8 sh:resultPath schema:name ;
9 sh:resultMessage " Datatype should be xsd:string " ;
10 sh:value 23
11 ] .

5.6.6 DEACTIVATING SHAPES


If a shape has the property sh:deactivated with the value true then it is deactivated and all RDF
terms will conform to it.
A typical use case for deactivated shapes is when one imports shapes from another graph
by a third party and wants to deactivate some of the shapes in the local shapes graph that do not
apply in the current context.
Notice that if the author of a shapes library anticipates that a shape may need to be dis-
abled or modified by others, it may be better to use IRIs instead of blank nodes, so they can be
referenced later.

Example 5.13 Deactivating shapes


Let’s assume that there is a shapes library available at IRI http://example.org/UserSh
apes with the following shapes graph:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property :HasName ;
4 sh:property :HasEmail .
136 5. SHACL
6 :HasName sh:path schema:name ;
7 sh:minCount 1;
8 sh:maxCount 1;
9 sh:datatype xsd:string .

11 :HasEmail sh:path schema:email ;


12 sh:minCount 1;
13 sh:nodeKind sh:IRI .

And we define a shapes graph importing the previous shapes and adding a declaration for
:TeacherShape that deactivates the property :HasEmail:
1 <> owl:imports <http: // example .org/UserShapes > .

3 :TeacherShape a sh:NodeShape ;
4 sh:targetClass :Teacher ;
5 sh:node :UserShape ;
6 sh:property [
7 sh:path :teaches ;
8 sh:minCount 1;
9 sh:datatype xsd:string ;
10 ] ;

12 :HasEmail sh:deactivated true .

The merged shapes graph deactivates the property shape :HasEmail so nodes that conform
to :TeacherShape need to conform to :UserShape but do not need to have schema:email property.
Given the following RDF data:
1 :alice a :Teacher ; # V
Passes as :TeacherShape
2 schema:name "Alice" ;
3 schema:email <mailto:alice@example .org >;
4 :teaches "Logic" .

6 :bod a :Teacher ; # V
Passes as :TeacherShape
7 schema:name " Robert " ;
8 schema:email "This email is not an IRI";
9 :teaches " Algebra " .

11 :carol a :Teacher ; # X Fails as :TeacherShape


12 schema:name 23 ;
13 :teaches "Logic" .

A SHACL processor checks that :alice and :bob conform to :TeacherShape even if :bob does
not conform to the :HasEmail shape. It returns the following error:
• :carol does not conform to :TeacherShape because it does not conform to :UserShape as the
value of property schema:name does not have datatype xsd:string.
5.7. TARGET DECLARATIONS 137
5.7 TARGET DECLARATIONS
SHACL shapes may define several target declarations. Target declarations specify the set of
nodes that will be validated against a shape. Table 5.4 contains the different target declarations
defined in SHACL core.
SHACL targets provide the same functionality as the ShEx Shape maps (see 4.9). We
discuss the core differences in section 7.4.
Table 5.4: SHACL target declarations

Value Description
sh:targetNode Directly point to a node
sh:targetClass All nodes that are instances of some class
sh:targetSubjectsOf All nodes that are subjects of some predicate
sh:targetObjectsOf All nodes that are objects of some predicate

5.7.1 TARGET NODE


The predicate sh:targetNode declares a node that must conform to some shape.

Example 5.14 sh:targetNode example


In the following example, :alice, :bob, and :carol are declared as the target nodes of
:UserShape so a SHACL processor will validate those nodes.

1 :UserShape a sh:NodeShape ;
2 sh:targetNode :alice , :bob , :carol ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .

Given the RDF graph:


1 :alice schema:name "Alice Cooper " . # V Passes as :UserShape

3 :bob foaf:name "Bob" . # X Fails as :UserShape

5 :carol schema:name 23 . # X Fails as :UserShape

7 :dave schema:name 45 . # Ignored

A SHACL processor checks that :alice conforms to :UserShape and returns the errors:
138 5. SHACL
• :bob does not have have value for property schema:name
• :carol has a value which is not a xsd:string for property schema:name.
Notice that it ignores :dave as it was not affected by the sh:targetNode declaration.

sh:targetNode provides a similar functionality to the ShEx Fixed shape map (see 4.9.1).
However, the difference is that SHACL target nodes silently ignore missing target nodes from
the data graph, while in ShEx, we get back a failure. Depending on the data and constraint
modeling approach, silent ignore may lead to false-positives and thus, target nodes should be
used with caution.

5.7.2 TARGET CLASS


Target class declarations specify that all instances of some class must be validated with some
shape.
SHACL employs a specific notion of instance, which is called SHACL instance, which
can be defined using SPARQL property paths as: A node X is a SHACL instance of a class C if
X rdf:type/rdfs:subClassOf* C.
It means that nodes with an explicit rdf:type arc declaration are considered but also values
that have an rdf:type declaration pointing to some class that is transitively linked to another class
by the rdfs:subClassOf predicate. Note that the definition uses only the predicate rdfs:subClassOf
but does not take into account other predicates from RDFS like rdfs:domain, rdfs:target, etc. The
definition is does not require RDFS inference.

Example 5.15 sh:targetClass example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .

Given the following RDF graph:


1 :alice a :User; # V Passes as :UserShape
2 schema:name "Alice Cooper " .

4 :bob a :User; # X Fails as :UserShape


5 foaf:name "Bob" .

7 :carol a :User; # X Fails as :UserShape


8 schema:name 23 .
5.7. TARGET DECLARATIONS 139

10 :dave a :Student ; # X Fails as :UserShape


11 schema:name 45 .

13 :emily a :Student ; # V Passes as :UserShape


14 schema:name "Emily" .

16 :Student rdfs:subClassOf :User .

A SHACL validator checks that both :alice and :emily conform to :UserShape and returns
the following errors:

• :bob does not have property schema:name.

• :carol has a value for schema:name that is not an xsd:string.

• :dave has a value for schema:name that is not an xsd:string.

5.7.3 IMPLICIT CLASS TARGET


A shape with type sh:NodeShape and rdfs:Class is a target class of itself. This means that the
sh:targetClass declaration is implicit.

Example 5.16 Example using implicit targetClass


1 :User a sh:NodeShape , rdfs:Class ;
2 sh:property [
3 sh:path schema:name ;
4 sh:minCount 1;
5 sh:maxCount 1;
6 sh:datatype xsd:string ;
7 ] .

has the same validation behavior as:


1 :User a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .

So given the following RDF graph:


140 5. SHACL

1 :alice a :User; # V Passes as :User


2 schema:name "Alice Cooper " .

4 :bob a :User; # X Fails as :User


5 foaf:name " Robert " .

The system would return the following error.


• :bob does not have property schema:name.

Implicit target class declarations conflate the concept of shape and class as a single entity.
This can be a dangerous practice in the open semantic web as they are different concepts (see 3.2).
It can also be a very convenient feature to associate shape constraints with classes, and the Data
Shapes Working Group decided to support it.
In this book, we opt to separate shapes and classes, using the following pattern:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 ...

5.7.4 TARGET SUBJECTS OF


The property sh:targetSubjectsOf selects as focus nodes the subjects of some property.

Example 5.17 sh:targetSubjectsOf example


1 :UserShape a sh:NodeShape ;
2 sh:targetSubjectsOf :teaches ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .

Given the following RDF graph:


1 :alice :teaches :Algebra ; # V Passes as :UserShape
2 schema:name "Alice" .

4 :bob :teaches :Logic ; # X Fails as :UserShape


5 foaf:name " Robert " .

7 :carol foaf:name 23 . # Ignored

The system checks that :alice has shape :UserShape and signals the error:
5.8. CARDINALITY 141
• :bob does not have property schema:name.

In this case, the system ignores :carol.

5.7.5 TARGET OBJECTS OF


The property sh:targetObjectsOf selects as focus nodes the objects of some property.

Example 5.18 sh:targetObjectsOf example


1 :UserShape a sh:NodeShape ;
2 sh:targetObjectsOf :isTaughtBy ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .

Given the following RDF graph:


1 :alice schema:name "Alice" . # V Passes as :UserShape

3 :bob foaf:name " Robert " . # X Fails as :UserShape

5 :carol foaf:name 23 . # Ignored

7 :algebra :isTaughtBy :alice, :bob .

The system checks that :alice has shape :UserShape and signals the error:

• :bob does not have property schema:name.

The system ignores :carol as it is not the object of the :isTaughtBy property.

5.8 CARDINALITY
Cardinality constraint components specify restrictions on the minimum and maximum number
of distinct value nodes. Table 5.5 defines the cardinality constraint component parameters in
SHACL. The default cardinality in SHACL for property shapes is {0,unbounded}.

Example 5.19 Cardinality


Given the following shapes graph:
142 5. SHACL
Table 5.5: SHACL cardinality constraint components

Operation Description
sh:minCount Restricts minimum number of value nodes.
If not defined, there is no restriction (no minimum).
sh:maxCount Restricts maximum number of value nodes.
If not defined, there is no restriction (unbounded).

1 :User a sh:NodeShape , rdfs:Class ;


2 sh:property [
3 sh:path schema:follows ;
4 sh:minCount 2 ;
5 sh:maxCount 3 ;
6 ] .

and the following RDF graph:


1 :alice a :User ; # V Passes as :User
2 schema:follows :bob, :carol .

4 :bob a :User ; # X Fails as :User


5 schema:follows :alice .

7 :carol a :User ; # X Fails as :User


8 schema:follows :alice, :bob,
9 :carol, :dave .

A SHACL validator returns the errors:

• :bob has less than two values for the property schema:follows; and

• :carol has more than three values for the property schema:follows.

5.9 CONSTRAINTS ON VALUES


These constraint components specify the set of values that a node can have. For example, nodes
with some datatype, or are IRIs, or literals, etc. Table 5.6 describes the different possibilities
which we will detail in the following sections.

5.9.1 DATATYPES
sh:datatype specifies the datatype that a focus node must have.
5.9. CONSTRAINTS ON VALUES 143
Table 5.6: Constraints on values
Operation Description
sh:datatype Specifies the values must be literals with some datatype.
sh:class Specifies that values must be SHACL instances of some
class.
sh:nodeKind Possible values: sh:BlankNode, sh:IRI,
sh:Literal, sh:BlankNodeOrIRI,
sh:BlankNodeOrLiteral, sh:IRIOrLiteral.
sh:in Enumerates the value nodes that a property is allowed to
have.
sh:hasValue A node must have a given value.

Remember that all literals in the RDF data model have an associated datatype (see Sec-
tion 2.2). Plain string literals have xsd:string datatype by default.
SHACL contains a list of built-in datatypes that are based on XML Schema datatypes
(which are the same as in SPARQL 1.1). For those datatypes SHACL processors also check
that the lexical form conforms to the datatype rules. This means that something like "Unknown"^^
xsd:date is not a well-typed literal because "Unknown" does not conform to the xsd:date rules.

Example 5.20 Simple datatypes example


Given the following shapes graph20 :
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:datatype xsd:string
6 ] ;
7 sh:property [
8 sh:path schema:birthDate ;
9 sh:datatype xsd:date ;
10 ] .

and the following RDF graph:


1 :alice a :User ; # V Passes as :User
2 schema:name "Alice";
3 schema:birthDate "1981 -07 -10"^^ xsd:date .

5 :bob a :User ; # X Fails as :User


20 This example is similar to ShEx Example 4.10.
144 5. SHACL
6 schema:name " Robert " ;
7 schema:birthDate 1981 .

9 :carol a :User ; # X Fails as :User


10 schema:name :Carol ;
11 schema:birthDate "2003 -06 -10"^^ xsd:date .

13 :dave a :User ; # X Fails as :User


14 schema:name "Dave" ;
15 schema:birthDate " Unknown "^^ xsd:date .

A SHACL processor validates that :alice has shape :User and returns the following errors:

• :bob has a value for path schema:birthDate that is not a xsd:date (it is an integer);

• :carol has a value for path schema:name that is not a xsd:string (it is an IRI); and

• has a value for path


:dave schema:birthDate that is not a xsd:date (its lexical form does not
match xsd:date).

Example 5.21 Custom datatypes example


The RDF data model enables the use of other datatypes apart from the popular XML
Schema datatypes.
In the following example, a picture contains the properties schema:width and schema:height
using a hypothetical custom datatype (cdt:distance).
1 :PictureShape a sh:NodeShape ;
2 sh:targetClass :Picture ;
3 sh:property [
4 sh:path schema:width ;
5 sh:datatype cdt:distance
6 ] ;
7 sh:property [
8 sh:path schema:height ;
9 sh:datatype cdt:distance
10 ] .

1 :gioconda a :Picture ; # V Passes as :PictureShape


2 schema:width "21 in"^^ cdt:distance ;
3 schema:height "30 in"^^ cdt:distance .

5 :other a :Picture ; # X Fails as :PictureShape


6 schema:width "21 in"^^ xsd:string ;
7 schema:height 30 .
5.9. CONSTRAINTS ON VALUES 145
Example 5.22 Language-tagged literals
A common use case is to declare that some literals must be language-tagged strings.
1 :CountryShape a sh:NodeShape ;
2 sh:targetClass :Country ;
3 sh:property [
4 sh:path schema:name ;
5 sh:datatype rdf:langString
6 ] .

1 :spain a :Country ; # V Passes as :CountryShape


2 schema:name "España"@es .

4 :france a :Country ; # X Fails as :CountryShape


5 schema:name " France " .

5.9.2 CLASS OF VALUES


sh:class specifies that each value is an instance of a given class. As in Section 5.7.2, the notion
of instance that SHACL uses is a variation of RDF Schema where a node X is an instance of C
if X rdf:type/rdfs:subClassOf* C.
Example 5.23 Class of values
The following shape :User declares that the values of property schema:worksFor must be
SHACL instances of the :Organization class.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:worksFor ;
5 sh:class :Organization
6 ] .

Given the following RDF graph:


1 :alice a :User ; # V Passes as :User
2 schema:worksFor :aCompany .

4 :bob a :User ; # V Passes as :User


5 schema:worksFor :aUniversity .

7 :carol a :User ; # X Fails as :User


8 schema:worksFor :Unknown .

10 :aCompany a :Organization .
11 :aUniversity a :University .
12 :University rdfs:subClassOf :Organization .
146 5. SHACL
A SHACL processor verifies that :alice and :bob conform to shape :User and returns the
following error:
• :carol has the value :Unknown for property schema:worksFor which is not a SHACL instance
of :Organization.

5.9.3 NODE KINDS


sh:nodeKind specifies the kind of values according to the RDF Data model. Table 5.7 contains
the possible values for that property.

Table 5.7: Node kinds


Nodekind Description
sh:IRI Nodes must be IRIs.
sh:BlankNode Nodes must be Blank nodes.
sh:Literal Nodes must be Literals.
sh:BlankNodeOrLiteral Nodes must be Blank nodes or literals.
sh:BlankNodeOrIRI Nodes must be Blank nodes or IRIs.
sh:IRIOrLiteral Nodes must be IRIs or literals.

Example 5.24 Nodekind example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:nodeKind sh:Literal ;
6 ];
7 sh:property [
8 sh:path schema:follows ;
9 sh:nodeKind sh:BlankNodeOrIRI
10 ];
11 sh:nodeKind sh:IRI
12 .

Given the following RDF graph:


1 :alice a :User; # V Passes as :UserShape
2 schema:name "Alice" ;
3 schema:follows [ schema:name "Dave" ] .

5 :bob a :User; # X Fails as :UserShape


5.9. CONSTRAINTS ON VALUES 147
6 schema:name _:1 ;
7 schema:follows :alice .

9 :carol a :User; # X Fails as :UserShape


10 schema:name "Carol " ;
11 schema:follows "Dave" .

13 :dave a :User . # V Passes as :UserShape

15 _:1 a :User . # X Fails as :UserShape

A SHACL processor verifies that :alice and :dave conform to shape :UserShape and returns
the following errors:
• :bob has a value that is not a literal for property schema:name.

• :carol has a value that is not a blank node or IRI for property schema:follows.

• _:1 is not an IRI .


Note that :dave pases as :UserShape because there are no cardinality restrictions on schema:name and
schema:follows.

5.9.4 SETS OF VALUES


sh:in specifies that each value must be a member of the provided list.

Example 5.25 sh:in example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:gender ;
5 sh:in ( schema:Male schema:Female )
6 ] .

Given the following RDF graph:


1 :alice a :User; # V Passes as :UserShape
2 schema:affiliation :OurCompany ;
3 schema:gender schema:Female .

5 :bob a :User; # X Fails as :UserShape


6 schema:gender schema:male .

A SHACL processor verifies that :alice conforms to :UserShape and returns the following
errors:
148 5. SHACL
• :bob has a value for schema:gender that is not in the list ( schema:Male schema:Female) because
schema:Male is not equal to schema:male.

5.9.5 SPECIFIC VALUE


sh:hasValue declares the value that a node must have . Notice that even if there is no sh:minCount
declared, this constraint checks that the property has that value (and possibly others).

Example 5.26 sh:hasValue

1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:affiliation ;
5 sh:hasValue :OurCompany
6 ] .

Given the following RDF graph:


1 :alice a :User; # V Passes as :UserShape
2 schema:affiliation :OurCompany .

4 :bob a :User; # X Fails as :UserShape


5 schema:affiliation :OurUniversity .

7 :carol a :User . # X Fails as :UserShape

9 :dave a :User; # V Passes as :UserShape


10 schema:affiliation :OurCompany ;
11 schema:affiliation :OurUniversity .

A SHACL processor verifies that :alice conforms to :UserShape and returns the following
errors:

• :bob does not have value :OurCompany for property schema:affiliation; and

• :carol does not have value for property schema:affiliation.

5.10 DATATYPE FACETS


SHACL contains a list of built-in constraint components that resemble XML Schema facets
and have the same semantics.
5.10. DATATYPE FACETS 149
5.10.1 VALUE RANGES
The parameters sh:minInclusive, sh:minExclusive, sh:maxInclusive, sh:maxExclusive declare the min-
imum or maximum value of a literal with the variants to include or exclude the given value.

Example 5.27 Example with value ranges


1 :Rating a sh:NodeShape ;
2 sh:targetSubjectsOf schema:ratingValue ;
3 sh:property [
4 sh:path schema:ratingValue ;
5 sh:minInclusive 1 ;
6 sh:maxInclusive 5 ;
7 sh:datatype xsd:integer
8 ] .

Given the following RDF graph:


1 :low schema:ratingValue 1 . # V Passes as :Rating
2 :average schema:ratingValue 3 . # V Passes as :Rating
3 :veryGood schema:ratingValue 5 . # V Passes as :Rating
4 :zero schema:ratingValue 0 . # X Fails as :Rating
5 :incredible schema:ratingValue 100 . # X Fails as :Rating
A SHACL processor verifies that :low, :average, and :veryGood conform to shape :Rating
and returns the errors:
• :zero has a value below the minimum 1; and
• :incredible has a value bigger than the maximum 5.

5.10.2 STRING-BASED CONSTRAINTS


The parameters sh:minLength, sh:maxLength, and sh:pattern (with sh:flags) specify string facets on
value nodes. These constraints check the string representation of the value.21
String facets are always violated when the value node is a blank node.
sh:minLength and sh:maxLength specify constraints on the size of the string representation of
a value node. When sh:minLength is 0, it means that there is no restriction on the length of the
string.

21 Technically, it is the lexical form of literals or the codepoint representation of IRIs.


150 5. SHACL
Example 5.28 sh:minLength, sh:maxLength example
1 :User a sh:NodeShape , rdfs:Class ;
2 sh:property [
3 sh:path schema:name ;
4 sh:minLength 4 ;
5 sh:maxLength 20 ;
6 ] ;
7 sh:property [
8 sh:path schema:description ;
9 sh:minLength 0 ;
10 ] .

The following RDF graph:


1 :alice a :User; # V
Passes as :User
2 schema:name "Alice";
3 schema:description "... long description ..." .

5 :bob a :User; # X Fails as :User


6 schema:name "Bob" .

8 :carol a :User; # X Fails as :User


9 schema:name :Carol .

11 :strange a :User; # X Fails as :User


12 schema:name _:strange .

verifies that :alice and :carol conform to shape :User and reports the errors:

• :bob has a schema:name whose length is less that 4; and

• :strange has a blank node as the value for property schema:name whose length can’t be cal-
culated.

In the case of :carol, notice that the example depends on the length of the prefixed name
:Carol which will be calculated after concatenating the IRI associated with the empty prefix :
to Carol. In this case, if : is associated with http://example.org/, the processor will evaluate the
length of http://example.org/Carol (which is 24) and fails because it is bigger than 20.

Example 5.29 Example with pattern


sh:pattern specifies that a value must match a regular expression. It has the same definition
as the SPARQL regex function.22
22 https://www.w3.org/TR/sparql11-query/#func-regex
5.10. DATATYPE FACETS 151
The parameter sh:flags is optional and can modify the way the regular expression matches.
The definition of sh:flags is the same as SPARQL and XPath regular expressions. One of the
most popular flags is i which indicates that the match is case-insensitive.
We already gave a short introduction to regular expressions in Section 4.5.3. Although
that section was for ShEx, the concept is the same.
1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path schema:productID ;
5 sh:pattern "^P\\d{3 ,4}" ;
6 sh:flags "i" ;
7 ] .

Given the following RDF graph:


1 :car a :Product ;
2 schema:productID "P2345" . # V Passes as :Product

4 :bus a :Product ;
5 schema:productID "p567" . # V Passes as :Product

7 :truck a :Product ;
8 schema:productID "P12" . # X Fails as :Product

10 :bike a :Product ;
11 schema:productID "B123" . # X Fails as :Product

A SHACL processor verifies that :car and :bus conform to :Product and returns the fol-
lowing errors:

• :truck has a value for schema:productID that is too short; and

• :bike has a value for schema:productID that does not start with P or p.

5.10.3 LANGUAGE-BASED CONSTRAINTS


sh:languageIn declares the allowed languages of a literal and sh:uniqueLang specifies that no pair of
nodes can have the same language tag.

Example 5.30 Example with sh:languageIn


The following example declares that the rdfs:label property of a product must be a tagged
literal in Spanish, English, or French.
152 5. SHACL

1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path rdfs:label ;
5 sh:languageIn ("es" "en" "fr")
6 ] .

1 :p234 a :Product ; # V Passes as :ProductShape


2 rdfs:label "jamón"@es,
3 "ham"@en .

5 :p235 a :Product ; # V Passes as :ProductShape


6 rdfs:label "milk"@en .

8 :p236 a :Product ; # X Fails as :ProductShape


9 rdfs:label "Käse"@de .

11 :p237 a :Product ; # X Fails as :ProductShape


12 rdfs:label " patatas "@es ,
13 " kartofeln "@de .

Example 5.31 Example with sh:uniqueLang


The following example declares that if the nodes of shape :CountryShape have property
skos:prefLabel then the values must have different language tags.

1 :CountryShape a sh:NodeShape ;
2 sh:targetClass :Country ;
3 sh:property [
4 sh:path skos:prefLabel ;
5 sh:uniqueLang true
6 ] .

1 :spain a :Country ; # V Passes as :CountryShape


2 skos:prefLabel "Spain "@en,
3 "España"@es .

5 :france a :Country ; # V Passes as :CountryShape


6 skos:prefLabel " France ",
7 " France "@en,
8 " Francia "@es .

10 :italy a :Country . # V Passes as :CountryShape

12 :usa a :Country ; # X Fails as :CountryShape


13 skos:prefLabel "USA"@en,
5.10. DATATYPE FACETS 153
14 " United States "@en.

The previous example returns the error:

• Node :usa has more than one language for English at property skos:prefLabel.

In the previous example, a node without skos:prefLabel (e.g., :italy) also conforms to
:CountryShape.

Example 5.32 Example with one language tag in a list of languages


A typical situation is to require exactly one literal per language from a list of allowed lan-
guages. For example, declaring that nodes of shape :CountryShape have at least one skos:prefLabel
in English or Spanish.
1 :CountryShape a sh:NodeShape ;
2 sh:targetClass :Country ;
3 sh:property [
4 sh:path skos:prefLabel ;
5 sh:minCount 1 ;
6 sh:uniqueLang true ;
7 sh:languageIn ("en" "es") ;
8 ] .

Given the following data:


1 :spain a :Country ; # V Passes as :CountryShape
2 skos:prefLabel "Spain"@en,
3 "España"@es .

5 :france a :Country ; # X Fails as :CountryShape


6 skos:prefLabel " France " ,
7 " France "@en,
8 " Francia "@es .

10 :italy a :Country . # X Fails as :CountryShape

12 :usa a :Country ; # X
Fails as :CountryShape
13 skos:prefLabel "USA"@en,
14 " United States "@en.

In this case, :italy fails because it has no skos:prefLabel, :france fails because if has one
value that is not in English or Spanish, and :usa fails because it has more than one value in
English.
154 5. SHACL
5.11 LOGICAL CONSTRAINTS: AND, OR, NOT, XONE
The operators sh:and, sh:or, xone, and sh:not can be used to form complex constraints.
Their semantics is described in Table 5.8. sh:and, sh:or, and sh:not have the traditional
meaning of the corresponding Boolean operators while sh:xone (exactly one) is similar to the
exclusive-or when applied to two arguments. When applied to more than 2 arguments, the
former requires exactly one, while the latter requires an odd number of arguments to be satisfied.

Table 5.8: SHACL logical operators

Operation Description
sh:and sh:and (S1 ... SN) specifies that each value node must conform
to all the shapes S1 ... SN.
sh:or sh:or (S1 ... SN) specifies that each value node conforms to at
least one of the shapes S1 ... SN.
sh:not sh:not S specifies that each value node must not conform to S.
sh:xone sh:xone (S1 ... SN) specifies that exactly one node conforms to
one of the shapes S1 ... SN.

5.11.1 AND
A node conforms to a shape containing the sh:and operator if it conforms to all the shapes linked
by it.
The following example declares a :User shape as the conjunction of two property shapes.

Example 5.33 SHACL AND example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:and (
4 [ a sh:NodeShape ;
5 sh:property [
6 sh:path schema:name ;
7 sh:datatype xsd:string ;
8 sh:minCount 1 ]
9 ]
10 [ a sh:NodeShape ;
11 sh:property [
12 sh:path schema:affiliation ;
13 sh:minCount 1 ]
14 ]
15 )
16 .
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 155
The declaration of type sh:NodeShape and the use of sh:property is not required when we
want to reference a property shape. The following code is equivalent to the previous example.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:and (
4 [ sh:path schema:name ;
5 sh:datatype xsd:string ;
6 sh:minCount 1
7 ]
8 [ sh:path schema:affiliation ;
9 sh:minCount 1
10 ]
11 )
12 .

sh:and is a little redundant because by default, when we associate constraint components


to a shape, the meaning is that all those constraints must conform, so there is an implicit con-
junction.
For example, the previous shape and the following one have the same meaning.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:datatype xsd:string ;
6 sh:minCount 1
7 ] ;
8 sh:property [
9 sh:path schema:affiliation ;
10 sh:minCount 1
11 ]
12 .

In case of complex expressions, using sh:and may improve readability. One example is using
sh:and to extend one shape with other constraints.

Example 5.34 Extending a shape with other constraints


The following example declares a top-level shape :Person whose nodes must have
schema:name. The shape :User extends :Person adding a new constraint on the existing property
schema:name and declaring the need of another property schema:email. Finally, the shape :Student
extends :User adding a new property :course.23
23 This example is the same as Example 4.50 for ShEx.
156 5. SHACL

1 :Person a sh:NodeShape , rdfs:Class ;


2 sh:property [
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:minCount 1 ;
6 sh:maxCount 1
7 ] .

9 :User a sh:NodeShape , rdfs:Class ;


10 sh:and (
11 :Person
12 [ sh:path schema:name ;
13 sh:maxLength 5
14 ]
15 [ sh:path schema:email ;
16 sh:nodeKind sh:IRI ;
17 sh:minCount 1 ;
18 sh:maxCount 1
19 ]
20 ) .

22 :Student a sh:NodeShape , rdfs:Class ;


23 sh:and (
24 :User
25 [ sh:path :course ;
26 sh:nodeKind sh:IRI ;
27 sh:minCount 1;
28 ]
29 ) .

If we have the following RDF data:


1 :alice a :Person ; # V Passes as :Person
2 schema:name "Alice" .

4 :bob a :User ; # X
Fails as :User
5 schema:name " Robert Smith"; # long name
6 schema:email <bob@example .org > .

8 :carol a :Person, :User; # V


Passes as :Person and :User
9 schema:name "Carol" ;
10 schema:email <carol@example .org > .

12 :dave a :Student ; # V
Passes as :Person,:User and Student
13 schema:name "Dave" ;
14 schema:email <carol@example .org >;
15 :course :algebra .
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 157
5.11.2 OR
The parameter sh:or declares a disjunction between several shapes.

Example 5.35 SHACL disjunction example


The following shape declares that nodes must either have property foaf:name or schema:name
(or both).
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:or ( [ sh:path foaf:name ;
4 sh:minCount 1;
5 ]
6 [ sh:path schema:name ;
7 sh:minCount 1;
8 ]
9 )
10 .

Given the following data:


1 :alice a :User ;
2 schema:name "Alice" . # V Passes as :User

4 :bob a :User ;
5 foaf:name " Robert " . # V Passes as :User

7 :carol a :User ;
8 foaf:name "Carol"; # V Passes as :User
9 schema:name "Carol" .

11 :dave a :User ;
12 rdfs:label "Dave" . # X Fails as :User

A SHACL processor checks that :alice, :bob, and :carol conform to :UserShape but returns
an error on :dave.

For this particular example, the use of sh:or could be replaced by a SHACL property with
sh:alternativePath:

1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:alternativePath ( schema:name foaf:name )] ;
4 sh:minCount 1;
5 ] .
158 5. SHACL
Example 5.36 Union of datatypes
A common use case of sh:or is to declare the union of several datatypes. The following
example declares that products must have a rdfs:label which must be either a xsd:string or a
language tagged literal, and must have a release date that must be either a xsd:date, or xsd:gYear
or the string "unknown-past" or "unknown-future".
1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path rdfs:label ;
5 sh:or (
6 [ sh:datatype xsd:string ]
7 [ sh:datatype rdf:langString ]
8 );
9 sh:minCount 1;
10 sh:maxCount 1
11 ];
12 sh:property [
13 sh:path schema:releaseDate ;
14 sh:or (
15 [ sh:datatype xsd:date ]
16 [ sh:datatype xsd:gYear ]
17 [ sh:in ("unknown -past" "unknown - future ")]
18 );
19 sh:minCount 1;
20 sh:maxCount 1
21 ];
22 .

Given the following data:


1 :p1 a :Product ; # V Passes as :Product
2 rdfs:label " Laptop ";
3 schema:releaseDate "1990"^^ xsd:gYear .

5 :p2 a :Product ; # V Passes as :Product


6 rdfs:label "Car"@en ;
7 schema:releaseDate "unknown - future " .

9 :p3 a :Product ; # X Fails as :Product


10 rdfs:label :House ;
11 schema:releaseDate "2020"^^ xsd:integer .

A SHACL processor checks that :p1, and :p2 conform to :ProductShape but returns an error
on :p3.
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 159
5.11.3 EXACTLY ONE
A node conforms to a shape containing the sh:xone operator if it conforms to exactly one of the
shapes linked by it.
The semantics of sh:xone is different from Exclusive OR (XOR) when there are more than 2
arguments. XOR is usually defined as requiring conformance of an odd number of arguments,
while sh:xone requires conformance of exactly one.

Example 5.37 SHACL Xone example


The following shape declares that nodes must have either foaf:name or schema:name but not
both.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:xone (
4 [ sh:property [
5 sh:path foaf:name ;
6 sh:minCount 1;
7 ]
8 ]
9 [ sh:property [
10 sh:path schema:name ;
11 sh:minCount 1;
12 ]
13 ]
14 ) .

Given the previous shape declaration and the following RDF graph:
1 :alice a :User ; # V Passes as :User
2 schema:name "Alice" .

4 :bob a :User ; # V Passes as :User


5 foaf:name " Robert " .

7 :carol a :User ; # X Fails as :User


8 foaf:name "Carol";
9 schema:name "Carol" .

11 :dave a :User ; # X Fails as :User


12 rdfs:label "Dave" .

A SHACL processor checks that :alice and :bob conform to :User but gives errors for
:carol and :dave.

The sh:xone constraint component only checks that exactly one of its arguments is satisfied.
160 5. SHACL
When defining complex models, it must be used with caution as its behavior may not be
the intended one.

Example 5.38 Exactly one on complex expressions


We want to declare that a user has either one name or a combination of one or more given
names plus a family name, but not both. This example is the same as Example 4.30 in ShEx. A
first attempt to model it in SHACL would be:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:xone (
4 [ sh:path schema:name ;
5 sh:datatype xsd:string ;
6 sh:minCount 1;
7 sh:maxCount 1
8 ]
9 [ a sh:NodeShape ;
10 sh:property [
11 sh:path schema:givenName ;
12 sh:datatype xsd:string ;
13 sh:minCount 1;
14 ] ;
15 sh:property [
16 sh:path schema:familyName ;
17 sh:datatype xsd:string ;
18 sh:minCount 1;
19 sh:maxCount 1
20 ] ;
21 ]
22 ) .

Note, however, that xone does not reject everything we might expect it to:
1 :alice a :User ; # V Passes as :UserShape
2 schema:name "Alice " .

4 :bob a :User ; # V Passes as :UserShape


5 schema:givenName "Bob",
6 " Robert " ;
7 schema:familyName "Smith" .

9 :carol a :User ; # X Fails as :UserShape


10 schema:name "Carol" ;
11 schema:givenName "Carol" ;
12 schema:familyName "King" .

14 :dave a :User ; # V
Passes as :UserShape
15 schema:name "Dave" ; # But it should fail
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 161
16 schema:familyName "King" .

In the case of :dave it passes although the intended meaning is that it should fail (it con-
forms to one of the branches but partially matches the other one).
The solution is to change the expression representing each alternative at the top-level
excluding the other ones. In which case, sh:xone is not required and sh:or is enough. Note that
sh:maxCount 0 plays the role of negation.
The SHACL code equivalent to Example 4.30 is:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:or (
4 [ a sh:NodeShape ;
5 sh:property [
6 sh:path schema:name ;
7 sh:datatype xsd:string ;
8 sh:minCount 1;
9 sh:maxCount 1
10 ] ;
11 sh:property [
12 sh:path schema:givenName ;
13 sh:maxCount 0
14 ] ;
15 sh:property [
16 sh:path schema:familyName ;
17 sh:maxCount 0
18 ] ;
19 ]
20 [ a sh:NodeShape ;
21 sh:property [
22 sh:path schema:name ;
23 sh:maxCount 0;
24 ] ;
25 sh:property [
26 sh:path schema:givenName ;
27 sh:datatype xsd:string ;
28 sh:minCount 1;
29 ] ;
30 sh:property [
31 sh:path schema:familyName ;
32 sh:datatype xsd:string ;
33 sh:minCount 1;
34 sh:maxCount 1
35 ] ;
36 ]
37 ) .
162 5. SHACL
With this definition the node :dave would now fail as expected. Note that this definition
can become quite verbose for more complex expressions (see Section 7.13 for a longer example).

5.11.4 NOT
The parameter sh:not specifies the condition that each node must not conform to a given shape.

Example 5.39 SHACL Not


1 :NotFoaf a sh:NodeShape ;
2 sh:not [
3 sh:property [
4 sh:path foaf:name ;
5 sh:minCount 1 ;
6 ] ;
7 ] .

1 :alice schema:name "Alice" . # V Passes as :User

3 :bob foaf:name " Robert " . # X Fails as :User

5 :carol rdfs:label "Carol" . # V Passes as :User

5.11.5 COMBINING LOGICAL OPERATORS


It is possible to combine the previous logical operators to form more complex expressions.
IF-THEN pattern A typical pattern is to emulate an IF-THEN. Remember that IF x THEN y
is equivalent to (NOT x)OR y.

Example 5.40 IF-THEN pattern in SHACL


The following shape declares that all products must have a schema:productID and if a
product has rdf:type schema:Vehicle then it must have the properties schema:vehicleEngine and
schema:fuelType. This example is the same as ShEx Example 4.58.

1 :ProductShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productID ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ];
6 sh:or (
7 [ sh:not [
8 sh:property [
9 sh:path rdf:type ;
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 163
10 sh:hasValue schema:Vehicle
11 ]]
12 ]
13 [ sh:property [
14 sh:path schema:vehicleEngine ;
15 sh:minCount 1; sh:maxCount 1
16 ] ;
17 sh:property [
18 sh:path schema:fuelType ;
19 sh:minCount 1; sh:maxCount 1
20 ] ;
21 ]
22 ) .

Given the following data:


1 :p1 a :Book ; # V Passes as :ProductShape
2 schema:productID "P1" .

4 :p2 a schema:Vehicle ; # V Passes as :ProductShape


5 schema:productID "P2" ;
6 schema:fuelType " Gasoline " ;
7 schema:vehicleEngine "X2" .

9 :p3 a schema:Vehicle ; # X Fails as :ProductShape


10 schema:productID "P3" .

A SHACL processor checks that :p1 and :p2 conform to :ProductShape but signals an error
for :p3.

IF-THEN-ELSE pattern In the same way as before, an IF-THEN-ELSE can also be de-
clared. Remember that: IF A THEN B ELSE C is equivalent to IF A THEN B AND IF NOT A THEN C

Example 5.41 IF-THEN-ELSE pattern in SHACL


The following example declares that if a product has rdf:type with value schema:Vehicle
then it must have schema:vehicleEngine and schema:fuelType, else it must have schema:category with
a xsd:string value. This example is equivalent to the ShEx example presented in Section 4.8.3.
1 :Product a sh:NodeShape ;
2 sh:or (
3 [ sh:not
4 [ sh:path rdf:type ;
5 sh:hasValue schema:Vehicle
6 ]
7 ]
8 [ sh:and (
9 [ sh:path schema:vehicleEngine ;
164 5. SHACL
10 sh:minCount 1;
11 sh:maxCount 1
12 ]
13 [ sh:path schema:fuelType ;
14 sh:minCount 1;
15 sh:maxCount 1
16 ]
17 )
18 ]
19 );
20 sh:or (
21 [ sh:path rdf:type ;
22 sh:hasValue schema:Vehicle
23 ]
24 [ sh:path schema:category ;
25 sh:datatype xsd:string ;
26 sh:minCount 1;
27 sh:maxCount 1
28 ]
29 )
30 .

With the following data, nodes :kitt and :c23 conform to :Product each one passing one
of the branches, while :bad1 and :bad2 do not conform.
1 :kitt a schema:Vehicle ; # V Passes as :Product
2 schema:vehicleEngine :x42 ;
3 schema:fuelType :electric .

5 :c23 a schema:Computer ; # V Passes as :Product


6 schema:category " Laptop " .

8 :bad1 a schema:Vehicle ; # X Fails as :Product


9 schema:fuelType :electric .

11 :bad2 a schema:Computer . # X Fails as :Product

5.12 SHAPE-BASED CONSTRAINTS


sh:node specifies that the value nodes conform to a given shape.

Example 5.42 sh:node example


The following shapes graph declares that nodes of shape :User have a property
schema:worksFor whose values must conform to the shape :Company and that nodes of shape :Company
have a property schema:name whose values are strings.
5.12. SHAPE-BASED CONSTRAINTS 165

1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:worksFor ;
5 sh:node :Company ;
6 ] .

8 :CompanyShape a sh:NodeShape ;
9 sh:property [
10 sh:path schema:name ;
11 sh:datatype xsd:string ;
12 ] .

Consider the following data:


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:worksFor ;
5 sh:node :CompanyShape ;
6 ]
7 .

9 :CompanyShape a sh:NodeShape ;
10 sh:property [
11 sh:path schema:name ;
12 sh:datatype xsd:string ;
13 ]
14 .

This data would raise the following error:


• :bob does not conform to shape :User because the value of property schema:worksFor does not
conform to shape :CompanyShape. The reason is that the value of property schema:name does
not have datatype xsd:string.

specifies that the values conform to a given property shape.


sh:property
Although in most of the previous examples sh:property was pointing to blank nodes, it
may be possible (and even recommended) to use IRIs for property shapes.

Example 5.43 sh:property example


The following shapes graph declares that :UserShape nodes have a name and work for some-
thing that also has a name.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
166 5. SHACL
3 sh:property :HasName ;
4 sh:property [
5 sh:path schema:worksFor ;
6 sh:node :HasName ;
7 ] .

9 :HasName sh:path schema:name ;


10 sh:datatype xsd:string ;
11 sh:minCount 1 ;
12 sh:maxCount 1 .

Consider the following data:

1 :alice a :User; # V Passes as :UserShape


2 schema:name "Alice " ;
3 schema:worksFor :OurCompany .

5 :bob a :User; # X Fails as :UserShape


6 schema:name " Robert " ;
7 schema:worksFor :Another .

9 :carol a :User ; # X Fails as :UserShape


10 schema:worksFor :OurCompany .

12 :OurCompany
13 schema:name " OurCompany " .

15 :Another
16 schema:name 23 .

A SHACL processor raises the error:

• :bob does not conform to shape :UserShape because the value of schema:worksFor (:Another)
has 23 as schema:name which does not have datatype xsd:string.

• :carol does not conform to shape :UserShape because it does not have a name.

5.12.1 SHAPE REFERENCES AND RECURSION


Declarations with shape references like sh:node and sh:property trigger validation of nodes with
other shapes. In Example 5.42, validating :alice with :UserShape shape, triggered validation of
:OurCompany with shape :Company. This process can be problematic if there are some cyclic depen-
dencies between shapes.
5.12. SHAPE-BASED CONSTRAINTS 167
We will see other predicates that implicitly introduce shape references like the logical
predicates sh:and, sh:or, sh:not, and sh:xone (see Section 5.11) and the qualified value shapes (see
Section 5.12.2).

Example 5.44 Simple cyclic data


Shape :User represents nodes that have one schema:name with xsd:string value, an optional
schema:birthDate with value xsd:date and zero or more values of schema:knows that conform to :User.

:User

schema:name xsd:string 0..*


schema:birthDate xsd:date ?

schema:knows

Figure 5.2: Example of cyclic model.

Given the following data, :alice and :bob conform to :User while :carol and :dave do not
conform. :dave fails because the value of schema:name is not a xsd:string and :carol fails because
the value of schema:knows does not conform to :User.
1 :alice schema:name "Alice" ;
2 schema:birthDate "1995 -06 -03"^^ xsd:date ;
3 schema:knows :bob .

5 :bob schema:name " Robert " .

7 :carol schema:name "Carol" ;


8 schema:knows :dave .

10 :dave schema:name 23 .

A direct representation of the cyclic model in SHACL could be the following:


1 :User a sh:NodeShape ; # Undefined shapes graph
2 sh:property [ # because :User refers to itself
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 ];
8 sh:property [
9 sh:path schema:birthDate ;
168 5. SHACL
10 sh:datatype xsd:date ;
11 sh:maxCount 1;
12 ];
13 sh:property [
14 sh:path schema:knows ;
15 sh:node :User ;
16 ].

The behavior of a SHACl processor with :User shape is undefined because it is recursive: :User is
defined in terms of itself. Validation with recursive shapes is undefined and it is left to processor
implementations. Some processors may support it while others may produce an error.

Sometimes recursion appears indirectly when one shape refers to other shapes that refer
to others, and eventually, one of the shapes refers to the first one.

Example 5.45 Cyclic data model with two shapes


Figure 5.3 contains a simple cyclic data model.

:User schema:worksFor :Company

schema:name xsd:string 0..* schema:legalName xsd:string


schema:employee
1..*

Figure 5.3: Example of cyclic model.

A direct representation of the data model could be the following SHACL shapes graph:
1 :User a sh:NodeShape ; # Undefined shapes graph because :User and :Company
2 sh:property [ # refer to each other recursively
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 ];
8 sh:property [
9 sh:path schema:worksFor ;
10 sh:node :Company ;
11 ] .

13 :Company a sh:NodeShape ;
14 sh:property [
15 sh:path schema:legalName ;
16 sh:minCount 1;
17 sh:maxCount 1;
5.12. SHAPE-BASED CONSTRAINTS 169
18 sh:datatype xsd:string ;
19 ] ;
20 sh:property [
21 sh:path schema:employee ;
22 sh:minCount 1 ;
23 sh:node :User ;
24 ] .

The previous shapes are mutually recursive and again, the behavior of SHACL processors
is undefined.

Avoiding recursion using target declarations Target declarations can be used to avoid recur-
sion by directly selecting which nodes we want to validate.

Example 5.46 Simulating recursion with targetClass


We can require that every node has a discriminating rdf:type declaration and replace
sh:node by sh:class, i.e., we declare that the values of schema:knows are instances of :User.

1 :User a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:datatype xsd:string ;
6 sh:minCount 1;
7 sh:maxCount 1;
8 ];
9 sh:property [
10 sh:path schema:birthDate ;
11 sh:datatype xsd:date ;
12 sh:maxCount 1;
13 ];
14 sh:property [
15 sh:path schema:knows ;
16 sh:class :User ;
17 ].

Given the following data:


1 :alice a :User ; # V
Passes as :User
2 schema:name "Alice" ;
3 schema:birthDate "1995 -06 -03"^^ xsd:date ;
4 schema:knows :bob .

6 :bob a :User ; # V Passes as :User


7 schema:name " Robert " .

9 :carol a :User ; # V Passes as :User


170 5. SHACL
10 schema:name "Carol " ; # Is it ok to pass?
11 schema:knows :dave .

13 :dave a :User ; # X Fails as :User


14 schema:name 23 . # wrong value of schema:name

16 :emily a :User ; # X Fails as :User


17 schema:name "Emily " ; # wrong value of schema:knows
18 schema:knows :frank .

A SHACL processor returns the following.


• It validates that :alice and :bob conform to :User.
• It returns a violation for :dave because the value of schema:name is not a xsd:string.
• It also returns a violation for :emily because the value of schema:knows is not an instance of
:User.

• It does not return a violation for :carol because our only requirement is that the value of
schema:knows is an instance of :User and :dave is declared to be an instance of :User (although
it does not validate).

This approach has the advantage that it not only finds instances of class :User, but also
instances of subclasses of :User. For example, if we declare:
1 :grace a :Teacher ; # V Passes as :User
2 schema:name "Grace" ;
3 schema:knows :heidi .

5 :heidi a :Student ; # V Passes as :User


6 schema:name "Heidi" .

8 :Student rdfs:subClassOf :User .


9 :Teacher rdfs:subClassOf :User .

The system would check that both :grace and :heidi conform to the :User shape.
Being able to validate future subclasses of a given class may be helpful if there are some
unexpected changes in the hierarchy. Nevertheless, it also has the problem of requiring a dis-
criminating rdf:type declaration for every instance which may not always be possible.
Another possibility is to use other target declarations such as sh:targetSubjectsOf or
sh:targetObjectsOf.

Example 5.47 Simulating indirect recursion with target declarations


In order to simulate the model from Figure 5.3 we can declare that the subjects of
schema:worksFor conform to :User and the objects to :Company; and the opposite for schema:employee.
5.12. SHAPE-BASED CONSTRAINTS 171

1 :User a sh:NodeShape ;
2 sh:targetSubjectsOf schema:worksFor ;
3 sh:targetObjectsOf schema:employee ;
4 sh:property [
5 sh:path schema:name ;
6 sh:datatype xsd:string ;
7 sh:minCount 1;
8 sh:maxCount 1;
9 ]
10 .

12 :Company a sh:NodeShape ;
13 sh:targetSubjectsOf schema:employee ;
14 sh:targetObjectsOf schema:worksFor ;
15 sh:property [
16 sh:path schema:legalName ;
17 sh:datatype xsd:string ;
18 sh:minCount 1;
19 sh:maxCount 1
20 ] ;
21 sh:property [
22 sh:path schema:employee ;
23 ] .

1 :alice schema:name "Alice" ; # V Passes as :User


2 schema:worksFor :OneCompany .

4 :bob schema:name " Robert " ; # V Passes as :User


5 schema:worksFor :OneCompany .

7 :carol schema:name 34 ; # X Fails as :User


8 schema:worksFor :Something . # Wrong datatype for schema:name

10 :OneCompany schema:legalName "One" ; # V Passes as :Company


11 schema:employee :alice,
12 :bob,
13 :carol .

15 :Something a :Company ; # X Fails as :Company


16 schema:legalName 0 . # Wrong datatype for schema:name

Simulating recursion with property paths SHACL property paths can be used to simulate
recursion in some cases. The idea is combining sh:zeroOrMorePath with an auxiliary shape that
172 5. SHACL
defines the structure of the expected shape without recursion. The recursion is implicitly defined
by the property path.

Example 5.48 Simulating recursion with property paths


Example 5.2 can be defined without recursion as:
1 :User a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:zeroOrMorePath schema:knows ] ;
4 sh:node :UserStructure
5 ] .

7 :UserStructure a sh:NodeShape ;
8 sh:property [
9 sh:path schema:name ;
10 sh:datatype xsd:string ;
11 sh:minCount 1;
12 sh:maxCount 1;
13 ] ;
14 sh:property [
15 sh:path schema:birthDate ;
16 sh:datatype xsd:date ;
17 sh:maxCount 1;
18 ]
19 .

Where :UserStructure is a non-recursive auxiliary shape that defines the structure of nodes
conforming to :User. Figure 5.4 depicts the new model.

:User schema:knows* :UserStructure

schema:Name xsd:string

schema:birthDate xsd:date ?

Figure 5.4: Simulating cyclic model with property paths.

Given the following data:


1 :alice schema:name "Alice " ; # V
Passes as :User
2 schema:birthDate "1995 -06 -03"^^ xsd:date ;
3 schema:knows :bob .

5 :bob schema:name " Robert " . # V Passes as :User

7 :carol schema:name "Carol " ; # X Fails as :User


5.12. SHAPE-BASED CONSTRAINTS 173
8 schema:knows :dave . # wrong value of schema:knows

10 :dave schema:name 23 . # X Fails as :User


11 # wrong value of schema:name

13 :emily schema:name "Emily" ; # X Fails as :User


14 schema:knows :frank . # wrong value of schema:knows

A SHACL processor returns the following.


• It checks that :alice and :bob conform to :User.

• It returns violation errors for :carol, :dave and :emily. In this case, :carol fails to validate as
expected.

Indirect recursion more tricky to simulate as it is difficult to determine the property path
that can be used.

Example 5.49 Indirect recursion with property paths


Example 5.3 can also be simulated using a similar pattern. In this case, we use two non-
recursive auxiliary shapes :UserStructure and :CompanyStructure that contain the plain properties.
Shapes :User and :Company refer to them and capture recursion with property paths. The depen-
dency from :User to :Company and back to :User is captured by the property path (schema:worksFor/
schema:employee)* and similarly the other way around.

:User (schema:worksFor/schema:employee)* :UserStructure

schema:Name xsd:string
sche schema:birthDate xsd:date ?
ma:w
orks
For

ee
mploy
:Company em a:e :CompanyStructure
sch
schema:Name xsd:string

(schema:employee/schema:worksFor)*

Figure 5.5: Simulating indirect recursion with property paths.

1 :User a sh:NodeShape ;
2 sh:property [
174 5. SHACL
3 sh:path [ sh:zeroOrMorePath ( schema:worksFor schema:employee ) ];
4 sh:node :UserStructure
5 ] ;
6 sh:property [
7 sh:path schema:worksFor ;
8 sh:node :CompanyStructure
9 ] .

11 :UserStructure a sh:NodeShape ;
12 sh:property [
13 sh:path schema:name ;
14 sh:datatype xsd:string ;
15 sh:minCount 1; sh:maxCount 1;
16 ]
17 .

19 :Company a sh:NodeShape ;
20 sh:property [
21 sh:path [ sh:zeroOrMorePath ( schema:employee schema:worksFor ) ];
22 sh:node :CompanyStructure
23 ] ;
24 sh:property [
25 sh:path schema:employee ;
26 sh:node :UserStructure
27 ] .

29 :CompanyStructure a sh:NodeShape ;
30 sh:property [
31 sh:path schema:legalName ;
32 sh:datatype xsd:string ;
33 sh:minCount 1; sh:maxCount 1
34 ] .

The previous solution does not scale well for more involved data models were cycles can
appear by different means. As an exercise, the reader can try to simulate the cyclic data model
depicted in Figure 4.9 or the WebIndex data model from Figure 6.1.

5.12.2 QUALIFIED VALUE SHAPES


Qualified value shapes declare that a specified number of nodes conform to some shape. The
shape is declared by the sh:qualifiedValueShape parameter and the parameters sh:qualifiedMinCount
and sh:qualifiedMaxCount declare the minimum and maximum number of values of that shape.
A typical use case for qualified value shapes is to model repeated properties whose values
must conform to different shapes. For example, a data model may contain the property sh:parent
5.12. SHAPE-BASED CONSTRAINTS 175
to represent the biological parent of a person and may want to define that one of the values is
male and the other is female.

Example 5.50 Qualified value shapes example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:parent ;
5 sh:qualifiedValueShape [
6 sh:path :isMale ;
7 sh:hasValue true
8 ] ;
9 sh:qualifiedMinCount 1 ;
10 sh:qualifiedMaxCount 1 ;
11 ];
12 sh:property [
13 sh:path schema:parent ;
14 sh:qualifiedValueShape [
15 sh:path :isFemale ;
16 sh:hasValue true
17 ] ;
18 sh:qualifiedMinCount 1 ;
19 sh:qualifiedMaxCount 1 ;
20 ].

Given the following example:


1 :alice a :User; # V Passes as :UserShape
2 schema:parent :bob, :carol .

4 :bob a :User ; # V Passes as :UserShape


5 :isMale true ;
6 schema:parent [ :isMale true ] ;
7 schema:parent [ :isFemale true ] .

9 :carol :isFemale true .

11 :dave a :User ; # XFails as :UserShape


12 schema:parent :emily, :frank . # :emily does not have :isMale true

14 :emily a :User . # X Fails as :UserShape

16 :frank :isFemale true .

18 :gordon a :User ; # V Passes as :UserShape


19 schema:parent [ :isMale true] ;
20 schema:parent [ :isFemale true] ;
21 schema:parent :heidi .
176 5. SHACL
A SHACL processor checks that :alice, :bob, and :gordon conform to :UserShape but returns
the following errors.
• :dave does not conform to :UserShape because the number of values that satisfy the qualified
value shape that checks that a parent is male is 0.
• :emily does not conform to :UserShape because the number of values that satisfy the qualified
value shapes that checks that a parent is male and the other is female is 0.
conforms to :UserShape but has three parents. If we want to further constraint that
:gordon
the number of biological parents must be exactly two, we can add:
1 :UserShape sh:property [
2 sh:path schema:parent ;
3 sh:minCount 2 ;
4 sh:maxCount 2
5 ].

In the shapes graph of example 5.50, there is no constraint the declares that the biological
parents must not be male or female at the same time. Using the following data:
1 :oscar a :User ; # V Passes as :UserShape
2 schema:parent :x .

4 :x :isMale true;
5 :isFemale true .

Node :oscar conforms to :UserShape which seems counter intuitive as it has a single parent
that satisfies both being female and male at the same time. There are two solutions, the first one
is to add the previous declaration that sh:minCount 2 and sh:maxCount 2 for property schema:parent.
In this way, :oscar would not conform because it has only one parent. Another solution is to
declare that the qualified value shapes are disjoint as follows.
Qualified value shapes contain a Boolean optional parameter
sh:qualifiedValueShapesDisjoint. It it is true, then the value nodes must not conform to
any of the sibling shapes. The default value is false.
Using this parameter, we could add the constraint that nodes that satisfy the female con-
straint are disjoint from nodes that satisfy the male constraint in the case of biological parents.24
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:parent ;
5 sh:qualifiedValueShape [
6 sh:path :isMale ;
24 Notice that forcing this condition in general may not always be desirable in some contexts.
5.13. CLOSED SHAPES 177
7 sh:hasValue true
8 ] ;
9 sh:qualifiedMinCount 1 ;
10 sh:qualifiedMaxCount 1 ;
11 sh:qualifiedValueShapesDisjoint true
12 ];
13 sh:property [
14 sh:path schema:parent ;
15 sh:qualifiedValueShape [
16 sh:path :isFemale ;
17 sh:hasValue true
18 ] ;
19 sh:qualifiedMinCount 1 ;
20 sh:qualifiedMaxCount 1 ;
21 sh:qualifiedValueShapesDisjoint true
22 ].

5.13 CLOSED SHAPES


sh:closed can be used to specify the condition that nodes do not have triples with properties
different than the ones that have been explicitly enumerated as a value of sh:path in any of the
property shapes.
The value of sh:closed is a Boolean that only has effect if it is true (it is assumed to be false
if not specified).
The parameter sh:ignoredProperties specifies a list of properties that are also permitted in
addition to those enumerated by the value of sh:path in property shapes.

Table 5.9: Closed shapes

Parameter Description
sh:closed Valid resources must only have values for properties
that appear as values of sh:path in property shapes.
sh:ignoredProperties List of predicates that are also permitted in addition
to those that are explicitly enumerated.

Example 5.51 Closed shapes


The following example declares that nodes conforming to :UserShape have only one prop-
erty schema:name, zero or more properties schema:knows and are allowed to have extra values for
property rdf:type.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
178 5. SHACL
3 sh:closed true ;
4 sh:ignoredProperties ( rdf:type );
5 sh:property [
6 sh:path schema:name ;
7 sh:minCount 1 ;
8 sh:maxCount 1;
9 sh:datatype xsd:string
10 ].

Given the following data:


1 :alice a :User ; #p:UserShape
2 schema:name "Alice" .

4 :bob a :User, #p:UserShape


5 :Person ;
6 schema:name " Robert " .

8 :carol a :User ; # X Fails as :UserShape


9 schema:name "Carol ";
10 schema:cookTime 23 .

A SHACL processor will check that both :alice and :bob conform to :UserShape but will
return the error:
• :carol does not conform to :UserShape because it has an extra property schema:cookTime which
is not allowed.

Note that sh:closed does not take into account SHACL property paths or constraints with
sh:node, sh:and, sh:or , etc.

Example 5.52 Closed only accounts for top-level predicates


The following shape does not allow any property except from rdf:type and schema:name.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:closed true ;
4 sh:ignoredProperties ( rdf:type );
5 sh:property [
6 sh:path schema:name ;
7 sh:datatype xsd:string ;
8 ] ;
9 sh:property [
10 sh:path [ sh:zeroOrOnePath schema:knows ] ;
11 sh:nodeKind sh:IRI ;
12 ] ;
13 sh:node [
5.13. CLOSED SHAPES 179
14 sh:property [
15 sh:path schema:worksFor ;
16 sh:nodeKind sh:IRI ;
17 ] ] .

Given the following data:


1 :alice a :User ; # V Passes as :User
2 schema:name "Alice" .

4 :bob a :User ; # X Fails as :User


5 schema:name " Robert " ;
6 schema:knows :carol .

8 :carol a :User ; # X Fails as :User


9 schema:name "Carol" ;
10 schema:worksFor :myCompany .

A SHACL processor:
• Checks that :alice conforms to :User.
• Fails for nodes :bob and :carol because they use properties schema:knows and schema:worksFor
in a closed shape.
A solution is to add those predicates to the list of ignored properties:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:closed true ;
4 sh:ignoredProperties ( rdf:type
5 schema:knows
6 schema:worksFor );
7 sh:property [
8 sh:path schema:name ;
9 sh:datatype xsd:string ;
10 ] ;
11 sh:property [
12 sh:path [ sh:zeroOrOnePath schema:knows ] ;
13 sh:nodeKind sh:IRI ;
14 ] ;
15 sh:node [
16 sh:property [
17 sh:path schema:worksFor ;
18 sh:nodeKind sh:IRI ;
19 ] ] .

An advice to use sh:closed is to enumerate all relevant properties as direct values of sh:path,
or add them to the sh:ignoredProperties list.
180 5. SHACL
5.14 PROPERTY PAIR CONSTRAINTS
Property pair constraints specify conditions in relation to other properties. These constraint com-
ponents can only be used in property shapes. Table 5.10 lists the parameters that can be used
to declare property pair constraints. All the predicates have a similar behavior, they compare
pairs of values of the current and referenced property on the current focus node and check the
condition.
Table 5.10: Property pair constraints

Operation Description
sh:equals The sets of values from both properties at a given
focus node must be equal.
sh:disjoint The sets of values from both properties at a given
focus node must be different.
sh:lessThan Current values must be smaller than than values of
another property.
sh:lessThanOrEquals Current values must be smaller or equal than than
values of another property.

Example 5.53 Equality constraints example


1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:givenName ;
5 sh:equals foaf:firstName
6 ];
7 sh:property [
8 sh:path schema:givenName ;
9 sh:disjoint schema:lastName
10 ] .

1 :alice a :User ; # V Passes as :UserShape


2 schema:givenName "Alice ";
3 schema:lastName " Cooper ";
4 foaf:firstName "Alice " .

6 :bob a :User ; # X Fails as :UserShape


7 schema:givenName "Bob";
8 schema:lastName "Smith " ;
9 foaf:firstName " Robert " .
5.14. PROPERTY PAIR CONSTRAINTS 181

11 :carol a :User ; # X Fails as :UserShape


12 schema:givenName "Carol";
13 schema:lastName "Carol" ;
14 foaf:firstName "Carol" .

A SHACL processor checks that :alice conforms to :UserShape and returns the following
errors.
• :bob has a different value for foaf:firstName and schema:givenName.
• :carolhas the same value for schema:givenName and schema:lastName when they should be
different.

Example 5.54 Value comparison example


The following example declares a :ConcertShape with three properties: schema:doorTime,
schema:startDate, and schema:endDate whose values must have datatype xsd:dateTime and establishes
the conditions that door time must be less or equals to start date, and start date must be before
end date.
1 :ConcertShape a sh:NodeShape ;
2 sh:targetClass :Concert ;
3 sh:property [
4 sh:path schema:doorTime ;
5 sh:datatype xsd:dateTime ;
6 sh:lessThanOrEquals schema:startDate ;
7 ];
8 sh:property [
9 sh:path schema:startDate ;
10 sh:datatype xsd:dateTime ;
11 sh:lessThan schema:endDate
12 ];
13 sh:property [
14 sh:path schema:endDate ;
15 sh:datatype xsd:dateTime ;
16 ] .

Given the following data:


1 :concert1 a :Concert ; # V
Passes as :ConcertShape
2 schema:doorTime "2017 -04 -20 T20:00:00 "^^ xsd:dateTime ;
3 schema:startDate "2017 -04 -20 T21:30:00 "^^ xsd:dateTime ;
4 schema:endDate "2017 -04 -20 T23:00:00 "^^ xsd:dateTime ;
5 .

7 :concert2 a :Concert ; # V Passes as :ConcertShape


182 5. SHACL
8 schema:doorTime "2018 -04 -20 T20:00:00 "^^ xsd:dateTime ;
9 schema:startDate "2017 -04 -20 T21:00:00 "^^ xsd:dateTime ;
10 schema:endDate "2017 -04 -20 T21:00:00 "^^ xsd:dateTime ;
11 .

A SHACL processor checks that :concert1 conforms to :ConcertShape and reports the fol-
lowing.
• The value of schema:doorTime must be less than or equal to the value of schema:startDate in
:concert2.

• The value of schema:startDate must be less than the value of schema:endDate in :concert2.

5.15 NON-VALIDATING SHACL PROPERTIES


SHACL introduces several properties that are not intended for validation and are ignored dur-
ing the validation process. These properties are intended for documentation or declarative User
Interface (form) building and are listed in Table 5.11.

Table 5.11: Property pair constraints

Operation Description
sh:name Specifies human-readable labels for a property shape.
sh:description Specifies a description of a property shape.
sh:order Indicates the relative order of a property shape in a form. A typical use
case is to display the property shapes sorted according to the values of
sh:order. The values must be decimals.
sh:group Group several property shapes together. Each group may have addi-
tional triples for different purposes like rdfs:label for form build-
ing. Groups can also have a sh:order value.
sh:defaultValue Describes the default value for a property. This value may be used by
form builders to pre-populate input fields.

Example 5.55 Non-validating SHACL properties


The following example declares a :UserShape with several property shapes.
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:familyName ;
4 sh:name " family Name";
5.15. NON-VALIDATING SHACL PROPERTIES 183
5 sh:description " Family name. In the U.S., the last name of an Person ";
6 sh:order 2 ;
7 sh:group :nameGroup
8 ] ;
9 sh:property [
10 sh:path schema:givenName ;
11 sh:name "Given name";
12 sh:description "Given name. In the U.S., the first name of a Person ";
13 sh:order 1 ;
14 sh:group :nameGroup ;
15 ];
16 sh:property [
17 sh:path schema:streetAddress ;
18 sh:name " Street address ";
19 sh:order 5 ;
20 sh:group :addressGroup
21 ] ;
22 sh:property [
23 sh:path schema:addressCountry ;
24 sh:name " Country ";
25 sh:defaultValue "Spain" ;
26 sh:order 6 ;
27 sh:group :addressGroup
28 ] .

30 :nameGroup a sh:PropertyGroup ;
31 rdfs:label "Name" .

33 :addressGroup a sh:PropertyGroup ;
34 rdfs:label " Address " .

An application could generate a web form like the one in Figure 5.6.

Name
Given name:
Family name:

Address
Street address:
Country: Spain

Figure 5.6: Possible form generated from a shapes graph.


184 5. SHACL
5.16 SHACL-SPARQL
The SHACL recommendation was divided in two parts: SHACL Core and SHACL-SPARQL.
SHACL Core (which was discussed until now in this chapter) was designed so it could be im-
plemented without the need of an underlying SPARQL processor. It contains what was con-
sidered the most frequent constraint components. However, there will be use cases where some
extra features will be necessary to express more complex constraints. To that end, SHACL-
SPARQL contains an extension mechanism that enables the definition of other constraints us-
ing SPARQL.
SHACL Core processors are not required to support SHACL-SPARQL. However,
SHACL-SPARQL processors must support SHACL Core.
A working group note has also proposed to define a similar extension mechanism using
Javascript.25

5.16.1 SPARQL CONSTRAINTS


sh:sparql associates a shape with a SPARQL based constraint that declares the SPARQL query
to evaluate.
SPARQL-based constraints are nodes of type sh:SPARQLConstraint that can have the fol-
lowing properties.
• sh:message: A human-readable message explaining the cause of the violation.
• sh:select: Contains a string with the SPARQL query. The SPARQL query can refer to a
special variable $this which is bound to the focus node before executing the SPARQL
query. SHACL processors may also bind the variables $shapesGraph and $currentShape
to the current shapes graph and shape. In case of property shapes, the variable $PATH acts as
a placeholder for the path used by the property shape. A validation result (see section 5.5)
will be generated by each solution in the SPARQL query.
• sh:prefixes:Points to namespace prefix declarations. Prefix declarations are the values of
the sh:declare which are pairs of property values sh:prefix and sh:namespace. The first one
is the prefix alias and the second one is the namespace, which must be a literal of type
xsd:anyURI.

Example 5.56 SPARQL constraint example


The following shape declares that nodes conforming to :UserShape have the constraint that
schema:name must be equal to the concatenation of schema:givenName and schema:familyName.

1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
25 http://W3C.github.io/data-shapes/shacl-js/
5.16. SHACL-SPARQL 185
3 sh:sparql [
4 a sh:SPARQLConstraint ;
5 sh:message " schema:name must equal schema:givenName + schema:familyName ";
6 sh:prefixes [
7 sh:declare [
8 sh:prefix " schema " ;
9 sh:namespace "http: // schema .org/"^^ xsd:anyURI ;
10 ]
11 ] ;
12 sh:select
13 """ SELECT $this ( schema:name AS ?path) (? name as ? value )
14 WHERE {
15 $this schema:name ?name .
16 $this schema:givenName ? givenName .
17 $this schema:familyName ? familyName .
18 FILTER (! isLiteral (? value) ||
19 ! isLiteral (? givenName ) ||
20 ! isLiteral (? familyName ) ||
21 concat (str (? givenName ), ' ', str (? familyName ))!=? name
22 )
23 }""" ;
24 ] .

Given the following data:


1 :alice a :User ; # V Passes as :UserShape
2 schema:givenName "Alice" ;
3 schema:familyName " Cooper " ;
4 schema:name "Alice Cooper " .

6 :bob a :User ; # X Fails as :UserShape


7 schema:givenName "Bob" ;
8 schema:familyName "Smith" ;
9 schema:name " Robert Smith" .

A SHACL processor checks that :alice conforms to :UserShape and returns the error:

• :bobdoes not conform to :UserShape because values of schema:name must be equal to the
concatenation of schema:givenName and schema:familyName.

5.16.2 SPARQL-BASED CONSTRAINT COMPONENTS


SHACL-SPARQL also contains the possibility to declare reusable constraint components.
Once defined, they can be used just like the other built-in SHACL Core components, without
the need to write SPARQL.
186 5. SHACL
SHACL constraint components are defined by declaring the list of parameters and asso-
ciating them with validators. Those validators are usually declared in SPARQL, although there
is a WG note for allowing Javascript-based validations (see section 5.20).
The properties that can be used to define constraint components are the following.

• sh:parameter associates a parameter declaration with the constraint component. The dec-
laration has a value for sh:path that must be an IRI and may have a Boolean value for
sh:optional (if not present, it is assumed false by default).

The local name of the IRI associated by sh:path will be taken as the local name for the
parameter. For example, in the following parameter declaration:
1 sh:parameter [
2 sh:path :listOfLength ;
3 sh:optional true ;
4 ]

The local name of the parameter is listOfLength and is used as a SPARQL (or Javascript)
variable that is prebound to the component parameter value.

• sh:labelTemplate can be used to specify how the constraint will be rendered. The value is a
string that can contain references to parameter names inside curly brackets. For example:
"Checks the list has {?listOfLength} values".

• sh:validatorassociates a validator with the constraint component. In SHACL-SPARQL,


there are two types of validators: SELECT-based and ASK-based.
SELECT-based validators are introduced by sh:nodeValidator or sh:propertyValidator (de-
pending on whether they are declared for a node or a property shape). They have one value
for the property sh:select that is a string containing a SPARQL select query. SHACL
processors prebind the variable $this in the SELECT clause and the variable $PATH in
case of property shapes. Each solution of the SPARQL select query will be reported as a
violation.
ASK-based validators are introduced by the property sh:validator and are executed for
each value node. If the result of the ASK query is true, then the value node conforms to
the shape. Notice that ASK-based validators work in the opposite direction to SELECT-
based ones. While SELECT-based validators return no results to indicate conformance,
ASK-based validators return true to indicate conformance.

Example 5.57 SPARQL constraint component example


The following code declares a SPARQL constraint component that checks that an RDF
list given as a value, has a fixed length.
5.16. SHACL-SPARQL 187

1 :FixedListConstraintComponent
2 a sh:ConstraintComponent ;
3 rdfs:label "Fixed list constraint component " ;
4 sh:parameter [
5 sh:path :size ;
6 sh:name "Size of list" ;
7 sh:description "The size of the list" ;
8 ] ;
9 sh:labelTemplate "Size of values: \"{ $size }\"" ;
10 sh:propertyValidator [
11 a sh:SPARQLSelectValidator ;
12 sh:message "{$PATH } must have length {? size}, not {? count }" ;
13 sh:prefixes [ sh:declare [
14 sh:prefix "rdf" ;
15 sh:namespace "http: // www.w3.org /1999/02/22 - rdf -syntax -ns#"
16 ]
17 ] ;
18 sh:select """
19 SELECT $this ?value $count WHERE {
20 $this $PATH ?value .
21 { { SELECT $this ?value (COUNT (? member ) AS ? count ) $size WHERE {
22 ?value rdf:rest */ rdf:first ? member
23 } GROUP BY $this ?value $size
24 }
25 FILTER (! isBlank (? value ) || ?count != $size )
26 }
27 }"""
28 ] .

A property shape can be declared as:


1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path :color ;
5 :size 3 ;
6 sh:minCount 1
7 ] .

Given the following data:


1 :p1 a :Product ; # V Passes as :Product
2 :color (255 0 255) .

4 :p2 a :Product ; # X Fails as :Product


5 :knows ( :x :y ) ;
6 :color (255 0 210 345) .

8 :p3 a :Product ; # X Fails as :Product


188 5. SHACL
9 :color 3 .

11 :p4 a :Product . # X Fails as :Product

A SHACL processor would validate that :p1 conforms to :Product but would report the
following errors.

• For :p2 the error message ":color must have length 3, not 4".

• For :p3 the error message ":color must have length 3, not 0".

• For :p4 the message that sh:minCount failed because there are no values for property :color.

Notice that the following example, although similar in effect, is not a valid shape defini-
tion:
1 :ProductShape a sh:NodeShape ;
2 sh:targetObjectsOf :color ;
3 :size 3 .

The reason is that a sh:nodeValidator is not defined for the :FixedListConstraintComponent


and a SHACL-SPARQL processor does not know how to execute the :size parameter in this
context. A solution would be to define both a property and a node validator, or define one ASK-
based validator.

5.17 SHACL AND INFERENCE SYSTEMS


SHACL uses some parts of the RDF Schema and OWL vocabularies, but full RDF Schema or
OWL inference is not required.
SHACL processors may support different entailment regimes which are defined in the
same way as for SPARQL. An entailment regime is identified by an IRI and defines the kind
of inference a processor will do to the data graph. A shapes graph that contains a triple with
predicate sh:entailment and value E indicates that it requires entailment E. If a SHACL processor
does not support entailment E, it will return an error.
Some values for the property sh:entailment are described in Table 5.12.

Table 5.12: Some entailment regimes

IRI Name
http://www.w3.org/ns/entailment/RDF RDF
http://www.w3.org/ns/entailment/RDFS RDF Schema
http://www.w3.org/ns/entailment/OWL-Direct OWL 2 direct semantics
5.17. SHACL AND INFERENCE SYSTEMS 189
Example 5.58 Example with entailment
The following shapes graph declares a :Teacher shape as someone that has property :teaches
with a value that is an instance of :Course and has rdf:type with value :Person. It also requires RDF
Schema entailment.

1 <> sh:entailment <http: // www.w3.org/ns/ entailment /RDFS > .

3 :Teacher a sh:NodeShape , rdfs:Class ;


4 sh:property [
5 sh:path :teaches ;
6 sh:class :Course ;
7 sh:minCount 1
8 ];
9 sh:property [
10 sh:path rdf:type ;
11 sh:qualifiedValueShape [
12 sh:hasValue :Person ;
13 ] ;
14 sh:minCount 1
15 ] .

Given the following data:

1 :alice a :Teacher, # V Passes as :Teacher with RDFS entailment


2 :Person ; # V Passes as :Teacher without RDFS entailment
3 :teaches :algebra .

5 :bob a :Teacher ; # V Passes as :Teacher with RDFS entailment


6 :teaches :logic . # X Fails as :Teacher without RDFS entailment
8 :carol a :Teacher ; # V
Passes as :Teacher with RDFS entailment
9 :teaches :algebra . # V
Passes as :Teacher without RDFS entailment
10 # It uses SHACL instances

12 :algebra a :Course .

14 :teaches rdfs:range :Course .


15 :teaches rdfs:domain :Teacher .
16 :Teacher rdfs:subClassOf :Person .

• :alice conforms to teacher with or without RDFS entailment, because it has rdf:type
:Person and it :teaches :algebra, and :algebra has rdf:type :Course.

• :bob only conforms if RDF Schema entailment is performed, because it infers that it has
rdf:type :Person and that :logic rdf:type :Course. Without RDF Schema entailment it fails.
190 5. SHACL
• :carolconforms to :Teacher even without RDF Schema entailment activated and even if it
does not have rdf:type :Person. The reason is that it is a SHACL instance of :Person (see
Section 5.7.2).

Although SHACL does not require inference, it has a special treatment for the properties
rdfs:subClassOf, rdf:type and owl:imports.

5.18 SHACL COMPACT SYNTAX


A SHACL compact syntax has been suggested for a subset of SHACL inspired by ShEx com-
pact syntax.26 Although it was not published as a working group note, it is expected that further
development will be done in the context of the W3C SHACL community group.
Given the temporary status, the following description may differ from the final compact
syntax that is published.

Example 5.59 SHACL example using compact syntax


Example 5.1 could be written in SHACL compact syntax as:
1 :UserShape -> :User {
2 IRI .
3 schema:name xsd:string [1..1] .
4 schema:gender in = [ schema:Male schema:Female ] [1..1] .
5 schema:birthDate xsd:date [0..1] .
6 schema:knows :User
7 }

The operator -> declares a sh:targetClass and the dot operator . separates the different
constraint components.

5.19 SHACL RULES AND ADVANCED FEATURES


The Data Shapes Working Group published a note called SHACL Advanced Features (27 ). It
defines the following language constructs.
• SPARQL-based targets provide a vocabulary that extend the ways targets can be declared
from SHACL Core. Two types of targets are defined: (a) sh:SPARQLTarget that provides a
SPARQL query directly in the target definition; and, (b) sh:SPARQLTargetType that provides
a mechanism similar to constraint components for parametrizable targets.
• Annotations properties provide an injection mechanism where users can pass static or
dynamic annotations from the shape definitions to the validation results.
26 http://W3C.github.io/data-shapes/shacl-compact-syntax/
27 https://www.w3.org/TR/shacl-af/
5.19. SHACL RULES AND ADVANCED FEATURES 191
• Functions provide a vocabulary to define SPARQL functions that can be reused in the
SELECT or ASK based validators (see Section 5.16.2).

• Node expressions a set of predefined functions that can be used to compute values from
focus nodes, e.g., compute a display label for an IRI.

• Constraint expressions extend the node expressions for validation purposes.

• Rules provide a light-weight RDF inferencing mechanism based on SHACL shapes.


At the time of this writing, there are no implementation reports for the advanced SHACL
features. We showcase the SPARQL-based targets and the SHACL rules with examples and
point the reader to the working group note for further reference.

Example 5.60 SPARQL-based target declarations


The following shape declares a target that selects only instances of :Teacher that teach
algebra.
1 :AlgebraTeacher a sh:NodeShape ;
2 sh:target [
3 a sh:SPARQLTarget ;
4 sh:prefixes [ sh:declare ":" ;
5 sh:namespace "http: // example .org/";
6 ] ;
7 sh:select """ SELECT ?this WHERE {
8 ?this a :Teacher .
9 ?this :teaches :Algebra .} """ ;
10 ] ;
11 sh:property [
12 sh:path :field ;
13 sh:hasValue :Mathematics ;
14 ] .

Given the following data:


1 :alice a :Teacher ; # V Passes as :AlgebraTeacher
2 :teaches :algebra ;
3 :field :Mathematics .

5 :bob a :Teacher ; # XFails as :AlgebraTeacher


6 :teaches :algebra . # No value for :field

8 :carol a :Teacher ; # Ignored


9 :teaches :logic .

A SHACL processor with SPARQL-based target support checks that :alice conforms to
shape and signals the error.
:AlgebraTeacher
192 5. SHACL
• :bob does not have value :Mathematics for :field property.

• is ignored although it does not have :field property, because it is not selected by the
:bob
SPARQL target.

Example 5.61 SHACL rules example


The following shape defines a rule that states that users with a value for the property
:teaches are instances of :Teacher.

1 :User a sh:NodeShape ;
2 sh:targetClass :User
3 sh:rule [
4 a sh:TripleRule ;
5 sh:subject sh:this ;
6 sh:predicate rdf:type ;
7 sh:object :Teacher ;
8 sh:condition [
9 sh:property [
10 sh:path :teaches ;
11 sh:minCount 1 ;
12 ] ;
13 ] .

Given the following data:


1 :alice a :User ;
2 :teaches :algebra .

4 :bob a :User ;
5 :teaches :logic .

7 :carol a :User ;
8 :attends :algebra .

A SHACL rules engine will infer the following RDF triples:


1 :alice a :Teacher .
2 :bob a :Teacher .

• :carol does not get an inferred triple because it does not have a value for :teaches.
5.20. SHACL JAVASCRIPT 193
5.20 SHACL JAVASCRIPT
SHACL Javascript (SHACL-JS)28 was published as a Working Group Note to enable the def-
inition of constraint components in Javascript. It is also intended to express advanced features
like custom targets, fuctions and rules in Javascript.
SHACL-JS is similar to SHACL-SPARQL but for Javascript instead of SPARQL. The
basic idea is that shapes can point to JavaScript functions available at some URL that can be
resolved from the Web. When shapes get evaluated, a SHACL-JS engine calls those functions
and constructs validation results from the results obtained by these calls. The Javascript code can
access the RDF triples available in the data and shapes graphs through a Javascript API.
Note that at the time of this writing, there are no implementation reports for SHACL JS
(the following code is speculative).

Example 5.62 Javascript-based constraint example


Assuming the following Javascript function is defined in http://example.org/numberFunctions
1 function isOddNumber ( $value ) {
2 if( $value . isLiteral ()) {
3 return $value .lex % 2 == 1
4 } else {
5 return false ;
6 }

and given the following shape:


1 :VotingCommittee a sh:NodeShape ;
2 sh:targetClass :VotingCommittee ;
3 sh:property [
4 sh:path :numberOfVoters ;
5 sh:js [
6 a sh:JSConstraint ;
7 sh:message " Number of voters must be odd to avoid ties" ;
8 sh:jsLibrary [
9 sh:jsLibraryURL "http: // example .org/ numberFunctions "
10 ] ;
11 sh:jsFunctionName " isOddNumber " ;
12 ]
13 ] .

With the following data:


1 :JuryCommittee a :VotingCommittee ; # V Passes as :VotingCommittee
2 :numberOfVoters 7 .

4 :CityCommittee a :VotingCommittee ; # X Fails as :VotingCommittee


5 :numberOfVoters 8 .

28 https://www.w3.org/TR/shacl-js/
194 5. SHACL
A SHACL-JS processor:
• checks that :JuryCommittee conforms to :VotimgCommittee; and
• returns a violation error for :CityCommittee with the message "Number of voters must be odd
to avoid ties."

5.21 SUMMARY
• SHACL is divided in two parts: SHACL Core and SHACL SPARQL.
• Shapes in SHACL contain the notion of target declarations which declare the sets of nodes
that they apply.
• There are two types of shapes: node and property shapes.
• Shapes contain a list of parameters of constraint components.
• SHACL SPARQL allows users to define their own constraint components.
• Some SHACL extensions have already been proposed like SHACL rules and SHACL
Javascript.

5.22 SUGGESTED READING


• H. Knublauch and D. Kontokostas. Shapes Constraint Language (SHACL). W3C Proposed
Recommendation, June 2017. https://www.w3.org/TR/shacl/
• S. Steyskal and K. Coyle. SHACL Use Cases and Requirements. W3C Working Draft,
2016. https://www.w3.org/TR/shacl-ucr/
• K. Cagle. SHACL: It’s about time. https://dzone.com/articles/its-about-time,
March 2017.
CHAPTER 6

Applications
In this chapter we describe several applications of RDF validation. We start with the WebIndex,
a medium-size linked data portal that was one of the earliest applications of ShEx. We describe
it using ShEx and SHACL so the reader can see how both formalisms can be applied to describe
RDF data.
In Section 6.2, we present the use of ShEx in HL7 FHIR, which was one of the main
motivations for the development of ShEx.
Section 6.3 describes Springer Nature SciGraph, a real-world application of SHACL.
Section 6.4 talks about validation use cases that have emerged in the DBpedia project.
We end the chapter with two exercises: the validation of ShEx files, encoded as RDF using
ShEx itself (Section 6.5), and the validation of SHACL shapes graphs in RDF using SHACL
(Section 6.6). These exercises help us understand the expressiveness of both formalisms.

6.1 DESCRIBING A LINKED DATA PORTAL


Linked data portals have emerged as a way to publish data on the Web in accordance with
principles that improve data reuse and integration. As discussed in Section 1.1, linked data uses
RDF to make statements that establish relationships between arbitrary things. In this section,
we consider one of the earliest practical applications of ShEx, the description of a real linked
data portal, the WebIndex, and its data model. Some contents of this section have been taken
from this paper [58] where we also compare the performance of two early implementations of
ShEx and SHACL.
The WebIndex is a multi-dimensional measure of the World Wide Web’s contribution to
development and human rights globally. In its latest edition (from 2014), it covers 81 countries
and incorporates indicators that assess several areas such as universal access; freedom and open-
ness; relevant content; and empowerment. Its first version provided a data portal where the data
was obtained by transforming raw observations and precomputed values from Excel sheets into
RDF. The second version added an approach to validation and computation that resulted in a
verifiable version of the index data.
The WebIndex data model is based on the RDF Data Cube vocabulary [24] and reuses
several vocabularies such as Organization ontology [83] and Dublin Core [10].
Figure 6.1 shows the main concepts of the data model. The boxes represent the different
shapes of nodes that are published in the data portal.
196 6. APPLICATIONS
dct:publisher

:DataSet :Slice :Organization


1..n
rdf:type = qb:DataSet rdf:type = qb:Slice rdf:type = org:Organization
qb:structure = wf:DSD qb:slice qb:sliceStructure = wf:sliceByArea rdfs:label : xsd:string
rdfs:label:xsd:string foaf:homepage:IRI

qb:observation
qb:dataSet

cex:indicator
1..n

cex:ref-area
cex:indicator
:Observation
wf:provider
rdf:type = qb:Observation, wf:Observation
cex:value : xsd:float
dct:issued : xsd:dateTime :Indicator
:Country rdfs:label : xsd:string rdf:type : cex:Primary|cex:Secondary
wf:iso2 : xsd:string cex:ref-year : xsd:gYear rdfs:label:xsd:string
rdfs:label : xsd:string dct:publisher = wf:WebFoundation ?
wf:source : IRI

cex:computation

:Computation
rdf:type : cex:Computation

Figure 6.1: Simplified WebIndex data model.

The main concept is an observation of type wf:Observation which has a float value cex:value
for a given indicator, as well as the country, year, and dataset. Observations can be raw observa-
tions, which are obtained from an external source, or computed observations, which are obtained
from other observations by computational processes.
A dataset contains a number of slices, each of which also contains a number of observa-
tions. Indicators are provided by an organization of type org:Organization, which is based on the
Organization ontology.
Datasets are also published by organizations.
A sample from the DITU dataset provided by ITU (International Telecommunication
Union) states that, in 2011, Spain had a value of 23.78 for the TU-B (Broadband subscribers per
100 population) indicator. This information is represented in Turtle as:
1 :obs8165
2 a qb:Observation, wf:Observation ;
3 rdfs:label "ITU B in ESP" ;
4 dct:issued "2013 -05 -30 T09:15:00 "^^ xsd:dateTime ;
5 cex:indicator :ITU_B ;
6 qb:dataSet :DITU ;
7 cex:value "23.78"^^ xsd:float ;
8 cex:ref -area :Spain ;
9 cex:ref -year "2011"^^ xsd:gYear ;
6.1. DESCRIBING A LINKED DATA PORTAL 197
10 cex:computation :comp234 .

Data following the WebIndex data model is richly interrelated. Observations are linked
to indicators and to datasets. Datasets contain links to slices. Slices have links both to indicators
and back to observations. Both datasets and indicators are linked to the organizations by which
they are published or made available. Such links are illustrated in the following example:
1 :DITU a qb:DataSet ;
2 qb:structure wf:DSD ;
3 rdfs:label "ITU Dataset " ;
4 dct:publisher :ITU ;
5 qb:slice :ITU09B ,
6 :ITU10B,
7 ...
8 :ITU09B a qb:Slice ;
9 qb:sliceStructure wf:sliceByArea ;
10 qb:observation :obs8165,
11 :obs8166,
12 ...
13 :ITU a org:Organization ;
14 rdfs:label "ITU" ;
15 foaf:homepage <http: // www.itu.int/> .

17 :Spain
18 wf:iso2 "ES" ;
19 rdfs:label "Spain" .

21 :ITU_B a wf:SecondaryIndicator ;
22 rdfs:label " Broadband subscribers %";
23 wf:provider :ITU .

For verification, the WebIndex data model includes a representation of computations that
declare how each observation has been obtained, either from a raw dataset or computed from
the observations of other datasets. The structure of computation descriptions, presented in [56],
is omitted here for simplicity.
In the next section we formally define the structure of this simplified WebIndex data
model using ShEx and review the main differences with the original.

6.1.1 WEBINDEX IN SHEX


The following declaration indicates that a valid :Country shape must have exactly one rdfs:label
and exactly one wf:iso2 both of which must be literals of type xsd:string. In the case of wf:iso2 it
must also have length 2.
1 :Country {
2 rdfs:label xsd:string ;
3 wf:iso2 xsd:string LENGTH 2
198 6. APPLICATIONS
4 }

In this example, we deliberately omitted the requirement for a rdf:type declaration. This
means that, in order to satisfy the :Country shape, a node need only have the properties that have
been specified and may or may not include rdf:type declarations.
By default, shape definitions are open meaning that additional triples with different pred-
icates may be present, so nodes of shape :Country could have other properties beyond those pre-
scribed by the shape.
The shape of datasets is described as follows:
1 :DataSet { a [ qb:DataSet ],
2 qb:structure [ wf:DSD ],
3 rdfs:label xsd:string ?,
4 qb:slice @:Slice +,
5 dct:publisher @:Organization
6 }

This says that nodes conforming to :DataSet shape must have rdf:type with value qb:DataSet, a
qb:structure of wf:DSD, an optional rdfs:label of type xsd:string, one or more qb:slice predicates
whose object is the subject of a set of triples matching the :Slice shape definition and exactly
one dct:publisher, whose object is the subject of a set of triples matching the :Organization shape.
The :Slice shape is defined in a similar fashion:
1 :Slice { a [ qb:Slice ],
2 qb:sliceStructure [ wf:sliceByYear ],
3 qb:observation @:Observation +,
4 cex:indicator @:Indicator
5 }

The :Observation shape in the WebIndex data model has two rdf:type declarations, which
indicate that they must be instances of both the RDF Data Cube class of Observation
(qb:Observation) and the wf:Observation class from the Web Foundation ontology. The property
dct:publisher is optional, but if it appears, it must have value wf:WebFoundation.
Values conforming to :Observation shape can either have a wf:source property of type IRI
(which, in this context, is used to indicate that it is a raw observation that has been taken from
the source represented by the IRI), or a cex:computation property whose value conforms to the
:Computation shape.
It should be noted that shapes do not define the semantics of an RDF graph. While the
designers of the WebIndex dataset model have determined that a raw observation would be
indicated using the wf:source predicate and with the object IRI referencing the original source,
ShEx simply states that, in order for a subject to satisfy the :Observation, it must include either a
wf:source or a cex:computation predicate, period. Meaning must be found elsewhere.

1 :Observation {
2 a [ qb:Observation ],
6.1. DESCRIBING A LINKED DATA PORTAL 199
3 a [ wi:Observation ],
4 cex:value xsd:float ,
5 dct:issued xsd:dateTime ,
6 dct:publisher [ wf:WebFoundation ]?,
7 qb:dataSet @:DataSet ,
8 cex:ref -area @:Country ,
9 cex:indicator @:Indicator ,
10 cex:ref -year xsd:gYear ,
11 ( wf:source IRI
12 | cex:computation @:Computation
13 )
14 }

A computation is represented as a node with type cex:Computation.


1 :Computation {
2 a [ cex:Computation ]
3 }

The type of indicators must be either wf:PrimaryIndicator or wf:SecondaryIndicator. They


must also contain the property wf:provider with a value conforming to shape :Organization.
1 :Indicator {
2 a [ wf:PrimaryIndicator
3 wf:SecondaryIndicator
4 ],
5 wf:provider @:Organization
6 }

In the case of organizations, we declare these as closed shapes using the CLOSED modifier
and only allow the properties rdfs:label, foaf:homepage and rdf:type, which must have the value
org:Organization. The EXTRA modifier is used to declare that we allow other values for the rdf:type
property (using the Turtle keyword a).
1 :Organization CLOSED EXTRA a {
2 a [ org:Organization ],
3 rdfs:label xsd:string ,
4 foaf:homepage IRI
5 }

Shape Expressions offer an intuitive way to describe the contents of linked data portals.
They have been used to document both the WebIndex1 and another data portal with a simi-
lar model, the Landbook2 data portal. Their documentation defines templates for the different
shapes of resources and for the triples that can be retrieved when dereferencing those resources.
These templates define the dataset structure in a declarative way and can serve as a contract be-
tween developers of the data portal contents and designers of the data model. Having a good
1 http://weso.github.io/wiDoc
2 http://weso.github.io/landportalDoc/data
200 6. APPLICATIONS
data model with a corresponding Shape Expressions specification facilitated the communication
between the various stakeholders involved.
The data model described in this chapter differs from the original one for readability and
didactic proposes in the following ways:
• We omitted the representation of computations, which are represented as single nodes with
type cex:Computation. A more detailed description of computations was described at [56].
We have also simplified the representation of the webindex structure, which was composed
of sub-indexes, components and other properties such as labels and provenance informa-
tion.
• We defined the shapes of countries to include just two simple properties. We deliber-
ately omit the mandatory use of rdf:type declaration to show that it is possible to have
nodes without that declaration. In the original WebIndex data model all countries had a
mandatory rdf:type arc but there were several generated nodes which did not have rdf:type
declarations. As we omitted the representation of computations we decided to offer that
possibility for countries as an example.
Appendix A includes the full version of the WebIndex ShEx description used in this book.

6.1.2 WEBINDEX IN SHACL


Although the original data portal was modeled in ShEx, we undertook the exercise of defining a
SHACL description for the same contents so that we could compare the expressiveness of ShEx
and SHACL. In this section we present a possible encoding in SHACL.
An equivalent description in SHACL of the :Country shape defined on page 197 would
be:
1 :Country a sh:NodeShape ;
2 sh:property [ sh:path rdfs:label ;
3 sh:datatype xsd:string ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ] ;
6 sh:property [ sh:path wf:iso2 ;
7 sh:datatype xsd:string ;
8 sh:length 2 ;
9 sh:minCount 1; sh:maxCount 1 ;
10 ] .

As can be seen, the :Country shape is defined by two constraints which specify that the
datatype of rdfs:label and wf:iso2 properties must be xsd:string and that wf:iso2 has length 2.
The default SHACL cardinality constraint is [0..*] meaning that cardinality constraints
that are omitted in ShEx grammar must be explicitly stated in SHACL as:
1 sh:minCount 1; sh:maxCount 1 ;
6.1. DESCRIBING A LINKED DATA PORTAL 201
Optionality (? or * in ShEx) can be represented either by omitting sh:minCount or by
sh:minCount=0. An unbounded maximum cardinality (* or + in ShEx) must be represented in
SHACL by omitting sh:maxCount. As an example, the definition of the :DataSet shape declares
that rdfs:label is optional (by omitting the sh:minCount property) and declares that there must be
one or more qb:slice predicates conforming to the qb:slice definition (by omitting the value of
sh:maxCount).
The predicate sh:node is used to indicate that the value of a property must have a given
shape. In this way, a shape can refer to another shape. Note that the WebIndex data model
contains cycles—shapes refer to other shapes and those shapes can refer back to the first ones—
which can generate recursive shapes. Nevertheless, the handling of recursion in SHACL is
implementation-dependent so it is necessary to circumvent this feature following some of the
techniques shown in section 5.12.1).
1 :DataSet a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:hasValue qb:DataSet ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ] ;
6 sh:property [ sh:path qb:structure ;
7 sh:hasValue wf:DSD ;
8 sh:minCount 1; sh:maxCount 1 ;
9 ] ;
10 sh:property [ sh:path rdfs:label ;
11 sh:datatype xsd:string ;
12 sh:maxCount 1 ;
13 ] ;
14 sh:property [ sh:path qb:slice ;
15 sh:node :Slice ;
16 sh:minCount 1 ;
17 ] ;
18 sh:property [ sh:path dct:publisher ;
19 sh:node :Organization ;
20 sh:minCount 1; sh:maxCount 1 ;
21 ] .

The definition of :Slice is similar to :DataSet, so we can omit it for clarity. The full version
of the SHACL shapes that we used in this section is shown in appendix B.
There are three items that need more explanation in the SHACL definition of the
:Observation shape. The first of these is the repeated appearance of the rdf:type property with
two values. Although we initially represented it using qualified value shapes, we noticed that it
could also be represented as:
1 :Observation a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:in ( qb:Observation wf:Observation )
4 sh:property [ sh:path rdf:type ;
202 6. APPLICATIONS
5 sh:minCount 2; sh:maxCount 2
6 ] ;
7 ...

The definition of observations also contains an optional property with a fixed value. This
was defined in ShEx as:
1 :Observation { ...
2 dct:publisher ( wf:WebFoundation )?
3 ...
4 }

which means that observations can either have a property dct:publisher with the fixed value
wf:WebFoundation or they can not have that property.
A possible representation in SHACL is to use an sh:or of two shapes: one in which there
is no dct:publisher (sh:maxCount=0) and one with exactly one value for dct:published.
1 :Observation ...
2 sh:or ( [ sh:path dct:publisher ;
3 sh:maxCount 0
4 ]
5 [ sh:path dct:publisher ;
6 sh:hasValue wf:WebFoundation ;
7 sh:minCount 1 ;
8 sh:maxCount 1
9 ]
10 )
11 ...

The last item requiring additional explanation is the disjunction definition which says that
observations must have either the property cex:computation with a value of shape :Computation
or the property wf:source with an IRI value, but not both. In ShEx, it was defined as:
1 :Observation { ...
2 , ( cex:computation @:Computation
3 | wf:source IRI
4 )
5 ...
6 }

In SHACL, this declaration can be defined using the sh:xone (exactly one) property con-
straint:
1 :Observation
2 ...
3 sh:xone ( [ sh:path wf:source ;
4 sh:nodeKind sh:IRI ;
5 sh:minCount 1; sh:maxCount 1 ;
6.1. DESCRIBING A LINKED DATA PORTAL 203
6 ]
7 [ sh:path cex:computation ;
8 sh:node :Computation ;
9 sh:minCount 1; sh:maxCount 1 ;
10 ]
11 )
12 ...

In the case of indicators we can see again the separation between the :Indicator shape and
the wf:PrimaryIndicator and wf:SecondaryIndicator classes.
1 :Indicator a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:in ( wf:PrimaryIndicator wf:SecondaryIndicator ) ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ] ;
6 ...

We defined organizations as closed shapes with the possibility that the rdf:type prop-
erty has some extra values apart from the org:Organization. This constraint can be expressed in
SHACL as:
1 :Organization a sh:NodeShape ;
2 sh:closed true ;
3 sh:ignoredProperties ( rdf:type )
4 sh:property [ sh:path rdf:type ;
5 sh:hasValue org:Organization ;
6 ] ;
7 ...

An important aspect that deserves some explanation is the use of recursion to represent
cyclic data models. While ShEx can define cyclic data models in a natural way, the lack of
recursion in SHACL needs to be circumvented.
One possibility is to add a discriminating rdf:type arc to every node so that its shape can
be associated to its class. We opted to add a sh:targetClass declaration to some shapes, such as
:Observation, conflating that shape with the class qb:Observation. Any node that contains a rdf:type
arc pointing to qb:Observation must conform to the :Observation shape declared by the WebIndex.
While this approach may be reasonable in closed contexts, it can cause problems in the
open semantic web if one combines data from other datasets. For example, we defined another
data model based on RDF data cube for the LandPortal project3 which also contained values
of type qb:Observation but with different structures. We consider that forcing every node of type
qb:Observation to have the same structure is not a good practice and that it may be better to
separate the target declarations from the shapes definitions.
3 http://landportal.info
204 6. APPLICATIONS
6.2 DESCRIBING CLINICAL RECORDS—FHIR
Fast Healthcase Interoperability Resources (FHIR)4 is a framework created by HL7, a clini-
cal standards organization, to define data formats and APIs for exchanging electronic health
records. FHIR Release 3.0 was published in March 2017 and adds support for RDF.
FHIR has a resource-oriented architecture that describes the different entities in-
volved in a clinical record. In a typical example, a patient (Patient resource) visits a clinician
(Practitioner resource), who records some observations (Observation resource), reviews some lab
results (Diagnostic results probably referencing other observations) and diagnoses a clinical is-
sue (Condition resource). These resources can be expressed interchangeably in multiple formats:
JSON, XML, and RDF.
FHIR resources are described by structure definitions in a FHIR-specific schema lan-
guage. This machine-readable language is translated into format-specific schema languages such
as XML Schema plus Schematron, JSON Schema, and ShEx.
The structure of FHIR resources is documented as machine-generated HTML tables.
Figure 6.2 shows part of the FHIR Observation resource5 .
FHIR structure definitions have two forms of limited disjunction. The first, choices of the
types of referenced resources, can be seen in subject and performer in Figure 6.2. The second is a
choice between a set of datatypes where the name of the datatype is appended to the property
name, indicated by the [x] notation (see effective[x] and value[x] in Figure 6.2). These are cap-
tured in ShEx using the shape expression ShapeOr (’OR’) and the triple expression OneOf (’|’)
respectively:

Example 6.1 FHIR Observation representation in ShEx


1 <Observation > CLOSED { a [ fhir:Observation ];
2 obs:status @<code > AND { fhir:value @fhirvs:observation - status };
3 obs:code @< CodeableConcept >;
4 obs:subject
5 ( @< PatientReference > OR
6 @< GroupReference > OR
7 @< DeviceReference > OR
8 @< LocationReference >
9 )?;
10 ( obs:effectiveDateTime @<dateTime >
11 | obs:effectiveTiming @<Timing >
12 )?;
13 obs:issued @<instant >?;
14 obs:performer
15 ( @< PractitionerReference > OR
16 @< OrganizationReference > OR
17 @< PatientReference > OR
4 https://www.hl7.org/fhir/
5 The original Observation resource is at http://hl7.org/fhir/observation.html
6.2. DESCRIBING CLINICAL RECORDS—FHIR 205
18 @< RelatedPersonReference >
19 )*;
20 ( obs:valueQuantity @<Quantity >
21 | obs:valueCodeableConcept @< CodeableConcept >
22 | obs:valueDateTime @<dateTime >
23 | obs:valuePeriod @<Period >
24 )?;
25 obs:bodySite @< CodeableConcept >?;
26 }

28 fhirvs:observation - status [" registered " " preliminary "


29 "final" " amended " ]

Figure 6.2: Part of Observation resource in FHIR.


206 6. APPLICATIONS
6.2.1 FHIR AS LINKED DATA
The definition of the RDF representation of FHIR was greatly simplified because FHIR was
designed to be resource-oriented. While clinical records are not expected to end up on the web,
the REST architecture was an easy way to implement addressability and separation of concerns.
This means that FHIR resources are interlinked in a fashion that is already familiar to
users of Linked Data. For example, the Observation excerpt includes references for the subject
and performer. The subject may be a resource of type Patient, Group, Device or Location, and the
performer may be a Practitioner, Organization, Patient, or a RelatedPerson (see Figure 6.2). While
Linked Data is most commonly associated with RDF, these constraints apply equally to the
XML and JSON representations of FHIR. However, of the four schema languages used to
validate FHIR, only ShEx validation spans resources.
There are several reasons why one might want to limit validation to a single document:
other resources might not be available or relevant and it may be impractical either computation-
ally or procedurally to test conformance of many resources at once. However, a common use
case for Linked Data is that all related data is addressable and available. Extending our schema
to include verification of external referents allows us to ensure that a resource is coherent not
only on its own but also when used in the context of the resources to which it is linked.

6.2.2 CONSISTENCY CONSTRAINTS


The FHIR-specific schemas are expressed as combinations of structure definitions describing
types and containership, and constraints. Most constraints are co-existence constraints, e.g., if
there is a duration there must be a durationUnits. For XML, structure definitions are expressed as
XML Schema, and co-existence constraints are expressed, where possible, in Schematron. For
RDF, structure definitions and coexistence constraints are both expressed in ShEx.
An example with co-existence constraints is the representation of the Timing datatype,
which represents an event that may occur multiple times. A Timing schedule can be a list of
events and/or criteria for when the event happens, which can be expressed in a structured form
and/or as a code. Figure 6.3 shows the HTML representation of Timing.
While these human-friendly HTML representations are generated from the FHIR
schema, they could easily be generated from representations in other schema languages such as
XML Schema, ShEx or SHACL. Schemas using more expressivity may be difficult to convey
graphically to users. For instance, these property trees do not have a way to assert co-existence
constraints, e.g. that certain properties are mutually exclusive. In a UML stack, these sorts of
constraints would be expressed using OCL (see section 3.1.1).
6.2. DESCRIBING CLINICAL RECORDS—FHIR 207

Figure 6.3: Complete Timing datatype in FHIR.


208 6. APPLICATIONS
Example 6.2 Timing representation in ShEx6
The ShEx representation of Timing is defined as:
1 PREFIX : <http: // hl7.org/fhir/ Timing .>
2 PREFIX fhirvs: <http: // hl7.org/fhir/ ValueSet />
3 BASE <http: // hl7.org/fhir/shape/>

5 <Timing > CLOSED {


6 :event @<dateTime >*;
7 :repeat @< Timing .repeat >?;
8 :code @< CodeableConcept >?;
9 }

where Timing.repeat shape contains two parts: a structure definition (lines 1–24) and several
co-existence constraints (lines 25–35) which can be expressed in natural language as:
• If there is a duration, there needs to be durationUnits.
• If there’s a period, there needs to be periodUnits.
• duration shall be a non-negative value.
• period shall be a non-negative value.
• If there is a periodMax, there must be a period.
• If there is a durationMax, there must be a duration.
• If there is a countMax, there must be a count.
• If there is an offset, there must be a when (and not C, CM, CD, CV).
• If there is a timeOfDay, there cannot be a when, or vice versa.

1 <Timing .repeat > CLOSED {


2 ( :repeat . boundsDuration @<Duration > |
3 :repeat . boundsRange @<Range > |
4 :repeat . boundsPeriod @<Period >
5 )?;
6 :repeat .count @<integer >?;
7 :repeat . countMax @<integer >?;
8 :repeat . duration @<decimal >?;
9 :repeat . durationMax @<decimal >?;
10 :repeat . durationUnit @<code > AND
11 { fhir:value @fhirvs:units -of -time }?;
12 :repeat . frequency @<integer >?;
13 :repeat . frequencyMax @<integer >?;
6 http://hl7.org/fhir/datatypes.html#timing
6.2. DESCRIBING CLINICAL RECORDS—FHIR 209
14 :repeat . period @<decimal >?;
15 :repeat . periodMax @<decimal >?;
16 :repeat . periodUnit @<code > AND
17 { fhir:value @fhirvs:units -of -time }?;
18 :repeat . dayOfWeek @<code > AND
19 { fhir:value @fhirvs:days -of -week }*;
20 :repeat . timeOfDay @<time >*;
21 :repeat .when @<code > AND
22 { fhir:value @fhirvs:event - timing }*;
23 :repeat . offset @< unsignedInt >?;
24 }
25 AND {( :repeat . duration .; :repeat . durationUnits .)? }
26 AND {( :repeat . period . ; :repeat . periodUnits .)? }
27 AND { :repeat . duration MinInclusive 0 ? }
28 AND { :repeat . period MinInclusive 0 ? }
29 AND {( :repeat . periodMax . ; :repeat . period . )? }
30 AND {( :repeat . durationMax . ; :repeat . duration .)? }
31 AND {( :repeat . countMax . ; :repeat .count .)? }
32 AND { :repeat . offset . ; :repeat .when [. - "C" - "CM" - "CD" - "CV"]
33 | :repeat .when . ? # if there is no offset there can still be a when
34 }
35 AND { :repeat . timeOfDay . | :repeat .when . }

The value set idiom of specifying a value type and a value set (e.g., <code> and fhirvs:units
-of-time) allows one to specify the structure and also to specify values within that structure.

6.2.3 FHIR/RDF DEVELOPMENT


The FHIR/RDF group, a joint undertaking of W3C and HL7, used ShEx not only to define the
final product but also to describe intermediate ideas and test them against example data. To this
end, members of the group learned ShEx to streamline the process with concrete, testable pro-
posals. During the development and deployment of version 3 of FHIR, Harold Solbrig (Mayo
Clinic) implemented a pipeline to test shapes against FHIR example data, catching errors in
both the examples and the ShEx schema.
Because the agile FHIR standardization process is centered around the maintenance of
FHIR resource structure definitions, the ShEx for FHIR is generated from these definitions.
The easy way to do this is to generate ShExJ (the JSON representation) but because the FHIR
group wanted these to be appealing to readers, they were transformed into ShExC, making
specific white space decisions in the process. These ShExC representations could then be parsed
to the abstract syntax to be tested against the reference ShExJ schemas. The latter transformation
was simpler and less error prone as it is involved only with the direct semantics.
210 6. APPLICATIONS
6.2.4 GENERIC PROPERTIES
Because electronic medical records use a consistent template to represent most clinical data, they
rely heavily on generic properties. These properties may be used multiple times with different
constraints. A simple example of this is a blood pressure, which actually consists of two measure-
ments: systolic (pressure during heart beat) and diastolic (pressure between heart beats). Both
of these measurements are connected to the blood pressure measurement by a fhir:Observation.
component property.

Example 6.3 FHIR blood pressure7


A <blood-pressure> shape can be defined in ShEx as:
1 <blood -pressure > {
2 a [ fhir:Observation ];
3 fhir:Observation . component {
4 fhir:Observation . component .code {
5 fhir:CodeableConcept . coding {
6 a [loinc:8480 -6] ; # systolic
7 }
8 } ;
9 fhir:Observation . component . valueQuantity {
10 fhir:Quantity .value { fhir:value xsd:decimal };
11 fhir:Quantity .unit { fhir:value ["mmHg"] };
12 }
13 } ;
14 fhir:Observation . component {
15 fhir:Observation . component .code {
16 fhir:CodeableConcept . coding {
17 a [loinc:8462 -4] ; # diastolic
18 }
19 };
20 fhir:Observation . component . valueQuantity {
21 fhir:Quantity .value { fhir:value xsd:decimal };
22 fhir:Quantity .unit { fhir:value ["mmHg"] };
23 }
24 }
25 }

and an example data conforming to that shape can be:


1 <http: // hl7.org/fhir/ Observation /blood -pressure >
2 a fhir:Observation ; # V
Passes as <blood-pressure>
3 fhir:Observation . component [
4 fhir:Observation . component .code [
5 fhir:CodeableConcept . coding [
6 a loinc:8480 -6; # systolic
7 http://build.fhir.org/observation-example-bloodpressure.ttl
6.2. DESCRIBING CLINICAL RECORDS—FHIR 211
7 ]
8 ];
9 fhir:Observation . component . valueQuantity [
10 fhir:Quantity .value [ fhir:value "107"^^ xsd:decimal ];
11 fhir:Quantity .unit [ fhir:value "mmHg" ];
12 ]
13 ], [
14 fhir:Observation . component .code [
15 fhir:CodeableConcept . coding [
16 a loinc:8462 -4; # diastolic
17 ]
18 ];
19 fhir:Observation . component . valueQuantity [
20 fhir:Quantity .value [ fhir:value "60"^^ xsd:decimal ];
21 fhir:Quantity .unit [ fhir:value "mmHg" ];
22 ]
23 ] .

This example is long, but it is taken directly from a use case. In fact, its length encourages
us to do a bit of factoring. While we want to keep constraints on the codes for systolic and
diastolic, we can create a separate <valueObs> shape to capture the quantity measurement.

Example 6.4 Factored FHIR blood pressure


1 PREFIX fhir: <http: // hl7.org/fhir/>
2 PREFIX loinc: <http: // loinc .org/owl#>
3 PREFIX owl: <http: // www.w3.org /2002/07/ owl#>
4 PREFIX rdfs: <http: // www.w3.org /2000/01/ rdf - schema #>
5 PREFIX sct: <http: // snomed .info/id/>
6 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
7 BASE <http: // hl7.org/fhir/shape/>

9 <blood -pressure > {


10 a [ fhir:Observation ];
11 fhir:Observation . component @<valueObs > AND {
12 fhir:Observation . component .code {
13 fhir:CodeableConcept . coding {
14 a [loinc:8480 -6] ; # systolic
15 }
16 }
17 } ;
18 fhir:Observation . component @<valueObs > AND {
19 fhir:Observation . component .code {
20 fhir:CodeableConcept . coding {
21 a [loinc:8462 -4] ; # diastolic
22 }
23 }
212 6. APPLICATIONS
24 }
25 }

27 <valueObs > {
28 fhir:Observation . component . valueQuantity {
29 fhir:Quantity .value { fhir:value xsd:decimal };
30 fhir:Quantity .unit { fhir:value ["mmHg"] };
31 }
32 }

This schema has two repeated properties: fhir:Observation.component with different con-
straints (one for systolic and the other for diastolic). It takes advantage of ShEx’s intuitive ad-
ditive semantics where requirements for repeated properties are simply expressed as additional
triple patterns (see section 4.6.7).

6.3 SPRINGER NATURE SCIGRAPH


Springer Nature SciGraph8 is a new Linked Open Data platform aggregating data sources from
Springer Nature and key partners from the scholarly domain. The platform currently collates in-
formation from across the research landscape, such as funders, research projects, conferences, af-
filiations, and publications (books and journals). This high-quality data from trusted and reliable
sources provides a rich semantic description of how information is related, as well as enabling
innovative visualizations of the scholarly domain.
Data quality is a key component in SciGraph. In earlier work, SPIN was used in vari-
ous validation scenarios. However, SPIN was hard to maintain and to read by non-experts and
SHACL was chosen instead. SHACL is now used to validate data before the data enters the
main triplestore. SHACL is also used to specify which classes and properties can be published
from the triplestore.
All of the SHACL shapes used in building public datasets of Springer Nature SciGraph
are published in a Github repository.9 There are shapes that define the RDF structure of all
SciGraph entity types such as articles, grants, and journals.
The following snippet of the Article shape says that all SHACL instances of sg:Article
must have exactly one sg:scigraphId that is a string, at most one value for sg:doi, a string following
a specific pattern and at most one value for sg:role that can be one of:author, editor or principal
investigator.
1 :Article a sh:NodeShape ;
2 sh:targetClass sg:Article ;
3 rdfs:label "RDF shape for the sg:Article model " ;

8 http://www.springernature.com/scigraph
9 https://github.com/springernature/scigraph
6.4. DBPEDIA VALIDATION USE CASES 213
5 # Identity
6 sh:property [
7 sh:path sg:scigraphId ;
8 sh:datatype xsd:string ;
9 sh:minCount 1 ;
10 sh:maxCount 1 ;
11 ] ;
12 sh:property [
13 sh:path sg:doi ;
14 sh:datatype xsd:string ;
15 sh:pattern " ^10\\.\\ d{4,5 }\\/\\ S+$" ;
16 sh:maxCount 1 ;
17 ] ;
18 # ...

20 sh:property [
21 sh:path sg:role ;
22 sh:in ( " author " " editor " " principal investigator " ) ;
23 sh:maxCount 1 ;
24 ] ;

6.4 DBPEDIA VALIDATION USE CASES


DBpedia10 ([60]) is a crowdsourced community effort to extract structured information from
Wikipedia and make this information available on the Web. DBpedia data is available as RDF
dumps, through a linked data interface and a SPARQL endpoint. The current DBpedia release
(version 2016-0411 ) provides circa 9.5 billion RDF triples.
Validating such large amounts of RDF data is a challenging task, and various methods
have been applied. At the time of writing, the core validation of DBpedia is performed with
neither ShEx nor SHACL. However, it is worth mentioning some approaches that work on
large and noisy datasets.

6.4.1 ONTOLOGY-BASED VALIDATION


One of the core sources of validation for DBpedia is the DBpedia ontology. The DBpedia on-
tology is crowdsourced and maintained by the community on the http://mappings.dbped
ia.org wiki. At the time of writing, the ontology consists of circa 750 classes, organized in
a hierarchy, and 2,600 properties. The community can define class disjoint statements and for
properties, axioms such as domain, range, literal datatypes, and functional properties. The DB-
pedia ontology both drives the correct extraction of RDF triples from Wikipedia pages and is
used in post-processing steps to remove data violations.
10 http://wiki.dbpedia.org
11 http://wiki.dbpedia.org/dbpedia-version-2016-04
214 6. APPLICATIONS
The DBpedia extraction framework has many extractors that parse different parts of a
Wikipedia page and generate RDF triples. The Mapping-based extractor is a special extrac-
tor that focuses on high-quality extraction from Wikipedia infoboxes. To achieve this it uses
the DBpedia ontology and the community-maintained infobox-to-ontology mappings. Each
infobox mapping maps a Wikipedia infobox template to a DBpedia class and each infobox tem-
plate parameter to a property mapping (see [60, sec. 2.4]) . At extraction time, each property
mapping is associated with a different parser, according to the rdfs:range of the DBpedia prop-
erty of each property mapping. For example, if the range of a property is defined as an xsd:date
(e.g. dbo:birthDate), property mappings with this property generate a value only if the value can
be parsed as a date.
As a post-processing step, the RDFS and OWL axioms defined in the DBpedia ontology
are used to further clean up the extracted data. A common approach is to run RDFUnit on
the data and get back detailed violation reports. These reports are used to identify common
sources of error that can be planned for fixing. Another approach is a set of scripts that parse
facts and, depending on the conformance of a fact to a set of axioms (e.g., rdfs:domain, rdfs:range,
owl:disjointWith, etc) dispatches the facts to different dataset buckets before publishing.

6.4.2 RDF MAPPINGS VALIDATION


A very common way to generate RDF data is through a mapping document. In a general case, a
mapping document contains rules that can be used to transform input data to RDF. The mapping
rules can be encoded in a script (e.g., using XSLT), in code, or formulated in mapping languages
such as R2RML [28] and RML [30].
A single error in the mapping document can, in many cases, be propagated to many errors
on the generated instance data, and the number of errors is usually proportional to the input
size. Consider for example a mapping document that generates person data and represents the
age of a person with the property foaf:age and the value as xsd:double instead of xsd:integer. Every
person instance in the generated RDF will have a violation for the datatype of foaf:age. Fixing
such errors in the mapping document is an easy task, but once the data is generated the task
becomes harder, especially on big datasets.
Dimou et al. [31] propose a workflow for including quality assessment of the mappings in
the general dataset quality assessment workflow. The authors use the dataset schema information
(i.e., ontologies) to identify schema errors of the dataset directly from an RML mapping docu-
ment. The results illustrate that violations such as domain and range, mistyped datatypes, class
and property disjointness, and the like can be identified directly from the mapping document.
Evaluation of this work indicates that fixing errors directly in the mapping document is more
efficient. For example, in the case of DBpedia, an automatic quality assessment of the mappings
took less than a minute while the complete dataset validation took more than 16 hours.
6.4. DBPEDIA VALIDATION USE CASES 215
However, the mapping quality assessment of the mappings cannot identify all possible
schema errors in the target dataset. Some constraints, such as cardinality, can only be identified
on the target dataset.
Even though this approach currently works with OWL and RDFS, it would be an easy
exercise to extend it to SHACL or ShEx. Given a set of mappings and a set of Shapes, one could
identify incompatibilities directly from the mapping document.

6.4.3 VALIDATING LINK CONTRIBUTIONS WITH SHACL


DBpedia promotes Github for accepting link contributions from the DBpedia community12
and, recently, there has been an effort to automate the link verification process (see [32, Section
3.3]). This has put into place a set of quality checks that validate various aspects of the link
submission and is integrated with common continuous integration services, such as Travis CI.
This approach enables instant checks on pull requests and reports problems to the sub-
mitter. In addition to scripts that check for instance valid RDF files, there is a script that checks
if the link manifest file conforms to the following SHACL schema.13
1 dbp:LinkManifest a sh:NodeShape ;
2 sh:targetClass void:Linkset ;
3 sh:property [
4 sh:path dc:author ;
5 sh:minCount 1;
6 sh:nodeKind sh:IRI ;
7 ] ;
8 sh:property [
9 sh:path dct:description ;
10 sh:minCount 1;
11 sh:nodeKind sh:Literal ;
12 sh:datatype xsd:string ;
13 ] ;
14 sh:property [
15 sh:path dct:license ;
16 sh:minCount 1;
17 sh:nodeKind sh:IRI ;
18 ] ;
19 sh:property [
20 sh:path dbp:script ;
21 sh:maxCount 1;
22 sh:nodeKind sh:IRI ;
23 ] ;
24 sh:property [
25 sh:path dbp:linkConf ;
26 sh:maxCount 1;
27 sh:nodeKind sh:IRI ;
12 https://github.com/dbpedia/links
13 The SHACL schema was based on an earlier version of SHACL and was adapted to the latest one for this book.
216 6. APPLICATIONS
28 ] ;
29 sh:property [
30 sh:path dbp:ntriplefilelocation ;
31 sh:maxCount 1;
32 sh:nodeKind sh:IRI ;
33 ] ;
34 sh:property [
35 sh:path dbp:endpoint ;
36 sh:maxCount 1;
37 sh:nodeKind sh:IRI ;
38 ] ;
39 sh:property [
40 sh:path dbp:constructQuery ;
41 sh:maxCount 1;
42 sh:nodeKind sh:Literal ;
43 sh:datatype xsd:string ;
44 ] ;
45 sh:property [
46 sh:path dbp:approvedPatch ;
47 sh:nodeKind sh:IRI ;
48 ] ;
49 sh:property [
50 sh:path dbp:optionalPatch ;
51 sh:nodeKind sh:IRI ;
52 ] ;
53 sh:property [
54 sh:path dbp:updateFrequencyInDays ;
55 sh:maxCount 1;
56 sh:nodeKind sh:Literal ;
57 sh:datatype xsd:integer ;
58 ] ;

The defined quality checks cannot capture all possible errors in a link submission pro-
cess. However, they can (a) provide a very useful feedback to the link submitter, and (b) enable
DBpedia to automatically pre-process some steps in the link generation pipeline.

6.4.4 ONTOLOGY VALIDATION WITH SHACL


The DBpedia ontology has been maintained by the DBpedia community in a crowdsourced
manner at the http://mappings.dbpedia.org wiki. There is an ongoing effort to move on-
tology development onto Github for easier collaboration and for the sake of more control over
the ontology structure. 14 At the time of writing, the following constraints are defined to ensure
that each DBpedia class and each DBpedia property conform to DBpedia community require-
ments:
14 https://github.com/dbpedia/ontology-tracker
6.4. DBPEDIA VALIDATION USE CASES 217
• Each DBpedia class and property must have at least one rdfs:label and at least one
rdfs:comment that are of rdf:langString datatype with unique language.

• Each DBpedia class can have at most one direct super class.
• Each DBpedia property can have at most one direct super property.
• Each DBpedia property can have at most one rdfs:domain.
• Each DBpedia property can have at most one rdfs:range.
• The domain and range of each property must be defined as an owl:Class.
• Top-level DBpedia classes must be discussed before defined.
These constraints are implemented with the following SHACL definitions. RDFUnit is
used to perform the validation as well as integrate with Travis CI and automate the checks on
each commit and pull request.
1 dbo - shape:ClassShape
2 a sh:Shape ;
3 sh:targetClass owl:Class ;
4 sh:targetSubjectsOf rdfs:subClassOf ;
5 sh:severity sh:Error ;
6 sh:property [
7 sh:message "Each owl:Class should have at least one rdfs:label " ;
8 sh:path rdfs:label ;
9 sh:minCount 1;
10 sh:dataType rdf:langString ;
11 sh:uniqueLang true ;
12 ] ;
13 sh:property [
14 sh:message "Each owl:Class should have at least one rdfs:comment " ;
15 sh:path rdfs:comment ;
16 sh:minCount 1;
17 sh:dataType rdf:langString ;
18 sh:uniqueLang true ;
19 ] ;
20 sh:property [
21 sh:message "Each owl:Class should have at most one superclass " ;
22 sh:path rdfs:subClassOf ;
23 sh:maxCount 1;
24 ] ;
25 sh:sparql [
26 sh:message " DBpedia Ontology only allows 9 top level classes, any new
top level classes need to be discussed " ;
27 sh:severity sh:Warning ;
28 sh:select """
29 PREFIX owl: <http: // www.w3.org /2002/07/ owl#>
218 6. APPLICATIONS
30 PREFIX rdfs: <http: // www.w3.org /2000/01/ rdf - schema #>
31 SELECT DISTINCT $this ? otherClass
32 WHERE {
33 $this rdfs:subClassOf owl:Thing .
34 FILTER ($this NOT IN (
35 <http: // dbpedia .org/ ontology /Activity >,
36 <http: // dbpedia .org/ ontology /Agent >,
37 <http: // dbpedia .org/ ontology /Concept >,
38 <http: // dbpedia .org/ ontology / CommunicationSystem >,
39 <http: // dbpedia .org/ ontology /Condition >,
40 <http: // dbpedia .org/ ontology /Event >,
41 <http: // dbpedia .org/ ontology / PhysicalThing >,
42 <http: // dbpedia .org/ ontology /Place >,
43 <http: // dbpedia .org/ ontology /TimePeriod >)
44 ).
45 } """ ;
46 ] .

48 dbo - shape:PropertyShape
49 a sh:Shape ;
50 sh:targetClass rdf:Property ;
51 sh:targetClass owl:DatatypeProperty ;
52 sh:targetClass owl:ObjectProperty ;
53 sh:targetSubjectsOf rdfs:subPropertyOf ;
54 sh:property [
55 sh:message "Each property should have at least one rdfs:label " ;
56 sh:path rdfs:label ;
57 sh:minCount 1;
58 sh:dataType rdf:langString ;
59 sh:uniqueLang true ;
60 ] ;
61 sh:property [
62 sh:message "Each property should have at least one rdfs:comment " ;
63 sh:path rdfs:comment ;
64 sh:minCount 1;
65 sh:dataType rdf:langString ;
66 sh:uniqueLang true ;
67 ] ;
68 sh:property [
69 sh:message "Each property should have at most one rdfs:domain " ;
70 sh:path rdfs:domain ;
71 sh:maxCount 1;
72 ] ;
73 sh:property [
74 sh:message "Each property should have an rdfs:domain that is defined
as an owl:Class " ;
75 sh:path rdfs:domain ;
76 sh:class owl:Class ;
6.5. SHEX FOR SHEX 219
77 ] ;
78 sh:property [
79 sh:message "Each property should have at most one rdfs:range " ;
80 sh:path rdfs:range ;
81 sh:maxCount 1;
82 ] ;
83 sh:property [
84 sh:message "Each property should have an rdfs:range that is defined as
an owl:Class " ;
85 sh:path rdfs:range ;
86 sh:class owl:Class ;
87 ] ;
88 sh:property [
89 sh:message "Each property should have at most one super property " ;
90 sh:path rdfs:subPropertyOf ;
91 sh:maxCount 1;
92 ] .

An interesting part of this use case is the use of SHACL-SPARQL to define the complex
constraint Top-level DBpedia classes must be discussed before defined. Here, only nine specific classes
are allowed as top-level classes (i.e. classes with no superclass except owl:Thing) and are hard-
coded in the SPARQL query. Even though this creates a tight coupling of the shape to the data,
top-level DBpedia classes are not changing frequently and adjusting the constraint can indeed
stimulate discussion.

6.5 SHEX FOR SHEX


Given that one serialization format for ShEx is RDF, it is possible to use ShEx to validate itself,
i.e., to validate RDF graphs representing ShEx schemas. The RDF serialization representation
of ShEx is called ShExR.
The following example contains a simple ShEx schema using ShExR in Turtle:
1 <> a sx:Schema ;
2 sx:shapes :User .

4 :User a sx:Shape ;
5 sx:expression [
6 a sx:EachOf ;
7 sx:expressions (
8 [ a sx:TripleConstraint ;
9 sx:predicate schema:name ;
10 sx:valueExpr [ a sx:NodeConstraint ;
11 sx:datatype xsd:string ]
12 ]
13 [ a sx:TripleConstraint ;
14 sx:predicate schema:gender ;
15 sx:valueExpr [ a sx:NodeConstraint ;
220 6. APPLICATIONS
16 sx:values ( schema:Male schema:Female )
17 ]
18 ]
19 )
20 ] .

In the following, we will describe the ShEx schemas that can validate RDF files in ShExR
(as above). The full code is included in the annex C and has been adapted from Appendix C
(ShEx shape) of the ShEx specification.15
ShExR graphs contain an RDF node with rdf:type sx:Schema, an optional list of starting
semantic actions, a start declaration and zero or more sx:shapes declarations whose values must
be shape expressions <ShapeExpr>.
Most of the shapes in this schema are defined as CLOSED to limit the appearance of unex-
pected triples.
1 <Schema > CLOSED {
2 a [ sx:Schema ] ;
3 sx:startActs @< SemActList1Plus >? ;
4 sx:start @<ShapeExpr >?;
5 sx:shapes @<ShapeExpr >*
6 }

As discussed in Section 4.4.3, there are six possibilities for defining shape expressions.
Which can be enumerated as:
1 <ShapeExpr > @<ShapeOr > OR
2 @<ShapeAnd > OR
3 @<ShapeNot > OR
4 @< NodeConstraint > OR
5 @<Shape > OR
6 @< ShapeExternal >

<ShapeOr> and <ShapeAnd> have a similar representation which contains a list of at least two
shape expressions represented by the <shapeExprList2Plus> shape, which will be described later.
1 <ShapeOr > CLOSED {
2 a [ sx:ShapeOr ] ;
3 sx:shapeExprs @< shapeExprList2Plus >
4 }

6 <ShapeAnd > CLOSED {


7 a [ sx:ShapeAnd ] ;
8 sx:shapeExprs @< shapeExprList2Plus >
9 }

<ShapeNot> contains a shape expression:


15 http://shex.io/shex-semantics/#shexr
6.5. SHEX FOR SHEX 221

1 <ShapeNot > CLOSED {


2 a [ sx:ShapeNot ] ;
3 sx:shapeExpr @<shapeExpr >
4 }

The following code represents lists of shape expressions. <shapeExprList2Plus is a list of at


least two shape expressions, and <shapeExprList1Plus> is a list of at least one.
1 <shapeExprList2Plus > CLOSED {
2 rdf:first @<shapeExpr > ;
3 rdf:rest @< shapeExprList1Plus >
4 }
5 <shapeExprList1Plus > CLOSED {
6 rdf:first @<shapeExpr > ;
7 rdf:rest [ rdf:nil ] OR @< shapeExprList1Plus >
8 }

Node constraints are formed by one or more declarations of node kind, datatype, string
facet, numeric facet, or a list of possible values.
1 <NodeConstraint > CLOSED {
2 a [ sx:NodeConstraint ] ;
3 ( sx:nodeKind [ sx:iri sx:bnode sx:literal sx:nonliteral ]
4 | sx:datatype IRI
5 | &< stringFacet >
6 | &< numericFacet >
7 | sx:values @< valueSetValueList1Plus >
8 )+
9 }

A shape can contain the Boolean directives sx:closed and sx:extra as well as a
and an optional list of semantic actions.
sx:tripleExpression

1 <Shape > CLOSED {


2 a [ sx:Shape ] ;
3 sx:closed [true false ]? ;
4 sx:extra IRI* ;
5 sx:expression @< tripleExpression >? ;
6 sx:semActs @< SemActList1Plus >? ;
7 }

External shapes only contain a type declaration.


1 <ShapeExternal > CLOSED {
2 a [ sx:ShapeExternal ] ;
3 }

Semantic actions contain a sx:name that points to an IRI describing the processor and a
sx:code value with the string code that will be passed to that processor.
222 6. APPLICATIONS

1 <SemAct > CLOSED {


2 a [ sx:SemAct ] ;
3 sx:name IRI ;
4 sx:code xsd:string ?
5 }

Annotations contain a predicate (which must be an IRI) and an object.


1 <Annotation > CLOSED {
2 a [ sx:Annotation ] ;
3 sx:predicate IRI ;
4 sx:object @< objectValue >
5 }

String and numeric facets just enumerate the different possibilities:


1 <stringFacet > {
2 sx:length xsd:integer
3 | sx:minlength xsd:integer
4 | sx:maxlength xsd:integer
5 | sx:pattern xsd:string
6 }
7 <numericFacet > {
8 sx:mininclusive @< numericLiteral >
9 | sx:minexclusive @< numericLiteral >
10 | sx:maxinclusive @< numericLiteral >
11 | sx:maxexclusive @< numericLiteral >
12 | sx:totaldigits xsd:integer
13 | sx:fractiondigits xsd:integer
14 }
15 <numericLiteral > xsd:integer OR
16 xsd:decimal OR
17 xsd:double

The values that can appear in a value set are object values, stems, or ranges:
1 <valueSetValue > @< objectValue >
2 OR @<IriStem > OR @< IriStemRange >
3 OR @< LiteralStem > OR @< LiteralStemRange >
4 OR @< LanguageStem > OR @< LanguageStemRange >

Object values can be IRIs or literals:


1 <objectValue > IRI OR LITERAL

Stems and ranges are defined for the different possibilities: IRIs, literals, or language-
tagged literals.
1 <IriStem > CLOSED { a [ sx:IriStem ]; sx:stem xsd:anyUri }
6.5. SHEX FOR SHEX 223
2 <IriStemRange > CLOSED {
3 a [ sx:IriStemRange ];
4 sx:stem xsd:anyUri OR @<Wildcard >;
5 sx:exclusion @< objectValue > OR @<IriStem >*
6 }
7 <LiteralStem > CLOSED { a [ sx:LiteralStem ]; sx:stem xsd:string }
8 <LiteralStemRange > CLOSED {
9 a [ sx:LiteralStemRange ];
10 sx:stem xsd:string OR @<Wildcard >;
11 sx:exclusion @< objectValue > OR @< LiteralStem >*
12 }
13 <LanguageStem > CLOSED { a [ sx:LanguageStem ]; sx:stem xsd:string }
14 <LanguageStemRange > CLOSED {
15 a [ sx:LanguageStemRange ];
16 sx:stem xsd:string OR @<Wildcard >;
17 sx:exclusion @< objectValue > OR @< LanguageStem >*
18 }
19 <Wildcard > BNODE CLOSED {
20 a [ sx:Wildcard ]
21 }

A triple expression is either a triple constraint, an inclusion of another shape expression


of a composed triple expression made from <OneOf> or <EachOf>.
1 <tripleExpression > @< TripleConstraint > OR
2 @<OneOf > OR
3 @<EachOf > OR
4 @<Inclusion >

The definition of <OneOf> and <EachOf> is very similar: they contain sx:min and sx:max cardi-
nalities. a list of at least two triple expressions, and optional list of semantic actions and a list of
annotations.
1 <OneOf > CLOSED {
2 a [ sx:OneOf ] ;
3 sx:min xsd:integer ? ;
4 sx:max xsd:integer ? ;
5 sx:expressions @< tripleExpressionList2Plus > ;
6 sx:semActs @< SemActList1Plus >? ;
7 sx:annotation @<Annotation >*
8 }

10 <EachOf > CLOSED {


11 a [ sx:EachOf ] ;
12 sx:min xsd:integer ? ;
13 sx:max xsd:integer ? ;
14 sx:expressions @< tripleExpressionList2Plus > ;
15 sx:semActs @< SemActList1Plus >? ;
16 sx:annotation @<Annotation >*
224 6. APPLICATIONS
17 }

<tripleExpressionList2Plus> declares a list of at least two triple expressions.


1 <tripleExpressionList2Plus > CLOSED {
2 rdf:first @< tripleExpression > ;
3 rdf:rest @< tripleExpressionList1Plus >
4 }
5 <tripleExpressionList1Plus > CLOSED {
6 rdf:first @< tripleExpression > ;
7 rdf:rest [ rdf:nil ] OR
8 @< tripleExpressionList1Plus >
9 }

A <tripleConstraint> contains a mandatory sx:predicate property, an optional value expres-


sion, the cardinality declarations sx:min and sx:max, the sx:inverse and sx:negated qualifiers and
the semantic actions and annotations.
1 <TripleConstraint > CLOSED {
2 a [ sx:TripleConstraint ] ;
3 sx:inverse [true false ]? ;
4 sx:negated [true false ]? ;
5 sx:min xsd:integer ? ;
6 sx:max xsd:integer ? ;
7 sx:predicate IRI ;
8 sx:valueExpr @<shapeExpr >? ;
9 sx:semActs @< SemActList1Plus >? ;
10 sx:annotation @<Annotation >*
11 }

An inclusion has a predicate sx:include that points to an IRI or a blank node (non-literals).
1 <Inclusion > CLOSED {
2 a [ sx:Inclusion ]? ;
3 sx:include NONLITERAL
4 }

The following definitions declare lists of at least one element: semantic actions or value
set values.
1 <SemActList1Plus > CLOSED {
2 rdf:first @<SemAct > ;
3 rdf:rest [ rdf:nil ] OR @< SemActList1Plus >
4 }
5 <valueSetValueList1Plus > CLOSED {
6 rdf:first @< valueSetValue > ;
7 rdf:rest [ rdf:nil ] OR @< valueSetValueList1Plus >
8 }
6.6. SHACL IN SHACL 225
6.6 SHACL IN SHACL
In this section we describe how to use SHACL to validate Shapes graphs that contain SHACL
code. This is similar to what we described in the previous section although in this case we are us-
ing SHACL to validate SHACL. The full code described in this section appears in Appendix D
and has been adapted from Appendix C of the SHACL specification. We have done some mod-
ifications to the original code for readability.
The document declares the shape of shapes :ShapeShape as a sh:NodeShape that contains a
long list of target declarations to define the nodes that must be validated as shapes.
1 :ShapeShape a sh:NodeShape ;
2 sh:targetClass sh:NodeShape , sh:PropertyShape ;
3 sh:targetSubjectsOf sh:targetClass , sh:targetNode ,
4 sh:targetObjectsOf , sh:targetSubjectsOf ,
5 sh:and , sh:class , sh:closed , sh:datatype ,
6 sh:disjoint , sh:equals , sh:flags , sh:hasValue ,
7 ... # All the other constraint component parameters
8 sh:targetObjectsOf sh:node , sh:not , sh:property sh:qualifiedValueShape .

It declares that every node that is an instance of sh:NodeShape or sh:PropertyShape


must conform to :ShapeShape and that the subjects of properties sh:targetClass, sh:targetNode
, …must also conform to :ShapeShape as well as the objects of sh:node, sh:not, sh:property, and
sh:qualifiedValueShape.
The next statement declares that nodes conforming to shapes, must conform to one of
:NodeShapeShape or :PropertyShapeShape.

1 :ShapeShape
2 sh:xone ( :NodeShapeShape :PropertyShapeShape ) ;

The following statements declare the types of values that can be associated with target
declarations.
1 :ShapeShape sh:property [
2 sh:path sh:targetNode ;
3 sh:nodeKind sh:IRIOrLiteral ;
4 ] ;
5 sh:property [
6 sh:path sh:targetClass ;
7 sh:nodeKind sh:IRI ;
8 ] ;
9 sh:property [
10 sh:path sh:targetSubjectsOf ;
11 sh:nodeKind sh:IRI ;
12 ] ;
13 sh:property [
14 sh:path sh:targetObjectsOf ;
15 sh:nodeKind sh:IRI ;
226 6. APPLICATIONS
16 ] ;
17 ...

In the same way, it declares the values that can have the different constraint components.
1 :ShapeShape sh:property [
2 sh:path sh:severity ;
3 sh:maxCount 1 ;
4 sh:nodeKind sh:IRI ;
5 ] ;
6 sh:property [
7 sh:path sh:deactivated ;
8 sh:maxCount 1 ;
9 sh:in ( true false ) ;
10 ] ;
11 sh:property [
12 sh:path sh:and ;
13 sh:node :ListShape ;
14 ] ;
15 sh:property [
16 sh:path sh:class ;
17 sh:nodeKind sh:IRI ;
18 ] ;
19 ...

We omit the full list of declarations as all of them follow the same style. They declare the
expected value of each predicate. For example, in the last case, that the predicate sh:class can
have an IRI as value.
A remarkable aspect is the following declaration:
1 :ShapeShape sh:or (
2 [ sh:not [
3 sh:class rdfs:Class ;
4 sh:or ( [ sh:class sh:NodeShape ]
5 [ sh:class sh:PropertyShape ]
6 )
7 ]
8 ]
9 [ sh:nodeKind sh:IRI ]
10 ).

It represents a syntax rule of implicit class targets (see Section 5.7.3) by which a Node-
Shape or PropertyShape that are also instances of rdfs:Class must be IRIs. This is an example of
an IF-THEN pattern (see Section 5.11.5) and could be defined in pseudo-code as:
1 IF ( sh:class rdfs:Class AND
2 ( sh:class sh:NodeShape OR sh:class sh:PropertyShape )
3 ) THEN sh:nodeKind sh:IRI
6.6. SHACL IN SHACL 227
Another interesting declaration is:
1 :ShapeShape sh:property [
2 sh:path sh:message ;
3 sh:or ( [ sh:datatype xsd:string ]
4 [ sh:datatype rdf:langString ] ) ;
5 ] .

which declares that messages can be any string literal or languages tagged string literal, which is
a common pattern for messages that admit not only plain string literals but multilingual ones.
Another aspect that can be remarked is the use of :ListShape as the value of several predi-
cates like sh:and, sh:or, sh:in, sh:ignoredProperties, and sh:xone.
The declarations are done as:
1 :ShapeShape sh:property [
2 sh:path sh:and ;
3 sh:node :ListShape ;
4 ] ;
5 sh:property [
6 sh:path sh:or ;
7 sh:node :ListShape ;
8 ] ;
9 # ... similar for the other predicates
10 .

The meaning is that the values of those predicates must be well-formed RDF lists (see
Section 2.2).
An RDF list is a collection of values linked by the rdf:rest predicate whose last value is
rdf:nil. Each node in the list must contain exactly one value of rdf:first. The declaration of
:ListShape is defined as:

1 :ListShape a sh:NodeShape ;
2 sh:property [ sh:path [ sh:zeroOrMorePath rdf:rest ] ;
3 sh:hasValue rdf:nil ;
4 sh:node :ListNodeShape ;
5 ] .

which means that all the nodes are linked by the predicate rdf:rest zero or more times, and that
those nodes must conform to :ListNodeShape which is defined as:
1 :ListNodeShape a sh:NodeShape ;
2 sh:or (
3 [ sh:hasValue rdf:nil ;
4 sh:property [ sh:path rdf:first ; sh:maxCount 0 ] ;
5 sh:property [ sh:path rdf:rest ; sh:maxCount 0 ] ;
6 ]
7 [ sh:not [ sh:hasValue rdf:nil ] ;
8 sh:property [ sh:path rdf:first ; sh:maxCount 1 ; sh:minCount 1 ] ;
228 6. APPLICATIONS
9 sh:property [ sh:path rdf:rest ; sh:maxCount 1 ; sh:minCount 1 ] ;
10 ]) .

This means that a list node is either rdf:nil, in which case it must not have any arc with
predicates rdf:first or rdf:rest, or a node with exactly one value for those predicates. In this
case, the pattern followed is an IF-THEN-ELSE pattern.
In the case of sh:ignoredProperties and sh:languageIn, the list nodes must also conform to
some specific shape (to be an IRI or a string). This can be expressed as:
1 :ShapeShape sh:property [
2 sh:path ( sh:ignoredProperties [ sh:zeroOrMorePath rdf:rest ] rdf:first );
3 sh:nodeKind sh:IRI ;
4 ];
5 sh:property [
6 sh:path ( sh:languageIn [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
7 sh:datatype xsd:string ;
8 ] .

Similarly, a constraint is established on the values of sh:and, sh:or and sh:xone which must
be lists of nodes conforming to :ShapeShape. This is declared as:
1 :ShapesListShape a sh:NodeShape ;
2 sh:property [
3 sh:path ( [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
4 sh:node :ShapeShape ;
5 ] .

Some properties, like the sh:path, sh:lessThan, sh:minCount, etc. cannot be applied to node
shapes. This constraint is declared as:
1 :NodeShapeShape a sh:NodeShape ;
2 sh:property [ sh:path sh:path ; sh:maxCount 0 ] ;
3 sh:property [ sh:path sh:lessThan ; sh:maxCount 0 ] ;
4 sh:property [ sh:path sh:maxCount ; sh:maxCount 0 ];
5 ... # Similar for sh:lessThanOrEquals , sh:minCount ,
6 # sh:qualifiedValueShape and sh:uniqueLang

Property shapes must have exactly one value for property sh:path.
1 :PropertyShapeShape a sh:NodeShape ;
2 sh:property [ sh:path sh:path ;
3 sh:maxCount 1 ; sh:minCount 1 ;
4 sh:node :PathShape
5 ] .

The value of sh:path must conform to :PathShape. The first version of :PathShape employed
recursion with the following pattern:
6.6. SHACL IN SHACL 229

1 :PathShape a sh:NodeShape ;
2 sh:xone (
3 [ sh:nodeKind sh:IRI ]
4 [ sh:nodeKind sh:BlankNode ;
5 sh:node :PathListWithAtLeast2Members ;
6 ]
7 [ sh:nodeKind sh:BlankNode ;
8 sh:closed true ;
9 sh:property [ sh:path sh:alternativePath ;
10 sh:node :PathListWithAtLeast2Members ;
11 sh:minCount 1 ; sh:maxCount 1 ;
12 ]
13 ]
14 [ sh:nodeKind sh:BlankNode ;
15 sh:closed true ;
16 sh:property [ sh:path sh:inversePath ;
17 sh:node :PathShape ; # Recursive reference
18 sh:minCount 1 ; sh:maxCount 1 ;
19 ]
20 ]
21 ...# similar for sh:zeroOrMorePath , sh:oneOrMorePath
22 # and sh:zeroOrOnePath
23 );
24 .

However, as recursion is undefined in SHACL, that definition was changed to simulate


recursion using the property path sh:zeroOrMorePath with an auxiliary shape (see Section 5.12.1).
The new definition is:
1 :PathShape a sh:NodeShape ;
2 sh:property [ sh:path [ sh:zeroOrMorePath _:PathPath ] ;
3 sh:node :PathNodeShape ;
4 ] .

6 _:PathPath sh:alternativePath (
7 ( [ sh:zeroOrMorePath rdf:rest ] rdf:first )
8 ( sh:alternativePath [ sh:zeroOrMorePath rdf:rest ] rdf:first )
9 sh:inversePath
10 sh:zeroOrMorePath
11 sh:oneOrMorePath
12 sh:zeroOrOnePath
13 ) .

15 :PathNodeShape sh:xone (
16 [ sh:nodeKind sh:IRI ]
17 [ sh:nodeKind sh:BlankNode ;
18 sh:node :PathListWithAtLeast2Members ;
19 ]
230 6. APPLICATIONS
20 [ sh:nodeKind sh:BlankNode ;
21 sh:closed true ;
22 sh:property [ sh:path sh:alternativePath ;
23 sh:node :PathListWithAtLeast2Members ;
24 sh:minCount 1 ; sh:maxCount 1 ;
25 ]
26 ]
27 [ sh:nodeKind sh:BlankNode ;
28 sh:closed true ;
29 sh:property [ sh:path sh:inversePath ;
30 sh:minCount 1 ; sh:maxCount 1 ;
31 ]
32 ]
33 ...# similar for sh:zeroOrMorePath , sh:oneOrMorePath
34 # and sh:zeroOrOnePath
35 ) .

The previous definitions use the following auxiliary shape :PathListWithAtLeast2Members:


1 :PathListWithAtLeast2Members a sh:NodeShape ;
2 sh:node :ListShape ;
3 sh:property [ sh:path [ sh:oneOrMorePath rdf:rest ] ;
4 sh:minCount 2 ; # 1 other list node plus rdf:nil
5 ] .

The last two definitions declare that the values of sh:shapesGraph and the values of
sh:entailment must be IRIs.

1 :ShapesGraphShape a sh:NodeShape ;
2 sh:targetObjectsOf sh:shapesGraph ;
3 sh:nodeKind sh:IRI .

5 :EntailmentShape a sh:NodeShape ;
6 sh:targetObjectsOf sh:entailment ;
7 sh:nodeKind sh:IRI .

6.7 SUMMARY
• ShEx and SHACL can be used to describe and validate linked data portals. We show how
they can be used to describe the WebIndex data model.
• FHIR describes an abstract information model which can be represented in JSON, XML,
and RDF. FHIR/RDF data model is described using ShEx.
• Springer Nature SciGraph is an early adopter of SHACL to validate data.
• DBpedia is an example of a big linked data portal whose needs for validation offer new
challenges.
6.8. SUGGESTED READING 231
• The RDF representation of ShEx can be described and validated in ShEx.
• SHACL Core shapes graphs can be described and validated in SHACL.

6.8 SUGGESTED READING


• Paper describing the WebIndex: J. E. Labra Gayo, E. Prud’hommeaux, H. Solbrig, and
I. Boneva. Validating and describing linked data portals using shapes. http://arxiv.or
g/abs/1701.08924

• FHIR linked data model. Describes the RDF data model used in FHIR and its use of
ShEx: D. Booth. FHIR linked data module. https://www.hl7.org/fhir/linked-
data-module.html, April 2017.

• Paper describing the use of RDFUnit on DBpedia as well as other large-scale RDF
datasets: D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelis-
sen, and A. Zaveri. Test-driven evaluation of linked data quality. In Proc. of the 23rd Inter-
national Conference on World Wide Web, WWW’14, pages 747–758, Republic and Canton
of Geneva, Switzerland, International World Wide Web Conferences Steering Commit-
tee, 2014. DOI: 10.1145/2566486.2568002
• Paper describing the mappings-based validation applied in DBpedia: A. Dimou, D. Kon-
tokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and
R. Van de Walle. Assessing and refining mappings to RDF to improve dataset quality. In
Proc. of the 14th International Semantic Web Conference, October 2015. DOI: 10.1007/978-
3-319-25010-6_8
• Paper describing the integration of SHACL with Travis CI for validating DBpedia link
contributions: M. Dojchinovski, D. Kontokostas, R. Rößling, M. Knuth, and S. Hell-
mann. DBpedia links: The hub of links for the web of data. In Proc. of the SEMAN-
TiCS Conference (SEMANTiCS 2016), September 2016. https://svn.aksw.org/pap
ers/2016/SEMANTiCS_DBpedia_Links/public.pdf
CHAPTER 7

Comparing ShEx and SHACL


In this chapter we present a comparison between ShEx and SHACL. The technologies have
similar goals and similar features. In fact at the start of the Data Shapes Working Group in
2014, convergence on a unified approach was considered possible. However, this did not happen
and as of July 2017 both technologies are maintained as separate solutions.
We start by describing some of the common features that they share, followed by a review
of the main differences.

7.1 COMMON FEATURES


ShEx and SHACL share the same goal, to have a mechanism for describing and validating RDF
data using a high-level language, so there are a lot of common features that both share.
• Shapes. Both define the notion of a shape, as something that contains constraints on the
topology of RDF nodes. SHACL shapes are similar to ShEx shape expressions, with the
difference that links to data nodes are expressed in SHACL by target declarations and in
ShEx by shape maps. In most of the common cases, it is possible to translate between
ShEx and SHACL.

Example 7.1 Similarities between ShEx and SHACL


Consider the following SHACL shapes graph:
1 :User a sh:NodeShape ;
2 sh:nodeKind sh:IRI ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] ;
9 sh:property [
10 sh:path schema:gender ;
11 sh:minCount 1;
12 sh:maxCount 1;
13 sh:or (
14 [ sh:in ( schema:Male schema:Female ) ]
15 [ sh:datatype xsd:string ]
234 7. COMPARING SHEX AND SHACL
16 )
17 ] ;
18 sh:property [
19 sh:path schema:birthDate ;
20 sh:maxCount 1;
21 sh:datatype xsd:date ;
22 ] .

This can be expressed in a ShEx schema:


1 :User IRI {
2 schema:name xsd:string ;
3 schema:gender [ schema:Male schema:Female ] OR xsd:string ;
4 schema:birthDate xsd:date ?
5 }

• Node constraints. Both languages have the notion of node constraints and share similar
expressiveness: node kinds, datatypes, datatype facets, value sets, etc. Example 7.1 shows
two declarations which are equivalent in ShEx and SHACL: a node must be an IRI, have
exactly one value for the property schema:name that has datatype xsd:string, have exactly one
value for the property schema:gender which must be one of (schema:Male schema:Female) or a
xsd:string, and optionally have a value for the property schema:birthDate that has datatype
xsd:date.

• Property Constraints. Both languages enable the declaration of constraints on the out-
going and incoming properties of a node.

Example 7.2 Constraints on incoming/outgoing properties in ShEx/SHACL


The following SHACL shapes graph describes that nodes that conform to :User have one
outgoing property schema:name and one incoming property schema:member from an organiza-
tion.
1 :User a sh:NodeShape ;
2 sh:property [
3 sh:path schema:name ;
4 sh:minCount 1;
5 sh:maxCount 1;
6 sh:datatype xsd:string ;
7 ] ;
8 sh:property [
9 sh:path [ sh:inversePath schema:member ] ;
10 sh:minCount 1; sh:maxCount 1;
11 sh:node :Organization ;
12 ] .
7.1. COMMON FEATURES 235

14 :Organization a sh:NodeShape ;
15 sh:property [
16 sh:path rdf:type ;
17 sh:minCount 1; sh:maxCount 1;
18 sh:hasValue :Organization ;
19 ] .

can be expressed in ShEx as:


1 :User {
2 schema:name xsd:string ;
3 ^ schema:member @:Organization
4 }
5 :Organization { a [ :Organization ] }

Given the following data:


1 :alice a :User ; # V Passes as :User
2 schema:name "Alice " .

4 :bob a :User ;
5 schema:name " Robert " . # X Fails as :User

7 :myCompany a :Organization ;
8 schema:member :alice .

Both ShEx and SHACL check that :alice conforms to the :User shape and raise an error
for :bob because there is no arc schema:member from a node with shape :Organization pointing
to :bob.

• Cardinalities. Both languages can constraint the number of values for a property in a
specific range, or leave the maximum number of value unbound.
• RDF syntax. Both ShEx and SHACL can use RDF concrete syntaxes though with dif-
ferent vocabularies.
• Logical operators. Both ShEx and SHACL have the logical operators And, Or and Not.
ShEx has the operators | to represent “oneOf ” while SHACL has xone to represent exactly
one.

Example 7.3 Example with logical operators


Imagine that in some domain, a :Product must have a schema:productID with a value that
either starts by P (matches regular expression "^P") or ends by a digit (regular expression
"\\[0-9]$") and is not "P23".
236 7. COMPARING SHEX AND SHACL
It can be expressed in ShEx as:
1 :Product ({
2 schema:productID /^P/i ;
3 } OR {
4 schema:productID /[0 -9]$/ ;
5 }) AND NOT {
6 schema:productID [ "P23" ]
7 }

and in SHACL as:


1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:or (
4 [ sh:path schema:productID ;
5 sh:minCount 1; sh:maxCount 1;
6 sh:pattern "^P" ;
7 sh:flags "i"
8 ]
9 [ sh:path schema:productID ;
10 sh:minCount 1; sh:maxCount 1;
11 sh:pattern "[0 -9]$" ;
12 ]
13 );
14 sh:not [
15 sh:path schema:productID ;
16 sh:hasValue "P23"
17 ] .

Given the following data:


1 :p45 a :Product ; # V Passes as :Product
2 schema:productID "P45" .

4 :x23 a :Product ; # V Passes as :Product


5 schema:productID "X23" .

7 :p23 a :Product ; # X Fails as :Product


8 schema:productID "P23" .

10 :xx a :Product ; # X Fails as :Product


11 schema:productID "xx" .

• Extension mechanism. Both ShEx and SHACL have extension mechanisms that support
the declaration of more advanced constraints. ShEx has semantic actions (see Section 4.10)
and SHACL has SHACL-SPARQL (see Section 5.16). In Section 7.18, we compare the
ShEx and SHACL extension mechanisms in more detail.
7.2. SYNTACTIC DIFFERENCES 237
7.2 SYNTACTIC DIFFERENCES
The design of ShEx emphasized human readability, with a compact grammar that follows tra-
ditional language design principles and a compact syntax evolved from Turtle. The specification
defines an abstract syntax. The compact syntax (ShExC), a concrete JSON syntax (ShExJ), or
any of the concrete syntaxes for RDF may be used to express a ShEx schema.
SHACL uses the RDF abstract syntax and concrete syntaxes directly. The SHACL spec-
ification enumerates circa 120 rules that define what constitutes a well-formed SHACL shapes
graph.1 SHACL processors can simply omit ill-formed shapes graphs.
A compact syntax inspired by ShEx has been proposed for a subset of SHACL as a WG
Note (see Section 5.18) but it is not mandatory, and compliant SHACL processors are only
required to handle the RDF syntax.
As the SHACL compact syntax was inspired by ShExC, they look similar, but there are
several semantic differences.

Example 7.4 Comparing ShEx and SHACL compact syntaxes


Given the following ShEx schema:
1 :Product {
2 schema:productId /^[A-R]/ ;
3 schema:productId /^[M-Z]/ ;
4 schema:brand IRI @:Organization * ;
5 schema:purchaseDate xsd:date ?
6 }
7 :Organization {
8 schema:name xsd:string
9 }

A similar (but not equivalent) representation using SHACL compact syntax is:
1 :Product {
2 schema:productId xsd:string [1..1] pattern ="^[A-R]" .
3 schema:productId xsd:string [1..1] pattern ="^[M-Z]" .
4 schema:brand IRI @:Organization [0..*] .
5 schema:purchaseDate xsd:date [0..1]
6 }
7 :Organization {
8 schema:name xsd:string
9 }

Though the examples look similar on the surface, there are several subtle differences. The
ShEx schema says that there must be two values for the property schema:productId, one matching
"^[A-R]" and the other matching "^[M-Z]". In contrast, the SHACL shapes graph says that there
is only one property schema:productId, which must satisfy both regular expressions.
1 The complete list of rules is defined in https://www.w3.org/TR/shacl/#syntax-rules.
238 7. COMPARING SHEX AND SHACL
Given the following RDF data:
1 :p1 a :Product ; # V Passes as :Product using ShEx
2 schema:productId "AB" ; # X Fails as :Product using SHACL
3 schema:productId "XY" ;
4 schema:brand :myBrand .

6 :p2 a :Product ; # X Fails as :Product using ShEx


7 schema:productId "MON" ; # V Passes as :Product using SHACL
8 schema:brand :myBrand .

10 :myBrand schema:name " MyBrand " .

Node :p1 conforms to ShEx definition of :Product and does not conform to SHACL be-
cause the constraints on schema:productId are not satisfied (both must be satisfied). Node :p2 does
not conform to ShEx because it only has one schema:productId but conforms to SHACL because
it satisfies all constraints.

The RDF vocabulary of ShEx is also different from SHACL.

Example 7.5
The RDF representation of Example 7.4 in ShEx is:
1 :Product a sx:Shape ;
2 sx:expression [ a sx:EachOf ;
3 sx:expressions (
4 [ a sx:TripleConstraint ;
5 sx:predicate schema:productId ;
6 sx:valueExpr [ a sx:NodeConstraint ;
7 sx:pattern "^[A-R]" ]
8 ]
9 [ a sx:TripleConstraint ;
10 sx:predicate schema:productId ;
11 sx:valueExpr [ a sx:NodeConstraint ;
12 sx:pattern "^[M-Z]" ]
13 ]
14 [ a sx:TripleConstraint ;
15 sx:predicate schema:brand ;
16 sx:min 0; sx:max -1;
17 sx:valueExpr [ a sx:ShapeAnd ;
18 sx:expressions (
19 [ a sx:NodeConstraint ; sx:nodeKind sx:iri ]
20 :Organization
21 )
22 ]
23 ]
24 [ a sx:TripleConstraint ;
25 sx:predicate schema:purchaseDate ;
7.3. FOUNDATION: SCHEMA VS. CONSTRAINTS 239
26 sx:min 0 ; sx:max 1 ;
27 sx:valueExpr [ a sx:NodeConstraint ;
28 sx:datatype xsd:date ]
29 ]
30 )
31 ] .

Here is the RDF encoding of the SHACL shapes graph in Example 7.4:
1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:minCount 1 ;
5 sh:maxCount 1 ;
6 sh:pattern "^[A-R]" ;
7 ];
8 sh:property [
9 sh:path schema:productId ;
10 sh:minCount 1 ;
11 sh:maxCount 1 ;
12 sh:pattern "^[M-Z]" ;
13 ];
14 sh:property [
15 sh:path schema:brand ;
16 sh:nodeKind sh:IRI ;
17 sh:node :Organization
18 ];
19 sh:property [
20 sh:path schema:purchaseDate ;
21 sh:maxCount 1 ;
22 sh:datatype xsd:date
23 ]
24 .

7.3 FOUNDATION: SCHEMA VS. CONSTRAINTS


Although both languages share a common goal, their designs are based on different approaches.
The designers of ShEx intended the language to be like a grammar or schema for RDF
graphs. This design was inspired by languages such as Yacc, RelaxNG, and XML Schema. The
main goal was to describe RDF graph structures so they could be validated against those de-
scriptions.
In contrast, the designers of SHACL aimed at providing a constraint language for RDF.
The main goal of SHACL is to verify that a given RDF graph satisfies a collection of con-
straints. In this sense, SHACL follows the Schematron approach, applied to RDF: it declares
240 7. COMPARING SHEX AND SHACL
constraints that RDF graphs must fulfill. Just as Schematron relies strongly on XPath, SHACL
relies strongly on SPARQL.
This difference is reflected in how validation results fit in. ShEx implementations usually
construct a data structure representing the RDF graph that were validated, containing the nodes
and shapes that were matched. After ShEx validation, the result shape map contains a structure
which can be considered as an annotated graph that can be traversed or used for further actions,
such as transforming RDF graphs into other data structures. This structure is analogous to the
Post Schema Validation Infoset from XML Schema (see Section 3.1.3).
In contrast, SHACL describes in detail the errors returned when constraints are not sat-
isfied. A SHACL validation report (see Section 5.5) can be very useful for detecting and repair-
ing errors in RDF graphs. When there are no errors, SHACL processors usually report a single
value, sh:conformance true. With SHACL, it can be difficult for users to distinguish the case in
which a node is valid because it was checked against some shape, versus the case in which a
node is not valid but was ignored by the SHACL processor because it was not reached during
the validation process.
The SHACL recommendation prescribes a basic structure for each violation result but
does not prescribe what information is to be returned when a node is validated. Nevertheless,
SHACL processors can enrich their results. Shaclex, for example, returns information about the
nodes validated.

7.4 INVOKING VALIDATION


SHACL shapes can include target declarations that associate each shape with a set of RDF
nodes and tell SHACL processors how to trigger the validation process (see Section 5.7).

Example 7.6 Target declarations and SHACL invocation


Consider the following SHACL shapes graph:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:targetObjectsOf schema:member ;
4 sh:targetSubjectsOf schema:familyName ;
5 sh:targetNode :alice ;
6 sh:property [
7 sh:path schema:name ;
8 sh:datatype xsd:string ;
9 sh:minCount 1 ;
10 sh:maxCount 1
11 ] .

and the following RDF graph:


1 :alice schema:name "Alice " .
7.4. INVOKING VALIDATION 241
3 :bob a :User ;
4 schema:name " Robert " .

6 :myCompany schema:member :carol .

8 :carol schema:name "Carol" .

10 :dave schema:familyName "Smith " ;


11 schema:name "Dave Smith" .

A SHACL processor checks that :alice, :bob, :carol, and :dave conform to :UserShape.
Directly associating target declarations to shapes can become quite verbose (see Sec-
tion 6.6). At the same time, it can limit the reusability of a shape in other contexts. In the
example above, if we import :UserShape in another context where the node :alice represents a
product instead of a user, the SHACL processor will still try to validate the node with that
shape. To avoid such cases, SHACL provides the sh:deactivated directive (see Section 5.13).

While including the target declarations in the schema is a convenient way to trigger val-
idation, it can be considered an anti-pattern because the shape can’t be reused for other data.
Even though this could work in some closed systems, it is impractical for data in open environ-
ments. In the interest of keeping schemas reusable, it is a good practice for SHACL to place
target declarations in a separate file and link this file to the schema with owl:imports.
A ShEx schema declares a constellation of shape expressions that function as a grammar
against which RDF nodes can be tested. The schema itself provides no mechanism for associ-
ating a shape expression with the nodes to which the schema applies. In the interest of making
schemas reusable, ShEx requires that definitions of shapes be decoupled from their application
to particular RDF graphs. ShEx separates the language of schemas, on the one hand, from the
association of shapes with nodes to be validated, on the other, by introducing the notion of shape
maps (see Section 4.9 for more details). This separation of concerns encourages the community
to innovate on node-shape association mechanisms independently from the validation seman-
tics. For example, though the shape map specification currently only supports RDF nodes by
direct reference or by triple pattern, Wikidata versions of ShEx include support for SPARQL
queries over remote endpoints. As such conventions evolve they can be rolled into future versions
of the shape map specification.

Example 7.7 Invoking validation through Shape maps in ShEx


The SHACL shapes graph from Example 7.6 can be expressed in ShEx with the following
query shape map:
1 { FOCUS rdf:type :User } @:UserShape ,
2 { _ schema:member FOCUS } @:UserShape ,
3 { FOCUS schema:familyName _ } @:UserShape ,
4 :alice @:UserShape
242 7. COMPARING SHEX AND SHACL
and removing the target declarations from the shape definition:
1 :UserShape {
2 schema:name xsd:string
3 }

The declarations above behave similarly to the SHACL target declarations. One subtle
difference is that while in the previous case, ShEx only checks direct instances of :User, SHACL
applies the concept of SHACL instance, which also encompass instances of subclasses of :User.
This possibility can be expressed using property paths in shape maps as:
1 { FOCUS rdf:type / rdfs:subClassOf * :User } @:UserShape

Another notable difference between SHACL target node declarations and ShEx shape
maps is the following: when a declared target node in SHACL does not exist in the data graph
and there are no required values for this node in the shape, the node passes the validation. In
ShEx if the node does not exit it always results in a failure, no matter of the shape definition.

7.5 MODULARIZATION AND REUSABILITY


SHACL leverages the property owl:imports to enable a shapes graph to import other shapes
graphs. This mechanism, which can be used to provide the basis of a modular design, is described
in Section 5.4.
ShEx has the concept of shapeExternal to declare that the contents of a shape can be ob-
tained from an external source (see Section 4.7.3).
ShEx has a basic import mechanism which allows a schema to derefentiate another schema
(see section 4.12) while SHACL has also the possibility to import other shapes graphs using
owl:imports (see section 5.4). One difference between ShEx and SHACL import mechanisms is
that ShEx dereferentiates the schema while SHACL is a graph merge, so in SHACL the system
expects to have already fetched all of the relevant shapes graphs.
Both languages support the reuse of shapes through extending a shape with an AND oper-
ator, as described in Section 4.8.1 (ShEx) and Section 5.34 (SHACL).

Example 7.8 Extending shapes in ShEx and SHACL


As a simple example, the following ShEx schema declares a :Product shape and a
:SoldProduct shape:

1 :Product {
2 schema:productId xsd:string
3 schema:price xsd:decimal
4 }

6 :SoldProduct @:Product AND {


7.5. MODULARIZATION AND REUSABILITY 243
7 schema:purchaseDate xsd:date ;
8 schema:productId /^[A-Z]/
9 }

A :SoldProduct has the same constraints as the :Product plus two more constraints. One
that further restricts the property schema:productId and another one that requires a new property
schema:purchaseDate.
Here is an analogous SHACL shapes graph:
1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:datatype xsd:string ;
5 sh:minCount 1 ;
6 sh:maxCount 1 ;
7 ];
8 sh:property [
9 sh:path schema:price ;
10 sh:datatype xsd:decimal ;
11 sh:minCount 1 ;
12 sh:maxCount 1 ;
13 ].

15 :SoldProduct a sh:NodeShape ;
16 sh:and (
17 :Product
18 [ sh:path schema:purchaseDate ;
19 sh:datatype xsd:date ;
20 sh:minCount 1 ;
21 sh:maxCount 1 ;
22 ]
23 [ sh:path schema:productId ;
24 sh:pattern "^[A-Z]" ;
25 sh:minCount 1 ;
26 sh:maxCount 1 ;
27 ]
28 ) .

Another way to reuse shapes in SHACL is by leveraging the subclass relationship and the
corresponding target declarations. The example above could be expressed as:
1 :Product a sh:NodeShape , rdfs:Class ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:datatype xsd:string
5 sh:minCount 1 ;
6 sh:maxCount 1
7 ];
244 7. COMPARING SHEX AND SHACL
8 sh:property [
9 sh:path schema:price ;
10 sh:datatype xsd:decimal
11 sh:minCount 1 ;
12 sh:maxCount 1
13 ].

15 :SoldProduct a sh:NodeShape , rdfs:Class ;


16 rdfs:subClassOf :Product ;
17 sh:property [
18 sh:path schema:purchaseDate ;
19 sh:datatype xsd:date
20 sh:minCount 1 ;
21 sh:maxCount 1
22 ] ;
23 sh:property [
24 sh:path schema:productId ;
25 sh:pattern "^[A-Z]" ;
26 sh:minCount 1 ;
27 sh:maxCount 1
28 ] ;
29 .

In this approach, :SoldProduct is declared as subclass of :Product. The rdfs:Class declaration


establishes that all nodes of rdf:type :SoldProduct must conform to shape :SoldProduct and also to
:Product.
One limitation of this approach is that it requires nodes to have an the appropriate rdf:type
declaration as well as keep rdfs:subClassOf statements in the data graph.

The reusability of both languages could be improved. For example, there is no notion of
a module, where one might declare internal or hidden shapes, or of public shapes that could be
imported by other modules. Also, there is no notion of a shape extending other shape, inheriting
some properties and redefining others. Such features could potentially be developed for both
languages.

7.6 SHAPES, CLASSES, AND INFERENCE


ShEx is only concerned with RDF graphs as they are presented to the validator. There is no
interaction between the ShEx processor and any inference mechanism. In this way, ShEx can
be used before or after inference. It can even be used to validate the behavior of an inference
engine if one defines the shapes that an RDF graph must have before and after inference (see
an example in Section 4.11).
In contrast, SHACL has some mechanisms that may interact with inference. For exam-
ple, the implicit class target (see Section 5.7.3), which associates a shape with a class, triggers
7.6. SHAPES, CLASSES, AND INFERENCE 245
validation on all nodes that are SHACL instances. The notion of SHACL instance is differ-
ent to the RDF Schema notion of instance because it encompasses instances of a class plus its
sub-classes (as determined by following rdfs:subClassOf links in the data), but does not take into
account all RDFS elements.
The results of applying a SHACL validator may be different if applied to RDF graphs
before or after RDFS inference. As SHACL processors are not required to support full RDFS
inference, they may ignore other RDFS predicates, such as rdfs:domain, rdfs:range, and sub-
properties of rdfs:subClassOf.
For example, consider the following SHACL shape:
1 :UserShape
2 sh:targetClass :User .
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1 ;
6 sh:datatype xsd:string ;
7 ] .

and the following RDF data:


1 :Teacher rdfs:subClassOf :User .
2 :teaches rdfs:domain :Teacher .

4 :frank :teaches :Algebra ; # Ignored without RDFS inference


5 schema:name "Frank" . # VPasses as :UserShape with RDFS inference

7 :grace :teaches :Logic ; # Ignored without RDFS inference


8 schema:name 34 . # XFails as :UserShape with RDFS inference

10 :oscar a :Teacher ; # X Fails as :UserShape


11 schema:name 45 .

If SHACL is applied after RDFS inference, the system checks whether :frank and :grace
conform to :UserShape. This is because the domain declaration of :teaches allows RDFS to infer
that they are instances of :Teacher and, hence, instances of :User, with the following results:
• :grace has a value for schema:name that is not an xsd:string.
• :oscar has a value for schema:name that is not an xsd:string.
In contrast, if SHACL is applied without RDFS inference, the system returns only one
error:
• :oscar has a value for schema:name that is not an xsd:string.
The system does not check :frank or :grace against shape :User because it only follows
rdf:type and rdfs:subClassOf declarations. In the absence of RDFS inference, the system only
246 7. COMPARING SHEX AND SHACL
checks that :oscar has shape :User. If SHACL is applied after RDFS inference, the system checks
the additional nodes.
This interference between SHACL and RDFS semantics hampers the use of SHACL to
validate an inference system as the use case described for ShEx in Example 3.11.
The property sh:entailment can be used to declare that the SHACL processors should add
inferred triples during validation to the data graph following the inference rules declared by a
given entailment regime (see Section 5.17). Nevertheless, SHACL processors are not required
to support entailment regimes. If a shapes graph declares an entailment and the processor does
not support it, a failure must be signalled.

7.7 VIOLATION REPORTING AND SEVERITIES


As pointed out above, SHACL puts more emphasis on validation and provides a dedicated RDF
vocabulary for describing conformance and reporting detailed violation results.
For every focus node that does not conform to a shape, an instance of sh:ViolationResult
is created in the SHACL results graph. Each violation result links back to the focus node
along with metadata, which includes the shape IRI, human readable messages, the failed con-
straint, the path, and (when available) the value node. The severity level of a SHACL shape,
if declared with (sh:Info, sh:Warning, or sh:Violation), can be included in the violation result (see
Section 5.6.5).
ShEx does not have rich violation reporting, but it can provide related functionality. The
result of the validation process is a shape map which contains information about the nodes that
conform to a shape or not. Every violation can be viewed as an entry showing the focus node
and the shape that failed. ShEx processors usually enrich these entries with further information.
As shapes in ShEx can contain arbitrary annotations (see Section 4.7.5), these annotations can
be included in the results.
In simple and top-level shape definitions, SHACL provides richer and granular violation
reporting for each individual constraint that failed. However, violations on nested constraints
as formed using sh:node, sh:and, sh:or, sh:xone, or sh:qualifiedValueShape, report only which nested
constraint failed (“sh:node failed”) without detailing why. Implementations could report that in-
formation by means of the sh:detail property, but that would be an implementation dependent
feature. Also, as a result of validation ShEx produces a Result Map associating nodes with shapes
(either validated or non-validated) while SHACL has no comparable feature.

7.8 DEFAULT CARDINALITIES


If no cardinality is declared, ShEx assumes the cardinality to be {1,1} while SHACL assumes
{0,*}.
7.9. PROPERTY PATHS 247
Example 7.9 Comparing cardinalities in ShEx and SHACL
The following ShEx schema declares that nodes conforming to :UserShape must have one
schema:name and one schema:givenName.

1 :UserShape {
2 schema:name xsd:string ;
3 schema:givenName xsd:string ;
4 }

The following SHACL shapes graph declares that if there is a schema:name then it must
have datatype xsd:string, and the same for schema:givenName:
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 ] ;
6 sh:property [
7 sh:path schema:givenName ;
8 sh:datatype xsd:string ;
9 ] .

Given the following data:


1 :alice schema:name "Alice Cooper "; # V Passes as :UserShape - ShEx
2 schema:givenName "Alice" . # V Passes as :UserShape - SHACL

4 :bob schema:givenName " Robert " ; # X Fails as :UserShape - ShEx


5 foaf:age 23 . # V Passes as :UserShape - SHACL
7 :carol schema:name 345 ; # X Fails as :UserShape - ShEx
8 schema:givenName 346 . # X Fails as :UserShape - SHACL

The difference in results is based on the difference between the ShEx and SHACL points
of view. In ShEx, a triple expression makes explicit which triples involving the focus node should
be found in the graph, and specifying a cardinality may require several such triples. The absence
of cardinality means one triple. In SHACL, a shape is a conjunction of constraints. A cardinality
constraint is used to constrain the number of allowed triples of a given kind, and the absence of
cardinality means no constraint on the number of triples allowed.

7.9 PROPERTY PATHS


SHACL property shapes can use a subset of SPARQL 1.1 property paths as values for
sh:path. In this way, SHACL leverages on the expressiveness of SPARQL property paths to
define constraints.
248 7. COMPARING SHEX AND SHACL
ShEx does not support arbitrary property paths—only direct and inverse predicates. How-
ever, it is easy to emulate this SHACL behavior using nested shapes or recursion.

Example 7.10 Comparing paths in SHACL and ShEx


The following SHACL declaration:
1 :GrandParent a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:zeroOrMorePath schema:knows ] ;
4 sh:class :Person ;
5 ] ;
6 sh:property [
7 sh:path ( schema:child schema:child ) ;
8 sh:minCount 1 ;
9 sh:class :GrandChild ;
10 ]
11 .

can be defined in ShEx as:


1 :GrandParent {
2 schema:knows @:PersonKnown *;
3 schema:child {
4 schema:child { a [ :GrandChild ] }
5 }
6 }

8 :PersonKnown {
9 a [ :Person ] ;
10 schema:knows @:PersonKnown *
11 }

7.10 RECURSION
ShEx supports the definition of cyclic data models with recursive shapes (see Section 4.7.2)
while the processing of recursive shapes is undefined in SHACL (see Section 5.12.1). However,
some recursion cases can be handled in SHACL through SHACL property paths.

Example 7.11 Recursion


The following shape declares a recursive :UserShape as:
1 :UserShape IRI {
2 schema:knows @:UserShape *
3 }
7.10. RECURSION 249
Nodes that conform to :UserShape must be IRIs and can have zero or more schema:knows
arcs whose values must all conform to :UserShape.
A direct translation to SHACL would be:
1 :UserShapeRecursion a sh:NodeShape ; # This definition is recursive
2 sh:nodeKind sh:IRI ;
3 sh:property [
4 sh:path schema:knows ;
5 sh:node :UserShapeRecursion
6 ] .

However, recursion in SHACL is undefined and not all SHACL processors may han-
dle that definition in the same way. The specification leaves recursion as an implementation-
dependent feature.
One possible solution is to add target declarations to the shape to trigger the validation
against them. A typical solution is to use rdf:type declarations as we saw in Section 5.12.1. In
this case, we could also use sh:targetSubjectsOf like:
1 :UserShapeRecursion a sh:NodeShape ;
2 sh:targetSubjectsOf schema:knows ;
3 sh:nodeKind sh:IRI ;
4 sh:property [
5 sh:path schema:knows ;
6 sh:class :User
7 ] .

Now, every node that is a subject of schema:knows must conform to that shape.
This solution may not be realistic in general. In this case, for example, we are forcing every
node that is a subject of schema:knows to conform to :UserShape and in other contexts, this could
be too restrictive. The same situation happens if we use sh:targetClass declarations.
Another approach to emulate recursive behavior is to use property paths. For example:
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:zeroOrMorePath schema:knows ] ;
4 sh:nodeKind sh:IRI ;
5 ] .

In this case, every node that is related by property schema:knows zero or more times with
the focus node, must be an IRI. With this solution, there may be other nodes that are subjects
of schema:knows but do not need to conform to :UserShape.
In Section 5.12.1, we described more advanced alternatives for using SHACL property
paths as an alternative to recursion.
250 7. COMPARING SHEX AND SHACL
7.11 PROPERTY PAIR CONSTRAINTS AND UNIQUENESS
Property pair constraints in SHACL can be used to compare current values with values from
another path, checking if they are equal, different or less than them (see Section 5.14).
ShEx 2.0 does not have the concept of property pair constraints, though this possibility
is being studied to be included in future versions.

Example 7.12 Example with property pair constraints


The following shapes graph declares that nodes conforming to :UserShape must fulfil the
constraint that schema:givenName is equal to foaf:firstName and different from schema:lastName, and
that schema:birthDate must be less than :loginDate.
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:givenName ;
4 sh:datatype xsd:string ;
5 sh:disjoint schema:lastName ;
6 sh:minCount 1;
7 sh:maxCount 1;
8 ] ;
9 sh:property [
10 sh:path foaf:firstName ;
11 sh:equals schema:givenName ;
12 sh:minCount 1;
13 sh:maxCount 1;
14 ] ;
15 sh:property [
16 sh:path schema:birthDate ;
17 sh:datatype xsd:date ;
18 sh:lessThan :loginDate
19 sh:minCount 1;
20 sh:maxCount 1;
21 ] .

The previous example could be written in a future version of ShEx as:


1 :UserShape { # Not supported in ShEx 2.0
2 $<givenName > schema:givenName xsd:string ;
3 $<firstName > schema:firstName xsd:string ;
4 $<birthDate > schema:birthDate xsd:date ;
5 $<loginDate > :loginDate xsd:date ;
6 $<givenName > = $<firstName > ;
7 $<givenName > != $<lastName > ;
8 $<birthDate > < $<loginDate >
9 }
7.12. REPEATED PROPERTIES 251
One constraint often required is the ability to declare unique keys. Unique keys are com-
binations of values that must be unique in a given scope. The scope can be the entire graph or
a focus node. One example of a unique constraint for an entire graph is to require that there be
no pair of identical values for the properties schema:givenName and schema:lastName. One example
of a unique constraint with a focus node scope would be to require that each node not have two
values of rdfs:label with the same language tag.
Neither SHACL nor ShEx 2.0 support unique keys in general, although they are sup-
ported by OWL 2. SHACL Core offers the sh:uniqueLang constraint to say that there can be no
more than one literal for each language tag (see Section 5.31). Other constraints can be defined
using SHACL-SPARQL. In the case of ShEx, there is a proposal to add a UNIQUE keyword to
the language, with the scope and the list of predicates that must be unique as parameters.
1 :UserShape { # Not supported in ShEx 2.0
2 schema:givenName xsd:string ;
3 schema:lastName xsd:string ;
4 UNIQUE ( schema:givenName , schema:lastName )
5 }

7.12 REPEATED PROPERTIES


ShEx allows multiple constraints on triples involving the focus nodes with the same property to
be defined. This feature is called repeated properties as explained in Section 4.6.7. In SHACL,
repeated properties behave conjunctively, which means that all constraints applied to properties
with the same sh:path must be satisfied. The typical SHACL pattern of:
1 :Shape a sh:NodeShape ;
2 sh:property [
3 sh:path :p1;
4 #... constraints on :p1 ...
5 ];
6 sh:property [
7 sh:path :p2;
8 #... constraints on :p2 ...
9 ];
10 ...

must be changed if we want :p1 and :p2 to be the same property, only with different values. A
direct translation of that pattern to:
1 :Shape a sh:NodeShape ;
2 sh:property [
3 sh:path :p;
4 # ... constraints on :p ...
5 ];
6 sh:property [
252 7. COMPARING SHEX AND SHACL
7 sh:path :p;
8 #... other constraints on :p ...
9 ];
10 ...

means that all constraints apply to the path :p conjunctively.

Example 7.13 Repeated properties in ShEx and SHACL


The following ShEx schema declares that a :Person has two parents, one with the value of
:isMale true and the other with the value :isFemale true.

1 :Person {
2 schema:parent { :isMale [ true ] }
3 schema:parent { :isFemale [ true ] }
4 }

A direct translation of the ShEx schema into SHACL would be:


1 :Person a sh:NodeShape ;
2 sh:property [
3 sh:path schema:parent ;
4 sh:node [
5 sh:property [
6 sh:path :isMale ;
7 sh:hasValue true ;
8 sh:maxCount 1
9 ]
10 ]
11 ];
12 sh:property [
13 sh:path schema:parent ;
14 sh:node [
15 sh:property [
16 sh:path :isFemale ;
17 sh:hasValue true ;
18 sh:maxCount 1
19 ]
20 ]
21 ]
22 .

However, this SHACL Shapes graph would only be satisfied by a node whose schema:parent
value is both male and female.
1 :alice a :Person ;
2 schema:parent :bob ; # V Passes as :Person in ShEx
3 schema:parent :carol . # X Fails as :Person in SHACL
7.12. REPEATED PROPERTIES 253
5 :bob :isMale true .
6 :carol :isFemale true .

8 :dave a :Person ;
9 schema:parent :x . # X Fails as :Person in ShEx
10 # V Passes as :Person in SHACL
12 :x :isMale true ;
13 :isFemale true .

As described in Section 5.12.2, repeated properties can be handled in SHACL using


sh:qualifiedValueShape but the definitions are more verbose.

Example 7.14 Repeated properties with qualified value shapes


The following declaration handles the previous example using qualified value shapes.
1 :Person a sh:NodeShape ;
2 sh:property [
3 sh:path schema:parent ;
4 sh:qualifiedValueShape [
5 sh:path :isMale ;
6 sh:hasValue true
7 ] ;
8 sh:qualifiedMinCount 1 ;
9 sh:qualifiedMaxCount 1 ;
10 ];
11 sh:property [
12 sh:path schema:parent ;
13 sh:qualifiedValueShape [
14 sh:path :isFemale ;
15 sh:hasValue true
16 ] ;
17 sh:qualifiedMinCount 1 ;
18 sh:qualifiedMaxCount 1 ;
19 ] ;
20 sh:property [ sh:path schema:parent ;
21 sh:minCount 2;
22 sh:maxCount 2
23 ]
24 .

Note that it requires to establish a count of the number of repeated properties allowed (in
this case 2).
254 7. COMPARING SHEX AND SHACL
7.13 EXACTLY ONE AND ALTERNATIVES
Data coherence minimizes defensive programming by providing predictable, logical data struc-
tures that must be used. To take a trivial example, a data structure may offer a choice between
different representations of a name as in Example 4.30 (for ShEx) and the corresponding Ex-
ample 5.38 (for SHACL).
Let’s change the constraint to require a combination of foaf:firstName and foaf:lastName or
foaf:givenName and foaf:familyName or schema:givenName and schema:familyName where none of these
properties can be mixed with the others. In ShEx, this can be declared as:
1 :Person {
2 foaf:firstName . ; foaf:lastName . |
3 foaf:givenName . ; foaf:familyName . |
4 schema:givenName . ; schema:familyName .
5 }

Given the following data, :alice and :bob conform to :Person while :carol and :dave do not.
In the case of :dave, it fails because the data meets one side of the disjunction and has some
properties from the other side.
1 :alice foaf:firstName "Alice" ; # V Passes as :Person
2 foaf:lastName " Cooper " .

4 :bob schema:givenName " Robert " ; # V Passes as :Person


5 schema:familyName "Smith " .

7 :carol foaf:firstName "Carol" ; # X Fails as :Person


8 foaf:lastName "King" ;
9 schema:givenName "Carol " ;
10 schema:familyName "King" .

12 :dave foaf:firstName "Dave" ; # X Fails as :Person


13 foaf:lastName "Clark" ;
14 schema:givenName "Dave" .

A first attempt to model the example in SHACL could be:


1 :PersonShape a sh:NodeShape ;
2 sh:targetClass :Person ;
3 sh:xone (
4 [ sh:property [
5 sh:path foaf:firstName ;
6 sh:minCount 1; sh:maxCount 1
7 ] ;
8 sh:property [
9 sh:path foaf:lastName ;
10 sh:minCount 1; sh:maxCount 1
11 ] ;
7.13. EXACTLY ONE AND ALTERNATIVES 255
12 ]
13 [ sh:property [
14 sh:path foaf:givenName ;
15 sh:minCount 1; sh:maxCount 1
16 ] ;
17 sh:property [
18 sh:path foaf:familyName ;
19 sh:minCount 1; sh:maxCount 1
20 ] ;
21 ]
22 [ sh:property [
23 sh:path schema:givenName ;
24 sh:minCount 1; sh:maxCount 1
25 ] ;
26 sh:property [
27 sh:path schema:familyName ;
28 sh:minCount 1; sh:maxCount 1
29 ] ;
30 ]
31 ) .

However, this SHACL shapes graph has a meaning different from the ShEx schema.
In this case, :dave conforms to :Person because it matches exactly one of the shapes (it has
foaf:firstName and foaf:lastName) and does not match the other shapes. The intended meaning
was that it should not have any of the other properties but it has schema:givenName.
As we described in Section 5.38, SHACL’s sh:xone does not check if there are partial
matches in other shapes. A workaround to simulate ShEx behavior is to normalize the expression
using a top-level disjunction whose shapes exclude the properties that are not desired.
1 :Person a sh:NodeShape ;
2 sh:or (
3 [ sh:property [
4 sh:path foaf:firstName ;
5 sh:minCount 1;
6 sh:maxCount 1
7 ];
8 sh:property [
9 sh:path foaf:lastName ;
10 sh:minCount 1;
11 sh:maxCount 1
12 ];
13 sh:property [
14 sh:path foaf:givenName ;
15 sh:maxCount 0
16 ];
17 sh:property [
18 sh:path foaf:familyName ;
19 sh:maxCount 0
256 7. COMPARING SHEX AND SHACL
20 ];
21 sh:property [
22 sh:path schema:givenName ;
23 sh:maxCount 0
24 ];
25 sh:property [
26 sh:path schema:familyName ;
27 sh:maxCount 0
28 ];
29 ]
30 [ sh:property [
31 sh:path foaf:firstName ;
32 sh:maxCount 0
33 ];
34 sh:property [
35 sh:path foaf:lastName ;
36 sh:maxCount 0
37 ];
38 sh:property [
39 sh:path foaf:givenName ;
40 sh:minCount 1;
41 sh:maxCount 1
42 ] ;
43 sh:property [
44 sh:path foaf:familyName ;
45 sh:minCount 1; sh:maxCount 1
46 ];
47 sh:property [
48 sh:path schema:givenName ;
49 sh:maxCount 0
50 ] ;
51 sh:property [
52 sh:path schema:familyName ;
53 sh:maxCount 0
54 ];
55 ]
56 [ sh:property [
57 sh:path foaf:firstName ;
58 sh:maxCount 0
59 ];
60 sh:property [
61 sh:path foaf:lastName ;
62 sh:maxCount 0
63 ];
64 sh:property [
65 sh:path foaf:givenName ;
66 sh:maxCount 0
67 ];
7.14. TREATMENT OF CLOSED SHAPES 257
68 sh:property [
69 sh:path foaf:familyName ;
70 sh:maxCount 0
71 ];
72 sh:property [
73 sh:path schema:givenName ;
74 sh:minCount 1;
75 sh:maxCount 1
76 ] ;
77 sh:property [
78 sh:path schema:familyName ;
79 sh:minCount 1;
80 sh:maxCount 1
81 ];
82 ]
83 )
84 .

Although this approach solves the problem, more complex and nested shapes can increase
the complexity and readability of SHACL shapes.

7.14 TREATMENT OF CLOSED SHAPES


ShEx has the CLOSED keyword to declare that a node must not have other properties beyond those
declared in the shape. SHACL also has a sh:closed parameter to declare that a node conforming
to a shape must not have other properties different from the properties declared in the shape.
Although they look similar, there are some differences due to the interaction of CLOSED with
other language features.
When a SHACL shape is closed, SHACL processors only take into account the top-level
properties that appear as the values of sh:path in property paths. In this way, it is not the same
if a shape is declared as a conjunction of property paths as when it is declared using sh:and. The
following shape declares that nodes conforming to :UserShape must have properties schema:name
and schema:birthDate. The declaration sh:closed true specifies that nodes conforming to :UserShape
cannot have other properties.
1 :UserShape a sh:NodeShape ;
2 sh:closed true ;
3 sh:property [ sh:path schema:name ;
4 sh:minCount 1;
5 sh:maxCount 1;
6 sh:datatype xsd:string
7 ] ;
8 sh:property [ sh:path schema:birthDate ;
9 sh:minCount 1;
10 sh:maxCount 1 ;
11 sh:datatype xsd:date
258 7. COMPARING SHEX AND SHACL
12 ]
13 .

If we rewrite that example using a sh:and as:


1 :UserShape a sh:NodeShape ;
2 sh:closed true ;
3 sh:and (
4 [ sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string
8 ]
9 [ sh:path schema:birthDate ;
10 sh:minCount 1;
11 sh:maxCount 1 ;
12 sh:datatype xsd:date
13 ]
14 )
15 .

then there will be no nodes satisfying the shape, as the two properties nested under sh:and
are thus hidden and not taken into consideration by the sh:closed directive.
A solution in this case is to enumerate the properties that we allow using
sh:ignoredProperties. In this case, one should add:

1 :UserShape
2 sh:ignoredProperties ( schema:name
3 schema:birthDate
4 )

A similar situation could happen if we use more complex property paths.


For example, we may want to declare that users can have either schema:name or foaf:name
using an alternative property path as:
1 :UserShape a sh:NodeShape ;
2 sh:closed true ;
3 sh:property [
4 sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ;
5 sh:minCount 1; sh:maxCount 1;
6 sh:datatype xsd:string ] ;
7 .

As in the previous example, no node would conform to that shape because the closed
declaration does not find direct properties in property paths.
There are two solutions: either to add a sh:ignoredProperties declaration enumerating all
the properties as in previous example, or to add a property declaration for each predicate that
specifies no cardinality, thus has no other effect.
7.15. STEMS AND STEM RANGES 259

1 :UserShape a sh:NodeShape ;
2 sh:closed true ;
3 sh:property [
4 sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string
8 ] ;
9 sh:property [ sh:path schema:name ] ;
10 sh:property [ sh:path foaf:name ] ;
11 .

7.15 STEMS AND STEM RANGES


ShEx supports the definition of stems and stem ranges when defining value sets (see Sec-
tion 4.5.4). SHACL does not have built-in support for stems or stem ranges. Stems and stem
ranges could be emulated with sh:pattern, sh:nodeKind, and sh:or.

Example 7.15 IRI ranges example


The following example was described in Section 4.19 and declared
1 prefix codes: <http: // example .codes />

3 :Product {
4 :status [ codes:good ~ codes:bad ~ ]
5 }

A possible SHACL definition using regular expressions could be:


1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path :status ;
4 sh:minCount 1 ;
5 sh:maxCount 1 ;
6 sh:nodeKind sh:IRI ;
7 sh:or (
8 [ sh:pattern "^http: // example .codes/good" ]
9 [ sh:pattern "^http: // example .codes/bad" ]
10 )
11 ] .

Another possibility is to define a reusable constraint component in SHACL-SPARQL


as:
1 :StemConstraintComponent
2 a sh:ConstraintComponent ;
260 7. COMPARING SHEX AND SHACL
3 sh:parameter [ sh:path :stem ];
4 sh:validator [ a sh:SPARQLAskValidator ;
5 sh:message "Value does not have stem {$stem }";
6 sh:ask """
7 ASK { FILTER (! isBlank ( $value ) &&
8 strstarts (str( $value ),str($stem )))
9 }"""
10 ] .

which can be used as:


1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path :status ;
4 sh:minCount 1 ;
5 sh:maxCount 1 ;
6 sh:or (
7 :stem <http: // example .codes/good >
8 :stem <http: // example .codes/bad >
9 )
10 .

ShEx also has range exclusions that can declare values to exclude, either literal or specified
with a stem (see 4.20). That feature is not part of SHACL Core and should be defined using
SHACL-SPARQL.

7.16 ANNOTATIONS
ShEx has the concept of annotations which can be attached to several constructs (see Sec-
tion 4.7.5). For example, the following ShEx schema attaches two annotations to each triple
constraint.

Example 7.16 Annotations example in ShEx


1 :Person {
2 schema:name xsd:string
3 // rdfs:label "Name"
4 // rdfs:comment "Name of person " ;
5 schema:birthDate xsd:date
6 // rdfs:label " BirthDate "
7 // rdfs:comment "Date of birth"
8 }

ShEx does not endorse or require the use of any specific annotation vocabulary.
SHACL has non-validating constraint components (see Section 5.15), such as sh:name
and sh:description, which are ignored by the SHACL processor during validation but can have
7.17. SEMANTICS AND COMPLEXITY 261
special meaning for user interface generation. It is also possible to add further informative triples
to any constraint or component, such as rdfs:label.

Example 7.17 Annotations example in SHACL


The following ShEx schema declares a shape :Person using the non-validating properties
sh:name and sh_description and the annotation rdfs:label.

1 :Person a sh:NodeShape ;
2 sh:property [
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:name "Name" ;
6 sh:description "Name of person "
7 rdfs:label "Name";
8 ];
9 sh:property [
10 sh:path schema:birthDate ;
11 sh:datatype xsd:date ;
12 sh:name " BirthDate " ;
13 sh:description "Birth date"
14 rdfs:label " BirthDate ";
15 ] .

As we saw in Section 5.15, SHACL non-validating properties can be helpful for gener-
ating forms from SHACL definitions.
Although ShEx does not provide built-in non-validating properties, it would be possible
to use annotations from other vocabularies, even from SHACL.

7.17 SEMANTICS AND COMPLEXITY


The ShEx semantic specification [81] is based on mathematical concepts and has been proven
to have a well founded semantics [11]. As we saw in Section 4.8.3, a restriction was imposed on
the combination of recursion and negation to avoid ill-formed data models.
With regards to the complexity of the validation algorithm, ShEx semantics is based on a
partitioning strategy where triples in the data are assigned to triple constraints in the schema and
the matching algorithm must take into account that arcs in a graph are unordered. It is possible
to construct schemas for which it is very expensive to find a mapping from RDF data triples to
triple constraints that satisfies the schema. In practical schemas, this is rarely a concern as the
search space is quite small, but certain mistakes in a schema can create a large search space. The
ShEx primer2 contains some advices to improve performance.
2 http://shex.io/shex-primer/
262 7. COMPARING SHEX AND SHACL
”Accidentally duplicating many triple constraints in a shape causes the search space
to explode. If a validation process takes a long time or a lot of memory, look for
duplicated chunks of the schema.

For shapes with multiple triple constraints for the same predicate, try to minimize
the overlap between the value expressions. For instance, if three types of inspection
are necessary on a manufacturing checklist, use three different constraints for each
of the inspection properties rather than requiring three different inspection proper-
ties with a value expression which is a union of all three types. This will make the
validation process more efficient and will more effectively capture the business logic
in the schema.”

The SHACL Core semantics is defined in natural language with some non-normative
SPARQL templates, while SHACL SPARQL depends on a SPARQL processor. Its complexity
depends on the complexity of SPARQL, which can also be quite expensive, especially in the use
of property paths. As in the case of ShEx, it is also possible to declare shapes graphs that may
consume a lot of time or memory.
Both ShEx and SHACL open the door for further research on optimizations and spe-
cialized implementations usable for big datasets. Validators could define language subsets with
constructs that behave better when confronted with such datasets. To our knowledge, current
implementations have mainly been tested on in-memory data: separate RDF files, or relatively
small units of work (transactions). An exception is RDFUnit, that supports the execution of
SHACL directly on SPARQL endpoints and thus, can theoretically scale along with the capa-
bilities of the SPARQL engine. A lot of research remains to see how how very large (and not
in-memory) data sets can be efficiently validated with RDF shapes.
Benchmarks and testing tools are an essential step towards measuring the performance of
both languages as well as implementations. One early attempt was to use the WebIndex dataset
as a benchmark [57].

7.18 EXTENSION MECHANISMS


SHACL-SPARQL can be used to define both custom SPARQL-based constraints as well as
reusable SPARQL-based constraint components (see Section 5.16.2). As the constraint compo-
nents are defined in SPARQL, any SPARQL compliant engine could potentially run them with-
out requiring software updates for execution. A SPARQL engine will be required in any case.
SHACL also provides SHACL-Javascript that can be used to write extensions (Section 5.20).
SHACL-SPARQL allows the definition of new constraint components which can have
parameters and can be reused in new contexts. It is expected that SHACL libraries of useful
constraint components will be developed in the future. For example, the http://datashapes
.org/ site contains a collection of some constraint components that extend SHACL Core.
7.19. CONCLUSIONS AND OUTLOOK 263
ShEx has provisions for callout to arbitrary functions, called semantic actions, that are
language-agnostic (see Section 4.10). However, semantic actions cannot be used to create new
reusable parametrizable shape expressions. This is considered an item for future work on ShEx.

7.19 CONCLUSIONS AND OUTLOOK


As of July 2017, it appears that ShEx and SHACL will evolve as two different specifications.
The design of SHACL prioritized the use of SPARQL as an execution engine and an exten-
sion mechanism for defining new constraint components, while ShEx was designed de novo to
meet its use cases. SHACL leverages a query language for validating sets of constraints, while
validation schemas in the ShEx language are defined in terms of a grammar.
There is, however, a significant intersection between the two languages. Many common
use cases may be met with either language, although users should consider how the limitations
of these languages apply to their current and future requirements. In this book, we described and
compared each formalism so that readers can assess which technology better fits their problems.
If we look for parallels in the XML ecosystem, ShEx is closer to RelaxNG or XML
Schema, which provide structural definitions for XML documents. SHACL is closer to
Schematron, which defines rules or constraints on top of XPath analogously to how SHACL
defines constraints on top of SPARQL. SHACL Core can capture simple structures, but more
complex structures, with exclusive choices or repeated properties, may require multiple inter-
related constraints.
The two specifications currently have different implementation ecosystems. ShEx has been
implemented in a variety of programming languages and RDF libraries: Apache Jena, Ruby,
Javascript, Haskell, and Python (see section 4.3). In the case of SHACL, most implementations
are based on Apache Jena and there is an implementation based on Javascript (see section 5.2)
although there are some implementations appearing in other systems like rdf4j. Most ShEx
implementations are non-commercial and have been developed mainly by individual projects.
SHACL has a mature commercial implementation, bundled with the TopBraid suite of prod-
ucts, which offers a rich user interface for editing SHACL-based data models. Although Top-
Braid is a commercial product, SHACL’s implementation is based on a separate open source
library maintained by TopQuadrant. SHACL is also integrated in the free edition of TopBraid
Composer.
Both ShEx and SHACL open several lines for future work and research.

• Application to RDF vocabulary design. When designing RDF vocabularies, it is a com-


mon practice to include an informal UML class diagram which represents the classes and
their relationships. Some examples are the DCAT vocabulary [61], the organization on-
tology [83] and the RDF Data Cube vocabulary [24]. Other vocabularies, such as the
Provenance Ontology [59] or the Annotation vocabulary [87], provide diagrams in simi-
lar styles.
264 7. COMPARING SHEX AND SHACL
In the future, these diagrams and vocabulary specifications can be backed by ShEx or
SHACL specifications. A first step in that direction is seen where SHACL is used to
capture the RDF Data Cube integrity constraints.3 There is much room for innovations
connecting these graphical representations to ShEx schemas or SHACL shapes graphs,
such as shape visualization, or generating shapes from customized UML diagrams.

• Efficient implementation of ShEx/SHACL processors. It may be necessary to identify


subsets of those languages that can be implemented efficiently, especially for handling big
datasets. One problem with current implementations is that they work mainly in memory,
limiting the size of datasets that can be processed. One possible solution could be to have
federated validators exchange intermediate validation results.

• Shapes induction. Given the recent emergence of schema languages, almost all existing
RDF data has no associated schemas. We can expect that schemas will be created for much
of the existing data. Deriving that automatically will greatly accelerate the availablity of
schemas. Some initial attempts are described in [99] and [37]. Such tools could become
part of the validation process, producing schemas that are conservative enough to reject
data patterns which are dubious because they occur very rarely in the examined data.
Given that there is already a large amount of RDF data that comes from structured sources
such as SQL databases or Wikipedia info boxes, derived schemas will likely reflect con-
straints native to the source format from which the data was converted or extracted.

• Subgraph extraction. An instersting application of RDF Shapes is to be used as a driver


for extracting subsets of a dataset that conform to specific shapes. For example, one could
want to extract all the persons in DBpedia that have an image and a birthdate. Although
this can be easily achieved for simple and independent shapes, complex schemas can be
quite a challenging task.

• Approximate validation. An interesting topic for future research is to accommodate prob-


abilistic approaches for RDF validation, which can check or predict typical graph struc-
tures around some nodes.

• Optimization of RDF stores based on shapes. RDF stores that know the shape of their
RDF graphs can optimize their internal representations and increase the performance of
SPARQL queries.

• User interface generation from shapes. Editing RDF by hand is usually an error-prone
and non-user-friendly task. If the structure of the data is known, the editorial process can
be improved. Given that ShEx and SHACL Core define the properties that RDF nodes
can have, specialized user interfaces and forms could be generated from those shapes to
3 https://www.w3.org/2011/gld/validator/datacube.shapes.ttl
7.19. CONCLUSIONS AND OUTLOOK 265
increase user friendliness. As we described in Section 5.15, SHACL contains some built-
in annotation properties which can help user interface generation from shapes graphs.
ShEx also has support for any annotation properties, which in the future could also be
used to generate rich user interfaces.

• Generating Software Artifacts from Shapes. It may be possible to generate various


software artifacts from appropriately extended shapes, such as: Object-RDF Mappinig
(ORM) layers, R2RML conversion scripts, JSONLD contexts and frames, etc.

• Schema transformation and mappings between data models. One of the most frequent
needs in computer science is to transform data based on some schema to data conforming
to another schema. These transformations are usually made by ad-hoc and error-prone
procedural programs. Because shapes languages can capture the structures of the sources
and targets of these transformations, they can be leveraged to define mappings. ShEx
Map,4 an extension of ShEx, can be used to convert RDF data between schemas.

• Integration between ShEx and SHACL. Although ShEx and SHACL are two different
approaches, both were designed to handle the general problem of RDF validation. ShEx
shines in its support of recursion based on well-founded semantics, while SHACL shines
in its support for SPARQL property paths and other SPARQL features. As in the case
of XML, where Schematron and RelaxNG can be used together [84], ShEx and SHACL
could be combined in a project to leverage the advantages of each.
On the other hand, the underpinnings of ShEx and SHACL are not radically different.
One implementation, Shaclex,5 uses compatible parts of libraries to implement a proces-
sor for both SHACL and ShEx and is being extended to convert between subsets of the
languages.

• ShEx and SHACL best practices. This book describes how ShEx and SHACL can be
used to express both simple and complex constraints on RDF data. It does not attempt
to teach modeling, or product design, or the engineering skill of knowing when to define
constraints and when to leave data less constrained. While modeling and enterprise data
management are covered by an extensive literature, the scale and breadth of the Semantic
Web requires new formulations of well-known problems.

ShEx and SHACL will play an important role in the future development of RDF and
will be a core part of the Semantic Web tool set. As more semantic data is generated, and more
applications are needed to integrate and consume it, RDF validation will be a fundamental
enabler for data quality and systems interoperability.
4 http://shex.io/extensions/Map/
5 http://labra.github.io/shaclex/
266 7. COMPARING SHEX AND SHACL
7.20 SUMMARY
• ShEx and SHACL can both be used to validate RDF.
• The expressiveness of ShEx and SHACL for common use cases is similar.
• ShEx is a W3C Community Group specification while SHACL Core and SHACL-
SPARQL are a W3C Recommendation
• ShEx is schema-oriented, while SHACL is focused on defining constraints over RDF
graphs.
• ShEx can be used with a compact syntax, a JSON-LD syntax, or any RDF syntax. SHACL
can be used with any RDF syntax, and a draft compact syntax has been proposed.
• ShEx has support for recursion and cyclic data models while recursion in SHACL is un-
defined.
• SHACL has support for arbitrary SPARQL property paths while ShEx has support only
for incoming and outgoing arcs.
• Both ShEx and SHACL support violation reporting at the shape level. For simple shapes,
SHACL can further distinguish the violations per constraint, as well as provide more
violation metadata. SHACL returns the violations in RDF using the Validation Report
vocabulary while ShEx returns a shape map with all nodes that were validated, including
the ones that pass validation while SHACL only the ones that failed.
• ShEx has a language agnostic extension mechanism called semantic actions while SHACL
offers extensibility through SPARQL and JavaScript.

7.21 SUGGESTED READING


• A seminal paper comparing ShEx and SHACL in its early versions: J. E. Labra Gayo,
E. Prud’hommeaux, H. Solbrig, and I. Boneva. Validating and describing linked data
portals using shapes. http://arxiv.org/abs/1701.08924
• Another paper comparing different RDF validation requirements: T. Hartmann, B. Za-
pilko, J. Wackerow, and K. Eckert. Validating RDF data quality using constraints to
direct the development of constraint languages. In IEEE 10th International Conference on
Semantic Computing (ICSC), pages 116–123, February 2016. DOI: 10.1109/icsc.2016.43
APPENDIX A

WebIndex in ShEx
The following code contains the schema of the WebIndex in ShEx that was described in Sec-
tion 6.1.1.
1 prefix : <http: // example .org/>
2 prefix sh: <http: // www.w3.org/ns/shacl#>
3 prefix xsd: <http: // www.w3.org /2001/ XMLSchema #>
4 prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#>
5 prefix wf: <http: // data. webfoundation .org#>
6 prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema #>
7 prefix qb: <http: // purl.org/linked -data/cube#>
8 prefix cex: <http: // purl.org/weso/ ontology / computex #>
9 prefix dct: <http: // purl.org/dc/terms />
10 prefix skos: <http: // www.w3.org /2004/02/ skos/core#>
11 prefix foaf: <http: // xmlns.com/foaf /0.1/ >
12 prefix org: <http: // www.w3.org/ns/org#>

14 :Country {
15 rdfs:label xsd:string ;
16 wf:iso2 LENGTH 2
17 }
18 :DataSet { a [ qb:DataSet ] ;
19 qb:structure [ wf:DSD ] ;
20 rdfs:label xsd:string ;
21 qb:slice @:Slice * ;
22 dct:publisher @:Organization
23 }
24 :Slice { a [ qb:Slice ] ;
25 qb:sliceStructure [ wf:sliceByYear ] ;
26 qb:observation @:Observation * ;
27 cex:indicator @:Indicator
28 }
29 :Observation { a [ qb:Observation ] ;
30 a [ wf:Observation ] ;
31 cex:value xsd:float ;
32 rdfs:label xsd:string ? ;
33 dct:issued xsd:dateTime ;
34 dct:publisher [ wf:WebFoundation ] ? ;
35 qb:dataSet @:DataSet ;
36 cex:ref -area @:Country ;
37 cex:indicator @:Indicator ;
268 A. WEBINDEX IN SHEX
38 ( cex:computation @:Computation
39 | wf:source IRI
40 )
41 }
42 :Computation { a [ cex:Computation ] }
43 :Indicator { a [ wf:PrimaryIndicator wf:SecondaryIndicator ] ;
44 rdfs:label xsd:string ;
45 wf:provider @:Organization ;
46 }
47 :Organization CLOSED EXTRA a { a [ org:Organization ] ;
48 rdfs:label xsd:string ;
49 foaf:homepage IRI
50 }
APPENDIX B

WebIndex in SHACL
The following code contains the full version of the WebIndex data in SHACL that was described
in Section 6.1.2.
1 @prefix : <http: // example .org/> .
2 @prefix sh: <http: // www.w3.org/ns/shacl#> .
3 @prefix xsd: <http: // www.w3.org /2001/ XMLSchema #> .
4 @prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns# >.
5 @prefix wf: <http: // data. webfoundation .org# >.
6 @prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema # >.
7 @prefix qb: <http: // purl.org/linked -data/cube#> .
8 @prefix cex: <http: // purl.org/weso/ ontology / computex #> .
9 @prefix dct: <http: // purl.org/dc/terms />.
10 @prefix skos: <http: // www.w3.org /2004/02/ skos/core# >.
11 @prefix foaf: <http: // xmlns .com/foaf /0.1/ > .
12 @prefix org: <http: // www.w3.org/ns/org#> .

15 :Country a sh:NodeShape ;
16 sh:property [
17 sh:path rdfs:label ;
18 sh:datatype xsd:string ;
19 sh:minCount 1 ;
20 sh:maxCount 1
21 ] ;
22 sh:property [
23 sh:path wf:iso2 ;
24 sh:datatype xsd:string ;
25 sh:length 2 ;
26 sh:minCount 1 ;
27 sh:maxCount 1
28 ]
29 .

31 :DataSet a sh:NodeShape ;
32 sh:property [
33 sh:path rdf:type ;
34 sh:hasValue qb:DataSet ;
35 sh:minCount 1 ;
36 sh:maxCount 1
37 ] ;
270 B. WEBINDEX IN SHACL
38 sh:property [
39 sh:path qb:structure ;
40 sh:hasValue wf:DSD
41 ] ;
42 sh:property [
43 sh:path rdfs:label ;
44 sh:datatype xsd:string ;
45 sh:maxCount 1
46 ] ;
47 sh:property [
48 sh:path qb:slice ;
49 sh:node :Slice ;
50 ] ;
51 sh:property [
52 sh:path dct:publisher ;
53 sh:node :Organization ;
54 sh:minCount 1 ;
55 sh:maxCount 1
56 ]
57 .

59 :Slice a sh:NodeShape ;
60 sh:property [
61 sh:path rdf:type ;
62 sh:hasValue qb:Slice
63 ] ;
64 sh:property [
65 sh:path qb:sliceStructure ;
66 sh:hasValue wf:sliceByYear ;
67 sh:minCount 1 ;
68 sh:maxCount 1 ;
69 ] ;
70 sh:property [
71 sh:path qb:observation ;
72 sh:node :Observation ;
73 ] ;
74 sh:property [
75 sh:path cex:indicator ;
76 sh:node :Indicator ;
77 sh:minCount 1 ;
78 sh:maxCount 1
79 ]
80 .

82 :Observation a sh:NodeShape ;
83 sh:property [
84 sh:path rdf:type ;
85 sh:in ( qb:Observation wf:Observation );
271
86 sh:minCount 2
87 ] ;
88 sh:property [ sh:path rdf:type ;
89 sh:minCount 2 ;
90 sh:maxCount 2
91 ] ;
92 sh:property [
93 sh:path cex:value ;
94 sh:datatype xsd:float ;
95 sh:minCount 1 ;
96 sh:maxCount 1
97 ] ;
98 sh:property [
99 sh:path rdfs:label ;
100 sh:datatype xsd:string ;
101 sh:maxCount 1
102 ] ;
103 sh:property [
104 sh:path dct:issued ;
105 sh:datatype xsd:dateTime ;
106 sh:minCount 1 ;
107 sh:maxCount 1
108 ] ;
109 sh:or (
110 [ sh:property [
111 sh:path dct:publisher ;
112 sh:hasValue wf:WebFoundation ;
113 ]
114 ]
115 [ sh:property [
116 sh:path dct:publisher ;
117 sh:maxCount 0
118 ]
119 ]
120 ) ;
121 sh:property [
122 sh:path qb:dataSet ;
123 sh:node :DataSet ;
124 sh:minCount 1 ;
125 sh:maxCount 1
126 ] ;
127 sh:property [
128 sh:path cex:ref -area ;
129 sh:node :Country ;
130 sh:minCount 1 ;
131 sh:maxCount 1
132 ] ;
133 sh:property [
272 B. WEBINDEX IN SHACL
134 sh:path cex:indicator ;
135 sh:node :Indicator ;
136 sh:minCount 1 ;
137 sh:maxCount 1
138 ] ;
139 sh:or (
140 [ sh:property [
141 sh:path wf:source ;
142 sh:nodeKind sh:IRI ;
143 sh:minCount 1 ;
144 sh:maxCount 1
145 ] ;
146 sh:property [
147 sh:path cex:computation ;
148 sh:maxCount 0
149 ]
150 ]
151 [ sh:property [
152 sh:path cex:computation ;
153 sh:node :Computation ;
154 sh:minCount 1 ;
155 sh:maxCount 1
156 ] ;
157 sh:property [
158 sh:path wf:source ;
159 sh:maxCount 0
160 ]
161 ]
162 )
163 .

165 :Computation a sh:NodeShape ;


166 sh:property [
167 sh:path rdf:type ;
168 sh:hasValue cex:Computation
169 ] .

171 :Indicator a sh:NodeShape ;


172 sh:property [
173 sh:path rdf:type ;
174 sh:in (
175 wf:PrimaryIndicator
176 wf:SecondaryIndicator
177 ) ;
178 sh:minCount 1 ;
179 sh:maxCount 1 ;
180 ] ;
181 sh:property [
273
182 sh:path rdfs:label ;
183 sh:datatype xsd:string ;
184 sh:minCount 1 ;
185 sh:maxCount 1 ;
186 ] ;
187 sh:property [
188 sh:path wf:provider ;
189 sh:node :Organization ;
190 sh:minCount 1 ;
191 sh:maxCount 1 ;
192 ] ;
193 .

195 :Organization a sh:NodeShape ;


196 sh:closed true ;
197 sh:ignoredProperties ( rdf:type ) ;
198 sh:property [
199 sh:path rdf:type ;
200 sh:hasValue org:Organization ;
201 ] ;
202 sh:property [
203 sh:path rdfs:label ;
204 sh:datatype xsd:string ;
205 sh:minCount 1 ;
206 sh:maxCount 1 ;
207 ] ;
208 sh:property [
209 sh:path foaf:homepage ;
210 sh:nodeKind sh:IRI ;
211 sh:minCount 1 ;
212 sh:maxCount 1 ;
213 ] ;
214 .
APPENDIX C

ShEx in ShEx
In this annex we include the full code of a ShEx schema that validates ShEx schemas represented
in RDF syntax (ShExR). This code has been adapted from the ShEx specification [81].1
1 PREFIX sx: <http: // www.w3.org/ns/shex#>
2 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
3 PREFIX rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#>
4 BASE <http: // www.w3.org/ns/shex#>

6 start =@<Schema >

8 <Schema > CLOSED { a [ sx:Schema ] ;


9 sx:startActs @< SemActList1Plus >? ;
10 sx:start @<shapeExpr >?;
11 sx:shapes @<shapeExpr >*
12 }

14 <shapeExpr > @<ShapeOr > OR @<ShapeAnd > OR @<ShapeNot > OR @< NodeConstraint >
OR @<Shape > OR @< ShapeExternal >

16 <ShapeOr > CLOSED { a [ sx:ShapeOr ] ;


17 sx:shapeExprs @< shapeExprList2Plus >
18 }

20 <ShapeAnd > CLOSED { a [ sx:ShapeAnd ] ;


21 sx:shapeExprs @< shapeExprList2Plus >
22 }

24 <ShapeNot > CLOSED { a [ sx:ShapeNot ] ;


25 sx:shapeExpr @<shapeExpr >
26 }

28 <NodeConstraint > CLOSED { a [ sx:NodeConstraint ] ;


29 sx:nodeKind [ sx:iri sx:bnode sx:literal sx:nonliteral ]?;
30 sx:datatype IRI ? ;
31 &<xsFacets > ;
32 sx:values @< valueSetValueList1Plus >?
33 }

35 <Shape > CLOSED { a [ sx:Shape ] ;


1 http://shex.io/shex-semantics/#shexr
276 C. SHEX IN SHEX
36 sx:closed [true false ]? ;
37 sx:extra IRI* ;
38 sx:expression @< tripleExpression >? ;
39 sx:semActs @< SemActList1Plus >? ;
40 }

42 <ShapeExternal > CLOSED { a [ sx:ShapeExternal ] }

44 <SemAct > CLOSED { a [ sx:SemAct ] ;


45 sx:name IRI ;
46 sx:code xsd:string ?
47 }

49 <Annotation > CLOSED { a [ sx:Annotation ] ;


50 sx:predicate IRI ;
51 sx:object @< objectValue >
52 }

54 <facet_holder > { # holds labeled productions


55 $<xsFacets > ( &< stringFacet > | &< numericFacet > )* ;
56 $< stringFacet > (
57 sx:length xsd:integer
58 | sx:minlength xsd:integer
59 | sx:maxlength xsd:integer
60 | sx:pattern xsd:string ; sx:flags xsd:string ?
61 );
62 $< numericFacet > (
63 sx:mininclusive @< numericLiteral >
64 | sx:minexclusive @< numericLiteral >
65 | sx:maxinclusive @< numericLiteral >
66 | sx:maxexclusive @< numericLiteral >
67 | sx:totaldigits xsd:integer
68 | sx:fractiondigits xsd:integer
69 )
70 }
71 <numericLiteral > xsd:integer OR xsd:decimal OR xsd:double

73 <valueSetValue > @< objectValue >


74 OR @<IriStem > OR @< IriStemRange >
75 OR @< LiteralStem > OR @< LiteralStemRange >
76 OR @< LanguageStem > OR @< LanguageStemRange >

78 <objectValue > IRI OR LITERAL


79 <IriStem > CLOSED { a [ sx:IriStem ]; sx:stem xsd:anyUri }
80 <IriStemRange > CLOSED {
81 a [ sx:IriStemRange ];
82 sx:stem xsd:anyUri OR @<Wildcard >;
83 sx:exclusion @< objectValue > OR @<IriStem >*
277
84 }
85 <LiteralStem > CLOSED { a [ sx:LiteralStem ]; sx:stem xsd:string }
86 <LiteralStemRange > CLOSED {
87 a [ sx:LiteralStemRange ];
88 sx:stem xsd:string OR @<Wildcard >;
89 sx:exclusion @< objectValue > OR @< LiteralStem >*
90 }
91 <LanguageStem > CLOSED { a [ sx:LanguageStem ]; sx:stem xsd:string }
92 <LanguageStemRange > CLOSED {
93 a [ sx:LanguageStemRange ];
94 sx:stem xsd:string OR @<Wildcard >;
95 sx:exclusion @< objectValue > OR @< LanguageStem >*
96 }
97 <Wildcard > BNODE CLOSED {
98 a [ sx:Wildcard ]
99 }

101 <tripleExpression > @< TripleConstraint > OR @<OneOf > OR @<EachOf >

103 <OneOf > CLOSED { a [ sx:OneOf ] ;


104 sx:min xsd:integer ? ;
105 sx:max xsd:integer ? ;
106 sx:expressions @< tripleExpressionList2Plus > ;
107 sx:semActs @< SemActList1Plus >? ;
108 sx:annotation @<Annotation >*
109 }
110 <EachOf > CLOSED { a [ sx:EachOf ] ;
111 sx:min xsd:integer ? ;
112 sx:max xsd:integer ? ;
113 sx:expressions @< tripleExpressionList2Plus > ;
114 sx:semActs @< SemActList1Plus >? ;
115 sx:annotation @<Annotation >*
116 }
117 <tripleExpressionList2Plus > CLOSED {
118 rdf:first @< tripleExpression > ;
119 rdf:rest @< tripleExpressionList1Plus >
120 }
121 <tripleExpressionList1Plus > CLOSED {
122 rdf:first @< tripleExpression > ;
123 rdf:rest [ rdf:nil ] OR @< tripleExpressionList1Plus >
124 }
125 <TripleConstraint > CLOSED { a [ sx:TripleConstraint ] ;
126 sx:inverse [true false ]? ;
127 sx:negated [true false ]? ;
128 sx:min xsd:integer ? ;
129 sx:max xsd:integer ? ;
130 sx:predicate IRI ;
131 sx:valueExpr @<shapeExpr >? ;
278 C. SHEX IN SHEX
132 sx:semActs @< SemActList1Plus >? ;
133 sx:annotation @<Annotation >*
134 }
135 <SemActList1Plus > CLOSED {
136 rdf:first @<SemAct > ;
137 rdf:rest [ rdf:nil ] OR @< SemActList1Plus >
138 }
139 <shapeExprList2Plus > CLOSED {
140 rdf:first @<shapeExpr > ;
141 rdf:rest @< shapeExprList1Plus >
142 }
143 <shapeExprList1Plus > CLOSED {
144 rdf:first @<shapeExpr > ;
145 rdf:rest [ rdf:nil ] OR @< shapeExprList1Plus >
146 }
147 <valueSetValueList1Plus > CLOSED {
148 rdf:first @< valueSetValue > ;
149 rdf:rest [ rdf:nil ] OR @< valueSetValueList1Plus >
150 }
APPENDIX D

SHACL in SHACL
In this Appendix we include the definition of SHACL to validate SHACL Core Shapes graphs.
The version we include here has been edited from the original one1 in an attempt to improve
readability (we changed the shsh prefix by the empty one and omitted rdfs:seeAlso declarations
and some comments). It is described in Section 6.6.
1 @prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#> .
2 @prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema #> .
3 @prefix sh: <http: // www.w3.org/ns/shacl#> .
4 @prefix xsd: <http: // www.w3.org /2001/ XMLSchema #> .
5 @prefix : <http: // www.w3.org/ns/shacl -shacl #> .

7 : rdfs:label "SHACL for SHACL"@en ;


8 sh:declare [ sh:prefix "" ;
9 sh:namespace "http: // www.w3.org/ns/shacl -shacl #" ;
10 ] .

12 :ShapeShape a sh:NodeShape ;
13 sh:targetClass sh:NodeShape ;
14 sh:targetClass sh:PropertyShape ;
15 sh:targetSubjectsOf sh:targetClass , sh:targetNode ,
16 sh:targetObjectsOf , sh:targetSubjectsOf ,
17 sh:and , sh:class , sh:closed , sh:datatype ,
18 sh:disjoint , sh:equals , sh:flags , sh:hasValue ,
19 sh:ignoredProperties , sh:in ,
20 sh:languageIn , sh:lessThan , sh:lessThanOrEquals ,
21 sh:maxCount , sh:maxExclusive , sh:maxInclusive , sh:maxLength ,
sh:minCount ,
22 sh:minExclusive , sh:minInclusive , sh:minLength ,
23 sh:node , sh:nodeKind , sh:not ,
24 sh:or , sh:pattern , sh:property ,
25 sh:qualifiedMaxCount , sh:qualifiedMinCount ,
26 sh:qualifiedValueShape , sh:qualifiedValueShapesDisjoint ,
27 sh:sparql , sh:uniqueLang , sh:xone ;
28 sh:targetObjectsOf sh:node , sh:not , sh:property , sh:qualifiedValueShape ;
29 sh:xone ( :NodeShapeShape :PropertyShapeShape ) ;
30 sh:property [
31 sh:path sh:targetNode ;
32 sh:nodeKind sh:IRIOrLiteral ;
1 See Appendix C in https://www.w3.org/TR/shacl/
280 D. SHACL IN SHACL
33 ] ;
34 sh:property [
35 sh:path sh:targetClass ;
36 sh:nodeKind sh:IRI ;
37 ] ;
38 sh:property [
39 sh:path sh:targetSubjectsOf ;
40 sh:nodeKind sh:IRI ;
41 ] ;
42 sh:property [
43 sh:path sh:targetObjectsOf ;
44 sh:nodeKind sh:IRI ;
45 ] ;
46 sh:or (
47 [ sh:not [
48 sh:class rdfs:Class ;
49 sh:or ( [ sh:class sh:NodeShape ]
50 [ sh:class sh:PropertyShape ]
51 )
52 ]
53 ]
54 [ sh:nodeKind sh:IRI ]
55 );
56 sh:property [
57 sh:path sh:severity ;
58 sh:maxCount 1 ;
59 sh:nodeKind sh:IRI ;
60 ] ;
61 sh:property [
62 sh:path sh:message ;
63 sh:or ( [ sh:datatype xsd:string ]
64 [ sh:datatype rdf:langString ]
65 ) ] ;
66 sh:property [
67 sh:path sh:deactivated ;
68 sh:maxCount 1 ;
69 sh:in ( true false ) ;
70 ] ;
71 sh:property [
72 sh:path sh:and ;
73 sh:node :ListShape ;
74 ] ;
75 sh:property [
76 sh:path sh:class ;
77 sh:nodeKind sh:IRI ;
78 ] ;
79 sh:property [
80 sh:path sh:closed ;
281
81 sh:datatype xsd:boolean ;
82 sh:maxCount 1 ;
83 ] ;
84 sh:property [
85 sh:path sh:ignoredProperties ;
86 sh:node :ListShape ;
87 sh:maxCount 1 ;
88 ] ;
89 sh:property [
90 sh:path ( sh:ignoredProperties [ sh:zeroOrMorePath rdf:rest ] rdf:first
) ;
91 sh:nodeKind sh:IRI ;
92 ] ;
93 sh:property [
94 sh:path sh:datatype ;
95 sh:nodeKind sh:IRI ;
96 sh:maxCount 1 ;
97 ] ;
98 sh:property [
99 sh:path sh:disjoint ;
100 sh:nodeKind sh:IRI ;
101 ] ;
102 sh:property [
103 sh:path sh:equals ;
104 sh:nodeKind sh:IRI ;
105 ] ;
106 sh:property [
107 sh:path sh:in ;
108 sh:maxCount 1 ;
109 sh:node :ListShape ;
110 ] ;
111 sh:property [
112 sh:path sh:languageIn ;
113 sh:maxCount 1 ;
114 sh:node :ListShape ;
115 ] ;
116 sh:property [
117 sh:path ( sh:languageIn [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
118 sh:datatype xsd:string ;
119 ] ;
120 sh:property [
121 sh:path sh:lessThan ;
122 sh:nodeKind sh:IRI ;
123 ] ;
124 sh:property [
125 sh:path sh:lessThanOrEquals ;
126 sh:nodeKind sh:IRI ;
127 ] ;
282 D. SHACL IN SHACL
128 sh:property [
129 sh:path sh:maxCount ;
130 sh:datatype xsd:integer ;
131 sh:maxCount 1 ;
132 ] ;
133 sh:property [
134 sh:path sh:maxExclusive ;
135 sh:maxCount 1 ;
136 sh:nodeKind sh:Literal ;
137 ] ;
138 sh:property [
139 sh:path sh:maxInclusive ;
140 sh:maxCount 1 ;
141 sh:nodeKind sh:Literal ;
142 ] ;
143 sh:property [
144 sh:path sh:maxLength ;
145 sh:datatype xsd:integer ;
146 sh:maxCount 1 ;
147 ] ;
148 sh:property [
149 sh:path sh:minCount ;
150 sh:datatype xsd:integer ;
151 sh:maxCount 1 ;
152 ] ;
153 sh:property [
154 sh:path sh:minExclusive ;
155 sh:maxCount 1 ;
156 sh:nodeKind sh:Literal ;
157 ] ;
158 sh:property [
159 sh:path sh:minInclusive ;
160 sh:maxCount 1 ;
161 sh:nodeKind sh:Literal ;
162 ] ;
163 sh:property [
164 sh:path sh:minLength ;
165 sh:datatype xsd:integer ;
166 sh:maxCount 1 ;
167 ] ;
168 sh:property [
169 sh:path sh:nodeKind ;
170 sh:in ( sh:BlankNode sh:IRI sh:Literal
171 sh:BlankNodeOrIRI sh:BlankNodeOrLiteral sh:IRIOrLiteral ) ;
172 sh:maxCount 1 ;
173 ] ;
174 sh:property [
175 sh:path sh:or ;
283
176 sh:node :ListShape ;
177 ] ;
178 sh:property [
179 sh:path sh:pattern ;
180 sh:datatype xsd:string ;
181 sh:maxCount 1 ;
182 ] ;
183 sh:property [
184 sh:path sh:flags ;
185 sh:datatype xsd:string ;
186 sh:maxCount 1 ;
187 ] ;
188 sh:property [
189 sh:path sh:qualifiedMaxCount ;
190 sh:datatype xsd:integer ;
191 sh:maxCount 1 ;
192 ] ;
193 sh:property [
194 sh:path sh:qualifiedMinCount ;
195 sh:datatype xsd:integer ;
196 sh:maxCount 1 ;
197 ] ;
198 sh:property [
199 sh:path sh:qualifiedValueShape ;
200 sh:maxCount 1 ;
201 ] ;
202 sh:property [
203 sh:path sh:qualifiedValueShapesDisjoint ;
204 sh:datatype xsd:boolean ;
205 sh:maxCount 1 ;
206 ] ;
207 sh:property [
208 sh:path sh:uniqueLang ;
209 sh:datatype xsd:boolean ;
210 sh:maxCount 1 ;
211 ] ;
212 sh:property [
213 sh:path sh:xone ;
214 sh:node :ListShape ;
215 ]
216 .
Bibliography
[1] S. Abiteboul, R. Hull, and V. Vianu, Eds. Foundations of Databases: The Logical Level, 1st
ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995.
[2] S. Abiteboul, I. Manolescu, P. Rigaux, M.-C. Rousset, and P. Senellart. Web Data Man-
agement. Cambridge University Press, 2012. DOI: 10.1017/cbo9780511998225.
[3] D. Allemang and J. Hendler. Semantic Web for the Working Ontologist: Effective Modeling in
RDFS and OWL, 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 2011.
[4] G. Antoniou, P. Groth, F. v. v. Harmelen, and R. Hoekstra. A Semantic Web Primer. The
MIT Press, 2012.
[5] C. Arnaud Le Hors. JSON-LD 1.0: A JSON-based Serialization for Linked Data. W3C
Recommendation, 2014. http://www.w3.org/TR/json-ld/
[6] C. Arnaud Le Hors. RDF Data Shapes Working Group Charter. http://www.w3.org/
2014/data-shapes/charter, 2014.

[7] T. Baker and E. Prud’hommeaux. Shape Expressions (ShEx) Primer. https://shexsp


ec.github.io/primer/, April 2017.

[8] T. Berners-Lee. Linked-data design issues. W3C design issue document, June 2006.
http://www.w3.org/DesignIssues/LinkedData.html

[9] P. V. Biron and A. Malhotra. XML Schema Part 2: Datatypes 2nd ed. W3C Recommen-
dation, 2004. http://www.w3.org/TR/xmlschema-2/
[10] DCMI Usage Board. DCMI Metadata Terms. http://dublincore.org/documents/d
cmi-terms/, 2012.

[11] I. Boneva, J. E. Labra Gayo, and E. Prud’hommeaux. Semantics and validation of shapes
schemas for RDF. In International Semantic Web Conference, 2017.
[12] D. Booth. FHIR linked data module. https://www.hl7.org/fhir/linked-data-
module.html, April 2017.

[13] T. Bosch, E. Acar, A. Nolle, and K. Eckert. The role of reasoning for RDF validation. In
Proc. of the 11th International Conference on Semantic Systems, SEMANTICS’15, pages 33–
40, New York, ACM, 2015. DOI: 10.1145/2814864.2814867.
286 BIBLIOGRAPHY
[14] P. Bourhis, J. L. Reutter, F. Suárez, and D. Vrgoč. JSON: Data model, query languages
and schema specification. In Proc. of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium
on Principles of Database Systems, PODS’17, pages 123–135, New York, ACM, 2017. DOI:
10.1145/3034786.3056120.
[15] G. E. P. Box. Science and statistics. Journal of the American Statistical Association, 71(356):
791–799, 1976. DOI: 10.2307/2286841.
[16] D. Brickley, R. V. Guha, and A. Layman. Resource description framework (RDF)
schemas. https://www.w3.org/TR/1998/WD-rdf-schema-19980409/, 1998.
[17] K. Cagle. SHACL: It’s about time. https://dzone.com/articles/its-about-time,
March 2017.
[18] G. Carothers and A. Seaborne. TRIG: RDF dataset language. http://www.w3.org/TR
/trig/, 2014.

[19] R. Chinnici, J.-J. Moreau, A. Ryman, and S. Weerawarana. Web Services Description
Language (WSDL) Version 2.0 Part 1: Core Language. https://www.w3.org/TR/wsd
l20/, 2007.

[20] J. Clark and M. Makoto, Eds. RELAX NG Specification. OASIS Committee Specification,
2001. http://relaxng.org/spec-20011203.html
[21] K. Clark and E. Sirin. On RDF validation, stardog ICV, and assorted remarks. In RDF
Validation Workshop. Practical Assurances for Quality RDF Data, W3C, Cambridge, MA,
Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[22] C. S. Coen, P. Marinelli, and F. Vitali. Schemapath, a minimal extension to XML Schema
for conditional constraints. In Proc. of the 13th International Conference on World Wide Web,
WWW’04, pages 164–174, New York, ACM, 2004. DOI: 10.1145/988672.988695.
[23] K. Coyle and T. Baker. Dublin core application profiles. Separating validation from se-
mantics. In RDF Validation Workshop. Practical Assurances for Quality RDF Data, W3C,
Cambridge, MA, Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[24] R. Cyganiak and D. Reynolds. The RDF Data Cube Vocabulary. W3C Recommendation,
2014. https://www.w3.org/TR/vocab-data-cube/
[25] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1—Concepts and Abstract Syntax. W3C
Recommendation, February 2014. http://www.w3.org/TR/rdf11-concepts/
[26] D. Brickley and R. V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema.
W3C Recommendation, 2004. https://www.w3.org/TR/2004/REC-rdf-schema-
20040210/
BIBLIOGRAPHY 287
[27] D. Brickley and R. V. Guha. RDF Schema 1.1. W3C Recommendation, 2014. http:
//www.w3.org/TR/rdf-schema/

[28] S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language. W3C
Recommendation, September 2012. http://www.w3.org/TR/r2rml/

[29] D. L. McGuinness and F. V. Harmelen. OWL Web Ontology Language Overview. W3C
Recommendation, 2004. https://www.w3.org/TR/owl-features/

[30] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de Walle.
RML: A generic language for integrated RDF mappings of heterogeneous data. In Proc.
of the 7th Workshop on Linked Data on the Web, April 2014. http://events.linkeddata.
org/ldow2014/papers/ldow2014_paper_01.pdf

[31] A. Dimou, D. Kontokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens,


S. Hellmann, and R. Van de Walle. Assessing and refining mappings to RDF to improve
dataset quality. In Proc. of the 14th International Semantic Web Conference, October 2015.
DOI: 10.1007/978-3-319-25010-6_8.

[32] M. Dojchinovski, D. Kontokostas, R. Rößling, M. Knuth, and S. Hellmann. DBpedia


links: The hub of links for the web of data. In Proc. of the SEMANTiCS Conference (SE-
MANTiCS 2016), September 2016. https://svn.aksw.org/papers/2016/SEMANTiC
S_DBpedia_Links/public.pdf

[33] B. DuCharme. Learning SPARQL. O’Reilly Media, Inc., 2011.

[34] M. Duerst and M. Suignard. Internationalized resource identifiers (IRIs). DOI:


10.17487/rfc3987.

[35] ECMA International. The JSON data interchange format. http://www.ecma-internat


ional.org/publications/files/ECMA-ST/ECMA-404.pdf, 2013.

[36] I. Ermilov, J. Lehmann, M. Martin, and S. Auer. LODStats: The data web census
dataset. In Proc. of 15th International Semantic Web Conference—Resources Track (ISWC),
2016. DOI: 10.1007/978-3-319-46547-0_5.

[37] D. F. Alvarez, J. E. Labra Gayo, and H. Garcia-Gonzalez, Eds. Inference and Serialization
of Latent Graph Schemata Using ShEx, number 10 in IARIA Series, 2016. http://thinkm
ind.org/index.php?view=article&articleid=semapro_2016_4_40_30038

[38] P. M. Fischer, G. Lausen, A. Schätzle, and M. Schmidt. RDF Constraint Checking.


In Fernandez Alvarez et al. [37], pages 205–212. http://ceur-ws.org/Vol-1330/pa
per-33.pdf
288 BIBLIOGRAPHY
[39] C. Fürber and M. Hepp. Using SPARQL and SPIN for data quality management on the
semantic web. In W. Abramowicz and R. Tolksdorf, Eds., Business Information Systems,
volume 47 of Lecture Notes in Business Information Processing, pages 35–46, Springer, 2010.
DOI: 10.1007/978-3-319-59336-4.

[40] Jose E. Labra Gayo, E. Prud’hommeaux, I. Boneva, S. Staworko, H. Solbrig, and S. Hym.
Towards an RDF validation language based on regular expression derivatives. http://ce
ur-ws.org/Vol-1330/paper-32.pdf

[41] R. J. Glushko, Ed. The Discipline of Organizing. The MIT Press, 2013. DOI:
10.1002/bult.2013.1720400108.

[42] C. F. Goldfarb. The SGML Handbook. Oxford University Press, Inc., New York, 1990.

[43] P. Grosso and J. Kosek. Associating Schemas with XML Documents 1.0, 3rd ed. W3C
Working Group Note, October 2012. https://www.w3.org/TR/xml-model/

[44] S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Recommendation, 2013.
http://www.w3.org/TR/sparql11-query/

[45] T. Hartmann, B. Zapilko, J. Wackerow, and K. Eckert. Validating RDF data quality
using constraints to direct the development of constraint languages. In IEEE 10th Inter-
national Conference on Semantic Computing (ICSC), pages 116–123, February 2016. DOI:
10.1109/icsc.2016.43.

[46] T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data
Space, volume 1. Morgan & Claypool Publishers LLC, February 2011. DOI:
10.2200/s00334ed1v01y201102wbe001.

[47] J. Hebeler, M. Fisher, R. Blace, and A. Perez-Lopez. Semantic Web Programming. Wiley
Publishing, 2009.

[48] P. Hitzler, M. Krötzsch, and S. Rudolph. Foundations of Semantic Web Technologies. Chap-
man & Hall/CRC, 2009.

[49] J. Hjelm. Creating the Semantic Web with RDF: Professional Developer’s Guide. Professional
Developer’s Guide Series. Wiley, 2001.

[50] ISO/IEC. Information Technology—Document Schema Definition Languages


(DSDL)—Part 3: Rule-based Validation—Schematron. http://schematron.com/,
2016.

[51] H. Knublauch. SPIN—Modeling Vocabulary. http://www.w3.org/Submission/spin-


modeling/, 2011.
BIBLIOGRAPHY 289
[52] H. Knublauch and D. Kontokostas. Shapes Constraint Language (SHACL). W3C Proposed
Recommendation, June 2017. https://www.w3.org/TR/shacl/

[53] D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelissen, and


A. Zaveri. Test-driven evaluation of linked data quality. In Proc. of the 23rd Interna-
tional Conference on World Wide Web, WWW’14, pages 747–758, Republic and Canton of
Geneva, Switzerland, International World Wide Web Conferences Steering Committee,
2014. DOI: 10.1145/2566486.2568002.

[54] J. E. Labra Gayo. Web semántica: comprendiendo el cambio hacia la Web 3.0. Nebiblo, 2012.

[55] J. E. Labra Gayo and J. M. A. Rodríguez. Validating statistical index data represented in
RDF using SPARQL queries. In RDF Validation Workshop. Practical Assurances for Qual-
ity RDF Data, W3C, Cambridge, MA, Boston, September 2013. http://www.w3.org/
2012/12/rdf-val

[56] J. E. Labra Gayo, H. Farham, J. C. Fernández, and J. M. Álvarez Rodríguez. Represent-


ing statistical indexes as linked data including metadata about their computation process.
In S. Closs, R. Studer, E. Garoufallou, and M. Sicilia, Eds., Proc. of the Metadata and
Semantics Research—8th Research Conference, MTSR, Karlsruhe, Germany, November 27–
29, 2014, volume 478 of Communications in Computer and Information Science, pages 42–53,
Springer, 2014. DOI: 10.1007/978-3-319-13674-5.

[57] J. E. Labra Gayo, E. Prud’hommeaux, H. R. Solbrig, and J. M. Á. Rodríguez. Validating


and describing linked data portals using RDF shape expressions. In Proc. of the 1st Workshop
on Linked Data Quality co-located with 10th International Conference on Semantic Systems,
LDQ@SEMANTiCS, volume 1215 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.

[58] J. E. Labra Gayo, E. Prud’hommeaux, H. Solbrig, and I. Boneva. Validating and describing
linked data portals using shapes. http://arxiv.org/abs/1701.08924

[59] T. Lebo, S. Sahoo, and D. McGuinness. PROV-O: The PROV Ontology. W3C Recom-
mendation, April 2013. http://www.w3.org/TR/prov-o/

[60] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann,


M. Morsey, P. van Kleef, S. Auer, and C. Bizer. DBpedia—a large-scale, multilingual
knowledge base extracted from wikipedia. Semantic Web Journal, 6(2):167–195, 2015. ht
tp://jens-lehmann.org/files/2014/swj_dbpedia.pdf

[61] F. Maali and J. Erickson, Eds. Data Catalog Vocabulary (DCAT). W3C Recommendation,
2014. https://www.w3.org/TR/vocab-dcat/
290 BIBLIOGRAPHY
[62] W. Martens, F. Neven, M. Niewerth, and T. Schwentick. Bonxai: Combining the simplic-
ity of DTD with the expressiveness of XML schema. In Proc. of the 34th ACM SIGMOD-
SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’15, pages 145–156,
New York, ACM, 2015. DOI: 10.1145/2745754.2745774.

[63] L. Miller and D. Brickley. Schemarama. http://swordfish.rdfweb.org/discovery


/2001/01/schemarama/, February 2001.

[64] B. Motik, I. Horrocks, and U. Sattler. Adding integrity constraints to OWL. In C. Gol-
breich, A. Kalyanpur, and B. Parsia, Eds., OWL: Experiences and Directions (OWLED),
Innsbruck, Austria, June 6–7, 2007.

[65] M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages
using formal language theory. ACM Transactions on Internet Technology, 5(4):660–704,
November 2005. DOI: 10.1145/1111627.1111631.

[66] M. A. Musen. The protégé project: A look back and a look forward. AI Matters, 1(4):
4–12, June 2015. DOI: 10.1145/2757001.2757003.

[67] T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In
Proc. of the ACM SIGMOD International Conference on Management of Data, SIGMOD’09,
pages 627–640, New York, ACM, 2009. DOI: 10.1145/1559845.1559911.

[68] O. Lassila and R. R. Swick. Resource Description Framework (RDF) Model and Syntax.
https://www.w3.org/TR/WD-rdf-syntax-971002/, 1997.

[69] O. Lassila and R. R. Swick. Resource Description Framework (RDF) Model and Syntax
Specification. https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/, 1999.

[70] W. OWL Working Group. OWL 2 Web Ontology Language: Document Overview. W3C
Recommendation, October 2009. http://www.w3.org/TR/owl2-overview/

[71] T. B. Passin. Explorer’s Guide to the Semantic Web. Manning Publications Co., Greenwich,
CT, 2004.

[72] P. F. Patel-Schneider. Using description logics for RDF constraint checking and closed-
world recognition. In Proc. of the 29th Conference on Artificial Intelligence, AAAI’15,
pages 247–253. AAAI Press, 2015. http://dl.acm.org/citation.cfm?id=2887007.
2887042

[73] J. Pérez, M. Arenas, and C. Gutierrez. Semantics and complexity of sparql.


ACM Transactions of Database System, 34(3):16:1–16:45, September 2009. DOI:
10.1145/1567274.1567278.
BIBLIOGRAPHY 291
[74] F. Pezoa, J. L. Reutter, F. Suarez, M. Ugarte, and D. Vrgoč. Foundations of JSON schema.
In Proc. of the 25th International Conference on World Wide Web, WWW’16, pages 263–273,
Republic and Canton of Geneva, Switzerland, International World Wide Web Confer-
ences Steering Committee, 2016. DOI: 10.1145/2872427.2883029.
[75] A. Phillips and M. Davis. Tags for identifying languages. Technical Report 47, Internet
Engineering Task Force, September 2009. DOI: 10.17487/rfc5646.
[76] S. Powers. Practical RDF. O’Reilly & Associates, Inc., Sebastopol, CA, 2003.
[77] E. Prud’hommeaux and T. Baker. ShapeMap structure and language. https://shexsp
ec.github.io/ShapeMap/, July 2017.
[78] E. Prud’hommeaux and G. Carothers. RDF 1.1 turtle: Terse RDF triple language. http:
//www.w3.org/TR/turtle/, 2014.
[79] E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF. W3C Recom-
mendation, 2008. http://www.w3.org/TR/rdf-sparql-query/
[80] E. Prud’hommeaux, Jose E. Labra Gayo, and H. R. Solbrig. Shape expressions:
An RDF validation and transformation language. In Proc. of the 10th Interna-
tional Conference on Semantic Systems, SEMANTICS, pages 32–40, ACM, 2014. DOI:
10.1145/2660517.2660523.
[81] E. Prud’hommeaux, I. Boneva, J. E. Labra Gayo, and G. Kellog. Shape expressions lan-
guage 2.0. https://shexspec.github.io/spec/, April 2017.
[82] RDF Working Group W3c. W3c validation workshop. practical assurances for quality rdf
data, September 2013. http://www.w3.org/2012/12/rdf-val/.
[83] D. Reynolds. The Organization Ontology. W3C Recommendation, 2014. http://www.
w3.org/TR/vocab-org/
[84] E. Robertsson. Combining RELAX NG and Schematron. XML.com, February 2004.
https://www.xml.com/pub/a/2004/02/11/relaxtron.html
[85] J. Rumbaugh, I. Jacobson, and G. Booch. Unified Modeling Language Reference Manual,
2nd ed. Pearson Higher Education, 2004.
[86] A. G. Ryman, A. L. Hors, and S. Speicher. OSLC resource shape: A language for defining
constraints on linked data. In C. Bizer, T. Heath, T. Berners-Lee, M. Hausenblas, and
S. Auer, editors, Linked Data on the Web, volume 996 of CEUR Workshop Proceedings. CEUR-
WS.org, 2013. DOI: 10.1145/1367497.1367760.
[87] R. Sanderson, P. Ciccarese, and B. Young. Web Annotation Vocabulary. W3C Recommen-
dation, February 2017. https://www.w3.org/TR/annotation-vocab/
292 BIBLIOGRAPHY
[88] T. Segaran, C. Evans, J. Taylor, S. Toby, E. Colin, and T. Jamie. Programming the Semantic
Web, 1st ed. O’Reilly Media, Inc., 2009.
[89] S. Gao, C. M. Sperberg-McQueen and H. S. Thompson. W3C XML Schema Definition
Language (XSD) 1.1 Part 1: Structures. W3C Recommendation, 2012. https://www.w3.
org/TR/xmlschema11-1/

[90] S. Simister and D. Brickley. Simple application-specific constraints for RDF models. In
RDF Validation Workshop. Practical Assurances for Quality RDF Data, W3C, Cambridge,
MA, Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[91] S. Steyskal and K. Coyle. SHACL Use Cases and Requirements. W3C Working Draft,
2016. https://www.w3.org/TR/shacl-ucr/
[92] H. Solbrig and E. Prud’hommeaux. Shape Expressions 1.0 Definition. http://www.w3.
org/Submission/shex-defn/, 2014.

[93] S. Speicher, J. Arwe, and A. Malhotra, Eds. Linked Data Platform 1.0. W3C Recommen-
dation, 2015. https://www.w3.org/TR/ldp/
[94] S. Staworko, I. Boneva, Jose E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, and H. R.
Solbrig. Complexity and expressiveness of ShEx for RDF. In 18th International Confer-
ence on Database Theory, ICDT, volume 31 of LIPIcs, pages 195–211, Schloss Dagstuhl—
Leibniz-Zentrum fuer Informatik, 2015.
[95] D. Steer and L. Miller. Validating RDF with treehugger and schematron. In FOAF-
Galway. Position paper, 2004. https://www.w3.org/2001/sw/Europe/events/foaf-
galway/papers/pp/validating_rdf/

[96] J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity constraints in OWL. In Proc. of
the 24th Conference on Artificial Intelligence (AAAI’10), 2010.
[97] N. M. Tim Berners-Lee. The rule of least power. http://www.w3.org/2001/tag/doc/
leastPower, February 2006.

[98] T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau. Extensible Markup


Language (XML) 1.0, 5th ed. W3C Recommendation, 2008. https://www.w3.org/TR
/xml/

[99] J. C. van Dam, J. J. Koehorst, P. J. Schaap, V. A. Martins dos Santos, and M. Suarez-Diez.
RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource.
Journal of Biomedical Semantics, 6(1):39, October 2015. DOI: 10.1186/s13326-015-0038-
9.
[100] E. van der Vlist. Relax NG: A Simpler Schema Language for XML. O’Reilly, Beijing, 2004.
BIBLIOGRAPHY 293
[101] A. Wright, Ed. JSON Schema: A Media Type for Describing JSON Documents. IETF, 2016.
http://json-schema.org/
297

Index

Absolute IRI, 59 Data quality, 27


And, 63 Datatype IRI, 10
Apache Jena, 122, 263 DBpedia, 215
DDL, 9, 29, 40, 44
bag, 41 Descrition logics, 21
Big data, 1 Disambiguation, 1
Blank node, 11, 12, 59 Douglas Crockford, 37
Boneva, Iovka, 58 DTD, 40
Dublin Core Application Profiles, 48
CI, 215
Clinical records, 203 ECMA, 37
Closed shapes, 44 ELI, 123
Closed World Assumption, 2, 47 Embedded schema, 36
co-existence constraints, 206 Entailment, 188
Codasyl, 29 Entailment regime, 188
Codepoint, 149 Exclusive or, 154, 159
Comma-Separated Values, 39
Constraint component, 186 FHIR, 18, 65, 204, 206
Continuous integration, 215 Focus node, 59
Corese, 123 Foreign key, 30
CSV, 28, 39, 40
CSV Lint, 40 George Box, 17
Custom datatypes, 144 Github, 215
CVSW, 39 Google, 17
CWA, 2 Grammar, 4
Cyclic data model, 91, 166
Haskell, 58, 263
Dan Brickley, 45 HL7, 18, 204
Data Definition Language, 29 HL7 FHIR, 56
Data engineer, 41 HTML, 4, 23, 24, 31, 206
Data graph, 124 HTTP link headers, 107
298 INDEX
IDE, 263 Microdata, 23, 24
identifiers.org, 17 Microsoft, 17
Inclusive or, 98 MOF, 28
Incoming arcs, 59 Multiset, 41
Indirect recursion, 173
Inference, 42, 188 N3.js, 263
Instance data, 41 Namespace, 184
Integrated Development Environment, 122 Namespace prefix declarations, 184
IRI, 1, 10, 59 Neighborhood, 59
IRI referent, 10 Netage, 123
Nicky van Oorschot, 123
Java, 58 Node constraint, 40
Javascript, 37, 58, 193 Node shape, 124
function, 193 NoSQL, 2
JSON, 28, 37, 40, 52, 204, 237 Not, 63
arrays, 37 Notation3, 3
Booleans, 37
null, 37 OASIS, 33
numbers, 37 Object Constraint Language, 29
objects, 37 Object Management Group, 28
string, 37 OCL, 29, 206
JSON Schema, 40 OCL Constraints, 29
JSON-LD, 4, 10, 23, 55, 114, 119 Olivier Corby, 123
Online demo, 122
Kellogg, Gregg, 58 Ontology, 9, 21, 41
Ontology engineer, 41
Labra Gayo, Jose Emilio, 58
Open World, 9
Language tag, 11
Language tagged strings, 11 Open World Assumption, 2
Lexical form, 10 Or, 63
Libby Miller, 45 OSLC, 119
lingua franca, 2 OSLC Resource Shapes, 48
Linked data, 1, 2 OSLC Resource shapes, 55, 107
Linked data platform, 107 Outgoing arcs, 59
container, 107 OWA, 2
Linked Open Vocabularies, 17 OWL, 9, 21, 47, 188, 215
Literal, 10, 59 owl:AllDisjointClasses, 22
Local part, 10 owl:Class, 22
owl:Nothing, 21
Meta-Object Facility, 28 owl:Restriction, 22
INDEX 299
owl:Thing, 21 Compositional, 2
owl:equivalentClass, 22 Custom datatypes, 69
owl:hasValue, 22 Language-tagged literal, 69, 145
owl:imports, 125, 190 RDF 1.0, 10
owl:intersectionOf, 22 RDF 1.1, 10
owl:members, 22 RDF collections, 15
owl:onProperty, 22 RDF data model, 1, 2, 10
owl:sameIndividualAs, 17 RDF Graph, 11
owl:unionOf, 22 RDF lists, 15, 130, 227
Annotation properties, 21 RDF node, 10
Closed World, 48 RDF object, 10
Constructors, 22 RDF predicate, 10
Datatype property, 21 RDF property, 10
Functional property, 47 RDF serialization format, 10
Functional syntax, 21 RDF statement, 10
Manchester syntax, 21 RDF subject, 10
Object property, 21 RDF triple, 10
OWL class, 21 RDF/XML, 10
OWL individual, 21 rdf:type declaration, 15
OWL reasoner, 22 Resource Description Framework, 9
owl namespace, 10 String literals, 11
rdf namespace, 10
Prefix declaration, 10, 56, 184 RDF 1.0, 9
Prefix label, 10 RDF Schema, 9, 20, 47, 112, 145, 188
Prefix name, 10 RDF validation, 41
prefix.cc, 10 RDF/XML, 3, 9, 10, 119
Primary key, 30 rdf4h, 263
Property shape, 124 RDFa, 4, 23, 24
Prud’hommeaux, Eric, 58 rdflib, 263
Python, 123, 263 RDFS, 9, 188, 215
rdfs:Class, 20, 139
RDF, 1, 41, 204 rdfs:Datatype, 20
rdf:first, 15, 227 rdfs:Literal, 20
rdf:langString, 69, 145 rdfs:Property, 20
rdf:nil, 15, 227 rdfs:Resource, 20
rdf:rest, 15, 227 rdfs:comment, 20
rdf:type, 20, 41, 59, 88, 107, 111, 112, rdfs:domain, 20
138, 145, 190 rdfs:label, 20
Blank node, 124, 149 rdfs:range, 20
300 INDEX
rdfs:subClassOf, 20, 138, 145, 190 sh:IRI, 142, 146, 178
rdfs:subPropertyOf, 20 sh:IRIOrLiteral, 142, 146
class, 139 sh:Info, 134, 246
rdfs namespace, 10 sh:Literal, 142, 146
RDFS inference, 245 sh:NodeKindConstraintComponent,
RDFUnit, 46, 123 132
Recursion, 91, 166, 171 sh:NodeShape, 120
Regular expressions, 71, 150 sh:SPARQLTarget, 190
flags, 150 sh:SPARQLTargetType, 190
i flag, 72 sh:ValidationReport, 126
m flag, 72 sh:ValidationResult, 126
Meta-characters, 71 sh:Violation, 134, 246
q flag, 72 sh:ViolationResult, 246
s flag, 72 sh:Warning, 134
x flag, 72 sh:alternativePath, 130
Relational databases, 29 sh:and, 133, 154, 166, 178, 257
Relative IRI, 59 sh:class, 132, 133, 142, 145
RelaxNG, 33, 55, 263 sh:closed, 133, 177, 178, 257
Resource, 10 sh:conforms, 126
Ruby, 58 sh:datatype, 133, 142
Ruby RDF, 263 sh:deactivated, 135, 241
sh:defaultValue, 182
Scala, 58 sh:description, 133, 182, 260
Schema.org, 17 sh:detail, 126
Schemarama, 45 sh:disjoint, 133, 180
Schematron, 34, 204, 206, 239, 263 sh:entailment, 188
Semantic web, 2 sh:equals, 133, 180
Semantic web stack, 18 sh:flags, 149, 150
Service endpoint, 107 sh:focusNode, 126
SGML, 31 sh:group, 133, 182
SHACL, 46, 119, 267 sh:hasValue, 133, 142, 148
$currentShape, 184 sh:ignoredProperties, 133, 177, 258
$shapesGraph, 184 sh:in, 133, 142, 147
$this, 184 sh:inversePath, 130
-> Operator, 190 sh:labelTemplate, 186
. Operator, 190 sh:languageIn, 133, 151, 153
sh:BlankNode, 142, 146 sh:length, 133
sh:BlankNodeOrIRI, 142, 146 sh:lessThan, 133, 180
sh:BlankNodeOrLiteral, 142, 146 sh:lessThanOrEquals, 133, 180
INDEX 301
sh:maxCount, 133, 141, 200 sh:sh:ClassConstraintComponent,
sh:maxExclusive, 133, 149 132
sh:maxInclusive, 149 sh:sourceConstraintComponent, 126
sh:maxLength, 133, 149 sh:sourceShape, 126
sh:message, 133, 184 sh:sparql, 184
sh:minCount, 133, 141, 148, 200 sh:targetClass, 107, 120, 137–139
sh:minExclusive, 149 sh:targetNode, 106, 137
sh:minInclusive, 133, 149 sh:targetObjectsOf, 107, 137, 141
sh:minLength, 133, 149 sh:targetSubjectsOf, 107, 137, 140
sh:name, 133, 182, 260 sh:uniqueLang, 133, 151, 153, 251
sh:namespace, 184 sh:validator, 186
sh:node, 164, 178, 201 sh:value, 126
sh:nodeKind, 133, 142, 146, 178 sh:xone, 133, 154, 159, 166
sh:nodeValidator, 186, 188 sh:zeroOrMorePath, 130
sh:not, 133, 154, 166 sh:zeroOrOneMorePath, 171
sh:oneOrMorePath, 130 sh:zeroOrOnePath, 130, 178
sh:optional, 186 Advanced features, 190
sh:or, 133, 154, 157, 166, 178 Annotations properties, 190
sh:order, 133, 182 ASK validators, 190
sh:parameter, 186 Cardinality, 141, 200
sh:path, 124, 186 Closed shapes, 177
sh:pattern, 133, 149, 150 Compact syntax, 190, 237
sh:prefix, 184 Constraint component, 131
sh:prefixes, 184 Constraint components, 193
sh:property, 165 Constraint expressions, 191
sh:propertyValidator, 186 Datatype facets, 148
sh:qualifiedMaxCount, 133, 174 Datatypes, 142
sh:qualifiedMinCount, 133, 174 Disjoint qualified value shapes, 176
sh:qualifiedValueShape, 133, 174 Entailment, 246
sh:qualifiedValueShapeDisjoint, Exactly one, 159
176 Functions, 190
sh:qualifiedValueShapesDisjoint, IF-THEN, 162
133 Implicit class target, 139
sh:result, 126 Importing shapes graphs, 125
sh:resultMessage, 126, 133 Node expressions, 191
sh:resultPath, 126 Node shapes, 129
sh:resultSeverity, 126 Non-validating SHACL Properties, 182
sh:select, 184, 186 Property pair constraints, 179
sh:severity, 134 Property path, 171, 173, 178
302 INDEX
Property shapes, 129 Closed, 87
Qualified value shapes, 166, 174 CLOSED qualifier, 89
Rules, 190, 191 Closed shapes, 257
SELECT based validator, 186 Closing a property, 80
SELECT validators, 190 Comments, 59
sh:property, 130 Compact syntax, 59
SHACL instance, 138, 145 Curly braces, 60
SHACL Javascript, 192 Datatype constraints, 68
SHACL paths, 130 Datatype facets, 70
SHACL-JS, 193 EachOf, 80, 86
SHACL-SPARQL, 133, 183 Exclusions, 76
Shapes, 119 External shapes, 92
Shapes graph, 121 EXTRA qualifier, 89
SPARQL based targets, 190 Extra qualifier, 87
SPARQL Constraint components, 185 Fixed shape map, 61, 105
Target declarations, 119, 137 Focus keyword, 106
Validation report, 126 Focus node, 59
Validation result, 126 FractionDigits, 70
SHACL community group, 119 Hidden negation, 88, 105
SHACL Core, 119, 122, 190 IF-THEN pattern, 102
SHACL Playground, 122 IF-THEN-ELSE pattern, 103
SHACL-Core, 184 import, 113
SHACL-SPARQL, 119, 122, 184 Inverse property, 59
Shaclex, 58, 123, 240 Inverse triple constraint, 85
Shape Expression, 63 Invocation, 60
Shapes Constraint Language, see Labeled triple expression, 93
SHACL119 Language-tagged values, 74
Shapes graph, 124 Length, 70
Shared Entity, 17 Literal datatype, 65
ShEx, 55, 119, 204 Literals, 59
a keyword, 59 Logical operators, 64
dot operator, 65 MaxExclusive, 70
AND, 67, 95 MinInclusive, 70
And, 60 Negative dependency, 104
Annotations, 94 Nested shapes, 84
BASE declarations, 59 Node constraint, 59, 63, 64
Blank nodes, 59 Node kind, 65, 67
BNF Grammar, 60 Node neighborhood, 59
Cardinalities, 80 NOT, 95, 101, 102
INDEX 303
Not, 60 ShExR, 114
Numeric facets, 65 SPARQL, 2, 4, 9, 18, 45–48, 55, 56, 59,
OneOf, 82 107, 184, 188, 239
OR, 95, 98 $PATH, 186
Or, 60 $this, 186
PREFIX declarations, 59 sh:validator, 186
Property, 59 ASK queries, 19
Query shape map, 62, 106 Basic graph patterns, 18
Ranges, 74 CONSTRUCT queries, 19
Regex pattern, 70 Limit, 19
Regular expressions, 70 Offset, 19
Reusing shapes, 96 Option, 19
Schema, 55, 59, 220 Order, 19
Semantic actions, 110 regex function, 150
Shape, 60, 63, 78 SELECT, 186
Shape label, 105 SELECT queries, 19
Shape maps, 105 Triple pattern, 18
Shape reference, 90 Union of patterns, 19
ShapeAnd, 63 Variables, 18
ShapeExternal, 64 SPARQL 1.1 paths, 130
ShapeNot, 64 SPARQL constraint, 184
ShapeOr, 63 SPARQL functions, 190
ShExC, 55, 73 SPARQL Property Path, 247
ShExJ, 55, 114 SPARQL property paths, 85
ShExR, 55, 219 SPIN, 46, 119
Start, 64, 105, 220 SQL, 9, 28, 29, 39, 44, 52
String facets, 65 Squish, 45
TotalDigits, 70 Stardog ICV, 47
Triple constraint, 59, 66 Structured Query Language, 29
Triple expression, 60 STTL, 123
Triple pattern, 62, 106
Unique, 251 Tab-Separated Values, 39
Unit value sets, 73 Tim Berners-Lee, 123
Value sets, 65, 72 TopBraid, 263
Wildcard, 106 TopBraid Composer, 122
ShEx 1.0, 55 TopBraid SHACL API, 122
ShEx community group, 55 TopQuadrant, 122, 263
ShExC, 59, 237 Travis, 215
ShExJ, 114, 237 TreeHuger, 45
304 INDEX
Trig, 10 XML, 28, 31, 40, 41, 52, 204
TSV, 39 Attribute, 31
Turtle, 3, 10, 55, 56, 59, 119, 237 Element, 31
Object list, 14 Post-schema validation infoset, 33
Predicate list, 13 PSVI, 33, 240
XML Information Set, 31
UML, 28, 52, 206
XML Processing instruction, 36
Class diagram, 28
XML Schema, 9, 33, 40, 44, 70, 105, 204,
UML 2, 28
206, 240
Unicode, 37
xsd:string, 11
Unified Modeling Language, 28
XML Schema datatypes, 69, 144
Unique keys, 250
XML Schema facets, 59, 70, 148
Unique Name Assumption, 47
XPath, 3, 34, 71, 239
Vocabulary, 17 xsd namespace, 10
xsd:noNamespaceSchemaLocation
W3C recommendation, 119 attribute, 36
Web of data, 1 xsd:schemaLocation attribute, 36
WebIndex, 267 XSLT, 3
Wikidata, 107
Working group note, 193 Yahoo, 17
WSDL, 37, 105 Yandex, 17

You might also like