Validating RDF Data 2017
Validating RDF Data 2017
Eric Prud’hommeaux
W3C/MIT and Micelio
Iovka Boneva
University of Lille
Dimitris Kontokostas
University of Leipzig
M
&C Morgan & cLaypool publishers
Copyright © 2018 by Morgan & Claypool
KEYWORDS
RDF, ShEx, SHACL, shape expressions, shapes constraint language, data quality,
web of data, Semantic Web, linked data
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 RDF and the Web of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 RDF: The Good Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Challenges for RDF Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Structure of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Conventions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 Non-RDF Schema Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 UML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 SQL and Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.3 XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4 JSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.5 CSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Understanding the RDF Validation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Previous RDF Validation Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Query-based Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Inference-based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.3 Structural Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Validation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 General Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.2 Graph-based Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 RDF Data Model Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.4 Data-modeling-based Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.5 Expressiveness of Schema Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.6 Validation Invocation Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.7 Usability Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Use of ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 ShEx implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 The Shape Expressions Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.1 Shape Expressions Compact Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.2 Invoking Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.3 Structure of Shape Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.4.4 Start Shape Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Node Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5.1 Node kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.2 Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5.3 Facets on Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.4 Value Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.6.1 Triple Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.6.2 Groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.3 Cardinalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.4 Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6.5 Nested Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6.6 Inverse Triple Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6.7 Repeated Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.8 Permitting other Triples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.7.1 Shape References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.7.2 Recursion and Cyclic References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.7.3 External Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7.4 Labeled Triple Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.5 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.8 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8.1 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.8.2 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.8.3 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.9 Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.9.1 Fixed Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.9.2 Query Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.9.3 Result Shape Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.9.4 JSON Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.9.5 Chaining Validation Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.10 Semantic Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.11 ShEx and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.12 Importing schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.13 RDF and JSON-LD Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.14 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.15 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5 SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1 Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.2 SHACL Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.3 Basic Definitions: Shapes Graphs, Node, and Property Shapes . . . . . . . . . . . 124
5.4 Importing other Shapes Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.5 Validation Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.6 Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.1 Node shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.2 Property Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6.3 Constraint Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.6.4 Human Friendly Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.6.5 Declaring Shape Severities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.6.6 Deactivating Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.7 Target Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.7.1 Target Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5.7.2 Target Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.7.3 Implicit Class Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7.4 Target Subjects Of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.7.5 Target Objects Of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.8 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.9 Constraints on Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.9.1 Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.9.2 Class of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.9.3 Node Kinds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.9.4 Sets of Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.9.5 Specific Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.10 Datatype Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.10.1 Value Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.10.2 String-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.10.3 Language-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.11 Logical Constraints: and, or, not, xone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.11.1 AND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.11.2 OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.11.3 Exactly One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.11.4 Not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.11.5 Combining Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.12 Shape-based Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.12.1 Shape References and Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.12.2 Qualified Value Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.13 Closed Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.14 Property Pair Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.15 Non-validating SHACL Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
5.16 SHACL-SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.16.1 SPARQL Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.16.2 SPARQL-based Constraint Components . . . . . . . . . . . . . . . . . . . . . . 185
5.17 SHACL and Inference Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
5.18 SHACL Compact Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.19 SHACL Rules and Advanced Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5.20 SHACL Javascript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.21 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.22 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1 Describing a Linked Data Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.1.1 WebIndex in ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.1.2 WebIndex in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.2 Describing Clinical Records—FHIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.2.1 FHIR as Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2.2 Consistency constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2.3 FHIR/RDF Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.2.4 Generic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.3 Springer Nature SciGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
6.4 DBpedia Validation Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4.1 Ontology-based Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.4.2 RDF Mappings Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.4.3 Validating Link Contributions with SHACL . . . . . . . . . . . . . . . . . . . 215
6.4.4 Ontology Validation with SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.5 ShEx for ShEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.6 SHACL in SHACL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.8 Suggested Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Preface
This book describes two languages for implementing constraints on RDF data, describing the
main features of both Shape Expressions (ShEx) and Shapes Constraint Language (SHACL)
from a user perspective, and also offering a comparison of the technologies. Throughout this
book, we develop a small number of examples that typify validation requirements and demon-
strate how they can be met with ShEx and SHACL. The book is not intended to be a formal
specification of the languages, for which the interested reader can consult the corresponding
reference documents, but rather, it is meant to serve as an introduction to the technologies with
some background about the rationale of their design and some points of comparison.
Chapter 1 provides a brief introduction to the topic. Chapter 2 presents a short
overview of the RDF data model and RDF-related technologies; this chapter could be skipped
by any reader who already knows RDF or Turtle. Chapter 3 helps the reader to understand what
to expect from data validation. It describes the problem of RDF validation and some approaches
that have been proposed. This book specifically reviews two of these approaches in further detail:
ShEx (Chapter 4) and SHACL (Chapter 5). These chapters describe each language and provide
a practical introduction using examples. Following the discussion of both languages, Chapter 6
presents some applications using either ShEx, SHACL, or both. Finally, Chapter 7 compares
ShEx and SHACL and offers some conclusions.
The goal of this book is to serve as a practical introduction to ShEx and SHACL using
examples. While we omitted formal definitions or specifications, references for further reading
can be found at the end of each chapter. We give a quick overview of some background and
related technologies so that readers without RDF knowledge can follow the book’s contents.
Also, it is not necessary to have any prior knowledge of programming or ontologies to understand
RDF validation technologies. The intended audience is anyone interested in data representation
and quality.
Jose Emilio Labra Gayo, Eric Prud’hommeaux, Iovka Boneva, and Dimitris Kontokostas
July 2017
Foreword by Phil Archer
“Anyone can say anything about anything,” says the mantra for the Semantic Web. More for-
mally, the Semantic Web adopts the Open World Assumption: just because your data encodes
a set of facts, that doesn’t mean there aren’t other facts stated elsewhere about the same thing.
All of which is fine and part of the design of RDF which supports the creation of a graph at
Web scale, but in a lot of practical applications you just need to know whether the triples you’ve
ingested match what you were expecting; you need validation. You might think of it as a de-
fined subset of the whole graph, or maybe a profile, providing a huge boost to interoperability
between disparate systems. If you can validate the data you’ve received then you can process it
with confidence, using more terse code, perhaps with more performant queries. I don’t accept
that RDF is hard, certainly no harder than any other Web technology; what is hard is thinking
in graphs. Keeping in your head that this node supports these properties and has relationships
with those other nodes becomes complex for anything other than trivial datasets. The validation
techniques set out in this book provide a means to tame that complexity, to set out for humans
and machines exactly what the structure of the data is or should be. That’s got to be helpful and,
incidentally, ties in with new work now under way at W3C on dataset exchange. In my role at
W3C I watched as the SHACL and ShEx camps tried hard to converge on a single method:
they couldn’t, hence the two different approaches. Both are described in detail here with co-
pious examples, which is just what you need to get started. How can you choose between the
two methods? Chapter 7 gives a detailed comparison and allows you to make your own choice.
Whichever you choose, this is the book you need to make sense of RDF validation.
Introduction
1.1 RDF AND THE WEB OF DATA
These days more and more devices generate data automatically and it is relatively easy to develop
applications in different domains backed by databases and exposing data to the Web. The amount
and diversity of data produced clearly exceeds our capacity to consume it.
The term big data has emerged to name data that is so large and complex that traditional
data processing applications can’t handle it. Big data has been described by at least three words
starting by V: volume, velocity, variety. Although volume and velocity are the most visible fea-
tures, variety is a key concern which prevents data integration and generates lots of interoper-
ability problems.
RDF was proposed as a graph-based data model which became part of the Semantic Web
vision. Its reliance on the global nature of URIs offered a solution to the data integration problem
as RDF datasets produced by different means can seamlessly be integrated with other data. Data
integration using RDF is faster and more robust than traditional solutions in the face of schema
changes.
RDF is also a key enabler of linked data. Linked data [46] was proposed as a set of best
practices to publish data on the Web. It was introduced by Tim Berners-Lee [8] and was based
on four main principles. RDF is mentioned in the third principle as one of the standards that
provides useful information. The goal is that information must be useful not only for humans
navigating through browsers (for which HTML would be enough) but also for other agents that
may automatically process that data.
The linked data principles became popular and several initiatives were created to publish
data portals. The size of data on the Web increased significantly in the last years. For example,
the LODStats project [36] aggregates around 150 billion triples from 2,973 datasets.
• Disambiguation. The use of IRIs to identify predicates and to make assertions about re-
sources enables the user to globally identify the property that is being asserted as well as
the resources involved in the statement. Those global properties can be identified by auto-
2 1. INTRODUCTION
mated agents which can recognize the data that they must understand in a non-ambiguous
way.
• RDF as an integration language. RDF is compositional in the sense that two RDF graphs
obtained from independent sources can automatically be merged to obtain a larger graph.
This property facilitates the integration of data from heterogeneous sources.
One of the biggest challenges of the current era related with computer science is how to
solve the interoperability problem between different applications that manipulate data that
comes from heterogeneous sources. RDF is a step forward to partially solve this problem
as RDF data can automatically be integrated even if it has been produced by different
parties.
• RDF as a lingua franca for semantic web and linked data. The simplicity and generality of
the RDF data model enables its use to model any kind of data that can be easily integrated
with other data.
RDF is at the core of the semantic web stack or layer cake and is mentioned in the linked
data principles and in the five-star model.
• RDF data stores and SPARQL. SPARQL was proposed as a query language for RDF in
2008. The language met an overwhelming acceptance and adoption by the RDF com-
munity. The ability to query led to the development of many new applications as well as
databases and libraries. RDF data stores began to popularize and some companies started
using RDF internally to represent their data. Some of those applications chose RDF just
for practical reasons, even without reference to the semantic web. Storing RDF and query-
ing it using SPARQL offers a very flexible model which can adapt very quickly to data
model changes. RDF data stores can be seen as part of the NoSQL movement and there
are solutions for RDF data stores with high capabilities that can work with very large
databases [67].
• Extensibility. When one starts to develop an application to solve some problem, it is nec-
essary to record information in a format with room to grow, which enables the data model
to evolve and increasingly adapt to new needs. The extensible graph model of RDF makes
it very easy to add more statements to any graph.
• Flexibility. While a change in a relational database may be difficult to accomplish. RDF
embraces flexibility and these changes are usually a matter of updating the triples.
• Open by default. The semantic web approach to knowledge representation promoted what is
called Open World Assumption (OWA) instead of the Closed World Assumption (CWA)
which was popular in previous knowledge representation systems. The CWA considers
that what is not known to be true must be false, while the OWA considers that what is
not known is just unknown.
1.3. CHALLENGES FOR RDF ADOPTION 3
The CWA is usually applied in systems that have complete information while the OWA
is more natural for incomplete information systems like the Web.
Given that RDF was applied for the semantic web, most of the applications based on RDF
also adopt the Open World Assumption adapting to the appearance of new data.
Although RDF and related technologies employ the Open World Assumption by default,
this does not mean that every application must adopt that assumption. In some contexts,
it may be necessary to take the opposite view and consider that a system contains all the
information on some topic in order to operate.
Alias Namespace
prefix : <http://example.org/>
prefix cex: <http://purl.org/weso/computex/ontology#>
prefix cdt: <http://example.org/customDataTypes#>
prefix dbr: <http://dbpedia.org/resource/>
prefix ex: <http://example.org/>
prefix qb: <http://purl.org/linked-data/cube#>
prefix org: <http://www.w3.org/ns/org#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix schema: <http://schema.org/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix sx: <http://shex.io/ns/shex#>
RDF is being applied to lots of domains, some of them highly specialized. We opted to
present examples using concepts from familiar domains like people, courses, companies, etc.
that we think will be familiar to any reader. Most of the examples use properties borrowed from
schema.org,1 which provides lots of concepts from familiar domains. The examples are just for
illustration purposes and do not pretend to check schema.org rules. Nevertheless, validating
schema.org using ShEx or SHACL can be an interesting exercise for readers.
For examples that involve validation of a node against a shape, we use the following no-
tation:
1 http://schema.org
1.5. CONVENTIONS AND NOTATION 7
which means that node :good validates against shape :Shape, while node :bad does not.
The examples have been tested using the different tools available. We maintain a public
repository where we keep the examples used in this book. The URL is: https://github.com
/labra/validatingRDFBookExamples.
CHAPTER 2
predicate
subject object
An RDF triple asserted means that some relationship, indicated by the predicate, holds
between the resources denoted by the subject and object. This is known as an RDF statement.
The predicate is an IRI that denotes a property. An RDF statement can be thought of as a binary
relation identified by the property between the subject and object.
There can be three kinds of nodes: IRIs, literals, and blank nodes.
• An IRI (Internationalized Resource Identifier) [34] refers to a resource (the referent). A
resource can be any thing. IRIs can appear as subjects, predicates and objects. In Turtle,
IRIs are enclosed by < and >. For example, an IRI can be <http://example.org/john>.
Most RDF formats include some mechanism called prefix declaration which enables to
simplify writing long IRIs declaring prefix labels. A prefix label associates an alias with an
IRI and enables the definition of prefixed names. A prefixed name contains a prefix label
and a local part separated by : and represents the IRI formed by concatenating the IRI
associated with the prefix label and the local part. For example, if ex is declared as a prefix
label to represent <http://example.org/>, then ex:alice is a prefixed name that represents
<http://example.org/alice> (see Figure 2.2).
There are some popular namespace aliases like rdf, xsd, rdfs, owl, etc. The http://pref
ix.cc service can be used to lookup the IRI associated with those popular aliases. The
snippets of code used in this book assume these prefix declarations. Table 1.1
2.2. RDF DATA MODEL 11
denotes
Prefixed name ex:alice <http://example.org/alice>
• A literal denotes resources which have an associated value, for example, an integer or string
value. Literals can only appear as objects in triples. They contain a lexical form and a
datatype IRI which are represented as "lexicalForm"^^datatype in Turtle. For example: "23"
^^xsd:integer represents an integer with value 23 and "1980-03-01"^^xsd:date represents the
March 1, 1980.
All literals in RDF have an associated datatype. In the case of string literals with no de-
clared datatype, it is assumed the xsd:string datatype by default. So "hi" is the same as
"hi"^^xsd:string.
A special type of literals are language-tagged strings, which are literals with datatype
rdf:langString that also contain a language tag [75] to identify a specific language.
Language-tagged strings are represented in Turtle as "string"@tag. For example: "hola"@es
represents the literal value "hola" written in Spanish (es).
• Blank nodes are local identifiers which do not identify specific resources. Blank nodes
can be used as subjects or objects of triples. They specify that something with the given
relationship exists, without explicitly naming it.
In Turtle, blank nodes can be denoted by an underscore followed by a colon and a local
identifier. For example: _:id represents a blank node.
An RDF graph is a set of RDF triples. Notice that the edges of RDF graphs can only
be IRIs. This is an important feature of RDF that enables to globally identify the predicates
asserted by triples. The subjects can only be IRIs or blank nodes, while the objects can be IRIs,
blank nodes or literals.
The corresponding RDF graph has been depicted in Figure 2.3. Rounded boxes represent
IRIs while orange rectangles represent literals.
Robert xsd:string
schema:name
schema:birthDate
1980-03-10 xsd:date
ex:alice
schema:knows schema:birthPlace
ex:carol
Blank nodes can be used to make assertions about some elements whose IRIs are not
known.
An important feature of RDF graphs is that two independent RDF graphs can automati-
cally be merged to obtain a larger RDF graph formed by the union on their sets of triples. Given
the global nature of IRIs, nodes with the same IRI are automatically unified. Using shared IRIs
makes the powerful statement the entities and relationships in one graph carry the same intent
as they do in the other graphs using the same identifiers. In a sense, the use of RDF gets rid of
the data merging problem and lets us focus on the hard problems of establishing shared entities
and vocabularies.
For example, the union of the RDF graphs from Figures 2.3 and 2.4 is depicted in Fig-
ure 2.5. Turtle contains several simplifications to facilitate readability.
ex:alice
ex:carol
schema:knows
schema:knows
schema:age
23 xsd:integer
schema:knows schema:birthPlace
ex:dave
schema:birthPlace
• When the subject is repeated, it is possible to use predicate lists collapsing the triples
with the same subject and to omit it separating the different predicates and objects by
semicolons (;). So, instead of writing
1 ex:bob schema:name " Robert " .
2 ex:bob schema:birthDate "1980 -03 -10"^^ xsd:date .
14 2. THE RDF ECOSYSTEM
schema:birthDate
1980-03-10 xsd:date
schema:knows dbr:Oviedo
ex:alice schema:knows
schema:knows
schema:knows
ex:carol schema:birthPlace
schema:age 23
schema:knows
xsd:integer
schema:knows
schema:birthPlace
ex:dave
schema:birthPlace
it is possible to write:
1 ex:bob schema:name " Robert " ;
2 schema:birthDate "1980 -03 -10"^^ xsd:date ;
3 schema:birthPlace dbr:Oviedo ;
4 schema:knows ex:carol .
• When the subject and predicate are the same, it is possible to use object lists collapsing
the subjects and predicates and separating the different objects by commas (,).
Instead of writing
1 ex:carol schema:knows ex:alice .
2 ex:carol schema:knows ex:bob .
it is possible to write:
2.2. RDF DATA MODEL 15
• Although number and Boolean literals can be defined like other literals with their lexical
form and datatype, there is also a shorthand syntax in Turtle to automatically parse some
values as literals. Table 2.1 shows how some values in shorthand notation are parsed as
literals.
Table 2.1: Shorthand syntax for numbers and Booleans in Turtle
• A triple of the form X rdf:type Y asserts that X has the type represented by Y. In Turtle,
rdf:type can also be represented by the token a, so the previous triple could also be repre-
sented as X a Y.
• RDF collections are list structures chained by the rdf:rest that end with rdf:nil and whose
values are declared by each value of the rdf:first property.
16 2. THE RDF ECOSYSTEM
Example 2.4 RDF collections not simplified
The following snippet declares the results of a marathon as an RDF Collection:
1 :m23 schema:name "New York City Marathon " ;
2 :results _:1 .
Turtle has a special notation for RDF collections enumerating the values enclosed by round
brackets. The previous example can also be represented in Turtle as:
1 :m23 schema:name "New York City Marathon " ;
2 :results (:dave :alice :bob) .
:m23
rdf:first :dave
:results
rdf:first :alice
rdf:rest
rdf:first :bob
rdf:rest
rdf:rest
rdf:nil
The RDF data model is very simple. This simplicity if part of its power as it enables RDF
to be used as a data representation language in a lot of scenarios.
2.4.1 SPARQL
SPARQL (SPARQL Protocol and RDF Query Language) is an RDF query language which is
able to retrieve and manipulate data stored in RDF. SPARQL 1.0 became a recommendation
in 2008 [79] and SPARQL 1.1 was published in 2013 [44].
SPARQL is based on the notion of Basic Graph Patterns which are sets of triple patterns.
A triple pattern is an extension of an RDF triple where some of the elements can be variables
which are denoted by a question mark.
A Basic Graph Pattern matches a subgraph of the RDF data when RDF terms from that
subgraph may be substituted for the variables and the result is an RDF graph equivalent to the
subgraph.
5 SELECT ?x ?y WHERE {
6 ?x schema:birthPlace dbr:Oviedo .
7 ?x schema:knows ?y
8 }
2.4. TECHNOLOGIES RELATED WITH RDF 19
Applying the previous SPARQL query to the RDF data defined in Example 2.3, a
SPARQL processor would return the results shown in Table 2.2.
?x ?y
:carol :alice
:carol :bob
:bob :carol
• Solution modifiers, which once the output of the pattern has been computed as a table
of variables/values, modify those values applying operators like projection, distinct, order,
limit, offset, grouping, etc.
• The output of SPARQL queries can be of different types like: ASK queries, which return
yes/no depending on the existence of matching values, SELECT queries that return a
selection of values for the variables that match a pattern and CONSTRUCT queries,
which return the triples generated from the values that match the pattern.
It contains a nested query (lines 3–5) which groups each element with the number of
known entries and a filter (line 8) which removes those elements whose counter is different to
one.
A full introduction to SPARQL is out of the scope of this book. For the interested reader,
we recommend [33].
20 2. THE RDF ECOSYSTEM
SPARQL is a very expressive language which can be used to describe very complex queries.
It can also be employed to validate the structure of complex RDF graphs [55]. In Section 3.13,
we describe how SPARQL can be used to validate RDF.
RDF Schema
RDF Schema was proposed as a data-modeling vocabulary for RDF data. The first public work-
ing draft of RDF Schema appeared in 1998 [16] and was accepted as a recommendation in
2004 [26].
It is a semantic extension of RDF which provides mechanisms to describe groups of re-
sources and relationships between them. It defines a set of common classes and properties.
The main classes defined in RDFS are:
1 schema:Person a rdfs:Class .
3 :Teacher a rdfs:Class ;
4 rdfs:subClassOf schema:Person .
6 :teaches a rdfs:Property ;
7 rdfs:domain :Teacher ;
8 rdfs:range :Course .
RDF Schema processors contain several rules that enable them to infer new RDF data.
For example, for any C rdfs:subClassOf D and x a C they can infer x a D, and for any p
rdfs:domain C and x p y they can infer x a C.
If we apply those rules to the following data:
1 :alice a :Person .
2 :bob a :Teacher .
3 :carol :teaches :algebra .
An RDFS processor could infer that :bob and :carol have rdf:type :Person and that :algebra
has rdf:type :Course.
OWL
OWL (Web Ontology Language) defines a vocabulary for expressing ontologies based on de-
scription logics. It was published as a W3C recommendation in 2004 [29] and a new ver-
sion, OWL 2, was accepted in 2009 [70]. OWL has several syntaxes: an RDF-based syntax,
functional-style Syntax, manchester syntax, etc., and a formally defined meaning. We will use
RDF syntax in the following examples with Turtle notation.
An ontology can be defined as a vocabulary of terms, usually about a specific domain and
shared by a community of users. Ontologies specify the definitions of terms by describing their
relationships with other terms in the ontology.
The main concepts in OWL are as follows.
• Classes, which represent sets of individuals. Classes can be subclasses of other classes, with
two special classes: owl:Thing that represents the set of all individuals and owl:Nothing that
represents the empty set.
• Individuals, which are elements in the domain. Individuals can be members of an OWL
class.
10 :Woman a owl:Class ;
11 owl:equivalentClass [
12 owl:intersectionOf (
13 :Person
14 [ a owl:Restriction ;
15 owl:onProperty :gender ;
16 owl:hasValue :Female
17 ] ) ] .
Now, we can define :Person as the union of the :Man and :Woman classes, and to declare that
those classes are disjoint.
18 :Person owl:equivalentClass [
19 rdf:type owl:Class ;
20 owl:unionOf ( :Woman :Man )
21 ] .
22
23 [ a owl:AllDisjointClasses ;
24 owl:members ( :Woman :Man )
25 ] .
29 :bob a :Man .
30 :alice a :Person .
31 :bob a :Person .
32 :bob :gender :Male .
OWL can be used to define ontologies in several domains and there are several tools
like the Protégé editor [66] which provide facilities for the creation and visualization of large
ontologies.
5 ex:uniovi a schema:Organization ;
6 schema:member ex:alice ;
7 schema:name " University of Oviedo " .
2.5 SUMMARY
• RDF defines a simple and powerful data model based on directed graphs.
• There are several syntaxes for RDF: Turtle, N-Triples, JSON-LD, RDF/XML, etc.
• Two alternatives to embed metadata in HTML content are RDFa and microdata.
• R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1—Concepts and Abstract Syntax. W3C
Recommendation, February 2014. http://www.w3.org/TR/rdf11-concepts/
• S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Recommendation, 2013.
http://www.w3.org/TR/sparql11-query/
26 2. THE RDF ECOSYSTEM
• D. Brickley and R. V. Guha. RDF Schema 1.1. W3C Recommendation, 2014. http:
//www.w3.org/TR/rdf-schema/
• W. OWL Working Group. OWL 2 Web Ontology Language: Document Overview. W3C
Recommendation, October 2009. http://www.w3.org/TR/owl2-overview/
There are several books introducing the concepts of RDF and Semantic Web in general like:
• J. Hjelm. Creating the Semantic Web with RDF: Professional Developer’s Guide. Professional
Developer’s Guide Series. Wiley, 2001
• S. Powers. Practical RDF. O’Reilly & Associates, Inc., Sebastopol, CA, 2003
• T. B. Passin. Explorer’s Guide to the Semantic Web. Manning Publications Co., Greenwich,
CT, 2004
• T. Segaran, C. Evans, J. Taylor, S. Toby, E. Colin, and T. Jamie. Programming the Semantic
Web, 1st ed. O’Reilly Media, Inc., 2009
• J. Hebeler, M. Fisher, R. Blace, and A. Perez-Lopez. Semantic Web Programming. Wiley
Publishing, 2009
• P. Hitzler, M. Krötzsch, and S. Rudolph. Foundations of Semantic Web Technologies. Chap-
man & Hall/CRC, 2009
• G. Antoniou, P. Groth, F. v. v. Harmelen, and R. Hoekstra. A Semantic Web Primer. The
MIT Press, 2012
And about particular topics:
• Linked data: T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global
Data Space, volume 1. Morgan & Claypool Publishers LLC, February 2011. DOI:
10.2200/s00334ed1v01y201102wbe001
• SPARQL: B. DuCharme. Learning SPARQL. O’Reilly Media, Inc., 2011
• OWL and semantic modeling: D. Allemang and J. Hendler. Semantic Web for the Working
Ontologist: Effective Modeling in RDFS and OWL, 2nd ed. Morgan Kaufmann Publishers
Inc., San Francisco, CA, 2011
CHAPTER 3
Data Quality
People have been using computers to record and reason about data for many decades. Typically,
this reasoning is less esoteric than artificial intelligence tasks like classification.
A data modeler usually has some structure of the data that she is trying to model. That
structure must be explicitly defined and communicated using some technology that can at the
same time be understood by other people and also be processed by automatic systems that can
check and enforce it. Using natural language for that is not enough as it can have ambiguities
and is difficult to process by machines. On the other hand, enforcing that structure using some
procedural programming language is difficult to maintain by other people. The right balance is
usually to have some declarative language that can be readable by humans but at the same time
parsed and checked by machines.
Rigorous data validation is like a contract that offers advantages to several different parties.
• Consumers have an easier time understanding the semantics of data. For instance, a data
structure that requires either a full name or a given and family name has a simple intuition
while one that has optional full, given and family names leaves the consumer unsure about
the many combinations she may encounter in the data.
• Programmers have to do much less “defensive coding” when working with predictable
data. A programmer need not write special cases for permutations like no name, a full
name and a given name, etc. Introducing quality control into data workflows can reduce
security exploits and catch systematic errors when they first occur rather than years later
when someone stumbles across inconsistent data. For instance, a process may erroneously
insert multiple primary addresses if no system enforces that a person should have no more
than one primary address.
• Producers can precisely define and validate their output. This allows them to test consis-
tency with business processes, perform quality control, and unambiguously communicate
their assets to other parties.
• Queriers can tailor the sophistication of their queries to address a constrained set of pos-
sibilities. Queriers are a specific kind of consumers who are especially vulnerable to sys-
tematic data errors. Unexpected variations in data structures can result in missing query
results. Possibly worse, a single accidental duplication of a record can result in it being
counted many times, once for each combination of attributes in the original and duplicate
record.
28 3. DATA QUALITY
3.1 NON-RDF SCHEMA LANGUAGES
While RDF is a relative newcomer to the data scene, most widely-used structured data languages
have a way to describe and enforce some form of data consistency. Examining UML, SQL,
XML, JSON, and CSV allows us to set expectations for RDF validation.
3.1.1 UML
The Unified Modeling Language (UML) is a general-purpose visual modeling language that
can be used to provide a standard way to visualize the design of a system [85]. In 2005, the
Object Management Group (OMG) published UML 2, a revision largely based on the same
diagram notations, but using a modeling infrastructure specified using Meta-Object Facility
(MOF). UML contains 14 types of diagrams, which are classified in three categories: structure,
behavior and interaction. The most popular diagram is the UML class diagram, which defines
the logical structure of a system in terms of classes and relationships between them. Given the
Object Oriented tradition of UML, classes are usually defined in terms of sets of attributes and
operations.
UML class diagrams are employed to visually represent data models.
UML diagrams are typically not refined enough to provide all the relevant aspects of a
specification. There is, among other things, a need to describe additional constraints about the
3.1. NON-RDF SCHEMA LANGUAGES 29
objects in the model. OCL (Object Constraint Language)1 has been proposed as a declarative
language to define this kind of constraints. It can also be used to define well-formedness rules,
pre- and post-conditions, model transformations, etc.
OCL contains a repertoire of primitive types (Integer, Real, Boolean, String) and several
constructs to define compound datatypes like tuples, ordered sets, sequences, bag and sets.
1 http://www.omg.org/spec/OCL/
30 3. DATA QUALITY
Example 3.3 DDL
1 CREATE TABLE User (
2 id INTEGER PRIMARY KEY NOT NULL ,
3 name VARCHAR (40) NOT NULL ,
4 birthDate DATE ,
5 birthPlace VARCHAR (50) ,
6 gender ENUM('male ','female ')
7 );
8
While implementation support for constraints and datatypes varies, popular datatypes
include numerics like various precisions of integer or float, characters, dates and strings.
Two popular constraints in DDL are for primary and foreign keys. In SQL and DDL,
attribute values are primitive types, which is to say that a user’s course is not a course record, but
instead typically an integer that is unique in some table of courses.
Enrolledin
studentId CourseId
82 23
34 45
… …
3.1.3 XML
XML was proposed by the W3C as an extensible markup language for the Web around
1996 [98]. XML derives from SGML [42], a meta-language that provides a common syn-
tax for textual markup systems and from which the first versions of HTML were also derived.
Given its origins in typesetting, the XML model is adapted to represent textual information
that contains mixed text and markup elements.
The XML model is known as the XML Information Set (XML InfoSet) and consists of
a tree structure, where each node of the tree is defined to be an information item of a particular
type. Each item has a set of type-specific properties associated with it. At the root there is a
document item, which has exactly one element as its child. An element has a set of attribute
items and a list of child elements or text nodes. Attribute items may contain character items or
they may contain typed data such as name tokens, identifiers and references. Element identifiers
and references may be used to connect nodes transforming the underlying tree into a graph.
XML became very popular in industry and a lot of technologies were developed to query
and transform XML. Among them, XPath was a simple language to select parts of XML doc-
uments that was embedded in other technologies like XSLT or XQuery.
The next XPath snippet finds the names of all students whose gender is "Female":
1 // student [ gender = " Female "]/ name
XML defines the notion of well-formed documents and valid documents. Well-formed
documents are XML documents with a correct syntax while valid documents are documents
that in addition of being well-formed, conform to some schema definition.
32 3. DATA QUALITY
root
course
ref="bob" Robert
DTD defines the structure of XML using a basic form of regular expressions. However,
DTDs have a limited support for datatypes. For example, it is not possible to validate that
the birth date of a student has the shape of a date.
3.1. NON-RDF SCHEMA LANGUAGES 33
• XML Schema. This specification was divided in two parts. The first part specifies the
structure of XML documents [89] and the second part a repertoire of XML Schema
datatypes [9].
An XML Schema validator decorates each structure of the XML document with addi-
tional information called the Post-Schema Validation Infoset, or PSVI. This structure
contains information about the validation process that can be later employed by other
XML tools.
• RelaxNG [20] was developed within the Organization for the Advancement of Structured
Information Standards (OASIS) as an alternative for XML Schema. RelaxNG has two
syntaxes: an XML-based one and a compact one. RelaxNG is grammar based and its
semantics is formally defined by means of axioms and inference rules.
1 element course {
2 element student {
3 element name { xsd:string },
4 element gender { ”Male” | ”Female” },
5 element birthDate { xsd:date }?,
6 attribute id { xsd:ID }
7 }* ,
8 attribute name { xsd:string }
9 }
• Schematron [50] is a rule-based language based on patterns, rules, and assertions. An as-
sertion contains an XPath expression and an error message. The error message is displayed
when the XPath expression fails. A rule groups various assertions together and defines
3.1. NON-RDF SCHEMA LANGUAGES 35
a context in which assertions are evaluated using an XPath expression. Finally, patterns
group various rules together.
Schematron has more expressive power than other schema languages like DTDs, RelaxNG
or XML Schema as it can express complex constraints that are impossible with them. In
fact, it is often used to define business rules.
Although Schematron can be used as a stand-alone, it is commonly used in cooperation
with other schema languages which define the document structure.
• Directly associate instance data with XML Schema. It can be done, for example, using the
xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes.
For example, the following XML document directly declares that it follows the schema
identified by http://example.org/ns/Course which is located at http://example.org/course.xsd:
1 <course xmlns:xsi ="http: // www.w3.org /2001/ XMLSchema - instance "
2 xsi:schemaLocation ="http: // example .org/ns/ Course
3 http: // example .org/ course .xsd">
4 ...
5 </ course >
• The XML processing instruction <?xml-model ?> has been proposed to associate an XML
document with a schema [43].
1 <?xml -model href="http: // example .org/ course .rng" ?>
2 <?xml -model href="http: // example .org/ course .xsd" ?>
3 <course name=" Algebra ">
4 ...
5 </ course >
3.1. NON-RDF SCHEMA LANGUAGES 37
Note that the XML model processing instruction enables to use multiple schemas for the
same document.
3.1.4 JSON
JSON was proposed by Douglas Crockford around 2001 as a subset of Javascript (the original
acronym was Javascript Object Notation). It has evolved as an independent data-interchange
format with its own ECMA specification [35].
A JSON value, or JSON document, can be defined recursively as follows.
• true, false and null are JSON values.
• Any string of Unicode characters enclosed by " is also a JSON value, called a string value.
• If k1 ; k2 ; : : : ; kn are distinct string values and v1 ; v2 ; : : : ; vn are JSON values, then fk1 W
v1 ; k2 W v2 ; : : : ; kn W vn g are JSON values, called objects. In this case, each ki W vi is a key-
value pair. The order of the key-value pairs is not significant.
• If v1 ; v2 ; : : : ; vn are JSON values, then Œv1; v2; : : : ; vn are JSON values, called arrays. The
order of the array elements is significant.
Note that in the case of arrays and objects the values vi can again be objects or arrays, thus
allowing the documents an arbitrary level of nesting. In this way, the JSON data model can be
represented as a tree [14].
"name" "name"
"Algebra"
1
"gender" "gender"
"Alice" "Female" 18 "Bob" "Male" "1980-09-24"
JSON Schema [101] was proposed as an Schema language for JSON with a role similar
to XML Schema for XML. It is written itself using JSON syntax and is programming language
agnostic. It contains the following predefined datatypes: null, Boolean, object, array, number
and string, and allows to define constraints on each of them.
In JSON Schema, it is possible to have reusable definitions which can later be referenced.
Recursion is not allowed between references [74].
3.1.5 CSV
Comma-Separated Values (CSV) and Tab-Separated Values (TSV) files have historically had
no format-specific schema language. A common use case for CSV (and TSV) is to import
it into a relational database, where it is subject to the same integrity constraints as any other
SQL data. However, wide-ranging practices for documenting table structure and semantics have
historically made it hard for consumers of CSV to consume published CSV data with confidence.
Column headings and meanings may appear as rows in the CSV file, columns in an auxiliary
CSV or flat file, or be omitted entirely.
Spreadsheets are another common generator and consumer of CSV data. Some spread-
sheets may have hand-tooled integrity constraints but they offer no standard schema language.
While traditionally schema-less, a recent standard, CSV on the Web (CSVW) attempts
to describe the majority of deployed CSV data. This includes semantics (e.g., mapping to an
ontology), provenance, XML Schema length and numeric value facets (e.g., minimum length,
max exclusive value), and format and structural constraints like foreign keys and datatypes.
CSVW describes a wide corpus of existing practice for publishing CSV documents. Be-
cause of it’s World Wide Web orientation, it includes internationalization and localization fea-
tures not found in other schema languages. Where most data languages standardize the lexical
representation of datatypes like dateTime or integer, CSVW describes a wide range of region
40 3. DATA QUALITY
or domain-specific datatypes. For instance, the following can all be representations of the same
numeric value: 12345.67, 12,345.67, 12.345,67, 1,23,45.67.
CSVW is also unusual in that it can be used to describe denormalized data. Because of
this, it includes separator specifiers to aid in micro-parsing individual data cells into sequences
of atomic datatypes.
CSVW is a very new specification and applies to a domain with historically no standard
schema language. Tools like CSVLint2 are adopting CSVW as a way to offer interoperable
schema declarations to enable data quality tests.
Although there have been several previous attempts to define RDF validation technologies
(see Section 3.3) this book focuses on ShEx and SHACL.
In this section we describe what are the particular concepts of RDF that have to be taken
into account for its validation:
Graph data model RDF is composed of triples, which have arcs (predicates) between nodes.
We can describe:
• the form of a node (the mechanisms for doing this will be called “node constraints”);
Shape of RDF
:IRI schema:name string (1, 1);
Nodes that
schema:knows IRI (, *)
represent Users
Unordered arcs A difference between RDF and XML with regards to their data model is that
while in RDF, the arcs are unordered, in XML, the sub-elements form an ordered sequence.
RDF validation languages must not assume any order on how the arcs of a node will be treated,
while in XML, the order of the elements affect the validation process.
From a theoretical point of view, the arcs related with a node in RDF can be represented
as a bag or multiset, i.e., a set which allows duplicate elements.
RDF Validation ¤ Ontology ¤ Instance data Notice that RDF validation is different from
ontology definition and also different from instance data.
• Ontologies are usually focused on real-world things or at least objects from some domain.
The semantic web community has put a lot of emphasis on defining ontologies for different
domains and there are several vocabularies like OWL, RDFS, etc. that can be used to that
end. People concerned with this level are ontology engineers which must have skills to
understand how to represent the knowledge of some domain.
• Instance data refers to the data of some situation or problem at any given point. That data
can be obtained from different sources and is materialized in some data representation
language. In our case, instance data refers to RDF graphs that are created by developers
and programmers, or generated automatically from other sources like sensors.
• RDF validation is an intermediate process that can check if that instance data conforms to
some desired schema. In the case of RDF, it is focused on RDF graph features which are
at a lower level than ontology features. The people interested in RDF data description and
validation are data engineers and have concerns that are different from those of ontology
engineers. Data engineers are more worried about how to model data so the developers
can effectively and efficiently produce or consume it.
Figure 3.6 represents the difference between instance data, ontology definitions, and RDF
validation.
Shapes ¤ Types Given the open and flexible nature of RDF, nodes in RDF graphs can have
zero, one or many rdf:type arcs.
42 3. DATA QUALITY
schema:knows a owl:ObjectProperty ;
Ontology rdfs:domain schema:Person ;
rdfs:range schema:Person .
Some application can use nodes of type schema:Person with some properties while another
application can use nodes with the same type but different properties. For example, schema:Person
can represent friend, invitee, patient,...in different applications or even in different contexts
of the same application. The same types can have different meanings and different structure
depending on the context.
While from an ontology point of view a concept has a single meaning, applications that are
using that same concept may select different properties and values and thus, the corresponding
representations may differ.
Nodes in RDF graphs are not necessarily annotated with fully discriminating types. This
implies that it is not possible to validate the shape of a node by just looking at its rdf:type arc.
We should be able to define specific validation constraints in different contexts.
Inference Validation can be performed before or after inference. Validation after inference (or
validation on a backward-chaining store that does inference on the fly) checks the correctness of
the implications. An inference testing service could use an input schema describing the contents
of the input RDF graph and an output schema describing the contents of the expected inferred
RDF graph. The service can check that instance data conforms to the input schema before infer-
ence and that after applying a reasoner, the resulting RDF graph with inferred triples, conforms
to the output schema.
Example 3.11 Suppose we have a schema with two shapes, each with one requirement:
• PersonShape requires an rdf:type of :Person
• TeacherShape requires an rdf:type of :Teacher
If we validate the following RDF graph without inference, only :alice would match
PersonShape. However, if we validate the RDF graph that results of applying RDF Schema in-
ference, then both :bob and :carol would also match PersonShape.
3.2. UNDERSTANDING THE RDF VALIDATION PROBLEM 43
4 :alice a :Person .
6 :bob a :Teacher .
Validation workflows will likely perform validation both before and after validation. Sys-
tems which perform possibly incomplete inference can use this to verify that their light-weight,
partial inference is producing the required triples.
RDF flexibility RDF was born as a schema-less language, a feature which provided a series
of advantages in terms of flexibility and adaptation of RDF data to different scenarios.
The same property, can have different types of values. For example, a property like
schema:creator can have as value a string literal or a more complex resource.
Repeated properties Sometimes, the same property is used for different purposes in the same
data. For example, a book can have two codes with different structure.
1 :book schema:name "Moby Dick";
2 schema:productID "ISBN -10 :1503280780 ";
3 schema:productID "ISBN -13 :978 -1503280786 " .
This is a natural consequence of the re-use of general properties,3 which is especially com-
mon in domains where many kinds of data are represented in the same structure.
1 :Obs1 a fhir:Observation ;
2 fhir:Observation .code fhir:LOINC8310 -5 ;
3 fhir:Observation . valueQuantity 36.5 ;
4 fhir:Observation . valueUnit "Cel" .
We can see that a blood pressure observation must have two instances of the
fhir:Observation.component property, one with a code for a systolic measurement and the other
with a code for a diastolic measurement.
Treating these two constraints on the property fhir:Observation.component individually
would cause the systolic constraint to reject the diastolic measurement and the diastolic con-
straint to reject the systolic measurement—both constraints must be considered as being satisfied
if one of the components satisfies one and the other component satisfies the other.
Closed Shapes The RDF dictum of anyone can say anything about anything is in tension with
conventional data practices which reject data with any assertions that are not recognized by
the schema. For SQL schemas, this is enforced by the data storage itself; there’s simply no
place to record assertions that does not correspond to some attribute in a table specified by the
DDL. XML Schema offers some flexibility with constructs like <xs:any processContents="skip">
but these are rare in formats for the exchange of machine-processable data. Typically the edict
is if you pass me something I do not understand fully, I will reject it.
For shapes-based schema languages, a shape is a collection of constraints to be applied to
some node in an RDF graph and if it is closed, every property attached to that node must be
included in the shape.
Even if the receiver of the data permits extra triples, it may not be able to store or return
them. For instance, a Linked Data container may accept arbitrary data, search for sub-graph
which it recognizes, and ignore the rest. A user expecting to put data in such a container and
5 Simplified from http://build.fhir.org/observation-example-bloodpressure.ttl.html.
3.3. PREVIOUS RDF VALIDATION APPROACHES 45
retrieve it will have a rude surprise when he gets back only a subset of the submitted data. Even
if the receiver does not validate with closed shapes, the user may wish to pre-emptively validate
their data against the receiver’s schema, flagging any triples not recognized by the schema.
Another value of closed shapes is that it can be used to detect spelling mistakes. If a shape
in a schema includes an optional rdfs:label and a user has accidentally included an rdf:label, the
schema has no way to detect that mistake unless all unknown properties are reported.
Like with repeated properties, the validation of closed shapes must consider property con-
straints as a whole, rather than examining each individually.
Using plain-SPARQL queries for RDF validation has the following benefits.
• It is very expressive and can handle most RDF validation needs.
• SPARQL is ubiquitous: most of RDF products already have support for SPARQL.
But it also has the following problems.
• Being very expressive, it is also very verbose. SPARQL queries can be difficult to write and
debug by non-experts.
• It can be idiomatic in the sense that there can be more than one way to encode the same
constraint.
• For all but the simplest data structures, it is complex to exhaustively write SPARQL queries
which accept all valid permutations and reject all incorrect structures. This exhaustive enu-
meration is essentially the job of the approaches described below.
SPARQL Inferencing Notation (SPIN)[51] was introduced by TopQuadrant as a mech-
anism to attach SPARQL-based constraints and rules to classes. SPIN also contained tem-
plates, user-defined functions and template libraries. SPIN rules are expressed as SPARQL
ASK queries where true indicates an error or CONSTRUCT queries that produce violations.
SPIN uses the expressiveness of SPARQL plus the semantics of the variable ?this standing for
the current focus node (the subject being validated).
SPIN has heavily influenced the design of SHACL. The Working Group has decided
to offer a SPARQL based semantics and the second part of the working draft also contains
a SPIN-like mechanism for defining SPARQL native constraints, templates and user-defined
functions. There are some differences like the renaming of some terms and the addition of more
core constraints like disjunction, negation or closed shapes. The following document describes
how SHACL and SPIN relate (http://spinrdf.org/spin-shacl.html).
There have been other proposals using SPARQL combined with other technologies. Für-
ber and Hepp [39] proposed a combination between SPARQL and SPIN as a semantic data
quality framework, Simister and Brickley [90] propose a combination between SPARQL queries
and property paths which is used by Google and Kontokostas et al. [53] proposed RDFUnit a
Test-driven framework which employs SPARQL query templates that are instantiated into con-
crete quality test queries.
3.3. PREVIOUS RDF VALIDATION APPROACHES 47
3.3.2 INFERENCE-BASED APPROACHES
Inference based approaches adapt RDF Schema or OWL to express validation semantics. The
use of Open World and Non-unique name assumption limits the validation possibilities. In fact,
what triggers constraint violations in closed world systems leads to new inferences in standard
OWL systems. Motik, Horrocks, and Sattler [64] proposed the notion of extended description
logics knowledge bases, in which a certain subset of axioms were designated as constraints.
In [72], Peter F. Pater-Schneider, separates the validation problem in two parts: integrity
constraint and closed-world recognition. He shows that description logics can be implemented
for both by translation to SPARQL queries.
In 2010, Tao et al. [96] had already proposed the use of OWL expressions with Closed
World Assumption and a weak variant of Unique Name Assumption to express integrity con-
straints.
Their work forms the bases of Stardog ICV [21] (Integrity Constraint Validation), which
is part of the Stardog database. It allows to write constraints using OWL syntax but with a
different semantics based on a closed world and unique name assumption. The constraints are
translated to SPARQL queries. As an example, a User could be specified as follows.
10 schema:name a owl:DatatypeProperty ,
11 owl:FunctionalProperty ;
12 rdfs:domain schema:Person ;
13 rdfs:range xsd:string .
14
15 schema:gender a owl:ObjectProperty ,
16 owl:FunctionalProperty ;
17 rdfs:domain schema:Person ;
18 rdfs:range :Gender .
19
20 schema:knows a owl:ObjectProperty ;
48 3. DATA QUALITY
21 rdfs:domain schema:Person ;
22 rdfs:range schema:Person .
23
24 schema:Female a :Gender .
25 schema:Male a :Gender .
Instance nodes are required to have an rdf:type declaration whose value is schema:Person.
Dublin Core Application Profiles [23] also define a set of validation constraints using
Description Templates
Fischer et al. [38] proposed RDF Data Descriptions as another domain specific language
that is compiled to SPARQL. The validation is class based in the sense that RDF nodes are val-
idated against a class C whenever they contain an rdf:type C declaration. This restriction enables
the authors to handle the validation of large datasets and to define some optimization techniques
which could be applied to shape implementations.
• VR 2. Concise: Schemas must be easy to understand, read, and write by humans. Verbose
languages tend to be neglected by their users.
• VR 6. Least power: The schema language must be able to do its job well but no more than
that. Although one could use whole procedural languages like Java or Python to validate
RDF, doing it in this way will be cumbersome as the validation rules will be interspersed
with the code [97]. This principle states that a declarative language should be preferred
over a procedural one.
• VR 7. Focus identification: A validation process must identify the graph nodes that are
expected match constraints. Unlike tree structures like XML or JSON, graphs like RDF
have no “root” node. For RDF, the focii would be IRIs, literals and blank nodes which are
subject to validation.
50 3. DATA QUALITY
• VR 8. Properties: A schema language must be able to describe which arcs relate with which
nodes. In the case of RDF, arcs between nodes are called properties or predicates and are
IRIs. The schema language must be able to describe the properties that depart from some
nodes.
• VR 9. Repeated properties: Some of the arcs that depart from a node may be repeated and
the nodes that they point to could have different structure. The schema language must be
able to declare that some properties can appear repeated but with different contents.
• VR 10. Inverse properties: It must be possible to describe the incoming arcs of a node,
which are also called inverse properties.
• VR 11. Paths: The schema language must be able to describe the paths that relate two
given nodes in a graph. SPARQL 1.1 contains a language to describe paths in an RDF
graph. For example, the transitive traversal of the rdfs:subClassOf property can be expressed
as rdfs:subClassOf*.
• VR 12. Node kinds: The RDF data model contains three kinds of nodes: IRIs, Literals, and
BNodes. The schema language must be able to describe the kind of some specific nodes
• VR 13. Datatypes: The schema language must be able to describe which are the datatypes
that some nodes have.
• VR 14. Datatype facets: The XML Schema datatypes are the most popular datatypes em-
ployed in RDF datasets. Those datatypes can be qualified with facets which constrain the
possible values. For example, one can say that a value is an xsd:integer between 10 and 20.
• VR 15. Language tags: The schema language can describe the language tag associated with
literals of type rdf:langString.
• VR 16. Conjunction: It must be possible to declare that some content must satisfy all the
constraints in a set.
• VR 17. Disjunction: It must be possible to declare that some content must satisfy some of
the constraints in a set.
3.4. VALIDATION REQUIREMENTS 51
• VR 18. Addition: It must be possible to declare that some content must be the addition of
some content. In the case of RDF graphs, one may want to declare that a node must have
some content and some other content.
• VR 19. Regular cardinalities: . The schema must support regular cardinalities like optional,
zero or more, one or more.
• VR 20. Numerical cardinalities: . The schema must support numerical cardinalities like
repetitions between m and n, or at least m repetitions.
• VR 21. Negation: It must be possible to declare that some content must not satisfy some
constraint.
• VR 22. Recursion: It must be possible to declare that some group of constraints that depend
on another group in a recursive way.
• VR 23. OneOf : It must be possible to declare that some content can have one of several
structures. For example, a person can have either a full name or a combination of first name
and last name, but not both.
• VR 24. Open/Closed models: The schema language must be able to define that some content
is open and admits other features apart from the declared structure or closed and does not
admit other features.
• VR 25. Co-occurrence constraints: The schema language must be able to declare that the
appearance of some content affects other content.
3.5 SUMMARY
In this chapter we learned which are the main motivations for validating RDF. We started de-
scribing what do other technologies do for validation with an overview of UML, SQL, XML,
3.6. SUGGESTED READING 53
JSON, and so on. This section was aimed to present those technologies and to gather some list
of validation requirements that are common to all of them.
We also described some of the previous RDF validation approaches and collected a list
of validation requirements that a good schema language for RDF validation must fulfil. Notice
that some of them contradict each other, so it is necessary to reach some compromise solution.
Shape Expressions
Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx
was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource
Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the
language were adopted from Turtle [80] and SPARQL [44] with tokens for grouping, repetition
and wildcards from regular expression and RelaxNG Compact Syntax [100]. The language was
described in a paper [80] and codified in a June 2014 W3C member submission [92] which
included a primer and a semantics specification. This was later deemed “ShEx 1.0”.
The W3C Data Shapes Working group started in September 2014 and quickly coalesced
into two groups: the ShEx camp and the SHACL camp. In 2016, the ShEx camp split from
the Data Shapes Working Group to form a ShEx Community Group (CG). In April of 2017,
the ShEx CG released ShEx 2 with a primer, a semantic specification and a test-suite with
implementation reports.
As of publication, the ShEx Community Group was starting work on ShEx 2.1 to add
features like value comparison and unique keys. See the ShEx Homepage http://shex.io/
for the state of the art in ShEx. A collection of ShEx schemas has also been started at https:
//github.com/shexSpec/schemas.
• They can have zero or more properties schema:knows whose value must be an IRI and con-
form to the :User shape.
Example 4.1
1 PREFIX : <http: // example .org/>
2 PREFIX schema: <http: // schema .org/>
3 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
4
5 :User {
6 schema:name xsd:string ;
7 schema:birthDate xsd:date ? ;
8 schema:gender [ schema:Male schema:Female ] OR xsd:string ;
9 schema:knows IRI @:User *
10 }
• :alice conforms because it contains schema:name and schema:gender with their corresponding
values. It does not contain the property schema:birthDate but that property is optional, as
indicated by ‘?‘. It also has the property schema:knows with the value :bob which has :User
shape.
• :bob conforms because it contains the properties and values of the :User shape. Note that
the order in which triples are expressed in the example does not matter. These are parsed
into an RDF graph and RDF graphs are unordered collections of triples.
ShEx shapes are open by default, which means that they constrain neither the existence
nor the value of the properties not mentioned in the shape. This behavior can be modified
using the CLOSED qualifier as we will explain in Section 4.6.8.
• :dave fails because the value of schema:birthDate is 1980 (an integer) which is not an xsd:date.
• :emilyfails because it has two values for property schema:name. Unless otherwise specified,
the default cardinality is “exactly one” (which can also be written as “{1}” or “{1,1}”).
• :gracefails because the value of schema:knows is a blank node and there is a node constraint
saying that it must be an IRI.
• :harold fails because the value of schema:knows is :grace and :grace does not conform to the
:User shape.
• ShExkell for Haskell (Sergio Iván Franco and Weso Research Group) https://github
.com/weso/shexkell.
There are also several online demos and tools that can be used to experiment with ShEx.
• shex.js (http://rawgit.com/shexSpec/shex.js/master/doc/shex-simple.html);
• Relative and absolute IRIs are enclosed by <> and prefixed names (a shorter way to write
out IRIs) are written with prefix followed by a colon “:”.
• Literals can be enclosed by the same quotation conventions (', ", ''', """) as in Turtle.
• Keywords (apart from a) are not case sensitive. Which means that MinInclusive is the same
as MININCLUSIVE.
A ShExC document declares a ShEx schema. A ShEx schema is a set of labeled shape
expressions which are composed of node constraints and shapes. These constrain the permissible
values or graph structure around a node in an RDF graph. When we are considering a specific
node, we call that node the focus node.
The triples which have the focus node as a subject are called outgoing arcs; those with
the focus node as an object are called incoming arcs. (Typical RDF idioms call for constraints
on outgoing arcs much more frequently than on incoming arcs.) Together, the incoming and
outgoing arcs are called the neighborhood of that node.
Shape expression labels can be IRIs or blank nodes but only IRI labels can be referenced
from outside the schema. In the previous Example 4.1, :User is an IRI label.
Node constraints declare the shape of a focus node without looking at the arcs. They can
declare the kind of node (IRI, blank node or literal), the datatype in case of literals, describe it
with XML Schema facets (e.g., min and max numeric values, string lengths, number of digits), or
enumerate a value set. Figure 4.1 signals the node constraints that appear in Example 4.1 which
are: xsd:string and xsd:date (datatype constraints), [schema:Male schema:Female] (a value set), IRI (a
node kind declaration) and @:User (a value shape). Node constraints will be described in more
detail in Section 4.5.
Triple constraints define the triples that appear in the neighborhood of a focus node. They
usually contain a property (or inverse property), a node constraint, and a cardinality declaration
which is one by default.
60 4. SHAPE EXPRESSIONS
Node constraints
:User {
schema:name xsd:string ;
schema:birthDate xsd:date ? ;
schema:gender [schema:Male schema:Female] OR xsd:string;
schema:knows IRI @:User *
}
Node constraints
For example, schema:name xsd:string is a triple constraint. The :User shape from Example 4.1
was formed by four triple constraints. Triple constraints will be described later in Section 4.6.1.
:User {
schema:name xsd:string ;
schema:birthDate xsd:date? ; Triple
schema:gender [schema:Male schema:Female] OR xsd:string; constraints
schema:knows IRI @:User*
}
Triple constraints can be grouped using the semicolon operator ; to form triple expres-
sions.1 Shapes are enclosed by curly braces {} and contain triple expressions.
Shapes are the basic form of shape expressions, although more complex shape expressions
can be formed by combining the logical operators AND, OR and NOT which will be later described
in Section 4.6. Shape expressions are identified by shape expression labels.
Figure 4.4 shows a compound shape expression formed by combining the shape reference
@:User with a shape that contains a single triple constraint :teaches @:Course using the AND operator.
The full ShEx BNF grammar is specified at http://shex.io/shex-semantics/#she
xc.
Shape expression
label
Shape espression
Shape
ShEx validation takes as input a schema, an RDF graph, and a shape map, and returns
another shape map.
The input shape map (called fixed shape map) contains a list of nodeSelector@shapeLabel as-
sociations separated by commas, where nodeSelector is an RDF node and shapeLabel is a shape
label. Both use N-Triples notation.
A fixed map would look like:
1 <http: // data. example /# alice >@<http: // schema . example /# User >,
2 <http: // data. example /#bob >@<http: // schema . example /# User >
Although shape maps use absolute IRIs for RDF nodes and shape labels, we will use
prefixes to abbreviate them in our listings:
1 :alice@:User ,
2 :bob@User
Note that during evaluation, the processor may need to check the conformance of other
nodes against other shapes.
1 :User {
2 schema:name xsd:string ;
3 schema:knows @:User *
4 }
The reason is that in order to check that :alice conforms to :User, the processor must
check that :carol also conforms to :User and hence, it adds the association :carol@:User to the
result shape map.
Figure 4.5 depicts the validation process.
There are many use case-dependent ways to compose a fixed shape map. ShEx defines a
common one called query shape map which uses triple patterns to select nodes. Triple patterns
use curly braces and three values that represent the subject, predicate and object of a triple. They
can contain the value FOCUS to identify the node we want to select and _ to indicate that we do
not constrain some value.
Section 4.9 describes fixed shape maps and query shape maps in greater detail.
4.4. THE SHAPE EXPRESSIONS LANGUAGE 63
ShExSchema
:User {
schema:name xsd:string ;
schema:knows @:User *
}
RDF Graph
:alice schema:name "Alice";
schema:knows :carol .
Figure 4.5: Validation process which accepts a fixed shape map and emits a result shape map.
In the previous example, validating :alice as a :User entailed validating :carol as a :User.
Unless the validation engine has some sort of state persistence, it would be more efficient to
validate once with a shape map like:
1 : alice@ :User ,: carol@ :User
This recursive structure forms a tree which has node constraints and shapes as leaves.
Figure 4.6 represents the ShEx data model.
ShapeExpr
TripleExpr
Node constraints and shapes are described in the following sections while the logical op-
erators are discussed in Section 4.8 and external shapes in Section 4.7.3.
3 <Patient > {
4 ...
5 }
6 ...
In the compact syntax, the directive start = @<Patient> declares that the shape expression
<Patient> will be used by default if a shape is not explicitly provided in the shapes map.
In shape maps, it is possible to declare that a node must be validated against the shape
map by using the keyword START. For example, the following shape map:
1 :alice@START ,
2 :bob@ <Doctor >
would validate :alice against the start shape expression (in the previous example, it would be
<Patient>) and :bob against <Doctor>.
Example 4.5
Any place one does not want a node constraint, can be marked with a period ("."). This
is analogous to the period which matches any character in regular expressions. The following
example lists the properties that a :User must have but it does not specify any constraint in their
values:
1 :User {
2 schema:name . ;
3 schema:alternateName . * ;
4 schema:birthDate . ?
5 }
If we provide the shape map :alice@:User,:bob@:User the ShEx processor would return that
they both conform.
The result would be that the first two nodes are conformant while the last two nodes are
non-conformant.
It is also possible to combine top-level node constraints with more complex shapes.
Example 4.8 Node constraint as top-level shape
The following declaration of shape :User says that nodes conforming to shape :User must
be IRIs and have a property schema:name with an xsd:string value.
4.5. NODE CONSTRAINTS 67
In this case, the external AND can be omitted, so the previous shape is equivalent to:
1 :User IRI {
2 schema:name xsd:string
3 }
Table 4.1 gives an overview of the main types of node constraints with some examples
and a short description.
Example 4.9
The following example declares that the value of property schema:name must be a literal and
the value of schema:follows must be an IRI.
1 :User {
2 schema:name Literal ;
3 schema:follows IRI
4 }
68 4. SHAPE EXPRESSIONS
Table 4.2: Node kinds
Value Description Examples
Literal Any RDF literal "Alice"
"Spain"@en
42
true
IRI Any RDF IRI <http://example.org/Alice>
ex:alice
:bob
BNode Any blank node _:x
[]
NonLiteral Any IRI or blank node <http://example.org/alice>
_:x
4.5.2 DATATYPES
Like most schema languages, ShEx includes datatype constraints which declare that a focus
node must be a literal with some specific datatype. ShEx has special support for XML Schema
datatypes [9] for which it checks that the lexical form also conforms to the expected datatype.
1 :User {
2 schema:name xsd:string ;
3 foaf:age xsd:integer ;
4 schema:birthDate xsd:date ;
5 }
As we said, for XML Schema datatypes, ShEx also checks that the lexical form matches
the expected datatype. For example, the foaf:age of :dave is "Unknown"^^xsd:integer and although
it declares that "Unknown" is an integer and some RDF parsers allow those declarations, "Unknown"
does not have the integer’s lexical form and the ShEx processor will complain. The same happens
for the value of schema:birthDate.
1 :Picture {
2 schema:name xsd:string ;
3 schema:width cdt:distance ;
4 schema:height cdt:distance
5 }
1 :Country {
2 schema:name rdf:langString ;
3 }
The pattern constraint (‘/regex/’) is based on the XPath regular expression function
fn:matches(str,re,flags)which takes as parameters the string to match, the regular expression,
and an optional flags parameter to modify the matching behavior.
XPath regular expressions are based on common conventions from other languages like
Perl or other Unix tools like grep. The regular expression language is a string composed of the
characters to match and some characters which have special meaning called meta-characters.
• x matches the 'x' character.
• \u0078 matches the unicode codepoint U+78 (which is again 'x').
• . matches any character.
• [vxz] declares a character class, and matches any of 'v', 'x', or 'z'.
• \d is a pre-defined character class which matches any digit. It is equivalent ot “[0-9]”.
• \S is a pre-defined character class which matches any space character (which also includes
tabs and newlines). It is equivalent ot “[\u0008\u000d\u000a\u0020]”.
Inside character classes, the symbol “^” means negation and “-” can be used to declare
character ranges. For instance, the character class [^a-zA-Z] matches any non-letter.
Cardinality (repetition) operators can be used to specify how many characters are matched.
The possibilities are as follows.
• ? represents zero or one values.
• + one or more values.
• * zero or more values.
72 4. SHAPE EXPRESSIONS
• {m,n} between m and n values.
Any string of characters must be matched in the order of its characters with the following
alterations.
• “()” declares a group which is useful for cardinality and alternatives. For example: “\^ab(
cd|ef){2,}gh” matches “abcdcdcdghij”.
All of the meta characters above will be treated as a literal (i.e., they match themselves) if
they are prefixed with a \\ (backslash).
Table 4.4 contains several examples of regular expression matches.
Regular Expression Some Values that Match Some Values that Don’t Match
P\d{2,3} P12 P234 A1 P2n P1 P2233
(pa)*b b pab papab papapab . . . pa po
(pa)*b b pab papab papapab . . . pa po
[a-z]{2,3} ab abc a abcd 23
[a-z]{2,3} ab abc a abcd x45 23
• i: Case-insensitive mode.
• m: Multi-line mode. If present, the ^ character matches the start of any line (not only the
start of the string) and the $ matches the end of any line (not only the end of the string).
• s: If present, the dot matches also newlines, otherwise it matches any character except
newlines. This mode is called single-line mode in Perl.
• q: All meta characters are interpreted as literals, i.e., they match themselves in the input
string. q is compatible with the i flag. If it’s used with the m, s or x flag, that flag is ignored.
4.5. NODE CONSTRAINTS 73
4.5.4 VALUE SETS
A value set is a node constraint which enumerates the list of possible values that a focus node
may have. In ShExC, value sets are enclosed by square brackets ([ and ]) where each possible
value is separated by a space.
1 :Product {
2 schema:color [ "Red" "Green " "Blue" ] ;
3 schema:manufacturer [ :OurCompany :AnotherCompany ]
4 }
Unit value sets A common pattern is to declare that a node must have a specific value. This
can be done by a unit value set, i.e., a value set with a single value.
Example 4.15
1 :Spanish {
2 schema:country [ :Spain ]
3 }
4
5 :User {
6 a [ schema:Person ]
7 }
Note that the :User shape employs the a keyword which stands for rdf:type. There is no
inference in ShEx, even for rdf:type, which is treated as any other arc. See Section 3.2 for a
discussion of the difference between shapes and classes.
Language-tagged values As seen above, value sets contain one or more values. The examples
so far have included IRI and strings (literals with a datatype of xsd:string). These match precisely
the same value in the data. They can also be language tags, which match any literal with the given
language tag.
Example 4.16
1 :FrenchProduct {
2 schema:label [ @fr ]
3 }
4
5 :SpanishProduct {
6 schema:label [ @es @es -AR @es -ES ]
7 }
Ranges We can see in the example above that it would be convenient to accept literals with any
language tag starting with "es". This can be indicated with the postfix operator ‘~’. For example,
Argentinian, Chilean, and other region codes for Spain could be accepted with ‘schema:label [
@es~ ]’.
1 :SpanishProduct {
2 schema:label [ @es~ ]
3 }
This also works for strings, e.g., ‘"+34"~’ (French telephone numbers) and IRIs, e.g., ‘<http:
//www.w3.org/ns/>~’ (W3C namespaces).
IRIs represented as prefixed names can also have a postfix ‘~’, e.g., foaf:~ represents the set
of all URIs that start with the namespace bound to the prefix foaf:.
Example 4.19
In the following example, we declare that the status of a product must start by http://
example.codes/good. or http://example.codes/bad..
3 :Product {
4 :status [ codes:good .~ codes:bad .~ ]
5 }
Exclusions It can also be useful to exclude some values from a range. Exclusions are marked
by the minus - sign. For example: codes:~ - codes:unknown represents all values starting by codes:
except codes:unknown.
Exclusions can themselves be ranges. For example: codes:~ - codes:bad.~ represents all
values starting by codes: except those that start by codes:bad..
Example 4.20 Range exclusions
The following code prescribes that the status of products can be anything that starts with
codes: except codes:unknown or codes starting with codes:bad..
3 :Product {
4 :status [ codes: ~ - codes:unknown - codes:bad .~ ]
5 }
Exclusions must be the same kind (IRI, string or language tag) as the stem type. For
instance, ‘[ codes:good.~ - "bad."- @fr~ ]’ would be malformed as it’s an IRI range excluding a
string and a language stem.
Heterogeneous value sets There is no requirement that value sets be composed of a consistent
kind of value (IRI, string or language tag). For instance, the status of a product can be the IRIs
(:Accepted or :Rejected) or a string, e.g., “unknown”.
Example 4.21
1 :Product {
2 schema:status [ :Accepted :Rejected " unknown " ]
3 }
4.5. NODE CONSTRAINTS 77
Wildcard stem ranges Sometimes we want to accept user data with any value except some
specific values. For this, a wildcard character (‘.’) followed by one or more exclusions can be used
(so long as those exclusions are all of the same kind). The kind of the exlcusions (IRI, string, or
language tag) establishes the type of RDF term that will be matched.
3 :Product {
4 :status [ . - codes:bad ]
5 }
Value set expressivity Value sets are mostly a shorthand syntax for complex Boolean com-
binations of node constraints. ShEx includes them because they are much more concise and,
given their ubiquity in other schema languages, they are fundamental to how people model and
understand data.
can be defined without value sets using the OR operator that will be presented in Section 4.6.
1 :User {
2 schema:gender [ schema:Male ]
3 } OR {
4 schema:gender [ schema:Female ]
5 }
78 4. SHAPE EXPRESSIONS
4.6 SHAPES
In the previous section we explored node constraints and how they declare a set of permissi-
ble RDF terms. Most of the examples used node constraints in triple constraints, limiting the
permissible values for triples in the input graph.
and we will try to validate the nodes :alice and :bob represented in the following data:
1 :alice schema:name "Alice" ; # V Passes as :User
2 schema:knows :bob .
3
To solidify our intuition of validating shapes, we need to think of this as a series of steps
to validate a focus node against a shape expression.
2. :Useris a shape so check if the neighborhood of :alice matches the triple expression in the
shape :User. This step means that one needs to find a way to distribute the triples in the
neighborhood to satisfy the triple expression.
3. The shape’s triple expression is a single triple constraint so all one needs to do is find
the triple with a matching predicate in the neighborhood. In this case, the triple :alice
schema:name "Alice".
4. The triple expression has a value expression so consider the object, "Alice", as the focus
node and test it against the node constraint (in this case xsd:string).
6. The cardinality of the triple constraint is {1,1} (the default one) and as there is only one
tripe matching the node conforms to the shape expression.
When the same steps are performed to check :bob, the last step will have 34 as the focus
node. This test fails so :bob fails to conform to :User.
4.6. SHAPES 79
Shape A shape is a container for a triple expression along with some properties stating how
to treat triples not matching the triple expression. We will describe these properties after in-
troducing triple expressions (Section 4.6.8). Since triple expressions are combinations of triple
constraints, we start with them.
Example 4.25 The following shape is defined by a single triple constraint whose components
are depicted in Figure 4.7.
1 :Product {
2 schema:productId xsd:string {1 ,2}
3 }
:Product {
schema:productId xsd:string {1,2} Triple constraint
}
Property Node Cardinality
constraint
Closing a property Triple constraints have an implicit meaning of closing the possible values
of a property. In the previous example, the declaration schema:productId xsd:string requires all
values of schema:productId to satisfy xsd:string. That’s why :p6 failed to conform: although it had
one string value, the other value wasn’t.
This behavior can be modified with the directives EXTRA and CLOSED that will be shown in
Section 4.6.8.
4.6.2 GROUPINGS
The EachOf operator combines two or more triple expressions. All the sub-expressions must be
satisfied by triples in the neighborhood of the focus node. EachOf is indicated by a semicolon
(;) in the compact syntax.
Example 4.26 A :User is defined by an EachOf expression that combines three triple con-
straints. A node satisfies the :User type if all the three triple constraints are satisfied.
1 :User {
2 schema:name xsd:string ;
3 foaf:age xsd:integer ;
4 schema:email xsd:string
5 }
4.6.3 CARDINALITIES
Cardinalities indicate the required number of triples satisfying the given constraint. They are
most often used on triple constraints although they can also be applied to more complex expres-
sions. Table 4.5 gives an overview of the different representations of cardinalities in ShExC.
If the cardinality is not specified, the default value is {1} (exactly one).
1 :User {
2 schema:name xsd:string ;
3 schema:worksFor IRI ? ;
4 schema:follows IRI *
5 }
6
7 :Company {
8 schema:founder IRI ?;
9 schema:employee IRI {1 ,100}
10 }
4.6.4 CHOICES
The pipe or choice operator | can be used to declare compose complex triple expressions with
the meaning that one of the branches must be satisfied.
A typical pattern consists of combining OneOf (| operator) with EachOf (;) to form more
complex expressions.
Example 4.30
The following shape declares that nodes must have either one schema:name or a combination
of zero or more schema:givenName and one schema:lastName.
1 :User {
2 schema:name xsd:string |
3 ( schema:givenName xsd:string + ;
4 schema:familyName xsd:string
5 )
6 }
A typical pattern is to add some cardinality to an expression formed by the OneOf (|)
operator.
1 :Product {
2 schema:productId xsd:string ;
3 ( schema:isRelatedTo @:Product |
4 schema:isSimilarTo @:Product ){0 ,2}
5 }
84 4. SHAPE EXPRESSIONS
Example 4.32
The following schema declares that nodes conforming with :User must have a property
schema:name with xsd:string and another property schema:worksFor whose value must conform with
an anonymous shape _:1 which must have rdf:type with the value :Company.
1 :User {
2 schema:name xsd:string ;
3 schema:worksFor @_:1
4 }
5
6 _:1 { a [ :Company ] }
Example 4.33
1 :Grandson {
2 :parent { :parent . + }+ ;
3 }
5 :Company {
6 a [ schema:Company ] ;
7 ^ schema:worksFor @:User +
8 }
86 4. SHAPE EXPRESSIONS
With the following data, node :Company1 conforms to :Company because there are two nodes,
:alice and :bob that work for it. However, node :Company2 does not conform because there are no
node pointing to it by the property schema:worksFor and node :Company3 also fails because the node
that works for it, does not conform to shape :User.
1 :alice schema:name "Alice"; # V Passes as :User
2 schema:worksFor :Company1 .
15 :x schema:gender schema:Female,
16 schema:Male .
Remember that ShEx distributes the triples to triple constraints in a triple expression (see
Section 4.6). This means the same triple cannot contribute for satisfying two different triple
constraints, even if its object satisfies the node constraints for both. That is why the node :frank
does not conform to the :User shape even if its parent satisfies both conditions.
Extra Properties
As we described in Section 4.6.1 triple constraints close properties by default. Sometimes, it is
useful to open a property to permit instances of it which are not included in the schema. The
EXTRA qualifier can be used to allow the appearance of other properties.
88 4. SHAPE EXPRESSIONS
A shape of the form
1 <Shape > EXTRA <property > {
2 <property > <NodeConstraint >
3 }
is equivalent to:
1 <Shape > {
2 <property > <NodeConstraint > ;
3 <property > (Not <NodeConstraint >)*
4 }
which means that it allows zero or more values of <property> that do not satisfy <NodeConstraint>.
Note that that there is a hidden negation in any shape that includes an EXTRA qualifier.
Notice that in the case of :bob is passes although it follows :emily which is not Spaniard.
If we remove the EXTRA declaration it would fail.
A typical pattern using EXTRA declarations is to constrain the set of required values of a
node but to allow other values.
1 :Company1 {
2 a [ schema:Organization ] ;
3 a [ org:Organization ]
4 }
5
Closed Shapes
A shape can be declared to have only the triples matching a given set of triple constraints and
no others using the keyword CLOSED.
6 :User2 CLOSED {
7 schema:name xsd:string ;
8 schema:knows IRI*
9 }
4.7 REFERENCES
4.7.1 SHAPE REFERENCES
A node constraint can be a shape reference, which has the form @label where label is the identifier
of another shape expression in the schema. Shape expression reference would be a more precise
name but is long enough to be awkard.
5 :Company {
6 schema:name xsd:string
7 }
schema:worksFor
:User :Company
schema:name: xsd:string
schema:employee
7 :Teacher {
8 a [ schema:Person ];
9 schema:name xsd:string ;
10 :teaches @:Course *
11 }
12
13 :Course {
14 schema:name xsd:string ;
15 :university @:University ;
16 :hasStudent @:Student +
17 }
18
19 :Student {
20 a [ schema:Person ];
21 schema:name xsd:string ;
22 schema:mbox IRI ;
23 :hasFriend @:Student * ;
24 :isEnroledIn @:Course *
25 }
Notice the separation between the types and shapes of nodes. Both :Teacher and :Student
must have rdf:type with value schema:Person, but their properties are different.
As can be seen, ShEx can model any kind cyclic or recursive model in a natural way. The
only restriction is when combining recursion with negation, as we will explain in Section 4.8.3
where the negation operator NOT is introduced.
:hasCourse
:University :Course
:Teacher :Student
:hasFriend
Although at the time of this writing, the ShEx specification does not define a mechanism
like the :service above, it is expected that future mechanisms like that will be developed.
The “\&:name” directive can be considered to insert the value of :name into its place. Logically,
:Employee is equivalent to this:
4.7.5 ANNOTATIONS
ShEx allows to provide annotations, which are lists of pairs (predicate,object) where predicate
is an IRI and object is any RDF node. Annotations provide additional information about the
elements to that they are applied, which can be triple constraints, EachOf, OneOf, or shapes.
The compact syntax for annotations uses two slashes // followed by a predicate and an
object.
1 :Person {
2 schema:name xsd:string
3 // rdfs:label "Name"
4 // rdfs:comment "Name of person " ;
5
6 schema:birthDate xsd:date
7 // rdfs:label " birthDate "
8 // rdfs:comment "Birth of date" ;
9 }
In this case, each triple constraint has its specific annotations which are internally repre-
sented as triples.
At the time of this writing ShEx does not have any built-in annotation vocabulary. It is ex-
pected that some specific annotations could be used for future uses like user interface generation
or any other use case.
Operation Description
AND S1 AND S2 is satisfied if and only if both are satisfied
OR S1 OR S2 is satisfied if and only if S1 or S2 (or both) are satisfied
NOT NOT S is satisfied if and only if S is not satisfied
4.8.1 CONJUNCTION
The AND operator forms a new shape expression from two shape expressions with the meaning
that a node conforms to S1 AND S2 if it conforms to both S1 and S2.
4 :Product {
5 schema:productId xsd:string AND MINLENGTH 5 AND MAXLENGTH 10
6 }
If the left-hand side of the conjunction is a node constraint, the AND keyword can be omit-
ted.
Reusing shape expressions A common situation is to declare a set of constraints that we want
to repeat.
3 :User {
4 schema:name xsd:string ;
5 schema:worksFor @:CompanyConstraints ;
6 schema:affiliation @:CompanyConstraints
7 }
8
9 :CompanyShape {
10 schema:founder xsd:string ;
11 }
Another example of shape reuse is to extend a shape with more constraints emulating a
kind of inheritance as in Object-Oriented languages.
11 :dave a schema:Person ; # V
Passes as :Person, :User and Student
12 schema:name "Carol" ;
13 schema:email <carol@example .org >;
14 :course :algebra .
Notice that this kind of reuse requires the shapes extended to be compatible with the new
ones. Otherwise, there will be no nodes satisfying them.
For example, we may want to declare a :Teacher shape extending :User but adding the
constraint that teachers have no email.
1 :Teacher @:User AND {
2 schema:email . {0 ,0} ;
3 }
However, there will be no nodes satisfying it, because shape :User prescribes that they must
have exactly one schema:email, while the extended shape :Teacher prescribes that they must have
no schema:email.
In order to obtain the desired model, it is necessary that the shapes to be extended are
general enough to be compatible with the new shapes. In this case, for example, it would be
better to declare that the cardinality of schema:email in :User was optional.
4.8.2 DISJUNCTION
The Or operator combines two shape expressions with an inclusive disjunction, i.e., either one
side or the other, or both must be satisfied.
1 :Product {
2 rdfs:label xsd:string OR rdf:langString ;
3 schema:releaseDate xsd:date OR xsd:gYear OR
4 [ "unknown -past" "unknown - future " ]
5 }
Emulating recursive property paths SPARQL property paths are a very expressive feature
that can define complex expressions. ShEx does not support property paths in order to have a
more controlled way to define shapes. However, using nested shapes (see Example 4.33), recur-
sion and logical operators, it is possible to emulate their behavior.
A common use case for Not is to check other shapes. Defining a shape :NotS as Not :S, all
nodes in an RDF graph can be valid, some of them will conform to :S while the others will
conform to :NotS. In this way, a continuous integration system can define the shape map that all
nodes must satisfy (either positive or negatively) and check whether they satisfy it or not.
Both nodes :alice and :bob conform to one of the shapes, :alice to :User and :bob to :NoUser.
1 :alice schema:name "Alice" ; # V
Passes as :User
2 schema:birthDate "1980 -03 -10"^^ xsd:date .
5 :NoName2 Not {
6 schema:name xsd:string
7 }
The behavior differs for node :bob which conforms to :NoName2. The reason is that it fails to
have a string value for schema:name so it fails to conform to the shape {schema:name xsd:string} and
thus, conforms to :NoName2.
1 :alice schema:name "Alice". # X Fails as :NoName1 and :NoName2
IF-THEN pattern A common pattern is the IF-THEN construct: if some condition holds,
then a given shape expression must be satisfied.
This pattern can be modeled using the logical operators OR and NOT. Remember that IF x
THEN y is equivalent to (NOT x)OR y.
IF-THEN-ELSE pattern The IF-THEN-ELSE pattern construct can be defined in a similar way.
In this case:
IF X THEN Y ELSE Z ((NOT X) OR Y) AND (X OR Z)
1 :Product (
2 NOT { a [ schema:Vehicle ] } OR
3 { schema:vehicleEngine . ;
4 schema:fuelType .
5 }
6 ) AND ({ a [ schema:Vehicle ] } OR
7 { schema:category xsd:string } )
With the following data, nodes :kitt and :c23 conform to :Product each one passing one
of the branches, while :bad1 and :bad2 do not conform.
1 :kitt a schema:Vehicle ; # V Passes as :Product
2 schema:vehicleEngine :x42 ;
3 schema:fuelType :electric .
7 :Person {
8 schema:name xsd:string
9 }
It is easy to check that :bob conforms to :Person (he has schema:name with a xsd:string value),
so he shaves a person, but:
Does :bob conform to :Barber?
If we assume he does, then it should not shave another barber, but as he shaves himself,
and we assumed he conformed to :Barber then he fails the constraint of not shaving barbers which
means that he should not conform. On the other hand, if we assumed he does not conform to
:Barber then he satisfies both constraints, and he should conform to :Barber.
This kind of problems that arise when combining negation and recursion have been studied
by the logic programming and databases community. Several approaches have been studied such
as negation-as-failure, stratified negation and well-founded semantics [1].
ShEx imposes a constraint to avoid ill formed data models: whenever a shape refers to
itself either directly or indirectly, the chain of references cannot traverse an occurrence of the
negation operation NOT.
The previous shape :Barber violates the negation requirement as is has one self reference
pointing to itself that includes a negation. More formally, we say that there is a dependency
from :ShapeA to :ShapeB if the definition of :ShapeA contains a reference @:ShapeB.
We say that a dependency from :ShapeA to :ShapeB is a negative dependency if at least one
of the following holds:
4.9. SHAPE MAPS 105
• the occurrence of @:ShapeB in the definition of :ShapeA appears under an occurrence of the
negation operator NOT; and
• there is a triple constraint :prop @:ShapeB in the definition of :ShapeA and the property :prop
is declared as EXTRA in the corresponding triple expression.
In the latter case, the negation operator NOT does not appear explicitly, but we still need to
verify that a :ShapeB is not satisfied in some neighbor nodes. This was called hidden negation in
Section 4.6.8.
associates all subjects of property schema:worksFor and all nodes of type schema:Person with :User,
and all objects of property schema:worksFor with shape :Company.
Any node in the data graph which is both of type schema:Person and the subject of a
schema:worksFor triple would be selected by both triple patterns and associated with :User in the
fixed map. Such duplicates are eliminated in accordance with the rule that a shape map can have
no duplicate pairs of nodeSelector and shapeLabel.
While the nodeSelector may be a triple pattern, it may also be an RDF node as we would
see in a fixed shape map. Common idioms of query map can do the following.
• Explicitly bind nodes to shapes. This effectively adds one nodeSelector/shapeLabel pair to
the shape map. This mechanism is employed in SHACL with the declaration sh:targetNode
(see Section 5.7).
4.9. SHAPE MAPS 107
Fixed Shape Map
:alice@:User,
{FOCUS schema:worksFor _}@:User ShapeMap :bob@:User,
{FOCUS rdf:type schema:Person _}@:User, Resolver :Carol@:User,
{_ schema:worksFor FOCUS }@:Company :c1@:Company,
:c2@:Company,
RDF Graph
:alice a :User .
:carol a :User ;
schema:worksFor :c1 .
Figure 4.10: Shape map resolution which accepts a query shape map and emits a fixed shape map.
• Declare that all nodes with some property must match a given shape. This mechanism is
also defined in SHACL with the declarations sh:targetSubjectsOf and sh:targetObjectsOf.
• Select nodes with a given property and value. This refinement of the previous approach is
especially useful for general-purpose predicates like rdf:type. In fact, the SHACL directive
sh:targetClass offers a similar selection mechanism for the rdf:type predicate (the difference
is that SHACL uses the notion of SHACL instance), see 5.7.2). As with the above selectors,
this one is very use-case specific—one may not want to say that everything with an rdf:type
property should be validated against a :Person, but it may be reasonable to select everything
with type :Employee.
While it is not currently part of the shape map specification, the Wikidata use of shape
maps extends the nodeSelector to contain a SPARQL query, enabling another common use case.
• Select nodes or node/shape pairs by SPARQL query or inference. Where earlier mech-
anisms are all limited to either a direct identification of an RDF node or its selection by
triple pattern, this one enables a more nuanced heuristics in the selection of focus nodes.
Query shape maps are not the only way to select focus nodes. For instance, it would make
sense to associate a shape with a service endpoint. The Linked Data Platform [93] defines a
notion of container which handles requests to get, create, modify and delete objects with a given
structure. While it does not specify a mechanism to publish that structure or validate incoming
data against it, earlier work at OSLC used Resource Shapes for that purpose. It is reasonable to
assume that protocols like the linked data platform will exploit shapes technology, perhaps with
the added precision of using HTTP Link headers to specify a node of interest, which would be
associated with the related shape with that interface.
108 4. SHAPE EXPRESSIONS
4.9.3 RESULT SHAPE MAPS
The product of validation is a result shape map which is annotated with errors encountered while
testing the conformance of each node/shape pair. The result shape map is again an extension of
the fixed map. Each nodeSelector/shapeLabel association in the result shape map may include
any of these three additional components:
• result: either conformant or nonconformant;
• reason: a human-readable report, usualy to explain a non-conformant result; or
• appInfo: a machine readable structure.
Engines vary in how they report errors, and they may add extra information to the resulting
shape map. Some implementations extend this to include machine-readable failure messages in
case of errors or recursive proof of conformance in case of success.
Example 4.62 Full validation process
Given the following ShEx schema:
1 :User {
2 schema:name xsd:string ;
3 schema:knows @:User *
4 }
After applying the validation process, the result shape map obtained would be:
1 :alice@:User,
2 :bob@:User,
3 :carol@:User
Figure 4.11 depicts a whole validation process with the different shape maps involved.
4.9. SHAPE MAPS 109
ShExSchema
:User {
schema:name xsd:string ;
Query Shape Map schema:knows @:User
}
{FOCUS schema:knows _}@:User
RDF Graph
:alice schema:name "Alice";
schema:knows :carol .
Figure 4.11: Full validation process with query, fixed, and result shape map.
If we were to individually validate :alice and :bob, we would validate :bob twice, once while
validating :alice’s schema:knows arc and once for the explicit call to validate :bob.
3 :Event {
4 schema:startDate xsd:dateTime %js:{ let start = o %} ;
5 schema:endDate xsd:dateTime %js:{ let end = o %} ;
6 }
The following example checks that the declared area of a rectangle is effectively its width
times height.
1 prefix js: <http: // shex.io/ extensions /javascript >
2
3 :Rectangle {
4 :height xsd:float %js:{ let height = o %} ;
5 :width xsd:float %js:{ let width = o %} ;
6 :area xsd:float %js:{ o = height * width %}
7 }
Semantic actions have been employed to transform RDF files to other formats like XML
or JSON [80], or even other ShEx schemas as performed by the Map extension.4
The test suite defines a single extension language called Test5 that can fail a validation
and/or return a message.
1 :TeacherBefore EXTRA a {
2 a [ :Teacher ]? ;
3 schema:name xsd:string ;
4 :teaches @:Course *
5 }
6
7 :TeacherAfter EXTRA a {
8 a [ :Teacher ];
9 a [ :Person ];
10 schema:name xsd:string ;
11 :teaches { a [ :Course ] } @:Course
12 }
13
14 :Course {
15 a [ :Course ]?
16 }
If we validate the following RDF data before applying inference, nodes :bob and :carol do
not conform to shape :TeacherAfter
1 :alice a :Teacher , :Person ; # V Passes as :TeacherBefore
2 schema:name "Alice" ; # V Passes as :TeacherAfter
3 :teaches :algebra .
4
11 :algebra a :Course .
12 :teaches rdfs:domain :Teacher .
13 :teaches rdfs:range :Course .
14 :Teacher rdfs:subClassOf :Person .
4.12. IMPORTING SCHEMAS 113
On the other side, if we validate the previous RDF graph after applying RDF Schema
inference, both :bob and :carol should conform to :TeacherAfter.
This combination of shapes before and after inference can be used to check the behavior
of a reasoner. For example, if in the previous case, a faulty RDFS reasoner does not infer that
:logic must have rdf:type :Course, :bob would not conform to :TeacherAfter and the bug could be
detected.
3 :Employee {
4 &:name ;
5 schema:worksFor <CompanyShape >
6 }
7
8 :Company {
9 schema:employee @:Employee ;
10 schema:founder @:Person ;
11 }
Example 4.67
The following ShEx schema
1 PREFIX : <http: // example .org/>
2 PREFIX schema: <http: // schema .org/>
3 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
4
5 :User IRI {
6 schema:name xsd:string ;
7 schema:knows @:User *
8 }
6 <> a sx:Schema ;
7 sx:shapes :User .
9 :User a sx:ShapeAnd ;
10 sx:shapeExprs (
11 [ a sx:NodeConstraint ;
12 sx:nodeKind sx:iri ]
13 [ a sx:Shape ;
6 Note that a value of -1 in max means unbounded.
4.13. RDF AND JSON-LD SYNTAX 115
14 sx:expression [
15 a sx:EachOf ;
16 sx:expressions (
17 [ a sx:TripleConstraint ;
18 sx:predicate schema:name ;
19 sx:valueExpr [
20 a sx:NodeConstraint ;
21 sx:datatype xsd:string
22 ]
23 ]
24 [ a sx:TripleConstraint ;
25 sx:predicate schema:knows ;
26 sx:valueExpr :User;
27 sx:min 0 ;
28 sx:max -1
29 ] )
30 ] ] ).
4.14 SUMMARY
In this chapter we learned about the ShEx language.
• ShEx was designed as a human-readable language for RDF description and validation.
• There are two syntaxes for ShEx: A compact syntax and an RDF-based.
• Shape Expressions can be combined using the logical operators: AND, OR, and NOT on
top of triple expressions.
• Triple expressions declare the topology of the neighborhood of a node (incoming and
outgoing edges).
• Well-founded semantics of shape schemas (which are the basis of ShEx): I. Boneva, J. E.
Labra Gayo, and E. Prud’hommeaux. Semantics and validation of shapes schemas for
RDF. In International Semantic Web Conference, 2017 https://labra.github.io/pdf/
2017_SemanticsValidationShapesSchemas.pdf
CHAPTER 5
SHACL
Shapes Constraint Language (SHACL) has been developed by the W3C RDF Data Shapes
Working Group, which was chartered in 2014 with the goal to “produce a language for defining
structural constraints on RDF graphs [6].”
The first public working draft was published in October 2015 and it was proposed as a
W3C Recommendation in June 2017.1
SHACL was influenced by SPIN, and some parts from OSLC resource shapes and ShEx.
At the beginning of the Working Group activity it was considered that SHACL was going to
be an integration of all the validation approaches into a unified language. However, due to core
differences, SHACL and ShEx did not converge. Chapter 7 contains a comparison of both
languages and describes the main differences.
SHACL is divided in two parts. The first part, called SHACL Core, describes a core
RDF vocabulary to define common shapes and constraints while the second part describes an
extension mechanism in terms of SPARQL and has been called: SHACL-SPARQL.
Two working group notes have been published to extend SHACL with (a) advanced fea-
tures such as rules and complex expressions2 and (b) to enable the definition of constraint com-
ponents in Javascript (called SHACL-Javascript).3
A W3C SHACL community group4 has been created to continue working on SHACL
preparing educational contents and supporting SHACL adoption. A working group note was
also suggested for a SHACL Compact Syntax5 but it was decided to postpone it for the W3C
community group.
SHACL groups the information and constraints that apply to data nodes into some constructs
called shapes. SHACL shapes differ from ShEx shapes in the sense that they also contain in-
formation about the target nodes or set of nodes to which they can be applied.
1 https://www.w3.org/TR/shacl
2 https://www.w3.org/TR/shacl-af/
3 https://www.w3.org/TR/shacl-js/
4 https://www.w3.org/community/shacl/
5 https://w3c.github.io/data-shapes/shacl-compact-syntax/
120 5. SHACL
The syntax of SHACL is defined in terms of RDF so we will use Turtle in this book
although it is possible to employ other RDF serialization formats such as JSON-LD or
RDF/XML.
• They must have exaclty one property schema:gender whose value must be either schema:Male
or schema:Female or any xsd:string literal (lines 9–17).
• They have zero or one schema:birthDate property whose datatype must be xsd:date (lines
18–22).
• They have zero or more schema:knows properties whose nodes must be IRIs and have type
:User (lines 23–27).
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [ # Blank node 1
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] ;
9 sh:property [ # Blank node 2
10 sh:path schema:gender ;
11 sh:minCount 1;
12 sh:maxCount 1;
13 sh:or (
14 [ sh:in ( schema:Male schema:Female ) ]
15 [ sh:datatype xsd:string ]
16 )
17 ] ;
18 sh:property [ # Blank node 3
19 sh:path schema:birthDate ;
20 sh:maxCount 1;
21 sh:datatype xsd:date ;
22 ] ;
23 sh:property [ # Blank node 4
6 The example differs in the avoidance of recursion for SHACL. See Section 5.12.1.
5.1. SIMPLE EXAMPLE 121
24 sh:path schema:knows ;
25 sh:nodeKind sh:IRI ;
26 sh:class :User ;
27 ] .
SHACL defines shapes as a conjunction of constraints that nodes must satisfy. A SHACL
processor checks each of the constraints and returns validation errors for every constraint that is
not satisfied.
When no error is reported, it is assumed that the RDF graph has been validated.
6 :bob a :User; # V
Passes as :UserShape
7 schema:gender schema:Male ;
8 schema:name " Robert ";
9 schema:birthDate "1980 -03 -10"^^ xsd:date .
When an RDF graph conforms to a shapes graph, SHACL processors return a validation
report with no errors. The validation report contains the declaration:
1 [ a sh:ValidationReport ;
2 sh:conforms true
3 ].
• :dave has value different from schema:Male, schema:Female or string for property schema:gender
(the allowed values).
• :dave has value 1980 for property schema:birthDate which is of datatype integer when it should
be of datatype xsd:date.
• :dave has value :grace for property schema:knows which is not an instance of :User.
• :emily has 2 values for property schema:name when the maximum count is 1.
• _:x fails because the value of schema:knows is a blank node and must be an IRI.
When an RDF graph does not conform to a shapes graph, SHACL processors return a
validation report that contains several errors. Section 5.5 describes the validation report struc-
ture.
• TopQuadrant has an open source implementation in Java (using the Apache Jena Library)
called TopBraid SHACL API7 . It implements SHACL Core, SHACL-SPARQL, and
SHACL rules (see 5.19) and also offers a command line tool. TopQuadrant is the company
behind TopBraid Composer, which is a commercial interactive development environment
for semantic web and linked data applications. TopBraid Composer (including the free
edition) includes a version of the API for RDF validation.
7 https://github.com/TopQuadrant/shacl
5.2. SHACL IMPLEMENTATIONS 123
• SHACL Playground,8 an online SHACL demo implemented in Javascript by TopQuad-
rant.
• SHACLex9 implements SHACL Core (it also implements ShEx). It has been written in
Scala based on a simple and generic RDF Library (currently it works on top of Apache
Jena library but there are plans to use other libraries). SHACLex can be used to deploy an
online validator service and an online demo is deployed in Heroku.10
• Netage SHACL Engine13 implemented in Java (using the Jena Library) by Nicky van
Oorschot. It has support for SHACL-SPARQL.
• RDFUnit.15 A test driven data-debugging framework that runs test cases against RDF
data and records any violations in structured form. Besides its SPARQL-based constraint
definition language, RDFUnit supports rule translation from multiple formats i.e. OWL
under closed world semantics, OSLC and DSP. At the time of this writing, RDFUnit
supports a very big part of SHACL-Core and SHACL-SPARQL16 . One of the future
plans for RDFUnit is to support ShEx through the SHACLex implementation.
• ELI Validator, by the ELI (European Legislation Identifier) Initiative18 which is based on
the TopBraid SHACL API.
• SHACL for rdf4j19 (formerly Sesame) developed as a Google Summer of Code 2017
project.
8 http://shacl.org/playground/
9 http://labra.github.io/shaclex/
10 http://shaclex.herokuapp.com/
11 http://ns.inria.fr/sparql-template
12 http://corese.inria.fr/
13 http://www.netage.nl
14 https://github.com/linkeddata/shacl-check
15 http://aksw.org/Projects/RDFUnit.html
16 https://github.com/AKSW/RDFUnit/issues/62
17 https://github.com/pfps/shacl
18 http://labs.sparna.fr/eli-validator/
19 https://github.com/eclipse/rdf4j
124 5. SHACL
5.3 BASIC DEFINITIONS: SHAPES GRAPHS, NODE, AND
PROPERTY SHAPES
A SHACL processor has two inputs: a data graph that contains the RDF data to validate and a
shapes graph that contains the shapes. Example 5.1 contains a shapes graph and Examples 5.2
and 5.3 contain two possible RDF data graphs. It is possible to use a single graph that contains
both the data and shapes graph merged.
There are two main types of shapes: node shapes and property shapes. Node shapes declare
constraints directly on a node. Property shapes declare constraints on the values associated with
a node through a path.
Property shapes have a property sh:path that declares the path that goes from the focus
node to the value that they describe. The most frequent paths are predicate paths which are
formed by a single IRI.
A node shape usually contains several property shapes which are declared through the
sh:property predicate.
Example 5.1 contained four such property shape declarations. The first one was defined
as:
1 :UserShape ...
2 sh:property [ # Blank node 1
3 sh:path schema:name ;
4 sh:minCount 1;
5 sh:maxCount 1;
6 sh:datatype xsd:string ;
7 ] ;
8 ...
Which means that nodes that conform to :UserShape must also conform to the property
shape identified by blank node 1. The path of that property shape (line 3) is the predicate
schema:name which is, in this case, a single IRI. The property shape contains several components
that declare that there can be a minimum and a maximum of one values that can be accessed
through that path (lines 4 and 5) and that they must belong to the xsd:string datatype (line 6).
Notice that in Example 5.1 we used blank nodes for property shapes and enumerated them
from 1–4 because we will refer to them when we describe the validation report in next section.
Although using blank nodes may be more readable, sometimes, it may be better to declare an
IRI for the property shapes so they can be referenced from other shapes graphs when they are
imported (see the next section).
3 :TeacherShape a sh:NodeShape ;
4 sh:targetClass :Teacher ;
126 5. SHACL
5 sh:node :UserShape ;
6 sh:property [
7 sh:path :teaches ;
8 sh:minCount 1;
9 sh:datatype xsd:string ;
10 ]
11 .
A SHACL processor validates that :alice conforms to :TeacherShape, and :bob to :UserShape
but reports that :carol does not conform to :TeacherShape.
1 :report a sh:ValidationReport ;
2 sh:conforms true .
If the data graph does not conform to the shapes graph, the validation report will have a
value false for the property sh:conforms and a set of validation errors of type sh:ValidationResult
linked by the property sh:result.
Each validation result contains metadata about the cause of the error such as sh:focusNode,
sh:value, sh:resultPath, etc. Table 5.1 describes the properties of validation results.
5.5. VALIDATION REPORT 127
Table 5.1: SHACL validation result properties
Property Description
sh:focusNode The focus node that was being validated when the
error happened.
sh:resultPath The path from the focus node. This property is op-
tional usually corresponds to the sh:path declara-
tion of property shapes.
sh:value The value that violated the constraint, when avail-
able.
sh:sourceShape The shape that the focus node was validated against
when the constraint was violated.
sh:sourceConstraintComponent The IRI that identifies the component that caused
the violation.
sh:detail May point to further details about the cause of the
error. This property can be used for reporting errors
in nested nested shapes.
sh:resultMessage Textual details about the error. This message can be
affected by the sh:message property (see Section
5.6.4).
sh:resultSeverity A value which is equal to the sh:severity value
of the shape that caused the violation error. If the
shape doesn’t have sh:severity declaration then
the default value will be sh:Violation.
Example 5.6
The validation report generated by a SHACL processor when trying to validate the shapes
graph in Example 5.1 with the data graph from Example 5.3 could be:
1 :report a sh:ValidationReport ;
2 sh:conforms false ;
3 sh:result
4 [ a sh:ValidationResult ;
5 sh:resultSeverity sh:Violation ;
6 sh:sourceConstraintComponent sh:InConstraintComponent ;
7 sh:sourceShape ... ; # blank node 2
8 sh:focusNode :dave ;
9 sh:value :Unknown ;
10 sh:resultPath schema:gender ;
128 5. SHACL
11 sh:resultMessage "Value has none of the shapes from the or list"],
12 [ a sh:ValidationResult ;
13 sh:resultSeverity sh:Violation ;
14 sh:sourceConstraintComponent sh:DatatypeConstraintComponent ;
15 sh:sourceShape ... ; # blank node 3
16 sh:focusNode :dave ;
17 sh:value 1980 ;
18 sh:resultPath schema:birthDate ;
19 sh:resultMessage "Value does not have datatype xsd:date " ],
20 [ a sh:ValidationResult ;
21 sh:resultSeverity sh:Violation ;
22 sh:sourceConstraintComponent sh:ClassConstraintComponent ;
23 sh:sourceShape ... ; # blank node 4
24 sh:focusNode :dave ;
25 sh:value :grace ;
26 sh:resultPath schema:knows ;
27 sh:resultMessage "Value is not an instance of User" ],
28 [ a sh:ValidationResult ;
29 sh:resultSeverity sh:Violation ;
30 sh:sourceConstraintComponent sh:MaxCountConstraintComponent ;
31 sh:sourceShape ... ; # blank node 1
32 sh:focusNode :emily ;
33 sh:resultPath schema:name ;
34 sh:resultMessage "More than 1 values " ],
35 [ a sh:ValidationResult ;
36 sh:resultSeverity sh:Violation ;
37 sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
38 sh:sourceShape ...; # blank node 1
39 sh:focusNode :frank ;
40 sh:resultPath schema:name ;
41 sh:resultMessage "Less than 1 values " ],
42 [ a sh:ValidationResult ;
43 sh:resultSeverity sh:Violation ;
44 sh:sourceConstraintComponent sh:NodeKindConstraintComponent ;
45 sh:sourceShape :UserShape ;
46 sh:focusNode _:x ;
47 sh:value _:x ;
48 sh:resultMessage "Value does not have node kind sh:IRI "]
49 .
Although in the rest of this chapter we will describe the different errors in natural language
for simplicity, the validation results returned by SHACL processors will have the structure above.
5.6. SHAPES 129
5.6 SHAPES
There are two types of shapes in SHACL: node shapes and property shapes. Node shapes specify
constraints about a node while property shapes specify constraints about the values that can be
reached from a node by a path.
Shape
target declarations
NodeShape PropertyShape
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path [ sh:alternativePath ( schema:knows schema:follows )] ;
5 sh:nodeKind sh:IRI ;
6 sh:minCount 1
7 ] ;
8 sh:property [
9 sh:path ([ sh:oneOrMorePath schema:knows ] schema:email ) ;
10 sh:nodeKind sh:IRI
11 ].
A SHACL processor verifies that :alice conforms to shape :UserShape because it has
schema:email with an IRI value and all the nodes that can be reached by the property schema:knows
one or more times followed by the property schema:email (which is equivalent to schema:knows+/
schema:email using SPARQL notation) are also IRIs.
The SHACL processor would return error for :dave because one of the values of
schema:knows has an schema:email that is not an IRI (:emily).
declares a node shape :UserShape with two constraints which are associated with the following
constraint components:
• with the value sh:IRI for the parameter sh:nodeKind. The con-
sh:NodeKindConstraintComponent
straint means that nodes that conform to :UserShape must be IRIs; and
• sh:ClassConstraintComponent with the value schema:Person for the parameter sh:class. The con-
straint means that nodes conforming to :UserShape must be instances of schema:Person.
When a constraint component declares a single parameter, the parameter may be used
several times in the same shape. Each value of the parameter declares a different constraint. The
interpretations of such declarations is conjunctive, i.e., all constraints apply.
Example 5.10 Shape with two constraints with the same parameter
The following code:
1 :UserShape a sh:NodeShape ;
2 sh:class foaf:Person ;
3 sh:class schema:Person .
Declares two constraints with the parameter sh:class that means that nodes conforming
to :UserShape must be instances of both foaf:Person and schema:Person.
Constraint components are associated with validators which define the behavior of the
constraint.
5.6. SHAPES 133
Table 5.3: SHACL core constraint components
SHACL Core contains a list of built-in constraint components that are classified in Ta-
ble 5.3. In the table, we included the parameter names because they are shorter than the com-
ponent IRIs. Those components will be described in more detail in their corresponding sections
later in this chapter.
As we will show in Section 5.16, SHACL-SPARQL can be used to declare other con-
straint components.
And we define a shapes graph importing the previous shapes and adding a declaration for
:TeacherShape that deactivates the property :HasEmail:
1 <> owl:imports <http: // example .org/UserShapes > .
3 :TeacherShape a sh:NodeShape ;
4 sh:targetClass :Teacher ;
5 sh:node :UserShape ;
6 sh:property [
7 sh:path :teaches ;
8 sh:minCount 1;
9 sh:datatype xsd:string ;
10 ] ;
The merged shapes graph deactivates the property shape :HasEmail so nodes that conform
to :TeacherShape need to conform to :UserShape but do not need to have schema:email property.
Given the following RDF data:
1 :alice a :Teacher ; # V
Passes as :TeacherShape
2 schema:name "Alice" ;
3 schema:email <mailto:alice@example .org >;
4 :teaches "Logic" .
6 :bod a :Teacher ; # V
Passes as :TeacherShape
7 schema:name " Robert " ;
8 schema:email "This email is not an IRI";
9 :teaches " Algebra " .
A SHACL processor checks that :alice and :bob conform to :TeacherShape even if :bob does
not conform to the :HasEmail shape. It returns the following error:
• :carol does not conform to :TeacherShape because it does not conform to :UserShape as the
value of property schema:name does not have datatype xsd:string.
5.7. TARGET DECLARATIONS 137
5.7 TARGET DECLARATIONS
SHACL shapes may define several target declarations. Target declarations specify the set of
nodes that will be validated against a shape. Table 5.4 contains the different target declarations
defined in SHACL core.
SHACL targets provide the same functionality as the ShEx Shape maps (see 4.9). We
discuss the core differences in section 7.4.
Table 5.4: SHACL target declarations
Value Description
sh:targetNode Directly point to a node
sh:targetClass All nodes that are instances of some class
sh:targetSubjectsOf All nodes that are subjects of some predicate
sh:targetObjectsOf All nodes that are objects of some predicate
1 :UserShape a sh:NodeShape ;
2 sh:targetNode :alice , :bob , :carol ;
3 sh:property [
4 sh:path schema:name ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string ;
8 ] .
A SHACL processor checks that :alice conforms to :UserShape and returns the errors:
138 5. SHACL
• :bob does not have have value for property schema:name
• :carol has a value which is not a xsd:string for property schema:name.
Notice that it ignores :dave as it was not affected by the sh:targetNode declaration.
sh:targetNode provides a similar functionality to the ShEx Fixed shape map (see 4.9.1).
However, the difference is that SHACL target nodes silently ignore missing target nodes from
the data graph, while in ShEx, we get back a failure. Depending on the data and constraint
modeling approach, silent ignore may lead to false-positives and thus, target nodes should be
used with caution.
A SHACL validator checks that both :alice and :emily conform to :UserShape and returns
the following errors:
Implicit target class declarations conflate the concept of shape and class as a single entity.
This can be a dangerous practice in the open semantic web as they are different concepts (see 3.2).
It can also be a very convenient feature to associate shape constraints with classes, and the Data
Shapes Working Group decided to support it.
In this book, we opt to separate shapes and classes, using the following pattern:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 ...
The system checks that :alice has shape :UserShape and signals the error:
5.8. CARDINALITY 141
• :bob does not have property schema:name.
The system checks that :alice has shape :UserShape and signals the error:
The system ignores :carol as it is not the object of the :isTaughtBy property.
5.8 CARDINALITY
Cardinality constraint components specify restrictions on the minimum and maximum number
of distinct value nodes. Table 5.5 defines the cardinality constraint component parameters in
SHACL. The default cardinality in SHACL for property shapes is {0,unbounded}.
Operation Description
sh:minCount Restricts minimum number of value nodes.
If not defined, there is no restriction (no minimum).
sh:maxCount Restricts maximum number of value nodes.
If not defined, there is no restriction (unbounded).
• :bob has less than two values for the property schema:follows; and
• :carol has more than three values for the property schema:follows.
5.9.1 DATATYPES
sh:datatype specifies the datatype that a focus node must have.
5.9. CONSTRAINTS ON VALUES 143
Table 5.6: Constraints on values
Operation Description
sh:datatype Specifies the values must be literals with some datatype.
sh:class Specifies that values must be SHACL instances of some
class.
sh:nodeKind Possible values: sh:BlankNode, sh:IRI,
sh:Literal, sh:BlankNodeOrIRI,
sh:BlankNodeOrLiteral, sh:IRIOrLiteral.
sh:in Enumerates the value nodes that a property is allowed to
have.
sh:hasValue A node must have a given value.
Remember that all literals in the RDF data model have an associated datatype (see Sec-
tion 2.2). Plain string literals have xsd:string datatype by default.
SHACL contains a list of built-in datatypes that are based on XML Schema datatypes
(which are the same as in SPARQL 1.1). For those datatypes SHACL processors also check
that the lexical form conforms to the datatype rules. This means that something like "Unknown"^^
xsd:date is not a well-typed literal because "Unknown" does not conform to the xsd:date rules.
A SHACL processor validates that :alice has shape :User and returns the following errors:
• :bob has a value for path schema:birthDate that is not a xsd:date (it is an integer);
• :carol has a value for path schema:name that is not a xsd:string (it is an IRI); and
10 :aCompany a :Organization .
11 :aUniversity a :University .
12 :University rdfs:subClassOf :Organization .
146 5. SHACL
A SHACL processor verifies that :alice and :bob conform to shape :User and returns the
following error:
• :carol has the value :Unknown for property schema:worksFor which is not a SHACL instance
of :Organization.
A SHACL processor verifies that :alice and :dave conform to shape :UserShape and returns
the following errors:
• :bob has a value that is not a literal for property schema:name.
• :carol has a value that is not a blank node or IRI for property schema:follows.
A SHACL processor verifies that :alice conforms to :UserShape and returns the following
errors:
148 5. SHACL
• :bob has a value for schema:gender that is not in the list ( schema:Male schema:Female) because
schema:Male is not equal to schema:male.
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:affiliation ;
5 sh:hasValue :OurCompany
6 ] .
A SHACL processor verifies that :alice conforms to :UserShape and returns the following
errors:
• :bob does not have value :OurCompany for property schema:affiliation; and
verifies that :alice and :carol conform to shape :User and reports the errors:
• :strange has a blank node as the value for property schema:name whose length can’t be cal-
culated.
In the case of :carol, notice that the example depends on the length of the prefixed name
:Carol which will be calculated after concatenating the IRI associated with the empty prefix :
to Carol. In this case, if : is associated with http://example.org/, the processor will evaluate the
length of http://example.org/Carol (which is 24) and fails because it is bigger than 20.
4 :bus a :Product ;
5 schema:productID "p567" . # V Passes as :Product
7 :truck a :Product ;
8 schema:productID "P12" . # X Fails as :Product
10 :bike a :Product ;
11 schema:productID "B123" . # X Fails as :Product
A SHACL processor verifies that :car and :bus conform to :Product and returns the fol-
lowing errors:
• :bike has a value for schema:productID that does not start with P or p.
1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path rdfs:label ;
5 sh:languageIn ("es" "en" "fr")
6 ] .
1 :CountryShape a sh:NodeShape ;
2 sh:targetClass :Country ;
3 sh:property [
4 sh:path skos:prefLabel ;
5 sh:uniqueLang true
6 ] .
• Node :usa has more than one language for English at property skos:prefLabel.
In the previous example, a node without skos:prefLabel (e.g., :italy) also conforms to
:CountryShape.
12 :usa a :Country ; # X
Fails as :CountryShape
13 skos:prefLabel "USA"@en,
14 " United States "@en.
In this case, :italy fails because it has no skos:prefLabel, :france fails because if has one
value that is not in English or Spanish, and :usa fails because it has more than one value in
English.
154 5. SHACL
5.11 LOGICAL CONSTRAINTS: AND, OR, NOT, XONE
The operators sh:and, sh:or, xone, and sh:not can be used to form complex constraints.
Their semantics is described in Table 5.8. sh:and, sh:or, and sh:not have the traditional
meaning of the corresponding Boolean operators while sh:xone (exactly one) is similar to the
exclusive-or when applied to two arguments. When applied to more than 2 arguments, the
former requires exactly one, while the latter requires an odd number of arguments to be satisfied.
Operation Description
sh:and sh:and (S1 ... SN) specifies that each value node must conform
to all the shapes S1 ... SN.
sh:or sh:or (S1 ... SN) specifies that each value node conforms to at
least one of the shapes S1 ... SN.
sh:not sh:not S specifies that each value node must not conform to S.
sh:xone sh:xone (S1 ... SN) specifies that exactly one node conforms to
one of the shapes S1 ... SN.
5.11.1 AND
A node conforms to a shape containing the sh:and operator if it conforms to all the shapes linked
by it.
The following example declares a :User shape as the conjunction of two property shapes.
In case of complex expressions, using sh:and may improve readability. One example is using
sh:and to extend one shape with other constraints.
4 :bob a :User ; # X
Fails as :User
5 schema:name " Robert Smith"; # long name
6 schema:email <bob@example .org > .
12 :dave a :Student ; # V
Passes as :Person,:User and Student
13 schema:name "Dave" ;
14 schema:email <carol@example .org >;
15 :course :algebra .
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 157
5.11.2 OR
The parameter sh:or declares a disjunction between several shapes.
4 :bob a :User ;
5 foaf:name " Robert " . # V Passes as :User
7 :carol a :User ;
8 foaf:name "Carol"; # V Passes as :User
9 schema:name "Carol" .
11 :dave a :User ;
12 rdfs:label "Dave" . # X Fails as :User
A SHACL processor checks that :alice, :bob, and :carol conform to :UserShape but returns
an error on :dave.
For this particular example, the use of sh:or could be replaced by a SHACL property with
sh:alternativePath:
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:alternativePath ( schema:name foaf:name )] ;
4 sh:minCount 1;
5 ] .
158 5. SHACL
Example 5.36 Union of datatypes
A common use case of sh:or is to declare the union of several datatypes. The following
example declares that products must have a rdfs:label which must be either a xsd:string or a
language tagged literal, and must have a release date that must be either a xsd:date, or xsd:gYear
or the string "unknown-past" or "unknown-future".
1 :ProductShape a sh:NodeShape ;
2 sh:targetClass :Product ;
3 sh:property [
4 sh:path rdfs:label ;
5 sh:or (
6 [ sh:datatype xsd:string ]
7 [ sh:datatype rdf:langString ]
8 );
9 sh:minCount 1;
10 sh:maxCount 1
11 ];
12 sh:property [
13 sh:path schema:releaseDate ;
14 sh:or (
15 [ sh:datatype xsd:date ]
16 [ sh:datatype xsd:gYear ]
17 [ sh:in ("unknown -past" "unknown - future ")]
18 );
19 sh:minCount 1;
20 sh:maxCount 1
21 ];
22 .
A SHACL processor checks that :p1, and :p2 conform to :ProductShape but returns an error
on :p3.
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 159
5.11.3 EXACTLY ONE
A node conforms to a shape containing the sh:xone operator if it conforms to exactly one of the
shapes linked by it.
The semantics of sh:xone is different from Exclusive OR (XOR) when there are more than 2
arguments. XOR is usually defined as requiring conformance of an odd number of arguments,
while sh:xone requires conformance of exactly one.
Given the previous shape declaration and the following RDF graph:
1 :alice a :User ; # V Passes as :User
2 schema:name "Alice" .
A SHACL processor checks that :alice and :bob conform to :User but gives errors for
:carol and :dave.
The sh:xone constraint component only checks that exactly one of its arguments is satisfied.
160 5. SHACL
When defining complex models, it must be used with caution as its behavior may not be
the intended one.
Note, however, that xone does not reject everything we might expect it to:
1 :alice a :User ; # V Passes as :UserShape
2 schema:name "Alice " .
14 :dave a :User ; # V
Passes as :UserShape
15 schema:name "Dave" ; # But it should fail
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 161
16 schema:familyName "King" .
In the case of :dave it passes although the intended meaning is that it should fail (it con-
forms to one of the branches but partially matches the other one).
The solution is to change the expression representing each alternative at the top-level
excluding the other ones. In which case, sh:xone is not required and sh:or is enough. Note that
sh:maxCount 0 plays the role of negation.
The SHACL code equivalent to Example 4.30 is:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:or (
4 [ a sh:NodeShape ;
5 sh:property [
6 sh:path schema:name ;
7 sh:datatype xsd:string ;
8 sh:minCount 1;
9 sh:maxCount 1
10 ] ;
11 sh:property [
12 sh:path schema:givenName ;
13 sh:maxCount 0
14 ] ;
15 sh:property [
16 sh:path schema:familyName ;
17 sh:maxCount 0
18 ] ;
19 ]
20 [ a sh:NodeShape ;
21 sh:property [
22 sh:path schema:name ;
23 sh:maxCount 0;
24 ] ;
25 sh:property [
26 sh:path schema:givenName ;
27 sh:datatype xsd:string ;
28 sh:minCount 1;
29 ] ;
30 sh:property [
31 sh:path schema:familyName ;
32 sh:datatype xsd:string ;
33 sh:minCount 1;
34 sh:maxCount 1
35 ] ;
36 ]
37 ) .
162 5. SHACL
With this definition the node :dave would now fail as expected. Note that this definition
can become quite verbose for more complex expressions (see Section 7.13 for a longer example).
5.11.4 NOT
The parameter sh:not specifies the condition that each node must not conform to a given shape.
1 :ProductShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productID ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ];
6 sh:or (
7 [ sh:not [
8 sh:property [
9 sh:path rdf:type ;
5.11. LOGICAL CONSTRAINTS: AND, OR, NOT, XONE 163
10 sh:hasValue schema:Vehicle
11 ]]
12 ]
13 [ sh:property [
14 sh:path schema:vehicleEngine ;
15 sh:minCount 1; sh:maxCount 1
16 ] ;
17 sh:property [
18 sh:path schema:fuelType ;
19 sh:minCount 1; sh:maxCount 1
20 ] ;
21 ]
22 ) .
A SHACL processor checks that :p1 and :p2 conform to :ProductShape but signals an error
for :p3.
IF-THEN-ELSE pattern In the same way as before, an IF-THEN-ELSE can also be de-
clared. Remember that: IF A THEN B ELSE C is equivalent to IF A THEN B AND IF NOT A THEN C
With the following data, nodes :kitt and :c23 conform to :Product each one passing one
of the branches, while :bad1 and :bad2 do not conform.
1 :kitt a schema:Vehicle ; # V Passes as :Product
2 schema:vehicleEngine :x42 ;
3 schema:fuelType :electric .
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:worksFor ;
5 sh:node :Company ;
6 ] .
8 :CompanyShape a sh:NodeShape ;
9 sh:property [
10 sh:path schema:name ;
11 sh:datatype xsd:string ;
12 ] .
9 :CompanyShape a sh:NodeShape ;
10 sh:property [
11 sh:path schema:name ;
12 sh:datatype xsd:string ;
13 ]
14 .
12 :OurCompany
13 schema:name " OurCompany " .
15 :Another
16 schema:name 23 .
• :bob does not conform to shape :UserShape because the value of schema:worksFor (:Another)
has 23 as schema:name which does not have datatype xsd:string.
• :carol does not conform to shape :UserShape because it does not have a name.
:User
schema:knows
Given the following data, :alice and :bob conform to :User while :carol and :dave do not
conform. :dave fails because the value of schema:name is not a xsd:string and :carol fails because
the value of schema:knows does not conform to :User.
1 :alice schema:name "Alice" ;
2 schema:birthDate "1995 -06 -03"^^ xsd:date ;
3 schema:knows :bob .
10 :dave schema:name 23 .
The behavior of a SHACl processor with :User shape is undefined because it is recursive: :User is
defined in terms of itself. Validation with recursive shapes is undefined and it is left to processor
implementations. Some processors may support it while others may produce an error.
Sometimes recursion appears indirectly when one shape refers to other shapes that refer
to others, and eventually, one of the shapes refers to the first one.
A direct representation of the data model could be the following SHACL shapes graph:
1 :User a sh:NodeShape ; # Undefined shapes graph because :User and :Company
2 sh:property [ # refer to each other recursively
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 ];
8 sh:property [
9 sh:path schema:worksFor ;
10 sh:node :Company ;
11 ] .
13 :Company a sh:NodeShape ;
14 sh:property [
15 sh:path schema:legalName ;
16 sh:minCount 1;
17 sh:maxCount 1;
5.12. SHAPE-BASED CONSTRAINTS 169
18 sh:datatype xsd:string ;
19 ] ;
20 sh:property [
21 sh:path schema:employee ;
22 sh:minCount 1 ;
23 sh:node :User ;
24 ] .
The previous shapes are mutually recursive and again, the behavior of SHACL processors
is undefined.
Avoiding recursion using target declarations Target declarations can be used to avoid recur-
sion by directly selecting which nodes we want to validate.
1 :User a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:name ;
5 sh:datatype xsd:string ;
6 sh:minCount 1;
7 sh:maxCount 1;
8 ];
9 sh:property [
10 sh:path schema:birthDate ;
11 sh:datatype xsd:date ;
12 sh:maxCount 1;
13 ];
14 sh:property [
15 sh:path schema:knows ;
16 sh:class :User ;
17 ].
• It does not return a violation for :carol because our only requirement is that the value of
schema:knows is an instance of :User and :dave is declared to be an instance of :User (although
it does not validate).
This approach has the advantage that it not only finds instances of class :User, but also
instances of subclasses of :User. For example, if we declare:
1 :grace a :Teacher ; # V Passes as :User
2 schema:name "Grace" ;
3 schema:knows :heidi .
The system would check that both :grace and :heidi conform to the :User shape.
Being able to validate future subclasses of a given class may be helpful if there are some
unexpected changes in the hierarchy. Nevertheless, it also has the problem of requiring a dis-
criminating rdf:type declaration for every instance which may not always be possible.
Another possibility is to use other target declarations such as sh:targetSubjectsOf or
sh:targetObjectsOf.
1 :User a sh:NodeShape ;
2 sh:targetSubjectsOf schema:worksFor ;
3 sh:targetObjectsOf schema:employee ;
4 sh:property [
5 sh:path schema:name ;
6 sh:datatype xsd:string ;
7 sh:minCount 1;
8 sh:maxCount 1;
9 ]
10 .
12 :Company a sh:NodeShape ;
13 sh:targetSubjectsOf schema:employee ;
14 sh:targetObjectsOf schema:worksFor ;
15 sh:property [
16 sh:path schema:legalName ;
17 sh:datatype xsd:string ;
18 sh:minCount 1;
19 sh:maxCount 1
20 ] ;
21 sh:property [
22 sh:path schema:employee ;
23 ] .
Simulating recursion with property paths SHACL property paths can be used to simulate
recursion in some cases. The idea is combining sh:zeroOrMorePath with an auxiliary shape that
172 5. SHACL
defines the structure of the expected shape without recursion. The recursion is implicitly defined
by the property path.
7 :UserStructure a sh:NodeShape ;
8 sh:property [
9 sh:path schema:name ;
10 sh:datatype xsd:string ;
11 sh:minCount 1;
12 sh:maxCount 1;
13 ] ;
14 sh:property [
15 sh:path schema:birthDate ;
16 sh:datatype xsd:date ;
17 sh:maxCount 1;
18 ]
19 .
Where :UserStructure is a non-recursive auxiliary shape that defines the structure of nodes
conforming to :User. Figure 5.4 depicts the new model.
schema:Name xsd:string
schema:birthDate xsd:date ?
• It returns violation errors for :carol, :dave and :emily. In this case, :carol fails to validate as
expected.
Indirect recursion more tricky to simulate as it is difficult to determine the property path
that can be used.
schema:Name xsd:string
sche schema:birthDate xsd:date ?
ma:w
orks
For
ee
mploy
:Company em a:e :CompanyStructure
sch
schema:Name xsd:string
(schema:employee/schema:worksFor)*
1 :User a sh:NodeShape ;
2 sh:property [
174 5. SHACL
3 sh:path [ sh:zeroOrMorePath ( schema:worksFor schema:employee ) ];
4 sh:node :UserStructure
5 ] ;
6 sh:property [
7 sh:path schema:worksFor ;
8 sh:node :CompanyStructure
9 ] .
11 :UserStructure a sh:NodeShape ;
12 sh:property [
13 sh:path schema:name ;
14 sh:datatype xsd:string ;
15 sh:minCount 1; sh:maxCount 1;
16 ]
17 .
19 :Company a sh:NodeShape ;
20 sh:property [
21 sh:path [ sh:zeroOrMorePath ( schema:employee schema:worksFor ) ];
22 sh:node :CompanyStructure
23 ] ;
24 sh:property [
25 sh:path schema:employee ;
26 sh:node :UserStructure
27 ] .
29 :CompanyStructure a sh:NodeShape ;
30 sh:property [
31 sh:path schema:legalName ;
32 sh:datatype xsd:string ;
33 sh:minCount 1; sh:maxCount 1
34 ] .
The previous solution does not scale well for more involved data models were cycles can
appear by different means. As an exercise, the reader can try to simulate the cyclic data model
depicted in Figure 4.9 or the WebIndex data model from Figure 6.1.
In the shapes graph of example 5.50, there is no constraint the declares that the biological
parents must not be male or female at the same time. Using the following data:
1 :oscar a :User ; # V Passes as :UserShape
2 schema:parent :x .
4 :x :isMale true;
5 :isFemale true .
Node :oscar conforms to :UserShape which seems counter intuitive as it has a single parent
that satisfies both being female and male at the same time. There are two solutions, the first one
is to add the previous declaration that sh:minCount 2 and sh:maxCount 2 for property schema:parent.
In this way, :oscar would not conform because it has only one parent. Another solution is to
declare that the qualified value shapes are disjoint as follows.
Qualified value shapes contain a Boolean optional parameter
sh:qualifiedValueShapesDisjoint. It it is true, then the value nodes must not conform to
any of the sibling shapes. The default value is false.
Using this parameter, we could add the constraint that nodes that satisfy the female con-
straint are disjoint from nodes that satisfy the male constraint in the case of biological parents.24
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:property [
4 sh:path schema:parent ;
5 sh:qualifiedValueShape [
6 sh:path :isMale ;
24 Notice that forcing this condition in general may not always be desirable in some contexts.
5.13. CLOSED SHAPES 177
7 sh:hasValue true
8 ] ;
9 sh:qualifiedMinCount 1 ;
10 sh:qualifiedMaxCount 1 ;
11 sh:qualifiedValueShapesDisjoint true
12 ];
13 sh:property [
14 sh:path schema:parent ;
15 sh:qualifiedValueShape [
16 sh:path :isFemale ;
17 sh:hasValue true
18 ] ;
19 sh:qualifiedMinCount 1 ;
20 sh:qualifiedMaxCount 1 ;
21 sh:qualifiedValueShapesDisjoint true
22 ].
Parameter Description
sh:closed Valid resources must only have values for properties
that appear as values of sh:path in property shapes.
sh:ignoredProperties List of predicates that are also permitted in addition
to those that are explicitly enumerated.
A SHACL processor will check that both :alice and :bob conform to :UserShape but will
return the error:
• :carol does not conform to :UserShape because it has an extra property schema:cookTime which
is not allowed.
Note that sh:closed does not take into account SHACL property paths or constraints with
sh:node, sh:and, sh:or , etc.
A SHACL processor:
• Checks that :alice conforms to :User.
• Fails for nodes :bob and :carol because they use properties schema:knows and schema:worksFor
in a closed shape.
A solution is to add those predicates to the list of ignored properties:
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
3 sh:closed true ;
4 sh:ignoredProperties ( rdf:type
5 schema:knows
6 schema:worksFor );
7 sh:property [
8 sh:path schema:name ;
9 sh:datatype xsd:string ;
10 ] ;
11 sh:property [
12 sh:path [ sh:zeroOrOnePath schema:knows ] ;
13 sh:nodeKind sh:IRI ;
14 ] ;
15 sh:node [
16 sh:property [
17 sh:path schema:worksFor ;
18 sh:nodeKind sh:IRI ;
19 ] ] .
An advice to use sh:closed is to enumerate all relevant properties as direct values of sh:path,
or add them to the sh:ignoredProperties list.
180 5. SHACL
5.14 PROPERTY PAIR CONSTRAINTS
Property pair constraints specify conditions in relation to other properties. These constraint com-
ponents can only be used in property shapes. Table 5.10 lists the parameters that can be used
to declare property pair constraints. All the predicates have a similar behavior, they compare
pairs of values of the current and referenced property on the current focus node and check the
condition.
Table 5.10: Property pair constraints
Operation Description
sh:equals The sets of values from both properties at a given
focus node must be equal.
sh:disjoint The sets of values from both properties at a given
focus node must be different.
sh:lessThan Current values must be smaller than than values of
another property.
sh:lessThanOrEquals Current values must be smaller or equal than than
values of another property.
A SHACL processor checks that :alice conforms to :UserShape and returns the following
errors.
• :bob has a different value for foaf:firstName and schema:givenName.
• :carolhas the same value for schema:givenName and schema:lastName when they should be
different.
A SHACL processor checks that :concert1 conforms to :ConcertShape and reports the fol-
lowing.
• The value of schema:doorTime must be less than or equal to the value of schema:startDate in
:concert2.
• The value of schema:startDate must be less than the value of schema:endDate in :concert2.
Operation Description
sh:name Specifies human-readable labels for a property shape.
sh:description Specifies a description of a property shape.
sh:order Indicates the relative order of a property shape in a form. A typical use
case is to display the property shapes sorted according to the values of
sh:order. The values must be decimals.
sh:group Group several property shapes together. Each group may have addi-
tional triples for different purposes like rdfs:label for form build-
ing. Groups can also have a sh:order value.
sh:defaultValue Describes the default value for a property. This value may be used by
form builders to pre-populate input fields.
30 :nameGroup a sh:PropertyGroup ;
31 rdfs:label "Name" .
33 :addressGroup a sh:PropertyGroup ;
34 rdfs:label " Address " .
An application could generate a web form like the one in Figure 5.6.
Name
Given name:
Family name:
Address
Street address:
Country: Spain
1 :UserShape a sh:NodeShape ;
2 sh:targetClass :User ;
25 http://W3C.github.io/data-shapes/shacl-js/
5.16. SHACL-SPARQL 185
3 sh:sparql [
4 a sh:SPARQLConstraint ;
5 sh:message " schema:name must equal schema:givenName + schema:familyName ";
6 sh:prefixes [
7 sh:declare [
8 sh:prefix " schema " ;
9 sh:namespace "http: // schema .org/"^^ xsd:anyURI ;
10 ]
11 ] ;
12 sh:select
13 """ SELECT $this ( schema:name AS ?path) (? name as ? value )
14 WHERE {
15 $this schema:name ?name .
16 $this schema:givenName ? givenName .
17 $this schema:familyName ? familyName .
18 FILTER (! isLiteral (? value) ||
19 ! isLiteral (? givenName ) ||
20 ! isLiteral (? familyName ) ||
21 concat (str (? givenName ), ' ', str (? familyName ))!=? name
22 )
23 }""" ;
24 ] .
A SHACL processor checks that :alice conforms to :UserShape and returns the error:
• :bobdoes not conform to :UserShape because values of schema:name must be equal to the
concatenation of schema:givenName and schema:familyName.
• sh:parameter associates a parameter declaration with the constraint component. The dec-
laration has a value for sh:path that must be an IRI and may have a Boolean value for
sh:optional (if not present, it is assumed false by default).
The local name of the IRI associated by sh:path will be taken as the local name for the
parameter. For example, in the following parameter declaration:
1 sh:parameter [
2 sh:path :listOfLength ;
3 sh:optional true ;
4 ]
The local name of the parameter is listOfLength and is used as a SPARQL (or Javascript)
variable that is prebound to the component parameter value.
• sh:labelTemplate can be used to specify how the constraint will be rendered. The value is a
string that can contain references to parameter names inside curly brackets. For example:
"Checks the list has {?listOfLength} values".
1 :FixedListConstraintComponent
2 a sh:ConstraintComponent ;
3 rdfs:label "Fixed list constraint component " ;
4 sh:parameter [
5 sh:path :size ;
6 sh:name "Size of list" ;
7 sh:description "The size of the list" ;
8 ] ;
9 sh:labelTemplate "Size of values: \"{ $size }\"" ;
10 sh:propertyValidator [
11 a sh:SPARQLSelectValidator ;
12 sh:message "{$PATH } must have length {? size}, not {? count }" ;
13 sh:prefixes [ sh:declare [
14 sh:prefix "rdf" ;
15 sh:namespace "http: // www.w3.org /1999/02/22 - rdf -syntax -ns#"
16 ]
17 ] ;
18 sh:select """
19 SELECT $this ?value $count WHERE {
20 $this $PATH ?value .
21 { { SELECT $this ?value (COUNT (? member ) AS ? count ) $size WHERE {
22 ?value rdf:rest */ rdf:first ? member
23 } GROUP BY $this ?value $size
24 }
25 FILTER (! isBlank (? value ) || ?count != $size )
26 }
27 }"""
28 ] .
A SHACL processor would validate that :p1 conforms to :Product but would report the
following errors.
• For :p2 the error message ":color must have length 3, not 4".
• For :p3 the error message ":color must have length 3, not 0".
• For :p4 the message that sh:minCount failed because there are no values for property :color.
Notice that the following example, although similar in effect, is not a valid shape defini-
tion:
1 :ProductShape a sh:NodeShape ;
2 sh:targetObjectsOf :color ;
3 :size 3 .
IRI Name
http://www.w3.org/ns/entailment/RDF RDF
http://www.w3.org/ns/entailment/RDFS RDF Schema
http://www.w3.org/ns/entailment/OWL-Direct OWL 2 direct semantics
5.17. SHACL AND INFERENCE SYSTEMS 189
Example 5.58 Example with entailment
The following shapes graph declares a :Teacher shape as someone that has property :teaches
with a value that is an instance of :Course and has rdf:type with value :Person. It also requires RDF
Schema entailment.
12 :algebra a :Course .
• :alice conforms to teacher with or without RDFS entailment, because it has rdf:type
:Person and it :teaches :algebra, and :algebra has rdf:type :Course.
• :bob only conforms if RDF Schema entailment is performed, because it infers that it has
rdf:type :Person and that :logic rdf:type :Course. Without RDF Schema entailment it fails.
190 5. SHACL
• :carolconforms to :Teacher even without RDF Schema entailment activated and even if it
does not have rdf:type :Person. The reason is that it is a SHACL instance of :Person (see
Section 5.7.2).
Although SHACL does not require inference, it has a special treatment for the properties
rdfs:subClassOf, rdf:type and owl:imports.
The operator -> declares a sh:targetClass and the dot operator . separates the different
constraint components.
• Node expressions a set of predefined functions that can be used to compute values from
focus nodes, e.g., compute a display label for an IRI.
A SHACL processor with SPARQL-based target support checks that :alice conforms to
shape and signals the error.
:AlgebraTeacher
192 5. SHACL
• :bob does not have value :Mathematics for :field property.
• is ignored although it does not have :field property, because it is not selected by the
:bob
SPARQL target.
1 :User a sh:NodeShape ;
2 sh:targetClass :User
3 sh:rule [
4 a sh:TripleRule ;
5 sh:subject sh:this ;
6 sh:predicate rdf:type ;
7 sh:object :Teacher ;
8 sh:condition [
9 sh:property [
10 sh:path :teaches ;
11 sh:minCount 1 ;
12 ] ;
13 ] .
4 :bob a :User ;
5 :teaches :logic .
7 :carol a :User ;
8 :attends :algebra .
• :carol does not get an inferred triple because it does not have a value for :teaches.
5.20. SHACL JAVASCRIPT 193
5.20 SHACL JAVASCRIPT
SHACL Javascript (SHACL-JS)28 was published as a Working Group Note to enable the def-
inition of constraint components in Javascript. It is also intended to express advanced features
like custom targets, fuctions and rules in Javascript.
SHACL-JS is similar to SHACL-SPARQL but for Javascript instead of SPARQL. The
basic idea is that shapes can point to JavaScript functions available at some URL that can be
resolved from the Web. When shapes get evaluated, a SHACL-JS engine calls those functions
and constructs validation results from the results obtained by these calls. The Javascript code can
access the RDF triples available in the data and shapes graphs through a Javascript API.
Note that at the time of this writing, there are no implementation reports for SHACL JS
(the following code is speculative).
28 https://www.w3.org/TR/shacl-js/
194 5. SHACL
A SHACL-JS processor:
• checks that :JuryCommittee conforms to :VotimgCommittee; and
• returns a violation error for :CityCommittee with the message "Number of voters must be odd
to avoid ties."
5.21 SUMMARY
• SHACL is divided in two parts: SHACL Core and SHACL SPARQL.
• Shapes in SHACL contain the notion of target declarations which declare the sets of nodes
that they apply.
• There are two types of shapes: node and property shapes.
• Shapes contain a list of parameters of constraint components.
• SHACL SPARQL allows users to define their own constraint components.
• Some SHACL extensions have already been proposed like SHACL rules and SHACL
Javascript.
Applications
In this chapter we describe several applications of RDF validation. We start with the WebIndex,
a medium-size linked data portal that was one of the earliest applications of ShEx. We describe
it using ShEx and SHACL so the reader can see how both formalisms can be applied to describe
RDF data.
In Section 6.2, we present the use of ShEx in HL7 FHIR, which was one of the main
motivations for the development of ShEx.
Section 6.3 describes Springer Nature SciGraph, a real-world application of SHACL.
Section 6.4 talks about validation use cases that have emerged in the DBpedia project.
We end the chapter with two exercises: the validation of ShEx files, encoded as RDF using
ShEx itself (Section 6.5), and the validation of SHACL shapes graphs in RDF using SHACL
(Section 6.6). These exercises help us understand the expressiveness of both formalisms.
qb:observation
qb:dataSet
cex:indicator
1..n
cex:ref-area
cex:indicator
:Observation
wf:provider
rdf:type = qb:Observation, wf:Observation
cex:value : xsd:float
dct:issued : xsd:dateTime :Indicator
:Country rdfs:label : xsd:string rdf:type : cex:Primary|cex:Secondary
wf:iso2 : xsd:string cex:ref-year : xsd:gYear rdfs:label:xsd:string
rdfs:label : xsd:string dct:publisher = wf:WebFoundation ?
wf:source : IRI
cex:computation
:Computation
rdf:type : cex:Computation
The main concept is an observation of type wf:Observation which has a float value cex:value
for a given indicator, as well as the country, year, and dataset. Observations can be raw observa-
tions, which are obtained from an external source, or computed observations, which are obtained
from other observations by computational processes.
A dataset contains a number of slices, each of which also contains a number of observa-
tions. Indicators are provided by an organization of type org:Organization, which is based on the
Organization ontology.
Datasets are also published by organizations.
A sample from the DITU dataset provided by ITU (International Telecommunication
Union) states that, in 2011, Spain had a value of 23.78 for the TU-B (Broadband subscribers per
100 population) indicator. This information is represented in Turtle as:
1 :obs8165
2 a qb:Observation, wf:Observation ;
3 rdfs:label "ITU B in ESP" ;
4 dct:issued "2013 -05 -30 T09:15:00 "^^ xsd:dateTime ;
5 cex:indicator :ITU_B ;
6 qb:dataSet :DITU ;
7 cex:value "23.78"^^ xsd:float ;
8 cex:ref -area :Spain ;
9 cex:ref -year "2011"^^ xsd:gYear ;
6.1. DESCRIBING A LINKED DATA PORTAL 197
10 cex:computation :comp234 .
Data following the WebIndex data model is richly interrelated. Observations are linked
to indicators and to datasets. Datasets contain links to slices. Slices have links both to indicators
and back to observations. Both datasets and indicators are linked to the organizations by which
they are published or made available. Such links are illustrated in the following example:
1 :DITU a qb:DataSet ;
2 qb:structure wf:DSD ;
3 rdfs:label "ITU Dataset " ;
4 dct:publisher :ITU ;
5 qb:slice :ITU09B ,
6 :ITU10B,
7 ...
8 :ITU09B a qb:Slice ;
9 qb:sliceStructure wf:sliceByArea ;
10 qb:observation :obs8165,
11 :obs8166,
12 ...
13 :ITU a org:Organization ;
14 rdfs:label "ITU" ;
15 foaf:homepage <http: // www.itu.int/> .
17 :Spain
18 wf:iso2 "ES" ;
19 rdfs:label "Spain" .
21 :ITU_B a wf:SecondaryIndicator ;
22 rdfs:label " Broadband subscribers %";
23 wf:provider :ITU .
For verification, the WebIndex data model includes a representation of computations that
declare how each observation has been obtained, either from a raw dataset or computed from
the observations of other datasets. The structure of computation descriptions, presented in [56],
is omitted here for simplicity.
In the next section we formally define the structure of this simplified WebIndex data
model using ShEx and review the main differences with the original.
In this example, we deliberately omitted the requirement for a rdf:type declaration. This
means that, in order to satisfy the :Country shape, a node need only have the properties that have
been specified and may or may not include rdf:type declarations.
By default, shape definitions are open meaning that additional triples with different pred-
icates may be present, so nodes of shape :Country could have other properties beyond those pre-
scribed by the shape.
The shape of datasets is described as follows:
1 :DataSet { a [ qb:DataSet ],
2 qb:structure [ wf:DSD ],
3 rdfs:label xsd:string ?,
4 qb:slice @:Slice +,
5 dct:publisher @:Organization
6 }
This says that nodes conforming to :DataSet shape must have rdf:type with value qb:DataSet, a
qb:structure of wf:DSD, an optional rdfs:label of type xsd:string, one or more qb:slice predicates
whose object is the subject of a set of triples matching the :Slice shape definition and exactly
one dct:publisher, whose object is the subject of a set of triples matching the :Organization shape.
The :Slice shape is defined in a similar fashion:
1 :Slice { a [ qb:Slice ],
2 qb:sliceStructure [ wf:sliceByYear ],
3 qb:observation @:Observation +,
4 cex:indicator @:Indicator
5 }
The :Observation shape in the WebIndex data model has two rdf:type declarations, which
indicate that they must be instances of both the RDF Data Cube class of Observation
(qb:Observation) and the wf:Observation class from the Web Foundation ontology. The property
dct:publisher is optional, but if it appears, it must have value wf:WebFoundation.
Values conforming to :Observation shape can either have a wf:source property of type IRI
(which, in this context, is used to indicate that it is a raw observation that has been taken from
the source represented by the IRI), or a cex:computation property whose value conforms to the
:Computation shape.
It should be noted that shapes do not define the semantics of an RDF graph. While the
designers of the WebIndex dataset model have determined that a raw observation would be
indicated using the wf:source predicate and with the object IRI referencing the original source,
ShEx simply states that, in order for a subject to satisfy the :Observation, it must include either a
wf:source or a cex:computation predicate, period. Meaning must be found elsewhere.
1 :Observation {
2 a [ qb:Observation ],
6.1. DESCRIBING A LINKED DATA PORTAL 199
3 a [ wi:Observation ],
4 cex:value xsd:float ,
5 dct:issued xsd:dateTime ,
6 dct:publisher [ wf:WebFoundation ]?,
7 qb:dataSet @:DataSet ,
8 cex:ref -area @:Country ,
9 cex:indicator @:Indicator ,
10 cex:ref -year xsd:gYear ,
11 ( wf:source IRI
12 | cex:computation @:Computation
13 )
14 }
In the case of organizations, we declare these as closed shapes using the CLOSED modifier
and only allow the properties rdfs:label, foaf:homepage and rdf:type, which must have the value
org:Organization. The EXTRA modifier is used to declare that we allow other values for the rdf:type
property (using the Turtle keyword a).
1 :Organization CLOSED EXTRA a {
2 a [ org:Organization ],
3 rdfs:label xsd:string ,
4 foaf:homepage IRI
5 }
Shape Expressions offer an intuitive way to describe the contents of linked data portals.
They have been used to document both the WebIndex1 and another data portal with a simi-
lar model, the Landbook2 data portal. Their documentation defines templates for the different
shapes of resources and for the triples that can be retrieved when dereferencing those resources.
These templates define the dataset structure in a declarative way and can serve as a contract be-
tween developers of the data portal contents and designers of the data model. Having a good
1 http://weso.github.io/wiDoc
2 http://weso.github.io/landportalDoc/data
200 6. APPLICATIONS
data model with a corresponding Shape Expressions specification facilitated the communication
between the various stakeholders involved.
The data model described in this chapter differs from the original one for readability and
didactic proposes in the following ways:
• We omitted the representation of computations, which are represented as single nodes with
type cex:Computation. A more detailed description of computations was described at [56].
We have also simplified the representation of the webindex structure, which was composed
of sub-indexes, components and other properties such as labels and provenance informa-
tion.
• We defined the shapes of countries to include just two simple properties. We deliber-
ately omit the mandatory use of rdf:type declaration to show that it is possible to have
nodes without that declaration. In the original WebIndex data model all countries had a
mandatory rdf:type arc but there were several generated nodes which did not have rdf:type
declarations. As we omitted the representation of computations we decided to offer that
possibility for countries as an example.
Appendix A includes the full version of the WebIndex ShEx description used in this book.
As can be seen, the :Country shape is defined by two constraints which specify that the
datatype of rdfs:label and wf:iso2 properties must be xsd:string and that wf:iso2 has length 2.
The default SHACL cardinality constraint is [0..*] meaning that cardinality constraints
that are omitted in ShEx grammar must be explicitly stated in SHACL as:
1 sh:minCount 1; sh:maxCount 1 ;
6.1. DESCRIBING A LINKED DATA PORTAL 201
Optionality (? or * in ShEx) can be represented either by omitting sh:minCount or by
sh:minCount=0. An unbounded maximum cardinality (* or + in ShEx) must be represented in
SHACL by omitting sh:maxCount. As an example, the definition of the :DataSet shape declares
that rdfs:label is optional (by omitting the sh:minCount property) and declares that there must be
one or more qb:slice predicates conforming to the qb:slice definition (by omitting the value of
sh:maxCount).
The predicate sh:node is used to indicate that the value of a property must have a given
shape. In this way, a shape can refer to another shape. Note that the WebIndex data model
contains cycles—shapes refer to other shapes and those shapes can refer back to the first ones—
which can generate recursive shapes. Nevertheless, the handling of recursion in SHACL is
implementation-dependent so it is necessary to circumvent this feature following some of the
techniques shown in section 5.12.1).
1 :DataSet a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:hasValue qb:DataSet ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ] ;
6 sh:property [ sh:path qb:structure ;
7 sh:hasValue wf:DSD ;
8 sh:minCount 1; sh:maxCount 1 ;
9 ] ;
10 sh:property [ sh:path rdfs:label ;
11 sh:datatype xsd:string ;
12 sh:maxCount 1 ;
13 ] ;
14 sh:property [ sh:path qb:slice ;
15 sh:node :Slice ;
16 sh:minCount 1 ;
17 ] ;
18 sh:property [ sh:path dct:publisher ;
19 sh:node :Organization ;
20 sh:minCount 1; sh:maxCount 1 ;
21 ] .
The definition of :Slice is similar to :DataSet, so we can omit it for clarity. The full version
of the SHACL shapes that we used in this section is shown in appendix B.
There are three items that need more explanation in the SHACL definition of the
:Observation shape. The first of these is the repeated appearance of the rdf:type property with
two values. Although we initially represented it using qualified value shapes, we noticed that it
could also be represented as:
1 :Observation a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:in ( qb:Observation wf:Observation )
4 sh:property [ sh:path rdf:type ;
202 6. APPLICATIONS
5 sh:minCount 2; sh:maxCount 2
6 ] ;
7 ...
The definition of observations also contains an optional property with a fixed value. This
was defined in ShEx as:
1 :Observation { ...
2 dct:publisher ( wf:WebFoundation )?
3 ...
4 }
which means that observations can either have a property dct:publisher with the fixed value
wf:WebFoundation or they can not have that property.
A possible representation in SHACL is to use an sh:or of two shapes: one in which there
is no dct:publisher (sh:maxCount=0) and one with exactly one value for dct:published.
1 :Observation ...
2 sh:or ( [ sh:path dct:publisher ;
3 sh:maxCount 0
4 ]
5 [ sh:path dct:publisher ;
6 sh:hasValue wf:WebFoundation ;
7 sh:minCount 1 ;
8 sh:maxCount 1
9 ]
10 )
11 ...
The last item requiring additional explanation is the disjunction definition which says that
observations must have either the property cex:computation with a value of shape :Computation
or the property wf:source with an IRI value, but not both. In ShEx, it was defined as:
1 :Observation { ...
2 , ( cex:computation @:Computation
3 | wf:source IRI
4 )
5 ...
6 }
In SHACL, this declaration can be defined using the sh:xone (exactly one) property con-
straint:
1 :Observation
2 ...
3 sh:xone ( [ sh:path wf:source ;
4 sh:nodeKind sh:IRI ;
5 sh:minCount 1; sh:maxCount 1 ;
6.1. DESCRIBING A LINKED DATA PORTAL 203
6 ]
7 [ sh:path cex:computation ;
8 sh:node :Computation ;
9 sh:minCount 1; sh:maxCount 1 ;
10 ]
11 )
12 ...
In the case of indicators we can see again the separation between the :Indicator shape and
the wf:PrimaryIndicator and wf:SecondaryIndicator classes.
1 :Indicator a sh:NodeShape ;
2 sh:property [ sh:path rdf:type ;
3 sh:in ( wf:PrimaryIndicator wf:SecondaryIndicator ) ;
4 sh:minCount 1; sh:maxCount 1 ;
5 ] ;
6 ...
We defined organizations as closed shapes with the possibility that the rdf:type prop-
erty has some extra values apart from the org:Organization. This constraint can be expressed in
SHACL as:
1 :Organization a sh:NodeShape ;
2 sh:closed true ;
3 sh:ignoredProperties ( rdf:type )
4 sh:property [ sh:path rdf:type ;
5 sh:hasValue org:Organization ;
6 ] ;
7 ...
An important aspect that deserves some explanation is the use of recursion to represent
cyclic data models. While ShEx can define cyclic data models in a natural way, the lack of
recursion in SHACL needs to be circumvented.
One possibility is to add a discriminating rdf:type arc to every node so that its shape can
be associated to its class. We opted to add a sh:targetClass declaration to some shapes, such as
:Observation, conflating that shape with the class qb:Observation. Any node that contains a rdf:type
arc pointing to qb:Observation must conform to the :Observation shape declared by the WebIndex.
While this approach may be reasonable in closed contexts, it can cause problems in the
open semantic web if one combines data from other datasets. For example, we defined another
data model based on RDF data cube for the LandPortal project3 which also contained values
of type qb:Observation but with different structures. We consider that forcing every node of type
qb:Observation to have the same structure is not a good practice and that it may be better to
separate the target declarations from the shapes definitions.
3 http://landportal.info
204 6. APPLICATIONS
6.2 DESCRIBING CLINICAL RECORDS—FHIR
Fast Healthcase Interoperability Resources (FHIR)4 is a framework created by HL7, a clini-
cal standards organization, to define data formats and APIs for exchanging electronic health
records. FHIR Release 3.0 was published in March 2017 and adds support for RDF.
FHIR has a resource-oriented architecture that describes the different entities in-
volved in a clinical record. In a typical example, a patient (Patient resource) visits a clinician
(Practitioner resource), who records some observations (Observation resource), reviews some lab
results (Diagnostic results probably referencing other observations) and diagnoses a clinical is-
sue (Condition resource). These resources can be expressed interchangeably in multiple formats:
JSON, XML, and RDF.
FHIR resources are described by structure definitions in a FHIR-specific schema lan-
guage. This machine-readable language is translated into format-specific schema languages such
as XML Schema plus Schematron, JSON Schema, and ShEx.
The structure of FHIR resources is documented as machine-generated HTML tables.
Figure 6.2 shows part of the FHIR Observation resource5 .
FHIR structure definitions have two forms of limited disjunction. The first, choices of the
types of referenced resources, can be seen in subject and performer in Figure 6.2. The second is a
choice between a set of datatypes where the name of the datatype is appended to the property
name, indicated by the [x] notation (see effective[x] and value[x] in Figure 6.2). These are cap-
tured in ShEx using the shape expression ShapeOr (’OR’) and the triple expression OneOf (’|’)
respectively:
where Timing.repeat shape contains two parts: a structure definition (lines 1–24) and several
co-existence constraints (lines 25–35) which can be expressed in natural language as:
• If there is a duration, there needs to be durationUnits.
• If there’s a period, there needs to be periodUnits.
• duration shall be a non-negative value.
• period shall be a non-negative value.
• If there is a periodMax, there must be a period.
• If there is a durationMax, there must be a duration.
• If there is a countMax, there must be a count.
• If there is an offset, there must be a when (and not C, CM, CD, CV).
• If there is a timeOfDay, there cannot be a when, or vice versa.
The value set idiom of specifying a value type and a value set (e.g., <code> and fhirvs:units
-of-time) allows one to specify the structure and also to specify values within that structure.
This example is long, but it is taken directly from a use case. In fact, its length encourages
us to do a bit of factoring. While we want to keep constraints on the codes for systolic and
diastolic, we can create a separate <valueObs> shape to capture the quantity measurement.
27 <valueObs > {
28 fhir:Observation . component . valueQuantity {
29 fhir:Quantity .value { fhir:value xsd:decimal };
30 fhir:Quantity .unit { fhir:value ["mmHg"] };
31 }
32 }
This schema has two repeated properties: fhir:Observation.component with different con-
straints (one for systolic and the other for diastolic). It takes advantage of ShEx’s intuitive ad-
ditive semantics where requirements for repeated properties are simply expressed as additional
triple patterns (see section 4.6.7).
8 http://www.springernature.com/scigraph
9 https://github.com/springernature/scigraph
6.4. DBPEDIA VALIDATION USE CASES 213
5 # Identity
6 sh:property [
7 sh:path sg:scigraphId ;
8 sh:datatype xsd:string ;
9 sh:minCount 1 ;
10 sh:maxCount 1 ;
11 ] ;
12 sh:property [
13 sh:path sg:doi ;
14 sh:datatype xsd:string ;
15 sh:pattern " ^10\\.\\ d{4,5 }\\/\\ S+$" ;
16 sh:maxCount 1 ;
17 ] ;
18 # ...
20 sh:property [
21 sh:path sg:role ;
22 sh:in ( " author " " editor " " principal investigator " ) ;
23 sh:maxCount 1 ;
24 ] ;
The defined quality checks cannot capture all possible errors in a link submission pro-
cess. However, they can (a) provide a very useful feedback to the link submitter, and (b) enable
DBpedia to automatically pre-process some steps in the link generation pipeline.
• Each DBpedia class can have at most one direct super class.
• Each DBpedia property can have at most one direct super property.
• Each DBpedia property can have at most one rdfs:domain.
• Each DBpedia property can have at most one rdfs:range.
• The domain and range of each property must be defined as an owl:Class.
• Top-level DBpedia classes must be discussed before defined.
These constraints are implemented with the following SHACL definitions. RDFUnit is
used to perform the validation as well as integrate with Travis CI and automate the checks on
each commit and pull request.
1 dbo - shape:ClassShape
2 a sh:Shape ;
3 sh:targetClass owl:Class ;
4 sh:targetSubjectsOf rdfs:subClassOf ;
5 sh:severity sh:Error ;
6 sh:property [
7 sh:message "Each owl:Class should have at least one rdfs:label " ;
8 sh:path rdfs:label ;
9 sh:minCount 1;
10 sh:dataType rdf:langString ;
11 sh:uniqueLang true ;
12 ] ;
13 sh:property [
14 sh:message "Each owl:Class should have at least one rdfs:comment " ;
15 sh:path rdfs:comment ;
16 sh:minCount 1;
17 sh:dataType rdf:langString ;
18 sh:uniqueLang true ;
19 ] ;
20 sh:property [
21 sh:message "Each owl:Class should have at most one superclass " ;
22 sh:path rdfs:subClassOf ;
23 sh:maxCount 1;
24 ] ;
25 sh:sparql [
26 sh:message " DBpedia Ontology only allows 9 top level classes, any new
top level classes need to be discussed " ;
27 sh:severity sh:Warning ;
28 sh:select """
29 PREFIX owl: <http: // www.w3.org /2002/07/ owl#>
218 6. APPLICATIONS
30 PREFIX rdfs: <http: // www.w3.org /2000/01/ rdf - schema #>
31 SELECT DISTINCT $this ? otherClass
32 WHERE {
33 $this rdfs:subClassOf owl:Thing .
34 FILTER ($this NOT IN (
35 <http: // dbpedia .org/ ontology /Activity >,
36 <http: // dbpedia .org/ ontology /Agent >,
37 <http: // dbpedia .org/ ontology /Concept >,
38 <http: // dbpedia .org/ ontology / CommunicationSystem >,
39 <http: // dbpedia .org/ ontology /Condition >,
40 <http: // dbpedia .org/ ontology /Event >,
41 <http: // dbpedia .org/ ontology / PhysicalThing >,
42 <http: // dbpedia .org/ ontology /Place >,
43 <http: // dbpedia .org/ ontology /TimePeriod >)
44 ).
45 } """ ;
46 ] .
48 dbo - shape:PropertyShape
49 a sh:Shape ;
50 sh:targetClass rdf:Property ;
51 sh:targetClass owl:DatatypeProperty ;
52 sh:targetClass owl:ObjectProperty ;
53 sh:targetSubjectsOf rdfs:subPropertyOf ;
54 sh:property [
55 sh:message "Each property should have at least one rdfs:label " ;
56 sh:path rdfs:label ;
57 sh:minCount 1;
58 sh:dataType rdf:langString ;
59 sh:uniqueLang true ;
60 ] ;
61 sh:property [
62 sh:message "Each property should have at least one rdfs:comment " ;
63 sh:path rdfs:comment ;
64 sh:minCount 1;
65 sh:dataType rdf:langString ;
66 sh:uniqueLang true ;
67 ] ;
68 sh:property [
69 sh:message "Each property should have at most one rdfs:domain " ;
70 sh:path rdfs:domain ;
71 sh:maxCount 1;
72 ] ;
73 sh:property [
74 sh:message "Each property should have an rdfs:domain that is defined
as an owl:Class " ;
75 sh:path rdfs:domain ;
76 sh:class owl:Class ;
6.5. SHEX FOR SHEX 219
77 ] ;
78 sh:property [
79 sh:message "Each property should have at most one rdfs:range " ;
80 sh:path rdfs:range ;
81 sh:maxCount 1;
82 ] ;
83 sh:property [
84 sh:message "Each property should have an rdfs:range that is defined as
an owl:Class " ;
85 sh:path rdfs:range ;
86 sh:class owl:Class ;
87 ] ;
88 sh:property [
89 sh:message "Each property should have at most one super property " ;
90 sh:path rdfs:subPropertyOf ;
91 sh:maxCount 1;
92 ] .
An interesting part of this use case is the use of SHACL-SPARQL to define the complex
constraint Top-level DBpedia classes must be discussed before defined. Here, only nine specific classes
are allowed as top-level classes (i.e. classes with no superclass except owl:Thing) and are hard-
coded in the SPARQL query. Even though this creates a tight coupling of the shape to the data,
top-level DBpedia classes are not changing frequently and adjusting the constraint can indeed
stimulate discussion.
4 :User a sx:Shape ;
5 sx:expression [
6 a sx:EachOf ;
7 sx:expressions (
8 [ a sx:TripleConstraint ;
9 sx:predicate schema:name ;
10 sx:valueExpr [ a sx:NodeConstraint ;
11 sx:datatype xsd:string ]
12 ]
13 [ a sx:TripleConstraint ;
14 sx:predicate schema:gender ;
15 sx:valueExpr [ a sx:NodeConstraint ;
220 6. APPLICATIONS
16 sx:values ( schema:Male schema:Female )
17 ]
18 ]
19 )
20 ] .
In the following, we will describe the ShEx schemas that can validate RDF files in ShExR
(as above). The full code is included in the annex C and has been adapted from Appendix C
(ShEx shape) of the ShEx specification.15
ShExR graphs contain an RDF node with rdf:type sx:Schema, an optional list of starting
semantic actions, a start declaration and zero or more sx:shapes declarations whose values must
be shape expressions <ShapeExpr>.
Most of the shapes in this schema are defined as CLOSED to limit the appearance of unex-
pected triples.
1 <Schema > CLOSED {
2 a [ sx:Schema ] ;
3 sx:startActs @< SemActList1Plus >? ;
4 sx:start @<ShapeExpr >?;
5 sx:shapes @<ShapeExpr >*
6 }
As discussed in Section 4.4.3, there are six possibilities for defining shape expressions.
Which can be enumerated as:
1 <ShapeExpr > @<ShapeOr > OR
2 @<ShapeAnd > OR
3 @<ShapeNot > OR
4 @< NodeConstraint > OR
5 @<Shape > OR
6 @< ShapeExternal >
<ShapeOr> and <ShapeAnd> have a similar representation which contains a list of at least two
shape expressions represented by the <shapeExprList2Plus> shape, which will be described later.
1 <ShapeOr > CLOSED {
2 a [ sx:ShapeOr ] ;
3 sx:shapeExprs @< shapeExprList2Plus >
4 }
Node constraints are formed by one or more declarations of node kind, datatype, string
facet, numeric facet, or a list of possible values.
1 <NodeConstraint > CLOSED {
2 a [ sx:NodeConstraint ] ;
3 ( sx:nodeKind [ sx:iri sx:bnode sx:literal sx:nonliteral ]
4 | sx:datatype IRI
5 | &< stringFacet >
6 | &< numericFacet >
7 | sx:values @< valueSetValueList1Plus >
8 )+
9 }
A shape can contain the Boolean directives sx:closed and sx:extra as well as a
and an optional list of semantic actions.
sx:tripleExpression
Semantic actions contain a sx:name that points to an IRI describing the processor and a
sx:code value with the string code that will be passed to that processor.
222 6. APPLICATIONS
The values that can appear in a value set are object values, stems, or ranges:
1 <valueSetValue > @< objectValue >
2 OR @<IriStem > OR @< IriStemRange >
3 OR @< LiteralStem > OR @< LiteralStemRange >
4 OR @< LanguageStem > OR @< LanguageStemRange >
Stems and ranges are defined for the different possibilities: IRIs, literals, or language-
tagged literals.
1 <IriStem > CLOSED { a [ sx:IriStem ]; sx:stem xsd:anyUri }
6.5. SHEX FOR SHEX 223
2 <IriStemRange > CLOSED {
3 a [ sx:IriStemRange ];
4 sx:stem xsd:anyUri OR @<Wildcard >;
5 sx:exclusion @< objectValue > OR @<IriStem >*
6 }
7 <LiteralStem > CLOSED { a [ sx:LiteralStem ]; sx:stem xsd:string }
8 <LiteralStemRange > CLOSED {
9 a [ sx:LiteralStemRange ];
10 sx:stem xsd:string OR @<Wildcard >;
11 sx:exclusion @< objectValue > OR @< LiteralStem >*
12 }
13 <LanguageStem > CLOSED { a [ sx:LanguageStem ]; sx:stem xsd:string }
14 <LanguageStemRange > CLOSED {
15 a [ sx:LanguageStemRange ];
16 sx:stem xsd:string OR @<Wildcard >;
17 sx:exclusion @< objectValue > OR @< LanguageStem >*
18 }
19 <Wildcard > BNODE CLOSED {
20 a [ sx:Wildcard ]
21 }
The definition of <OneOf> and <EachOf> is very similar: they contain sx:min and sx:max cardi-
nalities. a list of at least two triple expressions, and optional list of semantic actions and a list of
annotations.
1 <OneOf > CLOSED {
2 a [ sx:OneOf ] ;
3 sx:min xsd:integer ? ;
4 sx:max xsd:integer ? ;
5 sx:expressions @< tripleExpressionList2Plus > ;
6 sx:semActs @< SemActList1Plus >? ;
7 sx:annotation @<Annotation >*
8 }
An inclusion has a predicate sx:include that points to an IRI or a blank node (non-literals).
1 <Inclusion > CLOSED {
2 a [ sx:Inclusion ]? ;
3 sx:include NONLITERAL
4 }
The following definitions declare lists of at least one element: semantic actions or value
set values.
1 <SemActList1Plus > CLOSED {
2 rdf:first @<SemAct > ;
3 rdf:rest [ rdf:nil ] OR @< SemActList1Plus >
4 }
5 <valueSetValueList1Plus > CLOSED {
6 rdf:first @< valueSetValue > ;
7 rdf:rest [ rdf:nil ] OR @< valueSetValueList1Plus >
8 }
6.6. SHACL IN SHACL 225
6.6 SHACL IN SHACL
In this section we describe how to use SHACL to validate Shapes graphs that contain SHACL
code. This is similar to what we described in the previous section although in this case we are us-
ing SHACL to validate SHACL. The full code described in this section appears in Appendix D
and has been adapted from Appendix C of the SHACL specification. We have done some mod-
ifications to the original code for readability.
The document declares the shape of shapes :ShapeShape as a sh:NodeShape that contains a
long list of target declarations to define the nodes that must be validated as shapes.
1 :ShapeShape a sh:NodeShape ;
2 sh:targetClass sh:NodeShape , sh:PropertyShape ;
3 sh:targetSubjectsOf sh:targetClass , sh:targetNode ,
4 sh:targetObjectsOf , sh:targetSubjectsOf ,
5 sh:and , sh:class , sh:closed , sh:datatype ,
6 sh:disjoint , sh:equals , sh:flags , sh:hasValue ,
7 ... # All the other constraint component parameters
8 sh:targetObjectsOf sh:node , sh:not , sh:property sh:qualifiedValueShape .
1 :ShapeShape
2 sh:xone ( :NodeShapeShape :PropertyShapeShape ) ;
The following statements declare the types of values that can be associated with target
declarations.
1 :ShapeShape sh:property [
2 sh:path sh:targetNode ;
3 sh:nodeKind sh:IRIOrLiteral ;
4 ] ;
5 sh:property [
6 sh:path sh:targetClass ;
7 sh:nodeKind sh:IRI ;
8 ] ;
9 sh:property [
10 sh:path sh:targetSubjectsOf ;
11 sh:nodeKind sh:IRI ;
12 ] ;
13 sh:property [
14 sh:path sh:targetObjectsOf ;
15 sh:nodeKind sh:IRI ;
226 6. APPLICATIONS
16 ] ;
17 ...
In the same way, it declares the values that can have the different constraint components.
1 :ShapeShape sh:property [
2 sh:path sh:severity ;
3 sh:maxCount 1 ;
4 sh:nodeKind sh:IRI ;
5 ] ;
6 sh:property [
7 sh:path sh:deactivated ;
8 sh:maxCount 1 ;
9 sh:in ( true false ) ;
10 ] ;
11 sh:property [
12 sh:path sh:and ;
13 sh:node :ListShape ;
14 ] ;
15 sh:property [
16 sh:path sh:class ;
17 sh:nodeKind sh:IRI ;
18 ] ;
19 ...
We omit the full list of declarations as all of them follow the same style. They declare the
expected value of each predicate. For example, in the last case, that the predicate sh:class can
have an IRI as value.
A remarkable aspect is the following declaration:
1 :ShapeShape sh:or (
2 [ sh:not [
3 sh:class rdfs:Class ;
4 sh:or ( [ sh:class sh:NodeShape ]
5 [ sh:class sh:PropertyShape ]
6 )
7 ]
8 ]
9 [ sh:nodeKind sh:IRI ]
10 ).
It represents a syntax rule of implicit class targets (see Section 5.7.3) by which a Node-
Shape or PropertyShape that are also instances of rdfs:Class must be IRIs. This is an example of
an IF-THEN pattern (see Section 5.11.5) and could be defined in pseudo-code as:
1 IF ( sh:class rdfs:Class AND
2 ( sh:class sh:NodeShape OR sh:class sh:PropertyShape )
3 ) THEN sh:nodeKind sh:IRI
6.6. SHACL IN SHACL 227
Another interesting declaration is:
1 :ShapeShape sh:property [
2 sh:path sh:message ;
3 sh:or ( [ sh:datatype xsd:string ]
4 [ sh:datatype rdf:langString ] ) ;
5 ] .
which declares that messages can be any string literal or languages tagged string literal, which is
a common pattern for messages that admit not only plain string literals but multilingual ones.
Another aspect that can be remarked is the use of :ListShape as the value of several predi-
cates like sh:and, sh:or, sh:in, sh:ignoredProperties, and sh:xone.
The declarations are done as:
1 :ShapeShape sh:property [
2 sh:path sh:and ;
3 sh:node :ListShape ;
4 ] ;
5 sh:property [
6 sh:path sh:or ;
7 sh:node :ListShape ;
8 ] ;
9 # ... similar for the other predicates
10 .
The meaning is that the values of those predicates must be well-formed RDF lists (see
Section 2.2).
An RDF list is a collection of values linked by the rdf:rest predicate whose last value is
rdf:nil. Each node in the list must contain exactly one value of rdf:first. The declaration of
:ListShape is defined as:
1 :ListShape a sh:NodeShape ;
2 sh:property [ sh:path [ sh:zeroOrMorePath rdf:rest ] ;
3 sh:hasValue rdf:nil ;
4 sh:node :ListNodeShape ;
5 ] .
which means that all the nodes are linked by the predicate rdf:rest zero or more times, and that
those nodes must conform to :ListNodeShape which is defined as:
1 :ListNodeShape a sh:NodeShape ;
2 sh:or (
3 [ sh:hasValue rdf:nil ;
4 sh:property [ sh:path rdf:first ; sh:maxCount 0 ] ;
5 sh:property [ sh:path rdf:rest ; sh:maxCount 0 ] ;
6 ]
7 [ sh:not [ sh:hasValue rdf:nil ] ;
8 sh:property [ sh:path rdf:first ; sh:maxCount 1 ; sh:minCount 1 ] ;
228 6. APPLICATIONS
9 sh:property [ sh:path rdf:rest ; sh:maxCount 1 ; sh:minCount 1 ] ;
10 ]) .
This means that a list node is either rdf:nil, in which case it must not have any arc with
predicates rdf:first or rdf:rest, or a node with exactly one value for those predicates. In this
case, the pattern followed is an IF-THEN-ELSE pattern.
In the case of sh:ignoredProperties and sh:languageIn, the list nodes must also conform to
some specific shape (to be an IRI or a string). This can be expressed as:
1 :ShapeShape sh:property [
2 sh:path ( sh:ignoredProperties [ sh:zeroOrMorePath rdf:rest ] rdf:first );
3 sh:nodeKind sh:IRI ;
4 ];
5 sh:property [
6 sh:path ( sh:languageIn [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
7 sh:datatype xsd:string ;
8 ] .
Similarly, a constraint is established on the values of sh:and, sh:or and sh:xone which must
be lists of nodes conforming to :ShapeShape. This is declared as:
1 :ShapesListShape a sh:NodeShape ;
2 sh:property [
3 sh:path ( [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
4 sh:node :ShapeShape ;
5 ] .
Some properties, like the sh:path, sh:lessThan, sh:minCount, etc. cannot be applied to node
shapes. This constraint is declared as:
1 :NodeShapeShape a sh:NodeShape ;
2 sh:property [ sh:path sh:path ; sh:maxCount 0 ] ;
3 sh:property [ sh:path sh:lessThan ; sh:maxCount 0 ] ;
4 sh:property [ sh:path sh:maxCount ; sh:maxCount 0 ];
5 ... # Similar for sh:lessThanOrEquals , sh:minCount ,
6 # sh:qualifiedValueShape and sh:uniqueLang
Property shapes must have exactly one value for property sh:path.
1 :PropertyShapeShape a sh:NodeShape ;
2 sh:property [ sh:path sh:path ;
3 sh:maxCount 1 ; sh:minCount 1 ;
4 sh:node :PathShape
5 ] .
The value of sh:path must conform to :PathShape. The first version of :PathShape employed
recursion with the following pattern:
6.6. SHACL IN SHACL 229
1 :PathShape a sh:NodeShape ;
2 sh:xone (
3 [ sh:nodeKind sh:IRI ]
4 [ sh:nodeKind sh:BlankNode ;
5 sh:node :PathListWithAtLeast2Members ;
6 ]
7 [ sh:nodeKind sh:BlankNode ;
8 sh:closed true ;
9 sh:property [ sh:path sh:alternativePath ;
10 sh:node :PathListWithAtLeast2Members ;
11 sh:minCount 1 ; sh:maxCount 1 ;
12 ]
13 ]
14 [ sh:nodeKind sh:BlankNode ;
15 sh:closed true ;
16 sh:property [ sh:path sh:inversePath ;
17 sh:node :PathShape ; # Recursive reference
18 sh:minCount 1 ; sh:maxCount 1 ;
19 ]
20 ]
21 ...# similar for sh:zeroOrMorePath , sh:oneOrMorePath
22 # and sh:zeroOrOnePath
23 );
24 .
6 _:PathPath sh:alternativePath (
7 ( [ sh:zeroOrMorePath rdf:rest ] rdf:first )
8 ( sh:alternativePath [ sh:zeroOrMorePath rdf:rest ] rdf:first )
9 sh:inversePath
10 sh:zeroOrMorePath
11 sh:oneOrMorePath
12 sh:zeroOrOnePath
13 ) .
15 :PathNodeShape sh:xone (
16 [ sh:nodeKind sh:IRI ]
17 [ sh:nodeKind sh:BlankNode ;
18 sh:node :PathListWithAtLeast2Members ;
19 ]
230 6. APPLICATIONS
20 [ sh:nodeKind sh:BlankNode ;
21 sh:closed true ;
22 sh:property [ sh:path sh:alternativePath ;
23 sh:node :PathListWithAtLeast2Members ;
24 sh:minCount 1 ; sh:maxCount 1 ;
25 ]
26 ]
27 [ sh:nodeKind sh:BlankNode ;
28 sh:closed true ;
29 sh:property [ sh:path sh:inversePath ;
30 sh:minCount 1 ; sh:maxCount 1 ;
31 ]
32 ]
33 ...# similar for sh:zeroOrMorePath , sh:oneOrMorePath
34 # and sh:zeroOrOnePath
35 ) .
The last two definitions declare that the values of sh:shapesGraph and the values of
sh:entailment must be IRIs.
1 :ShapesGraphShape a sh:NodeShape ;
2 sh:targetObjectsOf sh:shapesGraph ;
3 sh:nodeKind sh:IRI .
5 :EntailmentShape a sh:NodeShape ;
6 sh:targetObjectsOf sh:entailment ;
7 sh:nodeKind sh:IRI .
6.7 SUMMARY
• ShEx and SHACL can be used to describe and validate linked data portals. We show how
they can be used to describe the WebIndex data model.
• FHIR describes an abstract information model which can be represented in JSON, XML,
and RDF. FHIR/RDF data model is described using ShEx.
• Springer Nature SciGraph is an early adopter of SHACL to validate data.
• DBpedia is an example of a big linked data portal whose needs for validation offer new
challenges.
6.8. SUGGESTED READING 231
• The RDF representation of ShEx can be described and validated in ShEx.
• SHACL Core shapes graphs can be described and validated in SHACL.
• FHIR linked data model. Describes the RDF data model used in FHIR and its use of
ShEx: D. Booth. FHIR linked data module. https://www.hl7.org/fhir/linked-
data-module.html, April 2017.
• Paper describing the use of RDFUnit on DBpedia as well as other large-scale RDF
datasets: D. Kontokostas, P. Westphal, S. Auer, S. Hellmann, J. Lehmann, R. Cornelis-
sen, and A. Zaveri. Test-driven evaluation of linked data quality. In Proc. of the 23rd Inter-
national Conference on World Wide Web, WWW’14, pages 747–758, Republic and Canton
of Geneva, Switzerland, International World Wide Web Conferences Steering Commit-
tee, 2014. DOI: 10.1145/2566486.2568002
• Paper describing the mappings-based validation applied in DBpedia: A. Dimou, D. Kon-
tokostas, M. Freudenberg, R. Verborgh, J. Lehmann, E. Mannens, S. Hellmann, and
R. Van de Walle. Assessing and refining mappings to RDF to improve dataset quality. In
Proc. of the 14th International Semantic Web Conference, October 2015. DOI: 10.1007/978-
3-319-25010-6_8
• Paper describing the integration of SHACL with Travis CI for validating DBpedia link
contributions: M. Dojchinovski, D. Kontokostas, R. Rößling, M. Knuth, and S. Hell-
mann. DBpedia links: The hub of links for the web of data. In Proc. of the SEMAN-
TiCS Conference (SEMANTiCS 2016), September 2016. https://svn.aksw.org/pap
ers/2016/SEMANTiCS_DBpedia_Links/public.pdf
CHAPTER 7
• Node constraints. Both languages have the notion of node constraints and share similar
expressiveness: node kinds, datatypes, datatype facets, value sets, etc. Example 7.1 shows
two declarations which are equivalent in ShEx and SHACL: a node must be an IRI, have
exactly one value for the property schema:name that has datatype xsd:string, have exactly one
value for the property schema:gender which must be one of (schema:Male schema:Female) or a
xsd:string, and optionally have a value for the property schema:birthDate that has datatype
xsd:date.
• Property Constraints. Both languages enable the declaration of constraints on the out-
going and incoming properties of a node.
14 :Organization a sh:NodeShape ;
15 sh:property [
16 sh:path rdf:type ;
17 sh:minCount 1; sh:maxCount 1;
18 sh:hasValue :Organization ;
19 ] .
4 :bob a :User ;
5 schema:name " Robert " . # X Fails as :User
7 :myCompany a :Organization ;
8 schema:member :alice .
Both ShEx and SHACL check that :alice conforms to the :User shape and raise an error
for :bob because there is no arc schema:member from a node with shape :Organization pointing
to :bob.
• Cardinalities. Both languages can constraint the number of values for a property in a
specific range, or leave the maximum number of value unbound.
• RDF syntax. Both ShEx and SHACL can use RDF concrete syntaxes though with dif-
ferent vocabularies.
• Logical operators. Both ShEx and SHACL have the logical operators And, Or and Not.
ShEx has the operators | to represent “oneOf ” while SHACL has xone to represent exactly
one.
• Extension mechanism. Both ShEx and SHACL have extension mechanisms that support
the declaration of more advanced constraints. ShEx has semantic actions (see Section 4.10)
and SHACL has SHACL-SPARQL (see Section 5.16). In Section 7.18, we compare the
ShEx and SHACL extension mechanisms in more detail.
7.2. SYNTACTIC DIFFERENCES 237
7.2 SYNTACTIC DIFFERENCES
The design of ShEx emphasized human readability, with a compact grammar that follows tra-
ditional language design principles and a compact syntax evolved from Turtle. The specification
defines an abstract syntax. The compact syntax (ShExC), a concrete JSON syntax (ShExJ), or
any of the concrete syntaxes for RDF may be used to express a ShEx schema.
SHACL uses the RDF abstract syntax and concrete syntaxes directly. The SHACL spec-
ification enumerates circa 120 rules that define what constitutes a well-formed SHACL shapes
graph.1 SHACL processors can simply omit ill-formed shapes graphs.
A compact syntax inspired by ShEx has been proposed for a subset of SHACL as a WG
Note (see Section 5.18) but it is not mandatory, and compliant SHACL processors are only
required to handle the RDF syntax.
As the SHACL compact syntax was inspired by ShExC, they look similar, but there are
several semantic differences.
A similar (but not equivalent) representation using SHACL compact syntax is:
1 :Product {
2 schema:productId xsd:string [1..1] pattern ="^[A-R]" .
3 schema:productId xsd:string [1..1] pattern ="^[M-Z]" .
4 schema:brand IRI @:Organization [0..*] .
5 schema:purchaseDate xsd:date [0..1]
6 }
7 :Organization {
8 schema:name xsd:string
9 }
Though the examples look similar on the surface, there are several subtle differences. The
ShEx schema says that there must be two values for the property schema:productId, one matching
"^[A-R]" and the other matching "^[M-Z]". In contrast, the SHACL shapes graph says that there
is only one property schema:productId, which must satisfy both regular expressions.
1 The complete list of rules is defined in https://www.w3.org/TR/shacl/#syntax-rules.
238 7. COMPARING SHEX AND SHACL
Given the following RDF data:
1 :p1 a :Product ; # V Passes as :Product using ShEx
2 schema:productId "AB" ; # X Fails as :Product using SHACL
3 schema:productId "XY" ;
4 schema:brand :myBrand .
Node :p1 conforms to ShEx definition of :Product and does not conform to SHACL be-
cause the constraints on schema:productId are not satisfied (both must be satisfied). Node :p2 does
not conform to ShEx because it only has one schema:productId but conforms to SHACL because
it satisfies all constraints.
Example 7.5
The RDF representation of Example 7.4 in ShEx is:
1 :Product a sx:Shape ;
2 sx:expression [ a sx:EachOf ;
3 sx:expressions (
4 [ a sx:TripleConstraint ;
5 sx:predicate schema:productId ;
6 sx:valueExpr [ a sx:NodeConstraint ;
7 sx:pattern "^[A-R]" ]
8 ]
9 [ a sx:TripleConstraint ;
10 sx:predicate schema:productId ;
11 sx:valueExpr [ a sx:NodeConstraint ;
12 sx:pattern "^[M-Z]" ]
13 ]
14 [ a sx:TripleConstraint ;
15 sx:predicate schema:brand ;
16 sx:min 0; sx:max -1;
17 sx:valueExpr [ a sx:ShapeAnd ;
18 sx:expressions (
19 [ a sx:NodeConstraint ; sx:nodeKind sx:iri ]
20 :Organization
21 )
22 ]
23 ]
24 [ a sx:TripleConstraint ;
25 sx:predicate schema:purchaseDate ;
7.3. FOUNDATION: SCHEMA VS. CONSTRAINTS 239
26 sx:min 0 ; sx:max 1 ;
27 sx:valueExpr [ a sx:NodeConstraint ;
28 sx:datatype xsd:date ]
29 ]
30 )
31 ] .
Here is the RDF encoding of the SHACL shapes graph in Example 7.4:
1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:minCount 1 ;
5 sh:maxCount 1 ;
6 sh:pattern "^[A-R]" ;
7 ];
8 sh:property [
9 sh:path schema:productId ;
10 sh:minCount 1 ;
11 sh:maxCount 1 ;
12 sh:pattern "^[M-Z]" ;
13 ];
14 sh:property [
15 sh:path schema:brand ;
16 sh:nodeKind sh:IRI ;
17 sh:node :Organization
18 ];
19 sh:property [
20 sh:path schema:purchaseDate ;
21 sh:maxCount 1 ;
22 sh:datatype xsd:date
23 ]
24 .
A SHACL processor checks that :alice, :bob, :carol, and :dave conform to :UserShape.
Directly associating target declarations to shapes can become quite verbose (see Sec-
tion 6.6). At the same time, it can limit the reusability of a shape in other contexts. In the
example above, if we import :UserShape in another context where the node :alice represents a
product instead of a user, the SHACL processor will still try to validate the node with that
shape. To avoid such cases, SHACL provides the sh:deactivated directive (see Section 5.13).
While including the target declarations in the schema is a convenient way to trigger val-
idation, it can be considered an anti-pattern because the shape can’t be reused for other data.
Even though this could work in some closed systems, it is impractical for data in open environ-
ments. In the interest of keeping schemas reusable, it is a good practice for SHACL to place
target declarations in a separate file and link this file to the schema with owl:imports.
A ShEx schema declares a constellation of shape expressions that function as a grammar
against which RDF nodes can be tested. The schema itself provides no mechanism for associ-
ating a shape expression with the nodes to which the schema applies. In the interest of making
schemas reusable, ShEx requires that definitions of shapes be decoupled from their application
to particular RDF graphs. ShEx separates the language of schemas, on the one hand, from the
association of shapes with nodes to be validated, on the other, by introducing the notion of shape
maps (see Section 4.9 for more details). This separation of concerns encourages the community
to innovate on node-shape association mechanisms independently from the validation seman-
tics. For example, though the shape map specification currently only supports RDF nodes by
direct reference or by triple pattern, Wikidata versions of ShEx include support for SPARQL
queries over remote endpoints. As such conventions evolve they can be rolled into future versions
of the shape map specification.
The declarations above behave similarly to the SHACL target declarations. One subtle
difference is that while in the previous case, ShEx only checks direct instances of :User, SHACL
applies the concept of SHACL instance, which also encompass instances of subclasses of :User.
This possibility can be expressed using property paths in shape maps as:
1 { FOCUS rdf:type / rdfs:subClassOf * :User } @:UserShape
Another notable difference between SHACL target node declarations and ShEx shape
maps is the following: when a declared target node in SHACL does not exist in the data graph
and there are no required values for this node in the shape, the node passes the validation. In
ShEx if the node does not exit it always results in a failure, no matter of the shape definition.
1 :Product {
2 schema:productId xsd:string
3 schema:price xsd:decimal
4 }
A :SoldProduct has the same constraints as the :Product plus two more constraints. One
that further restricts the property schema:productId and another one that requires a new property
schema:purchaseDate.
Here is an analogous SHACL shapes graph:
1 :Product a sh:NodeShape ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:datatype xsd:string ;
5 sh:minCount 1 ;
6 sh:maxCount 1 ;
7 ];
8 sh:property [
9 sh:path schema:price ;
10 sh:datatype xsd:decimal ;
11 sh:minCount 1 ;
12 sh:maxCount 1 ;
13 ].
15 :SoldProduct a sh:NodeShape ;
16 sh:and (
17 :Product
18 [ sh:path schema:purchaseDate ;
19 sh:datatype xsd:date ;
20 sh:minCount 1 ;
21 sh:maxCount 1 ;
22 ]
23 [ sh:path schema:productId ;
24 sh:pattern "^[A-Z]" ;
25 sh:minCount 1 ;
26 sh:maxCount 1 ;
27 ]
28 ) .
Another way to reuse shapes in SHACL is by leveraging the subclass relationship and the
corresponding target declarations. The example above could be expressed as:
1 :Product a sh:NodeShape , rdfs:Class ;
2 sh:property [
3 sh:path schema:productId ;
4 sh:datatype xsd:string
5 sh:minCount 1 ;
6 sh:maxCount 1
7 ];
244 7. COMPARING SHEX AND SHACL
8 sh:property [
9 sh:path schema:price ;
10 sh:datatype xsd:decimal
11 sh:minCount 1 ;
12 sh:maxCount 1
13 ].
The reusability of both languages could be improved. For example, there is no notion of
a module, where one might declare internal or hidden shapes, or of public shapes that could be
imported by other modules. Also, there is no notion of a shape extending other shape, inheriting
some properties and redefining others. Such features could potentially be developed for both
languages.
If SHACL is applied after RDFS inference, the system checks whether :frank and :grace
conform to :UserShape. This is because the domain declaration of :teaches allows RDFS to infer
that they are instances of :Teacher and, hence, instances of :User, with the following results:
• :grace has a value for schema:name that is not an xsd:string.
• :oscar has a value for schema:name that is not an xsd:string.
In contrast, if SHACL is applied without RDFS inference, the system returns only one
error:
• :oscar has a value for schema:name that is not an xsd:string.
The system does not check :frank or :grace against shape :User because it only follows
rdf:type and rdfs:subClassOf declarations. In the absence of RDFS inference, the system only
246 7. COMPARING SHEX AND SHACL
checks that :oscar has shape :User. If SHACL is applied after RDFS inference, the system checks
the additional nodes.
This interference between SHACL and RDFS semantics hampers the use of SHACL to
validate an inference system as the use case described for ShEx in Example 3.11.
The property sh:entailment can be used to declare that the SHACL processors should add
inferred triples during validation to the data graph following the inference rules declared by a
given entailment regime (see Section 5.17). Nevertheless, SHACL processors are not required
to support entailment regimes. If a shapes graph declares an entailment and the processor does
not support it, a failure must be signalled.
1 :UserShape {
2 schema:name xsd:string ;
3 schema:givenName xsd:string ;
4 }
The following SHACL shapes graph declares that if there is a schema:name then it must
have datatype xsd:string, and the same for schema:givenName:
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 ] ;
6 sh:property [
7 sh:path schema:givenName ;
8 sh:datatype xsd:string ;
9 ] .
The difference in results is based on the difference between the ShEx and SHACL points
of view. In ShEx, a triple expression makes explicit which triples involving the focus node should
be found in the graph, and specifying a cardinality may require several such triples. The absence
of cardinality means one triple. In SHACL, a shape is a conjunction of constraints. A cardinality
constraint is used to constrain the number of allowed triples of a given kind, and the absence of
cardinality means no constraint on the number of triples allowed.
8 :PersonKnown {
9 a [ :Person ] ;
10 schema:knows @:PersonKnown *
11 }
7.10 RECURSION
ShEx supports the definition of cyclic data models with recursive shapes (see Section 4.7.2)
while the processing of recursive shapes is undefined in SHACL (see Section 5.12.1). However,
some recursion cases can be handled in SHACL through SHACL property paths.
However, recursion in SHACL is undefined and not all SHACL processors may han-
dle that definition in the same way. The specification leaves recursion as an implementation-
dependent feature.
One possible solution is to add target declarations to the shape to trigger the validation
against them. A typical solution is to use rdf:type declarations as we saw in Section 5.12.1. In
this case, we could also use sh:targetSubjectsOf like:
1 :UserShapeRecursion a sh:NodeShape ;
2 sh:targetSubjectsOf schema:knows ;
3 sh:nodeKind sh:IRI ;
4 sh:property [
5 sh:path schema:knows ;
6 sh:class :User
7 ] .
Now, every node that is a subject of schema:knows must conform to that shape.
This solution may not be realistic in general. In this case, for example, we are forcing every
node that is a subject of schema:knows to conform to :UserShape and in other contexts, this could
be too restrictive. The same situation happens if we use sh:targetClass declarations.
Another approach to emulate recursive behavior is to use property paths. For example:
1 :UserShape a sh:NodeShape ;
2 sh:property [
3 sh:path [ sh:zeroOrMorePath schema:knows ] ;
4 sh:nodeKind sh:IRI ;
5 ] .
In this case, every node that is related by property schema:knows zero or more times with
the focus node, must be an IRI. With this solution, there may be other nodes that are subjects
of schema:knows but do not need to conform to :UserShape.
In Section 5.12.1, we described more advanced alternatives for using SHACL property
paths as an alternative to recursion.
250 7. COMPARING SHEX AND SHACL
7.11 PROPERTY PAIR CONSTRAINTS AND UNIQUENESS
Property pair constraints in SHACL can be used to compare current values with values from
another path, checking if they are equal, different or less than them (see Section 5.14).
ShEx 2.0 does not have the concept of property pair constraints, though this possibility
is being studied to be included in future versions.
must be changed if we want :p1 and :p2 to be the same property, only with different values. A
direct translation of that pattern to:
1 :Shape a sh:NodeShape ;
2 sh:property [
3 sh:path :p;
4 # ... constraints on :p ...
5 ];
6 sh:property [
252 7. COMPARING SHEX AND SHACL
7 sh:path :p;
8 #... other constraints on :p ...
9 ];
10 ...
1 :Person {
2 schema:parent { :isMale [ true ] }
3 schema:parent { :isFemale [ true ] }
4 }
However, this SHACL Shapes graph would only be satisfied by a node whose schema:parent
value is both male and female.
1 :alice a :Person ;
2 schema:parent :bob ; # V Passes as :Person in ShEx
3 schema:parent :carol . # X Fails as :Person in SHACL
7.12. REPEATED PROPERTIES 253
5 :bob :isMale true .
6 :carol :isFemale true .
8 :dave a :Person ;
9 schema:parent :x . # X Fails as :Person in ShEx
10 # V Passes as :Person in SHACL
12 :x :isMale true ;
13 :isFemale true .
Note that it requires to establish a count of the number of repeated properties allowed (in
this case 2).
254 7. COMPARING SHEX AND SHACL
7.13 EXACTLY ONE AND ALTERNATIVES
Data coherence minimizes defensive programming by providing predictable, logical data struc-
tures that must be used. To take a trivial example, a data structure may offer a choice between
different representations of a name as in Example 4.30 (for ShEx) and the corresponding Ex-
ample 5.38 (for SHACL).
Let’s change the constraint to require a combination of foaf:firstName and foaf:lastName or
foaf:givenName and foaf:familyName or schema:givenName and schema:familyName where none of these
properties can be mixed with the others. In ShEx, this can be declared as:
1 :Person {
2 foaf:firstName . ; foaf:lastName . |
3 foaf:givenName . ; foaf:familyName . |
4 schema:givenName . ; schema:familyName .
5 }
Given the following data, :alice and :bob conform to :Person while :carol and :dave do not.
In the case of :dave, it fails because the data meets one side of the disjunction and has some
properties from the other side.
1 :alice foaf:firstName "Alice" ; # V Passes as :Person
2 foaf:lastName " Cooper " .
However, this SHACL shapes graph has a meaning different from the ShEx schema.
In this case, :dave conforms to :Person because it matches exactly one of the shapes (it has
foaf:firstName and foaf:lastName) and does not match the other shapes. The intended meaning
was that it should not have any of the other properties but it has schema:givenName.
As we described in Section 5.38, SHACL’s sh:xone does not check if there are partial
matches in other shapes. A workaround to simulate ShEx behavior is to normalize the expression
using a top-level disjunction whose shapes exclude the properties that are not desired.
1 :Person a sh:NodeShape ;
2 sh:or (
3 [ sh:property [
4 sh:path foaf:firstName ;
5 sh:minCount 1;
6 sh:maxCount 1
7 ];
8 sh:property [
9 sh:path foaf:lastName ;
10 sh:minCount 1;
11 sh:maxCount 1
12 ];
13 sh:property [
14 sh:path foaf:givenName ;
15 sh:maxCount 0
16 ];
17 sh:property [
18 sh:path foaf:familyName ;
19 sh:maxCount 0
256 7. COMPARING SHEX AND SHACL
20 ];
21 sh:property [
22 sh:path schema:givenName ;
23 sh:maxCount 0
24 ];
25 sh:property [
26 sh:path schema:familyName ;
27 sh:maxCount 0
28 ];
29 ]
30 [ sh:property [
31 sh:path foaf:firstName ;
32 sh:maxCount 0
33 ];
34 sh:property [
35 sh:path foaf:lastName ;
36 sh:maxCount 0
37 ];
38 sh:property [
39 sh:path foaf:givenName ;
40 sh:minCount 1;
41 sh:maxCount 1
42 ] ;
43 sh:property [
44 sh:path foaf:familyName ;
45 sh:minCount 1; sh:maxCount 1
46 ];
47 sh:property [
48 sh:path schema:givenName ;
49 sh:maxCount 0
50 ] ;
51 sh:property [
52 sh:path schema:familyName ;
53 sh:maxCount 0
54 ];
55 ]
56 [ sh:property [
57 sh:path foaf:firstName ;
58 sh:maxCount 0
59 ];
60 sh:property [
61 sh:path foaf:lastName ;
62 sh:maxCount 0
63 ];
64 sh:property [
65 sh:path foaf:givenName ;
66 sh:maxCount 0
67 ];
7.14. TREATMENT OF CLOSED SHAPES 257
68 sh:property [
69 sh:path foaf:familyName ;
70 sh:maxCount 0
71 ];
72 sh:property [
73 sh:path schema:givenName ;
74 sh:minCount 1;
75 sh:maxCount 1
76 ] ;
77 sh:property [
78 sh:path schema:familyName ;
79 sh:minCount 1;
80 sh:maxCount 1
81 ];
82 ]
83 )
84 .
Although this approach solves the problem, more complex and nested shapes can increase
the complexity and readability of SHACL shapes.
then there will be no nodes satisfying the shape, as the two properties nested under sh:and
are thus hidden and not taken into consideration by the sh:closed directive.
A solution in this case is to enumerate the properties that we allow using
sh:ignoredProperties. In this case, one should add:
1 :UserShape
2 sh:ignoredProperties ( schema:name
3 schema:birthDate
4 )
As in the previous example, no node would conform to that shape because the closed
declaration does not find direct properties in property paths.
There are two solutions: either to add a sh:ignoredProperties declaration enumerating all
the properties as in previous example, or to add a property declaration for each predicate that
specifies no cardinality, thus has no other effect.
7.15. STEMS AND STEM RANGES 259
1 :UserShape a sh:NodeShape ;
2 sh:closed true ;
3 sh:property [
4 sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ;
5 sh:minCount 1;
6 sh:maxCount 1;
7 sh:datatype xsd:string
8 ] ;
9 sh:property [ sh:path schema:name ] ;
10 sh:property [ sh:path foaf:name ] ;
11 .
3 :Product {
4 :status [ codes:good ~ codes:bad ~ ]
5 }
ShEx also has range exclusions that can declare values to exclude, either literal or specified
with a stem (see 4.20). That feature is not part of SHACL Core and should be defined using
SHACL-SPARQL.
7.16 ANNOTATIONS
ShEx has the concept of annotations which can be attached to several constructs (see Sec-
tion 4.7.5). For example, the following ShEx schema attaches two annotations to each triple
constraint.
ShEx does not endorse or require the use of any specific annotation vocabulary.
SHACL has non-validating constraint components (see Section 5.15), such as sh:name
and sh:description, which are ignored by the SHACL processor during validation but can have
7.17. SEMANTICS AND COMPLEXITY 261
special meaning for user interface generation. It is also possible to add further informative triples
to any constraint or component, such as rdfs:label.
1 :Person a sh:NodeShape ;
2 sh:property [
3 sh:path schema:name ;
4 sh:datatype xsd:string ;
5 sh:name "Name" ;
6 sh:description "Name of person "
7 rdfs:label "Name";
8 ];
9 sh:property [
10 sh:path schema:birthDate ;
11 sh:datatype xsd:date ;
12 sh:name " BirthDate " ;
13 sh:description "Birth date"
14 rdfs:label " BirthDate ";
15 ] .
As we saw in Section 5.15, SHACL non-validating properties can be helpful for gener-
ating forms from SHACL definitions.
Although ShEx does not provide built-in non-validating properties, it would be possible
to use annotations from other vocabularies, even from SHACL.
For shapes with multiple triple constraints for the same predicate, try to minimize
the overlap between the value expressions. For instance, if three types of inspection
are necessary on a manufacturing checklist, use three different constraints for each
of the inspection properties rather than requiring three different inspection proper-
ties with a value expression which is a union of all three types. This will make the
validation process more efficient and will more effectively capture the business logic
in the schema.”
The SHACL Core semantics is defined in natural language with some non-normative
SPARQL templates, while SHACL SPARQL depends on a SPARQL processor. Its complexity
depends on the complexity of SPARQL, which can also be quite expensive, especially in the use
of property paths. As in the case of ShEx, it is also possible to declare shapes graphs that may
consume a lot of time or memory.
Both ShEx and SHACL open the door for further research on optimizations and spe-
cialized implementations usable for big datasets. Validators could define language subsets with
constructs that behave better when confronted with such datasets. To our knowledge, current
implementations have mainly been tested on in-memory data: separate RDF files, or relatively
small units of work (transactions). An exception is RDFUnit, that supports the execution of
SHACL directly on SPARQL endpoints and thus, can theoretically scale along with the capa-
bilities of the SPARQL engine. A lot of research remains to see how how very large (and not
in-memory) data sets can be efficiently validated with RDF shapes.
Benchmarks and testing tools are an essential step towards measuring the performance of
both languages as well as implementations. One early attempt was to use the WebIndex dataset
as a benchmark [57].
• Shapes induction. Given the recent emergence of schema languages, almost all existing
RDF data has no associated schemas. We can expect that schemas will be created for much
of the existing data. Deriving that automatically will greatly accelerate the availablity of
schemas. Some initial attempts are described in [99] and [37]. Such tools could become
part of the validation process, producing schemas that are conservative enough to reject
data patterns which are dubious because they occur very rarely in the examined data.
Given that there is already a large amount of RDF data that comes from structured sources
such as SQL databases or Wikipedia info boxes, derived schemas will likely reflect con-
straints native to the source format from which the data was converted or extracted.
• Optimization of RDF stores based on shapes. RDF stores that know the shape of their
RDF graphs can optimize their internal representations and increase the performance of
SPARQL queries.
• User interface generation from shapes. Editing RDF by hand is usually an error-prone
and non-user-friendly task. If the structure of the data is known, the editorial process can
be improved. Given that ShEx and SHACL Core define the properties that RDF nodes
can have, specialized user interfaces and forms could be generated from those shapes to
3 https://www.w3.org/2011/gld/validator/datacube.shapes.ttl
7.19. CONCLUSIONS AND OUTLOOK 265
increase user friendliness. As we described in Section 5.15, SHACL contains some built-
in annotation properties which can help user interface generation from shapes graphs.
ShEx also has support for any annotation properties, which in the future could also be
used to generate rich user interfaces.
• Schema transformation and mappings between data models. One of the most frequent
needs in computer science is to transform data based on some schema to data conforming
to another schema. These transformations are usually made by ad-hoc and error-prone
procedural programs. Because shapes languages can capture the structures of the sources
and targets of these transformations, they can be leveraged to define mappings. ShEx
Map,4 an extension of ShEx, can be used to convert RDF data between schemas.
• Integration between ShEx and SHACL. Although ShEx and SHACL are two different
approaches, both were designed to handle the general problem of RDF validation. ShEx
shines in its support of recursion based on well-founded semantics, while SHACL shines
in its support for SPARQL property paths and other SPARQL features. As in the case
of XML, where Schematron and RelaxNG can be used together [84], ShEx and SHACL
could be combined in a project to leverage the advantages of each.
On the other hand, the underpinnings of ShEx and SHACL are not radically different.
One implementation, Shaclex,5 uses compatible parts of libraries to implement a proces-
sor for both SHACL and ShEx and is being extended to convert between subsets of the
languages.
• ShEx and SHACL best practices. This book describes how ShEx and SHACL can be
used to express both simple and complex constraints on RDF data. It does not attempt
to teach modeling, or product design, or the engineering skill of knowing when to define
constraints and when to leave data less constrained. While modeling and enterprise data
management are covered by an extensive literature, the scale and breadth of the Semantic
Web requires new formulations of well-known problems.
ShEx and SHACL will play an important role in the future development of RDF and
will be a core part of the Semantic Web tool set. As more semantic data is generated, and more
applications are needed to integrate and consume it, RDF validation will be a fundamental
enabler for data quality and systems interoperability.
4 http://shex.io/extensions/Map/
5 http://labra.github.io/shaclex/
266 7. COMPARING SHEX AND SHACL
7.20 SUMMARY
• ShEx and SHACL can both be used to validate RDF.
• The expressiveness of ShEx and SHACL for common use cases is similar.
• ShEx is a W3C Community Group specification while SHACL Core and SHACL-
SPARQL are a W3C Recommendation
• ShEx is schema-oriented, while SHACL is focused on defining constraints over RDF
graphs.
• ShEx can be used with a compact syntax, a JSON-LD syntax, or any RDF syntax. SHACL
can be used with any RDF syntax, and a draft compact syntax has been proposed.
• ShEx has support for recursion and cyclic data models while recursion in SHACL is un-
defined.
• SHACL has support for arbitrary SPARQL property paths while ShEx has support only
for incoming and outgoing arcs.
• Both ShEx and SHACL support violation reporting at the shape level. For simple shapes,
SHACL can further distinguish the violations per constraint, as well as provide more
violation metadata. SHACL returns the violations in RDF using the Validation Report
vocabulary while ShEx returns a shape map with all nodes that were validated, including
the ones that pass validation while SHACL only the ones that failed.
• ShEx has a language agnostic extension mechanism called semantic actions while SHACL
offers extensibility through SPARQL and JavaScript.
WebIndex in ShEx
The following code contains the schema of the WebIndex in ShEx that was described in Sec-
tion 6.1.1.
1 prefix : <http: // example .org/>
2 prefix sh: <http: // www.w3.org/ns/shacl#>
3 prefix xsd: <http: // www.w3.org /2001/ XMLSchema #>
4 prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#>
5 prefix wf: <http: // data. webfoundation .org#>
6 prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema #>
7 prefix qb: <http: // purl.org/linked -data/cube#>
8 prefix cex: <http: // purl.org/weso/ ontology / computex #>
9 prefix dct: <http: // purl.org/dc/terms />
10 prefix skos: <http: // www.w3.org /2004/02/ skos/core#>
11 prefix foaf: <http: // xmlns.com/foaf /0.1/ >
12 prefix org: <http: // www.w3.org/ns/org#>
14 :Country {
15 rdfs:label xsd:string ;
16 wf:iso2 LENGTH 2
17 }
18 :DataSet { a [ qb:DataSet ] ;
19 qb:structure [ wf:DSD ] ;
20 rdfs:label xsd:string ;
21 qb:slice @:Slice * ;
22 dct:publisher @:Organization
23 }
24 :Slice { a [ qb:Slice ] ;
25 qb:sliceStructure [ wf:sliceByYear ] ;
26 qb:observation @:Observation * ;
27 cex:indicator @:Indicator
28 }
29 :Observation { a [ qb:Observation ] ;
30 a [ wf:Observation ] ;
31 cex:value xsd:float ;
32 rdfs:label xsd:string ? ;
33 dct:issued xsd:dateTime ;
34 dct:publisher [ wf:WebFoundation ] ? ;
35 qb:dataSet @:DataSet ;
36 cex:ref -area @:Country ;
37 cex:indicator @:Indicator ;
268 A. WEBINDEX IN SHEX
38 ( cex:computation @:Computation
39 | wf:source IRI
40 )
41 }
42 :Computation { a [ cex:Computation ] }
43 :Indicator { a [ wf:PrimaryIndicator wf:SecondaryIndicator ] ;
44 rdfs:label xsd:string ;
45 wf:provider @:Organization ;
46 }
47 :Organization CLOSED EXTRA a { a [ org:Organization ] ;
48 rdfs:label xsd:string ;
49 foaf:homepage IRI
50 }
APPENDIX B
WebIndex in SHACL
The following code contains the full version of the WebIndex data in SHACL that was described
in Section 6.1.2.
1 @prefix : <http: // example .org/> .
2 @prefix sh: <http: // www.w3.org/ns/shacl#> .
3 @prefix xsd: <http: // www.w3.org /2001/ XMLSchema #> .
4 @prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns# >.
5 @prefix wf: <http: // data. webfoundation .org# >.
6 @prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema # >.
7 @prefix qb: <http: // purl.org/linked -data/cube#> .
8 @prefix cex: <http: // purl.org/weso/ ontology / computex #> .
9 @prefix dct: <http: // purl.org/dc/terms />.
10 @prefix skos: <http: // www.w3.org /2004/02/ skos/core# >.
11 @prefix foaf: <http: // xmlns .com/foaf /0.1/ > .
12 @prefix org: <http: // www.w3.org/ns/org#> .
15 :Country a sh:NodeShape ;
16 sh:property [
17 sh:path rdfs:label ;
18 sh:datatype xsd:string ;
19 sh:minCount 1 ;
20 sh:maxCount 1
21 ] ;
22 sh:property [
23 sh:path wf:iso2 ;
24 sh:datatype xsd:string ;
25 sh:length 2 ;
26 sh:minCount 1 ;
27 sh:maxCount 1
28 ]
29 .
31 :DataSet a sh:NodeShape ;
32 sh:property [
33 sh:path rdf:type ;
34 sh:hasValue qb:DataSet ;
35 sh:minCount 1 ;
36 sh:maxCount 1
37 ] ;
270 B. WEBINDEX IN SHACL
38 sh:property [
39 sh:path qb:structure ;
40 sh:hasValue wf:DSD
41 ] ;
42 sh:property [
43 sh:path rdfs:label ;
44 sh:datatype xsd:string ;
45 sh:maxCount 1
46 ] ;
47 sh:property [
48 sh:path qb:slice ;
49 sh:node :Slice ;
50 ] ;
51 sh:property [
52 sh:path dct:publisher ;
53 sh:node :Organization ;
54 sh:minCount 1 ;
55 sh:maxCount 1
56 ]
57 .
59 :Slice a sh:NodeShape ;
60 sh:property [
61 sh:path rdf:type ;
62 sh:hasValue qb:Slice
63 ] ;
64 sh:property [
65 sh:path qb:sliceStructure ;
66 sh:hasValue wf:sliceByYear ;
67 sh:minCount 1 ;
68 sh:maxCount 1 ;
69 ] ;
70 sh:property [
71 sh:path qb:observation ;
72 sh:node :Observation ;
73 ] ;
74 sh:property [
75 sh:path cex:indicator ;
76 sh:node :Indicator ;
77 sh:minCount 1 ;
78 sh:maxCount 1
79 ]
80 .
82 :Observation a sh:NodeShape ;
83 sh:property [
84 sh:path rdf:type ;
85 sh:in ( qb:Observation wf:Observation );
271
86 sh:minCount 2
87 ] ;
88 sh:property [ sh:path rdf:type ;
89 sh:minCount 2 ;
90 sh:maxCount 2
91 ] ;
92 sh:property [
93 sh:path cex:value ;
94 sh:datatype xsd:float ;
95 sh:minCount 1 ;
96 sh:maxCount 1
97 ] ;
98 sh:property [
99 sh:path rdfs:label ;
100 sh:datatype xsd:string ;
101 sh:maxCount 1
102 ] ;
103 sh:property [
104 sh:path dct:issued ;
105 sh:datatype xsd:dateTime ;
106 sh:minCount 1 ;
107 sh:maxCount 1
108 ] ;
109 sh:or (
110 [ sh:property [
111 sh:path dct:publisher ;
112 sh:hasValue wf:WebFoundation ;
113 ]
114 ]
115 [ sh:property [
116 sh:path dct:publisher ;
117 sh:maxCount 0
118 ]
119 ]
120 ) ;
121 sh:property [
122 sh:path qb:dataSet ;
123 sh:node :DataSet ;
124 sh:minCount 1 ;
125 sh:maxCount 1
126 ] ;
127 sh:property [
128 sh:path cex:ref -area ;
129 sh:node :Country ;
130 sh:minCount 1 ;
131 sh:maxCount 1
132 ] ;
133 sh:property [
272 B. WEBINDEX IN SHACL
134 sh:path cex:indicator ;
135 sh:node :Indicator ;
136 sh:minCount 1 ;
137 sh:maxCount 1
138 ] ;
139 sh:or (
140 [ sh:property [
141 sh:path wf:source ;
142 sh:nodeKind sh:IRI ;
143 sh:minCount 1 ;
144 sh:maxCount 1
145 ] ;
146 sh:property [
147 sh:path cex:computation ;
148 sh:maxCount 0
149 ]
150 ]
151 [ sh:property [
152 sh:path cex:computation ;
153 sh:node :Computation ;
154 sh:minCount 1 ;
155 sh:maxCount 1
156 ] ;
157 sh:property [
158 sh:path wf:source ;
159 sh:maxCount 0
160 ]
161 ]
162 )
163 .
ShEx in ShEx
In this annex we include the full code of a ShEx schema that validates ShEx schemas represented
in RDF syntax (ShExR). This code has been adapted from the ShEx specification [81].1
1 PREFIX sx: <http: // www.w3.org/ns/shex#>
2 PREFIX xsd: <http: // www.w3.org /2001/ XMLSchema #>
3 PREFIX rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#>
4 BASE <http: // www.w3.org/ns/shex#>
14 <shapeExpr > @<ShapeOr > OR @<ShapeAnd > OR @<ShapeNot > OR @< NodeConstraint >
OR @<Shape > OR @< ShapeExternal >
101 <tripleExpression > @< TripleConstraint > OR @<OneOf > OR @<EachOf >
SHACL in SHACL
In this Appendix we include the definition of SHACL to validate SHACL Core Shapes graphs.
The version we include here has been edited from the original one1 in an attempt to improve
readability (we changed the shsh prefix by the empty one and omitted rdfs:seeAlso declarations
and some comments). It is described in Section 6.6.
1 @prefix rdf: <http: // www.w3.org /1999/02/22 - rdf -syntax -ns#> .
2 @prefix rdfs: <http: // www.w3.org /2000/01/ rdf - schema #> .
3 @prefix sh: <http: // www.w3.org/ns/shacl#> .
4 @prefix xsd: <http: // www.w3.org /2001/ XMLSchema #> .
5 @prefix : <http: // www.w3.org/ns/shacl -shacl #> .
12 :ShapeShape a sh:NodeShape ;
13 sh:targetClass sh:NodeShape ;
14 sh:targetClass sh:PropertyShape ;
15 sh:targetSubjectsOf sh:targetClass , sh:targetNode ,
16 sh:targetObjectsOf , sh:targetSubjectsOf ,
17 sh:and , sh:class , sh:closed , sh:datatype ,
18 sh:disjoint , sh:equals , sh:flags , sh:hasValue ,
19 sh:ignoredProperties , sh:in ,
20 sh:languageIn , sh:lessThan , sh:lessThanOrEquals ,
21 sh:maxCount , sh:maxExclusive , sh:maxInclusive , sh:maxLength ,
sh:minCount ,
22 sh:minExclusive , sh:minInclusive , sh:minLength ,
23 sh:node , sh:nodeKind , sh:not ,
24 sh:or , sh:pattern , sh:property ,
25 sh:qualifiedMaxCount , sh:qualifiedMinCount ,
26 sh:qualifiedValueShape , sh:qualifiedValueShapesDisjoint ,
27 sh:sparql , sh:uniqueLang , sh:xone ;
28 sh:targetObjectsOf sh:node , sh:not , sh:property , sh:qualifiedValueShape ;
29 sh:xone ( :NodeShapeShape :PropertyShapeShape ) ;
30 sh:property [
31 sh:path sh:targetNode ;
32 sh:nodeKind sh:IRIOrLiteral ;
1 See Appendix C in https://www.w3.org/TR/shacl/
280 D. SHACL IN SHACL
33 ] ;
34 sh:property [
35 sh:path sh:targetClass ;
36 sh:nodeKind sh:IRI ;
37 ] ;
38 sh:property [
39 sh:path sh:targetSubjectsOf ;
40 sh:nodeKind sh:IRI ;
41 ] ;
42 sh:property [
43 sh:path sh:targetObjectsOf ;
44 sh:nodeKind sh:IRI ;
45 ] ;
46 sh:or (
47 [ sh:not [
48 sh:class rdfs:Class ;
49 sh:or ( [ sh:class sh:NodeShape ]
50 [ sh:class sh:PropertyShape ]
51 )
52 ]
53 ]
54 [ sh:nodeKind sh:IRI ]
55 );
56 sh:property [
57 sh:path sh:severity ;
58 sh:maxCount 1 ;
59 sh:nodeKind sh:IRI ;
60 ] ;
61 sh:property [
62 sh:path sh:message ;
63 sh:or ( [ sh:datatype xsd:string ]
64 [ sh:datatype rdf:langString ]
65 ) ] ;
66 sh:property [
67 sh:path sh:deactivated ;
68 sh:maxCount 1 ;
69 sh:in ( true false ) ;
70 ] ;
71 sh:property [
72 sh:path sh:and ;
73 sh:node :ListShape ;
74 ] ;
75 sh:property [
76 sh:path sh:class ;
77 sh:nodeKind sh:IRI ;
78 ] ;
79 sh:property [
80 sh:path sh:closed ;
281
81 sh:datatype xsd:boolean ;
82 sh:maxCount 1 ;
83 ] ;
84 sh:property [
85 sh:path sh:ignoredProperties ;
86 sh:node :ListShape ;
87 sh:maxCount 1 ;
88 ] ;
89 sh:property [
90 sh:path ( sh:ignoredProperties [ sh:zeroOrMorePath rdf:rest ] rdf:first
) ;
91 sh:nodeKind sh:IRI ;
92 ] ;
93 sh:property [
94 sh:path sh:datatype ;
95 sh:nodeKind sh:IRI ;
96 sh:maxCount 1 ;
97 ] ;
98 sh:property [
99 sh:path sh:disjoint ;
100 sh:nodeKind sh:IRI ;
101 ] ;
102 sh:property [
103 sh:path sh:equals ;
104 sh:nodeKind sh:IRI ;
105 ] ;
106 sh:property [
107 sh:path sh:in ;
108 sh:maxCount 1 ;
109 sh:node :ListShape ;
110 ] ;
111 sh:property [
112 sh:path sh:languageIn ;
113 sh:maxCount 1 ;
114 sh:node :ListShape ;
115 ] ;
116 sh:property [
117 sh:path ( sh:languageIn [ sh:zeroOrMorePath rdf:rest ] rdf:first ) ;
118 sh:datatype xsd:string ;
119 ] ;
120 sh:property [
121 sh:path sh:lessThan ;
122 sh:nodeKind sh:IRI ;
123 ] ;
124 sh:property [
125 sh:path sh:lessThanOrEquals ;
126 sh:nodeKind sh:IRI ;
127 ] ;
282 D. SHACL IN SHACL
128 sh:property [
129 sh:path sh:maxCount ;
130 sh:datatype xsd:integer ;
131 sh:maxCount 1 ;
132 ] ;
133 sh:property [
134 sh:path sh:maxExclusive ;
135 sh:maxCount 1 ;
136 sh:nodeKind sh:Literal ;
137 ] ;
138 sh:property [
139 sh:path sh:maxInclusive ;
140 sh:maxCount 1 ;
141 sh:nodeKind sh:Literal ;
142 ] ;
143 sh:property [
144 sh:path sh:maxLength ;
145 sh:datatype xsd:integer ;
146 sh:maxCount 1 ;
147 ] ;
148 sh:property [
149 sh:path sh:minCount ;
150 sh:datatype xsd:integer ;
151 sh:maxCount 1 ;
152 ] ;
153 sh:property [
154 sh:path sh:minExclusive ;
155 sh:maxCount 1 ;
156 sh:nodeKind sh:Literal ;
157 ] ;
158 sh:property [
159 sh:path sh:minInclusive ;
160 sh:maxCount 1 ;
161 sh:nodeKind sh:Literal ;
162 ] ;
163 sh:property [
164 sh:path sh:minLength ;
165 sh:datatype xsd:integer ;
166 sh:maxCount 1 ;
167 ] ;
168 sh:property [
169 sh:path sh:nodeKind ;
170 sh:in ( sh:BlankNode sh:IRI sh:Literal
171 sh:BlankNodeOrIRI sh:BlankNodeOrLiteral sh:IRIOrLiteral ) ;
172 sh:maxCount 1 ;
173 ] ;
174 sh:property [
175 sh:path sh:or ;
283
176 sh:node :ListShape ;
177 ] ;
178 sh:property [
179 sh:path sh:pattern ;
180 sh:datatype xsd:string ;
181 sh:maxCount 1 ;
182 ] ;
183 sh:property [
184 sh:path sh:flags ;
185 sh:datatype xsd:string ;
186 sh:maxCount 1 ;
187 ] ;
188 sh:property [
189 sh:path sh:qualifiedMaxCount ;
190 sh:datatype xsd:integer ;
191 sh:maxCount 1 ;
192 ] ;
193 sh:property [
194 sh:path sh:qualifiedMinCount ;
195 sh:datatype xsd:integer ;
196 sh:maxCount 1 ;
197 ] ;
198 sh:property [
199 sh:path sh:qualifiedValueShape ;
200 sh:maxCount 1 ;
201 ] ;
202 sh:property [
203 sh:path sh:qualifiedValueShapesDisjoint ;
204 sh:datatype xsd:boolean ;
205 sh:maxCount 1 ;
206 ] ;
207 sh:property [
208 sh:path sh:uniqueLang ;
209 sh:datatype xsd:boolean ;
210 sh:maxCount 1 ;
211 ] ;
212 sh:property [
213 sh:path sh:xone ;
214 sh:node :ListShape ;
215 ]
216 .
Bibliography
[1] S. Abiteboul, R. Hull, and V. Vianu, Eds. Foundations of Databases: The Logical Level, 1st
ed. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 1995.
[2] S. Abiteboul, I. Manolescu, P. Rigaux, M.-C. Rousset, and P. Senellart. Web Data Man-
agement. Cambridge University Press, 2012. DOI: 10.1017/cbo9780511998225.
[3] D. Allemang and J. Hendler. Semantic Web for the Working Ontologist: Effective Modeling in
RDFS and OWL, 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 2011.
[4] G. Antoniou, P. Groth, F. v. v. Harmelen, and R. Hoekstra. A Semantic Web Primer. The
MIT Press, 2012.
[5] C. Arnaud Le Hors. JSON-LD 1.0: A JSON-based Serialization for Linked Data. W3C
Recommendation, 2014. http://www.w3.org/TR/json-ld/
[6] C. Arnaud Le Hors. RDF Data Shapes Working Group Charter. http://www.w3.org/
2014/data-shapes/charter, 2014.
[8] T. Berners-Lee. Linked-data design issues. W3C design issue document, June 2006.
http://www.w3.org/DesignIssues/LinkedData.html
[9] P. V. Biron and A. Malhotra. XML Schema Part 2: Datatypes 2nd ed. W3C Recommen-
dation, 2004. http://www.w3.org/TR/xmlschema-2/
[10] DCMI Usage Board. DCMI Metadata Terms. http://dublincore.org/documents/d
cmi-terms/, 2012.
[11] I. Boneva, J. E. Labra Gayo, and E. Prud’hommeaux. Semantics and validation of shapes
schemas for RDF. In International Semantic Web Conference, 2017.
[12] D. Booth. FHIR linked data module. https://www.hl7.org/fhir/linked-data-
module.html, April 2017.
[13] T. Bosch, E. Acar, A. Nolle, and K. Eckert. The role of reasoning for RDF validation. In
Proc. of the 11th International Conference on Semantic Systems, SEMANTICS’15, pages 33–
40, New York, ACM, 2015. DOI: 10.1145/2814864.2814867.
286 BIBLIOGRAPHY
[14] P. Bourhis, J. L. Reutter, F. Suárez, and D. Vrgoč. JSON: Data model, query languages
and schema specification. In Proc. of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium
on Principles of Database Systems, PODS’17, pages 123–135, New York, ACM, 2017. DOI:
10.1145/3034786.3056120.
[15] G. E. P. Box. Science and statistics. Journal of the American Statistical Association, 71(356):
791–799, 1976. DOI: 10.2307/2286841.
[16] D. Brickley, R. V. Guha, and A. Layman. Resource description framework (RDF)
schemas. https://www.w3.org/TR/1998/WD-rdf-schema-19980409/, 1998.
[17] K. Cagle. SHACL: It’s about time. https://dzone.com/articles/its-about-time,
March 2017.
[18] G. Carothers and A. Seaborne. TRIG: RDF dataset language. http://www.w3.org/TR
/trig/, 2014.
[19] R. Chinnici, J.-J. Moreau, A. Ryman, and S. Weerawarana. Web Services Description
Language (WSDL) Version 2.0 Part 1: Core Language. https://www.w3.org/TR/wsd
l20/, 2007.
[20] J. Clark and M. Makoto, Eds. RELAX NG Specification. OASIS Committee Specification,
2001. http://relaxng.org/spec-20011203.html
[21] K. Clark and E. Sirin. On RDF validation, stardog ICV, and assorted remarks. In RDF
Validation Workshop. Practical Assurances for Quality RDF Data, W3C, Cambridge, MA,
Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[22] C. S. Coen, P. Marinelli, and F. Vitali. Schemapath, a minimal extension to XML Schema
for conditional constraints. In Proc. of the 13th International Conference on World Wide Web,
WWW’04, pages 164–174, New York, ACM, 2004. DOI: 10.1145/988672.988695.
[23] K. Coyle and T. Baker. Dublin core application profiles. Separating validation from se-
mantics. In RDF Validation Workshop. Practical Assurances for Quality RDF Data, W3C,
Cambridge, MA, Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[24] R. Cyganiak and D. Reynolds. The RDF Data Cube Vocabulary. W3C Recommendation,
2014. https://www.w3.org/TR/vocab-data-cube/
[25] R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1—Concepts and Abstract Syntax. W3C
Recommendation, February 2014. http://www.w3.org/TR/rdf11-concepts/
[26] D. Brickley and R. V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema.
W3C Recommendation, 2004. https://www.w3.org/TR/2004/REC-rdf-schema-
20040210/
BIBLIOGRAPHY 287
[27] D. Brickley and R. V. Guha. RDF Schema 1.1. W3C Recommendation, 2014. http:
//www.w3.org/TR/rdf-schema/
[28] S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language. W3C
Recommendation, September 2012. http://www.w3.org/TR/r2rml/
[29] D. L. McGuinness and F. V. Harmelen. OWL Web Ontology Language Overview. W3C
Recommendation, 2004. https://www.w3.org/TR/owl-features/
[30] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de Walle.
RML: A generic language for integrated RDF mappings of heterogeneous data. In Proc.
of the 7th Workshop on Linked Data on the Web, April 2014. http://events.linkeddata.
org/ldow2014/papers/ldow2014_paper_01.pdf
[36] I. Ermilov, J. Lehmann, M. Martin, and S. Auer. LODStats: The data web census
dataset. In Proc. of 15th International Semantic Web Conference—Resources Track (ISWC),
2016. DOI: 10.1007/978-3-319-46547-0_5.
[37] D. F. Alvarez, J. E. Labra Gayo, and H. Garcia-Gonzalez, Eds. Inference and Serialization
of Latent Graph Schemata Using ShEx, number 10 in IARIA Series, 2016. http://thinkm
ind.org/index.php?view=article&articleid=semapro_2016_4_40_30038
[40] Jose E. Labra Gayo, E. Prud’hommeaux, I. Boneva, S. Staworko, H. Solbrig, and S. Hym.
Towards an RDF validation language based on regular expression derivatives. http://ce
ur-ws.org/Vol-1330/paper-32.pdf
[41] R. J. Glushko, Ed. The Discipline of Organizing. The MIT Press, 2013. DOI:
10.1002/bult.2013.1720400108.
[42] C. F. Goldfarb. The SGML Handbook. Oxford University Press, Inc., New York, 1990.
[43] P. Grosso and J. Kosek. Associating Schemas with XML Documents 1.0, 3rd ed. W3C
Working Group Note, October 2012. https://www.w3.org/TR/xml-model/
[44] S. Harris and A. Seaborne. SPARQL 1.1 Query Language. W3C Recommendation, 2013.
http://www.w3.org/TR/sparql11-query/
[45] T. Hartmann, B. Zapilko, J. Wackerow, and K. Eckert. Validating RDF data quality
using constraints to direct the development of constraint languages. In IEEE 10th Inter-
national Conference on Semantic Computing (ICSC), pages 116–123, February 2016. DOI:
10.1109/icsc.2016.43.
[46] T. Heath and C. Bizer. Linked Data: Evolving the Web into a Global Data
Space, volume 1. Morgan & Claypool Publishers LLC, February 2011. DOI:
10.2200/s00334ed1v01y201102wbe001.
[47] J. Hebeler, M. Fisher, R. Blace, and A. Perez-Lopez. Semantic Web Programming. Wiley
Publishing, 2009.
[48] P. Hitzler, M. Krötzsch, and S. Rudolph. Foundations of Semantic Web Technologies. Chap-
man & Hall/CRC, 2009.
[49] J. Hjelm. Creating the Semantic Web with RDF: Professional Developer’s Guide. Professional
Developer’s Guide Series. Wiley, 2001.
[54] J. E. Labra Gayo. Web semántica: comprendiendo el cambio hacia la Web 3.0. Nebiblo, 2012.
[55] J. E. Labra Gayo and J. M. A. Rodríguez. Validating statistical index data represented in
RDF using SPARQL queries. In RDF Validation Workshop. Practical Assurances for Qual-
ity RDF Data, W3C, Cambridge, MA, Boston, September 2013. http://www.w3.org/
2012/12/rdf-val
[58] J. E. Labra Gayo, E. Prud’hommeaux, H. Solbrig, and I. Boneva. Validating and describing
linked data portals using shapes. http://arxiv.org/abs/1701.08924
[59] T. Lebo, S. Sahoo, and D. McGuinness. PROV-O: The PROV Ontology. W3C Recom-
mendation, April 2013. http://www.w3.org/TR/prov-o/
[61] F. Maali and J. Erickson, Eds. Data Catalog Vocabulary (DCAT). W3C Recommendation,
2014. https://www.w3.org/TR/vocab-dcat/
290 BIBLIOGRAPHY
[62] W. Martens, F. Neven, M. Niewerth, and T. Schwentick. Bonxai: Combining the simplic-
ity of DTD with the expressiveness of XML schema. In Proc. of the 34th ACM SIGMOD-
SIGACT-SIGAI Symposium on Principles of Database Systems, PODS’15, pages 145–156,
New York, ACM, 2015. DOI: 10.1145/2745754.2745774.
[64] B. Motik, I. Horrocks, and U. Sattler. Adding integrity constraints to OWL. In C. Gol-
breich, A. Kalyanpur, and B. Parsia, Eds., OWL: Experiences and Directions (OWLED),
Innsbruck, Austria, June 6–7, 2007.
[65] M. Murata, D. Lee, M. Mani, and K. Kawaguchi. Taxonomy of XML schema languages
using formal language theory. ACM Transactions on Internet Technology, 5(4):660–704,
November 2005. DOI: 10.1145/1111627.1111631.
[66] M. A. Musen. The protégé project: A look back and a look forward. AI Matters, 1(4):
4–12, June 2015. DOI: 10.1145/2757001.2757003.
[67] T. Neumann and G. Weikum. Scalable join processing on very large RDF graphs. In
Proc. of the ACM SIGMOD International Conference on Management of Data, SIGMOD’09,
pages 627–640, New York, ACM, 2009. DOI: 10.1145/1559845.1559911.
[68] O. Lassila and R. R. Swick. Resource Description Framework (RDF) Model and Syntax.
https://www.w3.org/TR/WD-rdf-syntax-971002/, 1997.
[69] O. Lassila and R. R. Swick. Resource Description Framework (RDF) Model and Syntax
Specification. https://www.w3.org/TR/1999/REC-rdf-syntax-19990222/, 1999.
[70] W. OWL Working Group. OWL 2 Web Ontology Language: Document Overview. W3C
Recommendation, October 2009. http://www.w3.org/TR/owl2-overview/
[71] T. B. Passin. Explorer’s Guide to the Semantic Web. Manning Publications Co., Greenwich,
CT, 2004.
[72] P. F. Patel-Schneider. Using description logics for RDF constraint checking and closed-
world recognition. In Proc. of the 29th Conference on Artificial Intelligence, AAAI’15,
pages 247–253. AAAI Press, 2015. http://dl.acm.org/citation.cfm?id=2887007.
2887042
[90] S. Simister and D. Brickley. Simple application-specific constraints for RDF models. In
RDF Validation Workshop. Practical Assurances for Quality RDF Data, W3C, Cambridge,
MA, Boston, September 2013. http://www.w3.org/2012/12/rdf-val
[91] S. Steyskal and K. Coyle. SHACL Use Cases and Requirements. W3C Working Draft,
2016. https://www.w3.org/TR/shacl-ucr/
[92] H. Solbrig and E. Prud’hommeaux. Shape Expressions 1.0 Definition. http://www.w3.
org/Submission/shex-defn/, 2014.
[93] S. Speicher, J. Arwe, and A. Malhotra, Eds. Linked Data Platform 1.0. W3C Recommen-
dation, 2015. https://www.w3.org/TR/ldp/
[94] S. Staworko, I. Boneva, Jose E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, and H. R.
Solbrig. Complexity and expressiveness of ShEx for RDF. In 18th International Confer-
ence on Database Theory, ICDT, volume 31 of LIPIcs, pages 195–211, Schloss Dagstuhl—
Leibniz-Zentrum fuer Informatik, 2015.
[95] D. Steer and L. Miller. Validating RDF with treehugger and schematron. In FOAF-
Galway. Position paper, 2004. https://www.w3.org/2001/sw/Europe/events/foaf-
galway/papers/pp/validating_rdf/
[96] J. Tao, E. Sirin, J. Bao, and D. L. McGuinness. Integrity constraints in OWL. In Proc. of
the 24th Conference on Artificial Intelligence (AAAI’10), 2010.
[97] N. M. Tim Berners-Lee. The rule of least power. http://www.w3.org/2001/tag/doc/
leastPower, February 2006.
[99] J. C. van Dam, J. J. Koehorst, P. J. Schaap, V. A. Martins dos Santos, and M. Suarez-Diez.
RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource.
Journal of Biomedical Semantics, 6(1):39, October 2015. DOI: 10.1186/s13326-015-0038-
9.
[100] E. van der Vlist. Relax NG: A Simpler Schema Language for XML. O’Reilly, Beijing, 2004.
BIBLIOGRAPHY 293
[101] A. Wright, Ed. JSON Schema: A Media Type for Describing JSON Documents. IETF, 2016.
http://json-schema.org/
297
Index