In this chapter we present a comparison between ShEx and SHACL. The technologies have similar goals and similar features. In fact at the start of the Data Shapes Working Group in 2014, convergence on a unified approach was considered possible. However, this did not happen and as of July 2017 both technologies are maintained as separate solutions.
We start by describing some of the common features that they share, followed by a review of the main differences.
ShEx and SHACL share the same goal, to have a mechanism for describing and validating RDF data using a high-level language, so there are a lot of common features that both share.
Consider the following SHACL shapes graph:
:User a sh:NodeShape ; sh:nodeKind sh:IRI ; sh:property [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ; ] ; sh:property [ sh:path schema:gender ; sh:minCount 1; sh:maxCount 1; sh:or ( [ sh:in (schema:Male schema:Female) ] [ sh:datatype xsd:string] ) ] ; sh:property [ sh:path schema:birthDate ; sh:maxCount 1; sh:datatype xsd:date ; ] . |
This can be expressed in a ShEx schema:
:User IRI { schema:name xsd:string ; schema:gender [schema:Male schema:Female] OR xsd:string ; schema:birthDate xsd:date ? } |
schema
:
name
that has datatype
xsd
:
string
,
have exactly one value for the property
schema
:
gender
which must be one of (
schema
:
Male
schema
:
Female
) or a
xsd
:
string
,
and optionally have a value for the property
schema
:
birthDate
that has datatype
xsd
:
date
.The following SHACL shapes graph describes that nodes that conform to
:
User
have one outgoing property
schema
:
name
and one incoming property
schema
:
member
from an organization.
:User a sh:NodeShape ; sh:property [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ; ] ; sh:property [ sh:path [ sh:inversePath schema:member ] ; sh:minCount 1; sh:maxCount 1; sh:node :Organization; ] . :Organization a sh:NodeShape ; sh:property [ sh:path rdf:type ; sh:minCount 1; sh:maxCount 1; sh:hasValue :Organization ; ] . |
can be expressed in ShEx as:
:User { schema:name xsd:string ; ^schema:member @:Organization } :Organization { a [:Organization] } |
Given the following data:
:alice a :User ; # Passes as a :User schema:name "Alice" . :bob a :User ; schema:name "Robert" . # Fails as a :User :myCompany a :Organization ; schema:member :alice . |
Both ShEx and SHACL check that
:
alice
conforms to the
:
User
shape
and raise an error for
:
bob
because there is no arc
schema
:
member
from a node with shape
:
Organization
pointing to
:
bob
.
|
to represent “oneOf” while SHACL has
xone
to represent exactly one.Imagine that in some domain, a
:
Product
must have a
schema
:
productID
with a value that either starts by
P
(matches regular expression
)
or ends by a digit (regular expression "^P"
"\\[0-9]$"
) and is not
."P23"
It can be expressed in ShEx as:
:Product ({ schema:productID /^P/i ; } OR { schema:productID /[0-9]$/ ; }) AND NOT { schema:productID [ "P23" ] } |
and in SHACL as:
:ProductShape a sh:NodeShape ; sh:targetClass :Product ; sh:or ( [ sh:path schema:productID ; sh:minCount 1; sh:maxCount 1; sh:pattern "^P" ; sh:flags "i" ] [ sh:path schema:productID ; sh:minCount 1; sh:maxCount 1; sh:pattern "[0-9]$" ; ] ); sh:not [ sh:path schema:productID ; sh:hasValue "P23" ] . |
Given the following data:
:p45 a :Product ; # Passes as a :Product schema:productID "P45" . :x23 a :Product ; # Passes as a :Product schema:productID "X23" . :p23 a :Product ; # Fails as a :Product schema:productID "P23" . :xx a :Product ; # Fails as a :Product schema:productID "xx" . |
The design of ShEx emphasized human readability, with a compact grammar that follows traditional language design principles and a compact syntax evolved from Turtle. The specification defines an abstract syntax. The compact syntax (ShExC), a concrete JSON syntax (ShExJ), or any of the concrete syntaxes for RDF may be used to express a ShEx schema.
SHACL uses the RDF abstract syntax and concrete syntaxes directly. The SHACL specification enumerates circa 120 rules that define what constitutes a well-formed SHACL shapes graph.1 SHACL processors can simply omit ill-formed shapes graphs.
A compact syntax inspired by ShEx has been proposed for a subset of SHACL as a WG Note (see Section 5.18) but it is not mandatory, and compliant SHACL processors are only required to handle the RDF syntax.
As the SHACL compact syntax was inspired by ShExC, they look similar, but there are several semantic differences.
Given the following ShEx schema:
:Product { schema:productId /^[A-R]/ ; schema:productId /^[M-Z]/ ; schema:brand IRI @:Organization* ; schema:purchaseDate xsd:date ? } :Organization { schema:name xsd:string } |
A similar (but not equivalent) representation using SHACL compact syntax is:
:Product { schema:productId xsd:string [1..1] pattern="^[A-R]" . schema:productId xsd:string [1..1] pattern="^[M-Z]" . schema:brand IRI @:Organization [0..*] . schema:purchaseDate xsd:date [0..1] } :Organization { schema:name xsd:string } |
Though the examples look similar on the surface, there are several subtle differences.
The ShEx schema says that there must be two values for the property
schema
:
productId
,
one matching
and the other matching
"^[A-R]"
.
In contrast, the SHACL shapes graph says that there is only one property
"^[M-Z]"
schema
:
productId
,
which must satisfy both regular expressions.
Given the following RDF data:
:p1 a :Product ; # Passes as a :Product using ShEx schema:productId "AB" ; # Fails as a :Product using SHACL schema:productId "XY" ; schema:brand :myBrand . :p2 a :Product ; # Fails as a :Product using ShEx schema:productId "MON" ; # Passes as a :Product using SHACL schema:brand :myBrand . :myBrand schema:name "MyBrand" . |
Node
:
p1
conforms to ShEx definition of
:
Product
and does not conform to SHACL because the constraints on
schema
:
productId
are not satisfied (both must be satisfied).
Node
:
p2
does not conform to ShEx because it only has one
schema
:
productId
but conforms to SHACL because it satisfies all constraints.
The RDF vocabulary of ShEx is also different from SHACL.
The RDF representation of Example 163 in ShEx is:
:Product a sx:Shape ; sx:expression [ a sx:EachOf ; sx:expressions ( [ a sx:TripleConstraint ; sx:predicate schema:productId ; sx:valueExpr [ a sx:NodeConstraint ; sx:pattern "^[A-R]" ] ] [ a sx:TripleConstraint ; sx:predicate schema:productId ; sx:valueExpr [ a sx:NodeConstraint ; sx:pattern "^[M-Z]" ] ] [ a sx:TripleConstraint ; sx:predicate schema:brand ; sx:min 0; sx:max -1; sx:valueExpr [ a sx:ShapeAnd ; sx:expressions ( [ a sx:NodeConstraint; sx:nodeKind sx:iri ] :Organization ) ] ] [ a sx:TripleConstraint ; sx:predicate schema:purchaseDate ; sx:min 0 ; sx:max 1 ; sx:valueExpr [ a sx:NodeConstraint ; sx:datatype xsd:date ] ] ) ] . |
Here is the RDF encoding of the SHACL shapes graph in Example 163:
:Product a sh:NodeShape ; sh:property [ sh:path schema:productId ; sh:minCount 1 ; sh:maxCount 1 ; sh:pattern "^[A-R]" ; ]; sh:property [ sh:path schema:productId ; sh:minCount 1 ; sh:maxCount 1 ; sh:pattern "^[M-Z]" ; ]; sh:property [ sh:path schema:brand ; sh:nodeKind sh:IRI ; sh:node :Organization ]; sh:property [ sh:path schema:purchaseDate ; sh:maxCount 1 ; sh:datatype xsd:date ] . |
Although both languages share a common goal, their designs are based on different approaches.
The designers of ShEx intended the language to be like a grammar or schema for RDF graphs. This design was inspired by languages such as Yacc, RelaxNG, and XML Schema. The main goal was to describe RDF graph structures so they could be validated against those descriptions.
In contrast, the designers of SHACL aimed at providing a constraint language for RDF. The main goal of SHACL is to verify that a given RDF graph satisfies a collection of constraints. In this sense, SHACL follows the Schematron approach, applied to RDF: it declares constraints that RDF graphs must fulfill. Just as Schematron relies strongly on XPath, SHACL relies strongly on SPARQL.
This difference is reflected in how validation results fit in. ShEx implementations usually construct a data structure representing the RDF graph that were validated, containing the nodes and shapes that were matched. After ShEx validation, the result shape map contains a structure which can be considered as an annotated graph that can be traversed or used for further actions, such as transforming RDF graphs into other data structures. This structure is analogous to the Post Schema Validation Infoset from XML Schema (see Section 3.1.3).
In contrast, SHACL describes in detail the errors returned when constraints are not satisfied.
A SHACL validation report (see Section 5.5) can be very useful for detecting and repairing errors in RDF graphs.
When there are no errors, SHACL processors usually report a single value,
sh
:
conformance
true
.
With SHACL, it can be difficult for users to distinguish the case in which a node
is valid because it was checked against some shape, versus the case in which a node
is not valid but was ignored by the SHACL processor because it was not reached during the validation process.
The SHACL recommendation prescribes a basic structure for each violation result but does not prescribe what information is to be returned when a node is validated. Nevertheless, SHACL processors can enrich their results. Shaclex, for example, returns information about the nodes validated.
SHACL shapes can include target declarations that associate each shape with a set of RDF nodes and tell SHACL processors how to trigger the validation process (see Section 5.7).
Consider the following SHACL shapes graph:
:UserShape a sh:NodeShape ; sh:targetClass :User ; sh:targetObjectsOf schema:member ; sh:targetSubjectsOf schema:familyName ; sh:targetNode :alice ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ] . |
and the following RDF graph:
:alice schema:name "Alice" . :bob a :User ; schema:name "Robert" . :myCompany schema:member :carol . :carol schema:name "Carol" . :dave schema:familyName "Smith" ; schema:name "Dave Smith" . |
A SHACL processor checks that
:
alice
,
:
bob
,
:
carol
, and
:
dave
conform to
:
UserShape
.
Directly associating target declarations to shapes can become quite verbose
(see Section 6.6).
At the same time, it can limit the reusability of a shape in other contexts.
In the example above, if we import
:
UserShape
in another context where the node
:
alice
represents a product instead of a user, the SHACL processor will still try to validate the node with that shape.
To avoid such cases, SHACL provides the
sh
:
deactivated
directive
(see Section 106).
While including the target declarations in the schema is a convenient way to trigger validation, it can be considered an anti-pattern because the shape can’t be reused for other data.
Even though this could work in some closed systems, it is impractical for data in open environments.
In the interest of keeping schemas reusable, it is a good practice for SHACL
to place target declarations in a separate file and link this file to the schema with
owl
:
imports
.
A ShEx schema declares a constellation of shape expressions that function as a grammar against which RDF nodes can be tested. The schema itself provides no mechanism for associating a shape expression with the nodes to which the schema applies. In the interest of making schemas reusable, ShEx requires that definitions of shapes be decoupled from their application to particular RDF graphs. ShEx separates the language of schemas, on the one hand, from the association of shapes with nodes to be validated, on the other, by introducing the notion of shape maps (see Section 4.9 for more details). This separation of concerns encourages the community to innovate on node-shape association mechanisms independently from the validation semantics. For example, though the shape map specification currently only supports RDF nodes by direct reference or by triple pattern, Wikidata versions of ShEx include support for SPARQL queries over remote endpoints. As such conventions evolve they can be rolled into future versions of the shape map specification.
The SHACL shapes graph from Example 165 can be expressed in ShEx with the following query shape map:
{ FOCUS rdf:type :User }@:UserShape, { _ schema:member FOCUS }@:UserShape, { FOCUS schema:familyName _ }@:UserShape, :alice @:UserShape |
and removing the target declarations from the shape definition:
:UserShape { schema:name xsd:string } |
The declarations above behave similarly to the SHACL target declarations.
One subtle difference is that while in the previous case, ShEx only checks direct instances of
:
User
,
SHACL applies the concept of SHACL instance, which also encompass instances of subclasses of
:
User
.
This possibility can be expressed using property paths in shape maps as:
{ FOCUS rdf:type/rdfs:subClassOf* :User }@:UserShape |
Another notable difference between SHACL target node declarations and ShEx shape maps is the following: when a declared target node in SHACL does not exist in the data graph and there are no required values for this node in the shape, the node passes the validation. In ShEx if the node does not exit it always results in a failure, no matter of the shape definition.
SHACL leverages the property
owl
:
imports
to enable a shapes graph to import other shapes graphs.
This mechanism, which can be used to provide the basis of a modular design, is described in Section 5.4.
ShEx has the concept of
shapeExternal
to declare that the contents of a shape can be obtained from an external source (see Section 4.7.3).
ShEx has a basic import mechanism which allows a schema to derefentiate another schema (see section 4.12)
while SHACL has also the possibility to import other shapes graphs using
owl
:
imports
(see section 5.4).
One difference between ShEx and SHACL import mechanisms is that ShEx dereferentiates the schema while SHACL is a graph merge,
so in SHACL the system expects to have already fetched all of the relevant shapes graphs.
Both languages support the reuse of shapes through extending a shape with an
AND
operator, as described in
Section 4.8.1
(ShEx) and Section 127 (SHACL).
As a simple example, the following ShEx schema declares a
:
Product
shape and a
:
SoldProduct
shape:
:Product { schema:productId xsd:string ; schema:price xsd:decimal } :SoldProduct @:Product AND { schema:purchaseDate xsd:date ; schema:productId /^[A-Z]/ } |
A
:
SoldProduct
has the same constraints as the
:
Product
plus two more constraints.
One that further restricts the property
schema
:
productId
and another one that requires a new property
schema
:
purchaseDate
.
Here is an analogous SHACL shapes graph:
:Product a sh:NodeShape; sh:property [ sh:path schema:productId ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ]; sh:property [ sh:path schema:price ; sh:datatype xsd:decimal ; sh:minCount 1 ; sh:maxCount 1 ; ]. :SoldProduct a sh:NodeShape; sh:and ( :Product [ sh:path schema:purchaseDate ; sh:datatype xsd:date ; sh:minCount 1 ; sh:maxCount 1 ; ] [ sh:path schema:productId ; sh:pattern "^[A-Z]" ; sh:minCount 1 ; sh:maxCount 1 ; ] ) . |
Another way to reuse shapes in SHACL is by leveraging the subclass relationship and the corresponding target declarations. The example above could be expressed as:
:Product a sh:NodeShape, rdfs:Class ; sh:property [ sh:path schema:productId ; sh:datatype xsd:string sh:minCount 1 ; sh:maxCount 1 ]; sh:property [ sh:path schema:price ; sh:datatype xsd:decimal sh:minCount 1 ; sh:maxCount 1 ]. :SoldProduct a sh:NodeShape, rdfs:Class ; rdfs:subClassOf :Product ; sh:property [ sh:path schema:purchaseDate ; sh:datatype xsd:date sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:path schema:productId ; sh:pattern "^[A-Z]" ; sh:minCount 1 ; sh:maxCount 1 ] ; . |
In this approach,
:
SoldProduct
is declared as subclass of
:
Product
.
The
rdfs
:
Class
declaration establishes that all nodes of
rdf
:
type
:
SoldProduct
must conform to shape
:
SoldProduct
and also to
:
Product
.
One limitation of this approach is that it requires nodes to have an
the appropriate
rdf
:
type
declaration as well as keep
rdfs
:
subClassOf
statements in the data graph.
The reusability of both languages could be improved. For example, there is no notion of a module, where one might declare internal or hidden shapes, or of public shapes that could be imported by other modules. Also, there is no notion of a shape extending other shape, inheriting some properties and redefining others. Such features could potentially be developed for both languages.
ShEx is only concerned with RDF graphs as they are presented to the validator. There is no interaction between the ShEx processor and any inference mechanism. In this way, ShEx can be used before or after inference. It can even be used to validate the behavior of an inference engine if one defines the shapes that an RDF graph must have before and after inference (see an example in Section 4.11).
In contrast, SHACL has some mechanisms that may interact with inference.
For example, the implicit class target (see Section 5.7.3),
which associates a shape with a class, triggers validation on all nodes that are
SHACL instances.
The notion of SHACL instance is different to the RDF Schema notion of instance because it encompasses instances
of a class plus its sub-classes (as determined by following
rdfs
:
subClassOf
links in the data), but does
not take into account all RDFS elements.
The results of applying a SHACL validator may be different if applied to RDF graphs before or after RDFS inference.
As SHACL processors are not required to support full RDFS inference, they may ignore other RDFS predicates, such as
rdfs
:
domain
,
rdfs
:
range
, and sub-properties of
rdfs
:
subClassOf
.
For example, consider the following SHACL shape:
:UserShape sh:targetClass :User . sh:property [ sh:path schema:name ; sh:minCount 1 ; sh:datatype xsd:string ; ] . |
and the following RDF data:
:Teacher rdfs:subClassOf :User . :teaches rdfs:domain :Teacher . :frank :teaches :Algebra ; # Ignored without RDFS inference *) schema:name "Frank" . #Passes as a :UserShape with RDFS inference :grace :teaches :Logic ; # Ignored without RDFS inference *) schema:name 34 . #Fails as a :UserShape with RDFS inference :oscar a :Teacher ; #Fails as a :UserShape schema:name 45 . |
If SHACL is applied after RDFS inference, the system checks whether
:
frank
and
:
grace
conform to
:
UserShape
.
This is because the domain declaration of
:
teaches
allows RDFS to infer that they are instances of
:
Teacher
and, hence, instances of
:
User
, with the following results:
:
grace
has a value for
schema
:
name
that is not an
xsd
:
string
.:
oscar
has a value for
schema
:
name
that is not an
xsd
:
string
.In contrast, if SHACL is applied without RDFS inference, the system returns only one error:
:
oscar
has a value for
schema
:
name
that is not an
xsd
:
string
.
The system does not check
:
frank
or
:
grace
against shape
:
User
because it only follows
rdf
:
type
and
rdfs
:
subClassOf
declarations.
In the absence of RDFS inference, the system only checks that
:
oscar
has shape
:
User
. If SHACL is applied after RDFS inference, the system checks the additional nodes.
This interference between SHACL and RDFS semantics hampers the use of SHACL to validate an inference system as the use case described for ShEx in Example 21.
The property
sh
:
entailment
can be used to declare that the SHACL processors should add inferred triples during validation to the data graph following the inference rules declared by a given entailment regime (see Section 5.17).
Nevertheless, SHACL processors are not required to support entailment regimes.
If a shapes graph declares an entailment and the processor does not support it, a failure must be signalled.
As pointed out above, SHACL puts more emphasis on validation and provides a dedicated RDF vocabulary for describing conformance and reporting detailed violation results.
For every focus node that does not conform to a shape, an instance of
sh
:
ViolationResult
is created in the SHACL results graph.
Each violation result links back to the focus node along with metadata, which includes the shape IRI, human readable messages, the failed constraint, the path, and (when available) the value node.
The severity level of a SHACL shape, if declared with (
sh
:
Info
,
sh
:
Warning
, or
sh
:
Violation
), can be included in the violation result
(see Section 5.6.5).
ShEx does not have rich violation reporting, but it can provide related functionality. The result of the validation process is a shape map which contains information about the nodes that conform to a shape or not. Every violation can be viewed as an entry showing the focus node and the shape that failed. ShEx processors usually enrich these entries with further information. As shapes in ShEx can contain arbitrary annotations (see Section 4.7.5), these annotations can be included in the results.
In simple and top-level shape definitions, SHACL provides
richer and granular violation reporting for each individual constraint that failed.
However, violations on nested constraints as formed using
sh
:
node
,
sh
:
and
,
sh
:
or
,
sh
:
xone
, or
sh
:
qualifiedValueShape
, report only which nested constraint failed (“
sh
:
node
failed”) without detailing why.
Implementations could report that information by means of the sh:detail property, but that would be an implementation dependent feature.
Also, as a result of validation ShEx produces a Result Map associating nodes
with shapes (either validated or non-validated) while SHACL has no comparable feature.
If no cardinality is declared, ShEx assumes the cardinality to be
{1,1}
while SHACL assumes
{0,*}
.
The following ShEx schema declares that nodes conforming to
:
UserShape
must have one
schema
:
name
and one
schema
:
givenName
.
:UserShape { schema:name xsd:string ; schema:givenName xsd:string ; } |
The following SHACL shapes graph declares that if there is a
schema
:
name
then it must have datatype
xsd
:
string
, and the same for
schema
:
givenName
:
:UserShape a sh:NodeShape ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; ] ; sh:property [ sh:path schema:givenName ; sh:datatype xsd:string ; ] . |
Given the following data:
:alice schema:name "Alice Cooper"; #Passes as a :UserShape - ShEx schema:givenName "Alice" . #Passes as a :UserShape - SHACL :bob schema:givenName "Robert" ; #Fails as a :UserShape - ShEx foaf:age 23 . #Passes as a :UserShape - SHACL :carol schema:name 345 ; #Fails as a :UserShape - ShEx schema:givenName 346 . #Fails as a :UserShape - SHACL |
The difference in results is based on the difference between the ShEx and SHACL points of view. In ShEx, a triple expression makes explicit which triples involving the focus node should be found in the graph, and specifying a cardinality may require several such triples. The absence of cardinality means one triple. In SHACL, a shape is a conjunction of constraints. A cardinality constraint is used to constrain the number of allowed triples of a given kind, and the absence of cardinality means no constraint on the number of triples allowed.
SHACL property shapes can use a subset of SPARQL 1.1 property paths as
values for
sh
:
path
.
In this way, SHACL leverages on the expressiveness of SPARQL property paths to define constraints.
ShEx does not support arbitrary property paths—only direct and inverse predicates. However, it is easy to emulate this SHACL behavior using nested shapes or recursion.
The following SHACL declaration:
:GrandParent a sh:NodeShape ; sh:property [ sh:path [ sh:zeroOrMorePath schema:knows] ; sh:class :Person ; ] ; sh:property [ sh:path (schema:child schema:child ) ; sh:minCount 1 ; sh:class :GrandChild ; ] . |
can be defined in ShEx as:
:GrandParent { schema:knows @:PersonKnown*; schema:child { schema:child { a [ :GrandChild ] } } } :PersonKnown { a [ :Person ] ; schema:knows @:PersonKnown* } |
ShEx supports the definition of cyclic data models with recursive shapes (see Section 4.7.2) while the processing of recursive shapes is undefined in SHACL (see Section 5.12.1). However, some recursion cases can be handled in SHACL through SHACL property paths.
The following shape declares a recursive
:
UserShape
as:
:UserShape IRI { schema:knows @:UserShape* } |
Nodes that conform to
:
UserShape
must be IRIs and can have zero or more
schema
:
knows
arcs whose values must all conform to
:
UserShape
.
A direct translation to SHACL would be:
:UserShapeRecursion a sh:NodeShape ; # This definition is recursive *) sh:nodeKind sh:IRI ; sh:property [ sh:path schema:knows ; sh:node :UserShapeRecursion ] . |
However, recursion in SHACL is undefined and not all SHACL processors may handle that definition in the same way. The specification leaves recursion as an implementation-dependent feature.
One possible solution is to add target declarations to the shape to trigger the validation against them.
A typical solution is to use
rdf
:
type
declarations as we saw in
Section 5.12.1.
In this case, we could also use
sh
:
targetSubjectsOf
like:
:UserShapeRecursion a sh:NodeShape ; sh:targetSubjectsOf schema:knows ; sh:nodeKind sh:IRI ; sh:property [ sh:path schema:knows ; sh:class :User ] . |
Now, every node that is a subject of
schema
:
knows
must conform to that shape.
This solution may not be realistic in general.
In this case, for example, we are forcing every node that is a subject of
schema
:
knows
to conform to
:
UserShape
and in other contexts, this could be too restrictive.
The same situation happens if we use
sh
:
targetClass
declarations.
Another approach to emulate recursive behavior is to use property paths. For example:
:UserShape a sh:NodeShape ; sh:property [ sh:path [ sh:zeroOrMorePath schema:knows] ; sh:nodeKind sh:IRI ; ] . |
In this case, every node that is related by property
schema
:
knows
zero or more times with the focus node, must be an IRI.
With this solution, there may be other nodes that are subjects of
schema
:
knows
but do not need to conform to
:
UserShape
.
In Section 5.12.1, we described more advanced alternatives for using SHACL property paths as an alternative to recursion.
Property pair constraints in SHACL can be used to compare current values with values from another path, checking if they are equal, different or less than them (see Section 5.14).
ShEx 2.0 does not have the concept of property pair constraints, though this possibility is being studied to be included in future versions.
The following shapes graph declares that nodes conforming to
:
UserShape
must fulfil the constraint that
schema
:
givenName
is equal to
foaf
:
firstName
and different from
schema
:
lastName
, and that
schema
:
birthDate
must be less than
:
loginDate
.
:UserShape a sh:NodeShape ; sh:property [ sh:path schema:givenName ; sh:datatype xsd:string ; sh:disjoint schema:lastName ; sh:minCount 1; sh:maxCount 1; ] ; sh:property [ sh:path foaf:firstName ; sh:equals schema:givenName ; sh:minCount 1; sh:maxCount 1; ] ; sh:property [ sh:path schema:birthDate ; sh:datatype xsd:date ; sh:lessThan :loginDate ; sh:minCount 1; sh:maxCount 1; ] . |
The previous example could be written in a future version of ShEx as:
:UserShape { # Not supported in ShEx 2.0 $<givenName> schema:givenName xsd:string ; $<firstName> schema:firstName xsd:string ; $<birthDate> schema:birthDate xsd:date ; $<loginDate> :loginDate xsd:date ; $<givenName> = $<firstName> ; $<givenName> != $<lastName> ; $<birthDate> < $<loginDate> } |
One constraint often required is the ability to declare unique keys.
Unique keys are combinations of values that must be unique in a given scope.
The scope can be the entire graph or a focus node.
One example of a unique constraint for an entire graph is to require that there be no pair of identical values for the properties
schema
:
givenName
and
schema
:
lastName
.
One example of a unique constraint with a focus node scope would be to require that each node not have two values of
rdfs
:
label
with the same language tag.
Neither SHACL nor ShEx 2.0 support unique keys in general, although they are supported by OWL 2.
SHACL Core offers the
sh
:
uniqueLang
constraint to say that there can be no more than one literal for each language tag
(see Section 124).
Other constraints can be defined using SHACL-SPARQL.
In the case of ShEx, there is a proposal to add a
UNIQUE
keyword to the language, with the scope and the list of predicates that must be unique as parameters.
:UserShape { # Not supported in ShEx 2.0 schema:givenName xsd:string ; schema:lastName xsd:string ; UNIQUE(schema:givenName, schema:lastName) } |
ShEx allows multiple constraints on triples involving the focus nodes with the same property to be defined.
This feature is called repeated properties as explained in
Section 4.6.7.
In SHACL, repeated properties behave conjunctively, which means that all constraints applied to properties with the same
sh
:
path
must be satisfied.
The typical SHACL pattern of:
:Shape a sh:NodeShape ; sh:property [ sh:path :p1; #...constraints on :p1... ]; sh:property [ sh:path :p2; #...constraints on :p2... ]; ... |
must be changed if we want
:
p1
and
:
p2
to be the same property, only with different values.
A direct translation of that pattern to:
:Shape a sh:NodeShape ; sh:property [ sh:path :p; # ...constraints on :p... ]; sh:property [ sh:path :p; #...other constraints on :p... ]; ... |
means that all constraints apply to the path
:
p
conjunctively.
The following ShEx schema declares that a
:
Person
has two parents, one with the value of
:
isMale
true
and the other with the value
:
isFemale
true
.
:Person { schema:parent { :isMale [ true ] } schema:parent { :isFemale [ true ] } } |
A direct translation of the ShEx schema into SHACL would be:
:Person a sh:NodeShape; sh:property [ sh:path schema:parent; sh:node [ sh:property [ sh:path :isMale ; sh:hasValue true ; sh:maxCount 1 ] ] ]; sh:property [ sh:path schema:parent; sh:node [ sh:property [ sh:path :isFemale ; sh:hasValue true ; sh:maxCount 1 ] ] ] . |
However, this SHACL Shapes graph would only be satisfied by a node whose
schema
:
parent
value is both male and female.
:alice a :Person; schema:parent :bob ; # Passes as a :Person in ShEx schema:parent :carol . # Fails as a :Person in SHACL :bob :isMale true . :carol :isFemale true . :dave a :Person ; schema:parent :x . # Fails as a :Person in ShEx # Passes as a :Person in SHACL :x :isMale true ; :isFemale true . |
As described in Section 5.12.2, repeated properties can be handled in SHACL using
sh
:
qualifiedValueShape
but the definitions are more verbose.
The following declaration handles the previous example using qualified value shapes.
:Person a sh:NodeShape; sh:property [ sh:path schema:parent ; sh:qualifiedValueShape [ sh:path :isMale ; sh:hasValue true ] ; sh:qualifiedMinCount 1 ; sh:qualifiedMaxCount 1 ; ]; sh:property [ sh:path schema:parent ; sh:qualifiedValueShape [ sh:path :isFemale ; sh:hasValue true ] ; sh:qualifiedMinCount 1 ; sh:qualifiedMaxCount 1 ; ] ; sh:property [ sh:path schema:parent; sh:minCount 2; sh:maxCount 2 ] . |
Note that it requires to establish a count of the number of repeated properties allowed (in this case 2).
Data coherence minimizes defensive programming by providing predictable, logical data structures that must be used. To take a trivial example, a data structure may offer a choice between different representations of a name as in Example 55 (for ShEx) and the corresponding Example 131 (for SHACL).
Let’s change the constraint to require a combination of
foaf
:
firstName
and
foaf
:
lastName
or
foaf
:
givenName
and
foaf
:
familyName
or
schema
:
givenName
and
schema
:
familyName
where none of these properties can be mixed with the others. In ShEx, this can be declared as:
:Person { foaf:firstName . ; foaf:lastName . | foaf:givenName . ; foaf:familyName . | schema:givenName . ; schema:familyName . } |
Given the following data,
:
alice
and
:
bob
conform to
:
Person
while
:
carol
and
:
dave
do not.
In the case of
:
dave
, it fails because the data meets one side of the disjunction
and has some properties from the other side.
:alice foaf:firstName "Alice" ; #Passes as a :Person foaf:lastName "Cooper" . :bob schema:givenName "Robert" ; #Passes as a :Person schema:familyName "Smith" . :carol foaf:firstName "Carol" ; #Fails as a :Person foaf:lastName "King" ; schema:givenName "Carol" ; schema:familyName "King" . :dave foaf:firstName "Dave" ; #Fails as a :Person foaf:lastName "Clark" ; schema:givenName "Dave" . |
A first attempt to model the example in SHACL could be:
:PersonShape a sh:NodeShape; sh:targetClass :Person ; sh:xone ( [ sh:property [ sh:path foaf:firstName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:lastName; sh:minCount 1; sh:maxCount 1 ] ; ] [ sh:property [ sh:path foaf:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:familyName; sh:minCount 1; sh:maxCount 1 ] ; ] [ sh:property [ sh:path schema:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path schema:familyName; sh:minCount 1; sh:maxCount 1 ] ; ] ) . |
However, this SHACL shapes graph has a meaning different from the ShEx schema.
In this case,
:
dave
conforms to
:
Person
because it matches exactly one of the shapes (it has
foaf
:
firstName
and
foaf
:
lastName
) and does not match the other shapes.
The intended meaning was that it should not have any of the other properties but it has
schema
:
givenName
.
As we described in Section 131, SHACL’s
sh
:
xone
does not check if there are partial matches in other shapes. A workaround to simulate ShEx behavior is to normalize the expression using a top-level disjunction whose shapes exclude the properties that are not desired.
:Person a sh:NodeShape; sh:or ( [ sh:property [ sh:path foaf:firstName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path foaf:lastName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path foaf:givenName; sh:maxCount 0 ]; sh:property [ sh:path foaf:familyName; sh:maxCount 0 ]; sh:property [ sh:path schema:givenName; sh:maxCount 0 ]; sh:property [ sh:path schema:familyName; sh:maxCount 0 ]; ] [ sh:property [ sh:path foaf:firstName; sh:maxCount 0 ]; sh:property [ sh:path foaf:lastName; sh:maxCount 0 ]; sh:property [ sh:path foaf:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:familyName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path schema:givenName; sh:maxCount 0 ] ; sh:property [ sh:path schema:familyName; sh:maxCount 0 ]; ] [ sh:property [ sh:path foaf:firstName; sh:maxCount 0 ]; sh:property [ sh:path foaf:lastName; sh:maxCount 0 ]; sh:property [ sh:path foaf:givenName; sh:maxCount 0 ]; sh:property [ sh:path foaf:familyName; sh:maxCount 0 ]; sh:property [ sh:path schema:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path schema:familyName; sh:minCount 1; sh:maxCount 1 ]; ] ) . |
Although this approach solves the problem, more complex and nested shapes can increase the complexity and readability of SHACL shapes.
ShEx has the
CLOSED
keyword to declare that a node must not have other properties beyond those declared in the shape.
SHACL also has a
sh
:
closed
parameter to declare that a node conforming to a shape must not have other properties different from the properties declared in the shape.
Although they look similar, there are some differences due to the interaction of CLOSED with other language features.
When a SHACL shape is closed, SHACL processors only take into account the top-level properties that appear as the values of
sh
:
path
in property paths.
In this way, it is not the same if a shape is declared as a conjunction of property paths as when it is declared using
sh
:
and
.
The following shape declares that nodes conforming to
:
UserShape
must have properties
schema
:
name
and
schema
:
birthDate
.
The declaration
sh
:
closed
true
specifies that nodes conforming to
:
UserShape
cannot have other properties.
:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; sh:property [ sh:path schema:birthDate ; sh:minCount 1; sh:maxCount 1 ; sh:datatype xsd:date ] . |
If we rewrite that example using a
sh
:
and
as:
:UserShape a sh:NodeShape ; sh:closed true ; sh:and ( [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] [ sh:path schema:birthDate ; sh:minCount 1; sh:maxCount 1 ; sh:datatype xsd:date ] ) . |
then there will be no nodes satisfying the shape, as
the two properties nested under
sh
:
and
are thus hidden and not taken into consideration by the
sh
:
closed
directive.
A solution in this case is to enumerate the properties that we allow using
sh
:
ignoredProperties
.
In this case, one should add:
:UserShape sh:ignoredProperties (schema:name schema:birthDate ) . |
A similar situation could happen if we use more complex property paths.
For example, we may want to declare that users can have either
schema
:
name
or
foaf
:
name
using an alternative property path as:
:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; . |
As in the previous example, no node would conform to that shape because the closed declaration does not find direct properties in property paths.
There are two solutions: either to add a
sh
:
ignoredProperties
declaration enumerating all the properties as in previous example, or to add a property declaration for each predicate that specifies no cardinality, thus has no other effect.
:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; sh:property [ sh:path schema:name ] ; sh:property [ sh:path foaf:name ] ; . |
ShEx supports the definition of stems and stem ranges when defining value sets
(see Section 4.5.4).
SHACL does not have built-in support for stems or stem ranges.
Stems and stem ranges could be emulated with
sh
:
pattern
,
sh
:
nodeKind
, and
sh
:
or
.
The following example was described in Section 44 and declared
prefix codes: <http://example.codes/> :Product { :status [ codes:good~ codes:bad~ ] } |
A possible SHACL definition using regular expressions could be:
:Product a sh:NodeShape ; sh:property [ sh:path :status ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:IRI ; sh:or ( [ sh:pattern "^http://example.codes/good" ] [ sh:pattern "^http://example.codes/bad" ] ) ] . |
Another possibility is to define a reusable constraint component in SHACL-SPARQL as:
:StemConstraintComponent a sh:ConstraintComponent ; sh:parameter [ sh:path :stem ]; sh:validator [ a sh:SPARQLAskValidator ; sh:message "Value does not have stem {$stem}"; sh:ask """ ASK { FILTER (!isBlank($value) && strstarts(str($value),str($stem))) }""" ] . |
which can be used as:
:Product a sh:NodeShape ; sh:property [ sh:path :status ; sh:minCount 1 ; sh:maxCount 1 ; sh:or ( :stem <http://example.codes/good> :stem <http://example.codes/bad> ) . |
ShEx also has range exclusions that can declare values to exclude, either literal or specified with a stem (see 45). That feature is not part of SHACL Core and should be defined using SHACL-SPARQL.
ShEx has the concept of annotations which can be attached to several constructs (see Section 4.7.5). For example, the following ShEx schema attaches two annotations to each triple constraint.
:Person { schema:name xsd:string // rdfs:label "Name" // rdfs:comment "Name of person" ; schema:birthDate xsd:date // rdfs:label "BirthDate" // rdfs:comment "Date of birth" } |
ShEx does not endorse or require the use of any specific annotation vocabulary.
SHACL has non-validating constraint components (see Section 5.15), such as
sh
:
name
and
sh
:
description
,
which are ignored by the SHACL processor during validation but can have special meaning for user interface generation.
It is also possible to add further informative triples to any constraint or component, such as
rdfs
:
label
.
The following ShEx schema declares a shape
:
Person
using the non-validating properties
sh
:
name
and
sh_description
and the annotation
rdfs
:
label
.
:Person a sh:NodeShape ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; sh:name "Name" ; sh:description "Name of person" rdfs:label "Name"; ]; sh:property [ sh:path schema:birthDate ; sh:datatype xsd:date ; sh:name "BirthDate" ; sh:description "Birth date" rdfs:label "BirthDate"; ] . |
As we saw in Section 5.15, SHACL non-validating properties can be helpful for generating forms from SHACL definitions.
Although ShEx does not provide built-in non-validating properties, it would be possible to use annotations from other vocabularies, even from SHACL.
The ShEx semantic specification [81] is based on mathematical concepts and has been proven to have a well founded semantics [11]. As we saw in Section 4.8.3, a restriction was imposed on the combination of recursion and negation to avoid ill-formed data models.
With regards to the complexity of the validation algorithm, ShEx semantics is based on a partitioning strategy where triples in the data are assigned to triple constraints in the schema and the matching algorithm must take into account that arcs in a graph are unordered. It is possible to construct schemas for which it is very expensive to find a mapping from RDF data triples to triple constraints that satisfies the schema. In practical schemas, this is rarely a concern as the search space is quite small, but certain mistakes in a schema can create a large search space. The ShEx primer2 contains some advices to improve performance.
"Accidentally duplicating many triple constraints in a shape causes the search space to explode. If a validation process takes a long time or a lot of memory, look for duplicated chunks of the schema.For shapes with multiple triple constraints for the same predicate, try to minimize the overlap between the value expressions. For instance, if three types of inspection are necessary on a manufacturing checklist, use three different constraints for each of the inspection properties rather than requiring three different inspection properties with a value expression which is a union of all three types. This will make the validation process more efficient and will more effectively capture the business logic in the schema."
The SHACL Core semantics is defined in natural language with some non-normative SPARQL templates, while SHACL SPARQL depends on a SPARQL processor. Its complexity depends on the complexity of SPARQL, which can also be quite expensive, especially in the use of property paths. As in the case of ShEx, it is also possible to declare shapes graphs that may consume a lot of time or memory.
Both ShEx and SHACL open the door for further research on optimizations and specialized implementations usable for big datasets. Validators could define language subsets with constructs that behave better when confronted with such datasets. To our knowledge, current implementations have mainly been tested on in-memory data: separate RDF files, or relatively small units of work (transactions). An exception is RDFUnit, that supports the execution of SHACL directly on SPARQL endpoints and thus, can theoretically scale along with the capabilities of the SPARQL engine. A lot of research remains to see how how very large (and not in-memory) data sets can be efficiently validated with RDF shapes.
Benchmarks and testing tools are an essential step towards measuring the performance of both languages as well as implementations. One early attempt was to use the WebIndex dataset as a benchmark [57].
SHACL-SPARQL can be used to define both custom SPARQL-based constraints as well as reusable SPARQL-based constraint components (see Section 5.16.2). As the constraint components are defined in SPARQL, any SPARQL compliant engine could potentially run them without requiring software updates for execution. A SPARQL engine will be required in any case. SHACL also provides SHACL-Javascript that can be used to write extensions (Section 5.20).
SHACL-SPARQL allows the definition of new constraint components which can have parameters and can be reused in new contexts. It is expected that SHACL libraries of useful constraint components will be developed in the future. For example, the http://datashapes.org/ site contains a collection of some constraint components that extend SHACL Core.
ShEx has provisions for callout to arbitrary functions, called semantic actions, that are language-agnostic (see Section 4.10). However, semantic actions cannot be used to create new reusable parametrizable shape expressions. This is considered an item for future work on ShEx.
As of July 2017, it appears that ShEx and SHACL will evolve as two different specifications. The design of SHACL prioritized the use of SPARQL as an execution engine and an extension mechanism for defining new constraint components, while ShEx was designed de novo to meet its use cases. SHACL leverages a query language for validating sets of constraints, while validation schemas in the ShEx language are defined in terms of a grammar.
There is, however, a significant intersection between the two languages. Many common use cases may be met with either language, although users should consider how the limitations of these languages apply to their current and future requirements. In this book, we described and compared each formalism so that readers can assess which technology better fits their problems.
If we look for parallels in the XML ecosystem, ShEx is closer to RelaxNG or XML Schema, which provide structural definitions for XML documents. SHACL is closer to Schematron, which defines rules or constraints on top of XPath analogously to how SHACL defines constraints on top of SPARQL. SHACL Core can capture simple structures, but more complex structures, with exclusive choices or repeated properties, may require multiple inter-related constraints.
The two specifications currently have different implementation ecosystems. ShEx has been implemented in a variety of programming languages and RDF libraries: Apache Jena, Ruby, Javascript, Haskell, and Python (see section 4.3). In the case of SHACL, most implementations are based on Apache Jena and there is an implementation based on Javascript (see section 5.2) although there are some implementations appearing in other systems like rdf4j. Most ShEx implementations are non-commercial and have been developed mainly by individual projects. SHACL has a mature commercial implementation, bundled with the TopBraid suite of products, which offers a rich user interface for editing SHACL-based data models. Although TopBraid is a commercial product, SHACL’s implementation is based on a separate open source library maintained by TopQuadrant. SHACL is also integrated in the free edition of TopBraid Composer.
Both ShEx and SHACL open several lines for future work and research.
In the future, these diagrams and vocabulary specifications can be backed by ShEx or SHACL specifications. A first step in that direction is seen where SHACL is used to capture the RDF Data Cube integrity constraints.3 There is much room for innovations connecting these graphical representations to ShEx schemas or SHACL shapes graphs, such as shape visualization, or generating shapes from customized UML diagrams.
Given that there is already a large amount of RDF data that comes from structured sources such as SQL databases or Wikipedia info boxes, derived schemas will likely reflect constraints native to the source format from which the data was converted or extracted.
On the other hand, the underpinnings of ShEx and SHACL are not radically different. One implementation, Shaclex,5 uses compatible parts of libraries to implement a processor for both SHACL and ShEx and is being extended to convert between subsets of the languages.
ShEx and SHACL will play an important role in the future development of RDF and will be a core part of the Semantic Web tool set. As more semantic data is generated, and more applications are needed to integrate and consume it, RDF validation will be a fundamental enabler for data quality and systems interoperability.