Previous Up Next
Validating RDF data

Chapter 7  Comparing ShEx and SHACL

In this chapter we present a comparison between ShEx and SHACL. The technologies have similar goals and similar features. In fact at the start of the Data Shapes Working Group in 2014, convergence on a unified approach was considered possible. However, this did not happen and as of July 2017 both technologies are maintained as separate solutions.

We start by describing some of the common features that they share, followed by a review of the main differences.

7.1  Common Features

ShEx and SHACL share the same goal, to have a mechanism for describing and validating RDF data using a high-level language, so there are a lot of common features that both share.

7.2  Syntactic Differences

The design of ShEx emphasized human readability, with a compact grammar that follows traditional language design principles and a compact syntax evolved from Turtle. The specification defines an abstract syntax. The compact syntax (ShExC), a concrete JSON syntax (ShExJ), or any of the concrete syntaxes for RDF may be used to express a ShEx schema.

SHACL uses the RDF abstract syntax and concrete syntaxes directly. The SHACL specification enumerates circa 120 rules that define what constitutes a well-formed SHACL shapes graph.1 SHACL processors can simply omit ill-formed shapes graphs.

A compact syntax inspired by ShEx has been proposed for a subset of SHACL as a WG Note (see Section 5.18) but it is not mandatory, and compliant SHACL processors are only required to handle the RDF syntax.

As the SHACL compact syntax was inspired by ShExC, they look similar, but there are several semantic differences.

Example 163  Comparing ShEx and SHACL compact syntaxes

Given the following ShEx schema:

:Product { schema:productId /^[A-R]/ ; schema:productId /^[M-Z]/ ; schema:brand IRI @:Organization* ; schema:purchaseDate xsd:date ? } :Organization { schema:name xsd:string }

A similar (but not equivalent) representation using SHACL compact syntax is:

:Product { schema:productId xsd:string [1..1] pattern="^[A-R]" . schema:productId xsd:string [1..1] pattern="^[M-Z]" . schema:brand IRI @:Organization [0..*] . schema:purchaseDate xsd:date [0..1] } :Organization { schema:name xsd:string }

Though the examples look similar on the surface, there are several subtle differences. The ShEx schema says that there must be two values for the property schema:productId, one matching "^[A-R]" and the other matching "^[M-Z]". In contrast, the SHACL shapes graph says that there is only one property schema:productId, which must satisfy both regular expressions.

Given the following RDF data:

:p1 a :Product ; # Passes as a :Product using ShEx schema:productId "AB" ; # Fails as a :Product using SHACL schema:productId "XY" ; schema:brand :myBrand . :p2 a :Product ; # Fails as a :Product using ShEx schema:productId "MON" ; # Passes as a :Product using SHACL schema:brand :myBrand . :myBrand schema:name "MyBrand" .

Node :p1 conforms to ShEx definition of :Product and does not conform to SHACL because the constraints on schema:productId are not satisfied (both must be satisfied). Node :p2 does not conform to ShEx because it only has one schema:productId but conforms to SHACL because it satisfies all constraints.

The RDF vocabulary of ShEx is also different from SHACL.

Example 164  

The RDF representation of Example 163 in ShEx is:

:Product a sx:Shape ; sx:expression [ a sx:EachOf ; sx:expressions ( [ a sx:TripleConstraint ; sx:predicate schema:productId ; sx:valueExpr [ a sx:NodeConstraint ; sx:pattern "^[A-R]" ] ] [ a sx:TripleConstraint ; sx:predicate schema:productId ; sx:valueExpr [ a sx:NodeConstraint ; sx:pattern "^[M-Z]" ] ] [ a sx:TripleConstraint ; sx:predicate schema:brand ; sx:min 0; sx:max -1; sx:valueExpr [ a sx:ShapeAnd ; sx:expressions ( [ a sx:NodeConstraint; sx:nodeKind sx:iri ] :Organization ) ] ] [ a sx:TripleConstraint ; sx:predicate schema:purchaseDate ; sx:min 0 ; sx:max 1 ; sx:valueExpr [ a sx:NodeConstraint ; sx:datatype xsd:date ] ] ) ] .

Here is the RDF encoding of the SHACL shapes graph in Example 163:

:Product a sh:NodeShape ; sh:property [ sh:path schema:productId ; sh:minCount 1 ; sh:maxCount 1 ; sh:pattern "^[A-R]" ; ]; sh:property [ sh:path schema:productId ; sh:minCount 1 ; sh:maxCount 1 ; sh:pattern "^[M-Z]" ; ]; sh:property [ sh:path schema:brand ; sh:nodeKind sh:IRI ; sh:node :Organization ]; sh:property [ sh:path schema:purchaseDate ; sh:maxCount 1 ; sh:datatype xsd:date ] .

7.3  Foundation: Schema vs. Constraints

Although both languages share a common goal, their designs are based on different approaches.

The designers of ShEx intended the language to be like a grammar or schema for RDF graphs. This design was inspired by languages such as Yacc, RelaxNG, and XML Schema. The main goal was to describe RDF graph structures so they could be validated against those descriptions.

In contrast, the designers of SHACL aimed at providing a constraint language for RDF. The main goal of SHACL is to verify that a given RDF graph satisfies a collection of constraints. In this sense, SHACL follows the Schematron approach, applied to RDF: it declares constraints that RDF graphs must fulfill. Just as Schematron relies strongly on XPath, SHACL relies strongly on SPARQL.

This difference is reflected in how validation results fit in. ShEx implementations usually construct a data structure representing the RDF graph that were validated, containing the nodes and shapes that were matched. After ShEx validation, the result shape map contains a structure which can be considered as an annotated graph that can be traversed or used for further actions, such as transforming RDF graphs into other data structures. This structure is analogous to the Post Schema Validation Infoset from XML Schema (see Section 3.1.3).

In contrast, SHACL describes in detail the errors returned when constraints are not satisfied. A SHACL validation report (see Section 5.5) can be very useful for detecting and repairing errors in RDF graphs. When there are no errors, SHACL processors usually report a single value, sh:conformance true. With SHACL, it can be difficult for users to distinguish the case in which a node is valid because it was checked against some shape, versus the case in which a node is not valid but was ignored by the SHACL processor because it was not reached during the validation process.

The SHACL recommendation prescribes a basic structure for each violation result but does not prescribe what information is to be returned when a node is validated. Nevertheless, SHACL processors can enrich their results. Shaclex, for example, returns information about the nodes validated.

7.4  Invoking Validation

SHACL shapes can include target declarations that associate each shape with a set of RDF nodes and tell SHACL processors how to trigger the validation process (see Section 5.7).

Example 165  Target declarations and SHACL invocation

Consider the following SHACL shapes graph:

:UserShape a sh:NodeShape ; sh:targetClass :User ; sh:targetObjectsOf schema:member ; sh:targetSubjectsOf schema:familyName ; sh:targetNode :alice ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ] .

and the following RDF graph:

:alice schema:name "Alice" . :bob a :User ; schema:name "Robert" . :myCompany schema:member :carol . :carol schema:name "Carol" . :dave schema:familyName "Smith" ; schema:name "Dave Smith" .

A SHACL processor checks that :alice, :bob, :carol, and :dave conform to :UserShape.

Directly associating target declarations to shapes can become quite verbose (see Section 6.6). At the same time, it can limit the reusability of a shape in other contexts. In the example above, if we import :UserShape in another context where the node :alice represents a product instead of a user, the SHACL processor will still try to validate the node with that shape. To avoid such cases, SHACL provides the sh:deactivated directive (see Section 106).

While including the target declarations in the schema is a convenient way to trigger validation, it can be considered an anti-pattern because the shape can’t be reused for other data. Even though this could work in some closed systems, it is impractical for data in open environments. In the interest of keeping schemas reusable, it is a good practice for SHACL to place target declarations in a separate file and link this file to the schema with owl:imports.

A ShEx schema declares a constellation of shape expressions that function as a grammar against which RDF nodes can be tested. The schema itself provides no mechanism for associating a shape expression with the nodes to which the schema applies. In the interest of making schemas reusable, ShEx requires that definitions of shapes be decoupled from their application to particular RDF graphs. ShEx separates the language of schemas, on the one hand, from the association of shapes with nodes to be validated, on the other, by introducing the notion of shape maps (see Section 4.9 for more details). This separation of concerns encourages the community to innovate on node-shape association mechanisms independently from the validation semantics. For example, though the shape map specification currently only supports RDF nodes by direct reference or by triple pattern, Wikidata versions of ShEx include support for SPARQL queries over remote endpoints. As such conventions evolve they can be rolled into future versions of the shape map specification.

Example 166  Invoking validation through Shape maps in ShEx

The SHACL shapes graph from Example 165 can be expressed in ShEx with the following query shape map:

{ FOCUS rdf:type :User }@:UserShape, { _ schema:member FOCUS }@:UserShape, { FOCUS schema:familyName _ }@:UserShape, :alice @:UserShape

and removing the target declarations from the shape definition:

:UserShape { schema:name xsd:string }

The declarations above behave similarly to the SHACL target declarations. One subtle difference is that while in the previous case, ShEx only checks direct instances of :User, SHACL applies the concept of SHACL instance, which also encompass instances of subclasses of :User. This possibility can be expressed using property paths in shape maps as:

{ FOCUS rdf:type/rdfs:subClassOf* :User }@:UserShape

Another notable difference between SHACL target node declarations and ShEx shape maps is the following: when a declared target node in SHACL does not exist in the data graph and there are no required values for this node in the shape, the node passes the validation. In ShEx if the node does not exit it always results in a failure, no matter of the shape definition.

7.5  Modularization and Reusability

SHACL leverages the property owl:imports to enable a shapes graph to import other shapes graphs. This mechanism, which can be used to provide the basis of a modular design, is described in Section 5.4.

ShEx has the concept of shapeExternal to declare that the contents of a shape can be obtained from an external source (see Section 4.7.3).

ShEx has a basic import mechanism which allows a schema to derefentiate another schema (see section 4.12) while SHACL has also the possibility to import other shapes graphs using owl:imports (see section 5.4). One difference between ShEx and SHACL import mechanisms is that ShEx dereferentiates the schema while SHACL is a graph merge, so in SHACL the system expects to have already fetched all of the relevant shapes graphs.

Both languages support the reuse of shapes through extending a shape with an AND operator, as described in Section 4.8.1 (ShEx) and Section 127 (SHACL).

Example 167  Extending shapes in ShEx and SHACL

As a simple example, the following ShEx schema declares a :Product shape and a :SoldProduct shape:

:Product { schema:productId xsd:string ; schema:price xsd:decimal } :SoldProduct @:Product AND { schema:purchaseDate xsd:date ; schema:productId /^[A-Z]/ }

A :SoldProduct has the same constraints as the :Product plus two more constraints. One that further restricts the property schema:productId and another one that requires a new property schema:purchaseDate.

Here is an analogous SHACL shapes graph:

:Product a sh:NodeShape; sh:property [ sh:path schema:productId ; sh:datatype xsd:string ; sh:minCount 1 ; sh:maxCount 1 ; ]; sh:property [ sh:path schema:price ; sh:datatype xsd:decimal ; sh:minCount 1 ; sh:maxCount 1 ; ]. :SoldProduct a sh:NodeShape; sh:and ( :Product [ sh:path schema:purchaseDate ; sh:datatype xsd:date ; sh:minCount 1 ; sh:maxCount 1 ; ] [ sh:path schema:productId ; sh:pattern "^[A-Z]" ; sh:minCount 1 ; sh:maxCount 1 ; ] ) .

Another way to reuse shapes in SHACL is by leveraging the subclass relationship and the corresponding target declarations. The example above could be expressed as:

:Product a sh:NodeShape, rdfs:Class ; sh:property [ sh:path schema:productId ; sh:datatype xsd:string sh:minCount 1 ; sh:maxCount 1 ]; sh:property [ sh:path schema:price ; sh:datatype xsd:decimal sh:minCount 1 ; sh:maxCount 1 ]. :SoldProduct a sh:NodeShape, rdfs:Class ; rdfs:subClassOf :Product ; sh:property [ sh:path schema:purchaseDate ; sh:datatype xsd:date sh:minCount 1 ; sh:maxCount 1 ] ; sh:property [ sh:path schema:productId ; sh:pattern "^[A-Z]" ; sh:minCount 1 ; sh:maxCount 1 ] ; .

In this approach, :SoldProduct is declared as subclass of :Product. The rdfs:Class declaration establishes that all nodes of rdf:type :SoldProduct must conform to shape :SoldProduct and also to :Product.

One limitation of this approach is that it requires nodes to have an the appropriate rdf:type declaration as well as keep rdfs:subClassOf statements in the data graph.

The reusability of both languages could be improved. For example, there is no notion of a module, where one might declare internal or hidden shapes, or of public shapes that could be imported by other modules. Also, there is no notion of a shape extending other shape, inheriting some properties and redefining others. Such features could potentially be developed for both languages.

7.6  Shapes, Classes, and Inference

ShEx is only concerned with RDF graphs as they are presented to the validator. There is no interaction between the ShEx processor and any inference mechanism. In this way, ShEx can be used before or after inference. It can even be used to validate the behavior of an inference engine if one defines the shapes that an RDF graph must have before and after inference (see an example in Section 4.11).

In contrast, SHACL has some mechanisms that may interact with inference. For example, the implicit class target (see Section 5.7.3), which associates a shape with a class, triggers validation on all nodes that are SHACL instances. The notion of SHACL instance is different to the RDF Schema notion of instance because it encompasses instances of a class plus its sub-classes (as determined by following rdfs:subClassOf links in the data), but does not take into account all RDFS elements.

The results of applying a SHACL validator may be different if applied to RDF graphs before or after RDFS inference. As SHACL processors are not required to support full RDFS inference, they may ignore other RDFS predicates, such as rdfs:domain, rdfs:range, and sub-properties of rdfs:subClassOf.

For example, consider the following SHACL shape:

:UserShape sh:targetClass :User . sh:property [ sh:path schema:name ; sh:minCount 1 ; sh:datatype xsd:string ; ] .

and the following RDF data:

:Teacher rdfs:subClassOf :User . :teaches rdfs:domain :Teacher . :frank :teaches :Algebra ; # Ignored without RDFS inference *) schema:name "Frank" . #Passes as a :UserShape with RDFS inference :grace :teaches :Logic ; # Ignored without RDFS inference *) schema:name 34 . #Fails as a :UserShape with RDFS inference :oscar a :Teacher ; #Fails as a :UserShape schema:name 45 .

If SHACL is applied after RDFS inference, the system checks whether :frank and :grace conform to :UserShape. This is because the domain declaration of :teaches allows RDFS to infer that they are instances of :Teacher and, hence, instances of :User, with the following results:

In contrast, if SHACL is applied without RDFS inference, the system returns only one error:

The system does not check :frank or :grace against shape :User because it only follows rdf:type and rdfs:subClassOf declarations. In the absence of RDFS inference, the system only checks that :oscar has shape :User. If SHACL is applied after RDFS inference, the system checks the additional nodes.

This interference between SHACL and RDFS semantics hampers the use of SHACL to validate an inference system as the use case described for ShEx in Example 21.

The property sh:entailment can be used to declare that the SHACL processors should add inferred triples during validation to the data graph following the inference rules declared by a given entailment regime (see Section 5.17). Nevertheless, SHACL processors are not required to support entailment regimes. If a shapes graph declares an entailment and the processor does not support it, a failure must be signalled.

7.7  Violation Reporting and Severities

As pointed out above, SHACL puts more emphasis on validation and provides a dedicated RDF vocabulary for describing conformance and reporting detailed violation results.

For every focus node that does not conform to a shape, an instance of sh:ViolationResult is created in the SHACL results graph. Each violation result links back to the focus node along with metadata, which includes the shape IRI, human readable messages, the failed constraint, the path, and (when available) the value node. The severity level of a SHACL shape, if declared with ( sh:Info, sh:Warning, or sh:Violation), can be included in the violation result (see Section 5.6.5).

ShEx does not have rich violation reporting, but it can provide related functionality. The result of the validation process is a shape map which contains information about the nodes that conform to a shape or not. Every violation can be viewed as an entry showing the focus node and the shape that failed. ShEx processors usually enrich these entries with further information. As shapes in ShEx can contain arbitrary annotations (see Section 4.7.5), these annotations can be included in the results.

In simple and top-level shape definitions, SHACL provides richer and granular violation reporting for each individual constraint that failed. However, violations on nested constraints as formed using sh:node, sh:and, sh:or, sh:xone, or sh:qualifiedValueShape, report only which nested constraint failed (“ sh:node failed”) without detailing why. Implementations could report that information by means of the sh:detail property, but that would be an implementation dependent feature. Also, as a result of validation ShEx produces a Result Map associating nodes with shapes (either validated or non-validated) while SHACL has no comparable feature.

7.8  Default Cardinalities

If no cardinality is declared, ShEx assumes the cardinality to be {1,1} while SHACL assumes {0,*}.

Example 168  Comparing cardinalities in ShEx and SHACL

The following ShEx schema declares that nodes conforming to :UserShape must have one schema:name and one schema:givenName.

:UserShape { schema:name xsd:string ; schema:givenName xsd:string ; }

The following SHACL shapes graph declares that if there is a schema:name then it must have datatype xsd:string, and the same for schema:givenName:

:UserShape a sh:NodeShape ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; ] ; sh:property [ sh:path schema:givenName ; sh:datatype xsd:string ; ] .

Given the following data:

:alice schema:name "Alice Cooper"; #Passes as a :UserShape - ShEx schema:givenName "Alice" . #Passes as a :UserShape - SHACL :bob schema:givenName "Robert" ; #Fails as a :UserShape - ShEx foaf:age 23 . #Passes as a :UserShape - SHACL :carol schema:name 345 ; #Fails as a :UserShape - ShEx schema:givenName 346 . #Fails as a :UserShape - SHACL

The difference in results is based on the difference between the ShEx and SHACL points of view. In ShEx, a triple expression makes explicit which triples involving the focus node should be found in the graph, and specifying a cardinality may require several such triples. The absence of cardinality means one triple. In SHACL, a shape is a conjunction of constraints. A cardinality constraint is used to constrain the number of allowed triples of a given kind, and the absence of cardinality means no constraint on the number of triples allowed.

7.9  Property Paths

SHACL property shapes can use a subset of SPARQL 1.1 property paths as values for
sh:path. In this way, SHACL leverages on the expressiveness of SPARQL property paths to define constraints.

ShEx does not support arbitrary property paths—only direct and inverse predicates. However, it is easy to emulate this SHACL behavior using nested shapes or recursion.

Example 169  Comparing paths in SHACL and ShEx

The following SHACL declaration:

:GrandParent a sh:NodeShape ; sh:property [ sh:path [ sh:zeroOrMorePath schema:knows] ; sh:class :Person ; ] ; sh:property [ sh:path (schema:child schema:child ) ; sh:minCount 1 ; sh:class :GrandChild ; ] .

can be defined in ShEx as:

:GrandParent { schema:knows @:PersonKnown*; schema:child { schema:child { a [ :GrandChild ] } } } :PersonKnown { a [ :Person ] ; schema:knows @:PersonKnown* }

7.10  Recursion

ShEx supports the definition of cyclic data models with recursive shapes (see Section 4.7.2) while the processing of recursive shapes is undefined in SHACL (see Section 5.12.1). However, some recursion cases can be handled in SHACL through SHACL property paths.

Example 170  Recursion

The following shape declares a recursive :UserShape as:

:UserShape IRI { schema:knows @:UserShape* }

Nodes that conform to :UserShape must be IRIs and can have zero or more schema:knows arcs whose values must all conform to :UserShape.

A direct translation to SHACL would be:

:UserShapeRecursion a sh:NodeShape ; # This definition is recursive *) sh:nodeKind sh:IRI ; sh:property [ sh:path schema:knows ; sh:node :UserShapeRecursion ] .

However, recursion in SHACL is undefined and not all SHACL processors may handle that definition in the same way. The specification leaves recursion as an implementation-dependent feature.

One possible solution is to add target declarations to the shape to trigger the validation against them. A typical solution is to use rdf:type declarations as we saw in Section 5.12.1. In this case, we could also use sh:targetSubjectsOf like:

:UserShapeRecursion a sh:NodeShape ; sh:targetSubjectsOf schema:knows ; sh:nodeKind sh:IRI ; sh:property [ sh:path schema:knows ; sh:class :User ] .

Now, every node that is a subject of schema:knows must conform to that shape.

This solution may not be realistic in general. In this case, for example, we are forcing every node that is a subject of schema:knows to conform to :UserShape and in other contexts, this could be too restrictive. The same situation happens if we use sh:targetClass declarations.

Another approach to emulate recursive behavior is to use property paths. For example:

:UserShape a sh:NodeShape ; sh:property [ sh:path [ sh:zeroOrMorePath schema:knows] ; sh:nodeKind sh:IRI ; ] .

In this case, every node that is related by property schema:knows zero or more times with the focus node, must be an IRI. With this solution, there may be other nodes that are subjects of schema:knows but do not need to conform to :UserShape.

In Section 5.12.1, we described more advanced alternatives for using SHACL property paths as an alternative to recursion.

7.11  Property Pair Constraints and Uniqueness

Property pair constraints in SHACL can be used to compare current values with values from another path, checking if they are equal, different or less than them (see Section 5.14).

ShEx 2.0 does not have the concept of property pair constraints, though this possibility is being studied to be included in future versions.

Example 171  Example with property pair constraints

The following shapes graph declares that nodes conforming to :UserShape must fulfil the constraint that schema:givenName is equal to foaf:firstName and different from schema:lastName, and that schema:birthDate must be less than :loginDate.

:UserShape a sh:NodeShape ; sh:property [ sh:path schema:givenName ; sh:datatype xsd:string ; sh:disjoint schema:lastName ; sh:minCount 1; sh:maxCount 1; ] ; sh:property [ sh:path foaf:firstName ; sh:equals schema:givenName ; sh:minCount 1; sh:maxCount 1; ] ; sh:property [ sh:path schema:birthDate ; sh:datatype xsd:date ; sh:lessThan :loginDate ; sh:minCount 1; sh:maxCount 1; ] .

The previous example could be written in a future version of ShEx as:

:UserShape { # Not supported in ShEx 2.0 $<givenName> schema:givenName xsd:string ; $<firstName> schema:firstName xsd:string ; $<birthDate> schema:birthDate xsd:date ; $<loginDate> :loginDate xsd:date ; $<givenName> = $<firstName> ; $<givenName> != $<lastName> ; $<birthDate> < $<loginDate> }

One constraint often required is the ability to declare unique keys. Unique keys are combinations of values that must be unique in a given scope. The scope can be the entire graph or a focus node. One example of a unique constraint for an entire graph is to require that there be no pair of identical values for the properties schema:givenName and schema:lastName. One example of a unique constraint with a focus node scope would be to require that each node not have two values of rdfs:label with the same language tag.

Neither SHACL nor ShEx 2.0 support unique keys in general, although they are supported by OWL 2. SHACL Core offers the sh:uniqueLang constraint to say that there can be no more than one literal for each language tag (see Section 124). Other constraints can be defined using SHACL-SPARQL. In the case of ShEx, there is a proposal to add a UNIQUE keyword to the language, with the scope and the list of predicates that must be unique as parameters.

:UserShape { # Not supported in ShEx 2.0 schema:givenName xsd:string ; schema:lastName xsd:string ; UNIQUE(schema:givenName, schema:lastName) }

7.12  Repeated Properties

ShEx allows multiple constraints on triples involving the focus nodes with the same property to be defined. This feature is called repeated properties as explained in Section 4.6.7. In SHACL, repeated properties behave conjunctively, which means that all constraints applied to properties with the same sh:path must be satisfied. The typical SHACL pattern of:

:Shape a sh:NodeShape ; sh:property [ sh:path :p1; #...constraints on :p1... ]; sh:property [ sh:path :p2; #...constraints on :p2... ]; ...

must be changed if we want :p1 and :p2 to be the same property, only with different values. A direct translation of that pattern to:

:Shape a sh:NodeShape ; sh:property [ sh:path :p; # ...constraints on :p... ]; sh:property [ sh:path :p; #...other constraints on :p... ]; ...

means that all constraints apply to the path :p conjunctively.

Example 172  Repeated properties in ShEx and SHACL

The following ShEx schema declares that a :Person has two parents, one with the value of :isMale true and the other with the value :isFemale true.

:Person { schema:parent { :isMale [ true ] } schema:parent { :isFemale [ true ] } }

A direct translation of the ShEx schema into SHACL would be:

:Person a sh:NodeShape; sh:property [ sh:path schema:parent; sh:node [ sh:property [ sh:path :isMale ; sh:hasValue true ; sh:maxCount 1 ] ] ]; sh:property [ sh:path schema:parent; sh:node [ sh:property [ sh:path :isFemale ; sh:hasValue true ; sh:maxCount 1 ] ] ] .

However, this SHACL Shapes graph would only be satisfied by a node whose schema:parent value is both male and female.

:alice a :Person; schema:parent :bob ; # Passes as a :Person in ShEx schema:parent :carol . # Fails as a :Person in SHACL :bob :isMale true . :carol :isFemale true . :dave a :Person ; schema:parent :x . # Fails as a :Person in ShEx # Passes as a :Person in SHACL :x :isMale true ; :isFemale true .

As described in Section 5.12.2, repeated properties can be handled in SHACL using sh:qualifiedValueShape but the definitions are more verbose.

Example 173  Repeated properties with qualified value shapes

The following declaration handles the previous example using qualified value shapes.

:Person a sh:NodeShape; sh:property [ sh:path schema:parent ; sh:qualifiedValueShape [ sh:path :isMale ; sh:hasValue true ] ; sh:qualifiedMinCount 1 ; sh:qualifiedMaxCount 1 ; ]; sh:property [ sh:path schema:parent ; sh:qualifiedValueShape [ sh:path :isFemale ; sh:hasValue true ] ; sh:qualifiedMinCount 1 ; sh:qualifiedMaxCount 1 ; ] ; sh:property [ sh:path schema:parent; sh:minCount 2; sh:maxCount 2 ] .

Note that it requires to establish a count of the number of repeated properties allowed (in this case 2).

7.13  Exactly One and Alternatives

Data coherence minimizes defensive programming by providing predictable, logical data structures that must be used. To take a trivial example, a data structure may offer a choice between different representations of a name as in Example 55 (for ShEx) and the corresponding Example 131 (for SHACL).

Let’s change the constraint to require a combination of foaf:firstName and foaf:lastName or foaf:givenName and foaf:familyName or schema:givenName and schema:familyName where none of these properties can be mixed with the others. In ShEx, this can be declared as:

:Person { foaf:firstName . ; foaf:lastName . | foaf:givenName . ; foaf:familyName . | schema:givenName . ; schema:familyName . }

Given the following data, :alice and :bob conform to :Person while :carol and :dave do not. In the case of :dave, it fails because the data meets one side of the disjunction and has some properties from the other side.

:alice foaf:firstName "Alice" ; #Passes as a :Person foaf:lastName "Cooper" . :bob schema:givenName "Robert" ; #Passes as a :Person schema:familyName "Smith" . :carol foaf:firstName "Carol" ; #Fails as a :Person foaf:lastName "King" ; schema:givenName "Carol" ; schema:familyName "King" . :dave foaf:firstName "Dave" ; #Fails as a :Person foaf:lastName "Clark" ; schema:givenName "Dave" .

A first attempt to model the example in SHACL could be:

:PersonShape a sh:NodeShape; sh:targetClass :Person ; sh:xone ( [ sh:property [ sh:path foaf:firstName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:lastName; sh:minCount 1; sh:maxCount 1 ] ; ] [ sh:property [ sh:path foaf:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:familyName; sh:minCount 1; sh:maxCount 1 ] ; ] [ sh:property [ sh:path schema:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path schema:familyName; sh:minCount 1; sh:maxCount 1 ] ; ] ) .

However, this SHACL shapes graph has a meaning different from the ShEx schema. In this case, :dave conforms to :Person because it matches exactly one of the shapes (it has foaf:firstName and foaf:lastName) and does not match the other shapes. The intended meaning was that it should not have any of the other properties but it has schema:givenName.

As we described in Section 131, SHACL’s sh:xone does not check if there are partial matches in other shapes. A workaround to simulate ShEx behavior is to normalize the expression using a top-level disjunction whose shapes exclude the properties that are not desired.

:Person a sh:NodeShape; sh:or ( [ sh:property [ sh:path foaf:firstName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path foaf:lastName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path foaf:givenName; sh:maxCount 0 ]; sh:property [ sh:path foaf:familyName; sh:maxCount 0 ]; sh:property [ sh:path schema:givenName; sh:maxCount 0 ]; sh:property [ sh:path schema:familyName; sh:maxCount 0 ]; ] [ sh:property [ sh:path foaf:firstName; sh:maxCount 0 ]; sh:property [ sh:path foaf:lastName; sh:maxCount 0 ]; sh:property [ sh:path foaf:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path foaf:familyName; sh:minCount 1; sh:maxCount 1 ]; sh:property [ sh:path schema:givenName; sh:maxCount 0 ] ; sh:property [ sh:path schema:familyName; sh:maxCount 0 ]; ] [ sh:property [ sh:path foaf:firstName; sh:maxCount 0 ]; sh:property [ sh:path foaf:lastName; sh:maxCount 0 ]; sh:property [ sh:path foaf:givenName; sh:maxCount 0 ]; sh:property [ sh:path foaf:familyName; sh:maxCount 0 ]; sh:property [ sh:path schema:givenName; sh:minCount 1; sh:maxCount 1 ] ; sh:property [ sh:path schema:familyName; sh:minCount 1; sh:maxCount 1 ]; ] ) .

Although this approach solves the problem, more complex and nested shapes can increase the complexity and readability of SHACL shapes.

7.14  Treatment of Closed Shapes

ShEx has the CLOSED keyword to declare that a node must not have other properties beyond those declared in the shape. SHACL also has a sh:closed parameter to declare that a node conforming to a shape must not have other properties different from the properties declared in the shape. Although they look similar, there are some differences due to the interaction of CLOSED with other language features.

When a SHACL shape is closed, SHACL processors only take into account the top-level properties that appear as the values of sh:path in property paths. In this way, it is not the same if a shape is declared as a conjunction of property paths as when it is declared using sh:and. The following shape declares that nodes conforming to :UserShape must have properties schema:name and schema:birthDate. The declaration sh:closed true specifies that nodes conforming to :UserShape cannot have other properties.

:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; sh:property [ sh:path schema:birthDate ; sh:minCount 1; sh:maxCount 1 ; sh:datatype xsd:date ] .

If we rewrite that example using a sh:and as:

:UserShape a sh:NodeShape ; sh:closed true ; sh:and ( [ sh:path schema:name ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] [ sh:path schema:birthDate ; sh:minCount 1; sh:maxCount 1 ; sh:datatype xsd:date ] ) .

then there will be no nodes satisfying the shape, as the two properties nested under sh:and are thus hidden and not taken into consideration by the sh:closed directive.

A solution in this case is to enumerate the properties that we allow using sh:ignoredProperties. In this case, one should add:

:UserShape sh:ignoredProperties (schema:name schema:birthDate ) .

A similar situation could happen if we use more complex property paths.

For example, we may want to declare that users can have either schema:name or foaf:name using an alternative property path as:

:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; .

As in the previous example, no node would conform to that shape because the closed declaration does not find direct properties in property paths.

There are two solutions: either to add a sh:ignoredProperties declaration enumerating all the properties as in previous example, or to add a property declaration for each predicate that specifies no cardinality, thus has no other effect.

:UserShape a sh:NodeShape ; sh:closed true ; sh:property [ sh:path [ sh:alternativePath ( schema:name foaf:name ) ] ; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string ] ; sh:property [ sh:path schema:name ] ; sh:property [ sh:path foaf:name ] ; .

7.15  Stems and Stem Ranges

ShEx supports the definition of stems and stem ranges when defining value sets (see Section 4.5.4). SHACL does not have built-in support for stems or stem ranges. Stems and stem ranges could be emulated with sh:pattern, sh:nodeKind, and sh:or.

Example 174  IRI ranges example

The following example was described in Section 44 and declared

prefix codes: <http://example.codes/> :Product { :status [ codes:good~ codes:bad~ ] }

A possible SHACL definition using regular expressions could be:

:Product a sh:NodeShape ; sh:property [ sh:path :status ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:IRI ; sh:or ( [ sh:pattern "^http://example.codes/good" ] [ sh:pattern "^http://example.codes/bad" ] ) ] .

Another possibility is to define a reusable constraint component in SHACL-SPARQL as:

:StemConstraintComponent a sh:ConstraintComponent ; sh:parameter [ sh:path :stem ]; sh:validator [ a sh:SPARQLAskValidator ; sh:message "Value does not have stem {$stem}"; sh:ask """ ASK { FILTER (!isBlank($value) && strstarts(str($value),str($stem))) }""" ] .

which can be used as:

:Product a sh:NodeShape ; sh:property [ sh:path :status ; sh:minCount 1 ; sh:maxCount 1 ; sh:or ( :stem <http://example.codes/good> :stem <http://example.codes/bad> ) .

ShEx also has range exclusions that can declare values to exclude, either literal or specified with a stem (see 45). That feature is not part of SHACL Core and should be defined using SHACL-SPARQL.

7.16  Annotations

ShEx has the concept of annotations which can be attached to several constructs (see Section 4.7.5). For example, the following ShEx schema attaches two annotations to each triple constraint.

Example 175  Annotations example in ShEx
:Person { schema:name xsd:string // rdfs:label "Name" // rdfs:comment "Name of person" ; schema:birthDate xsd:date // rdfs:label "BirthDate" // rdfs:comment "Date of birth" }

ShEx does not endorse or require the use of any specific annotation vocabulary.

SHACL has non-validating constraint components (see Section 5.15), such as sh:name and sh:description, which are ignored by the SHACL processor during validation but can have special meaning for user interface generation. It is also possible to add further informative triples to any constraint or component, such as rdfs:label.

Example 176  Annotations example in SHACL

The following ShEx schema declares a shape :Person using the non-validating properties sh:name and sh_description and the annotation rdfs:label.

:Person a sh:NodeShape ; sh:property [ sh:path schema:name ; sh:datatype xsd:string ; sh:name "Name" ; sh:description "Name of person" rdfs:label "Name"; ]; sh:property [ sh:path schema:birthDate ; sh:datatype xsd:date ; sh:name "BirthDate" ; sh:description "Birth date" rdfs:label "BirthDate"; ] .

As we saw in Section 5.15, SHACL non-validating properties can be helpful for generating forms from SHACL definitions.

Although ShEx does not provide built-in non-validating properties, it would be possible to use annotations from other vocabularies, even from SHACL.

7.17  Semantics and Complexity

The ShEx semantic specification [81] is based on mathematical concepts and has been proven to have a well founded semantics [11]. As we saw in Section 4.8.3, a restriction was imposed on the combination of recursion and negation to avoid ill-formed data models.

With regards to the complexity of the validation algorithm, ShEx semantics is based on a partitioning strategy where triples in the data are assigned to triple constraints in the schema and the matching algorithm must take into account that arcs in a graph are unordered. It is possible to construct schemas for which it is very expensive to find a mapping from RDF data triples to triple constraints that satisfies the schema. In practical schemas, this is rarely a concern as the search space is quite small, but certain mistakes in a schema can create a large search space. The ShEx primer2 contains some advices to improve performance.

"Accidentally duplicating many triple constraints in a shape causes the search space to explode. If a validation process takes a long time or a lot of memory, look for duplicated chunks of the schema.

For shapes with multiple triple constraints for the same predicate, try to minimize the overlap between the value expressions. For instance, if three types of inspection are necessary on a manufacturing checklist, use three different constraints for each of the inspection properties rather than requiring three different inspection properties with a value expression which is a union of all three types. This will make the validation process more efficient and will more effectively capture the business logic in the schema."

The SHACL Core semantics is defined in natural language with some non-normative SPARQL templates, while SHACL SPARQL depends on a SPARQL processor. Its complexity depends on the complexity of SPARQL, which can also be quite expensive, especially in the use of property paths. As in the case of ShEx, it is also possible to declare shapes graphs that may consume a lot of time or memory.

Both ShEx and SHACL open the door for further research on optimizations and specialized implementations usable for big datasets. Validators could define language subsets with constructs that behave better when confronted with such datasets. To our knowledge, current implementations have mainly been tested on in-memory data: separate RDF files, or relatively small units of work (transactions). An exception is RDFUnit, that supports the execution of SHACL directly on SPARQL endpoints and thus, can theoretically scale along with the capabilities of the SPARQL engine. A lot of research remains to see how how very large (and not in-memory) data sets can be efficiently validated with RDF shapes.

Benchmarks and testing tools are an essential step towards measuring the performance of both languages as well as implementations. One early attempt was to use the WebIndex dataset as a benchmark [57].

7.18  Extension Mechanisms

SHACL-SPARQL can be used to define both custom SPARQL-based constraints as well as reusable SPARQL-based constraint components (see Section 5.16.2). As the constraint components are defined in SPARQL, any SPARQL compliant engine could potentially run them without requiring software updates for execution. A SPARQL engine will be required in any case. SHACL also provides SHACL-Javascript that can be used to write extensions (Section 5.20).

SHACL-SPARQL allows the definition of new constraint components which can have parameters and can be reused in new contexts. It is expected that SHACL libraries of useful constraint components will be developed in the future. For example, the http://datashapes.org/ site contains a collection of some constraint components that extend SHACL Core.

ShEx has provisions for callout to arbitrary functions, called semantic actions, that are language-agnostic (see Section 4.10). However, semantic actions cannot be used to create new reusable parametrizable shape expressions. This is considered an item for future work on ShEx.

7.19  Conclusions and Outlook

As of July 2017, it appears that ShEx and SHACL will evolve as two different specifications. The design of SHACL prioritized the use of SPARQL as an execution engine and an extension mechanism for defining new constraint components, while ShEx was designed de novo to meet its use cases. SHACL leverages a query language for validating sets of constraints, while validation schemas in the ShEx language are defined in terms of a grammar.

There is, however, a significant intersection between the two languages. Many common use cases may be met with either language, although users should consider how the limitations of these languages apply to their current and future requirements. In this book, we described and compared each formalism so that readers can assess which technology better fits their problems.

If we look for parallels in the XML ecosystem, ShEx is closer to RelaxNG or XML Schema, which provide structural definitions for XML documents. SHACL is closer to Schematron, which defines rules or constraints on top of XPath analogously to how SHACL defines constraints on top of SPARQL. SHACL Core can capture simple structures, but more complex structures, with exclusive choices or repeated properties, may require multiple inter-related constraints.

The two specifications currently have different implementation ecosystems. ShEx has been implemented in a variety of programming languages and RDF libraries: Apache Jena, Ruby, Javascript, Haskell, and Python (see section 4.3). In the case of SHACL, most implementations are based on Apache Jena and there is an implementation based on Javascript (see section 5.2) although there are some implementations appearing in other systems like rdf4j. Most ShEx implementations are non-commercial and have been developed mainly by individual projects. SHACL has a mature commercial implementation, bundled with the TopBraid suite of products, which offers a rich user interface for editing SHACL-based data models. Although TopBraid is a commercial product, SHACL’s implementation is based on a separate open source library maintained by TopQuadrant. SHACL is also integrated in the free edition of TopBraid Composer.

Both ShEx and SHACL open several lines for future work and research.

ShEx and SHACL will play an important role in the future development of RDF and will be a core part of the Semantic Web tool set. As more semantic data is generated, and more applications are needed to integrate and consume it, RDF validation will be a fundamental enabler for data quality and systems interoperability.

7.20  Summary

7.21  Suggested Reading


1
The complete list of rules is defined in https://www.w3.org/TR/shacl/\#syntax-rules.
2
See: http://shex.io/shex-primer/
3
See: https://www.w3.org/2011/gld/validator/datacube.shapes.ttl
4
See: http://shex.io/extensions/Map/
5
See: http://labra.github.io/shaclex/

Previous Up Next