Chapter 4 Shape Expressions

Shape Expressions (ShEx) is a schema language for describing RDF graphs structures. ShEx was originally developed in late 2013 to provide a human-readable syntax for OSLC Resource Shapes. It added disjunctions, so it was more expressive than Resource Shapes. Tokens in the language were adopted from Turtle [80] and SPARQL [44] with tokens for grouping, repetition and wildcards from regular expression and RelaxNG Compact Syntax [100]. The language was described in a paper [80] and codified in a June 2014 W3C member submission [92] which included a primer and a semantics specification. This was later deemed “ShEx 1.0”.

The W3C Data Shapes Working group started in September 2014 and quickly coalesced into two groups: the ShEx camp and the SHACL camp. In 2016, the ShEx camp split from the Data Shapes Working Group to form a ShEx Community Group (CG). In April of 2017, the ShEx CG released ShEx 2 with a primer, a semantic specification and a test-suite with implementation reports.

As of publication, the ShEx Community Group was starting work on ShEx 2.1 to add features like value comparison and unique keys. See the ShEx Homepage http://shex.io/ for the state of the art in ShEx. A collection of ShEx schemas has also been started at https://github.com/shexSpec/schemas.

4.1 Use of ShEx

Strictly speaking, a ShEx schema defines a set of graphs. This can be used for many purposes, including communicating data structures associated with some process or interface, generating or validating data, or driving user interface generation and navigation. At the core of all of these use cases is the notion of conformance with schema. Even one is using ShEx to create forms, the goal is to accept and present data which is valid with respect to a schema.

ShEx has several serialization formats:

a concise, human-readable compact syntax (ShExC);
a JSON-LD syntax (ShExJ) which serves as an abstract syntax; and
an RDF representation (ShExR) derived from the JSON-LD syntax.

These are all isomorphic and most implementations can map from one to another.

Tools that derive schemas by inspection or translate them from other schema languages typically generate ShExJ. Interactions with users, e.g., in specifications are almost always in the compact syntax ShExC. As a practical example, in HL7 FHIR, ShExJ schemas are automatically generated from other formats, and presented to the end user using compact syntax. See Section 6.2.3 for more details.

ShExR allows to use RDF tools to manage schemas, e.g., doing a SPARQL query to find out whether an organization is using dc:creator with a string, a foaf:Person, or even whether an organization is consistent about it.

4.2 First Example

Example 26 below contains a very simple ShEx schema.

The first three lines declare prefixes using the same syntax as SPARQL Turtle.
Nest line defines a shape called :User. Nodes with that shape must satisfy the following constraints on their properties.
They must have exactly one value for property schema:name which must be a xsd:string.
They can have an optional property schema:birthDate with type xsd:date.
They must have exactly one property schema:gender whose value is schema:Male or schema:Female or some string.
They can have zero or more properties schema:knows whose value must be an IRI and conform to the :User shape.

Example 26 Simple ShEx Schema

PREFIX : <http://example.org/> PREFIX schema: <http://schema.org/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> :User { schema:name xsd:string ; schema:birthDate xsd:date? ; schema:gender [ schema:Male schema:Female ] OR xsd:string ; schema:knows IRI @:User* }

All the nodes in the following RDF graph conform to :User shape.

:alice schema:name "Alice" ; # Passes as a :User schema:gender schema:Female ; schema:knows :bob . :bob schema:gender schema:Male ; # Passes as a :User schema:name "Robert"; schema:birthDate "1980-03-10"^^xsd:date . :carol schema:name "Carol" ; # Passes as a :User schema:gender "unspecified" ; foaf:name "Carol" .

The nodes :alice, :bob and :carol have shape :User.

Try example in Shaclex

:alice conforms because it contains schema:name and schema:gender with their corresponding values. It does not contain the property schema:birthDate but that property is optional, as indicated by ‘ ?‘. It also has the property schema:knows with the value :bob which has :User shape.
:bob conforms because it contains the properties and values of the :User shape. Note that the order in which triples are expressed in the example does not matter. These are parsed into an RDF graph and RDF graphs are unordered collections of triples.
:carol conforms because it has property schema:name with a xsd:string value, schema:gender with another xsd:string value and an extra property foaf:name.
Notice that :carol conforms even if it has other properties apart of those mentioned by the :User shape definition (in this case foaf:name).
ShEx shapes are open by default, which means that they constrain neither the existence nor the value of the properties not mentioned in the shape. This behavior can be modified using the CLOSED qualifier as we will explain in Section 4.6.8.

Given the following RDF graph:

:dave schema:name "Dave"; # Fails as a :User schema:gender "XYY"; # schema:birthDate 1980 . # 1980 is not an xsd:date *) :emily schema:name "Emily", "Emilee" ; # Fails as a :User schema:gender schema:Female . # too many schema:names *) :frank foaf:name "Frank" ; # Fails as a :User schema:gender: schema:Male . # missing schema:name *) :grace schema:name "Grace" ; # Fails as a :User schema:gender schema:Male ; # schema:knows _:x . # \_:x is not an IRI *) :harold schema:name "Harold" ; # Fails as a :User schema:gender schema:Male ; schema:knows :grace . # :grace does not conform to :User *)

If we try to validate the nodes in the following graph against the shape :User, the validator would fail for all of the nodes:

:dave fails because the value of schema:birthDate is 1980 (an integer) which is not an xsd:date.
:emily fails because it has two values for property schema:name. Unless otherwise specified, the default cardinality is “exactly one” (which can also be written as “ {1}” or “ {1,1}”).
:frank fails because it does not have the property schema:name.
:grace fails because the value of schema:knows is a blank node and there is a node constraint saying that it must be an IRI.
:harold fails because the value of schema:knows is :grace and :grace does not conform to the :User shape.

Try example in Shaclex

4.3 ShEx implementations

At the time of this writing, we are aware of the following implementations of ShEx.

shex.js for Javascript/N3.js (Eric Prud’hommeaux) https://github.com/shexSpec/shex.js/;
Shaclex for Scala/Jena (Jose Emilio Labra Gayo) https://github.com/labra/shaclex/;
shex.rb for Ruby/RDF.rb (Gregg Kellogg) https://github.com/ruby-rdf/shex;
Java ShEx for Java/Jena (Iovka Boneva/University of Lille) https://gforge.inria.fr/projects/shex-impl/; and
ShExkell for Haskell (Sergio Iván Franco and Weso Research Group) https://github.com/weso/shexkell.

There are also several online demos and tools that can be used to experiment with ShEx.

shex.js (http://rawgit.com/shexSpec/shex.js/master/doc/shex-simple.html);
Shaclex (http://shaclex.herokuapp.com); and
ShExValidata (for ShEx 1.0) (https://www.w3.org/2015/03/ShExValidata/).

4.4 The Shape Expressions Language

4.4.1 Shape Expressions Compact Syntax

The ShEx compact syntax (ShExC) was designed to be read and edited by humans. It follows some conventions which are similar to Turtle or SPARQL.

PREFIX and BASE declarations follow the same convention as in Turtle. In the rest of this chapter we will omit prefix declarations for brevity.
Comments start with a # and continue until the end of line.
The keyword a identifies the rdf:type property.
Relative and absolute IRIs are enclosed by < > and prefixed names (a shorter way to write out IRIs) are written with prefix followed by a colon “:”.
Blank nodes are identified using _:label notation.
Literals can be enclosed by the same quotation conventions ( ', ", ''', """) as in Turtle.
Keywords (apart from a) are not case sensitive. Which means that MinInclusive is the same as MININCLUSIVE.

A ShExC document declares a ShEx schema. A ShEx schema is a set of labeled shape expressions which are composed of node constraints and shapes. These constrain the permissible values or graph structure around a node in an RDF graph. When we are considering a specific node, we call that node the focus node.

The triples which have the focus node as a subject are called outgoing arcs; those with the focus node as an object are called incoming arcs. (Typical RDF idioms call for constraints on outgoing arcs much more frequently than on incoming arcs.) Together, the incoming and outgoing arcs are called the neighborhood of that node.

Shape expression labels can be IRIs or blank nodes but only IRI labels can be referenced from outside the schema. In the previous Example 26, :User is an IRI label.

Node constraints declare the shape of a focus node without looking at the arcs. They can declare the kind of node (IRI, blank node or literal), the datatype in case of literals, describe it with XML Schema facets (e.g., min and max numeric values, string lengths, number of digits), or enumerate a value set. Figure 4.4.1 signals the node constraints that appear in Example 26 which are: xsd:string and xsd:date (datatype constraints), [schema:Male schema:Female] (a value set), IRI (a node kind declaration) and @:User (a value shape). Node constraints will be described in more detail in Section 4.5.

Figure 4.1: Node constraints in a shape.

Triple constraints define the triples that appear in the neighborhood of a focus node. They usually contain a property (or inverse property), a node constraint, and a cardinality declaration which is one by default.

For example, schema:name xsd:string is a triple constraint. The :User shape from Example 26 was formed by four triple constraints. Triple constraints will be described later in Section 4.6.1.

Figure 4.2: Triple constraints in a shape.

Triple constraints can be grouped using the semicolon operator ; to form triple expressions.¹ Shapes are enclosed by curly braces { } and contain triple expressions.

Shapes are the basic form of shape expressions, although more complex shape expressions can be formed by combining the logical operators AND, OR and NOT which will be later described in Section 4.6. Shape expressions are identified by shape expression labels.

Figure 4.3: Shapes, shape expression labels and triple expressions.

Figure 4.4.1 shows a compound shape expression formed by combining the shape reference @:User with a shape that contains a single triple constraint :teaches @:Course using the AND operator.

The full ShEx BNF grammar is specified at http://shex.io/shex-semantics/\#shexc.

Figure 4.4: Shape expression and shape.

4.4.2 Invoking Validation

In Example 26, we tested several RDF nodes ( :alice, :bob, ... :harold) against the shape :User.

ShEx validation takes as input a schema, an RDF graph, and a shape map, and returns another shape map.

The input shape map (called fixed shape map) contains a list of nodeSelector@shapeLabel associations separated by commas, where nodeSelector is an RDF node and shapeLabel is a shape label. Both use N-Triples notation.

A fixed map would look like:

<http://data.example/#alice>@<http://schema.example/#User>, <http://data.example/#bob>@<http://schema.example/#User>

Although shape maps use absolute IRIs for RDF nodes and shape labels, we will use prefixes to abbreviate them in our listings:

:alice@:User, :bob@User

Note that during evaluation, the processor may need to check the conformance of other nodes against other shapes.

Example 27 Invoking validation example

If we define the following schema:

:User { schema:name xsd:string ; schema:knows @:User* }

and the RDF graph:

:alice schema:name "Alice"; schema:knows :carol . :bob schema:name "Robert" . :carol schema:name "Carol" .

when we invoke a ShEx processor with the fixed shape map:

:alice@:User, :bob@:User

the result shape map is:

:alice@:User, :bob@:User, :carol@:User

The reason is that in order to check that :alice conforms to :User, the processor must check that :carol also conforms to :User and hence, it adds the association :carol@:User to the result shape map.

Figure 4.5 depicts the validation process.

Figure 4.5: Validation process which accepts a fixed shape map and emits a result shape map.

Try example in Shaclex

There are many use case-dependent ways to compose a fixed shape map. ShEx defines a common one called query shape map which uses triple patterns to select nodes. Triple patterns use curly braces and three values that represent the subject, predicate and object of a triple. They can contain the value FOCUS to identify the node we want to select and _ to indicate that we do not constrain some value.

Example 28 Query map example

The following query map selects all subjects of schema:name, all objects of schema:knows and nodes that have rdf:type with value schema:Person.

{FOCUS schema:name _}@:User, {_ schema:knows FOCUS}@:User, {FOCUS rdf:type schema:Person}@:User

Try example in Shaclex

Section 4.9 describes fixed shape maps and query shape maps in greater detail.

In the previous example, validating :alice as a :User entailed validating :carol as a :User. Unless the validation engine has some sort of state persistence, it would be more efficient to validate once with a shape map like:

:alice@:User,:carol@:User

than to validate :alice and :carol separately.

Validating a shape map with multiple node/shape pairs allows the engine to leverage any pairs that it has already tested.

4.4.3 Structure of Shape Expressions

In Section 4.4.1, we described shape expressions as being composed of node constraints and shapes. These can also be combined with the logical operators And, Or and Not. And and Or expressions in turn contain two or more shape expressions. When we refer to a shape expression, we mean one of the following.

A node constraint, which constrains the set of allowed values of a node.
A shape, which constrains the neighborhood of a node.
An And of two or more shape expressions (called ShapeAnd).
An Or of two or more shape expressions (called ShapeOr).
A Not of one shape expression (called ShapeNot)
An external shape expression.

This recursive structure forms a tree which has node constraints and shapes as leaves. Figure 4.6 represents the ShEx data model.

Figure 4.6: ShEx data model.

Node constraints and shapes are described in the following sections while the logical operators are discussed in Section 4.8 and external shapes in Section 4.7.3.

4.4.4 Start Shape Expression

The shape expression might be selected by label or it might default to a special shape called the start shape.

A schema can have one more shape expression called the start expression. This serves as start here advice from the schema author and is useful when describing a graph with a single purpose. For instance, the medical data protocol FHIR (see Section 6.2) has specific schemas for resources like Patient.

Example 29 ShEx schema with start directive

Consider the following code:

start = @<Patient> <Patient> { ... } ...

In the compact syntax, the directive start = @<Patient> declares that the shape expression <Patient> will be used by default if a shape is not explicitly provided in the shapes map.

In shape maps, it is possible to declare that a node must be validated against the shape map by using the keyword START. For example, the following shape map:

:alice@START, :bob@<Doctor>

would validate :alice against the start shape expression (in the previous example, it would be <Patient>) and :bob against <Doctor>.

4.5 Node Constraints

Node constraints describe the allowed values of a node. These include specification of RDF node kind, literal datatype, string and numeric facets, and value sets.

Node constraints can appear as a labeled shape expression or as part of triple constraints.

Example 30

Any place one does not want a node constraint, can be marked with a period ( "."). This is analogous to the period which matches any character in regular expressions. The following example lists the properties that a :User must have but it does not specify any constraint in their values:

:User { schema:name . ; schema:alternateName . * ; schema:birthDate . ? }

Given the following RDF graph:

:alice schema:name 23 . # Passes as a :User :bob schema:name "Robert" ; # Passes as a :User schema:alternateName "Bob", "Bobby", <Bob> ; schema:birthDate "Unknown" .

If we provide the shape map :alice@:User,:bob@:User the ShEx processor would return that they both conform.

Node constraints usually appear as part of value expressions in triple constraints.

Example 31 Node constraint in a value expression

The following example declares that nodes with shape :User must have a property schema:url whose value must be an IRI.

:User { schema:url IRI }

Node constraints can also appear as top level shapes.

Example 32 Node constraint as top-level shape

The following code defines two shapes, :HomePage and :CanVoteAge, which are defined as node constraints. The first one declares that nodes must be IRIs and the second one that they must be xsd:integer values greater than 18.

:HomePage IRI :CanVoteAge xsd:integer MinInclusive 18

If we provide a ShEx processor the shape map

<http://example.org/alice>@:HomePage, 23@:CanVoteAge, 45@:HomePage, 14@:CanVoteAge

The result would be that the first two nodes are conformant while the last two nodes are non-conformant.

It is also possible to combine top-level node constraints with more complex shapes.

Example 33 Node constraint as top-level shape combined with complex shape

The following declaration of shape :User says that nodes conforming to shape :User must be IRIs and have a property schema:name with an xsd:string value.

:User IRI AND { schema:name xsd:string }

In this case, the external AND can be omitted, so the previous shape is equivalent to:

:User IRI { schema:name xsd:string }

Table 4.1 gives an overview of the main types of node constraints with some examples and a short description.

Table 4.1: Node constraints

Name Description Examples

Anything The value can be anything .

Datatype The value must be an element of that datatype xsd:string
xsd:date
cdt:distance
…

Node kind The value must have that kind IRI
BNode
Literal
NonLiteral

Value set The value must be an element of that set [:Male :Female]

Shape reference The value must conform to <User> @:User

4.5.1 Node kinds

Node kinds describe the kind that a value must have. There are four node kinds in ShEx: Literal, IRI, BNode, and NonLiteral which follow the rules defined in RDF 1.1 for such terms.

Table 4.2: Node kinds

Value Description Examples

Literal Any RDF literal "Alice"
"Spain"@en
42
true

IRI Any RDF IRI <http://example.org/Alice>
ex:alice
:bob

BNode Any blank node _:x
[]

NonLiteral Any IRI or blank node <http://example.org/alice>
_:x

Example 34

The following example declares that the value of property schema:name must be a literal and the value of schema:follows must be an IRI.

:User { schema:name Literal ; schema:follows IRI }

:alice schema:name "Alice"; # Passes as a :User schema:follows :bob . :bob schema:name :Bob ; # Fails as a :User schema:follows _:x . # :Bob is not a literal and \_:x is not an IRI *)

4.5.2 Datatypes

Like most schema languages, ShEx includes datatype constraints which declare that a focus node must be a literal with some specific datatype. ShEx has special support for XML Schema datatypes [9] for which it checks that the lexical form also conforms to the expected datatype.

Example 35 Simple datatypes

The following example declares the datatypes that must have the values of schema:name and schema:birthDate properties.

:User { schema:name xsd:string ; foaf:age xsd:integer ; schema:birthDate xsd:date ; }

:alice schema:name "Alice"; # Passes as a :User foaf:age 36 ; schema:birthDate "1981-07-10"^^xsd:date . :bob schema:name "Robert"^^xsd:string ; # Passes as a :User foaf:age "26"^^xsd:integer ; schema:birthDate "1981-07-10"^^xsd:date . :carol schema:name :Carol ; # Fails as a :User foaf:age "14" ; # :Carol is an IRI *) schema:birthDate "2003-06-10"^^xsd:date . # and "14" a string *) :dave schema:name "Dave" ; # Fails as a :User foaf:age "Unknown"^^xsd:integer; # invalid lexical forms *) schema:birthDate "Unknown"^^xsd:date .

As we said, for XML Schema datatypes, ShEx also checks that the lexical form matches the expected datatype. For example, the foaf:age of :dave is "Unknown"^^xsd:integer and although it declares that "Unknown" is an integer and some RDF parsers allow those declarations, "Unknown" does not have the integer’s lexical form and the ShEx processor will complain. The same happens for the value of schema:birthDate.

Example 36 Custom datatypes

Although the most common use case is to use XML Schema datatypes, RDF data can use other datatypes. In the following example, a picture contains the properties schema:width and schema:height using a hypothetical custom datatype for distances ( cdt:distance).

:Picture { schema:name xsd:string ; schema:width cdt:distance ; schema:height cdt:distance }

:gioconda schema:name "Mona Lisa"; # Passes as a :Picture schema:width "21 in"^^cdt:distance ; schema:height "30 in"^^cdt:distance . :other schema:name "Other picture" ; # Fails as a :Picture schema:width "21 in"^^xsd:string ; # expected cdt:distance *) schema:height 30 .

Example 37 Language-tagged literals

The datatype rdf:langString identifies language-tagged literals (see [25, Section 3.3]), i.e., RDF literals that have a language tag.

:Country { schema:name rdf:langString ; }

:italy schema:name "Italia"@es . #Passes as a :Country :france schema:name "France" . #Fails as a :Country

4.5.3 Facets on Literals

XML Schema provides a useful library of string and numeric tests called facets [9]. These facets are listed in Table 4.3 with a sample argument and some passing and failing values.

Table 4.3: Facets on literals

Facet and
argument Passing values Failing values

MinInclusive 1 "1"^^xsd:decimal,
1, 2, 98, 99, 100 "1"^^xsd:string,
-1, 0

MinExclusive 1 2, 98, 99, 100 -1, 0, 1

MaxInclusive 99 1, 2, 98, 99 100

MaxExclusive 99 1, 2, 98 99, 100

TotalDigits 3 "1"^^xsd:integer,
9, 999, 0999,
9.99, 99.9, 0.1020 "1"^^xsd:string,
1000, 01000,
1.1020, .1021, 0.1021

FractionDigits 3 "1"^^xsd:decimal,
0.1, 0.1020, 1.1020 "1"^^xsd:integer,
0.1021, 0.10212

Length 3 "123"^^xsd:string,
"123"^^xsd:integer,
"abc" "12"^^xsd:string,
"12"^^xsd:integer,
"ab", "abcd"

MinLength 3 "abc", "abcd" "", "ab"

MaxLength 3 "", "ab", "abc" "abcd", "abcde"

/^ab+/
Regex pattern "ab", "abb", "abbcd" "", "a", "acd", "cab"
"AB", "ABB", "ABBCD"

/^ab+/i
Regex pattern
with i flag "ab", "abb", "abbcd"
"AB", "ABB", "ABBCD" "", "a", "acd"

Example 38

:Product { schema:name xsd:string MaxLength 10 ; schema:weight xsd:decimal MinInclusive 1 MaxInclusive 200 ; schema:sku /^[A-Z0-9]{10,20}$/ ; }

:product1 schema:name "Product 1"; #Passes as a :Product schema:weight "23.0"^^xsd:decimal; schema:sku "A23456B234CBDF" . :product2 schema:name "Product 2" ; #Fails as a :Product schema:weight "245.5"^^xsd:decimal ;# schema:weight > 200 *) schema:sku "ABC" . # schema:sku fails regex *)

Try example in Shaclex

The pattern constraint (‘ /regex/’) is based on the XPath regular expression function fn:matches(str,re,flags) which takes as parameters the string to match, the regular expression, and an optional flags parameter to modify the matching behavior.

XPath regular expressions are based on common conventions from other languages like Perl or other Unix tools like grep. The regular expression language is a string composed of the characters to match and some characters which have special meaning called meta-characters.

x matches the 'x' character.
\u0078 matches the unicode codepoint U+78 (which is again 'x').
. matches any character.
[vxz] declares a character class, and matches any of 'v', 'x', or 'z'.
\d is a pre-defined character class which matches any digit. It is equivalent ot “ [0-9]”.
\S is a pre-defined character class which matches any space character (which also includes tabs and newlines). It is equivalent ot “ [\u0008\u000d\u000a\u0020]”.

Inside character classes, the symbol “ ^” means negation and “ -” can be used to declare character ranges. For instance, the character class [^a-zA-Z] matches any non-letter.

Cardinality (repetition) operators can be used to specify how many characters are matched. The possibilities are as follows.

? represents zero or one values.
+ one or more values.
* zero or more values.
{m,n} between m and n values.

Any string of characters must be matched in the order of its characters with the following alterations.

| declares alternatives, e.g., “ abc|def|ghi” matches any of “ abc”, “ def”, “ ghi”.
^ matches the beginning of a string.
$ matches the end of a string.
“()” declares a group which is useful for cardinality and alternatives. For example: “ \^ab(cd|ef){2,}gh” matches “ abcdcdcdghij”.

All of the meta characters above will be treated as a literal (i.e., they match themselves) if they are prefixed with a \\ (backslash).

Table 4.4 contains several examples of regular expression matches.

Table 4.4: Examples of regular expressions

Regular Expression Some values that match Some values that don’t match

P\d{2,3} P12 P234 A1 P2n P1 P2233

(pa)*b b pab papab papapab … pa po

(pa)*b b pab papab papapab … pa po

[a-z]{2,3} ab abc a abcd 23

[a-z]{2,3} ab abc a abcd x45 23

The flags string has the following possibilities.

i: Case-insensitive mode.
m: Multi-line mode. If present, the ^ character matches the start of any line (not only the start of the string) and the $ matches the end of any line (not only the end of the string).
s: If present, the dot matches also newlines, otherwise it matches any character except newlines. This mode is called single-line mode in Perl.
x: Removes white space characters in the regular expression before matching.
q: All meta characters are interpreted as literals, i.e., they match themselves in the input string. q is compatible with the i flag. If it’s used with the m, s or x flag, that flag is ignored.

4.5.4 Value Sets

A value set is a node constraint which enumerates the list of possible values that a focus node may have. In ShExC, value sets are enclosed by square brackets ( [ and ]) where each possible value is separated by a space.

Example 39 Example with value sets

The following example declares a shape :Product with two properties: schema:color and schema:manufacturer, whose possible values are enumerated.

:Product { schema:color [ "Red" "Green" "Blue" ] ; schema:manufacturer [ :OurCompany :AnotherCompany ] }

:x1 schema:color "Red"; # Passes as a :Product schema:manufacturer :OurCompany . :x2 schema:color "Cyan" ; # Fails as a :Product schema:manufacturer :OurCompany . :x3 schema:color "Green" ; # Fails as a :Product schema:manufacturer :Unknown .

Unit value sets

A common pattern is to declare that a node must have a specific value. This can be done by a unit value set, i.e., a value set with a single value.

Example 40

:Spanish { schema:country [ :Spain ] } :User { a [ schema:Person ] }

:alice schema:country :Spain . # Passes as a :Spanish :bob schema:country :France . # Fails as a :Spanish :carol a schema:Person ; # Passes as a :Spanish and :User schema:country :Spain . :p1 a schema:Product; # Fails as a :User schema:country :Spain . # Passes as a :Spanish :dave rdf:type schema:Person; # Passes as a :User schema:country :Japan . # Fails as a :Spanish

Note that the :User shape employs the a keyword which stands for rdf:type. There is no inference in ShEx, even for rdf:type, which is treated as any other arc. See Section 3.2 for a discussion of the difference between shapes and classes.

Language-tagged values

As seen above, value sets contain one or more values. The examples so far have included IRI and strings (literals with a datatype of xsd:string). These match precisely the same value in the data. They can also be language tags, which match any literal with the given language tag.

Example 41

:FrenchProduct { schema:label [ @fr ] } :SpanishProduct { schema:label [ @es @es-AR @es-ES ] }

:car1 schema:label "Voiture"@fr . # Passes as a :FrenchProduct :car2 schema:label "Auto"@es . # Passes as a :SpanishProduct :car3 schema:label "Carro"@es-AR . # Passes as a :SpanishProduct :car4 schema:label "Coche"@es-ES . # Passes as a :SpanishProduct

Ranges

We can see in the example above that it would be convenient to accept literals with any language tag starting with "es". This can be indicated with the postfix operator ‘ ~’. For example, Argentinian, Chilean, and other region codes for Spain could be accepted with ‘ schema:label [ @es~ ]’.

Example 42 Language-tagged ranges

The following code declares that Spanish products contain rdfs:label with a value that must be a language-tagged literal in Spanish or any variant.

:SpanishProduct { schema:label [ @es~ ] }

:car1 schema:label "Auto"@es . # Passes as a :SpanishProduct :car2 schema:label "Carro"@es-AR . # Passes as a :SpanishProduct :car3 schema:label "Coche"@es-ES . # Passes as a :SpanishProduct

This also works for strings, e.g., ‘ "+34"~’ (French telephone numbers) and IRIs, e.g., ‘ <http://www.w3.org/ns/>~’ (W3C namespaces).

Example 43 String and IRI ranges example

:SpanishW3CPeople { schema:telephone [ "+34"~ ] ; schema:url [ <http://www.W3C.es/Personal>~ ] }

:alice schema:telephone "+34 123 456 789"; # Passes as a :SpanishW3CPeople schema:url <http://www.W3C.es/Personal/Alice> . :bob schema:telephone "123 456 789" ; # Fails as a :SpanishW3CPeople schema:url <http://other.org/bob> . # Bad telephone and url *)

IRIs represented as prefixed names can also have a postfix ‘ ~’, e.g., foaf:~ represents the set of all URIs that start with the namespace bound to the prefix foaf:.

Example 44

In the following example, we declare that the status of a product must start by http://example.codes/good. or http://example.codes/bad..

prefix codes: <http://example.codes/> :Product { :status [ codes:good.~ codes:bad.~ ] }

prefix codes: <http://example.codes/> prefix other: <http://other.codes/> :x1 :status codes:good.Shipped . # Passes as a :Product :x2 :status other:done . # Fails as a :Product :x3 :status <http://example.codes/bad.Lost> . # Passes as a :Product

Exclusions

It can also be useful to exclude some values from a range. Exclusions are marked by the minus - sign. For example: codes:~ - codes:unknown represents all values starting by codes: except codes:unknown.

Exclusions can themselves be ranges. For example: codes:~ - codes:bad.~ represents all values starting by codes: except those that start by codes:bad..

Example 45 Range exclusions

The following code prescribes that the status of products can be anything that starts with codes: except codes:unknown or codes starting with codes:bad..

prefix codes: <http://example.codes/> :Product { :status [ codes:~ - codes:unknown - codes:bad.~ ] }

prefix codes: <http://example.codes/> prefix other: <http://other.codes/> :p1 :status codes:good.Shipped . # Passes as a :Product :p2 :status other:done . # Fails as a :Product :p3 :status <http://example.codes/bad.Lost> . # Fails as a :Product :p4 :status <http://example.codes/unknown> . # Fails as a :Product

Exclusions must be the same kind (IRI, string or language tag) as the stem type. For instance, ‘ [ codes:good.~ - "bad." - @fr~ ]’ would be malformed as it’s an IRI range excluding a string and a language stem.

Heterogeneous value sets

There is no requirement that value sets be composed of a consistent kind of value (IRI, string or language tag). For instance, the status of a product can be the IRIs ( :Accepted or :Rejected) or a string, e.g., “unknown".

Example 46

:Product { schema:status [ :Accepted :Rejected "unknown" ] }

Wildcard stem ranges

Sometimes we want to accept user data with any value except some specific values. For this, a wildcard character (‘.’) followed by one or more exclusions can be used (so long as those exclusions are all of the same kind). The kind of the exlcusions (IRI, string, or language tag) establishes the type of RDF term that will be matched.

Example 47 Example of a wildcard range with exclusion

The following code declares that the status of products can be anything except the IRI codes:bad. Given that the exclusion is an IRI, the status must be an IRI.

prefix codes: <http://example.codes/> :Product { :status [ . - codes:bad ] }

prefix codes: <http://example.codes/> prefix other: <http://other.codes/> :p1 :status codes:good . # Passes as a :Product :p2 :status other:bad . # Passes as a :Product :p3 :status codes:bad . # Fails as a :Product :p4 :status "good" . # Fails as a :Product # "good" must be a IRI *)

Value set expressivity

Value sets are mostly a shorthand syntax for complex Boolean combinations of node constraints. ShEx includes them because they are much more concise and, given their ubiquity in other schema languages, they are fundamental to how people model and understand data.

Example 48 Representing value sets

The following shape:

:User { schema:gender [ schema:Male schema:Female ] }

can be defined without value sets using the OR operator that will be presented in Section 4.6.

:User { schema:gender [ schema:Male ] } OR { schema:gender [ schema:Female ] }

4.6 Shapes

In the previous section we explored node constraints and how they declare a set of permissible RDF terms. Most of the examples used node constraints in triple constraints, limiting the permissible values for triples in the input graph.

Example 49 Simple example

In the following example, we describe a shape :User

:User { schema:name xsd:string }

and we will try to validate the nodes :alice and :bob represented in the following data:

:alice schema:name "Alice" ; # Passes as a :User schema:knows :bob . :bob schema:name 34 ; # Fails as a :User schema:knows :alice . # wrong schema:name *)

To solidify our intuition of validating shapes, we need to think of this as a series of steps to validate a focus node against a shape expression.

Check if focus node :alice conforms to the shape expression :User.
:User is a shape so check if the neighborhood of :alice matches the triple expression in the shape :User. This step means that one needs to find a way to distribute the triples in the neighborhood to satisfy the triple expression.
The shape’s triple expression is a single triple constraint so all one needs to do is find the triple with a matching predicate in the neighborhood. In this case, the triple :alice schema:name "Alice".
The triple expression has a value expression so consider the object, "Alice", as the focus node and test it against the node constraint (in this case xsd:string).
"Alice" matches ‘ xsd:string’ so this test succeeds.
The cardinality of the triple constraint is {1,1} (the default one) and as there is only one tripe matching the node conforms to the shape expression.

When the same steps are performed to check :bob, the last step will have 34 as the focus node. This test fails so :bob fails to conform to :User.

Shape

A shape is a container for a triple expression along with some properties stating how to treat triples not matching the triple expression. We will describe these properties after introducing triple expressions (Section 4.6.8). Since triple expressions are combinations of triple constraints, we start with them.

4.6.1 Triple Constraints

The basic building block of a triple expression is a triple constraint. It is composed of a property, a node constraint, and a cardinality.

A triple constraint expresses a constraint on the values of triples with the given property and the number of values expressed by the cardinality. Cardinalities will be described in more detail in Section 4.6.3.

Example 50 The following shape is defined by a single triple constraint whose components are depicted in Figure 50.

:Product { schema:productId xsd:string {1,2} }

The meaning is that nodes conforming to :Product must satisfy:

They must have property schema:productId.
All the values of schema:productId must satisfy the node constraint xsd:string.
As the cardinality is {1,2}, there can be between 1 and 2 values of schema:productId.

Figure 4.7: Parts of a triple constraint.

:p1 schema:productId "P1" . # Passes as a :Product :p2 schema:productId "P2", "C2". # Passes as a :Product :p3 schema:productId "P3", "C3", "X3" . # Fails as a :Product # Cardinality exceeded *) :p4 schema:name "No Id" . # Fails as a :Product # No schema:productId *) :p5 schema:productId 5 . # Fails as a :Product # xsd:string not satisfied *) :p6 schema:productId "P6", 5 . # Fails as a :Product # xsd:string not satisfied *)

Closing a property

Triple constraints have an implicit meaning of closing the possible values of a property. In the previous example, the declaration schema:productId xsd:string requires all values of schema:productId to satisfy xsd:string. That’s why :p6 failed to conform: although it had one string value, the other value wasn’t.

This behavior can be modified with the directives EXTRA and CLOSED that will be shown in Section 4.6.8.

4.6.2 Groupings

The EachOf operator combines two or more triple expressions. All the sub-expressions must be satisfied by triples in the neighborhood of the focus node. EachOf is indicated by a semicolon ( ;) in the compact syntax.

Example 51 A :User is defined by an EachOf expression that combines three triple constraints. A node satisfies the :User type if all the three triple constraints are satisfied.

:User { schema:name xsd:string ; foaf:age xsd:integer ; schema:email xsd:string }

4.6.3 Cardinalities

Cardinalities indicate the required number of triples satisfying the given constraint. They are most often used on triple constraints although they can also be applied to more complex expressions. Table 4.5 gives an overview of the different representations of cardinalities in ShExC.

Table 4.5: ShEx cardinalities

Value Description

* 0 or more

+ 1 or more

? 0 or 1

{m} Exactly m repetitions

{m,n} Between m and n repetitions

{m,} m or more repetitions

If the cardinality is not specified, the default value is {1} (exactly one).

Example 52 Cardinalities example

The following :User shape declares that nodes must have exactly one value for schema:name (default cardinality), and optional value for schema:worksFor and zero or more values for schema:follows.

The :Company shape uses the explicit {m,n} syntax to assert that a matching node must have between 1 and 100 employees and an optional schema:founder value.

:User { schema:name xsd:string ; schema:worksFor IRI ? ; schema:follows IRI * } :Company { schema:founder IRI ?; schema:employee IRI {1,100} }

:alice schema:name "Alice"; #Passes as a :User schema:follows :bob; schema:worksFor :OurCompany. :bob schema:name "Robert" ; #Passes as a :User schema:worksFor :OurCompany. :carol schema:name "Carol" ; #Passes as a :User schema:follows :alice . :dave schema:name "Dave" . #Passes as a :User :emily schema:name "Emily" ; #Fails as a :User schema:worksFor :OurCompany, # more than one schema:worksFor *) :OtherCompany . :OurCompany schema:founder :dave ; schema:employee :alice, :bob. #Passes as a :Company :OtherCompany schema:founder :alice . #Fails as a :Company # 0 employees *)

A cardinality can also be used on more general expressions indicating that the neighborhood of a node must contain several groups of triples, each of them satisfying the expression.

Example 53 Cardinalities on expressions

The following shape declares that nodes must have exactly one value for schema:name and that they can contain the combination of schema:givenName and schema:familyName with optional cardinality (either they contain the group of both properties or none of them).

:User { schema:name xsd:string ; ( schema:givenName xsd:string ; schema:familyName xsd:string ) ? }

:alice schema:name "Alice" #Passes as a :User . :bob schema:name "Robert" ; #Passes as a :User schema:givenName "Robert" ; schema:familyName "Smith" . :carol schema:name "Carol" ; #Fails as a :User schema:givenName "Carol" .

4.6.4 Choices

The pipe or choice operator | can be used to declare compose complex triple expressions with the meaning that one of the branches must be satisfied.

Example 54 OneOf operator

The following shape declares that nodes must have either schema:name or foaf:name, but not both.

:User { schema:name xsd:string | foaf:name xsd:string }

:alice schema:name "Alice" . #Passes as a :User :bob foaf:name "Bob" ; #Passes as a :User schema:identifier "P234" . :carol schema:name "Carol" ; #Fails as a :User foaf:name "Carol" . # More than one *) :dave schema:identifier "P123" . #Fails as a :User # None provided *)

A typical pattern consists of combining OneOf ( | operator) with EachOf ( ;) to form more complex expressions.

Example 55

The following shape declares that nodes must have either one schema:name or a combination of zero or more schema:givenName and one schema:lastName.

:User { schema:name xsd:string | ( schema:givenName xsd:string + ; schema:familyName xsd:string ) }

:alice schema:name "Alice" . #Passes as a :User :bob schema:givenName "Bob" ; #Passes as a :User schema:givenName "Bobby"; schema:familyName "Smith" . :carol schema:name "Carol" ; #Fails as a :User schema:familyName "King" . # Can't have both *) :dave schema:name 23 . #Fails as a :User # schema:name must be xsd:string *)

A typical pattern is to add some cardinality to an expression formed by the OneOf ( |) operator.

Example 56 Cardinality on OneOf expression

The following shape declares that nodes must have exactly one value for schema:productId and that they can contain between 0 or two combinations of schema:isRelatedTo or schema:isSimilarTo.

:Product { schema:productId xsd:string ; ( schema:isRelatedTo @:Product | schema:isSimilarTo @:Product ){0,2} }

:p1 schema:productId "P1" ; #Passes as a :Product schema:isRelatedTo :p2, :p3 . :p2 schema:productId "P2" . #Passes as a :Product :p3 schema:productId "P3"; #Passes as a :Product schema:isRelatedTo :p1 ; schema:isSimilarTo :p2 . :p4 schema:productId "P4" ; #Fails as a :Product schema:isRelatedTo :p1, :p2, :p3 .

4.6.5 Nested Shapes

It is possible to avoid defining two shapes when one of them is just an auxiliary shape that is not needed elsewhere.

Example 57

The following schema declares that nodes conforming with :User must have a property schema:name with xsd:string and another property schema:worksFor whose value must conform with an anonymous shape _:1 which must have rdf:type with the value :Company.

:User { schema:name xsd:string ; schema:worksFor @_:1 } _:1 { a [ :Company ] }

It can be rewritten as:

:User { schema:name xsd:string ; schema:worksFor { a [ :Company] } }

:alice schema:name "Alice" ; #Passes as a :User schema:worksFor :OurCompany . :bob schema:name "Robert" ; #Passes as a :User schema:worksFor [ a :Company] . :carol schema:name "Carol" ; #Fails as a :User schema:worksFor [ # The value of schema:worksFor *) schema:name "AnotherCompany" # does not have rdf:type :Company *) ]. :OurCompany a :Company . #Passes as a anonymous shape

Nested shapes can be used to emulate simple SPARQL property paths.

Example 58

:Grandson { :parent { :parent . + }+ ; }

:alice :parent :bob, :carol . #Passes as a :Grandson :bob :parent :dave. #Passes as a :Grandson :carol :parent :emily . #Fails as a :Grandson :dave :parent :grace . #Fails as a :Grandson :emily schema:name "Emily" . #Fails as a :Grandson

4.6.6 Inverse Triple Constraints

The ^ operator reverses the order of the triple constraint. Instead of constraining the focus node’s outgoing arcs, it constrains incoming arcs.

Example 59 Inverse triple constraints

The following code declares that nodes conforming to shape :Company must have rdf:type :Company and must be the objects of one or more triples with predicate schema:worksFor and a subject conforming to shape :User.

:User { schema:name xsd:string } :Company { a [schema:Company] ; ^schema:worksFor @:User + }

With the following data, node :Company1 conforms to :Company because there are two nodes, :alice and :bob that work for it. However, node :Company2 does not conform because there are no node pointing to it by the property schema:worksFor and node :Company3 also fails because the node that works for it, does not conform to shape :User.

:alice schema:name "Alice"; #Passes as a :User schema:worksFor :Company1 . :bob schema:name "Bob" ; #Passes as a :User schema:worksFor :Company1 . :carol schema:worksFor :Company3 . #Fails as a :User # No schema:name *) :Company1 a schema:Company . #Passes as a :Company :Company2 a schema:Company . #Fails as a :Company # No one works for it *) :Company3 a schema:Company . #Fails as a :Company # Carol works for it *) # but does not conform to User *)

4.6.7 Repeated Properties

The EachOf operator is different from a conjunction operator. This is best illustrated when a shape uses the same property several times; we call this a repeated property. In Example 60, the :User shape is an EachOf with three triple constraints, two of which have the same property :parent. This shape is conformed by a node that has two arcs for the :parent property, each of which contributes to satisfy one of the two triple constraints.

Example 60 Repeated properties

:User { schema:name xsd:string; schema:parent { schema:gender [schema:Male ] } ; schema:parent { schema:gender [schema:Female ] } ; }

:alice schema:name "Alice" ; #Passes as a :User schema:parent :bob, :carol . :bob schema:gender schema:Male . :carol schema:gender schema:Female . :dave schema:name "Dave" ; #Fails as a :User schema:parent :carol, :emily . # both parents are Female :emily schema:gender schema:Female . :frank schema:name "Frank"; #Fails as a :User schema:parent :x . # only one parent :x schema:gender schema:Female, schema:Male .

Remember that ShEx distributes the triples to triple constraints in a triple expression (see Section 4.6). This means the same triple cannot contribute for satisfying two different triple constraints, even if its object satisfies the node constraints for both. That is why the node :frank does not conform to the :User shape even if its parent satisfies both conditions.

4.6.8 Permitting other Triples

When defining RDF-based services using ShEx schemas, there are several possibilities that have to be taken into account. Some services backed by an RDF triple store may simply accept and store any triples not described in the schema; in such a case, the role of the schema is mainly to identify and constrain the triples that the service understands and manipulates, allowing any extra triples for unforeseen applications. This open model is more popular in the semantic web community.

At the other extreme, some services or databases may accept or emit some fixed structure, disallowing any triples that are not mentioned in the schema. In this case, the role of ShEx schemas is to validate and verify the content before it is processed or published. This closed model has been traditionally employed in contexts where data quality and security play a significant part.

ShEx manages these use cases with two granularities:

extra properties manage triples with predicates that appear in the shape expression but do not have corresponding values; and
closed shapes manage triples with predicates that do not appear in the shape expression.

Extra Properties

As we described in Section 4.6.1 triple constraints close properties by default. Sometimes, it is useful to open a property to permit instances of it which are not included in the schema. The EXTRA qualifier can be used to allow the appearance of other properties.

A shape of the form

<Shape> EXTRA <property> { <property> <NodeConstraint> }

is equivalent to:

<Shape> { <property> <NodeConstraint> ; <property> (Not <NodeConstraint>)* }

which means that it allows zero or more values of <property> that do not satisfy <NodeConstraint>. Note that that there is a hidden negation in any shape that includes an EXTRA qualifier.

Example 61 EXTRA example

The following example declares that nodes that conform to :FollowSpaniards must follow one of more nodes whose nationality is :Spain, but can also follow other nodes.

:FollowSpaniards EXTRA schema:follows { schema:follows { schema:nationality [:Spain] }+ }

:alice schema:follows :david . #Passes as a :FollowSpaniards :bob schema:follows :david, :emily . #Passes as a :FollowSpaniards :carol schema:follows :emily . #Fails as a :FollowSpaniards :david schema:nationality :Spain . :emily schema:nationality :France .

Notice that in the case of :bob is passes although it follows :emily which is not Spaniard. If we remove the EXTRA declaration it would fail.

A typical pattern using EXTRA declarations is to constrain the set of required values of a node but to allow other values.

Example 62 EXTRA properties with several types

The following example declares the shapes for companies which must have two values for the rdf:type predicate: schema:Organization and org:Organization. Shape :Company1 does not allow any extra rdf:type arc, while shape :Company2 allows extra values.

:Company1 { a [ schema:Organization ] ; a [ org:Organization ] } :Company2 EXTRA a { # Allows extra values of rdf:type a [ schema:Organization ] ; a [ org:Organization ] }

:OurCompany a org:Organization, #Passes as a :Company1 and :Company2 schema:Organization . :OurUniversity a org:Organization, #Fails as a :Company1 schema:CollegeOrUniversity, # unexpected rdf:type schema:Organization . #Passes as a :Company2

Closed Shapes

A shape can be declared to have only the triples matching a given set of triple constraints and no others using the keyword CLOSED.

Example 63 CLOSED shape example

:User1 { schema:name xsd:string; schema:knows IRI* } :User2 CLOSED { schema:name xsd:string; schema:knows IRI* }

:alice schema:name "Alice" ; #Passes as a :User1 and :User2 schema:knows :bob . :bob schema:name "Bob" ; #Passes as a :User1 schema:knows :alice ; #Fails as a :User2 schema:age 23 . # unexpected schema:age

A common pattern is to combine CLOSED and EXTRA.

Example 64 CLOSED shapes

The shape KnowsW3CPeople

:KnowsW3CPeople CLOSED EXTRA schema:knows { schema:name xsd:string; schema:affiliation IRI ? ; schema:knows { schema:affiliation [:W3C] }+ }

:alice schema:name "Alice" ; #Passes as a :KnowsW3CPeople schema:affiliation :ACompany ; schema:knows :bob . :bob schema:name "Bob" ; #Fails as a :KnowsW3CPeople schema:affiliation :W3C; schema:knows :carol . # :carol's affiliation is not :W3C *) :carol schema:name "Carol" ; #Passes as a :KnowsW3CPeople schema:affiliation :ACompany ; schema:knows :alice, :bob . :dave schema:name "Dave" ; #Fails as a :KnowsW3CPeople schema:knows :alice, :bob ; schema:age 23 . # schema:age not allowed*)

Try example in Shaclex

4.7 References

4.7.1 Shape References

A node constraint can be a shape reference, which has the form @label where label is the identifier of another shape expression in the schema. Shape expression reference would be a more precise name but is long enough to be awkard.

Example 65 Shape references

:User { schema:worksFor @:Company ; } :Company { schema:name xsd:string }

:alice a :User; #Passes as a :User schema:worksFor :a . :bob a :User; #Fails as a :User because :x fails as :Company schema:worksFor :x . :a schema:name "CompanyA" . #Passes as a :Company :x schema:name 23 . #Fails as a :Company

4.7.2 Recursion and Cyclic References

It is possible to define data models with cyclic references, i.e., shapes that recursively refer to themselves either directly or indirectly. ShEx supports these kinds of data models which appear frequently.

Example 66 Cyclic data model

The model depicted in Figure 66 can be specified in ShEx as:

:User { schema:worksFor @:Company ; } :Company { schema:name xsd:string ; schema:employee @:User* }

Figure 4.8: Example of cyclic data model.

:alice schema:worksFor :OurCompany . #Passes as a :User :bob schema:name "Robert"; #Passes as a :User schema:worksFor :OurCompany . :carol schema:worksFor :AnotherCompany . #Passes as a :User :OurCompany schema:name "OurCompany" ; #Passes as a :Company schema:employee :alice, :bob . :AnotherCompany schema:name "AnotherCompany" . #Passes as a :Company

Example 67 More complex cyclic model

As an exercise, we present a more complex cyclic data model in Figure 67. Although the model has several cycles, it can be easily represented in ShEx as:

:University { schema:name xsd:string ; schema:employee @:Teacher +; schema:course @:Course + } :Teacher { a [ schema:Person ]; schema:name xsd:string ; :teaches @:Course* } :Course { schema:name xsd:string ; :university @:University ; :hasStudent @:Student+ } :Student { a [ schema:Person ]; schema:name xsd:string ; schema:mbox IRI ; :hasFriend @:Student* ; :isEnroledIn @:Course* }

Figure 4.9: Exercise to represent cyclic data model.

Notice the separation between the types and shapes of nodes. Both :Teacher and :Student must have rdf:type with value schema:Person, but their properties are different.

As can be seen, ShEx can model any kind cyclic or recursive model in a natural way. The only restriction is when combining recursion with negation, as we will explain in Section 4.8.3 where the negation operator NOT is introduced.

4.7.3 External Shapes

External shapes are an extension mechanism to externally define shapes. This is useful when we want to describe functional shapes or very large value sets. As a practical example, in medical schemas, value sets can be dynamically derived and include hundreds of thousands of terms. In the FHIR use case (see Section 6.2), these are resolved using an emerging REST API for ShEx.

Example 68 External shape example

The following code declares an external shape for products where the value of schema:category is defined as an external shape. In this case, an annotation declares the property :service that points to the URL where the shape can be retrieved.

:Product { schema:productId xsd:string ; schema:category EXTERNAL // :service <http://categories.org/> }

Although at the time of this writing, the ShEx specification does not define a mechanism like the :service above, it is expected that future mechanisms like that will be developed.

4.7.4 Labeled Triple Expression

Much as shape references (Section 4.7.1) are allowed wherever a shape expression may appear, any triple expression can be labeled so it can later be referenced.

The target triple expression must be labeled with $label and references are made with &label.

For instance, if we want to share a name expression between :User and :Employee shapes, we could include the expression in one and reference it from the other.

Example 69 Labeled triple expression

:User { $:name ( schema:name . | schema:givenName . ; schema:familyName . ) ; schema:email IRI } :Employee { &:name ; :employeeId . }

:alice schema:name "Alice" ; #Passes as a :User schema:email <mailto:alice@example.org> . :bob schema:givenName "Robert" ; #Passes as a :Employee schema:familyName "Smith" ; :employeeId 1234567 .

The “ \&:name" directive can be considered to insert the value of :name into its place. Logically, :Employee is equivalent to this:

Example 70 Equivalent triple expression

:Employee { ( schema:name . | schema:givenName . ; schema:familyName .) ; :employeeId . }

4.7.5 Annotations

ShEx allows to provide annotations, which are lists of pairs (predicate,object) where predicate is an IRI and object is any RDF node. Annotations provide additional information about the elements to that they are applied, which can be triple constraints, EachOf, OneOf, or shapes.

The compact syntax for annotations uses two slashes // followed by a predicate and an object.

Example 71 Shape with annotations

The following code declares a shape :User which must have a schema:name with a xsd:string value, and a schema:birthDate with a xsd:date. Each triple constraint has its corresponding rdfs:label and rdfs:comment annotations.

:Person { schema:name xsd:string // rdfs:label "Name" // rdfs:comment "Name of person" ; schema:birthDate xsd:date // rdfs:label "birthDate" // rdfs:comment "Birth of date" ; }

In this case, each triple constraint has its specific annotations which are internally represented as triples.

At the time of this writing ShEx does not have any built-in annotation vocabulary. It is expected that some specific annotations could be used for future uses like user interface generation or any other use case.

4.8 Logical Operators

The logical operators AND, OR, and NOT can be used to form complex shape expressions. Their meaning follows the conventional logical meaning of conjunction, disjunction, and negation. The precedence of the operators is the usual one.

Table 4.6: Logical operators on shape expressions

Operation Description

AND S1 AND S2 is satisfied if and only if both are satisfied

OR S1 OR S2 is satisfied if and only if S1 or S2 (or both) are satisfied

NOT NOT S is satisfied if and only if S is not satisfied

4.8.1 Conjunction

The AND operator forms a new shape expression from two shape expressions with the meaning that a node conforms to S1 AND S2 if it conforms to both S1 and S2.

Example 72 Conjunction example

The following example expresses that :User nodes must satisfy two shape expressions at the same time. Notice that the appearance of the repeated property schema:owns means that both expressions must be satisfied, i.e., that the value of schema:owns must be an IRI and must have shape :Product, which must have a property schema:productId whose value is a xsd:string between 5 and 10 characters.

:User { schema:name xsd:string ; schema:owns IRI } AND { schema:owns @:Product } :Product { schema:productId xsd:string AND MINLENGTH 5 AND MAXLENGTH 10 }

:alice schema:name "Alice" ; #Passes as a :User schema:owns :product1 . :bob schema:name "Robert" ; #Fails as a :User schema:owns :product2, :product3 . :carol schema:name "Carol" ; #Fails as a :User schema:owns _:x . :product1 schema:productId "Product1" . #Passes as a :Product :product2 schema:productId "Product2" . #Passes as a :Product :product3 schema:productId "Product3" . #Passes as a :Product :product4 schema:productId "P4" . #Fails as a :Product _:x schema:productId "ProductX" . #Passes as a :Product

If the left-hand side of the conjunction is a node constraint, the AND keyword can be omitted.

Example 73 Omitting ANDs

In the following schema, :User1 and :User2, and :Product1 and :Product2 are equivalent:

:User1 IRI AND { schema:name xsd:string } :User2 IRI { schema:name xsd:string } :Product1 { schema:productId xsd:string AND MINLENGTH 5 AND MAXLENGTH 10 } :Product2 { schema:productId xsd:string MINLENGTH 5 MAXLENGTH 10 }

Reusing shape expressions

A common situation is to declare a set of constraints that we want to repeat.

Example 74 Reusing constraints

In the following example, we reuse :CompanyConstraints in two places (for schema:worksFor and for schema:affiliation).

:CompanyConstraints IRI /^http:\/\/example.org\/id[0-9]+/ @:CompanyShape :User { schema:name xsd:string; schema:worksFor @:CompanyConstraints; schema:affiliation @:CompanyConstraints } :CompanyShape { schema:founder xsd:string; }

:alice schema:name "Alice" ; #Passes as a :User schema:worksFor :id1 ; schema:affiliation :id2 . :id1 schema:founder "Robert" . :id2 schema:founder "Carol" .

Another example of shape reuse is to extend a shape with more constraints emulating a kind of inheritance as in Object-Oriented languages.

Example 75 Extending shapes

The following example declares a top-level shape :Person whose nodes must have rdf:type with value schema:Person and schema:name. The shape :User extends :Person adding a new constraint on the existing property schema:name and declaring the need of another property schema:email. Finally, the shape :Student extends :User adding a new property :course.

:Person { a [ schema:Person ] ; schema:name xsd:string ; } :User @:Person AND { schema:name MaxLength 20 ; schema:email IRI } :Student @:User AND { :course IRI *; }

:alice a schema:Person ; # Passes as a :Person schema:name "Alice" . :bob schema:name "Robert"; # Fails as a :User schema:email <bob@example.org> . # lacks rdf:type :Person *) :carol a schema:Person; # Passes as a :Person and :User schema:name "Carol" ; schema:email <carol@example.org> . :dave a schema:Person; # Passes as a :Person, :User and Student schema:name "Carol" ; schema:email <carol@example.org>; :course :algebra .

Notice that this kind of reuse requires the shapes extended to be compatible with the new ones. Otherwise, there will be no nodes satisfying them.

For example, we may want to declare a :Teacher shape extending :User but adding the constraint that teachers have no email.

:Teacher @:User AND { schema:email . {0,0} ; }

However, there will be no nodes satisfying it, because shape :User prescribes that they must have exactly one schema:email, while the extended shape :Teacher prescribes that they must have no schema:email.

In order to obtain the desired model, it is necessary that the shapes to be extended are general enough to be compatible with the new shapes. In this case, for example, it would be better to declare that the cardinality of schema:email in :User was optional.

4.8.2 Disjunction

The Or operator combines two shape expressions with an inclusive disjunction, i.e., either one side or the other, or both must be satisfied.

Example 76 Disjunction

The following example declares that nodes of shape :User must have either a schema:name with xsd:string value or a combination of schema:givenName and schema:familyName with xsd:string values, or both.

:User { schema:name xsd:string } OR { schema:givenName xsd:string ; schema:familyName xsd:string }

:alice schema:name "Alice" . #Passes as a :User :bob schema:givenName "Robert"; #Passes as a :User schema:familyName "Smith" . :carol schema:name "Carol King" ; #Passes as a :User schema:givenName "Carol"; schema:familyName "King" .

Example 77 Difference between Or and |

There is a difference between the Or and the choice ( |) operator. The former defines an inclusive-or, while the latter specifies an exclusive-or in this case (only one of the shape expressions must be satisfied, but not both).

:User1 { schema:name xsd:string } OR { schema:givenName xsd:string ; schema:familyName xsd:string } :User2 { schema:name xsd:string | schema:givenName xsd:string ; schema:familyName xsd:string }

:alice schema:name "Alice" . #Passes as a :User1 and :User2 :bob schema:givenName "Robert"; #Passes as a :User1 and :User2 schema:familyName "Smith" . :carol schema:name "Carol King" ; #Passes as a :User1 schema:givenName "Carol"; #Fails as a :User2 schema:familyName "King" . :dave schema:name "Dave" ; #Passes as a :User1 schema:givenName "Dave" . #Fails as a :User2

Example 78 Disjunction of datatypes

A common use case is to declare that the value of some property is the disjunction of several datatypes or value sets. The following example declares that products must have a rdfs:label with a string value or a language tagged literal (remember that those literal have type rdf:langString), and a schema:releaseDate whose values must be either xsd:date, xsd:gYear or one of the values "unknown-past" or "unknown-future".

:Product { rdfs:label xsd:string OR rdf:langString; schema:releaseDate xsd:date OR xsd:gYear OR [ "unknown-past" "unknown-future" ] }

:p1 a :Product ; #Passes as a :Product rdfs:label "Laptop"; schema:releaseDate "1990"^^xsd:gYear . :p2 a :Product ; #Passes as a :Product rdfs:label "Car"@en ; schema:releaseDate "unknown-future" . :p3 a :Product ; #Fails as a :Product rdfs:label :House ; schema:releaseDate "2020"^^xsd:integer .

Emulating recursive property paths

SPARQL property paths are a very expressive feature that can define complex expressions. ShEx does not support property paths in order to have a more controlled way to define shapes. However, using nested shapes (see Example 58), recursion and logical operators, it is possible to emulate their behavior.

Example 79 SHACL instance of Person

In SHACL, instances are declared by the expression rdfs:subClassOf*/rdf:type, which defines the closure of the rdfs:subClassof property followed by rdf:type (see Section 5.7.2). The following example declares that nodes conforming to shape :Person must be SHACL instances of schema:Person.

:Person { a @:PersonShape } :PersonShape [ schema:Person ] OR { rdfs:subClassOf @:PersonShape }

:alice a schema:Person . #Passes as a :Person :bob a :Teacher . #Passes as a :Person :carol a :Assistant . #Passes as a :Person :Teacher rdfs:subClassOf schema:Person . :Assistant rdfs:subClassOf :Teacher .

4.8.3 Negation

NOT s creates a new shape expression from a shape s. Nodes conform to NOT s when they do not conform to s.

Example 80 Not

:NoName Not { schema:name . }

:alice schema:givenName "Alice" ; #Passes as a :NoName schema:familyName "Cooper" . :bob schema:name "Robert" . #Fails as a :NoName :carol schema:givenName "Carol" ; #Fails as a :NoName schema:name "Carol" .

A common use case for Not is to check other shapes. Defining a shape :NotS as Not :S, all nodes in an RDF graph can be valid, some of them will conform to :S while the others will conform to :NotS. In this way, a continuous integration system can define the shape map that all nodes must satisfy (either positive or negatively) and check whether they satisfy it or not.

Example 81 Not

The following code declares a shape :User and its complementary :NotUser.

:User { schema:name xsd:string ; schema:birthDate xsd:date? ; } :NoUser Not @:User

Both nodes :alice and :bob conform to one of the shapes, :alice to :User and :bob to :NoUser.

:alice schema:name "Alice" ; #Passes as a :User schema:birthDate "1980-03-10"^^xsd:date . :bob schema:name 23 ; #Passes as a :NoUser schema:birthDate "Unknown" .

Difference between Not and Max-cardinality 0

The operator Not checks that a node fails to conform to a whole shape expression. Sometimes, the intended meaning is not to negate a whole shape expression but to declare that some properties cannot appear. This behavior is better described by declaring the maximum cardinality to 0.

Example 82 Difference between Not and Max-0

Shape :NoName1 prohibits the appearance of property schema:name establishing its maximum cardinality to 0. Shape :NoName2 looks like it does the same thing using the negation. However, notice that :NoName2 will be satisfied by any node that does not conform to schema:name xsd:string

:NoName1 { schema:name xsd:string {0} } :NoName2 Not { schema:name xsd:string }

The behavior differs for node :bob which conforms to :NoName2. The reason is that it fails to have a string value for schema:name so it fails to conform to the shape {schema:name xsd:string} and thus, conforms to :NoName2.

:alice schema:name "Alice". #Fails as a :NoName1 and :NoName2 :bob schema:name 23 . #Fails as a :NoName1 Passes as a :NoName2 :carol foaf:age 34 . #Passes as a :NoName1}*) \Passes{:NoName2

IF-THEN pattern

A common pattern is the IF-THEN construct: if some condition holds, then a given shape expression must be satisfied.

This pattern can be modeled using the logical operators OR and NOT. Remember that IF x THEN y is equivalent to (NOT x) OR y.

Example 83 IF-THEN pattern example

The following example specifies that all products must have a schema:productID and if a product has type schema:Vehicle, then it must have the properties schema:vehicleEngine and schema:fuelType.

:Product { schema:productID . } AND NOT { a [ schema:Vehicle ] } OR { schema:vehicleEngine . ; schema:fuelType . }

:kitt schema:productID "C21"; #Passes as a :Product a schema:Vehicle; schema:vehicleEngine :x42 ; schema:fuelType :electric . :bad schema:productID "C22"; #Fails as a :Product a schema:Vehicle; schema:fuelType :electric . :c23 schema:productID "C23" ; #Passes as a :Product a schema:Computer .

IF-THEN-ELSE pattern

The IF-THEN-ELSE pattern construct can be defined in a similar way. In this case:

IF X THEN Y ELSE Z≡((NOT X) OR Y) AND (X OR Z)

Example 84 IF-THEN-ELSE pattern example

The following shape declares that if a product has type schema:Vehicle, then it must have the properties schema:vehicleEngine and schema:fuelType, otherwise, it must have the property schema:category with a xsd:string value.

:Product ( NOT { a [ schema:Vehicle ] } OR { schema:vehicleEngine . ; schema:fuelType . } ) AND ({ a [schema:Vehicle] } OR { schema:category xsd:string } )

With the following data, nodes :kitt and :c23 conform to :Product each one passing one of the branches, while :bad1 and :bad2 do not conform.

:kitt a schema:Vehicle; #Passes as a :Product schema:vehicleEngine :x42 ; schema:fuelType :electric . :c23 a schema:Computer ; #Passes as a :Product schema:category "Laptop" . :bad1 a schema:Vehicle; #Fails as a :Product schema:fuelType :electric . :bad2 a schema:Computer . #Fails as a :Product

Restriction on cyclic dependencies with negation

One problem of combining recursion with negation freely is the possibility of defining paradoxical shapes.

Example 85 Barber’s paradox

The following shape declares a :Barber as someone who shaves a person but does not shave a barber.

:Barber { # Violates the negation requirement :shaves @:Person } AND NOT { :shaves @:Barber } :Person { schema:name xsd:string }

Given the following data:

:albert :shaves :dave . #Passes as a :Barber :bob schema:name "Robert" ; #Passes as a :Person :shaves :bob . # Passes :Barber or not? *) :dave schema:name "Dave" . #Passes as a :Person

It is easy to check that :bob conforms to :Person (he has schema:name with a xsd:string value), so he shaves a person, but:

Does :bob conform to :Barber?

If we assume he does, then it should not shave another barber, but as he shaves himself, and we assumed he conformed to :Barber then he fails the constraint of not shaving barbers which means that he should not conform. On the other hand, if we assumed he does not conform to :Barber then he satisfies both constraints, and he should conform to :Barber.

This kind of problems that arise when combining negation and recursion have been studied by the logic programming and databases community. Several approaches have been studied such as negation-as-failure, stratified negation and well-founded semantics [1].

ShEx imposes a constraint to avoid ill formed data models: whenever a shape refers to itself either directly or indirectly, the chain of references cannot traverse an occurrence of the negation operation NOT.

The previous shape :Barber violates the negation requirement as is has one self reference pointing to itself that includes a negation. More formally, we say that there is a dependency from :ShapeA to :ShapeB if the definition of :ShapeA contains a reference @:ShapeB.

We say that a dependency from :ShapeA to :ShapeB is a negative dependency if at least one of the following holds:

the occurrence of @:ShapeB in the definition of :ShapeA appears under an occurrence of the negation operator NOT; and
there is a triple constraint :prop @:ShapeB in the definition of :ShapeA and the property :prop is declared as EXTRA in the corresponding triple expression.

In the latter case, the negation operator NOT does not appear explicitly, but we still need to verify that a :ShapeB is not satisfied in some neighbor nodes. This was called hidden negation in Section 4.6.8.

4.9 Shape Maps

The ShEx 2 specification is focused on the semantics of the validation language and separates the invocation mechanisms to a different specification called Shape Maps [77]. They were already introduced in Section 4.4.2 and are node/shape associations that are used as input to the validation process and are also the result of it.

In ShEx, the construction of shape maps is orthogonal to their use in validation. Decoupling these processes enables ShEx to address a wide range of use cases. Just as XML Schema could not have predicted its use in WSDL (a protocol that was developed years later), it is impossible to predict the many and varied ways in which shape maps may be constructed in the future.

The current ShapeMap specification defines three kinds of shape map.

Fixed shape map: input to the validation process.
Query shape map: query mechanism to construct a fixed shape map.
Result shape map: result of validation.

Each of these consists of a comma-separated list of node/shape associations with at least two components.

nodeSelector - identify a set of RDF nodes.
shapeLabel - select a shape expression from the schema.

The simplest kind of shape map is a fixed shape map.

4.9.1 Fixed Shape Maps

ShEx validation takes as input a set of nodeSelector/shapeLabel pairs called a fixed shape map.

The shapeLabel is either the label for a shape expression in the schema or the case-insensitive keyword START to identify the start shape (see Section 4.4.4).

For the fixed shape map, the nodeSelector is one of:

an RDF IRI,
an RDF literal, or
for systems which support it, the label of a bnode in an RDF dataset.

Note that because the shapeLabel can identify a shape expression with only node constraints, one can use ShEx to valdiate RDF terms that do not appear in the graph. This can be useful for testing membership in a value set or verifying the form of a URL.

Fixed shape maps have a compact syntax which consists of separating each shape association by comma and node selectors from shape labels by @:

:alice@:User, :alice@:Employee, :bob@:User

4.9.2 Query Shape Maps

The query shape map extends the fixed shape map to enable simple pattern matching to select focus nodes from the data graph. This is done by permitting the node selectors to be either an RDF node as in a fixed map or a triple pattern. A triple pattern can have a focus keyword to represent the nodes that will be validated and a node or wildcard (represented by the underscore character _).

Example 86 Query shape map example

The shape map:

{ FOCUS schema:worksFor _ }@:User, { FOCUS rdf:type schema:Person}@:User, { _ schema:worksFor FOCUS }@:Company

associates all subjects of property schema:worksFor and all nodes of type schema:Person with :User, and all objects of property schema:worksFor with shape :Company.

Any node in the data graph which is both of type schema:Person and the subject of a schema:worksFor triple would be selected by both triple patterns and associated with :User in the fixed map. Such duplicates are eliminated in accordance with the rule that a shape map can have no duplicate pairs of node selector and shape label.

Figure 4.10: Shape map resolution which accepts a query shape map and emits a fixed shape map.

While the node selector may be a triple pattern, it may also be an RDF node as we would see in a fixed shape map. Common idioms of query map can do the following.

Explicitly bind nodes to shapes. This effectively adds one nodeSelector/shapeLabel pair to the shape map. This mechanism is employed in SHACL with the declaration sh:targetNode (see Section 5.7).
Declare that all nodes with some property must match a given shape. This mechanism is also defined in SHACL with the declarations sh:targetSubjectsOf and sh:targetObjectsOf.
Select nodes with a given property and value. This refinement of the previous approach is especially useful for general-purpose predicates like rdf:type. In fact, the SHACL directive sh:targetClass offers a similar selection mechanism for the rdf:type predicate (the difference is that SHACL uses the notion of SHACL instance), see 5.7.2). As with the above selectors, this one is very use-case specific—one may not want to say that everything with an rdf:type property should be validated against a :Person, but it may be reasonable to select everything with type :Employee.

While it is not currently part of the shape map specification, the Wikidata use of shape maps extends the nodeSelector to contain a SPARQL query, enabling another common use case.

Select nodes or node/shape pairs by SPARQL query or inference. Where earlier mechanisms are all limited to either a direct identification of an RDF node or its selection by triple pattern, this one enables a more nuanced heuristics in the selection of focus nodes.

Query shape maps are not the only way to select focus nodes. For instance, it would make sense to associate a shape with a service endpoint. The Linked Data Platform [93] defines a notion of container which handles requests to get, create, modify and delete objects with a given structure. While it does not specify a mechanism to publish that structure or validate incoming data against it, earlier work at OSLC used Resource Shapes for that purpose. It is reasonable to assume that protocols like the linked data platform will exploit shapes technology, perhaps with the added precision of using HTTP Link headers to specify a node of interest, which would be associated with the related shape with that interface.

4.9.3 Result Shape Maps

The product of validation is a result shape map which is annotated with errors encountered while testing the conformance of each node/shape pair. The result shape map is again an extension of the fixed map. Each nodeSelector/shapeLabel association in the result shape map may include any of these three additional components:

result: either conformant or nonconformant;
reason: a human-readable report, usualy to explain a non-conformant result; or
appInfo: a machine readable structure.

Engines vary in how they report errors, and they may add extra information to the resulting shape map. Some implementations extend this to include machine-readable failure messages in case of errors or recursive proof of conformance in case of success.

Example 87 Full validation process

Given the following ShEx schema:

:User { schema:name xsd:string ; schema:knows @:User* }

and the RDF data:

:alice schema:name "Alice"; schema:knows :carol . :bob schema:name "Robert"; schema:knows :carol . :carol schema:name "Carol" .

If we have the query shape map:

{FOCUS schema:knows _ }@:User

A shape map resolver would generate the fixed shape map:

:alice@:User, :bob@:User

After applying the validation process, the result shape map obtained would be:

:alice@:User, :bob@:User, :carol@:User

Figure 87 depicts a whole validation process with the different shape maps involved.

Figure 4.11: Full validation process with query, fixed, and result shape map.

4.9.4 JSON Representation

The fixed shape map from Figure 87 can be represented as:²

Example 88 JSON representation of shape maps

[ { "node": ":alice", "shape": ":User" }, { "node": ":bob", "shape": ":User" } ]

The output shape map would be:

[ { "node": ":alice", "shape": ":User", "status": "conformant" }, { "node": ":bob", "shape": ":User", "status": "conformant" }, { "node": ":carol", "shape": ":User", "status": "conformant" } ]

4.9.5 Chaining Validation Workflows

Because the input and output of the validation process is a shape map, long-running workflows can use the result shape map as a starting state for further validation. This is useful when shapes have inter-dependencies, i.e., when validating one node/shape pair requires validating others. Let’s look at a simplified subset of that schema and data.

Example 89 ShEx validator and shape maps

Given the following schema:

:User { schema:name xsd:string ; schema:knows @:User* }

and RDF graph

:alice schema:name "Alice"; schema:knows :bob . :bob schema:name "Robert" .

If we were to individually validate :alice and :bob, we would validate :bob twice, once while validating :alice’s schema:knows arc and once for the explicit call to validate :bob.

4.10 Semantic Actions

Semantic actions³ serve as an extension point for Shape Expressions. They can be used to signal a failure or perform some operations during the validation process.

A semantic action contains a label that indicates the language in which the action is written and a string with its contents. When the ShEx validator finds a semantic action, it checks if it has a processor for that language and calls it with the action contents. The result of the processor is cast to a Boolean value, in case the result is false, the corresponding shape would fail.

Example 90 Semantic actions

The following example uses a hypothetical Javascript semantic actions processor to capture the start and end events in a conference and to check that the start date is before the end date.

prefix js: <http://shex.io/extensions/javascript>

:Event {
 schema:startDate xsd:dateTime  %js:{ let start = o %} ;
 schema:endDate   xsd:dateTime  %js:{ let end = o %} ;
}

The following example checks that the declared area of a rectangle is effectively its width times height.

prefix js: <http://shex.io/extensions/javascript>

:Rectangle {
 :height xsd:float  %js:{ let height = o %} ;
 :width  xsd:float  %js:{ let width = o %} ;
 :area xsd:float    %js:{ o = height * width  %} 
}

Semantic actions have been employed to transform RDF files to other formats like XML or JSON [80], or even other ShEx schemas as performed by the Map extension.⁴

The test suite defines a single extension language called Test⁵ that can fail a validation and/or return a message.

4.11 ShEx and Inference

ShEx was designed as an RDF validation language which is independent of reasoners or inference systems. A ShEx processor takes as input an RDF graph and checks if its nodes conform to the shapes defined in a ShEx schema. The shapes describe the topology of the RDF graph taking into account the possible values of nodes as well as the incoming and outgoing arcs. In ShEx, a triple whose predicate is rdf:type is treated as any other triple, and in fact there is no special treatment for nodes that are also RDF classes. ShEx separates RDF classes and types following the guidelines described in Section 3.2.

This independence between ShEx and reasoners makes it possible to apply a ShEx processor to a plain RDF graph before inference, to validate the resulting graph after applying a reasoner, or even to validate the intermediate graphs during the reasoning phase, checking reasoner’s behavior.

Example 91 Validating data before and after inference

The following shapes can be used to check an RDF graph before and after RDF Schema inference. Shape :TeacherBefore describes that nodes must have rdf:type :Teacher, a property schema:name with a xsd:string value and zero or more properties :teaches whose nodes must conform to :Course.

Shape :TeacherAfter describes the shape that teachers must have after inference. For example, they must have rdf:type :Teacher and :Person, and the values of property :teaches must have rdf:type :Course.

:TeacherBefore EXTRA a { a [:Teacher]? ; schema:name xsd:string ; :teaches @:Course* } :TeacherAfter EXTRA a { a [:Teacher]; a [:Person]; schema:name xsd:string ; :teaches { a [:Course] } @:Course } :Course { a [:Course]? }

If we validate the following RDF data before applying inference, nodes :bob and :carol do not conform to shape :TeacherAfter

:alice a :Teacher, :Person; #Passes as a :TeacherBefore schema:name "Alice" ; #Passes as a :TeacherAfter :teaches :algebra . :bob schema:name "Robert" ; #Passes as a :TeacherBefore :teaches :logic . #Fails as a :TeacherAfter :carol a :Teacher ; #Passes as a :TeacherBefore schema:name "Carol" . #Fails as a :TeacherAfter :algebra a :Course . :teaches rdfs:domain :Teacher . :teaches rdfs:range :Course . :Teacher rdfs:subClassOf :Person .

On the other side, if we validate the previous RDF graph after applying RDF Schema inference, both :bob and :carol should conform to :TeacherAfter.

This combination of shapes before and after inference can be used to check the behavior of a reasoner. For example, if in the previous case, a faulty RDFS reasoner does not infer that :logic must have rdf:type :Course, :bob would not conform to :TeacherAfter and the bug could be detected.

4.12 Importing schemas

ShEx has an import keyword that specifies the IRI of another schema that can be imported. The ShEx processor puts the labeled shapes and triple expressions of the imported schema in scope for resolution of references in the importing document. If the imported schema imports other schemas, they are also imported.

Example 92 Import example

For example, if there is a schema located at http://example.org/Person.shex with the content.

:Person { $:name ( schema:name . | schema:givenName . ; schema:familyName . ) ; schema:email . }

And we define a new schema as.

import <http://example.org/Person.shex> :Employee { &:name ; schema:worksFor <CompanyShape> } :Company { schema:employee @:Employee ; schema:founder @:Person ; }

:alice schema:name "Alice"; #Passes as a :Employee schema:worksFor :OurCompany . :OurCompany schema:employee :alice ; schema:founder :bob . :bob schema:name "Robert" ; schema:email <mailto:bob@example.com> .

The ShEx processor imports each imported schemas exactly once so cyclic imports are allowed. For instance, a schema may import itself or it may import some schema which directly or indirectly imports it.

However, it is an error to import a schema which attempts to re-define a shape expression or triple expression. For instance, if http://example.org/Person.shex defined either :Employee or :Company, or if the importing schema defined :name, the import would fail and processing would stop.

4.13 RDF and JSON-LD Syntax

The ShEx language is defined in terms of a JSON-LD syntax, called “ShExJ”, which separates the compact syntax details from the language specification. This serves as an abstract syntax in that it has constructs to capture all of the logic of ShEx. Having an abstract syntax provides a clear definition of the language, makes it easier to write language processors and encourages the definition of other concrete syntax formats. The fact that it is JSON-LD means that the RDF representation of ShEx, called “ShExR”, is simply the JSON-LD interpretation of ShExJ.

Example 93

The following ShEx schema

PREFIX : <http://example.org/> PREFIX schema: <http://schema.org/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> :User IRI { schema:name xsd:string ; schema:knows @:User* }

can be represented in ShExR as⁶:

PREFIX sx: <http://shex.io/ns/shex#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> prefix : <http://example.org/> prefix schema: <http://schema.org/> <> a sx:Schema ; sx:shapes :User . :User a sx:ShapeAnd ; sx:shapeExprs ( [ a sx:NodeConstraint ; sx:nodeKind sx:iri ] [ a sx:Shape; sx:expression [ a sx:EachOf ; sx:expressions ( [ a sx:TripleConstraint ; sx:predicate schema:name ; sx:valueExpr [ a sx:NodeConstraint ; sx:datatype xsd:string ] ] [ a sx:TripleConstraint ; sx:predicate schema:knows ; sx:valueExpr :User; sx:min 0 ; sx:max -1 ] ) ] ] ).

It can can also be represented in JSON-LD as:

{ "@context": "https://shexspec.github.io/context.jsonld", "type": "Schema", "shapes": [ { "type": "ShapeAnd", "shapeExprs": [ { "type": "NodeConstraint", "nodeKind": "iri" }, { "type": "Shape", "expression": { "type": "EachOf", "expressions": [ { "type": "TripleConstraint", "predicate": "http://schema.org/name", "valueExpr": { "type": "NodeConstraint", "datatype": "xsd:string" } }, { "type": "TripleConstraint", "predicate": "http://schema.org/knows", "valueExpr": "http://example.org/User", "min": 0, "max": -1 } ] } } ], "id": "http://example.org/User" } ] }

4.14 Summary

In this chapter we learned about the ShEx language.

ShEx was designed as a human-readable language for RDF description and validation.
ShEx can be considered as a grammar for RDF.
There are two syntaxes for ShEx: A compact syntax and an RDF-based.
ShEx defines the notion of shape expressions and node constraints.
Shape Expressions can be combined using the logical operators: AND, OR, and NOT on top of triple expressions.
Triple expressions declare the topology of the neighborhood of a node (incoming and outgoing edges).
Node constraints declare constraints on the form of a single node.
Semantic actions offer an extension mechanism over ShEx.

4.15 Suggested Reading

We collected the following selection of references about Shape Expressions.

Short introduction to ShEx: Baker and Prud’hommeaux,2017 [7]
ShEx 2.0 language specification: Prud’hommeaux, Boneva, Labra Gayo, and Kellog,2017 [81]
Description of the first version of ShEx: Prud’hommeaux, Labra Gayo, and Solbrig,2014 [80]
An algorithm to implement Shape Expressions based on derivatives: Gayo, Prud’hommeaux, Boneva, Staworko, Solbrig, and Hym,2016 [40]
Theoretical foundations of ShEx: Staworko, Boneva, Labra Gayo, Hym, Prud’hommeaux, and Solbrig,2015 [94] http://labra.github.io/pdf/2015_ComplexityExpressivenessShEx.pdf
Well-founded semantics of shape schemas (which are the basis of ShEx): Boneva, Labra Gayo, and Prud’hommeaux,2017 [11] https://labra.github.io/pdf/2017_SemanticsValidationShapesSchemas.pdf

1: We will see that the pipe operator can also be used to form triple expressions in Section 4.6.4.
2: At the time of this writing shape maps specification requires full IRIs but we use prefixed IRIs for simplicity.
3: The name semantic actions is inspired by parser generators. It is not related to the semantic web.
4: See: http://shex.io/extensions/Map/
5: See: http://shexspec.github.io/extensions/Test/
6: Note that a value of -1 in max means unbounded.

Name	Description	Examples
Anything	The value can be anything	`.`
Datatype	The value must be an element of that datatype	`xsd:string` `xsd:date` `cdt:distance` …
Node kind	The value must have that kind	`IRI` `BNode` `Literal` `NonLiteral`
Value set	The value must be an element of that set	`[:Male` `:Female]`
Shape reference	The value must conform to `<User>`	`@:User`

Value	Description	Examples
`Literal`	Any RDF literal	`"Alice"` `"Spain"@en` `42` `true`
`IRI`	Any RDF IRI	`<http://example.org/Alice>` `ex:alice` `:bob`
`BNode`	Any blank node	`_:x` `[]`
`NonLiteral`	Any IRI or blank node	`<http://example.org/alice>` `_:x`

Facet and argument	Passing values	Failing values
`MinInclusive` `1`	`"1"^^xsd:decimal`, `1`, `2`, `98`, `99`, `100`	`"1"^^xsd:string`, `-1`, `0`
`MinExclusive` `1`	`2`, `98`, `99`, `100`	`-1`, `0`, `1`
`MaxInclusive` `99`	`1`, `2`, `98`, `99`	`100`
`MaxExclusive` `99`	`1`, `2`, `98`	`99`, `100`
`TotalDigits` `3`	`"1"^^xsd:integer`, `9`, `999`, `0999`, `9.99`, `99.9`, `0.1020`	`"1"^^xsd:string`, `1000`, `01000`, `1.1020`, `.1021`, `0.1021`
`FractionDigits` `3`	`"1"^^xsd:decimal`, `0.1`, `0.1020`, `1.1020`	`"1"^^xsd:integer`, `0.1021`, `0.10212`
`Length` `3`	`"123"^^xsd:string`, `"123"^^xsd:integer`, `"abc"`	`"12"^^xsd:string`, `"12"^^xsd:integer`, `"ab"`, `"abcd"`
`MinLength` `3`	`"abc"`, `"abcd"`	`""`, `"ab"`
`MaxLength` `3`	`""`, `"ab"`, `"abc"`	`"abcd"`, `"abcde"`
`/^ab+/` Regex pattern	`"ab"`, `"abb"`, `"abbcd"`	`""`, `"a"`, `"acd"`, `"cab"` `"AB"`, `"ABB"`, `"ABBCD"`
`/^ab+/i` Regex pattern with `i` flag	`"ab"`, `"abb"`, `"abbcd"` `"AB"`, `"ABB"`, `"ABBCD"`	`""`, `"a"`, `"acd"`

Regular Expression	Some values that match	Some values that don’t match
`P\d{2,3}`	`P12` `P234`	`A1` `P2n` `P1` `P2233`
`(pa)*b`	`b` `pab` `papab` `papapab` …	`pa` `po`
`(pa)*b`	`b` `pab` `papab` `papapab` …	`pa` `po`
`[a-z]{2,3}`	`ab` `abc`	`a` `abcd` `23`
`[a-z]{2,3}`	`ab` `abc`	`a` `abcd` `x45` `23`

Value	Description
`*`	0 or more
`+`	1 or more
`?`	0 or 1
`{m}`	Exactly m repetitions
`{m,n}`	Between m and n repetitions
`{m,}`	m or more repetitions

Operation	Description
`AND`	`S1` `AND` `S2` is satisfied if and only if both are satisfied
`OR`	`S1` `OR` `S2` is satisfied if and only if `S1` or `S2` (or both) are satisfied
`NOT`	`NOT` `S` is satisfied if and only if `S` is not satisfied