RDF for "Little Languages"

Query, Transformation and Report Generation

© Graham Klyne, 25-Apr-2002, revised 5-Jun-2002, 21-Oct-2002, 1-Dec-2002.

This note describes an experimental software development in which RDF/N3 is used to code query and report generation functions performed on RDF data.

The Python source code for the software described here, and some sample data, are published at [11].

Table of Contents

1. Introduction

In his Programming Pearls column in the August 1986 issue of Communications of the ACM [1], Jon Bentley introduced the idea of "little languages". Briefly, these little languages are structured formats used to represent inputs to a computer program. Typically, a little language is expressed using text that a user can type in via a keyboard. This note describes an experiment to use RDF [2] and Notation3 [3] as a medium for expressing little languages.

The experiment is based on a simple application: to use RDF-based descriptions of message header fields to create registry entries and documentation. This application has been developed in conjunction with a proposal to define an IETF registry for message header fields [4]. The RDF style of metadata seems to be a natural form for the raw header field information from which the registry entries are generated.

The application itself is very simple: it involves reading header field information from one or more sources, and generating a output in a number of formats:

The header field information is coded initially in Notation3 [3], which can, if desired, be converted to RDF/XML using Tim Berners-Lee's cwm program [6]. The "report generator" application to generate the formats noted above contains the following parts:

The last three of these use little languages coded in N3 to define their application-specific logic.

2. Format and schema for header field information

Here is an example header field, coded in Notation 3:

@prefix rdf:      <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:     <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf:     <http://xmlns.com/foaf/0.1/> .
@prefix hdr:      <http://id.ninebynine.org/wip/2002/IETF/MsgHdr/> .
<http://id.ninebynine.org/wip/2002/IETF/RFC2822#Content-features>
   a hdr:HeaderField ;
   hdr:fieldName "Content-features" ;
   rdfs:label "Indicates content features of a MIME body part" ;
   hdr:protocol
    [ hdr:protocolName "mail" ;
      hdr:specification
       [ = <urn:ietf:rfc:2822> ;
         hdr:document <http://www.rfc-editor.org/rfc/rfc2822.txt> ] ] ;
    hdr:status "standards-track" ;
    hdr:author
     [ foaf:name "Graham Klyne" ;
       foaf:mbox "GK-headers@ninebynine.org" ;
       foaf:organization "Clearswift Corporation" ;
       foaf:workplacePostal
        [ foaf:building "1310 Waterside" ;
          foaf:street1  "Arlington Business Park" ;
          foaf:street2  "Theale" ;
          foaf:city     "Reading" ;
          foaf:area     "Berks" ;
          foaf:postcode "RG7 4SA" ;
          foaf:country  "UK" ] ;
       foaf:workplaceTel "011 8903 8903" ;
       foaf:workplaceFax "011 8903 9000" ;
       foaf:workplaceHomepage <http://www.mimesweeper.com/> ] ;
    hdr:specification
     [ = <urn:ietf:rfc:2912> ;
       hdr:document <http://www.rfc-editor.org/rfc/rfc2912.txt> ;
       hdr:section  "3" ] ;
    rdfs:comment
"""The 'Content-features:' header can be used to annotate
a MIME body part with a media feature expression,
to indicate features of the body part content.
See also: RFC 2533, RFC 2506, RFC 2045.""" .

This data makes use of the following vocabularies, in addition to the standard RDF [2] and RDFS [12] identifiers:

2.1 Prefix: foaf

http://xmlns.com/foaf/0.1/

This vocabulary is defined by for the RDFWeb project [13] by Dan Brickley and others. I have used some addional terms.

Class foaf:Person, additional properties:

foaf:organization
Name of organization with which person is affiliated (literal)
foaf:workplaceTel
Person's workplace telephone number (literal)
foaf:workplaceFax
Person's workplace fax number (literal)
foaf:workplacePostal
Person's workplace postal address (foaf:PostalAddress)

New class foaf:PostalAddress:

This class can be used for any postal address. (Here, it is used with property foaf:workplacePostal to indicate a person's workplace postal address.)

foaf:building
Building name (literal)
foaf:street1
1st street address (literal)
foaf:street2
2nd street address (literal)
foaf:city
City name (literal)
foaf:area
Area, region or state name (literal)
foaf:postcode
Postal code or zip code (literal)
foaf:country
Country (literal)

2.2 Prefix: hdr

http://id.ninebynine.org/wip/2002/IETF/MsgHdr/

This vocabulary is used to describe properties of a protocol message header field (as in RFC2822 or HTTP).

Class hdr:HeaderField:

This class describes a header field registry entry.

hdr:fieldName
Header field name (literal)
hdr:label
A one-line description of the header field's purpose (literal)
hdr:comment
Longer description of the header field, including citations of related specification documents (literal)
hdr:status
Header field status (literal)
hdr:author
Header field registry entry author/change controller (foaf:Person)
hdr:protocol
Header field applicable protocol (hdr:ApplicableProtocol)
hdr:specification
Header specification details (hdr:SpecificationDoc)

Class hdr:ApplicableProtocol:

Describes a protocol with which the header field is used.

hdr:protocolName
Short protocol name ("mail", "http", "news", etc.). This name may also be used as a subdirectory name for the detailed header field description.
hdr:specification
Protocol specification details (hdr:SpecificationDoc)

Class hdr:SpecificationDoc:

Locates a specification, by document and optional section.

hdr:document
Specification document (URI).
hdr:section
Section number or description for relevant specification (literal)

2.3 Prefix: rep

http://id.ninebynine.org/wip/2002/IETF/MsgHdr/

This prefix is used by the report generation "little language" code. The various properties and values used are grouped by the functions they are used with.

Class rep:QueryPattern:

This class describes a query pattern. An instance of this class is the head of a list whose members are blank nodes each with one or more of the following properties.

rep:uri
this property attaches to a URI match node, and indicates the URI label to be matched (URI).
rep:member
this special property matches any RDF container membership property of the form rdf:_n. It is used when scanning the content of RDF containers.
rep:element
this special property matches any subgraph sequence containing zero or more rdf:rest properties followed by rdf:first. Thus, it corresponds to a generic list element relation. It is used when scanning the content of RDF lists.
rep:lit
this property attaches to a literal match node, and indicates the literal to be matched (literal).
rep:var
this property attaches to a query variable node, and indicates the query variable name (literal).
rep:and
this property attaches to a branching node, and indicates a branch to be matched (list). Multiple instances of this property are used to indicate multiple branches from the current graph node that must all be matched.
rep:alt
this property attaches to a rep:and branching node, and indicates a branch to be matched if all of the rep:and branches cannot be matched (list). Only one rep:alt property can be applied to any given node, and it is examined only if the available rep:and branches are not all matched.
rep:opt
this property is similar to rep:and, except that it cannot be combined with rep:alt (list). Instead, it indicates that the node is an optional match; all or none of the indicated branches are matched against the graph. This is equivalent to a node with rep:and properties, and an empty rep:alt pattern.

Class rep:FormatTemplate:

This class describes a formatter template. A instance is a sequence of nodes, each of which describes some value to be appended to a result string. Literal node labels are copied directly to the result string, or some other formatting code. The complete set of values are:

(literal)
A string that is copied directly into the output string or file.
[rep:var "name"]
a blank node with this property indicates a variable to be copied. If the named variable is bound to a literal or URI node, then the node label is copied. If the variable is bound to a blank node, then a unique blank node identifier value is copied (mainly for diagnostic purposes: in practice, this is probably an error).
rep:nl
this node in a formatter template causes a newline to be appended to the result string.
rep:trimws
this node in a formatter template causes trailing whitespace to be trimmed from the last value formatted.
[rep:tab "pos"]
adds spaces to the result string to bring the next character to the indicated column position.
[rep:tabsp "pos"]
like rep:tab, except that at least one space is always added.
[rep:tabnl "pos"]
if the character position is at or beyond the indicated position, a newline is generated, then appends spaces to tab to the indicated position.
[rep:indent "offset"]
adjusts the left margin to the right (+ve) or to the left (-ve).
[rep:left "pos"]
sets the left margin to the indicated character position.
[rep:wrap "pos"]
indicates a right margin position at which text is wrapped. While a wrap margin is in effect, consecutive whitespace in the data for output may be replaced by a single space. The margin stays in effect for the current 'write' command, or until reset to zero.
[rep:if [defined "name", ... ] ;
 rep:do (template)]
selects (template) for output if all of the named query variables are defined.
[rep:if [defined "name", ... ] ;
 rep:do (template-1) ;
 rep:else (template-2)]
selects (template-1) for output if all of the named query variables are defined, otherwise selects (template-2).
[rep:ifany [defined "name", ... ] ;
 rep:do (template)]
selects (template) for output if any of the named query variables are defined.
[rep:ifany [defined "name", ... ] ;
 rep:do (template-1) ;
 rep:else (template-2)]
selects (template-1) for output if any of the named query variables are defined, otherwise selects (template-2).
(URI)
If the URI does noty correspond to any of the special formatter codes noted above, the URI string is copied to the output string.

The rep:if and rep:ifany, used in conjunction with alternative and optional query pattern sections, can be used to define output formats that are sensitive to the data present in a model from which output is being generated. A similar effect can be obtained using the rep:Report structure described below, but is more combersome to code and less efficient to generate.

Class rep:Report:

This class describes a report generation control program, which consists of a sequence of blank nodes, each of which describes a command to be executed (see later for more detailed descriptions).

rep:cmd
this property identifies the command code
rep:open
command code to open a new output channel
rep:close
command code to close an output channel
rep:write
command code to write data to an output channel
rep:if
command code for conditional execution of a command sequence, based on a named query variable being defined and/or a query pattern match against the model.
rep:ifany
command code for conditional execution of a command sequence, based on any one of a list of query variables being defined, or a query pattern match against the model.
rep:for
command code for repeated execution of a command sequence, for each match of a query against the model.
rep:debug
command code for displaying diagnostic information as a report is being generated. The displayed message is specified in the same way as rep:write output, using a rep:data property.
rep:chan
this property indicates an output channel for open, close and write commands (literal)
rep:file
this property indicates an filename for the open command (rep:FormatTemplate)
rep:data
this property indicates data for the write or debug command (rep:FormatTemplate)
rep:defined
this property indicates a query variable name to be tested to see if it is defined (literal). This test may be used with if and ifany commands.
rep:pattern
this property indicates a query pattern for the if, ifany and for commands (rep:QueryPattern)
rep:member
this is a special property used in a query to match a container membership property relation between its subject and object. Used to scan the contents of RDF containers in the style of rdf:Bag, etc.
rep:element
this is a special property used in a query to match a direct or indirect list element relation between its subject and object. Used to scan the contents of the new RDF lists constructed using rdf:first, rdf:rest, etc..
rep:do
this property indicates a sequence of commands for the if and for commands (rep:Report)
rep:first
this property indicates a sequence of commands used by the for command before the first invocation of the rep:do sequence, only if the query pattern is matched (rep:Report)
rep:sep
this property indicates a sequence of commands used by the for command between each invocation of the rep:do sequence (rep:Report)
rep:last
this property indicates a sequence of commands used by the for command following the last invocation of the rep:do sequence, only if the query pattern is matched (rep:Report)
rep:else
this property indicates an alternative sequence of commands for the if and for commands, executed if there is no match of the query pattern (rep:Report)

3. Overview of registry generator application

Registry generation application is called N3GenMsgRegistry, and is written in Python.

To use the program to create registry data from N3-coded header, first ensure that a Python system version 2.1 or later is installed on your computer. The progrem can then be run using a command line like this:

python N3GenMsgRegistry -i file1,file2,... -o dir

where:

-i
indicates a list of 1 or more files containing data in Notation 3 format, including details of the headers to be included in the registry files. Headers are recognized as resources that have type hdr:HeaderField.
-o
indicates a base directory for the output files. The header field summary and RFC 2629 XML document source are placed in this directory, and the detailed header field descriptions are placed in subdirectories mail, http, news, etc. according to the applicable protocol.

The application consists of the following components:

N3GenMsgRegistry.py

This is the main program module, and contains the N3-coded "little language" source for the registry file generation application. This source could equally be read from a data file containing the N3 code.

N3Model.py (including N3Statement.py, N3Node.py and N3Exception.py)

This provides a basic API for creating and accessing an RDF/N3 data model. The N3Model interface bears some structural similarity to the Model interface provided by Jena [7], though they differ in many details and N3Model is very much less comprehensive. N3Model provides an abstracted interface to the RDF/N3 data which is independent of the particular input syntax used.

N3Parser.py

This is a Notation 3 parser, storing resulting data into an N3Model.

N3Report.py

This is a module that generates reports from N3Model data, with both the report definition and report data being obtained from the N3Model.

4. Model

As noted above, this was influenced by and bears some superficial resemblance to the Jena Model interface.

The model is implemented in three main modules: N3Model.py, N3Statement.py and N3Node.py. These modules (and N3Exception) are tha basis around which all of the other components are constructed.

In general, interface methods have been added as and when a requirement was encountered, rather than according to some master plan. Consequently, I like to think that the interface reflects what is practically useful for implementing certain classes of RDF application.

Some key features are:

Todo:

5. Notation 3 parser

The Notation 3 parser is implemented in module N3Parser.py.

Some key features are:

Todo:

6. Report generator

The "report generator" is implemented in module N3Report.py. This module allows data in an N3Model to be extracted and formatted in a variety of ways. This module makes extensive use of N3 to encode "little languages" that drive the report generation process:

The query language and formatter components communicate by means of variable bindings.

6.1 Query processor

The query language is derived from the following syntax:

Pattern   = "(" Path ")"

Path      = Subject Subpath

Subpath   = Predicate Object *( Subpath )
          | "(" Subpath *( "&" Subpath ) ")"

Subject   = Variable | URI

Predicate = Variable | URI

Object    = Variable | URI | Literal

Variable  = "?" Name

When encoded in Notation 3, a query looks something like this:

hrep:HdrProtoPattern :-
  ( [ rep:var "header" ]
    [ rep:and
      ( [ rep:uri rdf:type ] [ rep:uri hdr:HeaderField ] ),
      ( [ rep:uri hdr:fieldName ] [ rep:var "name" ] ),
      ( [ rep:uri rdfs:label ] [ rep:var "purpose" ] ),
      ( [ rep:uri hdr:protocol ] [ rep:var "p" ]
        [ rep:and
          ( [ rep:uri hdr:protocolName] [ rep:var "pname" ] ),
          ( [ rep:uri hdr:specification] [ rep:var "ps" ]
            [ rep:uri hdr:document ] [ rep:var "psdocument" ] )
      ] )
  ] ) .

This differs from RDF query languages like RDQL [8] or SquishQL [9] in a number of respects:

Todo:

6.2 Formatter

The formatter language is a sequence of values for output, where each value may be one of:

template     = ( template *template )
               | simple-value

simple-value = "literal"                 # Copy literal to result
               [ rep:var "name" ]        # Copy bound node value to result
               rep:nl                    # Copy newline to result
               rep:trimws                # Trim whitespace from last item
               [ rep:tab "pos" ]         # Tab to position
               [ rep:tabsp "pos" ]       # Tab, with space if needed
               [ rep:tabnl "pos" ]       # Tab, with newline if needed
               [ rep:left "pos" ]        # Set left margin
               [ rep:wrap "pos" ]        # Set right margin word-wrap
               [ rep:indent "offset" ]   # Adjust left margin
               [ rep:defer template ]    # Deferred value for output
               [ rep:flush template ]    # Override deferred value
               [ rep:if    [rep:defined "name"], ... ;
                 rep:do    template ;    # Use this is all names are defined
                 rep:else  template ]    # .. otherwise this
               [ rep:ifany [rep:defined "name"], ... ;
                 rep:do    template ;    # Use this if any name is defined
                 rep:else  template ]    # .. otherwise this
               node                      # Copy node URI or label to result

(The node option is provided mainly for diagnostic purposes.)

When encoded in Notation 3, a formatter template corresponding to the query example in the previous section looks something like this:

hrep:HdrEntry1 :-
  ( "<h3>Header field: " [rep:var "name"] "</h3>" rep:nl
    "<p>" [rep:var "purpose"] "</p>" rep:nl
    "<dl>" rep:nl
    "<dt>Applicable protocol: </dt><dd>" [rep:var "pname"]
        " ("
        "<a href='" [rep:var "psdocument"] "'>"
        [rep:var "psdocument"] "</a>"
        ")</dd>" rep:nl
    "<dt>Status:</dt><dd>" [rep:var "status"] "</dd>" rep:nl ) .

The formatter template is basically very simple: it defines an output string that can be generated using substitutions from a supplied of variable bindings. It also recognizes special options options for output formatting (tabs, margins) and provides for optional and alternative templates to be used depending on what query variables are defined.

6.3 Control language

The control language drives the report generation process. It consists of sequences of the following basic commands:

open( channel, filename )

open an output channel to a named file. The supplied filename is a formatter template, as described in the previous section.

close( channel )

close an output channel.

write( channel, data )

write data to an output channel. The supplied data is a formatter template, as described in the previous section.

if ( pattern, do-sequence, else-sequence )

match pattern against the model and if matched execute do-sequence of commands with the resulting variable bindings, otherwise execute else-sequence.

for ( pattern, for-sequence, first-sequence, separator-sequence, last-sequence, else-sequence )

match pattern against the model and for each match execute for-sequence of commands with the corresponding resulting variable bindings with separator-sequence executed between each such occurrence; otherwise if there are no matches execute else-sequence. If present, separator-sequence is executed with variable bindings present before pattern is matched. The first-sequence and last-sequence, if present, are executed only if pattern is matched at least once, before the first invocation of for-sequence and following the last such invocation, respectively.

do ( sequence )

execute the commands in sequence.

debug ( data )

display data as diagnostic outpout while a report is being generated.

When encoded in Notation 3, a partial report generator might look like this:

hrep:GenHeaders a rep:Report ; :-
  ( [ rep:cmd rep:open  ; rep:chan "t" ;
              rep:file (  [ rep:var "path" ] "/MessageHeaders.html" ) ]
    [ rep:cmd rep:write ; rep:chan "t" ; rep:data hrep:TableHead ]
    [ rep:cmd rep:for   ; rep:pattern hrep:HdrProtoPattern ;
      rep:do
        ( [ rep:cmd rep:write ; rep:chan "t" ; rep:data hrep:TableItem ]
          [ rep:cmd rep:if    ; rep:pattern hrep:HdrDetailPattern ;
            rep:do
              ( [ rep:cmd rep:open ; rep:chan "e" ;
                  rep:file ( [rep:var "path" ] "/"
                             [rep:var "pname"] "/"
                             [rep:var "name"] ".html" ) ]
                [ rep:do  hrep:GenEntry ]
                [ rep:cmd rep:close ; rep:chan "e" ] ) ;
            rep:else
              ( [ rep:cmd rep:write ; rep:chan "t" ;
                  rep:data hrep:NoDetail ] ) ] ) ;
      rep:else
        ( [ rep:cmd rep:write ; rep:chan "t" ; rep:data hrep:TableEmpty ] ) ]
    [ rep:cmd rep:write ; rep:chan "t" ; rep:data hrep:TableFoot ]
    [ rep:cmd rep:close ; rep:chan "t" ] ) .

7. Conclusions

7.1 "Little languages" in Notation 3

This note describes an application that uses three "little languages" coded in the Notation 3 syntax for RDF. Maybe the first question that springs to mind is "so what?"; this was, after all, a simple application that could have been written without any involvement of RDF. So what?

The advantages here may not be overwhelming, but I found the overall development effort to be commensurate with the application. The main disadvantage I found was that coding the "little language" in Notation 3 was not always the easiest way, though it was still considerably less effort than writing corresponding code from scratch. (The ease of coding the logic in N3 have been improved by some judicious changes to the query and template languages made since the initial implementation.) If I were planning to create a range of different applications based on the N3Report module, I would probably want to write a simple preprocessor to generate the N3 code from a more friendly text format.

What I did find attractive was the flexibility of having the different facets of application logic in a common format: the common format and use of variable bindings made it very easy to integrate the control language, query and formatter components. Using the same format for application logic that is used for the data brings certain advantages of familiarity and re-use of the data access code.

7.2 Query language form

This implementation uses a variation on the form query language for use with RDF, compared with some other work in this area [8], [9]. The form used here is based on matching paths through the graph - sequences of nodes and properties - where other approaches have tended to focus on matching sets of triples. Implementation of the path-and triple- matching forms essentially reduce to a similar form of query (i.e. a sequence of triple patterns to be matched by the model), but I am encouraged that my path-based query format, and the use of existing variable bindings to further constrain a query, results in a pattern of queries against the model that seems quite efficient, in that it is very similar to what I would perform if coding the queries by hand.

7.3 Lessons about the report generator

I approached the report generator design with the view of putting all complex logic in the "control language", and keeping the query and formatter templates simple and direct. This turned out to be cumbersome for coding reports, and I ended up building some limited ability to deal with optional and/or missing information into the query pattern and formatter template designs. This resulted in much easier coding and adaptation of reports.

7.4 Implications for "dark triples"

As I write this, the RDF core working group has been giving some consideration to "dark triples" (i.e. unasserted triples) in RDF models.

Using RDF graphs to encode my "little languages", I have no idea of the corresponding subgraphs are satisfiable in the RDF model theory [10] in a way that is consistent with other uses of the data. Practically, this doesn't matter for my applications: they work fine. But is there a problem lurking if the same RDF data is used by advanced reasoners that make full use of the RDF entailments sanctioned by the model theory?

This uncertainty would be banished if the RDF core language explicitly recognizes dark triples, and provides a syntactic or other mechanism to designate certain statements as "dark triples".

7.5 Performance

Report generator performance when interpreting the query and formatter languages directly from the RDF model store was very poor. Way too much time was spent simply fetching data from the model. More sophisticated model access methods may have helped, but there's still clearly a big overhead here for an interpreted language. A dramatic performance improvement (20-50 times) was obtained by prescannning the query pattern and formatter template from the model store into local data structures, and maintaining a cache of these. The resulting code was also very much cleaner, since it separated the RDF model access logic from the query and formatter execution logic.

7.6 Further work

There are, of course, myriad improvements that I would make to my report generator program and message header field registry generating application. But, for the most part, they would add little to the lessons so far noted.

Some more promising areas for future work seem to be:

7.7 Enhancements

Since this document was first written, two further applications have been developed based on the same codeset: one to generate reports from a document issue-tracking database, and one for generate reports from a protocol dependency matrix, both maintained in Notation3 format.

In the process of developing these applications, the following small enhancements have been added:

8. References

[1] Jon Bentley, Little languages, Communications of the ACM, 29(8):711--21, August 1986.

[2] Ora Lassila and Ralph R. Swick, Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation, February 1999, http://www.w3.org/TR/REC-rdf-syntax.

[3] Tim Berners-Lee, Notation 3: Ideas about Web Architecture - yet another notation, http://www.w3.org/DesignIssues/Notation3.html.

[4] G. Klyne, M. Nottingham and J. Mogul, Registration procedures for message header fields, IETF Internet-draft (work-in-progress), Mar 2002, http://www.ninebynine.org/IETF/Messaging/draft-klyne-msghdr-registry-04.html.

[5] M. Rose, RFC 2629: Writing I-Ds and RFCs using XML, June 1999, http://www.ietf.org/rfc/rfc2629.txt.

[6] Tim Berners-Lee, cwm (Closed World Machine) software, http://www.w3.org/2000/10/swap/doc/cwm.html

[7] Brian McBride, et al, Jena API and supporting software, http://www.hpl.hp.com/semweb/jena-top.html, (formerly http://www.hpl.hp.co.uk/people/bwm/rdf/jena/)

[8] RDQL, http://www.hpl.hp.com/semweb/rdql.html

[9] Libby Miller, SquishQL, http://swordfish.rdfweb.org/rdfquery/

[10] Patrick Hayes, RDF Model Theory, (work in progress) January 2002, http://www.w3.org/TR/rdf-mt/.

[11] Graham Klyne, RDF/N3 report generating software, http://www.ninebynine.org/Software/N3ReportGenerator.zip.

[12] Dan Brickley, R. V. Guha, Resource Description Framework (RDF) Schema Specification 1.0, W3C Candidate Recommendation, 27 March 2000, http://www.w3.org/TR/rdf-schema

[13] Dan Brickley, Libby Miller, et al, RDFWeb: an introduction to RDFWeb and FOAF, http://rdfweb.org/2000/08/why/


For feedback please see: http://www.ninebynine.org/index.html#Contact
$Id: $