RDF icon - link to W3C RDFD pageSwish

Semantic Web Inference Scripting in Haskell

Contents

My related pages


Introduction to Swish

Summary

Swish is a framework, written in the purely functional programming language Haskell, for performing deductions in RDF data using a variety of techniques. Swish is conceived as a toolkit for experimenting with RDF inference, and for implementing stand-alone RDF file processors (usable in similar style to CWM, but with a view to being extensible in declarative style through added Haskell function and data value declarations). It explores Haskell as "a scripting language for the Semantic Web".

Swish is a work-in-progress, and currently incorporates:

Notation 3 input and output

Swish contains a Notation3 parser and formatter. The parser does not recognize all of the very latest additions to Notation3, but seems to be able to deal with a good number of Notation3 files that appear in the wild. The formatter is fairly primitive, focusing as it does on round-tripping: generating Notation3 data that can be accepted by the parser.

RDF graph isomorphism testing and merging

Swish has logic for performing RDF graph isomorphism testing (graphs are considered equivalent if they can be made exactly equal by some reallocation of bnode identifiers). This is used mainly for software testing purposes.

Display differences between RDF graphs

For diagnostic purposes, it is sometimes useful to be able to see the precise areas in which two graphs may differ. The Swish program contains a facility to do just this, illustrated by the following sample output:

D:\Dev\HaskellRDF>Swish -i=Data/Diff1.n3 -d=Data/Diff2.n3
Graph differences: 8
---- Difference 1 ----
Graph 1:"lx12"
Graph 2:"lx22"
---- Difference 2 ----
 :
---- Difference 5 ----
Graph 1:
(_:7 base2:p22
  (_:8 rdf:rest
    (_:10 rdf:rest rdf:nil
      ; rdf:first "lx12")
    ; rdf:first
    (_:9 base2:p23 "lx11")))
Graph 2:
(No arcs)
---- Difference 6 ----
Graph 1:
(No arcs)
Graph 2:
(_:7 base2:p22a
  (_:8 rdf:rest
    (_:10 rdf:rest rdf:nil
      ; rdf:first "lx22")
    ; rdf:first
    (_:9 base2:p23 "lx21")))
---- Difference 7 ----
Graph 1:"p3-diff1"
Graph 2:"p3-diff2"
---- Difference 8 ----
Graph 1:base4:o1
Graph 2:base4:o2

D:\Dev\HaskellRF>

The input data for this example is Diff1.n3 and Diff2.n3.

The program attempts to find the smallest set of differences between the two graphs supplied, and display the result. This capabaility has proven useful to locate faults in software that generates RDF graphs.

Dan Connolly (W3C http://www.w3.org/People/Connolly/) reports getting this sample to run using Hugs on Debian Linux. The command line he used used was:

runhugs -98 -P:.:../ HaskellUtils/:Parsec:HUnit:Sort:Dfa Swish.hs -i=Data/ Diff1.n3 -d=Data/Diff2.n3

The Swish documentation (see Swish links and download) contains more information about Hugs options needed to run Swish.

Inference framework based on logical inference rule concept

The Swish inference framework uses as its foundation a generic concept of an inference rule, having the following characteristics:

Not every rule is required to fully support both forward and backward chaining, but all are required to support the proof-checking mode.

The underlying rule definition is not specific to RDF graphs, and is polymorphic in the type of expression to which the rule applies. Other modules in the Swish framework specialize the rule to apply to RDF graphs.

Rules and axioms are grouped into bundles called rulesets, which are used to represent defined patterns of inference (e.g. RDF, RDFS, datatype-aware, application domain aware).

RDF Horn-style rule implementation

Swish provides a generic implementation of the rule framework that deduces a single conclusion from a conjunction of antecedent RDF graphs. The antecedent and consequent graphs are specified as graph patterns that may contain variable nodes in place of any of the RDF nodes or properties, and the deduction is considerd to be true for any substitution of the variale nodes through the whole rule (i.e. the variable nodes are universally quantified within the scope of the rule).

These rules can be used in both forward- and backward- chaining modes. Backward chaining may involve the introduction of bnodes (existential variables). [[[Note: the code does not currently properly handle the scoping of such introduced existential variables -- to do this will require an extension to the RDF graph structure used.]]]

This kind of rule is implemented by a combination of simple graph query and back-substitution. (Both forward and backward chaining operate similarly in this respect, but the details do vary.)

The basic query/substitution operation of simple rules can be modified through the use of variable binding modifiers and variable binding filters. These are used to manipulate the variable bindings obtained from the query phase, before performing the substitution phase. In a simple case, a variable binding filter (which is just a special case of a variable binding modifier) may cause some of the variable bindings from the query phase to be discarded if they do not satisfy certain criteria; e.g. some rules are applicable only to certain forms of RDF graph node. More complex variable binding modifiers may actually add new variable bindings to those obtained from the query phase; e.g. the RDF "allocated to" rule that appears in the RDF formal semantics specification is handled by such means. The early design of Swish also anticipated that datatype-specific operations (e.g. integer addition) might be introduced through this mechanism.

Class restriction rule implementation

A second style of general purpose inference rule implementation provided by Swish is based on the notion of a class restriction, the basic idea of which is described by a paper by Pan and Horrocks as a generalization of the Owl notion of a class restriction.

This form of inference rule has been implemented in Swish because it appears to be a more useful way of accessing datatype-related inferences. My analysis is that this allows a cleaner separation of datatype-specific inference functionality from application-specific use of the datatype properties. Datatype inference capabilities built into Swish can, in principle, be accessed from RDF data without the need of any new parsing capability beyond standard RDF.

RDF formal semantics entailment rule implementation

Built-in to Swish is an implementation of all of the entailment rules described in the RDF Semantics specification. With the exception of subgraph entailment and simple entailment, these are all implemented as instances of the Horn rule with variable binding modifiers, described above. Subgraph entailment and simple entailment are implemented by special-purpose inference rules written in Haskell.

Framework for datatypes

The Swish datatype model is based on that defined for RDF (and XML schema datatypes), and embodies:

To these, Swish adds:

The last two provide overlapping functionality, in that they provide alternative ways to define deductions that use computations on datatype values (e.g. arithmetic operations). One of the goals of Swish is to allow experimentation with different styles of inference. The advantage of the class restriction approach is that it provides access to datatype computations within the syntactic framework of RDF; the variable binding modifier approach requires the introduction of syntactic construct for formulae to define the antecedents and consequences for Horn rules.

Swish main program

The Swish main program can be run as a stand-alone executable (when compiled using GHC, or by using the RunHugs utility), or from an invocation of the runSwish function from within a Hugs or GHCi interactive session.

A summary of command line options can be obtained by running Swish with the command line option '-?'.

Facilities available through command line optioins include:

A script file provides access to a range of Swish inference capabilities, shown in the example below. The following script file commands are currently provided:

Swish can be used as a Unix-style filter, reading from a standard input stream and writing to standard output, or to read and write named files. [[[Not yet implemented: reading and writing via HTTP or from a specified URI.]]]


Script file example

This script file example is provided to give a quick sense of the capabilities of Swish:

# -- Example Swish script --
#
# Comment lines start with a '#'
#
# The script syntax is loosely based on Notation3, but it is a quite 
# different language, except that embedded graphs (enclosed in {...})
# are encoded using Notation3 syntax.
#
# -- Prefix declarations --
#
# As well as being used for all labels defined and used by the script
# itself, these are applied to all graph expressions within the script 
# file, and to graphs created by scripted inferences, 
# but are not applied to any graphs read in from an external source.

@prefix ex:  <http://id.ninebynine.org/wip/2003/swishtest/> .
@prefix pv:  <http://id.ninebynine.org/wip/2003/swishtest/pv/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix xsd_integer: <http://id.ninebynine.org/2003/XMLSchema/integer#> .
@prefix rs_rdf:  <http://id.ninebynine.org/2003/Ruleset/rdf#> .
@prefix rs_rdfs: <http://id.ninebynine.org/2003/Ruleset/rdfs#> .
@prefix :   <http://id.ninebynine.org/default/> .

# Additionally, prefix declarations are provided automatically for:
#    @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#    @prefix rdfs:  <file:///E:/Download/www.w3.org/2000/01/rdf-schema#> .
#    @prefix rdfd:  <http://id.ninebynine.org/2003/rdfext/rdfd#> .
#    @prefix rdfo:  <http://id.ninebynine.org/2003/rdfext/rdfo#> .
#    @prefix owl:   <http://www.w3.org/2002/07/owl#> .


# -- Simple named graph declarations --

ex:Rule01Ant :- { ?p ex:son ?o . }

ex:Rule01Con :- { ?o a ex:Male ; ex:parent ?p . }

ex:TomSonDick :- { :Tom ex:son :Dick . }

ex:TomSonHarry :- { :Tom ex:son :Harry . }


# -- Named rule definition --

@rule ex:Rule01 :- ( ex:Rule01Ant ) => ex:Rule01Con


# -- Named ruleset definition --
#
# A 'ruleset' is a collection of axioms and rules.
#
# Currently, the ruleset is identified using the namespace alone;
# i.e. the 'rules' in 'ex:rules' below is not used.  
# This is under review.

@ruleset ex:rules :- (ex:TomSonDick ex:TomSonHarry) ; (ex:Rule01)

# -- Forward application of rule --
#
# The rule is identified here by ruleset and a name within the ruleset.

@fwdchain ex:rules ex:Rule01 { :Tom ex:son :Charles . } => ex:Rule01fwd

# -- Compare graphs --
#
# Compare result of inference with expected result.
# This is a graph isomorphism test rather than strict equality, 
# to allow for bnode renaming.
# If the graphs are not equal, a message is generated
# The comment (';' to end of line) is included in any message generated

ex:ExpectedRule01fwd :- { :Charles a ex:Male ; ex:parent :Tom . }  

@asserteq ex:Rule01fwd ex:ExpectedRule01fwd
   ; Infer that Charles is male and has parent Tom

# -- Display graph --
#
# Write graph as Notation3 to standard output.
# The comment is included in the output.

@write ex:Rule01fwd ; Charles is male and has parent Tom

# -- Write graph to file --
#
# The comment is included at the head of the file.
# (TODO: support for output to Web using HTTP.)

@write ex:Rule01fwd <Example1.n3> ; Charles is male and has parent Tom

# -- Read graph from file --
#
# Creates a new named graph in the Swish environment.
# (TODO: support for input from Web using HTTP.)

@read ex:Rule01inp <Example1.n3>

# -- Proof check --
#
# This proof uses the built-in RDF and RDFS rulesets, 
# which are the RDF- and RDFS- entailment rules described in the RDF
# formal semantics document.
#
# To prove:
#     ex:foo ex:prop "a" .
# RDFS-entails
#     ex:foo ex:prop _:x .
#     _:x rdf:type rdfs:Resource .
#
# If the proof is not valid according to the axioms and rules of the 
# ruleset(s) used and antecedents given, then an error is reported 
# indicating the failed proof step.

ex:Input01 :- { ex:foo ex:prop "a" . }

ex:Result :- { ex:foo ex:prop _:a . _:a rdf:type rdfs:Resource . }

@proof ex:Proof01 ( rs_rdf:rules rs_rdfs:rules )
  @input  ex:Input01
  @step   rs_rdfs:r3 ( rs_rdfs:a10 rs_rdfs:a39 )
          => ex:Step01a :- { rdfs:Literal rdf:type rdfs:Class . }
  @step   rs_rdfs:r8 ( ex:Step01a )
          => ex:Step01b :- { rdfs:Literal rdfs:subClassOf rdfs:Resource . }
  @step   rs_rdfs:r1 ( ex:Input01 )
          => ex:Step01c :- { ex:foo ex:prop _:a . _:a rdf:type rdfs:Literal . }
  @step   rs_rdfs:r9 ( ex:Step01b ex:Step01c )
          => ex:Step01d :- { _:a rdf:type rdfs:Resource . }
  @step   rs_rdf:se  ( ex:Step01c ex:Step01d )   => ex:Result
  @result ex:Result

# -- Restriction based datatype inferencing --
#
# Datatype inferencing based on a general class restriction and
# a predefined relation (per idea noted by Pan and Horrocks).

ex:VehicleRule :-
  { :PassengerVehicle a rdfd:GeneralRestriction ;
      rdfd:onProperties (:totalCapacity :seatedCapacity :standingCapacity) ;
      rdfd:constraint xsd_integer:sum ;
      rdfd:maxCardinality "1"^^xsd:nonNegativeInteger . }

# Define a new ruleset based on a declaration of a constraint class
# and reference to built-in datatype.
# The datatype constraint xsd_integer:sum is part of the definition 
# of datatype xsd:integer that is cited in the constraint ruleset
# declaration.  It relates named properties of a class instance.

@constraints pv:rules :- ( ex:VehicleRule ) | xsd:integer

# Input data for test cases:

ex:Test01Inp :-
  { _:a1 a :PassengerVehicle ;
      :seatedCapacity "30"^^xsd:integer ;
      :standingCapacity "20"^^xsd:integer . }

# Forward chaining test case:

ex:Test01Fwd :- { _:a1 :totalCapacity "50"^^xsd:integer . }

@fwdchain pv:rules :PassengerVehicle ex:Test01Inp => :t1f
@asserteq :t1f ex:Test01Fwd  ; Forward chain test

# Backward chaining test case:
#
# Note that the result of backward chaining is a list of alternatives,
# any one of which is sufficient to derive the given conclusion.

ex:Test01Bwd0 :-
  { _:a1 a :PassengerVehicle .
    _:a1 :totalCapacity "50"^^xsd:integer .
    _:a1 :seatedCapacity "30"^^xsd:integer . }

ex:Test01Bwd1 :-
  { _:a1 a :PassengerVehicle .
    _:a1 :totalCapacity "50"^^xsd:integer .
    _:a1 :standingCapacity "20"^^xsd:integer . }

# Declare list of graphs:

ex:Test01Bwd :- ( ex:Test01Bwd0 ex:Test01Bwd1 )

@bwdchain pv:rules :PassengerVehicle ex:Test01Inp <= :t1b

@asserteq :t1b ex:Test01Bwd  ; Backward chain test

# Can test for graph membership in a list

@assertin ex:Test01Bwd0 :t1b ; Backward chain component test (0)
@assertin ex:Test01Bwd1 :t1b ; Backward chain component test (1)

# -- Merge graphs --
#
# Merging renames bnodes to avoid collisions.

@merge ( ex:Test01Bwd0 ex:Test01Bwd1 ) => ex:Merged

# This form of comparison sets the Swish exit status based on the result.

ex:ExpectedMerged :-
  { _:a1 a :PassengerVehicle .
    _:a1 :totalCapacity "50"^^xsd:integer .
    _:a1 :seatedCapacity "30"^^xsd:integer .
    _:a2 a :PassengerVehicle .
    _:a2 :totalCapacity "50"^^xsd:integer .
    _:a2 :standingCapacity "20"^^xsd:integer . }

@compare ex:Merged ex:ExpectedMerged

# End of example script

With the Swish software installed, this script can be executed by:

  1. Placing the above text into a file called, say, SwishExample.ss.
  2. Start the Hugs or GHCi Haskell system.
  3. Load module SwishMain.hs from the Swish software directory.
  4. Issue the command: runSwish "-s=SwishExample.ss".

Substitute the actual file name used for SwishExample.ss. If the path contains backslash characters ('\'), then they must be doubled-up within the string (e.g. runSwish "-s=F:\\Temp\\SwishExample.ss"). Using Hugs, the console console session running the above script looks like this:

SwishMain> runSwish "-s=F:\\Temp\\SwishScriptExample.ss"
# Charles is male and has parent Tom
@prefix rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs : <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdfd : <http://id.ninebynine.org/2003/rdfext/rdfd#> .
@prefix rdfo : <http://id.ninebynine.org/2003/rdfext/rdfo#> .
@prefix owl : <http://www.w3.org/2002/07/owl#> .
@prefix ex : <http://id.ninebynine.org/wip/2003/swishtest/> .
@prefix pv : <http://id.ninebynine.org/wip/2003/swishtest/pv/> .
@prefix xsd : <http://www.w3.org/2001/XMLSchema#> .
@prefix xsd_integer : <http://id.ninebynine.org/2003/XMLSchema/integer#> .
@prefix rs_rdf : <http://id.ninebynine.org/2003/Ruleset/rdf#> .
@prefix rs_rdfs : <http://id.ninebynine.org/2003/Ruleset/rdfs#> .
@prefix  : <http://id.ninebynine.org/default/> .
:Charles ex:parent :Tom ;
         rdf:type ex:Male .
Proof satisfied: ex:Proof01

For more detailed discussion, see the Swish documentation.


Extending Swish

Swish is designed to be readily extensible in the following ways. For the most part, these extensions involve the addition of new Haskell code, and very small changes to existing code.

Adding datatypes

The primary means for extending Swish capabilities is anticipated to be the addition of new datatypes.

A new datatype can be added as a single module, written in Haskell, following the pattern of an existing datatype module.

If only one of the available datatype inference styles is required to be supported, then the other can be left undefined. If an application attempts to use a datatype inference style that is not defined, a program error will occur.

Adding rulesets, axioms and inference rules

A ruleset is a named collection of axioms and inference rules. A ruleset might be built into Swish to express some body of domain knowledge. All built-in rulesets are accessible in the Swish script language. This is how the standard RDF and RDFS entailment rules are accessed.

Adding new forms of inference rule

Swish contains two built-in styles of generic inference rule:

It also contains a small number of hand-coded inference rules to describe RDF core semantic behaviours:

It is possible to write a new inference rule in Haskell, and expose it to the script processor via a new ruleset. There are, however, some complications to doing generic inference patterns in this way, as the script processor would need to be enhanced to provide a way to instantiate rules based on such a pattern.

Adding new script language commands

The script processor has a fairly simple modular structure, with a parser that compiles the script to a higher order monadic function (incorporating the IO monad), and then evaluates that function. Adding new functions would be reasonably easy for an experienced Haskell programmer, but it does require an understanding of the way that Haskell uses monads for managing state and I/O.

A "compiled" script, returned by the script parser, is an instance of the 'SwishStateIO' monad. This is a function that is executed by applying it to an initial state value.

Create a new program using the existing functions

For specific applications, it may be most effective to simply write a new Haskell main program to perform the required processing. This could be done based on the existing SwishStateIO monad, or by using that as an example to guide the definition of an application-specific monad, and using the existing code as a starting point for the new program. Using the existing code in this way means that the details of command line access and setting up the Haskell I/O environment are mostly taken care of.

The RDF and Notation 3 parsing, formatting and inference functions are all defined as pure functions, so there should be no difficulty in taking these and incorporating them into some other Haskell program.

The script parser itself is also a pure function, so it could be used to compile an application-specific script whose code is embedded in a program, and then to execute that script with an appropriate initial state.


Why Haskell?

Haskell is a pure functional programming language. This means that it has no language constructs that cause side-effects, and Haskell expressions exhibit referential transparency (their value does not depend on some unknown "state"). This gives rise to greater semantic clarity, reducing the possibility of surprises when dealing with complex information. It also means that expression values are not dependent on order of evaluation, a significant feature when dealing with RDF information that may be incoming from a variety of sources.

Haskell's pure functional style makes it amenable to formal semantic analysis; it thus provides an analyzeable and testable (executable) way to extend RDF semantics, and to bridge the gap between logical theory and running software.

My own past work on using RDF inference tools has exposed a need for access to full programming language capabilities when developing applications using on RDF inference, for incorporating new datatype inference primitives into the application, but without the attendant complexities and uncertainties of a conventional programming language. The non-imperative Haskell programming style is a good match with the declarative nature of semantic web deduction.

Haskell embodies a very powerful type system for all values that can be expressed (based on the "Hindley-Milner" type system). It is fully type-secure, statically-checked, and polymorphic (meaning that functions, and indeed simple data values, can have many different types, and hence be used safely in conjunction with a range of types of other values). Many conventional languages have strong typing, but require some of the type guarantees to be sacrificed in order to write generic code, typically exposing a possibility of run-time datatype violation errors. A Haskell function can be written to support a wide range of different uses while retaining full type security, checked at compile-time. In practical terms, this leads to more reliable software.

Many Semantic Web systems and proposals make use of Prolog-like logic programming techniques. Haskell's lazy evaluation model makes it very easy to write programs that mimic Prolog's search-and-backtrack style of evaluation, and also to embed these computations in a framework that smoothly handles features that logic programming does not handle so well. (The RDF graph query logic in Swish is an example of this.)

Higher order functions as first class values provide an easy way to represent and manipulate relationships between values, and inference processes that operate on such values. Inferences processes described in RDF can be "compiled" to higher order functions in Haskell. Mixing functions (computable values) and conventional data to represent information provides for smoother integration between the RDF language and software that manipulates it.

Of course, there are many computer programming languages that can boast an impressive array of features, but many fall down when applied to real-world situations. In my past experience, quality of support is at least as important as the quality of a programming language.

Haskell is supported by an active developer community. There are (at least) three high quality, freely available implementations that exhibit a good degree of interoperability, and are available on most common computing platforms; one of these can compile Haskell programs to directly executable binary code. In several months of working with Haskell, using two different systems, I have not encountered any difficulties caused by the quality of these implementations.

Perhaps Haskell is least strong in the area of its support libraries. There are a growing number of add-on libraries available, providing a good range of useful capabilities, though development of these seems to be somewhat uncoordinated. Despite this, I find that they tend to work together very well (maybe because Haskell permits and encourages a cleaner separation of concerns than many other languages). There is currently an effort under way to provide a common library framework to support decentralized coordinated development of additional library modules.

For programmers used to conventional computer programming languages, Haskell presents a fairly steep learning curve. It's easy enough to get up-and-running writing simple programs, but some of the techniques that provide better results for more advanced projects take some time to acquire. (A collection of rough notes about my experiences of learning Haskell can be found here.)


Future work

Swish is very much a work-in-progress, without a clear single goal, other than to serve as a framework for experimenting with Semantic Web inference for real applications. I anticipate there will always be things I'd like to see added to Swish. Currently, a few of these are:

See also: Swish TODO notes.


Swish links and download

Current release:

Previous releases:


Licence information

Swish is free, open-source software, distributed under the GNU General Public License. To discuss possible alternative licensing arrangements, please contact the author - see Contact information below.


Swish revision history

Version 0.1 (May 2003)

This was an initial release of the core software with very limited functionality: reading and writing RDF graphs as Notation 3, comparing RDF graphs for isomorphism, and merging RDF graphs.

Version 0.2.0 (December 2003)

This version is the first to contain inference capabilities, including a capability to handle datatype-aware inferences. This is the version of Swish that I plan to use to experiment with real applications. The intent is that future developments of Swish will be guided largely by application requirements.

Version 0.2.1 (February 2004)

The source code is re-organized to separate generic Haskell functions from Swish-specific code. A consolidated test program is included. Added graph partitioning module, and feature to display differences between RDF graphs.


Links to related information


Contact information

Swish has been developed and published by Graham Klyne.

Please direct any enquiries to:


For feedback please see:
<http://www.ninebynine.org/index.html#Contact>
$Id: Intro.html,v 1.8 2004/02/12 10:11:02 graham Exp $