TOC 
Nine by NineG. Klyne
 Nine by Nine
 May 29, 2003

Semantic Web Inference using Haskell

Abstract

This memo describes the software package Swish, which is being developed to conduct experiments in using the programming language Haskell as a basis for performing inference on Semantic Web data.

© 2003 G. Klyne; Some rights reserved.

$Id: swish-0.1.html,v 1.3 2003/06/03 16:22:57 graham Exp $



 TOC 

Table of Contents




 TOC 

1. Introduction

This memo describes Swish, a software package that is intended to provide a framework for building programs in Haskell that perform inference on RDF [1] data.

Swish is primarily intended to be used as a starting point for developing new RDF applications in Haskell, but it also includes a stand-alone program that can be used to perform some simple manipulation of RDF data. Currently, only the Notation 3 [4] serialization form is supported, but I'd like to add support for full RDF/XML in due course.

The software can be downloaded from the web by following links from http://www.ninebynine.org/Software/Intro.html. This software is made generally available under the GNU General Public Licence (GPL), version 2 (or later), a full copy of which is available at http://www.gnu.org/licenses/gpl.html, and also as file GPL.TXT in the Swish software distribution. Other licensing arrangements may be negotiated with the author: see LICENSING.TXT in the Swish software distribution.



 TOC 

2. Background

The Swish software package was developed as a result of experience using CWM [5], an off-the-shelf general-purpose RDF processing program, for evaluating simple inference rules on network access control information expressed in RDF [19]. Specifically, while the general inference capabilities of CWM were almost sufficient for the network access application, some capabilities were required that are unlikely to be provided by any completely general-purpose tool; e.g. analysis of IP network addresses by subnet and host address.

Additionally, the framework for datatyped literals currently proposed by the RDFcore working group [8] is quite open-ended, and it is not specified how generic applications may provide support for new or non-standard datatypes.

In light of these considerations, I sought ways of combining the full expressive capability of a general purpose programming language with the declarative style of inference rules and formal specifications. Using Haskell [11], a pure functional programming language, is the approach adopted.

To use Haskell as a basis for performing inference on RDF data, certain capabilities are neded:

Swish aims to provide these capabilities. Further, it provides capabilities to compare RDF graphs (insensitive to possible renaming of blank nodes), and to merge RDF graphs, renaming blank nodes as necessary to prevent unintended semantic consequences [9].

I anticipate that the main use for Swish will be as a support library for new utilities that apply predefined RDF inference rules. Where CWM is a general-purpose tool for manipulating RDF, I expect to use Swish as a toolkit for creating tools to deal with specific RDF processing requirements. In time, this may lead to identification of some useful capabilities that can guide the design of future general-purpose RDF processing tools.

The programming language Haskell was chosen for a number of reasons:

More information about Haskell can be found at [11]. A useful paper discussing some particular characteristics of functional programming languages is [17].



 TOC 

3. Description of Swish software

Swish comprises a number of modules that can be invoked by Haskell programs, and a stand-alone command-line utility that can be used to perform some basic processing of RDF data.

The Haskell source code for the stand-alone utility may also be used as a starting point for similar utilities that perform specific application processing of RDF data.

3.1 Swish software components

Swish has the following components:

Currently, the only supported RDF graph serialization format is Notation3, but future developments may add support for other formats. RDF/XML would clearly be most desirable. Meanwhile, utilities such as CWM [5] can be used convert RDF/XML to and from Notation 3 format.

3.2 Swish command format

The Swish utility is a command-line utility that performs some simple RDF processing functions. The capabilities provided are with a view to testing the underlying RDF library software rather than performing any particular application purpose.

A Swish command contains a one or more command line options that are processed from left-to-right. The Swish program maintains an internal graph workspace, which is updated or referenced as the command options are processed.

Swish command options:

-?
Displays a summary of the command line options.
-n3
Indicates that Notation3 be used for subsequent input and output. (Currently, this is the only format option, and is selected by default.)
-i[=file]
read file into the graph workspace, replacing any existing graph. If the filename is omitted, the graph is read from standard input.
-m[=file]
read and merge file with the graph workspace. Blank nodes in the input file are renamed as necessary to avoid node identifiers already used by the existing graph. If the filename is omitted, the graph is read from standard input.
-c[=file]
read file and compare the resulting graph with the workspace. Graph comparison is done in a fashion that treats isomorphic graphs as equivalence, and is insensitive to renaming of blank nodes. This is intended to match the definition of graph equivalence in the RDF abstract syntax specification [10]. If the filename is omitted, the graph is read from standard input. If the graphs are unequal, the exit status code is 1.
-o[=file]
write the graph workspace to a file. If the filename is omitted, the graph is written to the standard output.

The Swish program terminates with a status code that indicates the final status of the operation(s) performed. Haskell distinguishes between a success status code whose value is not specified, assumed to be system dependent, and a failure code which is associated with an integer value. The status code values returned by Swish are:

Success
Operation completed successfully; graphs compare equal.
1
Graphs compare different.
2
Input data file incorrect format.
3
File access problem.
4
Incorrect option in command line.

Here are some example Swish command lines:

3.3 Function interfaces

[[[To be provided; until then see the source files, notably SwishCommands.hs, GraphClass.hs, RDFGraph.hs, and the various test modules.]]]



 TOC 

4. Software installation

The Swish software is distributed as a single ZIP archive. Start installation by creating an empty directory for the software, and extracting the content of the ZIP archive into that directory. Select the ZIP option that uses directory information from the archive so that the sub-directory structure is preserved.

The following sections deal with how get get the software running in different Haskell environments. The instructions relate to MS Windows operating systems, but it should be fairly obvious how to adapt the procedures for Unix/Linux systems.

4.1 System requirements

Swish is written entirely in Haskell, and should work with any Haskell system that supports Haskell 98 together with the extensions noted below. The software has been tested using Hugs [12] (version November 2002), Glasgow Haskell Compiler (GHC) [13] (version 5.04.3) and the interactive version of GHC (GHCi).

The required extensions to standard Haskell-98 are:

Some freely available additional Haskell libraries are used, as described later. For convenience, these are included with the Swish software distribution, but are not themselves part of the Swish software for licensing purpose. More details are given later.

My development has been performed mostly using Hugs on a 1.3GHz PC with 256Mb of memory. For most purposes, this has been more than adequate. Some of the larger test cases, and the more perverse graph comparisons, may take several minutes to run on this platform (SwishTest takes about 20 minutes). In practice, the applications are likely to be more demanding than basic requirements of Swish.

4.2 Installation files

The Swish software distribution includes the following files:

Install directory
Swish.html, Swish.xml: this documentation file, and XML source code.
*.hs: Haskell source files (see software overview above).
*Test.hs: unit test Haskell source files.
*.bat: MS-Windows command files for building and testing the software using GHC.
*.txt: additional information, including licensing details.
Data subdirectory
Contains Notation3 data files used by the SwishTest program.
Parsec subdirectory
Contains the Parsec library used by Swish.
HUnit subdirectory
Contains the HUnit library used by Swish test modules.
Sort subdirectory
Contains the Quicksort library used by Swish. (References to this module can be removed, and the standard Haskell function List.sort used in place of QuickSort.)

4.3 System dependent details

4.3.1 Installation using Hugs

Running the Swish software under Hugs is straightforward. The Hugs options -98 and +N must be specified.

Special steps that might help include:

The full settings reported by my Hugs installation are:

Current settings: +fewuiRWXN -stgGl.qQkoOIHT -h5000000 -p"%s> " -r$$ -c40
Search path     : -P{Hugs}\libraries;{Hugs}\libraries\HUnit;
                    {Hugs}\libraries\Parsec;{Hugs}\libraries\Sort
Project Path    :
Source suffixes : -S.hs;.lhs
Editor setting  : -E"C:\\Program Files\\TextPad 4\\TextPad.exe"
Preprocessor    : -F
Compatibility   : Hugs Extensions (-98)
               

4.3.2 Installation using GHCi

Running the Swish software under GHCi is almost as easy as using Hugs. GHCi command line options used include '-fglasgow-exts' and '-iF:\Haskell\Lib\HUnit;F:\Haskell\Lib\Parsec;F:\Haskell\Lib\Sort' (adjusted according to the directories containing the library files). Working under MS-Windows, I find it convenient to create a desktop shortcut to run GHCi, specifying the Swish source directory and other options as properties of the shortcut.

To run a program in the GHCi command interpreter, follow the same procedure that is described for running a program under Hugs. The GHCi and Hugs command shells are very similar.

There is a GHCi initialization file '.ghci' that if placed in the appropriate startup directory is read automatically by GHCi and defines some convenient commands for running the non-interactive GHC compiler from within the GHCi shell.

4.3.3 Installation using GHC

MS-Windows command scripts have been prepared to compile and run the Swish software in an MS-Windows command window. It should be straightforward to create Unix equivalents using information from these. The relevant files are:

The file ghcc.bat assumes a standard GHC installation, with the GHC compiler is on the current search path, and will probably need to be edited to refelct the actual locations of the support libraries used.

Once the programs have been compiled and linked, they can be run in the usual way by using entering the program name at a command prompt. The test programs do not expect any command line options and run to completion. The program Swish.exe takes command line options as descriped above.

4.4 Installation testing

A Swish installation under GHC can be tested by running the command script TestSwish.bat, and ensuring that all tests complete with zero errors. On a 1.7GHz PC running Windows 2000, the tests take a few minutes to complete.

To test the installation from an interactive shell, the test programs need to be loaded and executed individually. To confirm a successful installation, it is probably sufficient to load and run RDFGraphTest, which should complete quite quickly (about 30 seconds under Hugs on a 1.3GHz PC), then run SwishTest which takes about 20 minutes on the same system.

4.5 Additional libraries used

Swish uses some additional libraries that are not part of the swish software, but which are included with the Swish software distribution for the convenience of users.

Please note that these support libraries are distributed under their own licensing terms and conditions, which I have reproduced below where available. Please contact the respective authors for further information.

4.5.1 Parsec

Parsec [14] is a monadic parser combinator library for Haskell. I found it to be excellently documented and generally easy to use. It also serves as a useful introduction, to using monads in Haskell.

4.5.1.1 Parsec licence

Copyright 1999-2000, Daan Leijen. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

This software is provided by the copyright holders "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders be liable for any direct, indirect, incidental, special, exemplary, or consequential damages ( including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

4.5.2 Quicksort

Quicksort is part of a collection of sorting functions in haskell, published by Ralf Hinze [16].

At the time of writing, I can find no claim for copyright or distribution licensing terms.

4.5.3 HUnit

HUnit [15] is a unit testing framework for Haskell, loosely modelled on the JUnit framework that is popular with Java programmers.

Swish application code does not use HUnit, but the test programs do make extensive use of it.

4.5.3.1 HUnit licence

HUnit is Copyright (c) Dean Herington, 2002, all rights reserved, and is distributed as free software under the following license.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.



 TOC 

5. Conclusions

Swish is very much a work-in-progress, and the present release is a first step along a path with many possible options for future developments.

The immediate next step is to use the Swish framework in the construction of some special-purpose RDF inference tools. My intent is to revisit my earlier work [19], and learn how that work may be served by using Haskell in place of a packaged RDF inference tool.

The Swish code itself is far from perfect, and there is much additional functionality and improvement that can be made. But it does pass an extensive array of tests, and I believe it is sufficiently stable and functional for this initial release.

The software distribution contains a file named TODO.TXT, which lists a number of specific possible enhancements that have been identified to date.



 TOC 

6. Acknowledgements

I would like to thank the following, whose previous work has been most helpful to me (though, of course, they bear no responsibility for the failings of my work):

This document has been authored in XML using the format described in RFC 2629 [3], and converted to HTML using the XML2RFC utility developed by Marshall Rose (http://xml.resource.org/).



 TOC 

References

[1] Lassila, O. and R. Swick, "Resource Description Framework (RDF) Model and Syntax Specification", W3C Recommendation rdf-syntax, February 1999.
[2] Brickley, D. and R. Guha, "Resource Description Framework (RDF) Schema Specification 1.0", W3C Candidate Recommendation CR-rdf-schema, March 2000.
[3] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999.
[4] Berners-Lee, T., "Notation3: Logic and Rules on RDF", Design Issues Ideas about Web Architecture - yet another notation, 1998.
[5] Berners-Lee, T., "Cwm (closed world machine)", September 2002.
[6] Carroll, J., "Matching RDF Graphs", July 2001.
[7] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", May 2003.
[8] "World Wide Web Consortium: RDFcore working group".
[9] Hayes, P., "RDF Semantics", W3C WD WD-rdf-mt-20021112, November 2002 (HTML).
[10] Klyne, G. and J. Carroll, "Resource Description Framework (RDF): Concepts and Abstract Syntax", W3C WD WD-rdf-concepts-20021108, November 2002 (HTML).
[11] "Haskell community web site".
[12] "Hugs online: Hugs98 web site".
[13] "The Glasgow Haskell Compiler (GHC) web site".
[14] Leijen, D., "Daan online: Parsec", Oct 2001.
[15] Herington, D., "HUnit - Haskell Unit Testing", Feb 2002.
[16] Ralf, R., "A library of sorting routines", Apr 2002.
[17] Hughes, J., "Why Functional Programming Matters", 1984.
[18] Thompson, S., "Haskell: The Craft of Functional Programming, Second Edition", Addison-Wesley ISBN 0-201-34275-8, 1999.
[19] Klyne, G., "Using RDF for Home Network Configuration", December 2002.
[20] Brickley, D. and K. Sharp, "European Semantic Web Advanced Development", 2002.
[21] Miller, L., "SWAD-Europe: Developer Workshop Report 2 - Semantic Web calendaring", 2002.


 TOC 

Author's Address

  Graham Klyne
  Nine by Nine
  14 Chambrai Close
  Appleford
  Abingdon, Oxon OX14 4NT
  UK
Phone:  +44 1235 848491
Fax:  +44 1235 848562
EMail:  GK-swish@ninebynine.org
URI:  http://www.ninebynine.net/


 TOC 

Appendix A. Revision history

2003-05-30:
  • Document initially created.



 TOC 

Appendix B. Unresolved issues



 TOC 

Appendix C. CVS revision log

$Log: swish-0.1.html,v $
Revision 1.3  2003/06/03 16:22:57  graham
Typo fixes to web site CVS

Revision 1.6  2003/06/03 16:17:44  graham
Fix another typo

Revision 1.5  2003/06/03 11:31:13  graham
Fix typos in documentation

Revision 1.4  2003/05/31 00:11:21  graham
Fix various typos and omissions

Revision 1.3  2003/05/30 19:12:57  graham
Fixed some document typos and added RDF semantics reference

Revision 1.2  2003/05/30 18:37:25  graham
First formatted version of Swish documentation

Revision 1.1  2003/05/30 16:41:22  graham
Swish documentation, initial version.