Putting Semantics in the Semantic Web

Building a meaningful basis for trust

[[[Work in progress]]]

This note discusses work undertaken by W3C's RDF core working group, and others, to define a formal semantics to underpin the Semantic Web, and indicates why this is significant for security related applications.

The Semantic Web

"With a name like yours, you might be any shape, almost"
- Humpty Dumpty in Through the Looking Glass, Lewis Carroll

Designing and creating a "Semantic Web" is an activity of the World Wide Web Consortium (W3C) [1] [2], driven, in part, by elements of Tim Berners-Lee's original vision for the World Wide Web that have not yet been fully realized [3] [4].

The present-day World Wide Web provides primarily for communication between people, in the form of human-readable web pages; most of the information thus conveyed is opaque to applications. Automation of business processes, the rise of Web Services and business-to-business (B2B) automation across the Internet require that information be accessible to computer applications as well as people.

Networked computer applications exchange data, which is just a collection of bits and bytes. Some common conventions mean that some kinds of data can be assigned a common interpretation: character encodings to interpret bit sequences as characters, syntactical conventions to interpret certain character sequences as numbers, etc. But of themselves, these conventions allow us to convey a limited set of values, with no well-founded rules for saying anything about what they actually mean. We need to go beyond this to form a common basis for exchanging information, and which allows new information to be deduced whose validity can be confrmed by all involved parties (humans and computers).

For example, if a transaction requires payment of €100, and the $/€ exchange rate is 0.86 then the dollar amount due is $86. As human readers, we can follow this if we share common notions of different monetary currencies and exchange rates and how these are related in the context of a monetary transaction. The Semantic Web aims to capture such relationships in a way that is amenable to automatic processing and validation by computer.

What's in a Name?

"That which we call a rose, by any other name..."
- Romeo and Juliet, William Shakespeare

A language is a tool for communicating information. Such communication depends on some common interpretation of the language and words used. In the realm of trust and security this can lead to problems if one party can take advantages of assumptions made by another party.

With classical security techniques (e.g. authentication and encryption using cryptography), and identifying token is bound to possession of a secret. What assurance can someone have that the user of such token is the person or entity that they think it is, or has the ability to deliver on promises contained in an electronically negotiated contract? Typically, some separate mechanism (which may not be computer-based, or may involve the use of a certificate assigned by some third party) is used to establish a relationship between possession of a secret key and some other desired capability or authority. Possession of secret information can be a fickle matter, and such systems may be fragile, and have proved difficult to deploy at Internet scale.

Also, note that the exact nature of an agreement is not always easy to pin down. Legal contracts often have provision for interpretation by a court of law, and specify a jurisdiction with respect to which they should be interpreted. Thus, we see that even use of a commonly understood language is not enough: there are also some "rules of engagement" that must be agreed to give some reasonable assurance of a specific desired interpretation.

Humans are well adapted to dealing with such uncertainties. Computers are not. To capture the rich information that is the basis for informed trust requires a very flexible representational form and access to a large, maybe unbounded, vocabulary of terms for representing the essential concepts in an agreement. With such rich expressive capability come very many opportunities for misunderstanding, which in computer controlled systems will manifest as system faults.

Formal semantics: some history

"... by standing on the shoulders of giants"
- Sir Isaac Newton, in a letter to Robert Hooke

Much practice to formalize meaning builds on Tarski's work [10] in semantics and formal logic, in particular the ideas of Model Theory, which have been adopted by some in the artificial intelligence (AI) community for representing knowledge about the physical world [11]. Some of these ideas are also found in work to formalize semantics of programming languages [12]. Much of formal semantics is built using first order predicate logic [20], or some close variant, though some other approaches have been tried [13].

More recently, a strand of the knowledge representation work has evolved to create description logics [14] [15], using logical expressions to codify ontologies: to classify (describe) kinds of things and form a basis for reasoning about the relationships between them.

The World Wide Web and knowledge representation communities come together in the DARPA Agent Markup Language (DAML) project [16], whose goal was to design an XML-based language that could be used for exchanging descriptions of entities between software agents. Also incorporating elements of OIL, a part of the European Union On-To-Knowledge project [17], the resulting DAML+OIL specification was published recently [18]. The DAML+OIL work is being used as the starting point for a W3C Web Ontology Language design effort [25].

Underlying this DAML+OIL work is W3C's Resource Description Framework (RDF). This was initially a project to standardize a format for machine-readable metadata attached to Web documents, and the original RDF specification was published in 1999 [5]. Since then, a number of issues have come to light with that specification, one of which was the lack of a formal semantics. W3C's RDF Core working group [6] has been working to resolve these issues within the spirit of the original specification.

NOTE: although it draws upon work from the AI community, artificial intelligence is not a goal of the Semantic Web and RDF. (If only because AI is such a difficult term to pin down: Arthur C. Clarke once proposed that any sufficiently advanced technology is indistinguishable from magic; I have come to consider that any sufficiently advanced algorithm for computation may be indistinguishable from intelligence.)

Model theory

"When I use a word, it means what I choose it to mean - neither more nor less."
- Humpty Dumpty in Through the Looking Glass, Lewis Carroll

Whether a computer, or even a person, can ever truly know what is meant by a statement is a philosophical point that can absorb many hours of debate. Fortunately, for our purposes, that does not matter. The basis for our judgements about meaning will be based on entailment, which is a logically tractable concept.

When expressing a statement, we may not know what the words in that statement stand for, but we presume they stand for something. If we make another statement in the same context using some of the same words, we presume they stand for the same things. When we say that some collection of statements P entail another statement S, we mean that when the statements P are true for any given interpretation of the words they contain, statement S is also true for the same interpretation.

Let us expand on the simple trading example introduced earlier: a trader's catalog may offer some item for sale at a price of 100 euros. Separately, the traders terms and conditions of sale may specify that a dollar/euro exchange rate of 0.86 may be applied. Puting these statements together, we may conclude that the sale price is 86 dollars. We expect "dollars" here to mean the unit of US currency, and "euro" to mean the unit of European currency:

From: (sale price €100) AND (€/$ exchange rate 0.86), conclude: sale price $86.

But that is just one possible interpretation: if "euro" denotes "apple" and "dollar" denotes "orange", then:

From: (sale price 100 apples) AND (apple/orange exchange rate 0.86), conclude: sale price 86 oranges.

The condition of entailment between a set of initial statements and some conclusion means that, according to the basic rules of the language used, the truth of the conclusion follows from the truth of the initial statements under any possible interpretation of the words.

The above example has focused on the interpretation of just two words, "euro" and "dollar", and has assumed that the meaning of all the other words is understood. In a really general purpose framework for conveying information, this idea has to be extended to cover other terms used in the expression; e.g. "sale price", "exchange rate" in the above example. This in turn calls for a very careful distinction to be drawn between the inherent structure of a language, the terms of that language, and the (real-world) things denoted by those terms. For natural languages (such as English or French) this is notoriously difficult to achieve. But for the languages (or notations, or data structures) used by computers, it is a realistic goal, and Model Theory is a branch of mathematical logic that shows us how to acieve it.

Roughly, model theory allows an expression to circumscribe a set of possible worlds, in the sense that it constrains the interpretations (the assignment of values to terms in the expression) that are considered to be valid world descriptions, or models of that expression. For a given language, an interpretation assigns values to the terms of an expression in that language; the language semantics define rules for evaluating any language expression to be true or false under some interpretation; interpretations which evaluate to true are considered to be models of the expression, i.e. they correspond to a possible world described by the expression.

The expression P in a language is said to entail expression S if every model (possible world) of P is also a model (possible world) for S. That is, there is no interpretation for which P is true and S is false. (Note: this idea of entailment does not, of itself, do anything to help us to deduce new facts in some language; rather, it provides a basis for deciding if a given rule of deduction is valid.)

Model theory is not an obvious topic to get the hang of, but fortunately most users of a computer language don't need to understand its corresponding model theory, as long as the results they use are justified by that theory. This is similar to the way that most people know the rules for adding and subtracting numbers, without having to understand the underlying mathematical number theory that validates those rules.

Here are a few ideas of Model Theory it may be useful to be aware of:

language (notation):: a system of symbols and rules for communicating information or expressing instructions. The rules can be divided into syntax, which defines the arrangements of symbols considered to be valid expressions in the language, and semantics which define the intended meaning of such expressions.
expression (well-formed-formula, wff):: an expression is any arrangement of language symbols that conforms to the syntax rules for the language.
domain of discourse (universe):: a universe, or world, of entities and concepts to which a language may be applied. Commonly, this will the "real world" in which we live, or some simplified idealization of it.


term (word, name):: a symbol of a language which may be interpreted as refering to something or some concept in a domain of discourse.
interpretation:: an assignment of some entities or concepts in a domain of discourse to the terms in an expression.
denotation (of a term):: a thing or value in the domain of discourse assigned to the term by an interpretation.

model (of an expression):: an interpretation for which the the expression is true, according to the semantic rules of the language.
entailment:: a relationship between two truth-valued expressions: A entails B if under any interpretation for which A is true, B is also true. This can be understood as indicating that all of the possible worlds described by A are also possible worlds according to B, or all the models of A are also models of B.

This is necessarily a very cursory treatment of model theory. For more information see [19] [20] [21] [22]. Pat Hayes' model theory for RDF [7] is presented with some consideration for a reader not already familiar with the ideas.

Semantics of RDF

"It looks terrible, but in fact it isn’t"
- Mao Tse-tung, speaking of the atom bomb.

One work-in-progress of the RDF core working group is a formal semantics for RDF (ref), which is principally the work of Pat Hayes, a leader in the AI community and long-time proponent of logic as a basis for representing knowledge [11].

RDF is, by design, a very simple language, so a formal semantics for RDF is somewhat less onerous than for, say, a typical computer programming language. The basic expressive capability of RDF is comparable with that of the tables in a relational database; i.e. the assertion of simple "ground" facts. This essential simplicity is tempered by the desire that RDF can be a foundation for a family of languages with increasing expressive power, up to and including the full expressive power of first order logic.

RDF is based on a directed graph data-model (not to be confused with a model in the model theoretic sense), in which the graph nodes denote objects or things, and the directed arcs denote relationships between those things. The nodes of the graph are labeled with URI references [9], literal strings or may be unlabeled. The arcs are labeled with URI references. The official syntax of RDF uses XML to describe this graph model. This syntax and the underlying graph model are introduced in W3C's official RDF Model and Syntax Specification (ref). It is quite easy to design an XML based markup language to be RDF compatible, by following the "striping" pattern (ref), in which node names alternate with property names in the nesting of XML elements.

Example of RDF graph data-model:

[Item] --price--> [ItemPrice] --inDollars--> "100"

Informally, the meaning of an RDF graph is the conjunction of the statements that it contains; that is, the possible worlds that it describes are those in which all of the statements are true. Individual statements are represented by arcs in the graph, interpreted as dyadic predicates. So the arc labelled price in the above example asserts that some predicate price(Item,ItemPrice), where Item denotes some item for sale and ItemPrice denotes a value that will be accepted in exchange. Similarly, inDollars(ItemPrice,"100") is asserted to be true. Thus, the models (possible worlds) described by the above RDF graph are all of those for which the given assertions are true.

Application to trust modelling

"There is no trusting appearances"
- School for Scandal, Richard Brinsley Sheridan

Taking a broad (and diligent) view, establishing trust between parties involves the consideration of a wide range of information. Not only simple matters like "if I commit to payment, will this supplier provide the goods?" (which might be underwritten by a third party), but more complex issues like "if I place my order with this supplier, for delivery in six months time, do I run a risk that they will fail to deliver and delay my project?". That is, establishing trust is not simply a matter of avoiding fraudulent deals, but also of ensuring quality and timeliness of supply. It is not always a simple binary decision, but a risk assessment based on evaluation of diverse information.

The details of trust modelling can be quite involved, as exemplified by a range of current work [26] [27], but irrespective of those details a mechanism is needed for exchanging arbitrary information that can safely be used to make decisions about trust.

In a simple bilateral trading arrangement, the trading partners will usually rely on some process of "due diligence", each evaluating the suitability of the other for the purposes of entering into a contract. But with business on the Internet, there may be hundreds, or thousands, or tens of thousands of potential trading partners. Over the years, we have learned that one of the greatest challenges of making things work in the Internet is dealing with the massive scales involved; to really benefit from this kind of scaling, as much processing as possible needs to be automated, and as much as possible of this processing should be handled at the edge of the network, by the parties involved.

To automate the process of trust establishment, a firm foundation is needed for exchanging not only data, but information that is meaningful in the sense that the rules for combining information and drawing new conclusions are well-defined. Given these rules and agreement on some basic assertions, certain conclusions may be drawn that can be beyond dispute. The relationship of entailment corresponds to this notion of an undisputable conclusion, and formal semantics (model theory) provides the basis for validating the rules whereby these conclusions can be discovered or proved to be valid.

Conclusions

"The end of our foundation is the knowledge of causes"
- Francis Bacon in New Atlantis

RDF is a language for expressing basic facts, and its model theory provides a sound basis for evaluating entailment using some assumed rules of inference. DAML+OIL has taken this further and provides mecahisms for describing and reasoning about the relationship between described entities; in particular, DAML+OIL rules can sometimes be used to prove that two differently-named values are in fact the same, a simple conclusion that turns out to be surprisingly powerful in some applications.

Thus, RDF is an important step toward automated evaluation and establishment of trust, by virtue of providing a means to exchange arbitrary information to be used as a basis for automated reasoning. This ability of RDF has other uses, so results from work across the range of semantic web activities can be used to support trust assessment.

But there is still work to do. As yet, there is no universally accepted way to express in RDF that statements may be unreliable, or contingent on some other facts or assumptions, both of which seem to be quite important in the evaluation of trust. This is an area of some ongoing research [23]. Also, there is ongoing work [24] [27] to describe and formalize trust relationships, which would need to be transferred to RDF [28].

References

"They are the ground, the books, the academes"
- Love's Labour's Lost, William Shakespeare

[1] World Wide Web Consortium (W3C), http://www.w3.org/.

[2] W3C Semantic Web Activity, http://www.w3.org/2001/sw/.

[3] Tim Berners-Lee, James Hendler and Ora Lassila, The Semantic Web, Scientific American, May 2001, http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html.

[4] Dan Brickley, Semantic Web History: Nodes and Arcs, 1989-1999; The WWW Proposal and RDF, November 1999, revised May 2001, http://www.w3.org/1999/11/11-WWWProposal/.

[5] Ora Lassila and Ralph R. Swick, Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation, February 1999, http://www.w3.org/TR/REC-rdf-syntax.

[6] W3C RDF Core working group, http://www.w3.org/2001/sw/RDFCore/.

[7] Patrick Hayes, RDF Model Theory, (work in progress) January 2002, http://www.w3.org/TR/rdf-mt/.

[8] Dan Brickley, RDF: Understanding the Striped RDF/XML Syntax, October 2001, http://www.w3.org/2001/10/stripes/.

[9] Tim Berners-Lee, Roy Fielding and Larry Masinter, Uniform Resource Identifiers (URI): Generic Syntax, IETF RFC 2396, August 1998, http://www.rfc-editor.org/rfc/rfc2396.txt.

[10] Alfred Tarski, The Semantic Conception of Truth, Philosophy and Phenomenological Research 4, 1944, http://www.ditext.com/tarski/tarski.html.

[11] Partrick Hayes, In Defense of Logic, Proceedings of International Joint Conference on Artificial Intelligence 1975, Morgan Kaufmenn 1977.

[12] Dana Scott and Christopher Strachey, Toward a Mathematical Semantics for Computer Languages, Technical Monograph PRG-6, Oxford University Computing Laboratory, Programming Research Group, August 1971.

[13], John Sowa, Knowledge Representation: Logical Philosophical and Computational Foundations, Brooks/Cole, Thompson Learning, 2000.

[14] W. Woods. and J. Schmolze, The KL-ONE family, Computers and Mathematics with Applications, special issue: Semantic Networks in Artificial Intelligence, Vol. 2-5, pp. 133-177, 1992.

[15] Alexander Borgida, Ronald J. Brachman, Deborah L. McGuiness and Lori Alperin Resnick, CLASSIC: A Structural Data Model for Objects, International Conference on Management of Data and Symposium on Principles of Database Systems, Proceedings of the 1989 ACM SIGMOD international conference on Management of data, 1989. Abstract: http://portal.acm.org/citation.cfm?id=66932.

[16] DAML, The DARPA Agent Markup Language Homepage, http://www.daml.org/.

[17] On-To-Knowledge home page, http://www.ontoknowledge.org/. (See also link to OIL page http://www.ontoknowledge.org/oil/index.shtml.)

[18] Dan Connolly, Frank van Harmelen, Ian Horrocks, Deborah L. McGuinness, Peter F. Patel-Schneider andLynn Andrea Stein, DAML+OIL (March 2001) Reference Description, W3C NOTE, December 2001, http://www.w3.org/TR/daml+oil-reference. (See also: http://www.daml.org/2001/03/daml+oil-index, http://www.w3.org/TR/daml+oil-model, http://www.w3.org/TR/daml+oil-axioms, http://www.w3.org/TR/daml+oil-walkthru/.)

[19] Formal Systems - Definitions (a summary in one page, from Ruth E. Davis, Truth, Deduction, and Computation. New York: Computer Science press, 1989.),
http://www-rci.rutgers.edu/~cfs/305_html/Deduction/FormalSystemDefs.html.

[20] Geoffrey Hunter, Metalogic: An Introduction to the Metatheory of Standard First Order Logic, University of California Press, 1971, ISBN 0-520-02356-0.

[21] [[[standard, readily available text on logic, formal systems, model theory?]]]

[22] William Weiss and Cherie D'Mello, Fundamentals of Model Theory, 1997, Full text online at: http://www.math.toronto.edu/~weiss/.

[23] Graham Klyne, Contexts for RDF Information Modelling, (informal note) October 2000, http://www.ninebynine.org/RDFNotes/RDFContexts.html.

[24] The Trusted e-Services Partnership, http://www.bitd.clrc.ac.uk/Activity/e-Trust.

[25] W3C, Web Ontology (WebOnt) Working Group, http://www.w3.org/2001/sw/WebOnt/.

[26] T. Grandison and M. Sloman, A Survey of Trust in Internet Applications, in IEEE Communications Surveys and Tutorials, Fourth Quarter 2000.

[27] Theo Dimitrakos and Juan Bicarregui, Towards a Framework for Managing Trust in e-Services, in Proceedings of the 4 th International Conference on Electronic Commerce Research, Dallas, Texas, USA, November 2001. ISBN 0- 9716253-0-1, http://www.bitd.clrc.ac.uk/PublicationAbstract/1377.

[28] Theo Dimitrakos, Brian Matthews and Juan Bicarregui, Towards supporting security and trust management policies on the Web, CLRC Rutherford Appleton Laboratory, Oxfordshire, OX11 0QX, UK, http://www.bitd.clrc.ac.uk/PublicationAbstract/1369.