A brief introduction to RDF

This note is intended as a 1-page introduction to RDF, aimed at software developers and technically oriented people having a some familiarity with XML and data structures in computer programs.

Background

Broadly, RDF maps a Directed Labelled Graph (DLG) data model onto XML. The graph contains nodes, and arcs that connect the nodes. The graph nodes and arcs are labelled using URIs [3]. It also turns out that a relational data model is also easily mapped onto this form, with a node corresponding to a table row or primitive value, and an arc corresponding to a column identifier. (For comparison, the data model that underlies XML is an annotated tree, with XML elements corresponding to nodes of the tree, and XML attributes being annotations applied to some node.)

This graph based model for RDG data is very flexible, deriving indirectly from a long history of research into knowledge representation (Semantic Nets, Conceptual Graphs, etc.) It allows arbitrarily complex data to be represented in a form with a very simnple, uniform ujnderlying structure, making it potentially accessible to a range of general purpose RDF processing tools. It can also be easily applied to much more prosaic purposes.

RDF/XML syntax

The RDF syntax defines a number of forms for XML that are easily mapped onto the DLG structure. The basic RDF syntax can be rather cumbersome, but if one assumes schema-aware RDF processing (or a special-purpose processor), it is possible to design a reasonably neat RDF-compatible form of XML for a typical application.

RDF does allow XML attributes to be used for arc names, but the syntax can get very confusing. I'd suggest designing the format with just XML elements initially, then mapping that to an attributes-as-arcs form later. Hopefully, the result will be very close to an XML format one might design for a specific application, with the possible benefit of a more regular structure and usability by generic RDF processing software.

RDF XML syntax can use XML elements to name both graph nodes and arcs: one tends to end up with an alternating nesting structure:

<Node-type-1 about="Node-name-1">
   <arc-label-1>
     <Node-type-2 about="Node-name-2">
       <arc-label-2>
       value
       </arc-label-2>
        :
     </Node-type-2>
   </arc-label-1>
   <arc-label-3>
     <Node-type-3 about="Node-name-3">
      :
     </Node-type-3>
   </arc-label-3>
    :
</Node-type-1>

Representing:

 [Node-name-1]
   |
   +--rdf:type-----> [Node-type-1]
   +--arc-label-1--> [Node-name-2]
   |                  |
   |                  +--rdf:type-----> [Node-type-2]
   |                  +--arc-label-2--> "value"
   |                  :
   |
   +--arc-label-3--> [Node-name-3]
   |                  |
   |                  +--rdf:type-----> [Node-type-3]
   |                  :
   :

Where:

[x]: denotes a resource named 'x' (e.g. a table row),
"foo": denotes a literal value (e.g. a column value in a row), and
--y-->: denotes a property named 'y' (e.g. a column name).

Example: email message headers

(This relates to a proposal for representing RFC822 email messages in XML [4].)

Applying this apporach to representing an email message, I view the headers of a message as providing the arcs of an RDF graph:

[<Message>]
   |
   +--subject---> "(subject)"
   +--date------> "(date)"
   +--comments--> "(description)"
   +--to--------> [<Address>]
   |                |
   |                +--adrs-->"(email adrs)"
   |                +--name-->"(formal name)"
   |
   +--to--------> [<Address>]
   :

The nodes are either resources (typed and identified objects) or literal strings. The above graph codes into RDF+XML thus:

<rdf:RDF xmlns:rdf="...">
  <Message>
    <subject>(subject)</subject>
    <date>(date)</date>
    <comments>(comments)</comments>
    <to>
      <Address>
        <adrs>(email adrs)</adrs>
        <name>(formal name)</name>
      </Address>
    </to>
    <to>
      <Address>
        :
      </Address>
    </to>
     :
  </Message>
</rdf:RDF>

Note the informal convention of using lowercase initial letter for properties, and uppercase for node type names.

RDF allows some alternative XML coding forms, which can be intermixed; I've used the example form above as I think it's a reasonably obvious way of using XML for messages when not using RDF.

References

[1]: Resource Description Framework (RDF) Model and Syntax Specification; http://www.w3.org/TR/REC-rdf-syntax.
[2]: Resource Description Framework (RDF) Schema Specification 1.0; http://www.w3.org/TR/rdf-schema.
[3]: RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax; ftp://ftp.isi.edu/in-notes/rfc2396.txt
[4]: An XML format for mail and other messages; http://search.ietf.org/internet-drafts/draft-klyne-message-rfc822-xml-01.txt
[5]: Why RDF model is different from the XML model; http://www.w3.org/DesignIssues/RDF-XML.html

Other resources

http://www710.univ-lyon1.fr/~champin/rdf-tutorial/: An extended tutorial on RDF.
http://www.w3.org/DesignIssues/: Background to web architecture -- I found these notes put RDF into a wider web perspective.
http://www.w3.org/RDF/: W3C page of links to RDF resources.

For feedback please see: <http://www.ninebynine.org/index.html#Contact>
Last updated: 25-Jun-2001, GK.