Too much data, or not enough?
How are we to deal with the massive volumes of data generated by and available using Internet and Web technologies? There is much too much for a single person to use meaningfully. And much of it is biased, of dubious content, or just plain wrong.
yield information not otherwise available
increase the reliability/accuracy of what is discovered
uncover new relationships
The traditional view of computer security is one of locks and keys. But, for e-services, the real issues are those more like “Would you buy a used car from this person?”.
According to Claude Shannon, information is that which reduces uncertainty, and a goal is to find or deduce this from the raw data available.
... without trust there can be no security
What are the various sources of data and information that we can bring together as part of a wider knowledge handling and development strategy?
Both the content and the protocol elements of email transfers convey information.
Again, both content and protocol headers. In some cases, the identity of the requester may also be significant.
Local file stores contain valued data. Protocol-related metadata is not available, but the fcat it is stored suggests some value.
Applications create or modify data using user-supplied information. If accessible, locally created application data is likely to be very relevant.
Words: word processing, text editors, ...
Numbers: spreadsheet, finance, ...
Pictures: drawings, images, ...
Combinations: database, PIMs, ...
These are components that we might employ to combine information from thje various sources identified.
Some pieces are available, but we are still evolving the technologies
The underlying model of RDF is based on a directed labelled graph. Nodes and arcs are labelled using URIs.
RDF syntax is based on XML. At first approach, the syntax of RDF is complex and confusing, but this is largely due to the way it is presented in the RDF specification. In practice, many intuitive XML formats can be mapped to an RDF-compliant form very easily. But RDF is both more regular and more flexible than raw XML, making it easier to write general purpose applications that deal with the information content of data, as well as its syntax.
"There's a freedom about the Internet: As long as we accept the rules of sending packets around, we can send packets containing anything to anywhere." [Berners-Lee]
The Internet has been the basis of an explosion in communication services -- why?
In the Internet, communication services are defined by the users, the parties who connect to the network, not the network provider
Service provision is driven by user needs and desires
The IP datagram is basis of all Internet comm services.
IP datagram sent between any pair of endpoints; hence end-to-end architecture. It is (in theory, and the absence of firewalls) communicated transparently between any pair of Internet-connected systems. (Netork address translation is evil because it breaks this important architectural model; hosts subjected to this are second class citizens on the Internet. Firewalls have a similar effect, but at least they are under some form of administrative control by the end user.)
This approach separates the infrastructure from service: service additions don’t need infrastructure changes.
Look for separation of information infrastructure (information formats) from service definition (applications).
RDF provides basic building blocks of information formats, that can, in theory, be applied to any form of information, yet maintains a simple consistent underlying structure for which common support tools can be developed. It might be regarded as the IP datagram of information representation.
Users can decide what to do with that information, unfettered by the design of the application that created it. Alternative tools can be brought to bear, and an independent, multivendor market in such tools can be created.
RSS web site summary
Dublin core bibliographic information
... and many more works in progress ...
Initially, the focus will be on descriptions applied to data, or other information about the way data is being used. But in due course, more and more data formats may be converted to use RDF as their primary representation. Then RDF applications can access the content without having to rely on an intermediary processor to extract the information to an RDF-based form.
http://www.tmdenton.com/netheads.htm -- looks at the issues of centre-defined vs edge-defined communication networks.
ftp://ftp.isi.edu/in-notes/rfc1958.txt -- a description of Internet architecture
ftp://ftp.isi.edu/in-notes/rfc2775.txt -- discussion of the Internet architecture in recent times