38 2 XML Semantics
<molecule id="m1" title="nitrous oxide">
<atomArray>
<atom id="n1" elementType="N"/>
<atom id="o1" elementType="O"/>
</atomArray>
<bondArray>
<bond atomRefs="n1 o1"/>
</bondArray>
</molecule>
Figure 2.3 The representation of nitrous oxide using CML.
2.2 Infosets
Although XML is not usually regarded as being an ontology language, it is
formally defined, so it certainly can be used to define ontologies. In fact, it
is currently the most commonly used and supported approach to ontologies
among all of the approaches considered in this book.
The syntax for XML is defined in (W3C 2001b). The structure of a docu-
ment is specified using a DTD as discussed in section 1.2. A DTD can be re-
garded as being an ontology. A DTD defines concepts (using element types)
and relationships (using the parent-child relationship and attributes). The
concept of a DTD was originally introduced in 1971 at IBM as a means of
specifying the structure of technical documents, and for two decades it was
seldom used for any other purpose. However, when XML was introduced,
there was considerable interest in using it for other kinds of data, and XML
has now become the preferred interchange format for any kind of data.
The formal semantics for XML documents is defined in (W3C 2004b). The
mathematical model is called aninfoset. The mathematical model for the
XML document in figure 2.1 is shown in figure 2.4. The infoset model con-
sists ofnodes(shown as rectangles or ovals) andrelationship links(shown as
arrows). There are various types of nodes, but the two most common types
areelement nodesandtext nodes. There are two kinds of relationship link:
parent-child linkandattribute link. Every infoset model has aroot node. For
an XML document, the root node has exactly one child node, but infosets
in general can have more than one child node of the root, as, for example,
when the infoset represents a fragment of an XML document or the result of
a query.