2.4 XML Data 47
One can make similar restrictions for theYearandMonthelements. How-
ever, this still does not entirely capture all possible restrictions. For example,
it would allow February to have 31 days. As it happens, there is an XML
datatype for a date which includes all restrictions required for an arbitrary
calendar date. To use this datatype, replace theYear,Month,andDayele-
ments with the following:
<element name=’DateCreated’ type=’xsd:date’/>
Using this approach, the Medline citation in figure 2.1 would look like this:
The semantics of an XML datatype is given in three parts:
- Thelexical spaceis the set of strings that are allowed by the datatype. In
other words, the kind of text that can appear in an attribute or element
that has this type. - Thevalue spaceis the set of abstract values being represented by the strings.
Each string represents exactly one value, but one value may be repre-
sented by more than one string. For example, 6.3200 and 6.32 are different
stringsbut they represent the samevalue. In other words, two strings have
the samemeaningwhen they represent the same value. - A set offacetsthat determine what operations can be performed on the
datatype. For example, a set of values can be sorted only if the datatype
has theorderedfacet.
For some datatypes, the lexical space and value space coincide, so what one
sees is what it means. However, for most datatypes there will be multiple
representations of the same value. When this is the case, each value will
have acanonicalrepresentation. Since values and canonical representations
correspond exactly to each other, in a one-to-one fashion, it is reasonable to
think of the canonical representation as being the meaning.
XSD includes over 40 built-in datatypes. In addition one can construct
datatypes based on the built-in ones. The built-in datatypes that are the most
useful to bioinformatics applications are: