176 8 Query Languages
8.1 XML Navigation Using XPath
XPath is a language for selecting parts of an XML document (W3C 1999). If
one has used computer file systems, then XPath navigation should be famil-
iar. For example, in the health study database, one can obtain all interviews
with this query:
HealthStudy/Interview
One specifies locations in an XML document by using the same notation that
is used for locating directories (folders) and files in most operating systems
(except that in Windows, a backward slash is used where a forward slash
would be used in XPath). Queries in XPath are calledpathsbecause they
describe the path to be followed to obtain the desired information.
In a document that has a much deeper hierarchical structure, one can use
a double slash to mean “skip any number of intermediate levels.” For ex-
ample, in the Medline database, one can obtain all substances by using this
path:
//NameOfSubstance
To obtain this set of elements without the double slash, one would have to
specify this path:
MedlineCitation/ChemicalList/Chemical/NameOfSubstance
Attributes are specified by using the at-sign character (@). Togetalistof
all of the body mass index (BMI) values in the health study, use this path:
//Interview/@BMI
The format of the result of a path will vary with the specific tool being used.
A typical result of the path above would look like this:
BMI = 18.66 BMI = 26.93 BMI = 33.95 BMI = 17.38
-> 4 item(s)
A path consists of a sequence ofsteps. Each step selects one or more desired
nodes. There are many kinds of node. The following are the most important:
- element. To select an element, simply give its name. The name can in-
clude a namespace prefix as in section 1.7. To select every element at one
level, use an asterisk (*). The asterisk is also known as the “star” or the
“wild card.” For example, in a MedlineCitation, one can obtain every
child element of every Chemical node by using this path: