Java 7 for Absolute Beginners

(nextflipdebug5) #1

CHAPTER 9 ■ WRITING AND READING XML


Listing 9-1. The Smallest Possible XML File

<?xml version="1.0" encoding="UTF-8"?>
<elementName/>

I have worked with systems that had many such files, as each directory in a set of directories meant
to contain the output of a complex process had to have at least one file. Consequently, we had a bunch
of XML files with content as follows: <?xml version="1.0" encoding="UTF-8"?><placeholder/>You can
see the exact syntax shortly. Until then, a more meaningful example will help to clarify things. Here's one
of my favorite poems, encoded as an XML document.

Listing 9-2. An Example of XML

<?xml version="1.0" encoding="UTF-8"?>
<poem title="The Great Figure" author="William Carlos Williams">
<line>Among the rain</line>
<line>and lights</line>
<line>I saw the figure 5</line>
<line>in gold</line>
<line>on a red</line>
<line>fire truck</line>
<line>moving</line>
<line>tense</line>
<line>unheeded</line>
<line>to gong clangs</line>
<line>siren howls</line>
<line>and wheels rumbling</line>
<line>through the dark city</line>
</poem>

The first line, the document specifier, indicates that this document is an XML document and
specifies the version (1.0, which is the most often used version, and suffices for most purposes) and the
encoding. Document specifiers always begin with <? and end with ?>. This way, they can't be confused
with XML elements. Most systems that can process XML will work with documents that don't have a
document specifier, but a document without one isn't strictly an XML file—it's just a collection of
characters that happen to look like an XML file. That may seem like an arbitrary and trivial distinction,
but your XML document may be rejected for just that reason by some systems, so it's good to get in the
habit of always including a document specifier. The encoding indicates the character set that applies to
the content. UTF-8 is a large set that includes most of the characters available in non-Asian languages
(including English, Greek, Spanish, Russian, and many others). The Asian languages (Chinese, Japanese,
Vietnamese, and others) use pictographs (that is, an image that corresponds to a word). The Asian
character sets are consequently very large and tricky to manipulate. For the sake of simplicity, we'll stick
to UTF-8 and documents in English.
The next line contains the root element. The first element in any XML file is that document's root
element. All other elements, no matter how deeply nested, are descendants of the root. The root
element, poem, contains two attributes, title and author. The root element also contains all the line
elements, which make up the body of the poem.
Note the syntax for each element. Each one begins with an opening tag (<poem> or <line>) and ends
with a closing tag (</poem> or </line>). The basic rule is that the names within the tags have to match
(and there are various restrictions about which characters can be used, but just about any English word
works). Other than that, opening tags always start with a left angle character (<) and end with a right
angle character(>). Ending tags always begin with a left angle character and a forward slash (</) and end
Free download pdf