APIs in Action
If we navigated to this element by using the ElementTree functions, which we have
used before, then we'd end up with something like the following:
root.find('body').findall('div')[1].find('p').text
Debian 8.0 was.
But this isn't the best approach, as it depends quite heavily on the HTML structure.
A change, such as a
break it. Also, in more complex documents, this can lead to horrendous chains of
method calls, which are hard to maintain. Our use of the
section to get the codename is an example of a good technique, because there is
always only one and one
finding our
It's a common web page design pattern to break a page into a few top-level
for the major page sections like the header, the footer and the content, and to give the
Hence, if we could search for
we'd have a clean way of selecting the right
document that is a match, and it's unlikely that another
to the document. This approach doesn't depend on the document structure, and so
it won't be affected by any changes that are made to the structure. We'll still need
to rely on the fact that the
tag in the
tag that appears, but
given that there is no other way to identify it, this is the best we can do.
So, how do we run such a search for our content
Searching with XPath
In order to avoid exhaustive iteration and the checking of every element, we need
to use XPath, which is more powerful than what we've used so far. It is a query
language that was developed specifically for XML, and it's supported by lxml. Plus,
the standard library implementation provides limited support for it.
We're going to take a quick look at XPath, and in the process we will find the answer
to the question posed earlier.
To get started, use the Python shell from the last section, and do the following:
root.xpath('body')
[<Element body at 0x39e0908>]