Learning Python Network Programming

(Sean Pound) #1
Chapter 3

This is the simplest form of XPath expression: it searches for children of the current
element that have tag names that match the specified tag name. The current element
is the one we call xpath() on, in this case root. The root element is the top-level


element in the HTML document, and so the returned element is the
element.

XPath expressions can contain multiple levels of elements. The searches start
from the node the xpath() call is made on and work down the tree as they match
successive elements in the expression. We can use this to find just the

child
elements of :





root.xpath('body/div')





[<Element div at 0x39e06c8>, <Element div at 0x39e05c8>, <Element div
at 0x39e0608>]


The body/div expression means match

children of children of the
current element. Elements with the same tag can appear more than once at the same
level in an XML document, so an XPath expression can match multiple elements,
hence the xpath() function always returns a list.


The preceding queries are relative to the element that we call xpath() on, but we
can force a search from the root of the tree by adding a slash to the start of the
expression. We can also perform a search over all the descendants of an element,
with the help of a double-slash. To do this, try the following:





root.xpath('//h1')





[<Element h1 at 0x2ac3b08>]


Here, we've directly found our

element by only specifying a single tag, even
though it's several levels below root. This double-slash at the beginning of the
expression will always search from the root, but we can prefix this with a dot if we
want it to start searching from the context element.





root.find('head').xpath('.//h1')





[]


This will not find anything because there are no

descendents of .


XPath conditions


So, we can be quite specific by supplying paths, but the real power of XPath lies
in applying additional conditions to the elements in the path. In particular, our
aforementioned problem, which is, testing element attributes.





root.xpath('//div[@id="content"]')





[<Element div at 0x39e05c8>]