Learning Python Network Programming

APIs in Action

The square brackets after div, [@id="content"], form a condition that we place on
the

elements that we're matching. The @ sign before id means that id refers
to an attribute, so the condition means: only elements with an id attribute equal to
"content". This is how we can find our content

.

Before we employ this to extract our information, let's just touch on a couple of
useful things that we can do with conditions. We can specify just a tag name,
as shown here:

root.xpath('//div[h1]')

[<Element div at 0x39e05c8>]

This returns all

elements which have an

child element. Also try:

root.xpath('body/div[2]'):

[<Element div at 0x39e05c8>]

Putting a number as a condition will return the element at that position in the
matched list. In this case this is the second

child element of .
Note that these indexes start at 1 , unlike Python indexing which starts at 0.

There's a lot more that XPath can do, the full specification is a World Wide
Web Consortium (W3C) standard. The latest version can be found at
http://www.w3.org/TR/xpath-3/.

Pulling it together

Now that we've added XPath to our superpowers, let's finish up by writing a script
to get our Debian version information. Create a new file, get_debian_version.py,
and save the following to it:

import re

import requests

from lxml.etree import HTML

response = requests.get('http://www.debian.org/releases/stable/')

root = HTML(response.content)

title_text = root.find('head').find('title').text

release = re.search('\u201c(.*)\u201d', title_text).group(1)

p_text = root.xpath('//div[@id="content"]/p[1]')[0].text

version = p_text.split()[1]

print('Codename: {}\nVersion: {}'.format(release, version))

Learning Python Network Programming

Pulling it together

Get our desktop app

Company

Features

Documentation

Resources