- Open the script2002.py file with an editor and enter the code shown here:
Click here to view code image
1: #!/usr/bin/python3
2:
3: import urllib.request
4: import lxml.etree
5: from lxml.cssselect import CSSSelector
6:
7: url = 'http://weather.yahoo.com/united- states/illinois/chicago-
2379574/'
8: response = urllib.request.urlopen(url)
9: html = response.read()
10:
11: parser = lxml.etree.HTMLParser(encoding='utf-8')
12: doctree = lxml.etree.fromstring(html, parser)
13:
14: div = CSSSelector("div.'day-temp-current temp-f'")
15: temp = div(doctree)[0].text[0:-1]
16: print('The current temperature in Chicago is', temp) - Save the file.
- Run the script from a command line, as shown here:
Click here to view code image
pi@raspberrypi ~ $ python3 script2002.py
The current temperature in Chicago is 79
pi@raspberrypi ~ $
This example had to add one extra step in processing the data (refer to line 15). Unfortunately, the
data returned from the element contained the ° HTML code to make the fancy degree symbol on
the webpage. Depending on the terminal you use to run the program, that code may produce an odd
ASCII character in the text output. To avoid that, you use the text method to convert the data into a
text string, and then you use string splicing to remove the odd character at the end of the string value,
like this:
Click here to view code image
temp = div(doctree)[0].text[0:-1]
The beauty of this script is that after you extract the temperature data from a webpage, you can do
whatever you want with it, such as create a table of temperatures to track historical temperature data.
You can then schedule the script to run at regular intervals to track the temperature throughout the day
(or even combine it with the email script to automatically email it to yourself)!
Watch Out!: The Volatility of the Internet
The Internet is a dynamic place. Don’t be surprised if you spend hours working out the
precise location of data on a webpage, only to find that it’s moved a couple weeks
later, breaking your script. In fact, it’s quite possible that this example won’t work by
the time you read this book. If you know the process for extracting data from
webpages, as shown in this Try It Yourself, you can then apply that principle to any
situation.