In [ 25 ]: content = resp.read()
content[: 100 ]
# first 100 characters of the file
Out[25]: ‘<!doctype html>\n<html lang=“en”>\n\n\t<head>\n\t\t<meta charset=“utf-
8”>\n\n\t\t<title>Dr. Yves J. Hilpisch \xe2\x80’
Once you have the content of a particular web page, there are many potential use cases.
You might want to look up certain information, for example. You might know that you can
find the email address on the page by looking for E (in this very particular case). Since
content is a string object, you can apply the find method to look for E:
[ 52 ]
In [ 26 ]: index = content.find(‘ E ‘)
index
Out[26]: 2071
Equipped with the index value for the information you are looking for, you can inspect the
subsequent characters of the object:
In [ 27 ]: content[index:index + 29 ]
Out[27]: ‘ E contact [at] dyjh [dot] de’
Once you are finished, you should again close the connection to the server:
In [ 28 ]: http.close()
urllib
There is another Python library that supports the use of different web protocols. It is called
urllib. There is also a related library called urllib2. Both libraries are designed to work
with arbitrary web resources, in the spirit of the “uniform” in URL (uniform resource
locator).
[ 53 ]
A standard use case, for example, is to retrieve files, like CSV data files, via the
Web. Begin by importing urllib:
In [ 29 ]: import urllib
The application of the library’s functions resembles that of both ftplib and httplib. Of
course, we need a URL representing the web resource of interest (HTTP or FTP server, in
general). For this example, we use the URL of Yahoo! Finance to retrieve stock price
information in CSV format:
In [ 30 ]: url = ‘http://ichart.finance.yahoo.com/table.csv?g=d&ignore=.csv’
url += ‘&s=YHOO&a=01&b=1&c=2014&d=02&e=6&f=2014’
Next, one has to establish a connection to the resource:
In [ 31 ]: connect = urllib.urlopen(url)
With the connection established, read out the content by calling the read method on the
connection object:
In [ 32 ]: data = connect.read()
The result in this case is historical stock price information for Yahoo! itself:
In [ 33 ]: print data
Out[33]: Date,Open,High,Low,Close,Volume,Adj Close
2014-03-06,39.60,39.98,39.50,39.66,10626700,39.66
2014-03-05,39.83,40.15,39.19,39.50,12536800,39.50
2014-03-04,38.76,39.79,38.68,39.63,16139400,39.63
2014-03-03,37.65,38.66,37.43,38.25,14714700,38.25
2014-02-28,38.55,39.38,38.22,38.67,16957100,38.67
2014-02-27,37.80,38.48,37.74,38.47,15489400,38.47
2014-02-26,37.35,38.10,37.34,37.62,15778900,37.62
2014-02-25,37.48,37.58,37.02,37.26,9756900,37.26
2014-02-24,37.23,37.71,36.82,37.42,15738900,37.42