Python for Finance: Analyze Big Financial Data

In [ 25 ]: content = resp.read() content[: 100 ] # first 100 characters of the file Out[25]: ‘<!doctype html>\n<html lang=“en”>\n\n\t<head>\n\t\t<meta charset=“utf- 8”>\n\n\t\t<title>Dr. Yves J. Hilpisch \xe2\x80’

Once you have the content of a particular web page, there are many potential use cases.

You might want to look up certain information, for example. You might know that you can

find the email address on the page by looking for E (in this very particular case). Since

content is a string object, you can apply the find method to look for E:

[ 52 ]

In [ 26 ]: index = content.find(‘ E ‘) index Out[26]: 2071

Equipped with the index value for the information you are looking for, you can inspect the

subsequent characters of the object:

In [ 27 ]: content[index:index + 29 ] Out[27]: ‘ E contact [at] dyjh [dot] de’

Once you are finished, you should again close the connection to the server:

In [ 28 ]: http.close()

urllib

There is another Python library that supports the use of different web protocols. It is called

urllib. There is also a related library called urllib2. Both libraries are designed to work

with arbitrary web resources, in the spirit of the “uniform” in URL (uniform resource

locator).

[ 53 ]

A standard use case, for example, is to retrieve files, like CSV data files, via the

Web. Begin by importing urllib:

In [ 29 ]: import urllib

The application of the library’s functions resembles that of both ftplib and httplib. Of

course, we need a URL representing the web resource of interest (HTTP or FTP server, in

general). For this example, we use the URL of Yahoo! Finance to retrieve stock price

information in CSV format:

In [ 30 ]: url = ‘http://ichart.finance.yahoo.com/table.csv?g=d&ignore=.csv’ url += ‘&s=YHOO&a=01&b=1&c=2014&d=02&e=6&f=2014’

Next, one has to establish a connection to the resource:

In [ 31 ]: connect = urllib.urlopen(url)

With the connection established, read out the content by calling the read method on the

connection object:

In [ 32 ]: data = connect.read()

The result in this case is historical stock price information for Yahoo! itself:

In [ 33 ]: print data Out[33]: Date,Open,High,Low,Close,Volume,Adj Close 2014-03-06,39.60,39.98,39.50,39.66,10626700,39.66 2014-03-05,39.83,40.15,39.19,39.50,12536800,39.50 2014-03-04,38.76,39.79,38.68,39.63,16139400,39.63 2014-03-03,37.65,38.66,37.43,38.25,14714700,38.25 2014-02-28,38.55,39.38,38.22,38.67,16957100,38.67 2014-02-27,37.80,38.48,37.74,38.47,15489400,38.47 2014-02-26,37.35,38.10,37.34,37.62,15778900,37.62 2014-02-25,37.48,37.58,37.02,37.26,9756900,37.26 2014-02-24,37.23,37.71,36.82,37.42,15738900,37.42