Python for Finance: Analyze Big Financial Data

(Elle) #1
split ([sep[, maxsplit]]) List of words in string with sep as separator

splitlines

([keepends])

Separated lines with line ends/breaks if keepends is True

strip

(chars)

Copy of string with leading/lagging characters in chars removed

upper

()

Copy with all letters capitalized

A powerful tool when working with string objects is regular expressions. Python


provides such functionality in the module re:


In  [ 30 ]: import re

Suppose you are faced with a large text file, such as a comma-separated value (CSV) file,


which contains certain time series and respective date-time information. More often than


not, the date-time information is delivered in a format that Python cannot interpret


directly. However, the date-time information can generally be described by a regular


expression. Consider the following string object, containing three date-time elements,


three integers, and three strings. Note that triple quotation marks allow the definition of


strings over multiple rows:


In  [ 31 ]: series  =   ”””
‘01/18/2014 13:00:00’, 100, ‘1st’;
‘01/18/2014 13:30:00’, 110, ‘2nd’;
‘01/18/2014 14:00:00’, 120, ‘3rd’
”””

The following regular expression describes the format of the date-time information


provided in the string object:


[ 21 ]

In  [ 32 ]: dt  =   re.compile(”’[0-9/:\s]+’”)      #   datetime

Equipped with this regular expression, we can go on and find all the date-time elements.


In general, applying regular expressions to string objects also leads to performance


improvements for typical parsing tasks:


In  [ 33 ]: result  =   dt.findall(series)
result
Out[33]: [“‘01/18/2014 13:00:00’”, “‘01/18/2014 13:30:00’”, “‘01/18/2014 14:00:0
0’”]

REGULAR EXPRESSIONS

When parsing string objects, consider using regular expressions, which can bring both convenience and

performance to such operations.

The resulting string objects can then be parsed to generate Python datetime objects (cf.


Appendix C for an overview of handling date and time data with Python). To parse the


string objects containing the date-time information, we need to provide information of


how to parse — again as a string object:


In  [ 34 ]: from datetime import datetime
pydt = datetime.strptime(result[ 0 ].replace(”’”, ””),
‘%m/%d/%Y %H:%M:%S’)
pydt
Out[34]: datetime.datetime(2014, 1, 18, 13, 0)
In [ 35 ]: print pydt
Free download pdf