split ([sep[, maxsplit]]) List of words in string with sep as separator
splitlines
([keepends])
Separated lines with line ends/breaks if keepends is True
strip
(chars)
Copy of string with leading/lagging characters in chars removed
upper
()
Copy with all letters capitalized
A powerful tool when working with string objects is regular expressions. Python
provides such functionality in the module re:
In [ 30 ]: import re
Suppose you are faced with a large text file, such as a comma-separated value (CSV) file,
which contains certain time series and respective date-time information. More often than
not, the date-time information is delivered in a format that Python cannot interpret
directly. However, the date-time information can generally be described by a regular
expression. Consider the following string object, containing three date-time elements,
three integers, and three strings. Note that triple quotation marks allow the definition of
strings over multiple rows:
In [ 31 ]: series = ”””
‘01/18/2014 13:00:00’, 100, ‘1st’;
‘01/18/2014 13:30:00’, 110, ‘2nd’;
‘01/18/2014 14:00:00’, 120, ‘3rd’
”””
The following regular expression describes the format of the date-time information
provided in the string object:
[ 21 ]
In [ 32 ]: dt = re.compile(”’[0-9/:\s]+’”) # datetime
Equipped with this regular expression, we can go on and find all the date-time elements.
In general, applying regular expressions to string objects also leads to performance
improvements for typical parsing tasks:
In [ 33 ]: result = dt.findall(series)
result
Out[33]: [“‘01/18/2014 13:00:00’”, “‘01/18/2014 13:30:00’”, “‘01/18/2014 14:00:0
0’”]
REGULAR EXPRESSIONS
When parsing string objects, consider using regular expressions, which can bring both convenience and
performance to such operations.
The resulting string objects can then be parsed to generate Python datetime objects (cf.
Appendix C for an overview of handling date and time data with Python). To parse the
string objects containing the date-time information, we need to provide information of
how to parse — again as a string object:
In [ 34 ]: from datetime import datetime
pydt = datetime.strptime(result[ 0 ].replace(”’”, ””),
‘%m/%d/%Y %H:%M:%S’)
pydt
Out[34]: datetime.datetime(2014, 1, 18, 13, 0)
In [ 35 ]: print pydt