>>> chars = list('Lorreta') # convert to characters list
>>> chars
['L', 'o', 'r', 'r', 'e', 't', 'a']
>>> chars.append('!')
>>> ''.join(chars) # to string: empty delimiter
'Lorreta!'
These calls turn out to be surprisingly powerful. For example, a line of data columns
separated by tabs can be parsed into its columns with a single split call; the more.py
script uses the splitlines variant shown earlier to split a string into a list of line strings.
In fact, we can emulate the replace call we saw earlier in this section with a split/join
combination:
>>> mystr = 'xxaaxxaa'
>>> 'SPAM'.join(mystr.split('aa')) # str.replace, the hard way!
'xxSPAMxxSPAM'
For future reference, also keep in mind that Python doesn’t automatically convert
strings to numbers, or vice versa; if you want to use one as you would use the other,
you must say so with manual conversions:
>>> int("42"), eval("42") # string to int conversions
(42, 42)
>>> str(42), repr(42) # int to string conversions
('42', '42')
>>> ("%d" % 42), '{:d}'.format(42) # via formatting expression, method
('42', '42')
>>> "42" + str(1), int("42") + 1 # concatenation, addition
('421', 43)
In the last command here, the first expression triggers string concatenation (since both
sides are strings), and the second invokes integer addition (because both objects are
numbers). Python doesn’t assume you meant one or the other and convert automati-
cally; as a rule of thumb, Python tries to avoid magic—and the temptation to guess—
whenever possible. String tools will be covered in more detail later in this book (in fact,
they get a full chapter in Part V), but be sure to also see the library manual for additional
string method tools.
Other String Concepts in Python 3.X: Unicode and bytes
Technically speaking, the Python 3.X string story is a bit richer than I’ve implied here.
What I’ve shown so far is the str object type—a sequence of characters (technically,
Unicode “code points” represented as Unicode “code units”) which represents both
ASCII and wider Unicode text, and handles encoding and decoding both manually on
request and automatically on file transfers. Strings are coded in quotes (e.g., 'abc'),
along with various syntax for coding non-ASCII text (e.g., '\xc4\xe8', '\u00c4\u00e8').
82 | Chapter 2: System Tools