[Python编程(第4版)].(Programming.Python.4th.Edition).Mark.Lutz.文字版

(yzsuai) #1

Technically, because filenames may contain arbitrary text, the os.listdir works in two
modes in 3.X: given a bytes argument, this function will return filenames as encoded
byte strings; given a normal str string argument, it instead returns filenames as Unicode
strings, decoded per the filesystem’s encoding scheme:


C:\...\PP4E\System\Filetools> python
>>> import os
>>> os.listdir('.')[:4]
['bigext-tree.py', 'bigpy-dir.py', 'bigpy-path.py', 'bigpy-tree.py']

>>> os.listdir(b'.')[:4]
[b'bigext-tree.py', b'bigpy-dir.py', b'bigpy-path.py', b'bigpy-tree.py']

The byte string version can be used if undecodable file names may be present. Because
os.walk and glob.glob both work by calling os.listdir internally, they inherit this
behavior by proxy. The os.walk tree walker, for example, calls os.listdir at each di-
rectory level; passing byte string arguments suppresses decoding and returns byte string
results:


>>> for (dir, subs, files) in os.walk('..'): print(dir)
...
..
..\Environment
..\Filetools
..\Processes

>>> for (dir, subs, files) in os.walk(b'..'): print(dir)
...
b'..'
b'..\\Environment'
b'..\\Filetools'
b'..\\Processes'

The glob.glob tool similarly calls os.listdir internally before applying name patterns,
and so also returns undecoded byte string names for byte string arguments:


>>> glob.glob('.\*')[:3]
['.\\bigext-out.txt', '.\\bigext-tree.py', '.\\bigpy-dir.py']
>>>
>>> glob.glob(b'.\*')[:3]
[b'.\\bigext-out.txt', b'.\\bigext-tree.py', b'.\\bigpy-dir.py']

Given a normal string name (as a command-line argument, for example), you can force
the issue by converting to byte strings with manual encoding to suppress decoding:


>>> name = '.'
>>> os.listdir(name.encode())[:4]
[b'bigext-out.txt', b'bigext-tree.py', b'bigpy-dir.py', b'bigpy-path.py']

The upshot is that if your directories may contain names which cannot be decoded
according to the underlying platform’s Unicode encoding scheme, you may need to
pass byte strings to these tools to avoid Unicode encoding errors. You’ll get byte strings
back, which may be less readable if printed, but you’ll avoid errors while traversing
directories and files.


Directory Tools | 173
Free download pdf