1326 Chapter 63
Disk files are a special case. As described in Chapter 13, the kernel employs the
buffer cache to speed disk I/O requests. Thus, a write() to a disk returns as
soon as the requested data has been transferred to the kernel buffer cache,
rather than waiting until the data is written to disk (unless the O_SYNC flag was
specified when opening the file). Correspondingly, a read() transfers data from
the buffer cache to a user buffer, and if the required data is not in the buffer
cache, then the kernel puts the process to sleep while a disk read is performed.
The traditional blocking I/O model is sufficient for many applications, but not all.
In particular, some applications need to able to do one or both of the following:
z Check whether I/O is possible on a file descriptor without blocking if it is not
possible.
z Monitor multiple file descriptors to see if I/O is possible on any of them.
We have already encountered two techniques that can be used to partially address
these needs: nonblocking I/O and the use of multiple processes or threads.
We described nonblocking I/O in some detail in Sections 5.9 and 44.9. If we
place a file descriptor in nonblocking mode by enabling the O_NONBLOCK open file sta-
tus flag, then an I/O system call that can’t be immediately completed returns an
error instead of blocking. Nonblocking I/O can be employed with pipes, FIFOs,
sockets, terminals, pseudoterminals, and some other types of devices.
Nonblocking I/O allows us to periodically check (“poll”) whether I/O is possible
on a file descriptor. For example, we can make an input file descriptor nonblocking,
and then periodically perform nonblocking reads. If we need to monitor multiple file
descriptors, then we mark them all nonblocking, and poll each of them in turn.
However, polling in this manner is usually undesirable. If polling is done only infre-
quently, then the latency before an application responds to an I/O event may be
unacceptably long; on the other hand, polling in a tight loop wastes CPU time.
In this chapter, we use the word poll in two distinct ways. One of these is as the
name of the I/O multiplexing system call, poll(). In the other use, we mean
“performing a nonblocking check on the status of a file descriptor.”
If we don’t want a process to block when performing I/O on a file descriptor, we
can instead create a new process to perform the I/O. The parent process can then
carry on to perform other tasks, while the child process blocks until the I/O is
complete. If we need to handle I/O on multiple file descriptors, we can create one
child for each descriptor. The problems with this approach are expense and com-
plexity. Creating and maintaining processes places a load on the system, and, typi-
cally, the child processes will need to use some form of IPC to inform the parent
about the status of I/O operations.
Using multiple threads instead of processes is less demanding of resources, but
the threads will probably still need to communicate information to one another
about the status of I/O operations, and the programming can be complex, especially
if we are using thread pools to minimize the number of threads used to handle
large numbers of simultaneous clients. (One place where threads can be particularly
useful is if the application needs to call a third-party library that performs blocking
I/O. An application can avoid blocking in this case by making the library call in a
separate thread.)