1346 Chapter 63
63.2.5 Problems with select() and poll()
The select() and poll() system calls are the portable, long-standing, and widely used
methods of monitoring multiple file descriptors for readiness. However, these
APIs suffer some problems when monitoring a large number of file descriptors:
z On each call to select() or poll(), the kernel must check all of the specified file
descriptors to see if they are ready. When monitoring a large number of
file descriptors that are in a densely packed range, the time required for this
operation greatly outweighs the time required for the next two operations.
z In each call to select() or poll(), the program must pass a data structure to the
kernel describing all of the file descriptors to be monitored, and, after checking
the descriptors, the kernel returns a modified version of this data structure to
the program. (Furthermore, for select(), we must initialize the data structure
before each call.) For poll(), the size of the data structure increases with the
number of file descriptors being monitored, and the task of copying it from
user to kernel space and back again consumes a noticeable amount of CPU
time when monitoring many file descriptors. For select(), the size of the data
structure is fixed by FD_SETSIZE, regardless of the number of file descriptors
being monitored.
z After the call to select() or poll(), the program must inspect every element of the
returned data structure to see which file descriptors are ready.
The consequence of the above points is that the CPU time required by select() and poll()
increases with the number of file descriptors being monitored (see Section 63.4.5 for
more details). This creates problems for programs that monitor large numbers of
file descriptors.
The poor scaling performance of select() and poll() stems from a simple limitation
of these APIs: typically, a program makes repeated calls to monitor the same set of file
descriptors; however, the kernel doesn’t remember the list of file descriptors to be
monitored between successive calls.
Signal-driven I/O and epoll, which we examine in the following sections, are
both mechanisms that allow the kernel to record a persistent list of file descriptors
in which a process is interested. Doing this eliminates the performance scaling
problems of select() and poll(), yielding solutions that scale according to the number
of I/O events that occur, rather than according to the number of file descriptors
being monitored. Consequently, signal-driven I/O and epoll provide superior per-
formance when monitoring large numbers of file descriptors.
63.3 Signal-Driven I/O
With I/O multiplexing, a process makes a system call (select() or poll()) in order to
check whether I/O is possible on a file descriptor. With signal-driven I/O, a process
requests that the kernel send it a signal when I/O is possible on a file descriptor.
The process can then perform any other activity until I/O is possible, at which time