234 Chapter 13
Correspondingly, for input, the kernel reads data from the disk and stores it in
a kernel buffer. Calls to read() fetch data from this buffer until it is exhausted, at
which point the kernel reads the next segment of the file into the buffer cache.
(This is a simplification; for sequential file access, the kernel typically performs
read-ahead to try to ensure that the next blocks of a file are read into the buffer
cache before the reading process requires them. We say a bit more about read-
ahead in Section 13.5.)
The aim of this design is to allow read() and write() to be fast, since they don’t
need to wait on a (slow) disk operation. This design is also efficient, since it reduces
the number of disk transfers that the kernel must perform.
The Linux kernel imposes no fixed upper limit on the size of the buffer cache.
The kernel will allocate as many buffer cache pages as are required, limited only by
the amount of available physical memory and the demands for physical memory
for other purposes (e.g., holding the text and data pages required by running pro-
cesses). If available memory is scarce, then the kernel flushes some modified buffer
cache pages to disk, in order to free those pages for reuse.
Speaking more precisely, from kernel 2.4 onward, Linux no longer maintains
a separate buffer cache. Instead, file I/O buffers are included in the page
cache, which, for example, also contains pages from memory-mapped files.
Nevertheless, in the discussion in the main text, we use the term buffer cache,
since that term is historically common on UNIX implementations.
Effect of buffer size on I/O system call performance
The kernel performs the same number of disk accesses, regardless of whether we
perform 1000 writes of a single byte or a single write of a 1000 bytes. However, the
latter is preferable, since it requires a single system call, while the former requires
- Although much faster than disk operations, system calls nevertheless take an
appreciable amount of time, since the kernel must trap the call, check the validity
of the system call arguments, and transfer data between user space and kernel
space (refer to Section 3.1 for further details).
The impact of performing file I/O using different buffer sizes can be seen by
running the program in Listing 4-1 (on page 71) with different BUF_SIZE values.
(The BUF_SIZE constant specifies how many bytes are transferred by each call to
read() and write().) Table 13-1 shows the time that this program requires to copy a
file of 100 million bytes on a Linux ext2 file system using different BUF_SIZE values.
Note the following points concerning the information in this table:
z The Elapsed and Total CPU time columns have the obvious meanings. The User
CPU and System CPU columns show a breakdown of the Total CPU time into,
respectively, the time spent executing code in user mode and the time spent
executing kernel code (i.e., system calls).
z The tests shown in the table were performed using a vanilla 2.6.30 kernel on an
ext2 file system with a block size of 4096 bytes.
When we talk about a vanilla kernel, we mean an unpatched mainline kernel.
This is in contrast to kernels that are supplied by most distributors, which
often include various patches to fix bugs or add features.