Linux Kernel Architecture

Chapter 16: Page and Buffer Cache

Reading pages into the page cache before they are actually required is simple from a technical point of
view and can easily be achieved with the framework introduced so far in this chapter. The challenge lies
in predicting the optimal size of the readahead window. For this purpose, the kernel keeps track of the
last setting for each file. The following data structure is associated with everyfileinstance:

<fs.h> struct file_ra_state { pgoff_t start; /* where readahead started */ unsigned int size; /* # of readahead pages */ unsigned int async_size; /* do asynchronous readahead when there are only # of pages ahead */ unsigned int ra_pages; /* Maximum readahead window */ ... loff_t prev_pos; /* Cache last read() position */ };

startdenotes the position in the page cache where readahead was started, andsizegives the size of
the readahead window.async_sizerepresents the least number of remaining readahead pages. If only
this many pages are still available in the readahead window, then asynchronous readahead is initiated to
bring more pages into the page cache. The meaning of these values is also illustrated in Figure 16-4.

ra_pagesdenotes the maximum size of the readahead window. The kernel can decide to read in fewer
pages than specified by this value, butit will never read in more. Finally,prev_posdenotes the position
that was last visited in previous reads.

The offset is given as a byte offset into the file,notas a page offset into the page cache! This allows filesystem code that does not know anything about page cache offsets to aid the readahead mechanism.

The most important providers of this value are, however,do_generic_mapping_readand
filemap_fault.

The routineondemand_readaheadis responsible to implement readahead policy, that is, decide
how many pages will be read in before they are actually required. As Figure 16-5 shows, both
page_cache_sync_readaheadandpage_cache_async_readaheadrely on this function. After deciding
on the size of the readahead window,ra_submitis called to delegate the technical aspects to
__do_page_cache_readahead. Here pages are allocated in the page cache and subsequently filled from
the block layer.

Before discussingondemand_readahead, two helper functions must be introduced:get_init_ra_size
determines the initial readahead window size for a file, andget_next_ra_sizecomputes the window for
subsequent reads, that is, when a previous readahead window exists.get_init_ra_sizedetermines the
window size based on the number of pages requested from the process, andget_next_ra_sizebases
the computation on the size of the previous readahead window. Both functions ensure that the size of
the readahead window does not exceed a file-specific upper limit. While the limit can be modified with
thefadvisesystem call, it is usually set toVM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE, which equates
to 32 pages on systems with a page size of 4 KiB. The results of both functions are shown in Figure 16-6.
The graph shows how the size of the initial readahead scales with request size, and also demonstrates
how the size of subsequent readahead operations scales depending on the size of the previous readahead

Linux Kernel Architecture

Chapter 16: Page and Buffer Cache

Get our desktop app

Company

Features

Documentation

Resources