Linux Kernel Architecture

Chapter 16: Page and Buffer Cache

The structure does not correspond to that of the binary or ternary search trees in general use. Neither are radix trees balanced; in other words, there maybe any number of height differences between the branches of the tree. The tree itself consists of two different data structures and a further data structure to represent the leaves and hold the useful data. Because memory pages must be organized, the leaves are instances of thepagestructure in this case, a fact that is of no further importance in the implementation of the tree. (The kernel sources do not define a particular data type but use avoidpointer; this means that radix trees could also be used for other purposes, although this is not done at present.)

The root of the tree is represented by a simple data structure that holds the height of the tree (the maxi- mum number of levels to accommodate the nodes) anda pointer to the first node data structure of which thetreeiscomprised.

The nodes are basically arrays. For the sake of simplicity, the nodes are shown with four elements in the figure, but in the kernel sources, they actually have 2RADIX_TREE_MAP_SHIFTentries. Since RADIX_TREE_MAP_SHIFTis typically defined as 6, each array has 64 elements — considerably more than are shown in the figure. Small systems use aRADIX_TREE_MAP_SHIFTsetting of 4 to save precious memory.

The elements of the tree are addressed by means ofa unique key consisting of a simple integer. The details of the algorithm used to find elements by reference to their key are not discussed here. A descrip- tion of the relevant code is given in Appendix C.

Enlarging the tree and deleting tree elements are kernel operations that require little effort, so mini- mum time is lost in performing cache management operations. Their implementation is also described in greater detail in Appendix C.

Observe from Figure 16-1 that the tree is equipped with twosearch tags. They allow for specifying if a given page is dirty (i.e., the page contents are not identical with the data in the backing store) or if it is currently being written back to the underlying block device. It is important that the tags are not only set in the leaf elements, but also all the way up to the root element. If at least one pointer in leveln+1hasa tag set, then the pointer on levelnwill also acquire the tag.

This allows the kernel to decide that one or more pages in a range have a tag bit set. The figure provides an example: Since the dirty tag bit on the leftmost pointer in the first level is set, the kernel knows that one or more of the pages associated with the corresponding second-level node have the dirty tag bit set. If, on the other hand, a tag isnotset for a pointer in the higher levels, then the kernel can be sure that none of the pages in the lower levels has the tag.

Recall from Chapter 3 that each page as represented by an instance ofstruct pageis equipped with a set of flags. These also include dirty and writeback flags. The information in the radix tree tags therefore only augments kernel knowledge. Page cache tags are useful to quickly determine if at least one page in a region is dirty or under writeback without scanning all pages in the whole region. They are, however, no replacement for the direct page flags.

16.1.2 Writing Back Modified Data

Thanks to the page cache, write operations are not performed directly on the underlying block device but are carried out in memory where the modified data are first collected for subsequent transfer to the lower kernel layer, where the write operations can be further optimized — as discussed in Chapter 6 — to fully exploit the specific capabilities of the individual devices. Here we are interested

Linux Kernel Architecture

Chapter 16: Page and Buffer Cache

Get our desktop app

Company

Features

Documentation

Resources