Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 16: Page and Buffer Cache


The structure does not correspond to that of the binary or ternary search trees in general use. Neither
are radix trees balanced; in other words, there maybe any number of height differences between the
branches of the tree. The tree itself consists of two different data structures and a further data structure to
represent the leaves and hold the useful data. Because memory pages must be organized, the leaves are
instances of thepagestructure in this case, a fact that is of no further importance in the implementation
of the tree. (The kernel sources do not define a particular data type but use avoidpointer; this means
that radix trees could also be used for other purposes, although this is not done at present.)

The root of the tree is represented by a simple data structure that holds the height of the tree (the maxi-
mum number of levels to accommodate the nodes) anda pointer to the first node data structure of which
thetreeiscomprised.

The nodes are basically arrays. For the sake of simplicity, the nodes are shown with four elements
in the figure, but in the kernel sources, they actually have 2RADIX_TREE_MAP_SHIFTentries. Since
RADIX_TREE_MAP_SHIFTis typically defined as 6, each array has 64 elements — considerably more
than are shown in the figure. Small systems use aRADIX_TREE_MAP_SHIFTsetting of 4 to save precious
memory.

The elements of the tree are addressed by means ofa unique key consisting of a simple integer. The
details of the algorithm used to find elements by reference to their key are not discussed here. A descrip-
tion of the relevant code is given in Appendix C.

Enlarging the tree and deleting tree elements are kernel operations that require little effort, so mini-
mum time is lost in performing cache management operations. Their implementation is also described in
greater detail in Appendix C.

Observe from Figure 16-1 that the tree is equipped with twosearch tags. They allow for specifying if a
given page is dirty (i.e., the page contents are not identical with the data in the backing store) or if it is
currently being written back to the underlying block device. It is important that the tags are not only set
in the leaf elements, but also all the way up to the root element. If at least one pointer in leveln+1hasa
tag set, then the pointer on levelnwill also acquire the tag.

This allows the kernel to decide that one or more pages in a range have a tag bit set. The figure provides
an example: Since the dirty tag bit on the leftmost pointer in the first level is set, the kernel knows that
one or more of the pages associated with the corresponding second-level node have the dirty tag bit set.
If, on the other hand, a tag isnotset for a pointer in the higher levels, then the kernel can be sure that
none of the pages in the lower levels has the tag.

Recall from Chapter 3 that each page as represented by an instance ofstruct pageis equipped with a
set of flags. These also include dirty and writeback flags. The information in the radix tree tags therefore
only augments kernel knowledge. Page cache tags are useful to quickly determine if at least one page in
a region is dirty or under writeback without scanning all pages in the whole region. They are, however,
no replacement for the direct page flags.

16.1.2 Writing Back Modified Data


Thanks to the page cache, write operations are not performed directly on the underlying block device
but are carried out in memory where the modified data are first collected for subsequent transfer
to the lower kernel layer, where the write operations can be further optimized — as discussed in
Chapter 6 — to fully exploit the specific capabilities of the individual devices. Here we are interested
Free download pdf