Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 16: Page and Buffer Cache


Implementation


The nodes of a radix tree are essentially represented by the following data structure:

<lib/radix_tree.c>
#define RADIX_TREE_TAGS 2
#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL? 4 : 6)
#define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT)
#define RADIX_TREE_TAG_LONGS \
((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)

struct radix_tree_node {
unsigned int height; /* Height from the bottom */
unsigned int count;
struct rcu_head rcu_head;
void *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_TAGS][RADIX_TREE_TAG_LONGS];
};

The layout of this data structure is also very simple.slotsis an array ofvoidpointers that — depending
on the level in which the node is located — point to either data elements or further nodes.countholds the
number of used array entries in the node. The array is filled with entries starting at the top, and unused
entries have null pointers.

Each tree node can point to 64 further nodes (or leaves) as indicated in the definition of theslotarray
inradix_tree_node. The direct consequence of this definition is that each node may have only an array
size that is a power of two. Also, the size ofallradix elements may only be defined at compilation time
(of course, the maximum number of elements in a tree can change at run time). This behavior is rewarded
by speed gains.

Tagging


The information discussed so far — the address space and the page tree — does not, however, allow the
kernel to make a direct distinction between the clean and dirty pages of a mapping. This distinction is
essential when, for example, pages are to be written back to store changes permanently on the underlying
block device. Earlier kernel versions provided additional lists of dirty and clean pages inaddress_space.
In principle, the kernel could, of course, scan the entire tree and filter out the pages with the appropriate
state, but this is obviously very time-consuming. For this reason, each node of the radix tree includes
additional tagging information that specifies whether each page in the node has the property specified
in the tag. For example, the kernel uses a tag to label nodes with dirty pages. Nodes without this tag
can therefore be skipped during a scan for dirty pages. This approach is a compromise between simple,
unified data structures (no explicit lists are needed to hold pages with different states) and the option of
performing a quick search for pages with specific properties. Currently, two tags are supported:


  1. PAGECACHE_TAG_DIRTYspecifies whether a page is dirty.

  2. PAGECACHE_TAG_WRITEBACKindicates that the page is being written back at the moment.

Free download pdf