Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 18: Page Reclaim and Swapping


The meanings of the elements are closely reflected in their variable names:


❑ nr_scannedreports to the caller how many inactive pages have been scanned and is used to
communicate between the various kernelfunctions involved in page reclaim.
❑ gfp_maskspecifies allocation flags that are valid for the context in which the reclaim function is
invoked. This is important because it is sometimes necessary to allocate fresh memory during
page reclaim. If the context from which reclaim is initiated is not allowed to sleep, this con-
straint must, of course, be forwarded to all functions called; this is precisely whatgfp_maskis
designed to do.
❑ may_writepageselects whether the kernel is allowed to write out pages to the backing store. Dis-
abling this opportunity is required sometimes when the kernel runs in laptop mode, as discussed
in Chapter 17.13.
❑ may_swapdecides if swapping is allowed as part of the page reclaim endeavors. Swapping is
only forbidden in two cases: if page reclaim runs on behalf of the software suspend mechanism,
and if a NUMA zone explicitly disables swapping. These possibilities are not considered any
further in this book.
❑ swap_cluster_maxis not actually related to swapping, but gives a threshold for the number
of pages per LRU list that are at least scanned in one page reclaim step. Usually, this is set to
SWAP_CLUSTER_MAX, defined to 32 per default.
❑ swappinesscontrols how aggressively the kernel tries to swap out pages; the value can range
between 0 and 100. Per default,vm_swappinessis used. The standard setting is 60, but this can
be tuned via/proc/sys/vm/swappiness. See the discussion in Section 18.6.6 for more details on
how this parameter is used.
❑ all_unreclaimableis used to report the unfortunate situation in which memory in all zones is
currently completely unreclaimable. This can happen, for example, if all pages are pinned by the
mlocksystem call.
❑ The kernel can actively try to reclaim page clusters of a given page order. Theorderdenotes that
2 ordercontiguous pages are supposed to be reclaimed.
Higher-order allocations consisting of more than a single page are complicated to reclaim espe-
cially when the system has been up and running for some time. The kernel uses thelumpy reclaim
trick — which could also well be called a dirty trick — to nevertheless satisfy such requests, as
discussed below.

Before discussing the page-reclaiming code, recall thatstruct zoneas introduced a long time ago in
Chapter 3.2.2 contains numerous fields that will be required in the following:


<mmzone.h>
struct zone {
...
unsigned long nr_scan_active;
unsigned long nr_scan_inactive;
unsigned long pages_scanned;
...
/* Zone statistics */
atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS];
...
}
Free download pdf