Linux Kernel Architecture

Chapter 18: Page Reclaim and Swapping

Recall thatisolate_lru_pagesalso picks pages adjacent to the page frame of a page on the free list if lumpy reclaim is used. If the allocation order of the request that led to the current reclaim pass is larger than the threshold order specified inPAGE_ALLOC_COSTLY_ORDER, lumpy reclaim is allowed to use both active and inactive pages when picking pages surrounding the tag page. For small allocation orders, only inactive pages may be used. The reason behind this is that larger allocations usually cannot be satisfied if the kernel is restricted to inactive pages — the chance that an active page is contained in large intervals is simply too big on a busy kernel.PAGE_ALLOC_COSTLY_ORDERis per default set to 3, which means that the kernel considers allocations of 8 and more continuous pages as complicated.

Although all pages on the inactive list are guaranteed to be inactive, lumpy reclaim can lead to active pages on the result list ofisolate_lru_pages. To account these pages properly, the auxiliary function clear_active_flagsiterates over all pages, counts the active ones, and clears the page flagPG_active from any of them. Finally, the page list can be pushed onward toshrink_page_listfor writeout. Notice that the asynchronous mode is employed.

Notice that it is not certain that all pages selected for reclaim can actually be reclaimed. shrink_page_listleaves such pages on the passed list and returns the number of pages for which it succeeded to initiate writeout. This figure must be added to the total number of swapped-out pages to determine when work may be terminated.

Direct reclaim requires one more step:

mm/vmscan.c if (nr_freed < nr_taken && !current_is_kswapd() && sc->order > PAGE_ALLOC_COSTLY_ORDER) { congestion_wait(WRITE, HZ/10); ... nr_freed += shrink_page_list(&page_list, sc, PAGEOUT_IO_SYNC); }

If not all pages that were supposed to be reclaimed could have been reclaimed, that is, ifnr_freed < nr_taken, some pages on the list have been locked and could not be written out in asynchronous mode.^13 If the kernel is performing the current reclaim pass in direct reclaim mode, that is, was not called from the swapping daemonkswapd, and reclaims to fulfill a high-order allocation, then it first waits for any congestion on the block devices to settle. Afterward,another writeout pass is performed in synchronous mode. This has the drawback that higher-order allocations are somewhat delayed, but since they do not happen so often, this is not an issue. Allocations smaller thanPAGE_ALLOC_COSTLY_ORDERthat arise much more frequently are not disturbed.

Finally, the non-reclaimable pages must be returned to the LRU lists. Lumpy reclaim and failed writeout attempts might have led to active pages on the local list, so both the active and the inactive LRU lists are possible destinations. To preserve the LRU order, the kernel iterates over the local list from tail to head. Depending on whether the page is active or not, it is returned to the start of the appropriate LRU list using eitheradd_page_to_active_listoradd_page_to_inactive_list. Once again, the usage counter of each page must be decremented by 1 because it was incremented accordingly at the start of the procedure. The now familiar page vectors are used to ensure that this is done as quickly as possible because they perform processing block-by-block.

(^13) There can also be other reasons for this, for instance, a failed writeout, but the reason mentioned is the essential cause.

Linux Kernel Architecture

Chapter 18: Page Reclaim and Swapping

Get our desktop app

Company

Features

Documentation

Resources