Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 18: Page Reclaim and Swapping


if (order < new_order) {
/*
* Don’t sleep if someone wants a larger ’order’
* allocation
*/
order = new_order;
} else {
schedule();
order = pgdat->kswapd_max_order;
}
finish_wait(&pgdat->kswapd_wait, &wait);
...
balance_pgdat(pgdat, 0, order);
}
return 0;
}

❑ prepare_waitplaces the task on a NUMA-zone-specific wait queue that is passed as parameter
to the daemon.
❑ The function keeps a record of the last allocation order for which node balancing was per-
formed. If the allocation order specified inkswapd_max_orderis greater than the last value,
balance_pgdatis invoked to rebalance the node (I discuss this shortly). Otherwise, the kernel
transfers control to another function or to userspace by means ofschedule.
If the kernel thinks it necessary to invoke the daemon out of sequence, it does so by means of
wake_up_interruptible.
As described in Chapter 14,finish_waitperforms the necessary clean-up work after the task
has been woken.
❑ Following wakeup and afterschedule, the kernel first rebalances the node and then the process
starts afresh. If the current allocation order is greater than that for which balancing was last per-
formed,balance_pgdatis invoked again with the larger parameter; otherwise the daemon goes
to sleep.

Figure 18-21 shows the code flow diagram for thebalance_pgdatfunction defined inmm/vmscan.c.In
this function, the kernel decides how many memory pages are to be freed and forwards this information
to theshrink_zonefunction discussed above.

Determine parameters

Iterate over all zones of the node

Iterate over decreasing priorities

Stop scanning

Highest priority?

balance_pgdat
disable_swap_token

shrink_zone

shrink_slab

Possible congestion? congestion_wait

All zones ok?

Figure 18-21: Code flow diagram forbalance_pgdat.
Free download pdf