Chapter 2: Process Management and Scheduling
Suppose that the virtual time of the run queue is given byfair_clock, while the waiting time of a process
is stored inwait_runtime. To sort tasks on the red-black tree, the kernel uses the differencefair_clock -
wait_runtime. Whilefair_clockis a measure for the CPU time a task would have gotten if scheduling
were completely fair,wait_runtimeis a direct measure for the unfairness caused by the imperfection of
real systems.
When a task is allowed to run, the interval during which it has been running is subtracted from
wait_runtime. This way, it will move rightward in the time-ordered tree at some point, and another
process will be the leftmost one — and is consequentlyselected to run. Notice, however, that the virtual
clock infair_clockwill increase when the task is running. This effectively means that the share of
CPU time that the task would have received in a perfectly fair system is deducted from the time spent
executing on the real CPU. This slows degradation of unfairness: Decrementingwait_runtimeis
equivalent to lowering the amount of unfairness received by the task, but the kernel must not forget
that some portion of the time used to lower the unfairness would have belonged to the process in a
completely fair world anyway. Suppose again that four processes sit on the run queue, and that a process
has been waiting for 20 real seconds. Now it is allowed to run for 10 seconds:wait_runtimeis afterward
10, but since the process would have gotten 10/ 4 =2 seconds of this time span anyway, effectively only
8 time units account for the potentially new position on the run queue.
Unfortunately, this strategy is complicated by a number of real-world issues:
❑ Different priority levels for tasks (i.e.,nicevalues) must be taken into account, and more impor-
tant processes must get a higher share of CPU time than less important ones.
❑ Tasks must not be switched too often because a context switch, that is, changing from one task to
another, has a certain overhead. When switching happens too often, too much time is spent with
exchanging tasks that is not available for effective work anymore.
On the other hand, the time that goes by between task switches must not be too long because
large unfairness values could accumulate in this case. Letting tasks run for too long can also lead
to larger latencies than desired for multimedia systems.
We will see how the scheduler tackles these problems in the following discussion.
A good way to understand scheduling decisions is to activate scheduler statistics at compile time. This
will generate the file/proc/sched_debug, which contains information on all aspects of the current state
of the scheduler.
Finally, note that theDocumentation/directory contains some files that relate to various aspects of the
scheduler. Keep in mind, however, that some of them still relate to the oldO(1) scheduler and are there-
fore outdated!
2.5.2 Data Structures
The scheduler uses a series of data structures to sort and manage the processes in the system. How the
scheduler works is closely linked with the design of these structures. Several components interact with
each other in many ways, and Figure 2-13 provides a first overview of the connections.
Scheduling can be activated in two ways: either directly if a task goes to sleep or wants to yield the
CPU for other reasons, or by a periodic mechanism that is run with constant frequency and that checks
from time to time if switching tasks is necessary. I denote these two componentsgeneric schedulerorcore