Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 2: Process Management and Scheduling


The structure can contain many more statistical elements if support for scheduler statistics has been
compiled into the kernel, and also has some more elements if group scheduling is enabled. The part
that is interesting for us right now, however, boils down to what you see above. The meaning of the
individual elements is as follows:

❑ loadspecifies a weight for each entity that contributes to the total load of the queue. Comput-
ing the load weight is an important task of the scheduler because the speed of the virtual clock
required for CFS will ultimately depend on it, so I discuss the method in detail in Section 2.5.3.
❑ run_nodeis a standard tree element that allows the entity to be sorted on a red-black tree.
❑ on_rqdenotes whether the entity is currently scheduled on a run queue or not.
❑ When a process is running, the consumed CPU time needs to be recorded for the completely
fair scheduler.sum_exec_runtimeis used for this purpose. Tracking the run time is done
cumulatively, inupdate_curr. The function is called from numerous places in the scheduler, for
instance, when a new task is enqueued, or from the periodic tick. At each invocation, the
difference between the current time andexec_startis computed, andexec_startis updated to
the current time. The difference interval is added tosum_exec_runtime.
The amount of time that has elapsed on the virtual clock during process execution is accounted
invruntime.
❑ When a process is taken off the CPU, its currentsum_exec_runtimevalue is preserved in
prev_exec_runtime. The data will later be required in the context of process preemption.
Notice, however, that preserving the value ofsum_exec_runtimeinprev_exec_runtime
doesnotmean thatsum_exec_runtimeis reset! The old value is kept, andsum_exec_runtime
continues to grow monotonically.

Since eachtask_structhas an instance ofsched_entityembedded, a task is a schedulable entity.
Notice, however, that the inverse statement is not true in general: A schedulable entity need not nec-
essarily be a task. However in the following we are concerned only with task scheduling, so for now we
can equate scheduling entities and tasks. Keep in mind that this is not true in general, though!

2.5.3 Dealing with Priorities


Priorities are deceptively simple from the userspace point of view: After all, they seem to be just a range
of numbers. The in-kernel reality is unfortunately somewhat different, and comparatively much effort is
required to work with priorities.

Kernel Representationof Priorities


The static priority of a process can be set in userspace by means of thenicecommand, which internally
invokes thenicesystem call.^25 Thenicevalueofaprocessisbetween−20 and+19 (inclusive). Lower
values mean higher priorities. Why this strange range was chosen is shrouded in history.

The kernel uses a simpler scale ranging from 0 to 139 inclusive to represent priorities internally. Again,
lower values mean higher priorities. The range from 0 to 99 is reserved for real-time processes. The nice
values [−20,+19] are mapped to the range from 100 to 139, as shown in Figure 2-14. Real-time processes
thus always have a higher priority than normal processes can ever have.

(^25) setpriorityis an alternative system call for setting process priority. It is able to modify not only the priority of an individual
thread but also the priorities of all threads in a thread group or of all processes of a specific user, selected by means of the UID.

Free download pdf