Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 5: Locking and Interprocess Communication


The program sequence could be modified to prevent this error by changing the individual steps after
data have been received. For example, processes could increment the value ofcounterimmediately after
reading its value and before opening a file. However, closer examination of suggestions of this kind
quickly lead to the conclusion that it is always possible to devise situations that result in a fatal error.
If we look at our suggestion, it soon becomes clear that an inconsistency is generated if the scheduler is
invoked between readingcounterand incrementing its value.

Situations in which several processes interfere with each other when accessing resources are generally
referred to asrace conditions. Such conditions are a central problem in the programming of distributed
applications because they cannot usually be detectedby systematic trial and error. Instead, a thorough
study of source code (coupled with intimate knowledge of the various paths that code can take) and a
generous supply of intuition are needed to find and eliminate them.

Situations leading to race conditions are few and far between, thus begging the question as to whether
it’s worth making the — sometimes very considerable — effort to protect code against their occurrence.

In some environments (electronic aircraft control, monitoring of vital machinery, or dangerous equip-
ment), race conditions may prove to be fatal in the literal sense of the word. But even in routine software
projects, protection against potential race conditions is an important contribution to program quality
and user satisfaction. As part of improved multiprocessor support in the Linux kernel, much effort has
been invested in pinpointing areas where dangers lurk and in providing suitable protection. Unexpected
system crashes and mysterious errors owing to lack of protection are simply unacceptable.

5.1.2 Critical Sections


The essence of the problem is as follows: Processes are interrupted at points where they shouldn’t be if
they are to do their work correctly. Obviously, a potential solution is to mark the relevant code section so
that it can no longer be interrupted by the scheduler. Although this approach would work in principle,
there are several inherent problems. Under certain circumstances, a faulty program would be unable
to find its way out of the marked code section and would fail to relinquish the CPU, thus making the
computer unusable. We must therefore reject this solution out of hand.^1

The solution to the problem does not necessarily require that the critical section not be interrupted.
Processes may well be interrupted in a critical sectionas long as no other process enters the section.This
strict prohibition ensures that values cannot be changed by several processes at the same time and is
referred to asmutual exclusion. Only one process may enter a critical area of code at a given time.

There are many ways of designing a mutual exclusion method of this kind (regardless of the technical
implementation). However, all must ensure that theexclusion principle functions under all circum-
stances. It must depend neither on thenumbernor on thespeedof the processors involved. If this were the
case (and the solution were therefore only available on a given computer system with a specific hardware
configuration), the solution would be impracticable because it would not provide general protection —
and this is exactly what is needed. Processes should not be allowed to block each other and come to a
permanent stop. Although this is a desirable goal, it cannot always be achieved by technical means as
you will see below. It is often up to the programmer to think ahead so that the problem does not occur.

Which principles are applied to support a mutual exclusion method? A multitude of different solu-
tions have been proposed in the history of multitasking and multiuser systems, all with their specific

(^1) The kernel itself can (and must) reserve the right to disable interrupts at certain points to seal itself off completely from external or
periodic events. This is not, however, possible for user processes.

Free download pdf