Linux Kernel Architecture

Chapter 5: Locking and Interprocess Communication

The program sequence could be modified to prevent this error by changing the individual steps after data have been received. For example, processes could increment the value ofcounterimmediately after reading its value and before opening a file. However, closer examination of suggestions of this kind quickly lead to the conclusion that it is always possible to devise situations that result in a fatal error. If we look at our suggestion, it soon becomes clear that an inconsistency is generated if the scheduler is invoked between readingcounterand incrementing its value.

Situations in which several processes interfere with each other when accessing resources are generally referred to asrace conditions. Such conditions are a central problem in the programming of distributed applications because they cannot usually be detectedby systematic trial and error. Instead, a thorough study of source code (coupled with intimate knowledge of the various paths that code can take) and a generous supply of intuition are needed to find and eliminate them.

Situations leading to race conditions are few and far between, thus begging the question as to whether it’s worth making the — sometimes very considerable — effort to protect code against their occurrence.

In some environments (electronic aircraft control, monitoring of vital machinery, or dangerous equip- ment), race conditions may prove to be fatal in the literal sense of the word. But even in routine software projects, protection against potential race conditions is an important contribution to program quality and user satisfaction. As part of improved multiprocessor support in the Linux kernel, much effort has been invested in pinpointing areas where dangers lurk and in providing suitable protection. Unexpected system crashes and mysterious errors owing to lack of protection are simply unacceptable.

5.1.2 Critical Sections

The essence of the problem is as follows: Processes are interrupted at points where they shouldn’t be if they are to do their work correctly. Obviously, a potential solution is to mark the relevant code section so that it can no longer be interrupted by the scheduler. Although this approach would work in principle, there are several inherent problems. Under certain circumstances, a faulty program would be unable to find its way out of the marked code section and would fail to relinquish the CPU, thus making the computer unusable. We must therefore reject this solution out of hand.^1

The solution to the problem does not necessarily require that the critical section not be interrupted. Processes may well be interrupted in a critical sectionas long as no other process enters the section.This strict prohibition ensures that values cannot be changed by several processes at the same time and is referred to asmutual exclusion. Only one process may enter a critical area of code at a given time.

There are many ways of designing a mutual exclusion method of this kind (regardless of the technical implementation). However, all must ensure that theexclusion principle functions under all circumstances. It must depend neither on thenumbernor on thespeedof the processors involved. If this were the case (and the solution were therefore only available on a given computer system with a specific hardware configuration), the solution would be impracticable because it would not provide general protection — and this is exactly what is needed. Processes should not be allowed to block each other and come to a permanent stop. Although this is a desirable goal, it cannot always be achieved by technical means as you will see below. It is often up to the programmer to think ahead so that the problem does not occur.

Which principles are applied to support a mutual exclusion method? A multitude of different solu- tions have been proposed in the history of multitasking and multiuser systems, all with their specific

(^1) The kernel itself can (and must) reserve the right to disable interrupts at certain points to seal itself off completely from external or
periodic events. This is not, however, possible for user processes.

Linux Kernel Architecture

Chapter 5: Locking and Interprocess Communication

Get our desktop app

Company

Features

Documentation

Resources