Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 13: System Calls


a negative value by adding a sign. For example, if an operation were not permitted, a handler routine
would executereturn -ENOPERMto yield the error code−1.

Let us examine theopensystem call with a particular focus on its return values (sys_openimplementation
is discussed in Chapter 8). What can go wrong when a file is opened? Not much, you would think. But
the kernel finds nine ways of causing problems. Forthe individual sources of error, see the standard
library documentation (and, of course, the kernel sources). The most frequent system call error codes are
as follows:

❑ EACCESindicates that a file cannot be processed in the desired access mode — for example, a file
cannot be opened for write access if the write bit is not set in its mode string.
❑ EEXISTis returned if an attempt is made to create a file that already exists.
❑ ENOENTmeans that the desired file does not exist, andthe flag to allow files that do not exist to be
created is not specified.

A positive number greater than zero is returned if the system call terminates successfully. As discussed
in Chapter 8, this is a file handle that is used to represent the file in all subsequent operations as well as
in the internal data structures of the kernel.

Linux uses thelongdata type to transfer results from kernel space to userspace; this is either 32 or 64
bits wide depending on processor type. One bit is used as the sign bit.^8 This causes no problems for most
system calls, such asopen. The positive values returned are usually so small that they fit into the range
provided bylong.

Unfortunately, the situation is more complicated when returning large numbers that occupy the full
unsigned longspace. This is the case withmallocandlongif memory addresses are allocated at the top
of virtual memory space. The kernel then interprets the returned pointer as anegativenumber because it
overruns the positive range ofsigned long; this would be reported as an error even though the system
call terminated successfully. How can the kernel prevent such mishaps?

As noted above, the symbolic constants for error codes that reach userspace extend only up to 511 — in
other words, error codes returned in the range from−1to−511. Consequently, alllowererror codes are
excluded and are interpreted correctly — as very high return values of successful system calls.

All that now needs to be done to successfully terminatethe system call is to switch back from kernel mode
to user mode. The result value is returned using a mechanism that functions similarly in the opposite
direction. The C function, in which the system call handler is implemented, usesreturnto place the
return code on the kernel stack. This value is copied into a specific processor register (eaxon IA-32
systems,a3on Alpha systems, etc.), where it is processed by the standard library and transferred to user
applications.

13.3.2 Access to Userspace


Even though the kernel does its best to keep kernel space and userspace separate, there are situations
in which kernel code has to access the virtual memory of user applications. Of course, this only makes
sense when the kernel is performing a synchronous action initiated by a user application — write and
read access by arbitrary processes not only serves no purpose, but may also produce risky results in the
code currently executing.

(^8) Of course, 2’s complement notation is used to prevent errors where there are two zeros with different signs. Seehttp://en
.wikipedia.org/wiki/Two%27s_complementfor more information about this format.

Free download pdf