Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 13: System Calls


The processing of system calls is, of course, a classic situation in which the kernel is busy with the syn-
chronous execution of a task assigned to it by an application. There are two reasons why the kernel has
to access the address space of user applications:

❑ If a system call requires more than six different arguments, they can be passed only with the help
of C structures that reside in process memory space. A pointer to the structures is passed to the
system call by means of registers.
❑ Larger amounts of data generated as a side effect of a system call cannot be passed to the user
process using the normal return mechanism. Instead, the data must be exchanged in defined
memory areas. These must, of course, be located in userspace so that the user application is able
to access them.

When the kernel accesses its own memory area, it can always be sure that there is a mapping between
the virtual address and a physical memory page. The situation in userspace is different, as described in
Chapter 3. Here, pages might be swapped out or not even be allocated.

The kernel may not therefore simply de-reference userspace pointers, but also must employ specific
functions to ensure that the desired area resides inRAM. To make sure that the kernel complies with this
convention, userspace pointers are labeled with the__userattribute to support automated checking by
C check tools.^9

Chapter 3 discusses the functions used to copy data between userspace and kernel space. In most cases,
these will becopy_to_userandcopy_from_user, but more variants are available.

13.3.3 System Call Tracing


Thestracetool developed to trace the system calls of processes using theptracesystem call is described
in Section 13.1.1.

Implementation of thesys_ptracehandler routine is architecture-specific and is defined in
arch/arch/kernel/ptrace.c. Fortunately, there are only minor differences between the code of the
individual versions. I therefore provide a generalized description of how the routine works without
going into architecture-specific details.

Before examining the flow of the system call in detail, it should be noted that this call is needed because
ptrace— essentially a tool for reading and modifying values in process address space — cannot be used
directly to trace system calls. Only by extracting the desired information at the right places can trace
processes draw conclusions on which system calls have been made. Even debuggers such asgdbare
totally reliant onptracefor their implementation.ptraceoffers more options than are really needed to
simply trace system calls.

ptracerequires four arguments as the definition in the kernel sources shows^10 :

<syscalls.h>
asmlinkage long sys_ptrace(long request, long pid, long addr, long data);

(^9) Linus Torvalds himself designed thesparsetool to find direct userspace pointer de-referencings in the kernel.
(^10) <syscalls.h>contains the prototypes for all architecture-independent system calls whose arguments are identical on all
architectures.

Free download pdf