Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 2: Process Management and Scheduling


process. As you see in Chapter 4, the actual implementation of the page fault handler is more complicated
because other aspects, such as swapped-out pages, must also be taken into account.

The condition in which a page table entry indicates that a page is ‘‘Read Only’’ although normally it
would be writable allows the kernel to recognize that the page is, in fact, a COW page. It therefore creates
a copy of the page that is assigned exclusively to the process — and may therefore also be used for write
operations. How the copy operation is implemented is not discussed until Chapter 4 because extensive
background knowledge of memory management is required.

The COW mechanism enables the kernel to delay copying of memory pages for as long as possible
and — more importantly — to make copying unnecessary in many cases. This saves a great deal of time.

ExecutingSystem Calls


The entry points for thefork,vfork,andclonesystem calls are thesys_fork,sys_vfork,andsys_clone
functions. Their definitions are architecture-dependent because the way in which parameters are passed
between userspace and kernel space differs on the various architectures (see Chapter 13 for further infor-
mation). The task of the above functions is to extract the information supplied by userspace from the
registers of the processors and then to invoke the architecture-independentdo_forkfunction responsible
for process duplication. The prototype of the function is as follows.

kernel/fork.c
long do_fork(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr)

The function requires the following arguments:

❑ Aflagset(clone_flags) to specify duplication properties. The low byte specifies the signal num-
ber to be sent to the parent process when the child process terminates. The higher bytes hold
various constants discussed below.
❑ The start address of the user mode stack (start_stack)tobeused.
❑ A pointer to the register set holding the call parameters in raw form (regs). The data type used
is the architecture-specificstruct pt_regsstructure, which holds all registers in the order in
which they are saved on the kernel stack when a system call is executed (more information is
provided in Appendix A).
❑ The size of the user mode stack (stack_size). This parameter is usually unnecessary and
set to 0.
❑ Two pointers to addresses in userspace (parent_tidptrandchild_tidptr) that hold the TIDs
of the parent and child processes. They are needed for the thread implementation of the NPTL
(Native Posix Threads Lilbrary) library. I discuss their meaning below.

The differentforkvariants are distinguished primarily by means of the flag set. On most architectures,^10
the classicalforkcall is implemented in the same way as on IA-32 processors.

(^10) Exception: Sparc(64) systems that accessdo_forkviasparc_do_fork. IA-64 kernels only provide a single system call,
sys_clone2, which is used to implementfork,vfork,andclonein userspace. Bothsys_clone2andsparc_do_fork
eventually rely ondo_fork.

Free download pdf