Linux Kernel Architecture

Chapter 2: Process Management and Scheduling

Let us return tocopy_process.Afterdup_task_structhas succeeded, the kernel checks if the maximam number of processes allowed for a particular user are exceeded with the creation of the new task:

kernel/fork.c if (atomic_read(&p->user->processes) >= p->signal->rlim[RLIMIT_NPROC].rlim_cur) { if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) && p->user != current->nsproxy->user_ns->root_user) goto bad_fork_free; } ...

The per-user resource counters for the user owning the current process are kept in an instance of user_structthat is accessible viatask_struct->user, and the number of processes currently held by a particular user is stored inuser_struct->processes. If this value exceeds the limit set by rlimit, task creation is aborted — unless the current user is assigned special capabilities (CAP_SYS_ADMINor CAP_SYS_RESOURCE) or is the root user. Checking for the root user is interesting: Recall from above that each PID namespace has its own root user. This must now be taken into account in the above check.

If resource limits do not prevent process creation, the interface functionsched_forkis called to give the scheduler a chance to set up things for the newtask. Before the introduction of the CFQ scheduler in kernel 2.6.23, this was more complicated because the remaining time slice of the parent had to be distributed between parent and child. Since the new scheduler does not require time slices anymore, things are a lot simpler now. Essentially, the routines initialize statistical fields and on multi-processor systems probably re-balance the available processes between the CPUs if this is necessary. Besides, the task state is set toTASK_RUNNING— which is not really true since the new process is, in fact, not yet running. However, this prevents any other part of the kernel from trying to change the process state from non-running to running and scheduling the new process before its setup has been completely finished.

Alargenumberofcopy_xyzroutines are then invoked to copy or share the resources of specific kernel subsystems. The task structure contains pointers to instances of data structures that describe a sharable or cloneable resource. Because the task structure of the child starts out as an exact copy of the parent’s task structure, both point to the same resource-specific instances initially. This is illustrated in Figure 2-10.

Suppose we have two resources:res_abcandres_def. Initially the corresponding pointers in the task structure of the parent and child process point to the same instance of the resource-specific data structure in memory.

IfCLONE_ABCis set, then both processes will shareres_abc. This is already the case, but it is additionally necessary to increment the reference counter of the instance to prevent the associated memory space from being freed too soon — memory may be relinquished to memory management only when it is no longer being used by a process. If either parent or child modifies the shared resource, the change will be visible in both processes.

IfCLONE_ABCis not set, then a copy ofres_abcis created for the child process, and the resource counter of the new copy is initialized to 1. Consequently, if parent or child modifies the resource, then changes willnotpropagate to the other process in this case.

As a general rule, the fewer the number ofCLONEflags set, the less work there is to do. However, this gives parent and child processes more opportunities to mutually manipulate their data structures — and this must be taken into consideration when programming applications.

Linux Kernel Architecture

Chapter 2: Process Management and Scheduling

Get our desktop app

Company

Features

Documentation

Resources