Chapter 2: Process Management and Scheduling
Let us return tocopy_process.Afterdup_task_structhas succeeded, the kernel checks if the maximam
number of processes allowed for a particular user are exceeded with the creation of the new task:
kernel/fork.c
if (atomic_read(&p->user->processes) >=
p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
p->user != current->nsproxy->user_ns->root_user)
goto bad_fork_free;
}
...
The per-user resource counters for the user owning the current process are kept in an instance of
user_structthat is accessible viatask_struct->user, and the number of processes currently held
by a particular user is stored inuser_struct->processes. If this value exceeds the limit set by rlimit,
task creation is aborted — unless the current user is assigned special capabilities (CAP_SYS_ADMINor
CAP_SYS_RESOURCE) or is the root user. Checking for the root user is interesting: Recall from above that
each PID namespace has its own root user. This must now be taken into account in the above check.
If resource limits do not prevent process creation, the interface functionsched_forkis called to give
the scheduler a chance to set up things for the newtask. Before the introduction of the CFQ scheduler
in kernel 2.6.23, this was more complicated because the remaining time slice of the parent had to be
distributed between parent and child. Since the new scheduler does not require time slices anymore,
things are a lot simpler now. Essentially, the routines initialize statistical fields and on multi-processor
systems probably re-balance the available processes between the CPUs if this is necessary. Besides, the
task state is set toTASK_RUNNING— which is not really true since the new process is, in fact, not yet
running. However, this prevents any other part of the kernel from trying to change the process state from
non-running to running and scheduling the new process before its setup has been completely finished.
Alargenumberofcopy_xyzroutines are then invoked to copy or share the resources of specific kernel
subsystems. The task structure contains pointers to instances of data structures that describe a sharable or
cloneable resource. Because the task structure of the child starts out as an exact copy of the parent’s task
structure, both point to the same resource-specific instances initially. This is illustrated in Figure 2-10.
Suppose we have two resources:res_abcandres_def. Initially the corresponding pointers in the task
structure of the parent and child process point to the same instance of the resource-specific data structure
in memory.
IfCLONE_ABCis set, then both processes will shareres_abc. This is already the case, but it is additionally
necessary to increment the reference counter of the instance to prevent the associated memory space from
being freed too soon — memory may be relinquished to memory management only when it is no longer
being used by a process. If either parent or child modifies the shared resource, the change will be visible
in both processes.
IfCLONE_ABCis not set, then a copy ofres_abcis created for the child process, and the resource counter
of the new copy is initialized to 1. Consequently, if parent or child modifies the resource, then changes
willnotpropagate to the other process in this case.
As a general rule, the fewer the number ofCLONEflags set, the less work there is to do. However, this
gives parent and child processes more opportunities to mutually manipulate their data structures — and
this must be taken into consideration when programming applications.