Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 13: System Calls


/300/ .word sys_set_robust_list, sys_get_robust_list, sys_migrate_pages, sys_mbind,
sys_get_mempolicy
.word sys_set_mempolicy, sys_kexec_load, sys_move_pages, sys_getcpu, sys_epoll_pwait
/310/ .word sys_utimensat, sys_signalfd, sys_timerfd, sys_eventfd, sys_fallocate


The table definition is similar on IA-32 processors.

arch/x86/kernel/syscall_table_32.S
ENTRY(sys_call_table)
.long sys_restart_syscall / 0 - old "setup()" system call, used for restarting /
.long sys_exit
.long sys_fork
.long sys_read
.long sys_write
.long sys_open / 5 /
.long sys_close
...
.long sys_utimensat / 320 /
.long sys_signalfd
.long sys_timerfd
.long sys_eventfd
.long sys_fallocate


The purpose of the.longstatements is to align the table entries in memory.

The tables defined in this way have the properties of a C array and can therefore be processed using
pointer arithmetic.sys_call_tableis the base pointer and points to the start of the array, that is, to
the zero entry in C terms. If a userspace program invokes theopensystem call, the number passed
is 5. The dispatcher routine adds this number to thesys_call_tablebase and arrives at the fifth entry
that holds the address ofsys_open— this is the processor-independent handler function. Once the
parameter values still held in registers have been copied onto the stack, the kernel calls the handler
routine and switches to the processor-independent part of system call handling.

Because the kernel mode and user mode use two different stacks, as described in
Chapter 3, system call parameters cannot be passed on the stack as would normally
be the case. Switching between the stacks is performed either in
architecture-specific assembly language code that is called when kernel mode is
entered, or is carried out automatically by the processor when the protection level is
switched from user to kernel mode.

Return to User Mode


Each system call must inform the user application whether its routine was executed and with which
result. It does this by means of its return code. From the perspective of the application, a normal variable
is read using C programming means. However, the kernel, in conjunction withlibc, must expend a little
more effort to make things just as simple for the user process.

Meaning of Return Values


Generally, the following applies for system call return values. Negative values indicate an error, and
positive values (and 0) denote successful termination.
Free download pdf