Chapter 4: Virtual Process Memory
4.7.3 Nonlinear Mappings
As just demonstrated, normal mappings map a continuous section from a file into a likewise continuous
section of virtual memory. If various parts of a file are mapped in a different sequence into an otherwise
contiguous area of virtual memory, it is generally necessary to use several mappings, which is more
costly in terms of resources (particularly invm_area_structs). A simpler way of achieving the same
result^11 is to use nonlinear mappings as introduced during the development of 2.5. The kernel features a
separate system call specifically for this purpose.
mm/fremap.c
long sys_remap_file_pages(unsigned long start, unsigned long size,
unsigned long __prot, unsigned long pgoff, unsigned long flags)
The system call allows for rearranging pages in a mapping such that the order in memory is not identical
with the order in the file. This is achievedwithoutmoving the memory contents around, but is instead
performed by manipulating the page tables of the process.
sys_remap_file_pagesenables an existing mapping at positionpgoffand with a size ofsizeto be
moved to a new position in virtual memory.startidentifies the mapping whose pages are to be moved,
and thus must fall into the address of an already existing mapping. It also specifies the new position into
which the pages identified bypgoffandsizeare supposed to be moved.
If a nonlinear mapping is swapped out, the kernel must ensure that the offsets are still present when the
mapping is swapped back in again. The information needed to do this is stored in the page table entries
of the pages swapped out and must be referenced when they are swapped back in, as we shall see below.
But how is the information encoded? Two components are used:
- Thevm_area_structinstances of all installed nonlinear mappings are stored in a list headed
by thei_mmap_nonlinearelement ofstruct address_space. The individualvmarea
structs on the list can employshared.vm_set.listas list element because a nonlinear
VMA will not be present on the standard prio tree. - The page table entries for the region in question are populated with special entries. These
are constructed such that they look like PTEs of pages that are not present, but contain
additional information identifying them as PTEs for nonlinear mappings. When the page
described by the PTE is accessed, a page fault is generated, and the correct page can be
read in.
Naturally, page table entries cannot be modified at will, but must adhere to conventions imposed by the
underlying architecture. To create nonlinear PTEs, help by the architecture-specific code is required, and
three functions must be defined:
- pgoff_to_ptetakes a file offset encoded as a page number and encodes it into a format that
can be stored in a page table. - pte_to_pgoffcan decode a file offset encoded in a page table.
(^11) Even though there appears to be very little need for this, there are various large databases that use operations of this kind to rep-
resent data transactions.