Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 16: Page and Buffer Cache


executed by standard functions of the kernel (mpage_readpageandmpage_readpages)thatcan
be used for most purposes.
Notice that thefileargument ofreadpageis not required if the standard functions are used to
implement the desired functionality because the inode associated with the desired page can be
determined viapage->mapping->host.

❑ sync_pageperforms synchronization of data that have not yet been written back to the back-
ing store. Unlikewritepage, the function operates on block layer level and attempts to perform
pending write requests still held in buffers in this layer. In contrast,writepageoperates on the
address space layer and simply forwards the data to the block layer without bothering about
active buffering there.
The kernel provides the standard functionblock_sync_page, which obtains the address space
mapping that belongs to thepagein question and unplugs the block device queue to start I/O.


❑ set_page_dirtyallows an address space to provide a specific method of marking a page as
dirty. However, this option is rarely used. In this case, the kernel automatically uses ccode_set-
page_dirty_buffers to simultaneously mark the page as dirty on the buffer level and to add it to
thedirty_pageslist of the current mapping.


❑ prepare_writeandcommit_writeperform write operations triggered by thewritesystem call.
To cater to the special features of journaling filesystems, this operation must be split into two
parts:prepare_writestores the transaction data in the journal, andcommit_writeperforms the
actual write operation by sending the appropriate commands to the block layer.
When data are written, the kernel must ensure that the two functions are always invoked in pairs
and in the correct sequence as otherwise the journal mechanism serves no purpose.
It has by now become common practice that even non-journaling filesystems (like Ext2) split
writing into two parts.


Unlikewritepage,prepare_andcommit_writedo not directly initiate I/O
operations (in other words, they do not forward corresponding commands to the
block layer) but, in the standard implementation, make do with marking whole
pages or parts thereof as dirty; the write operation itself is triggered by a kernel
daemon that is provided for this purpose and that periodically checks the existing
pages.

❑ write_beginandwrite_endare replacements forprepare_writeandcommit_write.
While the intention of the functions is identical, the required parameters and espe-
cially the way in which locking of involved objects is handled have changed. Since
Documentation/filesystems/vfs.txtprovides a detailed description of how the functions
operate, nothing more needs to be added here.


❑ bmapmaps a logical block offset within an address space to a physical block number. This is usu-
ally straightforward for block devices, but since files are in general not represented by a linear
number of blocks on a device, the required information cannot be determined otherwise.
bmapis required by the swap code (see Section 18.3.3), theFIBMAPfile ioctl, and internally by
some filesystems.

Free download pdf