Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 17: Data Synchronization


The actual write operation for the data is initiated indo_writepages. This function invokes
thewritepagesmethod of the correspondingaddress_space_operationsstructure if the
method exists and is not assigned a null pointer; for example, theext3_writepagesmethod
is invoked for the Ext3 filesystem.
If no method exists, the kernel invokes thegeneric_writepagesfunction, which finds all
dirty pages of the mapping and sequentially writes them back usingwritepagefrom the
address space operations (note that in contrast towritepages, there is nosat the end of the
name) ormpage_writepageif the former does not exist.


  1. write_inodewrites back the metadata needed to manage the inode itself. The function is not
    complicated; it simply checks whether the superblock operations associated with the inode
    instance include thewrite_inodemethod (the block device filesystem does not provide one,
    e.g.). If it exists, it is invoked to find the relevant data and write it back via the block layer.


Filesystems often choose to perform no actual writes to a block device, but just
submit a dirty buffer to the generic code. This needs to be dealt with in thesync
system call discussed below.

Note that callingwrite_inodeis skipped ifI_DIRTY_SYNCofI_DIRTY_DATASYNCis set
because this signals that only data, but not the metadata, require to be written back.


  1. If the current synchronization aims at data integrity, that is, ifWB_SYNC_ALLis set, then
    filemap_fdatawaitis used to wait until all pending write operations (which are usually
    processed asynchronously) are performed. The function waits for write operations to
    complete on a page-by-page basis. Pages currently written to their backing store have the
    PG_writebackstatus bit set, which is automatically removed by the responsible block layer
    code when the operation is complete. Therefore, the synchronization code just needs to wait
    until the bit goes away.


The above steps complete inode synchronization, at least in the view of the filesystem (naturally, the block
layer still has a few things to do iffilemap_fdatawaithas not been called to await the results before), but
the layer structure of the kernel means that this is of no further relevance to us). The inode now needs to
be put back into the correct list, and the kernel must update the inode status if it has changed as a result
of synchronization. There are four different lists in which the inode can be inserted:


  1. If the inode data have become dirty again in the meantime (i.e., if theI_DIRTYbit is set in the
    status element), the inode is added to thes_dirtylist of the superblock.
    It is also placed in this list if not all dirty data of the mapping were written back — because,
    for example, the number of pages specified by writeback control was too small to allow all
    dirty pages to be processed in one go. In this case, the inode status is set toI_DIRTY_PAGES
    so that synchronization of the metadata is skipped the next time__sync_single_inodeis
    invoked — these data have just been written back and are still intact.

  2. If not all data of the mapping were written back, butpdflushwas called fromwb_kupdate,
    the inode is placed ons_more_ioand will be dealt with in later synchronization runs.
    If not all data were written back andpdflushwasnotcalled fromwb_kupdate, then the
    inode is placed back on the dirty list. This avoids that one large dirty file that cannot be writ-
    ten properly suspends other pending files for a long time or indefinitely.redirty_tailis
    responsible to keep the inverse time ordering ons_dirtyintact.

Free download pdf