Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 17: Data Synchronization


sync_filesystemssynchronizes the mounted filesystems by iterating once more over thesuper_blocks
list and invoking thesync_fssuperblock operations routine for each filesystem that is mounted in
Read/Write mode and provides async_fsmethod. The method is only called when explicit synchroniza-
tion via a system call is requested and gives individual filesystems the ability to hook into the process. The
Ext3 filesystem, for instance, uses the opportunity to start a commit of all currently running transactions.


wakeup_padflush(0)

sys_sync

do_sync

sync_inodes(0)

sync_supers

sync_filesystems(0)

sync_filesystems(1)

sync_inodes(1)

Figure 17-10: Code flow diagram for
sys_sync.

Asthecodeflowdiagramshows,sync_inodesandsync_filesystemsare invokedtwice,firstwith
the parameter 0 and then 1. The parameter specifieswhether the functions are to wait until the write
operations are finished (1) or whether they are to execute asynchronously (0). Splitting the operation into
two passes allows the write operations to be initiated in the first pass. This triggers the synchronization
of dirty pages associated with inodes, and also useswrite_inodeto synchronize the metadata. However,
a filesystem implementation may choose just to dirty the buffers or pages that contain the metadata, but
not send an actual write request to the block device. Sincesync_inodesiterates over all dirty inodes, the
small contributions from the individual metadata changes will pile up to a comparatively large amount
of dirty data.


The second pass is therefore required for two reasons:



  1. The dirtied pages resulting from the calls towrite_inodeare written to disk (synchroniza-
    tion of raw block devices ensures this). Since metadata changes need not be processed on a
    piece-by-piece basis, the approach improves write performance.

  2. The kernel now explicitly waits for all write operations to complete that have been
    triggered — this is ensured becauseWB_SYNC_ALLis set in the second pass.


The two-pass behavior requires one change tosync_sb_inodesthat I have not discussed yet. The second
pass wants to wait for all pages that have been submitted. This includes the pages submitted during the
first pass. Recall from our previous considerations (the overview in Figure 17-1 might be helpful here)
that the corresponding wait operations are issued in__sync_single_inode. However, the function only
sees inodes that have been present on one of the listss_dirty,s_io,ors_more_ioof the superblock

Free download pdf