Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 17: Data Synchronization


This is what__sync_inodesdoes:

❑ If the superblock is currently being synchronized from another part of the kernel (i.e., if
s_syncingofstruct super_blockis set), it is skipped. Otherwise,s_syncingis set to 1 to signal
that this superblock is being synchronized to other places in the kernel.
❑ sync_inodes_sbsynchronizes all dirty inodes associated with the superblock. The current
page state is queried withget_page_state, and then the function creates an instance of
writeback_control. There, the value ofnr_to_write(maximum number of pages to be
written) is set as follows:
fs/fs-writeback.c
unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);

wbc.nr_to_write = nr_dirty + nr_unstable +
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */

The computed value should usually suffice to cover all dirty pages of the system, but 50 percent
more is added. This ensures that absolutely all dirty pages of the inode are written back, but
avoids some concurrency problems that can appear if no limit is set on the number of pages that
may be written.
Thereafter, the familiarsync_sb_inodesfunction is invoked to call the low-level synchroniza-
tion routines of the various filesystems.
❑ The low-level synchronization routines of most filesystems simply mark buffers or pages as dirty
but do not perform actual writeback. For this reason, the kernel then invokessync_blockdev
to synchronize all mappings of the block device on which the filesystem resides (in this step,
the kernel isnotrestricted to a specific filesystem). This ensures that the data are actually
written back.

17.15.2 Synchronization of Individual Files


The contents of individual files can be synchronizedwithout the need to synchronize all the data in the
system. This option is used by applications that must ensure that the data they modify in memory are
always written back to the appropriate block device.Because normal write access operations first land in
cache, this option provides added safety for really important data (of course, another alternative would
be to use direct I/O operations that bypass the cache).

As already noted, several different system calls are available for this purpose:


  1. fsyncsynchronizes the contents of a file and also writes the metadata associated with the
    file’s inode back to the block device.

  2. fdatasyncwrites back only the data contents and ignores the metadata.

  3. sync_file_rangeis a comparatively new system call that was introduced in kernel 2.6.16. It
    allows for controlling synchronization for precisely defined parts of open files. Essentially,
    the implementation selects the desired pages for writeback and possibly awaits the result.
    Since this is not too different from the methods employed for the above system calls, I will
    not bother to discusssync_file_rangein detail.

Free download pdf