Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 17: Data Synchronization


Some checks ascertain that the inode is suitable for synchronization before actual writeback is performed:


❑ Purely memory-based filesystems like RAM disks or pseudo-filesystems or purely virtual filesys-
tems, respectively, do not require synchronization with an underlying block device. This is
signaled by settingBDI_CAP_NO_WRITEBACKin thebacking_dev_infoinstance that belongs to
the filesystem’s mapping. If an inode of this type is encountered, processing can be aborted
immediately.
However, there is one filesystem whose metadata are purely memory-based and without phys-
ical backing store, but that cannot be skipped: the block device pseudo-filesystembdev.Recall
from Chapter 10 thatbdevis used to handle access to raw block devices or partitions thereof.
An inode is provided for each partition, and access to the raw device is handled via this inode.
While the inode metadata are important in memory, it does not make sense to store them any-
where permanently since they are just used toimplement a uniform abstraction mechanism.
This, however, does not imply that the contents of the block device do not require synchro-
nization: Quite the opposite is true. Access to the raw device is as usual buffered by the page
cache, and any changes are reflected in the radix tree data structures. When modifications are
made on the contents of a block device, they go through the page cache. The pages must there-
fore be synchronized like all other pages in the page cache with the underlying hardware from
time to time.
The block device pseudo-filesystembdevthus doesnotsetBDI_CAP_NO_WRITEBACK. However,
nowrite_inodemethod is contained in the associatedsuper_operations, so metadata syn-
chronization is not performed. Data synchronization, on the other hand, runs as for any other
filesystem.
❑ If the synchronization queue is congested (theBDI_write_congestedbit is set in the sta-
tus field of thebacking_dev_infoinstance) and non-blocking writeback was selected in
writeback_control, the congestion needs to be reported to the higher layers. This is done by
setting theencountered_congestionfield in thewriteback_controlinstance to 1.
If the current inode belongs to a block device, then the auxiliary functionrequeue_iois used to
move the inode froms_iotomore_io. It is possible that different inodes of a block device are
backed by different queues, for instance, if multiple physical devices are combined into a single
logical device. The kernel therefore continues to process the other inodes on thes_iolist in the
hope that they belong to different queues that are not congested.
If the current inode, however, stems from a regular filesystem, it can be assumed that
all other inodes are backed by the same queue. Since this queue is already congested, it
does not make sense to synchronize the other inodes, so the loop iteration is aborted. The
unprocessedinodesremaininthes_iolist and are dealt with the next timesync_sb_inodes
is called.
❑ pdflushcan be instructed viawriteback_controlto focus on a single queue. If a regular filesys-
tem inode that uses a different queue is encountered, processing can be aborted. If the inode
represents a block device, processing skips forward to the next inode on thes_iolist for the
same reason as in the write congestion case.
❑ The current system time in jiffies is held in a local variable at the start ofsync_sb_inodes.The
kernel now checks whether the time when the inode just processed was marked as dirty is after
the start time ofsync_sb_inodes. If so, synchronization is aborted in its entirety. The unpro-
cessed inodes are again left ons_io.
Free download pdf