Linux Kernel Architecture

Chapter 17: Data Synchronization

Some checks ascertain that the inode is suitable for synchronization before actual writeback is performed:

❑ Purely memory-based filesystems like RAM disks or pseudo-filesystems or purely virtual filesystems, respectively, do not require synchronization with an underlying block device. This is signaled by settingBDI_CAP_NO_WRITEBACKin thebacking_dev_infoinstance that belongs to the filesystem’s mapping. If an inode of this type is encountered, processing can be aborted immediately. However, there is one filesystem whose metadata are purely memory-based and without physical backing store, but that cannot be skipped: the block device pseudo-filesystembdev.Recall from Chapter 10 thatbdevis used to handle access to raw block devices or partitions thereof. An inode is provided for each partition, and access to the raw device is handled via this inode. While the inode metadata are important in memory, it does not make sense to store them any- where permanently since they are just used toimplement a uniform abstraction mechanism. This, however, does not imply that the contents of the block device do not require synchronization: Quite the opposite is true. Access to the raw device is as usual buffered by the page cache, and any changes are reflected in the radix tree data structures. When modifications are made on the contents of a block device, they go through the page cache. The pages must therefore be synchronized like all other pages in the page cache with the underlying hardware from time to time. The block device pseudo-filesystembdevthus doesnotsetBDI_CAP_NO_WRITEBACK. However, nowrite_inodemethod is contained in the associatedsuper_operations, so metadata synchronization is not performed. Data synchronization, on the other hand, runs as for any other filesystem. ❑ If the synchronization queue is congested (theBDI_write_congestedbit is set in the sta- tus field of thebacking_dev_infoinstance) and non-blocking writeback was selected in writeback_control, the congestion needs to be reported to the higher layers. This is done by setting theencountered_congestionfield in thewriteback_controlinstance to 1. If the current inode belongs to a block device, then the auxiliary functionrequeue_iois used to move the inode froms_iotomore_io. It is possible that different inodes of a block device are backed by different queues, for instance, if multiple physical devices are combined into a single logical device. The kernel therefore continues to process the other inodes on thes_iolist in the hope that they belong to different queues that are not congested. If the current inode, however, stems from a regular filesystem, it can be assumed that all other inodes are backed by the same queue. Since this queue is already congested, it does not make sense to synchronize the other inodes, so the loop iteration is aborted. The unprocessedinodesremaininthes_iolist and are dealt with the next timesync_sb_inodes is called. ❑ pdflushcan be instructed viawriteback_controlto focus on a single queue. If a regular filesystem inode that uses a different queue is encountered, processing can be aborted. If the inode represents a block device, processing skips forward to the next inode on thes_iolist for the same reason as in the write congestion case. ❑ The current system time in jiffies is held in a local variable at the start ofsync_sb_inodes.The kernel now checks whether the time when the inode just processed was marked as dirty is after the start time ofsync_sb_inodes. If so, synchronization is aborted in its entirety. The unpro- cessed inodes are again left ons_io.

Linux Kernel Architecture

Chapter 17: Data Synchronization

Get our desktop app

Company

Features

Documentation

Resources