Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 17: Data Synchronization


performed in__sync_single_inodediscussed below; recall from Figure 17-1 that it sits at the
bottom of the mechanism, where it is responsible to delegate synchronization of a single inode to
the filesystem-specific methods. All functions that wait on inodes becauseWB_SYNC_ALLis set are
marked in Figure 17-1.
Notice that writeback withWB_SYNC_ALLset is referred to asdata integrity writeback. If a sys-
tem crash happens immediately after writebackin this mode has been finished, no data are lost
because everything is synchronized with the underlying block devices.
IfWB_SYNC_NONEis used, the kernel will send the request, but continue with the remaining syn-
chronization work immediately afterward. This mode is also referred to asflushing writeback.
WB_SYNC_HOLDis a special form used for thesyncsystem call that works similarly to
WB_SYNC_NONE. The exact differences are subtle and are discussed in Section 17.15.

❑ When the kernel performs writeback, it must decide which dirty cache data need to be
synchronized with the backing store. It uses theolder_than_thisandnr_to_writeelements
for this purpose. Data are written back if they have been dirty for longer than specified by
older_than_this.


older_than_thisis defined as a pointer type, which is unusual for a singlelong
value. Its numeric value, which can be obtained by appropriate de-referencing, is of
interest. If the pointer isNULL, then age checking is not performed, and all objects
are synchronized irrespective of when they became dirty. Settingnr_to_writeto 0
likewise disables any upper limit on the number of pages that are supposed to be
written back.

❑ nr_to_writecan restrict the maximal number of pages that should be written back. The upper
bound for this is given byMAX_WRITEBACK_PAGES, which is usually set to 1,024.


❑ If pages were selected to be written back, functions from lower layers perform the required oper-
ations. However, they can fail for various reasons, for instance, because the page is locked from
some other part of the kernel. The number of skipped pages can be reported to higher layers via
the counterpages_skipped.


❑ Thenonblockingflag specifies whether writeback queues block or not in the event of congestion
(more pending write operations than can be effectively satisfied). If they are blocked, the kernel
waits until the queue is free. If not, it relinquishes control. The write operation is then resumed
later.


❑ encountered_congestionis also a flag to signal to higher layers that congestion has occurred
during data writeback. It is a Boolean variable and accepts the values 1 or 0.


❑ for_kupdatedis set to 1 if the write request was issued by the periodic mechanism. Otherwise,
its value is 0.for_reclaimandfor_writepagesare used in a similar manner: They are set if
the writeback operation was initiated from memory reclaim from thedo_writepagesfunction,
respectively.


❑ Ifrange_cyclicis set to 0, the writeback mechanism is restricted to operate on the range given
byrange_startandrange_end. The limits refer to the mapping for which the writeback was
initiated.
Ifrange_cyclicis set to 1, the kernel may iterate many times over the pages associated with a
mapping, thus the name of the element.

Free download pdf