Linux Kernel Architecture

Chapter 9: The Extended Filesystem Family

number of management blocks from the number of blocks available for the filesystem. Albeit simple, the operation is costly (the kernel needs to iterate over all block groups), so s_overhead_lastands_blocks_lastare used to cache the last computed value for the number of management blocks and the totally available blocks. Note that the values usually remain constant.Once they have been computed, they can never- theless change if the filesystem is resized while being mounted. But resizing is seldomly used and requires an external kernel patch,so it will not be discussed any further. ❑ s_dir_countindicates the total number of directories; this is needed for implementation of the Orlov allocator discussed in Section 9.2.4. Because this value is not saved in the on-disk structure, it must be determined each time a filesystem is mounted. The kernel provides the ext2_count_dirsfunction for this purpose. ❑ s_debtsis a pointer to an array of 8-bit numbers (generally,shorts) with a separate entry for each block group. The Orlov allocator uses this array to keep a balance between file and directory inodes in a block group (this is discussed in greater depth in Section 9.2.4). ❑ Thepercpu_counterinstances at the end of the structure provide approximate, but fast and scal- able counters for the free blocks and inodes and the number of directories. The implementation of such counters is discussed in Section 5.2.9.

In kernel versions up to 2.4,ext2_sb_infoincluded additional elements to cache the block and inode bitmaps. Because of their size, it was not possible (or at least not reasonable) to keep them all in memory at the same time. As a result, an LRU method was used to keep only the most frequently used elements in RAM. In the meantime, this specific cache implementation has been rendered superfluous because accesses to block devices are now cached automatically by the kernel, even if only a single block (and not an entire page) is read. Chapter 16 discusses the implementation of the new caching scheme in detail when describing__bread.

Pre-allocation

To increase the performance of block allocation, the second extended filesystem employs a mechanism known aspre-allocation. Whenever a number of new blocks is requested for a file, not just the absolutely necessary blocks are allocated. Blocks for consecutiveallocations are additionally spied out and marked for later usewithoutbeing finally allocated. The kernel ensures that reserved areas don’t overlap. This saves time when new allocations are made and prevents fragmentation, especially when multiple files grow concurrently. It should be emphasized that pre-allocation does not lead to poorer use of the available disk space. A region pre-allocated by one inode can at any time be overwritten by another inode if the need arises. However, the kernel tries to be polite and avoid this. One can think of pre-allocation as an additional layer before the final block allocation that determines how the available spacecouldbe put to good use. Pre-allocation is a suggestion, while allocation is final.

Several data structures are required to implement this mechanism. The reservation window itself is not very complicated: It uses a start and end block to specify a reserved region. The following data structure reflects this:

<ext2_fs_sb.h> struct ext2_reserve_window { ext2_fsblk_t _rsv_start; /* First byte reserved */ ext2_fsblk_t _rsv_end; /* Last byte reserved or 0 */ };

Linux Kernel Architecture

Chapter 9: The Extended Filesystem Family

Get our desktop app

Company

Features

Documentation

Resources