Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 9: The Extended Filesystem Family


number of management blocks from the number of blocks available for the filesystem.
Albeit simple, the operation is costly (the kernel needs to iterate over all block groups), so
s_overhead_lastands_blocks_lastare used to cache the last computed value for the number
of management blocks and the totally available blocks.
Note that the values usually remain constant.Once they have been computed, they can never-
theless change if the filesystem is resized while being mounted. But resizing is seldomly used
and requires an external kernel patch,so it will not be discussed any further.
❑ s_dir_countindicates the total number of directories; this is needed for implementation of
the Orlov allocator discussed in Section 9.2.4. Because this value is not saved in the on-disk
structure, it must be determined each time a filesystem is mounted. The kernel provides the
ext2_count_dirsfunction for this purpose.
❑ s_debtsis a pointer to an array of 8-bit numbers (generally,shorts) with a separate entry for
each block group. The Orlov allocator uses this array to keep a balance between file and directory
inodes in a block group (this is discussed in greater depth in Section 9.2.4).
❑ Thepercpu_counterinstances at the end of the structure provide approximate, but fast and scal-
able counters for the free blocks and inodes and the number of directories. The implementation
of such counters is discussed in Section 5.2.9.

In kernel versions up to 2.4,ext2_sb_infoincluded additional elements to cache the block and inode
bitmaps. Because of their size, it was not possible (or at least not reasonable) to keep them all in memory
at the same time. As a result, an LRU method was used to keep only the most frequently used elements
in RAM. In the meantime, this specific cache implementation has been rendered superfluous because
accesses to block devices are now cached automatically by the kernel, even if only a single block (and
not an entire page) is read. Chapter 16 discusses the implementation of the new caching scheme in detail
when describing__bread.

Pre-allocation


To increase the performance of block allocation, the second extended filesystem employs a mechanism
known aspre-allocation. Whenever a number of new blocks is requested for a file, not just the absolutely
necessary blocks are allocated. Blocks for consecutiveallocations are additionally spied out and marked
for later usewithoutbeing finally allocated. The kernel ensures that reserved areas don’t overlap. This
saves time when new allocations are made and prevents fragmentation, especially when multiple files
grow concurrently. It should be emphasized that pre-allocation does not lead to poorer use of the avail-
able disk space. A region pre-allocated by one inode can at any time be overwritten by another inode if
the need arises. However, the kernel tries to be polite and avoid this. One can think of pre-allocation as
an additional layer before the final block allocation that determines how the available spacecouldbe put
to good use. Pre-allocation is a suggestion, while allocation is final.

Several data structures are required to implement this mechanism. The reservation window itself is not
very complicated: It uses a start and end block to specify a reserved region. The following data structure
reflects this:

<ext2_fs_sb.h>
struct ext2_reserve_window {
ext2_fsblk_t _rsv_start; /* First byte reserved */
ext2_fsblk_t _rsv_end; /* Last byte reserved or 0 */
};
Free download pdf