Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 9: The Extended Filesystem Family


32 bits to address positions within a file, this restricts maximum file size to 2^32 bits, which corresponds
to 2 GiB and is less than can be managed with triple indirection in the Ext2 filesystem. To cope with
this drawback, a special scheme was introduced to access large files; this not only has an impact on the
routines of the standard library, but must also be taken into account in the kernel sources.

Fragmentation


The similarity between memory management and disk storage in terms of their block structure means
that they share the familiar problem of fragmentation discussed in Chapter 3. Over time, many files of a
filesystem are deleted at random positions on the disk, and new ones are added. This inevitably leads to
fragmentation of free disk space into chunks of different sizes, as illustrated in Figure 9-5.

used
free

Figure 9-5: Fragmentation in filesystems.

Although the situation illustrated may well be exaggerated, it clearly indicates the nature of the problem.
There are still 12 blocks free on the hard disk, but the longest contiguous unit is 5 blocks. What happens
when a program wants to save data occupying a total of 7 blocks to disk? Or what about when it is nec-
essary to add data to an existing file and the data blocks beyond the end of the file are already occupied
by other data?

The answer is obvious. The data are spread over different areas of the disk and become fragmented. It is
important that this be done transparently to the user process. Processes accessing a filealwayssee the file
as a continuous linear structure, regardless of thedegree of data fragmentation on the hard disk. This is
reminiscent of the way in which a processor presents working memory to processes, the difference being
that there is no automatic hardware instance to ensure linearization on behalf of the filesystem. The code
of the filesystem itself is responsible for this task.

Ofcourse,thisdoesnotpresentanybasicdifficultywhen direct pointers or simple, double, and triple
indirection are used to point to the file data blocks. The data block numbers are always uniquely identi-
fied by the information in the pointers. From this point of view, it is irrelevant whether the data blocks
are sequential or are spread randomly across the entire hard disk.

However, there is a noticeable difference in access speed. If all file blocks are contiguous on the hard disk
(this is desirable), movement of the read/write head when reading data is reduced to a minimum, thus
boosting the speed of data transfer. If the opposite istrue — and the file blocks are distributed across the
disk — the read/write head is forced to constantly traverse the disk in order to read the data, and this
slows access.

Consequently, the Second Extended Filesystem doesits best to prevent fragmentation. When fragmen-
tation cannot be avoided, it attempts to keep the individual file blocks in the same block group.^6 It is
very helpful if the filesystem is not filled to capacity and is operated with appropriate reserves; more file
storage options are then available, and this automatically reduces susceptibility to fragmentation.

(^6) Thedefrag.ext2system tool analyzes Ext2 partitions and reorganizes fragmented data in a contiguous structure.

Free download pdf