Linux Kernel Architecture

(Jacob Rumans) #1

Chapter 9: The Extended Filesystem Family


filesystems. How can the correctness and consistency of metadata be ensured if filesystem operations
are interrupted unintentionally — for example, in the event of a power outage or if a user switches a
computer off without shutting it down first?

9.3.1 Concepts


The basic idea of Ext3 is to regard each operation on the filesystem metadata as atransactionthat is
saved in ajournalbefore it is performed. Once the transaction has terminated (i.e., when the desired
modifications to the metadata have been made), the associated information is removed from the jour-
nal. If a system error occurs after transaction data have been written to the journal — but before (or
during) performance of the actual operations — the pending operations are carried out in their entirety
the next time the filesystem is mounted. The filesystem is then automatically in a consistent state. If the
interruption occursbeforethe transaction is written to the journal, the operation itself is not performed
because the information on it is lost when the system is restarted, but at least filesystem consistency is
retained.

However, Ext3 cannot perform miracles. It is still possible to lose data because of a system crash. Never-
theless, the filesystem can always be restored to a consistent state very quickly afterward.

The additional overhead needed to log transactions is, of course, reflected in the performance of Ext3,
which does not quite match that of Ext2. The kernel is able to access the Ext3 filesystem in three different
ways in order to strike a suitable balance between performance and data integrity in all situations:


  1. Inwritebackmode, only changes to the metadata are logged to the journal. Operations on
    useful data bypass the journal. This mode guarantees highest performance but lowest data
    protection.

  2. Inorderedmode only changes to the metadata are logged to the journal. However, changes to
    useful data are grouped and are always madebeforeoperations are performed on the meta-
    data. This mode is therefore slightly slower than Writeback mode.

  3. Injournalmode, changes not only to metadata but also to useful data are written to the
    journal. This guarantees the highest level of data protection but is by far the slowest mode
    (except in a few pathological situations). The chance of losing data is minimized.


The desired mode is specified in thedataparameter when the filesystem is mounted. The default is
ordered.

As already stated, the Ext3 filesystem is designed to be fully compatible with Ext2 — not only downward
but also (as far as possible) upward. The journal therefore resides in a special file with (as usual) its own
inode. This enables Ext3 filesystems to be mounted on systems that support only Ext2. Even existing Ext2
partitions can be converted to Ext3 quickly and, above all, without the need for complicated data copying
operations — a major consideration on server systems.

The journal can be held not only in a special file but also on a separate partition, but the details are not
discussed here.
Free download pdf