ugh.book

(singke) #1

232 System Administration


One can almost detect an emergent intelligence, as in “Colossus: The
Forbin Project.” Unix managed to purge from itself the documents that
prove it’s buggy.

Unix’s method for updating the data and pointers that it stores on the disk
allows inconsistencies and incorrect pointers on the disk as a file is being
created or modified. When the system crashes before updating the disk
with all the appropriate changes, which is always, the file system image on
disk becomes corrupt and inconsistent. The corruption is visible during the
reboot after a system crash: the Unix boot script automatically runs fsck to
put the file system back together again.

Many Unix sysadmins don’t realize that inconsistencies occur during a sys-
tem dump to tape. The backup program takes a snapshot of the current file
system. If there are any users or processes modifying files during the
backup, the file system on disk will be inconsistent for short periods of
time. Since the dump isn’t instantaneous (and usually takes hours), the
snapshot becomes a blurry image. It’s similar to photographing the Indy
500 using a 1 second shutter speed, with similar results: the most important
files—the ones that people were actively modifying—are the ones you
can’t restore.

Because Unix lacks facilities to backup a “live” file system, a proper
backup requires taking the system down to its stand-alone or single-user
mode, where there will not be any processes on the system changing files
on disk during the backup. For systems with gigabytes of disk space, this
translates into hours of downtime every day. (With a sysadmin getting paid
to watch the tapes whirr.) Clearly, Unix is not a serious option for applica-
tions with continuous uptime requirements. One set of Unix systems that
desired continuous uptime requirements was forced to tell their users in
/etc/motd to “expect anomalies” during backup periods:
SunOS Release 4.1.1 (DIKUSUN4CS) #2:Sun Sep 22 20:48:55 MET DST 1991
--- BACKUP PLAN ----------------------------------------------------
Skinfaxe: 24. Aug, 9.00-12.00 Please note that anomalies can
Freja & Ask: 31. Aug, 9.00-13.00 be expected when using the Unix
Odin: 7. Sep, 9.00-12.00 systems during the backups.
Rimfaxe: 14. Sep, 9.00-12.00
Div. Sun4c: 21. Sep, 9.00-13.00
--------------------------------------------------------------------

(^1) This message is reprinted without Keith Bostic’s permission, who said “As far as I
can tell, [reprinting the message] is not going to do either the CSRG or me any
good.” He’s right: the backups, made with the Berkeley tape backup program, were
also bad.

Free download pdf