Expert C Programming

(Jeff_L) #1

Executable files on UNIX are also labelled in a special way so that systems can recognize their special
properties. It's a common programming technique to label or tag important data with a unique number
identifying what it is. The labelling number is often termed a "magic" number; it confers the
mysterious power of being able to identify a collection of random bits. For example, the superblock
(the fundamental data structure in a UNIX filesystem) is tagged with the following magic number:


#define FS_MAGIC 0x011954


That strange-looking number isn't wholly random. It's Kirk McKusick's birthday. Kirk, the
implementor of the Berkeley fast file system, wrote this code in the late 1970's, but magic numbers are


so useful that this one is still in the source base today (in file sys/fs/ufs_fs.h). Not only does


it promote file system reliability, but also, every file systems hacker now knows to send Kirk a
birthday card for January 19.


There's a similar magic number for a.out files. Prior to AT&T's System V release of UNIX, an a.out
was identified by the magic number 0407 at offset zero. And how was 0407 selected as the "magic
number" identifying a UNIX object file? It's the opcode for an unconditional branch instruction
(relative to the program counter) on a PDP-11! If you're running on a PDP-11 or VAX in
compatibility mode, you can just start executing at the first word of the file, and the magic number
(located there) will branch you past the a.out header and into the first real executable instruction of the
program. The PDP-11 was the canonical UNIX machine at the time when a.out needed a magic


number. Under SVr4, executables are marked by the first byte of a file containing hex 7F followed by


the letters "ELF" at bytes 2, 3, and 4 of the file.


Segments


Object files and executables come in one of several different formats. On most SVr4 implementations
the format is called ELF (originally "Extensible Linker Format", now "Executable and Linking
Format"). On other systems, the executable format is COFF (Common Object-File Format). And on
BSD UNIX (rather like the Buddha having Buddha-nature), a.out files have a.out format. You can find


out more about the format used on a UNIX system by typing man a.out and reading the manpage.


All these different formats have the concept of segments in common. There will be lots more about
segments later, but as far as object files are concerned, they are simply areas within a binary file where
all the information of a particular type (e.g., symbol table entries) is kept. The term section is also
widely used; sections are the smallest unit of organization in an ELF file. A segment typically contains
several sections.


Don't confuse the concept of segment on UNIX with the concept of segment on the Intel x86
architecture.



  • A segment on UNIX is a section of related stuff in a binary.

  • A segment in the Intel x86 memory model is the result of a design in which (for compatibility
    reasons) the address space is not uniform, but is divided into 64-Kbyte ranges known as
    segments.


The topic of segments on the Intel x86 architecture really deserves a chapter of its own. [1] For the
remainder of this book, the term segment has the UNIX meaning unless otherwise stated.


[1] And it pretty near has one, too! See next chapter.

Free download pdf