458
Web Design in a Nutshell, eMatter Edition
Chapter 27International-ization
CHAPTER 27
Internationalization
If the Web is to reach a truly worldwide audience, it needs to be able to support
the display of all the languages of the world, with all their unique alphabets and
symbols, directionality, and specialized punctuation. This poses a big challenge to
HTML constructs as we know them. However, according to the W3C, “energetic
efforts” are being made toward this complicated goal.
The W3C’s efforts for internationalization (referred to as “i18n”—an i, then 18
letters, then an n) address two primary issues. First is the handling of alternative
character sets that take into account all the writing systems of the world. Second,
is how to specify languages and their unique presentation requirements within an
HTML document. Many solutions presented by internationalization experts in a
document called RFC-2070 were incorporated into the current HTML 4.0
Specification.
This chapter addresses both key issues for internationalization, as well as the new
character set and language features in HTML 4.0.
Character Sets
The first challenge in internationalization is dealing with the staggering number of
unique character shapes (called “glyphs”) that occur in all the writing sytems of
the world. This includes not only alphabets, but all ideographs (characters that
indicate a whole word or concept) for languages such as Chinese, Japanese, and
Korean.
8-Bit Encoded Character Sets
Character encodings (or character sets) are organizations of characters—units of a
written language system—in which each character is assigned a specific number.
Each character may be associated with a number of different glyphs; for instance,
the “close quote” character may be displayed using a ” or»glyph, depending on