Chapter 4: Data Formats 117
PART I
For applications solely in English, this encoding will likely be adequate; however, with
Ajax applications that use the XMLHttpRequest object, the ISO-8859-1 format is not used.
At the time of this book’s writing, XHR requests are always in UTF-8 form, no matter how you set
things or what your browser is configured to do. This may not matter to English developers
doing noninternationalized content as the ISO-8859-1 format is a subset of UTF-8, so no
characters will be garbled, but there are important implications of this format that should be
discussed if you do decide to support other languages. For non-Western language content
developers, this is, however, a very important issue that should be explored further.
To prove the previous point, we have set up a testing tool at http://ajaxref.com/ch4/
charsetexplorer.php. This example allows you to define the character set to be used with the
Content-Type header set using the setRequestHeader() method and provides a number
of payloads in popular languages to explore. In case you are curious, each foreign language
phrase is an expression asking why other people just can’t speak that particular language.
Figure 4-1 illustrates what happens when either Japanese, or Arabic-speaking Ajax
developers try to use character sets that are explicitly set.
What happened in Figure 4-1 that caused the characters to be distorted? First, we won’t
focus on the correctness of what is shown in the pull-down as that is not the issue; it is the
underlying encoding we care about. The problem arises because UTF-8 characters are
always sent whether the XHR object is instructed to do so or not. We also specifically set
that the response should be returned using a specific Arabic or Japanese character set. When
run, the example shows that the received payload is now garbled because there was no
translation between the UTF-8 and specified character set and thus the data is ruined.
The character garbling problem can be avoided by translating characters on the server.
As an example, in PHP there is a useful function, iconv, that converts characters from one
character set to another like so:
$outgoingvalue = iconv("UTF-8", "ISO-8859-6", $incomingvalue);
You see the result of this correct exchange in Figure 4-2.
FIGURE 4-1 Example of character set confusion in Ajax