Pro PHP- Patterns, Frameworks, Testing and More

(vip2019) #1

(^44) CHAPTER 5 ■ WHAT'S NEW IN PHP 6
Unicode in PHP 6
The single biggest change in PHP 6 is the introduction of Unicode text-encoding support. Unicode
changes a lot of things in PHP because the conceptual length of a string is no longer connected
to the number of bytes of storage it uses. Many characters in Unicode may be two or more
bytes. This required reworking most string functions, such as strlen(), to support multibyte
character strings.


Unicode Semantics.


A string in PHP 5 normally allocates one byte (8 bits) per character, but in PHP 6, the strings can
be Unicode 16-bit encoding. The 8-bit encodings, and any other encodings you may have used,
are now considered binary format strings. New, normal strings are considered Unicode. Which
one you use by default depends on a php.ini setting called unicode.semantics.
When unicode.semantics is set to on, normal string literals are Unicode. When it is set to
off, string literals are 8-bit binary strings. I recommend enabling Unicode semantics at this
time. To see whether Unicode semantics are on or off, execute this command:

> php -r "echo ini_get('unicode.semantics');"

1

If you see a 0 here instead, you will need to enable unicode.semantics in your php.ini file.
You can use the command php -i to locate your php.ini file. If no php.ini file exists at the specified
location, a sample file comes with the PHP distribution. Locate php.ini-recommended in the
source package and rename/copy it to the location specified by php -i; for example, /etc/
php.ini or /usr/local/php6/lib/php.ini.
Once Unicode semantics are enabled, you can use Unicode characters directly in your
PHP scripts, and functions like strlen() will correctly count the length of your strings using
Unicode, rather than the number of bytes in the string. Listing 5-4 demonstrates a Unicode
example using Canadian Aboriginal Syllabics.

■Tip Linux and Mac users will already have a compatible font, but Windows users can find one on http://
http://www.tiro.com. Without a proper font, the text in Listing 5-4 may appear as boxes or spaces.

McArthur_819-9C05.fm Page 44 Wednesday, February 27, 2008 8:38 AM

Free download pdf