249
Unicode under Windows
Under Microsoft Windows, the data type wchar_t is used to represent a single
“wide ” UTF-16 character (WCS), while the char type is used both for ANSI
strings and for multibyte UTF-16 strings (MBCS). What’s more, Windows per-
mits you to write code that is character set independent. To accomplish this, a
data type known as TCHAR is provided. The data type TCHAR is a typedef
to char when building your application in ANSI mode and is a typedef to
wchar_t when building your application in UTF-16 (WCS) mode. (For consis-
tency, the type WCHAR is also provided as a synonym for wchar_t.)
Throughout the Windows API, a prefi x or suffi x of “w,” “wcs,” or “W”
indicates wide (UTF-16) characters; a prefi x or suffi x of “t,” “tcs,” or “T”
indicates the current character type (which might be ANSI or might be UTF-
16, depending on how your application was built); and no prefi x or suf-
fi x indicates plain old ANSI. STL uses a similar convention—for example,
std::string is STL’s ANSI string class, while std::wstring is its wide
character equivalent.
Prett y much every standard C library function that deals with strings has
equivalent WCS and MBCS versions under Windows. Unfortunately, the API
calls don’t use the terms UTF-8 and UTF-16, and the names of the functions
aren’t always 100% consistent. This all leads to some confusion among pro-
grammers who aren’t in the know. (But you aren’t one of those programmers!)
Table 5.1 lists some examples.
Windows also provides functions for translating between ANSI character
strings, multibyte UTF-8 strings, and wide UTF-16 strings. For example, wcs-
tombs() converts a wide UTF-16 string into a multibyte UTF-8 string.
Complete documentation for these functions can be found on Microsoft ’s
MSDN web site. Here’s a link to the documentation for strcmp() and its ilk,
from which you can quite easily navigate to the other related string-manip-
ulation functions using the tree view on the left -hand side of the page, or via
the search bar: htt p://msdn2.microsoft .com/en-us/library/kk6xf663(VS.80).
aspx.
ANSI WCS MBCS
strcmp() wcscmp() _mbscmp()
strcpy() wcscpy() _mbscpy()
strlen() wcslen() _mbstrlen()
Table 5.1. Variants of some common standard C library string functions for use with ANSI,
wide and multibyte character sets.
5.4. Strings