Recent Additions to Character for Unicode Code Point Support
Recently, major additions have been made toCharacter. Beginning with JDK 5, theCharacter
class has included support for 32-bit Unicode characters. In the past, all Unicode characters
could be held by 16 bits, which is the size of achar(and the size of the value encapsulated
within aCharacter), because those values ranged from 0 to FFFF. However, the Unicode
character set has been expanded, and more than 16 bits are required. Characters can now range
from 0 to 10FFFF.
Here are two important terms: code point and supplemental character. Acode pointis a
character in the range 0 to 10FFFF. Characters that have values greater than FFFF are
calledsupplemental characters.
The expansion of the Unicode character set caused a fundamental problem for Java. Because
a supplemental character has a value greater than acharcan hold, some means of handling
the supplemental characters was needed. Java addressed this problem two ways. First,
Java uses twochars to represent a supplemental character. The firstcharis called thehigh
surrogate,and the second is called thelow surrogate.New methods, such ascodePointAt( ),
were provided to translate between code points and supplemental characters.
Secondly, Java overloaded several preexisting methods in theCharacterclass. The
overloaded forms useintrather thanchardata. Because anintis large enough to hold any
character as a single value, it can be used to store any character. For example, all of the methods
in Table 16-7 have overloaded forms that operate onint. Here is a sampling:
static boolean isDigit(intcp)
static boolean isLetter(intcp)
static int toLowerCase(intcp)
In addition to the methods overloaded to accept code points,Characteradds methods
that provide additional support for code points. A sampling is shown in Table 16-8.
Chapter 16: Exploring java.lang 401
Method Description
static int charCount(intcp) Returns 1 ifcpcan be represented by a single
char. It returns 2 if twochars are needed.
static int
codePointAt(CharSequencechars, intloc)
Returns the code point at the location specified
byloc.
static int codePointAt(charchars[ ], intloc) Returns the code point at the location specified
byloc.
static int
codePointBefore(CharSequencechars, intloc)
Returns the code point at the location that
precedes that specified byloc.
static int
codePointBefore(charchars[ ], intloc)
Returns the code point at the location that
precedes that specified byloc.
static boolean isHighSurrogate(charch) Returnstrueifchcontains a valid high surrogate
character.
TABLE 16-8 A Sampling of Methods That Provide Suppor t for 32-Bit Unicode Code Points