Java The Complete Reference, Seventh Edition

Recent Additions to Character for Unicode Code Point Support

Recently, major additions have been made toCharacter. Beginning with JDK 5, theCharacter
class has included support for 32-bit Unicode characters. In the past, all Unicode characters
could be held by 16 bits, which is the size of achar(and the size of the value encapsulated
within aCharacter), because those values ranged from 0 to FFFF. However, the Unicode
character set has been expanded, and more than 16 bits are required. Characters can now range
from 0 to 10FFFF.
Here are two important terms: code point and supplemental character. Acode pointis a
character in the range 0 to 10FFFF. Characters that have values greater than FFFF are
calledsupplemental characters.
The expansion of the Unicode character set caused a fundamental problem for Java. Because
a supplemental character has a value greater than acharcan hold, some means of handling
the supplemental characters was needed. Java addressed this problem two ways. First,
Java uses twochars to represent a supplemental character. The firstcharis called thehigh
surrogate,and the second is called thelow surrogate.New methods, such ascodePointAt( ),
were provided to translate between code points and supplemental characters.
Secondly, Java overloaded several preexisting methods in theCharacterclass. The
overloaded forms useintrather thanchardata. Because anintis large enough to hold any
character as a single value, it can be used to store any character. For example, all of the methods
in Table 16-7 have overloaded forms that operate onint. Here is a sampling:

static boolean isDigit(intcp) static boolean isLetter(intcp) static int toLowerCase(intcp)

In addition to the methods overloaded to accept code points,Characteradds methods
that provide additional support for code points. A sampling is shown in Table 16-8.

Chapter 16: Exploring java.lang 401

Method Description static int charCount(intcp) Returns 1 ifcpcan be represented by a single char. It returns 2 if twochars are needed. static int codePointAt(CharSequencechars, intloc)

Returns the code point at the location specified byloc. static int codePointAt(charchars[ ], intloc) Returns the code point at the location specified byloc. static int codePointBefore(CharSequencechars, intloc)

Returns the code point at the location that precedes that specified byloc. static int codePointBefore(charchars[ ], intloc)

Returns the code point at the location that precedes that specified byloc. static boolean isHighSurrogate(charch) Returnstrueifchcontains a valid high surrogate character.

TABLE 16-8 A Sampling of Methods That Provide Suppor t for 32-Bit Unicode Code Points

Java The Complete Reference, Seventh Edition

Recent Additions to Character for Unicode Code Point Support

Get our desktop app

Company

Features

Documentation

Resources