THE Java™ Programming Language, Fourth Edition

(Jeff_L) #1

CURRENCY_SYMBOL OTHER_LETTER


DASH_PUNCTUATION OTHER_NUMBER


DECIMAL_DIGIT_NUMBER OTHER_PUNCTUATION


ENCLOSING_MARK OTHER_SYMBOL


END_PUNCTUATION` PARAGRAPH_SEPARATOR


FINAL_QUOTE_PUNCTUATION PRIVATE_USE


FORMAT SPACE_SEPARATOR


INITIAL_QUOTE_PUNCTUATION START_PUNCTUATION


LETTER_NUMBER SURROGATE


LINE_SEPARATOR TITLECASE_LETTER


LOWERCASE_LETTER UNASSIGNED


MATH_SYMBOL UPPERCASE_LETTER


Unicode is divided into blocks of related characters. The static nested class Character.Subset is used to
define subsets of the Unicode character set. The static nested class Character.UnicodeBlock extends
Subset to define a set of standard Unicode character blocks, which are available as static fields of
UnicodeBlock. The static method UnicodeBlock.of returns the UnicodeBlock object representing
the Unicode character block for a particular character. The UnicodeBlock class also defines constants for
all the blocks, such as GREEK, KATAKANA, TELUGU, and COMBINING_MARKS_FOR_SYMBOLS. The of
method will return one of these values, or null if the character is not in any block. For example, the code


boolean isShape =
(Character.UnicodeBlock.of(ch) ==
Character.UnicodeBlock.GEOMETRIC_SHAPES);


tests to see if a character is in the GEOMETRIC_SHAPES block.


Two Subset objects define the same Unicode subset if they are the same object, a semantic enforced in
Subset by declaring equals and hashCode to be final, and defining them to have the default Object
behavior for these methods. If you define your own subsets for some reason, you should give people a way
analogous to of to get a single Subset object for each different kind of Subset you define.


8.5.1. Working with UTF-16


Working with sequences of characters, whether arrays of char, strings, or other types that implement
CharSequence (see Chapter 13), is complicated by the fact that supplementary characters need to be
encoded as a pair of char values. To assist with this, the Character class defines a range of methods that
help with the encoding and decoding of surrogate pairs, and accounting for their existence in a sequence of
character values:


public static intcharCount(int codePoint)
Free download pdf