CURRENCY_SYMBOL OTHER_LETTER
DASH_PUNCTUATION OTHER_NUMBER
DECIMAL_DIGIT_NUMBER OTHER_PUNCTUATION
ENCLOSING_MARK OTHER_SYMBOL
END_PUNCTUATION` PARAGRAPH_SEPARATOR
FINAL_QUOTE_PUNCTUATION PRIVATE_USE
FORMAT SPACE_SEPARATOR
INITIAL_QUOTE_PUNCTUATION START_PUNCTUATION
LETTER_NUMBER SURROGATE
LINE_SEPARATOR TITLECASE_LETTER
LOWERCASE_LETTER UNASSIGNED
MATH_SYMBOL UPPERCASE_LETTER
Unicode is divided into blocks of related characters. The static nested class Character.Subset is used to
define subsets of the Unicode character set. The static nested class Character.UnicodeBlock extends
Subset to define a set of standard Unicode character blocks, which are available as static fields of
UnicodeBlock. The static method UnicodeBlock.of returns the UnicodeBlock object representing
the Unicode character block for a particular character. The UnicodeBlock class also defines constants for
all the blocks, such as GREEK, KATAKANA, TELUGU, and COMBINING_MARKS_FOR_SYMBOLS. The of
method will return one of these values, or null if the character is not in any block. For example, the code
boolean isShape =
(Character.UnicodeBlock.of(ch) ==
Character.UnicodeBlock.GEOMETRIC_SHAPES);
tests to see if a character is in the GEOMETRIC_SHAPES block.
Two Subset objects define the same Unicode subset if they are the same object, a semantic enforced in
Subset by declaring equals and hashCode to be final, and defining them to have the default Object
behavior for these methods. If you define your own subsets for some reason, you should give people a way
analogous to of to get a single Subset object for each different kind of Subset you define.
8.5.1. Working with UTF-16
Working with sequences of characters, whether arrays of char, strings, or other types that implement
CharSequence (see Chapter 13), is complicated by the fact that supplementary characters need to be
encoded as a pair of char values. To assist with this, the Character class defines a range of methods that
help with the encoding and decoding of surrogate pairs, and accounting for their existence in a sequence of
character values:
public static intcharCount(int codePoint)