THE Java™ Programming Language, Fourth Edition

(Jeff_L) #1

Consult the release documentation for your implementation to see if any other character set encodings are
supported.


Character sets and their encoding mechanisms are represented by specific classes within the
java.nio.charset package:


Charset

A named mapping (such as US-ASCII or UTF-8) between sequences of
16-bit Unicode code units and sequences of bytes. This contains general
information on the sequence encoding, simple mechanisms for encoding and
decoding, and methods to create CharsetEncoder and
CharsetDecoder objects for richer abilities.

CharsetEncoder

An object that can transform a sequence of 16-bit Unicode code units into a
sequence of bytes in a specific character set. The encoder object also has
methods to describe the encoding.

CharsetDecoder

An object that can transform a sequence of bytes in a specific character set
into a sequence of 16-bit Unicode code units. The decoder object also has
methods to describe the decoding.

You can obtain a Charset via its own static forName method, though usually you will just specify the
character set name to some other method (such as the String constructor or an I/O operation) rather than
working with the Charset object directly. To test whether a given character set is supported use the
forName method, and if you get an UnsuppportedCharsetException then it is not.


You can find a list of available character sets from the static availableCharsets method, which returns
a SortedMap of names and Charset instances, of all known character sets. For example, to print out the
names of all the known character sets you can use:


for (String name : Charset.availableCharsets().keySet())
System.out.println(name);


Every instance of the Java virtual machine has a default character set that is determined during
virtual-machine startup and typically depends on the locale and encoding being used by the underlying
operating system. You can obtain the default Charset using the static defaultCharset method.


13.3. Regular Expression Matching


The package java.util.regex provides you a way to find if a string matches a general description of a
category of strings called a regular expression. A regular expression describes a class of strings by using
wildcards that match or exclude groups of characters, markers to require matches in particular places, etc. The
package uses a common kind of regular expression, quite similar to those used in the popular perl
programming language, which itself evolved from those used in several Unix utilities.

Free download pdf