5,372.97
5 372,97
5.372,97
24.6.3. Text Boundaries
Parsing requires finding boundaries in text. The class BreakIterator provides a locale-sensitive tool for
locating such break points. It has four kinds of "get instance" methods that return specific types of
BreakIterator objects:
getCharacterInstance returns an iterator that shows valid breaks in a string for individual
characters (not necessarily a char).
•
- getWordInstance returns an iterator that shows word breaks in a string.
getLineInstance returns an iterator that shows where it is proper to break a line in a string, for
purposes such as wrapping text.
•
- getSentenceInstance returns an iterator that shows where sentence breaks occur in a string.
The following code prints each break shown by a given BreakIterator:
static void showBreaks(BreakIterator breaks, String str) {
breaks.setText(str);
int start = breaks.first();
int end = breaks.next();
while (end != BreakIterator.DONE) {
System.out.println(str.substring(start, end));
start = end;
end = breaks.next();
}
System.out.println(str.substring(start)); // the last
}
A BreakIterator is a different style of iterator from the usual java.util.Iterator objects you
have seen. It provides several methods for iterating forward and backward within a string, looking for
different break positions.
You should always use these boundary classes when breaking up text because the issues involved are subtle
and widely varying. For example, the logical characters used in these classes are not necessarily equivalent to
a single char. Unicode characters can be combined, so it can take more than one 16-bit Unicode value to
constitute a logical character. And word breaks are not necessarily spacessome languages do not even use
spaces.
Never speak more clearly than you think
Jeremey Bernstein
Chapter 25. Standard Packages
No unmet needs exist, and current unmet needs that are being met will continue to be met.
Transportation Commission on Unmet Needs, California