THE Java™ Programming Language, Fourth Edition

(Jeff_L) #1

The above fails to work because it can act like there were two comments when there was only one. Consider a
non-comment line followed by a comment line. The input stream might look something like this:


token\n# This is a comment line\ntoken2


After the last token on the non-comment line is processed, the current input position is just before the line
separator that delimited the end of that token. When hasNext is invoked it looks past the line separator and
sees that there is something there, so it returns true, leaving the current input position where it was. When
hasNext(COMMENT) is invoked, it too ignores the line separator delimiter and sees a pattern on the next
line that matches the comment, so it also returns true, leaving the current input position where it was. When
nextLine is invoked its job is to advance the current position to the beginning of the next line and return the
input that was skipped. The current position is immediately before the line separator, so moving it after the
line separator is a very short move and, other than the line separator, no input is skipped. Consequently,
nextLine returns an empty string. In our example this is not a problem because we loop around again and
match the comment a second time, and this time we remove it properly. However, if your code assumed that
the comment itself was gone after nextLine was invoked, then that assumption would be incorrect. We can
fix this problem like so:


Scanner in = new Scanner(source);
Pattern COMMENT = Pattern.compile("#.*");
String comment;
// ...
while (in.hasNext()) {
if (in.hasNext(COMMENT)) {
comment = in.findWithinHorizon(COMMENT, 0);
in.nextLine();
}
else {
// process other tokens
}
}


Now when hasNext(COMMENT) tells us that there is a comment ahead, we use
findWithinHorizon(COMMENT,0) to skip the line separator, find the actual comment, and return it.
We don't need to set any horizon because we know from hasNext that the comment is there. After
findWithinHorizon returns the comment, the current position is just before the line separator at the end
of the comment line, so we use nextLine to skip over that to the next line.


Another way to get the scanner to skip comments would be to make the comment pattern part of the delimiter
pattern. But that is not quite as straightforward as it might sound. We leave that approach as an exercise for
the reader.


So skipping comments was trivial with StreamTokenizer and quite involved with Scanner. In contrast,
it is quite simple to change the scanner's delimiter to parse a comma-separated-variable file, but to do the
same with StringTokenizer requires careful manipulation of the character classes to make the comma a
whitespace character and to stop space from being considered a whitespace character. Although relatively
simple to state, the API makes it awkward to do and it is conceptually bizarre.


StreamTokenizer is very good for working with free-format files such as that used in the attribute
reading example on page 533. It read input that consisted of names and values, seperated by whitespace, with
an optional = character in between, and stored them into an Attr object. The names were simply words, and
values could be words or numbers, while the = character was an ordinary character. Pairing of names and
values was trivially done, as was detecting a misplaced = character. In contrast, Scanner has a lot of trouble

Free download pdf