dealing with such a flexible format. If you add the = character to the delimiter it isn't too hard to simply treat
every two words as a name and value pair. However, because delimiters are ignored, you can't detect
misplaced = characters. If you don't make the = character a delimiter, then you have problems when there is
no whitespace between the = and the name or value.
If you wanted to store Attr objects and read them back with a Scanner, you could have more flexibility
with the values than if you used StreamTokenizer, but the file would be more restricted in format. For
example, here is a pair of methods to print and scan attributes:
public static void printAttrs(Writer dest, Attr[] attrs) {
PrintWriter out = new PrintWriter(dest);
out.printf("%d attrs%n", attrs.length);
for (int i = 0; i < attrs.length; i++) {
Attr attr = attrs[i];
out.printf("%s=%s%n",
attr.getName(), attr.getValue());
}
out.flush();
}
public static Attr[] scanAttrs(Reader source) {
Scanner in = new Scanner(source);
int count = in.nextInt();
in.nextLine(); // skip rest of line
Attr[] attrs = new Attr[count];
Pattern attrPat =
Pattern.compile("(.?)=(.)$", Pattern.MULTILINE);
for (int i = 0; i < count; i++) {
in.findInLine(attrPat);
MatchResult m = in.match();
attrs[i] = new Attr(m.group(1), m.group(2));
}
return attrs;
}
The printAttrs method uses printf to print an attribute in one line, using a = character to separate
name from value. The scanAttrs method will read such a file back in. It uses a combination of
stream-based and line-based Scanner calls. It first reads back the count as an int and then consumes the
rest of the line (the "attrs" from the printf is just for human readability). It then loops getting name/value
lines. The pattern used has two capture groups, the first to get the attribute name (using a non-greedy qualifier
to get the fewest possible characters before the =) and the second to get the value. These groups are pulled out
to create the actual Attr object.
It is no accident that these two methods are paired. Although a Scanner can read many formats of data, it is
best used for data formatted in relatively straightforward ways. Stream mode is useful for reading user input
as well, but when you use Scanner for data you should be able to visualize (if not actually write) the
printf usage that would generate that data. Scanner is extremely powerful, but that power can be difficult
to harness.
Exercise 22.10: Write a method to tokenize input that ignores comments, using the comment pattern as part of
the scanner's delimiter.
Exercise 22.11: Write a version of readCSV that uses a StreamTokenizer rather than a Scanner.
Exercise 22.12: Write a version of the attribute reading method from page 533 that uses a Scanner. For this