Hacking Google Maps and Google Earth (ExtremeTech)

(Dana P.) #1

68 Part I — Basics


Reading Delimited Files


Getting information out of a delimited file is generally straightforward, although there are a
few oddities that can complicate matters. If the file is tab-delimited or uses some other delimit-
ing character (colons, semicolons, and tildes [~] are common), fields are delimited by tabs and
records are delimited by a carriage return and/or a linefeed character. To read the data, you read
each line and split the line up by tabs to extract the individual fields.

Listing 5-1 shows an example of this technique in action with a Perl script. Most languages
have some form of tokenizer (a function that converts lines according to a separation character
or expression) that will split up the string on a given character; here, I use the split function to
extract each field from the record.

Listing 5-1:Reading a Delimited File

open(DATA,$ARGV[0]) or die “Cannot open file: $!”;

while(<DATA>)
{
chomp;
my ($id,$ref,$fname,$lname,$country) = split /\t/,$record;
print “ID: $id\nRef: $ref\nFirst: $fname\nLast: $lname\nCountry:
$country\n”;
}

close(DATA);

For comma-separated value (CSV ) files, the process is slightly more complicated because a
value in a CSV file is normally additionally qualified with double quotes to ensure that any
data that may include a comma is not misunderstood during parsing. Although you could
develop such a solution yourself, it’s easier to use an existing module. In this case, I use the
Text::CSV_XSmodule to do the parsing. Listing 5-2 shows an example of an application in
action.

Listing 5-2:Reading a Comma-Separated Value File

use Text::CSV_XS;

open(DATA,$ARGV[0]) or die “Couldn’t open file: $!”;

my $csv = Text::CSV_XS->new();

while(<DATA>)
{
Free download pdf