similar in structure and even in context. Nonrepetitive records are records that appear
where there is little or no repetition of records from one record to the next.
But repetitive text is something entirely different. Repetitive text refers to text appearing
the same way or in a very similar way across more than one document. A simple example
of repetitive text is boilerplate contracts. In boilerplate contracts, a lawyer has taken a
basic contract and added a few words to it. The same contract appears over and over
again in a repetitive manner. Another example of repetitive text is blood pressure. In
blood pressure readings, blood pressure is written as “bp 124/68.” The first number is the
diastolic reading, and the second number is the systolic reading. When one encounters
“bp 176/98,” one knows exactly what is meant by the text. The text is repetitive.
Of course, you can use as many techniques and specifications are as applicable. You can
use taxonomies, inline contextualization, and custom formatting, all at once. Or you can
use only taxonomy processing or only inline contextualization. The data and what you
want to do with the data dictate how you will choose to do what is needed.
One of the issues is choosing name for variables. For example, when you create a custom
format, you choose a name for the variable. Suppose you wanted to pick up telephone
number. You could use a specification of “999-999-9999.” You need to name the
variable that is created in a meaningful manner. The variable name becomes the context.
For example, for a telephone number, the name “variable001” would be a terrible name.
No one would know what you meant when they encountered “variable001.” Instead, a
name like “telephone_number001” is much more appropriate. When a person reads
“telephone_number001,” it is immediately obvious what is meant.
The definition of a mapping is meant to be done in an iterative manner. It is HIGHLY
unlikely that you will create a mapping and that the first mapping you create becomes the
final mapping. It is MUCH MORE likely that you will create a mapping, run the mapping
against the document, then go back, and make adjustments to the mapping. Documents
are complex, and language is complex. There are plenty of nuances in language that
people take for granted. Therefore, it is unrealistic to think that you will create the
perfect mapping the first time you create one. It just doesn’t happen with even the most
experienced people.
Textual ETL often has multiple ways to handle the same interpretation. In many cases,
the mapper will be able to accomplish the same results in more than one way. There is no
right way or wrong way to do something in textual ETL. You can choose whatever way
Chapter 10.2: Mapping