While a human being can read these forms of data and understand what is meant, a
computer cannot.
Data standardization by textual ETL reads data, recognizes them as a date, recognizes
what date value is being represented in text, and converts the date value into a standard
value. The standard value is then stored in the analytic database.
Fig. 10.1.13 shows how textual ETL reads raw text and converts date values into
standardized values.
Fig. 10.1.13 Converting dates into a standardized format.
As an example of the processing done by textual ETL against raw text, consider the
following raw text:
...she married on July 15, 2015 at a small church in Southern Colorado....
The database reference generated for the analytic database would look like the following:
Document name, byte, context—date value, value—20150715
List Processing
Occasionally, text contains a list. And occasionally, the list needs to be processed as a
list, rather than as a sequential string of text.
Textual ETL can recognize and process a list if asked to do so.
Chapter 10.1: Nonrepetitive Data