That is why context is 90% of the work done by textual ETL in order to read and prepare
text for inclusion into a database.
Despite the fact that context is so difficult to identify and manage, it is MANDATORY
that context be included with EVERY word put into a database. If a word was to be put
by itself into a database, the word would be naked. A word without context would be lost
and almost useless for the purpose of being analyzed.
Textual ETL then is the technology that allows text to be read and meaningfully placed
into a database. Textual ETL ALWAYS—in every case—considers both the word and its
context.
Fig. 17.1.5 shows textual ETL.
Fig. 17.1.5 Textual ETL.
Textual ETL reads as input raw text, taxonomies, and other input and determines what
text is important and how the text is to be processed. The output is a standard database.
Chapter 17.1: Managing Text