data-architecture-a

(coco) #1

disambiguation.


Fig. 4.6.8 The two main processing components of textual ETL.

Preprocessing a Document


On occasion, it is necessary to preprocess a document. On occasion, the text of a
document cannot be processed in a standard fashion by textual disambiguation. In these
circumstances, it is necessary to pass the text through a preprocessor. In the
preprocessor, the text can be edited to alter the text to the point that the text can be
processed in a normal manner by textual disambiguation.


As a rule, you don’t want to preprocess text unless you absolutely have to. The reason
why you don’t want to have to preprocess text is that by preprocessing text, you
automatically double (or more!) the machine cycles that are required to process the text.


Fig. 4.6.9 shows that—if necessary—electronic text can be preprocessed.


Chapter 4.6: Textual Disambiguation
Free download pdf