data-architecture-a

(coco) #1
Fig. 10.1.20 Proximity analysis.

As an example of proximity analysis against raw text, suppose there were raw text that
looked like


...away in a manger no crib for a child....


Suppose the analyst had specified that the words manger, child, and crib were the words
that made up the proximity variable—baby Jesus.


The results of the processing would look like the following:


Document name, byte, context—manger, crib, child, value—baby Jesus.

Care must be taken with proximity analysis as a great amount of system resources can be
expended if there are many proximity variables to be sought.


Functional Sequencing Within Textual ETL


There are many different functions that occur within textual ETL. Given on the
document and the processing that needs to occur, the sequence the functions are done in
has a great impact on the validity of the results. In fact, the sequence of the functions
may determine whether the results that are achieved are accurate or not.


Therefore, one of the more important features of textual ETL is the ability to sequence
the order in which functions are executed.


Fig. 10.1.21 shows that the different functions can be sequences at the discretion of the


Chapter 10.1: Nonrepetitive Data
Free download pdf