data-architecture-a

(coco) #1
Fig. 4.6.7 A load utility.

Document Fracturing/Named Value Processing


There are many features to the actual processing done by textual disambiguation. But
there are two primary paths of processing a document. These paths are called document
fracturing and named value processing.


Document fracturing is the process by which a document is processed—word by word—
doing such processing as stop word processing, alternate spelling and acronym resolution,
and homographic resolution. The effect of document fracturing is that upon processing,
the document still has a recognizable shape, albeit in a modified form. For all practical
purposes, it appears as if the document has been fractured.


The second major type of processing that occurs is named value processing. Named value
processing occurs when inline contextualization needs to be done. Inline
contextualization is done where the text is repetitive, as sometimes occurs. When text is
repetitive, it can be processed by looking for unique beginning delimiters and ending
delimiters.


There are other types of processing that can be done by textual disambiguation, but
document fracturing and named value processing are the two primary analytic processing
paths.


Fig. 4.6.8 depicts the two primary forms of processing that occur in textual


Chapter 4.6: Textual Disambiguation
Free download pdf