data-architecture-a

(coco) #1

The remainder of this chapter will be an explanation of logic that is found in textual
disambiguation.


Inline Contextualization


One form of contextualization is a form that is called “inline contextualization” (or
sometimes called “named value” processing). Inline contextualization only applies when
there is a repetition and predictability of text. It is noted that in many cases, there is no
predictability of text, so inline contextualization cannot be used in these cases.


Inline contextualization is the process of inferring the context of a word or phrase by
looking at the text immediately preceding and immediately following the word or phrase.
As a simple example of inline contextualization, consider the raw text “2. This is a PAID-
UP LEASE.”


The context name would be contract type. The beginning delimiter would be “2. This is
a” and the ending delimiter would be “.” The system would produce an entry into the
analytic database that would look like the following:


Document name, byte, context—contract type, value—PAID-UP LEASE

Fig. 10.1.6 shows the activity the system does in processing raw text to determine inline
contextualization.


Fig. 10.1.6 Finding beginning and ending delimiters.

Note that beginning delimiter must be unique. If you were to specify “is a” as a beginning
delimiter, then every occurrence where the term “is a” is found would be qualified. And
there may be many places where the term “is a” is found that does not specify inline
contextualization.


Chapter 10.1: Nonrepetitive Data
Free download pdf