data-architecture-a

(coco) #1
Fig. 4.6.5 Iterative development.

Input/Output


The input to the process of textual disambiguation is electronic text. There are MANY
forms of electronic text. Indeed, electronic text can come from almost anywhere. The
electronic text can be in the form of proper language, slang, shorthand, comments,
database entries, and many other forms. Textual disambiguation needs to be able to
handle all the forms of electronic text. In addition, electronic text can be in different
languages.


Textual disambiguation can handle nonelectronic text after the nonelectronic text passes
through an automated capture mechanism such as optical character recognition (OCR)
processing.


The output of textual disambiguation can take many forms. The output of textual
disambiguation is output that is created in a “flat file format.” As such, the output can be
sent to any standard DBMS or to Hadoop.


Fig. 4.6.6 shows the types of output that can be created from textual disambiguation.


Chapter 4.6: Textual Disambiguation
Free download pdf