data-architecture-a

(coco) #1
Fig. 4.6.11 Reformatting spreadsheet data.

Report Decompilation


Most textual information is found in the form of a document. And when text is on a
document, it is processed linearly by textual disambiguation. Fig. 4.6.12 shows that
textual disambiguation operates in a linear fashion.


Fig. 4.6.12 Linear processing of text.

But text on a document is not the only form of nonrepetitive unstructured data. Another
common form of nonrepetitive unstructured data is that of a table. Tables are found
everywhere—in bank statements, in research papers, in corporate invoices, and so forth.


On some occasions, it is necessary to read the table in as input, just as text is read in on a
document. To this end, a specialized form of textual disambiguation is required. This form
of textual disambiguation is called report decomposition.


In report decomposition, the contents of the report are handled very differently than the
contents of text. The reason why reports are handled differently from text is that in a
report, the information cannot be handled in a linear format.


Fig. 4.6.13 shows that there are different elements of a report that must be brought
together in a normalized format. The problem is that those elements appear is a decidedly


Chapter 4.6: Textual Disambiguation
Free download pdf