data-architecture-a

(coco) #1
Fig. 4.6.10 Filtering emails.

Spreadsheets


Another special case is the case of spreadsheets. Spreadsheets are ubiquitous. Sometimes,
the information on the spreadsheet is purely numerical. But on other occasions, there is
character-based information on a spreadsheet. As a rule, textual disambiguation does not
process numerical information from a spreadsheet. That is because there are no metadata
to accurately describe numeric values on a spreadsheet. (Note: there is formulaic
information for the numbers found on a spreadsheet, but the spreadsheet formulas are
almost worthless as metadata descriptions of the meaning of the numbers.) For this
reason, the only data that are found on the spreadsheet that make its way into textual
ETL are the character-based descriptive data.


To this end, there is an interface that allows the data on the spreadsheet that are useful to
be formatted from the spreadsheet into a working database. From the working database,
the data are then sent into textual disambiguation, as seen in Fig. 4.6.11.


Chapter 4.6: Textual Disambiguation
Free download pdf