data-architecture-a

(coco) #1

Fig. 10.1.18 shows that document metadata can be created by textual ETL.


Fig. 10.1.18 Processing document metadata.

Suppose an organization has a contract document. Running textual ETL against the
contract document can produce the following entry into the analytic database:


Document name, byte, document title—Jones Contract, July 30, 1995, 32651 bytes, by Ted Van Duyn,...

Document Classification


In addition to document metadata being able to be gathered, it is also possible to classify
documents into an index. As an example of classifying documents, suppose the company
is an oil company. One way of classifying document in an oil company is according to
how the documents belong to a part of the organization. Some documents are about
exploration. Some documents are about oil production. Some documents are about
refining, oil distribution, and oil sales.


Textual ETL can read the document and determine which classification the document
belongs in.


Fig. 10.1.19 shows the reading of raw text and the classification of documents.


Chapter 10.1: Nonrepetitive Data
Free download pdf