data-architecture-a

(coco) #1
Fig. 10.1.12 Tagging numerical values.

As an example of how textual ETL might read a document and tag a numeric value,
consider the following raw text:


Raw text—“...Invoice amount”—“$813.97,...”

The data placed onto the analytic database would look like the following:


Document name, byte, context—invoice amount, value—813.97

Date Tagging


Date tagging operates on the same basis as numeric tagging. The only difference is that
date tagging operates on dates rather than numeric values.


Date Standardization


Date standardization comes in useful when there are multiple documents that have to be
managed or when a single document requires analysis based on date. The problem with
date is that it can be formatted in so many ways. Some common ways that date can be
formatted include the following:


May 13, 2104
23 rd of June, 2015
2001/05/28
14/14/09

Chapter 10.1: Nonrepetitive Data
Free download pdf