data-architecture-a

(coco) #1

Chapter 10.2


Mapping


Abstract


Nonrepetitive analytics begins with the contextualization of the nonrepetitive data.
Unlike repetitive data, the context of nonrepetitive data is difficult to determine. The
context of nonrepetitive big data is determined by textual disambiguation. In textual
disambiguation, there are algorithms that relate to stop word resolution, stemming,
homographic resolution, inline contextualization, taxonomy/ontology resolution, custom
variable resolution, acronym resolution, and so forth. Nonrepetitive analytics is very
relevant to business value. Some typical forms of nonrepetitive analytics include the
analysis of medical records, warranty analysis, insurance claim analysis, and call center
analysis.


Keywords


Nonrepetitive data; Textual disambiguation; Stemming; Stop word processing;
Homographic Resolution; Taxonomic resolution; Custom variable resolution; Acronym
resolution; Inline contextualization


Mapping is the process of defining the specifications of how a document is to be
processed to textual ETL. There is a separate mapping for each type of document to be
processed. One of the nice features of textual ETL is that the analyst can build on the
specification of previous mappings when it comes time to build a new mapping. On many
occasions, one mapping will be very similar to another mapping. It is not necessary for
the analyst to create a new mapping if a previous mapping has been created that is
similar.


At first glance, creating mappings is a bewildering process. It is like the airline pilot at the
control of the airplane. There are many control panels and many switches and buttons. To
the uninitiated, flying an airplane seems to be an almost monumental task.


However, once an organized approach is taken, learning to do mapping is a
straightforward process.


Fig. 10.2.1 shows the questions the analyst needs to be asking as he/she does the mapping


Chapter 10.2: Mapping
Free download pdf