Chapter 10.1
Nonrepetitive Data
Abstract
Nonrepetitive analytics begins with the contextualization of the nonrepetitive data.
Unlike repetitive data, the context of nonrepetitive data is difficult to determine. The
context of nonrepetitive big data is determined by textual disambiguation. In textual
disambiguation, there are algorithms that relate to stop word resolution, stemming,
homographic resolution, inline contextualization, taxonomy/ontology resolution, custom
variable resolution, acronym resolution, and so forth. Nonrepetitive analytics is very
relevant to business value. Some typical forms of nonrepetitive analytics include the
analysis of medical records, warranty analysis, insurance claim analysis, and call center
analysis.
Keywords
Nonrepetitive data; Textual disambiguation; Stemming; Stop word processing;
Homographic resolution; Taxonomic resolution; Custom variable resolution; Acronym
resolution; Inline contextualization
There are two types of data that reside in the big data environment—repetitive data and
nonrepetitive data. Repetitive data are relatively easy to handle because of the repetitive
nature of the structure of the data. But nonrepetitive data are anything but easy to handle
because every unit of data in the nonrepetitive environment must be individually
interpreted before it can be used for analytic processing.
Fig. 10.1.1 shows a representation of nonrepetitive data as they reside in a raw state in
the big data environment.
Chapter 10.1: Nonrepetitive Data