data-architecture-a

(coco) #1

Also, note that the ending delimiter must be specified exactly. In this case, if the term
does not end in a “.” the system will not consider the entry to be a hit.


Because the ending delimiter must be specified accurately, the analyst also specifies a
maximum character count. The maximum character count tells the system how far to
search to determine whether the ending delimiter has been found.


On occasion, the analyst wants the inline contextualization search to end on a special
character. In this case, the analyst specifies the special character that is needed.


Taxonomy/Ontology Processing


Another powerful way to specify context is through the usage of taxonomies and
ontologies.


There are many important things that taxonomies do for contextualization. The first is
applicability. Whereas inline contextualization requires repetitive and predictable
occurrences of text to be applicable, taxonomies do not have such a requirement.
Taxonomies are applicable just about everywhere. A second valuable feature of
taxonomies is that can be applied externally. This means that in choosing the taxonomy
to be applied, the analyst can greatly influence the interpretation of the raw text.


For example, suppose the analyst was going to apply a taxonomy to the phrase “President
Ford drove a Ford.” If the interpretation that analyst wished to infer were about cars,
then the analyst would choose one or more taxonomy that would allow “Ford” to be
interpreted as an automobile. But if the analyst were to choose a taxonomy relating to the
history of the presidents of the United States, then the term “Ford” would be interpreted
to be a former president of the United States.


The analyst then has great power in applying the correct taxonomy to the raw text that is
to be processed.


The mechanics of how a taxonomy processes against raw text is seen in Fig. 10.1.7.


Chapter 10.1: Nonrepetitive Data
Free download pdf