data-architecture-a

(coco) #1
Punctuation
Grammar
Proper sentence construction

It cannot be argued that there are no rules that govern the creation of proper text. But
those rules are so complex that the rules are not obvious and apparent to the computer.
From the computer's perspective, text is unstructured simply because the computer
cannot understand all the rules of proper textual construction.


Contextualization


There are many parts of text that must be managed if text is to be turned into a form that
is useful to the computer. But easily, the most important and the most complex aspect of
text that must be mastered is that of finding and determining the context of text. Stated
differently, if you do not understand the context of text, you cannot use text for any form
of useful decision-making.


Contextualization of text then is the single largest challenge facing the analyst who
wishes to use nonrepetitive unstructured text in the decision-making process.


Fig. 4.4.7 shows an example of the importance of understanding context.


Fig. 4.4.7 Text makes no sense without understanding context.

Two gentlemen are standing on a corner, and one gentleman says to the next as a young


Chapter 4.4: Unstructured Data
Free download pdf