data-architecture-a

(coco) #1

lady passes by—“She's hot.”


Now, what is being said here?


One interpretation is that the gentleman finds the young lady to be attractive and he
would like to have a date with her.


Another interpretation is that it is Houston, Texas, on a July day and it is 98 degrees and
100% humidity. The lady is wet from pouring sweat. She's hot.


Another interpretation is that the two gentlemen are in a hospital and they are doctors.
One doctor has just taken the lady's temperature, and she has a temperature of 104
degrees. She is burning up with fever, and she's hot.


These then are three very different meanings of the words—“She's hot.” Trying to use
and interpret these words without understanding the context could lead to disaster and
embarrassment.


The need to find and understand context is hardly limited to the words—“She's hot.” The
need to find and understand context is true for all words.


The largest challenge facing the analyst who wishes to make sense of nonrepetitive
unstructured data then is that of understanding how to contextualize text.


It is noteworthy that there are other challenges as well. As important as contextualization
is, it is hardly the only challenge when it comes to doing analysis.


Fig. 4.4.8 shows that finding context in nonrepetitive unstructured data is a major
challenge.


Fig. 4.4.8 Finding context.

Some Approaches to Contextualization


The notion that finding context in nonrepetitive unstructured data is a challenge is not a
new idea. Indeed, people have been attempting to contextualize text for a long time. The


Chapter 4.4: Unstructured Data
Free download pdf