data-architecture-a

(coco) #1

earliest attempt to trying to contextualize text is a technology called “NLP.” NLP stands
for natural language processing (or sometimes “natural language programming.”)


NLP has been around a long time and has met with modest success. There are several
inherent limitations to NLP. The first limitation is that NLP makes the assumption that
context of text can be derived from text itself. The problem is that only a small amount of
context comes from text itself. In the case of the two gentlemen standing around and
saying—“She's hot”—the vast majority of the context comes from external sources, not
textual sources. Is the lady young and attractive? Is it Houston, Texas, in the
summertime? Is the conversation taking place in a hospital? All of these circumstances
that provide context are external to the words that are being spoken.


The second limitation of NLP is that NLP does not account for emphasis. Suppose the
words are spoken—“I love you.” How are these words to be interpreted?


If you say “I love you” where the emphasis is on “I,” the meaning is that it is me and not
someone else who loves you. If the emphasis is on the word “love,” the meaning is that
the emotion I feel is strong, one of love. I don’t like you—I actually love you. If the
emphasis is on the word “you,” the meaning is that it is you and not someone else that I
love.


So, the same words can have very different meaning based on the way the words are said.


But there is a very different reason why NLP has had a hard time showing concrete
results. That reason is that NLP—in order to be implemented effectively—must
understand the logic behind words. The problem is that the English language has evolved
over many years and many circumstances, and at the end of the day, the logic behind the
English language is very complex. Trying to map out the logic of the English language is
very difficult to do. It is tortuous.


For these reasons (and probably more), NLP processing has met with modest success.


A much more practical approach is that of textual disambiguation.


Fig. 4.4.9 shows the two approaches toward contextualization of text.


Chapter 4.4: Unstructured Data
Free download pdf