data-architecture-a

(coco) #1

unstructured records are the following:


Very nonuniform in shape.
Sometimes small, sometimes large, and sometimes very large.
The records are quite difficult to parse because the records are made up of text and text requires an
entirely different approach than simple parsing.

There are probably more differences between these two types of data. But these
differences alone warrant the recognition of the “great divide” between the types of
unstructured data.


So, what is so difficult about going in and working with text? Fig. 4.4.6 shows some
typical text.


Fig. 4.4.6 Some typical text.

There are many reasons why text is so difficult to work with.


First off, there is the discussion of whether text is actually unstructured at all. An English
teacher might argue that text is anything but unstructured. There are rules that govern the
structure of all text. Some of the rules include the following:


Spelling

Chapter 4.4: Unstructured Data
Free download pdf