data-architecture-a

(coco) #1

Fig. 4.3.8 shows the parsing of nonrepetitive data.


Fig. 4.3.8 Parsing nonrepetitive data.

The parsing of nonrepetitive is an entirely different matter than the parsing of repetitive
data. In fact, the term—“parsing of nonrepetitive data”—is often referred to as textual
disambiguation. There is much more to the reading of nonrepetitive data than merely
parsing it.


However it is done, nonrepetitive data are read and turned into a form that can be
managed by a database management system.


There is a very good reason why nonrepetitive data require well beyond a parsing
algorithm. The reason is that context in nonrepetitive data hides in many and complex
forms. For that reason, textual disambiguation is usually done external to the
nonrepetitive data in big data. (In other words, because of the inherent complexity of
nonrepetitive data, textual disambiguation is done outside of the database system that
manages big data.)


A related issue to parallel processing in the big data environment is that of the efficiency
of queries. As seen in Fig. 4.3.6, when a simple query is done against big data, the parsing
of the entire set of data contained in big data must be parsed. Even though the data are
managed in parallel, such a full database scan of data causes many machine resources to
be used.


An alternate approach is to scan the data once and create a separate index. This approach


Chapter 4.3: Parallel Processing
Free download pdf