data-architecture-a

The infrastructure for big data is quite different than the infrastructure found in a
standard DBMS. In the infrastructure for big data, there is a block. And in the block are
found many repetitive records. Each record is merely concatenated to each other record.
Fig. 1.2.8 is representative of a record that might be found in big data.

Fig. 1.2.8 Records inside the block.

In Fig. 1.2.8, it is seen that there is merely a long string of data, with records stacked one
against the other. The system only sees the block and the long string of data. In order to
find a record, the system needs to “parse” the string, as seen in Fig. 1.2.9.

Fig. 1.2.9 Parsing records inside the block.

Suppose the system wants to find a given record. The system needs to sequentially read
the string of data until it recognizes that there is a record. Then, the system needs to go
into the record and determine whether it is record “B.” This is how a search is conducted
in the most primitive state in big data.

It doesn’t take much of an imagination to see that a lot of machine cycles are chewed up
looking for data in big data. To this end, the big data environment employs a means of
processing referred to as the “Roman census” approach. More will be described about
the Roman census approach in the chapter on big data.

The Two Infrastructures

The two different infrastructures are contrasted in Fig. 1.2.10.

Chapter 1.2: The Data Infrastructure

data-architecture-a

The Two Infrastructures

Get our desktop app

Company

Features

Documentation

Resources