data-architecture-a

(coco) #1
Fig. 3.1.8 Data generated automatically.

In Fig. 3.1.8, data are generated quickly and in great volumes. Several things happen to
the data that are generated automatically. Not all data are selected for movement into the
data lake. Some data are selected randomly. Other data are selected because they are
outside a preset threshold of boundaries. Other data are selected because of the time of
day they were generated. There are many criteria that can be applied to the selection of
data that have been automatically generated.


After the data for movement are selected, other data are typically added. Typical data
that are added are the date and time of the generation, the location of the data, the
machine identification of the data generated, and so forth.


After the data have been selected and modified, it is placed in the data lake.


Transforming Bulk Data


One of the more interesting transformations occurs when data go from the data lake back
to the corporate data warehouse. In this case, mass amounts of data are read and filtered.
The results of the filtering are sent to the data warehouse where the data can be actively
analyzed. In addition, the filtered data can be combined with existing active data.


Chapter 3.1: Transformations in the End-State Architecture
Free download pdf