data-architecture-a

(coco) #1

years. That data will not be in the online application database. Instead, you need to look
for specific data over time. The place to find that data is the data warehouse.


Suppose you want to examine the spending habits of all customers who deposit more than
$1000 a month in their account. You need to look at all of that data in order to satisfy a
special study. You might look for these data in a data mart. The processing you do here is
of an analytic nature.


Now, suppose you are being audited by the IRS. You need to go back 10 years to show
that a check was written a decade ago. You would go to your bulk data warehouse in the
data lake.


The factors that determine where data are placed include the following:


How much data are there?
How old are the data?
How quickly do the data have to be retrieved?
Can the data be updated?

Data in different places have different properties. And those properties affect their usage.


Data in the Data Lake


There can exist different kinds of data in the data lake. There are several reasons why
data are placed in a data lake:


The probability of access of the data has dropped significantly.
There are so much data that there is no better place to put the data.
The data have aged.
The usage of the data does not warrant being placed elsewhere.

Accordingly, data are placed in the data lake.


However, just because data have been placed in a data lake does not mean that the data
are (or are not) in a data warehouse. It is entirely possible that the data warehouse has
been extended into the data lake.


Fig. 2.1.6 shows the data in the data lake.


Chapter 2.1: The End-State Architecture—The “World Map”
Free download pdf