data-architecture-a

(coco) #1

voice to text transcription. Written text—if it is not already in the form of electronic text
—can be captured and transformed by optical character recognition, OCR. However, the
text exists; it is prepared into the form of electronic text.


Transaction data are data that have been captured as the by-product of the execution of a
transaction. There are many kinds of transactions. There are bank teller transactions,
ATM transactions, airline reservations, retail purchases, credit card activity, inventory
management transactions, payment ledger transactions, and many more. These
transactions are usually run by applications. As a rule, applications are developed and
built in a “siloed” fashion. This means that when one application is built, it does not take
into consideration the other applications with which it must interact. Corporations end up
with a whole collection of applications, each one of which acts independently. The result
is unintegrated application data.


Corporate data are data that have entered the system and then have been transformed
into an integrated corporate state. The transformation moves the data from being
application-oriented data to a data warehouse where the data are integrated into a
corporate state. As a simple example of corporate integration, application A has gender
as male/female, application B has gender designated as x/y, and application C has gender
designated as 1/0. The corporate standard for the designation of gender is m/f. The
application data are converted as they were moved into the data warehouse from the
application.


The data marts contain data that are customized for the different groups that will be
analytically using the data. Typically, there are data marts for marketing, sales, finance,
and others. The source of data for the data marts is the data warehouse.


The data lake contains a variety of data. Some of the data found in the data lake are
archival data. Other data in the data lake are simply bulk data. And it is possible to build
a bulk data warehouse in the data lake. In addition, the bulk data warehouse may contain
a bulk data vault. The bulk data warehouse is the single version of the truth for bulk
amounts of data.


The data ponds are the subsets of the data lake that are set aside for different purposes.
There may be an archival data pond, a litigation support data pond, a general purpose
data pond, a manufacturing data pond, an analog data pond, and so forth.


Shaping the Data Through Models


Chapter 2.1: The End-State Architecture—The “World Map”
Free download pdf