Managing Information Technology

(Frankie) #1
Chapter 5 • Enterprise Systems 203

produce customized reports from the data warehouse, per-
haps on a regular basis, as well as query the data ware-
house to answer specific questions. Data warehouses can
be built around a traditional database management system,
such as Oracle or IBM DB2, by using add-on data ware-
housing software tools, but increasingly organizations are
turning to comprehensive data warehousing software pack-
ages or to data warehousing “appliances” to handle their
data warehouses.
The comprehensive data warehousing software
packages include IBM InfoSphere Warehouse, Informatica
Platform, Microsoft SQL Server, Oracle Data Integrator
Enterprise Edition, and SAS/Warehouse Administrator. A
data warehousing applianceis a packaged solution
consisting of hardware (i.e., server, storage) and software
(i.e., operating system, database management system,
other data warehousing software), where the software has
been specifically pre-installed and pre-optimized for data
warehousing. Data warehousing appliances on the market
include Hewlett-Packard NeoView, IBM InfoSphere
Balanced Warehouse, Netezza TwinFin, Oracle’s Sun
Oracle Database Machine, Sybase IQ, and Teradata Data
Warehouse Appliance. Note that some data warehousing
appliances, such as Sybase IQ, employ third-party
hardware.
In addition to appliances, another new development
in the data warehousing scene is column-store databases.
Consider a customer database, where each customer’s
record occupies one row, and where each column contains
the same attribute, such as customer name or customer zip
code, for each customer. When this database is stored in
memory or on the hard drive as a traditional, row-store
database, all the attributes for the first customer (or row)
are serialized together, followed by all the attributes for the
second customer, and so on. When the same database is
stored in column-store format, then all the values for the
first attribute (or column), say customer names, are
serialized together, followed by all the values for the
second attribute, say customer zip code, and so on. The
traditional row-store approach tends to be more efficient in
transaction processing, because entire new rows are
created and existing rows are modified. However, in
queries that are only concerned with a limited number of
columns (attributes), such as producing a report of sales by
zip code, a column-store approach is more efficient. In this
case, only the zip code and sales columns need to be
accessed, and all the values for zip codes are stored together,
as are all the values for sales. Thus, operational databases,
which are used primarily for transaction processing, are
almost always row-store, while data warehouses, which
are used for a variety of purposes—including querying—
might be either row-store or column-store, depending on


the mix of uses of the warehouse (Henschen, 2008).
Column-store data warehouse solutions include Infobright
(based in Canada), ParAccel Analytic Database, Sybase
IQ, and Vertica Analytic Database.
For data warehouses built around traditional
database management systems, there is a wide variety of
software tools available in the warehouse access and
analysis area. While comprehensive data warehousing
software packages include some access and analysis
tools, some users need different types of analysis tools
and choose to add another package. Among the many
analysis packages available are Computer Associates’
CA ERwin Data Modeler, Information Builders’
WebFOCUS, Microstrategy 9, Oracle Data Mining, and
SAS Institute’s Enterprise Miner and SAS Visual BI. We
will defer further consideration of these analysis tools
until the next chapter, when we consider decision
support systems, data mining, executive information
systems, and especially business intelligence systems in
more detail. In our judgment, creation and maintenance
of the data warehouse is an enterprise system, while
these end-user reporting and analysis tools are designed
for management support—the topic of Chapter 6.
Data warehousing is being used successfully by
organizations of all shapes and sizes. Let us consider some
examples. The U.S. Postal Service has assembled a
gigantic 32-terabyte data warehouse, and is using the
system to analyze many areas of its business, including
sales at individual post offices, the efficiency of mail-
processing facilities, and the use of manpower and
transportation resources. The data warehouse is based on
hardware and software from Teradata. The warehouse
collects retail data from 37,000 post offices, data from
mail-processing facilities, package-tracking data, air-
transportation data, and data from the Postal Service’s ERP
and CRM applications. At present, the data warehouse
generates about 20,000 reports for 1,800 users every day,
using software from Microstrategy; the number of reports
is expected to grow to 60,000 reports for more than 5,000
users, according to Wayne Grimes, Customer-Care
Operations Manager for the Postal Service. The data
warehouse provides the Postal Service with a much clearer
picture of its finances and operations. In the past, it took
three to four months to close the books at the end of the
fiscal year, but last year—using the data warehouse—it
took less than five weeks (Whiting, 2005).
Walmart operates a massive data warehouse
containing—as of January 2006—583 terabytes of sales
and inventory data. The data warehouse is built on a
massively parallel 1,000-processor system from Teradata.
“Our database grows because we capture data on every
item, for every customer, for every store, every day,” says
Free download pdf