Managing Information Technology

Chapter 5 • Enterprise Systems 203

produce customized reports from the data warehouse, per-
haps on a regular basis, as well as query the data ware-
house to answer specific questions. Data warehouses can
be built around a traditional database management system,
such as Oracle or IBM DB2, by using add-on data ware-
housing software tools, but increasingly organizations are
turning to comprehensive data warehousing software pack-
ages or to data warehousing “appliances” to handle their
data warehouses.
The comprehensive data warehousing software
packages include IBM InfoSphere Warehouse, Informatica
Platform, Microsoft SQL Server, Oracle Data Integrator
Enterprise Edition, and SAS/Warehouse Administrator. A
data warehousing applianceis a packaged solution
consisting of hardware (i.e., server, storage) and software
(i.e., operating system, database management system,
other data warehousing software), where the software has
been specifically pre-installed and pre-optimized for data
warehousing. Data warehousing appliances on the market
include Hewlett-Packard NeoView, IBM InfoSphere
Balanced Warehouse, Netezza TwinFin, Oracle’s Sun
Oracle Database Machine, Sybase IQ, and Teradata Data
Warehouse Appliance. Note that some data warehousing
appliances, such as Sybase IQ, employ third-party
hardware.
In addition to appliances, another new development
in the data warehousing scene is column-store databases.
Consider a customer database, where each customer’s
record occupies one row, and where each column contains
the same attribute, such as customer name or customer zip
code, for each customer. When this database is stored in
memory or on the hard drive as a traditional, row-store
database, all the attributes for the first customer (or row)
are serialized together, followed by all the attributes for the
second customer, and so on. When the same database is
stored in column-store format, then all the values for the
first attribute (or column), say customer names, are
serialized together, followed by all the values for the
second attribute, say customer zip code, and so on. The
traditional row-store approach tends to be more efficient in
transaction processing, because entire new rows are
created and existing rows are modified. However, in
queries that are only concerned with a limited number of
columns (attributes), such as producing a report of sales by
zip code, a column-store approach is more efficient. In this
case, only the zip code and sales columns need to be
accessed, and all the values for zip codes are stored together,
as are all the values for sales. Thus, operational databases,
which are used primarily for transaction processing, are
almost always row-store, while data warehouses, which
are used for a variety of purposes—including querying—
might be either row-store or column-store, depending on

the mix of uses of the warehouse (Henschen, 2008). Column-store data warehouse solutions include Infobright (based in Canada), ParAccel Analytic Database, Sybase IQ, and Vertica Analytic Database. For data warehouses built around traditional database management systems, there is a wide variety of software tools available in the warehouse access and analysis area. While comprehensive data warehousing software packages include some access and analysis tools, some users need different types of analysis tools and choose to add another package. Among the many analysis packages available are Computer Associates’ CA ERwin Data Modeler, Information Builders’ WebFOCUS, Microstrategy 9, Oracle Data Mining, and SAS Institute’s Enterprise Miner and SAS Visual BI. We will defer further consideration of these analysis tools until the next chapter, when we consider decision support systems, data mining, executive information systems, and especially business intelligence systems in more detail. In our judgment, creation and maintenance of the data warehouse is an enterprise system, while these end-user reporting and analysis tools are designed for management support—the topic of Chapter 6. Data warehousing is being used successfully by organizations of all shapes and sizes. Let us consider some examples. The U.S. Postal Service has assembled a gigantic 32-terabyte data warehouse, and is using the system to analyze many areas of its business, including sales at individual post offices, the efficiency of mail- processing facilities, and the use of manpower and transportation resources. The data warehouse is based on hardware and software from Teradata. The warehouse collects retail data from 37,000 post offices, data from mail-processing facilities, package-tracking data, air- transportation data, and data from the Postal Service’s ERP and CRM applications. At present, the data warehouse generates about 20,000 reports for 1,800 users every day, using software from Microstrategy; the number of reports is expected to grow to 60,000 reports for more than 5,000 users, according to Wayne Grimes, Customer-Care Operations Manager for the Postal Service. The data warehouse provides the Postal Service with a much clearer picture of its finances and operations. In the past, it took three to four months to close the books at the end of the fiscal year, but last year—using the data warehouse—it took less than five weeks (Whiting, 2005). Walmart operates a massive data warehouse containing—as of January 2006—583 terabytes of sales and inventory data. The data warehouse is built on a massively parallel 1,000-processor system from Teradata. “Our database grows because we capture data on every item, for every customer, for every store, every day,” says

Managing Information Technology

Get our desktop app

Company

Features

Documentation

Resources