Once upon a time– which in the fast-moving technology industry is only a few years ago– the data warehouse was marketed as the ultimate MIS solution. It was an appealingly simple concept: If you wanted to be able to analyze your entire business, you should have access to your entire business’s data. That meant not simply local or regional summaries but extremely, even excruciatingly, detailed transaction data. The data warehouse was also closely associated with many firms’ enthusiasm for multifunctional enterprise systems, such as SAP’s R/3. As new transactions were captured by an enterprise system, they were immediately stored in a central data warehouse.
However, as many firms took the data warehousing concept to its logical extreme, some problems became apparent. First, these warehouses quickly became behemoths, sometimes taking up several terabytes of disk space, often containing hundreds of millions of low- level data records. These large databases pushed the limits of existing client/server hardware.
Second, because a data warehouse is by nature everything to everybody, these all-inclusive warehouses were themselves pushed to the limit by numerous queries and requests for information. The bottom line? If you can’t easily get information out of the data warehouse, in a format you can use, it’s not particularly useful as a decision-making tool.
For these reasons and more, data marts are gaining in popularity. In general terms, a data mart is a special-purpose database that contains a “slice” of the data warehouse that is of interest to a particular user or group of users–a finance department, a manufacturing plant, a marketing group. These data marts may simply be smaller relational databases that contain selected transaction- level information. Or, they may be multidimensional databases that store data summarized according to a variety of categories, or dimensions.
Indeed, there is a certain amount of confusion surrounding what, exactly, makes a data mart. “Some purists will say that a data mart must be a relational database,” says Steve Hurrell, a product marketing manager for Oracle Corp., in Redwood Shores, California. New types of data marts are emerging, such as the “virtual” data mart, in which information from a variety of sources may be collected online and stored in RAM for analysis; and the “disposable” data mart, in which data marts can be rapidly built–and just as rapidly discarded–for one-time projects and analyses. Still another connotation is beginning to catch on: the data mart as an enormous divisional data source used for large- scale data mining.
For finance and information technology managers, the challenge will be to wade through all the jargon and hype and identify which technologies can support their business goals.
Size isn’t necessarily a distinguishing characteristic of data marts. Although they typically contain less than 50GB of data, it’s common to find data marts ranging upward of 250GB, and experts see terabyte marts on the horizon. Specificity of function is a distinguishing characteristic, however. So are rapid implementation and relatively low cost. It may take more than $3 million and three years to build a multifunctional enterprise data warehouse, compared with, say, $200,000 and two months for a large data mart.
The function, implementation time, and cost differentials mean that a data mart is a far less risky project than an enterprise data warehouse. Earl Haddad & Associates, a consulting firm, estimates that 85 percent of data warehouses built before 1996 failed to live up to expectations. Why don’t data warehouses deliver? One reason is that many corporations took a “field of dreams” approach to warehousing: if you built it, users would come. Frequently, users didn’t show up, though, and companies ended up with what Forrester Research, in Cambridge, Massachusetts, has derisively termed “data basements.”
Data warehouses have other problems. Such top- down, cross-functional, multiyear technology projects put immense strain on a corporate IT function. They have greater exposure to project snafus, technological obsolescence, business changes, and even corporate politics. In contrast, data marts represent a less autocratic, more democratic approach to data warehousing, argues Paul Albright, vice president of marketing for Informatica Corp., a Menlo Park, California-based software vendor.
“Budgets of large companies have migrated toward the line of business,” where data marts are used, says Albright. “Knowledge workers are in the lines of business.” He asserts that for the price of one galactic data warehouse, a half-dozen data marts can be built concurrently for the same price– and in much less time.
A Dimensional Debate
The functional purpose of a data mart is to enhance decision-support reporting. By storing information at a less global level than the typical data warehouse does, the data mart allows users to gain more-rapid access to task- specific information. Data marts may also allow for more-flexible data formatting. While a number of vendors and consultants do not consider multidimensional databases to be true data marts, multidimensional products such as Arbor Software’s Essbase are extremely prevalent in corporate finance departments and they do, in fact, meet the functional definition of data mart.
These multidimensional data marts use a data model that describes how users want to see information summarized, in order to assemble a local database from information stored in a data warehouse, or within multiple data sources. This sort of data mart is particularly useful for financial analysis, including newfangled measures such as economic value added and activity-based costing.
But summarized data is a limitation, as well as a strength. Power users who want to drill down to the transaction level will have to go elsewhere. If a user community subsequently decides it needs to view data in a new way, or if a business undergoes significant change, the database must be rebuilt to accommodate that view or change–no easy task. Also, the “cube” structure of a multidimensional database puts an effective ceiling on the size of the database (a ceiling that can vary considerably, depending on whether you’re talking to an analyst or a marketing representative).
Relational databases, in contrast, hold transaction-level data and scale up to data warehouse size. So-called ROLAP (relational online analytic processing) tools can perform multidimensional analysis on relational data, although some consultants argue that their performance falls short of MOLAP (multidimensional OLAP) tools.
Scalability enables companies to use data marts as an intermediate step toward building a full-fledged data warehouse. In this case, a collection of relational data marts may be used to feed a larger data warehouse. These data marts, in turn, may be fed by a collection of legacy systems and local, special-purpose applications. Multiple, interdependent relational data marts and multidimensional data marts can– and do– coexist.
To keep all these data sour-ces in sync, most firms use some sort of data model or metadata layer. Metadata simply means data about data. In the case of multidimensional data sources, it’s a set of instructions that explain where data should be collected and how it should be combined.
A number of vendors offer tool kits that facilitate the rapid development of multisource data marts without the hassle associated with building and maintaining custom interface programs. These data-marts-in- a-box include a firmwide metadata layer designed to track how information is being stored in one or more data marts. And they promise easy access to a variety of data sources, such as spreadsheets, mainframe computers, and relational databases.
For instance, Informatica sells a suite of client/server software tools, called PowerMart, that enables users to roll out a data mart in 30 to 45 days, says Albright. A typical installation runs $200,000 to $300,000, including hardware and services. Similar suites are sold by IBM (Visual Warehouse), Information Builders (SmartMart), SAS Institute (SAS Warehouse Administrator, CFO Vision), Platinum Technology (Repository, InfoRefiner, InfoPump), and Sagent (Sagent Data Mart Solution).
CrossZ, a software development company based in Uniondale, New York, has taken the just-add- water approach to data marts a step further with its Voyager software. The product allows users to define a business goal–say, fraud detection or acquiring new customers–through a user-friendly interface. The system then searches preselected data sources for relevant information, evaluates the quality of this information, and helps the user to rapidly build a data model and a small desktop-level data mart designed specifically for the task at hand.
“Because Voyager data marts can be built literally within a single morning, users can dynamically create multidimensional data marts on an as-needed basis,” says CrossZ vice president Matthew Doering. These data marts can also be quickly discarded. For finance users, the disposable data mart can save a great deal of time; users can create the data source they need rather than put in a special request to IT.
Data marts that draw to a large extent on a data warehouse must remain in sync with the warehouse as it evolves. Because enterprise- application and -database vendors are constantly improving (or fixing) their products, the format of data within the warehouse may also change frequently. For example, a payment table may be updated to add a cost center code. However, the data mart’s interface program that grabs information from the warehouse will not “recognize” the updated table with the extra field until it, too, is updated.
Therefore, every time tables in the data warehouse are changed, it is necessary for data mart users and IT support staff to identify these changes, update their data model or interface program, and recreate the data mart. This can be a serious headache for those who depend on data marts for key analysis.
For companies whose data marts will be populated directly from an enterprise-level system, the solution may be at hand from the system vendor, in the form of a “prefab” data mart. Such marts are automatically updated as the source database is updated. They offer the advantages of rapid implementation, seamless interfaces, and automatic data transfer.
Steve Cranford, partner in charge of the data warehousing consulting practice at KPMG Peat Marwick LLP, in Annapolis, Maryland, thinks this prefab approach will push data warehousing in a commodity-driven direction. “A lot of the integration issues have been solved,” he says. “There are a lot of business-rich applications now.”
Among the vendors currently offering such integration are Oracle and Lawson Software. Oracle’s Hurrell says that Financial Analyzer, a user-defined multi-dimensional data model and suite of applications, can easily be integrated with the Oracle enterprise financial system. “The multidimensional data mart can be updated quickly and automatically every day, week, or month, as the user requires,” he says.
Lawson’s enterprisewide transaction proces- sing system provides users with generic multidimensional data models that are compatible with the system’s structure. As the Lawson data warehouse is updated, users receive updated data models. Adam Thier, vice president of product marketing, explains that these data models are used to create a multidimensional database in Arbor Software’s Essbase product.
“We like the seamless interface between the Lawson system and our Essbase data mart,” says Tom Cook, vice president and CIO of Tejas Gas Corp., a $2.2 billion natural gas pipeline company based in Houston. Tejas Gas will use Lawson’s data models to import data into Essbase, which in turn will be used to consolidate intercompany transactions for financial reporting purposes.
But users of enterprise systems don’t have to settle for the system’s prefab mart. At San Franciscobased Pacific Gas & Electric Co., a multidimensional data mart supplements SAP’s financial modules. “Our budget analysts in the field wanted more information than we could get from the preformatted reports that come with the company’s SAP R/3 financial system,” says Jim Burke, a senior accounting analyst for the giant utility. “We also wanted to be able to perform flexible queries that would let us track down the source of any anomalous information.”
The next release of R/3 Pacific Gas is scheduled to receive will include an executive information module. But in the interim, the analysts will use a multidimensional database in conjunction with an OLAP query tool that will allow them to pull in data from the R/3 data warehouse and perform more-flexible analyses. The tool selected was Fiscal, offered by Toronto-based Lingo, which includes both a data model that can be used to build and populate a multidimensional database and a suite of financial reporting and flexible query tools.
Multidimensional data marts like Fiscal can be implemented by an organization’s finance staff without a great deal of involvement from the IT group. Also, they allow users to easily create their own reports. Says Burke: “About 80 percent of the reports we need exist within R/3’s extensive library of canned reports. But it’s the other 20 percent of reports that absorbed the lion’s share of time and energy. Finance staff had to ask the IT group to build custom programs for what were often one-time- only reports.”
The first step in implementing Fiscal, says Burke, was to build a data model to represent the relationships among the various data items to be stored within the multidimensional database. During this phase, Burke also identified the dimensions by which his group wanted to slice and dice data, including time period, cost center, planned quantity, actual quantity, organizational hierarchy, and order hierarchy. This phase was begun in late March; the initial rollout followed in June.
Pacific Gas & Electric hopes to reap numerous benefits from the data mart, says Burke. For example, reports plotting monthly employee overtime according to cost center will allow for more-accurate planning. Likewise, finance will use the mart to develop reports to help executives consider how the recent deregulation of California’s power industry will affect the utility over the long term. These reports will include comparisons between PG&E’s costs and its competitors’ costs, based on publicly available data.
Distributed marts, virtual marts, and the intranet
A weakness of the independent data mart approach is architectural. Many experts insist that unless data marts are built with the goal of creating a multisubject, enterprisewide data warehouse–sharing the same data structures and database management systems– they will simply end up replicating the “islands of information” problem that data warehousing was supposed to solve. Many companies have heeded the experts’ advice and are building enterprisewide data warehouses one data mart at a time, says KPMG’s Cranford.
Last month, Informatica released a new tool kit that uses a slightly different approach to connect data marts into an enterprisewide structure, which it calls distributed (or networked) data marts. Like traditional data warehouses, the marts rely on a central repository of data. However, the metadata and application logic are maintained in a separate repository, which simplifies construction of data marts while keeping metadata centralized, says marketing vice president Albright.
Meanwhile, Hyperion Software will soon offer applications that will allow users to combine information from multiple sources online. Says John Colbert, director of product marketing for the Stamford, Connecticut-based company: “These applications will copy information and allow users to view it and manipulate it in RAM. They will be able to perform advanced, custom-tailored financial analysis without having to create another physical data source.” This sort of “virtual” data mart could provide all the advantages of multiple data marts while minimizing systems administration.
In the future, many finance executives will want to provide Web access to information stored within data marts. This may include giving customers access to account and pricing information through a secure intranet or extranet and allowing multiple internal users and multiple locations to access a data mart online. To prepare for that future, nearly every OLAP vendor offers or plans to offer Web- access modules.
But there remain serious bandwidth challenges to moving large amounts of data and serving hundreds of concurrent users across an intranet. One day, perhaps soon, intranets will be the killer app for the distribution of data–but that day hasn’t arrived yet. *
Data Marts at a Glance
A sampling of software vendors and products
Arbor Software * Essbase * www.arborsoft.com
CrossZ * Voyager * www.crossz.com
Hyperion Software * www.hyperion.com
IBM * IBM Visual Warehouse * www.software.ibm.com/data/warehouse/vw/
Informatica * PowerMart * www.informatica.com
Information Builders * SmartMart * www.informationbuilders.com
Lawson Software * Lawson Data Warehouse * www.lawson.com
Lingo Computer Design * Fiscal * www.ilap.com/lingo
Oracle * Oracle Data Marts * www.oracle.com
Platinum Technology * Repository, InfoRefiner, InfoPump * www.platinum.com
Sagent Technology * Sagent Data Mart Solution * www.sagent.com
SAS Institute * SAS Warehouse Administrator, CFO Vision * www.sas.com
SAP * www.sap.com