Free Subscription to CFO Magazine

Panning for Internet Gold

E-businesses are analyzing the data streams flowing through Web sites.

December 1, 1999

Data warehousing, a cluster of technologies that can deliver dazzling insights to decision makers and chronic headaches to IT staffers, has found a fertile new field of application: the World Wide Web. A small but growing number of companies are using "webhousing" to analyze the enormous volumes of traffic streaming daily through their Web sites.

Some of the most familiar names in data warehousing and analytic software — Hyperion, IBM, Oracle, Sagent, and SAS Institute, to name five — are applying their expertise to this burgeoning market. They've been joined by fledgling software providers that offer Web server log analysis, such as WebTrends and NetGenesis; and by CRM (customer relationship management) and E-commerce software vendors, such as BroadVision. As for the customers, dot-coms and media companies, such as CDNow, AutoTrader.com, and The New York Times Co., are leading the way.

What are these companies looking for? In the brick-and-mortar world, data warehousing analyzes trends in sales data and develops customer profiles. Transactions and profiles are important in cyberspace, too, but users of webhousing (sometimes referred to as clickstream data warehousing) are especially interested in patterns of online behavior.

They start with path analysis — how surfers navigate sites. Which pages do visitors choose, and which do they linger on longest? Web designers can use this information to redesign sites for maximum utility. Many companies are now advertising online; how can they measure the effectiveness of that investment? Through path and click-through analysis, E-marketers can determine which banner ads deliver the most bang for the buck. They can identify the best sites, and the best pages on those sites, for placing those ads.

Webhousing can calculate the value of affiliations — which portals and search engines deliver the best customers. And it can measure newfangled Internet metrics such as page views, click-throughs, and "stickiness" (the length of time surfers stay on a site). The new breed of webhousing tools, in short, makes online ventures "feel less like a shot in the dark," says Rick Ratliff, director of the new media division at Detroit Newspapers.

Up Close and Personal
Companies like Detroit Newspapers focus on the online behavior of groups, not individuals, to determine content affinities. But when a Web visitor registers or buys something, a profile can be developed, and personalization becomes possible. "You can redecorate your Web site according to customer preferences," says Michael Howard, vice president of the data warehouse program office at Oracle Corp., in Redwood Shores, California. Knowing the customer will make cross-selling and upselling feasible, and enable an E-business to tailor content and prices to its most valuable customers.

Online shopping-cart abandonment occurs more than 50 percent of the time a cart is used, according to Daniel Druker, general manager of Hyperion Solutions Corp.'s new E-Business Division. Why not send an E-mail to the would-be shopper and offer him a discount? Webhousing makes this doable. Admittedly, says Druker, some shoppers may be less thrilled by the savings than chilled by the surveillance. But "on the flip side, people are willing to have their shopping experience improved," he notes. Druker believes that concerns over privacy will abate as people become more used to the Internet "as a pervasive medium."

The benefits of personalization can extend to the business-to-business sphere: which products does a corporate customer typically buy via the Web? Meanwhile, the same tools that optimize external Web sites can be applied to internal sites, points out Howard; intranets and portals can be made more user-friendly for knowledge workers. "Your whole company becomes that much more intelligent," he says.

That includes competitive intelligence. Analyzing Web-site traffic, companies can spot visitors from competing companies and identify the pages they view. According to an industry source, one technology company with a sizable Internet presence can reasonably guess when a rival is preparing to launch or upgrade a particular product. How? By observing the uptick in traffic from the rival's domain to the Web pages for the company's own, corresponding product.

Combinatorial Explosion
The essential trick of webhousing is to retrace the paths taken by individual Web visitors — to "sessionize" the raw data downloaded from Web servers. Like walking on one's hands across a football field, this is a straightforward task in theory, but difficult in practice.

Here's why. A Web server records hits, or requests for data, in a log. Clicking on a home page may result in 5 hits — four images plus HTML text — and five rows of log data. Clicking on another page on the site might result in another 10 hits. According to IBM, the average number of hits per page view is 5, and the average number of page views is also 5. That's bad enough.

It gets much, much worse. If, say, nine other surfers are using the Web site at the same time, the record of those original five hits is interspersed with data for the other visitors; one might have to comb through 200 or 300 pages of server log data to reconstruct the single page view of one visitor. Multiply these kinds of numbers by the amount of traffic a busy Web site gets, and the result is "a combinatorial explosion," says Lou Agosta, senior industry analyst at Giga Information Group and author of The Essential Guide to Data Warehousing (Prentice Hall).


Reader Comments» Post a comment

advertisement

Related White Papers

» More Related White Papers

Business Solutions Center

» More Business Solutions Center Links

advertisement

We Deliver

Newsletters

Webcasts

Enter your email address to begin receiving updates on these topics.