New uses are still emerging. EDS Corp. uses the technology to analyze comments from annual employee surveys, and also to examine thousands of supplier contracts to help the purchasing department track contractual terms and discounts.
But No One Understands It
For all of its promise, however, the text-mining market is still minuscule. Actual market-size figures are hard to come by, but Chicago-based SPSS Inc., one of the major vendors in the category, claims to have around 1,000 text-mining customers, a mere fraction compared with the number of its more-traditional data-mining clients.
And despite the successes, some analysts call the market's growth "disappointing." Part of the problem is that, despite vendor claims to the contrary, considerable skills are required to use text mining effectively. You must first know what the technology can do, and then how to act on the results. "It's very difficult to see whether you have a great text-mining opportunity at hand," says Alexander Linden, a research vice president at Stamford, Connecticut-based Gartner Inc. "Businesspeople don't understand the technology well. IT people don't even understand it well. And that's pretty much a recipe for a bad outlook."
Another factor behind the tepid response may be a lack of urgency. In an era of show-me-the-money budgeting decisions, other investments such as regulatory-compliance projects have come first. Says Linden: "Text mining has some good potential payback but it is not a make-or-break technology."
Even companies convinced that they need text mining right away may be scared off by the time and resources required to make it happen. The price of the software ranges from $50,000 to several million dollars, and it can take months to collect the necessary data and customize the software. While vendors can supply prefab dictionaries, adjustments are usually necessary. "You have to understand which words, phrases, and concepts are meaningful to your company and which aren't so you can tune the system to what's relevant," says Laura Ramos, a vice president at Cambridge, Massachusetts-based Forrester Research Inc.
Dr. Eric Bremer, director of pediatric brain-tumor research at Children's Memorial, says he spent more than a year getting his text-mining project up and running. First he had to download more than 150,000 journal articles into a database. Next he had to create a dictionary of gene names and convert all of the Greek symbols, which the literature represented graphically, into text. Then things got ugly. The lab's computers could process only about 5,000 articles in 24 hours. To get the necessary computing power, Bremer had to create a grid that siphons off unused processing capacity from other hospital computers. He now has a system that can process 100,000 articles in 24 hours.
The system is finally beginning to earn its keep. Earlier this year it identified a gene believed to be a marker for a particular type of tumor. If testing proves that to be true, treatments will be more-accurately prescribed. The discovery has eased Bremer's mind about whether the text-mining setup headaches were worthwhile. "If I had realized up front what the costs would be, I might not have been willing to do the project," he says. "But we had such a backlog of data that I bit the bullet and spent the money. The rewards are worth it."
It doesn't take a life-and-death situation to win over other converts. Johnson Controls chief information officer Sam Valanju says of his text-mining system: "It has definitely improved product quality. The returns are intangible, but they are definitely there."
Yasmin Ghahremani writes about business and technology from New York.
Parsing the Text Market
The federal government has been a major growth driver for the text-mining market. The Central Intelligence Agency and other federal agencies have long had electronic tools for finding information on terrorist activities, but those largely relied on structured data. Since 9/11, the intelligence community has sought to increase its ability to mine E-mail, chat rooms, field reports, newspaper articles, and other text sources. In-Q-Tel, the CIA's investment arm, has provided financial backing for Attensity Corp., Inxight Software Inc., and Intelliseek Inc., among others.
Cincinnati-based Intelliseek has since sold off its CIA-backed business, but acknowledges the agency's contribution to the market's development. "They definitely catalyzed the growth of text mining between 2001 and today," says Sundar Kadayam, chief technology officer at Intelliseek. "They have very specific, discernible, and immediate needs to assimilate large volumes of data."
Today, the text-mining vendor market breaks down into roughly three categories:
Specialized text-mining firms. Intelliseek, Inxight Software, Intelligenxia, ClearForest, and Attensity. These innovators are prime candidates for acquisition.
Data-mining vendors. SAS Institute and SPSS have added text mining to their portfolios. They lead the market.
Database vendors. IBM, Oracle, and Microsoft have incorporated some text-mining capabilities into their database and software infrastructure products. Customers seeking complex features must look elsewhere, however. — Y.G.


Video

Reader Comments» Post a comment