The Joy of Text

Valuable information lurks in ''unstructured'' data, and new tools can help companies extract it.
Yasmin GhahremaniJanuary 24, 2006

Randy Collica is a modern-day treasure hunter. As a senior business analyst in Palo Alto–based Hewlett-Packard Co.’s customer data and knowledge services department, his job is to mine data in search of insight that can help marketers better understand various customer segments. He stumbled upon a veritable gold mine a few years ago as he riffled through notes taken by HP’s call-center representatives. “I just knew there had to be nuggets of valuable information in there, given the volume of data we had,” says Collica. “But I also knew that finding them would be impossible if we didn’t have a tool to automate the analysis.”

Although standard data-mining systems can detect patterns hidden within structured tables of information, such as the transactional data of an ERP system, they are essentially useless with unstructured data — and notes taken during a phone call are about as unstructured as data gets. So Collica turned to text mining, a type of data-mining technology that combs through text and gives it structure so it can be analyzed.

Collica’s hunch turned out to be right: text mining revealed, as one example, that customers in lower-value segments ask a lot more questions about business processes, such as HP’s contract-negotiation procedures, than do the company’s best customers. “That insight has been invaluable in helping marketers come up with solutions and campaigns targeted at different customer groups,” says Collica.

4 Powerful Communication Strategies for Your Next Board Meeting

4 Powerful Communication Strategies for Your Next Board Meeting

This whitepaper outlines four powerful strategies to amplify board meeting conversations during a time of economic volatility. 

The latest generation of technology, developed by vendors flush with post-9/11 government investment (see “Parsing the Text Market” at the end of this article), is still far from perfect. But it is allowing corporations with large data sets to perform important feats they couldn’t before. “It really is the next frontier of understanding in business intelligence,” says Martin Schneider, an analyst at The 451 Group in New York.

Key to the improvements have been advances in natural language processing, a method of extracting meaning from printed words that now allows the software to “understand” complex phrases about 80 percent of the time. Text-mining systems can also be programmed to assign value to expressions. Suppose a telesales representative has entered the following note: “Nov. 15 – Cstmr not happy w/cell phne. Wants to switch to Yellow Inc.” The software can recognize that November 15 is a date; that “cstmr” is a customer; that he has a cell phone and is unhappy, which is bad; and that he wants to switch a competitor, which is worse.

Once that kind of information is extracted, it can be structured in a format similar to a database and further analyzed, often more quickly than a human analyst can locate his reading glasses.

And the possibilities aren’t limited to customer service. San Francisco–based LoanPerformance, a provider of credit-risk-decision support tools for residential mortgage operators, uses text mining to offer its clients improved predictive analytics. Traditional risk-scoring solutions for loss mitigation and delinquency management incorporate only structured data such as a borrower’s interest rate, outstanding balance, and monthly payments. That ignores rich information that could help a mortgage servicer better determine how likely a delinquent borrower is to miss more payments or, ultimately, default. “If someone says they missed a payment because they lost their job, that’s different from ‘I forgot to send my check,’” explains Damien Weldon, director of mixed-data analytics at LoanPerformance. When the company included data mined from call-center conversations in its scoring calculations, accuracy rose by 15 to 20 percent.

Text mining is finding fertile ground in the life-sciences and pharmaceutical industries, too. The brain-tumor research department at Children’s Memorial Hospital in Chicago uses text mining to comb through reams of medical journals and unearth gene-pairing information that can accelerate critical scientific breakthroughs.

Pharmaceutical companies like Pfizer Inc. mine patent documentation for insight on new directions in research. “This serves as an early-warning system to identify trends,” explains Mark Burfoot, head of information management at Pfizer in St. Louis. “We can see what competitors are doing and, by linking that information with our own R&D data, make a decision about whether it’s an area we should be looking into.”

Text mining often starts as a way to automate manual processes and then spreads as companies see its potential. At Bank of America N.A., the E-commerce team used to manually read, sort, and categorize the comments it received on surveys and feedback forums. Now text mining does the job instantly, producing graphs and charts about prevailing attitudes that help the team prioritize proposed service enhancements. Johnson Controls Inc., the Milwaukee-based auto-parts supplier, first started using text mining in its call center several years ago, then began mining notes from the company’s 7,000 field-maintenance and installation engineers, searching for ways to improve products and reduce maintenance costs. More recently it has set up a program to scour Web logs and chat rooms to assess consumer opinions on car batteries. Next, the company plans to mine warranty claims for early warnings on product defects.

New uses are still emerging. EDS Corp. uses the technology to analyze comments from annual employee surveys, and also to examine thousands of supplier contracts to help the purchasing department track contractual terms and discounts.

But No One Understands It

For all of its promise, however, the text-mining market is still minuscule. Actual market-size figures are hard to come by, but Chicago-based SPSS Inc., one of the major vendors in the category, claims to have around 1,000 text-mining customers, a mere fraction compared with the number of its more-traditional data-mining clients.

And despite the successes, some analysts call the market’s growth “disappointing.” Part of the problem is that, despite vendor claims to the contrary, considerable skills are required to use text mining effectively. You must first know what the technology can do, and then how to act on the results. “It’s very difficult to see whether you have a great text-mining opportunity at hand,” says Alexander Linden, a research vice president at Stamford, Connecticut-based Gartner Inc. “Businesspeople don’t understand the technology well. IT people don’t even understand it well. And that’s pretty much a recipe for a bad outlook.”

Another factor behind the tepid response may be a lack of urgency. In an era of show-me-the-money budgeting decisions, other investments such as regulatory-compliance projects have come first. Says Linden: “Text mining has some good potential payback but it is not a make-or-break technology.”

Even companies convinced that they need text mining right away may be scared off by the time and resources required to make it happen. The price of the software ranges from $50,000 to several million dollars, and it can take months to collect the necessary data and customize the software. While vendors can supply prefab dictionaries, adjustments are usually necessary. “You have to understand which words, phrases, and concepts are meaningful to your company and which aren’t so you can tune the system to what’s relevant,” says Laura Ramos, a vice president at Cambridge, Massachusetts-based Forrester Research Inc.

Dr. Eric Bremer, director of pediatric brain-tumor research at Children’s Memorial, says he spent more than a year getting his text-mining project up and running. First he had to download more than 150,000 journal articles into a database. Next he had to create a dictionary of gene names and convert all of the Greek symbols, which the literature represented graphically, into text. Then things got ugly. The lab’s computers could process only about 5,000 articles in 24 hours. To get the necessary computing power, Bremer had to create a grid that siphons off unused processing capacity from other hospital computers. He now has a system that can process 100,000 articles in 24 hours.

The system is finally beginning to earn its keep. Earlier this year it identified a gene believed to be a marker for a particular type of tumor. If testing proves that to be true, treatments will be more-accurately prescribed. The discovery has eased Bremer’s mind about whether the text-mining setup headaches were worthwhile. “If I had realized up front what the costs would be, I might not have been willing to do the project,” he says. “But we had such a backlog of data that I bit the bullet and spent the money. The rewards are worth it.”

It doesn’t take a life-and-death situation to win over other converts. Johnson Controls chief information officer Sam Valanju says of his text-mining system: “It has definitely improved product quality. The returns are intangible, but they are definitely there.”

Yasmin Ghahremani writes about business and technology from New York.

Parsing the Text Market

The federal government has been a major growth driver for the text-mining market. The Central Intelligence Agency and other federal agencies have long had electronic tools for finding information on terrorist activities, but those largely relied on structured data. Since 9/11, the intelligence community has sought to increase its ability to mine E-mail, chat rooms, field reports, newspaper articles, and other text sources. In-Q-Tel, the CIA’s investment arm, has provided financial backing for Attensity Corp., Inxight Software Inc., and Intelliseek Inc., among others.

Cincinnati-based Intelliseek has since sold off its CIA-backed business, but acknowledges the agency’s contribution to the market’s development. “They definitely catalyzed the growth of text mining between 2001 and today,” says Sundar Kadayam, chief technology officer at Intelliseek. “They have very specific, discernible, and immediate needs to assimilate large volumes of data.”

Today, the text-mining vendor market breaks down into roughly three categories:

Specialized text-mining firms. Intelliseek, Inxight Software, Intelligenxia, ClearForest, and Attensity. These innovators are prime candidates for acquisition.

Data-mining vendors. SAS Institute and SPSS have added text mining to their portfolios. They lead the market.

Database vendors. IBM, Oracle, and Microsoft have incorporated some text-mining capabilities into their database and software infrastructure products. Customers seeking complex features must look elsewhere, however. — Y.G.