Big Data, Smaller Risk

“Think back to eighth grade, when we had to diagram sentences on the chalk board,” Brian Murrow says, reaching into his past for an analogy to explain how a field of computer science can detect whether call-center employees are bullying customers. Technology used in that field, called “natural language processing,” enables a computer to read millions of documents. As it reads transcripts of phone calls, for example, a computer can “diagram” sentences and phrases in mathematical terms, thus revealing through word order the tone of telemarketers as they attempt to sell a particular product.

“Once you diagram [phrases and sentences], you know what your subject is and what your predicate is, what your objects are, your prepositional phrases. You can start to figure out who’s saying what, with what intention, to whom,” notes Murrow, a KPMG Big Data expert focusing on banks.

That, in turn, enables a bank’s risk managers to gauge the probability that its call-center workers are engaging in predatory lending practices, like selling a loan to a customer who can’t pay it back. Knowing that, the bank can avoid a visit from regulators by either firing the predatory workers or training them to mend their ways.

Murrow provides the example to indicate a trend: After years of treating Big Data almost exclusively as a way to aid marketers and drive revenue, companies are starting to explore its risk management capabilities. Increasingly, they’re looking for patterns in their internal emails and audio files and on social media to spot and avert a plethora of potential risks.

“We’re at a pivot point where we are seeing the capabilities move from marketing into risk management,” the consultant said. “The analytics aren’t necessarily new, and how we deal with the data isn’t necessarily new. But what is new is the evolution of computing power to be able to handle the level of computations it takes to apply these analytics to Big Data.”

A Bevy of Breaches

It’s likely that the recent barrage of high-profile data thievery has also helped shift some of the corporate focus from benefits to risks. Massive thefts of big data (names, credit card numbers, email addresses, passwords, etc.) have been getting widespread attention, beginning with the hacks of Target and Adobe Systems in 2013 and extending right through the attack reported in July on Ashley Madison, a purportedly anonymous website encouraging extramarital affairs.

Regardless of the reason, corporations have begun to focus on the risky side of Big Data in two ways: both as a source of risk itself and as a means to manage it. One example of assessing the former, called “data-flow analysis,” involves tracing the location of data at different times during a business process, according to Jim Adler, chief privacy officer at Metanautix, a Big Data analytics firm specializing in supply chains.

The method can prove especially useful in detecting attacks on retail point-of-sale devices that copy debit or credit card data to an internal server. Working at night, hackers might steal the credit card numbers that the devices had collected throughout the day on the server, Adler said, noting that criminals have accumulated “millions of [personal information] numbers” under the noses of data-security personnel.

If company risk managers deploy data-flow analysis, however, they can detect an abnormally large number of queries being made on a specific aspect of a store’s database during the last week, for instance, and compare that number with trends over the last year or longer. Security people don’t even need to know what kind of data are being requested, Adler said. Just observing an unusual number of queries might be enough to trigger a response from the company.

Another common Big Data modeling technique, used by credit card companies to sniff out fraudsters, is called “outlier analysis,” according to Rob Hellewell, a vice president in data analytics at Xerox Litigation Services. For instance, “if I’m a credit card holder, you can look at my transactions over the last two or three years and see that 95% of them take place within the Washington, D.C., metro area,” he says.

“If I buy a hamburger at Five Guys on Friday there and if on Saturday I try to buy a $10,000 plasma TV in St. Petersburg [Florida], outlier analysis says, ‘Hey, this doesn’t fit,’” and data security personnel can take a look at the person and possibly nip the fraud in the bud, Hellewell adds.

But credit card transactions and other forms of “structured” Big Data—the pre-defined data residing in spreadsheets or formal database records—are not the only source of data for assessing risks. Much more of a company’s data are “unstructured,” like the human speech used in natural language processing or chatrooms and email. That data are where risk managers are finding new ways to uncover perils.

A particular advantage of unstructured Big Data touted by its collectors and analyzers is that it provides finance and risk executives with the ability to act almost immediately to avert hazards. One technology that offers such speed is image-recognition software, which, for example, enables consumer goods sales reps to use their smartphones to snap photos of supermarket shelves. The software provides an instant visual analysis of the photo, which the rep can then use to see that errors are corrected.

A digital image shown on the website of Trax Technology Solutions, a Singapore-based provider of image-recognition software, reveals how the tool works. The photo shows a misplaced product on a retail shelf circled in yellow, presumably enabling the rep who sees the image to walk over to a supermarket employee and request that the error be corrected. Manufacturers can use the tool to manage the risk of in-store violations of its brand, according to the firm.

Ironically, unstructured data can apparently be used to limit the risks posed by the collection of more structured information—the items in the handwritten inventory lists traditionally employed by retailers, for instance. After analyzing the unstructured data stemming from a shelf photo, for instance, Trax has been able “to intervene in certain cases before questionable data is collected, and while processes can still be adjusted or reversed, thus improving the overall quality of data collection,” Nina Tan, the firm’s CFO, contends.

Where Risk Resides

Amid the vast array of sources of unstructured data that can be used to manage corporate risk, email is getting special attention. “If you’re taking a look at where risks sit in your enterprise, email is probably the most vulnerable part,” says Xerox Litigation’s Hellewell. “And within a corporation, it certainly qualifies as Big Data because there’s a lot of it.”

In the context of the risk of lawsuits and regulatory investigations, he adds, “it’s the primary target of what people are asking for and sifting through.”

While Hellewell focuses a great deal on searching through the unstructured text of thousands of emails in helping his clients avert lawsuits, he also expends a lot of effort in investigating the patterns in the metadata of employees’ email communications. “We discovered that the great thing about unstructured data was that it has this very rich layer of metadata, and metadata can also be very, very revealing about risk,” he says. “In some cases as much as the text.”

By metadata, he means the information in an email’s header: the subject, the addresser, and the addressee; the date and time; whether and to whom the email is forwarded; whether it’s high or low priority; and who’s copied and blind-copied. Bcc’s, in fact, are a subject of special focus. “If somebody is going to the trouble of bcc’ing somebody on an email, it usually indicates an intention to hide something,” Hellewell says. “Why did you choose a bcc over cc? Because you didn’t want someone to know that you were forwarding that email.”

Another potential risk indicator is the time of day an email is sent. “Emails sent after hours have a higher likelihood of containing concerning information than ones that are sent during business hours,” says Hellewell, noting that metadata helps his firm build models to guide the searches of unstructured text.

Insider Threats

At Deloitte, email monitoring forms an important part of the Big Four firm’s efforts to prevent the release of restricted information to the public either accidentally or on purpose, says Chuck Saia, its chief reputation, risk, and regulatory affairs officer.

As a big federal contractor, the firm is subject to the rules of the government’s Insider Threat Program established in response to huge data leaks, especially the diplomatic cables leaked by Chelsea Manning, the U.S. army soldier convicted of espionage in 2013.

Set up in 2011 by an executive order issued by President Barack Obama, the program aims “to promote the development of effective insider threat programs within departments and agencies to deter, detect, and mitigate actions by employees who may represent a threat to national security.”

The threats include political spying and threats against the nation like the release of some of “the vast amounts of classified data available on interconnected United States Government computer networks and systems.”

Besides poring over data contained in Deloitte’s human resource systems for ethical and compliance violations by its employees, a group within the firm monitors the incoming and outgoing email traffic of particular groups or individuals. The purpose is “to understand if we have a high-risk area that we need to look into,” Saia says, noting that the firm is building out its insider threat program for use in firm-wide monitoring.

Similarly, Dun & Bradstreet CFO Richard Veldran has expanded the use of federal government compliance data culled by the credit-risk analysis firm to broader risk management purposes. Since 1962, D&B has maintained D-U-N-S Numbers—Veldran calls the identifier “a Social Security number for a business”—and assigned them to more than 100 million businesses worldwide.

In 1994, the federal government adopted the nine-digit number, which identifies businesses by location, as the standard business identifier for electronic commerce. In 1998, the number was approved as the federal government’s contractor identification code for all procurement-related activities.

In short, the number, which can be obtained for free, is a must-have for small companies wanting to do business with the U.S. government and many foreign ones. To qualify for it, businesses hand over to D&B a rich lode of data, including the business’s name, physical and mailing addresses, financial information, and links to members of corporate family trees worldwide.

Besides seeing the data as a source of his company’s revenue, Veldran uses it to fuel his assessment of D&B’s own risks. As the company’s clients do, he analyzes it to determine the basics of whether and when customers can be expected to pay their bills. In “a more advanced way,” however, he analyzes it to determine “where a [customer] is headed.”

Further, the finance chief uses the D-U-N-S data to ferret out weak links in D&B’s customers’ supply chains. For example, he looks at whether suppliers are shipping more or fewer goods than their competitors. “Is the company likely to go out of business?” he asks. “Is it a subsidiary of a troubled parent?”

Panoply of Patter

Like many other organizations, D&B and Deloitte make use of information gleaned from the panoply of websites and applications, chatrooms, blogs, and video-sharing systems collectively known as social media.

For his part, Veldran uses social media to round out the assessments of supply-chain risks he gains from analyzing the structured data derived from D-U-N-S Numbers and other sources. “Seeing what’s being tweeted about a particular company provides a more holistic view,” he says.

Deloitte’s Saia notes that his firm has “invested heavily” in what it dubs its “reputational re-sensing capability.” The effort to detect external threats to the firm’s reputation deploys an in-house team that provides round-the-clock monitoring of social media via software provided by Sprinklr, a social media management firm.

The software tool enables the team to pick up “anything being said about us via publicly available, open-sourced information,” the risk officer says, as well as chatter about Deloitte’s industry sector, its competitors, and its stakeholders.

Acting on what it hears and sees by means of analyzing Big Data, the firm may strike back in defense of its reputation. “If we see a trend coming from a particular stakeholder group, we simply might reach out to that stakeholder group to tell our side of the story,” Saia says.

“In addition, it may trigger a response by us in the media outlets where we try to get our story out ahead of any negative story,” he adds.

Indeed, as the messages on social media proliferate, more and more organizations will likely be engaged in such skirmishes. The story of Big Data in Corporate America may have become as much about averting the negative as it is about accentuating the positive.