CFOs and accountants have heard the call to expand their big data analytical skills. With this newfangled know-how comes the promise to identify workplace efficiencies, evaluate prospective business opportunities and better understand customers. But this rosy perspective focuses only on the upside of big data.
A downside lurks in the disorganized and nonessential email, memos, drafts, presentations, reports and spreadsheets that lay strewn across most company networks. This surplusage can be analyzed by hostile litigators, aggressive regulators or surreptitious hackers using the same big-data techniques employed by marketing wizards. Therefore, if not properly managed, this mundane form of internal big data can significantly increase risks and raise costs. When it comes to internal company data, “less” is usually “more.” To run critical business functions and to mitigate risk, a company has to determine what data to keep and what data to delete.
The disorganized morass of nonessential files is now a critical issue for leadership, in light of rapid advancements in computer science. Complex analysis historically was limited to large databases, with their tidy tables and regular columns of numbers in uniform formats. However, big data analysts now can run complex searches across messy workplace byproduct. They can search “unstructured” information, outside formatted databases, across different file types and even in different languages.
Paul Luehr
Not surprisingly then, big data analysis has become the lifeblood of many kinds of litigation. Even the most routine employment cases often involve forensic collections of data from multiple locations (email servers, file servers, laptop hard drives, online email accounts and smartphones). Larger litigation matters take big data analysis a step farther. For example, graphic files are searchable through optical character recognition; communication patterns can be visually depicted through early case assessment tools; concept searches are being used to enhance basic keyword searches; and computer prediction models are replacing weeks of human document review.
Federal regulators and cybercriminals, alike, have also climbed on the big data band wagon. The SEC data mines financial reports and trading accounts to find fraud and “aberrational performance.” This past winter, the head of FINRA boasted that his organization detects market abuses by monitoring almost six billion trades per day, as he said, “more ‘big data’ on a daily basis than the Library of Congress or Visa.” Meanwhile, hackers crawl large corporate networks, use complex queries to search for exposed passwords, hide among the refuse of everyday computer files and target the very same customer databases compiled by big data analysts.
In spite of these developments, many companies do not organize their data well enough to respond to subpoenas and legal discovery. In addition, because of this weakness companies often cannot tell when, how often or if they’ve been hit by a cyber attack. By hoarding mountains of unorganized, nonessential internal data, companies risk facing obstruction and spoliation of evidence charges (intentionally or negligently hiding or destroying evidence), and making themselves vulnerable to multistage cyber attacks.
Drawing a Data Map
To escape these fates, companies must create comprehensive and well-thought-out data retention and deletion policies. The process begins with the creation of a company data map. A data map tracks what information is created, where it is stored, how long it is retained and who manages it within the organization, allowing leadership to take real control of it.
To start, a CFO should assign a team to sit down with internal business groups to assess their data inventories and to construct a taxonomy of them. For example, accounting files may fall under the categories accounts receivable, accounts payable, taxes and billing disputes. Then the team should identify who oversees the data and where it resides — on employee hard drives, in a particular database, on a network drive or in the cloud. Creating this inventory can be unnerving. Critical records may be scattered all over the network, sensitive personal information may not be encrypted and confidential financial documents may be available to an unnecessary number of employees through a shared network folder.
Next, the CFO’s team should consult with legal counsel to assign proper retention periods to each document category. The attorney may advise keeping general email for three months or certain tax documents for five years, but employee benefit records indefinitely. He or she can also advise on the practical extent of litigation holds. Many companies fail to lift holds when a case ends. As a result, the same documents could be subjected to another hold when the next lawsuit arises, and so on.
A few years ago, my firm worked through a large inventory of case histories for a major insurance company to verify which legal holds could be lifted. This company was retaining so much data, under so many separate litigation holds, that an entire cottage industry had developed in its hometown to handle the company’s backup tapes. When we verified that numerous litigation holds could be lifted, including one from a federal regulator, the company enjoyed a multimillion dollar reduction in its data storage costs.
Whatever data retention periods a company sets have to be adjustable to address company needs and changing circumstances. For example, in the event of a data breach, forensic investigators often will ask to look at historic log files and server backups in an effort to determine when and how the attackers first struck. If a company normally keeps four sets of weekly backup tapes and three sets of monthly backup tapes of its servers, the investigators might lose an entire month’s worth of valuable data if action is not taken quickly to suspend the company’s normal backup tape rotation. Setting aside the oldest backup set and inserting a new batch of tapes is an easy way to preserve that evidence.
Even with an effective data map and a clear data retention and deletion schedule, the work around real information governance is never done. A company must regularly train employees about where to store its data and must update policies frequently to address evolving technologies. Some of the thorniest issues today arise within companies that have liberal bring-your-own mobile device (BYOD) policies. In these companies, personal information may commingle with corporate data on smartphones that are backed up to home and cloud locations. The next challenge may be wearable technology that introduces health-care data into the workplace.
We only know one thing for sure: data will continue to grow. Managing it, not only by harnessing large stores of data about customers and markets, but also by minimizing nonessential information clogging the internal network, can help a company maximize its big data returns.
Paul H. Luehr is managing director of Stroz Friedberg, having joined the firm after more than 11 years as a federal attorney with the U.S. Department of Justice and the Federal Trade Commission. Luehr has supervised numerous computer forensic, computer crime and electronic discovery matters and overseen several investigations into trade secret thefts. He has also pioneered cases involving “database forensics,” in which the data from enterprise Oracle, SQL and Access databases must be extracted, preserved, authenticated and interpreted in electronic discovery and regulatory matters.