This article is the first in a series of articles called “Opening the Black Box: How to Assess Machine Learning Models.” The second piece, Selecting and Preparing Data for Machine Learning Projects, and the third piece, Understanding and Assessing Machine Learning Algorithms, were both published in May 2020.
The use of machine learning technology is spreading across all areas of modern organizations, and its predictive capabilities suit the finance function’s forward-looking needs. Understanding how to work with machine learning models is crucial for making informed investment decisions.
Yet, for many finance professionals, successfully employing them is the equivalent of navigating the Bermuda Triangle.
Properly deploying machine learning within an organization involves considering and answering three core questions:
- Does this project match the characteristics of a typical machine learning problem?
- Is there a solid foundation of data and experienced analysts?
- Is there a tangible payoff?
Does This Project Match the Characteristics of a Typical ML Problem?
Machine learning is a subset of artificial intelligence that’s focused on training computers to use algorithms for making predictions or classifications based on observed data.
Finance functions typically use “supervised” machine learning, where an analyst provides data that includes the outcomes and asks the machine to make a prediction or classification based on similar data.
With “unsupervised” machine learning, data is provided without outcomes and the machine attempts to glean them. However, given the popularity of the supervised models within finance functions, our articles will focus on such models.
To present a very simple example in which you were attempting to train a model that predicts A + B = C using supervised machine learning, you would give it a set of observations of A, B, and the outcome C.
You would then tell an algorithm to predict or classify C, given A and B. With enough observations, the algorithm will eventually become very good at predicting C. With respect to this example, the problem is well solved by humans.
But what if the question was A+B+…+F(X) = Z?
Traditionally, humans would tackle that problem by simplifying the equation — by removing factors and introducing their own subjectivity. As a result, potentially important factors and data are not considered. A machine can consider all the factors and train various algorithms to predict Z and test its results.
In short, machine learning problems typically involve predicting previously observed outcomes using past data. The technology is best suited to solve problems that require unbiased analysis of numerous quantified factors in order to generate an outcome.
Is There a Solid Foundation of Data?
Machine learning models require data. As noted earlier, the data must also include observable outcomes, or “the right answer,” for machine learning to predict or classify.
For instance, if you are trying to predict what credit rating a private company might attain based on its financial statements, you need data that contains other companies’ financial statements and credit ratings. The ML model will look at all the financial statement data and the observable outcomes (in this case the other companies’ credit ratings), and then predict what the private company credit rating might be.
If the data didn’t include credit-rating outcomes, the machine learning model would have no way to use the data to predict an outcome.
Another consideration regarding data organization, when determining whether machine learning can solve a problem, is that text needs to be transformed into numerical data and contain observable outcomes.
When making machine learning assessments, evaluating outputs of a model, or determining if a model is useful, be sure to consider your organization’s historical data. That’s what enables machine learning models to make predictions or classifications.
As with any statistical analysis based on historical data, a machine learning model’s predictions and classifications are only as relevant as the historical data is representative of the current environment.
Determining how effective machine learning will be at solving an organization’s problems also requires understanding individual problems well enough to know if the model answer is meaningful. The analyst must be able to interpret the results and determine if they are correct and causal.
In the prior example of predicting a credit rating, the analyst might gather all public filing data and credit ratings available. This would provide a vast amount of data — and the more data, the better, right?
You might get great results with train-and-test scores, but an analyst that understands a problem would recognize that the results might improve if, for example, you only used data after the financial crisis of 2008. Machine learning works best in organizations with experienced analysts to interpret the results and understand the problem well enough to solve it using ML.
Understanding the Payoff
Given the hype around machine learning, it’s understandable that businesses are eager to implement it. As with any technology application, leaders should ask themselves if their teams will be able to use the model to work more efficiently and effectively, and/or make better decisions.
In assessing the payoff, leaders should ensure that their teams are properly trained on how ML works, understand the underlying data, and are able to use their valuable experience to interpret the results.
When properly assessed and evaluated, machine learning holds the key that can help organizations unlock objective results better and faster.