This article is the third in a series of articles called, “Opening the Black Box: How to Assess Machine Learning Models.” The first piece, “What Kind of Problems Can Machine Learning Solve?” was published last October. The second piece, “Selecting and Preparing Data for Machine Learning Projects” was published on May 5.
Chief financial officers today face more opportunities to engage with machine learning within the corporate finance function of their organizations. As they encounter these projects, they’ll work with employees and vendors and will need to communicate effectively to get the results they want.
The good news is that finance executives can have a working understanding of machine learning algorithms, even if they don’t have a computer science background. As more organizations turn to machine learning to predict key business metrics and solve problems, learning how algorithms are applied and how to assess them will help financial professionals glean information to lead their organization’s financial activity more effectively.
Machine learning is not a single methodology but rather an overarching term that covers a number of methodologies known as algorithms.
Enterprises use machine learning to classify data, predict future outcomes, and gain other insights. Predicting sales at new retail locations or determining which consumers will most likely buy certain products during an online shopping experience represent just two examples of machine learning.
A useful aspect about machine learning is that it is relatively easy to test a number of different algorithms simultaneously. However, this mass testing can create a situation where teams select an algorithm based on a limited number of quantitative criteria, namely accuracy and speed, without considering the methodology and implications of the algorithm. The following questions can help finance professionals better select the algorithm that best fits their unique task.
Four questions you should ask when assessing an algorithm:
1. Is this a classification or prediction problem? There are two main types of algorithms: classification and prediction. The first form of data analysis can be used to construct models that describe classes of data using labels. In the case of a financial institution, a model can be used to classify what loans are most risky and which are safer. Prediction models on the other hand, produce numerical outcome predictions based on data inputs. In the case of a retail store, such a model may attempt to predict how much a customer will spend during a typical sales event at the company.
Financial professionals can comprehend the value of classification by seeing how it handles a desired task. For example, classification of accounts receivables is one way machine learning algorithms can help CFOs make decisions. Suppose a company’s usual accounts receivable cycle is 35 days, but that figure is simply an average of all payment terms. Machine learning algorithms provide more insight to help find relationships in the data without introducing human bias. That way, financial professionals can classify which invoices need to be paid in 30, 45, or 60 days. Applying the correct algorithms in the model can have a real business impact.
2. What is the selected algorithm’s methodology? While finance leaders are not expected to develop their own algorithms, gaining an understanding of the algorithms used in their organizations is possible since most commonly deployed algorithms follow relatively intuitive methodologies.
Two common methodologies are decision trees and Random Forest Regressors. A decision tree, as its name suggests, uses a branch-like model of binary decisions that lead to possible outcomes. Decision tree models are often deployed within corporate finance because of the types of data generated by typical finance functions and the problems financial professionals often seek to solve.
A Random Forest Regressor is a model that uses subsets of data to build numerous smaller decision trees. It then aggregates the results to the individual trees to arrive at a prediction or classification. This methodology helps account for and reduces a variance in a single decision tree, which can lead to better predictions.
CFOs typically don’t need to understand the math beneath the surface of these two models to see the value of these concepts for solving real-world questions.
3. What are the limitations of algorithms and how are we mitigating them? No algorithm is perfect. That’s why it’s important to approach each one with a kind of healthy skepticism, just as you would your accountant or a trusted advisor. Each has excellent qualities, but each may have a particular weakness you have to account for. As with a trusted advisor, algorithms improve your decision-making skills in certain areas, but you don’t rely on them completely in every circumstance.
With decision trees, there’s a tendency that they will over-tune themselves toward the data, meaning they may struggle with data outside the sample. So, it’s important to put a good deal of rigor into ensuring that the decision tree tests well beyond the dataset you provide it. As mentioned in our previous article, “cross contamination” of data is a potential issue when building machine learning models, so teams need to make sure the training and testing data sets are different, or you will end up with fundamentally flawed outcomes.
One limitation with Random Forest Regressors, or a prediction version of the Random Forest algorithm, is that they tend to produce averages instead of helpful insights at the far ends of the data. These models make predictions by building many decision trees on subsets of the data. As the algorithm runs through the trees, and observations are made, the prediction from each tree is averaged. When faced with observations at the extreme ends of data sets, it will often have a few trees that still predict a central result. In other words, those trees, even if they aren’t in the majority, will still tend to pull predictions back toward the middle of the observation, creating a bias.
4. How are we communicating the results of our models and training our people to most effectively work with the algorithms? CFOs should provide context to their organizations and employees when working with machine learning. Ask yourself questions such as these: How can I help analysts make decisions? Do I understand which model is best for accomplishing a particular task, and which is not? Do I approach models with appropriate skepticism to find the accurate outcomes needed?
Nothing is flawless, and machine learning algorithms aren’t exceptions to this. Users need to be able to understand the model’s outputs and interrogate them effectively in order to gain the best possible organizational results when deploying machine learning.
A proper skepticism using the Random Forest Regressor would be to test the outcomes to see if they match your general understanding of reality. For example, if a CFO wanted to use such a model to predict the profitability of a group of enterprise-level services contracts she is weighing, the best practice would be to have another set of tests to help your team understand the risk that the model may classify highly unprofitable contracts with mildly unprofitable ones. A wise user would look deeper at the underlying circumstances of the company to see that the contract carries a much higher risk. A skeptical approach would prompt the user to override the situation to get a clearer picture and better outcome.
Understanding the types of algorithms in machine learning and what they accomplish can help CFOs ask the right questions when working with data. Applying skepticism is a healthy way to evaluate models and their outcomes. Both approaches will benefit financial professionals as they provide context to employees who are engaging machine learning in their organizations.
Chandu Chilakapati is a managing director and Devin Rochford a director with Alvarez & Marsal Valuation Services.