John Collins likes data. As a special investigator with the New York Stock Exchange, he built an automated surveillance system to detect suspicious trading activity. He pioneered approaches for transforming third-party “data exhaust” into investment signals as co-founder and chief product officer of Thasos. He also served as a portfolio manager for a fund’s systematic equities trading strategy.
So, when trying to land Collins as LivePerson’s senior vice president of quantitative strategy, the software company sent Collins a sample of the data that is generated on its automated, artificial intelligence-enabled conversation platform. He was intrigued. After a few months as an SVP, in February 2020, Collins was named CFO.
What can a person with Collins’ kind of experience do when sitting at the intersection of all the data flowing into an operating company? In a phone interview, Collins discussed the initial steps he’s taken to transform LivePerson’s vast sea of data into useful information, why data science projects often fail, and his vision for an AI operating model.
An edited transcript of the conversation follows.
The company was running a very fragmented network of siloed spreadsheets and enterprise software. Humans performed essentially the equivalent of ETL [extract, transform, load] jobs — manually extracting data from one system, transforming it in a spreadsheet, and then loading it into another system. The result, of course, from this kind of workflow is delayed time-to-action and a severely constrained flow of reliable data for deploying the simplest of automation.
The focus was to solve those data constraints, those connectivity constraints, by connecting some systems, writing some simple routines — primarily for reconciliation purposes — and simultaneously building a new modern data-lake architecture. The data lake would serve as a single source of truth for all data and the back office and a foundation for rapidly automating manual workflows.
One of the first areas where there was a big impact, and I prioritized it because of how easy it seemed to me, was the reconciliation of the cash flowing into our bank account to the invoice we sent customers. That was a manual process that took a team of about six people to reconcile invoice information and bank account transaction detail continuously.
More impactful was [analyzing] the sales pipeline. Traditional pipeline analytics for an enterprise sales business consists of taking late-stage pipeline and assuming some fraction will close. We built what I consider to be some fairly standard classic machine learning algorithms that would understand all the [contributors] to an increase or decrease in the probability of closing a big enterprise deal. If the customer spoke with a vice president. If the customer got its solutions team involved. How many meetings or calls [the salespeson] had with the customer. … We were then able to deploy [the algorithms] in a way that gave us insight into the bookings for [en entire] quarter on the first day of the quarter.
If you know what your bookings will be the first week of the quarter, and if there’s a problem, management has plenty of time to course-correct before the quarter ends. Whereas in a typical enterprise sales situation, the reps may hold onto those deals they know aren’t going to close. They hold onto those late-stage deals to the very end of the quarter, the last couple of weeks, and then all of those deals push into the next quarter.
LivePerson delivers conversational AI. The central idea is that with very short text messages coming into the system from a consumer, the machine can recognize what that consumer is interested in, what their desire or “intent” is, so that the company can either solve it immediately through automation or route the issue to an appropriate [customer service] agent. That understanding of the intent of the consumer is, I think, at the cutting edge of what’s possible through deep learning, which is the basis for the kind of algorithms that we’re deploying.
The idea is to apply the same kind of conversational AI layer across our systems layer and over the top of the data-lake architecture.
You wouldn’t need to be a data scientist, you wouldn’t need to be an engineer to simply ask about some [financial or other] information. It could be populated dynamically in a [user interface] that would allow the person to explore the data or the insights or find the report, for example, that covers their domain of interest. And they would do it by simply messaging with or speaking to the system. … That would transform how we interact with our data so that everyone, regardless of background or skillset, had access to it and could leverage it.
The goal is to create what I like to think of as an AI operating model. And this operating model is based on automated data capture — we’re connecting data across the company in this way. It will allow AI to run nearly every routine business process. Every process can be broken down into smaller and smaller parts.
“Unfortunately, there’s a misconception that you can hire a team of data scientists and they’ll start delivering insights at scale systematically. In reality, what happens is that data science becomes a small group that works on ad-hoc projects.”
And it replaces the traditional enterprise workflows with conversational interfaces that are intuitive and dynamically constructed for the specific domain or problem. … People can finally stop chasing data; they can eliminate the spreadsheet, the maintenance, all the errors, and focus instead on the creative and the strategic work that makes [their] job interesting.
I’ll give you an example of where we’ve already delivered. So we have a brand-new planning system. We ripped out Hyperion and we built a financial planning and analysis system from scratch. It automates most of the dependencies on the expense side and the revenue side, a lot of where most of the dependencies are for financial planning. You don’t speak to it with your voice yet, but you start to type something and it recognizes and predicts how you’ll complete that search [query] or idea. And then it auto-populates the individual line items that you might be interested in, given what you’ve typed into the system.
And right now, it’s more hybrid live search and messaging. So the system eliminates all of the filtering and drag-and-drop [the user] had to do, the endless menus that are typical of most enterprise systems. It really optimizes the workflow when a person needs to drill into something that’s not automated.
Unfortunately, there’s a misconception that you can hire a team of data scientists and they’ll start delivering insights at scale systematically. In reality, what happens is that data science becomes a small group that works on ad-hoc projects. It produces interesting insights but in an unscalable way, and it can’t be applied on a regular basis, embedded in any kind of real decision-making process. It becomes window-dressing if you don’t have the right skill set or experience to manage data science at scale and ensure that you have the proper processing [capabilities].
In addition, real scientists need to work on problems that are stakeholder-driven, spend 50% to 80% of their time not writing code sitting in a dark room by themselves. … [They’re] speaking with stakeholders, understanding business problems, and ensuring [those conversations] shape and prioritize everything that they do.
There are data constraints. Data constraints are pernicious; they will stop you cold. If you can’t find the data or the data is not connected, or it’s not readily available, or it’s not clean, that will suddenly take what might have been hours or days of code-writing and turn it into a months-long if not a year-long project.
You need the proper engineering, specifically data engineering, to ensure that data pipelines are built, the data is clean and scalable. You also need an efficient architecture from which the data can be queried by the scientists so projects can be run rapidly, so they can test and fail and learn rapidly. That’s an important part of the overall workflow.
And then, of course, you need back-end and front-end engineers to deploy the insights that are gleaned from these projects, to ensure that those can be production-level quality, and can be of recurring value to the processes that drive decision making, not just on a one-off basis.
So that whole chain is not something that most people, especially at the highest level, the CFO level, have had an opportunity to see, let alone [manage]. And if you just hire somebody to run it without [them] having had any first-hand experience, I think you run the risk of just kind of throwing stuff in a black box and hoping for the best.
There are some pretty serious pitfalls when dealing with data. And a common one is drawing likely faulty conclusions from so-called small data, where you have just a couple of data points. You latch on to that, and you make decisions accordingly. It’s really easy to do that and easy to overlook the underlying statistics that help to and are necessary to draw really valid conclusions.
Without that grounding in data science, without that experience, you’re missing something pretty essential for crafting the vision, for steering the team, for setting the roadmap, and ultimately, even for executing.