The following is a guest post from Niyi Oladepo, vice president of finance at Pike Industries, a CRH Company. Opinions are the author’s own.
It started with a premise, not a breakthrough.
State Department of Transportation bid results are public record. Every contractor bidding on a highway project in New Hampshire or Maine has its line-item prices published after the award. Mobilization, drainage pipe, hot mix asphalt, line striping — all of it, project after project, sitting in PDFs on state DOT websites.
My assumption: If historical bid results shape how experienced contractors think about price, I should be able to model that behavior with some confidence. Testing it took months, and the early versions were not good.
What went wrong first
The artificial intelligence models I worked with initially struggled with the volume and inconsistency of bid tabulation data. Early outputs included prices for items that do not apply to certain project types, competitor patterns attributed to the wrong geography and analyses that sounded confident yet were useless in practice. On one back-test, the tool came in 52% below the actual awarded value, not because unit prices were wrong but because it had worked from an incomplete item list with no way to flag what it was missing.
I spent months rejecting outputs and arguing with the model, going back in, challenging the reasoning, insisting it stay grounded in what the data showed rather than what it wanted to infer. When early outputs reached the estimating team, Bethany, one of our estimators, said, "This is neat." Greg, a senior construction manager, asked how I had gotten it done. Their feedback shaped the tool as much as anything I built deliberately. The point that never went away: Low bid price is not the objective. The system needed to find where our pricing position gave us a genuine advantage, not just where we could undercut.
What the tool actually has to know
The tool operates in three layers, and the division of labor between them was deliberate.
The first is historical baseline pricing. Drawing on 56 New Hampshire Department of Transportation projects valued at $394 million and 90 Maine DOT projects valued at $445 million (146 projects, $839 million in contracts, nearly 4,300 bid item data points), the AI establishes what competitors have historically priced for specific items, project types and regions across four dimensions: item type, project size, project category and geography.
Item type matters because competitors are not uniform across categories. A contractor whose advantage is HMA plant capacity will be aggressive on asphalt tonnage and conservative on labor-intensive drainage work. That pattern is invisible in any single project and becomes clear across 56 or 90. Project size matters because the same contractor prices differently at $2 million, where filling a crew matters, than at $20 million, where bonding capacity and cash flow dominate the calculation. Geography matters because NHDOT's six districts and Maine DOT's five regions have distinct competitive dynamics. A contractor who dominates resurfacing near Portland may be largely absent from Aroostook County because haul economics make them uncompetitive on that work.
The second layer is where the estimator takes over entirely. No dataset knows whether a project has night-only work windows, whether guardrail subcontractors are available in this market or whether a competitor just won three large jobs and is pricing conservatively to protect capacity. The estimator scores five categories of dynamic factors (work schedule restrictions, site and production conditions, subcontractor dependency, technical complexity and current market conditions) and each carries a pricing adjustment. That scoring is judgment, not data retrieval.
The third layer is where the AI re-enters. When multiple dynamic factors are scored, some are correlated in ways that make simple addition misleading. Night work and project complexity are the clearest cases. All night work inherently introduces project complexity, so a 5% adjustment for each does not represent 10% in additional cost. The AI is programmed with the known relationships between factors and applies a blended adjustment: In that example, 7% rather than 10%, capturing the real overlap rather than mechanically stacking independent adjustments.
Any CFO who has built a capital allocation scorecard or acquisition pricing model has hit this problem. Correlated risk factors scored independently and summed will overstate total risk. Building the correlation logic into the model produces consistent outputs without removing the estimator's judgment on the inputs.
What we learned
The most consequential decision we made was to start with data that already existed. DOT bid tabulations were public, structured and tied directly to a decision we make hundreds of times a year. We organized what was already there. Most organizations have at least one dataset fitting that description and have not yet built the process around it.
What I did not anticipate was how much of the value would come from the failure period rather than despite it. The 52% scope misidentified a specific flaw in how the model handled incomplete inputs. The phantom recommendations identified where data density was too thin to support reliable inference. An iterative period where a model produces wrong answers that practitioners challenge and correct is not a sign that the approach is not working. It is the approach working.
The boundary between what the AI does and what the estimator does turned out to be the most important design decision in the whole system, and also the one we got right more by necessity than by foresight. The estimator's read on competitor backlog, subcontractor availability and seasonal market conditions cannot be extracted from historical data because it does not exist there. Once we accepted that as a permanent constraint rather than a temporary one to be solved, the architecture followed naturally: AI handles the history, the estimator handles the present and the deduplication logic handles the interaction between them. That division is not a compromise. It is what makes the output worth using.