Big Retailers Put Testing to the Test

When Wawa, the $6 billion convenience-store chain, came up with a new flatbread breakfast product in 2008, the company’s marketing department was flat-out excited. The product had performed exceptionally well in spot testing and seemed more than ready for a systemwide rollout.

At that very time, though, Wawa was trying out a software-based approach to designing and evaluating customer-behavior tests. After it redid the flatbread test using the technology, the breakfast item was killed. “We found it was cannibalizing other, more-profitable products,” says Wawa CFO Chris Gheysens.

What made the difference was a more-scientific approach to selecting stores for test and control groups, as well as regression analyses to weed out irrelevant “noise” from test results. The software that provided the heightened sophistication, called Test & Learn, is from Applied Predictive Technologies (APT), which has established a strong following among large retailers.

Before signing on with APT, Wawa routinely performed what Gheysens calls a “good old manual” testing process, with financial analysts using spreadsheets to select stores and evaluate flux and trending data.

That process was problematic, says Gheysens. “We were never really confident as a finance group giving approval for new products or other initiatives, and we didn’t have a strong-enough voice to kill them either, because with that kind of manual analysis and all the noise in the data, you really rely on influence and emotion more than facts,” he says.

The software isn’t cheap. The average annual cost for the typical three-year license is between $700,000 and $1 million, notes Scott Setrakian, APT’s managing director. That price point defines its market: Fortune 1000 companies.

Wawa store

It takes anywhere from two weeks to a few months to establish a daily data feed from a customer to APT. Companies provide information by store, market, or merchandise class, and some also provide transaction-level data.

Test & Learn is designed to provide visibility into the impact of any kind of program, investment, or activity that may influence customer behavior. It provides three levels of understanding, Setrakian says: how an action will affect overall sales or profitability; how to tailor the action for maximum effectiveness, such as whether a 5%, 10%, or 15% discount is best; and the impact by market, store, or even by customer.

For designing tests, the software calculates the optimal number of locations to test and the test duration. For example, if sales vacillate greatly every day under normal conditions, more stores must be included in the test for a relatively longer period than if sales typically don’t vary much outside test environments.

The software then picks test stores that are representative of the full-rollout population in any number of selected attributes, such as store size, store age, or sales volume. And for each test store, the software finds others very much like it to compose a custom control group where the program will not be tested.

And, notably, tools allowing company business analysts to interpret the results are also built into the software. That is a key to why APT has established a unique niche in the business intelligence analytics market, says Kevin Sterneckert, a retail merchandising and manufacturing analyst for Gartner.

Retailers wanting to employ a scientific approach to testing often hire management consultants who license analytics software from such vendors as SAS, IBM, or Microstrategy and interpret test results. APT’s software, on the other hand, is designed to let a company design, run, and analyze tests itself without having to employ a large group of statisticians and highly specialized analysts, Sterneckert points out.

“The software allows the common analyst or merchant to develop a testing process and understand the results,” he says. “What APT tries to do is package into the software the intellectual capital that would come from consultants.”

Gheysens says Wawa was very cautious about purchasing APT’s software, running six months of pilot tests. Now, though, Test & Learn is becoming institutionalized at the company as the way decisions are made.

In addition to testing marketing activities and new products, Wawa has examined the impact of adding or subtracting hours of labor at its stores. There was a longstanding belief internally that at stores with smaller parking lots, having extra cashiers at certain hours would move customers out more quickly and thus stimulate sales. When the idea was tested with APT, it turned out to be false, says Gheysens.

Wawa also discovered that adding 50 hours of store labor per week produced dramatically higher sales at stores in the largest urban areas. The company initially was troubled, though, that it couldn’t explain why. While test results were clear early on, the initiative remained in testing for more than six months, after which the extra labor was added to about 50 stores.

Another retailer, Big Lots, says it’s saving several million dollars a year on printing and distribution costs after using Test & Learn to find out whether it could produce fewer circulars without harming sales. The $5 billion chain is also installing new energy-management systems at 700 stores this year following tests performed in 2009, says CFO Joe Cooper.

Michaels, the big arts-and-crafts chain, is currently using Test & Learn to experiment with distributing circulars electronically. It’s finding that all markets are not created equal — some do well with electronic distribution and others don’t, says Rick Jablonski, vice president of financial planning and analysis. The company also is investigating the impact of Halloween “pop-up” stores on nearby Michaels locations.

In Michaels’s decentralized structure, the software is used predominantly by finance staff embedded in the various functions. The company is putting together a user group to make sure that what’s learned is shared across the organization.

Chip Molloy, finance chief of PetSmart, says he first used APT’s software several years ago when he headed financial planning and analysis for a different retailer. “The challenge there was that folks weren’t always wanting to buy into it,” he says. “It doesn’t always give you the answer you want, especially if you’ve put a lot of effort into something that isn’t working.”

At PetSmart the culture is more oriented toward being disciplined about testing, notes Malloy. But, he adds, “you can’t be doing 1,000 tests at a time. You really have to have some governance around what you’re going to test.”