While recent innovations in the software sector have significantly boosted capabilities for most software products, free open-source software solutions have made an even bigger splash. More than half of all data mining tasks are now conducted using open-source software, displacing the purchase of proprietary software. Analysis from market research firm IBISWorld estimates the adoption of open-source has contributed to a 1 percent annualized decline in the price of data mining software in the three years to 2014, to about $8,000 per user per year. Although most buyers cannot rely solely on open-source software and will still have to purchase some proprietary data mining software, they can reduce total costs by buying fewer proprietary licenses.
Pricey and Proprietary
For the uninitiated, data mining is the manipulation and examination of very large databases to find patterns. Data mining is used for databases that are too large to be handled by data management software like Microsoft Access. Data mining software is used by any organization that utilizes large databases, such as financial institutions, academic institutions, retailers, government agencies and consulting firms.
According to IBISWorld estimates, about 50 companies develop and publish data mining software, from conglomerates such as IBM, Microsoft and Oracle to independent vendors like StataCorp or StatSoft that specialize solely in data mining.
The proprietary market is highly concentrated, with IBM and SAS Institute together controlling about half of the market’s revenue. The high level of concentration means there is less price competition in the market, reducing a buyer’s ability to negotiate for the best deal.
The price of proprietary software moves mainly with the cost of producing the software and with demand from the public and private sectors. Production costs depend primarily on software engineer wages and the level of innovation and change to underlying code, and demand is based primarily on the level of government investment and corporate profit. Even though both wages and demand have risen in recent years, prices have fallen because vendors have been forced to cut prices to maintain market share in the face of the rising popularity of open-source options.
Unsettling the Market
High market-share concentration in the paid data mining software market would normally spell low negotiating power, but the existence and rising popularity of open-source data mining software, specifically the R programming language, has given buyers greater power and options. Few open source tools have impacted data mining software more than R. According to the Rexer Analytics “2013 Annual Data Miner Survey,” R has been the most-used data mining tool since 2010; about 70 percent of data miners reported using the tool in 2013. Currently, about one-quarter of all data miners use R as their primary analytics tool, more than any other software.
The rapid adoption of the R programming language has played a significant role in disrupting the market. Vendors have been forced to lower their prices in an effort to retain customers. The average price of proprietary software has fallen $860 since 2008, despite continual improvements in power and capabilities. In addition, vendors have grown increasingly willing to offer bundle discounts to companies that also purchase related software packages, such as data management and business analytics software. Use of the R programming language is expected to increase over the next three years, which will force vendors of proprietary data mining software to cut prices even further. Prices are forecast to fall an average of 1 percent per year during the period.
‘R’ You Suited to R?
The popularity and broad functionality of R means that companies using only proprietary data mining software can potentially save money by incorporating greater use of R. Although most companies engaged in data mining will not be able to fully switch over to R due to the nature of the work, buyers can try to build an R software package that emulates the functions of costly proprietary software, which would allow them to release those licenses and save on software costs.
Although open-source software is generally free, it does come with other costs that are important to evaluate. Its high level of customizability is great for users that have a firm footing in programming, and in the R language specifically, but the software is practically useless for companies without such programmers on staff. Those companies will have to stick to proprietary software, hire a programmer with R expertise (they typically command a salary of more than $100,000) or train their existing staff members.
Proprietary software is built with robust and easy-to-use user interfaces to guide users through both standard and more advanced functions. On the other hand, companies that use R must build and customize their own platform with little help from the program itself, a process that can take a month or longer. Due to its cost, customizability and steep learning curve, R is best suited for buyers that place high value on saving money and the ability to write code for the program, and low value on the quality of the user interface and ease of use.
R’s flexibility means that it can be made to emulate the functions of proprietary options, such as SPSS, SAS or Stata, and its low cost makes it cheap to have on hand even if it is not the primary tool used. R can be difficult to use, but companies that have an IT department with the time and resources to build and implement an R-based platform will be able to realize substantial cost savings by reducing their reliance on proprietary software. As a result, buyers should look to build an R-based data mining package that can fill the functionality of one or more of the proprietary software solutions they use. Even if buyers cannot cost-effectively build a complete R-based platform, they should be able to leverage its existence in price negotiations with vendors.
Dale Schmidt is a lead technology analyst at IBISWorld. He received his bachelor’s degree in economics from Northwestern University and can be reached at firstname.lastname@example.org.