I derive great enjoyment from reading the blogs, comments and tweets about whether or not Big Data will replace the Data Warehouse. But as with most predictions, I believe there is a half-truth hidden therein… The momentum around Big Data is forcing organizations to re-evaluate their investments in data management and there can be no doubt that many Data Warehouses haven’t delivered close to their anticipated ROI. On the other hand, Big Data technologies are still at the early stage of the adoption curve and most organizations are not “betting the bank” on them yet.
That said, with the investment in Big Data technologies, the lines between these technologies and the traditional databases (or more specifically the data warehouse tools) will continue to blur (with the migration to the cloud only helping accelerate this shift). However, the underlying principles on which the two technologies are based are different, therefore so will the derivative technologies be (at least in the near term).
So, how does this impact the fate of the Data Warehouse? To address the question we have to consider the different underlying information requirements as well as the prevailing culture in the average enterprise. Recently, we’ve started to think more about the notion of the “half-life” of data and the different classes of analysis that enable decision making. Think about these two classes of analysis; “insight discovery” versus “operational intelligence”. They are fundamentally different than the traditional “classes of end-user access” approach to BI.
I’ve had the pleasure of being involved in many Data Warehouse projects that have made breakthrough findings (discovered insights). Retailers benefit from discovered insights on inverse price elasticity while insights on negative margin high volume products help CPG companies better set pricing and promotion strategies. Telecom companies rely on data warehouse-aggregated information and analytics to discover redundant high cost circuits and Insurers to help curb inefficient reserve management. In all these cases the business returns were quantifiable and in the millions. Once the “insight” was discovered, corrective action was taken and the need shifted from discovery to the operational execution and monitoring (operational intelligence) of the “new normal”.
In all these cases, the critical ingredients were the data, the business knowledge and the IT skills to crunch the data (brought together in a collaborative and agile way). The Big Data technologies enable a whole new class of insight discovery, against data we’ve either not had access to before (new sensors, social media etc…) or not had the tools to efficiently process such as high volume time-series analysis or cluster analysis on a traditional RDBMS. But, the technology is only one of the 3 ingredients; we need the data and the business knowledge to develop the insights.
What’s especially interesting to me about the Big Data movement is who’s investing in it. By and large it’s not IT; it’s the business in search of insights or discoveries on which they can capitalize. That’s creating a tension (similar and related to the “Self-service BI” trend) with the IT managed DW and BI competency center. But in truth those in IT haven’t had the right tools and process to establish a collaborative, agile, working model with their business sponsors.
In a 2012 survey we’ve run of 542 practitioners and managers 430 (83%) of all respondents confirm that they “can’t keep up with the business requests for new or updated information requirements…” In the same survey, a third of the respondents reported employing more than 20 people to support their data warehouse and 47% spent more than $1.5 million a year to operate their data warehouse.
It’s little wonder the business is looking to other technologies to develop insights! So yes, I do believe that the emergence of Big Data into the business mainstream changes both the conversation around and ultimately the position of a traditional data warehouse. But the change won’t be one or the other; it will rather be a shift in focus to make the data warehouse more agile, responsive to business needs and more efficient to build and operate!