The analytic sandbox, also called a data sandbox, is an idea that resurfaced in a big way in 2012. In case you haven’t come across one yet, Techopedia defines the data sandbox as “a scalable and developmental platform used to explore an organization’s rich information sets through interaction and collaboration.” Turns out sandboxes are an old idea made new again. Paul Fong points out in The Analytic Sandbox that sandboxes were part of the 2001 Corporate Information Factory published by Bill Inmon and Claudia Imhoff.
So why the renewed interest? A likely cause is that organizations are recognizing they need more speed and efficiency from their analytic processes to capitalize on new opportunities in today’s fast-moving big data world. Unfortunately, a non-agile Enterprise Data Warehouse (EDW) offers neither speed nor efficiency when it comes to supporting new initiatives. According to Fong, adding a new initiative to the EDW will cost about $500,000 and have a lead time of 6-9 months.
From master data governance to self-service BI, many organizations are developing structured processes for accessing and consuming data without significant IT intervention. While many promises of effective governance have been met for master and reference data, transaction data has yet to have the same level of control or visibility. Therefore, agile data management needs to be extended to the EDW to efficiently drive the sandboxes needed by the next generation of power users.
I see the analytic sandbox as the next step in the evolution of agile data management. New ideas happen fast with power users. And the rise of the data scientist is only going to increase the intensity and the pace of discovery. Like any scientist they need a lab to test complex new ideas. In most organizations the data scientist can do this faster than the EDW team. Unfortunately, it’s not as simple as setting up a virtual (or physical) area for data scientists and throwing data over the wall.
- Will the extracted data be reliable and fit for purpose?
- How will data scientists integrate 3rd party data?
- Do processes exist to get good ideas back into the EDW quickly?
The answer to questions 1 and 2 typically involves a combination of rules engines, data matching tools, and data munging that won’t be easily repurposed for the warehouse. The answer to 3 is probably that new ideas stay in the sandbox because it takes too long, is too labor intensive, and too costly to implement new initiatives in non-agile EDWs as mentioned above.
Analytic sandboxes are viewed by many as a way to mitigate the lack of agility in most traditional EDWs; however, the result is the creation of yet another data silo. Analytic sandboxes can become even more valuable by leveraging agile data warehousing techniques in concert with the EDW. An agile Enterprise Data Warehouse (aEDW) can help make sandboxes more effective by:
- Making it easier understand and access the data in the warehouse;
- Providing governed and trusted data to the analytic sandboxes;
- Using agile tools to streamline the integration of external data used in the sandboxes; and
- Guaranteeing that good ideas are made repeatable in the warehouse quickly.