Someone asked me my opinion of Anchor Modeling, and I had to admit that I hadn’t really looked at it in detail. I’m not sure why — at first glance, “Almost 6th Normal Form” modified using a practical approach and software automation should be right in wheelhouse of people who work at Kalido!
I took a few minutes today to look it over, and while I’ll be the first to admit I’ve got a lot more to learn on the topic, here are my initial reactions. First, I really wanted to like it. Smart guys like Lars Rönnbäck see problems with current data modeling techniques and think: “I can improve on that”, and with a good team, set out to do it. We need more people around like that. Second, I’m on board with the Anchor Modeling philosophy Lars put forth at GSE Nordics in June 2011:
- Make modeling free from assumptions
- Make modeling agile and iterative
- Make evolution non-destructive
- Do not duplicate information
- Do not alter existing information
- Decouple metadata from the model
- Provide a simple interface for queries
Stuff I liked:
- Anchor Modeling takes a new modeling approach because they assume all requirements will change all the time. I totally agree with that, and I just don’t think the current practices we see out there using traditional ER modeling tools and ETL tools can really meet the agile requirements of a modern business. (Incidentally this is the same reason we created Kalido. I dug out this old website splash screen from 1999 using the Wayback Machine.)
- Model changes are primarily additive – systems that use non-destructive schema evolution are significantly more stable than those that don’t. Adding new attributes to the model leaves the existing structures in place and removing items comes about by ‘end dating’ them. Certainly this is the approach Kalido has taken.
- Absence of null values – Lars provides a few reasons for this, including: 1) an entity need only have an identity to be allowed into the data warehouse and 2) Constraints are not used in the data warehouse due to performance reasons. Kalido takes a similar stand against NULLs.
- Table elimination – Because Anchor Modeling has described the resulting table structures, their query optimizer can remove unneeded tables from a query. This is similar (though not exact) to how Kalido Query Definitions generate SQL and attempt to generate the optimal SQL to resolve what the user has asked for.
- Time variance – AM has temporal concepts baked right in. Because describing such capabilities can be challenging AM defines concepts like “changing time”, “happening time”, and “recording time.” Kalido calls this ability ‘corporate memory’ and I could certainly be persuaded that some of our temporal terms within that could be improved.
Stuff I didn’t like:
- I’m not a fan of the model representation. For example, here is a sample AM restaurant example:
Just like I don’t think most business users get anything out of a typical ER model, I don’t think this type of model is going to resonate with a business user either. I think a business model that can be the common language between the business and IT is going to have the most value. Business users can look at it, point out missing items, and return requirements to you in the context of the existing model. Behind the scenes, the system can then translate the model into whatever flexible format is required. For comparison sake, here is roughly the same model in Kalido BIM, with the same features, but in a much more business-friendly format:
- I think the naming conventions based on mnemonics (e.g. DI_Dish, CUI_Cuisine) are not liable to be popular at some companies nor with the BI teams designing reports. The automated view design helps, but I think more could be done here.
- If the existing modeling structure is going to be persisted, I would consider adding intelligent defaults to improve model development time. For instance, after typing the Descriptor, the Mnemonic field could be prepopulated based on a best practice. In this example, Lars manually typed EXP when it could have defaulted, and if EXP was already used, the modeler could highlight that in red and perhaps suggested an alternative. The data types could also have defaulted or at the very least had a drop down for easy selection.
- I may have missed this in my limited reading, but it wasn’t clear to me what happened when the initial assumptions proved to be incorrect or the business changed their mind. For example, Cuisine was deemed very small, so tinyint was used as the key. But what if later our system evolves and we ended up with more than 255 types of cuisine? Is a failing load our first indication of that? Does the table definition have to change? How does that fit with the principal of non-destructive schema evolution? Similarly, time variance appears to be an ‘add on’ technique. What happens if we run for 6 months with no history and the business later comes back and says they do want history? Being able to optimize is good, but I think a good system should make smart choices about things like surrogate key type. The amount of space saved using tinyint is not likely to add more value than the time lost debating if it should be a tinyint, especially if we factor in the time needed down the line to change it if the initial choice we made was wrong.
- AM appears to have a heavy reliance on ETL at the moment. When implementing a system like this (without database constraints), referential integrity is of critical importance, which means the ETL team is going to have to be extremely well versed in the principals of AM to avoid making a mistake that corrupts the database. If they don’t already exist, templates for the leading ETL tools should be generated to mitigate this risk. However, I think a much safer approach is the one Kalido took which is to manage the population of the data warehouse natively and pushing the heavy lifting to the database itself (e.g. ELT). This helps on all platforms, but especially helps on the MPP databases like Oracle Exadata and Teradata.
Bottom line, I think it is still early days for Anchor Modeling. I certainly see some potential here, but at the current time, I’d give a lot of thought before deciding to build your data warehouse using this approach. Success is likely to require a ‘guru’ level anchor modeler, and the AM site mentions that at present there are less than 100 implementations in the world (mostly in Sweden), so a guru may be hard to find. Even with a guru, I think the business is still going to need another representation of the model to understand what they are signing off on, and I think you are likely to have an ongoing struggle with acceptance from the larger DBA and BI community in your company. That said, I wish Lars and team well, because these types of ideas (“Almost 6NF” matched with a practical approach and software automation) are something Kalido has lived with daily for the past 15 years
and what we have built our company on.