Modeling to Support Agile Data Warehouses: Hyper Normalization and Hyper Generalization

I was in Las Vegas recently for The Data Warehousing Institute conference helping out with our Magnitude Software booth, and I noticed one of the classes being taught there was “Agile Data Engineering:  New Data Modeling Techniques that Readily Adapt to Constant Change.”

We’ve been modeling for a long time now—is there really something new out there?  I decided to attend.

The instructor for this class was Ralph Hughes, someone who has spent 30 years delivering data warehouses as well as authoring two books on Agile Data Warehousing, with a third in progress.  The gist of Ralph’s class was this:  agile methodologies help to deliver data warehouses faster, but those techniques only get you so far.  To truly have breakthrough performance, you need to model in a way that can adapt to the inevitable business change that will be thrown your way as the project progresses.

Ralph focuses on two modeling methods.  The first method Ralph calls “Hyper Normalized Form” (HNF) and the second he calls “Hyper Generalized Form” (HGF).  HNF generally aligns to Data Vault and Anchor Modeling ideas while HGF is a generalized version of HNF that, when coupled with software automation like Kalido, overcomes some of the underlying issues with HNF.  Briefly, some of the issues with HNF are that it is a method primarily focused on the integration layer, not the presentation layer, and that getting data out of an EDW based on pure HGF is more difficult.  Certainly the HGF EDW model is more complex with significantly more tables than traditional 3NF.  In their defense, practitioners of HNF would say they have methods of addressing these issues, albeit with custom development to take the hubs, links, and satellites and build out more traditional consumption layers.

It is worth mentioning that not everyone agrees with this terminology.  For instance, Dan Linstedt, said of hyper normalization:  “No such thing exists.”  He thinks that industry conferences are “perpetuating incorrect instruction” by allowing new terms to be defined.  I disagree.  Industry terms are created and defined all the time.  We can agree or disagree with them, but they define our way of thinking about issues.  For instance, “Big Data.”  I don’t like the term, but I know what the industry means when they say it.  Also, the first edition of Corporate Information Factory in 1998 by Bill Inmon, Claudia Imhoff, and Ryan Sousa never once said “third-normal form” in it, but if you ask someone what is meant by an Inmon-style normalized warehouse, I bet 95% of people asked would say that normalized meant 3NF.  If you ask someone about a denormalized warehouse, they would know you were probably talking about Kimball-style star schemas.  Even within the Data Vault community, things seem to be changing with evolution towards both American and Dutch/European Schools of Data Vault (“Unified Anchor Modeling”).

Bottom-line, language evolves, and adding a qualifier like hyper normalized clearly differentiates to people that what you are saying is not just a 3NF warehouse which is what the bulk of people would think if you just said normalized.  By that standpoint alone, having a different term makes sense.

Overall, I really enjoyed Ralph’s class and hearing about his success in applying agile delivery techniques to the data warehousing field.  While some of these modeling techniques are not exactly new, it is nice that they are finally getting some coverage at an industry event the caliber of The Data Warehousing Institute!


0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply