Data Model Denormalization-How do we trade-off while doing denormalization?
- Sub-question1: the standard to implement
- Do we always have to denormalize a model? For what kind of project must we use denormalization techniques while others may not?
- Since denormalization has its gains and losses, how well should we denormalize a data model? Perhaps, the more complete we denormalize, the more complex, uncertain and poor the situation will be.
- Sub-question2: the characteristics of normalization
- Does denormalization have several levels/forms the same as that of normalization? For instance: 1DNF, 2DNF...
- Given we can denormalize a data model, it may never be restored to the original one because to do normalization, one can have many ways while to build a data model, you can have multiple choices in determining entities, attributes, etc.
You do not always have to denormalize. I have had a few data models that did not need to be denormalized. However, I admit that these models were relatively small and designed to help solve simple problems.
Nonetheless, most trade-offs are performance based, based on the database engine that will be used. I would say that one would want to maximize performance while maintaining the integrity of the data. When push comes to shove, I feel the integrity should win out over performance. Otherwise, the data could end up in poor condition and may be unusable.
While normalization works up the normal forms, I feel that denormalization works down the normal forms. Hopefully, there is a normal form that the model ends up in, it just may not be in the most optimal normal form. It could be the case that the Conceptual Model is in 2NF, the Logical Model is in 4NF and the Physical Model is in 3NF.
I agree with Raymond. I would focus on the tradeoff between data integrity versus usability. By 'usability' I mean that many users feel multi-table joins are difficult so they request a 'flattening' with denormalized data. But as mentioned, denormalization adds risk that data becomes out of sync.
In my opinion the process should be thought of as “creating a denormalised version of the model” rather than “denormalising the model” since a completely normalised logical data model should be retained after it is finalised (by “completely normalised” I mean BCNF unless there are any all-key tables with 3 or more columns, or 6NF is required to manage time-variance).
This is because denormalisation is generally only required for a particular version of a particular platform and may become unnecessary as that platform evolves.
The completely normalised logical data model should be the reference model for each physical data model, and the physical data model is where any denormalisation occurs. Denormalisation incurs costs in terms of maintaining consistency between the multiple instances of each item of redundant data, so should in general only be considered as a form of temporary fix to be reviewed whenever the platform is updated.
The reason for retaining the completely normalised logical data model is that if information requirements change the conceptual data model should first be changed and changes to that should then flow through to the logical data model. Each new or changed data item should then be reviewed to establish whether it requires denormalisation.
If the conceptual data model is used (as it should be) as the business view of the information being maintained, there is a case for that model to be unnormalised, e.g. where certain computed data items such as totals are business-critical, they should be included in the conceptual data model (and marked as computed). While these redundant data items should not be included in the logical data model, they can be included in the physical data model but only if there is a sound case for such computed items being stored rather than calculated on the fly.
@y-chen-33 these questions aren't from a school assignment are they? 😶
Nevertheless I'm happy to throw in my two cents (my opinion which isn't worth much). Raymond's and Graham's replies were really good.
Personally, when I'm training new modelers and data architects, the several hours of slides and lip-flapping boils down to this:
- ALWAYS design the physical data model to its ideal normalized state FIRST (typically BCNF).
- THEN let it ripen for a while. Let the developers use it. If the model is complex and unwieldy, give them them some abstractions and helps like views, instead-of triggers, and canned/reusable DML that take away some of the join drudgery.
- ONLY denormalize when forced to due to performance requirements of the system. There are a few other edge cases, but performance is the main reason to denormalize. If you have to denormalize, ensure that there are robust mechanisms built or utilized to keep the source data and its redundant manifestations in sync.
So as far as your questions are concerned:
SubQ1.1: No. Denormalize to support performance requirements that cannot be attained by well-tuned SQL or stored procs operating on the normalized tables. Most of the time when others are clamoring for denormalized, flat or friendly data structures, what they really need is better SQL, bulk SQL, re-thought application requirements, and other solutions to "make it fast" that do not involve denormalization.
SubQ1.2: I don't believe there is such a thing as well-denormalized. Every denormalization is a new instability, a point of possible data integrity failure in the system. The more denormalized, the less "well" the model is. So the second sentence, the tentative observation after the question is spot-on which reads "the more complete we denormalize, the more complex, uncertain and poor the situation will be."
SubQ2.1: Not officially. I know of no classification scheme of the levels of badness that can be obtaining recursively denormalizing a system until it's one, giant, flat table. But I'm no expert.
SubQ2.2: I got lost. This appears to be a statement rather than a question.