Resolving ragged hierarchies in Data Modeling
To illustrate the concept of ragged hierarchies consider a family consisting of a child, parent and grand parent. If we model an event table that records the event of child getting dropped off at school by parent, the hierarchy looks like this.
This model works under the assumption that in the perfect world children get dropped of by their parents. What if either parents or grand parents can drop off a child and that needs to be tracked? We might end up doing something like this (this is not the only way to model this).
Now the event table has a nullable foreign key from GrandParent which is populated when there is an occasional drop off by grand parent— at all other times it remains null.
The drawback of this design is that the data model is loose in enforcing business rules. The nullable foreign key leaves it to the application developer to fill it up diligently whenever the grand parent is dropping off.
Let’s consider a real world example.
This is straightforward business rule. A drug, let’s say Amoxicilin can be part of multiple drug plans. Each plan can have one or more sub-plans. When a customer enters a membership for Amoxicilin, she is added to a drug sub-plan belonging to that drug. Her membership is defined by PolicyID.
The twist is when some customers are part of a plan that doesn’t necessarily have a subplan, or when special type of customers are part of the plan directly (let’s say an institutional customer). It will look something like this.
Drawbacks
This design takes us back to data model loophole explained in the school drop-off example. It’s upon the application developer to enforce that a null foreign key from DrugPlan is allowed only when that membership (policy) record has a valid sub plan. Similarly a null foreign key from DrugSubplan should be allowed only when that policy has a valid plan but no sub plan.
And to add more unnecessary ETL ambiguity, consider the situation when membership has a sub plan. The plan for that membership can be traced back through the foreign key on DrugSubPlan table. In this case does ETL also update the direct foreign key to DrugPlan table on Policy table with the same plan?
Here is the problem. Now you have two ways to find out the plan belonging to a policy. By backtracking through DrugSubplan or by backtracking directly to DrugPlan. Which one is more trustworthy after 5 years of operation? What are the odds this direct foreign key on Policy is not consistent after 5 years of operation?
The odds are pretty high — if the design allows an error, it will eventually be made. Whenever possible, it is best to not leave it to applications to enforce foreign keys.
The reason this happens is because plan, sub plan and policy form a ragged hierarchy. Not all policies have both a plan and a sub plan.
Ragged hierarchies can be resolved through self joins. Here is a simple implementation of the above ragged hierarchy. Depending on how different the attributes are for plan and sub plan this design could go a bit more detailed.
Note that foreign keys are shown in the bottom table only just to make the self-referencing relationship clearer. The data model loophole is overcome through the cardinality of Policy table.
Cardinality of Policy
When a customer is associated to a sub plan there will be two policy records with the same PolicyID. One record has the sub plan in PlanSubplanID. This record’s foreign key points to the second record for that customer. This second record (or the parent record) has the plan in PlanSubplanID. The foreign keys will be null for the parent record, but this doesn’t leave loophole in the data model.
When a customer is associated to only a plan, there is only one record for that membership.