Can an indicator be null?
In a recent Data Modeling Master Class, someone asked whether an indicator can be null. An indicator is an attribute containing only two values, such as True or False, Yes or No, On or Off, etc. If null is allowed as a value, that would be a third value.
My response was that on a logical data model, we need to show the rules, and therefore an indicator must be not null to show the rule that only two values should be allowed and not the third null value. On a physical data model however, where we frequently require tradeoffs such as loss of rules in exchange for performance or ease of development, an indicator might be null if the tradeoffs make sense.
What are your thoughts?
I agree in theory nulls should not be allowed. However, in reality, there is quite often a third option. That third option is we have no information on the element. What should be a value when you have no knowledge on what the value should be in a specific case?
If one of the two values have zero effect then set that as a default and then don't allow nulls. However, if both of the values have consequences and causes effects then one can't really set one value as a default which leaves the null option.
I would think this could also apply to the logical model as well.
Other views on this would be appreciated.
I feel that nulls should be allowed as an interim workaround, so to speak, but not as a permanent solution.
I use nulls largely to indicate a choice that has not yet been made or an unknown value. I feel that a record that includes nulls is not a complete record and that you cannot make an informed decision based on an incomplete record. It's like saying, in the case of a Yes/No choice, a null indicates Maybe. How can anyone make a practical choice based on a Maybe? In others case, a null may indicate "I don't know". Again, quite difficult to make a good decision based on "I don't know".
However, at any given point in time, you do want to capture what data points are valid at that time. And, there may be some unknowns at that time. The hope is that, eventually, the user goes back and "corrects" the null values. My current database has checkpoints. It will not proceed forward until all nulls are cleared. (I realize, that's more of an implementation solution than a modeling one.) It's the realization that the user may have some of the answers, which should be captured, and then has to wait for other answers from other sources.
From my point of view (1) the logical data model must represent the business data requirements and (2) null is not a value - it indicates the lack of a value, which is something different.
The model can show the rule (valid values are true/false, or 0/1, or Yes/No) while at the same time it can show that it is optional. A good data modelling tool like erwin illustrates this by having a validation rule that only includes these 2 values.
The fact that the attribute is optional should not be read as 'a third value of null is allowed'. It does make sense however to describe in a comment what it means when no value is provided/available.
So my logical models may contain several indicator domains: 2 mandatory domains, almost identical, except for the default value (since there are 2 possible default values, the model may support both) . And one optional domain, without a default.
May be this does not follow one or other theory, but it does facilitate the 3 different types of indicators related to real business data requirements in a logical way.
We are thinking alike here. I tend to refer to these attributes as flags or toggles. I use "indicator" and "code" interchangeably when talking about attributes that store short codes where more than two choices are allowed. But I'm mostly self-taught, so am happy to be corrected.
In the past I allowed nulls in flags. Oracle didn't have a boolean, so it was typically emulated with a single-character nullable column. With time and experience, I learned that allowing three values causes problems. First, allowing three choices, where the NULL option doesn't utilize normal equality operators, caused more complex SQL than was otherwise necessary. Second, the knowledge that a given flag column allowed three values -- although easily discoverable in the DB IDE or data dictionary -- was easily forgotten, and rarely passed down when someone who "knew" that model left the team. The successor has an even higher chance of missing that small, but important detail.
So now, I dig "pits of success" for the engineers to fall into. In the standards I publish and enforce, flags are to start life as a non-nullable column, with a default, and only two values are allowed (enforced by check constraint in Oracle or by boolean data type in Postgres). Most of the time, this is what the business needs anyway, and the default value meets requirements nicely. If a third value is really needed to mean "No answer received yet", or "We really don't know", then I turn this attribute into a code column, whose valid values are typically enforced by a small lookup table, where "U" is a third value that means Unknown (could also be done with an enum in Postgres or a check constraint in both Postgres and Oracle).