Does RM Pro 4 really preclude NULLs?
Quote from Darren Duncan on June 8, 2020, 7:31 pmQuote from dandl on June 8, 2020, 10:29 amSentinel values allow you to encode missing values on input and for interchange, which union types do not. It's not a gimme.
I strongly disagree. Union types are fully useable for input and interchange, and interchange is one of the key use cases of my proposal.
Take for example JSON which natively supports union types, by which I mean null/boolean/numbers/strings/etc are all disjoint types but you can have a collection whose elements are each any of those.
So [52, 29, null, 25] for example is exploiting a union type of integer and null and that is an interchange format.
Quote from dandl on June 8, 2020, 10:29 amSentinel values allow you to encode missing values on input and for interchange, which union types do not. It's not a gimme.
I strongly disagree. Union types are fully useable for input and interchange, and interchange is one of the key use cases of my proposal.
Take for example JSON which natively supports union types, by which I mean null/boolean/numbers/strings/etc are all disjoint types but you can have a collection whose elements are each any of those.
So [52, 29, null, 25] for example is exploiting a union type of integer and null and that is an interchange format.
Quote from dandl on June 9, 2020, 12:27 amQuote from Dave Voorhis on June 8, 2020, 11:15 amQuote from dandl on June 8, 2020, 10:29 amQuote from Dave Voorhis on June 8, 2020, 8:10 amQuote from dandl on June 8, 2020, 2:14 amQuote from Darren Duncan on June 7, 2020, 7:27 pmWhat I believe is best, and what I implement, is a strictly 2VL system, where every value equals itself, that supports type unions.
I have a system-defined singleton type named "Ignorance" whose meaning is, we don't have a normal value here AND we have no explanation for why.
So Ignorance is the direct analogy to a generic null of many languages that carries no other semantics.
Users can also choose to define other singleton types that correspond to more specific reasons a normal value might be missing.
All of my normal types like numbers or strings or relations expressly do NOT include any special values. Users can union these with Ignorance or other special values when they want that.
This may well be the best approach (but I prefer 'Missing'). For it to play nicely I think you need:
- easy to define and use T? as 'type T plus Missing', so all the libraries on T work easily with T? [Nullable<> in C# works quite well in this regard.]
- works consistently with ordered types and aggregation
- Outer Join that emits Missing if attribute type is T?.
These are option types, as I mentioned above. Option<T> in Java; Nullable<T> in C#.
I might have misunderstood, but I don't think so. In C# at least all nulls are the same.
Nullable<T> is not C#'s null, it's a parametric type. But C#'s implementation is peculiar -- tied up as it is with C#'s nullable types -- and seems only halfway to being a true union type. Far better is Java's Optional<T> (which is what I meant, not Option<T>) and comes closer to being a real union type. It would be trivial to create a C# equivalent.
That's so. It's incredibly useful, because it's directly supported but it's not a union. I made one called Maybe<T> for a particular purpose, but it's not really the same thing.
Now that I think more about it, C#'s Nullable<T> is probably a violation of RM Pro 4, but an equivalent to Java's Optional<T> would be fine.
It is. Nulls are nulls and untyped.
Sentinel values allow you to encode missing values on input and for interchange, which union types do not. It's not a gimme.
Sentinel values are almost inevitable in external CSV files and the like, and often unavoidable if you're outputting CSV, but you really don't want to be keeping them around inside a data processing system, particularly when file X uses one sentinel value and file Y uses a different sentinel value, and you need to UNION or JOIN X and Y.
Convert them to an option type as early as possible, or you will wind up with the unpleasant situation of file X having one sentinel value, file Y having a different sentinel value, and your internal systems using a third sentinel value -- perhaps one that file X and Y both define as valid data. Much ugliness ensues.
Broadly I agree, but it still relies on agreed conventions. If I use Missing<T> and you have Ignorance<T> and someone else has Unknown<T> and the system recognises none of them as special, the problem is just as bad. My core argument was really over the idea of a single widely used conceptual basis for a 'missing' value, that could be used in conjunction with system types and implemented in user types, and that would be respected by inbuilt operators like join and aggregation. All the benefits of SQL NULL but less fattening.
Quote from Dave Voorhis on June 8, 2020, 11:15 amQuote from dandl on June 8, 2020, 10:29 amQuote from Dave Voorhis on June 8, 2020, 8:10 amQuote from dandl on June 8, 2020, 2:14 amQuote from Darren Duncan on June 7, 2020, 7:27 pmWhat I believe is best, and what I implement, is a strictly 2VL system, where every value equals itself, that supports type unions.
I have a system-defined singleton type named "Ignorance" whose meaning is, we don't have a normal value here AND we have no explanation for why.
So Ignorance is the direct analogy to a generic null of many languages that carries no other semantics.
Users can also choose to define other singleton types that correspond to more specific reasons a normal value might be missing.
All of my normal types like numbers or strings or relations expressly do NOT include any special values. Users can union these with Ignorance or other special values when they want that.
This may well be the best approach (but I prefer 'Missing'). For it to play nicely I think you need:
- easy to define and use T? as 'type T plus Missing', so all the libraries on T work easily with T? [Nullable<> in C# works quite well in this regard.]
- works consistently with ordered types and aggregation
- Outer Join that emits Missing if attribute type is T?.
These are option types, as I mentioned above. Option<T> in Java; Nullable<T> in C#.
I might have misunderstood, but I don't think so. In C# at least all nulls are the same.
Nullable<T> is not C#'s null, it's a parametric type. But C#'s implementation is peculiar -- tied up as it is with C#'s nullable types -- and seems only halfway to being a true union type. Far better is Java's Optional<T> (which is what I meant, not Option<T>) and comes closer to being a real union type. It would be trivial to create a C# equivalent.
That's so. It's incredibly useful, because it's directly supported but it's not a union. I made one called Maybe<T> for a particular purpose, but it's not really the same thing.
Now that I think more about it, C#'s Nullable<T> is probably a violation of RM Pro 4, but an equivalent to Java's Optional<T> would be fine.
It is. Nulls are nulls and untyped.
Sentinel values allow you to encode missing values on input and for interchange, which union types do not. It's not a gimme.
Sentinel values are almost inevitable in external CSV files and the like, and often unavoidable if you're outputting CSV, but you really don't want to be keeping them around inside a data processing system, particularly when file X uses one sentinel value and file Y uses a different sentinel value, and you need to UNION or JOIN X and Y.
Convert them to an option type as early as possible, or you will wind up with the unpleasant situation of file X having one sentinel value, file Y having a different sentinel value, and your internal systems using a third sentinel value -- perhaps one that file X and Y both define as valid data. Much ugliness ensues.
Broadly I agree, but it still relies on agreed conventions. If I use Missing<T> and you have Ignorance<T> and someone else has Unknown<T> and the system recognises none of them as special, the problem is just as bad. My core argument was really over the idea of a single widely used conceptual basis for a 'missing' value, that could be used in conjunction with system types and implemented in user types, and that would be respected by inbuilt operators like join and aggregation. All the benefits of SQL NULL but less fattening.
Quote from johnwcowan on September 21, 2020, 8:00 pmThe most important thing with using a Maybe type (I like the Haskell names myself) is that Just Integer must not be an Integer. If it is, then we go right back to the problems of NULL, except with an explicit rather than an overloaded type. But if you have to use an overloaded DISCLOSE operator to convert it to type Integer before using it, then you can specify what to substitute for Nothing yourself on a case by case basis: the string "None" or the empty string, or 0, or Julian day 0, or whatever you please, but you cannot fail to specify a substitute. This is much superior to using an explicit sentinel.
Of course, in order to extend a relation containing an attribute of Maybe type, one needs an overloaded ENCLOSE operator.
The most important thing with using a Maybe type (I like the Haskell names myself) is that Just Integer must not be an Integer. If it is, then we go right back to the problems of NULL, except with an explicit rather than an overloaded type. But if you have to use an overloaded DISCLOSE operator to convert it to type Integer before using it, then you can specify what to substitute for Nothing yourself on a case by case basis: the string "None" or the empty string, or 0, or Julian day 0, or whatever you please, but you cannot fail to specify a substitute. This is much superior to using an explicit sentinel.
Of course, in order to extend a relation containing an attribute of Maybe type, one needs an overloaded ENCLOSE operator.
Quote from dandl on September 22, 2020, 3:17 amQuote from johnwcowan on September 21, 2020, 8:00 pmThe most important thing with using a Maybe type (I like the Haskell names myself) is that Just Integer must not be an Integer. If it is, then we go right back to the problems of NULL, except with an explicit rather than an overloaded type. But if you have to use an overloaded DISCLOSE operator to convert it to type Integer before using it, then you can specify what to substitute for Nothing yourself on a case by case basis: the string "None" or the empty string, or 0, or Julian day 0, or whatever you please, but you cannot fail to specify a substitute. This is much superior to using an explicit sentinel.
Of course, in order to extend a relation containing an attribute of Maybe type, one needs an overloaded ENCLOSE operator.
The essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
The MAYBE construct is quite simple to implement in any language that has adequate generics: I use it from time to time in C#.
Maybe<int>
simply creates a new type, which may or may not contain a value of typeint
, and a way to find out, for example:IsEmpty(Maybe<int>)
.No, you don't need special operators. Assuming no implicit type conversions (the usual convention in TTM), any
int
can be explicitly converted toMaybe<int>
by the usual means: a selector with one argument:Maybe<int>(int)
. The operator to convertMaybe<int>
toint
is a selector of two arguments:int(<Maybe<int>,int)
. The second argument is the 'default' value.Those three functions are all you really need. The whole thing is a bit nicer if the compiler helps, but it's barely necessary.
Quote from johnwcowan on September 21, 2020, 8:00 pmThe most important thing with using a Maybe type (I like the Haskell names myself) is that Just Integer must not be an Integer. If it is, then we go right back to the problems of NULL, except with an explicit rather than an overloaded type. But if you have to use an overloaded DISCLOSE operator to convert it to type Integer before using it, then you can specify what to substitute for Nothing yourself on a case by case basis: the string "None" or the empty string, or 0, or Julian day 0, or whatever you please, but you cannot fail to specify a substitute. This is much superior to using an explicit sentinel.
Of course, in order to extend a relation containing an attribute of Maybe type, one needs an overloaded ENCLOSE operator.
The essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
The MAYBE construct is quite simple to implement in any language that has adequate generics: I use it from time to time in C#. Maybe<int>
simply creates a new type, which may or may not contain a value of type int
, and a way to find out, for example: IsEmpty(Maybe<int>)
.
No, you don't need special operators. Assuming no implicit type conversions (the usual convention in TTM), any int
can be explicitly converted to Maybe<int>
by the usual means: a selector with one argument: Maybe<int>(int)
. The operator to convert Maybe<int>
to int
is a selector of two arguments: int(<Maybe<int>,int)
. The second argument is the 'default' value.
Those three functions are all you really need. The whole thing is a bit nicer if the compiler helps, but it's barely necessary.
Quote from Brian S on September 27, 2020, 4:34 amNULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
If I haven't supplied the birthdates of several employees and issue a query to determine whether any pair of employees have the same birthdate, then when NULL == NULL, the result is false. George's birthday is actually on 3/15, while Bill's birthday is actually on June 12. George and Bill obviously don't have the same birthday, but the query result would indicate that they do.
Remember: Logical problems are HUGE problems.
Brian
NULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
If I haven't supplied the birthdates of several employees and issue a query to determine whether any pair of employees have the same birthdate, then when NULL == NULL, the result is false. George's birthday is actually on 3/15, while Bill's birthday is actually on June 12. George and Bill obviously don't have the same birthday, but the query result would indicate that they do.
Remember: Logical problems are HUGE problems.
Brian
Quote from dandl on September 27, 2020, 2:01 pmQuote from Brian S on September 27, 2020, 4:34 amNULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
I'm not sure I follow that.
If NULL is a value of that type, then of course NULL == NULL. Why is it 'logically flawed', singly or doubly? It just is.
If I haven't supplied the birthdates of several employees and issue a query to determine whether any pair of employees have the same birthdate, then when NULL == NULL, the result is false. George's birthday is actually on 3/15, while Bill's birthday is actually on June 12. George and Bill obviously don't have the same birthday, but the query result would indicate that they do.
If NULL is a value of the type then two distinct queries are possible. One 'with nulls' logically means employees who either both have the same birthdate or both are unknown; the other 'without nulls' logically means employees who are known to have the same birthdate, unknowns having been excluded. Both seem perfectly 'logical' to me, but you do have to choose.
Remember: Logical problems are HUGE problems.
I don't know where that quote originally comes from, I've never really understood it. Google thinks a logical problem is a logic puzzle, a game for kids, child's play.
Brian
Quote from Brian S on September 27, 2020, 4:34 amNULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
I'm not sure I follow that.
If NULL is a value of that type, then of course NULL == NULL. Why is it 'logically flawed', singly or doubly? It just is.
If I haven't supplied the birthdates of several employees and issue a query to determine whether any pair of employees have the same birthdate, then when NULL == NULL, the result is false. George's birthday is actually on 3/15, while Bill's birthday is actually on June 12. George and Bill obviously don't have the same birthday, but the query result would indicate that they do.
If NULL is a value of the type then two distinct queries are possible. One 'with nulls' logically means employees who either both have the same birthdate or both are unknown; the other 'without nulls' logically means employees who are known to have the same birthdate, unknowns having been excluded. Both seem perfectly 'logical' to me, but you do have to choose.
Remember: Logical problems are HUGE problems.
I don't know where that quote originally comes from, I've never really understood it. Google thinks a logical problem is a logic puzzle, a game for kids, child's play.
Brian
Quote from Dave Voorhis on September 27, 2020, 4:37 pmQuote from dandl on September 27, 2020, 2:01 pmQuote from Brian S on September 27, 2020, 4:34 amNULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
I'm not sure I follow that.
If NULL is a value of that type, then of course NULL == NULL. Why is it 'logically flawed', singly or doubly? It just is.
That's one valid interpretation.
Another valid interpretation treats NULL as undefined. Thus, would it be appropriate for [latex]n \div 0 = \sqrt{-1}[/latex] in real numbers?
That NULL can -- depending on how it's used -- be considered either is one of (if not the essence of) the problems with NULL.
Quote from dandl on September 27, 2020, 2:01 pmQuote from Brian S on September 27, 2020, 4:34 amNULL == NULL is logically flawed for "applicable NULLs"--regardless of whether they are typed. It is doubly flawed for "inapplicable NULLs" or "'I don't have a clue' NULLs."
I'm not sure I follow that.
If NULL is a value of that type, then of course NULL == NULL. Why is it 'logically flawed', singly or doubly? It just is.
That's one valid interpretation.
Another valid interpretation treats NULL as undefined. Thus, would it be appropriate for n \div 0 = \sqrt{-1} in real numbers?
That NULL can -- depending on how it's used -- be considered either is one of (if not the essence of) the problems with NULL.
Quote from johnwcowan on September 27, 2020, 6:36 pmQuote from dandl on September 22, 2020, 3:17 amThe essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
I would argue that in SQL, NULL is implicitly cast to whatever type is expected. Still, that's something of a quibble.
The MAYBE construct is quite simple to implement in any language that has adequate generics [...] you don't need special operators
Indeed: if you have generics, Maybe is trivial. I was talking about the simpler situation where you don't have generics but want Maybe types ad hoc. For ENCLOSE and DISCLOSE feel free to substitute casts.
Quote from dandl on September 22, 2020, 3:17 amThe essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
I would argue that in SQL, NULL is implicitly cast to whatever type is expected. Still, that's something of a quibble.
The MAYBE construct is quite simple to implement in any language that has adequate generics [...] you don't need special operators
Indeed: if you have generics, Maybe is trivial. I was talking about the simpler situation where you don't have generics but want Maybe types ad hoc. For ENCLOSE and DISCLOSE feel free to substitute casts.
Quote from dandl on October 1, 2020, 12:59 amQuote from johnwcowan on September 27, 2020, 6:36 pmQuote from dandl on September 22, 2020, 3:17 amThe essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
I would argue that in SQL, NULL is implicitly cast to whatever type is expected. Still, that's something of a quibble.
And I would disagree: there is no cast involved, since SQL does not treat as as other values. There is a whole range of syntactic constructs just for dealing with NULL.
The MAYBE construct is quite simple to implement in any language that has adequate generics [...] you don't need special operators
Indeed: if you have generics, Maybe is trivial. I was talking about the simpler situation where you don't have generics but want Maybe types ad hoc. For ENCLOSE and DISCLOSE feel free to substitute casts.
I think you snipped the bit that matters: you don't special operators, all you need are type casts already handled by the language (with no need for MAYBE).
Quote from johnwcowan on September 27, 2020, 6:36 pmQuote from dandl on September 22, 2020, 3:17 amThe essence of my proposal here was that TTM does not really preclude nulls, so much as it precludes the SQL or any other typeless NULL.
I would argue that in SQL, NULL is implicitly cast to whatever type is expected. Still, that's something of a quibble.
And I would disagree: there is no cast involved, since SQL does not treat as as other values. There is a whole range of syntactic constructs just for dealing with NULL.
The MAYBE construct is quite simple to implement in any language that has adequate generics [...] you don't need special operators
Indeed: if you have generics, Maybe is trivial. I was talking about the simpler situation where you don't have generics but want Maybe types ad hoc. For ENCLOSE and DISCLOSE feel free to substitute casts.
I think you snipped the bit that matters: you don't special operators, all you need are type casts already handled by the language (with no need for MAYBE).