The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Does RM Pro 4 really preclude NULLs?

Page 1 of 3Next

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Andl - A New Database Language - andl.org
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

I bet Date & Darwen would object -- along with most of the rest of us -- to any attempt to lawyer your way into defining a general SQL-like NULL. But I'm sure we've discussed various TTM-compliant ways of handling missing values before, including TTM-complaint ways of handling (for example) import of SQL query results into a D using various kinds of union types to define option types, both with or without the IM.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Yes it complies.  I've said so before.  Yes you can define a type that has this member NULL and yes you can define equality such that this NULL member is equal to itself.

And that misses the mark of WHY NULL (SQL version) was ever invented in the first place.  So no, it manifestly ***does not*** "alleviate a number of problems in dealing with SQL-like queries".  At best, it replaces them with an almost identical number of new problems, such as, e.g., if the user really wants the SQL NULL behaviour (NULL not equal to anything) then he's now forced to bloat his code all over the place with "WHERE x <> NULL" restrictions.

In the chapter by my hand on the subject in DBE, I called it M (or M(), I don't remember precisely).

Quote from Dave Voorhis on June 4, 2020, 7:45 am
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

I bet Date & Darwen would object -- along with most of the rest of us -- to any attempt to lawyer your way into defining a general SQL-like NULL.

Do you read it that way? I think most people find a general need to deal with missing values in their data, and most people find the SQL outer join a convenience. I'm exploring whether the benefits of NULL can be achieved without 3-level logic or SQL syntactic oddities; but also without having to build and implement a custom type just to do such a simple and generic thing.

I'm looking for feedback on issues that I might have missed. If it's just "D&D wouldn't like it", that doesn't help a whole lot.

But I'm sure we've discussed various TTM-compliant ways of handling missing values before, including TTM-complaint ways of handling (for example) import of SQL query results into a D using various kinds of union types to define option types, both with or without the IM.

I think it was considerations like this that persuaded Dataphor it couldn't do without some kind of NULL, but I don't know what they actually finished up doing. But it certainly should make life easier, and that can't be all bad.

Andl - A New Database Language - andl.org
Quote from dandl on June 4, 2020, 11:28 am
Quote from Dave Voorhis on June 4, 2020, 7:45 am
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

I bet Date & Darwen would object -- along with most of the rest of us -- to any attempt to lawyer your way into defining a general SQL-like NULL.

Do you read it that way? I think most people find a general need to deal with missing values in their data, and most people find the SQL outer join a convenience. I'm exploring whether the benefits of NULL can be achieved without 3-level logic or SQL syntactic oddities; but also without having to build and implement a custom type just to do such a simple and generic thing.

I'm looking for feedback on issues that I might have missed. If it's just "D&D wouldn't like it", that doesn't help a whole lot.

Not only D & D wouldn't like it...

We have gone over this in some detail before, to the point that it's one of the various annual discussions, along with update-through-views, etc.

My preference is for an option type defined via the IM's UNION facility, others prefer Zero-Or-One ("ZOO") RVAs, others -- no doubt -- prefer other things. I'd expect a useful D to ideally support at least allowing option types and ZOOs.

But I'm sure we've discussed various TTM-compliant ways of handling missing values before, including TTM-complaint ways of handling (for example) import of SQL query results into a D using various kinds of union types to define option types, both with or without the IM.

I think it was considerations like this that persuaded Dataphor it couldn't do without some kind of NULL, but I don't know what they actually finished up doing. But it certainly should make life easier, and that can't be all bad.

NULL in its typical SQL and classic Java/C/C++ programming language guises is all bad. The literature -- and various programming language implementations -- are flush with suggestions for mitigating these.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Although I don't much like the idea, it certainly appears to comply with RM Pro 4.  Above all, it doesn't depart from 2VL.  I once asked a similar question in this forum (though in even less detail than dandl) but the idea was pretty much debunked, as I recall.

I suppose that the semantics of arithmetic operations on INTEGER(NULL) and RATIONAL(NULL) would be inspired by existing treatments of NaN.  (Does INTEGER(NULL)/0 raise an exception?)  What about string operations involving CHARACTER(NULL)?

Question to dandl: Do you really mean every type?  Even UDTs, enumerated types, tuple types, and relation types?

Hugh

 

Coauthor of The Third Manifesto and related books.
Quote from Hugh on June 4, 2020, 1:34 pm
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Although I don't much like the idea, it certainly appears to comply with RM Pro 4.  Above all, it doesn't depart from 2VL.  I once asked a similar question in this forum (though in even less detail than dandl) but the idea was pretty much debunked, as I recall.

I suppose that the semantics of arithmetic operations on INTEGER(NULL) and RATIONAL(NULL) would be inspired by existing treatments of NaN.

Just so, but with a well-behaved NULL, so NULL === NULL

(Does INTEGER(NULL)/0 raise an exception?)

Returns NULL I think.

What about string operations involving CHARACTER(NULL)?

I've not found it useful to distinguish NULL from the empty string in stored data, other than the special treatment provided by SQL. My preference is that it's just an empty string, nothing special.

Question to dandl: Do you really mean every type?  Even UDTs, enumerated types, tuple types, and relation types?

Yes. Most values already have a value suitable for use as a NULL. Here is my list of eleven:

  • bool: actually no, this type should not be NULL, that would be 3VL
  • int (INTEGER): the value -2^63 could be reserved for this purpose (assuming 64 bit integers)
  • double (RATIONAL): NaN, but library functions need to treat it right
  • decimal: dunno, pick one
  • date/time: year 0
  • text (CHARACTER); empty string (of characters)
  • binary: empty string (of bytes)
  • enum: the word 'null' (as per JS)
  • UDT: every component is NULL
  • tuple: every attribute is NULL
  • relation: empty relation.

Hugh

 

Andl - A New Database Language - andl.org
Quote from Erwin on June 4, 2020, 11:24 am
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Yes it complies.  I've said so before.  Yes you can define a type that has this member NULL and yes you can define equality such that this NULL member is equal to itself.

And that misses the mark of WHY NULL (SQL version) was ever invented in the first place.  So no, it manifestly ***does not*** "alleviate a number of problems in dealing with SQL-like queries".  At best, it replaces them with an almost identical number of new problems, such as, e.g., if the user really wants the SQL NULL behaviour (NULL not equal to anything) then he's now forced to bloat his code all over the place with "WHERE x <> NULL" restrictions.

In the chapter by my hand on the subject in DBE, I called it M (or M(), I don't remember precisely).

It's not of interest to me why SQL made the choices it did around 3VL, or to be of any assistance to those who want to import this behaviour into a D type system. Missing values are a feature of many systems that handle real world data, and many data sources (including JSON) reflect this. KNIME for example has deep support for them. SQL outer joins and aggregation are familiar ways to handle common problems. The question is whether it's possible or reasonable to get the benefits without the pain.

DBE ch 25 has your name on it, so I assume that's the one you mean, although it's written as by some other author. It proposes a solution based on adding a new supertype called M to the IM, so seems only tangentially relevant. My proposal is more along the lines of the 'special value' model. I think D&D may have written about this, but I don't know where.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on June 4, 2020, 12:45 pm
Quote from dandl on June 4, 2020, 11:28 am
Quote from Dave Voorhis on June 4, 2020, 7:45 am
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

I bet Date & Darwen would object -- along with most of the rest of us -- to any attempt to lawyer your way into defining a general SQL-like NULL.

Do you read it that way? I think most people find a general need to deal with missing values in their data, and most people find the SQL outer join a convenience. I'm exploring whether the benefits of NULL can be achieved without 3-level logic or SQL syntactic oddities; but also without having to build and implement a custom type just to do such a simple and generic thing.

I'm looking for feedback on issues that I might have missed. If it's just "D&D wouldn't like it", that doesn't help a whole lot.

Not only D & D wouldn't like it...

We have gone over this in some detail before, to the point that it's one of the various annual discussions, along with update-through-views, etc.

My preference is for an option type defined via the IM's UNION facility, others prefer Zero-Or-One ("ZOO") RVAs, others -- no doubt -- prefer other things. I'd expect a useful D to ideally support at least allowing option types and ZOOs.

But I'm sure we've discussed various TTM-compliant ways of handling missing values before, including TTM-complaint ways of handling (for example) import of SQL query results into a D using various kinds of union types to define option types, both with or without the IM.

I think it was considerations like this that persuaded Dataphor it couldn't do without some kind of NULL, but I don't know what they actually finished up doing. But it certainly should make life easier, and that can't be all bad.

NULL in its typical SQL and classic Java/C/C++ programming language guises is all bad. The literature -- and various programming language implementations -- are flush with suggestions for mitigating these.

Agreed, so we don't do that. But languages with typed NULL treated mostly as an ordinary value? Do you know one?

Andl - A New Database Language - andl.org
Quote from dandl on June 5, 2020, 2:24 am
Quote from Hugh on June 4, 2020, 1:34 pm
Quote from dandl on June 4, 2020, 7:10 am

Consider the following.

  • Every type shall be assumed to include a value called NULL, unless declared or constrained otherwise.
  • Every instance of NULL is a typed value, which can be created by a selector with some literal argument. For example INTEGER(NULL).
  • An instance of NULL compares equal to NULL of the same  type and not equal to every other value of that type. NULLs of different types may not be compared.
  • For ordered types, NULL compares less than any other value of that type.
  • An outer join returns a value in which some tuples may have attributes containing a NULL value, providing this is not prohibited for the type.
  • Aggregation by default ignores NULL values in the aggregated attribute (other options are possible).

As far as I can tell this complies with RM Pro 4, and alleviates a number of problems in dealing with SQL-like queries.

Although I don't much like the idea, it certainly appears to comply with RM Pro 4.  Above all, it doesn't depart from 2VL.  I once asked a similar question in this forum (though in even less detail than dandl) but the idea was pretty much debunked, as I recall.

I suppose that the semantics of arithmetic operations on INTEGER(NULL) and RATIONAL(NULL) would be inspired by existing treatments of NaN.

Just so, but with a well-behaved NULL, so NULL === NULL

(Does INTEGER(NULL)/0 raise an exception?)

Returns NULL I think.

What about string operations involving CHARACTER(NULL)?

I've not found it useful to distinguish NULL from the empty string in stored data, other than the special treatment provided by SQL. My preference is that it's just an empty string, nothing special.

Question to dandl: Do you really mean every type?  Even UDTs, enumerated types, tuple types, and relation types?

Yes. Most values already have a value suitable for use as a NULL. Here is my list of eleven:

  • bool: actually no, this type should not be NULL, that would be 3VL
  • int (INTEGER): the value -2^63 could be reserved for this purpose (assuming 64 bit integers)
  • double (RATIONAL): NaN, but library functions need to treat it right
  • decimal: dunno, pick one
  • date/time: year 0
  • text (CHARACTER); empty string (of characters)
  • binary: empty string (of bytes)
  • enum: the word 'null' (as per JS)
  • UDT: every component is NULL
  • tuple: every attribute is NULL
  • relation: empty relation.

Every sentinel value you choose will show up in real data with some other meaning, either someone else's semantically-different-from-yours sentinel value or some real data.

For example, the empty string seems like an obvious choice for strings until you need to import/link a SQL table that has semantically distinguished null strings from empty strings. This happens a lot.

The proper way to handle it is with true option types. Ideally, you'd have a language with structural obligation to force handling the 'just a' and 'nothing' options (as in Haskell, Kotlin, etc.), but the usual approaches to handling option types in the popular languages (usually a parametric Option<T> type) will suffice.

It's what they're for.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Page 1 of 3Next