The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Relational Completeness: not just about choice of operators

12

I continue to mull whether Codd's 1972 definition of 'Relational Completeness' is complete enough to define any operation over relations.

In particular, how to define a test for equality between relations?

Contrast that TTM (RM Pre 8) prescribes "D shall support the equality comparison operator “=” for every type T." That is, whether or not equality is included in "the usual operators of the relational algebra (or some logical equivalent thereof)." (RM Pre 18)

 

There's an immediate problem: RM Pre 8's operator returns a boolean, and type boolean is required to be included in a D, per RM Pre 1. Codd's algebra included only relations, not any scalar types specifically. That's not too bad; we could adopt some convention that (say) DEE represents True/relations are equal, DUM represents False.

And since I'm talking about a "logical equivalent" to an operator, I don't need to include testing equality between relations as a primitive. I could define it in terms of subset tests between relations (with other operators); or in terms of IS_EMPTY( ) with other operators. But try as I might, I couldn't define those in terms of Codd's primitives. The best I could do was: give me two relation values (extra to the ones I'm comparing), I'll return the first if the comparison is True, the second otherwise.

Codd's primitives don't give any way to express a relation literal. I can define DEE providing I know the name of a relation in the database that is guaranteed to be non-empty. I can define DUM providing I know the name of any relation at all in the database. But that seems terribly ad-hoc: the algebra should be complete even if the database is completely empty.

Then I've just come across 1 a discussion of 'functional completeness' of a logic, specifically the implicational fragment of the Propositional Calculus.

[Material] Implication alone is not functionally complete as a logical operator because one cannot form all other two-valued truth functions from it. However, if one has a propositional formula which is known to be false and uses that as if it were a nullary connective for falsity, then one can define all other truth functions. So implication is virtually complete as an operator. If P,Q, and F are propositions and F is known to be false, ...

Ok. Then I think it's legitimate to add DEE as a primitive, even though it's not an operator. Of course DUM =df DEE MINUS DEE. Now I can define an equality test to return DEE or DUM and I'm in business.

 

1 - https://en.wikipedia.org/wiki/Implicational_propositional_calculus#Virtual_completeness_as_an_operator

 

Quote from AntC on March 30, 2019, 9:49 am

I continue to mull whether Codd's 1972 definition of 'Relational Completeness' is complete enough to define any operation over relations.

...

Codd's primitives don't give any way to express a relation literal.

I think you answered the part of your own question dealing with booleans. It seems reasonable to assume some mechanism for creating the relational values on which the RA operates (and which are presented in his papers), and that given any plausible mechanism it could be used to create DEE and DUM if they didn't already exist. Problem solved.

But there are many other operations that the RA cannot perform. The first that come to mind are:

  • Anything to do with RVAs and TVAs (NEST, UNNEST, etc)
  • Anything that depends on computation or counting, since the RA is only first order (COUNT, ISEVEN, FOLD/SUM, WHILE, etc).

You don't need a type system, but you do need something more than the RA.

Andl - A New Database Language - andl.org

I am more interested not what a minimal set of RA operators is, but on what the usual set is.  Everyone seems to have their own notions of "usual".  I'm okay with using Tutorial D's set minus the division operators, but even that list is not so easy to find; the TD grammar alone is not enough.

 

Quote from johnwcowan on June 27, 2019, 5:15 pm

I am more interested not what a minimal set of RA operators is, but on what the usual set is.  Everyone seems to have their own notions of "usual".  I'm okay with using Tutorial D's set minus the division operators, but even that list is not so easy to find; the TD grammar alone is not enough.

 

No, Tutorial D's set is well more than minimal. See Appendix A, where Tutorial D's operators are inter-defined and reduced to a tight core.

In terms of "usual": since most of the industry thinks SQL is relational; and the whole idea of inter-defining SQL 'operators' is hazardous, I don't suppose you're going to find agreement. (Lots of SQL's options in the SELECT statement you'd think might be equivalent. But it turns out with NULL and unnamed or duplicate-named columns, there are annoyingly different corner cases.)

Even Codd [1972]'s minimal set is less than minimal: he omitted RENAME, which error was detected almost immediately. Occasionally Hugh or someone posts here their 'starter' for a usual set. It's really a parlour game, since there's many equally valid (equally expressive) possible sets.

Quote from AntC on June 27, 2019, 10:50 pm

In terms of "usual": since most of the industry thinks SQL is relational; and the whole idea of inter-defining SQL 'operators' is hazardous, I don't suppose you're going to find agreement.

I mean of course "usual" among the people who are already interested in and knowledgeable about the RM.  Both RM Pro 7 and RM Pro 18 refer to "the usual relational operators", but without saying what is meant.  It's also unclear if that means a D has to support these "usual" operators with syntax, or if it's enough to just provide some basis set and leave the rest up to the user.

Quote from johnwcowan on June 28, 2019, 12:40 am
Quote from AntC on June 27, 2019, 10:50 pm

In terms of "usual": since most of the industry thinks SQL is relational; and the whole idea of inter-defining SQL 'operators' is hazardous, I don't suppose you're going to find agreement.

I mean of course "usual" among the people who are already interested in and knowledgeable about the RM.  Both RM Pro 7 and RM Pro 18 refer to "the usual relational operators", but without saying what is meant.

RM Pre 18 continues "(or some logical equivalent thereof)." Furthermore a note at the end of the 2016 TTM says " a sentence listing certain
relational algebra operators that D was required to “support, directly or indirectly” has been deleted, since it added nothing."

So you could look back at earlier versions of the TTM document to find that list. Or you could look at DTATRM on RM Pre 18, which has a list and some long explanations/history of comings and goings on the list. (I don't know if the list varied in later versions of TTM before the list was dropped altogether.)

I think dropping the list is sensible: it'll only lead to endless nit-picking to for example include both JOIN and INTERSECTION when the latter is just a special case of the former; but to not mention TIMES, which is also just a special case of JOIN. And why list MINUS when many would prefer NOT MATCHING? That list owes too much to Codd 1972, which is of dubious provenance.

It's also unclear if that means a D has to support these "usual" operators with syntax, or if it's enough to just provide some basis set and leave the rest up to the user.

"(or some logical equivalent thereof)." says it all, doesn't it? If you can define the "usual" set using the set some D provides, then you're done. Tutorial D has the limitation that you can't user-define relational operators in it (RM VSS 6 notwithstanding), because that would need generics/polymorphism regarded as too advanced for tutorial level. Therefore Tutorial D has a superfluity of operators. Your D might support RM VSS 6.

Quote from AntC on June 27, 2019, 10:50 pm
Quote from johnwcowan on June 27, 2019, 5:15 pm

I am more interested not what a minimal set of RA operators is, but on what the usual set is.  Everyone seems to have their own notions of "usual".  I'm okay with using Tutorial D's set minus the division operators, but even that list is not so easy to find; the TD grammar alone is not enough.

 

No, Tutorial D's set is well more than minimal. See Appendix A, where Tutorial D's operators are inter-defined and reduced to a tight core.

In terms of "usual": since most of the industry thinks SQL is relational; and the whole idea of inter-defining SQL 'operators' is hazardous, I don't suppose you're going to find agreement. (Lots of SQL's options in the SELECT statement you'd think might be equivalent. But it turns out with NULL and unnamed or duplicate-named columns, there are annoyingly different corner cases.)

Even Codd [1972]'s minimal set is less than minimal: he omitted RENAME, which error was detected almost immediately. Occasionally Hugh or someone posts here their 'starter' for a usual set. It's really a parlour game, since there's many equally valid (equally expressive) possible sets.

I complained about the omission of RENAME over many years, until somebody unkindly drew to my attention that renaming can be expressed in terms of extension and projection.  But Codd omitted extension too, so we can say that he did indirectly omit RENAME.

Of course extension too, is theoretically redundant, as we showed in DTATRM, Appendix A.  The ISBL team considered using JOIN for that purpose, with heavy restrictions on relations representing operator, but abandoned the idea as impractical.

In a later post I saw John expressing a preference for NOT MATCHING over MINUS, surmising Codd 1972 as the likely provenance (and Chris Date did indeed teach MINUS on that course I attended that year).  I think he is right about that provenance, perpetrated by Chris and other writers over subsequent years, but ISBL came up with their "not matching" operator also in the early 1970s, and that's the one we opted for in BS12.  I was opposed to including MINUS in Tutorial D but Chris wouldn't budge.  I was opposed to including INTERSECT and TIMES, too and I resolutely refuse ever to use those in place of JOIN in my Rel applications.

I think Chris and I would both be happy to drop DIVIDEBY now.  He wanted to drop SUMMARIZE, too, in later-than-DTATRM revisions of TD, but I find that operator  far too useful, in both its PER and BY versions, and was able to resist its loss in spite of his grumbling (he loves "image relations").  I actually use Tutorial D, extensively (via Rel).  He doesn't use it at all.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on June 28, 2019, 12:13 pm

In a later post I saw John expressing a preference for NOT MATCHING over MINUS, surmising Codd 1972 as the likely provenance.

Not me, so it must have been someone else.  ("The Iliad and the Odyssey were not written by Homer, but by another man of the same name.")  I do prefer the name ANTIJOIN to either NOT MATCHING or SEMIMINUS, though for a while I thought that ANTIJOIN was actually what TD calls COMPOSE (I need to put that and TCLOSE into SQLish somehow).  But I'm happy to have both MINUS/EXCEPT and ANTIJOIN/SEMIMINUS/NOT MATCHING operators, though I agree with you that given the inescapable name JOIN, neither TIMES nor INTERSECT is worth keeping.

I still don't understand !! at all.

Quote from johnwcowan on June 28, 2019, 2:13 pm

I still don't understand !! at all.

For a long while, I didn't either. Then I implemented it in Rel -- which was oddly easy; it fell naturally out of the relevant mechanisms -- and it became a little clearer.

I still find it painfully counterintuitive.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on June 28, 2019, 2:17 pm
Quote from johnwcowan on June 28, 2019, 2:13 pm

I still don't understand !! at all.

For a long while, I didn't either. Then I implemented it in Rel -- which was oddly easy; it fell naturally out of the relevant mechanisms -- and it became a little clearer.

I still find it painfully counterintuitive.

The basic idea is intuitive enough as it goes, but :

It's slightly unclear to me that using it extensively makes for clearer code,

And it's much less slightly unclear to me whether the optimizer that is able to tackle all the intrinsically possible access/computation strategy pitfalls, is a feasible one.  (Date has always had this tendency to willingly ignore aspects of feasibility of compiler-level code.)

12