The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

More sad quotes

More sad quotes:

"I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least).   I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join".   He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only."

Apparently, ISBL was based on the same misunderstanding that persists today, assuming theory from wrong implementation.   It looks like even fewer people than I'd have guessed have even read the two summary paragraphs of the 1970 paper.

Addressing the last sentence first, the section on operations includes this: "Extension of the notions of linear and cyclic S-join and their natural counterparts to the joining of n binary relations (where n 2 3) is obvious…."

Regarding the definition of natural join, it appears that the 1969 and 1970 papers had the first mentions of the very idea, one which has been widely ignored in academia and Industry ever since, to grave effects, except of course for usurping the term as well the relational adjective.

It also appears to be more than coincidence that the 1970 paper also introduced the idea of storage optimization known as normalization which its natural join copes so neatly with.

The logically disastrous effect of the widespread hijacking of the term natural join is that so-called relational database systems not only allow logically invalid results, they actually require them!

The previous favourite quote of mine about the billing system fragment that is claimed to be simplified could be simplified even further to make the point. Assume there is only one relvar in the database, Invoices, and then repeat the earnest  question "what should happen?" when an invoices tuple is deleted. When there is more than one tuple obviously an application language needs to be able to address the subset that contains the desired tuple, which many languages easily express by joining invoices with an explicit subset relation literal. A workable algebra wouldn't need to make the distinction so to illustrate the problem it's enough to assume there is only one Invoice. I've shown the logical  problem multiple times, but I'll give it again in terms that high-school kids should understand.

Predicate calculus is not needed to illustrate the situation, Boolean algebra is sufficient.  The subset relationship is given by the material implication (S->I) where S represents the subset and I represents the Invoices relation. The deletion aka negation is represented by (Not S).

Now combine those two premises with some desired conclusion such as (S Not Member of I) and express this conclusion as (Not I).

The Logical argument that results can be written as:

( (S->I) & (Not  S) ) -> (Not I). 

However this is not a valid logical argument (false conclusion allowed).

A valid argument would be:

( (S<->I) & (Not  S) ) -> (Not I).

The 1970 definition assumes the biconditional premise of the valid argument (equal projections). Appendix A does not (unequal projections allowed). Among other things, what this means is that not only does Appendix A supposedly prevent join deletion, it also prevents logically provable base deletion. In fact, not just updates but queries become logically invalid as well.

Codd wrote about a number of different data theories but his 1990 book is consistent with the 1970 basic operations. Some coders rather stupidly presume just one relational theory. 

Another quote: "...Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and those projections aren't equal. "

Elsewhere, this complaint is sometimes called logical-physical confusion. For example, treating the usual Suppliers relation as a join when in fact it is a disjoint Union, or a foible of assuming relation types based on headings that ignore logical structure, or relying on homespun casual predicate explanations, etc.. Codd dealt with this by expecting a data language to distinguish stored versus named versus expressible relations. Repair Appendix A's join definition so that it defines just which relation values determine the joined value, excluding lost information  !

Another quote:   "...Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about?..."

Indeed? It is a strange relational interface indeed which can't express such a basic logical structure as set difference. If this comment is what TtM would mean with correct join treatment, obviously it needs more repair work. 

There are other massive follow-on effects of what is now a fifty-year-old mistake. Ironic too, when appendix A actually claims without more explanation "logically respectable" as justification for one of its features!

Om top of this are all the junk dbms'es that expect users to do the all the easy relational work and do it without error every time.  Wanting yet another advanced file system doesn't mean needing another one. 

 

Quote from p c on March 15, 2020, 1:03 pm

More sad quotes:

"I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least).   I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join".   He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only."

Apparently, ISBL was based on the same misunderstanding that persists today, assuming theory from wrong implementation.   It looks like even fewer people than I'd have guessed have even read the two summary paragraphs of the 1970 paper.

Addressing the last sentence first, the section on operations includes this: "Extension of the notions of linear and cyclic S-join and their natural counterparts to the joining of n binary relations (where n 2 3) is obvious…."

Regarding the definition of natural join, it appears that the 1969 and 1970 papers had the first mentions of the very idea, one which has been widely ignored in academia and Industry ever since, to grave effects, except of course for usurping the term as well the relational adjective.

It also appears to be more than coincidence that the 1970 paper also introduced the idea of storage optimization known as normalization which its natural join copes so neatly with.

The logically disastrous effect of the widespread hijacking of the term natural join is that so-called relational database systems not only allow logically invalid results, they actually require them!

The previous favourite quote of mine about the billing system fragment that is claimed to be simplified could be simplified even further to make the point. Assume there is only one relvar in the database, Invoices, and then repeat the earnest  question "what should happen?" when an invoices tuple is deleted. When there is more than one tuple obviously an application language needs to be able to address the subset that contains the desired tuple, which many languages easily express by joining invoices with an explicit subset relation literal. A workable algebra wouldn't need to make the distinction so to illustrate the problem it's enough to assume there is only one Invoice. I've shown the logical  problem multiple times, but I'll give it again in terms that high-school kids should understand.

Predicate calculus is not needed to illustrate the situation, Boolean algebra is sufficient.  The subset relationship is given by the material implication (S->I) where S represents the subset and I represents the Invoices relation. The deletion aka negation is represented by (Not S).

Now combine those two premises with some desired conclusion such as (S Not Member of I) and express this conclusion as (Not I).

The Logical argument that results can be written as:

( (S->I) & (Not  S) ) -> (Not I). 

However this is not a valid logical argument (false conclusion allowed).

A valid argument would be:

( (S<->I) & (Not  S) ) -> (Not I).

The 1970 definition assumes the biconditional premise of the valid argument (equal projections). Appendix A does not (unequal projections allowed). Among other things, what this means is that not only does Appendix A supposedly prevent join deletion, it also prevents logically provable base deletion. In fact, not just updates but queries become logically invalid as well.

Codd wrote about a number of different data theories but his 1990 book is consistent with the 1970 basic operations. Some coders rather stupidly presume just one relational theory. 

Another quote: "...Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and those projections aren't equal. "

Elsewhere, this complaint is sometimes called logical-physical confusion. For example, treating the usual Suppliers relation as a join when in fact it is a disjoint Union, or a foible of assuming relation types based on headings that ignore logical structure, or relying on homespun casual predicate explanations, etc.. Codd dealt with this by expecting a data language to distinguish stored versus named versus expressible relations. Repair Appendix A's join definition so that it defines just which relation values determine the joined value, excluding lost information  !

Another quote:   "...Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about?..."

Indeed? It is a strange relational interface indeed which can't express such a basic logical structure as set difference. If this comment is what TtM would mean with correct join treatment, obviously it needs more repair work. 

There are other massive follow-on effects of what is now a fifty-year-old mistake. Ironic too, when appendix A actually claims without more explanation "logically respectable" as justification for one of its features!

Om top of this are all the junk dbms'es that expect users to do the all the easy relational work and do it without error every time.  Wanting yet another advanced file system doesn't mean needing another one. 

More gibberish.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org