## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:

# Codd's remainder

I've been looking more closely at Codd's divide.  The chief complaint against it, which has brought on a flood of other divides, is given in the history-of-divide chapter: "Codd’s divide [...] doesn’t really solve the problem it was originally meant to solve".  Nevertheless it is mathematically interesting, and some of the seeming arbitrariness of the restrictions it has go away if you conceptualize it as integer rather than rational division.  I'll limit myself to the abridged version in which all the attributes of the divisor must appear in the dividend.

Chapter 12 rightly says: "Thus we see that Expression 1 doesn’t accurately represent the query 'Get supplier numbers for suppliers who supply all purple parts'; rather, it represents the query 'Get supplier numbers for suppliers who supply at least one part and supply all purple parts.'”  A reasonable riposte, however, is that suppliers that don't supply anything are not actually suppliers just because they appear in the suppliers table.

(I would also add that universal quantification over the empty set isn't as cut and dried in logic as it is usually made out to be.  Translating "all F is G" from syllogistic logic into the predicate logic version "∀x: F(x) ⊃ G(x)", as has been normal since Frege's day, is not quite what Aristotle and all logicians up to the 19C had in mind, precisely because it is counterintuitive on empty sets. The translation counts "All purple rhinos are dead" and "All purple rhinos are alive" as equally true, even though it is true that no rhino is both alive and dead..  Instead, the syllogistic logicians made "all F is G" false by convention if there were no members of F.  Neither convention is superior to the other in all cases, and my riposte above asserts the traditional position.)

So what's the title of this post about?  Well, chapter 12's explanation of why the operator is called divide is true as far as it goes: "t if r0 and r2 are relations with no attribute names in common and we form the cartesian product r1 = r0 TIMES r2 (or, equivalently, r0 JOIN r2), and then divide r1 by r2, we get back to r0; in other words, product and divide are inverses of each other, in a sense."  However, it is not the case that if r1 divided by r2 is r0, then r0 times r2 is r1; there may be other rows in r1 that are not restored by the multiplication.  This is a familiar story in integer division:  2 * 5 is 10 and 10 divided by 5 is 2, but 13 divided by 5 is 2 with a remainder of 3.  So we can call the operator that given r1 and r2 produces r1 - (r1 divide r2) * r2 by the name of Codd's remainder, since it is strictly analogous to the remainder in integer division.

So what is Codd's remainder actually good for?  I don't know yet.

The modern understanding of universal quantification over empty sets is based on the property that to prove something false you have to either find a logically correct reasoning based on "universally" accepted axioms (but in the case of purple rhinos such axioms are quite rare) or else you have to find a real-life example that shows the opposite (but in the case of purple rhinos such real-life examples are equally rare).

Mathematicians like to be able to apply the tautology FORALL X : P(X) === NOT EXISTS X : NOT P(X) and do so ***without*** having to make reservations about the domain of X.

Any other approach smacks of another kind of 3VL in which the 3d truth value is "IRRELEVANT", or some such.  I doubt such logic would have lesser downsides than the one we have with "UNKNOWN".

There are at least 2 directions to expand/explore the topic.

First, there is more than one "division-like" operation in Relational Model, which can actually be viewed as set joins. In the predicate calculus perspective set joins correspond to generalized quantifiers.

Second, it was Pierce&Tarski algebra of Binary Relations, where inspiration for relational division might have come from. In Relation Algebra the operation of composition can informally can be thought as multiplication or database join. The "inverse" operation (there are actually two -- lower and upper adjoints) are defined as follows:

$x \circ y < z \Leftrightarrow y < x \setminus z \Leftrightarrow x < z / y$

Quote from johnwcowan on June 27, 2019, 8:34 pm

So we can call the operator that given r1 and r2 produces r1 - (r1 divide r2) * r2 by the name of Codd's remainder, since it is strictly analogous to the remainder in integer division.

So what is Codd's remainder actually good for?  I don't know yet.

This answer to observe in particular that algebraically, your expression "r1 - (r1 divide r2) * r2" is going to be valid ***only*** if R2's heading is a subset of R1's.  (Unless you meant '-' to correspond to SEMIMINUS not MINUS.)  (And which relates to which divide is denoted by your 'divide' :-) )

Quote from Erwin on June 27, 2019, 9:21 pm

Mathematicians like to be able to apply the tautology FORALL X : P(X) === NOT EXISTS X : NOT P(X) and do so ***without*** having to make reservations about the domain of X.

So they do, but logicians are more flexible.  (My father was one, so I know of what I speak.)  See the SEP article "The Traditional Square of Opposition".

Any other approach smacks of another kind of 3VL in which the 3d truth value is "IRRELEVANT", or some such.  I doubt such logic would have lesser downsides than the one we have with "UNKNOWN".

Not at all.  Syllogistic logic is altogether 2VL: all statements are either true or false.  However, positive statements have existential import and negative statements do not, as opposed to the modern version where particular statements have existential import (which can be tautologically expressed as "existential statements have existential import") and universal statements do not.

From the above article:  "The puzzle about this argument is why the doctrine of the traditional square was maintained for well over 20 centuries in the face of this consideration [about empty sets]. Were 20 centuries of logicians so obtuse as not to have noticed this apparently fatal flaw? Or is there some other explanation?"  There is.

Quote from Erwin on June 27, 2019, 9:46 pm

This answer to observe in particular that algebraically, your expression "r1 - (r1 divide r2) * r2" is going to be valid ***only*** if R2's heading is a subset of R1's.  (Unless you meant '-' to correspond to SEMIMINUS not MINUS.)  (And which relates to which divide is denoted by your 'divide' :-) )

Exactly what I said in my original post's first paragraph:  "I'll limit myself to the abridged version [of Codd's divide] in which all the attributes of the divisor must appear in the dividend."

It is good for the same thing that his original natural join operator is good for, enforcing data Independence.

The beauty of his operators is that they preserve that fundamental goal even when data designers don't understand it, which is often. All bets are off when dbms designers don't understand it, which is often.

Absolute truth is not involved in his algebra, only deductive validity is and this is determined only by finite extensions. This feature is crippled when the information principle is ignored, such as when no information is given that can indicate whether a purple unicorn is alive or dead and even when such information is given but users choose not to specify it in their algebraic expressions. When the information principle is ignored every operator is misapplied.
Instead of asking what different operator would give some result that somebody prefers the question should be which of his operators should be used to augment results that somebody sometimes thinks are partial. When it is understood that only four operators are really needed for his algebra, namely Union join difference and projection, the candidate is obvious.
An oft-quoted misapplication is the old shop is open, alarm is set example. It was used to equate join with Union for certain purposes when the accurate comparison should have been between exclusive Union and Union. No justification has ever been given for this departure from Codd's fundamental goal.
Quote from p c on June 28, 2019, 12:18 pm
An oft-quoted misapplication is the old shop is open, alarm is set example. It was used to equate join with Union for certain purposes when the accurate comparison should have been between exclusive Union and Union. No justification has ever been given for this departure from Codd's fundamental goal.

Good excuse to try this latex thing.

$\overline{A \bowtie B} \equiv (\overline {A} \bowtie U_b ) \cup ( \overline {B} \bowtie U_a )$

A tuple is not member of a join iff its 'a part' is not member of A or its 'b part' is not member of B.  That is just plain De Morgan applied to the definition of JOIN.

And that shows beyond reasonable doubt, and totally independent of shops and alarms, why deleting from a join is the very same problem as inserting to a union.

Oh yes, for those in doubt about the definition of "complement" :

$\overline {A} \equiv U_a \setminus A$

I might start to like this latex thing.

That reply doesn't address the two points I made, data Independence and the inapplicability of join to the shop alarm application. If it is to serve any useful purpose that application clearly depends on a relation with a predicate of a relation expressed as shop open and not alarm set OR shop not open and alarm set, which is exclusive Union.

Instead it raises the old question of join deletes and misapplies demorgan's law. Where Codd's algebra is concerned the misapplication depends on the same old pop logic which presumes that it is necessary to decide which tuples are not in the result extensions rather than which ones are.

Nobody has ever questioned the generalized deductive validity of demorgan when applied with suitable premises. Regarding premises, just exactly what are the universes you refer to and what are the negations? Note that Codd didn't mention universes but domains instead, as in function domains. Note also that his algebra provides no way to negate a relation, it only provides a way to tell whether a given extension can be inferred or deduced or not from some other extension, for example when A join B equals A, A can be inferred from B, otherwise it can't.

Let's go along with the apparent presumption in the pop argument that the sets A and B can be equated with relation extensions so suppose the A and B in your argument stand for equal singleton relations. Without any other assumptions the dbms must allow for the possibility that those relations have equal predicates, say PA = PB. If you delete (PA AND PB) by deleting only PA, your database now contradicts itself. Codd defined natural join so that this could not happen. Data independence depends on that definition.

You can arrive at the same conclusion with more involved examples such as when B is a subset of A or when B and A have different headings. In no way do the results deny de Morgan provided it is applied in accordance with Codd's algebra for example (delete (A join B) ) equals (A minus B) union (B minus A).

By the way this website is not likely to attract much traffic from people with severe eye damage and probably not from people with less serious problems. By modern standards it is not only deficient, requiring a physical keyboard to cut and paste but the pale text and inability to Reflow which some of us must depend on for visibility makes me think it was designed or implemented by an HTML neophyte who is probably an obsessive technocrat to boot. Amateurish describes it. Maybe I shouldn't be surprised by such retrograde text handling when the relational model hasn't progressed much in the last 40 years either.

Quote from p c on June 30, 2019, 2:17 am

... Data independence depends on that definition. ...

You keep using the phrase "data independence"... I don't think it means what you think it means.

Quote from p c on June 30, 2019, 2:17 am

By the way this website is not likely to attract much traffic from people with severe eye damage and probably not from people with less serious problems. By modern standards it is not only deficient, requiring a physical keyboard to cut and paste but the pale text and inability to Reflow which some of us must depend on for visibility makes me think it was designed or implemented by an HTML neophyte who is probably an obsessive technocrat to boot. Amateurish describes it. Maybe I shouldn't be surprised by such retrograde text handling when the relational model hasn't progressed much in the last 40 years either.

Could you explain what the problem is with cut-and-paste?

I've tried it on a conventional physical keyboard, an onscreen keyboard on a touchscreen computer, a mobile phone, and various tablets. All appear to work correctly, particularly as cut-and-paste are provided by the operating system, not the Web browser or this forum.

Furthermore, the text not only reflows, it is responsive to screen resolution and works equally well on different devices as it does on different screen resolutions. I suggest you try zooming in with your browser. You'll see that the text enlarges and reflows seamlessly.

The "pale" text is defined by the stylesheet to be close to black, but with slightly reduced contrast -- particularly on icons, non-content notices, etc. -- to reduce eyestrain when reading content text.