The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

The intellectual curiosity of `NatJoin`, `InnerUnion`

Quote from Vadim on May 14, 2021, 6:38 pm

... Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor.

Thanks Vadim ...

Looking at Appendix A's 'FORMAL DEFINITIONS' for <AND>, it's a strange beast. Judging by Stackoverflow, most SQL'ers just don't get it -- neither questioners nor answerers, even those with a considerable reputation.

  • <AND>/ Natural Join is not the mathematicians' Cartesian Product, because of (what wikipedia calls) the "flattened" tuples in the result.
  • That is, by comparison it erases information: you can't tell from the result what were the headings of the operands.
  • It's perhaps easier to see as (set) INTERSECTION, but that only works if the operands have same heading.
  • How to explain if the operands have partially-overlapping heading?
  • We might say: project each operand onto the attributes in common; get the INTERSECTION of those (because that's easy to explain); extend each intersecting tuple back to that/those in the source operands.
  • That explanation becomes hard work if the operands have no attributes in common: project each operand on to no attributes? What?
  • Perhaps that's why SQL explains FROM table1, table2, ... as more like a flattened Cartesian Product? Even if it risks duplicate columns.

What's remarkable about the semantics for <AND> is that it can take any two relation values as operands, and is guaranteed to produce just another relation value as result (subject to the 'join-compatibility' of the headings).

Curiosity upon curiosity, this <AND> is commutative, associative, idempotent! Relational Algebra can be like set operations.

<AND> also (because it's defined across all relation values) provides a way to compare relations:

  • if they have the same heading and one's body is a subset of the other's, we find bigger <AND> smaller yields smaller
    -- and that works even if the smaller is empty.
  • If they have same heading but only a partial overlap of tuples in their bodies, we find r1 <AND> r2 yields a relation that is smaller than either -- it's body is exactly the INTERSECTion.
  • The real curiosity, is when we get to relation values with differing headings, 'though now we need a bit of sideways thinking:
  • If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
    bigger-body-smaller-heading <AND> smaller-body-bigger-heading yields smaller-body-....
  • If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find r1 <AND> r2 yields a relation that is smaller than either -- going by the smaller-body-bigger-heading ordering.

So if <AND> / Natural Join is alleged to be defined across all relation values, a few of questions naturally arise:

  • There must be a bigger-than-anything-else value(?)
    Yes, going by our bigger-body-smaller-heading ordering, that'll be the value with minimal heading -- that is, no attributes -- and maximum body -- that is, DEE.
  • There must be a smaller-than-anything-else value(?)
    Yes, going by our smaller-body-bigger-heading ordering, that'll be the value with minimal body -- that is, empty -- and maximum heading -- that is, a rather ineffable 'all possible attributes' -- which might be a way to represent what's sometimes called the (or a) 'Universal Relation'. (Or perhaps the 'Dictionary' of a recent thread.)
  • Is there an operation that'll return the bigger of two relations, going by this bigger-body-smaller-heading ordering? Yes.
  • If so, what does it produce a) when the bodies partially overlap (with same heading)?; b) when the headings partially overlap (with same tuples projected on the attributes in common)?; c) when there's partial overlap of headings and bodies?; d) in the degenerate cases of disjoint headings and/or disjoint bodies -- a kind of Cartesian what?

That there is a well-defined answer to each of a), b), c), d); and that it's provided by a single operator, is an intellectual surprise from which I still haven't recovered. Let's go through the cases, because although this seems weird, we'll bump into some familiar operations:

  • Same heading, and one's body is a subset of the other's, we find bigger <?> smaller yields bigger
    -- and that works even if the smaller is empty.
  • a) If they have same heading but only a partial overlap of tuples in their bodies, we find r1 <?> r2 yields a relation that is bigger than either -- it's body is exactly the union. I'll say again: it is exactly Codd's familiar UNION.
  • b) If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
    bigger-body-smaller-heading <?> smaller-body-bigger-heading yields bigger-body-....
  • c) If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find r1 <?> r2 yields a relation that is bigger than either -- going by the bigger-body-smaller-heading ordering.
  • c) supplemental: If one has a strictly smaller heading, and empty body (so also strictly smaller), we still conform to the bigger-body-smaller-heading ordering: take the smaller heading, project the bigger's body on to it. This is exactly Codd's/Tutorial D's projection operation -- taking the attribute-comma-list from the smaller heading.
    Did I say "strictly smaller heading"? Of course I meant strictly bigger in the ordering, except that a strictly smaller (empty) body means not directly comparable.
  • d) If the headings are disjoint, we still conform to the bigger-body-smaller-heading ordering: take the attributes in common (empty heading), project both bodies on it, form the UNION. That'll yield DUM just in case both bodies are empty, otherwise DEE. That the bodies are disjoint is not a problem. This is Boolean algebra's familiar OR operation, treating DEE/DUM as truth values.

What to call this operation <?>? Appendix A takes Natural Join as a generalisation of <AND>. Since we've seen <?> has behaviour that's a generalisation of UNION, we could call it <UNION> -- which is what ISBL did, or SQL has UNION CORRESPONDING. We've seen its behaviour also is a generalisation of Boolean disjunction -- but <OR> is already taken -- and there's grounds to allow that's just as valid as a generalisation.

So I use InnerUnion; with the Inner denoting to project each operand on attributes in common; Unioning the result. Just as A's <OR> can implement Tutorial D's UNION of relations with same heading, so can InnerUnion. Furthermore, it can implement Tutorial D's projection, taking a empty relation to denote the heading of the desired result.

Perhaps not so curious by now, this InnerUnion is commutative, associative, idempotent. Its result reverses the ordering as compared with NaturalJoin / <AND>, so the technical characterisation is as a lattice 'dual' operation. (But you didn't need to know anything about lattices.)

 

AntC

Quite beautiful!