## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

# The intellectual curiosity of `NatJoin`, `InnerUnion`

Quote from Vadim on May 14, 2021, 6:38 pm

... Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor.

Looking at Appendix A's 'FORMAL DEFINITIONS' for `<AND>`, it's a strange beast. Judging by Stackoverflow, most SQL'ers just don't get it -- neither questioners nor answerers, even those with a considerable reputation.

• `<AND>/ Natural Join` is not the mathematicians' Cartesian Product, because of (what wikipedia calls) the "flattened" tuples in the result.
• That is, by comparison it erases information: you can't tell from the result what were the headings of the operands.
• It's perhaps easier to see as (set) `INTERSECTION`, but that only works if the operands have same heading.
• How to explain if the operands have partially-overlapping heading?
• We might say: project each operand onto the attributes in common; get the `INTERSECTION` of those (because that's easy to explain); extend each intersecting tuple back to that/those in the source operands.
• That explanation becomes hard work if the operands have no attributes in common: project each operand on to no attributes? What?
• Perhaps that's why SQL explains `FROM table1, table2, ...` as more like a flattened Cartesian Product? Even if it risks duplicate columns.

What's remarkable about the semantics for `<AND>` is that it can take any two relation values as operands, and is guaranteed to produce just another relation value as result (subject to the 'join-compatibility' of the headings).

Curiosity upon curiosity, this `<AND>` is commutative, associative, idempotent! Relational Algebra can be like set operations.

`<AND>` also (because it's defined across all relation values) provides a way to compare relations:

• if they have the same heading and one's body is a subset of the other's, we find `bigger <AND> smaller` yields `smaller`
-- and that works even if the `smaller` is empty.
• If they have same heading but only a partial overlap of tuples in their bodies, we find `r1 <AND> r2` yields a relation that is smaller than either -- it's body is exactly the `INTERSECT`ion.
• The real curiosity, is when we get to relation values with differing headings, 'though now we need a bit of sideways thinking:
• If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
`bigger-body-smaller-heading <AND> smaller-body-bigger-heading` yields `smaller-body-...`.
• If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find `r1 <AND> r2` yields a relation that is smaller than either -- going by the smaller-body-bigger-heading ordering.

So if `<AND> / Natural Join` is alleged to be defined across all relation values, a few of questions naturally arise:

• There must be a bigger-than-anything-else value(?)
Yes, going by our bigger-body-smaller-heading ordering, that'll be the value with minimal heading -- that is, no attributes -- and maximum body -- that is, `DEE`.
• There must be a smaller-than-anything-else value(?)
Yes, going by our smaller-body-bigger-heading ordering, that'll be the value with minimal body -- that is, empty -- and maximum heading -- that is, a rather ineffable 'all possible attributes' -- which might be a way to represent what's sometimes called the (or a) 'Universal Relation'. (Or perhaps the 'Dictionary' of a recent thread.)
• Is there an operation that'll return the bigger of two relations, going by this bigger-body-smaller-heading ordering? Yes.
• If so, what does it produce a) when the bodies partially overlap (with same heading)?; b) when the headings partially overlap (with same tuples projected on the attributes in common)?; c) when there's partial overlap of headings and bodies?; d) in the degenerate cases of disjoint headings and/or disjoint bodies -- a kind of Cartesian what?

That there is a well-defined answer to each of a), b), c), d); and that it's provided by a single operator, is an intellectual surprise from which I still haven't recovered. Let's go through the cases, because although this seems weird, we'll bump into some familiar operations:

• Same heading, and one's body is a subset of the other's, we find `bigger <?> smaller` yields `bigger`
-- and that works even if the `smaller` is empty.
• a) If they have same heading but only a partial overlap of tuples in their bodies, we find `r1 <?> r2` yields a relation that is bigger than either -- it's body is exactly the union. I'll say again: it is exactly Codd's familiar `UNION`.
• b) If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
`bigger-body-smaller-heading <?> smaller-body-bigger-heading` yields `bigger-body-...`.
• c) If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find `r1 <?> r2` yields a relation that is bigger than either -- going by the bigger-body-smaller-heading ordering.
• c) supplemental: If one has a strictly smaller heading, and empty body (so also strictly smaller), we still conform to the bigger-body-smaller-heading ordering: take the smaller heading, project the bigger's body on to it. This is exactly Codd's/Tutorial D's projection operation -- taking the attribute-comma-list from the smaller heading.
Did I say "strictly smaller heading"? Of course I meant strictly bigger in the ordering, except that a strictly smaller (empty) body means not directly comparable.
• d) If the headings are disjoint, we still conform to the bigger-body-smaller-heading ordering: take the attributes in common (empty heading), project both bodies on it, form the `UNION`. That'll yield `DUM` just in case both bodies are empty, otherwise `DEE`. That the bodies are disjoint is not a problem. This is Boolean algebra's familiar `OR` operation, treating `DEE/DUM` as truth values.

What to call this operation `<?>`? Appendix A takes `Natural Join` as a generalisation of `<AND>`. Since we've seen `<?>` has behaviour that's a generalisation of `UNION`, we could call it `<UNION>` -- which is what ISBL did, or SQL has `UNION CORRESPONDING`. We've seen its behaviour also is a generalisation of Boolean disjunction -- but `<OR>` is already taken -- and there's grounds to allow that's just as valid as a generalisation.

So I use `InnerUnion`; with the `Inner` denoting to project each operand on attributes in common; `Union`ing the result. Just as A's `<OR>` can implement Tutorial D's `UNION` of relations with same heading, so can `InnerUnion`. Furthermore, it can implement Tutorial D's projection, taking a empty relation to denote the heading of the desired result.

Perhaps not so curious by now, this `InnerUnion` is commutative, associative, idempotent. Its result reverses the ordering as compared with `NaturalJoin / <AND>`, so the technical characterisation is as a lattice 'dual' operation. (But you didn't need to know anything about lattices.)

AntC

Quite beautiful!