The intellectual curiosity of `NatJoin`, `InnerUnion`
Quote from AntC on May 16, 2021, 7:46 amQuote from Vadim on May 14, 2021, 6:38 pm... Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor.
Thanks Vadim ...
Looking at Appendix A's 'FORMAL DEFINITIONS' for
<AND>
, it's a strange beast. Judging by Stackoverflow, most SQL'ers just don't get it -- neither questioners nor answerers, even those with a considerable reputation.
<AND>/ Natural Join
is not the mathematicians' Cartesian Product, because of (what wikipedia calls) the "flattened" tuples in the result.- That is, by comparison it erases information: you can't tell from the result what were the headings of the operands.
- It's perhaps easier to see as (set)
INTERSECTION
, but that only works if the operands have same heading.- How to explain if the operands have partially-overlapping heading?
- We might say: project each operand onto the attributes in common; get the
INTERSECTION
of those (because that's easy to explain); extend each intersecting tuple back to that/those in the source operands.- That explanation becomes hard work if the operands have no attributes in common: project each operand on to no attributes? What?
- Perhaps that's why SQL explains
FROM table1, table2, ...
as more like a flattened Cartesian Product? Even if it risks duplicate columns.What's remarkable about the semantics for
<AND>
is that it can take any two relation values as operands, and is guaranteed to produce just another relation value as result (subject to the 'join-compatibility' of the headings).Curiosity upon curiosity, this
<AND>
is commutative, associative, idempotent! Relational Algebra can be like set operations.
<AND>
also (because it's defined across all relation values) provides a way to compare relations:
- if they have the same heading and one's body is a subset of the other's, we find
bigger <AND> smaller
yieldssmaller
-- and that works even if thesmaller
is empty.- If they have same heading but only a partial overlap of tuples in their bodies, we find
r1 <AND> r2
yields a relation that is smaller than either -- it's body is exactly theINTERSECT
ion.- The real curiosity, is when we get to relation values with differing headings, 'though now we need a bit of sideways thinking:
- If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
bigger-body-smaller-heading <AND> smaller-body-bigger-heading
yieldssmaller-body-...
.- If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find
r1 <AND> r2
yields a relation that is smaller than either -- going by the smaller-body-bigger-heading ordering.So if
<AND> / Natural Join
is alleged to be defined across all relation values, a few of questions naturally arise:
- There must be a bigger-than-anything-else value(?)
Yes, going by our bigger-body-smaller-heading ordering, that'll be the value with minimal heading -- that is, no attributes -- and maximum body -- that is,DEE
.- There must be a smaller-than-anything-else value(?)
Yes, going by our smaller-body-bigger-heading ordering, that'll be the value with minimal body -- that is, empty -- and maximum heading -- that is, a rather ineffable 'all possible attributes' -- which might be a way to represent what's sometimes called the (or a) 'Universal Relation'. (Or perhaps the 'Dictionary' of a recent thread.)- Is there an operation that'll return the bigger of two relations, going by this bigger-body-smaller-heading ordering? Yes.
- If so, what does it produce a) when the bodies partially overlap (with same heading)?; b) when the headings partially overlap (with same tuples projected on the attributes in common)?; c) when there's partial overlap of headings and bodies?; d) in the degenerate cases of disjoint headings and/or disjoint bodies -- a kind of Cartesian what?
That there is a well-defined answer to each of a), b), c), d); and that it's provided by a single operator, is an intellectual surprise from which I still haven't recovered. Let's go through the cases, because although this seems weird, we'll bump into some familiar operations:
- Same heading, and one's body is a subset of the other's, we find
bigger <?> smaller
yieldsbigger
-- and that works even if thesmaller
is empty.- a) If they have same heading but only a partial overlap of tuples in their bodies, we find
r1 <?> r2
yields a relation that is bigger than either -- it's body is exactly the union. I'll say again: it is exactly Codd's familiarUNION
.- b) If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
bigger-body-smaller-heading <?> smaller-body-bigger-heading
yieldsbigger-body-...
.- c) If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find
r1 <?> r2
yields a relation that is bigger than either -- going by the bigger-body-smaller-heading ordering.- c) supplemental: If one has a strictly smaller heading, and empty body (so also strictly smaller), we still conform to the bigger-body-smaller-heading ordering: take the smaller heading, project the bigger's body on to it. This is exactly Codd's/Tutorial D's projection operation -- taking the attribute-comma-list from the smaller heading.
Did I say "strictly smaller heading"? Of course I meant strictly bigger in the ordering, except that a strictly smaller (empty) body means not directly comparable.- d) If the headings are disjoint, we still conform to the bigger-body-smaller-heading ordering: take the attributes in common (empty heading), project both bodies on it, form the
UNION
. That'll yieldDUM
just in case both bodies are empty, otherwiseDEE
. That the bodies are disjoint is not a problem. This is Boolean algebra's familiarOR
operation, treatingDEE/DUM
as truth values.What to call this operation
<?>
? Appendix A takesNatural Join
as a generalisation of<AND>
. Since we've seen<?>
has behaviour that's a generalisation ofUNION
, we could call it<UNION>
-- which is what ISBL did, or SQL hasUNION CORRESPONDING
. We've seen its behaviour also is a generalisation of Boolean disjunction -- but<OR>
is already taken -- and there's grounds to allow that's just as valid as a generalisation.So I use
InnerUnion
; with theInner
denoting to project each operand on attributes in common;Union
ing the result. Just as A's<OR>
can implement Tutorial D'sUNION
of relations with same heading, so canInnerUnion
. Furthermore, it can implement Tutorial D's projection, taking a empty relation to denote the heading of the desired result.Perhaps not so curious by now, this
InnerUnion
is commutative, associative, idempotent. Its result reverses the ordering as compared withNaturalJoin / <AND>
, so the technical characterisation is as a lattice 'dual' operation. (But you didn't need to know anything about lattices.)
AntC
Quote from Vadim on May 14, 2021, 6:38 pm... Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor.
Thanks Vadim ...
Looking at Appendix A's 'FORMAL DEFINITIONS' for <AND>
, it's a strange beast. Judging by Stackoverflow, most SQL'ers just don't get it -- neither questioners nor answerers, even those with a considerable reputation.
<AND>/ Natural Join
is not the mathematicians' Cartesian Product, because of (what wikipedia calls) the "flattened" tuples in the result.- That is, by comparison it erases information: you can't tell from the result what were the headings of the operands.
- It's perhaps easier to see as (set)
INTERSECTION
, but that only works if the operands have same heading. - How to explain if the operands have partially-overlapping heading?
- We might say: project each operand onto the attributes in common; get the
INTERSECTION
of those (because that's easy to explain); extend each intersecting tuple back to that/those in the source operands. - That explanation becomes hard work if the operands have no attributes in common: project each operand on to no attributes? What?
- Perhaps that's why SQL explains
FROM table1, table2, ...
as more like a flattened Cartesian Product? Even if it risks duplicate columns.
What's remarkable about the semantics for <AND>
is that it can take any two relation values as operands, and is guaranteed to produce just another relation value as result (subject to the 'join-compatibility' of the headings).
Curiosity upon curiosity, this <AND>
is commutative, associative, idempotent! Relational Algebra can be like set operations.
<AND>
also (because it's defined across all relation values) provides a way to compare relations:
- if they have the same heading and one's body is a subset of the other's, we find
bigger <AND> smaller
yieldssmaller
-- and that works even if thesmaller
is empty. - If they have same heading but only a partial overlap of tuples in their bodies, we find
r1 <AND> r2
yields a relation that is smaller than either -- it's body is exactly theINTERSECT
ion. - The real curiosity, is when we get to relation values with differing headings, 'though now we need a bit of sideways thinking:
- If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
bigger-body-smaller-heading <AND> smaller-body-bigger-heading
yieldssmaller-body-...
. - If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find
r1 <AND> r2
yields a relation that is smaller than either -- going by the smaller-body-bigger-heading ordering.
So if <AND> / Natural Join
is alleged to be defined across all relation values, a few of questions naturally arise:
- There must be a bigger-than-anything-else value(?)
Yes, going by our bigger-body-smaller-heading ordering, that'll be the value with minimal heading -- that is, no attributes -- and maximum body -- that is,DEE
. - There must be a smaller-than-anything-else value(?)
Yes, going by our smaller-body-bigger-heading ordering, that'll be the value with minimal body -- that is, empty -- and maximum heading -- that is, a rather ineffable 'all possible attributes' -- which might be a way to represent what's sometimes called the (or a) 'Universal Relation'. (Or perhaps the 'Dictionary' of a recent thread.) - Is there an operation that'll return the bigger of two relations, going by this bigger-body-smaller-heading ordering? Yes.
- If so, what does it produce a) when the bodies partially overlap (with same heading)?; b) when the headings partially overlap (with same tuples projected on the attributes in common)?; c) when there's partial overlap of headings and bodies?; d) in the degenerate cases of disjoint headings and/or disjoint bodies -- a kind of Cartesian what?
That there is a well-defined answer to each of a), b), c), d); and that it's provided by a single operator, is an intellectual surprise from which I still haven't recovered. Let's go through the cases, because although this seems weird, we'll bump into some familiar operations:
- Same heading, and one's body is a subset of the other's, we find
bigger <?> smaller
yieldsbigger
-- and that works even if thesmaller
is empty. - a) If they have same heading but only a partial overlap of tuples in their bodies, we find
r1 <?> r2
yields a relation that is bigger than either -- it's body is exactly the union. I'll say again: it is exactly Codd's familiarUNION
. - b) If one relation has a smaller heading, but at least all the tuples of the other considering only the attributes in common, we still get
bigger-body-smaller-heading <?> smaller-body-bigger-heading
yieldsbigger-body-...
. - c) If they have a partial overlap of headings and/or a partial overlap of tuples (going by attributes in common) we still find
r1 <?> r2
yields a relation that is bigger than either -- going by the bigger-body-smaller-heading ordering. - c) supplemental: If one has a strictly smaller heading, and empty body (so also strictly smaller), we still conform to the bigger-body-smaller-heading ordering: take the smaller heading, project the bigger's body on to it. This is exactly Codd's/Tutorial D's projection operation -- taking the attribute-comma-list from the smaller heading.
Did I say "strictly smaller heading"? Of course I meant strictly bigger in the ordering, except that a strictly smaller (empty) body means not directly comparable. - d) If the headings are disjoint, we still conform to the bigger-body-smaller-heading ordering: take the attributes in common (empty heading), project both bodies on it, form the
UNION
. That'll yieldDUM
just in case both bodies are empty, otherwiseDEE
. That the bodies are disjoint is not a problem. This is Boolean algebra's familiarOR
operation, treatingDEE/DUM
as truth values.
What to call this operation <?>
? Appendix A takes Natural Join
as a generalisation of <AND>
. Since we've seen <?>
has behaviour that's a generalisation of UNION
, we could call it <UNION>
-- which is what ISBL did, or SQL has UNION CORRESPONDING
. We've seen its behaviour also is a generalisation of Boolean disjunction -- but <OR>
is already taken -- and there's grounds to allow that's just as valid as a generalisation.
So I use InnerUnion
; with the Inner
denoting to project each operand on attributes in common; Union
ing the result. Just as A's <OR>
can implement Tutorial D's UNION
of relations with same heading, so can InnerUnion
. Furthermore, it can implement Tutorial D's projection, taking a empty relation to denote the heading of the desired result.
Perhaps not so curious by now, this InnerUnion
is commutative, associative, idempotent. Its result reverses the ordering as compared with NaturalJoin / <AND>
, so the technical characterisation is as a lattice 'dual' operation. (But you didn't need to know anything about lattices.)
AntC