The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Generalising Codd's operators; and a quiz

Quiz first. Here's an operator InnerIntersect that is equivalent to INTERSECT if its operands have same heading. If they're different, project each operand on the attributes in common; then INTERSECT their projections. I.e.

r1 InnerIntersect r2 =df r1{attributes-in-common} INTERSECT r2{attributes-in-common};

In setbuilder, given r1[X, Y], r2[Y, Z] in which X, Y, Z are groups of attributes (each possibly empty)

r1 InnerIntersect r2 =df {(Y) | ∃X (XY)∈r1 ∧ ∃Z (YZ)∈r2 }

What algebraic properties does this have? It's clearly commutative and idempotent. It has an 'absorbing element' DUM. (Whereas JOIN has an identity element DEE.)

The money question: is it associative? That is, (r1 InnerIntersect (r2 InnerIntersect r3)) = ((r1 InnerIntersect r2) InnerIntersect r3). I can see that if all operands are empty relations, that holds; or if they all have the same heading. I can see that the heading of the result must be the intersection of all headings. But in general? Intuition tells me it ought to be associative, because it's intersections all the way down; intuition is unreliable here. If not, please provide a counter-example.

(There could be other questions like: is this distributive over Appendix A's <OR>; but I need to prove associativity first; and I'm not making much headway.)

Generalising Codd

Codd's original set of operators for the Relational Algebra are rather annoying: there's tiresome restrictions on the headings of operands; which makes them all partial; and means they lack nice algebraic properties.

In particular, UNION, INTERSECT, MINUS require their operands to have the same headings ('union-compatible') whereas TIMES requires its operands to have disjoint headings. As Appendix A points out, we can observe that TIMES is a special case (disjoint headings) of JOIN, and INTERSECT is also a special case (same headings); so we can dispense with both and generalise as JOIN, or <AND> as Appendix A uses.

MINUS we can generalise to allow the headings of the operands to be different (not necessarily even having any attributes in common) -- as NOT MATCHING, then Appendix A generalises as <AND> <NOT>.

For UNION, if the operands have different headings, Appendix A chooses to generalise by returning a result with the union of headings, <OR> which might also be called OuterUnion, on a similar basis to OUTER JOIN. I've always felt queasy about that, because it's domain-dependent -- even if Tutorial D makes sure it could never be called in an 'unsafe query'. I prefer InnerUnion aka SQL's UNION CORRESPONDING, which is what ISBL implemented as Union: project each operand on the attributes in common; then UNION the projections. This operation is the dual of JOIN, in the sense if r1 JOIN r2 = r2 then r1 InnerUnion r2 = r1JOIN takes the union of headings and the intersection of tuples; InnerUnion takes the intersection of headings and the union of tuples.

Then if InnerUnion is the dual of JOIN aka <AND>, what is the dual of <OR> aka OuterUnion? It ought to be InnerIntersect, as described in the quiz. But then it ought to have complementary properties to <OR> (Union the headings and union the tuples cp intersect the headings and intersect the tuples.) But I'm not getting those properties to 'behave'.

Yes I've copied InnerIntersection from Tropashko (he calls it InnerJoin). His papers claim it's not associative, not distributive over <OR>, and in general has disappointing properties. I suspect he has his definitions wrong.

The TTM <OR> operation can be defined in numerous equivalent ways:

x <OR> y = <NOT>(<NOT>x ^ <NOT>y) 
x <OR> y = (x ^ R11) v (y ^ (x v R11))
x <OR> y = (R11 ^ y) v (x ^ y) v (x ^ R11)
x <OR> y = (x ^ (y v R11)) v (y ^ (x v R11))
x <OR> y = (R00 ^ x ^ y) v (R11 ^ y) v (x ^ R11)

with the following notation:

  • <NOT> as in TTM Algebra A
  • ^ - natural join, \bowtie, AKA TTM <AND>
  • v - inner union
  • R00 AKA TABLE_DUM
  • R11 - the dual of R00 (not to confuse with R01 AKA TTM TABLE_DEE)

For inner intersection/join, which we'll call in TTM style as <and> (not to confuse with natural join),  there are at least two equivalent definitions:

x <and> y = (x ^ R00) v (x ^ y) v (R00 ^ y)
x <and> y = (x ^ y) v ((x v y) ^ R00)

However, keep in mind that there is  at least one more, a different operation which produces a relation with the same intersection header:

x <"and"> y = <NOT>((<NOT>x) v (<NOT>y))

The pair of operations -- v together with <"and"> -- form the second lattice structure, while the <OR> together with <and> does not.

In summary, it looks like a zoo of relational operations out there, which we only had a glimpse yet.

 

 

 

 

 

Quote from Tegiri Nenashi on July 5, 2019, 5:36 pm

The TTM <OR> operation can be defined in numerous equivalent ways:

Tegiri, I don't think it's a happy idea to mix up Appendix D notation with Tropashko's lattice stuff. Using <and> or <"and"> for things that are not Appendix A's <AND> is going to cause tremendous confusion. For 'Inner Intersection', Tropashko uses *.

For inner intersection/join, which we'll call in TTM style as <and> (not to confuse with natural join),  there are at least two equivalent definitions:

x <and> y = (x ^ R00) v (x ^ y) v (R00 ^ y)
x <and> y = (x ^ y) v ((x v y) ^ R00)

There's several other possible equations. Are they equivalent? Because I can't prove it. Indeed I (or rather Mace4) have counter-examples where they give differing results. I suspect that this is the root of why Tropashko can't show InnerIntersection to be associative. Specifically, Tropahsko fails to nail down the definition of R00 (aka Appendix A DUM). So any equations mentioning R00 are wobbly.

However, keep in mind that there is  at least one more, a different operation which produces a relation with the same intersection header:

x <"and"> y = <NOT>((<NOT>x) v (<NOT>y))

The pair of operations -- v together with <"and"> -- form the second lattice structure,

No. The (first and only) lattice structure pairs v with ^. That's the definition of  a lattice, by convention. Look at the reference you quote: it's talking about the pair of Tropashko's + (i.e. <OR>) with what you call <"and">.

while the <OR> together with <and> does not.

I challenge that claim. I think Tropashko has got his definitions wrong, and the symptom of that is that InnerIntersection can't be shown to be associative.

In summary, it looks like a zoo of relational operations out there, which we only had a glimpse yet.

 

Of course there's infinitely many operations closed over relations. And infinitely many lattices you could define over the same set of elements (assuming those are infinite). Codd's 'Relational Completeness' (or at least a better motivated account of expressive completeness) is what puts us in a position to define any such operation using a small subset. We want the subset to have tractable algebraic properties, so that we can verify the definitions. So far Tropahsko hasn't done that rigorously.

Quote from AntC on July 5, 2019, 10:46 pm

Tegiri, I don't think it's a happy idea to mix up Appendix D notation with Tropashko's lattice stuff. Using <and> or <"and"> for things that are not Appendix A's <AND> is going to cause tremendous confusion. For 'Inner Intersection', Tropashko uses *.

The choice of * operation symbol can be criticized, because there is more than one flavor of the "inner intersection". That is assuming we don't disagree on what follows.

For inner intersection/join, which we'll call in TTM style as <and> (not to confuse with natural join),  there are at least two equivalent definitions:

x <and> y = (x ^ R00) v (x ^ y) v (R00 ^ y)
x <and> y = (x ^ y) v ((x v y) ^ R00)

There's several other possible equations. Are they equivalent? Because I can't prove it. Indeed I (or rather Mace4) have counter-examples where they give differing results. I suspect that this is the root of why Tropashko can't show InnerIntersection to be associative. Specifically, Tropahsko fails to nail down the definition of R00 (aka Appendix A DUM). So any equations mentioning R00 are wobbly.

They are equivalent. Here is the proof:

1 ((x ^ R00) v (x ^ y)) v (R00 ^ y) = (x ^ y) v ((x v y) ^ R00).  [goal].
2 x ^ y = y ^ x.  [assumption].
3 (x ^ y) ^ z = x ^ (y ^ z).  [assumption].
4 x ^ (x v y) = x.  [assumption].
5 x v y = y v x.  [assumption].
6 (x v y) v z = x v (y v z).  [assumption].
7 x v (x ^ y) = x.  [assumption].
8 (R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y).  [assumption].
9 (x ^ y) v (R00 ^ (z ^ (x v y))) = ((R00 ^ (z ^ x)) v y) ^ ((R00 ^ (z ^ y)) v x).  [para(5(a,1),8(a,1))].
10 ((c1 ^ R00) v (c1 ^ c2)) v (R00 ^ c2) != (c1 ^ c2) v ((c1 v c2) ^ R00).  [deny(1)].
11 ((R00 ^ c1) v (c1 ^ c2)) v (R00 ^ c2) != (c1 ^ c2) v ((c1 v c2) ^ R00).  [para(2(a,1),10(a,1,1,1))].
12 (R00 ^ c2) v ((R00 ^ c1) v (c1 ^ c2)) != (c1 ^ c2) v ((c1 v c2) ^ R00).  [para(5(a,1),11(a,1))].
13 (R00 ^ c2) v ((R00 ^ c1) v (c1 ^ c2)) != (c1 ^ c2) v (R00 ^ (c1 v c2)).  [para(2(a,1),12(a,2,2))].
14 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (R00 ^ c2) v ((R00 ^ c1) v (c1 ^ c2)).  [copy(13),flip(a)].
15 (x ^ y) ^ z = y ^ (x ^ z).  [para(2(a,1),3(a,1,1))].
16 x ^ (y ^ z) = y ^ (x ^ z).  [para(3(a,1),15(a,1))].
17 x ^ y = x ^ ((x v z) ^ y).  [para(4(a,1),3(a,1,1))].
18 x ^ ((x v y) ^ z) = x ^ z.  [copy(17),flip(a)].
19 x ^ y = x ^ (y ^ ((x ^ y) v z)).  [para(4(a,1),3(a,1))].
20 x ^ (y ^ ((x ^ y) v z)) = x ^ y.  [copy(19),flip(a)].
21 x ^ (y v x) = x.  [para(5(a,1),4(a,1,2))].
22 (x v y) v z = y v (x v z).  [para(5(a,1),6(a,1,1))].
23 x v (y v z) = y v (x v z).  [para(6(a,1),22(a,1))].
24 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (R00 ^ c1) v ((R00 ^ c2) v (c1 ^ c2)).  [para(23(a,1),14(a,2))].
25 x ^ x = x.  [para(7(a,1),4(a,1,2))].
26 (x ^ y) v (R00 ^ ((x v y) ^ z)) = ((R00 ^ (z ^ x)) v y) ^ ((R00 ^ (z ^ y)) v x).  [para(2(a,1),9(a,1,2,2))].
27 x ^ y = y ^ (x ^ (y v z)).  [para(4(a,1),16(a,1,2))].
28 x ^ (y ^ (x v z)) = y ^ x.  [copy(27),flip(a)].
29 x v (y ^ (x ^ z)) = x.  [para(16(a,1),7(a,1,2))].
30 (x ^ y) v (R00 ^ (y ^ (x v y))) = ((R00 ^ (y ^ x)) v y) ^ ((R00 ^ y) v x).  [para(25(a,1),9(a,2,2,1,2))].
31 (x ^ y) v (R00 ^ y) = ((R00 ^ (y ^ x)) v y) ^ ((R00 ^ y) v x).  [para(21(a,1),30(a,1,2,2))].
32 (x ^ y) v (R00 ^ y) = (y v (R00 ^ (y ^ x))) ^ ((R00 ^ y) v x).  [para(5(a,1),31(a,2,1))].
33 (x ^ y) v (R00 ^ y) = y ^ ((R00 ^ y) v x).  [para(29(a,1),32(a,2,1))].
34 x ^ y = x ^ ((z v x) ^ y).  [para(21(a,1),3(a,1,1))].
35 x ^ ((y v x) ^ z) = x ^ z.  [copy(34),flip(a)].
36 x ^ y = y ^ (x ^ (z v y)).  [para(21(a,1),16(a,1,2))].
37 x ^ (y ^ (z v x)) = y ^ x.  [copy(36),flip(a)].
38 x ^ (y ^ z) = y ^ (x ^ ((y v u) ^ z)).  [para(18(a,1),16(a,1,2))].
39 x ^ (y ^ ((x v z) ^ u)) = y ^ (x ^ u).  [copy(38),flip(a)].
40 x ^ (x v y) = x ^ (z v (x v y)).  [para(21(a,1),18(a,1,2))].
41 x = x ^ (z v (x v y)).  [para(4(a,1),40(a,1))].
42 x ^ (y v (x v z)) = x.  [copy(41),flip(a)].
43 x v y = x v ((z ^ (x ^ u)) v y).  [para(29(a,1),6(a,1,1))].
44 x v ((y ^ (x ^ z)) v u) = x v u.  [copy(43),flip(a)].
45 x ^ ((x v y) ^ z) = x ^ (z ^ (((x v y) ^ z) v u)).  [para(20(a,1),18(a,1,2))].
46 x ^ z = x ^ (z ^ (((x v y) ^ z) v u)).  [para(18(a,1),45(a,1))].
47 x ^ (y ^ (((x v z) ^ y) v u)) = x ^ y.  [copy(46),flip(a)].
48 (x ^ (y v z)) v (R00 ^ y) = ((R00 ^ (y ^ x)) v (y v z)) ^ ((R00 ^ (y ^ (y v z))) v x).  [para(42(a,1),9(a,1,2,2))].
49 (R00 ^ y) v (x ^ (y v z)) = ((R00 ^ (y ^ x)) v (y v z)) ^ ((R00 ^ (y ^ (y v z))) v x).  [para(5(a,1),48(a,1))].
50 (R00 ^ y) v (x ^ (y v z)) = (y v ((R00 ^ (y ^ x)) v z)) ^ ((R00 ^ (y ^ (y v z))) v x).  [para(23(a,1),49(a,2,1))].
51 (R00 ^ y) v (x ^ (y v z)) = (y v z) ^ ((R00 ^ (y ^ (y v z))) v x).  [para(44(a,1),50(a,2,1))].
52 (R00 ^ x) v (y ^ (x v z)) = (x v z) ^ ((R00 ^ x) v y).  [para(4(a,1),51(a,2,2,1,2))].
53 x ^ (y ^ z) = y ^ (x ^ ((u v y) ^ z)).  [para(35(a,1),16(a,1,2))].
54 x ^ (y ^ ((z v x) ^ u)) = y ^ (x ^ u).  [copy(53),flip(a)].
55 x ^ ((R00 ^ x) v y) = (R00 ^ x) v (y ^ x).  [para(33(a,1),5(a,1))].
56 (R00 ^ x) v (y ^ x) = x ^ ((R00 ^ x) v y).  [copy(55),flip(a)].
57 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (R00 ^ c1) v (c2 ^ ((R00 ^ c2) v c1)).  [para(56(a,1),24(a,2,2))].
58 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (R00 ^ c1) v (c2 ^ (c1 v (R00 ^ c2))).  [para(5(a,1),57(a,2,2,2))].
59 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (c1 v (R00 ^ c2)) ^ ((R00 ^ c1) v c2).  [para(52(a,1),58(a,2))].
60 (c1 ^ c2) v (R00 ^ (c1 v c2)) != (c1 v (R00 ^ c2)) ^ (c2 v (R00 ^ c1)).  [para(5(a,1),59(a,2,2))].
61 (x ^ y) v (R00 ^ (x v y)) = ((R00 ^ ((((R00 v z) ^ (x v y)) v u) ^ x)) v y) ^ ((R00 ^ ((((R00 v z) ^ (x v y)) v u) ^ y)) v x).  [para(47(a,1),26(a,1,2))].
62 (x ^ y) v (R00 ^ (x v y)) = ((R00 ^ (x ^ (((R00 v z) ^ (x v y)) v u))) v y) ^ ((R00 ^ ((((R00 v z) ^ (x v y)) v u) ^ y)) v x).  [para(2(a,1),61(a,2,1,1,2))].
63 (x ^ y) v (R00 ^ (x v y)) = (y v (R00 ^ (x ^ (((R00 v z) ^ (x v y)) v u)))) ^ ((R00 ^ ((((R00 v z) ^ (x v y)) v u) ^ y)) v x).  [para(5(a,1),62(a,2,1))].
64 (x ^ y) v (R00 ^ (x v y)) = (y v (R00 ^ (x ^ (((R00 v z) ^ (x v y)) v u)))) ^ ((R00 ^ (y ^ (((R00 v z) ^ (x v y)) v u))) v x).  [para(2(a,1),63(a,2,2,1,2))].
65 (x ^ y) v (R00 ^ (x v y)) = (y v (R00 ^ (x ^ (((R00 v z) ^ (x v y)) v u)))) ^ (x v (R00 ^ (y ^ (((R00 v z) ^ (x v y)) v u)))).  [para(5(a,1),64(a,2,2))].
66 (x v (R00 ^ (y ^ (((R00 v z) ^ (y v x)) v u)))) ^ (y v (R00 ^ (x ^ (((R00 v z) ^ (y v x)) v u)))) = (y ^ x) v (R00 ^ (y v x)).  [copy(65),flip(a)].
67 x ^ (y ^ (x v z)) = y ^ (x ^ (((y v u) ^ (x v z)) v w)).  [para(47(a,1),39(a,1,2))].
68 y ^ x = y ^ (x ^ (((y v u) ^ (x v z)) v w)).  [para(28(a,1),67(a,1))].
69 x ^ (y ^ (((x v z) ^ (y v u)) v w)) = x ^ y.  [copy(68),flip(a)].
70 (x v (R00 ^ y)) ^ (y v (R00 ^ (x ^ (((R00 v z) ^ (y v x)) v u)))) = (y ^ x) v (R00 ^ (y v x)).  [para(69(a,1),66(a,1,1,2))].
71 x ^ (y ^ (z v x)) = y ^ (x ^ (((y v u) ^ (z v x)) v w)).  [para(47(a,1),54(a,1,2))].
72 y ^ x = y ^ (x ^ (((y v u) ^ (z v x)) v w)).  [para(37(a,1),71(a,1))].
73 x ^ (y ^ (((x v z) ^ (u v y)) v w)) = x ^ y.  [copy(72),flip(a)].
74 (x v (R00 ^ y)) ^ (y v (R00 ^ x)) = (y ^ x) v (R00 ^ (y v x)).  [para(73(a,1),70(a,1,2,2))].
75 (x ^ y) v (R00 ^ (x v y)) = (y v (R00 ^ x)) ^ (x v (R00 ^ y)).  [copy(74),flip(a)].
76 (c2 v (R00 ^ c1)) ^ (c1 v (R00 ^ c2)) != (c1 v (R00 ^ c2)) ^ (c2 v (R00 ^ c1)).  [para(75(a,1),60(a,1))].
77 (c1 v (R00 ^ c2)) ^ (c2 v (R00 ^ c1)) != (c1 v (R00 ^ c2)) ^ (c2 v (R00 ^ c1)).  [para(2(a,1),76(a,1))].
78 $F.  [copy(77),xx(a)].

As you can see from step #8, for example, I used Litak et.al. axiom system.

However, keep in mind that there is  at least one more, a different operation which produces a relation with the same intersection header:

x <"and"> y = <NOT>((<NOT>x) v (<NOT>y))

The pair of operations -- v together with <"and"> -- form the second lattice structure,

No. The (first and only) lattice structure pairs v with ^. That's the definition of  a lattice, by convention. Look at the reference you quote: it's talking about the pair of Tropashko's + (i.e. <OR>) with what you call <"and">.

I'm confused. The first lattice structure is formed by v together with ^. The second lattice, formed by + (AKA <OR>) and # (AKA <"and">), is different from the first one. They coexist in the same database (set of relations).

while the <OR> together with <and> does not.

I challenge that claim. I think Tropashko has got his definitions wrong, and the symptom of that is that InnerIntersection can't be shown to be associative.

Take

x = [q r]
     b 0
;
y = [q r]
     a 0
;
z = [p r]
     0 0
     0 1
     0 2
     1 1
     1 2
     2 2
;

Then (x <and> y) <and> z evaluates to the empty relation

[r]
;

while x <and> (y <and> z) is not empty

[r]
 0
;

In summary, it looks like a zoo of relational operations out there, which we only had a glimpse yet.

 

Of course there's infinitely many operations closed over relations. And infinitely many lattices you could define over the same set of elements (assuming those are infinite). Codd's 'Relational Completeness' (or at least a better motivated account of expressive completeness) is what puts us in a position to define any such operation using a small subset. We want the subset to have tractable algebraic properties, so that we can verify the definitions. So far Tropahsko hasn't done that rigorously.

I consider the fact that there is more than one lattice as quite nontrivial. I would be convinced otherwise if you exhibit the third lattice.

Quote from AntC on July 5, 2019, 11:30 am

Quiz first. Here's an operator InnerIntersect that is equivalent to INTERSECT if its operands have same heading. If they're different, project each operand on the attributes in common; then INTERSECT their projections. I.e.

r1 InnerIntersect r2 =df r1{attributes-in-common} INTERSECT r2{attributes-in-common};

In setbuilder, given r1[X, Y], r2[Y, Z] in which X, Y, Z are groups of attributes (each possibly empty)

r1 InnerIntersect r2 =df {(Y) | ∃X (XY)∈r1 ∧ ∃Z (YZ)∈r2 }

What algebraic properties does this have? It's clearly commutative and idempotent. It has an 'absorbing element' DUM. (Whereas JOIN has an identity element DEE.)

The money question: is it associative? That is, (r1 InnerIntersect (r2 InnerIntersect r3)) = ((r1 InnerIntersect r2) InnerIntersect r3). I can see that if all operands are empty relations, that holds; or if they all have the same heading. I can see that the heading of the result must be the intersection of all headings. But in general? Intuition tells me it ought to be associative, because it's intersections all the way down; intuition is unreliable here. If not, please provide a counter-example.

(There could be other questions like: is this distributive over Appendix A's <OR>; but I need to prove associativity first; and I'm not making much headway.)

Generalising Codd

Codd's original set of operators for the Relational Algebra are rather annoying: there's tiresome restrictions on the headings of operands; which makes them all partial; and means they lack nice algebraic properties.

In particular, UNION, INTERSECT, MINUS require their operands to have the same headings ('union-compatible') whereas TIMES requires its operands to have disjoint headings. As Appendix A points out, we can observe that TIMES is a special case (disjoint headings) of JOIN, and INTERSECT is also a special case (same headings); so we can dispense with both and generalise as JOIN, or <AND> as Appendix A uses.

MINUS we can generalise to allow the headings of the operands to be different (not necessarily even having any attributes in common) -- as NOT MATCHING, then Appendix A generalises as <AND> <NOT>.

For UNION, if the operands have different headings, Appendix A chooses to generalise by returning a result with the union of headings, <OR> which might also be called OuterUnion, on a similar basis to OUTER JOIN. I've always felt queasy about that, because it's domain-dependent -- even if Tutorial D makes sure it could never be called in an 'unsafe query'. I prefer InnerUnion aka SQL's UNION CORRESPONDING, which is what ISBL implemented as Union: project each operand on the attributes in common; then UNION the projections. This operation is the dual of JOIN, in the sense if r1 JOIN r2 = r2 then r1 InnerUnion r2 = r1JOIN takes the union of headings and the intersection of tuples; InnerUnion takes the intersection of headings and the union of tuples.

Then if InnerUnion is the dual of JOIN aka <AND>, what is the dual of <OR> aka OuterUnion? It ought to be InnerIntersect, as described in the quiz. But then it ought to have complementary properties to <OR> (Union the headings and union the tuples cp intersect the headings and intersect the tuples.) But I'm not getting those properties to 'behave'.

Yes I've copied InnerIntersection from Tropashko (he calls it InnerJoin). His papers claim it's not associative, not distributive over <OR>, and in general has disappointing properties. I suspect he has his definitions wrong.

It's an interesting question, but I'm surprised it causes such difficulty. I've always been of the view that, despite Codd's restrictive definitions, all the set-like operations (UNION, INTERSECT, MINUS, etc) generalise simply:

  • project each argument onto the set of common attributes
  • apply Codd-like operation.

That certainly looks associative, and I can see no way to construct a counter-example. Does that not suggest a method for a proof?

Andl - A New Database Language - andl.org

Thank you Tegiri. Heck, lots to take in so I'll reply in chunks.

Firstly thank you for the counter-example: an InnerIntersection that is not associative. So you've answered the quiz question. Then that operator can't form a lattice, even if R00 aka DUM is its absorbing element. Neither can it be the lattice dual of Appendix A <OR>, even though DUM is the identity for <OR>.

Thank you for the proof of equivalence -- and especially the axiom (your #8) from Litak et al 2014. Perhaps that (ugly thing) is what it needs to fix R00/DUM. And that paper has an additional axiom for DUM, almost as ugly, that specifies distributivity. I'm very pleased to see that neither axiom uses any domain-dependent operations. So then I can stop worrying about <OR>,<NOT>.

<OR> is commutative, idempotent, associative and has identity element DUM. That is sufficient to define a semi-lattice. (I'll swerve around the nomenclature of whether it's lattice-meet or lattice-join and whether DUM is top or bottom. I think of that lattice as going 'sideways' relative to the <AND>, DEE lattice, so I call DUM 'lattice left'.) There must be a dual operation to <OR>, and I can define it without using negation, it's the solution to:

(x = (x OR z) & y = (y OR z)) <-> ((x HUH y) = (x HUH y) OR z).

You can prove <HUH> is uniquely defined, is commutative, idempotent, associative and has absorbing element DUM. I have no intuition how to describe it in setbuilder notation.

I consider the fact that there is more than one lattice as quite nontrivial. I would be convinced otherwise if you exhibit the third lattice.

Um? In general, given a set of elements, all you need do to define a lattice is find a partial ordering over them. The 'partial' means not all elements are comparable pairwise. It doesn't necessarily have a top or a bottom; the meet/join don't have to do anything useful or sensible or even understandable (as with the dual to <OR>).

Given that relations have a rich structure with both the headings being a set and the tuples being a set, I can imagine a bunch of (probably useless) orderings. Let's try this as a starter:

  • x ≤ y if x's heading is a strict subset of y's (irrespective of content);
  • x ≤ y if x, y's headings are the same, and x's content ⊆ y's content;
  • otherwise x, y not comparable.
  • then DUM is the least element.
Quote from dandl on July 6, 2019, 5:00 am
Quote from AntC on July 5, 2019, 11:30 am

...

Generalising Codd

Codd's original set of operators for the Relational Algebra are rather annoying: there's tiresome restrictions on the headings of operands; which makes them all partial; and means they lack nice algebraic properties.

In particular, UNION, INTERSECT, MINUS require their operands to have the same headings ('union-compatible') whereas TIMES requires its operands to have disjoint headings. As Appendix A points out, we can observe that TIMES is a special case (disjoint headings) of JOIN, and INTERSECT is also a special case (same headings); so we can dispense with both and generalise as JOIN, or <AND> as Appendix A uses.

...

Yes I've copied InnerIntersection from Tropashko (he calls it InnerJoin). His papers claim it's not associative, not distributive over <OR>, and in general has disappointing properties. I suspect he has his definitions wrong.

It's an interesting question, but I'm surprised it causes such difficulty. I've always been of the view that, despite Codd's restrictive definitions, all the set-like operations (UNION, INTERSECT, MINUS, etc) generalise simply:

  • project each argument onto the set of common attributes
  • apply Codd-like operation.

MINUS (with union-compatible operands) is not commutative nor idempotent nor associative. It has neither left nor right identity. No amount of generalising will give it nice algebraic properties; and projecting each argument on to the set of common attributes won't improve the situation. So I'm happy to generalise to NOT MATCHING (whose result has heading same as its left operand, and which has right identity DUM) and leave it at that.

The common attributes approach works for UNION as InnerUnion. The surprise (to me) is that gives a dual to JOIN aka <AND>. And we can combine with the properties for DUM (as Hugh investigated), as the join-to-make-it-empty element to get

x PROJECT_ON y =df x InnerUnion HEADING_OF(y); // i.e. project x on the attributes in common with y

x SAME_HEADING_AS y =df HEADING_OF(x) = HEADING_OF(y);

HEADING_OF(x) =df x JOIN DUM;

Then we can represent headings as relation values and start doing set-algebra on headings.

That certainly looks associative, and I can see no way to construct a counter-example. Does that not suggest a method for a proof?

For INTERSECT project on to common attributes, looks can be deceptive it seems. And Tegiri has given a counter-example. So no wonder there's no proof. More of a concern is that Mace4 can't find a counter-example (perhaps I didn't run it for long enough, but it should only need a 6-element model; small beer by its lights).

Quote from Tegiri Nenashi on July 6, 2019, 12:04 am
Quote from AntC on July 5, 2019, 10:46 pm

Of course there's infinitely many operations closed over relations. And infinitely many lattices you could define over the same set of elements (assuming those are infinite). Codd's 'Relational Completeness' (or at least a better motivated account of expressive completeness) is what puts us in a position to define any such operation using a small subset. We want the subset to have tractable algebraic properties, so that we can verify the definitions. So far Tropahsko hasn't done that rigorously.

I consider the fact that there is more than one lattice as quite nontrivial. I would be convinced otherwise if you exhibit the third lattice.

Ok here's a fourth lattice. (This one thanks to 3 glasses of 'Bouncing Czech' Pilsner from the Boundary Road Brewery, and my local team having just won the Super XV Southern Hemisphere Championship.)

  • x ≤ y if x's heading is strictly alphabetically prior to y's (irrespective of content, taking the attribute names sorted alphabetically ascending, taking attribute type names sorted alphabetically descending in case of needing a 'tie breaker' for attribute names);
  • x ≤ y if x, y's headings are the same, and x's content ⊆ y's content;
  • otherwise x, y not comparable.
  • then DUM is the least element.
Quote from Tegiri Nenashi on July 5, 2019, 5:36 pm

However, keep in mind that there is  at least one more, a different operation which produces a relation with the same intersection header:

x <"and"> y = <NOT>((<NOT>x) v (<NOT>y))

The pair of operations -- v together with <"and"> -- form the second lattice structure, while the <OR> together with <and> does not.

Ok this <"and"> stands a better chance of being the dual to <OR>. I had hoped that with that double-negation it would turn out to be domain-independent; but no. I think its set-builder spec for relations r1[X, Y], r2[Y, Z]  (in which X, Y, Z are possibly empty groups of attributes) is:

r1 <"and"> r2 =df {(Y) | (forall X (X, Y)∈ r1) ∧ (forall Z (Y, Z)∈ r2)}

Note that unlike most setbuilder specs, those are forall quantifiers inside the scope. IOW every possible X value in the domain of r1 must appear with each Y value in the relation; furthermore likewise for all Z values in r2.

I'd call this a useless operator: almost always it will return an empty relation, with heading the intersection of its operands' headings.

I'm puzzled by all this.  Why should it be supposed that for the relational algebra to make sense, it must be single-sorted? It is certainly not your usual multi-sorted algebra with impermeable boundaries between types (analogous to parametric polymorphism), nor is it generalized multi-sorted algebra with a hierarchy of types (analogous to inclusion polymorphism).  Instead, there are complex relationships between sorts ("hyper-generalized multi-sorted algebra"?) with some operators constrained to classical multi-sortedness, and others with more complicated realizations.

I know that this just restates Codd's own definitions, but it does so in terms of a framework that is reasonably well understood.