The Third Manifesto ForumDatabases: Database TheoryRelational Lattice: 2019 Report C …

Please or Register to create posts and topics.

Relational Lattice: 2019 Report Card

#1 · January 4, 2020, 2:48 am

Quote from AntC on January 4, 2020, 2:48 am
This thread is going to be several posts (in no particular order) as a status report/brain dump of my exploring Relational Lattices. I'll be putting the study aside while I go travelling for several weeks; but (spoiler alert) I don't feel very inclined to pick it up again. I think we have a quaint curiosity of little practical benefit. Which is to say the benefits derive from features that were already separately identified and understood before Vadim's Relational Lattices put them together:

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.

The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...

Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...

The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).

NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.

Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.

We take FOPL with equality = between relations, as I've already assumed above.

We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

Nothing I've said so far requires us to treat relation values as inhabiting a Lattice. If we take the domain of discourse to be all possible relation values obtainable from some heading H , defined as per TTM, full spec in a following post, they naturally form a Lattice algebraic structure using NatJOIN, InnerUNION as Lattice meet, join respectively -- that is providing we can get over the confusing terminological confusion in 'join'. To be able to draw from the substantial theory of Lattices we need just one more thing: I've drawn attention to the critical importance of DUM for the equivalences of operators; we need a way to distinguish the properties of DUM as a unique element within the set of Lattice elements.

Otherwise, all reasoning involving DUM (which is all reasoning) is off: we can neither prove nor disprove equivalences of expressions; IOW we can't support some query engine's optimisation of an expression into an equivalent but more efficient query plan -- except by a bunch of hand-wavy ad-hoc rules, which puts us in no better a position than for SQL. That's because the definitions/equivalences for all of the operators, see above, rely on DUM.

My line of thought started from trying to capture the properties of Functional Dependencies, as expressed as equivalences between expressions over whole-relations in the Lattice algebra, avoiding mention of attribute names. Note other Dependencies we know how to express in the algebra: JDs, INDs, EXDs (see above mention of subset comparisons); ah, except those also need to talk about projections on headings, using DUM. For FDs I was using an approach of Appendix A-style function-as-relations, specifying the relations' properties using the algebra ... using DUM. Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

This thread is going to be several posts (in no particular order) as a status report/brain dump of my exploring Relational Lattices. I'll be putting the study aside while I go travelling for several weeks; but (spoiler alert) I don't feel very inclined to pick it up again. I think we have a quaint curiosity of little practical benefit. Which is to say the benefits derive from features that were already separately identified and understood before Vadim's Relational Lattices put them together:

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.
The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...
Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...
The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).
NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.
Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.
We take FOPL with equality = between relations, as I've already assumed above.
We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

Nothing I've said so far requires us to treat relation values as inhabiting a Lattice. If we take the domain of discourse to be all possible relation values obtainable from some heading H , defined as per TTM, full spec in a following post, they naturally form a Lattice algebraic structure using NatJOIN, InnerUNION as Lattice meet, join respectively -- that is providing we can get over the confusing terminological confusion in 'join'. To be able to draw from the substantial theory of Lattices we need just one more thing: I've drawn attention to the critical importance of DUM for the equivalences of operators; we need a way to distinguish the properties of DUM as a unique element within the set of Lattice elements.

Otherwise, all reasoning involving DUM (which is all reasoning) is off: we can neither prove nor disprove equivalences of expressions; IOW we can't support some query engine's optimisation of an expression into an equivalent but more efficient query plan -- except by a bunch of hand-wavy ad-hoc rules, which puts us in no better a position than for SQL. That's because the definitions/equivalences for all of the operators, see above, rely on DUM.

My line of thought started from trying to capture the properties of Functional Dependencies, as expressed as equivalences between expressions over whole-relations in the Lattice algebra, avoiding mention of attribute names. Note other Dependencies we know how to express in the algebra: JDs, INDs, EXDs (see above mention of subset comparisons); ah, except those also need to talk about projections on headings, using DUM. For FDs I was using an approach of Appendix A-style function-as-relations, specifying the relations' properties using the algebra ... using DUM. Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

#2 · January 4, 2020, 6:31 am

Quote from AntC on January 4, 2020, 6:31 am
Nothing I've said so far requires us to treat relation values as inhabiting a Lattice. If we take the domain of discourse to be all possible relation values obtainable from some heading H , defined as per TTM, full spec in a following post, they naturally form a Lattice algebraic structure using NatJOIN, InnerUNION as Lattice meet, join respectively ...

I'll motivate this post with a question to Vadim. The easiest to understand and most 'complete' in a sense lattice structure is the powerset as per this wikipedia Hasse diagram of a powerset lattice structure . Relational Lattices contain sub-lattices that exhibit powerset structure in two orthogonal 'dimensions' so to speak: for a given heading, all possible relation values with that heading form a powerset of all tuples with that heading; for a given set of attributes, all possible headings form a powerset from the empty heading through all combinations of attributes up to the heading with all attributes.

Q1: is there a way to axiomatise that a certain sub-lattice must be a powerset? The Litak axioms go some of the way but not far enough. Specifically, one reason for the intractability I'm experiencing is I'm getting counter-examples that are structures that couldn't possibly be a Relational Lattice because they don't exhibit that powerset structure.

There's two equivalent ways to specify a Lattice structure: one is as a partially ordered set; but the more useful way for current purposes is as a Bounded lattice:

A bounded lattice is an algebraic structure of the form (L, ∨, ∧, 0, 1) such that (L, ∨, ∧) is a lattice, 0 (the lattice's bottom) is the identity element for the join operation ∨, and 1 (the lattice's top) is the identity element for the meet operation ∧.

Operation ∧ is Lattice meet, which we'll take as NatJOIN.

1 is the identity element for NatJOIN i.e. DUM aka Vadim's R01: the relation with empty heading (0) and maximum possible content for that heading (1).

Operation ∨ is Lattice join, the dual of NatJOIN i.e. InnerUNION.

0 is the identity element for InnerUNION, also the 'absorbing element' for NatJOIN, unnamed in TTM, I call it DUMPTY aka Vadim's R10: the relation with maximum possible heading (1) and empty content (0).

The rest of this post is specifying the relation values as elements within set L, but I'll just pause to consider that Lattice bottom value. Litak et al are wary of it [private correspondence] because they fear specifying "maximum possible heading" might be domain-dependent and therefore introduce indeterminacy or 'unsafe queries'. I'm going to specify L using TTM terminology, in such a way that I think avoids any suspicion. Other than that, this treatment pretty much follows Litak et al's [2015 section 2.1], with one important difference I'll come to.

Specifying Lattice Elements using TTM Terminology

Start with a heading H, defined as per RM Pre 9 and supporting definitions. So H is a set. This might be the heading of a relation, but it's more useful to think of it as the union of headings of all relvars in the database. Oh, unioned with the headings of any ad hoc/temp relations created (via EXTEND/RENAME or relation literals) by queries in the active domain. We require all these relvars/relations to be join-compatible, that is same attribute name is at same attribute type.

Let d be the degree of H.

Form the powerset of H, that gives 2^dheadings, including the empty heading. Amongst those headings are the headings for every relvar within the database; the heading for every NatJOIN of those relvars, and NatJOIN of those joins, up to their transitive closure; the heading for every PROJECT, REMOVE over those relvars and their joins; IOW the headings for every possible query over the database/active domain.

For each of those headings, form all possible tuples with that heading. I concede the "all possible tuples" is domain dependent -- in the same way that Appendix A's <OR>, <NOT> are domain dependent. But this domain dependence is not something I'm going to rely on. I'm not going to expect there be a (finite) relation with all possible tuples, or with all possible tuples for all possible attributes.

For each heading, form the powerset of all possible subsets of possible tuples, including the empty subset and the full set. So if there are t possible tuples, that's 2^t relation values.

All the relation values so formed give the elements of L.

Note that because I started with a TTM heading, I have a type for each attribute, and the set of values within that type. This is a point of difference compared with Litak at al, who assume the same set of values for all attributes.

[end of specification]

I'm mentioning these metrics along the way, because I have a more specific way of asking the question to Vadim:

Q2. Is there a way to axiomatise that the number of distinct headings (i.e. of empty relation values in L) be a power of 2? Is there a way to axiomatise that the number of relation values (elements of L) for a given heading must be a power of 2? The Litak et al axioms do impose a restriction that the subset of empty relations and of relations with the same heading be a distributive lattice, and powersets are distributive in that way, but that's not a sufficient restriction.

There's a further metric over the elements of L that exhibits a sort of 'glitch' or discontinuity, and that's very much bound up with empty relations, which brings us down to earth with a DUM bump. Consider within heading H, attribute A at type TA with cardinality cA, and attribute B at type TB with cardinality cB. There are 2^cA relation values with heading {A TA} and 2^cB relation values with heading {B TB}. How many relation values are there with heading {A TA, B TB}? We can generate them by forming the Cartesian product of each of the relation values with a single attribute, so that's 2^cA × 2^cB right?

No: for each single-attribute heading there's a value with empty content. The Cartesian product of that with all the non-empty relations of the other attribute still gives an empty result. So the correct count is ( (2^cA -1) × (2^cB -1) ) +1, in which the ( -1) takes the empty relation out of the product and the +1 adds back the empty relation with the full heading. Again, I'm drawing attention to this because the Litak et al axioms don't capture it; and yet it's a symptom of empty relations recognisable algebrically as r NatJOIN DUM = r.

We can extend that adjusted Cartesian product business to get a metric for the total number of elements in L; but then I'm of course heavily domain dependent.

I've tried capturing algebraically this 'glitch'/discontinuity wrt to empty relations. It's needed axiomatising the idea of a lattice cover.

A lattice element y is said to cover another element x, if y > x, but there does not exist a z such that y > z > x. Here, y > x means x ≤ y and x ≠ y.

Bringing in a disequality ≠ and negation and existential ¬(∃z ...) pretty much brings all theorem proving and counter-example searching to a grinding halt, as we depart the world of constructivist logic and therefore constructive proofs.

Nothing I've said so far requires us to treat relation values as inhabiting a Lattice. If we take the domain of discourse to be all possible relation values obtainable from some heading H , defined as per TTM, full spec in a following post, they naturally form a Lattice algebraic structure using NatJOIN, InnerUNION as Lattice meet, join respectively ...

I'll motivate this post with a question to Vadim. The easiest to understand and most 'complete' in a sense lattice structure is the powerset as per this wikipedia Hasse diagram of a powerset lattice structure . Relational Lattices contain sub-lattices that exhibit powerset structure in two orthogonal 'dimensions' so to speak: for a given heading, all possible relation values with that heading form a powerset of all tuples with that heading; for a given set of attributes, all possible headings form a powerset from the empty heading through all combinations of attributes up to the heading with all attributes.

Q1: is there a way to axiomatise that a certain sub-lattice must be a powerset? The Litak axioms go some of the way but not far enough. Specifically, one reason for the intractability I'm experiencing is I'm getting counter-examples that are structures that couldn't possibly be a Relational Lattice because they don't exhibit that powerset structure.

There's two equivalent ways to specify a Lattice structure: one is as a partially ordered set; but the more useful way for current purposes is as a Bounded lattice:

A bounded lattice is an algebraic structure of the form (L, ∨, ∧, 0, 1) such that (L, ∨, ∧) is a lattice, 0 (the lattice's bottom) is the identity element for the join operation ∨, and 1 (the lattice's top) is the identity element for the meet operation ∧.

Operation ∧ is Lattice meet, which we'll take as NatJOIN.
1 is the identity element for NatJOIN i.e. DUM aka Vadim's R01: the relation with empty heading (0) and maximum possible content for that heading (1).
Operation ∨ is Lattice join, the dual of NatJOIN i.e. InnerUNION.
0 is the identity element for InnerUNION, also the 'absorbing element' for NatJOIN, unnamed in TTM, I call it DUMPTY aka Vadim's R10: the relation with maximum possible heading (1) and empty content (0).

The rest of this post is specifying the relation values as elements within set L, but I'll just pause to consider that Lattice bottom value. Litak et al are wary of it [private correspondence] because they fear specifying "maximum possible heading" might be domain-dependent and therefore introduce indeterminacy or 'unsafe queries'. I'm going to specify L using TTM terminology, in such a way that I think avoids any suspicion. Other than that, this treatment pretty much follows Litak et al's [2015 section 2.1], with one important difference I'll come to.

Specifying Lattice Elements using TTM Terminology

Start with a heading H, defined as per RM Pre 9 and supporting definitions. So H is a set. This might be the heading of a relation, but it's more useful to think of it as the union of headings of all relvars in the database. Oh, unioned with the headings of any ad hoc/temp relations created (via EXTEND/RENAME or relation literals) by queries in the active domain. We require all these relvars/relations to be join-compatible, that is same attribute name is at same attribute type.
Let d be the degree of H.
Form the powerset of H, that gives 2^dheadings, including the empty heading. Amongst those headings are the headings for every relvar within the database; the heading for every NatJOIN of those relvars, and NatJOIN of those joins, up to their transitive closure; the heading for every PROJECT, REMOVE over those relvars and their joins; IOW the headings for every possible query over the database/active domain.
For each of those headings, form all possible tuples with that heading. I concede the "all possible tuples" is domain dependent -- in the same way that Appendix A's <OR>, <NOT> are domain dependent. But this domain dependence is not something I'm going to rely on. I'm not going to expect there be a (finite) relation with all possible tuples, or with all possible tuples for all possible attributes.
For each heading, form the powerset of all possible subsets of possible tuples, including the empty subset and the full set. So if there are t possible tuples, that's 2^t relation values.
All the relation values so formed give the elements of L.
Note that because I started with a TTM heading, I have a type for each attribute, and the set of values within that type. This is a point of difference compared with Litak at al, who assume the same set of values for all attributes.

[end of specification]

I'm mentioning these metrics along the way, because I have a more specific way of asking the question to Vadim:

Q2. Is there a way to axiomatise that the number of distinct headings (i.e. of empty relation values in L) be a power of 2? Is there a way to axiomatise that the number of relation values (elements of L) for a given heading must be a power of 2? The Litak et al axioms do impose a restriction that the subset of empty relations and of relations with the same heading be a distributive lattice, and powersets are distributive in that way, but that's not a sufficient restriction.

There's a further metric over the elements of L that exhibits a sort of 'glitch' or discontinuity, and that's very much bound up with empty relations, which brings us down to earth with a DUM bump. Consider within heading H, attribute A at type TA with cardinality cA, and attribute B at type TB with cardinality cB. There are 2^cA relation values with heading {A TA} and 2^cB relation values with heading {B TB}. How many relation values are there with heading {A TA, B TB}? We can generate them by forming the Cartesian product of each of the relation values with a single attribute, so that's 2^cA × 2^cB right?

No: for each single-attribute heading there's a value with empty content. The Cartesian product of that with all the non-empty relations of the other attribute still gives an empty result. So the correct count is ( (2^cA -1) × (2^cB -1) ) +1, in which the ( -1) takes the empty relation out of the product and the +1 adds back the empty relation with the full heading. Again, I'm drawing attention to this because the Litak et al axioms don't capture it; and yet it's a symptom of empty relations recognisable algebrically as r NatJOIN DUM = r.

We can extend that adjusted Cartesian product business to get a metric for the total number of elements in L; but then I'm of course heavily domain dependent.

I've tried capturing algebraically this 'glitch'/discontinuity wrt to empty relations. It's needed axiomatising the idea of a lattice cover.

A lattice element y is said to cover another element x, if y > x, but there does not exist a z such that y > z > x. Here, y > x means x ≤ y and x ≠ y.

Bringing in a disequality ≠ and negation and existential ¬(∃z ...) pretty much brings all theorem proving and counter-example searching to a grinding halt, as we depart the world of constructivist logic and therefore constructive proofs.

#3 · January 4, 2020, 10:28 am

Quote from AntC on January 4, 2020, 10:28 am
... Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

Warning: this post will be mostly algebra. Worse still, in Prover9 syntax.

How am I "defining sufficient properties to identify" some distinguished element of the relation values in L? Take the definition for DEE the identity for NatJOIN aka R10, the identity for lattice meet ^.

x ^ R10 = x. There's an implied ∀x around that, which Prover9 spells (all x (x ^ R10 = x)).

To prove that definition uniquely fixes R10, give the same definition for a differently named value, get the engine to prove those values are equal:
(all y (y ^ R10b = y)) -> R10 = R10b. proven within a tenth of a second.

If only the properties for DUM aka R00 were so easy to specify. We can take one of the equivalences that mention DUM from my first post

r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty;

So x ^ R00 = x <-> x v R00 = R00.? Not so fast. This holds: x ^ y = x <-> x v y = y. as a consequence of the definitions for ^, v being duals. So substituting R00 for y doesn't fix R00 whatsoever.

x ^ R00 ≠ x <-> x v R00 = R01. is that test for non-empty. At least we've already fixed R01; but we have a disequality on LHS of the equivalence, so this restricts R00 but doesn't fix it. (In effect, it's merely requiring that R01 be a cover for R00, see previous post for cover.)

Next I should I suppose apologise to Litak et al. Their two main axioms AxRH1, AxRH2 involve DUM/R00, which they call H just to add to the confusion, along with their using 'header' when they mean 'heading' grr. But only a mealy-mouthed apology because they fail to spell out the useful consequences of those axioms:

The sub-lattice of relations with the same heading is distributive. Actually the distributiveness is more powerful than that but still weaker than saying the sub-lattice forms a powerset.

The sub-lattice of empty relations is distributive. ditto.

The M₃ 'diamond' cannot appear as an embedded sub-lattice anywhere, even across elements with different headings or between empty and non-empty elements.

Given how much my earlier post relied on DUM/R00 when talking about headings and empty relations, you'd think that would easily put R00 in its place. But no, it doesn't even prove my equivalence above wrt non-empty relations. Their axioms fail to have the consequence that R00 is distinct from R01! Putting their axioms together with my non-empty doo-hickey, giving the same properties to a differently named value, and then asking if that fixes them equal gives this counter-example (here's the Hasse diagram, apologies for the ASCII art):
   R01
  /   \
R00   R00b
  \   /
    ?
    |
   R10    -- lattice bottom
However we interpret alleged-element ?, that can't be a structure generated for a Relational Lattice:

If ? denotes an empty relation as you'd think from it lying between R00, R10, it must have a heading strictly wider than R00 (empty heading) and strictly narrower than R01 (the empty relation with all possible attributes).

Say ? has heading {A TA} and R01 has heading {A TA, B TB} then there should be an empty relation with heading {B TB}. IOW the sub-lattice R00 - ? - R10 is distributive (trivially), but doesn't exhibit a powerset structure.

Whatever is the heading of R10, there should be a non-empty relation with the same heading. Call this missing relation R11, then it must be such that R11 ^ R00 = R10. There's no candidate for that. IOW there's no powerset sub-lattice structure of non-empty relations with R10's heading and orthogonal to the sub-lattice of empty relations. A (degenerate) explanation there is that type TB has an empty set of values, IOW is Omega.

OK here's my speculative axiom to at least prohibit 'dangling nodes' like ? - R10 above and give a sniff of powerset structure:

(x > z1 & z1 > y) -> exists z2 (x > z2 & z2 > y & z2 != z1)., in which
% x strictly greater y.
x > y <-> (x ^ y = y & x != y). -- taken from wikipedia on cover

Gloss: for each chain from x to y via z1, there must be at least one alternative chain from x to y via z2. All the ugly stuff with ≠ is needed to avoid trivially satisfying it with x = z1 and/or z2 = y, etc. All the ugly stuff with ≠ does of course drive the proof engine into paroxysms of undecidability. I'm amazed it doesn't run forever.

Adding this restriction does prevent the 5-element model above, but just generates an 8-element counter-example that is (plausibly) a set of Relational Lattice elements, but not sufficient to fix R00. Even if it did, I fear trying to do any inference from it about equivalences between operators will need at least as long as the heat-death of the universe.

I suppose I should add that all the counter-examples represent relation values where there is at most one value within each attribute type. I'd call that degenerate/trivial but not implausible. I could add more axioms to require more values; but that would be more ≠ equations, thus needing several heat-deaths to prove anything. I'm sounding like Deep Thought.

Oh, I did some experimentation with Vadim's FDA and FDA^-1. These give R00 in terms of R11, which is the (domain-dependent) relation with all possible attributes and all possible tuples for those attributes. Except that those axioms define R11 in terms of R00. So it doesn't fix nothing; indeed there's a consistent model with R00 = R01, R11 = R10. (That's even with the Litak et all axioms in play.)

... Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

Warning: this post will be mostly algebra. Worse still, in Prover9 syntax.

How am I "defining sufficient properties to identify" some distinguished element of the relation values in L? Take the definition for DEE the identity for NatJOIN aka R10, the identity for lattice meet ^.

x ^ R10 = x. There's an implied ∀x around that, which Prover9 spells (all x (x ^ R10 = x)).
To prove that definition uniquely fixes R10, give the same definition for a differently named value, get the engine to prove those values are equal:
(all y (y ^ R10b = y)) -> R10 = R10b. proven within a tenth of a second.

If only the properties for DUM aka R00 were so easy to specify. We can take one of the equivalences that mention DUM from my first post

r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty;
So x ^ R00 = x <-> x v R00 = R00.? Not so fast. This holds: x ^ y = x <-> x v y = y. as a consequence of the definitions for ^, v being duals. So substituting R00 for y doesn't fix R00 whatsoever.
x ^ R00 ≠ x <-> x v R00 = R01. is that test for non-empty. At least we've already fixed R01; but we have a disequality on LHS of the equivalence, so this restricts R00 but doesn't fix it. (In effect, it's merely requiring that R01 be a cover for R00, see previous post for cover.)

Next I should I suppose apologise to Litak et al. Their two main axioms AxRH1, AxRH2 involve DUM/R00, which they call H just to add to the confusion, along with their using 'header' when they mean 'heading' grr. But only a mealy-mouthed apology because they fail to spell out the useful consequences of those axioms:

The sub-lattice of relations with the same heading is distributive. Actually the distributiveness is more powerful than that but still weaker than saying the sub-lattice forms a powerset.
The sub-lattice of empty relations is distributive. ditto.
The M₃ 'diamond' cannot appear as an embedded sub-lattice anywhere, even across elements with different headings or between empty and non-empty elements.

Given how much my earlier post relied on DUM/R00 when talking about headings and empty relations, you'd think that would easily put R00 in its place. But no, it doesn't even prove my equivalence above wrt non-empty relations. Their axioms fail to have the consequence that R00 is distinct from R01! Putting their axioms together with my non-empty doo-hickey, giving the same properties to a differently named value, and then asking if that fixes them equal gives this counter-example (here's the Hasse diagram, apologies for the ASCII art):

   R01
  /   \
R00   R00b
  \   /
    ?
    |
   R10    -- lattice bottom

However we interpret alleged-element ?, that can't be a structure generated for a Relational Lattice:

If ? denotes an empty relation as you'd think from it lying between R00, R10, it must have a heading strictly wider than R00 (empty heading) and strictly narrower than R01 (the empty relation with all possible attributes).
Say ? has heading {A TA} and R01 has heading {A TA, B TB} then there should be an empty relation with heading {B TB}. IOW the sub-lattice R00 - ? - R10 is distributive (trivially), but doesn't exhibit a powerset structure.
Whatever is the heading of R10, there should be a non-empty relation with the same heading. Call this missing relation R11, then it must be such that R11 ^ R00 = R10. There's no candidate for that. IOW there's no powerset sub-lattice structure of non-empty relations with R10's heading and orthogonal to the sub-lattice of empty relations. A (degenerate) explanation there is that type TB has an empty set of values, IOW is Omega.

OK here's my speculative axiom to at least prohibit 'dangling nodes' like ? - R10 above and give a sniff of powerset structure:

(x > z1 & z1 > y) -> exists z2 (x > z2 & z2 > y & z2 != z1)., in which
% x strictly greater y.
x > y <-> (x ^ y = y & x != y). -- taken from wikipedia on cover

Gloss: for each chain from x to y via z1, there must be at least one alternative chain from x to y via z2. All the ugly stuff with ≠ is needed to avoid trivially satisfying it with x = z1 and/or z2 = y, etc. All the ugly stuff with ≠ does of course drive the proof engine into paroxysms of undecidability. I'm amazed it doesn't run forever.

Adding this restriction does prevent the 5-element model above, but just generates an 8-element counter-example that is (plausibly) a set of Relational Lattice elements, but not sufficient to fix R00. Even if it did, I fear trying to do any inference from it about equivalences between operators will need at least as long as the heat-death of the universe.

I suppose I should add that all the counter-examples represent relation values where there is at most one value within each attribute type. I'd call that degenerate/trivial but not implausible. I could add more axioms to require more values; but that would be more ≠ equations, thus needing several heat-deaths to prove anything. I'm sounding like Deep Thought.

Oh, I did some experimentation with Vadim's FDA and FDA^-1. These give R00 in terms of R11, which is the (domain-dependent) relation with all possible attributes and all possible tuples for those attributes. Except that those axioms define R11 in terms of R00. So it doesn't fix nothing; indeed there's a consistent model with R00 = R01, R11 = R10. (That's even with the Litak et all axioms in play.)

#4 · January 5, 2020, 12:04 am

Quote from AntC on January 4, 2020, 6:31 am

Q2. Is there a way to axiomatise that the number of distinct headings (i.e. of empty relation values in L) be a power of 2? Is there a way to axiomatise that the number of relation values (elements of L) for a given heading must be a power of 2? The Litak et al axioms do impose a restriction that the subset of empty relations and of relations with the same heading be a distributive lattice, and powersets are distributive in that way, but that's not a sufficient restriction.

Leveraging Stone's representation theorem for Boolean algebras, all we need to do is to assert that the subset of all empty relations is Boolean Algebra. That is, take standard axiomatization of Boolean Algebra while substituting axiom's free variables with R00^x, R00^y, and R00^z . This can't be accomplished with the two lattice operations and constants alone, but can be done with the help of the unary attribute complement.

My interest however, lies outside of your requirement that the set of headers must be a full powerset. For that matter, I don't agree that relation tuples should necessarily form a powerset as well. To reiterate, I consider the lattice at Figure 1 of Relational Lattice Axioms as being fully legitimate.

Why deemphasizing Boolean algebra/Power set structure on attributes & tuples? In section 2.3 of the Litak&Mikulas&Hidders paper the authors proposed a generic recipe how to manufacture relational lattices out of other mathematical structures. Followed their idea I pondered what might be the relational lattice where attributes are organized into the lattice of all linear subspaces. Tadeusz, however, was not very impressed by "...the fact that a class of mathematical objects [relational lattices] can be found in hindsight in various mathematical disguises...".

#5 · January 5, 2020, 7:38 am

Quote from dandl on January 5, 2020, 7:38 am

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.

The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...

Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...

The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

That's hand-wavy. I don't think you can do that in the same style as the above.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).

NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.

Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.

We take FOPL with equality = between relations, as I've already assumed above.

We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.

The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...

Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...

The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

That's hand-wavy. I don't think you can do that in the same style as the above.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).

NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.

Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.

We take FOPL with equality = between relations, as I've already assumed above.

We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

Andl - A New Database Language - andl.org

#6 · January 5, 2020, 12:06 pm

Quote from AntC on January 5, 2020, 12:06 pm

Quote from Vadim on January 5, 2020, 12:04 am

Quote from AntC on January 4, 2020, 6:31 am

Q2. Is there a way to axiomatise that the number of distinct headings (i.e. of empty relation values in L) be a power of 2? Is there a way to axiomatise that the number of relation values (elements of L) for a given heading must be a power of 2? The Litak et al axioms do impose a restriction that the subset of empty relations and of relations with the same heading be a distributive lattice, and powersets are distributive in that way, but that's not a sufficient restriction.

Leveraging Stone's representation theorem for Boolean algebras, all we need to do is to assert that the subset of all empty relations is Boolean Algebra. That is, take standard axiomatization of Boolean Algebra while substituting axiom's free variables with R00^x, R00^y, and R00^z . This can't be accomplished with the two lattice operations and constants alone, but can be done with the help of the unary attribute complement.

Thanks yes, I've discovered that the official term for what I called 'powerset' is Boolean Algebra. A Boolean Algebra is a 'relatively complemented' lattice aot, as it shows here. So no we don't need a unary complement; a relative complement will do. And in fact the Litak axioms provide enough (nearly). Here's a couple of theorems proven from Litak et al axioms
% interval [x ^ R00, x] (unique relative complement) proven from Litak.
x ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = x
& y ^ z2 = (x ^ R00) & y v z2 = x
-> z1 = z2.

% heading subinterval [x ^ R00, R00] unique relative complement.
% proven from Litak.
R00 ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = R00
& y ^ z2 = (x ^ R00) & y v z2 = R00
-> z1 = z2.
But there's something else needed to be a Boolean Algebra, and the Litak axioms don't provide it. It's one of the explanations for why I was getting counter-examples that were 'impossible' Relational Lattices: As well as the relative complement being unique, it must exist. (There's no possible complement for the ? node in the R00- ? - R10 sub-lattice in my earlier diagram.) So I fixed that with a couple of ugly axioms.
% interval [x ^ R00, x] must have complements.
x ^ y = y & y ^ (x ^ R00) = (x ^ R00)
-> exists z (y ^ z = (x ^ R00) & y v z = x).

% interval [x ^ R00, R00] must have complements.
R00 ^ y = y & y ^ (x ^ R00) = (x ^ R00)
-> exists z (y ^ z = (x ^ R00) & y v z = R00).
In fact a lattice-global complement (unary or otherwise) is just wrong: it's only specific sub-intervals that exhibit well-behaved complements -- i.e. existing and unique.

My interest however, lies outside of your requirement that the set of headers must be a full powerset. For that matter, I don't agree that relation tuples should necessarily form a powerset as well. To reiterate, I consider the lattice at Figure 1 of Relational Lattice Axioms as being fully legitimate.

Then you're disagreeing with Litak et al. And I certainly have no interest in pursuing that line of thought, as we've discussed before.

Why deemphasizing Boolean algebra/Power set structure on attributes & tuples? In section 2.3 of the Litak&Mikulas&Hidders paper the authors proposed a generic recipe how to manufacture relational lattices out of other mathematical structures. Followed their idea I pondered what might be the relational lattice where attributes are organized into the lattice of all linear subspaces. Tadeusz, however, was not very impressed by "...the fact that a class of mathematical objects [relational lattices] can be found in hindsight in various mathematical disguises...".

I'm not interested in mathematical disguises. Most of the Litak et al papers seem to be written in a foreign language. They don't need any more disguising to be inscrutable. I'm interested in being able to reason about queries. If you're going to throw out the Boolean Algebra, that'll make it even less tractable than I'm finding already. So no, thanks.

Quote from Vadim on January 5, 2020, 12:04 am

Quote from AntC on January 4, 2020, 6:31 am

Q2. Is there a way to axiomatise that the number of distinct headings (i.e. of empty relation values in L) be a power of 2? Is there a way to axiomatise that the number of relation values (elements of L) for a given heading must be a power of 2? The Litak et al axioms do impose a restriction that the subset of empty relations and of relations with the same heading be a distributive lattice, and powersets are distributive in that way, but that's not a sufficient restriction.

Leveraging Stone's representation theorem for Boolean algebras, all we need to do is to assert that the subset of all empty relations is Boolean Algebra. That is, take standard axiomatization of Boolean Algebra while substituting axiom's free variables with R00^x, R00^y, and R00^z . This can't be accomplished with the two lattice operations and constants alone, but can be done with the help of the unary attribute complement.

Thanks yes, I've discovered that the official term for what I called 'powerset' is Boolean Algebra. A Boolean Algebra is a 'relatively complemented' lattice aot, as it shows here. So no we don't need a unary complement; a relative complement will do. And in fact the Litak axioms provide enough (nearly). Here's a couple of theorems proven from Litak et al axioms

% interval [x ^ R00, x] (unique relative complement) proven from Litak.
x ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = x
& y ^ z2 = (x ^ R00) & y v z2 = x
-> z1 = z2.

% heading subinterval [x ^ R00, R00] unique relative complement.
% proven from Litak.
R00 ^ y = y & y ^ (x ^ R00) = (x ^ R00)
& y ^ z1 = (x ^ R00) & y v z1 = R00
& y ^ z2 = (x ^ R00) & y v z2 = R00
-> z1 = z2.

But there's something else needed to be a Boolean Algebra, and the Litak axioms don't provide it. It's one of the explanations for why I was getting counter-examples that were 'impossible' Relational Lattices: As well as the relative complement being unique, it must exist. (There's no possible complement for the ? node in the R00- ? - R10 sub-lattice in my earlier diagram.) So I fixed that with a couple of ugly axioms.

% interval [x ^ R00, x] must have complements.
x ^ y = y & y ^ (x ^ R00) = (x ^ R00)
-> exists z (y ^ z = (x ^ R00) & y v z = x).

% interval [x ^ R00, R00] must have complements.
R00 ^ y = y & y ^ (x ^ R00) = (x ^ R00)
-> exists z (y ^ z = (x ^ R00) & y v z = R00).

In fact a lattice-global complement (unary or otherwise) is just wrong: it's only specific sub-intervals that exhibit well-behaved complements -- i.e. existing and unique.

My interest however, lies outside of your requirement that the set of headers must be a full powerset. For that matter, I don't agree that relation tuples should necessarily form a powerset as well. To reiterate, I consider the lattice at Figure 1 of Relational Lattice Axioms as being fully legitimate.

Then you're disagreeing with Litak et al. And I certainly have no interest in pursuing that line of thought, as we've discussed before.

Why deemphasizing Boolean algebra/Power set structure on attributes & tuples? In section 2.3 of the Litak&Mikulas&Hidders paper the authors proposed a generic recipe how to manufacture relational lattices out of other mathematical structures. Followed their idea I pondered what might be the relational lattice where attributes are organized into the lattice of all linear subspaces. Tadeusz, however, was not very impressed by "...the fact that a class of mathematical objects [relational lattices] can be found in hindsight in various mathematical disguises...".

I'm not interested in mathematical disguises. Most of the Litak et al papers seem to be written in a foreign language. They don't need any more disguising to be inscrutable. I'm interested in being able to reason about queries. If you're going to throw out the Boolean Algebra, that'll make it even less tractable than I'm finding already. So no, thanks.

#7 · January 5, 2020, 12:24 pm

Quote from AntC on January 5, 2020, 12:24 pm

Quote from dandl on January 5, 2020, 7:38 am

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.

The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...

Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...

The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).

NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.

Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.

We take FOPL with equality = between relations, as I've already assumed above.

We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

Quote from dandl on January 5, 2020, 7:38 am

The importance of Natural JOIN/Appendix A <AND> as a powerful operation amongst relations. NatJOIN is closed over relations, whereas Codd's Cartesian product and his 'union-compatible' operators are partial functions.

The natural dual of that, Inner UNION, aka ISBL UNION aka SQL UNION CORRESPONDING as a closed-over-relations counterpart of Codd's UNION, avoiding being restricted to only 'union-compatible' operands. I think this is just as valid a generalisation as Appendix A's <OR>; has the merit of being domain-independent; and captures the effect of projecting a relation on the heading of another, which leads on to ...

Treating relation values as 'standing for' their headings, thereby avoiding introducing into the algebra a need for attribute names or sets thereof as an additional syntactic or semantic construct. Critical to getting this to work is ...

The importance of a distinguished relation value denoting the relation with an empty heading and no tuples [RM Pro 5], that is DUM aka Tropashko's R00. Then relation r NatJOIN DUM gives a relation same heading as r but empty; r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty; (r InnerUNION DUM) NatJOIN DUM = r NatJOIN DUM is a test for r having an empty heading ; r NatJOIN DUM = s NatJOIN DUM is a test for r, s having the same heading; r InnerUNION (s NatJOIN DUM) gives r projected on the attributes in common with s, but ignoring s's content.

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

I've put that last bullet at some length to demonstrate the sort of algebraic and equational reasoning about programs/expressions that I was hoping for. Contrast that the reasoning and equivalence of operators in Codd's Algebra (or sets of operators derived from it as used within SQL engines) is much more ad hoc and informal; and hung around with many hand-wavy side conditions like: these two expressions are equivalent providing r, s have no attributes in common. We can express that condition in the Lattice algebra: (r InnerUNION s) NatJOIN DUM = DUM; we can't express it using 'traditional' RA operators.

These operators also have nice properties for heading-inference over expressions (i.e. type inference); whereas Appendix A or Tutorial D operators using (sets of) attribute names really have to be limited to name literals: reasoning over name variables (if it were allowed) would be intractable.

Then if the only entities in the domain of discourse of the algebra is relations (not sets of attributes, not attribute names nor values); and the semantics for NatJOIN/InnerUNION have built-in tests for same attribute names and same attribute values; we can treat other scalar operators as relations in the style of Appendix A PLUS, and expressions in the algebra need only talk about whole-relations.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

Then we can use FOPL to express the logic of our relational expressions, and reason about their properties in FOPL, and use theorem provers to check our reasoning:

The FOPL quantifies over relations (and only those).

NatJOIN, InnerUNION, NOTMATCHING, PROJECT, REMOVE, etc are operations/functions over relations returning relations.

Equivalences between operators we can express as FOPL equivalences over terms invoking those operators: r NOTMATCHING (r NOTMATCHING s) = r MATCHING s, r COMPOSE s = (r NatJOIN s) REMOVE (r InnerUNION s), r REMOVE (r REMOVE s) = r PROJECT s are equivalences give in Appendix A but needing a great deal of handwaving and scare-quotes.

We take FOPL with equality = between relations, as I've already assumed above.

We express subset comparisons between relations using the relational operators: r ⊆ s ≡ (r NatJOIN DUM = s NatJOIN DUM) & r NatJOIN s = r.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

#8 · January 6, 2020, 7:24 am

Quote from dandl on January 6, 2020, 7:24 am

Quote from AntC on January 5, 2020, 12:24 pm

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

I was trying not to go too far down the lattice rabbit hole, and I missed that you planned to import the whole TTM type system via RM Pre 9 and 10. You made a claim about FOPL, but selectors in TTM have unlimited computational power.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

Selectors in TTM take the form of functions, not sets. Selectors can certainly produce duplicate values.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

AppA shows how to put a function into the form of a relation value so that it can be queried. It's still a function, not an RV. For example, there is no guarantee it can be inverted.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

Creating new RVs within the TTM type system is trivial, as is EXTEND. My interest is in separating out the closed-over-relations query language from any kind of type system describing attributes.

Quote from AntC on January 5, 2020, 12:24 pm

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

I was trying not to go too far down the lattice rabbit hole, and I missed that you planned to import the whole TTM type system via RM Pre 9 and 10. You made a claim about FOPL, but selectors in TTM have unlimited computational power.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

Selectors in TTM take the form of functions, not sets. Selectors can certainly produce duplicate values.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

AppA shows how to put a function into the form of a relation value so that it can be queried. It's still a function, not an RV. For example, there is no guarantee it can be inverted.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

Creating new RVs within the TTM type system is trivial, as is EXTEND. My interest is in separating out the closed-over-relations query language from any kind of type system describing attributes.

Andl - A New Database Language - andl.org

#9 · January 6, 2020, 8:36 am

Quote from AntC on January 6, 2020, 8:36 am
A Proof of Uniqueness For R00 (modelling DUM) in the Relational Lattice Algebra

Yeah nah yeah. But really this is a demonstration of how it makes the whole algebra intractable ...

Quote from AntC on January 4, 2020, 10:28 am

... Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

Warning: this post will be mostly algebra. Worse still, in Prover9 syntax.

How am I "defining sufficient properties to identify" some distinguished element of the relation values in L? Take the definition for DEE the identity for NatJOIN aka R10, the identity for lattice meet ^.

...

... We can take one of the equivalences that mention DUM from my first post

r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty;

...

x ^ R00 ≠ x <-> x v R00 = R01. is that test for non-empty. At least we've already fixed R01; but we have a disequality on LHS of the equivalence, so this restricts R00 but doesn't fix it. (In effect, it's merely requiring that R01 be a cover for R00, see previous post for cover.)

Also:

x ^ R00 != R00 % that is, exclude R00, R01.
-> exists y1 exists y2 (R01 > y1 & y1 > y2 & y2 > (x ^ R00)).
In which x > y <-> (x ^ y = y & x != y).

These two amount to:

R00 (modelling DUM) is adjacent to R01 (modelling DEE) in the lattice ordering. IOW there is no z distinct from R00, R01 such that z NatJOIN R00 = R00.

For any other relation x, its heading (x NatJOIN R00) must be at least two distinct steps in the ordering y1, y2 distant from R01.
IOW for each non-empty heading, there must be at least three possible relation values; of which one is its empty relation; that is, at least two values in its attribute type; and the 'Boolean Algebra' powerset axioms mean in practice at least four relations, being 2².

The axioms include the Litak et al axioms; and my extras from a previous post to ensure every boolean H-fibre as Litak "may call" it [2015 paper Section 5] is not only relative-complemented but a relative complement exists for every element.

That's a total of 5 'clunky' axioms including the definition for >, which gets used multiple times. By clunky I mean not equational, not quasi-equational, but with negations and existential quant.

Prover9 proof output attached. Edit: hmm WordPress doesn't seem to like attachments; apply to the author. If you see a .pdf it's really .txt.

With those axioms the proof system is just lost. I can't for example prove the expected properties for NOTMATCHING defined in terms of NatJOIN, InnerUNION, R00

(x \ y) ^ (x ^ y) = (x ^ y) ^ R00. % \ is NOT MATCHING
(x \ y) v (x ^ y) = x.
(x \ (x \ y)) = (x ^ y) v (x ^ R00). [goal] % not proven after several hours trying

There's no counter-example for that goal (up to domain size 10, I've also sampled size 18, 22). Maybe a proof or a counter-example would appear after several days or weeks. Those axioms for \ are equational, so don't make the undecidability any worse than the proof for R00 uniqueness.

The (in effect) lower limit on minimum two distinct values per attribute type is reasonable for practical database applications; but it's galling to limit the theory this way. Furthermore having obtained the uniqueness proof you can't just throw away that axiom for other work: without it, R00 is not fixed (there are many valid solutions for R00); therefore any definition using it would have to hold for all of its possible solutions -- which of course defeats the purpose of being a "definition".

A Proof of Uniqueness For R00 (modelling DUM) in the Relational Lattice Algebra

Yeah nah yeah. But really this is a demonstration of how it makes the whole algebra intractable ...

Quote from AntC on January 4, 2020, 10:28 am

... Reasoning was intractable because, as it turned out, defining sufficient properties to identify DUM is intractable. An explanation for why is for another post.

Warning: this post will be mostly algebra. Worse still, in Prover9 syntax.

How am I "defining sufficient properties to identify" some distinguished element of the relation values in L? Take the definition for DEE the identity for NatJOIN aka R10, the identity for lattice meet ^.

...

... We can take one of the equivalences that mention DUM from my first post

r NatJOIN DUM = r is a test for r being empty; r InnerUNION DUM = DUM is another test; r InnerUNION DUM = DEE is a test for r being non-empty;

...

x ^ R00 ≠ x <-> x v R00 = R01. is that test for non-empty. At least we've already fixed R01; but we have a disequality on LHS of the equivalence, so this restricts R00 but doesn't fix it. (In effect, it's merely requiring that R01 be a cover for R00, see previous post for cover.)

Also:

x ^ R00 != R00 % that is, exclude R00, R01.
-> exists y1 exists y2 (R01 > y1 & y1 > y2 & y2 > (x ^ R00)).
In which x > y <-> (x ^ y = y & x != y).

These two amount to:

R00 (modelling DUM) is adjacent to R01 (modelling DEE) in the lattice ordering. IOW there is no z distinct from R00, R01 such that z NatJOIN R00 = R00.
For any other relation x, its heading (x NatJOIN R00) must be at least two distinct steps in the ordering y1, y2 distant from R01.
IOW for each non-empty heading, there must be at least three possible relation values; of which one is its empty relation; that is, at least two values in its attribute type; and the 'Boolean Algebra' powerset axioms mean in practice at least four relations, being 2².
The axioms include the Litak et al axioms; and my extras from a previous post to ensure every boolean H-fibre as Litak "may call" it [2015 paper Section 5] is not only relative-complemented but a relative complement exists for every element.
That's a total of 5 'clunky' axioms including the definition for >, which gets used multiple times. By clunky I mean not equational, not quasi-equational, but with negations and existential quant.

Prover9 proof output attached. Edit: hmm WordPress doesn't seem to like attachments; apply to the author. If you see a .pdf it's really .txt.

With those axioms the proof system is just lost. I can't for example prove the expected properties for NOTMATCHING defined in terms of NatJOIN, InnerUNION, R00

(x \ y) ^ (x ^ y) = (x ^ y) ^ R00. % \ is NOT MATCHING
(x \ y) v (x ^ y) = x.
(x \ (x \ y)) = (x ^ y) v (x ^ R00). [goal] % not proven after several hours trying

There's no counter-example for that goal (up to domain size 10, I've also sampled size 18, 22). Maybe a proof or a counter-example would appear after several days or weeks. Those axioms for \ are equational, so don't make the undecidability any worse than the proof for R00 uniqueness.

The (in effect) lower limit on minimum two distinct values per attribute type is reasonable for practical database applications; but it's galling to limit the theory this way. Furthermore having obtained the uniqueness proof you can't just throw away that axiom for other work: without it, R00 is not fixed (there are many valid solutions for R00); therefore any definition using it would have to hold for all of its possible solutions -- which of course defeats the purpose of being a "definition".

Uploaded files:

RelQComp-R00unique_proof.pdf

#10 · January 6, 2020, 9:02 am

Quote from AntC on January 6, 2020, 9:02 am
David I have no idea what you're trying to say here. "selectors in TTM have unlimited computational power" what? They're deliberately called "selectors" because they invoke no computational power, as opposed to OOP 'constructors'. I have a strong feeling that if I made any effort to understand your post, it would all be wrong or all be irrelevant to Lattices or both. So I won't bother.

"Selectors can certainly produce duplicate values." what? TTM's 'sets all the way down' are specified on the values produced by Selectors, not the internals of Selectors' operands or how multi-PossReps map to scalar values. If two textually distinct Selector invocations produce the same scalar value, that's one attribute value to appear in a tuple in a relation value; no other tuple can have that same attribute value, unless it's distinct by having a distinct attribute value in some other-named attribute. What's so hard?

Quote from dandl on January 6, 2020, 7:24 am

Quote from AntC on January 5, 2020, 12:24 pm

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

I was trying not to go too far down the lattice rabbit hole, and I missed that you planned to import the whole TTM type system via RM Pre 9 and 10. You made a claim about FOPL, but selectors in TTM have unlimited computational power.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

Selectors in TTM take the form of functions, not sets. Selectors can certainly produce duplicate values.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

AppA shows how to put a function into the form of a relation value so that it can be queried. It's still a function, not an RV. For example, there is no guarantee it can be inverted.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

Creating new RVs within the TTM type system is trivial, as is EXTEND. My interest is in separating out the closed-over-relations query language from any kind of type system describing attributes.

David I have no idea what you're trying to say here. "selectors in TTM have unlimited computational power" what? They're deliberately called "selectors" because they invoke no computational power, as opposed to OOP 'constructors'. I have a strong feeling that if I made any effort to understand your post, it would all be wrong or all be irrelevant to Lattices or both. So I won't bother.

"Selectors can certainly produce duplicate values." what? TTM's 'sets all the way down' are specified on the values produced by Selectors, not the internals of Selectors' operands or how multi-PossReps map to scalar values. If two textually distinct Selector invocations produce the same scalar value, that's one attribute value to appear in a tuple in a relation value; no other tuple can have that same attribute value, unless it's distinct by having a distinct attribute value in some other-named attribute. What's so hard?

Quote from dandl on January 6, 2020, 7:24 am

Quote from AntC on January 5, 2020, 12:24 pm

I like these a lot, except for one problem. Where do relation values come from? How do you express the idea that attribute names form a set? How do you express relation value equality, so as to avoid duplicates?

Not sure if you wrote this after seeing my second post. 'Specifying Lattice Elements using TTM Terminology'. RM Pre 9 and 10, and supporting definitions. Also Appendix A setbuilder for the relational operators avoids generating duplicates in query results.

I was trying not to go too far down the lattice rabbit hole, and I missed that you planned to import the whole TTM type system via RM Pre 9 and 10. You made a claim about FOPL, but selectors in TTM have unlimited computational power.

The natural numbers can be generated from 0 and +1. Do you have something like that in mind? Please show how.

Again that section in the second post. It's set operations all the way down. (Scalar types specify sets of values, even if they're potentially infinite.)

Selectors in TTM take the form of functions, not sets. Selectors can certainly produce duplicate values.

That's hand-wavy. I don't think you can do that in the same style as the above.

As Hugh has said, PLUS is just a relation value. Constructed as per RM Pre 9, 10.

AppA shows how to put a function into the form of a relation value so that it can be queried. It's still a function, not an RV. For example, there is no guarantee it can be inverted.

As I've said before, I don't think you can create new relation values within FOPL. You would have to show how, in some detail.

A EXTEND boils down to a NatJOIN, as per Appendix A; probably joining to a PLUS-style scalar operator as relation constant. The FOPL axiomatisation I'm talking about here just quantifies over whole-relations. Ideally we want to express in FOPL the Functional Dependencies that come with PLUS; but that's beyond intractable.

As I've said before, you seem to be the only person who doesn't see how "you can create new relation values".

Creating new RVs within the TTM type system is trivial, as is EXTEND. My interest is in separating out the closed-over-relations query language from any kind of type system describing attributes.

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters

Relational Lattice: 2019 Report Card