Generalising Codd's operators; and a quiz

#31 · July 8, 2019, 10:34 am

Quote from AntC on July 8, 2019, 10:34 am

Quote from dandl on July 8, 2019, 5:29 am

Quote from AntC on July 8, 2019, 3:53 am

Quote from dandl on July 6, 2019, 5:10 pm

Indeed. Informally, intersection before projection can remove tuples that would survive if projection goes first. Seems obvious given the counter-example.

I wonder if this has a real world application.

More formally we could say that associativity of InnerUnion is not valid because it would need push-project-through-join (or lift-project-outside-join), which is not valid where the projected-away attributes include those in common between the operands of the join. To express that we need a way to formulate 'projected-away attributes' as smoothly as 'projected-on attributes'.

You said that very nicely. I wonder why you didn't see it that way before being provided with that counter example. No matter.

Actually, it does matter, because it demonstrates that it's really hard to reason informally about operations over relations. There's a great deal of ad-hoc/rule-of-thumb in (for example) query optimisation -- like the rules for push-project-through-join. Codd's algebra (or rather the pale imitation of it inside query engines that have to algebraitise SQL's nutty semantics) is not tractable to 'equational reasoning' like you'd use in FP.

I'm not saying that reasoning over lattice structures is easy. Although relational JOIN ≡ lattice meet gets you off to a flying start, it's a great deal less than intuitive that the lattice dual of meet (which most SQLs don't implement) could combine with DUM (which no SQLs implement) to achieve projection. So I was floundering around before seeing Tegiri's counter-example and proof of equivalence; failing to either formally prove what seemed the intuitive answer (which was wrong) or to disprove it.

Now Appendix A does that with its <REMOVE>, but smooth it isn't. There's a great deal of weasel words and hand-waving (like the talk of scare-quotes "macro" operator).

Then one real world application is: to get relational expressions more like algebra; so that the programmer can reason about their code, and chuck it at theorem provers to verify their intuitions; and so that optimisers can rearrange the code being confident they are not corrupting its semantics. In fact how Functional Programmers tend to operate anyway.

A worthy goal. I don't find RA expressions hard to reason about, as long as the syntax is natural and flows nicely. But all I'm trying to do is find some reasonable path to get from point A to point B. It's way harder to refactor code, or choose the best query plan. For that you really need rock solid underlying theory. You need a set of provable rewrite rules. Is that your aim? Do you expect to achieve it?

My aim is not exactly that. Rewrite rules/refactoring for optimisation are going to depend chiefly on available indexes and relative cardinalities; and the indexing characteristics of some query engine. I don't see that's tractable to formal algebra. But what is tractable is to take the optimised and original unoptimised query expressions, and algebraically prove they're equivalent. In the sense: produce the same output for a given input.

I don't see we get "rock solid underlying theory" with Codd's operators: they're partial ('union compatible'); attribute-comma-lists are not first-class; theta-expressions in restrictions or theta-joins are not first-class; the components of rename/extend are not first-class; etc, etc. Appendix A has tackled many of those limitations -- especially with 'Treating Operators as Relations', but still attribute-comma-lists are not first-class and not tractable to reasoning, despite all the reasoning Appendix A is doing with them under the covers.

Quote from dandl on July 8, 2019, 5:29 am

Quote from AntC on July 8, 2019, 3:53 am

Quote from dandl on July 6, 2019, 5:10 pm

Indeed. Informally, intersection before projection can remove tuples that would survive if projection goes first. Seems obvious given the counter-example.

I wonder if this has a real world application.

More formally we could say that associativity of InnerUnion is not valid because it would need push-project-through-join (or lift-project-outside-join), which is not valid where the projected-away attributes include those in common between the operands of the join. To express that we need a way to formulate 'projected-away attributes' as smoothly as 'projected-on attributes'.

You said that very nicely. I wonder why you didn't see it that way before being provided with that counter example. No matter.

Actually, it does matter, because it demonstrates that it's really hard to reason informally about operations over relations. There's a great deal of ad-hoc/rule-of-thumb in (for example) query optimisation -- like the rules for push-project-through-join. Codd's algebra (or rather the pale imitation of it inside query engines that have to algebraitise SQL's nutty semantics) is not tractable to 'equational reasoning' like you'd use in FP.

I'm not saying that reasoning over lattice structures is easy. Although relational JOIN ≡ lattice meet gets you off to a flying start, it's a great deal less than intuitive that the lattice dual of meet (which most SQLs don't implement) could combine with DUM (which no SQLs implement) to achieve projection. So I was floundering around before seeing Tegiri's counter-example and proof of equivalence; failing to either formally prove what seemed the intuitive answer (which was wrong) or to disprove it.

Now Appendix A does that with its <REMOVE>, but smooth it isn't. There's a great deal of weasel words and hand-waving (like the talk of scare-quotes "macro" operator).

Then one real world application is: to get relational expressions more like algebra; so that the programmer can reason about their code, and chuck it at theorem provers to verify their intuitions; and so that optimisers can rearrange the code being confident they are not corrupting its semantics. In fact how Functional Programmers tend to operate anyway.

A worthy goal. I don't find RA expressions hard to reason about, as long as the syntax is natural and flows nicely. But all I'm trying to do is find some reasonable path to get from point A to point B. It's way harder to refactor code, or choose the best query plan. For that you really need rock solid underlying theory. You need a set of provable rewrite rules. Is that your aim? Do you expect to achieve it?

My aim is not exactly that. Rewrite rules/refactoring for optimisation are going to depend chiefly on available indexes and relative cardinalities; and the indexing characteristics of some query engine. I don't see that's tractable to formal algebra. But what is tractable is to take the optimised and original unoptimised query expressions, and algebraically prove they're equivalent. In the sense: produce the same output for a given input.

I don't see we get "rock solid underlying theory" with Codd's operators: they're partial ('union compatible'); attribute-comma-lists are not first-class; theta-expressions in restrictions or theta-joins are not first-class; the components of rename/extend are not first-class; etc, etc. Appendix A has tackled many of those limitations -- especially with 'Treating Operators as Relations', but still attribute-comma-lists are not first-class and not tractable to reasoning, despite all the reasoning Appendix A is doing with them under the covers.

#32 · July 8, 2019, 1:20 pm

Quote from Tegiri Nenashi on July 6, 2019, 12:04 am

Quote from AntC on July 5, 2019, 10:46 pm
There's several other possible equations.
Are
 they equivalent? ...
Specifically, Tropahsko fails to nail down the definition of R00 (aka Appendix A DUM). So any equations mentioning R00 are wobbly.
They are equivalent. Here is the proof:

1 ((x ^ R00) v (x ^ y)) v (R00 ^ y) = (x ^ y) v ((x v y) ^ R00). [goal].
...

8 (R00 ^ (x ^ (y v z))) v (y ^ z) = ((R00 ^ (x ^ y)) v z) ^ ((R00 ^ (x ^ z)) v y). [assumption].

As you can see from step #8, for example, I used Litak et.al. axiom system.

I'm not sure I'd previously seen that 2014 paper. (I'd given up the 2009 paper as impenetrable.)

Other than that axiom #8 and its friend, and a few bits of the introduction, most of that 2014 paper seems to be written in a language I don't recognise.

For what reason other than sheer showing-off would you define the lattice structure using a single axiom?

As it turns out, those two axioms for R00 aka DUM are just as wobbly as Tropashko's. The system allows for a degenerate lattice with only one element (i.e. DEE = DUM). And Mace4 produces a counter-model showing DUM is not fixed.

So the Litak system helps proving some useful axioms, but it's not useful enough to fix an algebra over relations AFAICT.

This is getting quite turgid.

#33 · July 8, 2019, 4:26 pm

Quote from AntC on July 6, 2019, 10:35 am

x ≤ y if x's heading is strictly alphabetically prior to y's (irrespective of content, taking the attribute names sorted alphabetically ascending, taking attribute type names sorted alphabetically descending in case of needing a 'tie breaker' for attribute names);

x ≤ y if x, y's headings are the same, and x's content ⊆ y's content;

otherwise x, y not comparable.

then DUM is the least element.

Even sillier definition would work. Encode the relation heading together with the content as a number. It maps the set of relations into a subset of numbers with the max and min as lattice operations.

One reason why you didn't find the inner join associativity counterexample with Mace4 was probably because of incomplete axiom system. In my experience, there is one more indispensable computer system for validating relational algebra statements -- a database engine. You populate sample database with representative set of relations, and run the counterexample search through the database engine by substituting relations into the statement variables.

#34 · July 9, 2019, 4:25 am

Quote from Dave Voorhis on July 8, 2019, 6:41 am

Quote from dandl on July 8, 2019, 12:38 am

Quote from Dave Voorhis on July 7, 2019, 4:50 pm

As for how you would know about it... Take a look in the Rel manual.

Ah, yes, there isn't one. Indeed. I really need to do something about that.

At the time I announced the Rel release that included ATTRIBUTES_OF(...), I did have in mind writing a short article about it -- along with some other features that are otherwise undocumented except in the CHANGES.TXT file that accompanies each Rel release -- but the discussion subsequent to the announcement here quickly diverted into one about the seagull that appears on https://reldb.org¹. Thus distracted, I forgot.

Yes, I remember that and yes, we've had this discussion before. Of course I would like a full manual triggered as context-sensitive help form within Rel, but I'd settle for a single page cheat sheet. How long can that take?

The last time we had this discussion, I made one in -- literally -- a few minutes. See https://reldb.org/c/wp-content/uploads/2017/12/Rel-and-Tutorial-D-Quickstart.pdf

That's an incredibly useful document, especially for a newcomer, but it's a quick start, not a cheat sheet. It doesn't list all the built in types, operators, etc and it doesn't list the Rel extensions (which is what this thread is all about). That should take about one more page, and another few minutes.

Andl - A New Database Language - andl.org

#35 · July 9, 2019, 12:49 pm

Quote from vadim tropashko on July 8, 2019, 4:26 pm

Quote from AntC on July 6, 2019, 10:35 am

One reason why you didn't find the inner join associativity counterexample with Mace4 was probably because of incomplete axiom system.

Thanks Vadim, and welcome to the forum.

Yes, it's only slowly dawning on me just how incomplete are the axioms. No wonder I've been struggling to get any tractable definitions of operators.

In my experience, there is one more indispensable computer system for validating relational algebra statements -- a database engine. You populate sample database with representative set of relations, and run the counterexample search through the database engine by substituting relations into the statement variables.

Yes, I recognised that's how Tegiri got his counter-example, using QBQL.

So all I need now is to populate the database with DEE and DUM (aka R01, R00); then get QBQL to tell me all the axiomatic properties sufficient to distinguish them and fix DUM ;-)

#36 · July 9, 2019, 2:03 pm

Quote from dandl on July 9, 2019, 4:25 am

Quote from Dave Voorhis on July 8, 2019, 6:41 am

Quote from dandl on July 8, 2019, 12:38 am

Quote from Dave Voorhis on July 7, 2019, 4:50 pm

As for how you would know about it... Take a look in the Rel manual.

Ah, yes, there isn't one. Indeed. I really need to do something about that.

At the time I announced the Rel release that included ATTRIBUTES_OF(...), I did have in mind writing a short article about it -- along with some other features that are otherwise undocumented except in the CHANGES.TXT file that accompanies each Rel release -- but the discussion subsequent to the announcement here quickly diverted into one about the seagull that appears on https://reldb.org¹. Thus distracted, I forgot.

Yes, I remember that and yes, we've had this discussion before. Of course I would like a full manual triggered as context-sensitive help form within Rel, but I'd settle for a single page cheat sheet. How long can that take?

The last time we had this discussion, I made one in -- literally -- a few minutes. See https://reldb.org/c/wp-content/uploads/2017/12/Rel-and-Tutorial-D-Quickstart.pdf

That's an incredibly useful document, especially for a newcomer, but it's a quick start, not a cheat sheet. It doesn't list all the built in types, operators, etc and it doesn't list the Rel extensions (which is what this thread is all about). That should take about one more page, and another few minutes.

It took a few minutes because it was based on already-existing teaching materials. Listing Rel extensions, etc., will take considerably longer. I'll look into adding it at some point.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#37 · July 9, 2019, 11:07 pm

Quote from AntC on July 8, 2019, 10:34 am

My aim is not exactly that. Rewrite rules/refactoring for optimisation are going to depend chiefly on available indexes and relative cardinalities; and the indexing characteristics of some query engine. I don't see that's tractable to formal algebra. But what is tractable is to take the optimised and original unoptimised query expressions, and algebraically prove they're equivalent. In the sense: produce the same output for a given input.

I don't see we get "rock solid underlying theory" with Codd's operators: they're partial ('union compatible'); attribute-comma-lists are not first-class; theta-expressions in restrictions or theta-joins are not first-class; the components of rename/extend are not first-class; etc, etc. Appendix A has tackled many of those limitations -- especially with 'Treating Operators as Relations', but still attribute-comma-lists are not first-class and not tractable to reasoning, despite all the reasoning Appendix A is doing with them under the covers.

Just so. Most of the payoff in query planning AFAIK comes from indexes and statistics, post compilation (and not just cardinalities). Modern SQL engines can also perform quite deep analysis of expressions to find opportunities to use indexes, eg

WHERE <date-col> >= <date-1> AND <date-col> < <date-2>

Andl can't do that, and I expect Rel can't either. But do you have an example of a transformation/rewrite/refactor that would become possible but only because of the proofs you seek?

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters