The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

ANNOUNCE: a relational algebra with only one primitive operator (tentative) (please check/critique)

PreviousPage 3 of 12Next
Quote from Vadim on May 14, 2021, 4:49 pm
Quote from AntC on May 14, 2021, 12:36 am
Quote from Vadim on May 13, 2021, 4:27 pm

The other [rather artificial] example is the database with all the empty relations.

No that's not a valid lattice structure to model relations: we must always have DEE, no matter how many empty relations there are. We can always say the sub-lattice from DUM down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can define DUM.

In the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.

Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)

In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.

Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce RENAME to other basic operations -- the other operations still need RENAME or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.

Page 11 here: https://books.google.com/books?id=siNlAn8Bm30C&pg=PA1&source=gbs_toc_r&cad=3#v=onepage&q&f=false

That's a book on Category Theory. It appears not to be written in English. I'm not claiming to be producing a Category-theoretical approach.

wrt the point at hand, I'll cite your own definition for NotMatching (generalised difference, you used \ operator). That has two equations, neither of which have bare (x \ y) on one side of the equation:

  • (x ^ y) v (x \ y) = x.
  • (x ^ y) ^ (x \ y) = (x ^ y) ^ DUM.

In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.

 

 

Then, it is terminology confusion, and you need to clarify what do you mean by "primitive operator". Analogies will get you only so far.

I'm demonstrating what I'm doing as I go along. So far as I can tell, the only person actually looking at the equations is me. I take NatJoin as primitive in the sense I stipulate its properties (Commutative, Idempotent, Associative), but I don't specify anything further about its behaviour or result.

Quote from Vadim on May 14, 2021, 6:38 pm
Quote from Erwin on May 14, 2021, 5:36 pm

DUM === {} , DEE === {{}}

DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.

Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.

A relation is a pair, not a set. Therefore, you must have meant

DUM \leftrightarrow \langle \{\}, \{\}\rangle

DEE \leftrightarrow \langle \{\}, \{ \langle\rangle \}\rangle

Then, even if you use Kuratovski pair for binary tuple, you still have to define what is a zilliary tuple. (BTW, Kuratovski pair has virtually no utility in mathematics). Also, in contemporary mathematics you never compare the two sets verbatim. Most often they are considered to be the "same" up to some equivalence. This idea is manifested in Anthony's presentation, where DEE and DUM are considered as couple of objects "fixed" inside an algebraic structure. Anthony and myself, however, disagree how successful those axiomatic definitions are in "fixing" the DUM, and even whether "fixing" an object is really a big deal.

It's a big deal because the algebra needs to know the result/manipulable properties of an expression mentioning DUM. For DEE, it's properties are sufficiently captured by saying it's identity for NaturalJoin.

We don't need to fix DUM if we have an adequately manipulable definition for NotMatching (I don't). Or we could introduce a primitive operation emptify( ); but again what are (all) the properties of expressions mentioning it? Things like r1 NatJoin emptify(r2) = emptify(r1 NatJoin r2); emptify(r1) InnerUnion emptify(r2) = emptify(r1 InnerUnion r2); r1 NotMatching Dee = emptify(r1); ....

Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor. I'd suggest that the database theory is still far from being cast in stone. Witness novel query languages appearing:

My o.p. started with 'Motivation'. The A algebra is not adequate to reason about the equivalence of queries. If any compiler/database engine is using A, I guess they've added all sorts of rules of thumb on top of the FORMAL DEFINITIONS. How can an industrial user of a query language be sure their queries produce the right results; or indeed how can a vendor of a DBMS? (For SQL it's almost impossible: there's no formal definition for all the corners of the semantics of SQL, and all its baroque bits of syntax, and how they interact inside a complex query.)

Quote from AntC on May 14, 2021, 10:50 pm

wrt the point at hand, I'll cite your own definition for NotMatching (generalised difference, you used \ operator). That has two equations, neither of which have bare (x \ y) on one side of the equation:

  • (x ^ y) v (x \ y) = x.
  • (x ^ y) ^ (x \ y) = (x ^ y) ^ DUM.

Yes, because the generalized difference is not a derived operation in the Relational Lattice with "standard" signature \{\wedge,\vee,R_{00}\}. It is the derived operation in the algebra with signature \{\wedge,\vee,\lhd NOT\rhd\}:

x <NotMatching> y = <NOT>(<NOT>(x) v (y ^ x)).

(Tested in QBQL, not in Prover9/Mace4)

 

 

 

Quote from AntC on May 14, 2021, 10:50 pm

I'm demonstrating what I'm doing as I go along. So far as I can tell, the only person actually looking at the equations is me. I take NatJoin as primitive in the sense I stipulate its properties (Commutative, Idempotent, Associative), but I don't specify anything further about its behaviour or result.

Well, it doesn't really matter which operations are considered primitive, and which aren't. Your entire thought process can be abstracted away, and eventually, you arrive to a system of algebraic identities. If you can succinctly list them, then I voluneer to check it. However, if you encountered some obstacle, then it is harder to help.

Quote from Erwin on May 14, 2021, 4:42 pm
Quote from AntC on May 14, 2021, 3:42 am

wrt REMOVE, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT to REMOVE. It is also hand-wavey about how to get from COMPOSE to the needed REMOVE: how does the algebra calculate the attributes in common between two operands being COMPOSEd? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE.)

The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.

(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do.  There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set.  Creates >1 distinct equivalence classes, that is.)

That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.

As an example of what A doesn't do, consider the properties of COMPOSE:

  • (r1 NatJoin r2) COMPOSE (s1 NatJoin s2) ≡ (((r1 NatJoin r2) ^ (s1 NatJoin s2)) REMOVE ?)
  • I can reshuffle the lhs of REMOVE (and in general the lhs expression might be bigger/hairier. Can I then push-REMOVE-through stuff?
  • COMPOSE is claimed to be commutative (unlike the mathematicians' version). I can see the NatJoin lhs of REMOVE is commutative, but is the rhs ?? ? is alleged to be a set-intersection of something, but there's no visibility of what.
  • COMPOSE is not in general associative (unlike the mathematician's version), but some COMPOSEs are associative -- for example (r1 COMPOSE s1) COMPOSE s2 providing s1, s2 have no attributes in common.

If SIRA_PRISE has COMPOSE, does it attempt to optimise expressions using it? How do your users know you didn't mess up some of the side-conditions for big/hairy expressions?

Quote from Vadim on May 14, 2021, 6:38 pm
Quote from Erwin on May 14, 2021, 5:36 pm

DUM === {} , DEE === {{}}

DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.

Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.

A relation is a pair, not a set. Therefore, you must have meant

DUM \leftrightarrow \langle \{\}, \{\}\rangle

DEE \leftrightarrow \langle \{\}, \{ \langle\rangle \}\rangle

Then, even if you use Kuratovski pair for binary tuple, you still have to define what is a zilliary tuple. (BTW, Kuratovski pair has virtually no utility in mathematics).

Just false. Practical mathematicians/algebraists rely on Kuratovski equivalence all the time -- even if they don't explicitly mention it.  Relationalists particularly -- you'll see correspondence with D.L. Childs about how it applies.

Kuratovski is a proof of capability. We can happily carry on using our familiar ordered pairs, and know they can be represented purely as sets, so we aren't needing any further mechanisms. In TTM it's particularly easy: the only tuples we have are:

  • relation values as pairs of (heading, body) -- the two of which are easily distinguishable because a heading is a set of pairs whereas a body is a set of sets of pairs. (The only ambiguous case is DUM, being a pair of empty sets, but then the components of the pair are the same, so no confusion). Headings are of different 'sort' vs bodies, and are (usually) simpler, so the obvious candidate to make a Kuratovski {H, {H, B}}.
  • In a heading's <A, T> pairs, the elements are of different 'sort's, and the A is atomic, so the obvious candidate to make a Kuratovski {A, {A, T}}.
  • In a body's tuple's <A, v> pairs, the elements are again of different 'sort's, and the A is atomic, so the obvious candidate to make a Kuratovski {A, {A, v}} .

 

Quote from Vadim on May 14, 2021, 11:59 pm
Quote from AntC on May 14, 2021, 10:50 pm

wrt the point at hand, I'll cite your own definition for NotMatching (generalised difference, you used \ operator). That has two equations, neither of which have bare (x \ y) on one side of the equation:

  • (x ^ y) v (x \ y) = x.
  • (x ^ y) ^ (x \ y) = (x ^ y) ^ DUM.

Yes, because the generalized difference is not a derived operation in the Relational Lattice with "standard" signature \{\wedge,\vee,R_{00}\}.

If that's the only place DUM appears (or rather (... NatJoin DUM) acting as equivalent to my emptify( ), then we do at least get some nice properties, because of the associativity/etc of NatJoin.

We also get some nice equalities around emptifying relations (proven by Prover9): x \ x = x NatJoin DUM; x \ DEE = x NatJoin DUM; DUM \ x = DUM; x \ DUM = x; ... But (and this is why I abandoned taking DUM as primitive,

  • I can't prove DEE ≠ DUM, and Mace4 is happy to produce a model in which they're equal.
  • Even if I give an axiom to say they're unequal, I can't prove DUM is adjacent to DEE (that is, interval one below lattice top).
  • I can't get Mace4 to even produce a model such that any other element x adjacent to DEE has x ^ DUM adjacent to DUM.
  • So every pair of (reasonably complicated) query expressions that I thought should be equal, Mace4 kept producing counter-models that couldn't possibly be models of a structure of relations -- even allowing for degenerate structures like all relations empty other than DEE, or all attribute types having a single possible value.
  • Trying to tell it that there should be a Boolean Algebra from x to x ^ DUM, or from DUM to lattice bottom (your R10) didn't stop it producing nonsensical models. (I say "didn't stop" meaning often Mace4 just ran for half an hour, producing nothing.)

It is the derived operation in the algebra with signature \{\wedge,\vee,\lhd NOT\rhd\}:

x <NotMatching> y = <NOT>(<NOT>(x) v (y ^ x)).
x <NotMatching> y = <NOT>(<NOT>(x) v (y ^ x)).
x <NotMatching> y = <NOT>(<NOT>(x) v (y ^ x)).

(Tested in QBQL, not in Prover9/Mace4)

 

I think that's going to suffer the same difficulties, due to the insufficient definedness of <NOT>. "Tested in QBQL" means exercised in some tool with impenetrable semantics, coded by someone who already knows what answer they want, and full of ad-hockery and rule-of-thumb black arts looking specifically for <NOT>. If <NOT> is a valid approach, prove it in a theorem prover where we can get a verifiable proof.

Quote from AntC on May 15, 2021, 2:49 am

If SIRA_PRISE has COMPOSE, does it attempt to optimise expressions using it? How do your users know you didn't mess up some of the side-conditions for big/hairy expressions?

How does any software user know his software is absolutely flawless ?  How does any software user know the 'vendor' (quotes because I want to include the non-commercial software providers as well) didn't mess up ?  How does any user of any non-software product (a car, a dishwasher, a heating installation, ...) has been proven to be fault-free ?

I've personally seen the IBM z/OS COBOL compiler (and in COBOL land, that's THE ultimate reference) generate infinite loops on a simple MOVE.  Where are the 'proofs' that DB2 didn't mess up ?  Where are the 'proofs' that Access didn't mess up ?  Where are the 'proofs' that Excel didn't mess up ?  Where are the 'proofs' that SAS didn't mess up ?  Did you ever get to see any such proof ?  I didn't.  And don't expect to either.  If any such proofs exist, why are those vendors still spending money to test their software ?  If any such proofs exist, why are those vendors still spending money on bug reporting facilities for their users ?

"Proving" that software "didn't mess up" is a hopeless endeavour.  And the users will have to learn to live with that.  "Know you didn't mess up" comes from using it and not running into errors all the time.  And users can get to 60% confidence, or 90%, or 99%, or 99.99%, but never 100%.  Even Tony Hoare once reflected that "the world doesn't seem to suffer that much from the problem we were trying to solve", that "problem" being "proving program correctness".

"Proving" that software "didn't mess up" is often also a futile endeavour.  You could "prove" that an arithmetic operation yields the "correct" result, assuming integer arithmetic modulo 65536 or some such.  Meaning it is "correct" to have 32767 + 1 = -32768.  At which point the end user just says it's wrong/irrelevant because that's not his arithmetic and possibly declares you nuts.

As for SIRA_PRISE : no rewrites at the source level, so no errors introduced by flawed rewrite mechanisms.  Also, it's just a data server (any SQL DBMS sans stored procedures and the like), so there's only DML to process and the closest thing to 'executable code' produced by the "compiler" is the particular chaining of computation (RA computation) and data access strategies associated with each operator invocation that is part of a particular given expression.  'Optimisation' is targeted at minimising physical file page accesses and so neither "memory consumption" nor "CPU time" are an optimisation target (other than that the strategies used by the engine are of course themselves as minimalistic as they can in those respects).

If you frown, you might appreciate the fact that my primary goal was and has always been to sort out the CREATE ASSERTION thing.  Which I managed to achieve and for which I *did* do the maths.  D&D have seen those maths, and so did Adrian Hudnott, but I don't know what became of him.  I hope he managed to achieve his life goals too and is now happy.

Quote from Erwin on May 15, 2021, 6:43 am
Quote from AntC on May 15, 2021, 2:49 am

If SIRA_PRISE has COMPOSE, does it attempt to optimise expressions using it? How do your users know you didn't mess up some of the side-conditions for big/hairy expressions?

"Proving" that software "didn't mess up" is a hopeless endeavour.

I think you might be missing a memo . We use as many tools as possible to get assurance. An automated proof system is relatively cheap to adopt for something that's mission-critical. But yes cost must be commensurate with risk. I'm not expecting "absolutely flawless". I am expecting a vendor/software supplier understands the risks I'm taking on in using their software. I guess there's no risk I'm going to use your software; or even breath a word of it to my clients. I'll be warning them off using it.

How does any software user know his software is absolutely flawless ?  How does any software user know the 'vendor' (quotes because I want to include the non-commercial software providers as well) didn't mess up ?  How does any user of any non-software product (a car, a dishwasher, a heating installation, ...) has been proven to be fault-free ?

You drive conservatively; you don't put your grandma's heirloom willow pattern dinner service into the dishwasher, but only the cheap china; you don't leave the logburner unattended for long periods. With a DBMS it's less easy to be that sort of conservative.

If you frown, you might appreciate the fact that my primary goal was and has always been to sort out the CREATE ASSERTION thing.

ASSERTIONs are statements in logic. I would have thought a logic tool abundantly helpful.

  Which I managed to achieve and for which I *did* do the maths.  D&D have seen those maths, ...

D&D are not theorem-provers. You are not a theorem-prover. Your doing the maths -- even with asterisks around -- is more fallible than a theorem prover.

Quote from AntC on May 15, 2021, 6:19 am
Quote from Vadim on May 14, 2021, 11:59 pm
Quote from AntC on May 14, 2021, 10:50 pm

wrt the point at hand, I'll cite your own definition for NotMatching (generalised difference, you used \ operator). That has two equations, neither of which have bare (x \ y) on one side of the equation:

  • (x ^ y) v (x \ y) = x.
  • (x ^ y) ^ (x \ y) = (x ^ y) ^ DUM.

Yes, because the generalized difference is not a derived operation in the Relational Lattice with "standard" signature \{\wedge,\vee,R_{00}\}.

If that's the only place DUM appears (or rather (... NatJoin DUM) acting as equivalent to my emptify( ), then we do at least get some nice properties, because of the associativity/etc of NatJoin.

BTW, replacing ... ^ DUM in that definition with ... \ DEE yields just the same equalities (below, mutatis mutandis, oh plus you have to say that \ is associative). So I can use ... \ DEE to act as emptify( ). That's a bit of a surprise because \ appears on both sides of the definition.

We also get some nice equalities around emptifying relations (proven by Prover9): x \ x = x NatJoin DUM; x \ DEE = x NatJoin DUM; DUM \ x = DUM; x \ DUM = x; ... But (and this is why I abandoned taking DUM as primitive,

  • I can't prove DEE ≠ DUM, and Mace4 is happy to produce a model in which they're equal.
  • Even if I give an axiom to say they're unequal, I can't prove DUM is adjacent to DEE (that is, interval one below lattice top).
  • I can't get Mace4 to even produce a model such that any other element x adjacent to DEE has x ^ DUM adjacent to DUM.
  • So every pair of (reasonably complicated) query expressions that I thought should be equal, Mace4 kept producing counter-models that couldn't possibly be models of a structure of relations -- even allowing for degenerate structures like all relations empty other than DEE, or all attribute types having a single possible value.
  • Trying to tell it that there should be a Boolean Algebra from x to x ^ DUM, or from DUM to lattice bottom (your R10) didn't stop it producing nonsensical models. (I say "didn't stop" meaning often Mace4 just ran for half an hour, producing nothing.)

And ... \ DEE fails to prove all those expected inequalities, so it's no worse. Calling your thing R00 seems to me false advertising: it's behaviour doesn't correspond to a relation value with empty (0) heading or empty (0) body.

PreviousPage 3 of 12Next