The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

ANNOUNCE: a relational algebra with only one primitive operator (tentative) (please check/critique)

PreviousPage 4 of 12Next
Quote from AntC on May 15, 2021, 10:17 am

If you frown, you might appreciate the fact that my primary goal was and has always been to sort out the CREATE ASSERTION thing.

ASSERTIONs are statements in logic. I would have thought a logic tool abundantly helpful.

  Which I managed to achieve and for which I *did* do the maths.  D&D have seen those maths, ...

D&D are not theorem-provers. You are not a theorem-prover. Your doing the maths -- even with asterisks around -- is more fallible than a theorem prover.

Per the very stances you are taking in this thread, a "logic tool" can only be helpful if it has been proved to be correct.  Quis custodiet ipsos custodes.  As for that final claim, I'm eagerly awaiting your proof.

Would it be helpful to start a new thread and set out what you are trying to achieve? Rather than this quibbling over mathematical nuances that may not even matter, and a heading that no longer seems relevant?

My impression is that the underlying aim is to devise an algebra on relations, similar to the RA, but with rules about constructing WFFs and for transforming or showing equivalence between WFFs. Am I close?

This is worth pursuing, because if you attach cost functions to the operators there are situations where a transformation of a WFF can have a lower overall cost. [I avoid the term optimisation, which does not apply here.]

Side-note: please don't fuss over dee and dum. They are literal values (like true or 42) that can be created as needed. But you do need a way to create new values, and every possible relation can then be created by starting with dum and adding stuff.

 

Andl - A New Database Language - andl.org
Quote from Erwin on May 15, 2021, 11:26 pm
Quote from AntC on May 15, 2021, 10:17 am

If you frown, you might appreciate the fact that my primary goal was and has always been to sort out the CREATE ASSERTION thing.

ASSERTIONs are statements in logic. I would have thought a logic tool abundantly helpful.

  Which I managed to achieve and for which I *did* do the maths.  D&D have seen those maths, ...

D&D are not theorem-provers. You are not a theorem-prover. Your doing the maths -- even with asterisks around -- is more fallible than a theorem prover.

Per the very stances you are taking in this thread, a "logic tool" can only be helpful if it has been proved to be correct.  Quis custodiet ipsos custodes.  As for that final claim, I'm eagerly awaiting your proof.

No, you're not getting the memo with the Swiss cheese model. This is not a command-and-control structure up to a single point of responsibility = single point of failure. (As has been demonstrated by now across many industries, including airline pilots and those running nuclear power plants.) I am not advocating for a theorem prover on its own, as if everybody else abdicates responsibility.

Who takes custody for the custodians? Every other custodian. Who takes custody for the cranky, opinionated outspoken programmer? Every other ... Wait. Those kind of people are so caustic, throw them off the project, even if -- indeed especially if -- they are the 'hero' who wrote the original software. We don't want heroes, we want collaborators.

Who checks the programmer's logic? Firstly, did the programmer get their logic from a business analyst who talked to the users/organisation's needs? Then check against the specification. Secondly, that's the specific job of QA. Thirdly that's the responsibility of the users/acceptance testing. Or did the programmer just make it up with no consultation/without writing a spec? Then sack the programmer and throw away the software, and count yourself lucky it didn't get as far as production.

Something like a DBMS is trickier because it's infrastructure, it has to be protean. It won't get well tested until other programmers/organisations build/test/etc applications on top of it. So look around for other sources of custodianship, like ... a theorem-prover.

Who checks the theorem-prover? Well, amongst others, the original programmer. But that's with a different hat on: theorem-reading/-verification, not theorem-writing. (What I'm doing to find holes in my model.) Who else? Theorem-provers can format their output into human-/logician-readable proofs. Even Russell & Whitehead weren't so sure of the proofs in Principia Mathematica, and got others to check -- and others found holes. A neat thing about automated provers is they can produce counter-examples. So you could pass that to a domain expert and ask: is this a plausible scenario?

Who checks the theorem-prover? It's being used by zillions of other custodians in diverse branches of verification techniques. It's not specifically their job to prove the prover, but if it produces proofs or counter-examples that don't withstand scrutiny, the principles of FOPL/deduction are so well-agreed, that it's clear-cut there's an error.

Quote from dandl on May 16, 2021, 12:30 am

Would it be helpful to start a new thread and set out what you are trying to achieve? Rather than this quibbling over mathematical nuances that may not even matter, and a heading that no longer seems relevant?

A reasonable question @dandl. I've just re-read the o.p. I think 'Motivation' still stands. I think '(Empty) Relations denoting headings' still stands.

The rest of your comment betrays that you still think this is something 'executable'. Then the third section of the o.p. still stands. No, DUM can't be "created as needed". It doesn't need creating/executing at all. What it needs is its properties fixed uniquely and correctly wrt other values and operations within the logic.

I've not much idea what you mean by 'attach cost functions', but I'm not trying to guide optimisations -- it's just that optimisation is the easiest example to point to as a usage for an equivalence-checker.

  1. I am not fussing over DEE -- that's easy/solved.
  2. I am fussing over some way to emptify( )/get the heading of a relation. If I have emptify( ), then I don't need to fuss over DUM -- it's merely emptify(DEE). If I have DUM, then I don't need to fuss over emptify( ) -- it's merely x NatJoin DUM. If I have NotMatching, then I don't need to fuss -- DUM = DEE NotMatching DEE; emptify(r) = r NotMatching r = r NotMatching DEE. But I am going to continue to fuss about the overall objective.

My impression is that the underlying aim is to devise an algebra on relations, similar to the RA, but with rules about constructing WFFs and for transforming or showing equivalence between WFFs. Am I close?

This is worth pursuing, because if you attach cost functions to the operators there are situations where a transformation of a WFF can have a lower overall cost. [I avoid the term optimisation, which does not apply here.]

Side-note: please don't fuss over dee and dum. They are literal values (like true or 42) that can be created as needed. But you do need a way to create new values, and every possible relation can then be created by starting with dum and adding stuff.

 

"similar to the RA" is where I think you're mis-apprehending. Codd 1972 is very much aimed at proving that his algebra is transformable to an executable language (the calculus/ALPHA); and that the algebra is expressively complete. (In that I think he failed -- which is why I shun the term 'Relationally complete'.) I'm not rejecting out of hand that my operators/operations/primitives might be executable -- indeed that's why I started from NatJoin, and why I'm avoiding domain-dependent operators like <NOT>. That is, until/unless it's forced on me that <NOT> (say) works whereas what I'm trying does not. So far, I think <NOT> suffers just as much difficulties of lack of being able to fix enough of its characteristics.

I could take DUM as a primitive. Then I need to specify the logical effect of "adding stuff": how/what characteristics do I add to get from DUM to DEE? (That question still applies if I also take DEE as primitive.) How is that different to adding a single attribute with an empty body? What are the properties of adding tuples to that empty body? The advantage of the lattice approach is we don't try to answer those questions, we take the set of relation values as given (and possibly infinite). But then we do need to pick out of that set: DEE (easy) and DUM (not so far).

For example: if I can talk about headings, I can characterise what it is for a relation (value) to have exactly one more attribute than another; I can characterise what it is for a relation to have exactly one more tuple than another with the same heading. I can say that characterising one more attribute on top of one more tuple achieves the same properties as characterising one more tuple on top of one more attribute. I can characterise starting from DUM and building up.

Quote from Vadim on May 14, 2021, 6:38 pm

 

... Witness novel query languages appearing:

LARA (AKA LaraDB) which has incorporated some ideas from Relational Lattices.

 

Thanks Vadim, but it wasn't far into those papers before I was (virtually) throwing them into the bin in disbelief.

After 70 years of the RM oh dear, oh dear, oh dear and why, why, why? are people still building semantics like this:

  • "A relational schema is a finite collection σ of two-sorted relation symbols. The first sort consists of key-attributes and the second one of value-attributes."
    [from the first paper]. Key-value stores (aka Indexed files) are what I was very glad to see the back of in the 1980's.
    In case anybody needs reminding: keys are values. (Although what I suspect they mean is 'identifiers'/GUIDs, which are just a bloody nuisance.) And values can be keys -- or at least targets for search and joining to/from other relvars; oh and not forgetting self-joins, where they're both.
  • "we show that the LaraDB implementation outperforms Accumulo’s native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply"
    [from the second paper] So it's an optimised form of MapReduce whoop-de-doo: row-by-agonising-row with shortcuts. It's a bicycle with derailleur gears, rather than a fixed-ratio with a rusty chain. Have you considered getting one of those automobile things?
Quote from AntC on May 16, 2021, 2:53 am
Quote from dandl on May 16, 2021, 12:30 am

Would it be helpful to start a new thread and set out what you are trying to achieve? Rather than this quibbling over mathematical nuances that may not even matter, and a heading that no longer seems relevant?

A reasonable question @dandl. I've just re-read the o.p. I think 'Motivation' still stands. I think '(Empty) Relations denoting headings' still stands.

I don't find Motivation helpful in understanding where you want to get to. It's just a list of issues.

Headings: who needs them? This is an algebra closed over relations, but the behaviour of operators is dictated by the headings of their arguments, and using relations to stand in for their headings doesn't really change anything.

Only one operator? I think you've abandoned that. So a new thread might be overdue.

The rest of your comment betrays that you still think this is something 'executable'. Then the third section of the o.p. still stands.

The algebra allows you to write WFFs. When variables in the WFF are bound to values it can be evaluated to yield a result (like Pythagoras). What other purpose  did you have in mind?

No, DUM can't be "created as needed". It doesn't need creating/executing at all. What it needs is its properties fixed uniquely and correctly wrt other values and operations within the logic.

Sorry: DUM exists, DEE has to be created by some operator. DUM is like zero in the number system. But (so far) your algebra has no way to create new values so you cannot create DEE except as a literal.

I've not much idea what you mean by 'attach cost functions', but I'm not trying to guide optimisations -- it's just that optimisation is the easiest example to point to as a usage for an equivalence-checker.

Ditto: a cost function is the formal way to describe what you call optimisation.

  1. I am not fussing over DEE -- that's easy/solved.
  2. I am fussing over some way to emptify( )/get the heading of a relation. If I have emptify( ), then I don't need to fuss over DUM -- it's merely emptify(DEE). If I have DUM, then I don't need to fuss over emptify( ) -- it's merely x NatJoin DUM. If I have NotMatching, then I don't need to fuss -- DUM = DEE NotMatching DEE; emptify(r) = r NotMatching r = r NotMatching DEE. But I am going to continue to fuss about the overall objective.

I object; you don't have X or r until you create them as literals or as the result of some operator.

My impression is that the underlying aim is to devise an algebra on relations, similar to the RA, but with rules about constructing WFFs and for transforming or showing equivalence between WFFs. Am I close?

This is worth pursuing, because if you attach cost functions to the operators there are situations where a transformation of a WFF can have a lower overall cost. [I avoid the term optimisation, which does not apply here.]

Side-note: please don't fuss over dee and dum. They are literal values (like true or 42) that can be created as needed. But you do need a way to create new values, and every possible relation can then be created by starting with dum and adding stuff.

 

"similar to the RA" is where I think you're mis-apprehending. Codd 1972 is very much aimed at proving that his algebra is transformable to an executable language (the calculus/ALPHA); and that the algebra is expressively complete. (In that I think he failed -- which is why I shun the term 'Relationally complete'.) I'm not rejecting out of hand that my operators/operations/primitives might be executable -- indeed that's why I started from NatJoin, and why I'm avoiding domain-dependent operators like <NOT>. That is, until/unless it's forced on me that <NOT> (say) works whereas what I'm trying does not. So far, I think <NOT> suffers just as much difficulties of lack of being able to fix enough of its characteristics.

Coss 1972 is problematic. He spends time on division and various kinds of join which no longer figure in the 'usual' RA. He makes no attempt to address the issue of creating new values, relying entirely on literals. And he includes Restriction as an operator that parallels those joins and has an attribute value as an argument, so not closed over relation. It's a masterwork, but flawed.

I could take DUM as a primitive. Then I need to specify the logical effect of "adding stuff": how/what characteristics do I add to get from DUM to DEE? (That question still applies if I also take DEE as primitive.) How is that different to adding a single attribute with an empty body? What are the properties of adding tuples to that empty body? The advantage of the lattice approach is we don't try to answer those questions, we take the set of relation values as given (and possibly infinite). But then we do need to pick out of that set: DEE (easy) and DUM (not so far).

For example: if I can talk about headings, I can characterise what it is for a relation (value) to have exactly one more attribute than another; I can characterise what it is for a relation to have exactly one more tuple than another with the same heading. I can say that characterising one more attribute on top of one more tuple achieves the same properties as characterising one more tuple on top of one more attribute. I can characterise starting from DUM and building up.

Which I think is the correct approach. But you are going to have to address the problem of creating new values. IMO once you have relcons restriction is just a semijoin and new value is just a join: it's relations all the way down. Lovely!

Andl - A New Database Language - andl.org

Even though I loved abstract algebra at university, I haven't kept up, lattices are certainly new to me, so I would need to put in possibly more effort than I can at this time to fully interact here, but hopefully my uninformed and disjoint comments might help.

How do you arrive at "Dumpty"? Do you just take it as a given? What if you want to extend it with a new attribute? I'm thinking along the lines of Galois theory where you start with the reals but have to do field extensions with e.g. the square root of 2 to be able to represent numbers that contain multiples of it, so you get numbers represented as "a + b*sqrt(2)". Then you can further extend with square root of 3, and so on.

What about the other dimension, the complete set of possible tuples of a certain heading? Or even the cross product of all possible values of all possible attributes? How does that get expanded (what are the generators of the group, if it indeed corresponds to a group at all) or is that not interesting?

IIUC, Dee is the identity for natural join, but Dum would be the zero. Also, it seems Dum might be the identity for inner union, and Dee would be the zero. Or is it? Does the identity for inner union need to be the empty relation with maximum heading? Could/should Dum have the maximum heading (oh, that's Dumpty)? Yes, I think so, Dumpty is also the true zero for natural join, not Dum.

Are natural join and inner union then orthogonal (in some sense)?

If we start with Dee and Dumpty (it perhaps remains to be proven that the algebra from that is sane), we can never create Dum by natural join and inner union. So I think we're back to the question of how we do field extensions with new attributes (and I suppose also how we remove them)

Quote from tobega on May 16, 2021, 8:41 am

Even though I loved abstract algebra at university, I haven't kept up, lattices are certainly new to me, so I would need to put in possibly more effort than I can at this time to fully interact here, but hopefully my uninformed and disjoint comments might help.

See the new thread I started on the curiosity. No mention of lattice theory until I slip in at the end, that's what I've been doing all along -- like the guy who discovered he'd been speaking prose all his life.

How do you arrive at "Dumpty"? Do you just take it as a given?

As I said to @dandl "we take the set of relation values as given (and possibly infinite). But then we do need to pick out of that set: ..." ... Dumpty (easy -- it's identity for InnerUnion aka the absorbing element for NatJoin -- this is, r NatJoin Dumpty = Dumpty, for all r.)

What if you want to extend it with a new attribute?

You don't: all possible attributes (and their type) are already in the heading of Dumpty. To respond to one of @dandl's points while I'm here

But you are going to have to address the problem of creating new values.

No, all headings for PLUS (and all other relcons) are already included in Dumpty, and all values for PLUS are already included in that even less effable relation value: the one with same heading as Dumpty, all possible tuples. Since the algebra is not 'executable' and all values live in the timeless/infinite realm of mathematics, the abstractness/ineffability/unrealisability is only an academic concern.

I'm thinking along the lines of Galois theory where you start with the reals but have to do field extensions with e.g. the square root of 2 to be able to represent numbers that contain multiples of it, so you get numbers represented as "a + b*sqrt(2)". Then you can further extend with square root of 3, and so on.

What about the other dimension, the complete set of possible tuples of a certain heading? Or even the cross product of all possible values of all possible attributes? How does that get expanded (what are the generators of the group, if it indeed corresponds to a group at all) or is that not interesting?

IIUC, Dee is the identity for natural join, but Dum would be the zero. Also, it seems Dum might be the identity for inner union,

No. Dum is not the identity for anything, neither is it 'the zero'. r NatJoin Dum is not r in general, nor Dum -- in fact the only r that would produce Dum is Dum itself or Dee. The identity for InnerUnion is Dumpty -- that's just the upside-down of NatJoin.

and Dee would be the zero. Or is it? Does the identity for inner union need to be the empty relation with maximum heading? Could/should Dum have the maximum heading (oh, that's Dumpty)? Yes, I think so, Dumpty is also the true zero for natural join, not Dum.

Are natural join and inner union then orthogonal (in some sense)?

Yes-ish (not "orthogonal", reverses) they're lattice 'duals' is the term. r1 NatJoin r2 = r2 ≡ r1 InnerUnion r2 = r1.

If we start with Dee and Dumpty (it perhaps remains to be proven that the algebra from that is sane), we can never create Dum by natural join and inner union. So I think we're back to the question of how we do field extensions with new attributes (and I suppose also how we remove them)

Maybe I can't fix Dum enough in terms of other operations. But if I can't there's no merit in ordaining Dum ex nihili -- I still need to fix its properties.

Quote from dandl on May 16, 2021, 7:28 am
Quote from AntC on May 16, 2021, 2:53 am
Quote from dandl on May 16, 2021, 12:30 am

Would it be helpful to start a new thread and set out what you are trying to achieve? Rather than this quibbling over mathematical nuances that may not even matter, and a heading that no longer seems relevant?

A reasonable question @dandl. I've just re-read the o.p. I think 'Motivation' still stands. I think '(Empty) Relations denoting headings' still stands.

I don't find Motivation helpful in understanding where you want to get to. It's just a list of issues.

Headings: who needs them? This is an algebra closed over relations, but the behaviour of operators is dictated by the headings of their arguments, and using relations to stand in for their headings doesn't really change anything.

Headings? I don't have them. The algebra is single-sorted. It's just that in TTM-land it's easier to talk in terms of headings and bodies. Instead of 'heading' you can put 'type of relation value'.

Only one operator? I think you've abandoned that. So a new thread might be overdue.

It's "only one primitive operator". I've defined NotMatching in terms of it; I've defined DUM = DEE NotMatching DEE .

The rest of your comment betrays that you still think this is something 'executable'. Then the third section of the o.p. still stands.

The algebra allows you to write WFFs. When variables in the WFF are bound to values it can be evaluated to yield a result (like Pythagoras). What other purpose  did you have in mind?

Showing that two queries are equivalent. Also expressing side-conditions on that equivalence -- such as the participant relations must have disjoint headings. Note I haven't in the algebra specified what is a heading, what is a tuple, what is a <A, v> pair. I've given no specification of a value. So good luck with trying to evaluate non-values -- or perhaps I should say "ghost" values. r1, r2, s ... are variables for/range over values.

Addit: You're very welcome to assume relations are as specified in Appendix A 'FORMAL DEFINITIONS'; you won't mis-understand. But I haven't assumed or specified anything about relations being sets or pairs of sets or pairs of a set-of-pairs with a set-of-sets-of-pairs. There's no set operations in the algebra. There's only operators and variables embedded in FOPL-with-equality.

No, DUM can't be "created as needed". It doesn't need creating/executing at all. What it needs is its properties fixed uniquely and correctly wrt other values and operations within the logic.

Sorry: DUM exists,

DUM exists in just the same way as all the values -- which is to say only as a "ghost".

DEE has to be created by some operator.

DEE is the identity for NatJoin. Which is to say if NatJoin is required to scare-quotes 'create' r NatJoin DEE, it'll just pass along r. There's nothing up my sleeves here. This is an algebra not a database engine nor query language running on a database engine.

DUM is like zero in the number system.

No. 0 + x = x -- that's sufficient to fix zero. DUM NatJoin r1 = ?.

But (so far) your algebra has no way to create new values so you cannot create DEE except as a literal.

All the algebra can do is inspect its argument. If it's DUM then ... If it's the identity for NatJoin then ... How do I tell if some value is the identity for NatJoin? I apply DEE NatJoin r to all r in my infinity of values; if each application returns r (and no other candidate returns r for all r), then I have DEE.

 

For example: if I can talk about headings, I can characterise what it is for a relation (value) to have exactly one more attribute than another; I can characterise what it is for a relation to have exactly one more tuple than another with the same heading. I can say that characterising one more attribute on top of one more tuple achieves the same properties as characterising one more tuple on top of one more attribute. I can characterise starting from DUM and building up.

Which I think is the correct approach. But you are going to have to address the problem of creating new values. IMO once you have relcons restriction is just a semijoin and new value is just a join: it's relations all the way down. Lovely!

 

Quote from AntC on May 16, 2021, 2:10 am

No, you're not getting the memo with the Swiss cheese model.

I love it when people start telling me that I'm not getting something.  Always an occasion to gamble whether it's Dunning&Kruger doing their thing or not.  And no, I don't like Swiss Cheese.  Too many holes.  Give me an Italian Lasagna.  Predicting the actual substance you're getting based on just the volume offered is way more accurate.  Don't depend on the volume offered by Swiss Cheese Vendors.

PreviousPage 4 of 12Next