The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

ANNOUNCE: a relational algebra with only one primitive operator (tentative) (please check/critique)

PreviousPage 2 of 12Next

This definition for NotMatching is defective:

  • r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1.
  • In the case where r1, r2 are the same value, the rhs of the  becomes (r1 NatJoin r1) InnerUnion s = r1, reduces to r1 InnerUnion s = r1 (Idempotence of NatJoin).
  • The lhs of  wants NotMatching to return the minimal value that satisfies rhs. That'll be lattice bottom. For every possible r1.
  • So my chief purpose for using NotMatching was to obtain emptify(r) = r NotMatching r as a way to talk about headings: all r2s with emptify(r2) = emptify(r) have the same heading. But with that definition for emptify( ), all relations of whatever heading return the same (bottom) value.
  • It ain't going to work. And that's why I wasn't making progress with defining DUM: DEE NotMatching DEE still returns bottom.

 

Quote from Vadim on May 13, 2021, 4:27 pm
Quote from AntC on May 13, 2021, 11:53 am
Quote from Vadim on May 12, 2021, 5:28 pm

Consider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.

Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element r1 within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given by r1 NotMatching r1.

Sub-Sub-Lattice complement of r2 coincides with r1 NotMatching r2. IOW if r2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1 then s = r1 NotMatching r2. I've added that as an axiom.  Also added the distributivity within the sub-sub-lattice from r1. That still doesn't fix enough, without the 'ugly' axiom.

The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.

The lattice structure containing only DEE, DUM is a perfectly fine structure of relation values. And it is a (degenerate) Boolean Algebra. And my proposed definitions work very smoothly. (DUM coincides with lattice bottom.)

The other [rather artificial] example is the database with all the empty relations.

No that's not a valid lattice structure to model relations: we must always have DEE, no matter how many empty relations there are. We can always say the sub-lattice from DUM down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can define DUM.

(As an aside: I could see that a structure containing only DEE and empty relations would be problematic to model. Furthermore a structure containing only singleton and empty relations would be problematic -- that is, all attribute types containing only a single possible value. So if I could model a structure in which for each non-empty heading there's at least two possible relation values, I'd be happy enough. I'd even (reluctantly) give away the possibility of a model with only DEE, DUM if I could get the rest to work.)

Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.

Aside from the trivial DEE, DUM structure, no required structures are Boolean Algebras alone. So thank you for the proof that NAND, NOR alone are not adequate.

In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.

Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce RENAME to other basic operations -- the other operations still need RENAME or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.

In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.

 

No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).

I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of meet, join is defined via the absorption laws. I merely equivalently defined InnerUnion (lattice join) in terms of NatJoin (lattice meet), to demonstrate the technique before using it to 'fix' NotMatching -- or so I hoped. From my definition of InnerUnion you can mechanically derive the absorption laws, and all the properties of InnerUnion.

For example I can give a quadratic equation as ax2 + bx + c = 0, no "derived operation alone on the left hand side" there (or on either side),  and from that derive the solution for x via equational reasoning.

Quote from dandl on May 12, 2021, 1:41 pm
...

No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin.

No AntiJoin (by which I take it you mean AntiSemiJoin) returns a relation with heading same as its left operand. It doesn't remove (nor add) any attributes.

Easy if you start with negation.

I'm not aiming for 'easy'; I'm aiming for expressively complete. Neither am I aiming for a 'party game' of reducing to a single operator (see my recent post). But if there's more than one operator, we need to define how they interact -- and in an equationally complete way. And if you get to a sufficiently robust set of definitions, you find you could anyway do it with only one operator -- that is, providing your domain has a single 'sort'. In contrast, A has at least two 'sort's: relations and attribute-comma-lists. Now you need operators to get from a relation to the a-c-l for its heading and back.

 

Quote from dandl on May 13, 2021, 8:41 am

Only one primitive operator

  • InnerUnion has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as is NatJoin. Unfortunately, InnerUnion is not distributive over NatJoin, nor vice versa.
  • But (ANNOUNCE) if we could define InnerUnion in terms of NatJoin (embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.
  • Furthermore (further ANNOUNCE) if we could define PROJECT/REMOVE, NotMatching in terms of NatJoin (embedding etc) then we can reason all the way round equations.

At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about emptifying relations just to get their heading -- or even to evaluate the relational expression argument to emptify( )! It's an algebra purely for reasoning equationally about the equivalence of queries.

No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).

We don't need our initial set of operators to include some form of negation. Consider a definition for dyadic operator arithmetic minus (-), or more relevantly set difference (\):

  • x - y =df z s.t. y + z = x.
  • x \ y =df z s.t. (y ∩ x) ∪ z = x & y ∩ z = ∅.
  • Note the definition for set difference needs some tricky conditions because of the possibility y contains elements not in x; and we must forbid z from containing elements already in y.
  • Perhaps you're puzzled by the format I've presented definitions in my o.p.? rather than that z s.t. ... form, I could replace z by its definiendum (substitution of equals) so go
    y + (x - y) = x
    (y ∩ x) ∪ (x \ y) = x & y ∩ (x \ y) = ∅
    And then I could split that last conjunction into two separate axioms.

This fixes - and \ uniquely for all x, y. There's no need to go via monadic minus; indeed we could define negate x =df 0 - x ≡ z s.t. x + z = 0. Because 0 is identity for + -- just as DEE is identity for NatJoin.

Appendix A claims NAND could replace only <AND>, <OR>, <NOT>; it doesn't go near project/REMOVE, COMPOSE. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)

It says:

"In fact, we will show in the next section that we do not really
need RENAME either; thus, we could in fact reduce our algebra
still further to just the two operators REMOVE and either
NOR or NAND (plus TCLOSE)."

I don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.

wrt REMOVE, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT to REMOVE. It is also hand-wavey about how to get from COMPOSE to the needed REMOVE: how does the algebra calculate the attributes in common between two operands being COMPOSEd? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE.)

Project/Remove is simply InnerUnion with an empty relation representing the new heading.

Project is that. REMOVE is not. I did provide an equivalent definition of project as a REMOVE of a REMOVE, but that's not sufficient to 'fix' either without some extra machinery to talk about headings-as-sets. Also, as it turns out, I haven't yet fixed how to get an empty relation. ( (r NAND r) would deliver that, but then how do I integrate NAND into the lattice operations? Or perhaps there's another approach which isn't at all lattice-based? You're welcome to try. Making the overall set of operations 'solvable' enough to express side-conditions and infer equivalence of queries raises a very high bar. Appendix A doesn't reach it.)

...

My attempts at such definitions have foundered in the past because they relied on DUM being defined; and I couldn't obtain a definition for DUM that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)

The definition for DEE hadn't suffered that difficulty, because we can fix DEE as the identity element for NatJoin. And it turns out (claim not yet demonstrated) we can define the other operators without mentioning DUM; then define DUM from them:

  • DEE =df s s.t. ∀ r. s NatJoin r = r.
  • DUM =df DEE NotMatching DEE.
  • DUM =df ∀ r. (r NotMatching r) REMOVE r equivalently  ∀ r. (r REMOVE r) NotMatching r .
  • (Yes, that's three equivalent definitions for DUM -- proven equivalent by equational reasoning.)

Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.

DEE is easy/doesn't need negate. DUM = DEE NotMatching DEE turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.

I have no idea what this means, or why I should care. My aim was to point out that I don't see any solution with a single primitive unless it's some kind of negation, like NAND. Whether or not you get a 'nice' DUM or not this way, I think you won't get there any other way.

...

No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.

AntiJoin could be defined as the negation of MATCHING; but r1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}, so you still need projection

AntiJoin is cognate with Not Matching: same heading, result is the complement of SemiJoin (aka Matching). I prefer to avoid the distinctive/idiosyncratic TD names for operators, they come with too much baggage.

I prefer to avoid the SQL names, because they come with too much baggage -- especially around Null and duplicate rows. If forced, I prefer 'AntiSemiJoin' to 'AntiJoin' (because the 'Anti' in 'AntiJoin' could be anti- some of the other behaviour of Join). The TTM/Tutorial D names I find just perfect. I'm using NatJoin rather than bare JOIN, only because we're in latticeland, and NatJoin corresponds to lattice meet.

But I have to say I'm not optimistic about your quest. In highly informal terms, you can have an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite, but a single operator to do all that seems far-fetched. More power if you can get there.

 

Off-topic re "an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite", one of the early (1994 IIRC, although I discovered it much later) attempts at update-through-view by Date and McGoveran noted that equivalence, and tried to take advantage of it by 'reverse engineering' a view definition back to its based-on relvars. The attempt failed abjectly, and ended up making stipulations in which that equivalence didn't hold. I.e. update-through-INTERSECTION behaved differently vs update-through-MINUS-of-MINUS.

I concede it's difficult to get the stipulations to work out. But it isn't difficult to check whether what you've stipulated actually does works out, when you've started by insisting it must work out.

I nearly abandoned TTM at that point, on grounds it seemed to be run by idiots. End of off-topic.

Quote from Vadim on May 13, 2021, 4:27 pm
Quote from AntC on May 13, 2021, 11:53 am
Quote from Vadim on May 12, 2021, 5:28 pm

Consider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.

Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element r1 within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given by r1 NotMatching r1.

Sub-Sub-Lattice complement of r2 coincides with r1 NotMatching r2. IOW if r2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1 then s = r1 NotMatching r2. I've added that as an axiom.  Also added the distributivity within the sub-sub-lattice from r1. That still doesn't fix enough, without the 'ugly' axiom.

The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.

The other [rather artificial] example is the database with all the empty relations.

Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.

I think I would be right in saying that a single operator is enough, but it still needs an argument domain, the set of boolean values.

In the same way there might be one, two or more operators in a minimal RA, but they are not responsible for the creation of relational values such as Dee and Dum, which are already part of the domain.

In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons. In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.

 

 

Andl - A New Database Language - andl.org

I don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.

OK. The way I read App-A is:

  • chatty discussion with various asides around the idea of a 'core' RA
  • formal definitions for 5 operators: NOT, REMOVE, RENAME, AND, OR (but not TCLOSE)
  • a foray into the idea of relcons as s source of new values, but no formal definitions.

I find the formal definitions to be correct, the rest to be thoughts experiments/work in progress. I have completed the work to my satisfaction with formal definitions based on relcons, plus while to replace TCLOSE. I have not found any other formal treatment of the entire extended RA, so this is what I work from.

wrt REMOVE, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT to REMOVE. It is also hand-wavey about how to get from COMPOSE to the needed REMOVE: how does the algebra calculate the attributes in common between two operands being COMPOSEd? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE.)

It's mostly hand-wavey, but the formal definitions stand on their own.

As to the remainder of your work, you are following a path I don't understand to a destination I cannot see. Best if I just lurk a while.

Andl - A New Database Language - andl.org
Quote from AntC on May 14, 2021, 3:42 am

wrt REMOVE, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT to REMOVE. It is also hand-wavey about how to get from COMPOSE to the needed REMOVE: how does the algebra calculate the attributes in common between two operands being COMPOSEd? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE.)

The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.

(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do.  There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set.  Creates >1 distinct equivalence classes, that is.)

That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.

Author of SIRA_PRISE
Quote from AntC on May 14, 2021, 12:36 am
Quote from Vadim on May 13, 2021, 4:27 pm

The other [rather artificial] example is the database with all the empty relations.

No that's not a valid lattice structure to model relations: we must always have DEE, no matter how many empty relations there are. We can always say the sub-lattice from DUM down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can define DUM.

In the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.

Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)

In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.

Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce RENAME to other basic operations -- the other operations still need RENAME or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.

Page 11 here: https://books.google.com/books?id=siNlAn8Bm30C&pg=PA1&source=gbs_toc_r&cad=3#v=onepage&q&f=false

In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.

 

No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).

I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of meet, join is defined via the absorption laws. I merely equivalently defined InnerUnion (lattice join) in terms of NatJoin (lattice meet), to demonstrate the technique before using it to 'fix' NotMatching -- or so I hoped. From my definition of InnerUnion you can mechanically derive the absorption laws, and all the properties of InnerUnion.

For example I can give a quadratic equation as ax2 + bx + c = 0, no "derived operation alone on the left hand side" there (or on either side),  and from that derive the solution for x via equational reasoning.

Then, it is terminology confusion, and you need to clarify what do you mean by "primitive operator". Analogies will get you only so far.

Quote from Vadim on May 14, 2021, 4:49 pm

In the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.

Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)

I do remember having had some correspondence, but I no longer remember any of the detail.  That said,

DUM === {} , DEE === {{}}

DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.

Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.

Author of SIRA_PRISE
Quote from Erwin on May 14, 2021, 5:36 pm

DUM === {} , DEE === {{}}

DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.

Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.

A relation is a pair, not a set. Therefore, you must have meant

DUM \leftrightarrow \langle \{\}, \{\}\rangle

DEE \leftrightarrow \langle \{\}, \{ \langle\rangle \}\rangle

Then, even if you use Kuratovski pair for binary tuple, you still have to define what is a zilliary tuple. (BTW, Kuratovski pair has virtually no utility in mathematics). Also, in contemporary mathematics you never compare the two sets verbatim. Most often they are considered to be the "same" up to some equivalence. This idea is manifested in Anthony's presentation, where DEE and DUM are considered as couple of objects "fixed" inside an algebraic structure. Anthony and myself, however, disagree how successful those axiomatic definitions are in "fixing" the DUM, and even whether "fixing" an object is really a big deal.

Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor. I'd suggest that the database theory is still far from being cast in stone. Witness novel query languages appearing:

LARA (AKA LaraDB) which has incorporated some ideas from Relational Lattices.

Graphical Conjunctive Queries. This paper is remarkable because the diagrammatic expression language is well established in quantum computation arena. (See the Coecke & Kissinger textbook for elementary, category-free exposition)

 

 

Quote from Erwin on May 14, 2021, 4:42 pm
Quote from AntC on May 14, 2021, 3:42 am

wrt REMOVE, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT to REMOVE. It is also hand-wavey about how to get from COMPOSE to the needed REMOVE: how does the algebra calculate the attributes in common between two operands being COMPOSEd? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE.)

The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.

(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do.  There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set.  Creates >1 distinct equivalence classes, that is.)

That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.

The problem is you're speaking as an implementor, not as an algebraist. The compiler's process you describe (which is indeed obvious) uses mechanisms for which there's no support in the algebra. Note the operands to COMPOSE might be arbitrary relational expressions, so an algebra can't 'read off' the attribute names from a relvar declaration.

The algebra does not include an operation like attribute-names-from( ); neither does it specify the result of a REMOVE where the attribute-comma-list includes names not in the heading of REMOVE's relation operand. Neither is there an algebraic operation to get the intersection of two attribute-names-from( ), as needed for COMPOSE's REMOVE .

PreviousPage 2 of 12Next