ANNOUNCE: a relational algebra with only one primitive operator (tentative) (please check/critique)
Quote from AntC on May 13, 2021, 11:46 pmThis definition for
NotMatching
is defective:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.- In the case where
r1, r2
are the same value, the rhs of the≡
becomes(r1 NatJoin r1) InnerUnion s = r1
, reduces tor1 InnerUnion s = r1
(Idempotence ofNatJoin
).- The lhs of
≡
wantsNotMatching
to return the minimal value that satisfies rhs. That'll be lattice bottom. For every possibler1
.- So my chief purpose for using
NotMatching
was to obtainemptify(r) = r NotMatching r
as a way to talk about headings: allr2
s withemptify(r2) = emptify(r)
have the same heading. But with that definition foremptify( )
, all relations of whatever heading return the same (bottom) value.- It ain't going to work. And that's why I wasn't making progress with defining
DUM
:DEE NotMatching DEE
still returns bottom.
This definition for NotMatching
is defective:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.- In the case where
r1, r2
are the same value, the rhs of the≡
becomes(r1 NatJoin r1) InnerUnion s = r1
, reduces tor1 InnerUnion s = r1
(Idempotence ofNatJoin
). - The lhs of
≡
wantsNotMatching
to return the minimal value that satisfies rhs. That'll be lattice bottom. For every possibler1
. - So my chief purpose for using
NotMatching
was to obtainemptify(r) = r NotMatching r
as a way to talk about headings: allr2
s withemptify(r2) = emptify(r)
have the same heading. But with that definition foremptify( )
, all relations of whatever heading return the same (bottom) value. - It ain't going to work. And that's why I wasn't making progress with defining
DUM
:DEE NotMatching DEE
still returns bottom.
Quote from AntC on May 14, 2021, 12:36 amQuote from Vadim on May 13, 2021, 4:27 pmQuote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The lattice structure containing only
DEE, DUM
is a perfectly fine structure of relation values. And it is a (degenerate) Boolean Algebra. And my proposed definitions work very smoothly. (DUM
coincides with lattice bottom.)The other [rather artificial] example is the database with all the empty relations.
No that's not a valid lattice structure to model relations: we must always have
DEE
, no matter how many empty relations there are. We can always say the sub-lattice fromDUM
down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can defineDUM
.(As an aside: I could see that a structure containing only
DEE
and empty relations would be problematic to model. Furthermore a structure containing only singleton and empty relations would be problematic -- that is, all attribute types containing only a single possible value. So if I could model a structure in which for each non-empty heading there's at least two possible relation values, I'd be happy enough. I'd even (reluctantly) give away the possibility of a model with onlyDEE, DUM
if I could get the rest to work.)Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
Aside from the trivial
DEE, DUM
structure, no required structures are Boolean Algebras alone. So thank you for the proof thatNAND, NOR
alone are not adequate.In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.
Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce
RENAME
to other basic operations -- the other operations still needRENAME
or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).
I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of
meet, join
is defined via the absorption laws. I merely equivalently definedInnerUnion
(latticejoin
) in terms ofNatJoin
(latticemeet
), to demonstrate the technique before using it to 'fix'NotMatching
-- or so I hoped. From my definition ofInnerUnion
you can mechanically derive the absorption laws, and all the properties ofInnerUnion
.For example I can give a quadratic equation as
ax
2
+ bx + c = 0
, no "derived operation alone on the left hand side" there (or on either side), and from that derive the solution forx
via equational reasoning.
Quote from Vadim on May 13, 2021, 4:27 pmQuote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The lattice structure containing only DEE, DUM
is a perfectly fine structure of relation values. And it is a (degenerate) Boolean Algebra. And my proposed definitions work very smoothly. (DUM
coincides with lattice bottom.)
The other [rather artificial] example is the database with all the empty relations.
No that's not a valid lattice structure to model relations: we must always have DEE
, no matter how many empty relations there are. We can always say the sub-lattice from DUM
down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can define DUM
.
(As an aside: I could see that a structure containing only DEE
and empty relations would be problematic to model. Furthermore a structure containing only singleton and empty relations would be problematic -- that is, all attribute types containing only a single possible value. So if I could model a structure in which for each non-empty heading there's at least two possible relation values, I'd be happy enough. I'd even (reluctantly) give away the possibility of a model with only DEE, DUM
if I could get the rest to work.)
Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
Aside from the trivial DEE, DUM
structure, no required structures are Boolean Algebras alone. So thank you for the proof that NAND, NOR
alone are not adequate.
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.
Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce RENAME
to other basic operations -- the other operations still need RENAME
or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.
In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).
I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of meet, join
is defined via the absorption laws. I merely equivalently defined InnerUnion
(lattice join
) in terms of NatJoin
(lattice meet
), to demonstrate the technique before using it to 'fix' NotMatching
-- or so I hoped. From my definition of InnerUnion
you can mechanically derive the absorption laws, and all the properties of InnerUnion
.
For example I can give a quadratic equation as ax
2
+ bx + c = 0
, no "derived operation alone on the left hand side" there (or on either side), and from that derive the solution for x
via equational reasoning.
Quote from AntC on May 14, 2021, 3:42 amQuote from dandl on May 12, 2021, 1:41 pm...No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin.
No AntiJoin (by which I take it you mean AntiSemiJoin) returns a relation with heading same as its left operand. It doesn't remove (nor add) any attributes.
Easy if you start with negation.
I'm not aiming for 'easy'; I'm aiming for expressively complete. Neither am I aiming for a 'party game' of reducing to a single operator (see my recent post). But if there's more than one operator, we need to define how they interact -- and in an equationally complete way. And if you get to a sufficiently robust set of definitions, you find you could anyway do it with only one operator -- that is, providing your domain has a single 'sort'. In contrast, A has at least two 'sort's: relations and attribute-comma-lists. Now you need operators to get from a relation to the a-c-l for its heading and back.
Quote from dandl on May 13, 2021, 8:41 amOnly one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
We don't need our initial set of operators to include some form of negation. Consider a definition for dyadic operator arithmetic minus (
-
), or more relevantly set difference (\
):
x - y =df z s.t. y + z = x
.x \ y =df z s.t. (y ∩ x) ∪ z = x & y ∩ z = ∅
.- Note the definition for set difference needs some tricky conditions because of the possibility
y
contains elements not inx
; and we must forbidz
from containing elements already iny
.- Perhaps you're puzzled by the format I've presented definitions in my o.p.? rather than that
z s.t. ...
form, I could replacez
by its definiendum (substitution of equals) so go
y + (x - y) = x
(y ∩ x) ∪ (x \ y) = x & y ∩ (x \ y) = ∅
And then I could split that last conjunction into two separate axioms.This fixes
-
and\
uniquely for allx, y
. There's no need to go via monadic minus; indeed we could definenegate x =df 0 - x ≡ z s.t. x + z = 0
. Because0
is identity for+
-- just asDEE
is identity forNatJoin
.Appendix A claims
NAND
could replace only<AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)It says:
"In fact, we will show in the next section that we do not really
need RENAME either; thus, we could in fact reduce our algebra
still further to just the two operators REMOVE and either
NOR or NAND (plus TCLOSE)."I don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.
wrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)Project/Remove is simply InnerUnion with an empty relation representing the new heading.
Project is that.
REMOVE
is not. I did provide an equivalent definition of project as aREMOVE
of aREMOVE
, but that's not sufficient to 'fix' either without some extra machinery to talk about headings-as-sets. Also, as it turns out, I haven't yet fixed how to get an empty relation. ((r NAND r)
would deliver that, but then how do I integrateNAND
into the lattice operations? Or perhaps there's another approach which isn't at all lattice-based? You're welcome to try. Making the overall set of operations 'solvable' enough to express side-conditions and infer equivalence of queries raises a very high bar. Appendix A doesn't reach it.)...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate.DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.I have no idea what this means, or why I should care. My aim was to point out that I don't see any solution with a single primitive unless it's some kind of negation, like NAND. Whether or not you get a 'nice' DUM or not this way, I think you won't get there any other way.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of
MATCHING
; butr1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projectionAntiJoin is cognate with Not Matching: same heading, result is the complement of SemiJoin (aka Matching). I prefer to avoid the distinctive/idiosyncratic TD names for operators, they come with too much baggage.
I prefer to avoid the SQL names, because they come with too much baggage -- especially around
Null
and duplicate rows. If forced, I prefer 'AntiSemiJoin' to 'AntiJoin' (because the 'Anti' in 'AntiJoin' could be anti- some of the other behaviour of Join). The TTM/Tutorial D names I find just perfect. I'm usingNatJoin
rather than bareJOIN
, only because we're in latticeland, andNatJoin
corresponds to latticemeet
.But I have to say I'm not optimistic about your quest. In highly informal terms, you can have an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite, but a single operator to do all that seems far-fetched. More power if you can get there.
Off-topic re "an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite", one of the early (1994 IIRC, although I discovered it much later) attempts at update-through-view by Date and McGoveran noted that equivalence, and tried to take advantage of it by 'reverse engineering' a view definition back to its based-on relvars. The attempt failed abjectly, and ended up making stipulations in which that equivalence didn't hold. I.e. update-through-
INTERSECTION
behaved differently vs update-through-MINUS
-of-MINUS
.I concede it's difficult to get the stipulations to work out. But it isn't difficult to check whether what you've stipulated actually does works out, when you've started by insisting it must work out.
I nearly abandoned TTM at that point, on grounds it seemed to be run by idiots. End of off-topic.
Quote from dandl on May 12, 2021, 1:41 pm...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin.
No AntiJoin (by which I take it you mean AntiSemiJoin) returns a relation with heading same as its left operand. It doesn't remove (nor add) any attributes.
Easy if you start with negation.
I'm not aiming for 'easy'; I'm aiming for expressively complete. Neither am I aiming for a 'party game' of reducing to a single operator (see my recent post). But if there's more than one operator, we need to define how they interact -- and in an equationally complete way. And if you get to a sufficiently robust set of definitions, you find you could anyway do it with only one operator -- that is, providing your domain has a single 'sort'. In contrast, A has at least two 'sort's: relations and attribute-comma-lists. Now you need operators to get from a relation to the a-c-l for its heading and back.
Quote from dandl on May 13, 2021, 8:41 amOnly one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
We don't need our initial set of operators to include some form of negation. Consider a definition for dyadic operator arithmetic minus (-
), or more relevantly set difference (\
):
x - y =df z s.t. y + z = x
.x \ y =df z s.t. (y ∩ x) ∪ z = x & y ∩ z = ∅
.- Note the definition for set difference needs some tricky conditions because of the possibility
y
contains elements not inx
; and we must forbidz
from containing elements already iny
. - Perhaps you're puzzled by the format I've presented definitions in my o.p.? rather than that
z s.t. ...
form, I could replacez
by its definiendum (substitution of equals) so go
y + (x - y) = x
(y ∩ x) ∪ (x \ y) = x & y ∩ (x \ y) = ∅
And then I could split that last conjunction into two separate axioms.
This fixes -
and \
uniquely for all x, y
. There's no need to go via monadic minus; indeed we could define negate x =df 0 - x ≡ z s.t. x + z = 0
. Because 0
is identity for +
-- just as DEE
is identity for NatJoin
.
Appendix A claims
NAND
could replace only<AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)It says:
"In fact, we will show in the next section that we do not really
need RENAME either; thus, we could in fact reduce our algebra
still further to just the two operators REMOVE and either
NOR or NAND (plus TCLOSE)."
I don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.
wrt REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalent ALL BUT
to REMOVE
. It is also hand-wavey about how to get from COMPOSE
to the needed REMOVE
: how does the algebra calculate the attributes in common between two operands being COMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands to COMPOSE
.)
Project/Remove is simply InnerUnion with an empty relation representing the new heading.
Project is that. REMOVE
is not. I did provide an equivalent definition of project as a REMOVE
of a REMOVE
, but that's not sufficient to 'fix' either without some extra machinery to talk about headings-as-sets. Also, as it turns out, I haven't yet fixed how to get an empty relation. ( (r NAND r)
would deliver that, but then how do I integrate NAND
into the lattice operations? Or perhaps there's another approach which isn't at all lattice-based? You're welcome to try. Making the overall set of operations 'solvable' enough to express side-conditions and infer equivalence of queries raises a very high bar. Appendix A doesn't reach it.)
...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate.DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.I have no idea what this means, or why I should care. My aim was to point out that I don't see any solution with a single primitive unless it's some kind of negation, like NAND. Whether or not you get a 'nice' DUM or not this way, I think you won't get there any other way.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of
MATCHING
; butr1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projectionAntiJoin is cognate with Not Matching: same heading, result is the complement of SemiJoin (aka Matching). I prefer to avoid the distinctive/idiosyncratic TD names for operators, they come with too much baggage.
I prefer to avoid the SQL names, because they come with too much baggage -- especially around Null
and duplicate rows. If forced, I prefer 'AntiSemiJoin' to 'AntiJoin' (because the 'Anti' in 'AntiJoin' could be anti- some of the other behaviour of Join). The TTM/Tutorial D names I find just perfect. I'm using NatJoin
rather than bare JOIN
, only because we're in latticeland, and NatJoin
corresponds to lattice meet
.
But I have to say I'm not optimistic about your quest. In highly informal terms, you can have an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite, but a single operator to do all that seems far-fetched. More power if you can get there.
Off-topic re "an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite", one of the early (1994 IIRC, although I discovered it much later) attempts at update-through-view by Date and McGoveran noted that equivalence, and tried to take advantage of it by 'reverse engineering' a view definition back to its based-on relvars. The attempt failed abjectly, and ended up making stipulations in which that equivalence didn't hold. I.e. update-through-INTERSECTION
behaved differently vs update-through-MINUS
-of-MINUS
.
I concede it's difficult to get the stipulations to work out. But it isn't difficult to check whether what you've stipulated actually does works out, when you've started by insisting it must work out.
I nearly abandoned TTM at that point, on grounds it seemed to be run by idiots. End of off-topic.
Quote from dandl on May 14, 2021, 4:14 amQuote from Vadim on May 13, 2021, 4:27 pmQuote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The other [rather artificial] example is the database with all the empty relations.
Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
I think I would be right in saying that a single operator is enough, but it still needs an argument domain, the set of boolean values.
In the same way there might be one, two or more operators in a minimal RA, but they are not responsible for the creation of relational values such as Dee and Dum, which are already part of the domain.
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons. In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
Quote from Vadim on May 13, 2021, 4:27 pmQuote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The other [rather artificial] example is the database with all the empty relations.
Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
I think I would be right in saying that a single operator is enough, but it still needs an argument domain, the set of boolean values.
In the same way there might be one, two or more operators in a minimal RA, but they are not responsible for the creation of relational values such as Dee and Dum, which are already part of the domain.
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons. In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
Quote from dandl on May 14, 2021, 7:13 amI don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.
OK. The way I read App-A is:
- chatty discussion with various asides around the idea of a 'core' RA
- formal definitions for 5 operators: NOT, REMOVE, RENAME, AND, OR (but not TCLOSE)
- a foray into the idea of relcons as s source of new values, but no formal definitions.
I find the formal definitions to be correct, the rest to be thoughts experiments/work in progress. I have completed the work to my satisfaction with formal definitions based on relcons, plus while to replace TCLOSE. I have not found any other formal treatment of the entire extended RA, so this is what I work from.
wrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)It's mostly hand-wavey, but the formal definitions stand on their own.
As to the remainder of your work, you are following a path I don't understand to a destination I cannot see. Best if I just lurk a while.
I don't know how to break this to you, but parts of Appendix A are wrong -- or at least their claims are greatly overstated. This doesn't take away from Appendix A being a great teaching resource to lead learners away from SQL's baroque set of operators and operations to the chill mountain stream of a minimalist algebra.
OK. The way I read App-A is:
- chatty discussion with various asides around the idea of a 'core' RA
- formal definitions for 5 operators: NOT, REMOVE, RENAME, AND, OR (but not TCLOSE)
- a foray into the idea of relcons as s source of new values, but no formal definitions.
I find the formal definitions to be correct, the rest to be thoughts experiments/work in progress. I have completed the work to my satisfaction with formal definitions based on relcons, plus while to replace TCLOSE. I have not found any other formal treatment of the entire extended RA, so this is what I work from.
wrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)
It's mostly hand-wavey, but the formal definitions stand on their own.
As to the remainder of your work, you are following a path I don't understand to a destination I cannot see. Best if I just lurk a while.
Quote from Erwin on May 14, 2021, 4:42 pmQuote from AntC on May 14, 2021, 3:42 amwrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.
(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do. There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set. Creates >1 distinct equivalence classes, that is.)
That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.
Quote from AntC on May 14, 2021, 3:42 am
wrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)
The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.
(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do. There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set. Creates >1 distinct equivalence classes, that is.)
That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.
Quote from Vadim on May 14, 2021, 4:49 pmQuote from AntC on May 14, 2021, 12:36 amQuote from Vadim on May 13, 2021, 4:27 pmThe other [rather artificial] example is the database with all the empty relations.
No that's not a valid lattice structure to model relations: we must always have
DEE
, no matter how many empty relations there are. We can always say the sub-lattice fromDUM
down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can defineDUM
.In the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.
Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.
Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce
RENAME
to other basic operations -- the other operations still needRENAME
or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.Page 11 here: https://books.google.com/books?id=siNlAn8Bm30C&pg=PA1&source=gbs_toc_r&cad=3#v=onepage&q&f=false
In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).
I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of
meet, join
is defined via the absorption laws. I merely equivalently definedInnerUnion
(latticejoin
) in terms ofNatJoin
(latticemeet
), to demonstrate the technique before using it to 'fix'NotMatching
-- or so I hoped. From my definition ofInnerUnion
you can mechanically derive the absorption laws, and all the properties ofInnerUnion
.For example I can give a quadratic equation as
ax
2
+ bx + c = 0
, no "derived operation alone on the left hand side" there (or on either side), and from that derive the solution forx
via equational reasoning.Then, it is terminology confusion, and you need to clarify what do you mean by "primitive operator". Analogies will get you only so far.
Quote from AntC on May 14, 2021, 12:36 amQuote from Vadim on May 13, 2021, 4:27 pmThe other [rather artificial] example is the database with all the empty relations.
No that's not a valid lattice structure to model relations: we must always have
DEE
, no matter how many empty relations there are. We can always say the sub-lattice fromDUM
down to lattice bottom must be a Boolean algebra (as the powerset of all possible attribute-type pairs, formed into headings). That is, if we can defineDUM
.
In the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.
Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons.
Please give a citation for that claim. I think you just made it up. Appendix A (tries to) proceed in that way. But actually fails: for example it does not reduce
RENAME
to other basic operations -- the other operations still needRENAME
or some further mechanism (not specified), as Philip Kelly pointed out many moons ago -- he proposed a mechanism 'equi-relations'.
Page 11 here: https://books.google.com/books?id=siNlAn8Bm30C&pg=PA1&source=gbs_toc_r&cad=3#v=onepage&q&f=false
In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
No. You are just making it up. I am not trying to produce an 'executable' algebra. I'm aiming for a 'solvable' algebra -- in the sense a theorem prover (or human) can infer if two queries are equivalent (will produce the same result over all possible input values, providing initial assumptions hold -- the algebra must be powerful enough to express those assumptions).
I'm not obsessing about a 'party game' of minimising the count of primitives. If we have many primitives, we still have to provide equations to define how they interact, so we can manipulate query equations to prove equivalences. This is just schoolroom solving of 'simultaneous equations'. In conventional lattice treatments, the interaction of
meet, join
is defined via the absorption laws. I merely equivalently definedInnerUnion
(latticejoin
) in terms ofNatJoin
(latticemeet
), to demonstrate the technique before using it to 'fix'NotMatching
-- or so I hoped. From my definition ofInnerUnion
you can mechanically derive the absorption laws, and all the properties ofInnerUnion
.For example I can give a quadratic equation as
ax
2
+ bx + c = 0
, no "derived operation alone on the left hand side" there (or on either side), and from that derive the solution forx
via equational reasoning.
Then, it is terminology confusion, and you need to clarify what do you mean by "primitive operator". Analogies will get you only so far.
Quote from Erwin on May 14, 2021, 5:36 pmQuote from Vadim on May 14, 2021, 4:49 pmIn the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.
Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)
I do remember having had some correspondence, but I no longer remember any of the detail. That said,
DUM === {} , DEE === {{}}
DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.
Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.
Quote from Vadim on May 14, 2021, 4:49 pmIn the database of all the empty relations DUM=DEE, otherwise you'll break some RL axioms.
Yes, it is counterintuitive, because the TTM stipulates DEE containing one tuple, and there is no way to satify this condition with empty relations. (I vaguely remember correspondence with Ervin who pointed out holes in this set based definition.)
I do remember having had some correspondence, but I no longer remember any of the detail. That said,
DUM === {} , DEE === {{}}
DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.
Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.
Quote from Vadim on May 14, 2021, 6:38 pmQuote from Erwin on May 14, 2021, 5:36 pmDUM === {} , DEE === {{}}
DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.
Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.
A relation is a pair, not a set. Therefore, you must have meant
[latex] DUM \leftrightarrow \langle \{\}, \{\}\rangle [/latex]
[latex] DEE \leftrightarrow \langle \{\}, \{ \langle\rangle \}\rangle [/latex]
Then, even if you use Kuratovski pair for binary tuple, you still have to define what is a zilliary tuple. (BTW, Kuratovski pair has virtually no utility in mathematics). Also, in contemporary mathematics you never compare the two sets verbatim. Most often they are considered to be the "same" up to some equivalence. This idea is manifested in Anthony's presentation, where DEE and DUM are considered as couple of objects "fixed" inside an algebraic structure. Anthony and myself, however, disagree how successful those axiomatic definitions are in "fixing" the DUM, and even whether "fixing" an object is really a big deal.
Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor. I'd suggest that the database theory is still far from being cast in stone. Witness novel query languages appearing:
LARA (AKA LaraDB) which has incorporated some ideas from Relational Lattices.
Graphical Conjunctive Queries. This paper is remarkable because the diagrammatic expression language is well established in quantum computation arena. (See the Coecke & Kissinger textbook for elementary, category-free exposition)
Quote from Erwin on May 14, 2021, 5:36 pmDUM === {} , DEE === {{}}
DUM is the empty set of empty sets, DEE is the set that has the empty set [empty set of tuples, that is] for its member.
Requiring the empty set to be equal to a nonempty set is the most perfect example of mathematical Harakiri I can imagine.
A relation is a pair, not a set. Therefore, you must have meant
DUM \leftrightarrow \langle \{\}, \{\}\rangle
DEE \leftrightarrow \langle \{\}, \{ \langle\rangle \}\rangle
Then, even if you use Kuratovski pair for binary tuple, you still have to define what is a zilliary tuple. (BTW, Kuratovski pair has virtually no utility in mathematics). Also, in contemporary mathematics you never compare the two sets verbatim. Most often they are considered to be the "same" up to some equivalence. This idea is manifested in Anthony's presentation, where DEE and DUM are considered as couple of objects "fixed" inside an algebraic structure. Anthony and myself, however, disagree how successful those axiomatic definitions are in "fixing" the DUM, and even whether "fixing" an object is really a big deal.
Regarding your critique of Anthony's motivation, you shall not dismiss curiosity factor. I'd suggest that the database theory is still far from being cast in stone. Witness novel query languages appearing:
LARA (AKA LaraDB) which has incorporated some ideas from Relational Lattices.
Graphical Conjunctive Queries. This paper is remarkable because the diagrammatic expression language is well established in quantum computation arena. (See the Coecke & Kissinger textbook for elementary, category-free exposition)
Quote from AntC on May 14, 2021, 10:30 pmQuote from Erwin on May 14, 2021, 4:42 pmQuote from AntC on May 14, 2021, 3:42 amwrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.
(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do. There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set. Creates >1 distinct equivalence classes, that is.)
That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.
The problem is you're speaking as an implementor, not as an algebraist. The compiler's process you describe (which is indeed obvious) uses mechanisms for which there's no support in the algebra. Note the operands to
COMPOSE
might be arbitrary relational expressions, so an algebra can't 'read off' the attribute names from a relvar declaration.The algebra does not include an operation like
attribute-names-from( )
; neither does it specify the result of aREMOVE
where the attribute-comma-list includes names not in the heading ofREMOVE
's relation operand. Neither is there an algebraic operation to get the intersection of twoattribute-names-from( )
, as needed forCOMPOSE
'sREMOVE
.
Quote from Erwin on May 14, 2021, 4:42 pmQuote from AntC on May 14, 2021, 3:42 amwrt
REMOVE
, Appendix A is hand-wavey about how to get from a projection in Tutorial D to an equivalentALL BUT
toREMOVE
. It is also hand-wavey about how to get fromCOMPOSE
to the neededREMOVE
: how does the algebra calculate the attributes in common between two operands beingCOMPOSE
d? (IOW what is a "macro" operator in this sense? That goes back to your extended discussion about macros on another thread, and at least needs some sort of program reflection to run after the compiler has inferred the heading of the expressions that are operands toCOMPOSE
.)The compiler knows about the headings of the operands, and perhaps holds that information in the form of a relation with heading {attributename:attributenamedomain, typename:typenamedomain} and applies, for example, semijoin or semiminus, on the two relation values at hand, to reliably and deterministically compute the results you say they are "hand-wavey" about.
(IOW a "macro" operator in this sense is some operator on some data types that happens to be used by the compiler for doing the job it is supposed to do. There is no implication that within the set of all operators used by either the compiler or any of the application programs, "being a macro operator" (thus giving rise to assessable predicates like "X is a 'macro' operator and Y is too") is a characteristic that creates equivalence classes within that set. Creates >1 distinct equivalence classes, that is.)
That all seems so blatantly obvious and trivial that I'm wondering what the hell the problem is.
The problem is you're speaking as an implementor, not as an algebraist. The compiler's process you describe (which is indeed obvious) uses mechanisms for which there's no support in the algebra. Note the operands to COMPOSE
might be arbitrary relational expressions, so an algebra can't 'read off' the attribute names from a relvar declaration.
The algebra does not include an operation like attribute-names-from( )
; neither does it specify the result of a REMOVE
where the attribute-comma-list includes names not in the heading of REMOVE
's relation operand. Neither is there an algebraic operation to get the intersection of two attribute-names-from( )
, as needed for COMPOSE
's REMOVE
.