ANNOUNCE: a relational algebra with only one primitive operator (tentative) (please check/critique)
Quote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings.- So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's
UNION CORRESPONDING
aka ISBLUNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
.r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Defining
InnerUnion
in terms ofNatJoin
In the formalism of lattices, these two operators are complementary, and that can be expressed equationally in the absorption laws. Or we can equivalently define
InnerUnion
in terms ofNatJoin
, then obtain the absorption laws, and other properties ofInnerUnion
by equational reasoning:
r InnerUnion (r NatJoin s) = r
-- InnerUnion absorbs NatJoin.r NatJoin (r InnerUnion s) = r
-- NatJoin absorbs InnerUnion.∀ r1 r2 s. (r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)) ≡ ((r1 InnerUnion r2) = (r1 InnerUnion r2) NatJoin s)
.
(A uniqueness proof is available forInnerUnion
defined this way -- verified with good old Prover9.)- Having defined
DEE
as the identity forNatJoin
, we now have a half-bounded lattice withDEE
as top. In fact we have a fully-bounded lattice, but there's no cute name for lattice bottom (I call itDumpty
), and it's a rather vertiginous empty relation with the maximum possible heading; I haven't needed to mention it in the definitions, so it'll play no further part.Ok that's all the lattice theory I'm going to introduce -- indeed I didn't need to let on it was anything to do with lattices, you can just go by the equations.
The equational technique to note in that equation for
InnerUnion
is there are many candidate relationss
s.t.r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)
;s = DEE
would be a solution;(r1 InnerJoin r2)
is fixed to be the bottommost suchs
/ the closest tor1, r2
/ the furthest fromDEE
.Defining
NotMatching
in terms ofNatJoin, InnerUnion
Noting (from Tropashko) that
(r1 NatJoin r2) InnerUnion (r1 NotMatching r2) = r1;
in general there's many candidatess
s.t.(r1 NatJoin r2) InnerUnion s = r1
, includings = r1
. This uses the same equational technique to require theNotMatching
be the bottommost suchs
/ furthest fromr1
:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.TentativeDUM =df DEE NotMatching DEE
.And now I have to tell of an ugliness. Help requested to find a more elegant equation than the following. Prover9 alleges the above definition is sufficient to make
NotMatching
unique; and to makeTentativeDUM
unique. But (using Test-Driven-Development) lots of the equational reasoning I expected to be fixed forNotMatching
turned out to have non-unique answers. Adding this (ugly) axiom resolved it. (Ugly in the sense of needing a negation, which makes the whole model potentially unstable -- Litak et al private correspondence.)
(r NotMatching r) ≠ r ≡ r InnerUnion (DEE NotMatching DEE) = DEE
.- Informally:
(r NotMatching r) ≠ r
meansr
is non-empty; equivalently projectingr
on no attributes= DEE
meansr
is non-empty.- The crucial effect of that axiom is to 'skew' the lattice: We can derive that there are no relation values falling between
DEE
and(DEE NotMatching DEE)
. Whereas for mostr
,(r NotMatching r)
are a long way apart: what's in between is all the relations whose bodies form the powerset of the tuples inr
. (BTW that powerset structure forms a (latticeland conventional) distributive sub-lattice -- as is provable from the axioms, contrast Litak et al need a specific axiom to fix that. Perhaps there's some equivalence?)- So I've postponed defining
DUM
untilREMOVE
is also defined; thenDUM
is defined as above as the result from 'generalisedNotMatching/REMOVE
', which I hope locks the equations together.Defining
REMOVE
in terms ofNatJoin, InnerUnion, NotMatching, DEE
tba [it's late].
[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
. - When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed. - And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common? - (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings. - So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's UNION CORRESPONDING
aka ISBL UNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
. r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.
Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators. - Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.
At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument to emptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.
My attempts at such definitions have foundered in the past because they relied on DUM
being defined; and I couldn't obtain a definition for DUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)
The definition for DEE
hadn't suffered that difficulty, because we can fix DEE
as the identity element for NatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioning DUM
; then define DUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)
Defining InnerUnion
in terms of NatJoin
In the formalism of lattices, these two operators are complementary, and that can be expressed equationally in the absorption laws. Or we can equivalently define InnerUnion
in terms of NatJoin
, then obtain the absorption laws, and other properties of InnerUnion
by equational reasoning:
r InnerUnion (r NatJoin s) = r
-- InnerUnion absorbs NatJoin.r NatJoin (r InnerUnion s) = r
-- NatJoin absorbs InnerUnion.∀ r1 r2 s. (r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)) ≡ ((r1 InnerUnion r2) = (r1 InnerUnion r2) NatJoin s)
.
(A uniqueness proof is available forInnerUnion
defined this way -- verified with good old Prover9.)- Having defined
DEE
as the identity forNatJoin
, we now have a half-bounded lattice withDEE
as top. In fact we have a fully-bounded lattice, but there's no cute name for lattice bottom (I call itDumpty
), and it's a rather vertiginous empty relation with the maximum possible heading; I haven't needed to mention it in the definitions, so it'll play no further part.
Ok that's all the lattice theory I'm going to introduce -- indeed I didn't need to let on it was anything to do with lattices, you can just go by the equations.
The equational technique to note in that equation for InnerUnion
is there are many candidate relations s
s.t. r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)
; s = DEE
would be a solution; (r1 InnerJoin r2)
is fixed to be the bottommost such s
/ the closest to r1, r2
/ the furthest from DEE
.
Defining NotMatching
in terms of NatJoin, InnerUnion
Noting (from Tropashko) that (r1 NatJoin r2) InnerUnion (r1 NotMatching r2) = r1;
in general there's many candidates s
s.t. (r1 NatJoin r2) InnerUnion s = r1
, including s = r1
. This uses the same equational technique to require the NotMatching
be the bottommost such s
/ furthest from r1
:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.TentativeDUM =df DEE NotMatching DEE
.
And now I have to tell of an ugliness. Help requested to find a more elegant equation than the following. Prover9 alleges the above definition is sufficient to make NotMatching
unique; and to make TentativeDUM
unique. But (using Test-Driven-Development) lots of the equational reasoning I expected to be fixed for NotMatching
turned out to have non-unique answers. Adding this (ugly) axiom resolved it. (Ugly in the sense of needing a negation, which makes the whole model potentially unstable -- Litak et al private correspondence.)
(r NotMatching r) ≠ r ≡ r InnerUnion (DEE NotMatching DEE) = DEE
.- Informally:
(r NotMatching r) ≠ r
meansr
is non-empty; equivalently projectingr
on no attributes= DEE
meansr
is non-empty. - The crucial effect of that axiom is to 'skew' the lattice: We can derive that there are no relation values falling between
DEE
and(DEE NotMatching DEE)
. Whereas for mostr
,(r NotMatching r)
are a long way apart: what's in between is all the relations whose bodies form the powerset of the tuples inr
. (BTW that powerset structure forms a (latticeland conventional) distributive sub-lattice -- as is provable from the axioms, contrast Litak et al need a specific axiom to fix that. Perhaps there's some equivalence?) - So I've postponed defining
DUM
untilREMOVE
is also defined; thenDUM
is defined as above as the result from 'generalisedNotMatching/REMOVE
', which I hope locks the equations together.
Defining REMOVE
in terms of NatJoin, InnerUnion, NotMatching, DEE
tba [it's late].
Quote from dandl on May 12, 2021, 1:41 pmQuote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings.- So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's
UNION CORRESPONDING
aka ISBLUNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
.r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND). This choice looks doomed to failure.
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
Defining
InnerUnion
in terms ofNatJoin
In the formalism of lattices, these two operators are complementary, and that can be expressed equationally in the absorption laws. Or we can equivalently define
InnerUnion
in terms ofNatJoin
, then obtain the absorption laws, and other properties ofInnerUnion
by equational reasoning:
r InnerUnion (r NatJoin s) = r
-- InnerUnion absorbs NatJoin.r NatJoin (r InnerUnion s) = r
-- NatJoin absorbs InnerUnion.∀ r1 r2 s. (r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)) ≡ ((r1 InnerUnion r2) = (r1 InnerUnion r2) NatJoin s)
.
(A uniqueness proof is available forInnerUnion
defined this way -- verified with good old Prover9.)- Having defined
DEE
as the identity forNatJoin
, we now have a half-bounded lattice withDEE
as top. In fact we have a fully-bounded lattice, but there's no cute name for lattice bottom (I call itDumpty
), and it's a rather vertiginous empty relation with the maximum possible heading; I haven't needed to mention it in the definitions, so it'll play no further part.Ok that's all the lattice theory I'm going to introduce -- indeed I didn't need to let on it was anything to do with lattices, you can just go by the equations.
The equational technique to note in that equation for
InnerUnion
is there are many candidate relationss
s.t.r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)
;s = DEE
would be a solution;(r1 InnerJoin r2)
is fixed to be the bottommost suchs
/ the closest tor1, r2
/ the furthest fromDEE
.Defining
NotMatching
in terms ofNatJoin, InnerUnion
Noting (from Tropashko) that
(r1 NatJoin r2) InnerUnion (r1 NotMatching r2) = r1;
in general there's many candidatess
s.t.(r1 NatJoin r2) InnerUnion s = r1
, includings = r1
. This uses the same equational technique to require theNotMatching
be the bottommost suchs
/ furthest fromr1
:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.TentativeDUM =df DEE NotMatching DEE
.And now I have to tell of an ugliness. Help requested to find a more elegant equation than the following. Prover9 alleges the above definition is sufficient to make
NotMatching
unique; and to makeTentativeDUM
unique. But (using Test-Driven-Development) lots of the equational reasoning I expected to be fixed forNotMatching
turned out to have non-unique answers. Adding this (ugly) axiom resolved it. (Ugly in the sense of needing a negation, which makes the whole model potentially unstable -- Litak et al private correspondence.)
(r NotMatching r) ≠ r ≡ r InnerUnion (DEE NotMatching DEE) = DEE
.- Informally:
(r NotMatching r) ≠ r
meansr
is non-empty; equivalently projectingr
on no attributes= DEE
meansr
is non-empty.- The crucial effect of that axiom is to 'skew' the lattice: We can derive that there are no relation values falling between
DEE
and(DEE NotMatching DEE)
. Whereas for mostr
,(r NotMatching r)
are a long way apart: what's in between is all the relations whose bodies form the powerset of the tuples inr
. (BTW that powerset structure forms a (latticeland conventional) distributive sub-lattice -- as is provable from the axioms, contrast Litak et al need a specific axiom to fix that. Perhaps there's some equivalence?)- So I've postponed defining
DUM
untilREMOVE
is also defined; thenDUM
is defined as above as the result from 'generalisedNotMatching/REMOVE
', which I hope locks the equations together.Defining
REMOVE
in terms ofNatJoin, InnerUnion, NotMatching, DEE
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
tba [it's late].
Care to mention why you didn't follow the NAND lead in App-A?
Quote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings.- So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's
UNION CORRESPONDING
aka ISBLUNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
.r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.
No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND). This choice looks doomed to failure.
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)
Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
Defining
InnerUnion
in terms ofNatJoin
In the formalism of lattices, these two operators are complementary, and that can be expressed equationally in the absorption laws. Or we can equivalently define
InnerUnion
in terms ofNatJoin
, then obtain the absorption laws, and other properties ofInnerUnion
by equational reasoning:
r InnerUnion (r NatJoin s) = r
-- InnerUnion absorbs NatJoin.r NatJoin (r InnerUnion s) = r
-- NatJoin absorbs InnerUnion.∀ r1 r2 s. (r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)) ≡ ((r1 InnerUnion r2) = (r1 InnerUnion r2) NatJoin s)
.
(A uniqueness proof is available forInnerUnion
defined this way -- verified with good old Prover9.)- Having defined
DEE
as the identity forNatJoin
, we now have a half-bounded lattice withDEE
as top. In fact we have a fully-bounded lattice, but there's no cute name for lattice bottom (I call itDumpty
), and it's a rather vertiginous empty relation with the maximum possible heading; I haven't needed to mention it in the definitions, so it'll play no further part.Ok that's all the lattice theory I'm going to introduce -- indeed I didn't need to let on it was anything to do with lattices, you can just go by the equations.
The equational technique to note in that equation for
InnerUnion
is there are many candidate relationss
s.t.r1 = (r1 NatJoin s) & r2 = (r2 NatJoin s)
;s = DEE
would be a solution;(r1 InnerJoin r2)
is fixed to be the bottommost suchs
/ the closest tor1, r2
/ the furthest fromDEE
.Defining
NotMatching
in terms ofNatJoin, InnerUnion
Noting (from Tropashko) that
(r1 NatJoin r2) InnerUnion (r1 NotMatching r2) = r1;
in general there's many candidatess
s.t.(r1 NatJoin r2) InnerUnion s = r1
, includings = r1
. This uses the same equational technique to require theNotMatching
be the bottommost suchs
/ furthest fromr1
:
r1 NatJoin s = s & s NatJoin (r1 NotMatching r2) = (r1 NotMatching r2) ≡ (r1 NatJoin r2) InnerUnion s = r1
.TentativeDUM =df DEE NotMatching DEE
.And now I have to tell of an ugliness. Help requested to find a more elegant equation than the following. Prover9 alleges the above definition is sufficient to make
NotMatching
unique; and to makeTentativeDUM
unique. But (using Test-Driven-Development) lots of the equational reasoning I expected to be fixed forNotMatching
turned out to have non-unique answers. Adding this (ugly) axiom resolved it. (Ugly in the sense of needing a negation, which makes the whole model potentially unstable -- Litak et al private correspondence.)
(r NotMatching r) ≠ r ≡ r InnerUnion (DEE NotMatching DEE) = DEE
.- Informally:
(r NotMatching r) ≠ r
meansr
is non-empty; equivalently projectingr
on no attributes= DEE
meansr
is non-empty.- The crucial effect of that axiom is to 'skew' the lattice: We can derive that there are no relation values falling between
DEE
and(DEE NotMatching DEE)
. Whereas for mostr
,(r NotMatching r)
are a long way apart: what's in between is all the relations whose bodies form the powerset of the tuples inr
. (BTW that powerset structure forms a (latticeland conventional) distributive sub-lattice -- as is provable from the axioms, contrast Litak et al need a specific axiom to fix that. Perhaps there's some equivalence?)- So I've postponed defining
DUM
untilREMOVE
is also defined; thenDUM
is defined as above as the result from 'generalisedNotMatching/REMOVE
', which I hope locks the equations together.Defining
REMOVE
in terms ofNatJoin, InnerUnion, NotMatching, DEE
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
tba [it's late].
Care to mention why you didn't follow the NAND lead in App-A?
Quote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra. It can be axiomatized with a single operation, but this operation is not a lattice one.
P.S. A lot of questions still hang from our past discussions. For example, what do you mean by the "fixed" algebraic object (zilliary operation DUM, unary negation, etc.)? It looks like the Category Theory might provide formalization (the same object/morphism up to equivalency, up to isomorphism etc). However, I will not pretend I mastered this subject yet. (And how do they write books like "Category theory for working [scientist]" (replace the "scientist" with anything but "mathematician") when all the category examples are mathematical objects, of which the reader might be familiar with only one or two?)
Consider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra. It can be axiomatized with a single operation, but this operation is not a lattice one.
P.S. A lot of questions still hang from our past discussions. For example, what do you mean by the "fixed" algebraic object (zilliary operation DUM, unary negation, etc.)? It looks like the Category Theory might provide formalization (the same object/morphism up to equivalency, up to isomorphism etc). However, I will not pretend I mastered this subject yet. (And how do they write books like "Category theory for working [scientist]" (replace the "scientist" with anything but "mathematician") when all the category examples are mathematical objects, of which the reader might be familiar with only one or two?)
Quote from Erwin on May 12, 2021, 7:34 pmQuote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
Are you asking this out of a concern about "query optimisation" ? If not, don't read any further. If so, you may but it's at your own risk.
From having written SIRA_PRISE from the ground up, I have come to believe that "query rewrite" is the wrong strategy, or the wrong problem, or the wrong solution, or, well, the wrong something. I have come to believe that people asking themselves (and worse, others too) these questions need to get their head cleared (***totally***) of their own preconceptions and ask themselves ***WHAT*** problem it is they are trying to solve and while doing so, wonder about ***every single aspect of the definition they find*** whether it is an aspect that pertains to the 'model' space or one that pertains to the 'implementation' space.
The first reason for my saying so is that when you have a query that you want to find an optimal execution strategy for, and you manage to rewrite that query to a provably equivalent one, what else do you have after that than yet another query that you want to find an optimal execution strategy for ? (The answer to that question is : "a query that has certain properties/satisfies certain conditions that the original query didn't". I suggest you now start to wonder about ***what exactly those properties/conditions are***. I think you are bound to find, and cannot escape, that the answer is "properties/conditions that are determined by what the optimizer does and doesn't do". But if your goal is to ***find*** what the optimizer should and shouldn't do, then you don't know what those properties/conditions are, and as a consequence you don't know what qualities the result of the rewrite is supposed to meet/adhere to/abide by/... .)
The second reason for my saying so is, why do you only mention 'restrict' and 'project' as candidates for "pushing through join" ? "Pushing extend through join" is a worthwhile strategy to consider if the EXTEND is computationally expensive [and the pushing is possible].
The third reason for my saying so, is that from having to find reliable answers to these kinds of question or otherwise I'd never have gotten the working product I have, I know that "push restrict through join" (and "pushing extend through join" is -as far as the semantics of the query at hand are concerned- in the same category because RESTRICT and EXTEND are in the very same 'A' category of operators, or perhaps I should say "are cases of the very same 'A' operator") still has distinct cases to consider. Say, relational expression A with attrs Ax,Cy and relational expression B with attrs Cy,Bz. "WHERE f(Ai,Bj)" : not pushable on the [natural] join of A and B. Ditto for EXTEND. "WHERE f(Ai,Cj)" : pushable, because the result can by definition include only tuples for which f(Ai,Cj) is true and that's evaluatable on A on its own. Ditto for f(Ai,Cj) used in EXTEND, but ***only*** useful in the case where f is known to be computationally expensive, ***and*** the risk is non-negligible that the very same computation would have to be carried out several times for each tuple matched in the JOIN. The point of this very long paragraph being : "push restrict through join" is just simply not enough as a detailed definition of the thing you are thinking of.
Further elaboration : now, f(Ax,By) is a logical conjunct where any Ax might be any of Ai,Cj and the same holds for By. If there is any (Ax,By) where Ax is actually an Ai and By is actually a Bj, then the conjunction as a whole is not "pushable". However, there might be constituants of the CNF form of the logical conjuct that ***are*** "pushable". If you can find that CNF form and observe certain sub-constituants that are indeed "pushable" and that effectively reduce (per the best esimates you can possible make) the cost estimate that your optimizer will associate with "carrying out this particular strategy", then push them.
Somewhat similar observations for the case where F(Ax,By) is a logical disjunct, but I'm afraid I'm getting too tired to continue writing and not start uttering stupidities.
Quote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
Are you asking this out of a concern about "query optimisation" ? If not, don't read any further. If so, you may but it's at your own risk.
From having written SIRA_PRISE from the ground up, I have come to believe that "query rewrite" is the wrong strategy, or the wrong problem, or the wrong solution, or, well, the wrong something. I have come to believe that people asking themselves (and worse, others too) these questions need to get their head cleared (***totally***) of their own preconceptions and ask themselves ***WHAT*** problem it is they are trying to solve and while doing so, wonder about ***every single aspect of the definition they find*** whether it is an aspect that pertains to the 'model' space or one that pertains to the 'implementation' space.
The first reason for my saying so is that when you have a query that you want to find an optimal execution strategy for, and you manage to rewrite that query to a provably equivalent one, what else do you have after that than yet another query that you want to find an optimal execution strategy for ? (The answer to that question is : "a query that has certain properties/satisfies certain conditions that the original query didn't". I suggest you now start to wonder about ***what exactly those properties/conditions are***. I think you are bound to find, and cannot escape, that the answer is "properties/conditions that are determined by what the optimizer does and doesn't do". But if your goal is to ***find*** what the optimizer should and shouldn't do, then you don't know what those properties/conditions are, and as a consequence you don't know what qualities the result of the rewrite is supposed to meet/adhere to/abide by/... .)
The second reason for my saying so is, why do you only mention 'restrict' and 'project' as candidates for "pushing through join" ? "Pushing extend through join" is a worthwhile strategy to consider if the EXTEND is computationally expensive [and the pushing is possible].
The third reason for my saying so, is that from having to find reliable answers to these kinds of question or otherwise I'd never have gotten the working product I have, I know that "push restrict through join" (and "pushing extend through join" is -as far as the semantics of the query at hand are concerned- in the same category because RESTRICT and EXTEND are in the very same 'A' category of operators, or perhaps I should say "are cases of the very same 'A' operator") still has distinct cases to consider. Say, relational expression A with attrs Ax,Cy and relational expression B with attrs Cy,Bz. "WHERE f(Ai,Bj)" : not pushable on the [natural] join of A and B. Ditto for EXTEND. "WHERE f(Ai,Cj)" : pushable, because the result can by definition include only tuples for which f(Ai,Cj) is true and that's evaluatable on A on its own. Ditto for f(Ai,Cj) used in EXTEND, but ***only*** useful in the case where f is known to be computationally expensive, ***and*** the risk is non-negligible that the very same computation would have to be carried out several times for each tuple matched in the JOIN. The point of this very long paragraph being : "push restrict through join" is just simply not enough as a detailed definition of the thing you are thinking of.
Further elaboration : now, f(Ax,By) is a logical conjunct where any Ax might be any of Ai,Cj and the same holds for By. If there is any (Ax,By) where Ax is actually an Ai and By is actually a Bj, then the conjunction as a whole is not "pushable". However, there might be constituants of the CNF form of the logical conjuct that ***are*** "pushable". If you can find that CNF form and observe certain sub-constituants that are indeed "pushable" and that effectively reduce (per the best esimates you can possible make) the cost estimate that your optimizer will associate with "carrying out this particular strategy", then push them.
Somewhat similar observations for the case where F(Ax,By) is a logical disjunct, but I'm afraid I'm getting too tired to continue writing and not start uttering stupidities.
Quote from AntC on May 13, 2021, 1:02 amQuote from dandl on May 12, 2021, 1:41 pmQuote from AntC on May 12, 2021, 11:46 amMotivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings.- So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's
UNION CORRESPONDING
aka ISBLUNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
.r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
Appendix A claims
NAND
could replace only<AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate.DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of
MATCHING
; butr1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projection
Quote from dandl on May 12, 2021, 1:41 pmQuote from AntC on May 12, 2021, 11:46 amMotivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
We could introduce machinery into the algebra to extract the heading of a relation(al expression) to an attribute list, and provide set operators to manipulate the list. But
- That gives two 'sorts' (in the mathematical sense) of entities to reason about, with a tricky inter-relationship as they shuttle from heading to set operator back to operand of relational operator.
- We already have relational operators that carry out some set operations on headings: specifically,
<AND>, <OR>
take the union of headings.- So several on this forum have already advocated for using relation values to denote their headings, when appearing as operands in positions that want a heading -- that is, the operator will ignore the relation value's content, and/or the algebra will 'emptify' the relation.
(Empty) Relations denoting headings
Consider Tropashko's 'Inner Union' aka SQL's
UNION CORRESPONDING
aka ISBLUNION
:
- It projects each operand's body on the attributes in common; then forms the set-union of those projections/bodies.
- Then we can define
r PROJECT s =df r InnerUnion emptify(s)
.r COMPOSE s =df (r NatJoin s) REMOVE emptify(r InnerUnion s)
.r PROJECT s =df r REMOVE emptify(r REMOVE s)
-- alternative definition.emptify(r) =df r NotMatching r
.Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
Appendix A claims NAND
could replace only <AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)
...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate. DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of MATCHING
; but r1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projection
Quote from AntC on May 13, 2021, 2:30 amQuote from Erwin on May 12, 2021, 7:34 pmQuote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
Are you asking this out of a concern about "query optimisation" ?
To be clear, this isn't a mechanism to tell an implementor how to optimise a query. It is aimed at mechanically checking whether a query is equivalent to some (alleged) optimisation. The optimisation strategy is likely to rely on observing that mentioned relations do/don't have attributes in common. From what I've shown about (empty) relations-denoting-headings, you should be able to see the algebra is expressive enough. Optimisations also rely on considerations like keys and foreign keys. Then (I claim, not yet substantiated) the algebra is powerful enough to express those properties.
If not, don't read any further. If so, you may but it's at your own risk.
From having written SIRA_PRISE from the ground up, I have come to believe that "query rewrite" is the wrong strategy, or the wrong problem, or the wrong solution, or, well, the wrong something.
Then you're swimming against the tide of modern compiler practice. "AST rewrite" is the strategy for implementors: rewrite the surface-language AST to some reduced-instruction-set algebra/AST; optimise that by manipulating the AST; type-check as you go, as a sanity check over the manipulations; rewrite to some assembly-level or LLVM/abstract JVM AST; translate to object code.
I have come to believe that people asking themselves (and worse, others too) these questions need to get their head cleared (***totally***) of their own preconceptions and ask themselves ***WHAT*** problem it is they are trying to solve and while doing so, wonder about ***every single aspect of the definition they find*** whether it is an aspect that pertains to the 'model' space or one that pertains to the 'implementation' space.
The first reason for my saying so is that when you have a query that you want to find an optimal execution strategy for, and you manage to rewrite that query to a provably equivalent one, ...
And how are you proving equivalence? From what I see of SQL engines, it's black arts and rule-of-thumb. Of course SQL is severely complicated by dealing with
Null
, duplicate rows, duplicate/anonymous columns.what else do you have after that than yet another query that you want to find an optimal execution strategy for ? (The answer to that question is : "a query that has certain properties/satisfies certain conditions that the original query didn't". I suggest you now start to wonder about ***what exactly those properties/conditions are***. I think you are bound to find, and cannot escape, that the answer is "properties/conditions that are determined by what the optimizer does and doesn't do". But if your goal is to ***find*** what the optimizer should and shouldn't do, then you don't know what those properties/conditions are, and as a consequence you don't know what qualities the result of the rewrite is supposed to meet/adhere to/abide by/... .)
The second reason for my saying so is, why do you only mention 'restrict' and 'project' as candidates for "pushing through join" ? "Pushing extend through join" is a worthwhile strategy to consider if the EXTEND is computationally expensive [and the pushing is possible].
I mentioned 'restrict' and 'project' push-through-join as the most common examples. And the same technique works for push-EXTEND-through join; and already is catered for by the A algebra. Frankly if you can't already see that, I'm wondering how competent an implementor you are. To give an example using only A:
- Assuming we have relvar
DoubleQty := ((PLUS WHERE X = Y) {ALL BUT Y}) RENAME { QTY := X, QTYx2 := Z }
- Take query
((S JOIN SP) WHERE STATUS = 10) EXTEND { QTYx2 := QTY + QTY };
- Translate to A
((S <AND> SP) <AND> REL{TUP{ STATUS 10 }}) <AND> DoubleQty;
<AND>
is associative, commutative, so shuffle around the parens:((S <AND> REL{TUP{ STATUS 10 }}) <AND> (SP <AND> DoubleQty));
- Translate back to Tutorial D
((S WHERE STATUS = 10) JOIN (SP EXTEND { QTYx2 := QTY + QTY}));
- QED: push EXTEND through join, and push restrict through join.
- I didn't show the bit about checking for attributes in common. Going only by associativity of <AND> could end up with this:
((S EXTEND { QTYx2 := QTY + QTY }) JOIN (SP WHERE STATUS = 10));
which might still work if actually executed by an (imaginary) A engine, but would then rely on domain-dependent semantics/infinite relations, so clearly isn't any sort of optimisation.I think the following 2 paras might be saying the same as my demonstration. But your prolixity has buried any point. I am not saying that push-x-through-join is necessarily an optimisation. Only the implementor can know that, wrt the specifics of their engine.
The third reason for my saying so, is that from having to find reliable answers to these kinds of question or otherwise I'd never have gotten the working product I have, I know that "push restrict through join" (and "pushing extend through join" is -as far as the semantics of the query at hand are concerned- in the same category because RESTRICT and EXTEND are in the very same 'A' category of operators, or perhaps I should say "are cases of the very same 'A' operator") still has distinct cases to consider. Say, relational expression A with attrs Ax,Cy and relational expression B with attrs Cy,Bz. "WHERE f(Ai,Bj)" : not pushable on the [natural] join of A and B. Ditto for EXTEND. "WHERE f(Ai,Cj)" : pushable, because the result can by definition include only tuples for which f(Ai,Cj) is true and that's evaluatable on A on its own. Ditto for f(Ai,Cj) used in EXTEND, but ***only*** useful in the case where f is known to be computationally expensive, ***and*** the risk is non-negligible that the very same computation would have to be carried out several times for each tuple matched in the JOIN. The point of this very long paragraph being : "push restrict through join" is just simply not enough as a detailed definition of the thing you are thinking of.
Further elaboration : now, f(Ax,By) is a logical conjunct where any Ax might be any of Ai,Cj and the same holds for By. If there is any (Ax,By) where Ax is actually an Ai and By is actually a Bj, then the conjunction as a whole is not "pushable". However, there might be constituants of the CNF form of the logical conjuct that ***are*** "pushable". If you can find that CNF form and observe certain sub-constituants that are indeed "pushable" and that effectively reduce (per the best esimates you can possible make) the cost estimate that your optimizer will associate with "carrying out this particular strategy", then push them.
Somewhat similar observations for the case where F(Ax,By) is a logical disjunct, but I'm afraid I'm getting too tired to continue writing and not start uttering stupidities.
Quote from Erwin on May 12, 2021, 7:34 pmQuote from AntC on May 12, 2021, 11:46 am[It'll take a few edits to write this up. Draft for now, I'll remove this message when done.]
Motivation
We'd like an algebra in which we can reason algebraically about the equivalence of queries. For example
- When is it equivalent to 'push restrict through (natural) Join'? The A algebra goes some way to answer that. It reduces surface-language
WHERE STATUS = 10
to<AND> REL{TUP{ STATUS 10}}
; then we can reason from the Associativity of<AND>
.- When is it equivalent to 'push project through Join'? But the A algebra has no answer here: project/
REMOVE
are not susceptible to equational reasoning, because theirattribute-comma-list
is not first-class: there's no operators to manipulate lists of attributes, obtain them from relation(al expressions), etc. Only literals are allowed.- And yet A at several points wants to manipulate attribute-lists: project in the surface language is to be implemented via
REMOVE
(of the complement of those attributes, relative to the relation being projected over);COMPOSE
"─we have allowed ourselves the luxury (some might think) of including a "macro" operator", what do the scare-quotes indicate? how could any "macro" look at the headings ofCOMPOSE
's operands and infer the attributes in common?- (I'm not 'picking on' A specifically; it's just that D&D are being admirably clear about requirements. Codd's algebra suffers the same difficulties. Whatever optimisation/equivalences/manipulation goes on inside SQL engines and their internal 'algebra' is similarly not easy to reason with.)
Are you asking this out of a concern about "query optimisation" ?
To be clear, this isn't a mechanism to tell an implementor how to optimise a query. It is aimed at mechanically checking whether a query is equivalent to some (alleged) optimisation. The optimisation strategy is likely to rely on observing that mentioned relations do/don't have attributes in common. From what I've shown about (empty) relations-denoting-headings, you should be able to see the algebra is expressive enough. Optimisations also rely on considerations like keys and foreign keys. Then (I claim, not yet substantiated) the algebra is powerful enough to express those properties.
If not, don't read any further. If so, you may but it's at your own risk.
From having written SIRA_PRISE from the ground up, I have come to believe that "query rewrite" is the wrong strategy, or the wrong problem, or the wrong solution, or, well, the wrong something.
Then you're swimming against the tide of modern compiler practice. "AST rewrite" is the strategy for implementors: rewrite the surface-language AST to some reduced-instruction-set algebra/AST; optimise that by manipulating the AST; type-check as you go, as a sanity check over the manipulations; rewrite to some assembly-level or LLVM/abstract JVM AST; translate to object code.
I have come to believe that people asking themselves (and worse, others too) these questions need to get their head cleared (***totally***) of their own preconceptions and ask themselves ***WHAT*** problem it is they are trying to solve and while doing so, wonder about ***every single aspect of the definition they find*** whether it is an aspect that pertains to the 'model' space or one that pertains to the 'implementation' space.
The first reason for my saying so is that when you have a query that you want to find an optimal execution strategy for, and you manage to rewrite that query to a provably equivalent one, ...
And how are you proving equivalence? From what I see of SQL engines, it's black arts and rule-of-thumb. Of course SQL is severely complicated by dealing with Null
, duplicate rows, duplicate/anonymous columns.
what else do you have after that than yet another query that you want to find an optimal execution strategy for ? (The answer to that question is : "a query that has certain properties/satisfies certain conditions that the original query didn't". I suggest you now start to wonder about ***what exactly those properties/conditions are***. I think you are bound to find, and cannot escape, that the answer is "properties/conditions that are determined by what the optimizer does and doesn't do". But if your goal is to ***find*** what the optimizer should and shouldn't do, then you don't know what those properties/conditions are, and as a consequence you don't know what qualities the result of the rewrite is supposed to meet/adhere to/abide by/... .)
The second reason for my saying so is, why do you only mention 'restrict' and 'project' as candidates for "pushing through join" ? "Pushing extend through join" is a worthwhile strategy to consider if the EXTEND is computationally expensive [and the pushing is possible].
I mentioned 'restrict' and 'project' push-through-join as the most common examples. And the same technique works for push-EXTEND-through join; and already is catered for by the A algebra. Frankly if you can't already see that, I'm wondering how competent an implementor you are. To give an example using only A:
- Assuming we have relvar
DoubleQty := ((PLUS WHERE X = Y) {ALL BUT Y}) RENAME { QTY := X, QTYx2 := Z }
- Take query
((S JOIN SP) WHERE STATUS = 10) EXTEND { QTYx2 := QTY + QTY };
- Translate to A
((S <AND> SP) <AND> REL{TUP{ STATUS 10 }}) <AND> DoubleQty;
<AND>
is associative, commutative, so shuffle around the parens:((S <AND> REL{TUP{ STATUS 10 }}) <AND> (SP <AND> DoubleQty));
- Translate back to Tutorial D
((S WHERE STATUS = 10) JOIN (SP EXTEND { QTYx2 := QTY + QTY}));
- QED: push EXTEND through join, and push restrict through join.
- I didn't show the bit about checking for attributes in common. Going only by associativity of <AND> could end up with this:
((S EXTEND { QTYx2 := QTY + QTY }) JOIN (SP WHERE STATUS = 10));
which might still work if actually executed by an (imaginary) A engine, but would then rely on domain-dependent semantics/infinite relations, so clearly isn't any sort of optimisation.
I think the following 2 paras might be saying the same as my demonstration. But your prolixity has buried any point. I am not saying that push-x-through-join is necessarily an optimisation. Only the implementor can know that, wrt the specifics of their engine.
The third reason for my saying so, is that from having to find reliable answers to these kinds of question or otherwise I'd never have gotten the working product I have, I know that "push restrict through join" (and "pushing extend through join" is -as far as the semantics of the query at hand are concerned- in the same category because RESTRICT and EXTEND are in the very same 'A' category of operators, or perhaps I should say "are cases of the very same 'A' operator") still has distinct cases to consider. Say, relational expression A with attrs Ax,Cy and relational expression B with attrs Cy,Bz. "WHERE f(Ai,Bj)" : not pushable on the [natural] join of A and B. Ditto for EXTEND. "WHERE f(Ai,Cj)" : pushable, because the result can by definition include only tuples for which f(Ai,Cj) is true and that's evaluatable on A on its own. Ditto for f(Ai,Cj) used in EXTEND, but ***only*** useful in the case where f is known to be computationally expensive, ***and*** the risk is non-negligible that the very same computation would have to be carried out several times for each tuple matched in the JOIN. The point of this very long paragraph being : "push restrict through join" is just simply not enough as a detailed definition of the thing you are thinking of.
Further elaboration : now, f(Ax,By) is a logical conjunct where any Ax might be any of Ai,Cj and the same holds for By. If there is any (Ax,By) where Ax is actually an Ai and By is actually a Bj, then the conjunction as a whole is not "pushable". However, there might be constituants of the CNF form of the logical conjuct that ***are*** "pushable". If you can find that CNF form and observe certain sub-constituants that are indeed "pushable" and that effectively reduce (per the best esimates you can possible make) the cost estimate that your optimizer will associate with "carrying out this particular strategy", then push them.
Somewhat similar observations for the case where F(Ax,By) is a logical disjunct, but I'm afraid I'm getting too tired to continue writing and not start uttering stupidities.
Quote from AntC on May 13, 2021, 3:13 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra. It can be axiomatized with a single operation, but this operation is not a lattice one.
Thanks Vadim. True. I don't see the relevance: we know a database of all relations (relvars) having the same heading is not adequate.
P.S. A lot of questions still hang from our past discussions. For example, what do you mean by the "fixed" algebraic object (zilliary operation DUM, unary negation, etc.)?
Compare: zilliary
DEE
is the identity value forNatJoin
, therefore lattice top, known asR01
in your papers. Is there any difficulty with that?DUM
is just some other specified value -- that is, providing I can arrive at a definition that fixes it. Is there something illegitimate with zilliary operations?Unary negation (A's
<NOT>
) I have not defined/have not used. I don't "mean" anything by it. It appears in some of your papers, what do you mean by it?
It looks like the Category Theory might provide formalization (the same object/morphism up to equivalency, up to isomorphism etc). However, I will not pretend I mastered this subject yet. (And how do they write books like "Category theory for working [scientist]" (replace the "scientist" with anything but "mathematician") when all the category examples are mathematical objects, of which the reader might be familiar with only one or two?)
Some people (by which I mean many propeller-headed Haskellers) seem to think Category Theory is some kind of magic mathematical dust that you can sprinkle on any abstraction and make it more easy to reason about. I don't see any Category-theoretic treatment of lattices; I don't understand wikipedia on Category Theory; I do understand wikipedia on Lattices.
Quote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra. It can be axiomatized with a single operation, but this operation is not a lattice one.
Thanks Vadim. True. I don't see the relevance: we know a database of all relations (relvars) having the same heading is not adequate.
P.S. A lot of questions still hang from our past discussions. For example, what do you mean by the "fixed" algebraic object (zilliary operation DUM, unary negation, etc.)?
Compare: zilliary DEE
is the identity value for NatJoin
, therefore lattice top, known as R01
in your papers. Is there any difficulty with that? DUM
is just some other specified value -- that is, providing I can arrive at a definition that fixes it. Is there something illegitimate with zilliary operations?
Unary negation (A's <NOT>
) I have not defined/have not used. I don't "mean" anything by it. It appears in some of your papers, what do you mean by it?
It looks like the Category Theory might provide formalization (the same object/morphism up to equivalency, up to isomorphism etc). However, I will not pretend I mastered this subject yet. (And how do they write books like "Category theory for working [scientist]" (replace the "scientist" with anything but "mathematician") when all the category examples are mathematical objects, of which the reader might be familiar with only one or two?)
Some people (by which I mean many propeller-headed Haskellers) seem to think Category Theory is some kind of magic mathematical dust that you can sprinkle on any abstraction and make it more easy to reason about. I don't see any Category-theoretic treatment of lattices; I don't understand wikipedia on Category Theory; I do understand wikipedia on Lattices.
Quote from dandl on May 13, 2021, 8:41 amOnly one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
Appendix A claims
NAND
could replace only<AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)It says:
"In fact, we will show in the next section that we do not really
need RENAME either; thus, we could in fact reduce our algebra
still further to just the two operators REMOVE and either
NOR or NAND (plus TCLOSE)."Project/Remove is simply InnerUnion with an empty relation representing the new heading.
...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate.DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.I have no idea what this means, or why I should care. My aim was to point out that I don't see any solution with a single primitive unless it's some kind of negation, like NAND. Whether or not you get a 'nice' DUM or not this way, I think you won't get there any other way.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of
MATCHING
; butr1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projectionAntiJoin is cognate with Not Matching: same heading, result is the complement of SemiJoin (aka Matching). I prefer to avoid the distinctive/idiosyncratic TD names for operators, they come with too much baggage.
But I have to say I'm not optimistic about your quest. In highly informal terms, you can have an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite, but a single operator to do all that seems far-fetched. More power if you can get there.
Only one primitive operator
InnerUnion
has some very nice properties for equational reasoning: it is commutative, idempotent, associative -- indeed as isNatJoin
. Unfortunately,InnerUnion
is not distributive overNatJoin
, nor vice versa.- But (ANNOUNCE) if we could define
InnerUnion
in terms ofNatJoin
(embedding the definition in the machinery of FOPL with equality) then we can reason around equations using both operators.- Furthermore (further ANNOUNCE) if we could define
PROJECT/REMOVE
,NotMatching
in terms ofNatJoin
(embedding etc) then we can reason all the way round equations.At this juncture, I'd better point out that this is not intended as an 'executable' algebra any more than A is. And it would be horribly inefficient to go about
emptify
ing relations just to get their heading -- or even to evaluate the relational expression argument toemptify( )
! It's an algebra purely for reasoning equationally about the equivalence of queries.No problem down to here, but I see trouble ahead. I was expecting the one operator to be a form of negation (like NAND).
Appendix A claims
NAND
could replace only<AND>, <OR>, <NOT>
; it doesn't go near project/REMOVE, COMPOSE
. So that's a non-starter. (Also it would introduce domain-dependence, which gives me a queasy feeling.)
It says:
"In fact, we will show in the next section that we do not really
need RENAME either; thus, we could in fact reduce our algebra
still further to just the two operators REMOVE and either
NOR or NAND (plus TCLOSE)."
Project/Remove is simply InnerUnion with an empty relation representing the new heading.
...
My attempts at such definitions have foundered in the past because they relied on
DUM
being defined; and I couldn't obtain a definition forDUM
that fixed it uniquely in the structure of relations. (Reminder: we can use in the definitions only relation values unanalysed, and operators on them. There's no ability to manipulate 'headings' or 'bodies' (as sets), because that would introduce additional 'sorts' into the algebra.)The definition for
DEE
hadn't suffered that difficulty, because we can fixDEE
as the identity element forNatJoin
. And it turns out (claim not yet demonstrated) we can define the other operators without mentioningDUM
; then defineDUM
from them:
DEE =df s s.t. ∀ r. s NatJoin r = r
.DUM =df DEE NotMatching DEE
.DUM =df ∀ r. (r NotMatching r) REMOVE r
equivalently∀ r. (r REMOVE r) NotMatching r
.- (Yes, that's three equivalent definitions for
DUM
-- proven equivalent by equational reasoning.)Seems to me it shouldn't be too hard to Dee and Dum, but only if you can negate.
DEE
is easy/doesn't need negate.DUM = DEE NotMatching DEE
turns out to have problems/is not unique enough, as I discovered. Lattice algebras tend to presume the structure is distributive, complemented unless you give a strong instruction otherwise -- which is my 'ugly' axiom.
I have no idea what this means, or why I should care. My aim was to point out that I don't see any solution with a single primitive unless it's some kind of negation, like NAND. Whether or not you get a 'nice' DUM or not this way, I think you won't get there any other way.
...
No, I really don't follow. I don't know whether it's the lattices or the toolset but you lost me here. As I see it REMOVE is just AntiJoin. Easy if you start with negation.
AntiJoin could be defined as the negation of
MATCHING
; butr1 MATCHING r2 = r1 NatJoin r2 {attributes-from-r1}
, so you still need projection
AntiJoin is cognate with Not Matching: same heading, result is the complement of SemiJoin (aka Matching). I prefer to avoid the distinctive/idiosyncratic TD names for operators, they come with too much baggage.
But I have to say I'm not optimistic about your quest. In highly informal terms, you can have an operator that removes attributes and an operator that removes rows and apply them twice to get the opposite, but a single operator to do all that seems far-fetched. More power if you can get there.
Quote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.
Quote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given by r1 NotMatching r1
.
Sub-Sub-Lattice complement of r2
coincides with r1 NotMatching r2
. IOW if r2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
then s = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice from r1
. That still doesn't fix enough, without the 'ugly' axiom.
Quote from Vadim on May 13, 2021, 4:27 pmQuote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The other [rather artificial] example is the database with all the empty relations.
Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons. In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.
Quote from AntC on May 13, 2021, 11:53 amQuote from Vadim on May 12, 2021, 5:28 pmConsider a database with all relations having the same set of attributes. Then, it is a Boolean Algebra.
Since all the relation values with the same heading form a Boolean Algebra (i.e. a distributive, complemented sub-lattice within the bigger structure), we can take any element
r1
within it: the sub-sub-lattice from that element (as sub-sub-lattice top) 'down' to the empty relation with that heading is also a Boolean Algebra. That empty relation value is given byr1 NotMatching r1
.Sub-Sub-Lattice complement of
r2
coincides withr1 NotMatching r2
. IOW ifr2 NatJoin s = (r1 NotMatching r1) & r2 InnerUnion s = r1
thens = r1 NotMatching r2
. I've added that as an axiom. Also added the distributivity within the sub-sub-lattice fromr1
. That still doesn't fix enough, without the 'ugly' axiom.
The simplest possible counter example is the database consisting of only two relations: DEE and DUM. Yes, this example is far fetched from database practice, but counterexamples can be trivial/silly.
The other [rather artificial] example is the database with all the empty relations.
Therefore, we are in the realm of Boolean Algebra. There, reduction to a single operation is well known (NAND, NOR), which is not consistent with your presentation.
In general, a reduction of the operation set in Universal Algebra to some basic set is understood as representing each derived operation via composition of basic operatons. In other words, you are not allowed to use the First Order Logic sentenses and specify a constraint on the derived operation this way; you have to exhibit an equality with the derived operation alone on the left hand side.