## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

# A plague on 'Relationally Complete'

In my (never-ending/fruitless) quest for a programming language which can express 'D-ness', I keep coming across this sort of claim:

Relational algebra  Section 4 here

Rather than model any particular flavour of SQL, we will show how to embed relational algebra operators in Agda. For the sake of simplicity, we have chosen only to model five operations: selection, projection, cartesian product, set union, and set difference.

... ... ...

As we mentioned previously, we have taken a very minimal set of relational algebra operators. It should be fairly straightforward to add operators for the many other operators in relational algebra, such as the natural join, θ-join, equijoin, renaming, or division, using the same techniques. Alternatively, you can define many of
these operations in terms of the operations we have implemented in the RA data type.

...

Our choice of Schema data type suffers from the usual disadvantages of using a list to represent a set: our Schema data type may contain duplicates and the order of the elements matters.

(Their 'selection' we'd recognise as `WHERE`. Their 'cartesian product' requires the headings be disjoint. 'set union' and 'difference' require operands have the same heading (aka "Schema"), with same attributes in the same l-to-r sequence. Presumably we can get intersection "in terms of" difference: `R INTERSECT S = R MINUS (R MINUS S)`. But "many" of the RA operators can be defined in terms of their set?)

They haven't demonstrated an ability to 'rename', because they haven't demonstrated how to avoid a Schema having duplicate attribute names. But of course 'rename' wasn't included in 'Relationally Complete' [Codd 1972]. They haven't demonstrated an ability to 'natural join' because they can't cope with operands having some-not-all attribute names in common -- let alone in different l-to-r sequence. But "you can define" NatJoin "in terms of the operations we have implemented" because Codd 1972 does. (No he doesn't, he sneaks in a need for 'rename' in two places, without calling it that. Except of course Codd is using 'positional perspective', not 'named'.)

The practical consequence here is that no tables in the database can have same-named attributes. (Or at least not if you're going to mention them in the same query.) We've had discussions in which Hugh has chafed against requiring same-naming of attributes in different tables that 'mean the same thing'. What is the collective wisdom on banning same-naming of any attributes, whether or not they 'mean the same thing'?

I frequently deal with installations whose naming standards are that column names must be prefixed with their table name. `P_NAME` vs `S_NAME` I suppose would avoid the hephalump trap in the Suppliers & Parts schema. `P_S#, S_S#, SP_P#, SP_S#` seems harder to swallow. I ask these installations what their standards are for naming Views: do they inherit the column names from the based-on tables, or do they `RENAME` every column?; what about Join views across Foreign Keys where in effect there's two based-on columns? Oh, we don't use Views; and we certainly don't use Join Views -- terrible for performance.

How do they cope with the need for self-joins, of the sort needed to mimic 'division'? (I suspect the answer is to read the tables into the program RBAR and eschew the database engine altogether. Then you can apply program-internal names to the columns.)

These academic approaches to what they call Relational Algebra go round in their own game of whispers. If O&S think their set is adequate, the next researcher won't read the caveats; will implement only the simplified subset; will then claim they have an RA; it's a solved problem.

Am I being too precious/purist? Can practical querying do without NatJoin or theta-Join? Is 'Relationally Complete' or a more precisely defined 'Expressively Complete' a useless criterion?

Quote from AntC on February 7, 2021, 3:40 am

In my (never-ending/fruitless) quest for a programming language which can express 'D-ness', I keep coming across this sort of claim:

Relational algebra  Section 4 here

Rather than model any particular flavour of SQL, we will show how to embed relational algebra operators in Agda. For the sake of simplicity, we have chosen only to model five operations: selection, projection, cartesian product, set union, and set difference.

... ... ...

As we mentioned previously, we have taken a very minimal set of relational algebra operators. It should be fairly straightforward to add operators for the many other operators in relational algebra, such as the natural join, θ-join, equijoin, renaming, or division, using the same techniques. Alternatively, you can define many of
these operations in terms of the operations we have implemented in the RA data type.

...

Our choice of Schema data type suffers from the usual disadvantages of using a list to represent a set: our Schema data type may contain duplicates and the order of the elements matters.

(Their 'selection' we'd recognise as `WHERE`. Their 'cartesian product' requires the headings be disjoint. 'set union' and 'difference' require operands have the same heading (aka "Schema"), with same attributes in the same l-to-r sequence. Presumably we can get intersection "in terms of" difference: `R INTERSECT S = R MINUS (R MINUS S)`. But "many" of the RA operators can be defined in terms of their set?)

They haven't demonstrated an ability to 'rename', because they haven't demonstrated how to avoid a Schema having duplicate attribute names. But of course 'rename' wasn't included in 'Relationally Complete' [Codd 1972]. They haven't demonstrated an ability to 'natural join' because they can't cope with operands having some-not-all attribute names in common -- let alone in different l-to-r sequence. But "you can define" NatJoin "in terms of the operations we have implemented" because Codd 1972 does. (No he doesn't, he sneaks in a need for 'rename' in two places, without calling it that. Except of course Codd is using 'positional perspective', not 'named'.)

I thought Codd relied on domains? If `S#` is the same type, join on it and if `CITY` is two different types, don't.

But really, I see no earthly reason why one cannot simply give up on the named and unnamed and go with equi-join, as per the SQL `SELECT`. Name all the columns coming and going and we're done. LINQ does that, and it all works quite nicely.

The practical consequence here is that no tables in the database can have same-named attributes. (Or at least not if you're going to mention them in the same query.) We've had discussions in which Hugh has chafed against requiring same-naming of attributes in different tables that 'mean the same thing'. What is the collective wisdom on banning same-naming of any attributes, whether or not they 'mean the same thing'?

I frequently deal with installations whose naming standards are that column names must be prefixed with their table name. `P_NAME` vs `S_NAME` I suppose would avoid the hephalump trap in the Suppliers & Parts schema. `P_S#, S_S#, SP_P#, SP_S#` seems harder to swallow. I ask these installations what their standards are for naming Views: do they inherit the column names from the based-on tables, or do they `RENAME` every column?; what about Join views across Foreign Keys where in effect there's two based-on columns? Oh, we don't use Views; and we certainly don't use Join Views -- terrible for performance.

How do they cope with the need for self-joins, of the sort needed to mimic 'division'? (I suspect the answer is to read the tables into the program RBAR and eschew the database engine altogether. Then you can apply program-internal names to the columns.)

These academic approaches to what they call Relational Algebra go round in their own game of whispers. If O&S think their set is adequate, the next researcher won't read the caveats; will implement only the simplified subset; will then claim they have an RA; it's a solved problem.

Am I being too precious/purist? Can practical querying do without NatJoin or theta-Join? Is 'Relationally Complete' or a more precisely defined 'Expressively Complete' a useless criterion?

Isn't equi-join enough? But AFAICT both these guys and Linq are missing antijoin, and that does have its uses.

Andl - A New Database Language - andl.org