Aggregates in relational calculus

#21 · August 19, 2019, 6:24 am

Quote from AntC on August 19, 2019, 6:24 am

Quote from dandl on August 19, 2019, 1:03 am

Quote from AntC on August 18, 2019, 12:35 pm

I don't think I have ever seen a clear, authoritative and widely accepted declaration of what the RA is and is not.

"a clearly defined category ... clearly defined membership function" [from a later message]

We seem to be in the umptieth round of the semi-annual 'explain Relationally Complete to David B'. Since the copious previous answers here don't seem to be working for you, perhaps try another forum?

No, you're wrong about that. You really should read more carefully.

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. This is the part everybody has tried to explain to you and that is not sinking in. So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

My issue is with the status of the definition of the RA. Codd defined his with great care, but you (and TTM) have not. I would think that RM Pre 18 envisages a set of operators rather larger than those proposed by Codd, but this is never made clear.

RM Pre 18 talks about "the usual", which is punting the issue down the road I suppose, but I'm finding no difficulty.

Since TTM provides in OO Pre 6 an explicit definition for aggregation, one would assume that aggregation is not part of the RA envisaged in RM Pre 18. Correct?

Since Codd didn't include aggregation (nor other calculations), it's not usual to include the capability for aggregations in "the usual" set.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

What about computing new values? Codd relied on constant values, and mentioned the need for computation. TTM specifies the creation of new values from literals, but makes no mention of creating new computed values. Is that to be found in some (extended) RA?

You can get another take on the capabilities of some RA in Appendix A. Relation constant PLUS etc has all the values anybody needs. My take goes like this:

You're familiar with Alice's discussion of the 'active domain'. Crucially it includes not only the current database instance, but also all constants/literals appearing in the query. This includes attribute names appearing in RENAME/EXTEND/GROUP/etc. Any domain-specific operators like arithmetic or aggregating have behind them definite procedures to compute a value from the active domain and add it to the active domain. (I'm going to ignore things like random number generators or fetch the current system time.) Then I claim there are no "new" values, only the active domain.

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

I don't have a problem with implementing a relationally complete language (done). I have a problem with a thread entitled "Aggregates in relational calculus" with no valid basis.

The title is from someone asking a question, not making any claim. And the answer was clear: Codd's RC didn't include aggregates. Nobody's claimed otherwise.

Quote from dandl on August 19, 2019, 1:03 am

Quote from AntC on August 18, 2019, 12:35 pm

I don't think I have ever seen a clear, authoritative and widely accepted declaration of what the RA is and is not.

"a clearly defined category ... clearly defined membership function" [from a later message]

We seem to be in the umptieth round of the semi-annual 'explain Relationally Complete to David B'. Since the copious previous answers here don't seem to be working for you, perhaps try another forum?

No, you're wrong about that. You really should read more carefully.

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. This is the part everybody has tried to explain to you and that is not sinking in. So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

My issue is with the status of the definition of the RA. Codd defined his with great care, but you (and TTM) have not. I would think that RM Pre 18 envisages a set of operators rather larger than those proposed by Codd, but this is never made clear.

RM Pre 18 talks about "the usual", which is punting the issue down the road I suppose, but I'm finding no difficulty.

Since TTM provides in OO Pre 6 an explicit definition for aggregation, one would assume that aggregation is not part of the RA envisaged in RM Pre 18. Correct?

Since Codd didn't include aggregation (nor other calculations), it's not usual to include the capability for aggregations in "the usual" set.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

What about computing new values? Codd relied on constant values, and mentioned the need for computation. TTM specifies the creation of new values from literals, but makes no mention of creating new computed values. Is that to be found in some (extended) RA?

You can get another take on the capabilities of some RA in Appendix A. Relation constant PLUS etc has all the values anybody needs. My take goes like this:

You're familiar with Alice's discussion of the 'active domain'. Crucially it includes not only the current database instance, but also all constants/literals appearing in the query. This includes attribute names appearing in RENAME/EXTEND/GROUP/etc. Any domain-specific operators like arithmetic or aggregating have behind them definite procedures to compute a value from the active domain and add it to the active domain. (I'm going to ignore things like random number generators or fetch the current system time.) Then I claim there are no "new" values, only the active domain.

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

I don't have a problem with implementing a relationally complete language (done). I have a problem with a thread entitled "Aggregates in relational calculus" with no valid basis.

The title is from someone asking a question, not making any claim. And the answer was clear: Codd's RC didn't include aggregates. Nobody's claimed otherwise.

#22 · August 19, 2019, 4:57 pm

Quote from AntC on August 19, 2019, 6:24 am

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

There's another take : Codd 1970 did not even consider the explicit naming of attributes as anywhere near crucial to the working of the RM. (He admitted the possibility as a matter of probably high practicality, and then happily went on assuming mathematical tuples where the attribute identifier is ordinal position, not name.) Considering that, it is rather obvious why we don't find RENAME in any of Codd.

Author of SIRA_PRISE

#23 · August 20, 2019, 12:47 am

Quote from dandl on August 20, 2019, 12:47 am

Quote from AntC on August 19, 2019, 6:24 am

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. This is the part everybody has tried to explain to you and that is not sinking in. So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

I'm increasingly forming the view that you resort to ad hominem attacks when you either don't understand the question or don't actually know what you're talking about. That's spurious, wrong and irrelevant.

My issue is with the status of the definition of the RA. Codd defined his with great care, but you (and TTM) have not. I would think that RM Pre 18 envisages a set of operators rather larger than those proposed by Codd, but this is never made clear.

RM Pre 18 talks about "the usual", which is punting the issue down the road I suppose, but I'm finding no difficulty.

But you're not answering the question either.

Since TTM provides in OO Pre 6 an explicit definition for aggregation, one would assume that aggregation is not part of the RA envisaged in RM Pre 18. Correct?

Since Codd didn't include aggregation (nor other calculations), it's not usual to include the capability for aggregations in "the usual" set.

Agreed. You don't know what 'the usual' is but you at least know something it is not.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

I disagree. Where is your reference for that? Alice is perfectly happy to deal with a an unnamed RM, and in many ways it makes things much simpler. Users might disagree.

What about computing new values? Codd relied on constant values, and mentioned the need for computation. TTM specifies the creation of new values from literals, but makes no mention of creating new computed values. Is that to be found in some (extended) RA?

You can get another take on the capabilities of some RA in Appendix A. Relation constant PLUS etc has all the values anybody needs. My take goes like this:

You're familiar with Alice's discussion of the 'active domain'. Crucially it includes not only the current database instance, but also all constants/literals appearing in the query. This includes attribute names appearing in RENAME/EXTEND/GROUP/etc. Any domain-specific operators like arithmetic or aggregating have behind them definite procedures to compute a value from the active domain and add it to the active domain. (I'm going to ignore things like random number generators or fetch the current system time.) Then I claim there are no "new" values, only the active domain.

I don't buy it, neither does Alice, and despite authors in common, neither does TTM. The active domain comprises values that are in the database and is therefore a known finite quantity (Alice points this out). Your active domain of PLUS would be infinite, which is a nonsense. PLUS is really a gloss over computation, placing it outside the RM. PLUS shows how to connect the RM to computation, how it might be provided with fewer operators in the language.

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

I don't have a problem with implementing a relationally complete language (done). I have a problem with a thread entitled "Aggregates in relational calculus" with no valid basis.

The title is from someone asking a question, not making any claim. And the answer was clear: Codd's RC didn't include aggregates. Nobody's claimed otherwise.

Cautiously said. The title implies aggregates as part of some RA/RC if not Codd's, and that is what troubles me. I see the RA/RC as being devoid of computation on attribute values, but might be extended to include recursion. HHT extends the RA to include attribute names (now commonplace) and offers some other suggestions (not including aggregates), but I don't see that their work has been widely accepted. AppA's PLUS is similar, but falls short.

I prefer the treatment here: https://en.wikipedia.org/wiki/Relational_algebra. Computation of new values, aggregation and transitive closure are not provided by the RA, but are requirements of a useful query language, so must be provided by other means. That's exactly what TTM and SQL do; SQL plausibly extends the RA in various areas; TTM extends the query language but not the RA.

Quote from AntC on August 19, 2019, 6:24 am

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. This is the part everybody has tried to explain to you and that is not sinking in. So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

I'm increasingly forming the view that you resort to ad hominem attacks when you either don't understand the question or don't actually know what you're talking about. That's spurious, wrong and irrelevant.

My issue is with the status of the definition of the RA. Codd defined his with great care, but you (and TTM) have not. I would think that RM Pre 18 envisages a set of operators rather larger than those proposed by Codd, but this is never made clear.

RM Pre 18 talks about "the usual", which is punting the issue down the road I suppose, but I'm finding no difficulty.

But you're not answering the question either.

Since TTM provides in OO Pre 6 an explicit definition for aggregation, one would assume that aggregation is not part of the RA envisaged in RM Pre 18. Correct?

Since Codd didn't include aggregation (nor other calculations), it's not usual to include the capability for aggregations in "the usual" set.

Agreed. You don't know what 'the usual' is but you at least know something it is not.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

I disagree. Where is your reference for that? Alice is perfectly happy to deal with a an unnamed RM, and in many ways it makes things much simpler. Users might disagree.

What about computing new values? Codd relied on constant values, and mentioned the need for computation. TTM specifies the creation of new values from literals, but makes no mention of creating new computed values. Is that to be found in some (extended) RA?

You can get another take on the capabilities of some RA in Appendix A. Relation constant PLUS etc has all the values anybody needs. My take goes like this:

You're familiar with Alice's discussion of the 'active domain'. Crucially it includes not only the current database instance, but also all constants/literals appearing in the query. This includes attribute names appearing in RENAME/EXTEND/GROUP/etc. Any domain-specific operators like arithmetic or aggregating have behind them definite procedures to compute a value from the active domain and add it to the active domain. (I'm going to ignore things like random number generators or fetch the current system time.) Then I claim there are no "new" values, only the active domain.

I don't buy it, neither does Alice, and despite authors in common, neither does TTM. The active domain comprises values that are in the database and is therefore a known finite quantity (Alice points this out). Your active domain of PLUS would be infinite, which is a nonsense. PLUS is really a gloss over computation, placing it outside the RM. PLUS shows how to connect the RM to computation, how it might be provided with fewer operators in the language.

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

I don't have a problem with implementing a relationally complete language (done). I have a problem with a thread entitled "Aggregates in relational calculus" with no valid basis.

The title is from someone asking a question, not making any claim. And the answer was clear: Codd's RC didn't include aggregates. Nobody's claimed otherwise.

Cautiously said. The title implies aggregates as part of some RA/RC if not Codd's, and that is what troubles me. I see the RA/RC as being devoid of computation on attribute values, but might be extended to include recursion. HHT extends the RA to include attribute names (now commonplace) and offers some other suggestions (not including aggregates), but I don't see that their work has been widely accepted. AppA's PLUS is similar, but falls short.

I prefer the treatment here: https://en.wikipedia.org/wiki/Relational_algebra. Computation of new values, aggregation and transitive closure are not provided by the RA, but are requirements of a useful query language, so must be provided by other means. That's exactly what TTM and SQL do; SQL plausibly extends the RA in various areas; TTM extends the query language but not the RA.

Andl - A New Database Language - andl.org

#24 · August 20, 2019, 9:41 am

Quote from dandl on August 20, 2019, 12:47 am

PLUS is really a gloss over computation, placing it outside the RM. PLUS shows how to connect the RM to computation, how it might be provided with fewer operators in the language.

No it's not "a gloss". It's the view that mathematical theory of relations has on the matter. It might be more fruitful if you just stopped twirling around the real issue :

Operators are functions and functions are relations. Therefore operator invocation is function application is JOIN application.

It's just a way of viewing things conceptually and mostly unsuitable as a basis for implementation, but that does not mean it's also worthless at the conceptual level.

Then you could stop making distinctions without a difference as in e.g.

TTM extends the query language but not the RA.

We need to be able to do EXTEND and we need to be able to do AGGREGATE/SUMMARIZEBY. Including them in a language means first having their algebraic properties fixed (otherwise users obviously couldn't usefully invoke them, not knowing what they'd be getting back as a result for lack of a proper definition). That done, you have all you need to "extend whatever RA you were looking at before that".

Author of SIRA_PRISE

#25 · August 20, 2019, 2:35 pm

Quote from dandl on August 20, 2019, 2:35 pm

Quote from Erwin on August 20, 2019, 9:41 am

Quote from dandl on August 20, 2019, 12:47 am

PLUS is really a gloss over computation, placing it outside the RM. PLUS shows how to connect the RM to computation, how it might be provided with fewer operators in the language.

No it's not "a gloss". It's the view that mathematical theory of relations has on the matter. It might be more fruitful if you just stopped twirling around the real issue :

Operators are functions and functions are relations. Therefore operator invocation is function application is JOIN application.

That's a rather loose use of the language. In maths, a relation is usually taken to be a binary relation, a set of tuples where each tuple is an ordered pair relating elements in one set to elements in another. If the relation satisfies some constraints it is a function (relations are a superset of functions). Infinite functions are commonplace.

In database terminology the term applies to a finitary relation; the tuples are not ordered, and each tuple relates those values drawn from each set (domain) that satisfy a predicate.The relation is assumed to be finite and to conform to the CWA. Such relations are obviously not functions.

Yes, operators are functions and functions are relations but the differences are significant: ordered vs not, anonymous vs named, infinite vs finite. AppA does enough of the legwork to give a flavour and HHT takes it further, but it's not a done deal.

For example, the result from: PLUS WHERE Z = 1? Obviously the result is infinite. No CWA here. And what does it mean to do a SEMIJOIN or ANTIJOIN or UNION on PLUS? Surely JOIN is not so special?

Perhaps there is a way to make it work, but I'm far from convinced. Given that AppA and HHT are the only known references to the idea, it seems others are not either.

It's just a way of viewing things conceptually and mostly unsuitable as a basis for implementation, but that does not mean it's also worthless at the conceptual level.

No, this time the implementation is easy: functions, JIT calculations, all in a day's work. It's the theory that troubles me.

Then you could stop making distinctions without a difference as in e.g.

TTM extends the query language but not the RA.

We need to be able to do EXTEND and we need to be able to do AGGREGATE/SUMMARIZEBY. Including them in a language means first having their algebraic properties fixed (otherwise users obviously couldn't usefully invoke them, not knowing what they'd be getting back as a result for lack of a proper definition). That done, you have all you need to "extend whatever RA you were looking at before that".

EXTEND is just another way to say we need functions to calculate values, which in the case of an aggregate will be second order, and functions don't fit nicely into the RA (regardless of AppA and HHT). TTM doesn't try: instead it proposes a quite separate language and type system with the operations of the RA embedded.

But if you have a theoretical treatment that extends the RA by just enough to include functions without running into the problems mentioned above, that would be interesting. Or is the answer to the question just: Datalog?

Quote from Erwin on August 20, 2019, 9:41 am

Quote from dandl on August 20, 2019, 12:47 am

PLUS is really a gloss over computation, placing it outside the RM. PLUS shows how to connect the RM to computation, how it might be provided with fewer operators in the language.

No it's not "a gloss". It's the view that mathematical theory of relations has on the matter. It might be more fruitful if you just stopped twirling around the real issue :

Operators are functions and functions are relations. Therefore operator invocation is function application is JOIN application.

That's a rather loose use of the language. In maths, a relation is usually taken to be a binary relation, a set of tuples where each tuple is an ordered pair relating elements in one set to elements in another. If the relation satisfies some constraints it is a function (relations are a superset of functions). Infinite functions are commonplace.

In database terminology the term applies to a finitary relation; the tuples are not ordered, and each tuple relates those values drawn from each set (domain) that satisfy a predicate.The relation is assumed to be finite and to conform to the CWA. Such relations are obviously not functions.

Yes, operators are functions and functions are relations but the differences are significant: ordered vs not, anonymous vs named, infinite vs finite. AppA does enough of the legwork to give a flavour and HHT takes it further, but it's not a done deal.

For example, the result from: PLUS WHERE Z = 1? Obviously the result is infinite. No CWA here. And what does it mean to do a SEMIJOIN or ANTIJOIN or UNION on PLUS? Surely JOIN is not so special?

Perhaps there is a way to make it work, but I'm far from convinced. Given that AppA and HHT are the only known references to the idea, it seems others are not either.

It's just a way of viewing things conceptually and mostly unsuitable as a basis for implementation, but that does not mean it's also worthless at the conceptual level.

No, this time the implementation is easy: functions, JIT calculations, all in a day's work. It's the theory that troubles me.

Then you could stop making distinctions without a difference as in e.g.

TTM extends the query language but not the RA.

We need to be able to do EXTEND and we need to be able to do AGGREGATE/SUMMARIZEBY. Including them in a language means first having their algebraic properties fixed (otherwise users obviously couldn't usefully invoke them, not knowing what they'd be getting back as a result for lack of a proper definition). That done, you have all you need to "extend whatever RA you were looking at before that".

EXTEND is just another way to say we need functions to calculate values, which in the case of an aggregate will be second order, and functions don't fit nicely into the RA (regardless of AppA and HHT). TTM doesn't try: instead it proposes a quite separate language and type system with the operations of the RA embedded.

But if you have a theoretical treatment that extends the RA by just enough to include functions without running into the problems mentioned above, that would be interesting. Or is the answer to the question just: Datalog?

Andl - A New Database Language - andl.org

#26 · August 22, 2019, 11:09 am

Quote from AntC on August 22, 2019, 11:09 am

Quote from dandl on August 20, 2019, 12:47 am

Quote from AntC on August 19, 2019, 6:24 am

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

The Alice book doesn't use Codd's term in the main text, but relegates it to a Bibliographic note. Then goes on (in Part E) to consider various notions of Expressiveness and 'complete'. "We have not emphasized that phrase [relational completeness] here because subsequent research has suggested that a more natural notion of completeness can be described in terms of Turing computability (see Chapter 16)."

TTM does not use the term, as I already noted. You would have thought D&D would use it if they were happy it's carefully defined.

Date's RDB Dictionary is lukewarm. The entry mentions that the formal definition requires DEE and DUM to be valid relation values. And yet Codd (according to a recent note from Hugh) couldn't see the point of them. If Codd was so careful, why didn't he understand the consequence of his own definition?

Codd 1972 doesn't include transitive closure; and yet in a 1979 paper he presents it; and later in 1985 was affronted anybody should think it wasn't supported. (I mean the equivalent of Tutorial D's TCLOSE, not the generalised form RM VSS 5.) If Codd's 1972 definition was so careful, why not include it? Its equivalent does arise in First-Order Logic, whereas the generalised form requires Second-order.

DTATRM with the original version of RM Pro 18 does list what it considers "the usual", and that includes several operators not in Codd 1972, including SUMMARIZE and GROUP.

Appendix A includes primitive operators <AND>, <NOT> that are not allowed by Codd's definitions, because they're domain-dependent. And yet the A Algebra is perfectly workable as an algebra providing the surface language allows only safe query expressions. So Codd 1972 was not careful to separate the notions of a domain-dependent operator vs a safe query. Deciding whether an algebra expression is a safe query is NP-hard in general, BTW [Alice]; but if you start with a safe query in a surface language and translate that to an algebra expression with domain-dependent operations; and then rearrange the expression using the well-understood transformations; you'll still have a safe query.

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. ... So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

.... That's spurious, wrong and irrelevant.

Suggest you read Codd 1972. The derivation from Predicate Logic is the whole point. The list of operators is a consequence. Which is why Date has a basis to say DEE and DUM must be included; and why we can say TCLOSE (or equivalent) must be included.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

I disagree. Where is your reference for that? Alice is perfectly happy to deal with a an unnamed RM, and in many ways it makes things much simpler. Users might disagree.

Reference 1: Codd 1972. As Erwin's message points out, Codd 1970 mostly considers the 'unnamed perspective' [Alice's term, although I'd prefer the 'positional perspective']. 1972 was too early in the evolution of the RM for Codd to properly consider the 'named perspective'.

Reference 2: the Alice book is developing two different algebras. To repeat Dave "there are relational algebras" [his lower case]. Of course in an algebra without names there's no RENAME -- there could not be. In Alice's 'named perspective' algebra of course there is RENAME. Alice is perfectly clear about the reasons, nothing sneaking.

BTW I strongly disagree an unnamed algebra is "much simpler". I think it's moderately harder for an implementor, let alone the users. I fully support TTM using attribute naming and only that (RM Pro 1).

I prefer the treatment here: https://en.wikipedia.org/wiki/Relational_algebra. Computation of new values, aggregation and transitive closure are not provided by the RA, but are requirements of a useful query language, so must be provided by other means. That's exactly what TTM and SQL do; SQL plausibly extends the RA in various areas; TTM extends the query language but not the RA.

You want to include everything on that wikipedia entry, including the outer joins? So you want to include NULLs in the RM? The wording is typically sloppy, so I can't determine whether (for example) transitive closure or aggregations are included out.

TTM does not specify an algebra, it specifies a D -- a family of languages. Appendix A is clear that A is not a D. Whatever SQL is, it's certainly not an extended relational algebra. To repeat: there is not a "the RA".

BTW, SQL has a shambles of mongrel 'named perspective' and 'positional perspective', with voluminous rules to make them co-exist, which are never the less far from complete. The Standard is now trying to deprecate the positional notation. So much for your "much simpler".

Quote from dandl on August 20, 2019, 12:47 am

Quote from AntC on August 19, 2019, 6:24 am

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully,

I think most are agreed he wasn't careful enough, and that's part of the subsequent/continuing definitional problems. (That's why TTM doesn't use the term.)

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

The Alice book doesn't use Codd's term in the main text, but relegates it to a Bibliographic note. Then goes on (in Part E) to consider various notions of Expressiveness and 'complete'. "We have not emphasized that phrase [relational completeness] here because subsequent research has suggested that a more natural notion of completeness can be described in terms of Turing computability (see Chapter 16)."
TTM does not use the term, as I already noted. You would have thought D&D would use it if they were happy it's carefully defined.
Date's RDB Dictionary is lukewarm. The entry mentions that the formal definition requires DEE and DUM to be valid relation values. And yet Codd (according to a recent note from Hugh) couldn't see the point of them. If Codd was so careful, why didn't he understand the consequence of his own definition?
Codd 1972 doesn't include transitive closure; and yet in a 1979 paper he presents it; and later in 1985 was affronted anybody should think it wasn't supported. (I mean the equivalent of Tutorial D's TCLOSE, not the generalised form RM VSS 5.) If Codd's 1972 definition was so careful, why not include it? Its equivalent does arise in First-Order Logic, whereas the generalised form requires Second-order.
DTATRM with the original version of RM Pro 18 does list what it considers "the usual", and that includes several operators not in Codd 1972, including SUMMARIZE and GROUP.
Appendix A includes primitive operators <AND>, <NOT> that are not allowed by Codd's definitions, because they're domain-dependent. And yet the A Algebra is perfectly workable as an algebra providing the surface language allows only safe query expressions. So Codd 1972 was not careful to separate the notions of a domain-dependent operator vs a safe query. Deciding whether an algebra expression is a safe query is NP-hard in general, BTW [Alice]; but if you start with a safe query in a surface language and translate that to an algebra expression with domain-dependent operations; and then rearrange the expression using the well-understood transformations; you'll still have a safe query.

and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

As Dave already said, there isn't a the Relational Algebra. And if you're wanting to be careful, any RA is not the baseline for relational completeness: the baseline (and this is where Codd 1972 is reasonably careful) is Predicate Logic. The RM is a means for expressing the extension of predicates using set theory; and the query language is for obtaining the extensions of predicate expressions from the current database instance. ... So the yardstick is: can some calculus/algebra express the equivalent of any predicate expression using the predicate symbols given for the base relations plus logical connectives and quantifiers?

.... That's spurious, wrong and irrelevant.

Suggest you read Codd 1972. The derivation from Predicate Logic is the whole point. The list of operators is a consequence. Which is why Date has a basis to say DEE and DUM must be included; and why we can say TCLOSE (or equivalent) must be included.

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

That's where Codd was not careful. RENAME is not needed in his RC/Alpha, because it's a calculus with the usual name-binding mechanisms. (It introduces variable names in the same way as do quantifiers or lambda-expressions.) He plain made an omission in interpreting RC to RA. And that was pointed out very quickly. So RENAME didn't sneak anywhere and yes it's included in "the usual" -- as does Alice.

I disagree. Where is your reference for that? Alice is perfectly happy to deal with a an unnamed RM, and in many ways it makes things much simpler. Users might disagree.

Reference 1: Codd 1972. As Erwin's message points out, Codd 1970 mostly considers the 'unnamed perspective' [Alice's term, although I'd prefer the 'positional perspective']. 1972 was too early in the evolution of the RM for Codd to properly consider the 'named perspective'.
Reference 2: the Alice book is developing two different algebras. To repeat Dave "there are relational algebras" [his lower case]. Of course in an algebra without names there's no RENAME -- there could not be. In Alice's 'named perspective' algebra of course there is RENAME. Alice is perfectly clear about the reasons, nothing sneaking.

BTW I strongly disagree an unnamed algebra is "much simpler". I think it's moderately harder for an implementor, let alone the users. I fully support TTM using attribute naming and only that (RM Pro 1).

I prefer the treatment here: https://en.wikipedia.org/wiki/Relational_algebra. Computation of new values, aggregation and transitive closure are not provided by the RA, but are requirements of a useful query language, so must be provided by other means. That's exactly what TTM and SQL do; SQL plausibly extends the RA in various areas; TTM extends the query language but not the RA.

You want to include everything on that wikipedia entry, including the outer joins? So you want to include NULLs in the RM? The wording is typically sloppy, so I can't determine whether (for example) transitive closure or aggregations are included out.

TTM does not specify an algebra, it specifies a D -- a family of languages. Appendix A is clear that A is not a D. Whatever SQL is, it's certainly not an extended relational algebra. To repeat: there is not a "the RA".

BTW, SQL has a shambles of mongrel 'named perspective' and 'positional perspective', with voluminous rules to make them co-exist, which are never the less far from complete. The Standard is now trying to deprecate the positional notation. So much for your "much simpler".

#27 · August 23, 2019, 12:24 am

Quote from dandl on August 23, 2019, 12:24 am

Quote from AntC on August 22, 2019, 11:09 am

Quote from dandl on August 20, 2019, 12:47 am

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

A considered response. I should point out that what I meant by 'careful' was with respect to his usage of 'relationally complete' in the context of a single paper. And yes, he bases that on FOPL.

The Alice book doesn't use Codd's term in the main text, but relegates it to a Bibliographic note. Then goes on (in Part E) to consider various notions of Expressiveness and 'complete'. "We have not emphasized that phrase [relational completeness] here because subsequent research has suggested that a more natural notion of completeness can be described in terms of Turing computability (see Chapter 16)."

And yet they use the term several times. But yes, they prefer a spectrum of expressiveness over a single bright line of completeness. And they are equally careful to define what they mean by that, using Codd as a starting point (see p271 for example).

TTM does not use the term, as I already noted. You would have thought D&D would use it if they were happy it's carefully defined.

TTM bails on the whole question by substituting 'usual operators'. A total miss and not careful at all, IMO.

Date's RDB Dictionary is lukewarm. The entry mentions that the formal definition requires DEE and DUM to be valid relation values. And yet Codd (according to a recent note from Hugh) couldn't see the point of them. If Codd was so careful, why didn't he understand the consequence of his own definition?

Codd 1972 doesn't include transitive closure; and yet in a 1979 paper he presents it; and later in 1985 was affronted anybody should think it wasn't supported. (I mean the equivalent of Tutorial D's TCLOSE, not the generalised form RM VSS 5.) If Codd's 1972 definition was so careful, why not include it? Its equivalent does arise in First-Order Logic, whereas the generalised form requires Second-order.

The TCLOSE is an odd beast - I need to think about that one. Rather than GTC (another odd beast) I prefer the WHILE in Alice, which too is second order.

DTATRM with the original version of RM Pro 18 does list what it considers "the usual", and that includes several operators not in Codd 1972, including SUMMARIZE and GROUP.

Which is bad. SUMMARIZE requires aggregation (second order) and GROUP is intrinsically no normalised (another of Codd's preconditions).

Suggest you read Codd 1972. The derivation from Predicate Logic is the whole point. The list of operators is a consequence. Which is why Date has a basis to say DEE and DUM must be included; and why we can say TCLOSE (or equivalent) must be included.

OK. Rather than argue further about who said what, when and where, is it safe to say:

there is a 'core' RA defined by Codd, bounded by the limits of FOPL

add the named perspective

add simple recursion (TC in TTM, see Alice p271)

add RVA/TVA (no longer normalised)

add a way to generate new values by computation (TBD)

add aggregation (second order)

add fixed point recursion (GTC in TTM, WHILE) (second order)

add D (now Turing Complete)

A spectrum/progression, with no agreement how far to go and still call it 'an' RA.

And no agreement whether (5) new values should be included or made part of some different 'domain algebra'.

Quote from AntC on August 22, 2019, 11:09 am

Quote from dandl on August 20, 2019, 12:47 am

Who are the 'most'? Do you have a reference for that? I would disagree, I think he was very careful but subsequent users of the term have been less so.

A considered response. I should point out that what I meant by 'careful' was with respect to his usage of 'relationally complete' in the context of a single paper. And yes, he bases that on FOPL.

The Alice book doesn't use Codd's term in the main text, but relegates it to a Bibliographic note. Then goes on (in Part E) to consider various notions of Expressiveness and 'complete'. "We have not emphasized that phrase [relational completeness] here because subsequent research has suggested that a more natural notion of completeness can be described in terms of Turing computability (see Chapter 16)."

And yet they use the term several times. But yes, they prefer a spectrum of expressiveness over a single bright line of completeness. And they are equally careful to define what they mean by that, using Codd as a starting point (see p271 for example).

TTM does not use the term, as I already noted. You would have thought D&D would use it if they were happy it's carefully defined.

TTM bails on the whole question by substituting 'usual operators'. A total miss and not careful at all, IMO.

Date's RDB Dictionary is lukewarm. The entry mentions that the formal definition requires DEE and DUM to be valid relation values. And yet Codd (according to a recent note from Hugh) couldn't see the point of them. If Codd was so careful, why didn't he understand the consequence of his own definition?

Codd 1972 doesn't include transitive closure; and yet in a 1979 paper he presents it; and later in 1985 was affronted anybody should think it wasn't supported. (I mean the equivalent of Tutorial D's TCLOSE, not the generalised form RM VSS 5.) If Codd's 1972 definition was so careful, why not include it? Its equivalent does arise in First-Order Logic, whereas the generalised form requires Second-order.

The TCLOSE is an odd beast - I need to think about that one. Rather than GTC (another odd beast) I prefer the WHILE in Alice, which too is second order.

DTATRM with the original version of RM Pro 18 does list what it considers "the usual", and that includes several operators not in Codd 1972, including SUMMARIZE and GROUP.

Which is bad. SUMMARIZE requires aggregation (second order) and GROUP is intrinsically no normalised (another of Codd's preconditions).

Suggest you read Codd 1972. The derivation from Predicate Logic is the whole point. The list of operators is a consequence. Which is why Date has a basis to say DEE and DUM must be included; and why we can say TCLOSE (or equivalent) must be included.

OK. Rather than argue further about who said what, when and where, is it safe to say:

there is a 'core' RA defined by Codd, bounded by the limits of FOPL
add the named perspective
add simple recursion (TC in TTM, see Alice p271)
add RVA/TVA (no longer normalised)
add a way to generate new values by computation (TBD)
add aggregation (second order)
add fixed point recursion (GTC in TTM, WHILE) (second order)
add D (now Turing Complete)

A spectrum/progression, with no agreement how far to go and still call it 'an' RA.

And no agreement whether (5) new values should be included or made part of some different 'domain algebra'.

Andl - A New Database Language - andl.org

#28 · August 23, 2019, 10:16 am

Quote from dandl on August 23, 2019, 12:24 am

OK. Rather than argue further about who said what, when and where, is it safe to say:

there is a 'core' RA defined by Codd, bounded by the limits of FOPL

add the named perspective

add simple recursion (TC in TTM, see Alice p271)

add RVA/TVA (no longer normalised)

add a way to generate new values by computation (TBD)

add aggregation (second order)

add fixed point recursion (GTC in TTM, WHILE) (second order)

add D (now Turing Complete)

A spectrum/progression, with no agreement how far to go and still call it 'an' RA.

And no agreement whether (5) new values should be included or made part of some different 'domain algebra'.

Sure, but I'm not sure agreement is necessary. Do what seems right.

A given relational algebra can be formally defined, but there isn't any technical or academic authority that mandates what "relational algebra" must be in general, though by convention it's a "relation" construct and a collection of operators on that construct. It's all a bit arbitrary and rather casually defined, but I'm not sure there's much compulsion for it to be otherwise.

In the software world, there isn't much interest in whether or not a language adheres to some theoretical formalism. Far more important (to the extent that even this matters) is the degree to which it's expressive and powerful.

Utility trumps purity.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#29 · August 23, 2019, 11:07 am

Quote from Dave Voorhis on August 23, 2019, 10:16 am

Quote from dandl on August 23, 2019, 12:24 am

OK. Rather than argue further about who said what, when and where, is it safe to say:

there is a 'core' RA defined by Codd, bounded by the limits of FOPL

add the named perspective

add simple recursion (TC in TTM, see Alice p271)

add RVA/TVA (no longer normalised)

add a way to generate new values by computation (TBD)

add aggregation (second order)

add fixed point recursion (GTC in TTM, WHILE) (second order)

add D (now Turing Complete)

A spectrum/progression, with no agreement how far to go and still call it 'an' RA.

And no agreement whether (5) new values should be included or made part of some different 'domain algebra'.

Sure, but I'm not sure agreement is necessary. Do what seems right.

A given relational algebra can be formally defined, but there isn't any technical or academic authority that mandates what "relational algebra" must be in general, though by convention it's a "relation" construct and a collection of operators on that construct. It's all a bit arbitrary and rather casually defined, but I'm not sure there's much compulsion for it to be otherwise.

In the software world, there isn't much interest in whether or not a language adheres to some theoretical formalism. Far more important (to the extent that even this matters) is the degree to which it's expressive and powerful.

Utility trumps purity.

But for the fact that purity in itself often provides kinds of utility that remain alas unseen. Such as longlastingness and robustness e.g.

Author of SIRA_PRISE

#30 · August 23, 2019, 7:43 pm

Quote from Erwin on August 23, 2019, 11:07 am

Quote from Dave Voorhis on August 23, 2019, 10:16 am

Quote from dandl on August 23, 2019, 12:24 am

OK. Rather than argue further about who said what, when and where, is it safe to say:

there is a 'core' RA defined by Codd, bounded by the limits of FOPL

add the named perspective

add simple recursion (TC in TTM, see Alice p271)

add RVA/TVA (no longer normalised)

add a way to generate new values by computation (TBD)

add aggregation (second order)

add fixed point recursion (GTC in TTM, WHILE) (second order)

add D (now Turing Complete)

A spectrum/progression, with no agreement how far to go and still call it 'an' RA.

And no agreement whether (5) new values should be included or made part of some different 'domain algebra'.

Sure, but I'm not sure agreement is necessary. Do what seems right.

A given relational algebra can be formally defined, but there isn't any technical or academic authority that mandates what "relational algebra" must be in general, though by convention it's a "relation" construct and a collection of operators on that construct. It's all a bit arbitrary and rather casually defined, but I'm not sure there's much compulsion for it to be otherwise.

In the software world, there isn't much interest in whether or not a language adheres to some theoretical formalism. Far more important (to the extent that even this matters) is the degree to which it's expressive and powerful.

Utility trumps purity.

But for the fact that purity in itself often provides kinds of utility that remain alas unseen. Such as longlastingness and robustness e.g.

When a formal construct can be found that contributes robustness, longevity, expressivity, prove-ability, compose-ability, etc., that's excellent.

When one can't be found, I'm not sure there's much to be gained by continuing to search for it as if it's just around the corner. It almost certainly isn't.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters