The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Aggregates in relational calculus

Quote from dandl on August 15, 2019, 11:47 pm
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

But what TTM describes is more or less exactly that host language. TTM provides a scalar type system and defines a number of other 'host language' features. TD suggests particular language features for doing computations of all kinds including variables, branches, loops and recursion, making it Turing complete. The extensions in Andl and Rel add aggregation via second order functions. They all stand outside the RA/RC, which is first order and not TC. I think you'll have your work cut out.

What TTM describes/defines is a ***unification*** of Codd's H and R that makes Codd's distinction entirely moot.

Whether aggregations (and transitive closures and their generalized incarnations for that matter) "stand outside RA/RC" or not, is entirely a matter of how you wish to define the latter.  Certain corners of the world are religiously sticking to Codd's view "because Codd said so", but no one is obliged to follow them in their religious zeal, and some of those who haven't done so have found very useful and workable (and implementable !) "extensions" to Codd's needlessly limited and restricted view of what the RM should be/do (and not).

We ***need*** to be able to do things like

SELECT COUNT*PRICE AS TOTAL_AMOUNT FROM ... and

SELECT * FROM CITIES C
WHERE 100000 <= (SELECT COUNT (*) FROM INHABITANTS I WHERE I.CNAME = C.CNAME)

(With sincere apologies to Hugh if this last example ruins his day and triggers bad memories :-) )

And I'm having a blank stare at "you'll have your work cut out".

Quote from johnwcowan on August 16, 2019, 1:31 am
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

Well, yes and no.  In particular:

  • Oracle still supports preprocessors for Fortran, Cobol, PL/I, and C/C++ that allow you to write SQL directly in your procedural code. and return the results in host variables a row at a time.
  • There are also stored procedures running server-side: in Oracle they can be in Oracle's bastard Ada (PL/SQL), C, C++, or Java, whereas PostgreSQL supports its PL/pgSQL language, C, Perl, Python, and Tcl (but not Java).
  • Finally, SQlite, where the application and the database run in the same process, has hooks for writing functions, aggregates, and virtual tables either in C/C++ or in any language that can be embedded in C, such as Python, Lua, or (in principle) Chibi Scheme or Guile Scheme.

What I was talking about was the ability to

SELECT UNIT_PRICE*UNIT_COUNT AS TOTAL_AMOUNT

which falls outside of anything Codd ever admitted in his model (well that might be an exaggeration but it is certainly true of Codd early yrs).  Preprocessors don't address that, nor do stored procedures (UDFs might indeed though) and solutions predicated on "application and DBMS in same process" are not likely to impress anyone understanding "data server" ...  (thinking of the "shared" kind of course.)

Quote from Erwin on August 17, 2019, 12:45 pm
Quote from dandl on August 15, 2019, 11:47 pm
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

But what TTM describes is more or less exactly that host language. TTM provides a scalar type system and defines a number of other 'host language' features. TD suggests particular language features for doing computations of all kinds including variables, branches, loops and recursion, making it Turing complete. The extensions in Andl and Rel add aggregation via second order functions. They all stand outside the RA/RC, which is first order and not TC. I think you'll have your work cut out.

[I had a feeling I was going to trigger some kind of response...]

What TTM describes/defines is a ***unification*** of Codd's H and R that makes Codd's distinction entirely moot.

H and R?

I don't think I have ever seen a clear, authoritative and widely accepted declaration of what the RA is and is not. But it has always seemed to me that:

  • the RA describes operations on relations (hence the 'R'),
  • the RA describes first order queries
  • the RA does not describe computations on attributes, aggregation, or transitive closure
  • the RA is not Turing complete
  • a practical query language must do all of the above, so it must combine RA with something else (aka host language/TTM).

But if you prefer to say that you have 'extended' the RA, I think we are arguing over definitions rather than substance.

Whether aggregations (and transitive closures and their generalized incarnations for that matter) "stand outside RA/RC" or not, is entirely a matter of how you wish to define the latter.  Certain corners of the world are religiously sticking to Codd's view "because Codd said so", but no one is obliged to follow them in their religious zeal, and some of those who haven't done so have found very useful and workable (and implementable !) "extensions" to Codd's needlessly limited and restricted view of what the RM should be/do (and not).

We ***need*** to be able to do things like

SELECT COUNT*PRICE AS TOTAL_AMOUNT FROM ... and

SELECT * FROM CITIES C
WHERE 100000 <= (SELECT COUNT (*) FROM INHABITANTS I WHERE I.CNAME = C.CNAME)

(With sincere apologies to Hugh if this last example ruins his day and triggers bad memories :-) )

Just so. And that's only the start of it. Alice deals with much of this.

And I'm having a blank stare at "you'll have your work cut out".

My impression was that the OP was trying to prove something and finding it difficult. Extending the RA is hardly likely to make it easier.

Andl - A New Database Language - andl.org
Quote from dandl on August 17, 2019, 2:10 pm

My impression was that the OP was trying to prove something and finding it difficult. Extending the RA is hardly likely to make it easier.

I can confirm that finding a solution (proofs included) for the problem of database constraints of arbitrary complexity (stated point of interest in a later post) is indeed difficult.  Especially if one is supposed to also include operations of aggregation that are needed for constraints such as "number of inscriptions for a session of a course cannot exceed the maximum stated in the database for that session".  More especially if the solution is also expected to behave correctly under features such as TTM's multiple assignment.  But if (part of) the problem (as perceived) is that the "version" of RA he's looking at does not have aggregation, then extending the RA to include aggregation (whatever the consequences for the correspondence to RC) is certainly not going to make it more difficult to deal with the problem as perceived.

An algebra is just a well-defined set of operators.  That's why at various points in time and in various places, I have defended the position that there is no such thing as "THE" RA.  Anyone can define any set of operators operating on relation values and returning relation values.  Any such set is an algebra.  Which may be the underlying reason for your "never seen a clear, authoritative and widely accepted declaration of what the RA is and is not.".

Anything else I can say on the matter would only be me repeating myself more than I already have.

Quote from Erwin on August 17, 2019, 3:31 pm

An algebra is just a well-defined set of operators.  That's why at various points in time and in various places, I have defended the position that there is no such thing as "THE" RA.  Anyone can define any set of operators operating on relation values and returning relation values.  Any such set is an algebra.  Which may be the underlying reason for your "never seen a clear, authoritative and widely accepted declaration of what the RA is and is not.".

Indeed, there is no such thing as "the relational algebra."

There are relational algebras.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Erwin on August 17, 2019, 3:31 pm
Quote from dandl on August 17, 2019, 2:10 pm

My impression was that the OP was trying to prove something and finding it difficult. Extending the RA is hardly likely to make it easier.

An algebra is just a well-defined set of operators.  That's why at various points in time and in various places, I have defended the position that there is no such thing as "THE" RA.  Anyone can define any set of operators operating on relation values and returning relation values.  Any such set is an algebra.  Which may be the underlying reason for your "never seen a clear, authoritative and widely accepted declaration of what the RA is and is not.".

I agree there is no such thing as the RA, and I would not expect it. But I would expect RA to constitute a clearly defined category: a set of algebras if you like, with a clearly defined membership function.

If we take your proposition, then an RA is closed over relations. There is no room for any operations on attributes. But if we relax that slightly and remember that (a) JOIN is dependent on testing attribute values for equality and (b) allow constant values, then we have Codd. So that does seem to be the irreducible minimum.

Alice says: "the family of unnamed algebra operators {σ, π,×,, −} is nonredundant, and the same is true for the named algebra operators {σ, π, , δ,, −}." These are referred to as SPJUN (select, project, join, union, negation) and SPJRUN (with rename). I think this is what most people expect of "the RA", but there is a lot it can't do (queries it cannot express).

Alice proposes various extensions: the domain calculus, new, while and so on. Is it still an RA if you add all these things? Is it still a bicycle if you add a V8 motor?

 

 

 

Andl - A New Database Language - andl.org
Quote from dandl on August 17, 2019, 2:10 pm
Quote from Erwin on August 17, 2019, 12:45 pm
Quote from dandl on August 15, 2019, 11:47 pm
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

 

What TTM describes/defines is a ***unification*** of Codd's H and R that makes Codd's distinction entirely moot.

H and R?

Yes, I'm waiting on the answer to that. H = "Host"; R = "Relational Calculus/Algebra"?

Codd's distinction between what and what?

I don't think I have ever seen a clear, authoritative and widely accepted declaration of what the RA is and is not.

"a clearly defined category ... clearly defined membership function" [from a later message]

We seem to be in the umptieth round of the semi-annual 'explain Relationally Complete to David B'. Since the copious previous answers here don't seem to be working for you, perhaps try another forum?

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

But it has always seemed to me that:

  • the RA describes operations on relations (hence the 'R'),
  • the RA describes first order queries
  • the RA does not describe computations on attributes, aggregation, or transitive closure
  • the RA is not Turing complete
  • a practical query language must do all of the above, so it must combine RA with something else (aka host language/TTM).

 

Quote from AntC on August 18, 2019, 12:35 pm
Quote from dandl on August 17, 2019, 2:10 pm

H and R?

Yes, I'm waiting on the answer to that. H = "Host"; R = "Relational Calculus/Algebra"?

Codd's distinction between what and what?

Host language and data language, as per "let us denote these two languages by H and R respectively", or something of that ilk.

Distinction between host language and data language.

Quote from AntC on August 18, 2019, 12:35 pm

I don't think I have ever seen a clear, authoritative and widely accepted declaration of what the RA is and is not.

"a clearly defined category ... clearly defined membership function" [from a later message]

We seem to be in the umptieth round of the semi-annual 'explain Relationally Complete to David B'. Since the copious previous answers here don't seem to be working for you, perhaps try another forum?

No, you're wrong about that. You really should read more carefully.

Relationally complete is a term coined by Codd and used in his 1972 paper to mean something he first defined very carefully, and then used to set a minimum bar for a future query language. Others since have used it in a similar way, to mean a query language at least as capable as the RA. But that isn't relevant to my question, which is about the RA itself, not its use as the baseline for relational completeness.

My issue is with the status of the definition of the RA. Codd defined his with great care, but you (and TTM) have not. I would think that RM Pre 18 envisages a set of operators rather larger than those proposed by Codd, but this is never made clear.

Since TTM provides in OO Pre 6 an explicit definition for aggregation, one would assume that aggregation is not part of the RA envisaged in RM Pre 18. Correct?

What about RENAME? Where did that sneak in? Certainly not in Codd. Is that part of some RA?

What about computing new values? Codd relied on constant values, and mentioned the need for computation. TTM specifies the creation of new values from literals, but makes no mention of creating new computed values. Is that to be found in some (extended) RA?

Contrast: where do you find the clear, authoritative and widely accepted declaration of what Turing complete is and is not. (It's in a 1936 paper by Turing, just as 'Relationally complete' is in Codd 1972.) Note there are plenty of formalisms provably Turing complete: the Universal Turing Machine; the Lambda Calculus; the SK-combinator calculus; the pi-calculus [Milne].

None of those formalisms are described/defined at the implementor's level your imagination seems to be limited to here:

I don't have a problem with implementing a relationally complete language (done). I have a problem with a thread entitled "Aggregates in relational calculus" with no valid basis.

Andl - A New Database Language - andl.org
Quote from Erwin on August 18, 2019, 8:19 pm
Quote from AntC on August 18, 2019, 12:35 pm
Quote from dandl on August 17, 2019, 2:10 pm

H and R?

Yes, I'm waiting on the answer to that. H = "Host"; R = "Relational Calculus/Algebra"?

Codd's distinction between what and what?

Host language and data language, as per "let us denote these two languages by H and R respectively", or something of that ilk.

Didn't happen. Unless you can provide a link?

Distinction between host language and data language.

They remain distinct, both in TTM and in every implementation I know of.

Andl - A New Database Language - andl.org