The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Aggregates in relational calculus

I have recently found myself wanting to convert from relational algebra to calculus as part of a database implementation. All seemed to make sense until I came to deal with aggregates.

I did a quick perusal of the literature and found the paper "Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions" by Anthony Klug. I've only read over it a few times so far, so I'm still at the stage of trying to understand it. But, I thought I would ask here if there are pieces of literatures beside Klug that could be recommened for this topic?

As far as I'm aware neither the RA nor RC deal with aggregates (well, or at all).

TTM OO Pre 6 sets out the behaviour of an aggregate operator as an iterated function. This deals with most but not all kinds of aggregations.

The Alice book deals briefly with aggregates (see p91).

My TTM implementation Andl allows arbitrary aggregates to be defined as second order functions, akin to map. Tutorial D does not.

What specifically are you trying to do?

 

Andl - A New Database Language - andl.org
Quote from Joe on August 14, 2019, 9:55 pm

I have recently found myself wanting to convert from relational algebra to calculus as part of a database implementation. All seemed to make sense until I came to deal with aggregates.

Hi Joe, "implementation" sounds like you're wanting a practical solution(?) rather than theoretical capability.

I did a quick perusal of the literature and found the paper "Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions" by Anthony Klug. I've only read over it a few times so far, so I'm still at the stage of trying to understand it. But, I thought I would ask here if there are pieces of literatures beside Klug that could be recommened for this topic?

Klug is one of the classics for theory. David B also points to other theoretical treatments:

As far as I'm aware neither the RA nor RC deal with aggregates

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

Another theoretical approach (more in the style of DTATRM Appendix A) is the Hall, Hitchcock & Todd 1975 approach to calculations -- essentially lambda-calculus expressions within the query language, combined with Tutorial D's GROUP operator or similar.

Quote from dandl on August 14, 2019, 11:56 pm

My TTM implementation Andl allows arbitrary aggregates to be defined as second order functions, akin to map. Tutorial D does not.

Though Rel extends Tutorial D so it does. See https://reldb.org/c/index.php/read/user-defined-aggregate-operators/

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on August 15, 2019, 10:55 am

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

Quote from dandl on August 14, 2019, 11:56 pm

As far as I'm aware neither the RA nor RC deal with aggregates (well, or at all).

TTM OO Pre 6 sets out the behaviour of an aggregate operator as an iterated function. This deals with most but not all kinds of aggregations.

The Alice book deals briefly with aggregates (see p91).

Thanks. I shall checkout the Alice book.

What specifically are you trying to do?

It's for an implementation of constraint checking, for when constraints are described using arbitrary expressions. There are a few papers (plus a thesis) by Blaustein, J.M.Nicolas, and Decker, all of which describe similar approaches to efficient evaluation. The techniques hinge on having an expression in the form of a calculus sentence rather than in algebra. Once in calculus, you can apply various simplifications, and also (potentially) specialise the expression on the basis of the prefix quantifiers.

I have something that works for expressions that are written in calculus. I then embarked on the algebra-to-calculus mapping, thinking it would be a doddle; which it was, until I remembered about aggregates.

Quote from AntC on August 15, 2019, 10:55 am
Hi Joe, "implementation" sounds like you're wanting a practical solution(?) rather than theoretical capability.
Yes. But also, wanting to make sure any implementation I do is backed with sound theory : )

Another theoretical approach (more in the style of DTATRM Appendix A) is the Hall, Hitchcock & Todd 1975 approach to calculations -- essentially lambda-calculus expressions within the query language, combined with Tutorial D's GROUP operator or similar.

Thanks. I was unaware of this work. I see that there is a PDF version marked up with notes from Hugh Darwen; I'll take a read of it.

Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Indeed. Codd 1972 is quite explicit that the query language is embedded in some 'host' language, and that's where you carry out calculations, including aggregating. If you're wanting a practical implementation, I'd be inclined to take that approach.

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

But what TTM describes is more or less exactly that host language. TTM provides a scalar type system and defines a number of other 'host language' features. TD suggests particular language features for doing computations of all kinds including variables, branches, loops and recursion, making it Turing complete. The extensions in Andl and Rel add aggregation via second order functions. They all stand outside the RA/RC, which is first order and not TC. I think you'll have your work cut out.

Andl - A New Database Language - andl.org
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

Well, yes and no.  In particular:

  • Oracle still supports preprocessors for Fortran, Cobol, PL/I, and C/C++ that allow you to write SQL directly in your procedural code. and return the results in host variables a row at a time.
  • There are also stored procedures running server-side: in Oracle they can be in Oracle's bastard Ada (PL/SQL), C, C++, or Java, whereas PostgreSQL supports its PL/pgSQL language, C, Perl, Python, and Tcl (but not Java).
  • Finally, SQlite, where the application and the database run in the same process, has hooks for writing functions, aggregates, and virtual tables either in C/C++ or in any language that can be embedded in C, such as Python, Lua, or (in principle) Chibi Scheme or Guile Scheme.
Quote from johnwcowan on August 16, 2019, 1:31 am
Quote from Erwin on August 15, 2019, 9:55 pm
Quote from AntC on August 15, 2019, 10:55 am

Hmmmmmmmmmm.  The impracticalites of that are why even SQL didn't follow suit.  (And were summarized by Hugh better than anyone can as : it seemed to us if we wanted to do a computation we'd be forced to halt the query, do the computation in the host language and then somehow get that computation back into the query processing mechanism so it could continue with the rest of the query.)

  • Finally, SQlite, where the application and the database run in the same process, has hooks for writing functions, aggregates, and virtual tables either in C/C++ or in any language that can be embedded in C, such as Python, Lua, or (in principle) Chibi Scheme or Guile Scheme.

Andl does this too. Both the Postgres and SQLite implementations allow arbitrary functions including aggregation to be executed during the evaluation of a query.

Andl - A New Database Language - andl.org