The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Formal definitions for Relation Constant, Transitive Closure, Extended Transitive Closure

PreviousPage 4 of 6Next
Quote from dandl on March 4, 2021, 11:20 pm

WHILE is a loop or recursion over UNION.

That's misleading. In very loose terms:

  • project and new value are about removing and adding attributes
  • join, union and not are about slicing and dicing
  • select (where) and while are about removing and adding tuples.

Using while and new value in concert a query can construct a relation of any dimensions partly or entirely of computed content. It fills what is otherwise a gap in the RA.

It's not misleading, and it's not a gap in "the" (?) RA.

I'm all for adding these if Hugh & Chris see fit to update Tutorial D to include while, but I'm not yet convinced there's a compelling reason to add it otherwise.

As I said, I don't think they were ever able to formally define the entire extended RA, and their treatment of relcons, aggregation and GTC are all incomplete. I think it's too late to fix that now, but at least adding while would satisfy the VSS for GTC. They do acknowledge that SQL has the edge there.

Not sure what "the entire extended RA" means and whilst the formal definitions may be incomplete, Tutorial D is complete for its purposes.

This is the nub of the problem: you don't know and D&D don't know and you really should.We all should.

  • Codd defined the RA at the level of Alice's SPJRUN, in algebraic form, with no type system and with all values created as literals.Very precisely, formally.
  • TTM requires the 'usual operators of the relational algebra' without ever defining precisely what those are.

That's perhaps because it doesn't really matter. There's no doubt an amorphous "too few operators" that results in annoying ergonomic weakness. Beyond that, if the language is otherwise Turing Complete -- or at least permits user-defined relational operators -- it will support any necessary computation.

It is an issue if there only permissible computation is some composition of a fixed set of algebraic operators. I presume -- and perhaps Hugh can correct me if I'm wrong -- one of the reasons for defining Tutorial D as a Turing Complete language is to permit arbitrary computations, including user-defined relational operators.

Indeed, I imagine a useful D being a language with no built-in relational operators (is it still a D if the relational model is only there if you load it?) but all the requisite language facilities to user-define them in-language -- or load a preferred set from a collection of pre-written definitions. Such a language would perhaps have an interesting type system and some simple standard library with I/O support and the like, but the relational model, databases, persistence, etc., would all be language-defined constructs.

  • DTATRM has a lengthy section on the subject, but mostly it is a definition of TD, not the RA. And it is incomplete (OO Pre 6 and RM VSS 5).

My definition of the ERA is roughly:

  • Codd's RA extended to the point where queries are at least as powerful as SQL. That turns out to be new value, aggregation and while/recursion (SPJRUN+VAW).
  • Not bound to any particular type system. That excludes GROUP and UNGROUP, which rely on RVA/TVA in the type system.
  • In algebraic form and not bound to any particular language syntax. That precludes using TD to define the ERA; TD should be defined by reference to the ERA.

TD is incomplete because of its many omissions, as you are aware. Rel is complete, because it is its own definition, but as a query language it is less powerful than SQL.

I suppose that depends how you define "powerful."

I can't think of any query you can express in SQL that you can't express in Tutorial D, but obviously not necessarily the same way SQL does it.

Again, if D&D see fit to add WHILE or other mechanisms to Tutorial D, then I'm happy to add them to Rel.

Otherwise, I add extensions to suit my own purposes and meet user requests, and so far I haven't seen a compelling need for WHILE and I haven't had a user request it. Note that Rel is open source and on GitHub, so anyone can send me merge requests. I'll approve them if the code is good, and in return I will add the contributor(s) to my distributed AUTHORS.txt file.

That sounds like the railway company who wouldn't build a station in the town because there weren't any people getting on or off trains who might need one. Build one and they might come.

Perhaps, but I suspect it's unlikely. I'm hard pressed to see what significant value WHILE adds.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

That's perhaps because it doesn't really matter. There's no doubt an amorphous "too few operators" that results in annoying ergonomic weakness. Beyond that, if the language is otherwise Turing Complete -- or at least permits user-defined relational operators -- it will support any necessary computation.

It matters. SQL/DML is successful precisely because it is predominantly first order, which allows query planning, optimisation and parallelisation that would not be possible if it were TC.

I suppose that depends how you define "powerful."

The usual way: the ability to write a program in that language.

  • SQL/PSM and TD are TC, so equivalent.
  • SQL/DML is more powerful than the corresponding RA part of TD, in both aggregation and fixed point recursion.

I can't think of any query you can express in SQL that you can't express in Tutorial D, but obviously not necessarily the same way SQL does it.

  1. How about STDDEV or RANK?
  2. The GTC as per DTATRM.

Again, if D&D see fit to add WHILE or other mechanisms to Tutorial D, then I'm happy to add them to Rel.

Otherwise, I add extensions to suit my own purposes and meet user requests, and so far I haven't seen a compelling need for WHILE and I haven't had a user request it. Note that Rel is open source and on GitHub, so anyone can send me merge requests. I'll approve them if the code is good, and in return I will add the contributor(s) to my distributed AUTHORS.txt file.

That sounds like the railway company who wouldn't build a station in the town because there weren't any people getting on or off trains who might need one. Build one and they might come.

Perhaps, but I suspect it's unlikely. I'm hard pressed to see what significant value WHILE adds.

 

Andl - A New Database Language - andl.org

I just thought I'd add two cents here:

The recursive cte expression in SQL is very helpful in that it guides your thinking. You produce a seed, then you produce new tuples out of the seed tuples and then continue producing new tuples from the produced tuples in the previous stage. Obviously all must have the same heading. Despite various flaws of SQL, this is one example where the syntax helps the user do the right thing (as I've mentioned before, types are not the only way to help the user, the way you construct syntax matters too, actually more IMHO)

I'm not sure, though, that a WHILE in TD would be as helpful because the recursive cte expression interplays with the declarative form of the select.

Quote from dandl on March 8, 2021, 2:44 am

That's perhaps because it doesn't really matter. There's no doubt an amorphous "too few operators" that results in annoying ergonomic weakness. Beyond that, if the language is otherwise Turing Complete -- or at least permits user-defined relational operators -- it will support any necessary computation.

It matters. SQL/DML is successful precisely because it is predominantly first order, which allows query planning, optimisation and parallelisation that would not be possible if it were TC.

Optimisation strategies are an implementation concern. I see no problem with defining a set of primitives which permit relational algebras (any one you like!) to be user-constructed, and define built-in optimisation strategies at that level.

Then whether or not equivalent relational expressions -- defined in terms of user-defined relational operators -- can be inferred and used by the optimiser, rather than having to be hard coded in the optimiser as is done in SQL and the like, is an interesting problem.

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I suppose that depends how you define "powerful."

The usual way: the ability to write a program in that language.

  • SQL/PSM and TD are TC, so equivalent.
  • SQL/DML is more powerful than the corresponding RA part of TD, in both aggregation and fixed point recursion.

I can't think of any query you can express in SQL that you can't express in Tutorial D, but obviously not necessarily the same way SQL does it.

  1. How about STDDEV or RANK?

RANK is part of Tutorial DRel implements it, along with a facility to define new aggregate operators that I've linked here before. Though obviously my point was about the power of the language in general, not which aggregate operators are or are not provided by default. (Indeed, I don't think every SQL implementation provides STDDEV or RANK.)

  1. The GTC as per DTATRM.

Code as needed in a user-defined operator.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from tobega on March 8, 2021, 8:55 am

I just thought I'd add two cents here:

The recursive cte expression in SQL is very helpful in that it guides your thinking. You produce a seed, then you produce new tuples out of the seed tuples and then continue producing new tuples from the produced tuples in the previous stage. Obviously all must have the same heading. Despite various flaws of SQL, this is one example where the syntax helps the user do the right thing (as I've mentioned before, types are not the only way to help the user, the way you construct syntax matters too, actually more IMHO)

I'm not sure, though, that a WHILE in TD would be as helpful because the recursive cte expression interplays with the declarative form of the select.

It's the same. I've implemented it in Andl, and what you describe is exactly what it does. Here is a simple TCLOSE:

MM .while( {{ z:=MAJOR_P#, MINOR_P# }} compose MM .select{ MAJOR_P#, z:=MINOR_P# } )

The first MM is the seed (heading as per DTATRM). The argument to while is an expression evaluated once per new tuple, as per SQL. It stops when the expression generates no new tuples.

Here is GTC:

def gtc(mmq:MMQ) => do {
let mmqseed := MMQ .{ MAJOR_P#, MINOR_P#, QTY, aggqty:=QTY, level:=ord() } 
let mmqtc := mmqseed .while( 
( {{ MAJOR_P#, zj:=MINOR_P#, aggqty, level:=ord() }} compose MMQ .{ zj:=MAJOR_P#, MINOR_P#, QTY } ) 
.{ MAJOR_P#, MINOR_P#, QTY, aggqty := aggqty*QTY, level } )
print(mmqtc .pp)
mmqtc join {{ MAJOR_P# := 'P1', MINOR_P# := 'P5' }}
(mmqtc join {{ MAJOR_P# := 'P1', MINOR_P# := 'P5' }}) .{ fold(+,aggqty) }
}

Same logic, but a new value for aggqty computed for each new tuple. The last step is an aggregation, which produces the same result as per DTATRM.

Those queries are safe, but this one would produce an infinite series of integers without the where.

{{ z:=0 }} .while( {{z:=z+1}} .where(z<10) )

 

Andl - A New Database Language - andl.org

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I have no idea what that would look like. Feel free to outline your vision...

  1. The GTC as per DTATRM.

Code as needed in a user-defined operator.

Have you ever written the code? It looks hard. I can't see how to do it in Andl, with no facility to iterate over tuples.

Andl - A New Database Language - andl.org
Quote from dandl on March 8, 2021, 12:46 pm

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I have no idea what that would look like. Feel free to outline your vision...

I don't know what it would look like either, and that's exactly what makes it interesting.

We've established that as far as we know, no existing popular (at least) language effectively supports it -- at least, not without undesirable caveats -- and yet every one of us who has created a relational model implementation or a D has presumably already built all the mechanisms that such a language would need.

I'll think on it.

  1. The GTC as per DTATRM.

Code as needed in a user-defined operator.

Have you ever written the code? It looks hard. I can't see how to do it in Andl, with no facility to iterate over tuples.

I haven't written the code. Tutorial D does have a facility to iterate over tuples (ARRAYs), so there is that.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 8, 2021, 2:18 pm
Quote from dandl on March 8, 2021, 12:46 pm

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I have no idea what that would look like. Feel free to outline your vision...

I don't know what it would look like either, and that's exactly what makes it interesting.

We've established that as far as we know, no existing popular (at least) language effectively supports it -- at least, not without undesirable caveats -- and yet every one of us who has created a relational model implementation or a D has presumably already built all the mechanisms that such a language would need.

I'll think on it.

There is no problem whatever in getting an existing GP language to support the relational model per Codd. None, as long as you stick to the numbered perspective, literal values and no type system.

The problems we have all run into are:

  1. The TTM type system, and the culture clash between the named perspective and the the way named records are handled in all those languages. For SQL it was natural to go with named columns, and TTM took that idea and turned it into a fully-fledged type system, but it's just too different. So the starting point regardless has to be a type system that harmonises with chosen target languages. In my latest Andl.NET I chose to use C# generics, parameterised on an explicit header type, which in turn is a set of strings.
  2. The embedded expression language used to create new values within the type system. SQL is quite limited in that regard, which makes backending onto existing RDBMS nearly impossible. In Andl.NET I used a function/relcon approach, which works better and is SQL backend-able.

So I was able to create an implementation of the 9 operator ERA (SPJRUN+VAW) natively within C#: fully functional, but unfortunately not immune to runtime type errors without a compiler mod or preprocessor. I think that's a good start, but if you want to drop the RA entirely I have no idea where you would go.

  1. The GTC as per DTATRM.

Code as needed in a user-defined operator.

Have you ever written the code? It looks hard. I can't see how to do it in Andl, with no facility to iterate over tuples.

I haven't written the code. Tutorial D does have a facility to iterate over tuples (ARRAYs), so there is that.

Pretty clunky isn't it? Not something you're going to knock up in 5 minutes, and not generalisable either. Sounds like a good argument for WHILE.

And incidentally, WHILE turns out to be really useful for other times you want to add new tuples, not just GTC.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 9, 2021, 12:29 am
Quote from Dave Voorhis on March 8, 2021, 2:18 pm
Quote from dandl on March 8, 2021, 12:46 pm

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I have no idea what that would look like. Feel free to outline your vision...

I don't know what it would look like either, and that's exactly what makes it interesting.

We've established that as far as we know, no existing popular (at least) language effectively supports it -- at least, not without undesirable caveats -- and yet every one of us who has created a relational model implementation or a D has presumably already built all the mechanisms that such a language would need.

I'll think on it.

There is no problem whatever in getting an existing GP language to support the relational model per Codd. None, as long as you stick to the numbered perspective, literal values and no type system.

The problems we have all run into are:

  1. The TTM type system, and the culture clash between the named perspective and the the way named records are handled in all those languages. For SQL it was natural to go with named columns, and TTM took that idea and turned it into a fully-fledged type system, but it's just too different. So the starting point regardless has to be a type system that harmonises with chosen target languages. In my latest Andl.NET I chose to use C# generics, parameterised on an explicit header type, which in turn is a set of strings.
  2. The embedded expression language used to create new values within the type system. SQL is quite limited in that regard, which makes backending onto existing RDBMS nearly impossible. In Andl.NET I used a function/relcon approach, which works better and is SQL backend-able.

So I was able to create an implementation of the 9 operator ERA (SPJRUN+VAW) natively within C#: fully functional, but unfortunately not immune to runtime type errors without a compiler mod or preprocessor. I think that's a good start, but if you want to drop the RA entirely I have no idea where you would go.

Sorry, I must have missed something. I don't know what "9 operator ERA (SPJRUN+VAW)" means.

The possibility of runtime type errors is the sort of "undesirable caveats" I had in mind. Indeed, that's why we implement whole new languages -- to get around precisely those limitations in the popular programming languages.

What's interesting to me is what a programming language -- a general-purpose programming language -- would need to look like to make it (relatively) easy for a programmer to create a D using it.

  1. The GTC as per DTATRM.

Code as needed in a user-defined operator.

Have you ever written the code? It looks hard. I can't see how to do it in Andl, with no facility to iterate over tuples.

I haven't written the code. Tutorial D does have a facility to iterate over tuples (ARRAYs), so there is that.

Pretty clunky isn't it? Not something you're going to knock up in 5 minutes, and not generalisable either. Sounds like a good argument for WHILE.

And incidentally, WHILE turns out to be really useful for other times you want to add new tuples, not just GTC.

If I'm using Rel, it's not something I generalise. I create an operator as needed.

Given that it's straightforward to create user-defined operators, I don't see the need for a built-in WHILE.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 9, 2021, 10:23 am
Quote from dandl on March 9, 2021, 12:29 am
Quote from Dave Voorhis on March 8, 2021, 2:18 pm
Quote from dandl on March 8, 2021, 12:46 pm

Indeed, the more I think about it, the more it seems to me that defining a language to support arbitrary user-defined relational models is a more interesting -- and perhaps worthwhile -- general language design problem to solve than the usual rearranging of some set of built-in relational operators to make a SQL replacement.

I have no idea what that would look like. Feel free to outline your vision...

I don't know what it would look like either, and that's exactly what makes it interesting.

We've established that as far as we know, no existing popular (at least) language effectively supports it -- at least, not without undesirable caveats -- and yet every one of us who has created a relational model implementation or a D has presumably already built all the mechanisms that such a language would need.

I'll think on it.

There is no problem whatever in getting an existing GP language to support the relational model per Codd. None, as long as you stick to the numbered perspective, literal values and no type system.

The problems we have all run into are:

  1. The TTM type system, and the culture clash between the named perspective and the the way named records are handled in all those languages. For SQL it was natural to go with named columns, and TTM took that idea and turned it into a fully-fledged type system, but it's just too different. So the starting point regardless has to be a type system that harmonises with chosen target languages. In my latest Andl.NET I chose to use C# generics, parameterised on an explicit header type, which in turn is a set of strings.
  2. The embedded expression language used to create new values within the type system. SQL is quite limited in that regard, which makes backending onto existing RDBMS nearly impossible. In Andl.NET I used a function/relcon approach, which works better and is SQL backend-able.

So I was able to create an implementation of the 9 operator ERA (SPJRUN+VAW) natively within C#: fully functional, but unfortunately not immune to runtime type errors without a compiler mod or preprocessor. I think that's a good start, but if you want to drop the RA entirely I have no idea where you would go.

Sorry, I must have missed something. I don't know what "9 operator ERA (SPJRUN+VAW)" means.

The possibility of runtime type errors is the sort of "undesirable caveats" I had in mind. Indeed, that's why we implement whole new languages -- to get around precisely those limitations in the popular programming languages.

What's interesting to me is what a programming language -- a general-purpose programming language -- would need to look like to make it (relatively) easy for a programmer to create a D using it.

Maybe we could start by looking at how you do the static type checking in Rel. I assume you have a separate compilation step that type-checks all the things? Then we can try to translate that into type-system features.

Off the top of my head I assume you need some element of structural typing of tuples. But it's not just by type, you need name-and-type. As far as I can tell, OCAML objects work a bit like that, but their records don't. Scala has some elements of it for interfaces. And of course a whole lot of dynamic languages, if we want to count them. Julia has structural typing for implicit types, but nominal typing for explicit types, and unfortunately has interleaved type-checking and execution. Typescript actually seems to be able to do it, and it does type-checking in a compile-step.

Next, you need to be able to create types dynamically, I suppose, and have them inferred? Or would you require an explicit type declaration for every new heading produced?

What else?

PreviousPage 4 of 6Next