The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Correlated subqueries

PreviousPage 5 of 8Next
Quote from Jake Wheat on October 28, 2020, 1:52 am
Quote from dandl on October 27, 2020, 10:18 am

I generalised aggregation in Tutorial D and implemented it in Rel, which is documented at https://reldb.org/c/index.php/read/user-defined-aggregate-operators/

You did indeed. I was surprised at the complexity: I had implemented it in Andl with a single operator : fold().

Can you sensibly provide all SQL window functionality via fold? Do you ask users to write window queries using fold directly instead? Or is this only about user defined window functions?

Not at all. The fold() function enables generalised aggregation. The arguments are an attribute name and binary operator, and the result is the aggregated value. Eg fold(income,+) is SUM(INCOME). Then you can write a function definition to make that more convenient.

Windowing functions are enabled by a different function: order(attribute). A running sum looks like this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

The functions are hard-coded: LEAD/LAG/NTH, ORD, ORDG (order within group). Aggregation is the running value, and resets on change of ordering attribute. It does most of what SQL does, and it's dead easy to use and implement. Dave could knock it off in a weekend if he didn't have a philosophical objection.

As noted in the conclusion, implementing windowing functions is straightforward. The ad hoc generalisation of windowing is to simultaneously be able to access the current tuple (or attribute value) and arbitrarily any other tuple (or value of the same attribute.) Regardless what mechanism is used to implement it, semantically it is an array.

I look forward to seeing it. No, Andl has no arrays and the semantics of its windowing are not those of an array. There is no means provided to store a sorted dataset and the relative access provided by LEAD/LAG etc is transitory. The mental model is more like inspecting a flow of traffic: you can look at a car, or the one in front or the one behind, but you can never do the things you can with arrays, like absolute indexing or iterating over the whole array.

I think there may be a way to iterate over a sorted array representing the entire window and do an operation that depends on that order in SQL windows, but I don't think I've ever seen it in use.

No, that really isn't what it's for. ARRAY is the wrong concept.

 

Andl - A New Database Language - andl.org

I like random access. I have user interface mechanisms that make use of it, much like the LIMIT and OFFSET in many SQL dialects but more flexible.

TTM despises it in the relational model, but this is outside the relational model and in the realm of general programming.

In language design there is occasional need for indexed retrieval of a single item from a collection, although I find I use it less and less. I like map and fold because they bring the function to the collection, rather than picking out items and feeding them to a function.

But my intention here was only to talk about the power of the query language, not any 3GL associated with it.

I generalised aggregation in Tutorial D and implemented it in Rel, which is documented at https://reldb.org/c/index.php/read/user-defined-aggregate-operators/

You did indeed. I was surprised at the complexity: I had implemented it in Andl with a single operator : fold().

All of the complexity is to seamlessly integrate it into the existing Tutorial D framework without changes to the current Tutorial D specification. Conceptually, it's trivial.

Your generalised aggregation is explained in a 9 page document, plus footnotes, and introduces several new reserved words. I think that sets a new bar for 'trivial'.

Fold is fine, but is of course limited to a notion of the next tuple or value. It is not the same as an array construct, which is more general.

They are completely unrelated. Fold does roughly what your 9 page PDF does. Windowing is separate.

But I think you've already made up your mind.

One of the singular and notable strengths of TTM: it defined D to be a language family rather than a specific language, which leaves us free to choose much of the syntax and semantics as we see fit. I have made up my mind about my implementation. You are free to make up your mind about yours. We are all free to do the same.

Which is as it should be.

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Andl - A New Database Language - andl.org
Quote from dandl on October 28, 2020, 5:52 am

I like random access. I have user interface mechanisms that make use of it, much like the LIMIT and OFFSET in many SQL dialects but more flexible.

TTM despises it in the relational model, but this is outside the relational model and in the realm of general programming.

In language design there is occasional need for indexed retrieval of a single item from a collection, although I find I use it less and less. I like map and fold because they bring the function to the collection, rather than picking out items and feeding them to a function.

But my intention here was only to talk about the power of the query language, not any 3GL associated with it.

I generalised aggregation in Tutorial D and implemented it in Rel, which is documented at https://reldb.org/c/index.php/read/user-defined-aggregate-operators/

You did indeed. I was surprised at the complexity: I had implemented it in Andl with a single operator : fold().

All of the complexity is to seamlessly integrate it into the existing Tutorial D framework without changes to the current Tutorial D specification. Conceptually, it's trivial.

Your generalised aggregation is explained in a 9 page document, plus footnotes, and introduces several new reserved words. I think that sets a new bar for 'trivial'.

It's conceptually trivial, and were it built into the language from the start would be implementationlly trivial. The setup and description are appropriately lengthy, but it's essentially an academic paper without the usual style affectations or a literature review which would make it even longer.

Fold is fine, but is of course limited to a notion of the next tuple or value. It is not the same as an array construct, which is more general.

They are completely unrelated. Fold does roughly what your 9 page PDF does. Windowing is separate.

Yes, that's my point. Fold is limited to a notion of the next tuple or value. Windowing is an arbitrary operation on a defined partition, a set of rows/tuples.

But I think you've already made up your mind.

One of the singular and notable strengths of TTM: it defined D to be a language family rather than a specific language, which leaves us free to choose much of the syntax and semantics as we see fit. I have made up my mind about my implementation. You are free to make up your mind about yours. We are all free to do the same.

Which is as it should be.

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Not sure what that does, but it looks like S .order(SNAME) generates a sorted arra- something... :-)

It's unlikely I'll add anything like SQL windowing as an extension to Rel unless there's user demand. I don't need it, Hugh hasn't mentioned needing it, and I haven't had any of the various users mention needing it, and my aggregation document was sufficient to show how it could be done.

By the way, why do some of your operators have dots in front of the name (.order and .select) and some don't (fold)?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

They are completely unrelated. Fold does roughly what your 9 page PDF does. Windowing is separate.

Yes, that's my point. Fold is limited to a notion of the next tuple or value. Windowing is an arbitrary operation on a defined partition, a set of rows/tuples.

Fold has no concept of 'next'. It takes each of the values in turn and adds them to an accumulator, in no particular order. In effect it does aggregation as per OO Pre 6, and appears similar to what you describe on p2 in yours. [My formal treatment of aggregation is quite different, as a single operator taking a list of values as an argument, and appears more similar to your p4, although the underlying 'trivial' concept still escapes me).

I think windowing always requires an ordered list, not a set.

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Not sure what that does, but it looks like S .order(SNAME) generates a sorted arra- something... :-)

The return value of .order() is just a relation like any other. But internally, there is an ordering made available to later operators. Then fold appearing after order returns a running sum, in some random order, which resets on each change of the ordering attribute SNAME. The only effect of order is to enable/change the behaviour of operators that follow it.

It's unlikely I'll add anything like SQL windowing as an extension to Rel unless there's user demand. I don't need it, Hugh hasn't mentioned needing it, and I haven't had any of the various users mention needing it, and my aggregation document was sufficient to show how it could be done.

By the way, why do some of your operators have dots in front of the name (.order and .select) and some don't (fold)?

The dot means there is an extra first argument, being the preceding value. Here fold(A,op) is a function of 2 arguments; .order(A) is a also function of two arguments. In principle you can write order(R,A) instead.

The idea in Andl, is that any operator F(...) with N arguments can be written as .F(...) with N-1 arguments, allowing a left-to-right style. A unary operator can be written asf(arg) or  arg .f.  A binary operator can be written f(a,b), a .f(b) or a f b. You can write r1 join r2  or r1 .join(r2) or join(r1,r2).

It's kind of experimental, but it does reduce the need for parentheses, which I prefer.

The dot means the first argument is the relation on the left. You can in principle write a join b, a .join(b) or join(a,b).

Andl - A New Database Language - andl.org
Quote from dandl on October 28, 2020, 2:12 pm

They are completely unrelated. Fold does roughly what your 9 page PDF does. Windowing is separate.

Yes, that's my point. Fold is limited to a notion of the next tuple or value. Windowing is an arbitrary operation on a defined partition, a set of rows/tuples.

Fold has no concept of 'next'. It takes each of the values in turn and adds them to an accumulator, in no particular order. In effect it does aggregation as per OO Pre 6, and appears similar to what you describe on p2 in yours. [My formal treatment of aggregation is quite different, as a single operator taking a list of values as an argument, and appears more similar to your p4, although the underlying 'trivial' concept still escapes me).

The trivial concept is that it's based on a sorted array of tuples.

The complexity is to fit it into the existing Tutorial D specification via extensions rather than alterations.

I think windowing always requires an ordered list, not a set.

True. I off-handedly and rather carelessly meant "set" informally -- a bunch of things -- rather than a set-theoretical set.

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Not sure what that does, but it looks like S .order(SNAME) generates a sorted arra- something... :-)

The return value of .order() is just a relation like any other. But internally, there is an ordering made available to later operators.

Claiming implementation magic doesn't escape the fact that what you've done is implement a non-relational system. Your "relations" are not relations. Defined formally and appropriately proven, there's perhaps nothing wrong with that. You could at least make an argument that the relational model is insufficient and would benefit from sortsets (or whatever you call your like-relations-but-with-tuple-order) instead of relations.

Whether it would hold up to scrutiny or not is a separate issue. (With the "not" side perhaps being an counterargument that what you really want are relations and arrays, which is of course what Tutorial D does.)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Not sure what that does, but it looks like S .order(SNAME) generates a sorted arra- something... :-)

The return value of .order() is just a relation like any other. But internally, there is an ordering made available to later operators.

Claiming implementation magic doesn't escape the fact that what you've done is implement a non-relational system. Your "relations" are not relations. Defined formally and appropriately proven, there's perhaps nothing wrong with that. You could at least make an argument that the relational model is insufficient and would benefit from sortsets (or whatever you call your like-relations-but-with-tuple-order) instead of relations.

Whether it would hold up to scrutiny or not is a separate issue. (With the "not" side perhaps being an counterargument that what you really want are relations and arrays, which is of course what Tutorial D does.)

I don't see it that way. There is absolutely no point, internally or externally, where a sorted version of the relation comes into existence. Conceptually, the only thing added to the data model is an ordering function. The expression 'tree' is decorated by the addition of an ordering function, which various other parts of the tree can call upon for their own purposes. You can't save an ordered relation, pass it as an argument or distinguish it in any way. In TTM terms R .order(a) = R. They are the very same value.

So in the case of

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

You might find it easier to understand if it was syntactically rewritten as

S .select{ SNAME, Running := fold(+,STATUS,{SNAME}) }

The ordering has its effect on the function, not on the data. The fold function, under the influence of an ordering argument, will sort its input privately in computing a result.

Here's another. This function adds a sequence number based on an ordering by name:

S .order(SNAME) .select{ Sequence := ORD() }

You might rewrite it as:

S .select{ Sequence := ORD({SNAME}) }

You could add this to Rel easily. This is simply EXTEND, with an the ORD() function that takes one or more attribute names as its argument.

Andl - A New Database Language - andl.org
Quote from dandl on October 29, 2020, 1:15 am

I don't see it that way. There is absolutely no point, internally or externally, where a sorted version of the relation comes into existence. Conceptually, the only thing added to the data model is an ordering function. The expression 'tree' is decorated by the addition of an ordering function, which various other parts of the tree can call upon for their own purposes. You can't save an ordered relation, pass it as an argument or distinguish it in any way. In TTM terms R .order(a) = R. They are the very same value.

What's the difference between 'a sorted version of the relation coming into existence' and having an ordering function in the syntax? It's about the user explaining to the system what they want, not about how the system executes it.

The approach Dave suggests, which is also what I would have thought of, is to model window functions as operations on sorted lists of tuples, both for the person writing the queries, and for the the person implementing custom window functions (so the user would write a query which explicitly turns a relation into a sorted list of tuples, then feeds it into the window function, then turns it back into a relation after if needed - and this is only about the meaning of the query, not how it is actually executed).

What you are suggesting, if I follow it correctly, is to keep with the 'only relations' abstraction, and to hide the fact that the data may be sorted at some point, and from the query writer's perspective, it's still a purely relational operator which takes relations and produces them. But you didn't quite achieve this 100% in your syntax - it explicitly has order in there. Maybe this is a good user friendly syntax for the operation, but I think it has the potential to be confusing because you tell the user that they only work with relations, but they are kind of being half exposed to something underneath that abstraction too.

Is this is a good general approach - to always hide new operations behind a relational facade, or is it sometimes better to expose when they seem to more properly be on some other kind of data structure (in this case, a sorted list of tuples)?

What's the precise abstraction you think is good for the implementor of custom window functions? You suggest that it isn't a sorted list with potential random access, but something less powerful.

Quote from Jake Wheat on October 29, 2020, 2:20 am
Quote from dandl on October 29, 2020, 1:15 am

I don't see it that way. There is absolutely no point, internally or externally, where a sorted version of the relation comes into existence. Conceptually, the only thing added to the data model is an ordering function. The expression 'tree' is decorated by the addition of an ordering function, which various other parts of the tree can call upon for their own purposes. You can't save an ordered relation, pass it as an argument or distinguish it in any way. In TTM terms R .order(a) = R. They are the very same value.

What's the difference between 'a sorted version of the relation coming into existence' and having an ordering function in the syntax? It's about the user explaining to the system what they want, not about how the system executes it.

Not from my perspective, as a tool builder. For me, it's about listening carefully to a user problem an coming up with a solution in the form of a tool that can be rigorously specified and tested, and that the user will choose as a satisfactory solution to that problem. Then if the user has the wrong mental model of what the tool really does, they will get unexpected results.

BTW SQL has no sorted tables either.

The approach Dave suggests, which is also what I would have thought of, is to model window functions as operations on sorted lists of tuples, both for the person writing the queries, and for the the person implementing custom window functions (so the user would write a query which explicitly turns a relation into a sorted list of tuples, then feeds it into the window function, then turns it back into a relation after if needed - and this is only about the meaning of the query, not how it is actually executed).

Why? I don't see the point. Sounds worse than SQL.

What you are suggesting, if I follow it correctly, is to keep with the 'only relations' abstraction, and to hide the fact that the data may be sorted at some point, and from the query writer's perspective, it's still a purely relational operator which takes relations and produces them. But you didn't quite achieve this 100% in your syntax - it explicitly has order in there. Maybe this is a good user friendly syntax for the operation, but I think it has the potential to be confusing because you tell the user that they only work with relations, but they are kind of being half exposed to something underneath that abstraction too.

Is this is a good general approach - to always hide new operations behind a relational facade, or is it sometimes better to expose when they seem to more properly be on some other kind of data structure (in this case, a sorted list of tuples)?

What's the precise abstraction you think is good for the implementor of custom window functions? You suggest that it isn't a sorted list with potential random access, but something less powerful.

I have no idea. Very few people have used Andl (or anything much other than SQL) so there are no user problems to work from. The specific problems I set out to solve with ordered queries were:

  • skip/take
  • running sums and moving averages
  • deltas
  • sequence numbering
  • order-preserving aggregation
  • all the above as an inherent part of a more complex RA query.

Example: S .order(SNAME) .skip(100) .take(25) join SP

Looks clear enough to me.

 

 

Andl - A New Database Language - andl.org
Quote from dandl on October 29, 2020, 1:15 am

I look forward to seeing an equivalent to SQL windowing as an extension of the Rel query language, so you can write a running sum similar to this:

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

Not sure what that does, but it looks like S .order(SNAME) generates a sorted arra- something... :-)

The return value of .order() is just a relation like any other. But internally, there is an ordering made available to later operators.

Claiming implementation magic doesn't escape the fact that what you've done is implement a non-relational system. Your "relations" are not relations. Defined formally and appropriately proven, there's perhaps nothing wrong with that. You could at least make an argument that the relational model is insufficient and would benefit from sortsets (or whatever you call your like-relations-but-with-tuple-order) instead of relations.

Whether it would hold up to scrutiny or not is a separate issue. (With the "not" side perhaps being an counterargument that what you really want are relations and arrays, which is of course what Tutorial D does.)

I don't see it that way. There is absolutely no point, internally or externally, where a sorted version of the relation comes into existence. Conceptually, the only thing added to the data model is an ordering function. The expression 'tree' is decorated by the addition of an ordering function, which various other parts of the tree can call upon for their own purposes. You can't save an ordered relation, pass it as an argument or distinguish it in any way. In TTM terms R .order(a) = R. They are the very same value.

So in the case of

S .order(SNAME) .select{ SNAME, Running := fold(+,STATUS) }

You might find it easier to understand if it was syntactically rewritten as

S .select{ SNAME, Running := fold(+,STATUS,{SNAME}) }

The ordering has its effect on the function, not on the data. The fold function, under the influence of an ordering argument, will sort its input privately in computing a result.

Here's another. This function adds a sequence number based on an ordering by name:

S .order(SNAME) .select{ Sequence := ORD() }

Your syntax implies a chain of operator invocations -- value of S passed to .order(), .order() evaluated and its result value passed to .select(), .select() invoking a lambda expression fold(...) -- such that the result value of the .order() evaluation must be a relation + order value -- or an "ordered relation" or non-relation -- and can't be just a relation value.

Your description implies that the whole expression isn't a chain of operator invocations; it's actually a calculus that looks like a chain of operator invocations.

I'm afraid I find it all rather baffling.

You might rewrite it as:

S .select{ Sequence := ORD({SNAME}) }

You could add this to Rel easily. This is simply EXTEND, with an the ORD() function that takes one or more attribute names as its argument.

Sorry, I don't understand that at all.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Jake Wheat on October 29, 2020, 2:20 am
Quote from dandl on October 29, 2020, 1:15 am

I don't see it that way. There is absolutely no point, internally or externally, where a sorted version of the relation comes into existence. Conceptually, the only thing added to the data model is an ordering function. The expression 'tree' is decorated by the addition of an ordering function, which various other parts of the tree can call upon for their own purposes. You can't save an ordered relation, pass it as an argument or distinguish it in any way. In TTM terms R .order(a) = R. They are the very same value.

...What you are suggesting, if I follow it correctly, is to keep with the 'only relations' abstraction, and to hide the fact that the data may be sorted at some point, and from the query writer's perspective, it's still a purely relational operator which takes relations and produces them. But you didn't quite achieve this 100% in your syntax - it explicitly has order in there. Maybe this is a good user friendly syntax for the operation, but I think it has the potential to be confusing because you tell the user that they only work with relations, but they are kind of being half exposed to something underneath that abstraction too.

Is this is a good general approach - to always hide new operations behind a relational facade, or is it sometimes better to expose when they seem to more properly be on some other kind of data structure (in this case, a sorted list of tuples)?

What's the precise abstraction you think is good for the implementor of custom window functions? You suggest that it isn't a sorted list with potential random access, but something less powerful.

Did he really write that ???  Then, per TTM's own all-encompassing principle ( v1 == v2 ===> FORALL f, f(v1) == f(v2) ) (*), it follows that FORALL f, f(R.order(...)) == f(R).  Which basically says that there simply should not be any need for the .order() construct to even exist in the language.  So why has he introduced it ???  Asking the question is answering it : because it's needed for certain kinds of operation.  But claiming that the constructs employed are ***exactly*** the same as "just relations" is the talk of salesmen and/or quacks, not of skilled technicians.  A relation is not the same thing as a (relation, ordering spec) pair.  The set of real numbers R is not the same thing as the group R,+,*.  As simple as that.

(*) Well, to call it a TTM principle might be wrong choice of words, but Hugh certainly endorses that principle most strongly, and I cannot imagine any TTM guy disagreeing.

PreviousPage 5 of 8Next