The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Correlated subqueries

PreviousPage 7 of 8Next
Quote from dandl on October 30, 2020, 1:53 am
Quote from Darren Duncan on October 29, 2020, 9:11 pm
Quote from dandl on October 29, 2020, 5:09 am

BTW SQL has no sorted tables either.

I believe that it does.  This is the result type of a query that uses ORDER BY.

Although I would call it Tuple_Array.

The result of an SQL query is not a table, it's a result set. Tables in SQL are not sorted.

 

I think there are a bunch of other situations where you can order tables in most products.

Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Quote from Jake Wheat on October 30, 2020, 5:00 am
Quote from dandl on October 29, 2020, 1:49 pm

Trivial, to coin a phrase. ORD(attribute-list) is a function invoked for each tuple. It returns the ordinal of that tuple with respect to the ordering on attribute-list. In the case of the standard supplier data it will extend the relation by the values 1,2,3,4,5 for Adams,Blake,Clarke,Jones,Smith respectively. This syntax is not implemented: in Andl the syntax is:

S .order(SNAME) .select{ Sequence := ORD() }

I'm short of time right now, but if you like I can run some samples to show the output.

I think it's just a syntactic issue you have, for instance, if

S .order(SNAME) == S

then most people would conclude that

S .order(SNAME) .select{ Sequence := ORD() } == S .select{ Sequence := ORD() }

If you tweak the syntax to not imply this is the case, then I think some of the objections go away.

I think that's a fair call. Another way of putting it is that functions should be pure. I chose the syntax for its echoes of SQL, but it seems that's not a good result. So:

S .select{ Sequence := ORD({SNAME}) } != S .select{ Sequence := ORD({}) }

The latter case is still useful: it just provides a sequence number with no promises beyond being unique. In many situations that's enough.

Please note: we need to be clear whether we are assigning a rank to a value within a set of values, or a unique ordinal for each tuple. An alternative syntax might be:

S .select{ Sequence := ORD({SNAME,*}) } != S .select{ Sequence := ORD({*}) }

In Andl the * means all (remaining) attributes.

Andl - A New Database Language - andl.org
Quote from Jake Wheat on October 30, 2020, 7:08 am
Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Quote from Erwin on October 30, 2020, 7:53 am
Quote from Jake Wheat on October 30, 2020, 7:08 am
Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Erwin on October 30, 2020, 7:53 am

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Are there people on this forum who don't condone RVAs? I'm happy to say that these other people are using something different that they also call relational algebra, but it isn't the one I use.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Seems very close. Is there a 'TTM compatible' way to choose one row from a set of ties?

Quote from Dave Voorhis on October 30, 2020, 10:20 am
Quote from Erwin on October 30, 2020, 7:53 am
Quote from Jake Wheat on October 30, 2020, 7:08 am
Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

So you are saying that rank is not a good abstraction for custom aggregate operator implementors? Is this a position based only on optimisation, or something else.

Do you think your idea of the generalized operator which uses a sorted list of tuples with random access is equivalent to using rank in terms of understanding the meaning of queries? Is it reasonable to let query writers work on this level of abstraction, and aggregate implementors on the other level?

Quote from Jake Wheat on October 30, 2020, 12:26 pm
Quote from Dave Voorhis on October 30, 2020, 10:20 am
Quote from Erwin on October 30, 2020, 7:53 am
Quote from Jake Wheat on October 30, 2020, 7:08 am
Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

So you are saying that rank is not a good abstraction for custom aggregate operator implementors? Is this a position based only on optimisation, or something else.

I'm not convinced it's a good abstraction, partly because of practical optimisation concerns and partly due to what Erwin pointed out, that (for example) a common operation like 'FIRST' using a rank of '1' could obtain n tuples where 0 ≥ n < ∞.  Obtaining the element in the first index of an array doesn't have that problem; there are either 0 elements or 1 can be obtained.

Of course, RANK allows a 'FIRST' to be defined that returns n tuples where 0 ≥ n < ∞, if that's desired. But if only RANK is available, it's more difficult to define a 'FIRST' that can only return 0 or 1 tuples.

Do you think your idea of the generalized operator which uses a sorted list of tuples with random access is equivalent to using rank in terms of understanding the meaning of queries? Is it reasonable to let query writers work on this level of abstraction, and aggregate implementors on the other level?

The query writer and the aggregate implementor will typically be the same person. I don't see using arrays to be a problem.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Erwin on October 30, 2020, 7:53 am
Quote from Jake Wheat on October 30, 2020, 7:08 am
Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either.  This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

RVAs are not part of the RA. What we use is an amalgam of 3 things:

  1. The original 'Codd' RA (SPJRUN) [This set has redundancy, App-A does it in 5]
  2. Specific extensions: new value (EXTEND/RELCON), aggregation, limited recursion (TC/GTC/while)
  3. A type system, for attribute values.

RVAs are part of the type system, not the (E)RA itself.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

RANK is not enough, unless it can assign a monotonically ascending value to each tuple, based on a deterministic ordering of attribute values. Conceptually RANK is at the heart of all the Andl ordered queries, and is the basis of my claim that they are purely relational. Note:

  • The discussion of RANK under RM VSS 5 falls short.
  • I don't know SQL windowing well enough to know what else might be hidden in there.

I can provide formal definitions of the 9 operators comprising the ERA. I haven't tried to produce one for RANK, but it looks plausible.

 

Andl - A New Database Language - andl.org
Quote from Jake Wheat on October 30, 2020, 12:24 pm
Quote from Erwin on October 30, 2020, 7:53 am

First, it depends on whose "standard" RA you're referring to.  With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Are there people on this forum who don't condone RVAs? I'm happy to say that these other people are using something different that they also call relational algebra, but it isn't the one I use.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL.  SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Seems very close. Is there a 'TTM compatible' way to choose one row from a set of ties?

No.  But some of the early relational folk (I'm aware of two and they don't discuss here, perhaps they used to lurk here just to see what was going on, though I doubt it) still prefer to stick to Codd's ultimate choice of forbidding relations as domains, so you can't have relation-valued attributes, since there's then no domain such an attribute could draw its values from.

As for "TTM compatible way to choose (randomly, I suppose) some tuple from a given set", I suppose the answer is "implement the operator" (and document it loud and clear that it breaks determinacy of expressions).  Tutorial D has "TUPLE FROM ..." which requires the argument relation to be singleton, I suppose we could also have "ANYTUPLE FROM ..." (I would ban it from my language if I had the choice, but that's just a personal preference.).

PreviousPage 7 of 8Next