Correlated subqueries

#71 · October 30, 2020, 2:13 pm

Quote from Erwin on October 30, 2020, 2:13 pm

Quote from Dave Voorhis on October 30, 2020, 10:20 am

Quote from Erwin on October 30, 2020, 7:53 am

Quote from Jake Wheat on October 30, 2020, 7:08 am

Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either. This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to. With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL. SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

Hmmmmmm. But sorting into an array with a sort spec that potentially leaves ties and no further input to tell the system how to handle them, leaves exactly the same problem.

RANK produces a relation that guarantees that the values for the RANK attribute in all tuples is in the range 1-n (bounds included, n is cardinality of the input) or alternatively 0-n (upper bound excluded). It does ***not*** guarantee that each possible RANK value appears exactly once (that is, the resulting relation is not guaranteed to satisfy a constraint KEY {RANK}). That is only the case if the [set of all] attributes used in the sort spec are a superset of some key the input is known to satisfy. In that case, KEY {RANK} will be satisfied, and in that case, I'd say RANK ***is*** the proper relational abstraction imo. Using arrays does not alter that condition/circumstance.

(I should add here that I am now talking of that particular version of RANK that is the equivalent of the EXTEND-with-RVA, count RVA, project away RVA approach I mentioned. DTATRM explicitly mentions this very same expansion (immediately after fig. 10-2), mentions that in previous writings the authors had claimed the opposite but that they now disavow of that previous claim (footnote), explicitly shows the equal RANK values in one of the examples, and explicitly mentions (immediately before fig. 10-2) that "get the three heaviest parts" can produce a relation with four tuples. That is, "get the three heaviest parts" is in fact "get all the parts that have a weight ranked in the top-3 of weights".

Quote from Dave Voorhis on October 30, 2020, 10:20 am

Quote from Erwin on October 30, 2020, 7:53 am

Quote from Jake Wheat on October 30, 2020, 7:08 am

Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either. This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to. With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL. SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

Hmmmmmm. But sorting into an array with a sort spec that potentially leaves ties and no further input to tell the system how to handle them, leaves exactly the same problem.

RANK produces a relation that guarantees that the values for the RANK attribute in all tuples is in the range 1-n (bounds included, n is cardinality of the input) or alternatively 0-n (upper bound excluded). It does ***not*** guarantee that each possible RANK value appears exactly once (that is, the resulting relation is not guaranteed to satisfy a constraint KEY {RANK}). That is only the case if the [set of all] attributes used in the sort spec are a superset of some key the input is known to satisfy. In that case, KEY {RANK} will be satisfied, and in that case, I'd say RANK ***is*** the proper relational abstraction imo. Using arrays does not alter that condition/circumstance.

(I should add here that I am now talking of that particular version of RANK that is the equivalent of the EXTEND-with-RVA, count RVA, project away RVA approach I mentioned. DTATRM explicitly mentions this very same expansion (immediately after fig. 10-2), mentions that in previous writings the authors had claimed the opposite but that they now disavow of that previous claim (footnote), explicitly shows the equal RANK values in one of the examples, and explicitly mentions (immediately before fig. 10-2) that "get the three heaviest parts" can produce a relation with four tuples. That is, "get the three heaviest parts" is in fact "get all the parts that have a weight ranked in the top-3 of weights".

Author of SIRA_PRISE

#72 · October 30, 2020, 3:35 pm

Quote from Dave Voorhis on October 30, 2020, 3:35 pm

Quote from Erwin on October 30, 2020, 2:13 pm

Quote from Dave Voorhis on October 30, 2020, 10:20 am

Quote from Erwin on October 30, 2020, 7:53 am

Quote from Jake Wheat on October 30, 2020, 7:08 am

Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either. This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to. With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL. SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

Hmmmmmm. But sorting into an array with a sort spec that potentially leaves ties and no further input to tell the system how to handle them, leaves exactly the same problem.

Per my response at https://forum.thethirdmanifesto.com/forum/topic/correlated-subqueries/?part=7#postid-987729, arrays present a slightly more generally ergonomic, and perhaps pragmatic solution than RANK at perhaps the expense of some theoretical purity.

'FIRST' intended to obtain 1 tuple (or perhaps pathologically, 0 tuples) using a rank of '1' could obtain n tuples where 0 ≥ n < ∞, with no easy way to obtain just one tuple, at least not without adding another mechanism like converting the relation emitted by RANK to an array.

Obtaining the element in the first index of an array doesn't have that problem; there are either 0 elements or 1 can be obtained. Indeed, obtaining any n tuples or any i^th tuple is trivial.

Quote from Erwin on October 30, 2020, 2:13 pm

Quote from Dave Voorhis on October 30, 2020, 10:20 am

Quote from Erwin on October 30, 2020, 7:53 am

Quote from Jake Wheat on October 30, 2020, 7:08 am

Quote from Erwin on October 29, 2020, 9:28 pm

And FWIW, here's how RANK is a shorthand for a combo of basic constructs already present in any language that can validly claim to be TTM-conformant :

And to head off at the pass silly remarks like "that's not how I would implement it" : that's not how I would implement it either. This explanation merely serves to explain/define ***SEMANTICS***.

So if rank is enough to model all window functions, from what you say here, rank can be derived from standard relational algebra operations, so window functions can be derived from standard relational algebra operations (no new 'axiom operator' needs to be introduced, if that's the right phrase). I think that's pretty unexpected (in a good way).

Yes and no.

First, it depends on whose "standard" RA you're referring to. With TTM's it can, but there are those people who do not condone RVA's, so this definition for RANK is not available to them in their algebra.

Second, if ties are possible and present, then doing something like "WHERE OTHER_RANK == THIS_RANK - 1" (say, in a self-join of a "ranked" relation to itself) to replace, say, LAG(1) or LEAD(1), is still not going to produce ***exactly*** the same results as SQL. SQL will always get you one row, but it's one randomly picked from the set of rows that tied, whereas the "OTHER_RANK == THIS_RANK - 1" trick might get you an empty relation, or one with the entire set of ties.

(And third, I don't know ***all*** window functions so there might still be some crazy beast among them that cannot be tackled this way.)

Indeed. That is why I based user-defined aggregation in Rel on sortable arrays: I can't presume to know what intent, purpose, or need users might have in creating a custom aggregation operator, so my job is only to hand them the most general reasonable (and feasible) facility upon which it can be based. RANK, as you've pointed out, isn't it.

Hmmmmmm. But sorting into an array with a sort spec that potentially leaves ties and no further input to tell the system how to handle them, leaves exactly the same problem.

Per my response at https://forum.thethirdmanifesto.com/forum/topic/correlated-subqueries/?part=7#postid-987729, arrays present a slightly more generally ergonomic, and perhaps pragmatic solution than RANK at perhaps the expense of some theoretical purity.

'FIRST' intended to obtain 1 tuple (or perhaps pathologically, 0 tuples) using a rank of '1' could obtain n tuples where 0 ≥ n < ∞, with no easy way to obtain just one tuple, at least not without adding another mechanism like converting the relation emitted by RANK to an array.

Obtaining the element in the first index of an array doesn't have that problem; there are either 0 elements or 1 can be obtained. Indeed, obtaining any n tuples or any i^th tuple is trivial.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#73 · October 30, 2020, 4:01 pm

Quote from Dave Voorhis on October 30, 2020, 3:35 pm

Per my response at https://forum.thethirdmanifesto.com/forum/topic/correlated-subqueries/?part=7#postid-987729, arrays present a slightly more generally ergonomic, and perhaps pragmatic solution than RANK at perhaps the expense of some theoretical purity.

'FIRST' intended to obtain 1 tuple (or perhaps pathologically, 0 tuples) using a rank of '1' could obtain n tuples where 0 ≥ n < ∞, with no easy way to obtain just one tuple, at least not without adding another mechanism like converting the relation emitted by RANK to an array.

Obtaining the element in the first index of an array doesn't have that problem; there are either 0 elements or 1 can be obtained. Indeed, obtaining any n tuples or any i^th tuple is trivial.

:-)

The issue is that by delivering an array you might create an illusion of ordering being total, when it isn't. RANK exposes this more readily and thus leads to systems that are a bit more "fail-fast". I know and agree that if the user consciously doesn't care about the ordering of things for which no ordering was requested, then it shouldn't be a problem. My issue is how wise it can possibly be to consciously not care (and how wise it can be to deliver systems that make this easy to do) and how low a percentage of the users (/programmers) it will be who really use such a feature wisely and consciously.

Is it a bad thing if the fail-fast system fails to answer (in ways that perfectly match the questionner's expactations) a question like "get the three heaviest parts" when it is sort of logically provable that the question is itself flawed because (a) in some sense underspecced, this being due to (b) upfront unwarranted presuming the cardinality of the resulting answer ?

Author of SIRA_PRISE

#74 · November 1, 2020, 11:13 pm

It seems to me that a rank is always possible based on:

One or more attributes
Each attribute is an ordered type

It be deterministic if

The list of attributes is singular, or is itself placed in some order.

It will be unique if

The list of attributes are a key.

If the attributes do not form a key, the rank is equivalent to

forming a unique rank on any projection for which the attributes do form a key
joining to the original relation.

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters