The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Correlated subqueries

SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);

I think we must assume that there is no feasible mechanical solution to the problem posed by OP.  (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.)  I would not call that a "non-issue".  (In fact it with all the hindsight I have now it might be ***THE*** issue with TTM.  There is no gentle transition path.  Just like there is no gentle transition path from COBOL to Haskell.)

As I said earlier, I don't agree, I think it is feasible. Here is the direct rewrite in Andl-like pseudo-code.

employees .where(salary > func1(department)) .select{ employee_number, name }
def func1(a1) =>employees .where(a1 = department) .select { average(salary) }   // anonymous single scalar lifted out

Then each of the WHERE clauses can be rewritten as a JOIN on a RELCON. First:

employees join rel1 .select{ employee_number, name }
var rel1 := employees .where(salary > func1(department)) .select{ salary, department }

Then perhaps:

var rel2 := employees .select { department, att1:=average(salary) }  // relation of department by ave salary
def func1(a1) => rel2 .where(a1 = department) .select{ (att1) } // lift

From here there are various ways to go, perhaps finishing up with a join on a pre-constructed relation of {department,salary} for all salaries above the average. Still not quite the way we would write it by hand. And I'm not entirely sure the last step is right. And while perhaps feasible, that still doesn't make it a good idea.

Andl - A New Database Language - andl.org
Quote from Darren Duncan on October 24, 2020, 4:07 am

Any D language intended for industry will provide an analogy to correlated subqueries

Did anyone here who built a D include a feature like this?

Quote from dandl on October 24, 2020, 6:26 am

I don't follow the references to RA and RC, or the reasoning behind why they would be part of the solution.

RA because I think everyone in this community uses RA as the base for any D language design discussions and implementations, RC because I think correlated subqueries were inspired by RC, and I think that it would be easier to convert correlated subqueries into an RC language than into RA language, but I'm not sure about this.

Quote from Dave Voorhis on October 24, 2020, 9:41 am

Furthermore, it's a non-issue. Switching from SQL to a D is a significant technical undertaking that would be performed by competent engineers who would find mere conversions of correlated subqueries to be notionally trivial.

I'm not sure I'd agree it that it would always be trivial, but I think I can accept a competent engineer should be able to do it.

Quote from Erwin on October 24, 2020, 5:26 pm

I think we must assume that there is no feasible mechanical solution to the problem posed by OP.  (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.)  I would not call that a "non-issue".

A while ago I wrote some code to run correlated subqueries on a SQL engine, by converting them to joins. I started with TPC-H, I think I managed to convert all but one query with csqs to use joins instead mechanically, but these conversions were all special cases for particular shapes of csqs. I didn't see any way to generalize the rewrites. And there was one query that I couldn't figure out how to rewrite to a join even manually (I wasn't thinking much about the semantics of the query).

Do you have anything more to say about there not being a feasible mechanical solution? I think something along the lines of the method of converting relational calculus to relational algebra would work, but I have no idea whether this would actually be usable for product.

(In fact it with all the hindsight I have now it might be ***THE*** issue with TTM.  There is no gentle transition path.  Just like there is no gentle transition path from COBOL to Haskell.)

Do you have other SQL features in mind that are as tricky as csqs?

I think 'SQL vs TTM' is like 'COBOL vs Haskell' is a little too harsh on SQL. But I think it works in the sense that you aren't going to get many COBOL programmers to learn Haskell, and you aren't going to get anyone to convert existing COBOL codebases to Haskell.

(Also, no-one tries to teach COBOL programmers Haskell, and Haskell isn't trying to become more widely used by moving in on COBOL's home territory, as far as I know.)

(FWIW I'll also add another version of the problem in the exact same wording as used by OP : "Some of them use SQL window functions extensively. What could be the options to support these users ?".)

I think that window functions might be a little easier - I suspect there's a straightforward and reasonable mechanical conversion possible (i.e. you can produce an efficient query plan from the result).

I have a slightly unrelated question about window functions - does anyone here know where they came from? They seem really useful and are widely used, but also seemed to have  beamed down into the ANSI standard from outer space at one point. I did a bit of looking around to see what the precedents and inspiration for them was - I didn't find anything similar that existed before window functions were added to the standard.

Quote from Jake Wheat on October 25, 2020, 3:14 am
Quote from Darren Duncan on October 24, 2020, 4:07 am

Any D language intended for industry will provide an analogy to correlated subqueries

Did anyone here who built a D include a feature like this?

I am in the process of doing so with my in-development industrial D language.  Also I see correlated subquery analogies as no big deal worth singling out for a big discussion, as implementing them is trivially easy, as much so as any other features, and anyone who uses this in SQL should be able to express the same logic in a D language with no greater effort than in SQL.

 

Quote from Dave Voorhis on October 24, 2020, 6:02 pm

It's the OP's question, "Correlated subqueries in a D?" that I consider a non-issue.

The answer is that a D doesn't have them; it's a quirk of SQL. If you're using a D, be glad they're gone.

Somehow retrofitting correlated subqueries in a D is a bad idea. Just say "no."

What about relational calculus?

Quote from dandl on October 25, 2020, 12:58 amSELECT employee_number, name

From here there are various ways to go, perhaps finishing up with a join on a pre-constructed relation of {department,salary} for all salaries above the average. Still not quite the way we would write it by hand. And I'm not entirely sure the last step is right. And while perhaps feasible, that still doesn't make it a good idea.

It is useful to separate between 'what are the options to mechanically support legacy SQL using csqs on top of a D language', and 'is it sensible to introduce a csq-like feature into a D for new development'?

Quote from Darren Duncan on October 25, 2020, 3:21 am

I am in the process of doing so with my in-development industrial D language.  Also I see correlated subquery analogies as no big deal worth singling out for a big discussion, as implementing them is trivially easy, as much so as any other features, and anyone who uses this in SQL should be able to express the same logic in a D language with no greater effort than in SQL.

Can you explain the trivially easy implementation?

A while ago I wrote some code to run correlated subqueries on a SQL engine, by converting them to joins. I started with TPC-H, I think I managed to convert all but one query with csqs to use joins instead mechanically, but these conversions were all special cases for particular shapes of csqs. I didn't see any way to generalize the rewrites. And there was one query that I couldn't figure out how to rewrite to a join even manually (I wasn't thinking much about the semantics of the query).

Do you have anything more to say about there not being a feasible mechanical solution? I think something along the lines of the method of converting relational calculus to relational algebra would work, but I have no idea whether this would actually be usable for product.

As I've been saying, my view is that you can mechanically rewrite a CSQ as a nested function, in which the body is the nested query and the 'correlated' bits are parameters. Do you see any reason why this would not work?

I think that window functions might be a little easier - I suspect there's a straightforward and reasonable mechanical conversion possible (i.e. you can produce an efficient query plan from the result).

No, there really isn't. The inherent features of window functions is the window, which is built on a sorted query. TTM and TD don't admit to the concept of the sorted query (core philosophy I think) so no sorts and therefore no windows.

Andl has sorted queries and window functions, which replicate most of what SQL can do.

I have a slightly unrelated question about window functions - does anyone here know where they came from? They seem really useful and are widely used, but also seemed to have  beamed down into the ANSI standard from outer space at one point. I did a bit of looking around to see what the precedents and inspiration for them was - I didn't find anything similar that existed before window functions were added to the standard.

COBOL has a REPORT section and line reports with page and section breaks were common in the day. It's easy enough to see why the features was created, but less easy to find out where it came from, or why it didn't appear until so late, or why it has the particular set of features it has. Hugh might know more about the history, although it's past his time there.

Andl - A New Database Language - andl.org
Quote from Jake Wheat on October 25, 2020, 3:47 am
Quote from Darren Duncan on October 25, 2020, 3:21 am

I am in the process of doing so with my in-development industrial D language.  Also I see correlated subquery analogies as no big deal worth singling out for a big discussion, as implementing them is trivially easy, as much so as any other features, and anyone who uses this in SQL should be able to express the same logic in a D language with no greater effort than in SQL.

Can you explain the trivially easy implementation?

Muldis Data Language is architecturally like a generic application programming language, and everything users write in it has the form of generic pure functions (composed only of pure expressions per a functional language) and procedures (composed of generic statements and expressions), so users have the ability fo formulate arbitrarily complex user-defined relational types and operators.

With such a foundation, how can it not be trivially easy to express everything SQL can, including correlated subqueries?

If you feel otherwise then please explain how this language should not be capable of trivially expressing the logic of a correlated subquery in a similar looking arrangement?

Quote from dandl on October 25, 2020, 4:13 am

A while ago I wrote some code to run correlated subqueries on a SQL engine, by converting them to joins. I started with TPC-H, I think I managed to convert all but one query with csqs to use joins instead mechanically, but these conversions were all special cases for particular shapes of csqs. I didn't see any way to generalize the rewrites. And there was one query that I couldn't figure out how to rewrite to a join even manually (I wasn't thinking much about the semantics of the query).

Do you have anything more to say about there not being a feasible mechanical solution? I think something along the lines of the method of converting relational calculus to relational algebra would work, but I have no idea whether this would actually be usable for product.

As I've been saying, my view is that you can mechanically rewrite a CSQ as a nested function, in which the body is the nested query and the 'correlated' bits are parameters. Do you see any reason why this would not work?

I think that window functions might be a little easier - I suspect there's a straightforward and reasonable mechanical conversion possible (i.e. you can produce an efficient query plan from the result).

No, there really isn't. The inherent features of window functions is the window, which is built on a sorted query. TTM and TD don't admit to the concept of the sorted query (core philosophy I think) so no sorts and therefore no windows.

More than core philosophy, it's per the definition of a relation. A "sorted relation" isn't part of any relational model.

Of course, that doesn't preclude defining a RANK operator that returns a relation indicating tuples ranked on some order, or defining some non-relation construct that defines a tuple order. Tutorial D defines an ARRAY construct for this, and you could certainly define a rich set of windowing (and other) operators for ARRAY. But that is outside the scope of the relational model.

We've discussed this before.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Jake Wheat on October 25, 2020, 3:36 am
Quote from Dave Voorhis on October 24, 2020, 6:02 pm

It's the OP's question, "Correlated subqueries in a D?" that I consider a non-issue.

The answer is that a D doesn't have them; it's a quirk of SQL. If you're using a D, be glad they're gone.

Somehow retrofitting correlated subqueries in a D is a bad idea. Just say "no."

What about relational calculus?

No.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org