Correlated subqueries

#1 · October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

#2 · October 24, 2020, 4:07 am

Any D language intended for industry will provide an analogy to correlated subqueries so your number 3 is off the table. So the answer is either 1 or 2 or some unstated 4th option that the language designer decides works for their language.

#3 · October 24, 2020, 6:26 am

I don't follow the references to RA and RC, or the reasoning behind why they would be part of the solution.

I see no particular benefit in shoehorning this feature into a D, except to ease the path for converts. It should be possible to rewrite an SQL correlated subquery more or less directly into TD, by expression the inner query as a parameterised function. This would be inefficient, and not easily optimised, but I gather from Wikipedia this is a problem for SQL too.

My experience with Andl has been that SQL queries that use this (SQL) feature are easily replaced by composed RA queries, with a single 'inner' query followed by an 'outer' query that joins to the 'inner'. The resulting query is simple, efficient, but usually quite unlike the original in structure.

It's not something I've studied in detail. Hopefully you'll get an answer from Hugh or Dave, who must have run into this issue in writing queries in TD.

Andl - A New Database Language - andl.org

#4 · October 24, 2020, 7:49 am

Quote from Jake Wheat on October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

Seems a bit of a non-issue, particularly as those who switch early are likely to be the most capable of intelligently making a switch. Given a SQL example from the Wikipedia page on the subject:

SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);

In Tutorial D becomes:

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}

I think it got simpler. It's certainly no harder, and doesn't require any special features or syntax.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#5 · October 24, 2020, 9:32 am

I agree. That's much what I would expect: the inner query does the SUMMARIZE; a JOIN picks that out for the outer query.

But the point remains: the query got simpler, but it also changed a lot. I don't think there is any easy rewrite rule to handle that, and I don't think there is any syntax that can do it for you.

Andl - A New Database Language - andl.org

#6 · October 24, 2020, 9:41 am

Quote from dandl on October 24, 2020, 9:32 am

I agree. That's much what I would expect: the inner query does the SUMMARIZE; a JOIN picks that out for the outer query.

But the point remains: the query got simpler, but it also changed a lot. I don't think there is any easy rewrite rule to handle that, and I don't think there is any syntax that can do it for you.

Exactly.

Furthermore, it's a non-issue. Switching from SQL to a D is a significant technical undertaking that would be performed by competent engineers who would find mere conversions of correlated subqueries to be notionally trivial.

If there are barely-competent SQL authors that need help to convert correlated subqueries to D equivalents, then they're going to need help in all sorts of other ways, too. Realistically, project managers will make sure those guys stick to SQL.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#7 · October 24, 2020, 1:41 pm

"Hopefully you'll get an answer from Hugh or Dave" (dandl)

Sorry that I can't improve on Dave's answer, which is pretty much what mine would have been if he hadn't got there first. My problem has been the other way around. More than once my managers at IBM asked me if I could provide an SQL front end to a non-SQL database, even if only for queries. It was a challenge I never felt competent to accept, straightforward RA being so easy to (a) make work, and (b) optimize, by comparison (where "so easy ... by comparison" does not imply easy! right, Dave?).

Of course my reluctance was partly based on my unwillingness to promote a language I loathed, but my lack of competence was genuine. My coding contribution to Business System 12 was confined to the run-time: evaluation of a precompiled binary tree encoding of the source, and optimization based largely on cost estimates and properties of the RA operators (commutativity and associativity) but also, and most importantly, on exploiting the properties of keys. I never learned parsing and compiler techniques. ("Uh? And you call yourself a language designer?", I hear you exclaim. No comment!)

Hugh

Coauthor of The Third Manifesto and related books.

#8 · October 24, 2020, 5:26 pm

Quote from Erwin on October 24, 2020, 5:26 pm
Quote from Dave Voorhis on October 24, 2020, 7:49 am

Quote from Jake Wheat on October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

Seems a bit of a non-issue, particularly as those who switch early are likely to be the most capable of intelligently making a switch. Given a SQL example from the Wikipedia page on the subject:

SELECT employee_number, name

FROM employees emp

WHERE salary > (

SELECT AVG(salary)

FROM employees

WHERE department = emp.department);

SELECT employee_number, name FROM employees emp WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = emp.department);
SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);
In Tutorial D becomes:

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
I think it got simpler. It's certainly no harder, and doesn't require any special features or syntax.
I think we must assume that there is no feasible mechanical solution to the problem posed by OP. (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.) I would not call that a "non-issue". (In fact it with all the hindsight I have now it might be ***THE*** issue with TTM. There is no gentle transition path. Just like there is no gentle transition path from COBOL to Haskell.)

(Besides the problem is seriously aggravated by the fact that "SQL" these days can no longer be said to be a (single) language, rather just a huge collection of dialects, and often seriously differing ones at that.)

(FWIW I'll also add another version of the problem in the exact same wording as used by OP : "Some of them use SQL window functions extensively. What could be the options to support these users ?".)

As for "what it doesn't require" : what it ***does*** require is a seriously different way of thinking on the part of the query writer.

Quote from Dave Voorhis on October 24, 2020, 7:49 am

Quote from Jake Wheat on October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

Seems a bit of a non-issue, particularly as those who switch early are likely to be the most capable of intelligently making a switch. Given a SQL example from the Wikipedia page on the subject:

SELECT employee_number, name

FROM employees emp

WHERE salary > (

SELECT AVG(salary)

FROM employees

WHERE department = emp.department);

SELECT employee_number, name FROM employees emp WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = emp.department);
SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);
In Tutorial D becomes:

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
I think it got simpler. It's certainly no harder, and doesn't require any special features or syntax.

I think we must assume that there is no feasible mechanical solution to the problem posed by OP. (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.) I would not call that a "non-issue". (In fact it with all the hindsight I have now it might be ***THE*** issue with TTM. There is no gentle transition path. Just like there is no gentle transition path from COBOL to Haskell.)

(Besides the problem is seriously aggravated by the fact that "SQL" these days can no longer be said to be a (single) language, rather just a huge collection of dialects, and often seriously differing ones at that.)

(FWIW I'll also add another version of the problem in the exact same wording as used by OP : "Some of them use SQL window functions extensively. What could be the options to support these users ?".)

As for "what it doesn't require" : what it ***does*** require is a seriously different way of thinking on the part of the query writer.

Author of SIRA_PRISE

#9 · October 24, 2020, 6:02 pm

Quote from Dave Voorhis on October 24, 2020, 6:02 pm
Quote from Erwin on October 24, 2020, 5:26 pm
Quote from Dave Voorhis on October 24, 2020, 7:49 am

Quote from Jake Wheat on October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

Seems a bit of a non-issue, particularly as those who switch early are likely to be the most capable of intelligently making a switch. Given a SQL example from the Wikipedia page on the subject:

SELECT employee_number, name

FROM employees emp

WHERE salary > (

SELECT AVG(salary)

FROM employees

WHERE department = emp.department);

SELECT employee_number, name FROM employees emp WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = emp.department);
SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);
In Tutorial D becomes:

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
I think it got simpler. It's certainly no harder, and doesn't require any special features or syntax.
I think we must assume that there is no feasible mechanical solution to the problem posed by OP. (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.) I would not call that a "non-issue".
It's the OP's question, "Correlated subqueries in a D?" that I consider a non-issue.

The answer is that a D doesn't have them; it's a quirk of SQL. If you're using a D, be glad they're gone.

Somehow retrofitting correlated subqueries in a D is a bad idea. Just say "no."

Quote from Erwin on October 24, 2020, 5:26 pm
Quote from Dave Voorhis on October 24, 2020, 7:49 am

Quote from Jake Wheat on October 24, 2020, 2:23 am

Hi All,

Correlated subqueries in a D?

Let's say I have a new D based DBMS, and somehow I know I can persuade a load of existing SQL users to switch by rewriting their existing SQL to a D.

Some of them use correlated subqueries extensively. What could be the options to support these users - do you think any of the following make sense, are there other options?

1. implement a D which allows you to mix relational algebra and a complete relational calculus

2. do a SQL-inspired small fraction of relational calculus embedded in relational algebra to provide a way to easily rewrite any SQL csq query into this language (is this possible?)

3. ask customers to convert most of their csqs to joins themselves, and the ones they can't, then ... (not sure - ask them to use divide?)

Thanks,
Jake.

Seems a bit of a non-issue, particularly as those who switch early are likely to be the most capable of intelligently making a switch. Given a SQL example from the Wikipedia page on the subject:

SELECT employee_number, name

FROM employees emp

WHERE salary > (

SELECT AVG(salary)

FROM employees

WHERE department = emp.department);

SELECT employee_number, name FROM employees emp WHERE salary > ( SELECT AVG(salary) FROM employees WHERE department = emp.department);
SELECT employee_number, name
  FROM employees emp
  WHERE salary > (
    SELECT AVG(salary)
      FROM employees
      WHERE department = emp.department);
In Tutorial D becomes:

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}

(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
(JOIN {SUMMARIZE Employees BY {Department}: {AvgSalary := AVG(Salary)}, Employees} WHERE Salary > AvgSalary) {Employee_number, Name}
I think it got simpler. It's certainly no harder, and doesn't require any special features or syntax.
I think we must assume that there is no feasible mechanical solution to the problem posed by OP. (For such a solution to exist it would require a level of "capturing semantics" -capturing the precise nature of the correlation in this case- that no human-written program in this world is able to achieve, imo.) I would not call that a "non-issue".

It's the OP's question, "Correlated subqueries in a D?" that I consider a non-issue.

The answer is that a D doesn't have them; it's a quirk of SQL. If you're using a D, be glad they're gone.

Somehow retrofitting correlated subqueries in a D is a bad idea. Just say "no."

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#10 · October 24, 2020, 6:04 pm

Quote from Hugh on October 24, 2020, 1:41 pm

"Hopefully you'll get an answer from Hugh or Dave" (dandl)

Sorry that I can't improve on Dave's answer, which is pretty much what mine would have been if he hadn't got there first. My problem has been the other way around. More than once my managers at IBM asked me if I could provide an SQL front end to a non-SQL database, even if only for queries. It was a challenge I never felt competent to accept, straightforward RA being so easy to (a) make work, and (b) optimize, by comparison (where "so easy ... by comparison" does not imply easy! right, Dave?).

Indeed. Some easy things are still very hard.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters

Correlated subqueries