The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Inappropriate responses to points about language design in SQL

Quote from AntC on November 8, 2019, 9:51 pm

I don't know whether all the IM features were included when "first defined". The way Dave uses the IM to support tagged unions does look like an ad hoc fudge.

I agree. It would be much nicer to support tagged unions via the very union-like construct spelled out inRM Pre 4 and 5. I very small change would do it. The IM is a heavyweight addition to the language, and even then the tagged unions are a bit clunky.

Of course, with the IM it does support tagged unions, ignoring that some might not like the way it supports tagged unions.

There's a crucial difference between the IM way vs the way in other languages with proper support: in other languages, the elements of the type are merely different values so are not distinguishable at type level. You can simply test for equality; if two values have different tags that's type-safe and returns FALSE. The IM way, they're different (sub-)types: now sometimes that doesn't matter; sometimes it's even useful. In testing for equality, having different tags (the equivalent thereof) means different types, and sometimes that means ill-typed -- according to some of the furious debate I've seen on the forum.

I think that's saying you want to be able to test the tag, not have to resort to testing the type. And I don't think you really want the parameters to some operator only work for some variants because of sub-typing.

I use SAME_TYPE_AS in my relvar declarations so that I can easily repeat all the assignments in case I make a mistake in setting them up.  Originally I was using simultaneous declaration and assignment: VAR such-and-such BASE INIT(rel-exp).

Andl allows any value of any type to be used to define another object of the same type. I find that incredibly useful, and saves a lot of duplication.

Yes I used the same tactic in my Sudoku solver: define the 9 possible values in a base table; use those values to label the columns, rows, boxes. (It would have been much better if I could define those values as an enumeration, not merely INT.) I actually started with a 4 x 4 grid to make sure my logic was right before hitting the combinatorial explosion of 9 x 9. That's in Ms Access, not even SQL. I'll readily agree it was a fudge. I don't see that SAME_TYPE_AS is a distinguishing feature of Tutorial D.

I wrote a Sudoku solver in Andl using .while(), using the same strategy as SQL and CTE RECURSIVE. It's quite short, and I don't think I could improve the code quality by using enums, if Andl had them.

 

Andl - A New Database Language - andl.org
Quote from AntC on November 7, 2019, 10:08 pm

You assign to base relvars? Is that any different to the way RPG approaches it? Why aren't you just writing a large nested expression in full? Most modern languages allow binding a name to an expression (especially useful for documenting your algorithm, and for debugging) without having to write some temp result out to storage. Furthermore if they're merely calculations, shouldn't they be VIRTUAL?  Given the language's limitations, writing out to temp tables might be the only workable approach. I used the same tactic in my Sudoku solver (in Ms Access). (I tried making them VIEWs, but the db engine just ground to a halt.)

Well, precisely.  Laziness can help efficiency, but it also creates additional overhead when everything will have to be examined anyway.  If your SQL or D engine is capable of strictness analysis, then you win, but usually they aren't — in which case creating a private real relvar is a kind of strictness indicator.  In fact, you can see it as a Clean-style strictness type.

 If Tutorial D had type inference of the power of Damas-Hindley-Milner, you'd not need type declarations like that at all: the compiler would figure two expressions/variables are the same type.

It's certainly a matter of taste, but I prefer languages that don't do full unification but only infer types from the outside in, because errors in your code can fail to be noticed because the compiler unifies your types in a way you don't expect.  (I once saw in a list of mysterious error messages this beauty from SML/NJ: Expected type int * int * int, got type int * int * int.)

With outside-in inference, there are three places where explicit types are still required: function parameters, literals representing empty generic collections, and mutable variables (in languages with subtypes).  In addition, all variables must be initialized, which some people think is bad practice.

The empty-collection-literal problem arises even in dynamically typed Python:  {1, 2, 3} is a set, whereas {1 : "a", 2 : "b", 3, "c"} is a dictionary.  But it's not obvious what {} is.  In fact it's a dictionary, and an empty set is irregularly notated set().

Quote from Dave Voorhis on November 8, 2019, 10:02 pm
Quote from AntC on November 8, 2019, 9:51 pm

That's in Ms Access, not even SQL.

Do you mean, not even standard SQL?

The JET database engine in MS Access supports a dialect of SQL.

Yes but it's not full-power SQL: very limited ability to define VIEWs; poor support for constraints; generally just freezes when given large-scale combinatorial queries.

As an aside, and purely out of curiosity, why did you build a Sudoku solver using MS Access, when a closer-to-standard SQL -- say, PostgreSQL, maybe even MySQL -- would undoubtedly have been less annoying?

For no good technical reason: MS Access standalone was installed on my desktop. No other SQL, nor did I have (licit) rights to any server.

I presume the goal was to build one using a database engine, and not a general-purpose language or you would have used one.

Oh, I'd built solvers in Haskell already. This was an exercise to see whether a set-based solver worked: define the strategy in terms of whole-board-at-a-time rather than searching for individual cells that could be solved. For example: starting with a board of givens or of already-solved, query in a single SELECT statement all the possibles in the not-yet-solved cells; analyse that result to find any cell with exactly one possible solution; add that to the already-solved; rinse repeat. Ideally that would be a cascade of VIEWs rather than iterating SELECT statements, Ms Access just gave up.

Given the need to iterate, I did use that to mark each solved cell with which 'round' it was solved in. And that was sometimes useful for an 'undo' when I was experimenting with strategies.

Sudoku on a 9 x 9 board is not too demanding for a brute-force attack in an iterative/recursive language. Brute-force combinatorial explosion is also the technique on those solutions given as a single SQL statement -- i.e. returning a result with 81 columns. But I was trying to solve more like humans do, by various strategies that analyse the position/look for patterns to find cell(s) with unique solutions, then feed those into the next round.

Quote from dandl on November 8, 2019, 10:36 pm
Quote from AntC on November 8, 2019, 9:51 pm

I don't know whether all the IM features were included when "first defined". The way Dave uses the IM to support tagged unions does look like an ad hoc fudge.

I agree. It would be much nicer to support tagged unions via the very union-like construct spelled out inRM Pre 4 and 5. I very small change would do it. The IM is a heavyweight addition to the language, and even then the tagged unions are a bit clunky.

Of course, with the IM it does support tagged unions, ignoring that some might not like the way it supports tagged unions.

There's a crucial difference between the IM way vs the way in other languages with proper support: in other languages, the elements of the type are merely different values so are not distinguishable at type level. You can simply test for equality; if two values have different tags that's type-safe and returns FALSE. The IM way, they're different (sub-)types: now sometimes that doesn't matter; sometimes it's even useful. In testing for equality, having different tags (the equivalent thereof) means different types, and sometimes that means ill-typed -- according to some of the furious debate I've seen on the forum.

I think that's saying you want to be able to test the tag, not have to resort to testing the type.

Yes. The IM way you have to use despatch on the sub-type; whereas with tagged unions the execution need only inspect the tags; if the tags are the same, then you need to 'despatch' on the tag to compare the components; but that's only value level case or if-then-else, not operator overloading. (Hence 'despatch' in #notreallydespatch quotes.)

And I don't think you really want the parameters to some operator only work for some variants because of sub-typing.

I use SAME_TYPE_AS in my relvar declarations so that I can easily repeat all the assignments in case I make a mistake in setting them up.  Originally I was using simultaneous declaration and assignment: VAR such-and-such BASE INIT(rel-exp).

Andl allows any value of any type to be used to define another object of the same type. I find that incredibly useful, and saves a lot of duplication.

Yes I used the same tactic in my Sudoku solver: define the 9 possible values in a base table; use those values to label the columns, rows, boxes. (It would have been much better if I could define those values as an enumeration, not merely INT.) I actually started with a 4 x 4 grid to make sure my logic was right before hitting the combinatorial explosion of 9 x 9. That's in Ms Access, not even SQL. I'll readily agree it was a fudge. I don't see that SAME_TYPE_AS is a distinguishing feature of Tutorial D.

I wrote a Sudoku solver in Andl using .while(), using the same strategy as SQL and CTE RECURSIVE. It's quite short, and I don't think I could improve the code quality by using enums, if Andl had them.

 

How much of that code has the 9 x 9 size hard-coded? I started at smaller scale with 4 x 4 but also scaled up to 12 x 12 and 16 x 16. The aim was to give only the possible cell values and have the solver scale to it. (Not that Ms Access could really adapt to that.)

Quote from AntC on November 9, 2019, 3:19 am

Yes I used the same tactic in my Sudoku solver: define the 9 possible values in a base table; use those values to label the columns, rows, boxes. (It would have been much better if I could define those values as an enumeration, not merely INT.) I actually started with a 4 x 4 grid to make sure my logic was right before hitting the combinatorial explosion of 9 x 9. That's in Ms Access, not even SQL. I'll readily agree it was a fudge. I don't see that SAME_TYPE_AS is a distinguishing feature of Tutorial D.

I wrote a Sudoku solver in Andl using .while(), using the same strategy as SQL and CTE RECURSIVE. It's quite short, and I don't think I could improve the code quality by using enums, if Andl had them.

How much of that code has the 9 x 9 size hard-coded? I started at smaller scale with 4 x 4 but also scaled up to 12 x 12 and 16 x 16. The aim was to give only the possible cell values and have the solver scale to it. (Not that Ms Access could really adapt to that.)

None, really. There are literals for 3 and 9 and 27 and 81 but that's just laziness. They could easily be defined constants.

Basically it sets up a relation that represents the 'units' (rows, columns and boxes) in terms of cell Ids, using modulo arithmetic. Then another with the remaining possible values for each cell, (9x9x9) rows initially, (9x9x1) at the end. It then applies the standard two algorithms to 'fix' cells and exclude other possibilities, using while() to recurse.That solves most of them; for a few it has to guess and backtrack.

The relational model was surprisingly effective at expressing the algorithm, but also surprisingly hard to read and understand. The code is short, but going back to it now it's a real challenge to remember what it does.

The only test data I had was for 9x9, so that was all I ever did.

Andl - A New Database Language - andl.org
Quote from AntC on November 8, 2019, 9:51 pm
Quote from Dave Voorhis on November 8, 2019, 1:44 pm
To answer Hugh's q first:
Quote from Hugh on November 8, 2019, 12:04 pm

Is Tutorial D's failure to support "tagged unions" a consequence of bad language design?

Yes, I would say so. In particular "tagged unions" in other languages fall within a more general feature that also supports user-defining enumerated types. And enumerated types are enormously useful in type-safe programming. Without them you co-opt INT or CHAR values to stand for the elements of the enumeration (and presumably in TTM would declare a constraint to limit the set of values). But these are not INTs that you want to do arithmetic on nor CHARs that you want to concatenate, etc.

  The language has grown a little since we first defined it but I hope none of the additions can be regarded as ad hoc fudges.

I don't know whether all the IM features were included when "first defined". The way Dave uses the IM to support tagged unions does look like an ad hoc fudge.

Of course, with the IM it does support tagged unions, ignoring that some might not like the way it supports tagged unions.

There's a crucial difference between the IM way vs the way in other languages with proper support: in other languages, the elements of the type are merely different values so are not distinguishable at type level. You can simply test for equality; if two values have different tags that's type-safe and returns FALSE. The IM way, they're different (sub-)types: now sometimes that doesn't matter; sometimes it's even useful. In testing for equality, having different tags (the equivalent thereof) means different types, and sometimes that means ill-typed -- according to some of the furious debate I've seen on the forum.

 

I use SAME_TYPE_AS in my relvar declarations so that I can easily repeat all the assignments in case I make a mistake in setting them up.  Originally I was using simultaneous declaration and assignment: VAR such-and-such BASE INIT(rel-exp).

Yes I used the same tactic in my Sudoku solver: define the 9 possible values in a base table; use those values to label the columns, rows, boxes. (It would have been much better if I could define those values as an enumeration, not merely INT.) I actually started with a 4 x 4 grid to make sure my logic was right before hitting the combinatorial explosion of 9 x 9. That's in Ms Access, not even SQL. I'll readily agree it was a fudge. I don't see that SAME_TYPE_AS is a distinguishing feature of Tutorial D.

 

 

You opine that failure to support tagged unions is a consequence of bad language design.  I'm not in a position to dispute that, but can you point to an aspect of TD's existing design that militates against addition of clean support for tagged unions to the language?

Some of the additions that have been made to the original language are mere "shorthands", and I would claim these to arise from appropriate language design.  A major exception is the optional TD extension to support our own IM, but again the existing language appeared, to us at least, to accommodate it quite neatly.

Of course SAME_TYPE_AS isn't a new invention by us.  My point in mentioning it was that the operand can be absolutely any expression in TD that denotes a value.  SQL's CREATE TABLE ... LIKE ... feature doesn't support all table expressions.  Orthogonality is an oft cited essential property for good language design.

Hugh

Coauthor of The Third Manifesto and related books.

"It's certainly a matter of taste, but I prefer languages that don't do full unification but only infer types from the outside in"

I once had a difference of opinion with a so-called "analyst" (who had a university degree and I didn't so by definition he was way more competent and knowledgeable than I) about which error message to produce when "1/5" and "2/5" and "3/5" and "4/5" had come in (messages cut in parts that is) but "5/5" hadn't.  He insisted that the error message HAD TO BE "part 5 is missing", and I never managed to make him understand that "the parts count is wrong" could also be the reason, i.e. I never managed to make him realise that the error could also be in the "/5" part, i.e. that university-degree analyst could not see the possibility that someone would make an error while counting, so he axiomatically assumed the "/5" just HAD to be correct because counting is so simple that surely no one is going to make a mistake doing it.

Outside in gives useful information if the outside is not the part where the actual error is.  The usefulness of outside in vs inside out simply depends on where errors are made, and no compiler or language designer can reasonably foresee that in general.

Quote from Hugh on November 9, 2019, 12:19 pm
Quote from AntC on November 8, 2019, 9:51 pm
Quote from Dave Voorhis on November 8, 2019, 1:44 pm
To answer Hugh's q first:
Quote from Hugh on November 8, 2019, 12:04 pm

Is Tutorial D's failure to support "tagged unions" a consequence of bad language design?

Yes, I would say so. In particular "tagged unions" in other languages fall within a more general feature that also supports user-defining enumerated types. And enumerated types are enormously useful in type-safe programming. Without them you co-opt INT or CHAR values to stand for the elements of the enumeration (and presumably in TTM would declare a constraint to limit the set of values). But these are not INTs that you want to do arithmetic on nor CHARs that you want to concatenate, etc.

  The language has grown a little since we first defined it but I hope none of the additions can be regarded as ad hoc fudges.

I don't know whether all the IM features were included when "first defined". The way Dave uses the IM to support tagged unions does look like an ad hoc fudge.

Of course, with the IM it does support tagged unions, ignoring that some might not like the way it supports tagged unions.

There's a crucial difference between the IM way vs the way in other languages with proper support: in other languages, the elements of the type are merely different values so are not distinguishable at type level. You can simply test for equality; if two values have different tags that's type-safe and returns FALSE. The IM way, they're different (sub-)types: now sometimes that doesn't matter; sometimes it's even useful. In testing for equality, having different tags (the equivalent thereof) means different types, and sometimes that means ill-typed -- according to some of the furious debate I've seen on the forum.

 

I use SAME_TYPE_AS in my relvar declarations so that I can easily repeat all the assignments in case I make a mistake in setting them up.  Originally I was using simultaneous declaration and assignment: VAR such-and-such BASE INIT(rel-exp).

Yes I used the same tactic in my Sudoku solver: define the 9 possible values in a base table; use those values to label the columns, rows, boxes. (It would have been much better if I could define those values as an enumeration, not merely INT.) I actually started with a 4 x 4 grid to make sure my logic was right before hitting the combinatorial explosion of 9 x 9. That's in Ms Access, not even SQL. I'll readily agree it was a fudge. I don't see that SAME_TYPE_AS is a distinguishing feature of Tutorial D.

 

 

You opine that failure to support tagged unions is a consequence of bad language design.  I'm not in a position to dispute that, but can you point to an aspect of TD's existing design that militates against addition of clean support for tagged unions to the language?

Hi Hugh, the "bad language design" wording was from you. I would rather phrase it: omitting a feature common to many modern languages that exhibit Rm Pre 26's good language design, and which doesn't contradict any of TTM's Pre's and Pro's AFAICT. Which leads to the knotty question: might an implementor of a D include some feature neither prescribed nor proscribed "within the meaning of that thrice-accursed act" [Lord Justice Cocklecarrot in the vexatious case of the cow with a horn but not a bell/Beachcomber/"thrice" as in Third Manifesto]

I think support for tagged unions could easily be slotted into RM Pre 4 and 5, using Selectors with zero or more components. (The zero components would provide enumerated types like TYPE DayofWeek = Monday() | Tuesday() | ... | Sunday().) But these Selectors are not multiple PossReps for the same set of values. (Hence in my invented syntax I didn't use comma separators.) Rather, the different Selector name indicates a distinct value not representable using another Selector. I suppose you might combine tagged unions/enumerations with multiple PossReps in the TTM sense:

TYPE DayofWeek = Monday(), Lundi() | Tuesday(), Mardi() | ... | Sunday(), Dimanche();

There I've used comma to separate multiple PossReps of the same value, but | to distinguish distinct values.

There is though a semantic difference vs TTM's approach to typing (especially as described in the IM, Pre 20 b. alpha): We do not envisage a pre-supplied set of scalar values. Rather, value Monday() is brought into existence by being declared as a tag/enumeration. By "value" there I mean its PhysRep and any PossReps. I suspect this "semantic difference" is more philosophical/academic than having any practical impact. In a nominative type system it explains where types (their names) and values (their Selectors) 'come from' -- i.e. from the same declarations.

I strongly suspect that with tagged unions, a D would have no need for UNION types, and therefore the IM would have no need to talk of type alpha.

Some of the additions that have been made to the original language are mere "shorthands", and I would claim these to arise from appropriate language design.  A major exception is the optional TD extension to support our own IM, but again the existing language appeared, to us at least, to accommodate it quite neatly.

Of course SAME_TYPE_AS isn't a new invention by us.  My point in mentioning it was that the operand can be absolutely any expression in TD that denotes a value.  SQL's CREATE TABLE ... LIKE ... feature doesn't support all table expressions.  Orthogonality is an oft cited essential property for good language design.

I don't think anyone round here would claim SQL exhibits good language design, especially because of its dire lack of orthogonality of features. So 'better than SQL' doesn't amount to much of a claim.

Quote from Erwin on November 10, 2019, 2:10 am

"It's certainly a matter of taste, but I prefer languages that don't do full unification but only infer types from the outside in"

... the possibility that someone would make an error while counting, so he axiomatically assumed the "/5" just HAD to be correct because counting is so simple that surely no one is going to make a mistake doing it.

Outside in gives useful information if the outside is not the part where the actual error is.  The usefulness of outside in vs inside out simply depends on where errors are made, and no compiler or language designer can reasonably foresee that in general.

Indeed. Suppose (in a language without implicit casts)

x INT;
y INT;
...
z := x + (y / 7);   // no declaration for z
v := denominator(z);

'Inside out' would take the variable's declared types, and might infer that because 7 could represent an INT that rhs means INT division with an INT result for z. 'Outside in' might see the denominator(z) and infer z to be RAT. The compiler can only point out an inconsistency. (In a language with implicit casts, it would have a hard time figuring where to insert them in general.)

The boon of Hindley-Milner type inference is that we don't need to declare the type of every temp variable or intermediate result. Nobody would want to be forced to annotate the type of subexpression (y / 7). We might even omit the declarations for x, y and leave the compiler to infer they're RAT. Nevertheless it's a sanity check to declare types for 'important' variables and functions and of course for data elements residing in the database.

With 'full unification' it doesn't matter whether inference goes outside in or inside out or (pragmatically) both. If there's an inconsistency, it'll be found somewhere. Of course the compiler might 'blame' the wrong declaration or subexpression, just as Erwin's so-called "analyst".

Quote from AntC on November 10, 2019, 5:20 am

Indeed. Suppose (in a language without implicit casts)

x INT;
y INT;
...
z := x + (y / 7);   // no declaration for z
v := denominator(z);

'Inside out' would take the variable's declared types, and might infer that because 7 could represent an INT that rhs means INT division with an INT result for z.

I believe that (modulo contra/covariance, if available) overloading functions on their return type alone is a mistake.  Java does allow it, but only if the argument types are also different.

What is more, as I've said before, there is no unique integer division operator.  There are at least six with reasonable uses, all of them obeying the basic law n = dq + r.  I would never overload / to mean any specific one of them.

'Outside in' might see the denominator(z) and infer z to be RAT.

That by no means follows either.  In the Lisps, denominator is overloaded and returns 1 when applied to an integer, as is mathematically correct.  That's most useful with either dynamic types or Haskell-style ad hoc polymorphism.

The boon of Hindley-Milner type inference is that we don't need to declare the type of every temp variable or intermediate result. Nobody would want to be forced to annotate the type of subexpression (y / 7).

Absolutely.  Fortunately, without return type overloading it's not a problem.  If anyone knows of a strong use case for return type overloading, I'd like to hear it.

We might even omit the declarations for x, y and leave the compiler to infer they're RAT. Nevertheless it's a sanity check to declare types for 'important' variables and functions and of course for data elements residing in the database.

Exactly.

With 'full unification' it doesn't matter whether inference goes outside in or inside out or (pragmatically) both. If there's an inconsistency, it'll be found somewhere. Of course the compiler might 'blame' the wrong declaration or subexpression, just as Erwin's so-called "analyst".

My concern is that if you don't write any type declarations but you do make a type error of exactly the right kind, the compiler may fail to report it because it made a completely different set of assumptions than you did.  Your program will not crash in that case, but it may have extremely mysterious bugs.  (This happened not to me but to a friend of mine.)