The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Quote from Erwin on November 25, 2019, 8:24 pm
Quote from AntC on November 25, 2019, 7:14 pm

I would expect (from RM Pre 21) to get a chaining of WITHs, such that each individual assignment is visible to those to its right.

...

In the first chaining, the MYELLIPSES value 'seen' by the first UPDATE is the pre-update one, the MYELLIPSES value 'seen' by the second UPDATE is the one left by the first.  But whatever is happening to E is never seen by this chain.  Any reference to E in this chain evaluates to the pre-update value, and this is contrary to your 'it is vital that ...'.

Nope can't see that interpretation in RM Pre 21. Is that D&D's intent?

 

"each individual assignment is visible to those to its right." is one of the two historical versions my recollection says Hugh once said were tried.  It's the one where they observed they could no longer swap using "A := B , B :=A" because the assignment to A was "visible" to the second one, effectively resulting in "A := B , B := B".

OK now I'm beginning to get it. It would be a really good idea to explain that motivation in the Pre. And, frankly, do away with all that pseudocode -- especially since the pseudocode is suspect.

It's not interpretation.  It's the fact that step c. says "evaluate all the RHS" and all of that is done before any actual assignment gets done (those only happen in step d.).

That's not accurate. Step b. has rewritten some of the assignments to use WITHs. In order to "evaluate all the RHS", that involves evaluating the RHS of the WITHand "actual assignment" of the intermediate result to the target var. Admittedly that 'target var' is not the externally visible var: it has local scope within the WITH, with name shadowing.

  Since all evaluation of RHS is done before any assignment is done, it must necessarily be the case that all references appearing anywhere in a RHS must necessarily evaluate to the pre-update value.  Well, all references whose "target" has not been "quietly" altered by the name shadowing technique from step b.

This is (apologies for the language) batshit crazy. When I see THE_A(E) on a RHS within a multiple assignment (in source Tutorial D), I have to figure out:

  • What is the ultimate target of this individual assignment -- E or MYELLIPSES in my nasty example -- i.e. the Vi after expansion at step a.; and note it might be a component of a component of a component ...
  • Is there any assignment to this same Vi (after expansion) anywhere to the left of me?
  • If not, ignore any individual assignments to my left and go back to the state before the whole MA to get the value for my RHS THE_A(E).
  • If so, go to the assignment immediately to my left.

I can see no sense in which this behaviour is "constructed according to well established principles of good language design." It's an ugly kludge. It destroys the intuition in my attempted simplification -- which interpretation I thought was the most secure:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.That is, as if the commas were semicolons.

I suggest the best way to 'plug' Rm Pre 21 is to say that for Multiple Assignments (as opposed to single-shot assignments) you can't put pseudovariables on LHS; you must put top-level variables. That should force the programmer to write cleaner/more understandable code.

Trying to mimic OOP's (pseudo-)assignment to components strikes me as a 'feature' better done without.

FWIW, my understanding of Chris' career was that for a non-negligible period of time, he was an important PL/1 guy within IBM.  PL/1 is that language that supported (still does, by the way) assignments to SUBSTR(charvar,begin,end) and called that technique "pseudovariables".  I suspect he borrowed the idea from PL/1.  I suppose the temptation was somewhere along the lines of "if it can be defined to behave predictably there's no point in depriving people from using it".

But, but ...: does PL/1 also support Multiple Assignment? Specifically does it support repeated assignments to the same top-level var in effect, within the same Multiple Assignment? And if it does, what 'predictable behaviour' does that give, and can we translate that into TTM terms? I guess we have to cope with Multiple PossReps which IIRC PL/1 does not have.

I don't want to bikeshed about syntax, but to borrow from CPL/BCPL's multiple assignment, Hugh's OP could be expressed as:

THE_B(E), THE_A(E) := THE_A(E) + 1.0, THE_A(E) + 2.0;  // a single :=, with balanced commalists each side

The semantics here is: evaluate all the RHS expressions; also evaluate all LHS expressions to obtain a target reference (they might contain subscripts or sub-components); then assign the values to the targets atomically 'all at once'. Note the targets in BCPL might be top-level vars or might be a byte expression to a part of a var (similar to assigning to a sub-string) or an array/vector-subscripted expression.

A couple of caveats (from my very dogeared BCPL manual vintage 1976)

  • "order of evaluation undefined" both of the RHS expressions, which might include procedure/function calls with side-effects, and of the LHS expressions for subscripts etc, with the same risk of side-effects.
  • Order of assigning RHS values to LHS targets also undefined -- beware if two targets evaluate to overlapping byte/bit segments of the same var.

Those caveats are sufficiently scary that even the most gung-ho BCPL programmer will be very careful. I guess in a tutorial/more user-friendly context, we might require the targets are disjoint. (This will probably in effect ban using different PossReps for the same target on one LHS.)

Now we can easily express the swapping of two values, which technique is used in CPL demonstration programs for sorting an array in situ, for example; and which appears to be behind all the nonsense in RM Pre 21:

A, B := B, A;

Array(x), Array(y), x := Array(y), Array(x), x + 1;  // incrementing x has no effect on the subscripts: they use the unincremented x.

So we can see that a Multiple Assignment cannot be translated to a series of single assignments, even if we delay constraint checking to the semicolon.

Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

Andl - A New Database Language - andl.org
Quote from Erwin on November 25, 2019, 8:44 pm
Quote from dandl on November 25, 2019, 12:57 pm

The point is that for the purposes of Hugh's example, THE() forms are not variables, so they must be considered as shorthands and eliminated in step RM Pre 21a. In step b when every LHS is a variable (of declared type ELLIPSE), the problem he refers to no longer exists. It evaporates.

Hugh's problem is caused by treating THE() forms as variables, which they are not.

I really don't understand how you can say that.  The logical conclusion of your "which they are not" would/should be "so stop people from assigning to THE_() forms".  That removes step a. from the prescription, and thus it also removes the point where the OP problem ***gets introduced***.  As opposed to your "no longer exists, evaporates".  If it really evaporated there would not have been a thread here.

The logical conclusion is to recognise them as shorthands, not assignments, and to expand them as such in step a.

FWIW, my understanding of Chris' career was that for a non-negligible period of time, he was an important PL/1 guy within IBM.  PL/1 is that language that supported (still does, by the way) assignments to SUBSTR(charvar,begin,end) and called that technique "pseudovariables".  I suspect he borrowed the idea from PL/1.  I suppose the temptation was somewhere along the lines of "if it can be defined to behave predictably there's no point in depriving people from using it".

I wrote a lot of PL/I and I am a big fan of pseudovariables, but you have to think of them as syntactic sugar over some underlying operator invocation. The relvar assignment in TTM is another case, where the primitive operators of the DBMS are quite different from the language view. Assigning to relvars in the catalog is another. In some sense it's pseudovariables all the way down.

 

Andl - A New Database Language - andl.org
Quote from Hugh on November 25, 2019, 2:33 pm
Quote from Erwin on November 24, 2019, 7:30 pm
Quote from Hugh on November 24, 2019, 3:23 pm

Having read the responses as far as Antc's, I now realise that Chris's description (to me) of the "broad idea" is not quite what we really had in mind.

...

But perhaps we should just go with Dave Voorhis and Erwin Smout and leave RM Pre 21 as-is.  I agree that it's not really flawed, but it does mean that an implementation that gets around the problem then become non-conforming (strictly speaking).

 

... we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

Hugh to be clear: I am not even attempting a "discussion about the merits and demerits of RM Pre 21's intention". My difficulty is more basic: I do not know what the intention is. I am making (up to the time of your message) no claim that the intention has merit or not. Erwin in a later message has made explicit part of the intention (about being able to 'swap' values of two variables). That simply doesn't appear in the text of Rm Pre 21, so it makes understanding more difficult.

I am not stupid/I'm used to reading programming and language references. Frankly, I think RM Pre 21 as stated is a failure to communicate.

BTW pace D&D's claim "we do not prescribe syntax"; RM Pre 21 actually prescribes two bits of syntax:

  • := for the form of assignment -- which is a constant sore for those who prefer INSERT/UPDATE/DELETE forms of updating relvars.
  • A1, A2, ..., An; for Multiple Assignment. If D&D had looked at wikipedia Parallel Assignment aka simultaneous assignment, and the long list of languages that use that form and their pedigree of good language design, perhaps all these difficulties could have been avoided at the outset.
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

That assignment to a component is shorthand for an equivalent selector invocation is a given. The question is how to formalise "rearranging terms".

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on November 26, 2019, 12:32 am
Quote from Erwin on November 25, 2019, 8:24 pm
Quote from AntC on November 25, 2019, 7:14 pm

I would expect (from RM Pre 21) to get a chaining of WITHs, such that each individual assignment is visible to those to its right.

...

In the first chaining, the MYELLIPSES value 'seen' by the first UPDATE is the pre-update one, the MYELLIPSES value 'seen' by the second UPDATE is the one left by the first.  But whatever is happening to E is never seen by this chain.  Any reference to E in this chain evaluates to the pre-update value, and this is contrary to your 'it is vital that ...'.

Nope can't see that interpretation in RM Pre 21. Is that D&D's intent?

 

"each individual assignment is visible to those to its right." is one of the two historical versions my recollection says Hugh once said were tried.  It's the one where they observed they could no longer swap using "A := B , B :=A" because the assignment to A was "visible" to the second one, effectively resulting in "A := B , B := B".

OK now I'm beginning to get it. It would be a really good idea to explain that motivation in the Pre. And, frankly, do away with all that pseudocode -- especially since the pseudocode is suspect.

It's not interpretation.  It's the fact that step c. says "evaluate all the RHS" and all of that is done before any actual assignment gets done (those only happen in step d.).

That's not accurate. Step b. has rewritten some of the assignments to use WITHs. In order to "evaluate all the RHS", that involves evaluating the RHS of the WITHand "actual assignment" of the intermediate result to the target var. Admittedly that 'target var' is not the externally visible var: it has local scope within the WITH, with name shadowing.

  Since all evaluation of RHS is done before any assignment is done, it must necessarily be the case that all references appearing anywhere in a RHS must necessarily evaluate to the pre-update value.  Well, all references whose "target" has not been "quietly" altered by the name shadowing technique from step b.

This is (apologies for the language) batshit crazy. When I see THE_A(E) on a RHS within a multiple assignment (in source Tutorial D), I have to figure out:

  • What is the ultimate target of this individual assignment -- E or MYELLIPSES in my nasty example -- i.e. the Vi after expansion at step a.; and note it might be a component of a component of a component ...
  • Is there any assignment to this same Vi (after expansion) anywhere to the left of me?
  • If not, ignore any individual assignments to my left and go back to the state before the whole MA to get the value for my RHS THE_A(E).
  • If so, go to the assignment immediately to my left.

I can see no sense in which this behaviour is "constructed according to well established principles of good language design." It's an ugly kludge. It destroys the intuition in my attempted simplification -- which interpretation I thought was the most secure:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.That is, as if the commas were semicolons.

I suggest the best way to 'plug' Rm Pre 21 is to say that for Multiple Assignments (as opposed to single-shot assignments) you can't put pseudovariables on LHS; you must put top-level variables. That should force the programmer to write cleaner/more understandable code.

Trying to mimic OOP's (pseudo-)assignment to components strikes me as a 'feature' better done without.

I note in passing, and for the sake of clarity (lest there be any confusion) that there is no pseudo-assignment in typical OO languages, there is only assignment. So it's accurate to say that what TTM is doing is mimicking OO assignment via pseudo-assignment (which expands to a selector invocation).

It's certainly possible to dispense with pseudo-assignment. I've not implemented it in Rel as it's not necessary, and a (relatively) minor syntactic convenience that makes it look like there's mutability where there isn't. I'd rather it always be explicitly clear exactly where mutability is and isn't.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 26, 2019, 6:50 am
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

That assignment to a component is shorthand for an equivalent selector invocation is a given. The question is how to formalise "rearranging terms".

No, that was not the problem Hugh raised. The problem he raised evaporates once the shorthands are properly expanded and put into the form shown. The issue with 21b and the WITH clause and triggering type constraints has gone. Disappeared.

But obviously there has always been a lack of formality in RM Pre 21. If you think this is a new problem to address then by all means say so, but it isn't the same. There is no algorithm presented, just the goal of (somehow) transforming the MA into the form shown. Informal, but sufficient.

From my perspective it's a straightforward application of standard compiler technology, and needs no explicit formalisation.

Andl - A New Database Language - andl.org
Quote from AntC on November 26, 2019, 12:32 am

...

THE_B(E), THE_A(E) := THE_A(E) + 1.0, THE_A(E) + 2.0;  // a single :=, with balanced commalists each side

...

Now we can easily express the swapping of two values, which technique is used in CPL demonstration programs for sorting an array in situ, for example; and which appears to be behind all the nonsense in RM Pre 21:

A, B := B, A;

Array(x), Array(y), x := Array(y), Array(x), x + 1;  // incrementing x has no effect on the subscripts: they use the unincremented x.

So we can see that a Multiple Assignment cannot be translated to a series of single assignments, even if we delay constraint checking to the semicolon.

I believe PL/1 indeed has that form.  I'm almost totally certain it has the form A,B,C := 1 (multiple targets, one same value).

But now I should point out that assignment is also supposed to work for db relvars.  And in particular, combinations of INSERT/DELETE to them.  (Not all such combinations are expressible by UPDATE, e.g. INSERT this one, DELETE these three.)  None of the established wisdom from any of the other languages is going to be helpful in that arena.  And I definitely want to keep using INSERT and DELETE for managing my database instead of being forced into the direct assignment form.

Maybe if we collect enough hindsight then an agreeable conclusion could be reached that TTM's attempt at unified assignment for all breaks the -sloganesque but not entirely untrue- dictum that one size fits all always ends up never fitting any.  But that's a discussion into merits and demerits.

Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

Well, as Antc already said, stop importing prejudices from what current compilers/languages do.  It does not matter what other compilers/languages do about "rearranging terms".  The TTM prescription has none of that, therefore according to the spec there is no "rearranging terms", and therefore the problem is there.  Period.

Hugh's answer to my question made it very clear that "the intermediate selector is there, therefore the runtime invocation must be there, therefore the potential for exceptions is there".