The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Quote from Erwin on November 23, 2019, 4:07 pm
Quote from Dave Voorhis on November 23, 2019, 2:58 pm

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

Alternatively -- and I'm not sure what can of worms this opens; perhaps a big one -- define multiple assignment and type constraints such that all type constraints are only checked at the end of a multiple assignment.

Yes, yes (*) and NOOOOOOOOOOOOOOO.

If type constraints are only "checked at end of assignment" then you get not-a-value thingys somewhere in the middle, then you get containers somewhere in the middle (of a chaining of assignments to the same target) deriving from "expressions" that suddenly are able to produce these not-a-value thingys as their result, and I think lots of stuff is gone out the window.

But this isn't only a q about PossRep components and specialisation-by-constraint. Consider the standard S&P schema with an Inclusion Dependency (Foreign Key Constraint) from SP to S on S#:

DELETE S WHERE S# = 'S1', UPDATE SP WHERE S# = 'S1' : ..., INSERT S : REL{TUP{S# 'S1', ...}};

(Of course I mean the assignments equivalent to those DELETE/UPDATE/INSERT statements.)

Should that UPDATE SP be allowed when it'll break referential integrity? Should that DELETE S be allowed, because presumably SP already has a tuple referencing Supplier S1? Is the MA acceptable overall, because the deleted S1 gets re-inserted 'before the semicolon'. If this MA is not valid, then I think Hugh's original MA is not valid either.

And I've always believed ELLIPSE(A 1.0, B 100.0) is, and should be allowed by the language to be, an ellipse too.  Besides, the idea to "replace" scalar values with their tuple "equivalents" (in the internal operation of the assignment for the small amount of time it takes to get the assignment done) is just the result of the more than striking resemblance between possrep representations of a scalar value and a tuple value, and the authors have never been very keen on letting that seep through too much in the language.  One problem is definitely "which possrep are you going to pick".  Can't guarantee that picking POLAR for POINT values will not make you run into problems when you encounter a subsequent THE_X().  And vice-versa.

Have you 'always believed' something like my MA to relvars should be allowed? What's different between relvar assignment vs PossRep component assignment (leaving aside the Multiple PossRep issue)?

(*) Well, it might make certain updates impossible if the constraint gets you "deadlocked" (can't change component 1 value because current component 2 value prohibits it and can't change component 2 value because current component 1 value prohibits it).

That's just like an Equi-Dependency between relvars; and it's been the classic justification for multiple assignment. So we can't expect constraints to hold 'at the comma'(?)

Quote from Dave Voorhis on November 23, 2019, 10:56 pm

I'm not sure what compilers can or can't do is an issue here; this is a model, not an implementation. My objection is that it makes the definition of assignment to THE_x pseudo-variables potentially different under single and multiple assignment and/or not simply a shorthand for selector invocation.

It isn't. But if a compiler can always do it (using the full range of existing compiler tech) then there is an algorithm. If there is an algorithm it can be specified. But this specification fails to do so. So the answer is to fix the specification, not ban the algorithm.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on November 23, 2019, 10:56 pm
Quote from dandl on November 23, 2019, 10:25 pm
Quote from Dave Voorhis on November 23, 2019, 2:58 pm
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

Agreed. A bad idea.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

A poor idea. This example can be mechanically rewritten by extracting individual component values, something a compiler could easily do. If you prohibit examples like this you are banning a compiler from performing a provably valid rewrite, just because you couldn't find a way to write the spec.

I'm not sure what compilers can or can't do is an issue here; this is a model, not an implementation. My objection is that it makes the definition of assignment to THE_x pseudo-variables potentially different under single and multiple assignment and/or not simply a shorthand for selector invocation.

No there's nothing specific to THE_x pseudo-variables. Consider assignment to relvars with an Equi-dependency between them. (All Order Lines must have an Order Header; all Order Headers must have at least on Order Line.) RM Pre 23 Note.

Without Multiple Assignment, there's no way to insert anything into either of the Order Header, nor Order Lines relvars. So the justification for Multiple Assignment is that (some) constraints are not checked 'at the comma'. Why are type constraints treated differently to database constraints for these purposes?

Quote from AntC on November 24, 2019, 2:38 am

Should that UPDATE SP be allowed when it'll break referential integrity? Should that DELETE S be allowed, because presumably SP already has a tuple referencing Supplier S1? Is the MA acceptable overall, because the deleted S1 gets re-inserted 'before the semicolon'. If this MA is not valid, then I think Hugh's original MA is not valid either.

Type constraints have to hold 'at the comma' to guard against invalid values and undefined behaviour. Perhaps it's a 32-bit number and the constraint on an addition is simply: expressible in 32 bits.

Database constraints have to hold 'at the semicolon'. Nobody cares about intermediate relational values, the constraint is what goes in the database, and notionally they all go in at once.

And I've always believed ELLIPSE(A 1.0, B 100.0) is, and should be allowed by the language to be, an ellipse too.  Besides, the idea to "replace" scalar values with their tuple "equivalents" (in the internal operation of the assignment for the small amount of time it takes to get the assignment done) is just the result of the more than striking resemblance between possrep representations of a scalar value and a tuple value, and the authors have never been very keen on letting that seep through too much in the language.  One problem is definitely "which possrep are you going to pick".  Can't guarantee that picking POLAR for POINT values will not make you run into problems when you encounter a subsequent THE_X().  And vice-versa.

Have you 'always believed' something like my MA to relvars should be allowed? What's different between relvar assignment vs PossRep component assignment (leaving aside the Multiple PossRep issue)?

None. Assignment is the notional replacement of the contents of a variable by new contents. Any relationship to any earlier value is in the eye of the beholder and/or an opportunity for optimisation.

(*) Well, it might make certain updates impossible if the constraint gets you "deadlocked" (can't change component 1 value because current component 2 value prohibits it and can't change component 2 value because current component 1 value prohibits it).

That's just like an Equi-Dependency between relvars; and it's been the classic justification for multiple assignment. So we can't expect constraints to hold 'at the comma'(?)

Like the Irishman giving directions: you can't get there from here but if you go somewhere else first then you can get there. Just need a smarter compiler.

Andl - A New Database Language - andl.org
Quote from dandl on November 24, 2019, 5:07 am
Quote from AntC on November 24, 2019, 2:38 am

Should that UPDATE SP be allowed when it'll break referential integrity? Should that DELETE S be allowed, because presumably SP already has a tuple referencing Supplier S1? Is the MA acceptable overall, because the deleted S1 gets re-inserted 'before the semicolon'. If this MA is not valid, then I think Hugh's original MA is not valid either.

Type constraints have to hold 'at the comma' to guard against invalid values and undefined behaviour.

Where in the TTM doco (or TTM's IM) does it say that? There's no motivation/explanation for the algorithm in RM Pre 21. I repeat: RM Pre 21 should be couched as a Prescription for behaviour, not as an algorithm that's a solution to whatever Prescription is in the authors' heads.

A value for the dbvar that breaks database constraints is just as much an invalid value, and just as likely to yield inconsistent behaviour if that invalid intermediate state is visible to other assignments within some MA. Not sure what you mean there by "undefined behaviour": what is the defined behaviour for assignments/updates to relvars that have duplicate values in attributes that are declared to be keys? RM Pre 23 Note appears to be talking only about inter-relvar constraints. Then by omission from RM Pre 23 Note do we guess that 'at the comma' applies to type constraints and to single-relvar constraints?

Perhaps it's a 32-bit number and the constraint on an addition is simply: expressible in 32 bits.

Database constraints have to hold 'at the semicolon'. Nobody cares about intermediate relational values, the constraint is what goes in the database, and notionally they all go in at once.

RM Pre 23 Note says (in effect) that inter-relvar database constraints need not hold 'at the comma'? I don't see it saying "nobody cares" about intermediate values of the dbvar; nor that intra-relvar constraints can be violated at the comma.

There's nothing in the wording for RM pre 21 specific to scalar type vars vs nonscalars. "all go in at once" applies to MAs of scalar vars just as much as of nonscalars. Indeed your MA might mix updates to scalar vars with those to nonscalars.

 

Quote from dandl on November 24, 2019, 2:41 am
Quote from Dave Voorhis on November 23, 2019, 10:56 pm

I'm not sure what compilers can or can't do is an issue here; this is a model, not an implementation. My objection is that it makes the definition of assignment to THE_x pseudo-variables potentially different under single and multiple assignment and/or not simply a shorthand for selector invocation.

It isn't. But if a compiler can always do it (using the full range of existing compiler tech) then there is an algorithm. If there is an algorithm it can be specified. But this specification fails to do so. So the answer is to fix the specification, not ban the algorithm.

"The full range of existing compiler tech" (which sounds like a roundabout way of saying "compiler magic") isn't an answer. The question here is how to define what the compiler should do.

There may be good reasons to preclude deferring type constraint checks under any circumstances.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on November 24, 2019, 2:51 am
Quote from Dave Voorhis on November 23, 2019, 10:56 pm
Quote from dandl on November 23, 2019, 10:25 pm
Quote from Dave Voorhis on November 23, 2019, 2:58 pm
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

Agreed. A bad idea.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

A poor idea. This example can be mechanically rewritten by extracting individual component values, something a compiler could easily do. If you prohibit examples like this you are banning a compiler from performing a provably valid rewrite, just because you couldn't find a way to write the spec.

I'm not sure what compilers can or can't do is an issue here; this is a model, not an implementation. My objection is that it makes the definition of assignment to THE_x pseudo-variables potentially different under single and multiple assignment and/or not simply a shorthand for selector invocation.

No there's nothing specific to THE_x pseudo-variables. Consider assignment to relvars with an Equi-dependency between them. (All Order Lines must have an Order Header; all Order Headers must have at least on Order Line.) RM Pre 23 Note.

Without Multiple Assignment, there's no way to insert anything into either of the Order Header, nor Order Lines relvars. So the justification for Multiple Assignment is that (some) constraints are not checked 'at the comma'. Why are type constraints treated differently to database constraints for these purposes?

Intuitively, deferring checking database constraints to the end of a multiple assignment seems straightforward; complicating it by temporarily allowing invalid scalar values seems (at least) undesirable. But it may be fine to do so; I await seeing the proof.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh, here's a comparable form of update from a higher-typed/parametric polymorphic language with records (Haskell).

To try to build on your example (although this is a bit artificial), suppose I have a type ColouredEllipse with fields labelled A, B inter alia. Suppose furthermore this is a parametric type: I can use any numeric type (INT, RAT, FLOAT) for the length of the axes, but I must use the same type for both. "use the same type" is in effect a type constraint on the components. I'll write that type ColouredEllipse(n), where n is some numeric type.

VAR myColouredEllipse ColouredEllipse(FLOAT);

myColouredEllipse := ...;

VAR myIntColouredEllipse ColouredEllipse(INT);

myIntColouredEllipse := myColouredEllipse{ A = floor( THE_A(myColouredEllipse) + 2.0), B = floor(THE_A(myColouredEllipse) + 1.0)};

The syntax form myVar{ A = ..., B = ...} is an 'Update using Field Labels'. The semantics are that for all fields of myVar not mentioned inside the { } , those values are 'copied forward' as is (that would be the Colour in this example). The commalist of label updates is not a sequence, as shown by the { } indicating the whole set of updates applies in one go. Note that the LHS of := must be a bare var; the RHS expression must return a type-correct 'whole value' to assign to the var.

So there's no opportunity to destructively assign to components of some var; all you can do is assign to the whole var. Furthermore there's no intermediate state of a partially-updated value, precisely to avoid violating the parametricity of the inferred types. All values are either of type ColouredEllipse(FLOAT) with both axes Float or ColouredEllipse(INT) with both axes INT.

How does this approach apply to TTM's MA? We say

  • assignment is to top-level vars only, and (apparent) assignment to components is a shorthand for that;
    (RM Pre 21 is trying to achieve that, but it exposes too much of the machinery)
  • we say that no intermediate values are constructed or visible;
  • the left-to-right ordering of individual updates is to be ignored, except that we observe dependencies of component updates left-to-right.
    (The final value of THE_B depends on the initial value of THE_A.)

I think this means we can carry out the intended effect using only syntactic expansions of the shorthand. For each target var, we go to the rightmost assignment then work backwards, that is the second assignment of the MA in your example:

, THE_A(E) := THE_A(E) + 2.0;

So we know the expansion of that will look like E := ELLIPSE(THE_A(E) + 2.0, ??);. There's no assignment to component THE_A(E) leftwards in the MA. But there is an assignment to the THE_B(E), so 'paste in' the RHS of that individual assignment

E := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

This syntactic-based rule won't work for Multiple PossReps where one of the assignments targets via one PossRep, another via a different PossRep. Because notionally it would need constructing a value via the first-used PossRep then select from it using the second-used; and that notional constructed value might violate the type constraints. Another good reason to do away with Multiple PossReps, I'd say.

Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

Hugh

I was thinking of an old McGoveran pronouncement "multiple assignment ... is limited (it cannot substitute for a group of interdependent updates in which an order is required to obtain a desired result – as for example occur in interactive transactions)".  In there, he is saying "order is required" as a way to express "intermediate results are needed and must be accessible because the intended result inevitably depends on them", I think.  And then I realised that TTM MA actively uses the technique of name hiding to obtain the effects it wants, thus effectively making many "intermediate results" invisible.

Let's assume we can avoid this name hiding, say by introducing names such as E', E'', ...  Then we get

E := WITH (E' := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E') + 2.0, THE_B(E'));

Now we have this thing THE_B(E') and we know this is a reference to an intermediate construct that is completely local to the "internals of the assignment".  That opens up the possibility of inlining (replace the invocation of THE_B with the expression used to construct the value in the first place :

E := WITH (E' := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E') + 2.0, THE_A(E) + 1.0);

Doing same with THE_A(E') yields

E := WITH (E' := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_A(E) + 1.0);

And now the WITH E' can be discarded :

E := ELLIPSE (THE_A(E) + 2.0, THE_A(E) + 1.0);

I think Chris Date was visually misled by what the expression looked like into believing that that intermediate value selector necessarily and inevitably had to also lead to an actual runtime invocation (which would inevitably raise the exception).

I think this is also the answer to @Antc 's question re. "what is different" with MA to relvars : the relvar is never used as a container to "store the intermediate result" (which would require checking the constraints "at the comma"), and on top of that relation types cannot be constrained by type constraints in the same sense that scalar types can (if MAJOR_AXIS must be larger than MINOR_AXIS then that would be a database constraint and not a type constraint on the relation type).  otoh, with the scalar assignment "to possrep components", if you effectively use "intermediate containers" to store intermediate results, then you're obliged to check the type constraint because that comes with the territory (of creating that container even if it's only an intermediate one).  (@Dave and @Dandl have articulated very well what you get if you willingly skip checking the type constraint.) If you use a technique like described above, then there no longer is any difference with MA to relvars.

(PS And now I was going to address the rest @Antc wrote and I see he wrote the same thing before me.)

Quote from AntC on November 24, 2019, 2:15 am

It's always struck me that RM Pre 21 is something of a misfit. "the semantics of MA" gives an algorithm. It doesn't give a prescription as to what's the intended outcome of that algorithm. If the intended outcome was explicit, we'd have independent grounds to say some algorithm complies or not. We'd also have independent grounds to say such-and-such a MA cannot be implemented within the prescriptions. (Just as saying some attempted update-through-virtual cannot be implemented.)

...

I agree with the sentiment, but is there a solution to the problem you say should actually be addressed ?

The problem is a programmer at some point in a program where he has some variables (local or external, such as the dbvar) containing some values and he wants them to contain other values.  So now you need to say something that defines the operation of assignment and it needs to cover every case of every programmer at every possible point in every program he writes.

We know VBAV (variable by agonizing variable) won't do so it needs to be multiple.  So the theoretically possible case of same variable used >1 as a target must be addressed.  And all of the targeted variables may also be mentioned in the RHS expressions used to assign to any of the targets within the MA so you must say something that defines precisely what value should "replace" that variable reference upon evaluation of such an RHS.

For TTM as it stands, that might go like

All references to variables in RHS expressions in "sub-assignments" evaluate to the pre-assignment value of the variable, EXCEPT

If the reference is to a variable that is also the target to which is being assigned by the expression [containing that particular variable reference as a sub-expression], then that reference evaluates to the value that the immediately preceding sub-assignment to that particular target variable attempted to assign, and in the absence of such a preceding sub-assignment, to the pre-assignment value of the variable.

I think that's accurate as far as it goes, but I don't know whether this matches your idea of "proper definition of semantics" and there's obviously still the following problems :

It still fails to specify what is a target variable and what is not (is a virtual relvar a target variable in this sense (and more importantly, are its underlying base relvars also targets in this sense, and if so, how to deal with the potentially overlapping grouping that ensues ???), is a pseudo-variable a target in this sense and is its underlying "root" variable too ?)  (The latter is answered by the examples and explanations provided, but it's not in the spec.)

And it still puts us back at square one of this thread because it speaks of "value that the immediately preceding sub-assignment ...", clearly implying that what "comes out" of that preceding sub-assignment must indeed be a value (satisfying its type constraint) and techniques like described above are in fact ruled out.

IMMSMW, historically, the former two editions of TTM tried the following :

All references to variables in an RHS of an assignment evaluate to the pre-assignment value.  (Bad, because subsequent assignment to possrep components always "keeps only the last and looses all the others".)

All references to variables in an RHS of an assignment evaluate to what is left for that variable by the latest preceding sub-assignment to that target.  (Bad, because now we can't swap using B:=A,A:=B; .)

Here's another proposal :

All sub-assignments to a target X lead to the creation of a new variable (lifetime is for the duration of the assignment) X', X'', X''' etc.
All new variables so created are visible/referenceable to any RHS expression within the assignment, independent of which target the sub-assignment it appears in assigns to.
References to such a variable X' evaluate to the value that the sub-assignment it is defined by, attempts to assign to it

But now our pseudovariables assignment just no longer works :

THE_A(E) := 7.0 , THE_B(E) := 6.0 ;  /* how to know the generated reference THE_A(E) must actually be THE_A(E') ??? */

THE_A(E) := 7.0 , THE_B(E') := 6.0 ;  /* note that almost invisible quote, but E' now appears on the LHS, and how to know how to correctly replace E' references with E ones ??? */

If a new sub-assignment must be added "somewhere in the middle" of an already big assignment (sourcewise), then that alters the target for already existing references like X''', ...  so we might add a fourth line

And if you want to shoot yourself in the foot by using this feature too extensively, that's your fault not ours.