The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

Coauthor of The Third Manifesto and related books.

I read RM Pre 21 as describing an algorithm using Tutorial D as a convenient syntactic form, not as code that is required to work.The step intends to merge a sequence of assignments to a variable into a single assignment via a sequence of notional intermediate values. It's the final results we care about, not the intermediate values.

So you can deal with that by observing simply that:

  1. step (b) may appear to generate invalid intermediate values (that violate constraints) but
  2. they are phantoms that are always resolved within that step, they do not propagate, and
  3. they can always be rewritten as code with the same effect that does not violate constraints.

In practical terms for the case you describe that rewrite might be to decompose each component into local variables of the appropriate type, operate on those, and finally reassemble the result value. You could formalise that if you wished, but I wouldn't bother.

Incidentally, the mention of 'declared type' troubles me. Given the IM, you cannot choose the declared type of an expression, and there is no obvious way to force the declared type of variable and expression to be the same.

Andl - A New Database Language - andl.org
Quote from dandl on November 22, 2019, 11:02 pm

I read RM Pre 21 as describing an algorithm using Tutorial D as a convenient syntactic form, not as code that is required to work.The step intends to merge a sequence of assignments to a variable into a single assignment via a sequence of notional intermediate values. It's the final results we care about, not the intermediate values.

So you can deal with that by observing simply that:

  1. step (b) may appear to generate invalid intermediate values (that violate constraints) but
  2. they are phantoms that are always resolved within that step, they do not propagate, and
  3. they can always be rewritten as code with the same effect that does not violate constraints.

In practical terms for the case you describe that rewrite might be to decompose each component into local variables of the appropriate type, operate on those, and finally reassemble the result value. You could formalise that if you wished, but I wouldn't bother.

Incidentally, the mention of 'declared type' troubles me. Given the IM, you cannot choose the declared type of an expression, and there is no obvious way to force the declared type of variable and expression to be the same.

Thanks, and yes, of course, but we don't feel it's right to provide a flawed definition and leave it to implementers to fix the problem.

In response to your last paragraph, please look at IM Pre 9 and IM Pre 17.

Hugh

Coauthor of The Third Manifesto and related books.

I do not find the definition flawed so much as requiring a little further explanation. Others might see it differently.

Re the other, perhaps I should be clearer. The wording of RM Pre 21 says:

where Vi is the name of some variable not defined in terms of any others and Xi is
an expression of declared type the same as that of Vi.

Each Vi is a variable obtained by the preceding process of expanding shortcuts, so is of some declared type. And likewise each Xi is an expression and it too has a declared type. For example, Vi might be of explicitly declared type INTEGER and Xi of declared type ODD (it is the value of a selector by that name). They are assignment compatible, and this is consistent with IM Pre 9.

I do not see how RM Pre 21 can require or assume a declared type for an expression (INTEGER in this case, where it is plainly ODD). Perhaps an example would help me to understand.

Andl - A New Database Language - andl.org
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

Alternatively -- and I'm not sure what can of worms this opens; perhaps a big one -- define multiple assignment and type constraints such that all type constraints are only checked at the end of a multiple assignment.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 23, 2019, 2:58 pm

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

Alternatively -- and I'm not sure what can of worms this opens; perhaps a big one -- define multiple assignment and type constraints such that all type constraints are only checked at the end of a multiple assignment.

Yes, yes (*) and NOOOOOOOOOOOOOOO.

If type constraints are only "checked at end of assignment" then you get not-a-value thingys somewhere in the middle, then you get containers somewhere in the middle (of a chaining of assignments to the same target) deriving from "expressions" that suddenly are able to produce these not-a-value thingys as their result, and I think lots of stuff is gone out the window.

And I've always believed ELLIPSE(A 1.0, B 100.0) is, and should be allowed by the language to be, an ellipse too.  Besides, the idea to "replace" scalar values with their tuple "equivalents" (in the internal operation of the assignment for the small amount of time it takes to get the assignment done) is just the result of the more than striking resemblance between possrep representations of a scalar value and a tuple value, and the authors have never been very keen on letting that seep through too much in the language.  One problem is definitely "which possrep are you going to pick".  Can't guarantee that picking POLAR for POINT values will not make you run into problems when you encounter a subsequent THE_X().  And vice-versa.

(*) Well, it might make certain updates impossible if the constraint gets you "deadlocked" (can't change component 1 value because current component 2 value prohibits it and can't change component 2 value because current component 1 value prohibits it).

Quote from Erwin on November 23, 2019, 4:07 pm
Quote from Dave Voorhis on November 23, 2019, 2:58 pm

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

Alternatively -- and I'm not sure what can of worms this opens; perhaps a big one -- define multiple assignment and type constraints such that all type constraints are only checked at the end of a multiple assignment.

Yes, yes (*) and NOOOOOOOOOOOOOOO.

If type constraints are only "checked at end of assignment" then you get not-a-value thingys somewhere in the middle, then you get containers somewhere in the middle (of a chaining of assignments to the same target) deriving from "expressions" that suddenly are able to produce these not-a-value thingys as their result, and I think lots of stuff is gone out the window.

That sounds reasonably unreasonable, and well wormy per my opened can. Intermediate violations of constraints may lead to indeterminate or invalid final results. Best avoided, then.

And I've always believed ELLIPSE(A 1.0, B 100.0) is, and should be allowed by the language to be, an ellipse too.  Besides, the idea to "replace" scalar values with their tuple "equivalents" (in the internal operation of the assignment for the small amount of time it takes to get the assignment done) is just the result of the more than striking resemblance between possrep representations of a scalar value and a tuple value, and the authors have never been very keen on letting that seep through too much in the language.  One problem is definitely "which possrep are you going to pick".  Can't guarantee that picking POLAR for POINT values will not make you run into problems when you encounter a subsequent THE_X().  And vice-versa.

(*) Well, it might make certain updates impossible if the constraint gets you "deadlocked" (can't change component 1 value because current component 2 value prohibits it and can't change component 2 value because current component 1 value prohibits it).

Such deadlocks are always a potential risk with constraints, but "get out of jail" cards (which the proposed "solution" appears to suggest) are (almost) never a real solution. If you have to violate a constraint to achieve a goal, then it's probably the wrong constraint. Or goal...

I've sometimes seen this sort of problem described as "I want a pony." I.e., you want a desired outcome but can only achieve it via unacceptable circumstances, akin to the unreasonable expense and hassle of actually owning a pony. The only viable solution is that you don't get to have a pony.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 23, 2019, 2:58 pm
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

Agreed. A bad idea.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

A poor idea. This example can be mechanically rewritten by extracting individual component values, something a compiler could easily do. If you prohibit examples like this you are banning a compiler from performing a provably valid rewrite, just because you couldn't find a way to write the spec.

Alternatively -- and I'm not sure what can of worms this opens; perhaps a big one -- define multiple assignment and type constraints such that all type constraints are only checked at the end of a multiple assignment.

A horrible idea. Allowing an intermediate result to violate a constraint is well into "undefined behaviour", including but not limited to random corruption of values or even unicorns and rainbows.

 

Andl - A New Database Language - andl.org
Quote from dandl on November 23, 2019, 10:25 pm
Quote from Dave Voorhis on November 23, 2019, 2:58 pm
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

The proposed solution seems worse than the problem, and an inelegant kludge rather than an appropriate definition.

Agreed. A bad idea.

I would simply declare the attempted update to be invalid, because it violates the type constraint for ELLIPSE under the expansion per RM Pre 21.

A poor idea. This example can be mechanically rewritten by extracting individual component values, something a compiler could easily do. If you prohibit examples like this you are banning a compiler from performing a provably valid rewrite, just because you couldn't find a way to write the spec.

I'm not sure what compilers can or can't do is an issue here; this is a model, not an implementation. My objection is that it makes the definition of assignment to THE_x pseudo-variables potentially different under single and multiple assignment and/or not simply a shorthand for selector invocation.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E,

Thanks Hugh, to be clear: this topic is not just about updating via PossReps, but any multiple assignment where different assignments target components within the same variable. In particular, the assignments might target 'component's within the dbvar, that is, different relvars; the transient state might specifically violate inter-relvar constraints.

It's always struck me that RM Pre 21 is something of a misfit. "the semantics of MA" gives an algorithm. It doesn't give a prescription as to what's the intended outcome of that algorithm. If the intended outcome was explicit, we'd have independent grounds to say some algorithm complies or not. We'd also have independent grounds to say such-and-such a MA cannot be implemented within the prescriptions. (Just as saying some attempted update-through-virtual cannot be implemented.)

via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Without following the algorithm in RM Pre 21, I'd implement that as

E := ELLIPSE(THE_A(E)+2.0, THE_A(E)+1.0);

That assumes there's no assignments intervening at the comma. Alternatively (and on the same assumption)

E := WITH (Temp_B := THE_A(E)+1.0) : ELLIPSE(THE_A(E)+2.0, Temp_B);

If there are assignments intervening at the comma:

  • If they're referencing E, they'll be accessing a state where constraints don't hold, so invalidating any expression at all.
  • If they're not referencing E, the assignments can be freely rearranged to group together assignments to E, then my above assumption does hold.

This 'dependency grouping' is a recognised technique in compiler technology. (Not 'dependency' in the relational sense, but declarations depending on other declarations depending on further declarations ..., particularly for recursive functions.) The bad news is that grouping for dependencies is NP-hard in general and not guaranteed to produce sub-groups any smaller than the original problem.

Isn't there a similar technique for determining if database updates are serialisable/can be applied in parallel?

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

No that's not right if there's an assignment intervening. Because the tuple representing the half-updated E is not visible at the comma, so to speak.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

I'll add a few comments in response to others'. I'm a bit mystified why Chris wants to plug this lacuna, when there are gaping holes elsewhere in TTM.