The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Quote from dandl on November 26, 2019, 7:29 am
Quote from Dave Voorhis on November 26, 2019, 6:50 am
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

That assignment to a component is shorthand for an equivalent selector invocation is a given. The question is how to formalise "rearranging terms".

No, that was not the problem Hugh raised. The problem he raised evaporates once the shorthands are properly expanded and put into the form shown. The issue with 21b and the WITH clause and triggering type constraints has gone. Disappeared.

No, expanding the shorthands properly is what exposes the problem -- the pseudo-variable assignments expand to selector invocations, with the result that a selector invocation will throw an exception.

But obviously there has always been a lack of formality in RM Pre 21. If you think this is a new problem to address then by all means say so, but it isn't the same. There is no algorithm presented, just the goal of (somehow) transforming the MA into the form shown. Informal, but sufficient.

From my perspective it's a straightforward application of standard compiler technology, and needs no explicit formalisation.

But that's precisely the question: what are the specific semantics of the "standard compiler technology" (by which I presume you mean a rewrite) to be used here?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 26, 2019, 8:13 am
Quote from dandl on November 26, 2019, 7:29 am
Quote from Dave Voorhis on November 26, 2019, 6:50 am
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

That assignment to a component is shorthand for an equivalent selector invocation is a given. The question is how to formalise "rearranging terms".

No, that was not the problem Hugh raised. The problem he raised evaporates once the shorthands are properly expanded and put into the form shown. The issue with 21b and the WITH clause and triggering type constraints has gone. Disappeared.

No, expanding the shorthands properly is what exposes the problem -- the pseudo-variable assignments expand to selector invocations, with the result that a selector invocation will throw an exception.

Not exactly, but close. You can certainly take the position that expanding the shorthands correctly exposes a problem, but it's not the same one. Instead of running into a problem inside a WITH construct in step (b) you run into a problem in dealing with selectors exposed in step (a). This provides an opportunity to 'pick apart' nested selector invocations and transform them into a single invocation (as shown), thus avoiding the problem with invalid intermediate values. It would be perfectly possible to formalise that specific case, but is it really worth the bother? I'm sure there are others that are much harder.

While we're about it, what do we do about other kinds of invalid intermediate value, like this one:

I := 0, J := 0, I := I/J, I := 17;

The annoying thing is that the compiler cannot in general predict the failure, because these values can only be tested at runtime.

Andl - A New Database Language - andl.org
Quote from dandl on November 26, 2019, 1:48 pm
Quote from Dave Voorhis on November 26, 2019, 8:13 am
Quote from dandl on November 26, 2019, 7:29 am
Quote from Dave Voorhis on November 26, 2019, 6:50 am
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

That assignment to a component is shorthand for an equivalent selector invocation is a given. The question is how to formalise "rearranging terms".

No, that was not the problem Hugh raised. The problem he raised evaporates once the shorthands are properly expanded and put into the form shown. The issue with 21b and the WITH clause and triggering type constraints has gone. Disappeared.

No, expanding the shorthands properly is what exposes the problem -- the pseudo-variable assignments expand to selector invocations, with the result that a selector invocation will throw an exception.

Not exactly, but close.

No, it is exactly the one identified by Hugh, per the currently-specified semantics.

You can certainly take the position that expanding the shorthands correctly exposes a problem, but it's not the same one. Instead of running into a problem inside a WITH construct in step (b) you run into a problem in dealing with selectors exposed in step (a). This provides an opportunity to 'pick apart' nested selector invocations and transform them into a single invocation (as shown), thus avoiding the problem with invalid intermediate values. It would be perfectly possible to formalise that specific case, but is it really worth the bother? I'm sure there are others that are much harder.

While we're about it, what do we do about other kinds of invalid intermediate value, like this one:

I := 0, J := 0, I := I/J, I := 17;

The annoying thing is that the compiler cannot in general predict the failure, because these values can only be tested at runtime.

What compiler?

The behaviour in that case is no more and no less than what the semantics for the language specify. If it's unspecified, then it's presumably implementation-dependent, but that means it should be specified by the implementation.

The goal here is to define the semantics in the model rather than leaving them to implementations.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on November 25, 2019, 12:57 pm
Quote from Erwin on November 25, 2019, 12:24 pm
Quote from dandl on November 25, 2019, 1:49 am

Gospel: There is no assignment to a component. A component is not a variable. It's an illusion, propagated by Tutorial D which makes it appear so.

...And the type constraint is violated. But all this proves that multiple assignment was never needed or possible for this case. It is easy to see that this can be rewritten as a single assignment. A compiler would and should emit an error, not do the rewrite.

THE_ pseudovariables were meant as the counterpart to OO setters.  So the "makes it appear so" was very deliberate and on purpose.  Everything can always be rewritten in another form.  Programmers tend to prefer the shortest.

I agree. No argument whatsoever.

The point is that for the purposes of Hugh's example, THE() forms are not variables, so they must be considered as shorthands and eliminated in step RM Pre 21a. In step b when every LHS is a variable (of declared type ELLIPSE), the problem he refers to no longer exists. It evaporates.

Hugh's problem is caused by treating THE() forms as variables, which they are not.

But expressions that can be assigned to must behave like variables.  Which properties of variables does THE_A(E) lack to justify your assertion?

Coauthor of The Third Manifesto and related books.
Quote from Dave Voorhis on November 25, 2019, 3:09 pm
Quote from Hugh on November 25, 2019, 2:33 pm

Also, we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.

The TTM approach -- which defines assignment to a possrep component as shorthand for invoking a selector -- has no equivalent in typical object-oriented languages.

In typical object-oriented languages, the closest equivalent to assignment to a possrep component is either invocation of a method that changes instance state (such a method is often called a 'setter', because it sets the value of an instance's member variable) or direct assignment to an instance's member variable. I'm not aware of any object oriented language in which assignment to a member variable is shorthand for invoking a constructor (which is the nearest equivalent to a selector), though an object oriented language could conceivably be made that would do that.

Or did you mean something different by "as in OO languages"?

Well, for what it's worth, the SQL standard used a similar definition in the 1998 edition and I'm not aware of that having changed since.

Let P be a POINT object in some OO language, such that P.X := 3.0 results in P's X value becoming 3.0 while its Y value remains unchanged.  Couldn't that be short for P := POINT(3.0, P.Y)?  I understand that the RHS there is an invocation of a constructor, but does that make an material difference?  (Please excuse my ignorance.)

Hugh

Coauthor of The Third Manifesto and related books.
Quote from dandl on November 26, 2019, 12:44 am
Quote from Hugh on November 25, 2019, 2:33 pm

...we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.  But we have struggled in vain to correct the problem without abandoning the idea of expressing them with syntactic substitution and I just wondered if anybody here would be willing to have a go.  It seems not, and I don't want to participate in a long discussion about the merits and demerits of RM Pre 21's intention.

@Hugh: IMHO the problem  in this particular instance is not with the principle of syntactic substitution but the way you applied it. Here is the original as you posted:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The remainder of RM Pre 21 plays out without further difficulty.

IMO there is no problem to solve. It evaporates, once you note that assignment to a component is a shorthand, to be resolved in step a.

The expression coloured red violates the type constraint for ELLIPSE , so if E' is of type ELLIPSE, then it is impossible for it to acquire or stand for the (non-existent) value of that expression.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on November 26, 2019, 3:43 pm
Quote from Dave Voorhis on November 25, 2019, 3:09 pm
Quote from Hugh on November 25, 2019, 2:33 pm

Also, we believe very strongly that assignment to a possrep component is merely shorthand (as in OO languages) and we think that is the right way to express the semantics.

The TTM approach -- which defines assignment to a possrep component as shorthand for invoking a selector -- has no equivalent in typical object-oriented languages.

In typical object-oriented languages, the closest equivalent to assignment to a possrep component is either invocation of a method that changes instance state (such a method is often called a 'setter', because it sets the value of an instance's member variable) or direct assignment to an instance's member variable. I'm not aware of any object oriented language in which assignment to a member variable is shorthand for invoking a constructor (which is the nearest equivalent to a selector), though an object oriented language could conceivably be made that would do that.

Or did you mean something different by "as in OO languages"?

Well, for what it's worth, the SQL standard used a similar definition in the 1998 edition and I'm not aware of that having changed since.

Let P be a POINT object in some OO language, such that P.X := 3.0 results in P's X value becoming 3.0 while its Y value remains unchanged.  Couldn't that be short for P := POINT(3.0, P.Y)?  I understand that the RHS there is an invocation of a constructor, but does that make an material difference?  (Please excuse my ignorance.)

Hugh

It depends on how "material difference" is defined.

Given, say, variables P and Q which are references to distinct instances of POINT(2.0, 4.0), executing P.X := 3.0 and Q := new POINT(3.0, Q.Y) will result in P.X and Q.X both being equal to 3.0, and P.Y and Q.Y both remaining equal to 4.0.

From that point of view, they are not materially different.

However, after the assignments, P references the same instance it did before but Q references a different instance. Memory was allocated for Q's new instance of POINT, but no new memory was allocated for P.X := 3.0. Furthermore, the accessible methods of a class instance provide an interface to what might be encapsulated (and possibly hidden) additional functionality, such as (for example) POINT might internally maintain a historical log of every assigned X or Y value (the history presumably being accessible via some POINT method.) In that case, P might reference an instance with a lengthy history of assignments to X and Y.  Being newly constructed, the instance referenced by Q would contain no such history.

From that point of view, they are materially different.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 25, 2019, 10:58 am
Quote from AntC on November 25, 2019, 10:15 am
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

A final result might depend on what would be an invalid state of the database. Indeed RM Pre 23 Note specifically talks about invalid states of the database involving two relvar values. It's silent on invalid states of the database due to (say) violating a key constraint. Again, nobody has given a reason to show violating type constraints is qualitatively different to violating database constraints. (That is, a reason couched in terms specific to TTM Pre's/Pro's. Of course there's reasons were we talking about updates to Objects within some OO language. But we know reference semantics is a can of worms, especially for update. That's why TTM has value semantics.)

E.g., is it correct for a "valid" result to be semantically dependent on a SQUARE temporarily being a non-square rectangle per the current RM Pre 21?

Is it correct for a " valid"  result to be semantically dependent on a relvar containing duplicate keys? Or on a dangling foreign key reference? Again there's nothing in RM Pre 21 specific to PossRep components.

Or does that never matter (or conceptually never occur) and the overall result is always provably valid?

Yes the overall result is always provably valid because all constraints must hold at the semicolon. If all constraints hold but somebody wants to say there's something not "valid", all that means is there aren't the right constraints declared. I think you're being poisoned by SQL thinking, because it's just not possible to declare all desired constraints in SQL; consequently validation happens in application code or in triggers, and we all know what a pile of ordure that is.

No, I'm being "poisoned" -- such as it is -- by the notion that type constraints should, and perhaps by definition, be inviolate. An unconstrained intermediate state of a database leading to a constrained state intuitively seems necessary. A notional violation of type constraints intuitively seems abominable.

OK here's a nice knock-down argument (to quote Lewis Carroll, who seems to be looking over my shoulder at this discussion).

  • Suppose a scalar type with two relation-valued components (among others);
  • Suppose the type has a constraint declared on those components that enforces some referential integrity between them.
  • How is that different to the two-relvar example in RM Pre 23 Note?
  • And how do you propose to keep that constraint "inviolate" for an attempt to update each component as separate THE_x( ) := ... assignments within an MA?
    (Let's say the same MA is also assigning to an ELLLIPSE component.)

 

Quote from AntC on November 26, 2019, 8:31 pm
Quote from Dave Voorhis on November 25, 2019, 10:58 am
Quote from AntC on November 25, 2019, 10:15 am
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

A final result might depend on what would be an invalid state of the database. Indeed RM Pre 23 Note specifically talks about invalid states of the database involving two relvar values. It's silent on invalid states of the database due to (say) violating a key constraint. Again, nobody has given a reason to show violating type constraints is qualitatively different to violating database constraints. (That is, a reason couched in terms specific to TTM Pre's/Pro's. Of course there's reasons were we talking about updates to Objects within some OO language. But we know reference semantics is a can of worms, especially for update. That's why TTM has value semantics.)

E.g., is it correct for a "valid" result to be semantically dependent on a SQUARE temporarily being a non-square rectangle per the current RM Pre 21?

Is it correct for a " valid"  result to be semantically dependent on a relvar containing duplicate keys? Or on a dangling foreign key reference? Again there's nothing in RM Pre 21 specific to PossRep components.

Or does that never matter (or conceptually never occur) and the overall result is always provably valid?

Yes the overall result is always provably valid because all constraints must hold at the semicolon. If all constraints hold but somebody wants to say there's something not "valid", all that means is there aren't the right constraints declared. I think you're being poisoned by SQL thinking, because it's just not possible to declare all desired constraints in SQL; consequently validation happens in application code or in triggers, and we all know what a pile of ordure that is.

No, I'm being "poisoned" -- such as it is -- by the notion that type constraints should, and perhaps by definition, be inviolate. An unconstrained intermediate state of a database leading to a constrained state intuitively seems necessary. A notional violation of type constraints intuitively seems abominable.

OK here's a nice knock-down argument (to quote Lewis Carroll, who seems to be looking over my shoulder at this discussion).

  • Suppose a scalar type with two relation-valued components (among others);
  • Suppose the type has a constraint declared on those components that enforces some referential integrity between them.
  • How is that different to the two-relvar example in RM Pre 23 Note?
  • And how do you propose to keep that constraint "inviolate" for an attempt to update each component as separate THE_x( ) := ... assignments within an MA?
    (Let's say the same MA is also assigning to an ELLLIPSE component.)

The difference between a type and some relvars is that the latter is exposed in the schema, so update mechanisms are always fully visible. A type may be more than just conceptually opaque; it can be actually opaque -- say, supplied without source in binary form by some vendor, such that its participation in a multiple assignment may invisibly expose risky states that would otherwise be prohibited by constraint. For example, imagine some type has component values that are constrained to be coprime as the basis for some security mechanism. Can we guarantee that as part of a multiple assignment that non-coprime pairs won't "leak" out of temporary unconstrained non-coprime components of that type, even if indirectly and only (say) as the basis for some decision?

Pseudo-variables that make it look like components are updatable when they really aren't (it only looks like you're mutating the value, but it really looks like you're mutating the value) conceptually blurs mutability and immutability. I find that to be a strange conflation of object-oriented statefulness and value semantics. I'd prefer no pseudo-variables at all. Mutable state clearly belongs strictly to variables, and values of types should be categorically and boastfully immutable without even the slightest whiff of something that looks like "mutable" values but isn't.

That would mean any mechanism that relies on an attempt to temporarily violate a type constraint must be (a) explicitly conceived and written by the user, and (b) be regarded as broken as, say, trying to temporarily assign a string to an integer.

Then if a user still wants to temporarily select an invalid value -- and I appreciate that not having pseudovariables doesn't eliminate the issue raised in this thread, it just means there's no impetus to find a way to formalise (and automate) solving it -- then you must find another way to solve the underlying business problem, because un-ellipsing an ELLIPSE or un-squaring a SQUARE -- or similar -- is a fundamentally wrong way to do it, even if only temporarily.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on November 26, 2019, 3:35 pm

Hugh's problem is caused by treating THE() forms as variables, which they are not.

But expressions that can be assigned to must behave like variables.  Which properties of variables does THE_A(E) lack to justify your assertion?

No, definitely no. Despite the 'general intent' mentioned in the TD spec, there is no overriding requirement that a pseudo-variable behave in all respects identically to a variable. [Hint: the 'pseudo' gives it away.]

[Speaking as one who has implemented pseudo-variables in a production compiler, I can tell you there are a myriad of gotchas and odd little differences to deal with. Identical behaviour might be the Holy Grail, but it is never achieved.]

In any case, how close it gets is a TD problem, not one for TTM. TTM deals with variables, those defined in RM Pre 11, 12, 13. The interpretation of RM Pre 21 must read 'variable' in that light, and if some language has used a pseudo-variable to give effect to the the update operator mentioned in RM Pre 3b, that must be treated as a shorthand and expanded in step a.

I note that the TD spec sets out its own version of RM Pre 21, on page 16, and your OP and the expansion you provide do not conform. The wording given there is quite similar to the rearrangement of terms I provided earlier.

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

After expanding shorthands per RM Pre 21a you should get:

E' := ELLIPSE(THE_A(E), THE_A(E) + 1.0), E'' := ELLIPSE(THE_A(E') + 2.0, THE_B(E'));

and then by rearranging terms:

E' := ELLIPSE(THE_A(E) + 2.0, THE_A(E) + 1.0);

The form provided in the TD spec is:

ST := PR ( X1 , X2 , ... , Xn )

In this form the problem you originally posted does not occur.

 

Andl - A New Database Language - andl.org