The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Quote from dandl on November 25, 2019, 12:37 am
Quote from Erwin on November 24, 2019, 1:49 pm
Quote from AntC on November 24, 2019, 2:38 am

That's just like an Equi-Dependency between relvars; and it's been the classic justification for multiple assignment. So we can't expect constraints to hold 'at the comma'(?)

It is important to realise that constraints (database constraints in particular) DO hold throughout the entire assignment, i.e. "at every single comma" in particular.  Here's how (using extreme shorthand lingo for notational and expository convenience) :

INSET 1 INTO R, INSERT 2 INTO R;

R := (WITH R' (R UNION 1) : R' UNION 2)

Whatever key applies to R (and is obviously satisfied by it), that key cannot be inferred to also apply to (R UNION 1).  A system making correct inferences from declared constraints, will understand full well that constraints applying to R do not of necessity apply to (R UNION 1).

Imagine that the constraint is: R shall always contain an even number of tuples. That constraint is routinely violated by the first INSERT, and satisfied by the second. The note to RM Pre 23 gives another more abstract instance.

Type constraints cannot be violated because they define the set of values of a type; other values do not exist and cannot be created. There are no type errors, they just never happen.

That might be true in the languages/type systems you're familiar with. I don't see it as true in TTM's system with declared constraints between components of a PossRep; nor with the ability in TTM to assign to components individually; nor with Multiple PossReps. Please give some example languages that support all three of those 'features'.

Consider these ELLIPSE values (can you say "they just never happen"?):

ELLIPSE(1.0, 0.5)     // valid

ELLIPSE(3.0, 2.0)     //valid

ELLIPSE(2.0, 2.0)     // valid?

ELLIPSE(1.0, 2.0)    // not valid

1.0 is type-valid for component A, as shown by the first example. 2.0 is type-valid for component B, as shown by the second example. So why can't I combine those two type-valid component values? I see no grounds to say that "do[es] not exist and cannot be created". I just gave an expression that creates it. The argument against looks to me very like the case for inter-relvar constraints. (And within a single relvar, we have inter-tuple constraints, same argument; or within a tuple we have inter-attribute constraints, same argument.)

Note that in sum-of-product type systems, components are regarded as orthogonal; any value that's valid for component 1 must be valid for all possible values of the other components. If you want to implement restrictions like that for ELLIPSE, either you write code to detect at run-time attempts to set invalid combinations, or you move to dynamic typing (which amounts to the same behaviour). Or you represent the components combined as a single scalar value, where you execute code to extract the pseudo-components.

A good example would be dates represented as a single INT days-since-epoch. Day, Month, Year are pseudo-components. Day := 31 is not always valid, but that's OK because Day is not really a component. Assigning to a pseudo-component executes code to validate and build the INT value. I.e. it 'constructs' (potentially allocating storage) in exactly the way Hugh objects to about 'constructors' in OOP. I've assumed a conventional/familiar language there; in context of TTM we could have Multiple PossReps for single INT vs 3 components. I don't see that anything changes at implementation level.

Database constraints must be satisfied for an assignment to succeed.

All constraints must be satisfied for an assignment to succeed. The question is where we count the assignment as succeeding: at the comma or at the semicolon?

For a multiple assignment they succeed as a whole or none of them do (and presumably some other action is taken, such as an exception or error). There is no mechanism to test or violate a database constraint except 'at the semicolon'.

Nonsense: RM Pre 23 says the constraint " shall be satisfied if and only if that boolean expression evaluates to TRUE,". We can evaluate a boolean expression at the comma. (Or we could change all the commas to semicolons, for the purposes of constraint-checking.)

To me these are simply straightforward readings of RM Pre 23.

No: look at the wording. You're importing a whole load of pre-judgments and expectations from other languages/type systems.

At an implementation level, your example is bundled into a single set-oriented merge. The insertion of {1,2} into {3,4,5,6} succeeds, the insertion of {1,2} into (1,3,4,5} or {2,3,4,5} fails. There is no 'at the comma'.

Not adequate if there are other assignments appearing between Erwin's two INSERTs. See my example just posted. If Erwin is correctly interpreting the algorithm in RM Pre 21, then the algorithm is not adequate. I say "if" because I plain don't know; I don't get the intent behind RM Pre 21 for these tricky cases; is the intent valid but RM Pre 21's algorithm flawed/a poor realisation thereof? Is RM Pre 21's algorithm adequate but Erwin's interpretation not accurate?

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.
Quote from AntC on November 25, 2019, 12:40 am

What is not equally straightforward about deferring checking scalar constraints to the end of a multiple assignment?

There's three possible interpretations of "invalid scalar values":

  • Assigning (say) a CHAR to a component declared as RAT. Yes this is never allowed.
  • Assigning 'at the comma' a RAT to a component declared as RAT that thereby violates the PossRep compound's declared constraint. Note this is validly assigning a value within the RAT component's "set of values" per RM Pre 23 a.
  • It's when we expand the shorthand to turn that into an assignment to the bare var of some compound type that we violate a constraint. This is Hugh's ELLIPSE example.

I agree with most if not all of the elided text, but I must emphasise my response to the second bullet point here.

Gospel: There is no assignment to a component. A component is not a variable. It's an illusion, propagated by Tutorial D which makes it appear so.

RM Pre 5b carefully defines an update to variable V such that the values before and after differ in a single component, but makes it clear that this is a replacement of the whole value of the variable. So at some level if we decompose the update operator there must be a selector with arguments intended to return the new value which will be assigned to V. The selector will fail because the arguments do not 'select' a value that is a member of the set of values for that type. There is no sense in which the value is created and then fails a test, it can simply never be created.

Hugh wrote this:

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

In the light of the above, this is incorrect. The purported MA is not in the form required by RM Pre 21 a, because THE_B(E) is not the name of a variable, rather it is a shorthand for an assignment to E. Expanding shorthands results in something like this:

E := ELLIPSE (THE_A(E), THE_A(E) + 1.0), E := ELLIPSE (THE_A(E) + 2.0, THE_B(E));

And the type constraint is violated. But all this proves that multiple assignment was never needed or possible for this case. It is easy to see that this can be rewritten as a single assignment. A compiler would and should emit an error, not do the rewrite.

Andl - A New Database Language - andl.org
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

E.g., is it correct for a "valid" result to be semantically dependent on a SQUARE temporarily being a non-square rectangle per the current RM Pre 21?

Or does that never matter (or conceptually never occur) and the overall result is always provably valid?

Or is there at least one provably invalid case?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

A final result might depend on what would be an invalid state of the database. Indeed RM Pre 23 Note specifically talks about invalid states of the database involving two relvar values. It's silent on invalid states of the database due to (say) violating a key constraint. Again, nobody has given a reason to show violating type constraints is qualitatively different to violating database constraints. (That is, a reason couched in terms specific to TTM Pre's/Pro's. Of course there's reasons were we talking about updates to Objects within some OO language. But we know reference semantics is a can of worms, especially for update. That's why TTM has value semantics.)

E.g., is it correct for a "valid" result to be semantically dependent on a SQUARE temporarily being a non-square rectangle per the current RM Pre 21?

Is it correct for a " valid"  result to be semantically dependent on a relvar containing duplicate keys? Or on a dangling foreign key reference? Again there's nothing in RM Pre 21 specific to PossRep components.

Or does that never matter (or conceptually never occur) and the overall result is always provably valid?

Yes the overall result is always provably valid because all constraints must hold at the semicolon. If all constraints hold but somebody wants to say there's something not "valid", all that means is there aren't the right constraints declared. I think you're being poisoned by SQL thinking, because it's just not possible to declare all desired constraints in SQL; consequently validation happens in application code or in triggers, and we all know what a pile of ordure that is.

Or is there at least one provably invalid case?

If all constraints hold, the overall effect of the MA is provably valid. Or you have some other definition for "invalid" in mind. Please explain.

Quote from AntC on November 25, 2019, 10:15 am
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).
  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

A final result might depend on what would be an invalid state of the database. Indeed RM Pre 23 Note specifically talks about invalid states of the database involving two relvar values. It's silent on invalid states of the database due to (say) violating a key constraint. Again, nobody has given a reason to show violating type constraints is qualitatively different to violating database constraints. (That is, a reason couched in terms specific to TTM Pre's/Pro's. Of course there's reasons were we talking about updates to Objects within some OO language. But we know reference semantics is a can of worms, especially for update. That's why TTM has value semantics.)

E.g., is it correct for a "valid" result to be semantically dependent on a SQUARE temporarily being a non-square rectangle per the current RM Pre 21?

Is it correct for a " valid"  result to be semantically dependent on a relvar containing duplicate keys? Or on a dangling foreign key reference? Again there's nothing in RM Pre 21 specific to PossRep components.

Or does that never matter (or conceptually never occur) and the overall result is always provably valid?

Yes the overall result is always provably valid because all constraints must hold at the semicolon. If all constraints hold but somebody wants to say there's something not "valid", all that means is there aren't the right constraints declared. I think you're being poisoned by SQL thinking, because it's just not possible to declare all desired constraints in SQL; consequently validation happens in application code or in triggers, and we all know what a pile of ordure that is.

No, I'm being "poisoned" -- such as it is -- by the notion that type constraints should, and perhaps by definition, be inviolate. An unconstrained intermediate state of a database leading to a constrained state intuitively seems necessary. A notional violation of type constraints intuitively seems abominable.

Or is there at least one provably invalid case?

If all constraints hold, the overall effect of the MA is provably valid. Or you have some other definition for "invalid" in mind. Please explain.

"Invalid" means some intermediate step dependent on an invalid type -- e.g., a non-square SQUARE, for example -- even if only notionally.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

Can't we just say:

  • A Multiple Assignment must have the effect (at the semicolon) as if each individual assignment took place in sequence.
    That is, as if the commas were semicolons.
  • Except that no constraints (RM Pre 23) need be checked at the comma. And no updates are final at the comma.
  • All constraints must hold at the semicolon for the overall MA to succeed. If any constraints are violated, all the assignments within the MA are void, there is no update to any vars (neither scalar nor non-, neither database relvars nor local).

Hmm, this next point is not well put:

  • An implementation can choose to check some constraints at the comma (and potentially at compile time, especially for type constraints RM Pre 23 a.), for efficiency reasons or to give more precise feedback to users about the source of violations.

I don't mean at the comma check the constraints against the intermediate state of the var(s), because constraints are allowed not to hold at intermediate states, per the second bullet. I mean: at the comma optionally pre-check the constraints that will apply at the semicolon. I have in mind things like assigning a CHAR value to a var or component of declared type RAT: if there's no following re-assignment to that var/component, then the CHAR value will continue to the semicolon, where it'll be found to be a violation. If some later assignment within the MA uses the value, it'll be type-invalid for any operation and/or type-invalid for assigning to some other var/component. Hmm, hmm what if the value is not used, and there is a later assignment that overwrites the CHAR value? Then assigning the CHAR was pointless, but the rules won't detect the temporary violation. So what (I guess).

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

In the case of assigning a CHAR value to a RAT var/component, the 'best' outcome is that was pointless. So I think a user-friendly compiler is within its rights to reject the attempt. In case of inserting or updating tuples into a relvar that would cause a duplicate key, that's impossible to detect at compile time; and validating for it might as well be left to the semicolon. In case of assigning values to an axis that would violate the ELLIPSE constraint: again that's impossible in general to detect at compile time; it must be checked at the semicolon; what merit is there in the extra effort to check at the comma?; especially given there might be a later assignment in the MA that results in a type-valid value for the overall var -- which is the point of Hugh's example.

Revised 4th bullet:

  • An implementation can choose at the comma to pre-check some constraints that will need to hold at the semicolon. (The pre-checking can potentially happen at compile time, especially for type constraints RM Pre 23 a.). The pre-check must not simplistically evaluate the boolean expression (RM Pre 23) for intermediate values assigned as at the comma, but rather must anticipate the final values for vars that will result at the semicolon, and evaluate the boolean expression for the final values. Pre-checking is for for efficiency reasons or to give more precise feedback to users about the source of violations.

I envisage grouping the individual assignments according to their target top-level var, then validating the resulting value for that var overall.

Quote from Dave Voorhis on November 25, 2019, 10:58 am
Quote from AntC on November 25, 2019, 10:15 am
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

 

Or is there at least one provably invalid case?

If all constraints hold, the overall effect of the MA is provably valid. Or you have some other definition for "invalid" in mind. Please explain.

"Invalid" means some intermediate step dependent on an invalid type -- e.g., a non-square SQUARE, for example -- even if only notionally.

I think you still have in mind by "invalid" the emotive force of something like a RAT var/component getting assigned a CHAR value. WRT the SQUARE example, would you be so upset if we represented the SQUARE data structure as a tuple-type value with attributes length, width and a tuple constraint they must be the same? That's how I'm thinking of type constraints. (Also because the IM forms no part of my thinking.)

See my earlier example building on Hugh's of alternating an update to an ELLIPSE component (making it a non-elliptical ELLIPSE) with an update to a relvar holding an ELLIPSE attribute (making the relvar hold non-elliptical ELLIPSES); which nevertheless after more updates overall results in all ELLIPSEs being elliptical. It is vital for the code to work as written that there be an intermediate state with non-elliptical ELLIPSEs.

Now we could reorganise that code to avoid any intermediate states holding non-elliptical ELLIPSEs; but that would require deep knowledge of the data structures and constraints, well beyond a general solution for the smartest of compilers.

Quote from AntC on November 25, 2019, 11:27 am
Quote from Dave Voorhis on November 25, 2019, 10:58 am
Quote from AntC on November 25, 2019, 10:15 am
Quote from Dave Voorhis on November 25, 2019, 9:48 am
Quote from AntC on November 25, 2019, 1:49 am

That seems notionally reasonable, but my concern is that it may mean a final result dependent on what would have been invalid types per the current RM Pre 21. If that's the case, what does it mean?

 

Or is there at least one provably invalid case?

If all constraints hold, the overall effect of the MA is provably valid. Or you have some other definition for "invalid" in mind. Please explain.

"Invalid" means some intermediate step dependent on an invalid type -- e.g., a non-square SQUARE, for example -- even if only notionally.

I think you still have in mind by "invalid" the emotive force of something like a RAT var/component getting assigned a CHAR value. WRT the SQUARE example, would you be so upset if we represented the SQUARE data structure as a tuple-type value with attributes length, width and a tuple constraint they must be the same? That's how I'm thinking of type constraints. (Also because the IM forms no part of my thinking.)

See my earlier example building on Hugh's of alternating an update to an ELLIPSE component (making it a non-elliptical ELLIPSE) with an update to a relvar holding an ELLIPSE attribute (making the relvar hold non-elliptical ELLIPSES); which nevertheless after more updates overall results in all ELLIPSEs being elliptical. It is vital for the code to work as written that there be an intermediate state with non-elliptical ELLIPSEs.

Now we could reorganise that code to avoid any intermediate states holding non-elliptical ELLIPSEs; but that would require deep knowledge of the data structures and constraints, well beyond a general solution for the smartest of compilers.

I regard a RAT being assigned to a CHAR value, and un-squaring a SQUARE, as being equivalent invalidity.

A square represented as a tuple with length, width and some database constraint is different: It's not a type.

Intuitively -- and at this point (lacking any logical proof), certainly emotionally -- I suggest that types must be opaque to the environment in which they are used. Deferring type constraints in some context breaks the type abstraction, violates encapsulation, and blurs distinct constructs.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on November 25, 2019, 1:29 am

Not adequate if there are other assignments appearing between Erwin's two INSERTs. See my example just posted. If Erwin is correctly interpreting the algorithm in RM Pre 21, then the algorithm is not adequate. I say "if" because I plain don't know; I don't get the intent behind RM Pre 21 for these tricky cases; is the intent valid but RM Pre 21's algorithm flawed/a poor realisation thereof? Is RM Pre 21's algorithm adequate but Erwin's interpretation not accurate?

["example just posted" copypasted here for convenience]

As it stands, because RM Pre 21 only gives an algorithm, I don't understand the intent. For example consider if there's relvar MYELLIPSES with attribute X of type ELLIPSE:

THE_B(E) := THE_A(E) + 1.0, UPDATE MYELLIPSES WHERE THE_A(X) = THE_A(E) : {THE_B(X) := THE_B(E)},   // comma here, so MA continues

THE_A(E) := THE_A(E) + 2.0, UPDATE MYELLIPSES WHERE THE_B(X) = THE_B(E) : {THE_A(X) := THE_A(E)};

(We'll conveniently ignore that the second UPDATE is probably not doing what the programmer intended.)

At the semicolon all constraints hold. At the first UPDATE, it is vital for that assignment to "see the state of affairs" in which the type constraint on E is violated -- ie THE_B(E) = THE_A(E) + 1.0. Furthermore there's no way to 'expand' those assignments to THE_xs in such a way as to get a single assignment to E and support the first UPDATE to apply as written.

Adrian's old way of explaining was "updates to the same target are done in sequence, updates to different targets are done in parallel".  Don't know if that helps, but what it means for your example is it won't work as you want.  You get two chainings : UPDATE MYELLIPSES ... , UPDATE MYELLIPSES ... ;   and the second   THE_B(E) := ... , THE_A(E) := ... ;

In the first chaining, the MYELLIPSES value 'seen' by the first UPDATE is the pre-update one, the MYELLIPSES value 'seen' by the second UPDATE is the one left by the first.  But whatever is happening to E is never seen by this chain.  Any reference to E in this chain evaluates to the pre-update value, and this is contrary to your 'it is vital that ...'.

It is the fact that no intermediate state is ever shared between distinct chains, that allows the claim "no state shall ever be seen that is in violation of the constraints".  The only "intermediate state" ever possibly seen by any expression is the one left by a preceding sub-assignment as it gets passed to a subsequent sub-assignment to the same target.  Now, with the name shadowing trick as applied to relvars, it is still the case that no state ***of the concerned relvar*** can be seen that is in violation of constraints, because if the name 'R' shadows the relvar and replaces it with the ***expression*** 'R UNION 1', then obviously the result of that UNION might be in violation of, say, an empty key, but that cannot be regarded as a 'state of R' violating its constraint.  It is not problematic to create a container to hold the result of that UNION, because no ***type*** constraints apply [at least not of the kind that rules ELLIPSE], but the same cannot be said for scalar containers.

So the solution to OP problem must lie in somehow avoiding the creation of those "intermediate scalar containers".  Replacing them with unconstrained tuple types is an obvious potential solution to explore (but then watch out for multipossreps), rewriting to eliminate them altogether is another.