The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Help sought to fix RM Pre 21 (multiple assignment)

Quote from AntC on November 24, 2019, 2:15 am

Isn't there a similar technique for determining if database updates are serialisable/can be applied in parallel?

There might be at some extremely coarse level of granularity (say, entire relvars, which no one is ever going to accept in real production systems), but otherwise there is not because because it is hard to compute upfront which additional tuples are going to be accessed during the constraint checking.  At that level, we're talking tuple-level predicate locks.

Quote from AntC on November 24, 2019, 2:38 am

That's just like an Equi-Dependency between relvars; and it's been the classic justification for multiple assignment. So we can't expect constraints to hold 'at the comma'(?)

It is important to realise that constraints (database constraints in particular) DO hold throughout the entire assignment, i.e. "at every single comma" in particular.  Here's how (using extreme shorthand lingo for notational and expository convenience) :

INSET 1 INTO R, INSERT 2 INTO R;

R := (WITH R' (R UNION 1) : R' UNION 2)

Whatever key applies to R (and is obviously satisfied by it), that key cannot be inferred to also apply to (R UNION 1).  A system making correct inferences from declared constraints, will understand full well that constraints applying to R do not of necessity apply to (R UNION 1).

Quote from AntC on November 24, 2019, 10:02 am
Quote from dandl on November 24, 2019, 5:07 am
Quote from AntC on November 24, 2019, 2:38 am

Should that UPDATE SP be allowed when it'll break referential integrity? Should that DELETE S be allowed, because presumably SP already has a tuple referencing Supplier S1? Is the MA acceptable overall, because the deleted S1 gets re-inserted 'before the semicolon'. If this MA is not valid, then I think Hugh's original MA is not valid either.

Type constraints have to hold 'at the comma' to guard against invalid values and undefined behaviour.

Where in the TTM doco (or TTM's IM) does it say that? There's no motivation/explanation for the algorithm in RM Pre 21. I repeat: RM Pre 21 should be couched as a Prescription for behaviour, not as an algorithm that's a solution to whatever Prescription is in the authors' heads.

I agree. TTM generally has too much algorithmic surface area, where intent and results are what count.

A value for the dbvar that breaks database constraints is just as much an invalid value, and just as likely to yield inconsistent behaviour if that invalid intermediate state is visible to other assignments within some MA. Not sure what you mean there by "undefined behaviour": what is the defined behaviour for assignments/updates to relvars that have duplicate values in attributes that are declared to be keys? RM Pre 23 Note appears to be talking only about inter-relvar constraints. Then by omission from RM Pre 23 Note do we guess that 'at the comma' applies to type constraints and to single-relvar constraints?

RM Pre 23 specifies type constraints and database constraints. The former ensures that no value can ever appear except for those that constitute a given type.It is simply not possible for any operator to return a value that is invalid for the type, as there is no way to create such a value.

The DBC apply only to the visible state of the database. The note makes it clear that the DBC must hold at the semicolon; for this purpose assignments separated by comma are treated as if simultaneous and there is no intermediate point at which the DBC is violated.

Perhaps it's a 32-bit number and the constraint on an addition is simply: expressible in 32 bits.

Database constraints have to hold 'at the semicolon'. Nobody cares about intermediate relational values, the constraint is what goes in the database, and notionally they all go in at once.

RM Pre 23 Note says (in effect) that inter-relvar database constraints need not hold 'at the comma'? I don't see it saying "nobody cares" about intermediate values of the dbvar; nor that intra-relvar constraints can be violated at the comma.

There are no intermediate values of the dbvar. To the extent that they seem to occur in a MA, they are an illusion of the code, never appearing in reality.

There's nothing in the wording for RM pre 21 specific to scalar type vars vs nonscalars. "all go in at once" applies to MAs of scalar vars just as much as of nonscalars. Indeed your MA might mix updates to scalar vars with those to nonscalars.

Agreed. RM Pre 23 is what sets the kinds of constraint apart.

 

Andl - A New Database Language - andl.org
Quote from dandl on November 23, 2019, 1:29 pm

I do not find the definition flawed so much as requiring a little further explanation. Others might see it differently.

Re the other, perhaps I should be clearer. The wording of RM Pre 21 says:

where Vi is the name of some variable not defined in terms of any others and Xi is
an expression of declared type the same as that of Vi.

Each Vi is a variable obtained by the preceding process of expanding shortcuts, so is of some declared type. And likewise each Xi is an expression and it too has a declared type. For example, Vi might be of explicitly declared type INTEGER and Xi of declared type ODD (it is the value of a selector by that name). They are assignment compatible, and this is consistent with IM Pre 9.

I do not see how RM Pre 21 can require or assume a declared type for an expression (INTEGER in this case, where it is plainly ODD). Perhaps an example would help me to understand.

I agree that explicitly requiring Vi and Xi to be of the same declared type seems inconsistent with the IM, but TTM assumes no subtyping apart from what is stated in OO Pre 2.  In fact this has always bothered me a little.  I suppose we could have added a note to OO Pre 2 stating that if inheritance is supported, then some of the rules concerning type of expressions and values stated in this manifesto are to some extent relaxed in accordance with that model.  I'll see what Chris has to say.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from AntC on November 24, 2019, 11:11 am

This syntactic-based rule won't work for Multiple PossReps where one of the assignments targets via one PossRep, another via a different PossRep. Because notionally it would need constructing a value via the first-used PossRep then select from it using the second-used; and that notional constructed value might violate the type constraints. Another good reason to do away with Multiple PossReps, I'd say.

For the multi-possreps feature to function reasonably, there need to be ***declarations*** of how to convert from one to the other, for each component of each possrep.

THE_RHO(V) === SQRT(THE_X(V) * THE_X(V) + THE_Y(V) * THE_Y(V))

THE_THETA(V) === ATAN2(THE_Y(V), THE_X(V))

THE_X(V) === THE_RHO(V) * SIN(THE_THETA(V))

THE_Y(V) === THE_RHO(V) * COS(THE_THETA(V))

Now,

THE_X(P) := 10, THE_RHO(P) := 10;

P := WITH (P' : CART(10, THE_Y(P)) : POLAR(10, THE_THETA(P')) )

P := WITH (P' : CART(10, THE_Y(P)) : POLAR(10, ATAN2(THE_Y(P'), THE_X(P'))) )

P := WITH (P' : CART(10, THE_Y(P)) : POLAR(10, ATAN2(THE_Y(P), 10)) )

P := POLAR(10, ATAN2(THE_Y(P), 10)) )

Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

Having read the responses as far as Antc's, I now realise that Chris's description (to me) of the "broad idea" is not quite what we really had in mind.

Take the case where the MA contains two ore more assigns to possrep components of the same variable.  Then these need to be collected together in advance of RM Pre 21a and the values of their RHSes collected into a ordered n-tuple whose components then become arguments to the selector invocation that is assigned to the parent variable.

But perhaps we should just go with Dave Voorhis and Erwin Smout and leave RM Pre 21 as-is.  I agree that it's not really flawed, but it does mean that an implementation that gets around the problem then become non-conforming (strictly speaking).

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on November 24, 2019, 3:23 pm

Having read the responses as far as Antc's, I now realise that Chris's description (to me) of the "broad idea" is not quite what we really had in mind.

Take the case where the MA contains two ore more assigns to possrep components of the same variable.  Then these need to be collected together in advance of RM Pre 21a and the values of their RHSes collected into a ordered n-tuple whose components then become arguments to the selector invocation that is assigned to the parent variable.

But perhaps we should just go with Dave Voorhis and Erwin Smout and leave RM Pre 21 as-is.  I agree that it's not really flawed, but it does mean that an implementation that gets around the problem then become non-conforming (strictly speaking).

Hugh

Why "non-conforming" ?  E.g. what would make the process/procedure suggested by Antc (and me and no doubt Dandl even though he wasn't anywhere near as explicit as he should have been) be "non-conforming" ?  Because it won't throw the exception that your (plural) perception of the spec says should be thrown ?  That's odd because if you want the exception to be part of the spec then of course you can't come complaining it poses a problem ...

Quote from Hugh on November 24, 2019, 3:02 pm
Quote from dandl on November 23, 2019, 1:29 pm

I do not find the definition flawed so much as requiring a little further explanation. Others might see it differently.

Re the other, perhaps I should be clearer. The wording of RM Pre 21 says:

where Vi is the name of some variable not defined in terms of any others and Xi is
an expression of declared type the same as that of Vi.

Each Vi is a variable obtained by the preceding process of expanding shortcuts, so is of some declared type. And likewise each Xi is an expression and it too has a declared type. For example, Vi might be of explicitly declared type INTEGER and Xi of declared type ODD (it is the value of a selector by that name). They are assignment compatible, and this is consistent with IM Pre 9.

I do not see how RM Pre 21 can require or assume a declared type for an expression (INTEGER in this case, where it is plainly ODD). Perhaps an example would help me to understand.

I agree that explicitly requiring Vi and Xi to be of the same declared type seems inconsistent with the IM, but TTM assumes no subtyping apart from what is stated in OO Pre 2.  In fact this has always bothered me a little.  I suppose we could have added a note to OO Pre 2 stating that if inheritance is supported, then some of the rules concerning type of expressions and values stated in this manifesto are to some extent relaxed in accordance with that model.  I'll see what Chris has to say.

That was my intention, to point out that this sentence as written might not be fully consistent with the IM. At present TTM has only one concession to the IM (the reference to omega), but the term 'declared type' is used over 20 times. I haven't checked them all, but I can see others that have the same issue.

Andl - A New Database Language - andl.org
Quote from Erwin on November 24, 2019, 1:49 pm
Quote from AntC on November 24, 2019, 2:38 am

That's just like an Equi-Dependency between relvars; and it's been the classic justification for multiple assignment. So we can't expect constraints to hold 'at the comma'(?)

It is important to realise that constraints (database constraints in particular) DO hold throughout the entire assignment, i.e. "at every single comma" in particular.  Here's how (using extreme shorthand lingo for notational and expository convenience) :

INSET 1 INTO R, INSERT 2 INTO R;

R := (WITH R' (R UNION 1) : R' UNION 2)

Whatever key applies to R (and is obviously satisfied by it), that key cannot be inferred to also apply to (R UNION 1).  A system making correct inferences from declared constraints, will understand full well that constraints applying to R do not of necessity apply to (R UNION 1).

Imagine that the constraint is: R shall always contain an even number of tuples. That constraint is routinely violated by the first INSERT, and satisfied by the second. The note to RM Pre 23 gives another more abstract instance.

Type constraints cannot be violated because they define the set of values of a type; other values do not exist and cannot be created. There are no type errors, they just never happen.

Database constraints must be satisfied for an assignment to succeed. For a multiple assignment they succeed as a whole or none of them do (and presumably some other action is taken, such as an exception or error). There is no mechanism to test or violate a database constraint except 'at the semicolon'.

To me these are simply straightforward readings of RM Pre 23.

At an implementation level, your example is bundled into a single set-oriented merge. The insertion of {1,2} into {3,4,5,6} succeeds, the insertion of {1,2} into (1,3,4,5} or {2,3,4,5} fails. There is no 'at the comma'.

Andl - A New Database Language - andl.org
Quote from Hugh on November 24, 2019, 3:23 pm
Quote from Hugh on November 22, 2019, 12:40 pm

Consider the following attempt to update ELLIPSE variable E, via multiple assignment to its possrep components:

THE_B(E) := THE_A(E) + 1.0, THE_A(E) := THE_A(E) + 2.0;

Under the definition given in RM Pre 21 this is stated to be equivalent  to this:

E := WITH (E := ELLIPSE (THE_A(E), THE_A(E) + 1.0) : ELLIPSE (THE_A(E) + 2.0, THE_B(E));

The WITH clause defines a new E, overriding the variable of that name for the expression following the colon.  But ELLIPSE (THE_A(E), THE_A(E) + 1.0), a selector invocation, violates the type constraint for ELLIPSE, which requires the minor semiaxis, B to be no longer than the major one, A.

A long time ago Chris and I tried to fix this problem, but here is what he wrote in an email to me yesterday:

[Our] broad idea for [a] solution to this problem was to map the intended result of that invalid selector invocation to a tuple, then "update that tuple" (sorry, wash out my mouth with soap) in accordance with the other selector invocation, then map the resulting tuple back to an ellipse possrep.  But I don't know how to define such a procedure generically and formally.  If you don't know either, is there anyone you know and trust who could write an appropriate specification?  I do think this is a lacuna that needs to be plugged.

There are certainly people here I can trust.  Perhaps there's a better idea than our "broad" one.  Can anyone step forward?

Hugh

Having read the responses as far as Antc's, I now realise that Chris's description (to me) of the "broad idea" is not quite what we really had in mind.

Take the case where the MA contains two ore more assigns to possrep components of the same variable.  Then these need to be collected together in advance of RM Pre 21a and the values of their RHSes collected into a ordered n-tuple whose components then become arguments to the selector invocation that is assigned to the parent variable.

Sorry Hugh, I don't think that will work either in general. Example below.

But perhaps we should just go with Dave Voorhis and Erwin Smout and leave RM Pre 21 as-is.  I agree that it's not really flawed, but it does mean that an implementation that gets around the problem then become non-conforming (strictly speaking).

I've now read all the responses up to here. There appears to be some interpretation in the heads of Dave and Erwin of RM Pre 21 and 23 taken together that is not supported by the actual wording. Perhaps the interpretation is in DTATRM (but I shouldn't need to look there), or perhaps it's appeared in ancient discussions on the forum/while those implementors were implementing.

Can someone succinctly state "the problem" that Hugh's last sentence is talking about -- please in terms of RM Pre's, not in some implementor's/language designer's/type theorist's general terms. Is it per Hugh's OP "a selector invocation, violates the type constraint for ELLIPSE"? Well no selector violation appears in the original MA; the invocation appears only in the expansion per RM Pre 21, and  I see no need to count that expansion as an invocation 'appearing' anywhere visible to the user.

So no, I don't think RM Pre 21 should be left as is (and probably RM Pre 23 needs to be more explicit as well). Simply put: I do not understand the intent behind RM Pre 21, therefore were I to be implementing a D, I'd be unclear whether/why RM Pre 21 might be flawed, nor what damage any work-round might do.

RM Pre 23 says "No user shall ever see a state of affairs in which C [a constraint] is violated." This appears before RM Pre 23 makes the distinction between type constraints vs database constraints, and before the Note about inter-relvar constraints; so presumably applies to all constraints at all times(?)

Does "see a state of affairs" include expressions on RHS of assignments in a MA, where the expression is seeing 'at the comma'? I can find nothing in the wording of TTM that says KEY constraints on a single relvar should be treated differently to type constraints for these purposes.

Quote from Dave Voorhis on November 24, 2019, 10:23 am

...

Intuitively, deferring checking database constraints to the end of a multiple assignment seems straightforward; complicating it by temporarily allowing invalid scalar values seems (at least) undesirable. But it may be fine to do so; I await seeing the proof.

What is not equally straightforward about deferring checking scalar constraints to the end of a multiple assignment?

There's three possible interpretations of "invalid scalar values":

  • Assigning (say) a CHAR to a component declared as RAT. Yes this is never allowed.
  • Assigning 'at the comma' a RAT to a component declared as RAT that thereby violates the PossRep compound's declared constraint. Note this is validly assigning a value within the RAT component's "set of values" per RM Pre 23 a.
  • It's when we expand the shorthand to turn that into an assignment to the bare var of some compound type that we violate a constraint. This is Hugh's ELLIPSE example.

I don't feel a need to provide a proof here. I do feel a need to clarify the Pre's. If the idea behind RM Pre 21 is that we expand the shorthand first (which will elide some of the individual assignments), then check the constraints 'at the comma' resulting after the elision; RM Pre 21, 23 between them need to say so.

As it stands, because RM Pre 21 only gives an algorithm, I don't understand the intent. For example consider if there's relvar MYELLIPSES with attribute X of type ELLIPSE:

THE_B(E) := THE_A(E) + 1.0, UPDATE MYELLIPSES WHERE THE_A(X) = THE_A(E) : {THE_B(X) := THE_B(E)},   // comma here, so MA continues

THE_A(E) := THE_A(E) + 2.0, UPDATE MYELLIPSES WHERE THE_B(X) = THE_B(E) : {THE_A(X) := THE_A(E)};

(We'll conveniently ignore that the second UPDATE is probably not doing what the programmer intended.)

At the semicolon all constraints hold. At the first UPDATE, it is vital for that assignment to "see the state of affairs" in which the type constraint on E is violated -- ie THE_B(E) = THE_A(E) + 1.0. Furthermore there's no way to 'expand' those assignments to THE_xs in such a way as to get a single assignment to E and support the first UPDATE to apply as written.

So under my best interpretation of RM Pre 21, 23, my MA just above must be allowed; therefore Hugh's MA in the OP must be allowed. If you disagree, please explain your interpretation of RM Pre 21, 23 that would disallow those MAs (both or just one of?).