The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Key questions

PreviousPage 2 of 2

If there's multiple constraints, each taking a different key as 'Persistent', my guess is that it would be impossible to change any of the keys.

  • Suppose the S&P data model but relvar S having as keys both {{SNO}, {SNAME}} -- that is, Supplier names must be unique as well as Supplier numbers.
  • And suppose a transition constraint that STATUS if changed can only be increased. Or rather two constraints: one saying that if as-is-SNO is same as to-be-SNO then as-is-STATUS must be less-equal to-be-STATUS; and another with the same logic in terms of SNAME as PKey.

Are you suggesting that transition constraints should 'look through' the current transaction, and act to prevent certain changes 'by whatever means necessary'?

  • To prevent cheating the transition constraint by 'modifying a tuple' in one transaction to change 'its' SNO, then re-modifying in a second transaction the tuple with the modified SNO back to as-was-SNO with decreased STATUS, you'd need a transition constraint that no SNO can disappear. Same anti-cheating constraint for SNAME.

If a TC only works one level deep, then surely a requirement that is more than one level deep will have be expressed as a separate relation tracking changes.  The MINSTATUSVALUE relation records the previous highest value, and requires no value for that key be set lower. That's an ordinary constraint.

Also, in this context, the only way to avoid a key 'disappearing' is to accompany any change of key by the insertion of a new record with the old key in its place. Which is exactly equivalent to forbidding key changes.

  • Now I think you're banjaxed: you can't persist a SNO and change 'its' SNAME; neither can you persist a SNAME and change 'its' SNO. Either of those attempts would make a SNO/SNAME disappear.
  • So after inserting a tuple, its initial SNO, SNAME remain in lock-step.

Perhaps that was the intended purpose? But I think we're a bit light on details to be sure.

(?)

  • there's a relvar somewhere in the design that is supposed to document stuff like "the country that was called 'Zaire' from 1970 to 1998 is the same as the one that was called 'Congo' from 1905 to 1970." and the constraint is formulated in such a way as to take note of that relvar (which assumes the user is properly updating that relvar too).  This is going to be ***very hard*** to do because of the transitive closures it is going to require, and which involve a temporal aspect by definition.  I don't even know if it is technically possible using the TTM algebra (in particular the TC/temporal combo, though I suspect it must be) and if so, its feasability (which I suspect to be on the low side).

***Having*** declarative transition constraints in the first place was enough for me.

These are exactly the situation in hospitals I described earlier, except way more of them. This UR number 123456 has records from J Smith (Snr), J Smith (Jnr, son) and J Smith (unrelated, used in error); other records for these 3 exist under URs 222222, 333333 and 444444. Now sort out that lot!

 

Andl - A New Database Language - andl.org
Quote from AntC on October 12, 2020, 6:42 am
Quote from Erwin on October 12, 2020, 5:37 am
Quote from AntC on October 2, 2020, 3:53 am

I wonder how SIRA_PRISE's transition constraints are declared? Is there reliance on something that amounts to PKey? What if a relvar has multiple keys?

If a relvar has >1 key, then either :

  • one key is picked as "primary" for the purpose of that particular transition constraint (and the semantics won't be enforced in the case there's a change to the value of some other key)
  • the "same" constraint is declared multiple times, reformulated in terms of each of the applicable keys (this is the very first time I thought of this possibility myself, so I never thought this through and have no idea about feasibility or possible downsides, apart from the blatantly obvious "redundancy" downside)

If there's multiple constraints, each taking a different key as 'Persistent', my guess is that it would be impossible to change any of the keys.

  • Suppose the S&P data model but relvar S having as keys both {{SNO}, {SNAME}} -- that is, Supplier names must be unique as well as Supplier numbers.
  • And suppose a transition constraint that STATUS if changed can only be increased. Or rather two constraints: one saying that if as-is-SNO is same as to-be-SNO then as-is-STATUS must be less-equal to-be-STATUS; and another with the same logic in terms of SNAME as PKey.
  • To prevent cheating the transition constraint by 'modifying a tuple' in one transaction to change 'its' SNO, then re-modifying in a second transaction the tuple with the modified SNO back to as-was-SNO with decreased STATUS, you'd need a transition constraint that no SNO can disappear. Same anti-cheating constraint for SNAME.
  • Now I think you're banjaxed: you can't persist a SNO and change 'its' SNAME; neither can you persist a SNAME and change 'its' SNO. Either of those attempts would make a SNO/SNAME disappear.
  • So after inserting a tuple, its initial SNO, SNAME remain in lock-step.

(?)

  • there's a relvar somewhere in the design that is supposed to document stuff like "the country that was called 'Zaire' from 1970 to 1998 is the same as the one that was called 'Congo' from 1905 to 1970." and the constraint is formulated in such a way as to take note of that relvar (which assumes the user is properly updating that relvar too).  This is going to be ***very hard*** to do because of the transitive closures it is going to require, and which involve a temporal aspect by definition.  I don't even know if it is technically possible using the TTM algebra (in particular the TC/temporal combo, though I suspect it must be) and if so, its feasability (which I suspect to be on the low side).

***Having*** declarative transition constraints in the first place was enough for me.

 

Oh yes, top marks for achieving that.

With the 'Congo'/'Zaire' case, presumably its code/abbreviation changed at the same time as its name? wikipedia is only giving me 'Republic of the Congo'/.cg/COG. both name and code changing at the same time would be indistinguishable from one country 'disappearing' and some other country appearing fresh.

Or does SIRA_PRISE look at the form of UPDATE ... : {CDomain = '.cg', CName = 'Congo'}? Which smells suspiciously like it's sidestepping RM Pre 21: we can translate that UPDATE into assignments; but having done that we can't enforce the transition constraints against deleting a country code.

Your remarks are the reason why I believe there is never a good business case for transition constraints of the ilk "status can never decrease", "salary can never decrease", "can never go from unmarried to divorced" etc etc.  In particular "prevent cheating the constraint" is a hopeless endeavour imo.

The true case for "transition constraints" (I prefer to call them 'assignment constraints') is for declaring rules that include the identity of the user doing the transaction (security rules, that is) and/or rules that include context information such as the system date (persons cannot be -declared to be- born later than today, for example).  imo.

(Especially in the realm of security rules it's a fine achievement methinks : I don't force an information model (users, roles, departments etc) of my own making onto the user for him to fit in, he can just define his own and use that, and then the user is free to define really almost anything as the predicate to be satisfied by some assignment.)

SIRA_PRISE models an update as a tuple much like the database tuple.  But instead of an RVA for each relvar in the db, there's a TVA for each relvar and that TVA has two RVA's in turn : the inserts and the deletes.  It's this tuple that drives the constraint checking machinery so there's no dependency on syntactic form.  On the "downside", if you want to recognise updates and treat them "specially" you're limited to picking a key that must be treated by the apps as immutable (so the inserts joined to the -key projection of the- deletes produce nonempty results).  But for example, there's just as much a problem with "prevent cheating the update" as there is with "prevent cheating the transition".  I don't want to enter that discussion again.  Last time it lasted three years.  And I still believe there's no "prevent cheating".

(To illustrate what the constraints look like : "country codes can't disappear" would be something like SEMIMINUS(DELETES(COUNTRY), PROJECT(INSERTS(COUNTRY),(Ccode)).  That could be something like IS_EMPTY( (DELETES FROM COUNTRY FROM STATEMENTTUPLE) NOT MATCHING (INSERTS FROM COUNTRY FROM STATEMENTTUPLE){CCode} ) in Tutorial D.  You can do that but then there's no deleting anymore.  Do that, and by definition you'll get to the point where a user says "but now I must".)

Quote from Erwin on October 12, 2020, 10:51 am

Your remarks are the reason why I believe there is never a good business case for transition constraints of the ilk "status can never decrease", "salary can never decrease", "can never go from unmarried to divorced" etc etc.  In particular "prevent cheating the constraint" is a hopeless endeavour imo.

I have found that while such transition constraints may be valid in reality (depending on how realistic they really are), sooner or later someone makes a mistake when inserting data into the DB, which then has to be corrected.  The correction cannot be made because it now contradicts the constraint.  So the constraint has to be 'temporarily' lifted to make the correction, which defeats its purpose.

The way accountants get round such problem of 'mistaken' transactions is to allow the chief accountant (or some such) to put in a correcting transaction.  Both this and the original mistaken transaction get logged, so the auditors can check on it.

Quote from Erwin on October 12, 2020, 10:51 am

The true case for "transition constraints" (I prefer to call them 'assignment constraints') is for declaring rules that include the identity of the user doing the transaction (security rules, that is) and/or rules that include context information such as the system date (persons cannot be -declared to be- born later than today, for example).  imo.

The approach you propose seems to be just what is needed for a "Chief DB Accountant"/DBA to do accounting-style fixes.

Of course you can't expect the relational FORMALISM to deduce how to create the DESIGN of the transition/assignment constraints to make this possible.  The DB designer has to design the constraint appropriately.

Quote from David Livingstone on October 19, 2020, 8:16 pm

I have found that while such transition constraints may be valid in reality (depending on how realistic they really are), sooner or later someone makes a mistake when inserting data into the DB, which then has to be corrected.  The correction cannot be made because it now contradicts the constraint.  So the constraint has to be 'temporarily' lifted to make the correction, which defeats its purpose.

The way accountants get round such problem of 'mistaken' transactions is to allow the chief accountant (or some such) to put in a correcting transaction.  Both this and the original mistaken transaction get logged, so the auditors can check on it.

The approach you propose seems to be just what is needed for a "Chief DB Accountant"/DBA to do accounting-style fixes.

Of course you can't expect the relational FORMALISM to deduce how to create the DESIGN of the transition/assignment constraints to make this possible.  The DB designer has to design the constraint appropriately.

Yes.  Thanks for agreeing to my whole point.  The whole point being :

  • While the ***formulation*** of the transition constraint may be a perfectly valid [even formal] reflection/modeling/... of reality itself, applying it to the database is a ***mistake*** because that database is not reality itself, but only a model/reflection/... of it, and that model/reflection/... might be itself in error.
  • And the kind of "transition" constraint that allows to deal with this [in a "managed" sort of way], must by necessity include some kind of filtering on ***who*** is allowed to this kind of [potentially pervasive] corrections.  Which puts it outside the realm of "pure" transition constraints, where by "pure" I mean some rule that depends on nothing more than just before-state and after-state.

And it's good to see you"re still alive [and kicking] [and for student's sake, hopefully still in the teaching business].

PS

A remark wrt "The DB designer has to design the constraint appropriately." : and this requires the ***end user*** [who is supposed to act as the source of information for the DB designer at hand] to ***think*** ***for himself [or his organisation]*** about ***how he wants this new system to be secured***.  Stress ***how he ***wants*** ***.  Apologies for the nesting of the emphasis marks but it's how I feel about the current state of IT.  DB design discipline/knowledge is in a starving kind of way, but security has already starved out altogether.

Sorry for not joining in on this topic apart from an initial response to Antc, also for my failure to respond to his response to that.  (Btw, apart from anything else, I'm pretty sure there isn't going to be another revision of TTM now.)

It appears to me that subsequent discussion has mainly focussed on Antc's motivation for asking for primary key support: his desire to single out keys whose values are deemed to be "immutable".  I have no further comment on that particular issue, apart from agreeing with some objections I have seen (and therefore feeling a bit embarrassed about RM VSS3!).

I just want to say that, thinking about my own databases in Rel, I have several base relvars with more than one declared key and I have never once thought of a possible advantage in nominating one to be in some sense primary.  Moreover, if there is some sense in which an advantage could be obtained, that would presumably involve some kind of constraint over and above that implied by KEY; and if there is more than one "some sense", then the "primary" nomination would have to be qualified.  For Antc's requirements, why not just KEY{...} IMMUTABLE? (with MUTABLE as the default).

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on October 24, 2020, 2:13 pm

It appears to me that subsequent discussion has mainly focussed on Antc's motivation for asking for primary key support: his desire to single out keys whose values are deemed to be "immutable".  I have no further comment on that particular issue, apart from agreeing with some objections I have seen (and therefore feeling a bit embarrassed about RM VSS3!).

I just want to say that, thinking about my own databases in Rel, I have several base relvars with more than one declared key and I have never once thought of a possible advantage in nominating one to be in some sense primary.

Thanks Hugh, could I ask: are those relvars in 6NF? I'm not suggesting they should be, but if you were to take the schema as far as 6NF, would there still be multiple candidate keys in some relvar(s)?

The 'possible advantage' I see is avoiding update anomalies. For example if all INDs (Foreign Key Constraints) targeted (REFERENCEd) the same key in a relvar, there'd be less danger of dangling links. In particular with CASCADEd operations, it might be quite perplexing that some forms of update CASCADE following one Foreign Key, where others follow another.

  Moreover, if there is some sense in which an advantage could be obtained, that would presumably involve some kind of constraint over and above that implied by KEY; and if there is more than one "some sense", then the "primary" nomination would have to be qualified.  For Antc's requirements, why not just KEY{...} IMMUTABLE? (with MUTABLE as the default).

The question would be: what does marking a key as IMMUTABLE mean? Is it purely documentation, or does that mean the DBMS treats it specially/enforces extra rules/etc? My pick would be:

  • If an IND targets some relvar, that should be to the IMMUTABLE key.
  • On an update (noting that RM Pre 21 in effect rules out tuple-at-a-time thinking) I want to say: you can't amend the value of an IMMUTABLE key -- but of course that would be exactly tuple-at-a-time thinking. (And rules out renumbering keys because the 6 digits that were adequate 20 years ago have got exhausted.)
  • Or at least on an update you can't delete a tuple unless within the same Multiple Update you insert a key with the same content of its IMMUTABLE key. But there are legitimate business reasons to say the entity denoted by this key has ceased to be, for the purposes of the enterprise. Even if it has ceased to be, we probably want to keep records of what happened to it while it existed, and those'll be expressed in 'transactions' REFERENCing from other relvars, so the IND will prevent deletion/instead require marking as 'expired'/'inactive' status.
  • Hmm so any update to a relvar with a key marked IMMUTABLE must not decrease its cardinality. That's pretty feeble.
Quote from AntC on October 25, 2020, 1:56 am
Quote from Hugh on October 24, 2020, 2:13 pm

It appears to me that subsequent discussion has mainly focussed on Antc's motivation for asking for primary key support: his desire to single out keys whose values are deemed to be "immutable".  I have no further comment on that particular issue, apart from agreeing with some objections I have seen (and therefore feeling a bit embarrassed about RM VSS3!).

I just want to say that, thinking about my own databases in Rel, I have several base relvars with more than one declared key and I have never once thought of a possible advantage in nominating one to be in some sense primary.

Thanks Hugh, could I ask: are those relvars in 6NF? I'm not suggesting they should be, but if you were to take the schema as far as 6NF, would there still be multiple candidate keys in some relvar(s)?

The 'possible advantage' I see is avoiding update anomalies. For example if all INDs (Foreign Key Constraints) targeted (REFERENCEd) the same key in a relvar, there'd be less danger of dangling links. In particular with CASCADEd operations, it might be quite perplexing that some forms of update CASCADE following one Foreign Key, where others follow another.

  Moreover, if there is some sense in which an advantage could be obtained, that would presumably involve some kind of constraint over and above that implied by KEY; and if there is more than one "some sense", then the "primary" nomination would have to be qualified.  For Antc's requirements, why not just KEY{...} IMMUTABLE? (with MUTABLE as the default).

The question would be: what does marking a key as IMMUTABLE mean? Is it purely documentation, or does that mean the DBMS treats it specially/enforces extra rules/etc? My pick would be:

  • If an IND targets some relvar, that should be to the IMMUTABLE key.
  • On an update (noting that RM Pre 21 in effect rules out tuple-at-a-time thinking) I want to say: you can't amend the value of an IMMUTABLE key -- but of course that would be exactly tuple-at-a-time thinking. (And rules out renumbering keys because the 6 digits that were adequate 20 years ago have got exhausted.)
  • Or at least on an update you can't delete a tuple unless within the same Multiple Update you insert a key with the same content of its IMMUTABLE key. But there are legitimate business reasons to say the entity denoted by this key has ceased to be, for the purposes of the enterprise. Even if it has ceased to be, we probably want to keep records of what happened to it while it existed, and those'll be expressed in 'transactions' REFERENCing from other relvars, so the IND will prevent deletion/instead require marking as 'expired'/'inactive' status.
  • Hmm so any update to a relvar with a key marked IMMUTABLE must not decrease its cardinality. That's pretty feeble.

I have little idea of what IMMUTABLE would mean, precisely, and I don't much want to give it further thought.  My assumption was that Antc knew what he wanted and that it would be shorthand for some kind of constraint, probably a transition constraint.

I don't consciously strive for 6NF, but I have found at least one relvar that is binary with two unary keys (so in 6NF).  Neither of those two singleton keys is in any sense more special than the other.   I have also found one that has just one key that happens to be composite.  It is not in 6NF because it has two non-key attributes, but decomposing to 6NF would necessitate the proverbial "foreign keys in both directions" and therefore multiple assignment for inserts and deletes.  It's not the only relvar like that in the database I'm looking at.  What's more, this is one of my real business-like databases, with information about real people who submit solutions to bridge problems at my website devoted to this activity.

Hugh

Coauthor of The Third Manifesto and related books.

The selection of a primary key is arbitrary, but it determines a unique database schema from the set of all equivalent database schemas.  If foreign keys only reference primary keys, then the structure of the database will reflect that choice.  For instance, first with no primary key:

EMPLOYEE {BADGE#, EMP#, SSN, ...} KEY {BADGE#} KEY {EMP#} KEY {SSN}

EMPRATE {EMP#, RATE} KEY {EMP#} EMP# REFERENCES EMPLOYEE {EMP#}

EMPTIMEIN {BADGE#, DATE_IN, TIME_IN} KEY {BADGE#, DATE_IN, TIME_IN} BADGE# REFERENCES EMPLOYEE {BADGE#}

EMPHEALTH {SSN, PLAN} KEY {SSN} SSN REFERENCES EMPLOYEE {SSN}

Now with EMP# as the primary key:

EMPLOYEE {BADGE#, EMP#, SSN, ...} KEY {BADGE#} PRIMARY KEY {EMP#} KEY {SSN}

EMPRATE {EMP#, RATE} PRIMARY KEY {EMP#} EMP# REFERENCES EMPLOYEE {EMP#}

EMPTIMEIN {EMP#, DATE_IN, TIME_IN} PRIMARY KEY {EMP#, DATE_IN, TIME_IN} EMP# REFERENCES EMPLOYEE {EMP#}

EMPHEALTH {EMP#, PLAN} PRIMARY KEY {EMP#} EMP# REFERENCES EMPLOYEE {EMP#}

This database schema is information-equivalent to the former, but only primary keys are referenced.

Are there benefits to declaring primary keys?  From an implementation standpoint, most definitely, because only one hash table needs to be maintained to ensure referential integrity.  Indexes must be maintained for other candidate keys to ensure uniqueness for inserts and updates that target the attributes of a candidate key, but those indexes would not need to be referenced when inserts or updates occur to other tables.  However, from a logical standpoint, it doesn't really matter one way or the other, unless all primary keys are unary, in which case I believe that some normalization issues that are otherwise intractable can be solved.

Brian

PreviousPage 2 of 2