The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Proposed SAFEUNGROUP operator

12
Quote from Hugh on June 29, 2019, 1:19 pm
Quote from johnwcowan on June 29, 2019, 1:28 am

The UNGROUP operator is not a safe operator; it cannot be applied to just any RVA.  In particular, if the attribute names inside the RVA overlap at all with the attribute names of the relation holding the RVA, then the UNGROUP cannot be done until the conflicting names in the outer relation have been renamed away (since there is, AFAICT, no way to rename the inner ones).  A second rename will then give all the attributes the desired final names.  But this RENAME-UNGROUP-RENAME is both sufficiently annoying and overly general.

Therefore I am suggesting a slightly different operator called SAFEUNGROUP.  It is not entirely safe either, but it's a lot better.  The idea here is that if the RVA is named foo and its inner attributes are {bar, baz, zam}, then they are ungrouped using the names {foo_bar, foo_baz, foo_zam}.  In particular, if there are two RVAs in the same relation that have the same type, then ungrouping both of them with UNGROUP is definitely going to produce a problem, but with SAFEUNGROUP it will work.

Of course if by chance the outer relation already has attributes with these names, SAFEUNGROUP will fail.  If another character such as $ were reserved for the purpose and never used for user-chosen attribute names, that source of conflict would be eliminated.

Sympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field).  John's example  doesn't work if an attribute named foo_bar already exists.  I'd use RENAME with the PREFIX option, specifying a prefix that I "know" will be "safe".  We introduced the PREFIX and SUFFIX options in an attempt to address the annoying-ness John mentions.

Hugh

 

If attribute naming policy is focused on always naming containers that hold content of the same type by the same name, i.e. always naming a part identifier attribute P_ID (the benefit of such a policy would be that many joins can naturally be written as a natural join), then adding prefixes willy-nilly for reasons that actually apply in only a rare number of cases (for some meaning of "rare"), is going to kill/contravene the benefits of the naming policy.

(And I know full well that there are those cases where "role" makes a difference, as in, say, BOM structures where of necessity we must have both CONTAINING_P_ID and CONTAINED_P_ID.)

I think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple likeID, and the natural-join rules require ITEM_ID and SUMMARY_IDand EMP_ID and DEP_ID, which come across as verbose and redundant.  I don't think there's anything to be done about it, though.

(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable".  Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last.  I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)

Quote from johnwcowan on June 29, 2019, 3:13 pm
Quote from Hugh on June 29, 2019, 1:19 pm

Sympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field).

Yes, I should have said "partial", as AntC says.

Then lots of Tutorial D operators are partial, why pick on UNGROUP? Indeed the operators of Codd's RA are partial, and even a more algebraic algebra could not avoid it. The point is any mismatch of operands can be detected statically as a type inference failure, for all of those operators. This isn't a 'problem' that needs any more fixing.

  • any operator with two or more relation operands needs those operands to be 'join-compatible' -- that is: same attribute name needs same attribute type within those operands, not necessarily database-wide. (Or you follow Tropashko and have attributes as untyped, that is: if you attempt to JOIN a relation with attribute <A INT> to a relation with attribute <A CHAR>, you'll get an empty result, because no INT value compares equal to any CHAR value.)
  • Codd's so-called 'union-compatible' operators (UNION, INTERSECTION, MINUS) need the operands to have the same heading; TIMES requires the operands' headings to be disjoint.
  • GROUP requires the target RVA attribute name to not already appear in the relation. (Or possibly, it can appear, providing it's one of the attributes to be grouped?? Is this valid in Tutorial D: SP GROUP P# := {P#, QTY}? I'd call that obfuscated, at best. In SP GROUP P# := {ALL BUT S#} you can't even see that it's P# getting grouped.)
  • Project/REMOVE require the named attributes to appear in the relation operand. RENAME, EXTEND require the 'source(s)' to appear, the 'target' not to.
  • WHERE requires all the attribute names free in the restriction boolean expr to appear in the relation operand.
  • And I'm sure there are other restrictions (all the flavours of DIVIDE, subset/equality comparisons between relations, ...)

 

Now that I understand how to use EXTEND for inner renaming, I'm no longer concerned.

 

Quote from johnwcowan on June 30, 2019, 1:25 am

I think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple likeID, and the natural-join rules require ITEM_ID and SUMMARY_IDand EMP_ID and DEP_ID, which come across as verbose and redundant.  I don't think there's anything to be done about it, though.

(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable".  Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last.  I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)

Of course strictly adhering to such policy would imply that inside the EMP relvar thou shalt have the EMP_ID attribute, not the ID attribute.  My 'RELVAR' relvar has the attribute 'RELVARNAME', not 'NAME'.

As for the stability of names, I remember a discussion of what it would take to identify "The country that was named 'Congo' from 1860 to 1960" as being the same one as "The country that was named 'Zaire' from 1960 to 1998" and also being the same as "The country that was named 'Congo' from 1998 to some moment in the future that is presently still unknown".

(Don't take my year numbers as anything supposed to match reality.)

12