Proposed SAFEUNGROUP operator
Quote from Erwin on June 29, 2019, 10:57 pmQuote from Hugh on June 29, 2019, 1:19 pmQuote from johnwcowan on June 29, 2019, 1:28 amThe UNGROUP operator is not a safe operator; it cannot be applied to just any RVA. In particular, if the attribute names inside the RVA overlap at all with the attribute names of the relation holding the RVA, then the UNGROUP cannot be done until the conflicting names in the outer relation have been renamed away (since there is, AFAICT, no way to rename the inner ones). A second rename will then give all the attributes the desired final names. But this RENAME-UNGROUP-RENAME is both sufficiently annoying and overly general.
Therefore I am suggesting a slightly different operator called SAFEUNGROUP. It is not entirely safe either, but it's a lot better. The idea here is that if the RVA is named
foo
and its inner attributes are{bar, baz, zam}
, then they are ungrouped using the names{foo_bar, foo_baz, foo_zam}
. In particular, if there are two RVAs in the same relation that have the same type, then ungrouping both of them with UNGROUP is definitely going to produce a problem, but with SAFEUNGROUP it will work.Of course if by chance the outer relation already has attributes with these names, SAFEUNGROUP will fail. If another character such as $ were reserved for the purpose and never used for user-chosen attribute names, that source of conflict would be eliminated.
Sympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field). John's example doesn't work if an attribute named foo_bar already exists. I'd use RENAME with the PREFIX option, specifying a prefix that I "know" will be "safe". We introduced the PREFIX and SUFFIX options in an attempt to address the annoying-ness John mentions.
Hugh
If attribute naming policy is focused on always naming containers that hold content of the same type by the same name, i.e. always naming a part identifier attribute P_ID (the benefit of such a policy would be that many joins can naturally be written as a natural join), then adding prefixes willy-nilly for reasons that actually apply in only a rare number of cases (for some meaning of "rare"), is going to kill/contravene the benefits of the naming policy.
(And I know full well that there are those cases where "role" makes a difference, as in, say, BOM structures where of necessity we must have both CONTAINING_P_ID and CONTAINED_P_ID.)
Quote from Hugh on June 29, 2019, 1:19 pmQuote from johnwcowan on June 29, 2019, 1:28 amThe UNGROUP operator is not a safe operator; it cannot be applied to just any RVA. In particular, if the attribute names inside the RVA overlap at all with the attribute names of the relation holding the RVA, then the UNGROUP cannot be done until the conflicting names in the outer relation have been renamed away (since there is, AFAICT, no way to rename the inner ones). A second rename will then give all the attributes the desired final names. But this RENAME-UNGROUP-RENAME is both sufficiently annoying and overly general.
Therefore I am suggesting a slightly different operator called SAFEUNGROUP. It is not entirely safe either, but it's a lot better. The idea here is that if the RVA is named
foo
and its inner attributes are{bar, baz, zam}
, then they are ungrouped using the names{foo_bar, foo_baz, foo_zam}
. In particular, if there are two RVAs in the same relation that have the same type, then ungrouping both of them with UNGROUP is definitely going to produce a problem, but with SAFEUNGROUP it will work.Of course if by chance the outer relation already has attributes with these names, SAFEUNGROUP will fail. If another character such as $ were reserved for the purpose and never used for user-chosen attribute names, that source of conflict would be eliminated.
Sympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field). John's example doesn't work if an attribute named foo_bar already exists. I'd use RENAME with the PREFIX option, specifying a prefix that I "know" will be "safe". We introduced the PREFIX and SUFFIX options in an attempt to address the annoying-ness John mentions.
Hugh
If attribute naming policy is focused on always naming containers that hold content of the same type by the same name, i.e. always naming a part identifier attribute P_ID (the benefit of such a policy would be that many joins can naturally be written as a natural join), then adding prefixes willy-nilly for reasons that actually apply in only a rare number of cases (for some meaning of "rare"), is going to kill/contravene the benefits of the naming policy.
(And I know full well that there are those cases where "role" makes a difference, as in, say, BOM structures where of necessity we must have both CONTAINING_P_ID and CONTAINED_P_ID.)
Quote from johnwcowan on June 30, 2019, 1:25 amI think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple like
ID
, and the natural-join rules requireITEM_ID
andSUMMARY_ID
andEMP_ID
andDEP_ID
, which come across as verbose and redundant. I don't think there's anything to be done about it, though.(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable". Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last. I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)
I think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple likeID
, and the natural-join rules require ITEM_ID
and SUMMARY_ID
and EMP_ID
and DEP_ID
, which come across as verbose and redundant. I don't think there's anything to be done about it, though.
(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable". Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last. I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)
Quote from AntC on June 30, 2019, 5:44 amQuote from johnwcowan on June 29, 2019, 3:13 pmQuote from Hugh on June 29, 2019, 1:19 pmSympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field).
Yes, I should have said "partial", as AntC says.
Then lots of Tutorial D operators are partial, why pick on
UNGROUP
? Indeed the operators of Codd's RA are partial, and even a more algebraic algebra could not avoid it. The point is any mismatch of operands can be detected statically as a type inference failure, for all of those operators. This isn't a 'problem' that needs any more fixing.
- any operator with two or more relation operands needs those operands to be 'join-compatible' -- that is: same attribute name needs same attribute type within those operands, not necessarily database-wide. (Or you follow Tropashko and have attributes as untyped, that is: if you attempt to
JOIN
a relation with attribute<A INT>
to a relation with attribute<A CHAR>
, you'll get an empty result, because noINT
value compares equal to anyCHAR
value.)- Codd's so-called 'union-compatible' operators
(UNION, INTERSECTION, MINUS)
need the operands to have the same heading;TIMES
requires the operands' headings to be disjoint.GROUP
requires the target RVA attribute name to not already appear in the relation. (Or possibly, it can appear, providing it's one of the attributes to be grouped?? Is this valid in Tutorial D:SP GROUP P# := {P#, QTY}
? I'd call that obfuscated, at best. InSP GROUP P# := {ALL BUT S#}
you can't even see that it'sP#
getting grouped.)- Project/
REMOVE
require the named attributes to appear in the relation operand.RENAME, EXTEND
require the 'source(s)' to appear, the 'target' not to.WHERE
requires all the attribute names free in the restriction boolean expr to appear in the relation operand.- And I'm sure there are other restrictions (all the flavours of
DIVIDE
, subset/equality comparisons between relations, ...)
Quote from johnwcowan on June 29, 2019, 3:13 pmQuote from Hugh on June 29, 2019, 1:19 pmSympathise with the problem but not sure it's worth fixing, especially if the fix can't guarantee to be "safe" (I'm uncomfortable with the use of that word, which has other connotations in the database field).
Yes, I should have said "partial", as AntC says.
Then lots of Tutorial D operators are partial, why pick on UNGROUP
? Indeed the operators of Codd's RA are partial, and even a more algebraic algebra could not avoid it. The point is any mismatch of operands can be detected statically as a type inference failure, for all of those operators. This isn't a 'problem' that needs any more fixing.
- any operator with two or more relation operands needs those operands to be 'join-compatible' -- that is: same attribute name needs same attribute type within those operands, not necessarily database-wide. (Or you follow Tropashko and have attributes as untyped, that is: if you attempt to
JOIN
a relation with attribute<A INT>
to a relation with attribute<A CHAR>
, you'll get an empty result, because noINT
value compares equal to anyCHAR
value.) - Codd's so-called 'union-compatible' operators
(UNION, INTERSECTION, MINUS)
need the operands to have the same heading;TIMES
requires the operands' headings to be disjoint. GROUP
requires the target RVA attribute name to not already appear in the relation. (Or possibly, it can appear, providing it's one of the attributes to be grouped?? Is this valid in Tutorial D:SP GROUP P# := {P#, QTY}
? I'd call that obfuscated, at best. InSP GROUP P# := {ALL BUT S#}
you can't even see that it'sP#
getting grouped.)- Project/
REMOVE
require the named attributes to appear in the relation operand.RENAME, EXTEND
require the 'source(s)' to appear, the 'target' not to. WHERE
requires all the attribute names free in the restriction boolean expr to appear in the relation operand.- And I'm sure there are other restrictions (all the flavours of
DIVIDE
, subset/equality comparisons between relations, ...)
Quote from johnwcowan on June 30, 2019, 5:54 amNow that I understand how to use EXTEND for inner renaming, I'm no longer concerned.
Now that I understand how to use EXTEND for inner renaming, I'm no longer concerned.
Quote from Erwin on June 30, 2019, 1:07 pmQuote from johnwcowan on June 30, 2019, 1:25 amI think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple like
ID
, and the natural-join rules requireITEM_ID
andSUMMARY_ID
andEMP_ID
andDEP_ID
, which come across as verbose and redundant. I don't think there's anything to be done about it, though.(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable". Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last. I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)
Of course strictly adhering to such policy would imply that inside the EMP relvar thou shalt have the EMP_ID attribute, not the ID attribute. My 'RELVAR' relvar has the attribute 'RELVARNAME', not 'NAME'.
As for the stability of names, I remember a discussion of what it would take to identify "The country that was named 'Congo' from 1860 to 1960" as being the same one as "The country that was named 'Zaire' from 1960 to 1998" and also being the same as "The country that was named 'Congo' from 1998 to some moment in the future that is presently still unknown".
(Don't take my year numbers as anything supposed to match reality.)
Quote from johnwcowan on June 30, 2019, 1:25 amI think the naming policy, despite its great benefits, is particularly annoying in the case of artificial primary keys having no "business" meaning, where one tends to want to name them something simple like
ID
, and the natural-join rules requireITEM_ID
andSUMMARY_ID
andEMP_ID
andDEP_ID
, which come across as verbose and redundant. I don't think there's anything to be done about it, though.(By the way, I think people way overdo it with the artificial keys, on the grounds that "names aren't stable". Sometimes they aren't, but in such cases I like to point out to such people that the name "Roma" has been associated with the same city in Italy for the last 2771 years, which is rather longer than any system designed today is likely to last. I have even seen artificial keys used in two-attribute linkage relations representing many-to-many relationships, which means of course that the DBMS cannot ensure their uniqueness and joins involving them produce lots of replicated records in conventional systems.)
Of course strictly adhering to such policy would imply that inside the EMP relvar thou shalt have the EMP_ID attribute, not the ID attribute. My 'RELVAR' relvar has the attribute 'RELVARNAME', not 'NAME'.
As for the stability of names, I remember a discussion of what it would take to identify "The country that was named 'Congo' from 1860 to 1960" as being the same one as "The country that was named 'Zaire' from 1960 to 1998" and also being the same as "The country that was named 'Congo' from 1998 to some moment in the future that is presently still unknown".
(Don't take my year numbers as anything supposed to match reality.)