The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Are inclusion dependencies reducible to foreign-key dependencies?

Quote from AntC on September 23, 2019, 2:20 am

The term 'Foreign Key' (constraint) is well-established and well-defined and was so long before Date started interfering. And he should accept that definition and shut the f*** up.  Persistently and perniciously he takes well-established terms and bends them.

Well, if our predecessors hadn't done that, we wouldn't talk of files, records, assemblers, compilers, objects, and Ghu only knows how many other terms that someone took and wrenched out of their earlier meanings.  Terminological buccaneering, as Northrop Frye called it, is inevitable.

And then gets all hurt when people don't follow.

Much like Geoff Pullum on preposition and subjunctive, actually.  He's entitled to redefine his terms, but not to criticize people who in all innocence use them in older senses.

There are umpteen examples of that through the old forum: I would point to "COMPOSE" which is not algebraicists' compose; to 'Image Relations' which are not mathematicians' images of a function; to 'type inheritance' which is not OOP inheritance; ...

TTM already concedes that Cartesian product in databases doesn't mean what mathematicians mean by it, but uses it anyway.  What alternative term would you propose?

Your "referring to a relation" above is wrong: it's not relation (values) that have keys, but relational expressions, keys derived from declared keys for relvars mentioned.

Say what?  Relations have keys, relation variables have key constraints that say the value of this relation must have these keys.

By generalising from FOREIGN KEY ... REFERENCES <relvar> <key> (where we can statically verify that <key> is indeed a declared key to <relvar>) to arbitrary relational expressions as targets, we're then at the mercy of the complexity of the expression, and the ability of the compiler/query engine to infer keys.

So we are.    That's why I want to read Darwen's article "The Role of Functional Dependence in Query Decomposition", about which Date says: "Hugh Darwen has given solutions in [this article] for <relation exp>s involving any of the following relational operations: rename, project, restrict, join, product, intersect, union, difference, extend, and summarize."  But it's in Relational Database Writings 1989-1991, which as I say is not available online.  I may have to break down and buy it, despite its budget-busting price for me.

Here's the description of the paper in SQL and Relational Theory 3e:

This paper gives a set of inference rules by which functional dependencies (FDs) satisfied by the relation r denoted by an arbitrary relational expression can be inferred from those holding for the relvar(s) referenced in the expression in question. The set of FDs thus inferred can then be inspected to determine the key constraints satisfied by r, thus providing a basis for the key inference rules mentioned in passing in Chapter 4 of the present book.

Time to read Chapter 4, I guess.  Tomorrow.

 

Quote from johnwcowan on September 23, 2019, 3:49 am
Quote from AntC on September 23, 2019, 2:20 am

The term 'Foreign Key' (constraint) is well-established and well-defined and was so long before Date started interfering. And he should accept that definition and shut the f*** up.  Persistently and perniciously he takes well-established terms and bends them.

Well, if our predecessors hadn't done that, we wouldn't talk of files, records, assemblers, compilers, objects, and Ghu only knows how many other terms that someone took and wrenched out of their earlier meanings.  Terminological buccaneering, as Northrop Frye called it, is inevitable.

And if people didn't think in terms of "files, records" when discussing the relational model, we'd be getting along far better. "objects" is an almost-empty term anyway. I'm not seeing that re-purposing "assemblers, compilers" is doing any harm: those terms did not previously have specific technical meanings within the field.

But Date isn't doing 'terminological buccaneering': he's taking a term that already has a precise definition within the profession and trying to change its meaning to something that includes the original meaning plus something contradictory to the original. This phenomenon we already suffer from hugely in the industry: look what gets called 'Relational' by the marketeers and isn't and Date fulminates against them, quite rightly.

And then gets all hurt when people don't follow.

Much like Geoff Pullum on preposition and subjunctive, actually.  He's entitled to redefine his terms, but not to criticize people who in all innocence use them in older senses.

The proper sense of such terms is wrt Latin and other classical/highly inflected languages. They just don't make sense for English syntax as she is spoke. So it was the C19th grammarians who were the buccaneers; Geoff is trying to retrieve some balance. Which is why he's so caustic about passive voice. English doesn't have a system of voice as in inflection on the verb. The appropriate grammatical category to which passive applies for English is the clause.

There are umpteen examples of that through the old forum: I would point to "COMPOSE" which is not algebraicists' compose; to 'Image Relations' which are not mathematicians' images of a function; to 'type inheritance' which is not OOP inheritance; ...

TTM already concedes that Cartesian product in databases doesn't mean what mathematicians mean by it, but uses it anyway.  What alternative term would you propose?

That is Codd's usage. It is by well now-established (although I agree it's not the mathematicians' sense). So long-set precedence trumps accuracy. And if TTM is going to observe that precedence, why abandon it wrt Foreign Key?

Your "referring to a relation" above is wrong: it's not relation (values) that have keys, but relational expressions, keys derived from declared keys for relvars mentioned.

Say what?  Relations have keys,

No: the relational expression has key(s). Some relation value produced by applying that expression from some instance of the database at some point in time will have the property that the key(s)' attribute(s) together determine the non-key attributes. But for that relation value there could be other attribute(s) that by chance correlate with some other attributes, in a way that makes them appear to be keys. That's just happenstance of the point-in-time content of the database. It's not sufficient to make them keys.

relation variables have key constraints that say the value of this relation must have these keys.

"have these keys"? I can't make any sense of that. A key constraint (or any constraint) says the relation value must have such-and-such properties at all times. Please give an explanation of what you mean by "relation ... have ... keys" -- where you seem to mean relation value, not relvar nor relational expression. Then also please give chapter-and-verse for where you're getting that explanation.

By generalising from FOREIGN KEY ... REFERENCES <relvar> <key> (where we can statically verify that <key> is indeed a declared key to <relvar>) to arbitrary relational expressions as targets, we're then at the mercy of the complexity of the expression, and the ability of the compiler/query engine to infer keys.

So we are.    That's why I want to read Darwen's article "The Role of Functional Dependence in Query Decomposition", about which Date says: "Hugh Darwen has given solutions in [this article] for <relation exp>s involving any of the following relational operations: rename, project, restrict, join, product, intersect, union, difference, extend, and summarize."  But it's in Relational Database Writings 1989-1991, which as I say is not available online.  I may have to break down and buy it, despite its budget-busting price for me.

Here's the description of the paper in SQL and Relational Theory 3e:

This paper gives a set of inference rules by which functional dependencies (FDs) satisfied by the relation r denoted by an arbitrary relational expression can be inferred from those holding for the relvar(s) referenced in the expression in question. The set of FDs thus inferred can then be inspected to determine the key constraints satisfied by r, thus providing a basis for the key inference rules mentioned in passing in Chapter 4 of the present book.

Time to read Chapter 4, I guess.  Tomorrow.

 

What's missing in those quotes is "statically" inferred or inferred "at compile time" OO Pre 1. Here's an example:

SSP1 := S JOIN SP;  -- inferred key in general is {S#, P#}

SSP2 := S JOIN (SP WHERE S# = S1 AND P# = P4);


SP := REL{TUP{S# S1, P# P4, QTY 50}}; 
SSP3 := S JOIN SP;  -- has only one tuple, what would you like to say is its key?


SSP2 and SSP3 have the same value. Does your notion of "relations have keys" apply here? The expression for the assignment to SSP1, SSP3 is syntactically identical. Please explain how static inference will derive different keys for SSP1 vs SSP3, but the same key for SSP2 vs SSP3. Or will it? Or what?

Notice that with that assignment of a single tuple to SP, we might like to say the resulting key for SP is more specific than its declared key {S#, P#}. But that would need dynamic typing semantics, which cuts against the whole idea of declared key. How does that go with your notion of "relations have keys"?

From RM VSS 2

D should provide such [key inference] functionality, but without any guarantee (a) that such inferred candidate keys are
not proper supersets of actual candidate keys (“proper superkeys”) or (b) that such an inferred candidate
key is discovered for every actual candidate key.

Sounds like Hugh is already aware the algorithm has limitations.

Good luck with that!

Quote from johnwcowan on September 20, 2019, 8:29 pm

By inclusion dependency I mean this: given two relvars RV1 and RV2 and a set of attributes {A, B, C, ...} in common, the projection of the value of RV1 onto {A, B, C, ...} is a subset of the projection of the value of RV2 onto {A, B, C, ...}.

By foreign-key dependency, I mean an inclusion dependency such that {A, B, C, ...} is a superkey of RV2.

Is it enough to provide foreign-key dependencies to achieve the purposes of arbitrary inclusion dependencies, given that RV1 and RV2 may be factored?

 

I hope somebody knows.

Somebody thinks he knows, but ...

Are your RV1 and RV2 relvars constrained to being (1) base relvars specifically, (2) just any declared relvar (i.e. virtual ones included) or (3) just any arbitrary expression that could just as well also have been explicitly declared as a virtual relvar but wasn't for some reason of varying degree of relevance ?

In (1) and (2), it's "no", in (3) it's "yes".

The reason for the "yes" is that it is then always possible to architect your RV1/RV2 in such a way that, say, your {A B C ...} ***becomes*** a key of RV2.

Take e.g. "All parts must be in a city where there is a supplier".  Formally, P{CITY} SUBSETOF S{CITY}.

{CITY} is not a superkey of S, but it ***is*** a superkey of the projection itself.  (So equate your RV2 not with S, but with the projection.)

Moreover, it is even the case that in (3), every conceivable db constraint becomes reducible to INDs/FKs (note that because we are in interpretation (3) here, we are not speaking of the limited/constrained/narrow/... SQL sense of "FK" here).

Take any constraint.  It defines a scenario that should never ever occur.  Write the RA expression for finding the occurrences of that scenario.  Require it to be a subset of TABLE_DUM (after first projecting on {} for argument compatibility).  There you go.

Quote from AntC on September 23, 2019, 5:48 am

 

And if people didn't think in terms of "files, records" when discussing the relational model, we'd be getting along far better.

Historically file meant more like 'directory' and record more like 'document', just like paper records stored in file folders.  Early programmers took over those metaphors directly, but they have now separated from their source.  Asserting that that's in itself a fault is the etymological fallacy.

No: the relational expression has key(s). Some relation value produced by applying that expression from some instance of the database at some point in time will have the property that the key(s)' attribute(s) together determine the non-key attributes.

Just so; if a relation produced by evaluating a relational expression has key attributes, then surely those key attributes taken together form its key.

But for that relation value there could be other attribute(s) that by chance correlate with some other attributes, in a way that makes them appear to be keys. That's just happenstance of the point-in-time content of the database. It's not sufficient to make them keys.

I would put that thus: constraints apply to variables, and there may or may not exist a key-constraint on the variable enforcing that some attribute set is always going to be a key for the relation that is the value of that variable.  Other keys of the relation are indeed happenstance from the viewpoint of the variable, though not perhaps from other variables whose value is the same relation.

"To be is to be the value of a variable"; are we replaying Strawson v. Quine here?

Please give an explanation of what you mean by "relation ... have ... keys" -- where you seem to mean relation value, not relvar nor relational expression.

Per above, a relation has a key if some subset of the attributes are jointly unique for each tuple of the relation.   Every relation has at least one key, viz. all the attributes.  Consequently, every relvar has at least one key-constraint: the constraint that the set of all the attributes is a (super)key.

But otherwise, a relvar may or may not have a key-constraint saying "set X of attributes must always be a key of my value."  This is deontic rather than propositional logic.

SSP1 := S JOIN SP;  -- inferred key in general is {S#, P#}

SSP2 := S JOIN (SP WHERE S# = S1 AND P# = P4);


SP := REL{TUP{S# S1, P# P4, QTY 50}}; 
SSP3 := S JOIN SP;  -- has only one tuple, what would you like to say is its key?


SSP2 and SSP3 have the same value. Does your notion of "relations have keys" apply here? The expression for the assignment to SSP1, SSP3 is syntactically identical. Please explain how static inference will derive different keys for SSP1 vs SSP3, but the same key for SSP2 vs SSP3. Or will it? Or what?

Static inference tells us that the value of SSP2 and SSP3 has just one tuple, as S# and P# are constrained to have just one value.  That means that any subset of the attributes except the empty subset constitutes a key for that value.  (Exception:  DEE does not have a key; a fortiori you cannot put a key-constraint on a relvar that constrains its value to be DEE, so you have to use cardinality and degree constraints instead.)

Quote from johnwcowan on September 23, 2019, 2:18 pm
Quote from AntC on September 23, 2019, 5:48 am

 

 

 

Please give an explanation of what you mean by "relation ... have ... keys" -- where you seem to mean relation value, not relvar nor relational expression.

Per above, a relation has a key if some subset of the attributes are jointly unique for each tuple of the relation.   Every relation has at least one key, viz. all the attributes.  Consequently, every relvar has at least one key-constraint: the constraint that the set of all the attributes is a (super)key.

But otherwise, a relvar may or may not have a key-constraint saying "set X of attributes must always be a key of my value."  This is deontic rather than propositional logic.

SSP1 := S JOIN SP;  -- inferred key in general is {S#, P#}

SSP2 := S JOIN (SP WHERE S# = S1 AND P# = P4);


SP := REL{TUP{S# S1, P# P4, QTY 50}}; 
SSP3 := S JOIN SP;  -- has only one tuple, what would you like to say is its key?


SSP2 and SSP3 have the same value. Does your notion of "relations have keys" apply here? The expression for the assignment to SSP1, SSP3 is syntactically identical. Please explain how static inference will derive different keys for SSP1 vs SSP3, but the same key for SSP2 vs SSP3. Or will it? Or what?

Static inference tells us that the value of SSP2 and SSP3 has just one tuple, as S# and P# are constrained to have just one value.  That means that any subset of the attributes except the empty subset constitutes a key for that value.  (Exception:  DEE does not have a key; a fortiori you cannot put a key-constraint on a relvar that constrains its value to be DEE, so you have to use cardinality and degree constraints instead.)

Oh dear. RM Pro 5

D shall not forget that relations with no attributes are respectable and interesting, nor that candidate keys
with no components are likewise respectable and interesting.

DEE (as usually declared) has a key with no components. Oh in fact RM Pre 15 requires every relvar to have a key. (We might quibble whether DEE is a relvar or a relation constant, but I'm pretty sure Hugh would insist relcons have candidate keys too.)

Any relvar which is constrained to contain only one tuple thereby has a candidate key with no components.

You seem to have missed the point of my example: for a relvar constrained to contain only one tuple, any arbitrary subset of the attributes is a superkey (of the key with no components). So when you assign a single tuple to some relvar declared with a non-empty key, and/or when the result from a relational expression contains only one tuple, you could say the relation value has a candidate key with no components. But you could say many other things, which is why the idea of "relation has key" is not coherent (going by your explanation thereof).

Quote from AntC on September 23, 2019, 2:20 am

Implementors will need a bunch of special-purpose rules to distinguish ducks-that-quack vs ducks-that-cluck IOW INDs whose target is a declared key of a relvar vs INDs whose target is an arbitrary relational expression.

I don't understand why people want to keep searching for the solution in that direction.  It's not there.

(And I agree it would be far better for clarity if people stopped talking of relations "having" keys and replaced it with relations ***satisfying*** keys (or rather the constraint that corresponds to it - a key is just a set of attribute names.)

Quote from AntC on September 23, 2019, 11:30 pm
Quote from johnwcowan on September 23, 2019, 2:18 pm
Quote from AntC on September 23, 2019, 5:48 am

 

 

 

Please give an explanation of what you mean by "relation ... have ... keys" -- where you seem to mean relation value, not relvar nor relational expression.

Per above, a relation has a key if some subset of the attributes are jointly unique for each tuple of the relation.   Every relation has at least one key, viz. all the attributes.  Consequently, every relvar has at least one key-constraint: the constraint that the set of all the attributes is a (super)key.

But otherwise, a relvar may or may not have a key-constraint saying "set X of attributes must always be a key of my value."  This is deontic rather than propositional logic.

SSP1 := S JOIN SP;  -- inferred key in general is {S#, P#}

SSP2 := S JOIN (SP WHERE S# = S1 AND P# = P4);


SP := REL{TUP{S# S1, P# P4, QTY 50}}; 
SSP3 := S JOIN SP;  -- has only one tuple, what would you like to say is its key?


SSP2 and SSP3 have the same value. Does your notion of "relations have keys" apply here? The expression for the assignment to SSP1, SSP3 is syntactically identical. Please explain how static inference will derive different keys for SSP1 vs SSP3, but the same key for SSP2 vs SSP3. Or will it? Or what?

Static inference tells us that the value of SSP2 and SSP3 has just one tuple, as S# and P# are constrained to have just one value.  That means that any subset of the attributes except the empty subset constitutes a key for that value.  (Exception:  DEE does not have a key; a fortiori you cannot put a key-constraint on a relvar that constrains its value to be DEE, so you have to use cardinality and degree constraints instead.)

Oh dear. RM Pro 5

Yes, yes, just a brain fart.  The empty key is the key of Dee (which may be divided into three sub-types, known as Sharp, Flat, and Natural) and Dum, but nothing else.nd interesting.

DEE (as usually declared) has a key with no components. Oh in fact RM Pre 15 requires every relvar to have a key. (We might quibble whether DEE is a relvar or a relation constant, but I'm pretty sure Hugh would insist relcons have candidate keys too.)

Surely it's utterly arbitrary to say that a relation does not have keys, but if you bind it to a name, then it does.   What neither a relation nor a relcon have is key constraints, since when nothing can change, there is no need for a constraint.  It's pointless to say that con x int = 5 constrains x to be 5:  x is just a name for the value 5.

You seem to have missed the point of my example: for a relvar constrained to contain only one tuple, any arbitrary subset of the attributes is a superkey (of the key with no components).

Except for slipping up about the empty key, I understood it entirely.

Quote from johnwcowan on September 23, 2019, 3:49 am
Quote from AntC on September 23, 2019, 2:20 am

The term 'Foreign Key' (constraint) is well-established and well-defined and was so long before Date started interfering. And he should accept that definition and shut the f*** up.  Persistently and perniciously he takes well-established terms and bends them.

Well, if our predecessors hadn't done that, we wouldn't talk of files, records, assemblers, compilers, objects, and Ghu only knows how many other terms that someone took and wrenched out of their earlier meanings.  Terminological buccaneering, as Northrop Frye called it, is inevitable.

And then gets all hurt when people don't follow.

Much like Geoff Pullum on preposition and subjunctive, actually.  He's entitled to redefine his terms, but not to criticize people who in all innocence use them in older senses.

There are umpteen examples of that through the old forum: I would point to "COMPOSE" which is not algebraicists' compose; to 'Image Relations' which are not mathematicians' images of a function; to 'type inheritance' which is not OOP inheritance; ...

TTM already concedes that Cartesian product in databases doesn't mean what mathematicians mean by it, but uses it anyway.  What alternative term would you propose?

Your "referring to a relation" above is wrong: it's not relation (values) that have keys, but relational expressions, keys derived from declared keys for relvars mentioned.

Say what?  Relations have keys, relation variables have key constraints that say the value of this relation must have these keys.

No, relations don't have keys.  Relations are values.  The same relation can be assigned to multiple relation variables--each of which may have a different key constraint.  Also, just because a projection over a subset of the heading of a relvar has the same cardinality as the relation in that relvar doesn't imply that that particular subset of the heading is a candidate key.

 

Quote from Brian S on October 1, 2019, 3:12 pm

No, relations don't have keys.  Relations are values.  The same relation can be assigned to multiple relation variables--each of which may have a different key constraint.  Also, just because a projection over a subset of the heading of a relvar has the same cardinality as the relation in that relvar doesn't imply that that particular subset of the heading is a candidate key.

 

True, but it raises an interesting problem. Do you remember Cobol, when to invoke string formatting you had to MOVE a value to a particular variable? This is shaping up the same way. We don't do things that way any more.

Of course relational values do not have keys, but there are occasions when it would be useful to test a value for compliance with a key (or other constraint) without the need for storing the value in a variable. A constraint is a function, which is true or false for any relation value (of the proper type). A relvar KEY or a database CONSTRAINT are simply functions that are applied in connection with relational assignment, but should be available for more general use in evaluating relational expressions.

Andl - A New Database Language - andl.org
Quote from Brian S on October 1, 2019, 3:12 pm

 

No, relations don't have keys.  Relations are values.  The same relation can be assigned to multiple relation variables--each of which may have a different key constraint.  Also, just because a projection over a subset of the heading of a relvar has the same cardinality as the relation in that relvar doesn't imply that that particular subset of the heading is a candidate key.

I grant that most of my problem is terminological:  I've been using "key" instead of "candidate key", and "key constraint" instead of "key".  But your last sentence is confusing:  I thought that is exactly what a candidate key is.  What is an example of a projection over some relation R having the same cardinality as R that is not a candidate key?