The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Are inclusion dependencies reducible to foreign-key dependencies?

By inclusion dependency I mean this: given two relvars RV1 and RV2 and a set of attributes {A, B, C, ...} in common, the projection of the value of RV1 onto {A, B, C, ...} is a subset of the projection of the value of RV2 onto {A, B, C, ...}.

By foreign-key dependency, I mean an inclusion dependency such that {A, B, C, ...} is a superkey of RV2.

Is it enough to provide foreign-key dependencies to achieve the purposes of arbitrary inclusion dependencies, given that RV1 and RV2 may be factored?

 

I hope somebody knows.

 

Have you seen Chapter 13 "Inclusion Dependencies and Foreign Keys" of Database Explorations by Date & Darwen?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

No, but thanks for the pointer.

Short answer (as usual) is that SQL FOREIGN KEYs are not adequate. But since that's all that's available from most vendors, not a lot of people know that.

An obvious counter-example is: every Order header must have at least one Order detail/line. That can be expressed as an IND, but not a Foreign Key.

The Chapter Dave points to explains it all. In TTM-land we hardly ever mention Foreign Key constraints/dependencies; prefer Inclusion Dependencies.

The short answer is: No.

First of all, a foreign key always references a candidate key, not a superkey.  An inclusion dependency that references a superkey implies that the referencing relvar exhibits redundancy.  The same information can be obtained by simply joining the referencing and referenced relvars.

Second, when there is a foreign key constraint, there is a simple implication between each the facts represented by a referencing tuple and each of the facts represented by the referenced tuple.  When there is an inclusion dependency that does not reference a superkey, that is not the case.  When you take a projection, each tuple in the projection represents an existentially quantified sentence, but a projection over a superkey has the same arity as the original relation.  As a result, for each tuple in the referencing relation, there is exactly one tuple in the referenced relation, and the existential quantifiers can therefore be eliminated, resolving to simple implications between the facts represented by each referencing tuple and the facts represented by its corresponding referenced tuple.   That is not the case in general for inclusion dependencies, however, instead, the facts represented by each referencing tuple imply only that there are corresponding facts in the referenced relation, but they may not all be instantiations of the same tuple.  Existential quantification over a finite domain resolves into an iterated OR.  If a tuple in the referencing relation represents two facts, X and Y, and if each tuple in the referenced relation represents three facts, and there are two tuples having the referenced component, then there are six relevant facts, A1, A2, B1, B2, C1, C2, in the referenced relation.  The implications become

X IMPLIES (A1 OR A2), X IMPLIES (B1 OR B2), X IMPLIES (C1 OR C2), Y IMPLIES (A1 OR A2), Y IMPLIES (B1 OR B2), Y IMPLIES (C1 OR C2)

When a superkey is referenced, the disjunctions are eliminated, yielding a set of simple implications.

So, in answer to your question, the presence of disjunctive information that is inherent when an inclusion dependency does not reference a key or superkey eliminates the possibility of using simple foreign key constraints to represent arbitrary inclusion dependencies.

 

Brian

Well, that's two No's and one qualified Yes, the Yes being Date's in "Inclusion Dependencies and Foreign Keys".  At the bottom of p. 227 (PDF p. 245), he says that provided the target (right-hand) relation is allowed to be an arbitrary relational expression rather than just a relvar, any IND is indeed reducible to a foreign-key declaration.  The example is CONSTRAINT IND1 P{CITY} ⊆ S{CITY}, which is equivalent to the following foreign-key definition on P: FOREIGN KEY FK1 {CITY}REFERENCES S{CITY}.  Unfortunately, there is not enough explanation to show me how to do this with the order-detail example.

To make this practical, of course, requires figuring out the keys of a relational expression based on the declared keys of the relvars it refers to.  Unfortunately "The Role of Functional Dependence in Query Decomposition" does not seem to be freely available online.  It might be simpler to limit the right-hand side to being a base or derived relvar, and allow keys to be declared on derived relvars, signaling a compile-time error if an inconsistency between the derived and the corresponding base relvars exists, since it is easier to check for an inconsistency than to infer a consistent solution.

The article, however, fundamentally comes from a different mindset than mine.  Its background assumption, which is made explicit in a few places, is that there already exists a mechanism for imposing a constraint on the database as a whole, no matter if it touches every tuple in the database or potentially takes a looooong time to run (consider CONSTRAINT ALL is_prime(R{P}), where P is an attribute of A whose type is integer).  I'd rather not provide such a thing and talk about what constraints (subdomain constraint, key constraints, foreign-key constraints, perhaps cardinality constraints

Quote from johnwcowan on September 21, 2019, 3:56 pm

Well, that's two No's and one qualified Yes, the Yes being Date's in "Inclusion Dependencies and Foreign Keys".  At the bottom of p. 227 (PDF p. 245), he says that provided the target (right-hand) relation is allowed to be an arbitrary relational expression rather than just a relvar, any IND is indeed reducible to a foreign-key declaration.  The example is CONSTRAINT IND1 P{CITY} ⊆ S{CITY}, which is equivalent to the following foreign-key definition on P: FOREIGN KEY FK1 {CITY}REFERENCES S{CITY}.

I'm feeling not inclined to read the fine print of Date's chapter. A Foreign Key in SQL AFAIR must reference a key in the target. wp's first paragraph says " furthermore that those [target] attributes must also be a candidate key in [target relvar] S." [citations include to Date's guide to SQL 1996; no reference to the DBE Chapter]. So that rules out ... REFERENCES S{CITY} because CITY is not a key to relvar S. Perhaps the definition of 'Foreign Key' in that Chapter is not using the SQL sense -- which would be entirely reasonable. Also and for the same reason AFAIR SQL targets can't be "arbitrary relational expression"s.

Unfortunately, there is not enough explanation to show me how to do this with the order-detail example.

If we allow arbitrary relational expressions -- in particular projections as targets:

CONSTRAINT IND2 Orders{OrderNo} ⊆ OrderLines{OrderNo}

OrderNo is not a key to OrderLines -- that would be {OrderNo, OrderLine} and/or {OrderNo, P#}.

Beware I'm expecting also a more conventional Foreign Key IND from OrderLines to Orders on OrderNo. So the only way to INSERT into Orders is by also INSERTing into OrderLines with a Multiple Assignment. (Which is probably another reason why SQL doesn't do that.)

To make this practical, of course, requires figuring out the keys of a relational expression based on the declared keys of the relvars it refers to.

Well no: the CONSTRAINT's Boolean expression must evaluate to TRUE at all times (as with all CONSTRAINTs). It doesn't rely on the notion of key at all -- which is why it's better called an 'Inclusion Dependency' than a 'Foreign Key'.

  Unfortunately "The Role of Functional Dependence in Query Decomposition" does not seem to be freely available online.  It might be simpler to limit the right-hand side to being a base or derived relvar, and allow keys to be declared on derived relvars, signaling a compile-time error if an inconsistency between the derived and the corresponding base relvars exists, since it is easier to check for an inconsistency than to infer a consistent solution.

The article, however, fundamentally comes from a different mindset than mine.  Its background assumption, which is made explicit in a few places, is that there already exists a mechanism for imposing a constraint on the database as a whole, no matter if it touches every tuple in the database or potentially takes a looooong time to run (consider CONSTRAINT ALL is_prime(R{P}), where P is an attribute of A whose type is integer).  I'd rather not provide such a thing and talk about what constraints (subdomain constraint, key constraints, foreign-key constraints, perhaps cardinality constraints

That's an implementer's concern, if I might say so. Whereas Dependencies/constraints in general are a matter of the logic/information structure inherent in the business domain of the data. Then the pragmatic compromise comes in designing the schema. For your is_prime example, possibly pre-build a relvar holding all the primes, and turn the CONSTRAINT into an IND.

Quote from AntC on September 22, 2019, 3:30 am
Quote from johnwcowan on September 21, 2019, 3:56 pm

Well, that's two No's and one qualified Yes, the Yes being Date's in "Inclusion Dependencies and Foreign Keys".  At the bottom of p. 227 (PDF p. 245), he says that provided the target (right-hand) relation is allowed to be an arbitrary relational expression rather than just a relvar, any IND is indeed reducible to a foreign-key declaration.  The example is CONSTRAINT IND1 P{CITY} ⊆ S{CITY}, which is equivalent to the following foreign-key definition on P: FOREIGN KEY FK1 {CITY}REFERENCES S{CITY}.

I'm feeling not inclined to read the fine print of Date's chapter. ...

Having said I wouldn't of course I did. Frankly that last CONCLUDING REMARKS section of the Chapter exhibits one of the 'features' of Date's writing that I find screamingly annoyingly. [Aside I should say I find many such screamingly annoying features in his writing, which is why I try to avoid it. End of aside] You should stick with the definitions he gives at the start of the Chapter, FOREIGN KEYS starting middle of p. 200. In which indeed being a key of the target is the criterion "To be specific, they[FKs]’re the special case [of INDs] in which the set of attributes on which the target relvar R1 is to be projected constitutes a key for that relvar."

That's consistent also with the definition of foreign key Date gives in his 'Relational Database Dictionary'.

So I'm happy to say CONSTRAINT IND1 P{CITY} ⊆ S{CITY} is an Inclusion Dependency. I'm not happy to say it's a Foreign Key constraint. Then p. 227's FOREIGN KEY FK1 {CITY}REFERENCES S{CITY} is talking speculatively about a possible Tutorial D 'shorthand' for declaring INDs. (Shorthand in scare quotes, because the proposed syntax is longer than the IND.)

Date's "Clearly, this IND isn’t a foreign key constraint as conventionally understood. But it can be expressed as one!" might be paraphrased as 'clearly a hen isn't a duck as conventionally understood. But if we redefine "duck" as anything that quacks or clucks it can be counted as one!'. Yes CITY can (trivially) be inferred as constituting the key for that "arbitrary relational expression" S{CITY}; no that doesn't convince me the IND is thereby a Foreign Key constraint.

I see a chunk of that Chapter is talking about COMPENSATORY ACTIONS (such as CASCADE) and IMPLEMENTATION ISSUES. And that's sensible: Foreign Key constraints (in the proper sense) are far more tractable for implementors than INDs-in-general. With that speculative alleged Foreign Key on CITY I've no idea what Compensatory Actions make sense.

 

Quote from AntC on September 22, 2019, 4:57 am
That's an implementer's concern, if I might say so.
Indeed.  But an implementer can't escape the Iron Law of Big-O unless a new algorithm is found, which doesn't happen every day.  I was rather annoyed at a recent job interview when I gave an O(n log n) solution to the problem on the whiteboard, and was then asked for an O(n) solution.   Such a thing does exist, but I thought it was unreasonable to ask me to discover it on the fly; I either knew it or I didn't, and I didn't.

For your is_prime example, possibly pre-build a relvar holding all the primes, and turn the CONSTRAINT into an IND.

Well, those distinguished Greeks kirie Euclid and kirie Diophantus might have a word to say to that idea.

Date's "Clearly, this IND isn’t a foreign key constraint as conventionally understood. But it can be expressed as one!" might be paraphrased as 'clearly a hen isn't a duck as conventionally understood. But if we redefine "duck" as anything that quacks or clucks it can be counted as one!'.

From a purely theoretical perspective, yes, of course: a thing is what it is defined to be, neither more nor less.  But is it such a big leap to extend the notion of "foreign-key constraint" from referring to a relvar on its right-hand side to referring to a relation?  If so, I don't see why.

I see a chunk of that Chapter is talking about COMPENSATORY ACTIONS (such as CASCADE) and IMPLEMENTATION ISSUES. And that's sensible: Foreign Key constraints (in the proper sense) are far more tractable for implementors than INDs-in-general. With that speculative alleged Foreign Key on CITY I've no idea what Compensatory Actions make sense.

That's another thing: a constraint is a constraint and a trigger is a trigger, and I don't see any benefit of having a special syntax for triggers that are notionally associated with certain (not all) constraints than for other triggers.  Indeed, triggers are a dodgy notion: it might well be better to have DELETE ... WITH CASCADE and UPDATE ... WITH CASCADE to specify that such far-reaching actions are to take place, where the ordinary notions of DELETE and UPDATE simply fail if a constraint is violated.  We write rm -rf dir to remove a file-system directory and all its contents: we do not have a mode bit on dir to say that rm dir will succeed even if it has contents,  much less a global rule on the whole file system that prescribes (according to some criteria) which directories are going to remove their contents and which will fail on an attempt to remove them.

 

Quote from johnwcowan on September 22, 2019, 12:50 pm
Quote from AntC on September 22, 2019, 4:57 am

Date's "Clearly, this IND isn’t a foreign key constraint as conventionally understood. But it can be expressed as one!" might be paraphrased as 'clearly a hen isn't a duck as conventionally understood. But if we redefine "duck" as anything that quacks or clucks it can be counted as one!'.

From a purely theoretical perspective, yes, of course: a thing is what it is defined to be, neither more nor less.

The term 'Foreign Key' (constraint) is well-established and well-defined and was so long before Date started interfering. And he should accept that definition and shut the f*** up. Persistently and perniciously he takes well-established terms and bends them. And then gets all hurt when people don't follow. There are umpteen examples of that through the old forum: I would point to "COMPOSE" which is not algebraicists' compose; to 'Image Relations' which are not mathematicians' images of a function; to 'type inheritance' which is not OOP inheritance; ...

  But is it such a big leap to extend the notion of "foreign-key constraint" from referring to a relvar on its right-hand side to referring to a relation?

It's a leap, given that the notion of Key and Foreign Key were already well-established. Then the appropriate approach for an educator is to give a new term to a new(-ish) concept. And that term is 'Inclusion Dependency'. And all is well. Stop there. But then Date rips the arse out of it (as my mother wouldn't say).

  If so, I don't see why.

Your "referring to a relation" above is wrong: it's not relation (values) that have keys, but relational expressions, keys derived from declared keys for relvars mentioned. Read RM VSS 2 more closely. By generalising from FOREIGN KEY ... REFERENCES <relvar> <key> (where we can statically verify that <key> is indeed a declared key to <relvar>) to arbitrary relational expressions as targets, we're then at the mercy of the complexity of the expression, and the ability of the compiler/query engine to infer keys. That generalising makes it an 'Inclusion Dependency'. And all is well. Stop there. I see no conceptual gain and a great deal of conceptual confusion as against well-established terminology in trying to argue that a hen can be considered a duck. Implementors will need a bunch of special-purpose rules to distinguish ducks-that-quack vs ducks-that-cluck IOW INDs whose target is a declared key of a relvar vs INDs whose target is an arbitrary relational expression.

People claim Date is a great educator, even though his contributions to theory may be slight. I find him a terrible educator.