The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Are inclusion dependencies reducible to foreign-key dependencies?

Quote from johnwcowan on October 2, 2019, 12:56 am
Quote from Brian S on October 1, 2019, 3:12 pm

 

No, relations don't have keys.  Relations are values.  The same relation can be assigned to multiple relation variables--each of which may have a different key constraint.  Also, just because a projection over a subset of the heading of a relvar has the same cardinality as the relation in that relvar doesn't imply that that particular subset of the heading is a candidate key.

I grant that most of my problem is terminological:  I've been using "key" instead of "candidate key", and "key constraint" instead of "key".  But your last sentence is confusing:  I thought that is exactly what a candidate key is.  What is an example of a projection over some relation R having the same cardinality as R that is not a candidate key?

It's been said a number of times before in this thread, and obviously lots of times in preceding discussions : a key is declared on a variable, not on a value.  I.e. it's the variable that "has" the key, in the sense that any assignment to the variable will check for satisfaction of the key [constraint], and the relation, i.e. the assigned value, ***satisfies*** the key [constraint] (or not).  The value ***satisfying*** the key [constraint] is a precondition for the assignment to succeed to a variable that ***has*** the key [and corresponding constraint].

Of course one could agree to use the term "relation has a key" iff it satisfies that key, but as this entire thread, and many before it, shows, doing that ***increases*** the confusion instead of addressing/eliminating it.

(and yet iow : a key is just a set of attributes and when declared to apply to a variable, it gives rise to a corresponding database constraint.  A projection of a relation yields(/is) a relation, and a relation is not "just a set of attributes".)

Quote from Erwin on October 2, 2019, 9:54 pm

It's been said a number of times before in this thread, and obviously lots of times in preceding discussions : a key is declared on a variable, not on a value.

It seems to follow from this, then, that a relation cannot be in nth normal form for some n, but only a relvar.  Is that right?  Unfortunately the word relation is used ambiguously outside the TTM community, so I can't really tell.

Quote from johnwcowan on October 8, 2019, 7:49 pm
Quote from Erwin on October 2, 2019, 9:54 pm

It's been said a number of times before in this thread, and obviously lots of times in preceding discussions : a key is declared on a variable, not on a value.

It seems to follow from this, then, that a relation cannot be in nth normal form for some n, but only a relvar.  Is that right?

I think the term used for what's in nth normal form is a (relation or database) 'schema'. Outside of TTM amongst relational model/normalisation theorists, the term 'relvar' is not much used. Note nth normal form is relative to a set of FDs, so "only a relvar" isn't really right. RM VSS 2 says keys can/should be inferred for Relational Expressions as if they were the defining expressions for (virtual) relvars.

  Unfortunately the word relation is used ambiguously outside the TTM community, so I can't really tell.

Outside of TTM amongst the profession, relation is often used (wrongly) to mean table (something not necessarily a set), with no clear distinction between variables (base table) vs bag-of-rows value. But yes keys are what's declared against a base table. SQL does not follow RM VSS 2 in expecting the engine to infer keys for expressions (although many engines do so, for optimisations).

Quote from AntC on October 8, 2019, 10:15 pm
Quote from johnwcowan on October 8, 2019, 7:49 pm
Quote from Erwin on October 2, 2019, 9:54 pm

It's been said a number of times before in this thread, and obviously lots of times in preceding discussions : a key is declared on a variable, not on a value.

It seems to follow from this, then, that a relation cannot be in nth normal form for some n, but only a relvar.  Is that right?

... RM VSS 2 says keys can/should be inferred for Relational Expressions as if they were the defining expressions for (virtual) relvars.

If we want to go beyond taking au pied de la lettre of TTM, and your D is equipped with a rich type system that supports 'phantom types' and/or you're embedding D-ness in such a language (now available in Scala, Rust, Elm and all sorts) ...

RM VSS 2 doesn't exclude that your Relational Expression might be a relation literal. Then we can infer what key dependencies hold from the data content. Then we can take any relational content, express it equivalently as a relation literal expression, then infer the key(s) that hold. In general we might find more keys hold for the happenstance of content than would be inferred purely from the query expression that produced that content at a point in time.

From here on I'm going to talk about Functional Dependencies that hold (and/or that can be inferred), rather than about keys. We can always infer keys from knowing the FDs, but not necessarily v.v. For example take (S JOIN SP): its inferred key(s) is(are) {{S#, P#}} but saying merely that loses information that FD {S#} -> {SNAME, CITY, STATUS} also holds, as inherited from the key for S.

Given that RM Pre 10 says "A relation value r (relation for short) consists of a heading and a body," and that we generally interpret the heading part as being expressed in the type system, I want to augment: A relation value consists of a heading, a set of Functional Dependencies and a body. The heading + FDs together constitute the schema for that relation value/or for that relvar (if the value is a current value of a relvar). The heading is merely the type of the tuples in the body (which might be empty); the FDs are a phantom type. Then note there's no type-theoretical difficulty in assigning a value with a more-specific phantom type to a variable (relvar) declared with a less-specific type. (Liskov Substitution applies.)

 

 

Quote from johnwcowan on October 8, 2019, 7:49 pm
Quote from Erwin on October 2, 2019, 9:54 pm

It's been said a number of times before in this thread, and obviously lots of times in preceding discussions : a key is declared on a variable, not on a value.

It seems to follow from this, then, that a relation cannot be in nth normal form for some n, but only a relvar.  Is that right?  Unfortunately the word relation is used ambiguously outside the TTM community, so I can't really tell.

Yes, to say so (that a relation is in xNF) is a contribution to perpetuating the conflation between variable and value.  And yes, outside of the TTM community there is not much interest in trying to make do with that particular conflation and yes, that is utterly unfortunate.  What I see on the db noob forums is, however, somewhat better in the sense that the question asked is often whether their ***designs*** are xNF.  Which at least is in line with Anthony's remark that the property of being in xNF applies to a ***schema*** (plus F/J Ds).

Quote from AntC on October 9, 2019, 4:13 am

Given that RM Pre 10 says "A relation value r (relation for short) consists of a heading and a body," and that we generally interpret the heading part as being expressed in the type system, I want to augment: A relation value consists of a heading, a set of Functional Dependencies and a body. The heading + FDs together constitute the schema for that relation value/or for that relvar (if the value is a current value of a relvar). The heading is merely the type of the tuples in the body (which might be empty); the FDs are a phantom type. Then note there's no type-theoretical difficulty in assigning a value with a more-specific phantom type to a variable (relvar) declared with a less-specific type. (Liskov Substitution applies.)

I disagree.  A relation value consists of a heading and a body.  Consider this,  Does the fact that 11 is a prime number mean that 11 consists of being a prime number?  No.  It merely satisfies the conditions that govern whether a number is prime.

From a theoretical standpoint, a relation consists only of a body, which is merely a set of tuples that have identical attributes.  The TTM definition poses some problems--especially when combined with the IM.  For instance:

ODD: INT | (X \ 2) * 2 != X

EVEN: INT | (X \ 2) * 2 == X

P {A ODD, B EVEN}, Q{A EVEN, B ODD}, R{A INT, B INT}

The comparison P == Q must valid under the IM, because the empty relation {A INT, B INT}{} can be assigned to all three of the relation variables.

How do I know that:

DELETE P; R := P;

The assignment principle requires that R == P holds.

Likewise,

DELETE Q; R := Q;

The assignment principle requires that R == Q holds.

It follows that P == Q must also hold whenever both P and Q are empty.

One could also argue that the actual type of the empty relation isn't {A INT, B INT}, but rather {A ALPHA, B ALPHA}, so if S{A CHAR, B FLOAT} is a relation variable, then P == S must also hold whenever both P and S are empty.

So static type checking under the IM is therefore problematic: a relation expression of type {A INT} can be compared to a relation expression of type {A CHAR}, and are identical whenever both expressions yield empty relations.

Brian

Quote from Brian S on October 12, 2019, 5:38 pm
Quote from AntC on October 9, 2019, 4:13 am

Given that RM Pre 10 says "A relation value r (relation for short) consists of a heading and a body," and that we generally interpret the heading part as being expressed in the type system, I want to augment: A relation value consists of a heading, a set of Functional Dependencies and a body. The heading + FDs together constitute the schema for that relation value/or for that relvar (if the value is a current value of a relvar). The heading is merely the type of the tuples in the body (which might be empty); the FDs are a phantom type. Then note there's no type-theoretical difficulty in assigning a value with a more-specific phantom type to a variable (relvar) declared with a less-specific type. (Liskov Substitution applies.)

I disagree.  A relation value consists of a heading and a body.

I don't quite get what your disagreeing with. You've restated RM Pre 10. I'm not disagreeing that's what RM Pre 10 says, nor that TTM proceeds from that. I interpret: the heading is the type of a value; the body is the value (which might be an empty set). I think the wording in RM Pre 10 is hugely unfortunate (even though I used it above, for consistency), and I have complained about it at length to the point I was told to shut up. Your post seems to be another demonstration of how unfortunate it is.

A phantom type (which I am proposing to capture the schema of some value, and to support 'schema-inference' if you will) is also a type of some value; it is not somehow apart from or independent of the value.

  Consider this,  Does the fact that 11 is a prime number mean that 11 consists of being a prime number?  No.  It merely satisfies the conditions that govern whether a number is prime.

Strictly speaking, TTM is about how to represent and manipulate values within a data structure and programming language. Your "11" seems to be in some abstract Platonic world, divorced from representation or programming. We only need consider what-if the example was some 23-digit monster integer, to see the benefit (within a programming language) of accompanying values with 'schema' (primeness, in this example) for tractable type inference.

From a theoretical standpoint, a relation consists only of a body, which is merely a set of tuples that have identical attributes.  The TTM definition poses some problems--especially when combined with the IM.

The rest of your post seems to be a complaint about the IM. (And you're merely back on one of your pet peeves, about which you've also been told to shut up.) I won't disagree: I think the IM is an abomination; I have nothing to do with it.

 

 

Quote from Brian S on October 12, 2019, 5:38 pm

The TTM definition poses some problems--especially when combined with the IM.  For instance:

ODD: INT | (X \ 2) * 2 != X

EVEN: INT | (X \ 2) * 2 == X

P {A ODD, B EVEN}, Q{A EVEN, B ODD}, R{A INT, B INT}

The comparison P == Q must valid under the IM, because the empty relation {A INT, B INT}{} can be assigned to all three of the relation variables.

I think you're layering a whole bunch of unnecessary detail. The IM either does or does not allow subtypes to be compared without an explicit conversion. The rest follows. Embedding values into tuples and relations and giving them conflicting names is irrelevant to the issue. ODD and EVEN are different types with a common super-type. The IM either considers comparisons between values of those types to be valid or not. It's a matter of fact, not opinion.

[I have no idea. The IM is a thing of mystery to me, but Dave and Hugh should certainly be able to say yea or nay.]

Andl - A New Database Language - andl.org
Quote from dandl on October 12, 2019, 11:43 pm
Quote from Brian S on October 12, 2019, 5:38 pm

The TTM definition poses some problems--especially when combined with the IM.  For instance:

ODD: INT | (X \ 2) * 2 != X

EVEN: INT | (X \ 2) * 2 == X

P {A ODD, B EVEN}, Q{A EVEN, B ODD}, R{A INT, B INT}

The comparison P == Q must valid under the IM, because the empty relation {A INT, B INT}{} can be assigned to all three of the relation variables.

I think you're layering a whole bunch of unnecessary detail. The IM either does or does not allow subtypes to be compared without an explicit conversion. The rest follows. Embedding values into tuples and relations and giving them conflicting names is irrelevant to the issue. ODD and EVEN are different types with a common super-type. The IM either considers comparisons between values of those types to be valid or not. It's a matter of fact, not opinion.

[I have no idea. The IM is a thing of mystery to me, but Dave and Hugh should certainly be able to say yea or nay.]

Per my recollection, that's an essay question, though as if I recall correctly, Date (at least) felt that it should be allowable to compare, say, INT to CHAR.

But given an expression like 3 = "blah" I'd argue that it should obviously and unconditionally be a static type mismatch under the IM, despite INT and CHAR being subtypes of ALPHA. In Rel, it throws an error.

This is similar to, say, Java, where attempts to compare (using the '==' operator) an Integer to a String fails as a type mismatch, despite Integer and String both being subclasses of Object. But note that '==' is an object identity comparison, not a value comparison. The result of new Integer(3) == new Integer(3) is false! (Which, per identity comparison, is true.)

However, Java allows you to do value comparisons using the equals method. The comparison new Integer(3).equals(new String("blah")) is statically valid and returns false. The comparison new Integer(3).equals(new Integer(3)) returns true.

Java's approach, though notionally obedient to the implications of a class hierarchy rooted in Object (and perhaps unavoidable given how the language works) -- which is notionally isomorphic to an IM subtype hierarchy rooted in ALPHA -- is weak. I'd argue that the same weakness applies to an IM that allows comparisons of INT to CHAR. It weakens type safety, and shouldn't be allowed.

That said, I wouldn't categorically preclude such comparisons, only require that they be made explicit. I.e., if you wish to be able to compare INT to CHAR, you are required to define an implementation of the '=' operator specific to INT and CHAR operands. That's how Rel works.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

More or less as I would expect.

You're in danger of muddying the waters by mentioning Java. In Java (as in C#) strings are reference types except when they aren't. The equals operator by default compares for identity (so objects never compare equal unless they're the 'very same object') except when they're strings. And equals() is not the same '==' and '!=' operator, and compare() and compareTo() are different again (or was that C#?). No matter. [C++ is better because it's less 'helpful'.]

My preference is marginally to treat comparisons like that as valid but constant, and issue a warning. By all means compare equal a whosit to a fizzbug, but it will always be false, the code does nothing, and the compiler might as well pick that up and tell you about it.

But then what should we do if we compare a INT99 (range 0..99) to a INT9 (range 0..9)? It seems reasonable that if they share a common supertype they can be compared safely as values of that supertype. I don't think that rule runs foul of the ALPHA situation in the IM, does it?

Andl - A New Database Language - andl.org