The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

ANNOUNCE: Rel version 3.013 now available!

Quote from Dave Voorhis on December 16, 2018, 10:37 am
Quote from Brian S on December 16, 2018, 5:27 am

Are you saying that REL doesn't support relation-valued attributes?

The logic of missing information is actually quite clear: (1) there is an n-ary predicate; (2) it has been asserted that there is an instantiation of that n-ary predicate for a particular key value; (3) it is not clear precisely which instantiation satisfies that n-ary predicate for that particular key value.

There is an existentially-quantified sentence that is true for each and every dependent component: there is one and only one x such that P(k, x).  This is the case regardless of whether it is known what that dependent component actually is.

Rel supports relation-valued attributes. They should not be used to re-invent SQL NULL.

I reject the notion that "missing information" needs to be recorded in a relational database. What does need to be recorded, in some cases, is the entirely non-missing fact that a particular attribute value is not within its usual domain.

For example, we may need to record the fact that a temperature reading did not find a certain value in degrees Celsius, and instead found a broken thermometer or defective sensor. What we actually record in a relational database are readings about the real (or a fictional, but notionally veridical) world, the domains of which can be wholly described by appropriate types.

A truly "missing" reading is not recorded, because by definition, it is missing. If we can record it, then by definition it is not missing.

Thus, "missing" can only be the absence of any record (a notion covered under the Closed World Assumption). Any present record is not missing.

Indeed, a record with "missing" information is a contradiction.

I disagree with your characterization of tuples as readings.  A tuple in an item master relation isn't a reading: it identifies a class of real-world objects which may not yet actually exist.  Until the first article has been produced, the item in question has no substance and exists only conceptually in the mind, and yet it can still appear in the database.

I disagree also with your characterization of the Closed World Assumption:  You can record what you know and you can record what you don't know, but under the Closed World Assumption, you can't record that you don't know what you don't know.

I think it's important to be able to record that there should be a value--even if it is not clear which one it is.  The presence of a tuple with a ZOO component asserts the fact that there should be a value, and a non-empty ZOO component declares which one it is.

The logic is in fact the logic of projection: "there is an x such that P(k, x)."  P(k, a) implies "there is an x such that P(k, x)," but it doesn't change the fact that P(k, a) and "there is an x such that P(k, x)" are distinct facts, and that "there is an x such that P(k, x)" can hold even if it is not clear whether P(k, a) holds, or P(k, b) holds, or P(k, c) holds.  This is the essence of missing information.  Whether information is in fact missing is distinct from the reason it is missing.  Your example above conflates the two concepts.

Quote from Brian S on December 22, 2018, 3:59 am
Quote from Dave Voorhis on December 16, 2018, 10:37 am
Quote from Brian S on December 16, 2018, 5:27 am

Are you saying that REL doesn't support relation-valued attributes?

The logic of missing information is actually quite clear: (1) there is an n-ary predicate; (2) it has been asserted that there is an instantiation of that n-ary predicate for a particular key value; (3) it is not clear precisely which instantiation satisfies that n-ary predicate for that particular key value.

There is an existentially-quantified sentence that is true for each and every dependent component: there is one and only one x such that P(k, x).  This is the case regardless of whether it is known what that dependent component actually is.

Rel supports relation-valued attributes. They should not be used to re-invent SQL NULL.

I reject the notion that "missing information" needs to be recorded in a relational database. What does need to be recorded, in some cases, is the entirely non-missing fact that a particular attribute value is not within its usual domain.

For example, we may need to record the fact that a temperature reading did not find a certain value in degrees Celsius, and instead found a broken thermometer or defective sensor. What we actually record in a relational database are readings about the real (or a fictional, but notionally veridical) world, the domains of which can be wholly described by appropriate types.

A truly "missing" reading is not recorded, because by definition, it is missing. If we can record it, then by definition it is not missing.

Thus, "missing" can only be the absence of any record (a notion covered under the Closed World Assumption). Any present record is not missing.

Indeed, a record with "missing" information is a contradiction.

I disagree with your characterization of tuples as readings.  A tuple in an item master relation isn't a reading: it identifies a class of real-world objects which may not yet actually exist.  Until the first article has been produced, the item in question has no substance and exists only conceptually in the mind, and yet it can still appear in the database.

I disagree also with your characterization of the Closed World Assumption:  You can record what you know and you can record what you don't know, but under the Closed World Assumption, you can't record that you don't know what you don't know.

I think it's important to be able to record that there should be a value--even if it is not clear which one it is.  The presence of a tuple with a ZOO component asserts the fact that there should be a value, and a non-empty ZOO component declares which one it is.

The logic is in fact the logic of projection: "there is an x such that P(k, x)."  P(k, a) implies "there is an x such that P(k, x)," but it doesn't change the fact that P(k, a) and "there is an x such that P(k, x)" are distinct facts, and that "there is an x such that P(k, x)" can hold even if it is not clear whether P(k, a) holds, or P(k, b) holds, or P(k, c) holds.  This is the essence of missing information.  Whether information is in fact missing is distinct from the reason it is missing.  Your example above conflates the two concepts.

Identifying a class of real-world objects which may not yet actually exist is just as much capturing data -- record keeping, or taking readings -- about the real (or a fictional) world as any other. If you can write it down on paper -- which you can certainly do with a class of real-world objects which may not yet actually exist -- then it is a reading, even if it is a reading of mind, so to speak.

If you can record what you don't know, then it isn't missing information. It's a record of what you do know, like the thermometer is broken rather than emitting Celsius temperature values. At that point, it's notionally equivalent whether you use a ZOO attribute is used to record the attempted temperature reading -- and you presumably throw away the fact that the thermometer is broken, perhaps because you only care about Celsius temperature values -- or you use a tagged union to record the specific fact that the thermometer is broken. Which approach you choose is a matter of programming language ergonomics and meeting requirements. Conceptually, the same information is recorded (or not recorded) in either approach. There is no logical distinction.

Regarding your last paragraph, if you can record "missing information" then it is, by definition, not missing. Describing it as "missing" is a mistake. I am not conflating non-missing information with missing information because the latter is an oxymoron, a self-contradiction. There is either recorded information or there is no record at all. Recording what you don't know is never "missing information"; it's a record of what you do know.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on December 22, 2018, 11:02 am
Quote from Brian S on December 22, 2018, 3:59 am
Quote from Dave Voorhis on December 16, 2018, 10:37 am
Quote from Brian S on December 16, 2018, 5:27 am

Are you saying that REL doesn't support relation-valued attributes?

The logic of missing information is actually quite clear: (1) there is an n-ary predicate; (2) it has been asserted that there is an instantiation of that n-ary predicate for a particular key value; (3) it is not clear precisely which instantiation satisfies that n-ary predicate for that particular key value.

There is an existentially-quantified sentence that is true for each and every dependent component: there is one and only one x such that P(k, x).  This is the case regardless of whether it is known what that dependent component actually is.

Rel supports relation-valued attributes. They should not be used to re-invent SQL NULL.

I reject the notion that "missing information" needs to be recorded in a relational database. What does need to be recorded, in some cases, is the entirely non-missing fact that a particular attribute value is not within its usual domain.

For example, we may need to record the fact that a temperature reading did not find a certain value in degrees Celsius, and instead found a broken thermometer or defective sensor. What we actually record in a relational database are readings about the real (or a fictional, but notionally veridical) world, the domains of which can be wholly described by appropriate types.

A truly "missing" reading is not recorded, because by definition, it is missing. If we can record it, then by definition it is not missing.

Thus, "missing" can only be the absence of any record (a notion covered under the Closed World Assumption). Any present record is not missing.

Indeed, a record with "missing" information is a contradiction.

I disagree with your characterization of tuples as readings.  A tuple in an item master relation isn't a reading: it identifies a class of real-world objects which may not yet actually exist.  Until the first article has been produced, the item in question has no substance and exists only conceptually in the mind, and yet it can still appear in the database.

I disagree also with your characterization of the Closed World Assumption:  You can record what you know and you can record what you don't know, but under the Closed World Assumption, you can't record that you don't know what you don't know.

I think it's important to be able to record that there should be a value--even if it is not clear which one it is.  The presence of a tuple with a ZOO component asserts the fact that there should be a value, and a non-empty ZOO component declares which one it is.

The logic is in fact the logic of projection: "there is an x such that P(k, x)."  P(k, a) implies "there is an x such that P(k, x)," but it doesn't change the fact that P(k, a) and "there is an x such that P(k, x)" are distinct facts, and that "there is an x such that P(k, x)" can hold even if it is not clear whether P(k, a) holds, or P(k, b) holds, or P(k, c) holds.  This is the essence of missing information.  Whether information is in fact missing is distinct from the reason it is missing.  Your example above conflates the two concepts.

Identifying a class of real-world objects which may not yet actually exist is just as much capturing data -- record keeping, or taking readings -- about the real (or a fictional) world as any other. If you can write it down on paper -- which you can certainly do with a class of real-world objects which may not yet actually exist -- then it is a reading, even if it is a reading of mind, so to speak.

If can record what you don't know, then it isn't missing information. It's a record of what you do know, like the thermometer is broken rather than emitting Celsius temperature values. At that point, it's notionally equivalent whether you use a ZOO attribute is used to record the attempted temperature reading -- and you presumably throw away the fact that the thermometer is broken, perhaps because you only care about Celsius temperature values -- or you use a tagged union to record the specific fact that the thermometer is broken. Which approach you choose is a matter of programming language ergonomics and meeting requirements. Conceptually, the same information is recorded (or not recorded) in either approach. There is no logical distinction.

Regarding your last paragraph, if you can record "missing information" then it is, by definition, not missing. Describing it as "missing" is a mistake. I am not conflating non-missing information with missing information because the latter is an oxymoron, a self-contradiction. There is either recorded information or there is no record at all. Recording what you don't know is never "missing information"; it's a record of what you do know.

I don't see it that way: what is being recorded is that there is a particular value from a specific domain that satisfies the predicate for a given key value.  It's just not clear which particular value satisfies the predicate for that given key value when the ZOO component is empty.  It is known, however, by the user inserting the tuple that there is one.

I think it is a mistake to have multiple relvars that employ the same predicate symbol.  Why?  Because the scope of application of the closed world assumption is each relvar, and as a result, the presence of a tuple in another relvar with the predicate, "there is an x such that P(k, x)...," is a clear contradiction of what can be deduced from the absence of a tuple in a relvar with the predicate P(k, x) under the closed world assumption.  It also negatively affects the semantics of projection:  The projection over k on the relvar with predicate P(k, x) no longer has the predicate "there is an x such that P(k, x)" under the closed world assumption."  The absence of a tuple in that projection for a given k no longer implies that there isn't an x such that P(k, x).  The user has to also look elsewhere in the database to find that out.

It is better, therefore, to house instances of P(k, x) and "there is an x such that P(k, x)" in the same relvar.

Quote from Brian S on December 28, 2018, 12:10 pm
Quote from Dave Voorhis on December 22, 2018, 11:02 am
Quote from Brian S on December 22, 2018, 3:59 am

 

I think it's important to be able to record that there should be a value--even if it is not clear which one it is.  The presence of a tuple with a ZOO component asserts the fact that there should be a value, and a non-empty ZOO component declares which one it is.

The logic is in fact the logic of projection: "there is an x such that P(k, x)."  P(k, a) implies "there is an x such that P(k, x)," but it doesn't change the fact that P(k, a) and "there is an x such that P(k, x)" are distinct facts, and that "there is an x such that P(k, x)" can hold even if it is not clear whether P(k, a) holds, or P(k, b) holds, or P(k, c) holds.  This is the essence of missing information.  Whether information is in fact missing is distinct from the reason it is missing.  Your example above conflates the two concepts.

 

I don't see it that way: what is being recorded is that there is a particular value from a specific domain that satisfies the predicate for a given key value.  It's just not clear which particular value satisfies the predicate for that given key value when the ZOO component is empty.  It is known, however, by the user inserting the tuple that there is one.

I think it is a mistake to have multiple relvars that employ the same predicate symbol.  Why?  Because the scope of application of the closed world assumption is each relvar, and as a result, the presence of a tuple in another relvar with the predicate, "there is an x such that P(k, x)...," is a clear contradiction of what can be deduced from the absence of a tuple in a relvar with the predicate P(k, x) under the closed world assumption.  It also negatively affects the semantics of projection:  The projection over k on the relvar with predicate P(k, x) no longer has the predicate "there is an x such that P(k, x)" under the closed world assumption."  The absence of a tuple in that projection for a given k no longer implies that there isn't an x such that P(k, x).  The user has to also look elsewhere in the database to find that out.

It is better, therefore, to house instances of P(k, x) and "there is an x such that P(k, x)" in the same relvar.

If you are facing a design situation where "known-ness of x" is a relevant/distinguishing property then your predicates must express that concept of "known-ness" somewhere somehow.  Yours don't.

Quote from Erwin on December 29, 2018, 1:28 pm
Quote from Brian S on December 28, 2018, 12:10 pm

 

P(k, a) and "there is an x such that P(k, x)" are distinct facts

I think it is a mistake to have multiple relvars that employ the same predicate symbol.

It is better, therefore, to house instances of P(k, x) and "there is an x such that P(k, x)" in the same relvar.

 

Besides, I think it is a mistake to use the same predicate symbol for distinct predicates.

Quote from Erwin on December 29, 2018, 1:28 pm
Quote from Brian S on December 28, 2018, 12:10 pm
Quote from Dave Voorhis on December 22, 2018, 11:02 am
Quote from Brian S on December 22, 2018, 3:59 am

 

I think it's important to be able to record that there should be a value--even if it is not clear which one it is.  The presence of a tuple with a ZOO component asserts the fact that there should be a value, and a non-empty ZOO component declares which one it is.

The logic is in fact the logic of projection: "there is an x such that P(k, x)."  P(k, a) implies "there is an x such that P(k, x)," but it doesn't change the fact that P(k, a) and "there is an x such that P(k, x)" are distinct facts, and that "there is an x such that P(k, x)" can hold even if it is not clear whether P(k, a) holds, or P(k, b) holds, or P(k, c) holds.  This is the essence of missing information.  Whether information is in fact missing is distinct from the reason it is missing.  Your example above conflates the two concepts.

 

I don't see it that way: what is being recorded is that there is a particular value from a specific domain that satisfies the predicate for a given key value.  It's just not clear which particular value satisfies the predicate for that given key value when the ZOO component is empty.  It is known, however, by the user inserting the tuple that there is one.

I think it is a mistake to have multiple relvars that employ the same predicate symbol.  Why?  Because the scope of application of the closed world assumption is each relvar, and as a result, the presence of a tuple in another relvar with the predicate, "there is an x such that P(k, x)...," is a clear contradiction of what can be deduced from the absence of a tuple in a relvar with the predicate P(k, x) under the closed world assumption.  It also negatively affects the semantics of projection:  The projection over k on the relvar with predicate P(k, x) no longer has the predicate "there is an x such that P(k, x)" under the closed world assumption."  The absence of a tuple in that projection for a given k no longer implies that there isn't an x such that P(k, x).  The user has to also look elsewhere in the database to find that out.

It is better, therefore, to house instances of P(k, x) and "there is an x such that P(k, x)" in the same relvar.

If you are facing a design situation where "known-ness of x" is a relevant/distinguishing property then your predicates must express that concept of "known-ness" somewhere somehow.  Yours don't.

I disagree.  There is no need for an "it is known to be the case that ..." modality.  The instantiation of the predicate having just the components of a key as parameters implies that there is exactly one instantiation of each predicate having a proper superset of the components of a key--even if it isn't clear which values actually instantiate the predicate.  It would therefore be redundant to state "it is known to be the case that Pka(k, a)" in addition to stating "Pka(k, a)."   The sentence,

∀k [Pk(k) → ∃1a Pka(k, a)].

is a logical consequence of the structure of a relvar having attributes k and a such that k → a.  The existence of predicates Pk and Pka is also a logical consequence of the functional dependency.  This is the case regardless of whether a value has been supplied for each a.  The modality you're suggesting is therefore superfluous.  The database represents a collection of definite facts.  Indefinite information is not directly represented by values in the database, but is instead implied by the instantiation of predicates like Pk.  For example, given the FD EMP# -> SSAN, and a tuple containing EMP#(1234), we can deduce that the employee having EMP#(1234) has exactly one social security number, even if no SSAN value has been supplied yet.  The presence of an explicit SSAN value for a given employee doesn't alter the fact that the employee in question has exactly one social security number.  Moreover, the fact that the employee in question has exactly one social security value combines with the fact that the employee has a specified social security number to deny all other instantiations of the predicate "employee EMP# has social security number SSAN" for that employee.  It is not just the closed world assumption at work here, but rather the implication that is a logical consequence of the functional dependency.

Given, P(a), and there is one and only one x such that P(x), one can conclude that forall x, P(x) -> x = a.

Brian

Quote from Hugh on December 8, 2018, 12:56 pm

I showed these outer join bullets to Chris Date:

- Rel: Added p LEFT JOIN q which returns
TUPLE {Matched p JOIN q, Missing p NOT MATCHING q}

- Rel: Added p RIGHT JOIN q which returns
TUPLE {Matched p JOIN q, Missing q NOT MATCHING p}

- Rel: Added p FULL JOIN q which returns
TUPLE {
Matched p JOIN q,
MissingLeft p NOT MATCHING q,
MissingRight q NOT MATCHING p
}

Sorry, I'm not seeing any sense in which these are "outer joins". If you're aiming to include them in a Tutorial language, you're just going to confuse beginners. (Several people make similar comments in this thread and in the preceding thread/question from asulling.)

 

To start from asulling's example, I might want employee outer joined with phone number, with email address, with emergency contact person, ... With this Rel alleged outer join, I can't compose the result straightforwardly with another outer join. Furthermore hard-coding the attribute names will quickly get really, really annoying.

Now there might be several phone numbs, contacts, etc. Then SQL's outer join is also dis-ergonomic: it'll cope OK(-ish) with a 'missing' email address, but cross-multiply the other auxiliary tables.

You're introducing RVAs in this so-called Join. Then why not go whole hog and produce a relation with RVAs? (This is straight from one of the DBE chapters, as somebody also suggested.)

 

Employee OUTER JOIN { Phones := Phone, EMails := Email, Contacts := Contact }

(I've used attribute names as plurals there, to show they're RVAs. I'm assuming the aux tables have an Inclusion Dependency to Employee on ENum, and are of cardinality 0-or-many:1.)

The desugarring would use Chris's shiny gizmo:

Employee EXTEND { Phones := IMAGE_IN(Phone), EMails := IMAGE_IN(Email), Contacts := IMAGE_IN(Contact) }

Now that would be hard to define as a generic operator (because Tutorial D doesn't have enough first-classness). Whereas what's gone into Rel could straghtforwardly be user-defined, if anybody wants it (which I doubt, quite frankly).

 

 FULL OUTER JOIN { Suppliers := S, Parts := P }

desugars to

UNION{ S{City}, P{City} } OUTER JOIN { Suppliers := S, Parts := P }

That is: project the operands of FULL OUTER JOIN onto their common attributes; UNIONthe projections to give a core of keys; OUTER JOIN the operands to the core.

I've written the FULL OUTER JOIN in n-adic form. That's chiefly so I could use Extend/attribute naming syntax (which would be awkward with infix form). n-adic is a dodgy move because FULL OUTER JOIN is not associative (in general -- neither is there an identity value), then I can't reduce the implementation to iterating a dyadic operator. OTOH, full outer joins I've seen in the wild usually are associative. There could perhaps be validation to say: the attributes in common across all the operands must be exactly the attributes in common between each pair of operands. Then

 FULL OUTER JOIN { Suppliers := S, Parts := P, Shipments := SP }         // rejected

 FULL OUTER JOIN{ Cities := FULL OUTER JOIN { Suppliers := S, Parts := P }, Shipments := SP }          // rejected

That latter is rejected because for the outer outer join there's no attributes in common. (Or we join on a core of Dee, which'll not give what anybody might expect.)

 

AntC

 

Quote from AntC on February 13, 2019, 12:26 am

... compose ...

Oh, in case it wasn't clear, the 'n-adic' form of OUTER JOIN is equivalent to composing a 'dyadic' form.

Employee OUTER JOIN { Phones := Phone, EMails := Email, Contacts := Contact }

Equivalent to

(( Employee OUTER JOIN { Phones := Phone } )
OUTER JOIN
{ EMails := Email } )
OUTER JOIN { Contacts := Contact }

(Scare quotes round those 'adic's, because OUTER JOIN is not a genuine operator.)

 

Also I saw a comment that the Rel alleged outer join results are information-equivalent to the two input relations. I'm dubious: you can only recover the input relation values if you know what were the attributes in common. And that's information lost by the operation/not held as a value in an attribute.

To the extent it's a valid observation, then my forms are also info-equiv to the inputs. Again you need to know which were the attributes in common (which might be different attribute-sets for different operands of the n-adic form).

AntC

 

Quote from AntC on February 13, 2019, 12:26 am
Quote from Hugh on December 8, 2018, 12:56 pm

I showed these outer join bullets to Chris Date:

- Rel: Added p LEFT JOIN q which returns
TUPLE {Matched p JOIN q, Missing p NOT MATCHING q}

- Rel: Added p RIGHT JOIN q which returns
TUPLE {Matched p JOIN q, Missing q NOT MATCHING p}

- Rel: Added p FULL JOIN q which returns
TUPLE {
Matched p JOIN q,
MissingLeft p NOT MATCHING q,
MissingRight q NOT MATCHING p
}

Sorry, I'm not seeing any sense in which these are "outer joins". If you're aiming to include them in a Tutorial language, you're just going to confuse beginners.

It doesn't appear to have confused anyone so far, particularly as it was introduced mainly to solve a practical use case for which it serves well.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I know Chris Date was enthusiastic about these additions, but I regret to say that I didn't alert him to the response (I think it was Brian Selzer's) pointing out that by returning a tuple rather than an SQL-style pseudo relation you fail to allow an outer join expression to be an operand to another outer join.

I had noted that the user can easily put RELATION{...} around the tuple expression but I'm not sure that helps a great deal.

Hugh

 

Coauthor of The Third Manifesto and related books.