The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Questions about some relational operator definitions and comments about the DIVIDEBY operation

In the Third Manifesto in the section titled RM PRESCRIPTION 18: RELATIONAL ALGEBRA, there is a definition of a DIVIDEBY relational operator with an example of a divide using the Suppliers example database. I repeat a portion of the example description below. I'm not including the sample database values.  The purpose of the example, however, is to demonstrate that the DIVIDEBY operation can handle the situation that results when the restriction operator returns no tuples.  In this case there are no purple parts in relvar P (nor in SP).

Consider the query “Get suppliers who supply every purple part” (the point of the example being that, given our usual sample data, the set of purple parts is empty). Using Codd’s divide, then, we might write:

     SP { S#, P# } DIVIDEBY ( P WHERE COLOR = COLOR('Purple') ) { P# }

But this expression misses suppliers who supply no parts at all—even though (logically speaking) such a supplier does supply every purple part, because, as is well known, “FORALL x (exp)” always evaluates to TRUE if there are no x’s, regardless of what the boolean expression exp happens to be. By contrast, the following expression (which involves a Small Divide) is guaranteed to give the right answer in all cases:

          WITH ( P WHERE COLOR = COLOR('Purple') ) AS PP :

S { S# } DIVIDEBY PP { P# } PER ( SP { S#, P# } )

This expression is defined to be shorthand for the following:

      WITH ( P WHERE COLOR = COLOR('Purple') ) AS PP ,

( S { S# } JOIN PP { P# } ) AS QQ ,

( QQ MINUS SP { S#, P# } ) AS RR :

S { S# } MINUS RR { S# }

I have questions regarding the example:

1) To which Boolean expression is the statement “FORALL x(exp)” always evaluates to TRUE if there are no x’s...” a reference? I see only relational operators that yield sets of tuples.

2) What is the value of the relation that results from “P (P#) WHERE COLOR(‘Purple’)”.? That is, what is the heading and what is the cardinality? In other words, I am requesting a formal definition of “empty relation/set”.

3) What would be the relation resulting from the relational JOIN as specified above: S(S#) JOIN PP(P#)? Since the two operands are disjoint then it reduces to a relational cartesian product. What is the result of a relational cartesian product when one of the relations is empty? If it were a set operation, I know the answer would be an empty set. But with the relational algebra as defined in TTM I see the possibility of the result of the above product to be either (a) an empty relation: or (b) a relation with heading of { {S#} (P#} } and a body of 5 tuples:

{ {S1} {} }, { {S2} {} }, { {S3} {} }, { {S4} {} }, { {S51} {} }

The definition of the JOIN (Chapter 2, Relational Operators) does not cover this situation.

4) If the dividend (the cross product) of a DIVIDE is 0 (empty relation) then is the quotient also 0 as it is for an arithmetic divide? I infer from the results of the example that the answer is no. Perhaps my inference is wrong? What is the formal definition of the DIVIDE operator especially in regard to handling that special case? Or, is it not a special case? I believe that it is a special case for reasons I express in my commentary below.

Some commentary:

A)  For the example, the relvar SP is defined as: “Relvar SP represents shipments (it shows which parts are supplied by which suppliers)”. There is a contradiction in the definition because recording the details of a shipment (i.e.: which parts have been shipped) is not the same as recording what parts are available from a supplier (I.e.: parts which they could supply in future). The latter predicate can not be evaluated by the data available. Therefore the query actually executed in the example is “Get suppliers who have shipped/supplied every purple part”. A visual inspection of the values in the database is sufficient to verify (via the SP relvar) that there are no suppliers who have shipped any purple parts. The result of the divide should agree with that fact; but it does not. Instead the result hypothesizes some possible future event in which purple parts will be shipped. The problem, as I see it, is that the inquirer would get the same result if there actually were purple parts in P and every supplier has supplied all of them. In other words the inquirer is unable to distinguish the case in which every supplier actually shipped every purple part from the case in which no purple parts exist and/or weren’t shipped. Unacceptable to me.

Suppose that we insert at least 1 tuple(s) into P for some purple part(s) without inserting any tuples into SP for that part(s) and then repeat the same divide operation. The result should be that there are no suppliers who have shipped all purple parts. And that is the result produced.

Logically, both of the above examples should have resulted in the same answer. I believe the error in the first case results from not dealing correctly with the empty relation produced by the cross product (QQ).

B) Consistent with my statements expressed in A) above, the introduction of the S relvar in place of the SP in the cross product of the DIVIDE operation is superfluous because all of the suppliers who have supplied any or all parts are included in the SP relvar and are the only suppliers that could satisfy the query. Suppliers who are known not to have supplied any parts do not need to be injected into the operation because they will later be excluded in the division.

End of post.

Quote from Greg Klisch on February 6, 2020, 3:42 am
Welcome to our little forum!
I don't entirely follow your arguments, but I can tell you that the D&D writing on the subject of DIVIDE of various kinds is extensive, and all of your questions are surely covered in DBE (Database Explorations) or DTATRM (Databases, Types and the Relational Model) or both. But I can try to provide brief answers to your specific questions.

1) To which Boolean expression is the statement “FORALL x(exp)” always evaluates to TRUE if there are no x’s...” a reference? I see only relational operators that yield sets of tuples.

The FORALL refers to a set of tuples (a relation). No x's means the set (relation) is empty.

2) What is the value of the relation that results from “P (P#) WHERE COLOR(‘Purple’)”.? That is, what is the heading and what is the cardinality? In other words, I am requesting a formal definition of “empty relation/set”.

The heading is (trivially) P# and the cardinality is (trivially) zero.

3) What would be the relation resulting from the relational JOIN as specified above: S(S#) JOIN PP(P#)? Since the two operands are disjoint then it reduces to a relational cartesian product. What is the result of a relational cartesian product when one of the relations is empty?

The heading is (trivially) { S#, P# }. The result is an empty relation.

If it were a set operation, I know the answer would be an empty set. But with the relational algebra as defined in TTM I see the possibility of the result of the above product to be either (a) an empty relation: or (b) a relation with header of { {S#} (P#} } and a body of 5 tuples:

{ {S1} {} }, { {S2} {} }, { {S3} {} }, { {S4} {} }, { {S51} {} }

No. What do you intend that syntactic form to denote? If you mean some kind of SQL-like NULL, then see RM Pro 4 (no nulls).

4) If the dividend (the cross product) of a DIVIDE is 0 (empty relation) then is the quotient also 0 as it is for an arithmetic divide? I infer from the results of the example that the answer is no. Perhaps my inference is wrong? What is the formal definition of the DIVIDE operator especially in regard to handling that special case? Or, is it not a special case? I believe that it is a special case for reasons I express in my commentary below.

There are no special cases. It would help if you rephrase that question. I'm having difficulty being certain what you're asking, and I wouldn't want to spend time on answering the wrong question.

 

Andl - A New Database Language - andl.org
Quote from dandl on February 6, 2020, 6:42 am
Quote from Greg Klisch on February 6, 2020, 3:42 am

 

If it were a set operation, I know the answer would be an empty set. But with the relational algebra as defined in TTM I see the possibility of the result of the above product to be either (a) an empty relation: or (b) a relation with header of { {S#} (P#} } and a body of 5 tuples:

{ {S1} {} }, { {S2} {} }, { {S3} {} }, { {S4} {} }, { {S51} {} }

No. What do you intend that syntactic form to denote? If you mean some kind of SQL-like NULL, then see RM Pro 4 (no nulls).

In addition to this : no, there is no such "possibility" and the existence of such a "possibility" is ruled out by the definition of natural join.  All you need to do is carefully read the definitions and apply them (e.g there is no such thing as "product" in "the relational algebra as defined in TTM", there is only natural join, and that one is pretty well-defined), and not let your own imagination get in the way.  That is not to say that people's imagination cannot possibly ever give rise to useful ideas/concepts/... , that is only to say that if and when discussing "the relational algebra as defined in TTM", one must stick to the definitions TTM provides.

 

Quote from Greg Klisch on February 6, 2020, 3:42 am

 

4) If the dividend (the cross product) of a DIVIDE is 0 (empty relation) then is the quotient also 0 as it is for an arithmetic divide? I infer from the results of the example that the answer is no. Perhaps my inference is wrong? What is the formal definition of the DIVIDE operator especially in regard to handling that special case? Or, is it not a special case? I believe that it is a special case for reasons I express in my commentary below.

Some commentary:

A)  For the example, the relvar SP is defined as: “Relvar SP represents shipments (it shows which parts are supplied by which suppliers)”. There is a contradiction in the definition because recording the details of a shipment (i.e.: which parts have been shipped) is not the same as recording what parts are available from a supplier (I.e.: parts which they could supply in future). The latter predicate can not be evaluated by the data available. Therefore the query actually executed in the example is “Get suppliers who have shipped/supplied every purple part”. A visual inspection of the values in the database is sufficient to verify (via the SP relvar) that there are no suppliers who have shipped any purple parts. The result of the divide should agree with that fact; but it does not. Instead the result hypothesizes some possible future event in which purple parts will be shipped. The problem, as I see it, is that the inquirer would get the same result if there actually were purple parts in P and every supplier has supplied all of them. In other words the inquirer is unable to distinguish the case in which every supplier actually shipped every purple part from the case in which no purple parts exist and/or weren’t shipped. Unacceptable to me.

Suppose that we insert at least 1 tuple(s) into P for some purple part(s) without inserting any tuples into SP for that part(s) and then repeat the same divide operation. The result should be that there are no suppliers who have shipped all purple parts. And that is the result produced.

Logically, both of the above examples should have resulted in the same answer. I believe the error in the first case results from not dealing correctly with the empty relation produced by the cross product (QQ).

B) Consistent with my statements expressed in A) above, the introduction of the S relvar in place of the SP in the cross product of the DIVIDE operation is superfluous because all of the suppliers who have supplied any or all parts are included in the SP relvar and are the only suppliers that could satisfy the query. Suppliers who are known not to have supplied any parts do not need to be injected into the operation because they will later be excluded in the division.

End of post.

Re. relational divide and arithmetic divide : that analogy is very brittle and best simply not made.  Relational divide was intended to cover the case where there is a need for some universal quantification to act as a filtering predicate in a RESTRICT.  That's it.

Re. "contradiction in the definition because recording the details of a shipment ..." : the source of your perceiving a "contradiction" here probably stems from you attaching too much meaning to the word "shipment".  I believe "Introduction to database systems" has an explicit remark to the effect that "SHIPMENT" represents "potential shipments" and not "shipments actually made on a specific day to a specific destinee" or some such.  That's another book than the one you were reading, so it's not "your fault", and the choice of name could indeed be considered unfortunate, but then again otoh it's typically impossible to cover all of the semantics of an external predicate in one single name acting as the predicate symbol.

Re. "The problem, as I see it, is that the inquirer would get the same result if there actually were purple parts in P and every supplier has supplied all of them. In other words the inquirer is unable to distinguish the case in which every supplier actually shipped every purple part from the case in which no purple parts exist and/or weren’t shipped. Unacceptable to me." : the problem is that if the inquirer has a need to "distinguish the case ... from the case ...", then he has asked the wrong question.  If he is interested in knowing whether there are purple arts or not, then he must query a projection of a restriction, and division should/will not be involved.

Mathematical convention says that a universal quantification over an empty set always yields true, even if the predicate subordinate to the quantification is an outright falsehood/contradiction such as 1=0 :

FORALL pink_elephants : 1=0   /* true if there are no pink elephants */

FORALL pink_elephants : the pink_elephant at hand is green   /* true if there are no pink elephants */

the "FORALL pink_elephants" part is a more informal way of stating "FORALL x | x element of the set of pink_elephants".

The consequence is that if there are no purple parts, then all suppliers indeed supply all purple parts.

Thank you David and Erwin for your responses.

Re: question 1) the FORALL question: I asked for which (meaning specific) expression to which the FORALL is being applied.  I recognize that a FORALL expression is being applied to the projection of a restriction in this divide.  And then there is another FORALL in the JOIN operation. And I suppose in each of the MINUS operations.  Because  I have been regarding the division in the long form as a series of operations (a procedure) that does not use the DIVIDEBY,  I didn't know which FORALL was being referenced.  I believe now that it could be either the projection of the restriction or the JOIN (possibly the SP table could be empty).  As I write this, I realize that I erred by thinking of the series of operations as a procedure rather than a single operator.  And that is why I couldn't identify which FORALL expression was referenced.

Re: question 2):  Thank you David.

Re: question 3):  I was not suggesting a notion of NULLs.  When I included that notation, I had about thinking about the statement in TTM that an attribute can be of type TUPLE or SET. So, I thought it possible that a JOIN to an empty relvar could result in empty TUPLEs values for some attributes.  So the notation was meant to represent an empty TUPLE value for an attribute. Including that thought into the post was not relevant and I apologize.

Re: question 4):  I was interested in having a formal definition of relational divide that I could use to allay my distrust of the results that were produced by the example.  I repeat my reason for not trusting the results: "The problem, as I see it, is that the inquirer would get the same result if there actually were purple parts in P and every supplier has supplied all of them. In other words the inquirer is unable to distinguish the case in which every supplier actually shipped every purple part from the case in which no purple parts exist and/or weren’t shipped."   To quote Erwin:  "the problem is that if the inquirer has a need to "distinguish the case ... from the case ...", then he has asked the wrong question.  If he is interested in knowing whether there are purple prts or not, then he must query a projection of a restriction, and division should/will not be involved".  Sadly, I was aware of that but I was expecting the responsibility for that to be in the definition of the divide operation(i.e.: special case) to provide a result more in alignment with a customer's expectations for that "case".  That was my error. The divide operation divides, it's my responsibility to know when to divide.

New question (5): Is the long form of the divide an implementation in Tutorial D or a definition?

Quote from Greg Klisch on February 7, 2020, 7:00 am

Thank you David and Erwin for your responses.

Hi Greg, welcome to the forum -- which is as much for discussion as explanation.

Nearly all learners get confused by Relational DIVIDE, judging by the qs on StackOverflow. It's a topic whose only purpose seems to be for instructors to torture students as far as I can tell. I've never used it in anger -- which is to say I've never tried to achieve the effect in SQL with the double-negative EXISTS. It's far easier to drag the data into some programming language and code all your tricky cases in a place you understand how to control. (So it's no good asking me your questions: I refuse to know those kind of answers.)

The name DIVIDE is supposed to suggest the operation stands to Cartesian product as (integer) arithmetic divide stands to  multiplication. With empty relations this gives comparable difficulties to divide-by-zero. There's also a Relational parallel to dividing by an integer that's not a factor of the product. In short, if the relation you're trying to DIVIDE into is not a Cartesian product of the relation you're trying to DIVIDE by, abandon hope of forming any reliable intuitions.

Re: question 1) ...  Because  I have been regarding the division in the long form as a series of operations (a procedure) that does not use the DIVIDEBY,  ...

Yes, and emphatically to the extent I always code using the other operations rather than DIVIDE. You can find those operations discussed in DBE Chapter 14 'Image Relations'. There's some drawbacks: none of the operations you need are in SQL: Relation-Valued Attributes; Tutorial D's GROUP operator; subset tests between the values in RVAs.

Emphatically I do not use the 'other operations' you'll find in SQL textbooks: a double-NOT EXISTS makes my head hurt. And it heaps obfuscation on confusion.

 

Welcome from me too, Greg, and thanks to those who have dealt with your queries, to your satisfaction, I hope, as well as to mine.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Greg Klisch on February 7, 2020, 7:00 am

Re: question 1) the FORALL question: I asked for which (meaning specific) expression to which the FORALL is being applied.  I recognize that a FORALL expression is being applied to the projection of a restriction in this divide.  And then there is another FORALL in the JOIN operation. And I suppose in each of the MINUS operations.  Because  I have been regarding the division in the long form as a series of operations (a procedure) that does not use the DIVIDEBY,  I didn't know which FORALL was being referenced.  I believe now that it could be either the projection of the restriction or the JOIN (possibly the SP table could be empty).  As I write this, I realize that I erred by thinking of the series of operations as a procedure rather than a single operator.  And that is why I couldn't identify which FORALL expression was referenced.

A supplier who supplies parts will appear in the relation headed { S#, P# }. Some of those will be 'suppliers who supply all purple parts' (so this is a join followed by a restriction), as will suppliers who do not appear (supply no parts at al)l. The FORALL applies to this relation.

Re: question 2):  Thank you David.

Re: question 3):  I was not suggesting a notion of NULLs.  When I included that notation, I had about thinking about the statement in TTM that an attribute can be of type TUPLE or SET. So, I thought it possible that a JOIN to an empty relvar could result in empty TUPLEs values for some attributes.  So the notation was meant to represent an empty TUPLE value for an attribute. Including that thought into the post was not relevant and I apologize.

Attributes may be tuple-valued (TVA) or relation-valued (RVA), but that is reflected in their type and that of the value(s) they hold. An attribute such as S# can only ever hold a scalar value of the correct type, not a tuple.

Re: question 4):  I was interested in having a formal definition of relational divide that I could use to allay my distrust of the results that were produced by the example.  I repeat my reason for not trusting the results: "The problem, as I see it, is that the inquirer would get the same result if there actually were purple parts in P and every supplier has supplied all of them. In other words the inquirer is unable to distinguish the case in which every supplier actually shipped every purple part from the case in which no purple parts exist and/or weren’t shipped."   To quote Erwin:  "the problem is that if the inquirer has a need to "distinguish the case ... from the case ...", then he has asked the wrong question.  If he is interested in knowing whether there are purple prts or not, then he must query a projection of a restriction, and division should/will not be involved".  Sadly, I was aware of that but I was expecting the responsibility for that to be in the definition of the divide operation(i.e.: special case) to provide a result more in alignment with a customer's expectations for that "case".  That was my error. The divide operation divides, it's my responsibility to know when to divide.

Yes. DIVIDE often answers a question that no-one wanted to ask.

New question (5): Is the long form of the divide an implementation in Tutorial D or a definition?

TD implements both the Small Divide and the Great Divide (spec on the TTM site). Is that what you meant? I see little point in using them, since they can be written longhand, often with greater clarity of intent. Andl does not.

 

Andl - A New Database Language - andl.org
Quote from Greg Klisch on February 7, 2020, 7:00 am

New question (5): Is the long form of the divide an implementation in Tutorial D or a definition?

I don't know whether any of the material in this discussion has already provided you with an answer, but my answer would be the question as such is almost impossible to answer.

In the long history of trying to come up with a definition of relational division that is all of "sensible", "intuitive" and "useful", it has turned out (besides having to pick only two out of those three to keep things achievable to begin with) that if you have MINUS (or actually SEMIMINUS) as a primitive operator in your algebra, then relational division is not a primitive operator of the algebra, meaning its effects/results can ***always*** be obtained by an equivalent formula that uses only a nesting of invocations of operators that ***are*** primitive in the algebra (Antc's "double NOT EXISTS"), meaning in turn that "relational division" is actually nothing more than what language designers would (and typically do) call a "syntactic shorthand".  So in ***that*** sense, the "longhand" is ***both*** "definition" and "implementation".  Think of it as a C macro.  Or assembler macro or PL/1 macro.  The macro is itself the definition and the implementation is the act of replacing the invocation by its definition.  The fact that you speak of relational division as if it were a "process" means you have an idea of what the definition looks like.

And that leads me straight to my second point : relational division may be equivalent to some nesting of [invocations of] primitive operators of the algebra, but what that means is that relational division is itself just what mathematics calls a "function composition".  And in mathematics, a composition of functions is itself still a function, and anything that is mathematically a function, becomes a [deterministic] operator in computing.

Brief : relational division may not be a primitive operator, but an operator it still is.  And nonprimitive operators have to be ***defined*** in terms of the primitive ones (the primitive ones of choice that is, because sets of primitive operators are not necessarily unique).  And if all the DBMS/language implementation does with an invocation of a nonprimitive operator is to [syntactically] replace it with its equivalent longhand, then in a certain sense that ***definition*** counts as the "implementation" in one and the same go.

As an aside, I've spoken of "relational division" here as if it were "one single operator", but the long history of trying to come up with a "good" definition of it has actually produced a multitude of definitions, all of them distinct, all of them useful for a specific set of use cases, but none of them suitable for all of the use cases.  So pls also bear in mind that "relational division" is best thought of not as ***one single*** relational operator, but rather as a whole family of them (the main reason for Antc's dismay of course).

Thank you to C. J. Date and Hugh Darwen for making “Databases, Types and the Relational Model The Third Manifesto” available on your website. It is an enormously generous gift of a textbook. It is my first reading on the relational algebra and I find it comprehensive and readable.

 

Thank you, Erwin for such an elucidating set of explanations regarding what is relational division and what it is not. I had been thinking of relational divide as a primitive operation.

Re: your statement: "in a certain sense that ***definition*** counts as the "implementation" in one and the same go.” My preference is that a definition be explicitly stated independent of its implementation. I want a definition against which I can judge the correctness of a solution/implementation. It is a declaration of intentions.

One of the reasons that I posted on this forum is that I was not satisfied with the result of the relational small division example in the TTM because it seems contrary to what to what I expected. I couldn’t discern which definition that was being used in the implementation so I felt compelled to ask. But the answer was there in the form of a citation that I overlooked. Specifically:

“In reference [34], therefore, the present writers proposed (a) a generalized version of Codd’s divide (“the Small Divide”) and (b) a generalized version of Todd’s divide (“the Great Divide”) that overcame the difficulties with empty relations.”

Reference 34: “Hugh Darwen and C. J. Date: “Into the Great Divide,” in reference [38].”

It’s my fault for not pursuing the reference.

Because there are a “multitude of definitions” of relational divide, I did a search on the world wide web to see if there were definitions other than Codd’s, and Todd’s (and the revisions to them proposed by Date and Darwen). I found many examples of relational division; but, only one of them provided a definition with their example. Perhaps, if not stated, the default definition is Codd’s.

The one definition I found was the following (author not revealed; link: https://en.wikipedia.org/wiki/Relational_algebra#Division_(%C3%B7) ):

The division is a binary operation that is written as R ÷ S. Division is not implemented directly in SQL. The result consists of the restrictions of tuples in R to the attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their combinations with tuples in S are present in R.

“ More formally the semantics of the division is defined as follows:

R ÷ S = { t[a1,...,an] : t ∈ R ∧ ∀s ∈ S ( (t[a1,...,an] ∪ s)∈ R) }

where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of  t  to this set. “

I think there is a typo in the definition and that the following is what was intended:

R ÷ S = { t[a1,...,an] : t ∈ R ∧∀s ∈ S ( (t[a1,...,an]  ⋈ s)∈ R) }

 

As someone new to relational algebra, it looks like a candidate definition for “small” divide. I reproduced it here because it seems to differ from the definition used by others.

 

In response to AntC, I absolutely am not trying to draw any parallels to operations in SQL. I am interested solely in the relational algebra as discussed in TTM. But, I am appreciate of your comments; especially:

The name DIVIDE is supposed to suggest the operation stands to Cartesian product as (integer) arithmetic divide stands to  multiplication. With empty relations this gives comparable difficulties to divide-by-zero. There's also a Relational parallel to dividing by an integer that's not a factor of the product. In short, if the relation you're trying to DIVIDE into is not a Cartesian product of the relation you're trying to DIVIDE by, abandon hope of forming any reliable intuitions.”

Why has it been difficult to define how empty relations should be handled? Is it because, to quote Erwin “nonprimitive operators have to be ***defined*** in terms of the primitive ones ...’, such a definition could not be expressed in the existing primitive operators? I think now I can answer that question myself: There are no control-of-flow operators in the algebra and therefore the sequential flow of operations alone decide the results. I had been thinking that the implementation should derive from the definition. But, if you limit an implementation to only the primitive operations available then they dictate the available definitions. I now understand why the FORALL exp(e) dictates the answer for the entire small divide proposed in the TTM. And why David said “There are no special cases”. I had been thinking like a programmer with all the programming commands available. Big mistake.

 

I have received a valuable education in this forum from all of you. Thank you again.

Perhaps I will get a copy of Database Explorations (recommended by David) and continue my education.