The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Questions about some relational operator definitions and comments about the DIVIDEBY operation

Why has it been difficult to define how empty relations should be handled? Is it because, to quote Erwin “nonprimitive operators have to be ***defined*** in terms of the primitive ones ...’, such a definition could not be expressed in the existing primitive operators?

There is no unique set of primitive operators. Appendix A (*) pursues this at some length, as does Alice (http://webdam.inria.fr/Alice/). 

Alice gives the minimal set for the named RA (relational algebra) as SPJRUN: Select, Project, Join, Rename, Union, Negate, and these match up with Codd (*). Negate appears only in combination (eg TD NOT MATCHING (antijoin)). Most query languages add some way to compute new attribute values (TD EXTEND). Some query languages add recursion (Datalog, Andl, SQL). Many add a Turing-complete language on top.

Perhaps I will get a copy of Database Explorations (recommended by David) and continue my education.

(*) DTATRM, DBE, Alice, Appendix A, Codd 1972, Hall & Todd 1975 are often mentioned here, and are all free downloads. I guess that's my core library. Others may want to extend it.

Andl - A New Database Language - andl.org
Quote from Greg Klisch on February 11, 2020, 1:57 am

...

“ More formally the semantics of the division is defined as follows:

R ÷ S = { t[a1,...,an] : t ∈ R ∧ ∀s ∈ S ( (t[a1,...,an] ∪ s)∈ R) }

where {a1,...,an} is the set of attribute names unique to R and t[a1,...,an] is the restriction of  t  to this set. “

I think there is a typo in the definition and that the following is what was intended:

R ÷ S = { t[a1,...,an] : t ∈ R ∧∀s ∈ S ( (t[a1,...,an]  ⋈ s)∈ R) }

 

The "typo" you seem to refer to is using the UNION symbol where you seemed to believe it should have been the JOIN (bowtie) symbol.

But pls do note that the things the operation is applied to are ***tuples*** not relations.

And while it is perfectly conceivable to imagine an operation of "natural join on tuples", it is mathematically the same thing (*) as the set union of those same tuples.  Tuples (in the named perspective to RM, where attributes have names) are sets of (name, value) pairs and a set such as { (a,1) (b,2) } can be unioned with { (c,3) (b,2) } to obtain { (a,1) (b,2) (c,3) } which is exactly what you'd want to get were you trying to define a "tuple natural join" operation.

(*) but for the case where the tuples at hand are not "union compatible" because they would result in a kind of tuple where the attribute names no longer appear uniquely, e.g. { (a,1) (b,2) (b,9) (c,3) } .  But such cases are very often tacitly considered ruled out.

One again I am asking about the relational divide operator. In the quote below from TTM an alternate formulation for the "Get suppliers who supply every purple part" is presented:

We now observe that expressions involving DIVIDEBY (either version) can always be

replaced by logically simpler—though sometimes lengthier—expressions involving relation

comparisons instead (see RM Prescription 22). For example, here is another formulation of the

query “Get suppliers who supply every purple part”:

               S WHERE ( ( SP RENAME ( S# AS X ) ) WHERE X = S# ) { P# } ⊇

                                   ( P WHERE COLOR = COLOR('Purple') ) { P# }

Explanation: For a given supplier, identified by the S# value from some tuple t in relvar S,

the expression

                         ( ( SP RENAME ( S# AS X ) ) WHERE X = S# ) { P# }

yields the set of part numbers for parts shown in relvar SP as currently being supplied by that

supplier. That set of part numbers is then compared with the set of part numbers currently

appearing in relvar P for purple parts. If and only if the two sets are equal, then that tuple t from

relvar S is included in the result. (We have omitted the final projection over supplier numbers for

simplicity, since it would probably not be wanted in practice anyway.)

 

Question: Will the above formulation yield the same results for an empty set as this prior formulation?:

          WITH ( P WHERE COLOR = COLOR('Purple') ) AS PP ,

                ( S { S# } JOIN PP { P# } ) AS QQ ,

                ( QQ MINUS SP { S#, P# } ) AS RR :

          S { S# } MINUS RR { S# }

It appears to me that they do not. It appears that supplier S5 will appear in the result set in the "FORALL exp(e)"  formulation but not in the relational comparisons version . The  relational comparison formulation differs from the other in that it eliminates any suppliers not supplying any parts (per SP). The two formulations do not seem equivalent.  Were they intended to be?  

Quote from Greg Klisch on February 12, 2020, 1:07 am

One again I am asking about the relational divide operator. In the quote below from TTM an alternate formulation for the "Get suppliers who supply every purple part" is presented:

We now observe that expressions involving DIVIDEBY (either version) can always be

replaced by logically simpler—though sometimes lengthier—expressions involving relation

comparisons instead (see RM Prescription 22). For example, here is another formulation of the

query “Get suppliers who supply every purple part”:

               S WHERE ( ( SP RENAME ( S# AS X ) ) WHERE X = S# ) { P# } ⊇

                                   ( P WHERE COLOR = COLOR('Purple') ) { P# }

Explanation: For a given supplier, identified by the S# value from some tuple t in relvar S,

the expression

                         ( ( SP RENAME ( S# AS X ) ) WHERE X = S# ) { P# }

yields the set of part numbers for parts shown in relvar SP as currently being supplied by that

supplier. That set of part numbers is then compared with the set of part numbers currently

appearing in relvar P for purple parts. If and only if the two sets are equal, then that tuple t from

relvar S is included in the result. (We have omitted the final projection over supplier numbers for

simplicity, since it would probably not be wanted in practice anyway.)

 

Question: Will the above formulation yield the same results for an empty set as this prior formulation?:

          WITH ( P WHERE COLOR = COLOR('Purple') ) AS PP ,

                ( S { S# } JOIN PP { P# } ) AS QQ ,

                ( QQ MINUS SP { S#, P# } ) AS RR :

          S { S# } MINUS RR { S# }

It appears to me that they do not. It appears that supplier S5 will appear in the result set in the "FORALL exp(e)"  formulation but not in the relational comparisons version . The  relational comparison formulation differs from the other in that it eliminates any suppliers not supplying any parts (per SP). The two formulations do not seem equivalent.  Were they intended to be?  

The outermost portion in the example from TTM is "S WHERE", that is, it is the suppliers relvar, that is, result tuples come from the suppliers relvar.  S5 appears in the suppliers relvar, so it will appear in the result if the WHERE clause evaluates to true.  That WHERE clause boils down to (informally) "set of parts supplied by supplier being tested is a superset of the set of purple parts", thus "set of parts supplied by supplier being tested is a superset of the empty set", which is trivially true.

(The comment speaks of "if both sets are equal", which is not the same thing as "one is superset of the other", so there is a remark to be made perhaps, but not the one you did.)

In your WITH version, PP is empty because no purple parts, QQ is empty because it is a join with PP, RR is empty because it is QQ minus something and the result includes S5 because it is (loosely) S MINUS empty.

Erwin says:  Mathematical convention says that a universal quantification over an empty set always yields true, even if the predicate subordinate to the quantification is an outright falsehood/contradiction such as 1=0 :

In RA , would the  "FORALL exp(e)" ever evaluate to FALSE?  You explained that an empty set yields TRUE.  And I believe that a non-empty set also yields TRUE.  Is it only if the set = TABLE_DUM that the result would be FALSE?

Erwin said: The outermost portion in the example from TTM is "S WHERE", that is, it is the suppliers relvar, that is, result tuples come from the suppliers relvar.  S5 appears in the suppliers relvar, so it will appear in the result if the WHERE clause evaluates to true.  That WHERE clause boils down to (informally) "set of parts supplied by supplier being tested is a superset of the set of purple parts", thus "set of parts supplied by supplier being tested is a superset of the empty set", which is trivially true.

(The comment speaks of "if both sets are equal", which is not the same thing as "one is superset of the other", so there is a remark to be made perhaps, but not the one you did.)

I used the phrase "if both sets are equal" because that was the phrase used in the text of the example.  I did notice that set inclusion was used in the actual formulation.

Does not the inner-most portion:  SP RENAME ( S# AS X ) ) WHERE X = S#   get executed first and eliminate S5 from the superset because it does not occur in SP?  

Thank you David for suggesting additional reading material to my list:

(*) DTATRM, DBE, Alice, Appendix A, Codd 1972, Hall & Todd 1975 are often mentioned here, and are all free downloads. I guess that's my core library. Others may want to extend it.

Quote from Greg Klisch on February 12, 2020, 5:26 pm

Erwin says:  Mathematical convention says that a universal quantification over an empty set always yields true, even if the predicate subordinate to the quantification is an outright falsehood/contradiction such as 1=0 :

In RA , would the  "FORALL exp(e)" ever evaluate to FALSE?  You explained that an empty set yields TRUE.  And I believe that a non-empty set also yields TRUE.  Is it only if the set = TABLE_DUM that the result would be FALSE?

In a logical system, the only thing that is logically material is whether conclusions always follow from premises in a logical argument. See the truth value combinations for logical implication aka material implication, which allow false as well as true premises. TABLE_DUM in a logical expression does not necessarily imply logical validity aka deductive validity, neither does an empty set of propositions.

 

The test for logical validity simply verifies that a false conclusion never follows from true premises. (A test that is easily automated even if the usual compilers don't bother.)

 

Logical validity is a requirement of a logically complete system which is one way of evaluating information content, which some languages try to represent in the form of IF ... THEN ... syntax.

 

Implications can themselves be premises and usually are, or should be as far as an RA expression is concerned, such as one meant to calculate suppliers that might supply an infinite number of non-existent parts rather than ones that supply  a finite number of existing parts. A semantic interpretation might involve subjunctive questions but it's only argument syntax that determines validity.

 

Interpreting/comparing those two calculations is easily obfuscated by datatype frameworks, things are more obvious in terms of Codd's domains.

 

Whether an RA expression is sufficiently explicit to represent a formal argument is usually up for grabs here and elsewhere because many premises are left unstated, are implicit, not explicit.

Quote from Erwin on February 6, 2020, 8:06 am
Re. "contradiction in the definition because recording the details of a shipment ..." : the source of your perceiving a "contradiction" here probably stems from you attaching too much meaning to the word "shipment".  I believe "Introduction to database systems" has an explicit remark to the effect that "SHIPMENT" represents "potential shipments" and not "shipments actually made on a specific day to a specific destinee" or some such.  That's another book than the one you were reading, so it's not "your fault", and the choice of name could indeed be considered unfortunate, but then again otoh it's typically impossible to cover all of the semantics of an external predicate in one single name acting as the predicate symbol.

Does "SUPPLIERS" have a similar definition to the effect that "SUPPLIERS" supply all "potential" parts?  If so, am I able to infer that all parts, current and potential, are provided by all suppliers?  If so, then would not the answer to the query--quantified or not quantified--'Get suppliers who supply every ... part' simply be (by definition)?:

S {S#}

After all, there is nothing in the query about shipments! As worded, the query refers only to suppliers and parts, not shipments.

 

 

 

Quote from Greg Klisch on February 13, 2020, 11:55 pm
Quote from Erwin on February 6, 2020, 8:06 am
Re. "contradiction in the definition because recording the details of a shipment ..." : the source of your perceiving a "contradiction" here probably stems from you attaching too much meaning to the word "shipment".  I believe "Introduction to database systems" has an explicit remark to the effect that "SHIPMENT" represents "potential shipments" and not "shipments actually made on a specific day to a specific destinee" or some such.  That's another book than the one you were reading, so it's not "your fault", and the choice of name could indeed be considered unfortunate, but then again otoh it's typically impossible to cover all of the semantics of an external predicate in one single name acting as the predicate symbol.

Does "SUPPLIERS" have a similar definition to the effect that "SUPPLIERS" supply all "potential" parts?  If so, am I able to infer that all parts, current and potential, are provided by all suppliers?  If so, then would not the answer to the query--quantified or not quantified--'Get suppliers who supply every ... part' simply be (by definition)?:

S {S#}

After all, there is nothing in the query about shipments! As worded, the query refers only to suppliers and parts, not shipments.

 

 

 

“...Does "SUPPLIERS" have a similar definition to the effect that "SUPPLIERS" supply all "potential" parts?  ...”

 

Logically, that’s an important question. As you suggested, it should be clear that the alternative relational expression for division is assuming an argument which implies that the set or domain of suppliers who supply all purple parts is the same set as that of suppliers who supply no purple parts.

 

Or, is the assumption just an inference rather than an implication? If so, the sample db is logically incomplete for the purpose this division definition/expression puts it to.  If the assumption is a premise of the argument, then it must be deriveable/logically provable by/from the sample db.  If it's a conclusion then the argument needs to be a logically valid implication.

 

If it’s incomplete, then the limit of the sample db’s information content has been reached and casual language risks various inferences/mysticisms.

 

For example, when is the NOT MATCHING operation not logically valid? Does R{a} Not Matching R{a,b} imply NOT R{a} for some databases but not for others?