The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Questions about New Relational Algebra A

Quote from Greg Klisch on February 25, 2020, 2:50 am

I am confused by the use of predicate logic operators as relational operators.  The predicate operators have Boolean values as operands as demonstrated by a truth table.  The relational operators have relvars (sets of tuples) as operands.  In what sense do relvar values have Boolean values of TRUE and FALSE? I believe only two relvar values have been defined as having BOOLEAN values:  TABLE_DEE and TABLE_DUM.  I obviously have failed to recognize the truth.

<AND>, <OR> are somewhat spelled the same as logic operators (for reasons I'll explain), but they're not spelled exactly as the logic operators -- they have those strange arrow things each side (which we can't easily render in WordPress).

In what sense do relvar values have Boolean values of TRUE and FALSE?

DTATRM Chapter 2/Survey. Suggest you read all of the chapter but especially RELVARS, RELATIONS, AND PREDICATES. This is the kind of foundational understanding most SQL texts manage to omit entirely. Every relvar 'comes with' a characteristic predicate/a criteria for membership of tuples/an interpretation. 'Comes with' in the sense the business analyst/DBA considered it in designing its schema (probably informally, probably unconsciously unless they know D&D texts).

We can apply that predicate to some proffered tuple value and look up the tuple in the relvar: if it's present, the predicate is TRUE of that proffered value. Otherwise it is FALSE (by the Closed World Assumption).

The content of a relvar at a point-in-time state of the database is the extension of that predicate as at that point in time. That's how the database schema is a model of the mini-world of the enterprise. If relvar S contains a tuple with Supplier code 'S1', SName 'Smith', etc, that's saying something about the world. If relvar SP contains a tuple with Supplier code 'S1', etc, that's saying something further about the world. If (S JOIN SP) contains a tuple with Supplier code 'S1', etc, that's saying: 'S1' identifies a Supplier named 'Smith' ... AND that Supplier supplies Part 'Pnn' in quantity QTY.

Consider two queries (S WHERE STATUS = 20), (S WHERE CITY = 'London'). From the characteristic predicate for S, and the predicate represented by the WHERE-condition, we derive a characteristic predicate for the content of each of those queries. As it happens, Supplier 'S1' appears in both. i.e.: 'S1' identifies a Supplier named 'Smith'... AND that Supplier has STATUS 20; 'S1' identifies a Supplier named 'Smith' ... AND that Supplier is located in 'London'.

As per Appendix A, I could write those queries as (S <AND> REL{TUP{ STATUS 20 }}); (S <AND> REL{TUP{ CITY 'London' }}).

Exercise for the reader: consider the <AND> of those two queries; consider the <OR> of them.

Another exercise: consider ((S <AND> REL{TUP{ STATUS 20 }}) <OR> (S <AND> REL{TUP{ CITY 'Paris' }})) wrt the example data in that section of DTATRM, with two Suppliers located in 'Paris'. What is the characteristic predicate for the query?

 I believe only two relvar values have been defined as having BOOLEAN values:  TABLE_DEE and TABLE_DUM.

That's a common misconception. They're relation constants, not variables (usually). Otherwise you'd be able to say: INSERT REL{TUP{}} INTO TABLE_DUM; or DELETE REL{TUP{}} FROM TABLE_DEE. And then where would you be?

Those two values/constants are defined to have an empty heading, with one tuple or no tuples respectively conforming to that heading. They are not defined to have a characteristic predicate (typically).

Consider: (S WHERE STATUS = 20 {}) what are the possible results from that query? What is the characteristic predicate of the query? What is the result of the query asserting for each possible result?

There's a convention we interpret TABLE_DEE as representing TRUE. But really that only makes sense if we know the characteristic predicate of the query that's giving that as a result.

Consider: (TABLE_DEE NOT MATCHING (S WHERE STATUS = 20 {})). Is that a Boolean value? If so, does it denote TRUE or FALSE?

That's a very nice essay on relational theory and I can't argue with any of it, but does it address the question?

I'm used to the idea of AND and OR turning up in a whole range of disciplines, not just predicate logic.From my electronics days, they represent voltages. From assembly language they are bit operations, in programming they guide flow of control and in English grammar they are arse about (to coin a phrase).

RA theory is implicitly founded on predicate logic, but I don't read the <AND> and <OR> in App-A in that light. We certainly wouldn't try to apply de Morgan to them. They're operators on relations and the result is always another relation. Every tuple in the relation is taken as true; everything else false. I don't think you can read more into the names.

Ask Hugh if you get a chance.

Andl - A New Database Language - andl.org
Quote from dandl on February 25, 2020, 8:42 am

RA theory is implicitly founded on predicate logic, but I don't read the <AND> and <OR> in App-A in that light. We certainly wouldn't try to apply de Morgan to them. They're operators on relations and the result is always another relation. Every tuple in the relation is taken as true; everything else false. I don't think you can read more into the names.

Ask Hugh if you get a chance.

No need to ask, he's already written it. From Appendix A

We do not actually need both <AND> and <OR>in order to achieve relational completeness, thanks to De Morgan's Laws. For example, A <AND> B is identically equal to <NOT>((<NOT> A) <OR> (<NOT> B)), ...

Quote from AntC on February 25, 2020, 9:44 am
Quote from dandl on February 25, 2020, 8:42 am

RA theory is implicitly founded on predicate logic, but I don't read the <AND> and <OR> in App-A in that light. We certainly wouldn't try to apply de Morgan to them. They're operators on relations and the result is always another relation. Every tuple in the relation is taken as true; everything else false. I don't think you can read more into the names.

Ask Hugh if you get a chance.

No need to ask, he's already written it. From Appendix A

We do not actually need both <AND> and <OR>in order to achieve relational completeness, thanks to De Morgan's Laws. For example, A <AND> B is identically equal to <NOT>((<NOT> A) <OR> (<NOT> B)), ...

Thank you, AntC.

The relationship between predicate logic and relational theory came to me long after my first visit (in 1972) to Relationland.  It was my friend the late Adrian Larner who first drew my attention to it and I remember asking him, what's a predicate?--a question to which Chris and I found ourselves still unsure of the exact answer when we wrote Chapter 2 of DBE.

By the time we were writing the first book based on TTM I felt I had at last come to a full enough understanding of the connection, now expressed in my free download book at Bookboon (and in my university lectures up to 2013 when I finally retired fully).  When we decided to invent a TTM-conforming computer language (because one didn't exist and we needed one for illustrative examples), I proposed to use the operator names AND and OR in place of JOIN and UNION.  Chris demurred but came up with the alternative idea of using them, adorned with angle brackets, in a definition of a totally abstract algebra that he envisaged would be a useful referent in definitions of relational operators designed for practical use (and we did indeed use it in the definition of Tutorial D).  Because an abstract algebra is not concerned with computational issues, we were able to define <OR> without the restrictions imposed on UNION.

Consider r1 <OR> r2.  If tuple t1 appears in r1, then in the result it appears (loosely speaking) "alongside" every tuple t2 that has the same heading as r2.  (Note: heading; sorry, but "header" touches a nerve with me every time I see it when heading is meant.)  Some of those tuples t2 do not appear in r2 and so represent falsehood with respect to the predicate p2 associated with r2; but that's fine because t1 represents truth withe respect to r1's predicate p1, so the predicate p1 OR p2 is satisfied.

The original question asked about relvars but there's no need to bring relation variables into the discussion.  The algebra A is just a set of mathematical operators in which the notion of variables (especially those in the programming language sense of that term) is alien.  The questioner also showed a misunderstanding of TABLE_DEE and TABLE_DUM.  These are relations, not variables.  (Chris and I found it necessary to explicate the difference between a value and variable because computer programmer types seem so often to lose it. That's because in a programmer's mind every value he or she is concerned with occupies a bit of space in the computer's memory, regardless of whether that bit of space has a variable name.  As a programmer myself I, too, had to learn the hard way.)  Anyway, relations are not truth values and do not represent truth values.  The same is the case with tuples: they are not truth values, nor does a tuple, considered in isolation, represent one.  However, we can use the attribute values of a tuple to form a sentence of the kind of which it can be said that it's either true or false, such as "Supplier S1 supplies part P1", where the tuple's attributes have the values S1 and P1.  That statement is either true, S1 does indeed supply part P1, or false.  A relation consisting of all the tuples of that form that yield true sentences in the same way gives us the entire truth about which suppliers supply which parts.  If we project that relation on no attributes at all, we get either TABLE_DEE (so there's at least one supplier who supplies at least one part) or TABLE_DUM (so there isn't: no suppliers supply anything).

Well, I was asked (albeit indirectly).

Hugh

Coauthor of The Third Manifesto and related books.

version 2: I am removing some statements.

Quote from Erwin February 24, 2020, 9:18 am:

Consider a tuple (following your own syntax) {A7,B7,C1,D1} and just apply the definition Bs={...}.  Does that tuple satisfy the condition or not ?  Repeat same exercise for a hypothetical tuple {A1,B1,C7,D7}.  Does that one satisfy the condition or not ?

Next exercise : I now rewrite the condition "tr1 ∈ Br1 or tr2 ∈ Br2" to "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1)".  In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).  Does that actually alter the condition or are those conditions provably equivalent ?

Next exercise : is the rewritten condition then equivalent to the algebraic expression (R1 JOIN UR2) UNION (R2 JOIN UR1) ?

(Bonus question : what might be the reason the typical DBMS won't provide an operator such as ◄OR► ?)

Before I attempt to do the assigned exercises, I need to understand more than I yet do. I restate my example including values for the types:

Example of relvar R1:

  • with Hr1 = { <A,AT>, <B,BT> }
  • type AT has this set of values {A1,A2,A3}
  • type BT has this set of values {B1,B2}
  • where the current value of R1 is these two tuples:

{ {A1,B1} , {A2, B2} }

and relvar R2:

  • with Hr2 = { <C,CT>,<D,DT> }
  • type CT has this set of values {C1,C2,C3,C4}
  • type DT has this set of values {D1,D2,D3}
  • where the current value of R2 is these 3 tuples:

{ {C1,D1} , {C2,D2}, {C3, D3} }

 

To test my understanding: would the value of Ur1 be these 6 tuples (IE: A Cartesian product of the attribute values for the two types):

{ {A1,B1}, {A1,B2}, {A2,B1}, {A2,B2}, {A3,B1}, {A3,B2} }

and would the value of Ur2 be:

{ {C1,D1},{C1,D2}, {C1,D3}, {C2,D1}, {C2,D2},{C2,D3},{C3,D1}, {C3,D2}, {C3,D3},

{C4,D1}, {C4,D2}, {C4,D3} }

?

If have no idea what Ur1 would like if any of the type were of an infinite type; example: integer with no constraints.

Now an attempt at doing the 4 exercises:

Exercise 1: Consider a tuple (following your own syntax) {A7,B7,C1,D1} and just apply the definition Bs={...}. Does that tuple satisfy the condition or not ?Repeat same exercise for a hypothetical tuple {A1,B1,C7,D7}. Does that one satisfy the condition or not ?

Per the revised example above, neither of these tuples would satisfy the condition because ‘ts = tr1 union tr2’ would be FALSE. In the first tuple: a tr2 exists in R2 but a tr1 does not exist in R1. Nor are they in the super-sets Ur2 or Ur1, respectively. [I’m trusting that I now understand the definition of the newly introduced Urx]. If A7 and B7 were in the type AT and BT, respectively, then the first tuple would satisfy the condition. Similar statement for C7 and D7.

Exercise 2: I now rewrite the condition "tr1 ∈ Br1 or tr2 ∈ Br2" to "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1)". In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types). Does that actually alter the condition or are those conditions provably equivalent ?

As stated, they are not equivalent because the original condition is incomplete: it does not specify set membership for tuple tr2 when tr1 ∈ Br1 [similarly for tuple tr1 when tr2 ∈ Br2]. It does state that a tr2 must exist that can be ‘union’ed with tr1. But, it doesn’t tell you to which set it belongs. Now, I see it must be a member of Ur2.

The rewritten condition is more complete and is easier to understand. Thank you.

It would also be helpful to have a formal definition of Urx, in addition to the explanation: In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types)”.

Exercise 3: Next exercise : is the rewritten condition then equivalent to the algebraic expression (R1 JOIN UR2) UNION (R2 JOIN UR1) ?

My answer is yes no as I believe the formal conditions/requirements are currently incomplete. The formal definitions does not include a definition of an operation. I think the formal definition is incomplete because it does not require that for every tr1 in R1 (or Ur1) that there be a corresponding tuple ts in B [similar statement for tr2]. Maybe, informally, something like for all tr1 in R1 there exists a ts in B <condition>. The definition/condition does state that were a tuple to be included in B it must meet that condition. Using my example database from above this value of B with a single tuple would meet the conditions as currently stated:

{A1,B1,C1,D1}

Informally, the explanatory text does state that a ‘union’ is being referred to. I do agree that if one takes the explanatory text and combines it with the formal definition; then, Yes the algebraic expression represents the intension of the operation. It was kind of you to provide a clue to the operation of the operator via that expression.

I posted these questions in my second post of this thread;

Given a tuple tr1 to which tuple tr2 would the tuple union apply? Every tr2

I retract those questions. I can find the answers using the revised conditions and in the algebraic expression you supplied.

Exercise 4: (Bonus question : what might be the reason the typical DBMS won't provide an operator such as ◄OR►?)

The creation of the sets of Urx could be prohibitively expensive. The number of members in Urx increases rapidly with the addition of attributes of different types. Particularly if any of the attribute types were ‘infinite’; such as, any integer.  Most candidate keys of a relation are of the infinite variety.  What would such a superset even look like? Would would be the key of the the combined relations? Would it require new system generated attribute that becomes the key and the keys from R1 and R2 become 'foreign keys'. It's just a rhetorical question; but what if:

it were permissible to use some sort of placeholder/reference in lieu of a specific value for such types as is done in this predicate:

If employee EX works in department DX, then <EX,DX,j> is a tuple in the body of this relation for all possible projects j, regardless of whether employee EX actually works on project j (and regardless of whether there is even a project j in the company at this time).

Maybe something like this:

{A1, {<TB>} } where TB is the type name for attribute B and plays the role of j in the predicate above.

A kind of GROUPing as it were without enumerating all the values of the type.

 

RE DUM and DEE

I did mistakenly call them relvars; but only because DTATRM defined them in terms of PROJECTions over relvars. But later in DTATRM they are referred to as relcons.

I obtained the idea of relcons TABLE_DEE and TABLE_DUM as having Boolean values from this quote from RM PRESCRIPTION 10: RELATIONS {emphasis is mine}:

As we saw under RM Prescription 9, tuples of degree zero are valid; thus, relations of degree zero are necessarily valid as well. In fact, there are exactly two such: one whose body contains just one tuple—the 0-tuple, of course—and one whose body contains no tuples at all. Following reference [29], we call these two relations, informally, TABLE_DEE and TABLE_DUM, respectively (DEE and DUM for short). What makes them so special is the fact that they can be interpreted as TRUE and FALSE, respectively,…

Perhaps the key word here is “interpreted”?

Further, in Appendix E “Darwen’s approach” to view updating, DEE and DUM are used in an example of SHOP IS OPEN/CLOSED and ALARM IS SET (or not) to represent truth values (or so it appears to me). I’m not going to reproduce that entire example here.

So it was in that role that I mentioned them at all when I asked this question:  In what sense do relvar values have Boolean values of TRUE and FALSE?  DUM and DEE are the only RA constructs that are mentioned in DTATRM in that context.

As always I am thankful for your patience with my ignorance, AntC, dandl, and Hugh, and for willingness to continue to answer my questions. You have reminded me that the answers are in the DTATRM; but sometimes a rephrasing of information helps me understand.

Greg,

" that there be a corresponding tuple ts in B "

I think this exposes your thinking error in trying to understand <OR>.  It's not about "finding" "corresponding" tuples in the other argument.  That's what JOIN (aka <AND>) is about.

The condition for <OR> says "imagine any imagineable tuple (of the heading of the result).  If its "A part" is in Ra (regardless of what the "B part" is) ***OR*** if its "B part" is in Rb (regardless of what the "A part" is) then that "any imagineable tuple" should be in the result".

The two mentions of "regardless" make it a ***wrong*** strategy to go try and find "corresponding" tuples.  That kind of search, because of the "regardless", should be done in those "universal" relations, and those relations being the universal relations, finding those tuples there will just be trivially true (so upfront you already know there is just no point in "go try and find" - you know you will) !

At least you already got close where you said "The creation of the sets of Urx could be prohibitively expensive.".  That was the point because the only known strategies that effectively produce the result as defined by the definition, all include "the creation of the sets of Urx".

Quote from Greg Klisch on February 26, 2020, 6:53 pm

...
I obtained the idea of relcons TABLE_DEE and TABLE_DUM as having Boolean values from this quote from RM PRESCRIPTION 10: RELATIONS {emphasis is mine}:

As we saw under RM Prescription 9, tuples of degree zero are valid; thus, relations of degree zero are necessarily valid as well. In fact, there are exactly two such: one whose body contains just one tuple—the 0-tuple, of course—and one whose body contains no tuples at all. Following reference [29], we call these two relations, informally, TABLE_DEE and TABLE_DUM, respectively (DEE and DUM for short). What makes them so special is the fact that they can be interpreted as TRUE and FALSE, respectively,…

Perhaps the key word here is “interpreted”?

Yes.  The set of boolean values (that is, the BOOLEAN datatype) is of cardinality two.  The set of possible relations of degree zero (that is, the nonscalar datatype RELATION{}) is of cardinality two.

Enumerated, they are respectively {TRUE, FALSE} and {REL{}{}   REL{}{TUP{}} }.  As long as this notation keeps confusing you, don't read on and try to mark the accolades denoting the type's value set, the type designators in the nonscalar datatype RELATION{} and the members of the bodies of the values of that nonscalar datatype, all in different colours with a marker.  Just don't do it on your expensive display :-)

Now because both these types are of cardinality two, there is by definition some isomorphism (bijective function) that "maps" any member of the former to some member of the latter.  And it is this isomorphism that gives rise to the "interpretation" : the isomorphism means you can substitute TRUE for REL{}{TUP{}} and be certain you haven't lost any information at all (just don't try that in human communications with non-TTM folk :-) ).

(The "opposite" isomorphism is obviously equally "thinkable" and "valid" in the mathematical sense, but somehow it would probably go counter to most people's instincts to map "yippEEh there is a tuple" to FALSE and "deUrhM there is no tuple" to TRUE.)

Greg,

Before I forget, your understanding :

"IE: A Cartesian product of the attribute values for the two types"

is indeed correct.  It's not complete as a formal statement, but I probably could hardly do much better.  In your example, cardinalities 6 and 12 (and the enumerations) are right.

Just for total clarity and to put things into a concrete setting, would someone care to enumerate the tuples that comprise

S {S#,SNAME} ◄AND► P {P#,PNAME}

S {S#,SNAME} ◄OR► P {P#,PNAME}

Or at least a representative sample thereof?

The first would seem to be simply the cross-product (5 * 6 = 30 tuples). But the second? Is is 5 * 6 * 5 + 6 * 5 * 5 = 300 or 5 * 5 * 6 * 5 = 750 tuples, or something else again?

Andl - A New Database Language - andl.org