The Third Manifesto ForumThe Third Manifesto: TTM General discussionQuestions about New Relational Al …

Please or Register to create posts and topics.

Questions about New Relational Algebra A

#31 · February 28, 2020, 3:22 am

Quote from AntC on February 28, 2020, 3:22 am

Quote from dandl on February 28, 2020, 12:06 am

Quote from Erwin on February 27, 2020, 2:56 pm

Quote from dandl on February 27, 2020, 1:00 pm

So no, you stand little chance that somebody is going to want to enumerate that.

Thank you. That was much clearer, but now I have to ask: what' s the point?

I guess a similar consideration would apply to ◄NOT►, relational values made up of vast quantities of 'virtual' tuples to be weeded out by subsequent operations.

I can do the maths, but I've always liked to check out the intermediate results to see where we're headed. Maybe not this time.

The point is that the result of "R1 <OR> R2" is the ***only*** relation that truly complies with the CWA and is the extension of the predicate "P(R1) OR P(R2)".

(complies with CWA in the sense that for any absent tuple the corresponding proposition must be false, meaning that if some combo of R1 attribute values makes P(R1) true, then the disjunction is by definition also true, meaning any such R1 tuple must appear "accompanied" by every possible combo of R2 attribute values)

I get that, but that's not quite my concern. Apart from Boolean, the common attribute types we deal with are all infinite. That means the results of ◄NOT► and generalised ◄OR► will almost always be infinite relations. I was taught to be deeply suspicious of any kind of maths that has an infinity as an intermediate result. The CWA exists precisely for the purpose of avoiding having to deal with infinities. I don't see where and how that is dealt with in App-A.

I can do the maths, I just didn't know I had to. I have read App-A or bits of it many times, but I had not realised that two of the core operators routinely produce infinities and have to be dealt with by recourse to the CWA. An observation to that effect would have been helpful.

From Appendix A 'Dispensing with UNION'

Note: We do not concern ourselves here with the computational difficulties that might arise from our generalization of Codd's UNION, because at this point we are only defining an algebra. Various safety mechanisms can be (and normally are) imposed in practice to circumvent such difficulties. For similar reasons, we also do not concern ourselves with the high degree of redundancy that most relations produced by <OR> will exhibit.

The "safety mechanisms" have been well understood since Codd 1972 (also in HHT 1975), and are in fact embodied in 'HOW Tutorial D BUILDS ON A'. For example, the only Tutorial D operation that maps to <OR> is UNION; the only that maps to <NOT> is MINUS; and both those Tutorial D operators require their operands to have same heading so are not domain-dependent. Also note that r1 NOT MATCHING r2 (semiminus) is not translated to bare r1 <AND> <NOT> r2, but to r1 MINUS (r1 MATCHING r2); so can be implemented as a filter on r1, thus avoiding any "degree of redundancy".

From 'Dispensing with MINUS'

... Note: Computational difficulties arise here as they did with <OR>, but again we need not concern ourselves with them at this juncture.

Having just re-read the Intro and Motivation sections of Appendix A, I see no suggestion the A algebra is intended to be executable or computable. Neither do I see such a suggestion in Codd 1972's set of operators. "This paper attempts to provide a theoretical basis which may be used to determine how complete a selection capability is provide" says the Abstract. The operators characterise "a yardstick of selective power".

So A is giving a semantics for the operators of a D; as an example it translates Tutorial D operators to A.

It may also be relevant that as at when Appendix A was written, TTM RM Pre 1 said scalar types must be finite.

Quote from dandl on February 28, 2020, 12:06 am

Quote from Erwin on February 27, 2020, 2:56 pm

Quote from dandl on February 27, 2020, 1:00 pm

So no, you stand little chance that somebody is going to want to enumerate that.

Thank you. That was much clearer, but now I have to ask: what' s the point?

I guess a similar consideration would apply to ◄NOT►, relational values made up of vast quantities of 'virtual' tuples to be weeded out by subsequent operations.

I can do the maths, but I've always liked to check out the intermediate results to see where we're headed. Maybe not this time.

The point is that the result of "R1 <OR> R2" is the ***only*** relation that truly complies with the CWA and is the extension of the predicate "P(R1) OR P(R2)".

(complies with CWA in the sense that for any absent tuple the corresponding proposition must be false, meaning that if some combo of R1 attribute values makes P(R1) true, then the disjunction is by definition also true, meaning any such R1 tuple must appear "accompanied" by every possible combo of R2 attribute values)

I get that, but that's not quite my concern. Apart from Boolean, the common attribute types we deal with are all infinite. That means the results of ◄NOT► and generalised ◄OR► will almost always be infinite relations. I was taught to be deeply suspicious of any kind of maths that has an infinity as an intermediate result. The CWA exists precisely for the purpose of avoiding having to deal with infinities. I don't see where and how that is dealt with in App-A.

I can do the maths, I just didn't know I had to. I have read App-A or bits of it many times, but I had not realised that two of the core operators routinely produce infinities and have to be dealt with by recourse to the CWA. An observation to that effect would have been helpful.

From Appendix A 'Dispensing with UNION'

Note: We do not concern ourselves here with the computational difficulties that might arise from our generalization of Codd's UNION, because at this point we are only defining an algebra. Various safety mechanisms can be (and normally are) imposed in practice to circumvent such difficulties. For similar reasons, we also do not concern ourselves with the high degree of redundancy that most relations produced by <OR> will exhibit.

The "safety mechanisms" have been well understood since Codd 1972 (also in HHT 1975), and are in fact embodied in 'HOW Tutorial D BUILDS ON A'. For example, the only Tutorial D operation that maps to <OR> is UNION; the only that maps to <NOT> is MINUS; and both those Tutorial D operators require their operands to have same heading so are not domain-dependent. Also note that r1 NOT MATCHING r2 (semiminus) is not translated to bare r1 <AND> <NOT> r2, but to r1 MINUS (r1 MATCHING r2); so can be implemented as a filter on r1, thus avoiding any "degree of redundancy".

From 'Dispensing with MINUS'

... Note: Computational difficulties arise here as they did with <OR>, but again we need not concern ourselves with them at this juncture.

Having just re-read the Intro and Motivation sections of Appendix A, I see no suggestion the A algebra is intended to be executable or computable. Neither do I see such a suggestion in Codd 1972's set of operators. "This paper attempts to provide a theoretical basis which may be used to determine how complete a selection capability is provide" says the Abstract. The operators characterise "a yardstick of selective power".

So A is giving a semantics for the operators of a D; as an example it translates Tutorial D operators to A.

It may also be relevant that as at when Appendix A was written, TTM RM Pre 1 said scalar types must be finite.

#32 · February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

Author of SIRA_PRISE

#33 · February 28, 2020, 8:36 am

Quote from AntC on February 28, 2020, 8:36 am

Quote from Erwin on February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

tr2 = TUPLE_DEE I can't see a difficulty: TUPLE_DEE is a (empty) set, so can be unioned with tr1.

But yes this is getting tricky. Suppose BR2 is empty (whatever its heading). Then what in the Appendix A FORMAL DEFINITIONS is constraining tr2 to conform to the heading HR2? (We don't get that difficulty with the definition for <AND>, because that requires tr2 ∈ Br2 so tr2 must conform to the heading.) Possibly the definition for <NOT> suffers this worse: what constrains tr or ts to conform to the heading?

Do we assume a multi-sorted setbuilder algebra, such that the very appearance of tr1 ∈ Br1 constrains tr1 to be of the appropriate type? And in the definition for <NOT>, ts = tr constrains ts to be same type as tr (which is also constrained by a (non-)membership test).

What if Ur2 is empty? I think TTM explicitly excludes empty scalar types other than OMEGA, but that can't be an attribute type(?)

Quote from Erwin on February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

tr2 = TUPLE_DEE I can't see a difficulty: TUPLE_DEE is a (empty) set, so can be unioned with tr1.

But yes this is getting tricky. Suppose BR2 is empty (whatever its heading). Then what in the Appendix A FORMAL DEFINITIONS is constraining tr2 to conform to the heading HR2? (We don't get that difficulty with the definition for <AND>, because that requires tr2 ∈ Br2 so tr2 must conform to the heading.) Possibly the definition for <NOT> suffers this worse: what constrains tr or ts to conform to the heading?

Do we assume a multi-sorted setbuilder algebra, such that the very appearance of tr1 ∈ Br1 constrains tr1 to be of the appropriate type? And in the definition for <NOT>, ts = tr constrains ts to be same type as tr (which is also constrained by a (non-)membership test).

What if Ur2 is empty? I think TTM explicitly excludes empty scalar types other than OMEGA, but that can't be an attribute type(?)

#34 · February 28, 2020, 11:46 am

Quote from Hugh on February 28, 2020, 11:46 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Hugh,

The definition of the <OR> operator as stated in Appendix A of the DTATRM includes this condition:

Bs = { ts : exists tr1 exists tr2 ( ( tr1 ∈ Br1 or tr2 ∈ Br2 ) and ts = tr1 union tr2 ) }

Question: Is Erwin’s version acceptable to you as a replacement for the version in the DTATRM?

Question: Does Erwin’s Ur1 and Ur2 faithfully represent your meaning of “superset” in this quote from DTATRM regarding the <OR> operator?

The body of s contains every tuple that conforms to the heading of s and is a superset of either some tuple in the body of r1 or some tuple in the body of r2.

You flatter me, because I have had little formal training in logic and regard myself as a strict amateur even though I felt confident enough to teach the basics to computer science undergrads. That said, yes, I do believe the definitions are equivalent, though you have to read the surrounding text as well. For example, Appendix A has an important stipulation regarding common attributes. (I assume the double quote in the citation of Erwin's definition is a typo for left paren.)

Erwin's definition and uses of Urx cover a stipulation that Appendix A deals with in the introductory text under FORMAL DEFINITIONS.

Hugh

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Hugh,

The definition of the <OR> operator as stated in Appendix A of the DTATRM includes this condition:

Bs = { ts : exists tr1 exists tr2 ( ( tr1 ∈ Br1 or tr2 ∈ Br2 ) and ts = tr1 union tr2 ) }

Question: Is Erwin’s version acceptable to you as a replacement for the version in the DTATRM?

Question: Does Erwin’s Ur1 and Ur2 faithfully represent your meaning of “superset” in this quote from DTATRM regarding the <OR> operator?

The body of s contains every tuple that conforms to the heading of s and is a superset of either some tuple in the body of r1 or some tuple in the body of r2.

You flatter me, because I have had little formal training in logic and regard myself as a strict amateur even though I felt confident enough to teach the basics to computer science undergrads. That said, yes, I do believe the definitions are equivalent, though you have to read the surrounding text as well. For example, Appendix A has an important stipulation regarding common attributes. (I assume the double quote in the citation of Erwin's definition is a typo for left paren.)

Erwin's definition and uses of Urx cover a stipulation that Appendix A deals with in the introductory text under FORMAL DEFINITIONS.

Hugh

Coauthor of The Third Manifesto and related books.

#35 · February 28, 2020, 11:55 am

Quote from Hugh on February 28, 2020, 11:55 am

Quote from AntC on February 28, 2020, 8:36 am

Quote from Erwin on February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

tr2 = TUPLE_DEE I can't see a difficulty: TUPLE_DEE is a (empty) set, so can be unioned with tr1.

But yes this is getting tricky. Suppose BR2 is empty (whatever its heading). Then what in the Appendix A FORMAL DEFINITIONS is constraining tr2 to conform to the heading HR2? (We don't get that difficulty with the definition for <AND>, because that requires tr2 ∈ Br2 so tr2 must conform to the heading.) Possibly the definition for <NOT> suffers this worse: what constrains tr or ts to conform to the heading?

Do we assume a multi-sorted setbuilder algebra, such that the very appearance of tr1 ∈ Br1 constrains tr1 to be of the appropriate type? And in the definition for <NOT>, ts = tr constrains ts to be same type as tr (which is also constrained by a (non-)membership test).

What if Ur2 is empty? I think TTM explicitly excludes empty scalar types other than OMEGA, but that can't be an attribute type(?)

Nice question, but isn't the answer obvious? If Ur2 is empty then of course there are no tuples tr2 and so r1 <OR> r2 is empty. But Ur2 can be empty only if at least one of its attribute types is empty, and that can happen only if omega is an attribute type at some level of nesting in Hr2. I think. Am I right?

Hugh

Quote from AntC on February 28, 2020, 8:36 am

Quote from Erwin on February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

tr2 = TUPLE_DEE I can't see a difficulty: TUPLE_DEE is a (empty) set, so can be unioned with tr1.

But yes this is getting tricky. Suppose BR2 is empty (whatever its heading). Then what in the Appendix A FORMAL DEFINITIONS is constraining tr2 to conform to the heading HR2? (We don't get that difficulty with the definition for <AND>, because that requires tr2 ∈ Br2 so tr2 must conform to the heading.) Possibly the definition for <NOT> suffers this worse: what constrains tr or ts to conform to the heading?

Do we assume a multi-sorted setbuilder algebra, such that the very appearance of tr1 ∈ Br1 constrains tr1 to be of the appropriate type? And in the definition for <NOT>, ts = tr constrains ts to be same type as tr (which is also constrained by a (non-)membership test).

What if Ur2 is empty? I think TTM explicitly excludes empty scalar types other than OMEGA, but that can't be an attribute type(?)

Nice question, but isn't the answer obvious? If Ur2 is empty then of course there are no tuples tr2 and so r1 <OR> r2 is empty. But Ur2 can be empty only if at least one of its attribute types is empty, and that can happen only if omega is an attribute type at some level of nesting in Hr2. I think. Am I right?

Hugh

Coauthor of The Third Manifesto and related books.

#36 · February 28, 2020, 12:01 pm

Quote from Erwin on February 28, 2020, 7:38 am

Quote from Greg Klisch on February 27, 2020, 10:33 pm

To Hugh and Erwin:

Erwin,

you provided me with the following rewritten condition for the <OR> operator:

Bs = { ts : exists tr1 exists tr2 "(tr1 ∈ Br1 and tr2 ∈ Ur2) or (tr2 ∈ Br2 and tr1 ∈ Ur1) and ts = tr1 union tr2 ) }

With this explanation of Urx : “ In there Urx are the "universal" relations of their respective types (the ones that contain all possible tuples of their types).

At that time you asked me: “Does that actually alter the condition or are those conditions provably equivalent ?” Even though you posed that question as a student exercise, I decided that it is not proper for me to answer that question because Hugh has ownership of the original condition. So, I’m asking it of Hugh.

Just want to add here that very strictly speaking, the two are not 100% equivalent imo because the original (the shorter, TTM version) does not appear to explicitly require both tr1 and tr2 to have the specific headings such as to conform to the type of BR1/BR2. Meaning, strictly speaking, tr1 itself must also be member of Bs (take tr2 = TUPLE_DEE). Cannot be the intent, of course.

That matter is explicitly covered in the introductory text for the whole section FORMAL DEFINITIONS.

Hugh

Coauthor of The Third Manifesto and related books.

#37 · February 28, 2020, 5:39 pm

Quote from Hugh on February 28, 2020, 5:39 pm

Quote from p c on February 27, 2020, 8:07 pm

to: Greg Klisch

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

It’s also important to take posts claiming “I can do the math” with a grain of salt when they are accompanied by the kind of happy-go-lucky posts that an end-users might make, where “facts” trump theory, where a property of data is an unknown concept, where implications apparently can’t be premises of a logical argument and where Codd’s 1972 division which requires union-compatible operands is equated with various TTM expressions that don’t. (Even though the needed mathematical thinking is simple, comments such as “That’s English usage” don’t cut it.)

Note the 1970 warming: “...Most users would not be directly concerned with these operations. Information systems designers and people concerned with data bank control should, however, be thoroughly familiar with them. …”

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid.

In the concise form preferred by some logicians and a tool I prefer natural join has this equivalance

( A=S) = ( -(A&S) = (-A & -S ) ) is valid .

This is not an expression that the average coder will recognize (and apparently some dbms developers too). Operator precedence is the customary logical precedence, & is conjunction, - is negation, and = is mutual implication/bi-implication. Because of poor eyesight I can’t risk more verbose expression forms.

The crucial equivalence is the first parenthesized term which gives Codd’s condition for natural join. It acts as a premise for a succeeding implication which has a premise of a negated conjunction. With suitable rearrangement other forms are possible such as conjunctive premises I just happen to like this form for certain visual advantages.

By contrast, the fAppendix A de Morgan example fails to be valid under Codd’s condition:

(A=S) = (-(A & S) = (-A + -S)) is not valid.

Occasionally I have posted examples of how following Codd’s natural join condition can lead to data designs that permit logically complete databases in the Godel sense, logically valid queries and updates and importantly designs that take much data housekeeping out of the hands of users and into the control of designers. If you swear only by the Appendix A catechism such possibilities are likely to be lost.

I have nothing to say about the phrase “semantically equivalent” in Appendix A.

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only. Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal. Unfortunately, pc's opening sentence is ambiguous: what is the referent for "which"? Is it the deviation from Codd1970 that he thinks "creates massive unnecessary use-ability differences", or is it, as I have pointed out, Codd1970 per se? I assume the former because he later claims that "Appendix A has tragic practical consequences because it misunderstands natural join". On the contrary, the consequences for me, as a daily user of Rel, would indeed be tragic if I didn't have TD's JOIN (which I take to be what pc calls "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Hugh

Quote from p c on February 27, 2020, 8:07 pm

to: Greg Klisch

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

It’s also important to take posts claiming “I can do the math” with a grain of salt when they are accompanied by the kind of happy-go-lucky posts that an end-users might make, where “facts” trump theory, where a property of data is an unknown concept, where implications apparently can’t be premises of a logical argument and where Codd’s 1972 division which requires union-compatible operands is equated with various TTM expressions that don’t. (Even though the needed mathematical thinking is simple, comments such as “That’s English usage” don’t cut it.)

Note the 1970 warming: “...Most users would not be directly concerned with these operations. Information systems designers and people concerned with data bank control should, however, be thoroughly familiar with them. …”

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid.

In the concise form preferred by some logicians and a tool I prefer natural join has this equivalance

( A=S) = ( -(A&S) = (-A & -S ) ) is valid .

This is not an expression that the average coder will recognize (and apparently some dbms developers too). Operator precedence is the customary logical precedence, & is conjunction, - is negation, and = is mutual implication/bi-implication. Because of poor eyesight I can’t risk more verbose expression forms.

The crucial equivalence is the first parenthesized term which gives Codd’s condition for natural join. It acts as a premise for a succeeding implication which has a premise of a negated conjunction. With suitable rearrangement other forms are possible such as conjunctive premises I just happen to like this form for certain visual advantages.

By contrast, the fAppendix A de Morgan example fails to be valid under Codd’s condition:

(A=S) = (-(A & S) = (-A + -S)) is not valid.

Occasionally I have posted examples of how following Codd’s natural join condition can lead to data designs that permit logically complete databases in the Godel sense, logically valid queries and updates and importantly designs that take much data housekeeping out of the hands of users and into the control of designers. If you swear only by the Appendix A catechism such possibilities are likely to be lost.

I have nothing to say about the phrase “semantically equivalent” in Appendix A.

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only. Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal. Unfortunately, pc's opening sentence is ambiguous: what is the referent for "which"? Is it the deviation from Codd1970 that he thinks "creates massive unnecessary use-ability differences", or is it, as I have pointed out, Codd1970 per se? I assume the former because he later claims that "Appendix A has tragic practical consequences because it misunderstands natural join". On the contrary, the consequences for me, as a daily user of Rel, would indeed be tragic if I didn't have TD's JOIN (which I take to be what pc calls "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Hugh

Coauthor of The Third Manifesto and related books.

#38 · February 29, 2020, 12:24 am

Quote from AntC on February 29, 2020, 12:24 am

Quote from Hugh on February 28, 2020, 5:39 pm

Quote from p c on February 27, 2020, 8:07 pm

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

...

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid. ...

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

So I expect Paul to condemn Codd's post-1970 papers. Or is it possible (gasp!) that Codd 1970 is not the last word on the RM?

Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal.

Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about? As in Codd's 1970 model is unusable? All that's going to happen is programmers will avoid Natural Join and instead use equi-joins with projections which'll give us SQL SELECT. Oh yes, that's what they do. Many of them go further and warn "Natural Join: a disaster waiting to happen".

... "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Appendix A FORMAL DEFINITIONS for <AND>

We remark that the <AND> operator might logically be called the conjoin.

Quote from Hugh on February 28, 2020, 5:39 pm

Quote from p c on February 27, 2020, 8:07 pm

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

...

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid. ...

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

So I expect Paul to condemn Codd's post-1970 papers. Or is it possible (gasp!) that Codd 1970 is not the last word on the RM?

Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal.

Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about? As in Codd's 1970 model is unusable? All that's going to happen is programmers will avoid Natural Join and instead use equi-joins with projections which'll give us SQL SELECT. Oh yes, that's what they do. Many of them go further and warn "Natural Join: a disaster waiting to happen".

... "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Appendix A FORMAL DEFINITIONS for <AND>

We remark that the <AND> operator might logically be called the conjoin.

#39 · February 29, 2020, 3:01 pm

Quote from Hugh on February 29, 2020, 3:01 pm

Quote from AntC on February 29, 2020, 12:24 am

Quote from Hugh on February 28, 2020, 5:39 pm

Quote from p c on February 27, 2020, 8:07 pm

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

...

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid. ...

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

So I expect Paul to condemn Codd's post-1970 papers. Or is it possible (gasp!) that Codd 1970 is not the last word on the RM?

Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal.

Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about? As in Codd's 1970 model is unusable? All that's going to happen is programmers will avoid Natural Join and instead use equi-joins with projections which'll give us SQL SELECT. Oh yes, that's what they do. Many of them go further and warn "Natural Join: a disaster waiting to happen".

... "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Appendix A FORMAL DEFINITIONS for <AND>

We remark that the <AND> operator might logically be called the conjoin.

Thanks Antc. Chris must have written that. Apologies to pc.

As soon as I discovered ISBL in 1978 I realised that Codd's early writings were merely seminal—that it would be up to others, language designers and implementers, to "stand on his shoulders" and eventually clean up. At the time that perception didn't bother me. Why should it? But Codd never seemed to show much interest in other people's work that he had inspired. And he still had a lot of undue influence even when he later came up with poor ideas (e.g. 3VL) and poor work (e.g. RM/T and RM/V2). As a result, the industry (at least) didn't clean up. It's all very sad and I have indeed wept real tears over it, as I wrote in the intro to my contribution in RDB Writings 1985-1989.

Hugh

Quote from AntC on February 29, 2020, 12:24 am

Quote from Hugh on February 28, 2020, 5:39 pm

Quote from p c on February 27, 2020, 8:07 pm

As well-intentioned and well-written as Appendix A is, it should be de rigueur to approach it with an understanding of its departures from the 1970 paper. Especially noticeable is the difference between conjoin and Codd’s natural join which creates massive unnecessary use-ability differences.

...

Note especially section 2.1.3 from 1970 where two relations are defined as joinable without loss of information when they have equal projections on common attributes. Codd distinguishes this as natural join, different from all other joins. In other words he represents data properties with attribute values. (Appendix A dependence on common sets of attributes is just a red herring.)

Although Appendix A has tragic practical consequences because it misunderstands natural join, it is at least consistent in its appeal to de Morgan’s laws. But this is just because de Morgan is prematurely applied. When it is applied in agreement with Codd it takes a different form which is logically valid and the Appendix A form becomes invalid. ...

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least). I got all my original understanding of relational operators from ISBL. Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join". He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal. But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

So I expect Paul to condemn Codd's post-1970 papers. Or is it possible (gasp!) that Codd 1970 is not the last word on the RM?

Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and the those projections aren't equal.

Indeed. Unlike the union-compatible requirement that can be checked at compile time; a run-time error is going to render applications unusable: you have a Customer who hasn't yet placed any orders? Sorry, you can't see any Customer Orders atall. Or an Employee who's just started and hasn't received a payslip? Sorry, you can't see any payslips. Is that the "massive unnecessary use-ability differences" that Paul is talking about? As in Codd's 1970 model is unusable? All that's going to happen is programmers will avoid Natural Join and instead use equi-joins with projections which'll give us SQL SELECT. Oh yes, that's what they do. Many of them go further and warn "Natural Join: a disaster waiting to happen".

... "conjoin" and it's his fault if I'm wrong about that, because Chris and I don't use that term anywhere afaik).

Appendix A FORMAL DEFINITIONS for <AND>

We remark that the <AND> operator might logically be called the conjoin.

Thanks Antc. Chris must have written that. Apologies to pc.

As soon as I discovered ISBL in 1978 I realised that Codd's early writings were merely seminal—that it would be up to others, language designers and implementers, to "stand on his shoulders" and eventually clean up. At the time that perception didn't bother me. Why should it? But Codd never seemed to show much interest in other people's work that he had inspired. And he still had a lot of undue influence even when he later came up with poor ideas (e.g. 3VL) and poor work (e.g. RM/T and RM/V2). As a result, the industry (at least) didn't clean up. It's all very sad and I have indeed wept real tears over it, as I wrote in the intro to my contribution in RDB Writings 1985-1989.

Hugh

Coauthor of The Third Manifesto and related books.

#40 · February 29, 2020, 3:43 pm

Quote from p c on February 29, 2020, 3:43 pm

To: Hugh Darwen

I addressed the quoted post to Greg Klisch because he asked the kind of probing questions which might be useful for repairing TTM and Appendix A, as opposed to presuming that following them necessarily results in a so-called truly relational system, the general affliction here.. But it would be impolite not to respond to the comments about that post by one of the co-authors.

Some points in reply:

The "referent" of "which" is the obvious subject of the sentence, spelled by the word "difference".

Much of what Codd wrote needs to be read as a puzzle which needs to be solved, or should be. Most people can't be bothered to do that and get by with criticisms based on isolated pieces of the puzzle or homespun logic instead.

Appendix A says that <AND>> might logically be called conjoin. It's a good handle for avoiding conflation with Codd's natural join.

Codd was consistent in his treatment of natural join right up to the 1990 ibook in his example of join deletion but this may not be obvious to casual readers.

It's not surprising that coders looking for long-term consistent definitions might take him to be inconsistent. It's equally likely that he fastened on a perhaps more obvious characterization of join as an equivalence, namely that ( S & A ) bi-implies ( S &A{"common attributes"} bi-implies A&S{"common attributes" } ). Such an equivalence applies just as much to queries as it might to updates.

It seems a certainty that he had this equivalence in mind in his 1990 reference to "backchaining". ( in my perhaps clumsy way, for some years I tried to make this very point in this group only to be met by the email equivalents of blank stares or derision.)

Some of what he wrote was obscure and some was aimed at his preferred logical language style and some assumed an open-world assumption plus a few other things I haven't tried to understand either but he went to some lengths to show that he was okay with an algebraic style that is apparently more accessible to many people.

But his "natural structure" of data suggests more than tuple variable types and relation variable types but happily not much more.

When it is understood that the aspects of a database language that involve backchaining will correspond with various inverse functions one is easily led to a categorization of basic relational structures based on four essential operations: union, difference, projection and join.

One sentence in the subject post entranced me: "Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and those projections aren't equal." It reminded me of the hot air balloon builder who when shown a Jet Plane asked "but where is the gondola?" Coder clubs everywhere are likely to fall into the trap of thinking that theory must follow code. Do r1 and r2 satisfy the equivalence or not? If they don’t, then what is the problem with a runtime error? Is it really nonsense?

(I noticed some supposed examples by Antc, which are laughable).

Natural join as well as conjoin generally drops tuples. It looks like cognitive dissonance when a type system allows people to think that what is joined is represented by relvars r1 and f2, including the very tuples that are dropped! Loosely, an output is conflated with an input. The thought that natural join must cause execution errors has to be born out of mysticism.

It doesn't matter what somebody thinks NJ means when a really relational dbms is used. Since extensions are available all it need do is check that ( r1 join r2 ) satisfies the above characteristic equivalence. Going beyond the above, if it does not then the supposed logical structure is not a join and must be one of the other three structures (ignoring external host language functions for the moment). For example, the characteristic equivalence of the difference structure should be obvious. It may fail to be one of the three in which case now you have a situation that you really could call "nonsense".

To: Hugh Darwen

I addressed the quoted post to Greg Klisch because he asked the kind of probing questions which might be useful for repairing TTM and Appendix A, as opposed to presuming that following them necessarily results in a so-called truly relational system, the general affliction here.. But it would be impolite not to respond to the comments about that post by one of the co-authors.

Some points in reply:

The "referent" of "which" is the obvious subject of the sentence, spelled by the word "difference".

Much of what Codd wrote needs to be read as a puzzle which needs to be solved, or should be. Most people can't be bothered to do that and get by with criticisms based on isolated pieces of the puzzle or homespun logic instead.

Appendix A says that <AND>> might logically be called conjoin. It's a good handle for avoiding conflation with Codd's natural join.

Codd was consistent in his treatment of natural join right up to the 1990 ibook in his example of join deletion but this may not be obvious to casual readers.

It's not surprising that coders looking for long-term consistent definitions might take him to be inconsistent. It's equally likely that he fastened on a perhaps more obvious characterization of join as an equivalence, namely that ( S & A ) bi-implies ( S &A{"common attributes"} bi-implies A&S{"common attributes" } ). Such an equivalence applies just as much to queries as it might to updates.

It seems a certainty that he had this equivalence in mind in his 1990 reference to "backchaining". ( in my perhaps clumsy way, for some years I tried to make this very point in this group only to be met by the email equivalents of blank stares or derision.)

Some of what he wrote was obscure and some was aimed at his preferred logical language style and some assumed an open-world assumption plus a few other things I haven't tried to understand either but he went to some lengths to show that he was okay with an algebraic style that is apparently more accessible to many people.

But his "natural structure" of data suggests more than tuple variable types and relation variable types but happily not much more.

When it is understood that the aspects of a database language that involve backchaining will correspond with various inverse functions one is easily led to a categorization of basic relational structures based on four essential operations: union, difference, projection and join.

One sentence in the subject post entranced me: "Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and those projections aren't equal." It reminded me of the hot air balloon builder who when shown a Jet Plane asked "but where is the gondola?" Coder clubs everywhere are likely to fall into the trap of thinking that theory must follow code. Do r1 and r2 satisfy the equivalence or not? If they don’t, then what is the problem with a runtime error? Is it really nonsense?

(I noticed some supposed examples by Antc, which are laughable).

Natural join as well as conjoin generally drops tuples. It looks like cognitive dissonance when a type system allows people to think that what is joined is represented by relvars r1 and f2, including the very tuples that are dropped! Loosely, an output is conflated with an input. The thought that natural join must cause execution errors has to be born out of mysticism.

It doesn't matter what somebody thinks NJ means when a really relational dbms is used. Since extensions are available all it need do is check that ( r1 join r2 ) satisfies the above characteristic equivalence. Going beyond the above, if it does not then the supposed logical structure is not a join and must be one of the other three structures (ignoring external host language functions for the moment). For example, the characteristic equivalence of the difference structure should be obvious. It may fail to be one of the three in which case now you have a situation that you really could call "nonsense".

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters

Questions about New Relational Algebra A