The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Questions about New Relational Algebra A

PreviousPage 5 of 5

“... At the time that perception didn't bother me.why should it?...”

There is a pop culture explanation for why it needn’t have bothered Codd either.

Some schools of industrial psychology like to categorize people into four groups, accommodators, divergers, convergers, and assimilators.  I remember being tested by two different employers along with a number of other people for what reason I never really knew.In one case we were even sent off site for six days where we were tested and made to do group exercises day and night. 

 I did come to believe in a couple of points that the exercises demonstrated, especially that it's poison for any group endeavour to put an extreme converger and an extreme assimilator in the same room without other people to buffer them, and can even be seriously incendiary.  The joke about the former was that they prefer not to know what the problem is because that will only slow them down solving it. The joke about the latter is that they only want to understand problems and could care less about solving them.

I suspect that Codd would have tested as an extreme assimilator.

Quote from p c on February 29, 2020, 3:43 pm

 

To: Hugh Darwen

I addressed the quoted post to Greg Klisch because he asked the kind of probing questions which might be useful for repairing TTM and Appendix A, as opposed to presuming that following them necessarily results in a so-called truly relational system, the general affliction here.. But it would be impolite not to respond to the comments about that post by  one of the co-authors.

Some points in reply:

The "referent" of "which" is the obvious subject of the sentence, spelled by the word "difference".

Much of what Codd wrote needs to be read as a puzzle which needs to be solved, or should be. Most people can't be bothered to do that and get by with criticisms based on isolated pieces of the puzzle or homespun logic instead.

Appendix A says that <AND>> might  logically be called conjoin. It's a good handle for avoiding conflation with Codd's natural join.

Codd was consistent in his treatment of natural join right up to the 1990 ibook in his example of join deletion but this may not be obvious to casual readers.

It's not surprising that coders looking for long-term consistent definitions might take him to be inconsistent. It's equally likely that he fastened on a perhaps more obvious characterization of join as an equivalence, namely that ( S & A ) bi-implies ( S &A{"common attributes"}  bi-implies A&S{"common attributes" } ). Such an equivalence applies just as much to queries as it might to updates.

It seems a certainty that he had this equivalence in mind in his 1990 reference to "backchaining". ( in my perhaps clumsy way, for some years I tried to make this very point in this group only to be met by the email equivalents of blank stares or derision.)

Some of what he wrote was obscure and some was aimed at his preferred logical language style and some assumed an open-world assumption plus a few other things  I haven't tried to understand either but he went to some lengths to show that he was okay with an algebraic style that is apparently more accessible to many people.

But his "natural structure" of data suggests  more than tuple variable types and relation variable types but happily not much more.

When it is understood that the aspects of a database language that involve backchaining will correspond with various inverse functions one is easily led to a categorization of basic relational structures based on four essential operations: union, difference, projection and join.

One sentence in the  subject post entranced me: "Think of all the run-time errors that the system has to raise when r1 NJ r2 is attempted and those projections aren't equal." It reminded me of the hot air balloon builder who when shown a Jet Plane asked "but where is the gondola?"  Coder clubs everywhere are likely to fall into the trap of thinking that theory must follow code. Do r1 and r2 satisfy the equivalence or not? If they don’t, then what is the problem with a runtime error? Is it really nonsense? 

(I noticed some supposed examples by Antc, which  are laughable).

Natural join as well as conjoin generally drops tuples. It looks like cognitive dissonance when a type system allows people to think that what is joined is represented by relvars r1 and f2, including the very tuples that are dropped! Loosely, an output is conflated with an input. The thought that natural join must cause execution errors has to be born out of mysticism.

It doesn't matter what somebody thinks NJ means when a really relational dbms is used. Since extensions are available all it need do is check that ( r1 join r2 ) satisfies the above characteristic equivalence. Going beyond the above,  if it does not then the supposed logical structure is not a join and must be one of the other three structures (ignoring external host language functions for the moment). For example, the characteristic equivalence of the difference structure should be obvious. It may fail to be one of the three in which case now you have a situation that you really could call "nonsense".

As usual, none of this makes the slightest sense. Minutiae aside, the relational model is notionally pretty simple stuff -- a set of structures (relations) along with a set of operators (relational algebra) that accept relations and sometimes attribute lists and returns relations or scalar values. You can turn the entirety of the notion into languages like Tutorial D. You can turn some of the notion into languages like SQL. You can turn an equivalent notion with a different fundamental construct -- say, arrays -- and create a language like APL, or use sets to create SETL, or use extended sets to create whatever David L Childs created, or use lists to create LISP, and so on.

In each case, you start with a basic construct, add some sensible and logically-based (either formally or informally) operators, and you wind up with a useful, practical thing. You can use it to write useful computer programs.

Generally, the useful thing is provably useful, too. You can prove that Tutorial D's JOIN doesn't produce bogus results, that APL's dyadic iota doesn't produce bogus results, that LISP's CDR doesn't produce bogus results, and so on.

That being so, p_c, I have no idea what you're on about here. The only thing that matters in terms of logic and language implementation is whether or not the operators are logical and -- ideally -- reasonably ergonomic, composable, and some other desirable properties that don't affect their logical correctness.

That. Is. All. That. Matters.

I can only assume you've fallen into some bizarre trap of believing the specific choice of operators -- Codd's join vs Appendix A vs Tutorial D's JOIN vs I-don't-know-what -- has some great philosophical or operational significance, when it obviously doesn't.

Have you ever actually developed a database management system, a database-driven application, or a database to underpin some real-world purpose?

I can only assume from your writings that you've done none of these things, because what you write is so thoroughly divorced from any practical or theoretical reality that it has no earthly meaning whatsoever, and thus appears to be nothing but trollish gibberish, borrowing words and phrases from relational theory and practice but bearing no other connection to it, and thus ultimately of no value here other than provoking responses from those of us who have struggled over the years to make the slightest sense of it.

Therefore, I suggest that in the interest of not wasting anyone's time any further -- including your own -- that you just bloody stop it, and instead spend some time learning the barest practical fundamentals of the field in which you deign to comment, so that at least there's a distant hope that at some point in the future you'll manage to make a post about databases, the relational model, coding, Codd, you-name-it that actually manages not only to convey meaning, but actually serves the purpose of human language and manages to communicate a message.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least).   I got all my original understanding of relational operators from ISBL.  Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join".   He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal.  But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

Just to clarify my own understanding, is Codd 1970 ("Data Banks") now of anything but historical value?

The earliest paper I have is 1969 ("Relations") which is a good read but falls short of a formal treatment. 1970 is written in quite a chatty style suited to publication in the Communications of the ACM, and it's introducing quite a novel idea with much effort in providing a setting, but I haven't found a need to refer to it. I have always relied on Codd 1972 ("Relational Completeness") as being the formal treatment of the topic, and looked to other authors to fill in the gaps (mainly Alice).The later papers (1983ff) take on quite a different character.

Am I missing something in the earlier papers?

Andl - A New Database Language - andl.org
Quote from dandl on March 1, 2020, 12:36 am

I have rarely looked at Codd's 1970 paper, which is a very hard read (for me, at least).   I got all my original understanding of relational operators from ISBL.  Anyway, I dragged out the 1970 paper again because I couldn't believe what pc was saying about Codd's definition of "natural join".   He's right: Codd really did define it that way, requiring the projections of the operands on their common attributes to be equal.  But that definition is complete nonsense for practical purposes, even if we overlook the fact that he defined it for binary relations only.

Codd 1972 'Relational Completeness' doesn't mention that requirement; although it does refer to "the natural join of the given relations as defined in [2]" -- which is 1971 'Further Normalization'. I can't find that online. Perhaps it's considering normalisation as in 'lossless join'? By 1979 'More Meaning', Codd's operators seem to have wandered further. Again no requirement for the join to be lossless. He distinguishes 'nonloss natural joins', presumably from joins that are lossy but still natural. Oh yes: there's an operator OUTER NATURAL JOIN which introduces nulls.

Just to clarify my own understanding, is Codd 1970 ("Data Banks") now of anything but historical value?

Suggest you look at Chris Date's evaluation of Codd's early papers -- which covers both 1969 and 1970.

The earliest paper I have is 1969 ("Relations") which is a good read but falls short of a formal treatment. 1970 is written in quite a chatty style suited to publication in the Communications of the ACM, and it's introducing quite a novel idea with much effort in providing a setting, but I haven't found a need to refer to it.

Mostly 1970 is a "chatty style" version of 1969. But there are a few novel bits, and I think they're important. Within section 1.3 the "domain-unordered counterparts." of relations, footnote 2 and the associated discussion. In particular, I think 1970 shows why Codd didn't want to equate his 'domain' with programming language 'type'. And I think D&D (including in Date's evaluation) never grokked that.

I have always relied on Codd 1972 ("Relational Completeness") as being the formal treatment of the topic, and looked to other authors to fill in the gaps (mainly Alice).

1972 makes very brief mention of "domain-unordered counterparts". But there's no formal treatment; and for that reason I always regard 1972 as defective (there's no understanding of RENAME in the definition of 'Relationally Complete'; I think 1972 is flawed because the algebra derives from ALPHA/calculus, and there's no justification where that comes from).

The later papers (1983ff) take on quite a different character.

There's a few good parts in 1979 'More Meaning', but it's more of a Curate's Egg than anything. Codd's 12 rules (1985?) I think are worth considering for what they should have said. The stuff after that is dross, and gets embarrassingly worse.

Am I missing something in the earlier papers?

 

Quote from dandl on March 1, 2020, 12:36 am

Just to clarify my own understanding, is Codd 1970 ("Data Banks") now of anything but historical value?

I suppose it depends how you define "value".

The tendency of some to venerate Codd's early (and sometimes later) writings as if they're definitive, flawless, inviolable, laden with hidden treasure, and misunderstood or unfairly-ignored Truth is decidedly odd and misguided. In most academic, scientific, and technical communities that reference the relational model, Codd's 1970 paper is regarded the same way almost all seminal papers in any field are regarded -- as starting something important but by no means the endpoint, and generally no longer relevant.

Subsequent work is almost invariably regarded as more important and usually corrects and/or refines the flawed initial work. In thinking and writing, citations should form a chain by pointing to the nearest relevant prior work, not a sunburst back to the origin, though seminal works are often mentioned to provide context.

Note that Codd's subsequent contributions to the body of work on the relational model are generally ignored. They're usually considered poor deviations rather than corrections or refinement.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on March 1, 2020, 11:55 am
Quote from dandl on March 1, 2020, 12:36 am

Just to clarify my own understanding, is Codd 1970 ("Data Banks") now of anything but historical value?

I suppose it depends how you define "value".

The tendency of some to venerate Codd's early (and sometimes later) writings as if they're definitive, flawless, inviolable, laden with hidden treasure, and misunderstood or unfairly-ignored Truth is decidedly odd and misguided. In most academic, scientific, and technical communities that reference the relational model, Codd's 1970 paper is regarded the same way almost all seminal papers in any field are regarded -- as starting something important but by no means the endpoint, and generally no longer relevant.

Not my intent at all. Just a simple question: is 1972 enough, or are earlier papers worthy of study or quoting, other than in a historical context?

Subsequent work is almost invariably regarded as more important and usually corrects and/or refines the flawed initial work. In thinking and writing, citations should form a chain by pointing to the nearest relevant prior work, not a sunburst back to the origin, though seminal works are often mentioned to provide context.

No argument, but which? Codd 1972 is formal, readable and 14 pages. Most of what D&D wrote is prolix, usually beyond my capacity to sustain reading, and AFAICT does not have any concise, formal, quotable treatment of a a definitive RM and RA. App-A is concise and formal, but serves a different purpose. Alice is way more formal but also very long. If you know of any short formal authoritative treatment of the RM and RA since Codd I would happily prefer that over Codd. I don't know one.

Note that Codd's subsequent contributions to the body of work on the relational model are generally ignored. They're usually considered poor deviations rather than corrections or refinement.

Yes.

Andl - A New Database Language - andl.org

... Note: Computational difficulties arise here as they did with <OR>, but again we need not concern ourselves with them at this juncture.

Having just re-read the Intro and Motivation sections of Appendix A, I see no suggestion the A algebra is intended to be executable or computable. Neither do I see such a suggestion in Codd 1972's set of operators. "This paper attempts to provide a theoretical basis which may be used to determine how complete a selection capability is provide" says the Abstract. The operators characterise "a yardstick of selective power".

So A is giving a semantics for the operators of a D; as an example it translates Tutorial D operators to A.

It may also be relevant that as at when Appendix A was written, TTM RM Pre 1 said scalar types must be finite.

My concern is not with how to turn an algebra into working code -- I can do that just fine.

What troubled me was the realisation that those two operators have finite arguments but return an infinite result, and the question as to whether the algebra is robust in handling infinities. If the types are merely large but finite there is no problem, but if the types are infinite and there are infinite intermediate results, are the final results valid? Is the algebra compromised?

Andl - A New Database Language - andl.org
Quote from dandl on March 2, 2020, 12:17 am
Quote from Dave Voorhis on March 1, 2020, 11:55 am
Quote from dandl on March 1, 2020, 12:36 am

Just to clarify my own understanding, is Codd 1970 ("Data Banks") now of anything but historical value?

I suppose it depends how you define "value".

The tendency of some to venerate Codd's early (and sometimes later) writings as if they're definitive, flawless, inviolable, laden with hidden treasure, and misunderstood or unfairly-ignored Truth is decidedly odd and misguided. In most academic, scientific, and technical communities that reference the relational model, Codd's 1970 paper is regarded the same way almost all seminal papers in any field are regarded -- as starting something important but by no means the endpoint, and generally no longer relevant.

Not my intent at all. Just a simple question: is 1972 enough, or are earlier papers worthy of study or quoting, other than in a historical context?

Subsequent work is almost invariably regarded as more important and usually corrects and/or refines the flawed initial work. In thinking and writing, citations should form a chain by pointing to the nearest relevant prior work, not a sunburst back to the origin, though seminal works are often mentioned to provide context.

No argument, but which? Codd 1972 is formal, readable and 14 pages. Most of what D&D wrote is prolix, usually beyond my capacity to sustain reading, and AFAICT does not have any concise, formal, quotable treatment of a a definitive RM and RA. App-A is concise and formal, but serves a different purpose. Alice is way more formal but also very long. If you know of any short formal authoritative treatment of the RM and RA since Codd I would happily prefer that over Codd. I don't know one.

There isn't one, because the RM/RA isn't a singular thing. There is no such thing as "a definitive RM and RA."

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

No argument, but which? Codd 1972 is formal, readable and 14 pages. Most of what D&D wrote is prolix, usually beyond my capacity to sustain reading, and AFAICT does not have any concise, formal, quotable treatment of a a definitive RM and RA. App-A is concise and formal, but serves a different purpose. Alice is way more formal but also very long. If you know of any short formal authoritative treatment of the RM and RA since Codd I would happily prefer that over Codd. I don't know one.

There isn't one, because the RM/RA isn't a singular thing. There is no such thing as "a definitive RM and RA."

You know that, I know that, I never suggested there is (only) one. I would settle for any short formal authoritative treatment of any specific RA and/or RM since Codd. I don't know any.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 3, 2020, 2:36 am

No argument, but which? Codd 1972 is formal, readable and 14 pages. Most of what D&D wrote is prolix, usually beyond my capacity to sustain reading, and AFAICT does not have any concise, formal, quotable treatment of a a definitive RM and RA. App-A is concise and formal, but serves a different purpose. Alice is way more formal but also very long. If you know of any short formal authoritative treatment of the RM and RA since Codd I would happily prefer that over Codd. I don't know one.

There isn't one, because the RM/RA isn't a singular thing. There is no such thing as "a definitive RM and RA."

You know that, I know that, I never suggested there is (only) one. I would settle for any short formal authoritative treatment of any specific RA and/or RM since Codd. I don't know any.

In the academic community, TTM "Appendix A" is considered as definitive and authoritative as anything on the subject needs to be. It's not a specification or a standard, nor should it be. It's computer science.

Note that the academic community largely regards the Codd et al. relational model to no longer be of theoretical interest. Indeed, its academic product and successor -- Datalog -- is considered academically "done" and no longer worth research pursuit. The field has moved on. Whatever was written on the subject has been deemed sufficient. If you disagree, and feel none of the writings on the subject are sufficient, then it's up to you (in the broad sense -- I'm not singling anyone out) to address that gap because it's almost certain that no one will do it for you.

The relational model -- whatever you deem it to be, even if only a source of inspiration -- is arguably still (or should be) of enormous engineering interest, but that means building things instead of ruminating on old papers.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 5 of 5