The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Uniqueness of Dum (R00)

PreviousPage 2 of 4Next
Quote from AntC on July 24, 2019, 1:49 am
Quote from Hugh on July 23, 2019, 2:40 pm
Quote from AntC on July 23, 2019, 6:56 am

BTW, I think D&D could reasonably take umbrage at the title of this thread: The uniqueness of DUM is perfectly clear, defined using the set-theoretic basis for relations as Prescribed in TTM.

...  Was it [the initial post] a (needless) proof of uniqueness, or a questioning thereof?  Or was it just attempting to equate it with something else?

True that DUM needs no proof of uniqueness. What had been questioned (at first in other places) was: a) the uniquess of R00 (given definitions appearing in Vadim/Marshall's and Litak et al's papers); then b) equating R00 with DUM.

Quote from Erwin on July 23, 2019, 2:36 pm
... I'm not really an algebraicist let alone a logician. I never understood what he was up to or where he was headed or what problems he thought needed addressing. Still don't BTW.

Then as one non-algebraicist to another ...

The appeal is an equivalent to the A algebra, but with fewer primitive concepts. (I think the appeal of having fewer primitive operators is a rather pointless parlour game.) Specifically, the A algebra still requires attribute name literals (for <REMOVE>); and their being literals only means we can't treat them as first-class; so we can't express theorems about equivalences between A expressions. For example we can't express the conditions under which you can 'push project through <AND>' in any form that allows deriving proofs.

Yes. The purpose of naming things (types, variables, etc) is for human convenience, which means they can be eliminated by the compiler. But here we have those names being a crucial part of the database itself, and of the runtime working of operators such as JOIN and PROJECT. Appendix A goes to a lot of trouble to remove RENAME, but it does so by sleight of hand: the attribute names are simply moved somewhere else. They should be eliminated entirely.

Assume relation R1 with attributes A,B,C identical in structure with relation R2 with attributes X,Y,Z. Can I write a program that runs interchangeably on either? Can I write a complicated user-defined operator that will run interchangeably on any suitable subset of attributes of the proper type? Can I get rid of attribute names entirely and use only type names (which the compiler will eliminate). I think I should.

The Alice book goes to a lot of trouble to present the RA expressed in both named and unnamed perspectives, and provides a  logical basis for doing so. Perhaps this is a worthwhile starting point.

Yes, I think this is a problem worth solving.

 

Andl - A New Database Language - andl.org
Quote from AntC on July 24, 2019, 1:49 am

Does the underpinning algebra make a difference to the surface programming language/D? Not necessarily. It at least suggests 'blessing' the technique of using relation(al expression)s to denote a heading. That's already available in Rel, as it turns out, with ATTRIBUTES_OF( ). It remains an annoyance in Rel that the return value from ATTRIBUTES_OF( ) is not first-class/can only appear where syntactically an attribute-commalist is expected. 

The generalisation of ATTRIBUTES_OF() is TYPE_OF(), but I'm the first to admit it's not very ergonomic.

I could make an ATTRIBUTES_OF() that generally returns a relation of attribute names, if that would be helpful?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on July 24, 2019, 1:49 am

Quote from Erwin on July 23, 2019, 2:36 pm
... I'm not really an algebraicist let alone a logician. I never understood what he was up to or where he was headed or what problems he thought needed addressing. Still don't BTW.

Then as one non-algebraicist to another ...

The appeal is an equivalent to the A algebra, but with fewer primitive concepts. (I think the appeal of having fewer primitive operators is a rather pointless parlour game.) Specifically, the A algebra still requires attribute name literals (for <REMOVE>); and their being literals only means we can't treat them as first-class; so we can't express theorems about equivalences between A expressions. For example we can't express the conditions under which you can 'push project through <AND>' in any form that allows deriving proofs.

Then the appeal of the 'inner union' operator is that it can express projection as a relational operation taking only relations as operands; and we can express 'r1 has no attributes in common with r2' as an equiequation: (r1 INNERUNION r2) JOIN DUM = DUM . DUM plays a critical role as the relational value that enables taking relations as headings for these proofs. We must have an equational definition of DUM in terms of JOIN and InnerUnion to drive inference; and that's what we do not have.

We do have some equations around a thing called R00 -- which we'd like to be equivalent to DUM. So far those equtions aren't sufficient to fix all the needed properties of R00.

A further appeal of 'inner union' (speaking more as an implementor/programmer than algebraicist) is that it is domain-independent, as opposed to <OR>, <NOT>. Then for Vadim's opening gambit on this thread to define R00 in terms of two domain-dependent operators has no appeal. This criticism is additional to the point about not defining R00 in terms of JOIN/InnerUnion. Indeed I'm not sure if 'domain dependent' is the correct term for 'inversion'/the unary postfix backquote operator. It requires an absolute complement of the set of attribute names. That seems worse than domain dependent  -- schema dependent?

Does the underpinning algebra make a difference to the surface programming language/D? Not necessarily. It at least suggests 'blessing' the technique of using relation(al expression)s to denote a heading. That's already available in Rel, as it turns out, with ATTRIBUTES_OF( ). It remains an annoyance in Rel that the return value from ATTRIBUTES_OF( ) is not first-class/can only appear where syntactically an attribute-commalist is expected. 

There are quite a number of things here I'm not grasping sufficiently.

"in any form that allows deriving proofs" for example.  Proofs of what ?  And do you mean to suggest that there are "forms" in which any axiom/theorem can appear that "does not allow deriving proofs" ?

I also do not understand why you would "need" to define your R00/DUM in terms of the operators of your algebra.  To my simplistic mind, if one sets out to devise an algebra, the values it acts upon are a given thing and it looks outright silly to me to try and define such a given in terms of the new things you're inventing to act on those givens.  In the same frame of mind, if the values are a given then is each and every one of those values not unique by definition ?  Why try to prove uniqueness ?

And from a practical perspective, I would not want to write all my expressions using only INNERUNION and JOIN, not any more than I would want to write all my expressions using just <AND> and <OR> (and nephews).  The benefit must be somewhere else, but I feel highly suspicious about the very existence of such a benefit.

Finally, about the "attribte name lists" in PROJECT.  While in search of a precise definition of INNERUNION, I ran into this old promo article for QBQL where a projection was written as "R INNERUNION [a1, a2]" and claiming an advantage that this expression did not involve attribute lists like traditional projection does.  (The trick was that "[a1, a2]" denotes an empty relation with heading a1,a2 and types SAME_AS_IN_THE_OTHER_ARGUMENT such that it was just a syntactic variation of REL{a1 ... a2 ...}{} with the ellipsis stuff inferred from stuff outside the expression per se.)  My mildest response to that is a very high frown.

Quote from Erwin on July 24, 2019, 8:28 pm

...Finally, about the "attribte name lists" in PROJECT.  While in search of a precise definition of INNERUNION, I ran into this old promo article for QBQL where a projection was written as "R INNERUNION [a1, a2]" and claiming an advantage that this expression did not involve attribute lists like traditional projection does.  (The trick was that "[a1, a2]" denotes an empty relation with heading a1,a2 and types SAME_AS_IN_THE_OTHER_ARGUMENT such that it was just a syntactic variation of REL{a1 ... a2 ...}{} with the ellipsis stuff inferred from stuff outside the expression per se.)  My mildest response to that is a very high frown.

I think you are referring to this pamphlet. Let me take this as an opportunity to add some commentary.

  1. The query output is usually printed this way, so why not to allow creating a table with exactly the same syntax as query output? And if you want to specify attribute types, this is just a separate database dictionary update query.
  2. Copying as relvar assignment, who would have thought of that?
  3. You are correctly noticed that attributes are listed in square brackets, but this is exactly the same syntax for the table/relation constant as in bullet #1.
  4. Selection as a join. You may question its utility, but keep in mind that rules for relational algebra query optimization make no distinction between selection and join. Therefore, implementing query optimizer with lesser number of operations is easier and therefore would result in more robust query execution/optimization engine.
  5.  Set intersection join AKA composition is an interesting operator. As soon as I introduced it in QBQL, I discovered that I use it more often then natural join! Also keep in mind that composition is arguably the most important operation in Tarski/Peirce algebra of binary relations. (As a side note, Tarski/Peirce algebra has been developed for 100 years by the time Codd discovered relational model).
  6. I admit: this query is rare in practice, but it leverages the inner union.
  7. The query is quite concise, but I'm not fan of operations which are not associative, such as inner join.
  8. Anti-join is also quite laconic, at the expense of attribute inversion, which many people dislike.
  9. Again, no new special syntax for REMOVE here.
  10. Assertions are identities in relational lattice, and, consequently, in QBQL. Please also note the syntax for TABLE_DUM: it is just a pair of square brackets.

 

Quote from Vadim on July 24, 2019, 10:05 pm
Quote from Erwin on July 24, 2019, 8:28 pm

...Finally, about the "attribte name lists" in PROJECT.  While in search of a precise definition of INNERUNION, I ran into this old promo article for QBQL where a projection was written as "R INNERUNION [a1, a2]" and claiming an advantage that this expression did not involve attribute lists like traditional projection does.  (The trick was that "[a1, a2]" denotes an empty relation with heading a1,a2 and types SAME_AS_IN_THE_OTHER_ARGUMENT such that it was just a syntactic variation of REL{a1 ... a2 ...}{} with the ellipsis stuff inferred from stuff outside the expression per se.)  My mildest response to that is a very high frown.

I think you are referring to this pamphlet. Let me take this as an opportunity to add some commentary.

I remember reading this some years back, and thinking it's a rather strange language. I couldn't tell what is a literal, what the basic types are, why something that looks like an expression is written in quotes. I gave up.

Is it real? Is there an implementation? Do you have a grammar?

I tried to read the page on renaming, but could make no sense of the QBQL.

Andl - A New Database Language - andl.org

If you are a java person, then just check out https://github.com/seanjensengrey/qbql

It is much more natural to run it from IDE, such as Eclipse. Then, fire Run.main(). It will execute some qbql program against a database (another qbql program constructing a bunch of relations).

The language grammar:

https://github.com/seanjensengrey/qbql/blob/master/src/qbql/lattice/lattice.grammar

There is only one datatype in QBQL, that is String. Consequently, quotation marks around string literals become redundant. Then, you may ask what "p=q" may possibly mean. Here is an explanation (apologies for copy-and-paste):

What's in a name?

SQL identifiers syntax looks odd by today’s programming language standards. They are case insensitive (which by itself is not a sin); however, in what looks like a futile attempt to please everybody, many SQL implementations offer quoted identifiers.

There is more in quoted identifiers idea than just admitting names in mixed case, though. A name becomes an arbitrary string literal: the name can start with number, contain spaces, or even Kanji symbols. An identifier can be a whole sentence! Wait a minute, didn’t Formal Logic and, consequently, Database Theory established already that a predicate is a sentence?

Consider the following sentence: “Max fed Folly at 2 o’clock”. This is a proposition (i.e. predicate of zero arity), which can be considered a tuple in a relation:

Fed=[person pet     time]
     claire folly   200
     max    folly   200
     max    folly   300
     max    scruffy 200
;

(The example is from Barwise&Etchemendy textbook), Therefore, a relation name Fed is a shorthand for “Person fed pet at time [o'clock]“. Fully named relations may appear silly for database professional who often operates tables and views with hundreds of attributes; although, database theoretician can fire back suggesting that database design with relations of excessive arity is preposterous.

 

Quote from Dave Voorhis on July 24, 2019, 11:39 am
Quote from AntC on July 24, 2019, 1:49 am

Does the underpinning algebra make a difference to the surface programming language/D? Not necessarily. It at least suggests 'blessing' the technique of using relation(al expression)s to denote a heading. That's already available in Rel, as it turns out, with ATTRIBUTES_OF( ). It remains an annoyance in Rel that the return value from ATTRIBUTES_OF( ) is not first-class/can only appear where syntactically an attribute-commalist is expected. 

The generalisation of ATTRIBUTES_OF() is TYPE_OF(), but I'm the first to admit it's not very ergonomic.

I could make an ATTRIBUTES_OF() that generally returns a relation of attribute names, if that would be helpful?

Thanks but I think not: you could manipulate those attribute names as a relation, but you'd lose static guarantees about headings(?) And any manipulation would be no more than what you could achieve with operations on the relation expression operand.

I guess ATTRIBUTES_OF( ) as it stands is as much as/more than you could expect at Tutorial level.

For an Industrial D, what I had more in mind was 'operations returning relations taking relation operands'. So perhaps a couple of shorthands as pseudo-relational operators:

* rx1 PROJECT rx2 =df rx1 {ATTRIBUTES_OF( rx2 )}

* rx1 REMOVE rx2 =df rx1 {ALL BUT ATTRIBUTES_OF( rx2 )}

To be generalised, I'd want no expectation that rx2's heading is a subset of rx1's or even that they have any attributes in common. So perhaps those shorthands should be

* rx1 PROJECT rx2 =df rx1 {ATTRIBUTES_OF( rx2 { ATTRIBUTES_OF( rx1 ) } )}

* rx1 REMOVE rx2 =df rx1 {ALL BUT ATTRIBUTES_OF( rx2 { ATTRIBUTES_OF( rx1 ) } )}

Except now I'm in an infinite regress ...

Quote from Erwin on July 24, 2019, 8:28 pm
Quote from AntC on July 24, 2019, 1:49 am

Quote from Erwin on July 23, 2019, 2:36 pm
... I'm not really an algebraicist let alone a logician. I never understood what he was up to or where he was headed or what problems he thought needed addressing. Still don't BTW.

Then as one non-algebraicist to another ...

The appeal is an equivalent to the A algebra, but with fewer primitive concepts. (I think the appeal of having fewer primitive operators is a rather pointless parlour game.) Specifically, the A algebra still requires attribute name literals (for <REMOVE>); and their being literals only means we can't treat them as first-class; so we can't express theorems about equivalences between A expressions. For example we can't express the conditions under which you can 'push project through <AND>' in any form that allows deriving proofs.

Then the appeal of the 'inner union' operator is that it can express projection as a relational operation taking only relations as operands; and we can express 'r1 has no attributes in common with r2' as an equiequation: (r1 INNERUNION r2) JOIN DUM = DUM . DUM plays a critical role as the relational value that enables taking relations as headings for these proofs. We must have an equational definition of DUM in terms of JOIN and InnerUnion to drive inference; and that's what we do not have.

We do have some equations around a thing called R00 -- which we'd like to be equivalent to DUM. So far those equtions aren't sufficient to fix all the needed properties of R00.

A further appeal of 'inner union' (speaking more as an implementor/programmer than algebraicist) is that it is domain-independent, as opposed to <OR>, <NOT>. Then for Vadim's opening gambit on this thread to define R00 in terms of two domain-dependent operators has no appeal. This criticism is additional to the point about not defining R00 in terms of JOIN/InnerUnion. Indeed I'm not sure if 'domain dependent' is the correct term for 'inversion'/the unary postfix backquote operator. It requires an absolute complement of the set of attribute names. That seems worse than domain dependent  -- schema dependent?

Does the underpinning algebra make a difference to the surface programming language/D? Not necessarily. It at least suggests 'blessing' the technique of using relation(al expression)s to denote a heading. That's already available in Rel, as it turns out, with ATTRIBUTES_OF( ). It remains an annoyance in Rel that the return value from ATTRIBUTES_OF( ) is not first-class/can only appear where syntactically an attribute-commalist is expected.

There are quite a number of things here I'm not grasping sufficiently.

Hi Erwin, ideally I'd resequence your qs/my answers for more logical flow; but that's too hard with this coal-fired editor. Please mentally resequence.

"in any form that allows deriving proofs" for example.  Proofs of what ?  And do you mean to suggest that there are "forms" in which any axiom/theorem can appear that "does not allow deriving proofs" ?

Yes: take my example push-project-through-<AND>. Very often needed for query optimisation. You can push a projection providing it's projecting-away attributes not in common between the operands of the <AND>. This rule-of-thumb is typically encoded inside the dark arts of query planners. In what form does a projection appear in the A algebra? Only as a literal attribute name in a <REMOVE>. How do we algebraically prove the compiler has correctly translated a projection into a <REMOVE>? How do we algebraically prove the optimiser/planner has correctly identified some particular <REMOVE> can be pushed through some particular <AND>?

I also do not understand why you would "need" to define your R00/DUM in terms of the operators of your algebra.  To my simplistic mind, if one sets out to devise an algebra, the values it acts upon are a given thing and it looks outright silly to me to try and define such a given in terms of the new things you're inventing to act on those givens.  In the same frame of mind, if the values are a given then is each and every one of those values not unique by definition ?  Why try to prove uniqueness ?

To be clear: DUM is defined in terms of the set algebra of Appendix A and its setbuilder specifications. The lattice algebra doesn't know anything about relations as sets of tuples; nor tuples as sets of <A,T,v> triples. So if we want the lattice R00 to behave like DUM, we have to define the properties of R00 in terms of lattice meet/join. We have some of those definitions but not enough to prove everything we need. We can claim (as Litak et al appear to) that R00 is constant. But we can ask a (seemingly) endless series of questions about its properties, and get neither a positive nor negative provable reply. Nothing unusual here in an axiom system: there are always sentences that are undecidable, that's just Goedel.

And from a practical perspective, I would not want to write all my expressions using only INNERUNION and JOIN, not any more than I would want to write all my expressions using just <AND> and <OR> (and nephews).  The benefit must be somewhere else, but I feel highly suspicious about the very existence of such a benefit.

Of course not. That's why I said "not necessarily" any difference to the surface D. The idea is the compiler translates surface D to equivalent expressions in the algebra; then the query planner transforms those into optimised expressions in the same algebra. In other compilers/other languages (for example translating surface language to Lambda calculus or to the JVM or to LLVM), those optimising transformations are highly likely to be infelicitous, because they involve all sorts of dark arts buried in rewrite rules. The advantage of an axiomisable algebra is we can prove ech transform is meaning-preserving. In the sense: the optimised query produces the same outputs from the same inputs.

Finally, about the "attribte name lists" in PROJECT.  While in search of a precise definition of INNERUNION, I ran into this old promo article for QBQL where a projection was written as "R INNERUNION [a1, a2]" and claiming an advantage that this expression did not involve attribute lists like traditional projection does.  (The trick was that "[a1, a2]" denotes an empty relation with heading a1,a2 and types SAME_AS_IN_THE_OTHER_ARGUMENT such that it was just a syntactic variation of REL{a1 ... a2 ...}{} with the ellipsis stuff inferred from stuff outside the expression per se.)  My mildest response to that is a very high frown.

I'll defer to Vadim's response, but I see no need to frown. Tutorial D already has SAME_TYPE_AS. Any type-unification-based compiler would happily see a polymorphic attribute type for an attribute same-named in another operand to <AND> (say) and infer 'same attribute name must be same attribute type'. This is 1980's (if not earlier) compiler technology. Are you saying SIRA_PRISE struggles with unifying polymorphic types?

Quote from Vadim on July 25, 2019, 1:32 am

If you are a java person, then just check out https://github.com/seanjensengrey/qbql

It is much more natural to run it from IDE, such as Eclipse. Then, fire Run.main(). It will execute some qbql program against a database (another qbql program constructing a bunch of relations).

I would not describe myself as a Java person, but I get by. Thank you for the github link, but I think you're making it too hard. You must provide a decent readme for any project you expect other people to use, and you should provide some sample programs, sample data and ideally create a github release. That would be the time to ask people to look at it.

The language grammar:

https://github.com/seanjensengrey/qbql/blob/master/src/qbql/lattice/lattice.grammar

Again, too hard. Impenetrable structure, almost no comments, probably incomplete without also consulting the 12 files in the parser directory. I tried to figure out how to tell an identifier from a string literal and failed miserably. And I have no idea how whitespace is treated.

There is only one datatype in QBQL, that is String. Consequently, quotation marks around string literals become redundant. Then, you may ask what "p=q" may possibly mean. Here is an explanation (apologies for copy-and-paste):

What's in a name?

SQL identifiers syntax looks odd by today’s programming language standards. They are case insensitive (which by itself is not a sin); however, in what looks like a futile attempt to please everybody, many SQL implementations offer quoted identifiers.

There is more in quoted identifiers idea than just admitting names in mixed case, though. A name becomes an arbitrary string literal: the name can start with number, contain spaces, or even Kanji symbols. An identifier can be a whole sentence! Wait a minute, didn’t Formal Logic and, consequently, Database Theory established already that a predicate is a sentence?

Consider the following sentence: “Max fed Folly at 2 o’clock”. This is a proposition (i.e. predicate of zero arity), which can be considered a tuple in a relation:

Fed=[person pet     time]
     claire folly   200
     max    folly   200
     max    folly   300
     max    scruffy 200
;

(The example is from Barwise&Etchemendy textbook), Therefore, a relation name Fed is a shorthand for “Person fed pet at time [o'clock]“. Fully named relations may appear silly for database professional who often operates tables and views with hundreds of attributes; although, database theoretician can fire back suggesting that database design with relations of excessive arity is preposterous.

Yes, I like the idea of relation literals. Other than that, I fail to see how this explanation bears on my question.

I tried to figure out (in the example given) whether the line break after each group of three is significant or just whitespace. Failed again. Sorry, too hard.

Andl - A New Database Language - andl.org
Quote from Vadim on July 25, 2019, 1:32 am

A couple of remarks on QBQL syntax. Like David B, I find it highly counter-intuitive. So I'm making these remarks to help explain in advance of responding to another post from Vadim which purports to show some sort of 'proof' in QBQL. I've no idea whether that's proving something that agrees with what I'm trying to do, or disproves it.

Of course Vadim is entitled to use whatever syntax he chooses in his own notation. But if you're trying to explain something on a TTM forum you need to use the lingua franca here -- either Tutorial D or A algebra. Or include a great deal of explanation how QBQL notation corresponds.

... There is only one datatype in QBQL, that is String.

Hmm. I'd prefer to say QBQL is untyped. If you want to use String types, how do I include a space/tab/newline character inside a value? AFAICT space is significant as a attribute-value delimiter, and newline is significant as a tuple delimiter. Oh but then semicolon delimits the end of something(?)

Consequently, quotation marks around string literals become redundant. Then, you may ask what "p=q" may possibly mean.

I certainly do ask if "p=q" means the same as [p=q]. If "p=q" is 'just a name', what does the embedded equal sign mean? If it's not denoting an equi-relation then please don't put an equal sign.

SQL identifiers syntax looks odd by today’s programming language standards. They are case insensitive (which by itself is not a sin); however, in what looks like a futile attempt to please everybody, many SQL implementations offer quoted identifiers.

Most of SQL syntax looks odd by any programming language standards. TTM has a whole prescription to forbid it:

RM Pre 26: "D shall be constructed according to well established principles of good language design. " IOW D shall be almost totally unlike SQL.

Quoted identifiers must be one of the more stupid bits of SQL syntax (although I admit there's plenty of competition for stupid language design). It reeks of Operating System Command Language syntax vintage 1960's. If you're claiming QBQL is trying to follow SQL, that's a negative as far as I'm concerned. Furthermore seeing QBQL syntax in the flesh (on the other post) I can confirm it's utterly confusing.

 

 

PreviousPage 2 of 4Next