The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Some issues with TTM

There are 6 distinct issues raised here. I can break them out as separate posts if preferred.

1. Reword RM Pre 6 for consistency with RM Pre 18

RM Pre 6 refers to the relational operators "RENAME, project, EXTEND, and JOIN" and cross references to RM Pre 18, but any mentions of specific operators (including these) had earlier been removed from RM Pre 18. Those references should be removed from here.

Suggestions: my view is that TTM should include a suggested list of relational operators, expressed in the same kind of generic wording as the concatenate and aggregate operators in RM VSS 5. My suggested set would be:

      • select, project, rename, join, union
      • the negated forms antijoin and minus
      • a new value operator
      • aggregation operators such as sum, max and min
      • transitive closure or generalised transitive closure or a while (fixed point recursion) operator.

This could be added as a section in the Preamble, or as an RM VSS.

2. Reword RM Pre 6 for consistency with RM Pre 5a

The wording of point (b) is unclear or incorrect. It should say something similar to RM Pre 5a, for example:

It shall be possible to “retrieve” (i.e., read the value of) every attribute of a tuple value. The read-only operator that provides this functionality shall have declared type the same as that of the attribute.

 

3.Reword RM Pre 6 and 7 to say that types with the same heading are the same type

RM Pre 6 and 7 refer to type equality without making it clear what is intended. Type equality is referred to in RM Pre 1 as a synonym for type identity ("the same type"). The same principle should be applied here, by the following wording.

Tuple types TUPLE H1 and TUPLE H2 shall be equal (that is, the same type) if and only if H1 = H2.

Relation types RELATION H1 and RELATION H2 shall be equal (that is, the same type) if and only if H1 = H2.

4. Move all proscriptions from the Prescription section to the Proscription section.

All requirements in any Pre section that specify what is not allowed should be  moved to the Proscription sections. They include:

      • RM Pre 3a and d
      • RM Pre 16
      • Rm Pre 23 (arguable)

5. Amend RM Pre 21 to clarify how to deal with pseudo-assignments

As per our previous discussion, my view is that pseudo-assignments are shorthands and should be expanded in step a. Whether you agree or not, this should be made clear, since it affects the validity of the subsequent steps.

 

6. Publish TTM and other suitable works on GitHub or similar

GitHub (or similar) is now the preferred means of publishing material of this kind. In particular, it gives you a read-me, licence terms, version control and facilities to allow interested users to raise and debate issues (like this one). Please consider.

Andl - A New Database Language - andl.org

If the authors of TTM could find energy or enthusiasm to revise the doco -- which at least one of the authors has said often enough they couldn't -- I'd be looking for a more strategic-level guidance to the Pres and Pros, rather than tinkering with the wording.

Currently the document reads as though each of the Pres and Pros are equal requirements for a language to count as a D. But I suspect it's not as heinous to omit Multiple Possreps as it is to transgress RM Pro 3 (no Nulls) or 4 (really no Nulls). Is a language without the Tuple operators of RM Pre 6 irretrievably beyond the pale of Relationland? That seems harsh when the effect could be achieved by placing individual tuples within relations, and applying relational operators -- presuming there's an IS_EMPTY( ), COUNT(*), TUPLE FROM ... to inspect the result.

Then I'd like to see some of the Pres/Pros demoted to VSSs; and the VSSs demoted to FSSs -- Fairly Strong Suggestions.

I'd also like to see some of the points couched more in terms of objective/requirement than implementation. (Does the doco have to go into that much detail about aggregate operators? There's no need to differentiate from how SQL handles aggregates over Nulls, because the doco has already ruled out Nulls.) In that vein, less specificity about implementation in procedural languages/program internals; more semantics about the required effect on the database/DBMS, that could be interpreted into other language paradigms and/or interpreted into 'features' to support D-ness within an existing language.

Then following the sequence of the doco:

  • RM Pre 1: user-defined scalar types, but no mention of enumerated types -- are those somehow anti-Relational? Reading RM Pre 1 together with 4, 5 seems to leave no way to define enumerations. (I think these Pres suffer from over-specifying the implementation/underspecifying the objective.)
  • RM Pre 3 scalar operators seems to rule out polymorphic or overloaded operators. And yet overloading is one possible mechanism for handling the multiplicity of numeric types typically found in databases, without an explosion of a set of distinct operators for each type. Is there something ant-Relational about overloading? Generic operators only allowed on relations RM VSS 6.
  • RM Pre 4, 5 scalar data structures/types and Selectors. Is the Selector mechanism the only way to implement scalar data structures? Would it be anti-Relational if a candidate-D provided only TVAs and their Tuple-types? Tuple-types seem to satisfy most of the requirements about component access in those two Pres, except the implementation detail of positional parameters: is it anti-Relational to fail to provide positional access to components of non-scalars? This is in sharp contrast to RM Pro 1, 2 ruling out positional access to attributes or tuples.
    Multiple PossReps: what's the objective with those? Is it merely programming ergonomic convenience -- then why is it so necessary that every value for the type have a representation for each PossRep? Is it essential to treat each PossRep as of equal status? Is it anti-Relational to take one of the PossReps as 'base' and regard others as mapped to/from it, with a possibility those mappings are not total? (The D will presumably anyway need to cater for non-total operations, such as divide by zero or numeric overflow.)
  • RM Pre 6 Tuple types, operations, and RM Pre 12 Tuple variables, assignment, RM Pre 20 operators: why must tuples be first-class/is it anti-Relational to only support operations for singleton relations? Yes the abstract semantics for relations needs to define the structure in terms of tuples. That doesn't require tuples be realised concretely in the language AFAICS. Requiring it seems to preclude certain sorts of representations for relations -- such as 'vertical stores'. A D could support the TUP{ } type generator/pseudo-Selector as syntax purely within REL{ }.
  • RM Pre 8 comparison operator == "for every type T". Just no: indeed many of the types commonly held in databases support equality-comparison. Others don't (such as function types). Perhaps non-comparable types should be excluded from the database, but they shouldn't be excluded from the D's ecosystem. I'm not sure RM Pre 8 would support Image/Audio/document/BLOB types. Is it anti-Relational to hold those in a database?
  • RM Pre 14 virtual 'relvars' with RM Pre 21 assignment. Again just no: since there are no implementations of D-likes that support assignment to arbitrary virtuals, there's your explanation for "Why are there no Truly Relational DBMSs?".

 

  • RM Pro 8 no "compound" or "composite" attributes seems to contra-indicate TVAs or RVAs. Again, what needs to be stated is the objective rather than the implementation. Presumably something along the lines 'no repeating groups'; but what makes RVAs Relational and "compound" attributes not?
  • OO Pre 5 nested transactions. AFAICT there is no D-like that supports those; indeed there's no very clear definition of how such a mechanism should behave. Another explanation for "Why are there no Truly Relational DBMSs?".
  • OO Pre 6 aggregate operator too much detail already.
  • OO Pro 1 "Relvars are not domains." too little detail already: what's the sense of "domains" here?
  • OO Pro 2 no pointers too little detail already: what distinguishes a pointer vs a surrogate key vs a Record-ID (auto-generated, not a disk-address), and are those latter two anti-Relational, and how to distinguish from RM VSS 1 supplied by the system? A timestamp is OK (presumably), a Record-ID not (presumably) but why? (I mean to proscribe a Record-ID accessible/visible within the D/joinable to other relations; I expect the D implementation would use Record-IDs internally/maybe make it visible for debugging.)

... ends.

Quote from AntC on May 7, 2020, 11:47 am

If the authors of TTM could find energy or enthusiasm to revise the doco -- which at least one of the authors has said often enough they couldn't -- I'd be looking for a more strategic-level guidance to the Pres and Pros, rather than tinkering with the wording.

Blow-by-blow response in order to keep the responses short, but it means there's going to be many ...

The authors have their background and history and age.  I can understand it perfectly well if they considered this work they've delivered as their final heritage to the world of data management (and aren't willing anymore to dedicate the majority of the time they have left in this world dealing with questions like yours).  I understand your question for "more strategic-level guidance" but here's the thing : when I started SIRA_PRISE, I deliberately and willingly and consciously made a choice to disregard all that "language" and "compiler" business, to do only as little of that as turned out to be inevitable, and focus on what the authors' descriptions said it meant "to be relational".  So SIRA_PRISE is entirely data-sublanguage paradigm, SIRA_PRISE expression language is nowhere near "computationally complete", and that's a crystal clear violation of at least one of the PRE's, and yet the authors have not dismissed SIRA_PRISE-as-a-D.  Quite the contrary.  Liberal interpretation gets you to places.  All you need is a sense of not getting too liberal.  If that's too vague, sorry but I can't do better.  If you want "strategic guidance" then you're going to have to do with what you can get out of those people still responding here who might have "gotten it".

Quote from AntC on May 7, 2020, 11:47 am

Currently the document reads as though each of the Pres and Pros are equal requirements for a language to count as a D. But I suspect it's not as heinous to omit Multiple Possreps as it is to transgress RM Pro 3 (no Nulls) or 4 (really no Nulls). Is a language without the Tuple operators of RM Pre 6 irretrievably beyond the pale of Relationland? That seems harsh when the effect could be achieved by placing individual tuples within relations, and applying relational operators -- presuming there's an IS_EMPTY( ), COUNT(*), TUPLE FROM ... to inspect the result.

Yes the document does not include numerical measurement of "degree of importance" for each individual rule.  Yes I'm confident an implementation constraining users to single-possrep could (would) be "condoned" as "sufficiently compliant" whereas an implementation condoning NULL would not (see Alphora D4).  The same ***could*** be true of a language deliberately choosing to not expose TUPLE types ***within the language***, i.e. as a full-blown first-class data type.  SIRA_PRISE was once like that.  The database contains only relations, the system is meant to only expose those relations, no need for tuple-level stuff inside the language, presuming (and you've already said it).

Quote from AntC on May 7, 2020, 11:47 am

I'd also like to see some of the points couched more in terms of objective/requirement than implementation. ...  In that vein, less specificity about implementation in procedural languages/program internals; more semantics about the required effect on the database/DBMS, that could be interpreted into other language paradigms and/or interpreted into 'features' to support D-ness within an existing language.

The authors came to where they arrived from the engineering side of things.  Neither of them are the Codd/Dijkstra/Hoare/... style of fundamental scientist.  And there are limits to what such a non-scientist can express if he's not allowed to ultimately resort to implementation-level language.  Express semantics by stating the algorithm.  Hell, even Dijkstra did that often enough.  Maybe it's regrettable they got "stuck in the sixties" and never got beyond COBOL and PL/1 (to put it bluntly).  Otoh I would expect any IT guy of considerable age to be able to remember what programming in COBOL, PL/1, RPG for God's sake, ... was like in those days, and thereby get a sense of what the TTM authors are getting at at any particular point in their narrative.  (E.g. my understanding of BS12 is that it was aimed at solving the exact same problem that RPGII was invented for.  I never saw one byte of BS12 code before Hugh started posting it here and I willfully skipped all of my RPGII classes.)

Quote from AntC on May 7, 2020, 11:47 am
  • RM Pre 1: user-defined scalar types, but no mention of enumerated types -- are those somehow anti-Relational? Reading RM Pre 1 together with 4, 5 seems to leave no way to define enumerations. (I think these Pres suffer from over-specifying the implementation/underspecifying the objective.)

Please get it imprinted in your brain that the TTM authors want, prime and above all, the type system to be open-ended.  User-defined types must be possible and must be first-class.  They would NEVER have intended any of the PREs (or combination thereof) to PREVENT the type system from being open-ended in that sense.  They NEVER intended to PROSCRIBE AGAINST a programmer being able to define WeekDay = {Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday};  Or any other such.  (They might have reservations if users started trying to exploit the feature to get to MyBoolean = {true, false, null} but that's another matter.)

Quote from AntC on May 7, 2020, 11:47 am
  • RM Pre 3 scalar operators seems to rule out polymorphic or overloaded operators. And yet overloading is one possible mechanism for handling the multiplicity of numeric types typically found in databases, without an explosion of a set of distinct operators for each type. Is there something ant-Relational about overloading? Generic operators only allowed on relations RM VSS 6.

That's because you continue to fail to see the distinction between the operator and the symbol used to denote it.  What gets overloaded is the symbol, not the operator.

Per appendix A, the equality operator that tests for equality between BYTE values is a relation with 256^2 tuples, 256 of which hold a return value 'TRUE'.  Per same, the equality operator that tests for equality between INTEGER values (assuming the usual definition of INTEGER) is a relation with 4294967296^2 tuples, 4294967296 of which hold a return value 'TRUE'.  The two are not the same relation, hence the two are not the same operator.

I've suggested "operator generator" as a concept in the past and still believe it's a valuable way of seeing things.  Not that I'm going to insist very hard any more.

Quote from AntC on May 7, 2020, 11:47 am

If the authors of TTM could find energy or enthusiasm to revise the doco -- which at least one of the authors has said often enough they couldn't -- I'd be looking for a more strategic-level guidance to the Pres and Pros, rather than tinkering with the wording.

Currently the document reads as though each of the Pres and Pros are equal requirements for a language to count as a D. But I suspect it's not as heinous to omit Multiple Possreps as it is to transgress RM Pro 3 (no Nulls) or 4 (really no Nulls). Is a language without the Tuple operators of RM Pre 6 irretrievably beyond the pale of Relationland? That seems harsh when the effect could be achieved by placing individual tuples within relations, and applying relational operators -- presuming there's an IS_EMPTY( ), COUNT(*), TUPLE FROM ... to inspect the result.

Then I'd like to see some of the Pres/Pros demoted to VSSs; and the VSSs demoted to FSSs -- Fairly Strong Suggestions.

I'd also like to see some of the points couched more in terms of objective/requirement than implementation. (Does the doco have to go into that much detail about aggregate operators? There's no need to differentiate from how SQL handles aggregates over Nulls, because the doco has already ruled out Nulls.) In that vein, less specificity about implementation in procedural languages/program internals; more semantics about the required effect on the database/DBMS, that could be interpreted into other language paradigms and/or interpreted into 'features' to support D-ness within an existing language.

TTM is an extraordinary document, but these are among its many failings. I am accustomed to reading standards documents with their lengthy set of definitions and their careful use of shall and may, and on these grounds TTM falls short. I find the level of detail of some parts excruciating and the lack of definitions in other parts mystifying, but when I've challenged something the response is usually hand-wavy.

I raised the above issues just to have them on record, not because I think anything will happen. I've previously written a 'Paraphrase' (it's on the Andl web site) and I'm currently writing a 'Lesser Manifesto' in like vein. I don't think anyone is going to set out to rewrite TTM, but if they did I think it should say much less. Type systems and computational completeness are largely a done deal, but the really important issues are about how tuple and relational types fit in, how to connect to the RA, how to interact with the database.

And of course the real issue is: what's the target? Where is it to be used? By whom? For what?

I'll respond to only a couple of points.

Then following the sequence of the doco:

  • RM Pre 1: user-defined scalar types, but no mention of enumerated types -- are those somehow anti-Relational? Reading RM Pre 1 together with 4, 5 seems to leave no way to define enumerations. (I think these Pres suffer from over-specifying the implementation/underspecifying the objective.)
  • RM Pre 3 scalar operators seems to rule out polymorphic or overloaded operators. And yet overloading is one possible mechanism for handling the multiplicity of numeric types typically found in databases, without an explosion of a set of distinct operators for each type. Is there something ant-Relational about overloading? Generic operators only allowed on relations RM VSS 6.
  • RM Pre 4, 5 scalar data structures/types and Selectors. Is the Selector mechanism the only way to implement scalar data structures? Would it be anti-Relational if a candidate-D provided only TVAs and their Tuple-types? Tuple-types seem to satisfy most of the requirements about component access in those two Pres, except the implementation detail of positional parameters: is it anti-Relational to fail to provide positional access to components of non-scalars? This is in sharp contrast to RM Pro 1, 2 ruling out positional access to attributes or tuples.
    Multiple PossReps: what's the objective with those? Is it merely programming ergonomic convenience -- then why is it so necessary that every value for the type have a representation for each PossRep? Is it essential to treat each PossRep as of equal status? Is it anti-Relational to take one of the PossReps as 'base' and regard others as mapped to/from it, with a possibility those mappings are not total? (The D will presumably anyway need to cater for non-total operations, such as divide by zero or numeric overflow.)
  • RM Pre 6 Tuple types, operations, and RM Pre 12 Tuple variables, assignment, RM Pre 20 operators: why must tuples be first-class/is it anti-Relational to only support operations for singleton relations? Yes the abstract semantics for relations needs to define the structure in terms of tuples. That doesn't require tuples be realised concretely in the language AFAICS. Requiring it seems to preclude certain sorts of representations for relations -- such as 'vertical stores'. A D could support the TUP{ } type generator/pseudo-Selector as syntax purely within REL{ }.

FWIW my view is you can't really avoid tuple types, but there is no inherent value in ever seeing an isolated tuple attribute/value/variable etc. I would see them 'bundled' into the relation type, but there are issues.

  • RM Pre 8 comparison operator == "for every type T". Just no: indeed many of the types commonly held in databases support equality-comparison. Others don't (such as function types). Perhaps non-comparable types should be excluded from the database, but they shouldn't be excluded from the D's ecosystem. I'm not sure RM Pre 8 would support Image/Audio/document/BLOB types. Is it anti-Relational to hold those in a database?

Yes, but it's not a small issue. I think the answer is: the database can hold stuff that is not typed values in attributes in tuples; you have to get at it some other way; TTM doesn't stop you defining that other way.

  • RM Pre 14 virtual 'relvars' with RM Pre 21 assignment. Again just no: since there are no implementations of D-likes that support assignment to arbitrary virtuals, there's your explanation for "Why are there no Truly Relational DBMSs?".

Yes, but there is a whole layer missing here. Pseudo-variable assignment to a view is just fine, but only if you have a mechanism to specify what happens.

  • RM Pro 8 no "compound" or "composite" attributes seems to contra-indicate TVAs or RVAs. Again, what needs to be stated is the objective rather than the implementation. Presumably something along the lines 'no repeating groups'; but what makes RVAs Relational and "compound" attributes not?

The intent is clear, the wording is not.

  • OO Pre 5 nested transactions. AFAICT there is no D-like that supports those; indeed there's no very clear definition of how such a mechanism should behave. Another explanation for "Why are there no Truly Relational DBMSs?".
  • OO Pre 6 aggregate operator too much detail already.
  • OO Pro 1 "Relvars are not domains." too little detail already: what's the sense of "domains" here?
  • OO Pro 2 no pointers too little detail already: what distinguishes a pointer vs a surrogate key vs a Record-ID (auto-generated, not a disk-address), and are those latter two anti-Relational, and how to distinguish from RM VSS 1 supplied by the system? A timestamp is OK (presumably), a Record-ID not (presumably) but why? (I mean to proscribe a Record-ID accessible/visible within the D/joinable to other relations; I expect the D implementation would use Record-IDs internally/maybe make it visible for debugging.)

... ends.

Generally yes, but really nothing to add.

Andl - A New Database Language - andl.org
Quote from AntC on May 7, 2020, 11:47 am

 

  • RM Pre 4, 5 scalar data structures/types and Selectors. Is the Selector mechanism the only way to implement scalar data structures? Would it be anti-Relational if a candidate-D provided only TVAs and their Tuple-types? Tuple-types seem to satisfy most of the requirements about component access in those two Pres, except the implementation detail of positional parameters: is it anti-Relational to fail to provide positional access to components of non-scalars? This is in sharp contrast to RM Pro 1, 2 ruling out positional access to attributes or tuples.
    Multiple PossReps: what's the objective with those? Is it merely programming ergonomic convenience -- then why is it so necessary that every value for the type have a representation for each PossRep? Is it essential to treat each PossRep as of equal status? Is it anti-Relational to take one of the PossReps as 'base' and regard others as mapped to/from it, with a possibility those mappings are not total? (The D will presumably anyway need to cater for non-total operations, such as divide by zero or numeric overflow.)

 

While the similarities (between possreps with components and tuples with attributes) are striking, tuple types do not offer the equivalent of possrep constraints.  You cannot have

POSSREP {X RATIONAL, Y RATIONAL} CONSTRAINT X>0

if tuple types are all there is, because this would create a situation where you have clearly distinct nonscalar types with the same heading (and of the same "nonscalar type type" - both tuple types or both relation types).  Which means you now are going to have to name them so the user can even make that distinction.  Which might get you into the userfriendliness problem that our other friend from below the equator calls "creating the type system".

Re. "Is it anti-Relational to take one of the PossReps as 'base' and regard others as mapped to/from it" : (I think no, because) that's exactly what SIRA_PRISE user-defined types do.  Someone must define the implementation.  D&D swipe that aspect of the problem under the covers by relegating it to some unspecified "storage structure definition language" (I don't recall the precise words).  I believe that if a type designer does think about the implementation, he'll find 99.99% of the times that by dealing with the physical aspect, he has already unveiled the first possrep of which it will make great sense to expose it at the logical level too.  My 2c.

Quote from AntC on May 7, 2020, 11:47 am
  • RM Pre 6 Tuple types, operations, and RM Pre 12 Tuple variables, assignment, RM Pre 20 operators: why must tuples be first-class/is it anti-Relational to only support operations for singleton relations? Yes the abstract semantics for relations needs to define the structure in terms of tuples. That doesn't require tuples be realised concretely in the language AFAICS. Requiring it seems to preclude certain sorts of representations for relations -- such as 'vertical stores'. A D could support the TUP{ } type generator/pseudo-Selector as syntax purely within REL{ }.
  • RM Pre 8 comparison operator == "for every type T". Just no: indeed many of the types commonly held in databases support equality-comparison. Others don't (such as function types). Perhaps non-comparable types should be excluded from the database, but they shouldn't be excluded from the D's ecosystem. I'm not sure RM Pre 8 would support Image/Audio/document/BLOB types. Is it anti-Relational to hold those in a database?

 

(I think I already mentioned that re. TUP{} "outside" REL{}, SIRA_PRISE was once like that.  But it does not really provide a benefit for the user if support is that limited, does it ?  Unless of course the benefit is "If I have to do TUP{} everywhere then the system won't get built and you dear user won't have a system to use" ...)

Not having an equality operator for some type precludes that type from entering the database a priori.  The absence of its equality operator makes testing tuple equality impossible, so makes duplicate testing impossible, and the whole sheboopla that entails.  It might be tolerable / condoneable for such types to exist within the language but be kept out of the database, it might be equally reasonable to demand from the type designer to just define equality no matter how useless it is in practice to the end-user.  Same way java defines Object.equals().  Inventorising all the distinct possible senses in which two pieces of audio or two pieces of video might be "equivalent" or not is not something anyone wants to do (nor should need to) when defining equality.  Audio sampled at 44K and sampled at 48K.  Same or not ?  Color photograph converted to grayscale.  Same or not ?  etc. etc.