The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

Quote from AntC on October 22, 2019, 9:52 am
Quote from Dave Voorhis on October 22, 2019, 7:15 am

... an attempt to comprehensively explore integration of a reasonably rich type system (despite its flaws) with the relational model. That should invite doing the same with alternative type systems, which hasn't (to my knowledge) yet been done

Well, (scalar) types are supposed to be orthogonal to model. I'm not seeing why a (scalar) type system (however rich) needs a particularly relational flavour. Non-scalar types might be a different kettle of fish. OTOH they boil down to data structures with scalar types innermost. So if a type system is rich enough to accommodate (polymorphic/parametric) data structures, doesn't that get us nearly all the way there?

Maybe. Or maybe not. Until the work is done, it's only an assumption, which leaves Date's TIRT as the only thorough work (to my knowledge) in the area.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on October 22, 2019, 10:35 am
Quote from Erwin on October 22, 2019, 8:45 am

I think footnote 2 ***is*** quite right.

In the domain-ordered approach the relations

BOM(containing_part, contained_part) {(1,2)} and BOM(contained_part, containing_part) {(2,1)}

But those examples aren't domain-ordered; they're 'role-qualified'. Domain-ordered would be

BOM(part, part) {(1,2)} -- in which case this value is not equal: BOM(part, part) {(2,1)}.

(I assume you're putting parens round the heading and tuple rather than braces, to show it's ordered, not a set.)

Yes, thanks for the correction.

Though of course Codd also wrote reasonably explicitly that if domain names are not unique within a relation, they MUST be qualified with the role name, and it would overall get to be something syntactically like

BOM(containing:part, contained:part) {(1,2)} -- in which case this value is not equal: BOM(contained:part, containing:part) {(2,1)}.

By now I'm pretty convinced one of the things Codd had in mind when he wrote about "identifying attributes (/attribute values) by domain name" was exactly for the purpose of "not comparing weights to lengths" and the like.  Which in the most prevailing languages of the day was a very easy thing to do, given that those languages only had "types" 'numeric' and 'text', so to speak (and were primarily focused on physical representation, as you already stated).

Quote from AntC on October 22, 2019, 10:35 am
Quote from Erwin on October 22, 2019, 8:45 am

 

  They never get any farther than the equally meaningless/unclear "domains are not types, they have types".  It could mean that for a given domain, a multitude of types could be used to implement that domain in a computer system.  That would imply of necessity that domains and types cannot possibly "be the same thing".  But I think that's nitpickingly seeking excuses for the mere purpose of bashing TTM.  As far as implementation on a computer is concerned, types are the obvious device to map domains to, and choosing distinct types for the same domain is probably never the wisest choice the designer can make.

I'm not seeing any evidence Codd would allow "choosing distinct types for the same domain". What he seems to be doing in those Figs is 'choosing the same types for different domains'. (That is if I make some guesses about what you're meaning by type.) We have an equivalent in modern type theory (which unfortunately goes by a number of different/confusing terms): "declaration that does introduce a new, distinct type, isomorphic to an existing type." 'Isomorphic' there means: contains the same "pool of values" and supporting the same operations/methods within the type.

This is a powerful feature in nominative/nominal type systems: the PhysReps are the same; those 'casts' are nullops -- including the 'casts' from numeric literals to lengths/weights; the type system does the bookkeeping; and can apply type erasure semantics.

Oh, ah, I meant to add this same PhysRep/different domain potentially conflicts with RM Pre 4's "By definition, v has exactly one physical representation and one or more possible representations (at least one, because there is obviously always one that is the same as the physical representation)."

Potentially part 4 has the same PhysRep as supplier 4 as quantity 4. Their PossReps are different. Or perhaps I should be more cautious here: 4 must appear in a syntactic context in a D where the domain it belongs to can be inferred. This is no more than requiring numeric literals appear in a context where we can infer if 4 represents INT vs INT64 vs RAT vs FLOAT vs DOUBLE etc.

I must say that writing/reading REL{TUP{S# 4, P# 4, QTY 4}} is far more ergonomic than REL{TUP{S# S#('S4'), P# P#('P4'), QTY 4}}. Something like SNAME NAME('SMITH') has always struck me as excessive circumlocution. Doubtless there's some pedagogical significance, but if we do away with multiple PossReps, we can surely do away with some of the clutter.

 

 

Quote from Erwin on October 22, 2019, 11:29 am

By now I'm pretty convinced one of the things Codd had in mind when he wrote about "identifying attributes (/attribute values) by domain name" was exactly for the purpose of "not comparing weights to lengths" and the like.

That's possible, but it's also possible that when Codd wrote the original paper he simply didn't think it through.

It's easy to assume such a landmark work was meticulously crafted by a genius who thought of everything, but that's only because it turned out to be a landmark work. That has no doubt retroactively imbued every passing remark with significance that it doesn't deserve. Like most academic and research papers, Codd 1970 was almost certainly written in a hurry under deadline pressure and page-count constraints, and thus given no more thought than millions of other papers that are tossed out and forgotten. I.e., some, but much left un- or hastily- considered.

In other words, trying to mine Codd 1970 for modern meaning (or great insights which we will surely discover if we read it again! Etc... Not.) beyond the obvious (the RM itself) is, I think, unproductive.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on October 22, 2019, 11:38 am

I must say that writing/reading REL{TUP{S# 4, P# 4, QTY 4}} is far more ergonomic than REL{TUP{S# S#('S4'), P# P#('P4'), QTY 4}}. Something like SNAME NAME('SMITH') has always struck me as excessive circumlocution. Doubtless there's some pedagogical significance, but if we do away with multiple PossReps, we can surely do away with some of the clutter.

Even with multiple possreps, I think it would be possible to (at least some of the time) infer selectors unambiguously from literals, but I imagine it might only add confusion in the pedagogic contexts where it's intended to be used.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 22, 2019, 11:49 am
Quote from Erwin on October 22, 2019, 11:29 am

By now I'm pretty convinced one of the things Codd had in mind when he wrote about "identifying attributes (/attribute values) by domain name" was exactly for the purpose of "not comparing weights to lengths" and the like.

That's possible, but it's also possible that when Codd wrote the original paper he simply didn't think it through.

It's easy to assume such a landmark work was meticulously crafted by a genius who thought of everything, but that's only because it turned out to be a landmark work. That has no doubt retroactively imbued every passing remark with significance that it doesn't deserve. Like most academic and research papers, Codd 1970 was almost certainly written in a hurry under deadline pressure and page-count constraints, and thus given no more thought than millions of other papers that are tossed out and forgotten. I.e., some, but much left un- or hastily- considered.

In other words, trying to mine Codd 1970 for modern meaning (or great insights which we will surely discover if we read it again! Etc... Not.) beyond the obvious (the RM itself) is, I think, unproductive.

Fair points. They apply just as much to Date's interpretation/claims for Codd 1969/1970; and for his critiques/blaming on those papers.

I'm dissatisfied with SQL's treatment of types. I'm dissatisfied with TTM's treatment. (I guess TTM minus multiple PossReps wouldn't be coherent.) I don't see either as 'necessary' to support thea relational model. Most theoretical work on RMs takes domains/pools of values for granted/as uninteresting. I'm casting about for glimpses of other possibilities.

Quote from AntC on October 22, 2019, 12:04 pm
Quote from Dave Voorhis on October 22, 2019, 11:49 am
Quote from Erwin on October 22, 2019, 11:29 am

By now I'm pretty convinced one of the things Codd had in mind when he wrote about "identifying attributes (/attribute values) by domain name" was exactly for the purpose of "not comparing weights to lengths" and the like.

That's possible, but it's also possible that when Codd wrote the original paper he simply didn't think it through.

It's easy to assume such a landmark work was meticulously crafted by a genius who thought of everything, but that's only because it turned out to be a landmark work. That has no doubt retroactively imbued every passing remark with significance that it doesn't deserve. Like most academic and research papers, Codd 1970 was almost certainly written in a hurry under deadline pressure and page-count constraints, and thus given no more thought than millions of other papers that are tossed out and forgotten. I.e., some, but much left un- or hastily- considered.

In other words, trying to mine Codd 1970 for modern meaning (or great insights which we will surely discover if we read it again! Etc... Not.) beyond the obvious (the RM itself) is, I think, unproductive.

Fair points. They apply just as much to Date's interpretation/claims for Codd 1969/1970; and for his critiques/blaming on those papers.

I'm dissatisfied with SQL's treatment of types. I'm dissatisfied with TTM's treatment. (I guess TTM minus multiple PossReps wouldn't be coherent.) I don't see either as 'necessary' to support thea relational model. Most theoretical work on RMs takes domains/pools of values for granted/as uninteresting. I'm casting about for glimpses of other possibilities.

You might have to invent them. Most of computer science and engineering is a result of someone being dissatisfied with X's treatment of Y.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 22, 2019, 11:02 am
Quote from dandl on October 22, 2019, 10:20 am

I find the distinction clear enough. Domains identify attributes (and perhaps a set of permitted values) in tuples in relations in the RA. Types are a feature of programming languages. TTM offers unification of the two concepts, but this is a choice. It would be perfectly possible to implement a relational database strictly conforming to Codd 1970, and access it from a typeless programming language. Types are a useful luxury, not a basic essential.

There is no such thing as a 'typeless programming language', even with languages like Tcl that have essentially one type (string, with some numeric manipulations of numeric strings) and assembly languages (at least, in those without descriptors) where types are expressly manifest only in opcode semantics.

I disagree. A typeless programming language is one the provides/imposes no type-like rules on values. A value may be treated as an integer, float, string, pointer etc by simply using it as if it were, usually by passing it as an argument to an operator.There is no way to determine the type of any value, either at compile time or runtime. Values are anonymous bit patterns that  are intepreted in whatever way you choose to use them. A Turing machine is typeless. So is Forth, I guess, it just has words on a stack.

TCL has a single type, string, and many operations on strings. Some strings represent numbers, some do not. That' s not really typeless.

BCPL was typeless in that it had a machine word of a predetermined size as its only data type, and that word could be used as an integer, pointer, character etc at will. It made writing compilers easier, and using them harder!

Parts of this philosophy made their way into its successor B, and thence into the C language that finally succeeded it. You can still write typeless code in C, if you don't mind adding a few casts and ignoring a few warnings.

Do you mean that user-defined types are a useful luxury?  Or that have more types than just 'string' is a useful luxury?

It's tongue-in-cheek, but Turing guaranteed we can compute anything without needing a type system; types then are a very convenient thing, but not essential for computation to happen.

 

 

Andl - A New Database Language - andl.org
Quote from AntC on October 22, 2019, 11:03 am
Quote from dandl on October 22, 2019, 9:00 am

I agree. In my view a domain is an abstract entity that provides a unique identifier for an attribute/column, and also constrains the set of permitted values. Domains are expected to be unique within a relation, but if not it can be further qualified by a role, which does not however play any part in matching up attributes for the RA.

Where this might lead to (if we try to graft on to modern type theory, and do away with all the hangover from ordered/positional tuples/relations) is that domains are roles are attribute names, we don't need two/three separate concepts.

There are two separate concepts, but the issue is that we would like to rationalise them in order to implement the RA in a programming language. These are choices.

A domain does not provide any of the other features of a programming language type, such as permitted operations, sub-typing etc.

Hmm hmm. See my reply to Erwin. We might use the same "pool of values" (i.e. INTs) for part and quantity. We might want to do arithmetic over quantity. We might even want to do (limited) arithmetic or at least sorting/ordering over part. But we certainly don't want to add part numbers to quantity.

The RA provides no such capability. We have the First Order RA with domains and all the queries Codd wrote about in 1970. But now we want the Second Order RA in order to write queries that generate new values, which means open expressions, which in both SQL and TTM means adding a type system. SQL stops there, but in TTM we want unsafe queries, so we add a full blown Turing complete language, with SORA embedded.

These are choices we make, for good reasons, but they have consequences. I can imagine sticking with domains and coming up with a different way to match it to a modern language and type system, but it's not obvious it would be worth the trouble.

If you replace domains by types (as TTM does) you lose the uniqueness feature so you have to add in attribute names. Now there is potential for confusion between names across relations (enter RENAME) and the way names are used in programming languages (hence the weird TTM type system).

Perhaps there is a different way to marry up relations, domains and types, but I don't know what it might be.

For any heading, requiring unique domains (named pool of values) is isomorphic to requiring unique role-qualifiers (role name + domain) is isomorphic to requiring unique attributes (attribute name + TTM type, which is a type name + set of values). I see redundancy.

This doesn't avoid RENAME: we still need part in role of assembly vs part in role of component. If distinct domains are defined over the same "pool of values", that's a type-level renaming resulting in an execution-time nullop.

I sense that it does. In that case where the domain part is ambiguous, the RA expression has to be annotated as to which role to use, but the result value is still a part.

Andl - A New Database Language - andl.org
Quote from dandl on October 22, 2019, 1:02 pm
Quote from Dave Voorhis on October 22, 2019, 11:02 am
Quote from dandl on October 22, 2019, 10:20 am

I find the distinction clear enough. Domains identify attributes (and perhaps a set of permitted values) in tuples in relations in the RA. Types are a feature of programming languages. TTM offers unification of the two concepts, but this is a choice. It would be perfectly possible to implement a relational database strictly conforming to Codd 1970, and access it from a typeless programming language. Types are a useful luxury, not a basic essential.

There is no such thing as a 'typeless programming language', even with languages like Tcl that have essentially one type (string, with some numeric manipulations of numeric strings) and assembly languages (at least, in those without descriptors) where types are expressly manifest only in opcode semantics.

I disagree. A typeless programming language is one the provides/imposes no type-like rules on values. A value may be treated as an integer, float, string, pointer etc by simply using it as if it were, usually by passing it as an argument to an operator.There is no way to determine the type of any value, either at compile time or runtime. Values are anonymous bit patterns that  are intepreted in whatever way you choose to use them. A Turing machine is typeless. So is Forth, I guess, it just has words on a stack.

TCL has a single type, string, and many operations on strings. Some strings represent numbers, some do not. That' s not really typeless.

BCPL was typeless in that it had a machine word of a predetermined size as its only data type, and that word could be used as an integer, pointer, character etc at will. It made writing compilers easier, and using them harder!

Parts of this philosophy made their way into its successor B, and thence into the C language that finally succeeded it. You can still write typeless code in C, if you don't mind adding a few casts and ignoring a few warnings.

It sounds like you may be conflating "typeless" with "minimal (or no) type checking" and/or "weakly typed".

"Typeless" is now a rather archaic term when used with programming languages, and would raise eyebrows even for B and BCPL, or Forth and assembly languages without descriptors. The preferred term is (usually) "weakly typed" and/or some appropriate description of how the type system works. Turing machines and untyped lambda calculus are indeed typeless, but they're theoretical constructs in which there is categorically no distinction between types. Implementations on real machines are almost inevitably weakly typed -- with varying degrees of type checking from none to some -- but always some distinction between types, even if only in how different (human-selected, so the human provides the type checking) operators interpret their operands.

Only if there is truly no distinction between interpretations of bit patterns at machine addresses can a system be considered typeless. As soon as there is some semantic distinction between the types (!) of data stored at addresses, it is (at least weakly) typed.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org