The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

Page 1 of 22Next

I dipped into Date's TIRT [2016] a little. Almost immediately [Chapter 1, second paragraph], I bumped into something he attributes to Codd which I think is a mis-representation. Date appears to be merely regurgitating his previous (mis-)readings of Codd.

the relational model implicitly requires a further theoretical footing of its own: namely, a supporting, or underlying, theory of types (or domains, as Codd called them in his early papers).

Is what Codd called domain something we'd today call type? I think not. I say that from the very wording of Codd's "early papers".

I'm specifically looking at Codd 1970, although much of the material wrt domain is word-for-word the same as Codd 1969. (As we've discussed before) Codd in 1970 would not have used type: programming language type theory of that vintage was preoccupied with physical representation, not abstract/mathematical properties; Codd as a mathematician would have thought of relations over domains. I think it's also significant that in his later writings and discussions, Codd refused to identify type with domain -- as reported by D&D.

Codd 1970 has some extra material over 1969 wrt domain, which suggests to me his thinking was evolving/not yet fixed. Let's start with what is common between the two papers:

One relation called part is defined on the following domains:
(1) part number
(2) part name
(3) part color
(4) part weight
(5) quantity on hand
(6) quantity on order
and possibly other domains as well. Each of these domains is, in effect, a pool of values, ...

What do the "values" in this "pool" look like? There's an example relation value Fig 1. in both papers [page numbered 379 in the 1970 paper, WordPress is murdering the formatting]:

supply (supplier part project quantity)

1     2     5     17
1     3     5     23
2     3    7      9
2     7    5      4
4     1     1     12

FIG. 1. A relation of degree 4

Without wishing to over-interpret the digits appearing in those cells, I can see 1 appears in multiple columns (i.e. multiple domains), as do 2, 4, 7. Those might be integers used as measures or cardinals or ordinals, but we can at least say that quantity 4 isn't the same as supplier 4. I want to say (contra RM Pre 1): if a domain's "pool of values" is to be interpreted as "a named set of scalar values", those sets of values are not necessarily disjoint; the domain keeps the values distinct. Let's pay attention to the comment following Fig 1 (in both papers):

One might ask: If the columns are labeled by the name of corresponding domains, ...

So domains have names (per RM Pre 1 "named"); we might use the name as label a columnar/tabular representation of a relation value. But not necessarily (again from both papers):

... two columns may have identical headings (indicating identical domains) but possess distinct meanings with respect to the relation. The relation depicted [Fig 2] is called component. It is a ternary relation whose first two domains are called part and third domain is called quantity.

(To be precise, Fig 2 in the 1969 has only two columns, both labelled part. This is the classic parts assembly/component hierarchy.)

It's in consideration of relations with "identical domains" in multiple columns the 1970 paper has extra material [top of p 380].

we propose that users deal, not with relations which are domain-ordered, but with relationships which are their domain-unordered counterparts.2 To accomplish this, domains must be uniquely identifiable at least within any given relation, without using position. Thus, where there are two or more identical domains, we require in each case that the domain name be qualified by a distinctive role name, ...

2 In mathematical terms, a relationship is an equivalence class of those relations that are equivalent under permutation of domains (see Section 2.1.1).

Footnote 2 can't be quite right, which is why I say Codd's thinking was in flux: if a relation has two identical domains, a permutation obtained by swapping those two domains is indistinguishable. So the role name is essential to distinguishing -- hence we arrive at the TTM model for relation values (presuming Codd's role name corresponds to TTM attribute name.)

Codd doesn't go into enough detail to determine if the "pool of values" for domains supplier, part, project, quantity are identical, whether any are a strict subset, etc. But then he doesn't need to: values always notionally carry their domain with them. Therefore also their Possible Representations might be the same -- excepting that in a tabular representation they appear in different columns, labelled for the "domains (role qualified whenever necessary)".

If a value can appear in multiple domains, that doesn't go well for TTM's IM. But again that doesn't matter if values always carry their domain with them. Under nominative typing, it's sufficient to distinguish types if they have different names, irrespective of the "pool of values" or however the type model specifies elements of the type.

I think it's sometimes difficult to appreciate the degree to which the late 1960's and very early 1970's computer science was a curiously baroque mix of well-formed and (in retrospect) ill-formed or "antique" thinking. At the time Codd 1970 was written, the work that largely shaped current thinking on programming language type systems was still 13 years away (Standard ML, 1983) or 20 years away (Haskell, 1990) or 15 years away (Cardelli, On Understanding Types, Data Structures, and Polymorphism, 1985). Smalltalk and Simula date from the same era, but are ad hoc achievements rather than theoretical or theoretically-informed frameworks.

Indeed, the notions behind "modern" type systems were only starting to appear in 1974 and 1975 with Reynolds' Towards a Theory of Type Structure and User-Defined Types and Procedural Data Structures as Complementary Approaches to Data Abstraction.

So it doesn't really matter whether Codd meant "domain" = type or "domain" = pool of values (subset of type), and indeed might -- given its origin deep in the primordial type-thinking ooze -- have vaguely meant either or both. What's notable about TIRT is that it's an attempt to comprehensively explore integration of a reasonably rich type system (despite its flaws) with the relational model. That should invite doing the same with alternative type systems, which hasn't (to my knowledge) yet been done -- other than various SQL vendors' implementations of typically-bletcherous SQL type systems.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I think footnote 2 ***is*** quite right.

In the domain-ordered approach the relations

BOM(containing_part, contained_part) {(1,2)} and BOM(contained_part, containing_part) {(2,1)}

are not the same but in the corresponding domain-unordered approach they are, and all he's trying to say here is that a "relation-in-the-domain-unordered-approach" (for which he uses the term "relationship") corresponds 1-1 to a whole equivalence class defined in the hopefully obvious way by [also hopefully obvious] "domain ordering permutation functions" applied to [the members of] the set of domain-ordered-relations.

 

As for the types/domains thing, the clan that insists the hardest on recognizing and maintaining the distinction, has never been able to explain clearly what that distinction actually is.  They never get any farther than the equally meaningless/unclear "domains are not types, they have types".  It could mean that for a given domain, a multitude of types could be used to implement that domain in a computer system.  That would imply of necessity that domains and types cannot possibly "be the same thing".  But I think that's nitpickingly seeking excuses for the mere purpose of bashing TTM.  As far as implementation on a computer is concerned, types are the obvious device to map domains to, and choosing distinct types for the same domain is probably never the wisest choice the designer can make.

Quote from AntC on October 22, 2019, 5:39 am

I dipped into Date's TIRT [2016] a little. Almost immediately [Chapter 1, second paragraph], I bumped into something he attributes to Codd which I think is a mis-representation. Date appears to be merely regurgitating his previous (mis-)readings of Codd.

the relational model implicitly requires a further theoretical footing of its own: namely, a supporting, or underlying, theory of types (or domains, as Codd called them in his early papers).

Is what Codd called domain something we'd today call type? I think not. I say that from the very wording of Codd's "early papers".

I agree. In my view a domain is an abstract entity that provides a unique identifier for an attribute/column, and also constrains the set of permitted values. Domains are expected to be unique within a relation, but if not it can be further qualified by a role, which does not however play any part in matching up attributes for the RA.

A domain does not provide any of the other features of a programming language type, such as permitted operations, sub-typing etc.

If you replace domains by types (as TTM does) you lose the uniqueness feature so you have to add in attribute names. Now there is potential for confusion between names across relations (enter RENAME) and the way names are used in programming languages (hence the weird TTM type system).

Perhaps there is a different way to marry up relations, domains and types, but I don't know what it might be.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on October 22, 2019, 7:15 am

...

So it doesn't really matter whether Codd meant ...

I'm pretty sure Codd's domain doesn't mean the same as RM Pre 1's 'scalar type', which is all I was trying to point out. Then TTM's IM can't apply either. I agree it's non-trivial to match up domain to any of the myriad of modern senses of programming language type.

"domain" = type or "domain" = pool of values (subset of type),

Neither do I see Codd's "pool of values" corresponding to 'subset of type'. That is: not if you're implying the subset must be defined/declared explicitly through a constraint on some TTM-style type's set of values. Perhaps we don't need to be curious about how/where values arise? We just pull some into the pool and give it a name.

... an attempt to comprehensively explore integration of a reasonably rich type system (despite its flaws) with the relational model. That should invite doing the same with alternative type systems, which hasn't (to my knowledge) yet been done

Well, (scalar) types are supposed to be orthogonal to model. I'm not seeing why a (scalar) type system (however rich) needs a particularly relational flavour. Non-scalar types might be a different kettle of fish. OTOH they boil down to data structures with scalar types innermost. So if a type system is rich enough to accommodate (polymorphic/parametric) data structures, doesn't that get us nearly all the way there?

-- other than various SQL vendors' implementations of typically-bletcherous SQL type systems.

Indeed. I wouldn't be looking to SQL for well-founded type theory.

Quote from Erwin on October 22, 2019, 8:45 am

As for the types/domains thing, the clan that insists the hardest on recognizing and maintaining the distinction, has never been able to explain clearly what that distinction actually is.  They never get any farther than the equally meaningless/unclear "domains are not types, they have types".  It could mean that for a given domain, a multitude of types could be used to implement that domain in a computer system.  That would imply of necessity that domains and types cannot possibly "be the same thing".  But I think that's nitpickingly seeking excuses for the mere purpose of bashing TTM.  As far as implementation on a computer is concerned, types are the obvious device to map domains to, and choosing distinct types for the same domain is probably never the wisest choice the designer can make.

I don't know who that clan might be, but the universe of prospective solutions to the problem is not large.

I find the distinction clear enough. Domains identify attributes (and perhaps a set of permitted values) in tuples in relations in the RA. Types are a feature of programming languages. TTM offers unification of the two concepts, but this is a choice. It would be perfectly possible to implement a relational database strictly conforming to Codd 1970, and access it from a typeless programming language. Types are a useful luxury, not a basic essential.

Andl - A New Database Language - andl.org
Quote from Erwin on October 22, 2019, 8:45 am

I think footnote 2 ***is*** quite right.

In the domain-ordered approach the relations

BOM(containing_part, contained_part) {(1,2)} and BOM(contained_part, containing_part) {(2,1)}

But those examples aren't domain-ordered; they're 'role-qualified'. Domain-ordered would be

BOM(part, part) {(1,2)} -- in which case this value is not equal: BOM(part, part) {(2,1)}.

(I assume you're putting parens round the heading and tuple rather than braces, to show it's ordered, not a set.)

are not the same but in the corresponding domain-unordered approach they are, and all he's trying to say here is that a "relation-in-the-domain-unordered-approach" (for which he uses the term "relationship") corresponds 1-1 to a whole equivalence class defined in the hopefully obvious way by [also hopefully obvious] "domain ordering permutation functions" applied to [the members of] the set of domain-ordered-relations.

Yes I do think the correspondence he means is obvious, so I didn't dwell on it.

As for the types/domains thing, the clan that insists the hardest on recognizing and maintaining the distinction, has never been able to explain clearly what that distinction actually is.

Who is this clan? AFAICT, D&D made their mis-reading at an early stage and have never reconsidered. Those who've taken issue with D&D on their interpretations of Codd-in-general haven't debated this particular point(?) Or if they have then where? (Let us tread lightly, we don't want to wake up those who live under the bridge.)

  They never get any farther than the equally meaningless/unclear "domains are not types, they have types".  It could mean that for a given domain, a multitude of types could be used to implement that domain in a computer system.  That would imply of necessity that domains and types cannot possibly "be the same thing".  But I think that's nitpickingly seeking excuses for the mere purpose of bashing TTM.  As far as implementation on a computer is concerned, types are the obvious device to map domains to, and choosing distinct types for the same domain is probably never the wisest choice the designer can make.

I'm not seeing any evidence Codd would allow "choosing distinct types for the same domain". What he seems to be doing in those Figs is 'choosing the same types for different domains'. (That is if I make some guesses about what you're meaning by type.) We have an equivalent in modern type theory (which unfortunately goes by a number of different/confusing terms): "declaration that does introduce a new, distinct type, isomorphic to an existing type." 'Isomorphic' there means: contains the same "pool of values" and supporting the same operations/methods within the type. So we might have a numeric type for lengths (on which we support arithmetic); and a distinct type for weights (on which we support arithmetic); but we don't directly support arithmetic between lengths and weights. A program would need to explicitly cast a length and a weight to some common isomorphic type, perform the arithmetic, then cast back to some other units.

This is a powerful feature in nominative/nominal type systems: the PhysReps are the same; those 'casts' are nullops -- including the 'casts' from numeric literals to lengths/weights; the type system does the bookkeeping; and can apply type erasure semantics.

And this keeping types separate connects to the earlier debate about auto-coercing numerics: wot we do not do. If you want to cast INT to RAT your code must be explicit, just as casting INT to LENGTH.

Then as my first guess: a domain is a named "pool of values", which values must draw from the same type/or perhaps which values must be isomorphic to some type's values. Distinct domains (IOW distinctly named) are distinct even if they're isomorphic to the same type's values.

Quote from dandl on October 22, 2019, 10:20 am
Quote from Erwin on October 22, 2019, 8:45 am

As for the types/domains thing, the clan that insists the hardest on recognizing and maintaining the distinction, has never been able to explain clearly what that distinction actually is.  They never get any farther than the equally meaningless/unclear "domains are not types, they have types".  It could mean that for a given domain, a multitude of types could be used to implement that domain in a computer system.  That would imply of necessity that domains and types cannot possibly "be the same thing".  But I think that's nitpickingly seeking excuses for the mere purpose of bashing TTM.  As far as implementation on a computer is concerned, types are the obvious device to map domains to, and choosing distinct types for the same domain is probably never the wisest choice the designer can make.

I don't know who that clan might be, but the universe of prospective solutions to the problem is not large.

I find the distinction clear enough. Domains identify attributes (and perhaps a set of permitted values) in tuples in relations in the RA. Types are a feature of programming languages. TTM offers unification of the two concepts, but this is a choice. It would be perfectly possible to implement a relational database strictly conforming to Codd 1970, and access it from a typeless programming language. Types are a useful luxury, not a basic essential.

There is no such thing as a 'typeless programming language', even with languages like Tcl that have essentially one type (string, with some numeric manipulations of numeric strings) and assembly languages (at least, in those without descriptors) where types are expressly manifest only in opcode semantics.

Do you mean that user-defined types are a useful luxury?  Or that have more types than just 'string' is a useful luxury?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on October 22, 2019, 9:00 am
Quote from AntC on October 22, 2019, 5:39 am

I dipped into Date's TIRT [2016] a little. Almost immediately [Chapter 1, second paragraph], I bumped into something he attributes to Codd which I think is a mis-representation. Date appears to be merely regurgitating his previous (mis-)readings of Codd.

the relational model implicitly requires a further theoretical footing of its own: namely, a supporting, or underlying, theory of types (or domains, as Codd called them in his early papers).

Is what Codd called domain something we'd today call type? I think not. I say that from the very wording of Codd's "early papers".

I agree. In my view a domain is an abstract entity that provides a unique identifier for an attribute/column, and also constrains the set of permitted values. Domains are expected to be unique within a relation, but if not it can be further qualified by a role, which does not however play any part in matching up attributes for the RA.

Where this might lead to (if we try to graft on to modern type theory, and do away with all the hangover from ordered/positional tuples/relations) is that domains are roles are attribute names, we don't need two/three separate concepts.

A domain does not provide any of the other features of a programming language type, such as permitted operations, sub-typing etc.

Hmm hmm. See my reply to Erwin. We might use the same "pool of values" (i.e. INTs) for part and quantity. We might want to do arithmetic over quantity. We might even want to do (limited) arithmetic or at least sorting/ordering over part. But we certainly don't want to add part numbers to quantity.

If you replace domains by types (as TTM does) you lose the uniqueness feature so you have to add in attribute names. Now there is potential for confusion between names across relations (enter RENAME) and the way names are used in programming languages (hence the weird TTM type system).

Perhaps there is a different way to marry up relations, domains and types, but I don't know what it might be.

For any heading, requiring unique domains (named pool of values) is isomorphic to requiring unique role-qualifiers (role name + domain) is isomorphic to requiring unique attributes (attribute name + TTM type, which is a type name + set of values). I see redundancy.

This doesn't avoid RENAME: we still need part in role of assembly vs part in role of component. If distinct domains are defined over the same "pool of values", that's a type-level renaming resulting in an execution-time nullop.

Quote from AntC on October 22, 2019, 9:52 am
Quote from Dave Voorhis on October 22, 2019, 7:15 am

...

So it doesn't really matter whether Codd meant ...

I'm pretty sure Codd's domain doesn't mean the same as RM Pre 1's 'scalar type', which is all I was trying to point out. Then TTM's IM can't apply either. I agree it's non-trivial to match up domain to any of the myriad of modern senses of programming language type.

"domain" = type or "domain" = pool of values (subset of type),

Neither do I see Codd's "pool of values" corresponding to 'subset of type'. That is: not if you're implying the subset must be defined/declared explicitly through a constraint on some TTM-style type's set of values.

Not now, no. Now we'd define a "pool of values" as a relvar P -- perhaps with a single attribute -- and define constraints against it if we want to make sure attribute A in relvar B can only belong to the pool of values in relvar P.

In 1970, notions of "type" were looser and more amorphous, it seemed reasonable to have a notion of Domain for P to be defined as a type-ish thing. Now we know better. The type is the invariant; inclusion constraints support the arbitrary (and perhaps notionally variant) pool of values. True invariant "pools" are specified via type constraints.

But we know all that now, because it's been thought through time and time again. In 1970, no.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Page 1 of 22Next