The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

PreviousPage 4 of 22Next
Quote from AntC on October 23, 2019, 3:52 am

 

As someone who mostly thinks in terms of parametric/polymorphic struct types (of which a relation is a particular shape), limiting the number of values (you don't mean "number of types"?) to ℵ₀ seems almost laughable.

No, I do mean "number of types".  There are ℵ₁ different subsets of the set of all objects (modulo Cantor's paradox), but only ℵ₀ different names (at least if we require names to have finite length), therefore only ℵ₀ types.

Trivial? <goggle> You still have to figure out with (say) numeric operations what type of result (NAT/INT/FLOAT/RAT/etc) to generate from the type of the arguments. Without a hierarchy of numeric types, that's harder.

"Type theory" specifically means the theory of static types, which are attached not to objects but to their representations in code.  The number 1 belongs to dynamic type Integer, but the code 1 has a static type of Integer in a statically typed language.  In a dynamically typed language, its static type is Object, just lik everything else.

If you're subtracting two NATs, do you assume the result is INT, or do you wait to see if you get a negative result? If you're adding/multiplying two INT32s, do you assume the result is INT64, or do you wait to see if the result overflows 32 bits? etc, etc.

These questions only make sense in the world of dynamic types.  The static type of NAT - NAT is INT, period.

Quote from johnwcowan on October 23, 2019, 11:30 am
Quote from AntC on October 23, 2019, 3:52 am

 

As someone who mostly thinks in terms of parametric/polymorphic struct types (of which a relation is a particular shape), limiting the number of values (you don't mean "number of types"?) to ℵ₀ seems almost laughable.

No, I do mean "number of types".  There are ℵ₁ different subsets of the set of all objects (modulo Cantor's paradox), but only ℵ₀ different names (at least if we require names to have finite length), therefore only ℵ₀ types.

Oh dear. You need to read TTM more carefully. An example TTM type name might be REL{S# S#, P# P#, QTY INT}. A type's name (in both TTM and type theory) is typically structured. A parameterised/polymorphic type's name includes variables, for example (Ord a, Eq b) => Set (a, b) (being a Set-of-pairs type constructor parameterised over one type that must be Orderable and another testable for equality). That programming languages have textual representations of type names, and that in extant programs those are finite length is by-the-by. I'll be wanting type names for relations like (Arith a) => {{S#, P#, a}} for a relation literal where type inference hasn't yet resolved all the attributedomain types.

Trivial? <goggle> You still have to figure out with (say) numeric operations what type of result (NAT/INT/FLOAT/RAT/etc) to generate from the type of the arguments. Without a hierarchy of numeric types, that's harder.

"Type theory" specifically means the theory of static types, which are attached not to objects but to their representations in code.  The number 1 belongs to dynamic type Integer, but the code 1 has a static type of Integer in a statically typed language.  In a dynamically typed language, its static type is Object, just lik everything else.

Not every language is object-oriented. Not every language is wholly dynamically typed or wholly statically typed. Not every language subscribes to whatever theory of typing is in your head in those remarks. Not every language treats bare lexeme 1 as necessarily Integer type.

If you're subtracting two NATs, do you assume the result is INT, or do you wait to see if you get a negative result? If you're adding/multiplying two INT32s, do you assume the result is INT64, or do you wait to see if the result overflows 32 bits? etc, etc.

These questions only make sense in the world of dynamic types.  The static type of NAT - NAT is INT, period.

No: the static type of NAT - NAT is whatever type the - operator tells us to infer from the type of its arguments. For example in Haskell we have (-) :: a -> a -> a so the result of the subtraction would be type NAT. And that means it'll throw an exception if the result would be negative. (Or possibly it'll return zero.)

Quote from johnwcowan on October 23, 2019, 11:30 am

"Type theory" specifically means the theory of static types, which are attached not to objects but to their representations in code.

Kind of... A type theory is one of various purely mathematical abstractions that formalise types in abstracto. A given computer language may draw upon type theory to a greater or lesser degree to inform the design of its type system, which presumably exists to achieve some desirable level of type safety -- i.e., how different kinds of parts of a language may be allowed to fit together -- which may be static, dynamic, or both.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on October 23, 2019, 11:19 am
Quote from AntC on October 22, 2019, 9:27 pm
Quote from Hugh on October 22, 2019, 4:46 pm

I'm responding to the original post and possibly repeating material that's been given in the plethora of posts that followed it (which I haven't read yet).

Thank you Hugh. And yes there's been quite a flurry, furthermore sticking to the subject.

Codd certainly thought of types as physical representations and that's why he chose the mathematical term domain.  But he was a bit woolly regarding the exact meaning, especially when he appeared to be confusing the terms "domain" and "attribute", as Chris has noted.

When did the term "attribute" appear (in a RM context)? I see "attribute" does appear a few times in Codd 1970. Amongst the plethora you haven't read yet is a suggestion that Codd wasn't so much "confusing" those terms as consciously equating them (at least in straightforward cases). And that equating is possible because domain does not mean type.

Our claim that TTM types serve exactly the same purpose as Codd's domains is based on a reasonable interpretation of what he was driving at.  In particular, values of different domains were not comparable (so SQL domains are certainly not Codd ones).  Of course in TTM  and the IM we have given a lot more detail than Codd did but I doubt we have added anything that he would have regarded as deviating from his model.  That said, I don't remember him ever giving much of an opinion on TTM.  In fact, he never seemed greatly interested in work done by other people on the RM.  And he kept saying that types are not domains to his dying day.

It's that last sentence (which you've told us before) that concerns me. Did he simply not try to understand 'type' in any of its more modern programming language senses? Or did he understand, but still think his domain meant something different?

There's a great deal of material on Codd's confusing writings re "domain" in Chris's recent book E.F. Codd and Relational Theory.  Equating domain and attribute doesn't work, as Codd himself acknowledged, when two attributes are defined on the same domain/type.  He clearly distinguished between domains and "columns" in his 1992 RM/V2 book.

The trouble with trying to interpret Codd's later writings is I think they got more confused/confusing as time went on.

I can't authoritatively answer your last question but the RM/V2 book has in Chapter 2: "Each domain is declared as an extended data type, not a mere basic data type."  And in Chapter 3: "The DBMS supports calendar dates, clock times, and decimal currency as extended data types ..." followed by a definition headed RT-3 User-defined Extended Data Types" (the text under this heading doesn't define the term at all, just says they must be supported).

I think I should withdraw my "to his dying day" assertion.  Sorry for not thinking of checking RM/V2 before I repeated it here.  It seems that he did eventually buy into the domain = type equation to a large extent, but his prohibition of the use of "basic data types" as domains is telling, I think.

Yes, seems he didn't consider user-defined types, and therefore what it is to define a type 'from the ground up' as it were.

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

Quote from AntC on October 23, 2019, 9:36 pm
Quote from Hugh on October 23, 2019, 11:19 am

I can't authoritatively answer your last question but the RM/V2 book has in Chapter 2: "Each domain is declared as an extended data type, not a mere basic data type."  And in Chapter 3: "The DBMS supports calendar dates, clock times, and decimal currency as extended data types ..." followed by a definition headed RT-3 User-defined Extended Data Types" (the text under this heading doesn't define the term at all, just says they must be supported).

I think I should withdraw my "to his dying day" assertion.  Sorry for not thinking of checking RM/V2 before I repeated it here.  It seems that he did eventually buy into the domain = type equation to a large extent, but his prohibition of the use of "basic data types" as domains is telling, I think.

Yes, seems he didn't consider user-defined types, and therefore what it is to define a type 'from the ground up' as it were.

Unsurprising, perhaps. User-defined types are still viewed with skepticism by some software engineers, and I sometimes see arguments to the effect that a C-like set of strings, numerics (perhaps subdivided into decimal or float and integer) and booleans and some means of composition into records are all we really need. Everything else is gratuitous.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on October 23, 2019, 9:36 pm

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

It's an interesting proposition, but I don't think the answer is so certain. By using domains you are in essence defining in advance the name by which the attribute is known in expressions of the RA and which relational attributes are joinable. If you go with INT or MONEY then you are allowing joins on every attribute which has INT or MONEY as its domain, and the attributes are called INT or MONEY. I don't think you want that.

I would expect the design process to involve generating a data dictionary of attributes with literally hundreds or thousands of uniquely named domains. Then a relation is a set of domains. If the same domain is used in two different relations, they are joinable on that domain. If a domain is to be used more than once in a relation then add a qualifying role. So BUY-PRICE and SELL-PRICE and MARGIN and COGS are all money-ish, but they're different domains.

Please remember: you absolutely do not need any kind of type system in order to use Codd's FO-RA. You only need equality between values of the some domain, or a domain and a literal. All new values and more complex comparisons are handled by SO-RA open expressions, and that depends on a type system. To use that, every domain has to map into a host language type, and that's where your MONEY or INT concept are needed.

 

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on October 23, 2019, 9:51 pm

Unsurprising, perhaps. User-defined types are still viewed with skepticism by some software engineers, and I sometimes see arguments to the effect that a C-like set of strings, numerics (perhaps subdivided into decimal or float and integer) and booleans and some means of composition into records are all we really need. Everything else is gratuitous.

Depends totally on the size of the code. Under 300 LOC I'll happily use Ruby, Python, JavaScript, even GWBASIC, whatever. Over 300KLOC I need all the help I can get to keep it all together. User types really help with managing complexity.

Don't forget that if you choose domains you don't need any user-defined types in the database. You can map your domains into anything the host language supports. Kind of like ODBC, where SQL currency and dates are mapped into strings for C to use.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on October 23, 2019, 9:51 pm
Quote from AntC on October 23, 2019, 9:36 pm
Quote from Hugh on October 23, 2019, 11:19 am

I can't authoritatively answer your last question but the RM/V2 book has in Chapter 2: "Each domain is declared as an extended data type, not a mere basic data type."  And in Chapter 3: "The DBMS supports calendar dates, clock times, and decimal currency as extended data types ..." followed by a definition headed RT-3 User-defined Extended Data Types" (the text under this heading doesn't define the term at all, just says they must be supported).

I think I should withdraw my "to his dying day" assertion.  Sorry for not thinking of checking RM/V2 before I repeated it here.  It seems that he did eventually buy into the domain = type equation to a large extent, but his prohibition of the use of "basic data types" as domains is telling, I think.

Yes, seems he didn't consider user-defined types, and therefore what it is to define a type 'from the ground up' as it were.

Unsurprising, perhaps. User-defined types are still viewed with skepticism by some software engineers, and I sometimes see arguments to the effect that a C-like set of strings, numerics (perhaps subdivided into decimal or float and integer) and booleans and some means of composition into records are all we really need. Everything else is gratuitous.

Strings and floats and integers and booleans !!?! Luxury! All you need is binary bitmaps. Octal? we used to dream of octal ... [In a Monty Python Four Yorkshireman voice, of course]

Composing into records? How do you get that in through the front-panel toggle switches?

Quote from dandl on October 24, 2019, 12:37 am
Quote from AntC on October 23, 2019, 9:36 pm

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

It's an interesting proposition, but I don't think the answer is so certain. By using domains you are in essence defining in advance the name by which the attribute is known in expressions of the RA and which relational attributes are joinable.

Well no. We still allow a RENAME operation in the algebra -- either to provide a join where the domain names are different, or to avoid a join where the domain names are the same, but they're in a different role wrt some query. (That's what's going on with the part component/assembly example.)

If you go with INT or MONEY then you are allowing joins on every attribute which has INT or MONEY as its domain, and the attributes are called INT or MONEY. I don't think you want that.

Quite. That's why I think Codd wanted domain names to be distinct from type names. Although it's not clear from the 1970 paper whether he thought of types as having names that appear within the data sublanguage: perhaps "calendar dates, clock times, and decimal currency " are merely the names by which he refers to pools-of-values when he's talking about the sublanguage.

 

Quote from AntC on October 23, 2019, 9:36 pm
Quote from Hugh on October 23, 2019, 11:19 am
Quote from AntC on October 22, 2019, 9:27 pm
Quote from Hugh on October 22, 2019, 4:46 pm

I'm responding to the original post and possibly repeating material that's been given in the plethora of posts that followed it (which I haven't read yet).

Thank you Hugh. And yes there's been quite a flurry, furthermore sticking to the subject.

Codd certainly thought of types as physical representations and that's why he chose the mathematical term domain.  But he was a bit woolly regarding the exact meaning, especially when he appeared to be confusing the terms "domain" and "attribute", as Chris has noted.

 

Did he simply not try to understand 'type' in any of its more modern programming language senses? Or did he understand, but still think his domain meant something different?

There's a great deal of material on Codd's confusing writings re "domain" in Chris's recent book E.F. Codd and Relational Theory.  Equating domain and attribute doesn't work, as Codd himself acknowledged, when two attributes are defined on the same domain/type.  He clearly distinguished between domains and "columns" in his 1992 RM/V2 book.

The trouble with trying to interpret Codd's later writings is I think they got more confused/confusing as time went on.

I can't authoritatively answer your last question but the RM/V2 book has in Chapter 2: "Each domain is declared as an extended data type, not a mere basic data type."  And in Chapter 3: "The DBMS supports calendar dates, clock times, and decimal currency as extended data types ..." followed by a definition headed RT-3 User-defined Extended Data Types" (the text under this heading doesn't define the term at all, just says they must be supported).

I think I should withdraw my "to his dying day" assertion.  Sorry for not thinking of checking RM/V2 before I repeated it here.  It seems that he did eventually buy into the domain = type equation to a large extent, but his prohibition of the use of "basic data types" as domains is telling, I think.

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea.

Aha! To answer my own question, from RM/V2 section 6.1 Basic Naming Features there's "A guideline for naming that tends to make programs easier to read ... two simple rules:" (but "guideline"/"rules" isn't a formal part of the model)

  1. If one considers the name of all domains, all relations, and all functions as a single collection of names, then in that collection every name is distinct from every other name.
  2. Every column name is a combination of a role name and a domain name, where the role name designates in brief the purpose of the column's use of the specified domain.

Hmm hmm. Dave has already pointed out the dangers of reading too much wisdom/foresight into works subsequently taken to be seminal. I've just said works as late as 1992 are well past Codd's seminal best. So I'm concluding 1970's "role-qualified domain" idea petered out.

PreviousPage 4 of 22Next