The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

PreviousPage 5 of 22Next
Quote from dandl on October 24, 2019, 12:37 am
Quote from AntC on October 23, 2019, 9:36 pm

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

It's an interesting proposition, but I don't think the answer is so certain. By using domains you are in essence defining in advance the name by which the attribute is known in expressions of the RA and which relational attributes are joinable. If you go with INT or MONEY then you are allowing joins on every attribute which has INT or MONEY as its domain, and the attributes are called INT or MONEY. I don't think you want that.

I would expect the design process to involve generating a data dictionary of attributes with literally hundreds or thousands of uniquely named domains. Then a relation is a set of domains. If the same domain is used in two different relations, they are joinable on that domain. If a domain is to be used more than once in a relation then add a qualifying role. So BUY-PRICE and SELL-PRICE and MARGIN and COGS are all money-ish, but they're different domains.

Please remember: you absolutely do not need any kind of type system in order to use Codd's FO-RA. You only need equality between values of the some domain, or a domain and a literal.

That sounds like a type system. If the system prevents comparisons of certain attributes because they belong to different domains, then "domain" is a synonym for a kind of type -- in some cases suggesting an enumeration or range type, and sometimes suggesting simple nominal typing -- and the system preventing those comparisons is demonstrating type safety.

I presume your literals are of exactly one kind, string?

If there's more than one kind of literal, then there's typefulness there, too.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on October 24, 2019, 5:56 am

Hmm hmm. Dave has already pointed out the dangers of reading too much wisdom/foresight into works subsequently taken to be seminal. I've just said works as late as 1992 are well past Codd's seminal best. So I'm concluding 1970's "role-qualified domain" idea petered out.

Indeed. My view is always that there is no "wisdom of the ancients", and Codd is no exception. There is no hidden wisdom to tease out, and it's a waste of time looking.

However I do see domains as an interesting idea that may have petered out before being fully explored. I don't look to Codd for answers, but it remains an interesting question. Did TTM take a wrong turn by imposing a full blown state of the art type system directly onto onto the RM? SQL uses a restricted type system and has been more successful, so is that a hint? It there an interesting direction arising out of the idea of 'domain' viewed as a truly minimal type system that might lead to something better than either?

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on October 24, 2019, 6:45 am
Quote from dandl on October 24, 2019, 12:37 am
Quote from AntC on October 23, 2019, 9:36 pm

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

It's an interesting proposition, but I don't think the answer is so certain. By using domains you are in essence defining in advance the name by which the attribute is known in expressions of the RA and which relational attributes are joinable. If you go with INT or MONEY then you are allowing joins on every attribute which has INT or MONEY as its domain, and the attributes are called INT or MONEY. I don't think you want that.

I would expect the design process to involve generating a data dictionary of attributes with literally hundreds or thousands of uniquely named domains. Then a relation is a set of domains. If the same domain is used in two different relations, they are joinable on that domain. If a domain is to be used more than once in a relation then add a qualifying role. So BUY-PRICE and SELL-PRICE and MARGIN and COGS are all money-ish, but they're different domains.

Please remember: you absolutely do not need any kind of type system in order to use Codd's FO-RA. You only need equality between values of the some domain, or a domain and a literal.

That sounds like a type system. If the system prevents comparisons of certain attributes because they belong to different domains, then "domain" is a synonym for a kind of type -- in some cases suggesting an enumeration or range type, and sometimes suggesting simple nominal typing -- and the system preventing those comparisons is demonstrating type safety.

I presume your literals are of exactly one kind, string?

If there's more than one kind of literal, then there's typefulness there, too.

Ok, if you spread the net wide enough just about anything could be a type system. It's just not the kind of type system used in any programming language, nor is one needed. It's not what you'll find in Wikipedia or any other reference I can find.

I'm interesting in what you get if you go for a data sub-language (ambiguously, a DSL), leaving out everything except early Codd, but with a separate host language (of your choice) and open expressions in that language. I wrote:

A DSL does not need a type system of the kid commonly found in computer languages, because it has no operations on values other than the RA itself. A DSL can:

  • Define named domains
  • Nominate a literal type (string, number, date) for use with that domain (optional)
  • Define named relations as a set of domains, optionally with roles to resolve ambiguities
  • Define named queries in the RA as per Codd
  • Define named updates to relations as a kind of query (relational assignment)
  • Define a host language attribute name and type for each domain/role
  • Provide a means for the host language to execute a query by name (such as a function call)
  • Optionally replace any literal by a parameter passed in from the host language
  • Optionally replace any literal by an open expression to be evaluated in the host language
Andl - A New Database Language - andl.org
Quote from dandl on October 24, 2019, 7:39 am
Quote from AntC on October 24, 2019, 5:56 am

Hmm hmm. Dave has already pointed out the dangers of reading too much wisdom/foresight into works subsequently taken to be seminal. I've just said works as late as 1992 are well past Codd's seminal best. So I'm concluding 1970's "role-qualified domain" idea petered out.

Indeed. My view is always that there is no "wisdom of the ancients", and Codd is no exception. There is no hidden wisdom to tease out, and it's a waste of time looking.

However I do see domains as an interesting idea that may have petered out before being fully explored. I don't look to Codd for answers, but it remains an interesting question. Did TTM take a wrong turn by imposing a full blown state of the art type system directly onto onto the RM? SQL uses a restricted type system and has been more successful, so is that a hint? It there an interesting direction arising out of the idea of 'domain' viewed as a truly minimal type system that might lead to something better than either?

It's unlikely that TTM's relative lack of uptake is because TTM advocated a more sophisticated type system than SQL, because the IM is optional. SQL's pervasiveness is entirely due to being in the right place at the right time. At the time SQL use exploded, there simply wasn't a load-and-go alternative to Oracle and DB2, and later MS SQL Server, MySQL, PostgreSQL and many others. By the time MySQL was the typical choice for a new dynamic Web site -- which was about the time, give or take, that the first TTM book was published -- SQL was already completely entrenched and dominant.

TTM (and the IM in particular) was a reaction to the growing popularity of object oriented database systems at a time when "object oriented" was primarily recognised -- for better or worse -- by inheritance and substitutability/polymorphism via inheritance. Thus, a relational alternative either had to tackle inheritance by embracing it, or by providing an equivalent mechanism. The TTM authors chose the former. Without it, arguably TTM would have been completely ignored and forgotten by now.

A "New TTM" devised today would similarly want to provide type-system facilities equivalent to -- or embracing -- those typical of popular programming languages and reflective of industry trends in those languages' type systems. I.e., substitutability/polymorphism via generics, maybe algebraic (e.g., sum & product) types, etc.

That said, domains could be as simple as nominal typing via a type alias, e.g., assuming Tutorial D and given an existing type like INT, the declaration

   DOMAIN MyType INT

states that MyType provides the interface of an INT but is not type compatible with INT or any other DOMAIN <name> INT unless explicitly cast to an INT.

E.g., given:

DOMAIN MyType INT;
DOMAIN AnotherType INT;
VAR MyVar REAL RELATION {x INT, y MyType, z MyType, q AnotherType} KEY {x};

EXTEND MyVar: {p := x + y} would fail with a domain (type) mismatch error.

EXTEND MyVar: {p := x + CAST_AS_INT(y)} would be valid.

EXTEND MyVar: {p := y + z} would be valid.

EXTEND MyVar: {p := q + z} would fail with a domain (type) mismatch error.

This could be quite trivially added to Tutorial D (and thus Rel) and would work either with or without the IM.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on October 24, 2019, 7:47 am
Quote from Dave Voorhis on October 24, 2019, 6:45 am
Quote from dandl on October 24, 2019, 12:37 am
Quote from AntC on October 23, 2019, 9:36 pm

One possible reason for banning "basic data types" as domains is that domains must be unique within a relation schema. It'd be highly likely there's more than one INT or MONEY or DATE in a schema (especially if we include the whole database schema). So the 1970 paper names the domain. As you say, and with the 1970 part component/assembly example, even naming the domain might not be unique. I haven't come across in Codd's later writings whether he expands the "role-qualified domain" idea. (But then I don't examine them closely: it's too depressing/the decline of a fine mind.)

It's an interesting proposition, but I don't think the answer is so certain. By using domains you are in essence defining in advance the name by which the attribute is known in expressions of the RA and which relational attributes are joinable. If you go with INT or MONEY then you are allowing joins on every attribute which has INT or MONEY as its domain, and the attributes are called INT or MONEY. I don't think you want that.

I would expect the design process to involve generating a data dictionary of attributes with literally hundreds or thousands of uniquely named domains. Then a relation is a set of domains. If the same domain is used in two different relations, they are joinable on that domain. If a domain is to be used more than once in a relation then add a qualifying role. So BUY-PRICE and SELL-PRICE and MARGIN and COGS are all money-ish, but they're different domains.

Please remember: you absolutely do not need any kind of type system in order to use Codd's FO-RA. You only need equality between values of the some domain, or a domain and a literal.

That sounds like a type system. If the system prevents comparisons of certain attributes because they belong to different domains, then "domain" is a synonym for a kind of type -- in some cases suggesting an enumeration or range type, and sometimes suggesting simple nominal typing -- and the system preventing those comparisons is demonstrating type safety.

I presume your literals are of exactly one kind, string?

If there's more than one kind of literal, then there's typefulness there, too.

Ok, if you spread the net wide enough just about anything could be a type system. It's just not the kind of type system used in any programming language, nor is one needed. It's not what you'll find in Wikipedia or any other reference I can find.

Any time a given part of a program can have different kinds, and the kinds -- or types -- are used to determine (either by the compiler, or the human using the language) how parts of various kinds can or can't be used together (type safety) or their semantics (dispatch), it is typeful.

That accords rather well with the (rather poorly-written) Wikipedia reference for "type system" at https://en.wikipedia.org/wiki/Type_system.

It could include things some wouldn't consider typeful -- like XML DTDs (even though it means Document Type Definition) -- but it also excludes things that are not typeful. For example, a typical imperative programming language will have different kinds of loops, but as long as they're structural there will be no type safety associated with them -- there's no notion of being prevented from using this kind of loop here but not that kind of loop there despite the syntax being correct -- nor do we dispatch differently depending on the kind of loop. Indeed, difference in dispatch doesn't even apply.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

"DOMAIN MyType INT"

That's the same thing that TTM (even sans IM) has already in the form

TYPE MyType POSSREP X {V INT};

"Cast to INT" is then invoking THE_V.  Everything will turn out to have a counterpart.  Iow this DOMAIN thing as presented here is nothing but syntactic sugar.

Complaining about verbosity of selectors as in, e.g. "MYTYPE(X(V(3)))" are il-inspired : relating the appearance of an expression with elements of context (e.g. the type of an assigned-to variable) can facilitate omission of the MYTYPE() part, leaving "X(V(3))".  Inspection of the number of possreps defined for the type can facilitate omission of the X() part, leaving "V(3)".  Inspection of the number of items within the possrep can facilitate omission of the V() part, leaving "3".  Nothing in TTM prohibits any of those, except possibly maybe rule 26 (I wouldn't judge).

 

Quote from Erwin on October 24, 2019, 11:32 am

"DOMAIN MyType INT"

That's the same thing that TTM (even sans IM) has already in the form

TYPE MyType POSSREP X {V INT};

"Cast to INT" is then invoking THE_V.  Everything will turn out to have a counterpart.  Iow this DOMAIN thing as presented here is nothing but syntactic sugar.

Yes, syntactic sugar is precisely the intent. If I implement it in Rel it would internally be nothing more than wrapping a declaration of TYPE MyType POSSREP etc., plus selector invocation shorthands for types declared via DOMAIN. I obliquely allude to its shorthand-ness in the Tutorial D subforum, where I have reposted the idea and solicit further comments there, since it's veering off-topic here.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 8:30 am
Quote from dandl on October 24, 2019, 7:39 am

Indeed. My view is always that there is no "wisdom of the ancients", and Codd is no exception. There is no hidden wisdom to tease out, and it's a waste of time looking.

However I do see domains as an interesting idea that may have petered out before being fully explored. I don't look to Codd for answers, but it remains an interesting question. Did TTM take a wrong turn by imposing a full blown state of the art type system directly onto onto the RM? SQL uses a restricted type system and has been more successful, so is that a hint? It there an interesting direction arising out of the idea of 'domain' viewed as a truly minimal type system that might lead to something better than either?

It's unlikely that TTM's relative lack of uptake is because TTM advocated a more sophisticated type system than SQL, because the IM is optional. SQL's pervasiveness is entirely due to being in the right place at the right time. At the time SQL use exploded, there simply wasn't a load-and-go alternative to Oracle and DB2, and later MS SQL Server, MySQL, PostgreSQL and many others. By the time MySQL was the typical choice for a new dynamic Web site -- which was about the time, give or take, that the first TTM book was published -- SQL was already completely entrenched and dominant.

TTM (and the IM in particular) was a reaction to the growing popularity of object oriented database systems at a time when "object oriented" was primarily recognised -- for better or worse -- by inheritance and substitutability/polymorphism via inheritance. Thus, a relational alternative either had to tackle inheritance by embracing it, or by providing an equivalent mechanism. The TTM authors chose the former. Without it, arguably TTM would have been completely ignored and forgotten by now.

A "New TTM" devised today would similarly want to provide type-system facilities equivalent to -- or embracing -- those typical of popular programming languages and reflective of industry trends in those languages' type systems. I.e., substitutability/polymorphism via generics, maybe algebraic (e.g., sum & product) types, etc.

No, you really are misunderstanding. I already know the history and I won't argue with this take on it, but it's not what I'm suggesting. Developers are quite happy writing code in Java, C#, JavaScript or whatever and they don't want to consider a bizarre new language with a weird type system. However, there is plenty of pain getting access to the data, whether it's embedded SQL, ODBC, ORMs or whatever, there is pain. Part of the attraction of NoSQL is writing all the code in one language (No SQL).

Codd proposed a DSL, embedded in a host language, and those systems were the only kind of SQL up to the 1990s. Then Microsoft and others created SQLC which became ODBC and CLI, the vendors added stored procedures and the rest is history. D&D tried to replace SQL by a grander language and the punters said no thanks. Nobody AFAICT has ever taken another look at the DSL idea.

The proposal I've been outlining amounts to a new DSL, but one designed from the outset to fit smoothly into any modern programming language. It avoids the perils of the ORM by directly using the host type system. It has open expressions and a couple of other things like RVAs that are really hard in SQL. The application code is entirely client-side where it can use conventional source code management tools and test suites.

Full implementation would be quite challenging, but it degrades smoothly to SQL. In some ways it overlaps with LINQ, but they're really quite different.

I have no reason to believe that in practice it would fare any better than TTM, but the barrier to adoption is way lower than TD/Rel, so it might just find a niche.

 

Andl - A New Database Language - andl.org
Quote from dandl on October 24, 2019, 1:15 pm
Quote from Dave Voorhis on October 24, 2019, 8:30 am
Quote from dandl on October 24, 2019, 7:39 am

Indeed. My view is always that there is no "wisdom of the ancients", and Codd is no exception. There is no hidden wisdom to tease out, and it's a waste of time looking.

However I do see domains as an interesting idea that may have petered out before being fully explored. I don't look to Codd for answers, but it remains an interesting question. Did TTM take a wrong turn by imposing a full blown state of the art type system directly onto onto the RM? SQL uses a restricted type system and has been more successful, so is that a hint? It there an interesting direction arising out of the idea of 'domain' viewed as a truly minimal type system that might lead to something better than either?

It's unlikely that TTM's relative lack of uptake is because TTM advocated a more sophisticated type system than SQL, because the IM is optional. SQL's pervasiveness is entirely due to being in the right place at the right time. At the time SQL use exploded, there simply wasn't a load-and-go alternative to Oracle and DB2, and later MS SQL Server, MySQL, PostgreSQL and many others. By the time MySQL was the typical choice for a new dynamic Web site -- which was about the time, give or take, that the first TTM book was published -- SQL was already completely entrenched and dominant.

TTM (and the IM in particular) was a reaction to the growing popularity of object oriented database systems at a time when "object oriented" was primarily recognised -- for better or worse -- by inheritance and substitutability/polymorphism via inheritance. Thus, a relational alternative either had to tackle inheritance by embracing it, or by providing an equivalent mechanism. The TTM authors chose the former. Without it, arguably TTM would have been completely ignored and forgotten by now.

A "New TTM" devised today would similarly want to provide type-system facilities equivalent to -- or embracing -- those typical of popular programming languages and reflective of industry trends in those languages' type systems. I.e., substitutability/polymorphism via generics, maybe algebraic (e.g., sum & product) types, etc.

No, you really are misunderstanding. I already know the history and I won't argue with this take on it, but it's not what I'm suggesting. Developers are quite happy writing code in Java, C#, JavaScript or whatever and they don't want to consider a bizarre new language with a weird type system. However, there is plenty of pain getting access to the data, whether it's embedded SQL, ODBC, ORMs or whatever, there is pain. Part of the attraction of NoSQL is writing all the code in one language (No SQL).

Codd proposed a DSL, embedded in a host language, and those systems were the only kind of SQL up to the 1990s. Then Microsoft and others created SQLC which became ODBC and CLI, the vendors added stored procedures and the rest is history. D&D tried to replace SQL by a grander language and the punters said no thanks. Nobody AFAICT has ever taken another look at the DSL idea.

The proposal I've been outlining amounts to a new DSL, but one designed from the outset to fit smoothly into any modern programming language. It avoids the perils of the ORM by directly using the host type system. It has open expressions and a couple of other things like RVAs that are really hard in SQL. The application code is entirely client-side where it can use conventional source code management tools and test suites.

Full implementation would be quite challenging, but it degrades smoothly to SQL. In some ways it overlaps with LINQ, but they're really quite different.

I have no reason to believe that in practice it would fare any better than TTM, but the barrier to adoption is way lower than TD/Rel, so it might just find a niche.

I don't think I'm misunderstanding, at least not in terms of domains and the type system. Your proposal sounds like fitting a new programming language into an existing programming language, which doesn't require a new programming language -- just code generation and tooling for an existing one.

Which, as it turns out, is exactly what I'm doing with my datasheet tool. It's becoming something that works rather like Rel for manipulating data and data sources, but instead of writing code in Tutorial D, you write code in Java.

And instead of the administrative front-end generating Tutorial D code, it's generating and compiling Java.

I fully appreciate that this, unfortunately, dispenses with some very nice D notions -- like tuples and relations having mixed structural/nominal typing; e.g., TUPLE {x INT, y CHAR} is the same type as TUPLE {y CHAR, x INT} -- leaving only the usual Java nominal typing. E.g., class P {x int, y String} is a different type from class Q {x int, y String} because P is not Q.

But it turns out that isn't really a show-stopper, and on the client-side, mapping tuples to class instances works -- despite P and Q being different types. Though mediating this is very much a work-in-progress; there may still be better ways to do it. When combined with Java streams -- plus the entirety of the Java ecosystem, plus cross-platform capability including Web -- you get effective blending of client-side and server-side capability without having to embrace a whole new language and possibly a new paradigm.

Though I admit the result is looking a lot closer to being a better ORM (which I guess it is, among other things) than a new D (which it definitely isn't.)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 9:36 am

It could include things some wouldn't consider typeful -- like XML DTDs (even though it means Document Type Definition)

Despite WP, it is "document type declaration", the <!DOCTYPE...> part of a document, including any external file incorporated, which is what most people think of as "a DTD".  It does indeed declare the type of — that is, the constraints on — the document.

 For example, a typical imperative programming language will have different kinds of loops, but as long as they're structural there will be no type safety associated with them -- there's no notion of being prevented from using this kind of loop here but not that kind of loop there despite the syntax being correct -- nor do we dispatch differently depending on the kind of loop. Indeed, difference in dispatch doesn't even apply.

I've thought about this a bit in the past.  A language might have a procedure type label primitive_recursive, which means that the compiler searches the potential dynamic extent of the procedure (all the procedures it might call) and make sure that there are no while-loops in them but only for-loops.  Then the different loops really do have types.

Such a language needs to be first-order, in the sense that there are no procedure-valued variables, arguments, or procedures.

PreviousPage 5 of 22Next