Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

#71 · October 27, 2019, 12:54 am

Quote from dandl on October 27, 2019, 12:54 am

Quote from johnwcowan on October 26, 2019, 7:19 pm

Quote from dandl on October 26, 2019, 9:24 am

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited.

A database might have only one type, say strings or bytevectors, but I don't see how it can have none at all. In any case, the RM prescriptions require only one scalar type, boolean.

I agree. But the RM requires many types, it only mandates one of them plus the ability to generate others.

Build on the work of others (JSON for the DB type system? Or even SQL?)

JSON's types are boolean, inexact rational number, string, array, tuple (and of course null, which we can ignore). In practice I think an exact-number type is also desirable, or as a bare minimum an exact-integer type. In short, just what Tutorial D prescribes, though it is unclear whether RATIONAL means exact or inexact numbers.

JSON values are: object array string number "true" "false" "null". The numbers are arbitrary precision generic signed decimal. Date values are commonly included too (see I-JSON https://tools.ietf.org/html/rfc7493).

Note that objects allow you to represent user-defined structured types (structs/TTM Pre 4) and arrays allow you to represent recurrences (arrays/TTM RVAs).

It's most likely enough for storing any conceivable business data.

As for SQL's scalar type system, it is unparalleled for jugheadedness by anything except the XML Schema scalar type system: read it and weep. In summary:

I agree, but it's easy enough to come up with an SQL type system that is a common subset of all common implementations. It's enough for storing any simple business data values, but (compared to JSON) it cannot store UDTs or recurrences.

I can say from experience that all you really need for business data storage is string, number and date/time, but it's really handy to also be able to store UDTs and recurrences, without the workarounds needed in SQL.

Quote from johnwcowan on October 26, 2019, 7:19 pm

Quote from dandl on October 26, 2019, 9:24 am

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited.

A database might have only one type, say strings or bytevectors, but I don't see how it can have none at all. In any case, the RM prescriptions require only one scalar type, boolean.

I agree. But the RM requires many types, it only mandates one of them plus the ability to generate others.

Build on the work of others (JSON for the DB type system? Or even SQL?)

JSON's types are boolean, inexact rational number, string, array, tuple (and of course null, which we can ignore). In practice I think an exact-number type is also desirable, or as a bare minimum an exact-integer type. In short, just what Tutorial D prescribes, though it is unclear whether RATIONAL means exact or inexact numbers.

JSON values are: object array string number "true" "false" "null". The numbers are arbitrary precision generic signed decimal. Date values are commonly included too (see I-JSON https://tools.ietf.org/html/rfc7493).

Note that objects allow you to represent user-defined structured types (structs/TTM Pre 4) and arrays allow you to represent recurrences (arrays/TTM RVAs).

It's most likely enough for storing any conceivable business data.

As for SQL's scalar type system, it is unparalleled for jugheadedness by anything except the XML Schema scalar type system: read it and weep. In summary:

I agree, but it's easy enough to come up with an SQL type system that is a common subset of all common implementations. It's enough for storing any simple business data values, but (compared to JSON) it cannot store UDTs or recurrences.

I can say from experience that all you really need for business data storage is string, number and date/time, but it's really handy to also be able to store UDTs and recurrences, without the workarounds needed in SQL.

Andl - A New Database Language - andl.org

#72 · October 27, 2019, 1:41 am

Quote from dandl on October 27, 2019, 12:54 am

JSON values are: object array string number "true" "false" "null".

Using JSON for numbers outside the range and precision of 64-bit binary floats is not interoperable: see section 2.2 of the RFC.

The numbers are arbitrary precision generic signed decimal.

The "object" type is in fact a tuple type, since all the types of JSON literals are manifestly typed and their types needn't be given by name.

Date values are commonly included too (see I-JSON https://tools.ietf.org/html/rfc7493).

There's a convention for encoding dates as strings, but dates are not a manifest JSON type. You cannot tell, looking at a JSON literal, whether "1920-01-01" is a date or some kind of code.

It's most likely enough for storing any conceivable business data.

As I've pointed out before, many businesses need to deal with measurements of all sorts, which is a different numerical type than for money (which is counted, not measured). "Business data" is not confined to the data common to all businesses.

I agree, but it's easy enough to come up with an SQL type system that is a common subset of all common implementations.

Here it is: 64-bit integer, 64-bit binary float, string, and blob (and null). That is all that SQLite has.

Specifically, it has no booleans (use integers), no unbounded integers, no exact rationals (use floats and take your chances, or use integers and do your own scaling), no dates (use strings or floats), no arrays, no row types, no UDTs, no refs (use integer synthetic keys), no enumerations, no complex numbers (use two columns).

#73 · October 27, 2019, 3:21 am

Quote from dandl on October 27, 2019, 3:21 am

Quote from johnwcowan on October 27, 2019, 1:41 am

Quote from dandl on October 27, 2019, 12:54 am

JSON values are: object array string number "true" "false" "null".

Using JSON for numbers outside the range and precision of 64-bit binary floats is not interoperable: see section 2.2 of the RFC.

That's enough, but in any we don't need interop. We just a a familiar universal type system with access to pre-written libraries in a variety of languages.

The numbers are arbitrary precision generic signed decimal.

The "object" type is in fact a tuple type, since all the types of JSON literals are manifestly typed and their types needn't be given by name.

It's close, but again it doesn't matter. JSON objects can be used to store all the familiar stuff like points, rectangles, numbers with units, etc.

Date values are commonly included too (see I-JSON https://tools.ietf.org/html/rfc7493).

There's a convention for encoding dates as strings, but dates are not a manifest JSON type. You cannot tell, looking at a JSON literal, whether "1920-01-01" is a date or some kind of code.

Again that doesn't matter. We can use dates as per I-JSON and JavaScript, for familiarity and universal acceptance. No interop needed.

It's most likely enough for storing any conceivable business data.

As I've pointed out before, many businesses need to deal with measurements of all sorts, which is a different numerical type than for money (which is counted, not measured). "Business data" is not confined to the data common to all businesses.

No problem. We can store them all as numbers, and if we need units we can use an object. No interop needed.

I agree, but it's easy enough to come up with an SQL type system that is a common subset of all common implementations.

Here it is: 64-bit integer, 64-bit binary float, string, and blob (and null). That is all that SQLite has.

Specifically, it has no booleans (use integers), no unbounded integers, no exact rationals (use floats and take your chances, or use integers and do your own scaling), no dates (use strings or floats), no arrays, no row types, no UDTs, no refs (use integer synthetic keys), no enumerations, no complex numbers (use two columns).

The choice of an SQL type system is for familiarity. SQLite happily deals with around 25 data types (see https://sqlite.org/datatype3.html) and stores them in 5 affinities. An implementation for SQL would have to deal with some numerics and date types using special case conversions.

The big problem is that there is no standard or expected way to deal with UDTs (tuples if you like) or recurrences (arrays) in SQL, so there is nothing conceptual there to build on. That's a barrier to usability, not an implementation problem.

Quote from johnwcowan on October 27, 2019, 1:41 am

Quote from dandl on October 27, 2019, 12:54 am

JSON values are: object array string number "true" "false" "null".

Using JSON for numbers outside the range and precision of 64-bit binary floats is not interoperable: see section 2.2 of the RFC.

That's enough, but in any we don't need interop. We just a a familiar universal type system with access to pre-written libraries in a variety of languages.

The numbers are arbitrary precision generic signed decimal.

The "object" type is in fact a tuple type, since all the types of JSON literals are manifestly typed and their types needn't be given by name.

It's close, but again it doesn't matter. JSON objects can be used to store all the familiar stuff like points, rectangles, numbers with units, etc.

Date values are commonly included too (see I-JSON https://tools.ietf.org/html/rfc7493).

There's a convention for encoding dates as strings, but dates are not a manifest JSON type. You cannot tell, looking at a JSON literal, whether "1920-01-01" is a date or some kind of code.

Again that doesn't matter. We can use dates as per I-JSON and JavaScript, for familiarity and universal acceptance. No interop needed.

It's most likely enough for storing any conceivable business data.

As I've pointed out before, many businesses need to deal with measurements of all sorts, which is a different numerical type than for money (which is counted, not measured). "Business data" is not confined to the data common to all businesses.

No problem. We can store them all as numbers, and if we need units we can use an object. No interop needed.

I agree, but it's easy enough to come up with an SQL type system that is a common subset of all common implementations.

Here it is: 64-bit integer, 64-bit binary float, string, and blob (and null). That is all that SQLite has.

Specifically, it has no booleans (use integers), no unbounded integers, no exact rationals (use floats and take your chances, or use integers and do your own scaling), no dates (use strings or floats), no arrays, no row types, no UDTs, no refs (use integer synthetic keys), no enumerations, no complex numbers (use two columns).

The choice of an SQL type system is for familiarity. SQLite happily deals with around 25 data types (see https://sqlite.org/datatype3.html) and stores them in 5 affinities. An implementation for SQL would have to deal with some numerics and date types using special case conversions.

The big problem is that there is no standard or expected way to deal with UDTs (tuples if you like) or recurrences (arrays) in SQL, so there is nothing conceptual there to build on. That's a barrier to usability, not an implementation problem.

Andl - A New Database Language - andl.org

#74 · October 27, 2019, 4:00 am

Quote from dandl on October 27, 2019, 4:00 am

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Quote from dandl on October 26, 2019, 9:24 am

Quote from Dave Voorhis on October 25, 2019, 1:31 pm

Ah, that's exactly where we differ. I'm trying to dispense with the function API to access query results. I did that with the Rel client library, and whilst it's workable, it's awkward -- as DBMS-to-client API's usually are. That's why I'm re-examining tuple-as-class-instance, but only as a way to represent database tuples on the client-side and not as a general database abstraction. The "first great blunder" is not being re-blundered; it's purely a means to make client-side database-driven application coding tastier and less chewy.

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited. Had they succeeded they would have entrenched another non-portable language in the database. One is bad enough.

What if D&D had rejected the notion, avoided trying to design a new type system and a new application development language and simply gone back to the fundamentals? I suggest:

Just a Data Sub-language, based closely on the RA/RC (Codd's 1972 RA would work just fine)

Anything can be a Host Language (portability in our time)

Build on the work of others (JSON for the DB type system? Or even SQL?)

Code lives in the repo, not in the database

Keep it simple, stupid!

Domains are just a way to work with a really simple type system in the database. At the end you get a familiar thing, a kind of RA-programmable ORM with no visible SQL. It's cross platform, cross language, works locally as well as on the server, it can even work on a NoSQL back end.

The devil of course is in the usual place. Maybe it's not possible, but it's fun to think about.

It sounds do-able, but for the typical developer who knows SQL, what benefit does it give over SQL and the usual JDBC/ODBC/etc. client-side libraries?

But that's the whole point, to present a solution to a different problem, one that is instantly familiar. TTM/D was intended as the solution for a perceived giant problem with SQL, but has become the solution for people (like myself) in need of a project they (mistakenly) think might be useful. Rel is a solution to the problem people have if they need an implementation of TD for teaching purposes.

But once you start asking the question of "why is it better than X?" then the potential set of people with some kind of a relevant problem is far larger. You could ask the exact same question as to why Microsoft created LINQ to SQL when they already had at least two ORMs and a full set of ODBC and SQL Server API tools, and you will find hundreds of people telling you why. You could ask why jOOQ? And so on.

I'm just interested in a solution to the problem for which "SQL and the usual JDBC/ODBC/etc. client-side libraries?" are the current solution, and asking whether a solution based on the RA is a better way to do it.

I'm not sure "domains" as you've hinted at here are anything but a type alias, or the usual ORM-like mapping between the client-side and server/DBMS-side types. Or are you using the term "domain" for precisely that?

No, not a type alias, more like a blend of a role and a type mapping. Just a different perspective on the same old problems of naming things and giving them useful attributes.

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them. Nothing other than TTM requires them, so let's see where we get without them.

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Quote from dandl on October 26, 2019, 9:24 am

Quote from Dave Voorhis on October 25, 2019, 1:31 pm

Ah, that's exactly where we differ. I'm trying to dispense with the function API to access query results. I did that with the Rel client library, and whilst it's workable, it's awkward -- as DBMS-to-client API's usually are. That's why I'm re-examining tuple-as-class-instance, but only as a way to represent database tuples on the client-side and not as a general database abstraction. The "first great blunder" is not being re-blundered; it's purely a means to make client-side database-driven application coding tastier and less chewy.

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited. Had they succeeded they would have entrenched another non-portable language in the database. One is bad enough.

What if D&D had rejected the notion, avoided trying to design a new type system and a new application development language and simply gone back to the fundamentals? I suggest:

Just a Data Sub-language, based closely on the RA/RC (Codd's 1972 RA would work just fine)

Anything can be a Host Language (portability in our time)

Build on the work of others (JSON for the DB type system? Or even SQL?)

Code lives in the repo, not in the database

Keep it simple, stupid!

Domains are just a way to work with a really simple type system in the database. At the end you get a familiar thing, a kind of RA-programmable ORM with no visible SQL. It's cross platform, cross language, works locally as well as on the server, it can even work on a NoSQL back end.

The devil of course is in the usual place. Maybe it's not possible, but it's fun to think about.

It sounds do-able, but for the typical developer who knows SQL, what benefit does it give over SQL and the usual JDBC/ODBC/etc. client-side libraries?

But that's the whole point, to present a solution to a different problem, one that is instantly familiar. TTM/D was intended as the solution for a perceived giant problem with SQL, but has become the solution for people (like myself) in need of a project they (mistakenly) think might be useful. Rel is a solution to the problem people have if they need an implementation of TD for teaching purposes.

But once you start asking the question of "why is it better than X?" then the potential set of people with some kind of a relevant problem is far larger. You could ask the exact same question as to why Microsoft created LINQ to SQL when they already had at least two ORMs and a full set of ODBC and SQL Server API tools, and you will find hundreds of people telling you why. You could ask why jOOQ? And so on.

I'm just interested in a solution to the problem for which "SQL and the usual JDBC/ODBC/etc. client-side libraries?" are the current solution, and asking whether a solution based on the RA is a better way to do it.

I'm not sure "domains" as you've hinted at here are anything but a type alias, or the usual ORM-like mapping between the client-side and server/DBMS-side types. Or are you using the term "domain" for precisely that?

No, not a type alias, more like a blend of a role and a type mapping. Just a different perspective on the same old problems of naming things and giving them useful attributes.

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them. Nothing other than TTM requires them, so let's see where we get without them.

Andl - A New Database Language - andl.org

#75 · October 27, 2019, 4:36 am

Quote from dandl on October 27, 2019, 3:21 am

We don't need interop.

Without interop there is no guarantee that what goes into the database from one language is what comes out in another.

SQLite happily deals with around 25 data types (see https://sqlite.org/datatype3.html) and stores them in 5 affinities.

It has 25 names for its four types. Actually, it has a good many more names than that. Consider the following perfectly correct SQLite DDL:

CREATE TABLE wtf(
moldy blob,
cheese blintz,
sweet saccharin,
cargo offloading,
prom date);

Needless to say, the last column does not contain dates, except by some sort of convention.

The big problem is that there is no standard or expected way to deal with UDTs (tuples if you like) or recurrences (arrays) in SQL, so there is nothing conceptual there to build on. That's a barrier to usability, not an implementation problem.

I don't know what you mean by standard. If you mean the ISO standards, there have been ways for 20 years. If you mean available across all SQL databases, then you only get the four.

#76 · October 27, 2019, 9:12 am

Quote from dandl on October 27, 2019, 4:00 am

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them.

Not sure what you mean. What's arcane and alien about them?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#77 · October 27, 2019, 9:28 am

Quote from dandl on October 27, 2019, 12:54 am

I can say from experience that all you really need for business data storage is string, number and date/time ...

Well, yes -- you can encode anything in a string -- and you can write any application in assembly language.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#78 · October 27, 2019, 12:16 pm

Quote from Dave Voorhis on October 27, 2019, 9:12 am

Quote from dandl on October 27, 2019, 4:00 am

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them.

Not sure what you mean. What's arcane and alien about them?

+1.

They [tuple and relation types] seem in the realm of the blimmin' obvious to me (with my query-user's hat on). Only if I put my type-theoretician's/implementor's hat on first do they seem problematic.

#79 · October 27, 2019, 7:22 pm

Quote from AntC on October 27, 2019, 12:16 pm

Quote from Dave Voorhis on October 27, 2019, 9:12 am

Quote from dandl on October 27, 2019, 4:00 am

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them.

Not sure what you mean. What's arcane and alien about them?

+1.

They [tuple and relation types] seem in the realm of the blimmin' obvious to me (with my query-user's hat on). Only if I put my type-theoretician's/implementor's hat on first do they seem problematic.

Perhaps it's the term "type generator" itself.

RELATION and TUPLE type generators don't have the usual appearance of parametric types. I imagine a typical Java/C#/whatever developer -- used to creating a parametrised type via something like class P <T> { .... T blah() { ... }} -- might well raise an eyebrow on hearing that RELATION {x INT, y CHAR} is notionally a parametrised type.

But tell him or her that RELATION {x INT, y CHAR} is a relation type and the slight unease goes away, and it all turns out to be nicely intuitive.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#80 · October 27, 2019, 11:49 pm

Quote from Dave Voorhis on October 27, 2019, 9:12 am

Quote from dandl on October 27, 2019, 4:00 am

Quote from Dave Voorhis on October 26, 2019, 10:12 am

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them.

Not sure what you mean. What's arcane and alien about them?

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

are unique features of TTM/D -- not found in other languages (*)
cannot be added to other languages (*) except by compiler modifications
are named by means of a type generator, such that the name itself cannot be written
cause confusion, disagreement and disbelief amongst those who ask questions about or challenge TTM, but never enthusiastic acceptance.

(*) Other languages that are fully implemented and used by at least a handful of people, and findable by Web search.

Items 1,2 and 4 also apply to POSSREPs.

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters