The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Which type?

PreviousPage 5 of 9Next
Quote from Paul Vernon on November 11, 2021, 12:55 pm
Quote from dandl on November 11, 2021, 12:18 pm

The big assumption of TTM was the idea of imposing a programming language type system on a data model.

Indeed. An assumption. And (unfortunately) a harmful one I have come to believe. Not that the intent was wrong, more that "type systems" are the wrong answer to the right question.

Which is where my '9 types' solution comes from. They don't comprise a type 'system', just a set of labels to accommodate whatever data you want to store. The type 'system' cam be added by whatever programming language you might choose.

"Catching a significant number of mistakes"and "Improving the robustness and safety of software" are the right questions. We just need the right answer.

We had a long ahead here a while back in which I advocated strongly that the next big advance in programming languages should be 'shorter, safer, higher' but most particularly 'safer'. There are a few programming environments in which: if it compiles, it runs. By this I mean that if you write code and it satisfies the static testing required before it can be executed, then once executed it can never fail: no exceptions, no runtime errors, no type errors, nothing. The 'system' is smart enough to figure out all the things that could go wrong and ensure that they are resolved before the program runs.

We are light years away from that now. I routinely spend hours and days getting correct code to run because of things I didn't know or think of about the stuff outside my code. I don't want to do that anymore.

Andl - A New Database Language - andl.org
Quote from dandl on November 12, 2021, 12:17 am
Quote from Dave Voorhis on November 11, 2021, 2:04 pm
Quote from dandl on November 11, 2021, 11:58 am
Quote from Dave Voorhis on November 11, 2021, 8:54 am
Quote from dandl on November 11, 2021, 12:48 am

[...]

In all the database models I can ever remember seeing, I have never found the need for more than 9 scalar types: boolean, integer, real, decimal, datetime, text string, binary string, enum, struct. (And I regard value inheritance as a pointless thought experiment.)

A programming language may have many more types, but IMO it should support those 9 as scalar value types if it is to be a database programming language (which is the aim of TTM/D). I don't know any that do.

I don't know any popular programming languages that don't support those as user-defined types in a library.

That's the crucial requirement, really -- not that those 9 (give or take) types be baked into the language, but that the language allow the user to define them, along with any other conceivable types that may be deemed desirable, and have them treated as notionally equivalent to built-in-to-the-language types, modulo the usual primitive vs non-primitive type issues.

The point is: "should support those 9 as scalar value types if it is to be a database programming language". It's not intended to be a high bar, but the languages I know don't have all these types natively, and encounter various restrictions when adding them using libraries. A D candidate (or SQL replacement) should support them 'as native', smoothly and seamlessly.

I presume "as native" allows them to be defined as part of a standard library?

The C language does almost nothing without the inclusion of a standard library. That's language design issue, not germane to this question.

Indeed, true of every popular programming language. Primitive types -- int, char and so forth -- are often baked into the language as hardware types (at least notionally, though not exclusively), and all other core types are defined using the language and provided in a standard library.

My point is that capability -- to define all useful types from some truly primitive (i.e., either real or virtual hardware, or conceptual starting point with appropriate theoretical basis) set -- is what is essential.

There are no other 'conceivable' types that I know of. Remember: this is just about the storage/data model. There are lots of other interesting types for writing programs, but then as per OO VSS 2, the operators on those types are not part of the type itself. So for example, you can store a Point or a Complex value as a struct of two elements, but they are of database type struct. The programming library that provides operators on Point and Complex are provided by some programming language rather than part of the database itself.

The same applies to decimal and datetime -- struct for the former and (usually) an integer or two for the latter.

If you take that line of thinking, binary blob does it all. I don't. The aim is a set that is sufficient rather than necessary or minimal, and that aligns with the data types found in other data models. If the nine types are supported, everything else can be readily accommodated by sets of operators on those types.

Yes, binary blob does do it all, at least conceptually. Though it's reasonable -- for performance reasons -- to ergonomically map some kinds of blobs to fundamental hardware types.

[As an aside, struct is a poor choice for decimal because the struct members have no useful names. Integer is a poor choice for date/time because (a) you also need an epoch (b) it sets arbitrary limits on precision.]

Ah hah!  You don't like my implementation of decimal and timestamp?  You want your own?

Exactly my point.

There are plenty of conceivable types, of course -- temperature, currency (not quite the same as decimal), distance, vector, etc., plus every domain-specific type to enforce type safety, so invoice_number, customer_number, and so on ad infinitum. Of course, you might suggest that decimal is somehow more typeful than invoice_number, but why, if both are based on integer?

Thus, I'd argue that the measure of any language -- database or otherwise -- is in its ability to define types, not in whether (or how) it embeds (or doesn't) some canonical set of them.

And that is the point: types of the kinds you mention are required by programming languages but are not inherently needed in the data model. Start with the bedrock of the 9 types and every data model is supported, at least at the storage level. Add libraries of operators as you add programming types, but as artefacts of the language through which the data model is viewed, not part of the model itself.

This is an inherent problem with TTM and not with SQL. The intention is a language-independent base view on the data, and language dependent views above that.

TTM doesn't prescribe types outside of boolean (RM Pre 1), and Tutorial D specified a set sufficient for pedagogical illustration. What set of types is baked into a language, provided in a standard library, available to download or default or whatever is more an issue for the marketing department or whatever promotes the language. It certainly has no theoretical, foundational, or essential basis, and specifying some "bedrock" set of nine types or four (e.g., int, double, char, or INTEGER, RATIONAL, CHARACTER plus a boolean) or whatever is the opposite of "language-independent."

It's not even application domain independent.

However, being able to define all the useful types within the language from some essential (either minimal theoretical, or minimal fundamental) types is a good indication of language capability and expressivity.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on November 12, 2021, 12:30 am
Quote from Paul Vernon on November 11, 2021, 12:55 pm
Quote from dandl on November 11, 2021, 12:18 pm

The big assumption of TTM was the idea of imposing a programming language type system on a data model.

Indeed. An assumption. And (unfortunately) a harmful one I have come to believe. Not that the intent was wrong, more that "type systems" are the wrong answer to the right question.

Which is where my '9 types' solution comes from. They don't comprise a type 'system', just a set of labels to accommodate whatever data you want to store. The type 'system' cam be added by whatever programming language you might choose.

They comprise components of a type system.

I think there's some misunderstanding of terminology here, that "type system" only means mechanisms for specifying substitutivity -- via inheritance or whatever --  and conditions for dispatch, but simply providing some primitive or built-in types -- for some undefined value of "primitive" and "built-in" -- isn't a type system. In other words, in some posts here (and elsewhere) there appears to be a belief that a language like Java has a type system (because inheritance, etc.) but (say) classic BASIC doesn't have a type system.

No. BASIC has a type system. A type system consisting only of predefined primitive types (maybe as few as one, in some languages) and no explicit substitution mechanisms is still a type system. It may not have (much, or any) type checking or type safety, but even a crude and unhelpful and un-automated type system is a type system.

"Catching a significant number of mistakes"and "Improving the robustness and safety of software" are the right questions. We just need the right answer.

We had a long ahead here a while back in which I advocated strongly that the next big advance in programming languages should be 'shorter, safer, higher' but most particularly 'safer'. There are a few programming environments in which: if it compiles, it runs. By this I mean that if you write code and it satisfies the static testing required before it can be executed, then once executed it can never fail: no exceptions, no runtime errors, no type errors, nothing. The 'system' is smart enough to figure out all the things that could go wrong and ensure that they are resolved before the program runs.

We are light years away from that now. I routinely spend hours and days getting correct code to run because of things I didn't know or think of about the stuff outside my code. I don't want to do that anymore.

I don't think anyone (here, at least) disagrees that better run-time safety -- and catching or avoiding more oopsies before run-time -- is valuable. Indeed, languages like Swift and Kotlin go a good way to providing it without losing flexibility, general-purpose capability, and (at least) integration with existing popular infrastructure, languages, and ecosystems. There was some disagreement about your particular approach to achieving it. (No slight at your suggestions, though -- I see much worse ones elsewhere on a daily basis.)

Speaking of elsewhere... I sometimes see arguments that favour (what amounts to) less run-time safety mechanisms, because (apparently) the most important thing is gaining marketshare by getting functionality out there fastest. Whether it's robust or not is mostly irrelevant.

Their argument, made seriously and with apparent weight, is that bugs are always a minor issue. You log them in production and fix them at your leisure. Unreliable software may annoy OCD programmers, but users are used to it. Indeed, they expect it.

So give them what they expect... Quickly. Speed of code delivery is the important thing. Safer, higher, shorter, no. Faster, faster, faster, yes.

Though "shorter" is sometimes raised as an issue, but only by counting programmer keystrokes. "Shorter" in terms of less overall developer effort (which isn't necessarily the fewest keystrokes), not so much.

Etc.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Indeed, true of every popular programming language. Primitive types -- int, char and so forth -- are often baked into the language as hardware types (at least notionally, though not exclusively), and all other core types are defined using the language and provided in a standard library.

My point is that capability -- to define all useful types from some truly primitive (i.e., either real or virtual hardware, or conceptual starting point with appropriate theoretical basis) set -- is what is essential.

 

So this is the same idea. The set of 9 types form a reasonable minimal set for virtual hardware supporting a DBMS and associate language, on which to build application types. Each of them

  • has a reasonable literal form
  • is intrinsically useful and already widely used
  • has a standardised canonical interchange format (most are in JS)
  • is reasonably distinct ie not easily expressed in terms of the others and/or there are barriers to doing so.

[As an aside, struct is a poor choice for decimal because the struct members have no useful names. Integer is a poor choice for date/time because (a) you also need an epoch (b) it sets arbitrary limits on precision.]

Ah hah!  You don't like my implementation of decimal and timestamp?  You want your own?

Exactly my point.

I don't care a hoot how you implement as long as it's hidden, but if you build decimal on a visible struct type then the members are visible and that won't do. Struct is a more plausible choice for datetime, as long as the atruct is specifically {year:month:day:seconds} so to some extent that one is a matter of choice. I think the primitive type plays better because dates are absolutely pervasive in business computing and it's more a question of why not?

There are plenty of conceivable types, of course -- temperature, currency (not quite the same as decimal), distance, vector, etc., plus every domain-specific type to enforce type safety, so invoice_number, customer_number, and so on ad infinitum. Of course, you might suggest that decimal is somehow more typeful than invoice_number, but why, if both are based on integer?

Thus, I'd argue that the measure of any language -- database or otherwise -- is in its ability to define types, not in whether (or how) it embeds (or doesn't) some canonical set of them.

And that is the point: types of the kinds you mention are required by programming languages but are not inherently needed in the data model. Start with the bedrock of the 9 types and every data model is supported, at least at the storage level. Add libraries of operators as you add programming types, but as artefacts of the language through which the data model is viewed, not part of the model itself.

This is an inherent problem with TTM and not with SQL. The intention is a language-independent base view on the data, and language dependent views above that.

TTM doesn't prescribe types outside of boolean (RM Pre 1), and Tutorial D specified a set sufficient for pedagogical illustration. What set of types is baked into a language, provided in a standard library, available to download or default or whatever is more an issue for the marketing department or whatever promotes the language. It certainly has no theoretical, foundational, or essential basis, and specifying some "bedrock" set of nine types or four (e.g., int, double, char, or INTEGER, RATIONAL, CHARACTER plus a boolean) or whatever is the opposite of "language-independent."

And I think that's a mistake. The problem is that you create a dependency: the database is inaccessible except through the type system of and created by that language. My view is that the database should provide values conforming to the bedrock 9 types, and the programming language builds on that foundation. Types created in the language stay in the language and do not pollute the database.

It's not even application domain independent.

However, being able to define all the useful types within the language from some essential (either minimal theoretical, or minimal fundamental) types is a good indication of language capability and expressivity.

Sure, but I'm not talking about language design, I'm talking about a database, a data model and the bedrock types it should provide for all languages to use.

Andl - A New Database Language - andl.org
Quote from dandl on November 12, 2021, 12:21 pm

Indeed, true of every popular programming language. Primitive types -- int, char and so forth -- are often baked into the language as hardware types (at least notionally, though not exclusively), and all other core types are defined using the language and provided in a standard library.

My point is that capability -- to define all useful types from some truly primitive (i.e., either real or virtual hardware, or conceptual starting point with appropriate theoretical basis) set -- is what is essential.

 

So this is the same idea. The set of 9 types form a reasonable minimal set for virtual hardware supporting a DBMS and associate language, on which to build application types. Each of them

  • has a reasonable literal form
  • is intrinsically useful and already widely used
  • has a standardised canonical interchange format (most are in JS)
  • is reasonably distinct ie not easily expressed in terms of the others and/or there are barriers to doing so.

[As an aside, struct is a poor choice for decimal because the struct members have no useful names. Integer is a poor choice for date/time because (a) you also need an epoch (b) it sets arbitrary limits on precision.]

Ah hah!  You don't like my implementation of decimal and timestamp?  You want your own?

Exactly my point.

I don't care a hoot how you implement as long as it's hidden, but if you build decimal on a visible struct type then the members are visible and that won't do. Struct is a more plausible choice for datetime, as long as the atruct is specifically {year:month:day:seconds} so to some extent that one is a matter of choice. I think the primitive type plays better because dates are absolutely pervasive in business computing and it's more a question of why not?

Sorry, you've missed my point. The reason we don't currently see even types like decimal as a canonical type -- and it's a relatively straightforward one, unlike dates and timestamps that seem simple but are a minefield of complexity, exceptions, and varying (and sometimes contradictory) requirements -- is that there is no universal agreement on how they should work. I'll make a decimal that doesn't suit your requirements. Your decimal won't suit mine.

At least with integer and floats you have hardware implementations defining (or imposing) requirements, but outside of that there are no consistent requirements -- and even integer and floats vary -- which is why it's not unusual for SQL DBMSs to have from over a dozen built-in column types (https://docs.oracle.com/cd/A58617_01/server.804/a58241/ch5.htm#417949) to dozens (https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver15) or (https://www.postgresql.org/docs/9.5/datatype.html).

Yet another here's-everything-you-need-canonical-set attempt is not what we need, particularly if you've had the rare joy of working in some domain where most of the canonical "business" types are no help and everything winds up being shoe-horned into some type where it doesn't belong, like BLOB.

Better is the means to easily user-define rich user-defined types in the DBMS (most DBMSs now allow user-defined types, but don't make them easy) and provide straightforward means to map them to types on the client side. If you want to provide a default set of nine (or whatever) that's fine, as long as I can choose not to have them or replace them with equivalents, as I see fit.

Done properly, DBMS vs client types are only compatible, not the same. That's because the server side defines types in terms of storage, the client side defines types in terms of presentation, and the middleware layer (if present) defines them in terms of business rules and calculations.

Thus they're actually distinct types, sharing only (where appropriate) convertibility.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 12, 2021, 11:09 am
Quote from dandl on November 12, 2021, 12:30 am
Quote from Paul Vernon on November 11, 2021, 12:55 pm
Quote from dandl on November 11, 2021, 12:18 pm

The big assumption of TTM was the idea of imposing a programming language type system on a data model.

Indeed. An assumption. And (unfortunately) a harmful one I have come to believe. Not that the intent was wrong, more that "type systems" are the wrong answer to the right question.

Which is where my '9 types' solution comes from. They don't comprise a type 'system', just a set of labels to accommodate whatever data you want to store. The type 'system' cam be added by whatever programming language you might choose.

They comprise components of a type system.

I think there's some misunderstanding of terminology here, that "type system" only means mechanisms for specifying substitutivity -- via inheritance or whatever --  and conditions for dispatch, but simply providing some primitive or built-in types -- for some undefined value of "primitive" and "built-in" -- isn't a type system. In other words, in some posts here (and elsewhere) there appears to be a belief that a language like Java has a type system (because inheritance, etc.) but (say) classic BASIC doesn't have a type system.

No. BASIC has a type system. A type system consisting only of predefined primitive types (maybe as few as one, in some languages) and no explicit substitution mechanisms is still a type system. It may not have (much, or any) type checking or type safety, but even a crude and unhelpful and un-automated type system is a type system.

But that specifically is not what I'm proposing. Every programming language that provides values also has a type system around those values whether it wants one or not. I'm proposing a set of bedrock types (or domains or permissible values) for a stored persistent data model that that is language agnostic (even atheistic) and can be accessed by any programming language, or none. If you can get over your and TTM's fixation on a single data language and think in terms of data not code, we might get somewhere.

"Catching a significant number of mistakes"and "Improving the robustness and safety of software" are the right questions. We just need the right answer.

We had a long ahead here a while back in which I advocated strongly that the next big advance in programming languages should be 'shorter, safer, higher' but most particularly 'safer'. There are a few programming environments in which: if it compiles, it runs. By this I mean that if you write code and it satisfies the static testing required before it can be executed, then once executed it can never fail: no exceptions, no runtime errors, no type errors, nothing. The 'system' is smart enough to figure out all the things that could go wrong and ensure that they are resolved before the program runs.

We are light years away from that now. I routinely spend hours and days getting correct code to run because of things I didn't know or think of about the stuff outside my code. I don't want to do that anymore.

I don't think anyone (here, at least) disagrees that better run-time safety -- and catching or avoiding more oopsies before run-time -- is valuable. Indeed, languages like Swift and Kotlin go a good way to providing it without losing flexibility, general-purpose capability, and (at least) integration with existing popular infrastructure, languages, and ecosystems. There was some disagreement about your particular approach to achieving it. (No slight at your suggestions, though -- I see much worse ones elsewhere on a daily basis.)

Nice of you to say so. :-)

But seriously, those languages are aiming at a better C, low level but with safer memory management, threading, etc. I want higher. Actually a bit of AI would be nice, a 'system' that watches what I do and scouts ahead to find the traps I'm about to walk into.

Speaking of elsewhere... I sometimes see arguments that favour (what amounts to) less run-time safety mechanisms, because (apparently) the most important thing is gaining marketshare by getting functionality out there fastest. Whether it's robust or not is mostly irrelevant.

Their argument, made seriously and with apparent weight, is that bugs are always a minor issue. You log them in production and fix them at your leisure. Unreliable software may annoy OCD programmers, but users are used to it. Indeed, they expect it.

So give them what they expect... Quickly. Speed of code delivery is the important thing. Safer, higher, shorter, no. Faster, faster, faster, yes.

Though "shorter" is sometimes raised as an issue, but only by counting programmer keystrokes. "Shorter" in terms of less overall developer effort (which isn't necessarily the fewest keystrokes), not so much.

Etc.

We had a presentation recently from a high level manager boasting about how they were doing multiple software releases per day. For some strange reason he was not able to answer my question about bug counts or what the users thought about taking over the role of testing department.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on November 12, 2021, 5:55 pm
Quote from dandl on November 12, 2021, 12:21 pm

Indeed, true of every popular programming language. Primitive types -- int, char and so forth -- are often baked into the language as hardware types (at least notionally, though not exclusively), and all other core types are defined using the language and provided in a standard library.

My point is that capability -- to define all useful types from some truly primitive (i.e., either real or virtual hardware, or conceptual starting point with appropriate theoretical basis) set -- is what is essential.

 

So this is the same idea. The set of 9 types form a reasonable minimal set for virtual hardware supporting a DBMS and associate language, on which to build application types. Each of them

  • has a reasonable literal form
  • is intrinsically useful and already widely used
  • has a standardised canonical interchange format (most are in JS)
  • is reasonably distinct ie not easily expressed in terms of the others and/or there are barriers to doing so.

[As an aside, struct is a poor choice for decimal because the struct members have no useful names. Integer is a poor choice for date/time because (a) you also need an epoch (b) it sets arbitrary limits on precision.]

Ah hah!  You don't like my implementation of decimal and timestamp?  You want your own?

Exactly my point.

I don't care a hoot how you implement as long as it's hidden, but if you build decimal on a visible struct type then the members are visible and that won't do. Struct is a more plausible choice for datetime, as long as the atruct is specifically {year:month:day:seconds} so to some extent that one is a matter of choice. I think the primitive type plays better because dates are absolutely pervasive in business computing and it's more a question of why not?

Sorry, you've missed my point. The reason we don't currently see even types like decimal as a canonical type -- and it's a relatively straightforward one, unlike dates and timestamps that seem simple but are a minefield of complexity, exceptions, and varying (and sometimes contradictory) requirements -- is that there is no universal agreement on how they should work. I'll make a decimal that doesn't suit your requirements. Your decimal won't suit mine.

No, actually you're missing mine and no, you won't. My definitions are:

  • real: a number complying with the JSON definition of number
  • decimal: a number complying with the JSON definition of number but without an exponent
  • integer: a number complying with the JSON definition of number but without an exponent or a decimal point.

And we're done. So what decimal do you have in mind?

At least with integer and floats you have hardware implementations defining (or imposing) requirements, but outside of that there are no consistent requirements -- and even integer and floats vary -- which is why it's not unusual for SQL DBMSs to have from over a dozen built-in column types (https://docs.oracle.com/cd/A58617_01/server.804/a58241/ch5.htm#417949) to dozens (https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver15) or (https://www.postgresql.org/docs/9.5/datatype.html).

Yes, I'm familiar with the situation. That's the kind of mess you finish up with when you have a mixture of historical, storage size, SQL featuritis and oddball DBMS specific types all thrown in together. We definitely don't want to do that.

At the DBMS end I have no objection to adding an expected storage size, but that shouldn't make it a different type. A smallint, short, long or whatever is just an integer with a range constraint. Autoincrement is a DBMS specific attribute, not a type. Types intended to support SQL as a language should be kept separate and built on top of the bedrock types (as indeed they are if you've ever prowled around inside Postgres).

Yet another here's-everything-you-need-canonical-set attempt is not what we need, particularly if you've had the rare joy of working in some domain where most of the canonical "business" types are no help and everything winds up being shoe-horned into some type where it doesn't belong, like BLOB.

Such as? I've done a lot of work with numeric series data and geometric data, but they're easily handled. What are you missing?

Better is the means to easily user-define rich user-defined types in the DBMS (most DBMSs now allow user-defined types, but don't make them easy) and provide straightforward means to map them to types on the client side. If you want to provide a default set of nine (or whatever) that's fine, as long as I can choose not to have them or replace them with equivalents, as I see fit.

So if you prefer you can look at my 9 types as the canonical mapping types, that allow data to be transferred reliably from one data model to another and to and from various programming languages.

Done properly, DBMS vs client types are only compatible, not the same. That's because the server side defines types in terms of storage, the client side defines types in terms of presentation, and the middleware layer (if present) defines them in terms of business rules and calculations.

Thus they're actually distinct types, sharing only (where appropriate) convertibility.

Of course. Client code works with programming types, but the TTM mistake is to allow client types to infect the database.

Andl - A New Database Language - andl.org
Quote from dandl on November 13, 2021, 4:42 am
Quote from Dave Voorhis on November 12, 2021, 5:55 pm
Quote from dandl on November 12, 2021, 12:21 pm

Indeed, true of every popular programming language. Primitive types -- int, char and so forth -- are often baked into the language as hardware types (at least notionally, though not exclusively), and all other core types are defined using the language and provided in a standard library.

My point is that capability -- to define all useful types from some truly primitive (i.e., either real or virtual hardware, or conceptual starting point with appropriate theoretical basis) set -- is what is essential.

 

So this is the same idea. The set of 9 types form a reasonable minimal set for virtual hardware supporting a DBMS and associate language, on which to build application types. Each of them

  • has a reasonable literal form
  • is intrinsically useful and already widely used
  • has a standardised canonical interchange format (most are in JS)
  • is reasonably distinct ie not easily expressed in terms of the others and/or there are barriers to doing so.

[As an aside, struct is a poor choice for decimal because the struct members have no useful names. Integer is a poor choice for date/time because (a) you also need an epoch (b) it sets arbitrary limits on precision.]

Ah hah!  You don't like my implementation of decimal and timestamp?  You want your own?

Exactly my point.

I don't care a hoot how you implement as long as it's hidden, but if you build decimal on a visible struct type then the members are visible and that won't do. Struct is a more plausible choice for datetime, as long as the atruct is specifically {year:month:day:seconds} so to some extent that one is a matter of choice. I think the primitive type plays better because dates are absolutely pervasive in business computing and it's more a question of why not?

Sorry, you've missed my point. The reason we don't currently see even types like decimal as a canonical type -- and it's a relatively straightforward one, unlike dates and timestamps that seem simple but are a minefield of complexity, exceptions, and varying (and sometimes contradictory) requirements -- is that there is no universal agreement on how they should work. I'll make a decimal that doesn't suit your requirements. Your decimal won't suit mine.

No, actually you're missing mine and no, you won't. My definitions are:

  • real: a number complying with the JSON definition of number
  • decimal: a number complying with the JSON definition of number but without an exponent
  • integer: a number complying with the JSON definition of number but without an exponent or a decimal point.

And we're done. So what decimal do you have in mind?

At least with integer and floats you have hardware implementations defining (or imposing) requirements, but outside of that there are no consistent requirements -- and even integer and floats vary -- which is why it's not unusual for SQL DBMSs to have from over a dozen built-in column types (https://docs.oracle.com/cd/A58617_01/server.804/a58241/ch5.htm#417949) to dozens (https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-ver15) or (https://www.postgresql.org/docs/9.5/datatype.html).

Yes, I'm familiar with the situation. That's the kind of mess you finish up with when you have a mixture of historical, storage size, SQL featuritis and oddball DBMS specific types all thrown in together. We definitely don't want to do that.

At the DBMS end I have no objection to adding an expected storage size, but that shouldn't make it a different type. A smallint, short, long or whatever is just an integer with a range constraint. Autoincrement is a DBMS specific attribute, not a type. Types intended to support SQL as a language should be kept separate and built on top of the bedrock types (as indeed they are if you've ever prowled around inside Postgres).

Yet another here's-everything-you-need-canonical-set attempt is not what we need, particularly if you've had the rare joy of working in some domain where most of the canonical "business" types are no help and everything winds up being shoe-horned into some type where it doesn't belong, like BLOB.

Such as? I've done a lot of work with numeric series data and geometric data, but they're easily handled. What are you missing?

Better is the means to easily user-define rich user-defined types in the DBMS (most DBMSs now allow user-defined types, but don't make them easy) and provide straightforward means to map them to types on the client side. If you want to provide a default set of nine (or whatever) that's fine, as long as I can choose not to have them or replace them with equivalents, as I see fit.

So if you prefer you can look at my 9 types as the canonical mapping types, that allow data to be transferred reliably from one data model to another and to and from various programming languages.

Done properly, DBMS vs client types are only compatible, not the same. That's because the server side defines types in terms of storage, the client side defines types in terms of presentation, and the middleware layer (if present) defines them in terms of business rules and calculations.

Thus they're actually distinct types, sharing only (where appropriate) convertibility.

Of course. Client code works with programming types, but the TTM mistake is to allow client types to infect the database.

Rather than (probably pointlessly) debate individual points, I'll point out that I am at least somewhat sympathetic to your approach. Though I would expect which types are available by default to be a product implementation concern only, as there isn't any theoretical basis for it. As a theoretical framework for applied implementations, TTM takes exactly the right approach in not prescribing specific types -- beyond the unavoidable boolean -- but it does appropriately prescribe how (Date & Darwen believe) types should be built.

I would expect that (or similar) ability to define any and all types in any post-SQL DBMS, completely independent of whatever types a given product implementation may or may not provide on boot-up.

In short: Provide whatever built-in types you feel are appropriate in your implementation, but they don't belong in a conceptual framework like TTM, and a reasonable product should neither compel their use or limit the user to only that set as primitives or baseline types.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I think I'm still curious as to what the contributors to this thread (and any lurkers) actually think the answer is to my original question.

So for some undefined meaning of the word do;  Do all scalar values always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Valid answers are:  Yes, No, Don't know, Don't care, It depends, If you want them to, It does not matter, Yes and No, Impossible to say, If you declare it so, ... or any other answer as long as it is of no more than say 5(ish) words.

Personally I think: No.

or at least, I don't think it is useful or needed to conceive that they do (OK, so that is more than 5(ish) words :-( )

Quote from Paul Vernon on November 13, 2021, 7:23 pm

I think I'm still curious as to what the contributors to this thread (and any lurkers) actually think the answer is to my original question.

So for some undefined meaning of the word do;  Do all scalar values always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Valid answers are:  Yes, No, Don't know, Don't care, It depends, If you want them to, It does not matter, Yes and No, Impossible to say, If you declare it so, ... or any other answer as long as it is of no more than say 5(ish) words.

Personally I think: No.

Conceptually, a value either has a known type, or the type is opaque -- you can't see it. Linguistically, the latter is "some value of unknown or arbitrary type."

A value with an opaque type presumably has a known type by some system that references it.

A value without a type is meaningless. It can perhaps be manipulated with other values in some sensible fashion -- e.g., algebraically -- or treated as an abstraction, but ultimately to have any meaning, a value must have a type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 5 of 9Next