The Third Manifesto ForumThe Third Manifesto: TTM General discussionDoes a database need anything mor …

Please or Register to create posts and topics.

Does a database need anything more than a storage-based type system?

#11 · April 11, 2020, 9:29 am

Quote from Dave Voorhis on April 11, 2020, 9:29 am

Quote from dandl on April 11, 2020, 9:14 am

Quote from Dave Voorhis on April 11, 2020, 1:03 am

Quote from dandl on April 11, 2020, 12:30 am

Quote from Dave Voorhis on April 10, 2020, 4:31 pm

Quote from dandl on April 10, 2020, 11:19 am

TTM describes a language with a sophisticated type system and used-defined operators using those types, with the expectation that those types and operators live in the database, supporting database relvars. Is this a good idea?

I have argued that relation should be treated as templated collections rather than types in their own right, which would preclude storing RVAs (and this is consistent with principles of normalisation). Is it a good idea to store typed values in the database that can only be used in conjunction with a particular user-defined type system (also stored in the database)?

Would it not be better to define a general purpose database type system, with enough detail to allow the database to make choices about storage allocation but not presuming or preempting any particular code that might use the data?

SQL already goes close, but has a fair bit of legacy baggage. My preferred list would be: boolean, integer, decimal, real, text, binary, time, enum and UDT.

Time is point in time, effectively non-zoned datetime. You want timezone, use a UDT.

Enum is a type for which every permitted value is enumerated. In SQL terms, it's probably just text strings.

UDT is a type defined as a record structure with one or more named fields, each of one of the other types. In SQL terms, it's just a group of columns.

Who needs more (in the database)?

That seems like an unnecessarily large and rather domain-specific set -- at least in a computer science sense -- and predefined, unextendable sets of types inevitably run into a need for one more type that's really appropriate for this problem but turns up nowhere else (or in this domain but not that one, or often enough to need it here but not there, etc.)

You lost me. In what sense?

At least that's the case with things I write, but I'm a proponent of "typeful programming" and encouraging use of user-defined types.

In a language like Tutorial D, the primitive types -- BOOLEAN, CHARACTER, RATIONAL, INTEGER -- are a sufficient basis for defining any user-defined type via composition as long as the compositional facilities are sufficiently rich, though I'd be inclined to add some form of Binary Large Object or byte array as a primitive.

Then we are in total agreement, except that I have added TIME and DECIMAL. My personal preference is to store only one numeric type (as per JSON), so having 3 is a nod to SQL and TD, 2 is just the wrong number (pick 1 or 3). My UDTs are simple compositions. Where else do you think we differ?

You can interface with any other language by providing a correspondence with (or mapping from) database language primitives to some set of types in the interfacing language. Most languages natively -- or at least easily -- support equivalents to the Tutorial D primitives.

Isn't that what I said?

User-defined types in the database can then be arbitrarily complex. That does mean interacting with other languages requires reduction to primitives, such that given some D-side value v of some given user-defined type V with a component FOO, itself a user-defined type with a component BAR, itself a user-defined type with a component BAZ, itself a user-defined type which has a CHARACTER component called STRING, you may wind up with something like THE_STRING(THE_BAZ(THE_BAR(THE_FOO(v))))¹ to reach the "root" primitive, but that actually works quite well. It's fairly typical to build rich data-processing capabilities on the database side, rich UI (or other system interfacing) capabilities on the client-side, and the two are otherwise decoupled. Coupling is entirely via mapped primitives.

In other words, one set of rich data-oriented user-defined types lives in the database-side, a different set of rich UI or interface -oriented user-defined types lives in the client-side, and there is no (or little) overlap between user-defined types on either side. The interface between the client side and the database side is via mapped primitives rather than user-defined types.

Yes. And your point is? Sounds like furious agreement to me (but I will stick to insisting on TIME as a type).

When there is some overlap, it's relatively straightforward to reconstruct the relatively few shared types on each side.

--
¹ This is perhaps exaggerated, for the sake of illustration. Code I've written has never nested this deep. One or two levels is typical.

Short answer (because it's late) is that user-defined types are absolutely mandatory. Only having simple records is entirely insufficient. Ideally, and at a minimum, I expect to see a rich type system. TTM's is fine (debate over that is elsewhere; let's not repeat it here) as is any system based on algebraic sum-of-product types. Rich polymorphism is a necessity; e.g., multiple dispatch, generics, etc.

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Quote from dandl on April 11, 2020, 9:14 am

Quote from Dave Voorhis on April 11, 2020, 1:03 am

Quote from dandl on April 11, 2020, 12:30 am

Quote from Dave Voorhis on April 10, 2020, 4:31 pm

Quote from dandl on April 10, 2020, 11:19 am

TTM describes a language with a sophisticated type system and used-defined operators using those types, with the expectation that those types and operators live in the database, supporting database relvars. Is this a good idea?

I have argued that relation should be treated as templated collections rather than types in their own right, which would preclude storing RVAs (and this is consistent with principles of normalisation). Is it a good idea to store typed values in the database that can only be used in conjunction with a particular user-defined type system (also stored in the database)?

Would it not be better to define a general purpose database type system, with enough detail to allow the database to make choices about storage allocation but not presuming or preempting any particular code that might use the data?

SQL already goes close, but has a fair bit of legacy baggage. My preferred list would be: boolean, integer, decimal, real, text, binary, time, enum and UDT.

Time is point in time, effectively non-zoned datetime. You want timezone, use a UDT.

Enum is a type for which every permitted value is enumerated. In SQL terms, it's probably just text strings.

UDT is a type defined as a record structure with one or more named fields, each of one of the other types. In SQL terms, it's just a group of columns.

Who needs more (in the database)?

That seems like an unnecessarily large and rather domain-specific set -- at least in a computer science sense -- and predefined, unextendable sets of types inevitably run into a need for one more type that's really appropriate for this problem but turns up nowhere else (or in this domain but not that one, or often enough to need it here but not there, etc.)

You lost me. In what sense?

At least that's the case with things I write, but I'm a proponent of "typeful programming" and encouraging use of user-defined types.

In a language like Tutorial D, the primitive types -- BOOLEAN, CHARACTER, RATIONAL, INTEGER -- are a sufficient basis for defining any user-defined type via composition as long as the compositional facilities are sufficiently rich, though I'd be inclined to add some form of Binary Large Object or byte array as a primitive.

Then we are in total agreement, except that I have added TIME and DECIMAL. My personal preference is to store only one numeric type (as per JSON), so having 3 is a nod to SQL and TD, 2 is just the wrong number (pick 1 or 3). My UDTs are simple compositions. Where else do you think we differ?

You can interface with any other language by providing a correspondence with (or mapping from) database language primitives to some set of types in the interfacing language. Most languages natively -- or at least easily -- support equivalents to the Tutorial D primitives.

Isn't that what I said?

User-defined types in the database can then be arbitrarily complex. That does mean interacting with other languages requires reduction to primitives, such that given some D-side value v of some given user-defined type V with a component FOO, itself a user-defined type with a component BAR, itself a user-defined type with a component BAZ, itself a user-defined type which has a CHARACTER component called STRING, you may wind up with something like THE_STRING(THE_BAZ(THE_BAR(THE_FOO(v))))¹ to reach the "root" primitive, but that actually works quite well. It's fairly typical to build rich data-processing capabilities on the database side, rich UI (or other system interfacing) capabilities on the client-side, and the two are otherwise decoupled. Coupling is entirely via mapped primitives.

In other words, one set of rich data-oriented user-defined types lives in the database-side, a different set of rich UI or interface -oriented user-defined types lives in the client-side, and there is no (or little) overlap between user-defined types on either side. The interface between the client side and the database side is via mapped primitives rather than user-defined types.

Yes. And your point is? Sounds like furious agreement to me (but I will stick to insisting on TIME as a type).

When there is some overlap, it's relatively straightforward to reconstruct the relatively few shared types on each side.

--
¹ This is perhaps exaggerated, for the sake of illustration. Code I've written has never nested this deep. One or two levels is typical.

Short answer (because it's late) is that user-defined types are absolutely mandatory. Only having simple records is entirely insufficient. Ideally, and at a minimum, I expect to see a rich type system. TTM's is fine (debate over that is elsewhere; let's not repeat it here) as is any system based on algebraic sum-of-product types. Rich polymorphism is a necessity; e.g., multiple dispatch, generics, etc.

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#12 · April 11, 2020, 10:57 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

Andl - A New Database Language - andl.org

#13 · April 11, 2020, 11:25 am

Quote from Dave Voorhis on April 11, 2020, 11:25 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#14 · April 11, 2020, 11:31 am

Quote from AntC on April 11, 2020, 11:31 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database.

This seems to be a distinction between PossRep: how the application handles types; vs PhysRep: byte arrays 'in the database'.

All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

It's true I've never seen SQL's User-defined types in the wild. That everything is numeric is why SQL is perfectly happy to add a length to a price and store the sum in a weight. The "not widely accepted" argument would seem to shut down any DBMS or relational model to being only SQL-compatible.

A half-decent DBMS would support the PhysReps being the same numeric type, but differentiate by PossRep -- i.e. by nominative type. And that type differentiation has to be held in the DBMS/catalogue. If the "other languages" are type aware, they can make use of the differentiation. If not, I suggest applications in those languages be forbidden from directly updating the database (instead update via APIs).

For a typeful language, we just need the ability to map/serialise/represent any type within the language to/from a byte array in the database. If the DBMS physically represents some types in formats "other [less typeful] languages" recognise, that's a bonus (and record the format in the catalogue). Otherwise, represent as an opaque B-SOB (Binary Small Object).

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database.

This seems to be a distinction between PossRep: how the application handles types; vs PhysRep: byte arrays 'in the database'.

All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

It's true I've never seen SQL's User-defined types in the wild. That everything is numeric is why SQL is perfectly happy to add a length to a price and store the sum in a weight. The "not widely accepted" argument would seem to shut down any DBMS or relational model to being only SQL-compatible.

A half-decent DBMS would support the PhysReps being the same numeric type, but differentiate by PossRep -- i.e. by nominative type. And that type differentiation has to be held in the DBMS/catalogue. If the "other languages" are type aware, they can make use of the differentiation. If not, I suggest applications in those languages be forbidden from directly updating the database (instead update via APIs).

For a typeful language, we just need the ability to map/serialise/represent any type within the language to/from a byte array in the database. If the DBMS physically represents some types in formats "other [less typeful] languages" recognise, that's a bonus (and record the format in the catalogue). Otherwise, represent as an opaque B-SOB (Binary Small Object).

#15 · April 13, 2020, 12:17 am

Quote from dandl on April 13, 2020, 12:17 am

Quote from AntC on April 11, 2020, 11:31 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database.

This seems to be a distinction between PossRep: how the application handles types; vs PhysRep: byte arrays 'in the database'.

I don't understand this. I don't think it has anything to do with anything I've been proposing.

All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

It's true I've never seen SQL's User-defined types in the wild. That everything is numeric is why SQL is perfectly happy to add a length to a price and store the sum in a weight.

That's an incredibly subtle and difficult problem to solve in the general case. It's perfectly valid to divide a length by a price and store the result in a value. How would the database know? This goes well beyond a simple type system.

The "not widely accepted" argument would seem to shut down any DBMS or relational model to being only SQL-compatible.

There are lots of 'widely accepted' ways of storing data that are not SQL-compatible.

A half-decent DBMS would support the PhysReps being the same numeric type, but differentiate by PossRep -- i.e. by nominative type. And that type differentiation has to be held in the DBMS/catalogue. If the "other languages" are type aware, they can make use of the differentiation. If not, I suggest applications in those languages be forbidden from directly updating the database (instead update via APIs).

I don't think 'PossRep' is helpful here, too much baggage. I do see value in distinguishing a 'storage type' from an 'intended usage type', but it's a difficult thing to get right. A TEXT string might be a URL, an email address, text label, customer name/address, markdown document, JSON document, "yes,no,maybe" enum, etc. Application programmers are used to dealing with those distinctions, but if you propose a general mechanism enforced by the database across multiple application clients I would be interested to know how that works.

For a typeful language, we just need the ability to map/serialise/represent any type within the language to/from a byte array in the database. If the DBMS physically represents some types in formats "other [less typeful] languages" recognise, that's a bonus (and record the format in the catalogue). Otherwise, represent as an opaque B-SOB (Binary Small Object).

This makes no sense. I'm proposing a type system and query language that are a superset of SQL/JSON and you want raw binary? How does that work?

Quote from AntC on April 11, 2020, 11:31 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database.

This seems to be a distinction between PossRep: how the application handles types; vs PhysRep: byte arrays 'in the database'.

I don't understand this. I don't think it has anything to do with anything I've been proposing.

All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

It's true I've never seen SQL's User-defined types in the wild. That everything is numeric is why SQL is perfectly happy to add a length to a price and store the sum in a weight.

That's an incredibly subtle and difficult problem to solve in the general case. It's perfectly valid to divide a length by a price and store the result in a value. How would the database know? This goes well beyond a simple type system.

The "not widely accepted" argument would seem to shut down any DBMS or relational model to being only SQL-compatible.

There are lots of 'widely accepted' ways of storing data that are not SQL-compatible.

A half-decent DBMS would support the PhysReps being the same numeric type, but differentiate by PossRep -- i.e. by nominative type. And that type differentiation has to be held in the DBMS/catalogue. If the "other languages" are type aware, they can make use of the differentiation. If not, I suggest applications in those languages be forbidden from directly updating the database (instead update via APIs).

I don't think 'PossRep' is helpful here, too much baggage. I do see value in distinguishing a 'storage type' from an 'intended usage type', but it's a difficult thing to get right. A TEXT string might be a URL, an email address, text label, customer name/address, markdown document, JSON document, "yes,no,maybe" enum, etc. Application programmers are used to dealing with those distinctions, but if you propose a general mechanism enforced by the database across multiple application clients I would be interested to know how that works.

For a typeful language, we just need the ability to map/serialise/represent any type within the language to/from a byte array in the database. If the DBMS physically represents some types in formats "other [less typeful] languages" recognise, that's a bonus (and record the format in the catalogue). Otherwise, represent as an opaque B-SOB (Binary Small Object).

This makes no sense. I'm proposing a type system and query language that are a superset of SQL/JSON and you want raw binary? How does that work?

Andl - A New Database Language - andl.org

#16 · April 13, 2020, 12:27 am

Quote from dandl on April 13, 2020, 12:27 am

Quote from Dave Voorhis on April 11, 2020, 11:25 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

Pardon? Where did you read that? I'm a query language and rich types system that are a superset of SQL/JSON, and you equate that to a key/value store? I want to keep user-written code out of the database type system, that's all.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

And every program you've ever designed or written can be implemented in hand-coded machine language. So what?

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

Quote from Dave Voorhis on April 11, 2020, 11:25 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

If user-defined types are supported, the only built-in types needed for a reasonably performant language are (per Tutorial D) BOOLEAN, CHARACTER, RATIONAL or (preferably) FLOAT to map to standard hardware IEEE 754, INTEGER (map to canonical long or equivalent), and (ideally) a byte array though CHARACTER will suffice via encoding (if necessary.)

With user-defined type support, additional types -- TIMESTAMP, DECIMAL, DATE, can be provided where appropriate via a non-builtin, non-primitive standard library or equivalent.

I disagree about your reasons: I see no reason to omit any of the ones I listed, they are all widely available, widely used and trivial to implement. The question is not whether you would leave any out, the question is: do you agree that they are sufficient?

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

Pardon? Where did you read that? I'm a query language and rich types system that are a superset of SQL/JSON, and you equate that to a key/value store? I want to keep user-written code out of the database type system, that's all.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

And every program you've ever designed or written can be implemented in hand-coded machine language. So what?

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

Andl - A New Database Language - andl.org

#17 · April 13, 2020, 12:31 am

Quote from dandl on April 11, 2020, 7:24 am

Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2. Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important. But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact). Fraction arithmetic is a fine thing, but it can get expensive too. And people who deal in complex numbers almost always want complex floats, too. So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

I don't recall ever needing fractions with denominators other than 1, 2 and 10, and INT, DEC and REAL have those covered.

Naturally not, since you don't have them. To determine their worth, you have to talk to Python or Lisp programmers (who actually have access to them as a standard part of their languages) and use they make of them.

#18 · April 13, 2020, 4:52 am

Quote from johnwcowan on April 13, 2020, 12:31 am

Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2. Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important. But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact). Fraction arithmetic is a fine thing, but it can get expensive too. And people who deal in complex numbers almost always want complex floats, too. So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

For my purposes of logical representation to users, inexact real numbers are just exact real numbers where your operators specify to round the result of the answer they would otherwise give so that we keep the size manageable. Floats are technically an exact type that just can't store all values; what you put in for plain assignment is exactly what you get out, and it is the calculations that are what is lossy. As such, in my system, a floating point type is one of the valid physical representations behind the scenes for a Fraction.

Of course, if one wants their floating-point number to be a disjoint type from fraction rather than a proper subset, that's a different matter, and perfectly reasonable.

#19 · April 13, 2020, 5:20 am

Quote from AntC on April 13, 2020, 5:20 am

Quote from dandl on April 13, 2020, 12:27 am

Quote from Dave Voorhis on April 11, 2020, 11:25 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

Pardon? Where did you read that? I'm a query language and rich types system that are a superset of SQL/JSON, and you equate that to a key/value store? I want to keep user-written code out of the database type system, that's all.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

You've rejected supporting enough to block adding a price to a weight, storing the sum in a length. (I'm not saying I always/for all types want blocks like that, I certainly want it sometimes.) Then you're not "fully supporting user-defined types". And if that's what you mean by the title's "a storage-based type system", then yes a database (DBMS/DML) does need more than that.

I think you've limited yourself in one of your replies above: "any RDBMS today", " a superset of SQL/JSON". What you're proposing might match what's available today in the mainstream. That's not adequate.

What I propose is not to build a dedicated DML/database-specific type system/DBMS. I propose using an existing sufficiently powerful language with its type system, and build on D-ness. In terms of the persistence layer, that type system must be powerful enough to statically type-safely marshall values from/to the language to/from the persistence layer. If it means no programs in other languages/type systems can access that data without a mapping layer, then so be it.

Quote from dandl on April 13, 2020, 12:27 am

Quote from Dave Voorhis on April 11, 2020, 11:25 am

Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

Pardon? Where did you read that? I'm a query language and rich types system that are a superset of SQL/JSON, and you equate that to a key/value store? I want to keep user-written code out of the database type system, that's all.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

You've rejected supporting enough to block adding a price to a weight, storing the sum in a length. (I'm not saying I always/for all types want blocks like that, I certainly want it sometimes.) Then you're not "fully supporting user-defined types". And if that's what you mean by the title's "a storage-based type system", then yes a database (DBMS/DML) does need more than that.

I think you've limited yourself in one of your replies above: "any RDBMS today", " a superset of SQL/JSON". What you're proposing might match what's available today in the mainstream. That's not adequate.

What I propose is not to build a dedicated DML/database-specific type system/DBMS. I propose using an existing sufficiently powerful language with its type system, and build on D-ness. In terms of the persistence layer, that type system must be powerful enough to statically type-safely marshall values from/to the language to/from the persistence layer. If it means no programs in other languages/type systems can access that data without a mapping layer, then so be it.

#20 · April 13, 2020, 7:26 am

Quote from AntC on April 13, 2020, 5:20 am

Quote from dandl on April 13, 2020, 12:27 am

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

You've rejected supporting enough to block adding a price to a weight, storing the sum in a length. (I'm not saying I always/for all types want blocks like that, I certainly want it sometimes.) Then you're not "fully supporting user-defined types". And if that's what you mean by the title's "a storage-based type system", then yes a database (DBMS/DML) does need more than that.

I think you've limited yourself in one of your replies above: "any RDBMS today", " a superset of SQL/JSON". What you're proposing might match what's available today in the mainstream. That's not adequate.

What I propose is not to build a dedicated DML/database-specific type system/DBMS. I propose using an existing sufficiently powerful language with its type system, and build on D-ness. In terms of the persistence layer, that type system must be powerful enough to statically type-safely marshall values from/to the language to/from the persistence layer. If it means no programs in other languages/type systems can access that data without a mapping layer, then so be it.

Yes, that.

I agree.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters

Does a database need anything more than a storage-based type system?