The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

Does a database need anything more than a storage-based type system?

Quote from johnwcowan on April 13, 2020, 12:31 am
Quote from dandl on April 11, 2020, 7:24 am
Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2.  Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important.  But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact).  Fraction arithmetic is a fine thing, but it can get expensive too.  And people who deal in complex numbers almost always want complex floats, too.  So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

I don't recall ever needing fractions with denominators other than 1, 2 and 10, and INT, DEC and REAL have those covered.

Naturally not, since you don't have them.  To determine their worth, you have to talk to Python or Lisp programmers (who actually have access to them as a standard part of their languages) and use they make of them.

I've written code in all those languages, and others beside. I've just never come across a problem for which that was the solution.

But I'm happy to be educated. Please suggest one.

Andl - A New Database Language - andl.org
Quote from dandl on April 13, 2020, 7:55 am
Quote from johnwcowan on April 13, 2020, 12:31 am
Quote from dandl on April 11, 2020, 7:24 am
Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2.  Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important.  But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact).  Fraction arithmetic is a fine thing, but it can get expensive too.  And people who deal in complex numbers almost always want complex floats, too.  So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

I don't recall ever needing fractions with denominators other than 1, 2 and 10, and INT, DEC and REAL have those covered.

Naturally not, since you don't have them.  To determine their worth, you have to talk to Python or Lisp programmers (who actually have access to them as a standard part of their languages) and use they make of them.

I've written code in all those languages, and others beside. I've just never come across a problem for which that was the solution.

But I'm happy to be educated. Please suggest one.

If you're developing tools for others, "I've never seen an x therefore x is unnecessary" or "I've only ever needed x so x is all you need" is always the wrong approach.

Good language and system design follows from having the flexibility to handle unanticipated requirements.

Bad language and system design is characterised by accommodating the domain experience of the designer and nothing more.

Quote from AntC on April 13, 2020, 5:20 am
Quote from dandl on April 13, 2020, 12:27 am
Quote from Dave Voorhis on April 11, 2020, 11:25 am
Quote from dandl on April 11, 2020, 10:57 am

Now I'm confused. There are plenty of languages out there with user-defined types, etc, etc supported by user-written code. I'm proposing a specific set of 9 types for the database, including user-defined ENUM and STRUCT, but no code. No selectors, no operators, and in fact no type declarations and no collections. That leaves the job of application type system and the database interface layer as a job for code, but that code is written in a GP language of your choice. The database simply promises that you can access the data natively in your chosen language.

That's not a database, that's a persistence layer with too many types. If you're defining a persistence layer, you need only one type: byte array.

No, that's a database, just as surely as any RDBMS today. With types, it supports queries, otherwise it does not.

Not sufficient. Types need to be defined on an application by application basis, flexibly based on requirements and composed from canonical primitives, not by some arbitrary domain-specific predetermination. That said, what about complex numbers? Geometric shapes? Points on a cartesian coordinate system? Lat/Long? Arbitrary-precision decimals? Fixed-point decimals? Currency? Measures? (a value/unit pair) Etc.

Of course, in the application but not in the database. All your examples are easily dealt with using the types I gave, plus application side code as needed. I gave them all as examples. Why do you say they have to be in the database, as user-defined types with supporting code?

If the application is doing all the type processing, then the "database" is merely a (transactional?) persistence engine. That sounds exactly like the Berkeley DB and similar key/value stores.

Pardon? Where did you read that? I'm a query language and rich types system that are a superset of SQL/JSON, and you equate that to a key/value store? I want to keep user-written code out of the database type system, that's all.

If that's the case, the only type needed in the database is byte array. Why provide others?

Instead of predefining a fixed set of types, provide a mechanism to let the user define their own database/storage-compatible primitive types. The type definer is obligated to provide a means to martial a value of that type to a byte array representation and a means to un-martial a byte array representation back to a native value. For composite types, the martial/un-martial operations can be provided automatically, by virtue of them being defined for primitive types.

That's how the Rel internals work, in fact.

There are strong motivations for my approach. First, there are no implementations of 'types in the database', or if there are they are rare and not widely accepted. Second, if you tie the database to supporting code in a particular language you deny access by other languages. Third,you invite a myriad of problems with versioning as an application and its types evolve over time. Fourth, you don't need them. A lingua franca of widely accepted types is enough, in the database and in data interchange.

All of my examples and all of your examples can be implemented with exactly one type: byte array.

Or string.

If you're going to admit more than byte array or string but otherwise hobble or preclude user-defined types, then you've created a system that implies typeful programming but restricts it to the DBMS author who presumes to know better than I do about what types I need and want. Forcing me to shoehorn every, say, Lat/Long or Complex into a pair of columns, or every polygon into a who-knows-what...?

No.

That takes us back to dBASE III levels of type capability.

I've proposed fully supporting user-defined types, but keeping user-written executable code out of the database type system. What do you propose?

You've rejected supporting enough to block adding a price to a weight, storing the sum in a length. (I'm not saying I always/for all types want blocks like that, I certainly want it sometimes.) Then you're not "fully  supporting user-defined types". And if that's what you mean by the title's "a storage-based type system", then yes a database (DBMS/DML) does need more than that.

No, I've proposed a type system with user-defined types but without related application code executed in the database. You get to define types as a set of values, as per RM Pre 1. You just don't get to write your own code to put in selectors for the database to execute per RM Pre 4. You also don't get to write your own library functions to use in queries (which as per OO VSS 2 are not part of the type system).

I think you've limited yourself in one of your replies above: "any RDBMS today", " a superset of SQL/JSON". What you're proposing might match what's available today in the mainstream. That's not adequate.

What I propose is not to build a dedicated DML/database-specific type system/DBMS. I propose using an existing sufficiently powerful language with its type system, and build on D-ness. In terms of the persistence layer, that type system must be powerful enough to statically type-safely marshall values from/to the language to/from the persistence layer. If it means no programs in other languages/type systems can access that data without a mapping layer, then so be it.

That's my plan B. Plan A is:

• A query language based on Andl ERA (as outlined), implemented as an extension to a GP language
• A type system as per this proposal (9 types including user-defined STRUCT and ENUM), compatible with the chosen GPL
• A function library compatible with the chosen GPL and SQL.

This can be executed locally or cross-compiled to SQL. If the lack of type system or library features is a compelling limitation, then Plan B:

• Add an enhanced ability to define types and related library functions accessible to queries.

Note that Plan B precludes compiling to SQL, but it might still be possible to target a server that supports other languages. If not, you're on your own. Good luck with that.

Given that no known DBMS provides facilities per Plan B, it's hard to know just how important it really is. Which is kind of why this proposal to not do it.

Andl - A New Database Language - andl.org
Quote from dandl on April 13, 2020, 8:56 am

... Plan A is:

• A query language based on Andl ERA (as outlined), implemented as an extension to a GP language
• A type system as per this proposal (9 types including user-defined STRUCT and ENUM), compatible with the chosen GPL
• A function library compatible with the chosen GPL and SQL.

This can be executed locally or cross-compiled to SQL. If the lack of type system or library features is a compelling limitation, then Plan B:

• Add an enhanced ability to define types and related library functions accessible to queries.

Note that Plan B precludes compiling to SQL, but it might still be possible to target a server that supports other languages. If not, you're on your own. Good luck with that.

Given that no known DBMS provides facilities per Plan B, it's hard to know just how important it really is. Which is kind of why this proposal to not do it.

I'm afraid I'm understanding this less the more you write about it.

At first it sounded like a lightweight storage engine with a set of predefined types. That's unduly limiting for my purposes, but I can kind of see the justification.

Now I don't know what it is, particularly with descriptions like "I've proposed a type system with user-defined types but without related application code executed in the database. You get to define types as a set of values, as per RM Pre 1. You just don't get to write your own code to put in selectors for the database to execute per RM Pre 4. You also don't get to write your own library functions to use in queries (which as per OO VSS 2 are not part of the type system)."

What is "application code executed in the database"?

Databases don't execute, languages execute. DBMSs can host languages that execute inside the DBMS. Is that what you mean?

If you "don't get to write your own code to put in selectors for the database to execute" (again, "for the database to execute" doesn't make sense), which I presume means you can't have selectors (?), how do you select values of types?

I get the feeling there's maybe a sound idea behind this, but it's not clear what it is.

As the thread went on, it sounded like you were planning to expose what is or could be the internals of a database language -- i.e., the core library -- in its implementation language, along with a predefined set of types. But that isn't it?

It does sound like queries would be unpleasantly restricted. No use of library functions in queries means queries that are largely unusable. No date functions? No string functions? Just -- I presume -- testing for equality?

Quote from Dave Voorhis on April 13, 2020, 8:16 am
Quote from dandl on April 13, 2020, 7:55 am
Quote from johnwcowan on April 13, 2020, 12:31 am
Quote from dandl on April 11, 2020, 7:24 am
Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2.  Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important.  But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact).  Fraction arithmetic is a fine thing, but it can get expensive too.  And people who deal in complex numbers almost always want complex floats, too.  So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

I don't recall ever needing fractions with denominators other than 1, 2 and 10, and INT, DEC and REAL have those covered.

Naturally not, since you don't have them.  To determine their worth, you have to talk to Python or Lisp programmers (who actually have access to them as a standard part of their languages) and use they make of them.

I've written code in all those languages, and others beside. I've just never come across a problem for which that was the solution.

But I'm happy to be educated. Please suggest one.

If you're developing tools for others, "I've never seen an x therefore x is unnecessary" or "I've only ever needed x so x is all you need" is always the wrong approach.

Good language and system design follows from having the flexibility to handle unanticipated requirements.

Bad language and system design is characterised by accommodating the domain experience of the designer and nothing more.

Adding features to a product just because you can is always a bad idea. Adding features to a product just someone thought it might 'add flexibility' is an even worse idea.

The basic rule is: add features to software for others there is a need, it solves a meaningful problem, people are going to use it. If in doubt, leave it out. YAGNI.

There are no use cases for fractional numbers that I know of. Someone (maybe the BDFL) thought it would be a good idea and whipped up some code, and nobody had the heart to tell him that's a stupid idea. It's there because it's there, not because anyone wants it or needs it.

Andl - A New Database Language - andl.org
Quote from dandl on April 13, 2020, 11:21 am
Quote from Dave Voorhis on April 13, 2020, 8:16 am
Quote from dandl on April 13, 2020, 7:55 am
Quote from johnwcowan on April 13, 2020, 12:31 am
Quote from dandl on April 11, 2020, 7:24 am
Quote from Darren Duncan on April 11, 2020, 2:42 am

I find that for core numeric types, 2 is exactly the right number, where I have Integer and Fraction as those 2.  Both of these are exact and unlimited precision and scale.

Well, in my view the question of integer and fraction vs. just fraction is not so important.  But when you need inexact numbers (e.g. floats) you really do need them, especially to express measurements (which unlike counts are inherently inexact).  Fraction arithmetic is a fine thing, but it can get expensive too.  And people who deal in complex numbers almost always want complex floats, too.  So:

Exact integers
Exact ratios
Inexact real numbers (to a certain precision and accuracy)
Inexact complex numbers (ditto)

I don't recall ever needing fractions with denominators other than 1, 2 and 10, and INT, DEC and REAL have those covered.

Naturally not, since you don't have them.  To determine their worth, you have to talk to Python or Lisp programmers (who actually have access to them as a standard part of their languages) and use they make of them.

I've written code in all those languages, and others beside. I've just never come across a problem for which that was the solution.

But I'm happy to be educated. Please suggest one.

If you're developing tools for others, "I've never seen an x therefore x is unnecessary" or "I've only ever needed x so x is all you need" is always the wrong approach.

Good language and system design follows from having the flexibility to handle unanticipated requirements.

Bad language and system design is characterised by accommodating the domain experience of the designer and nothing more.

Adding features to a product just because you can is always a bad idea. Adding features to a product just someone thought it might 'add flexibility' is an even worse idea.

The basic rule is: add features to software for others there is a need, it solves a meaningful problem, people are going to use it. If in doubt, leave it out. YAGNI.

There are no use cases for fractional numbers that I know of. Someone (maybe the BDFL) thought it would be a good idea and whipped up some code, and nobody had the heart to tell him that's a stupid idea. It's there because it's there, not because anyone wants it or needs it.

Adding arbitrary features without requirements is just gilding, and is indeed a bad idea. Providing flexibility to meet future requirements isn't a feature but a characteristic -- and sometimes a philosophy -- and is always a good idea.

"Here's everything you'll ever need" was indeed the dBASE III (and friends) approach, and there's a good reason such speciality languages gave way to general-purpose languages.

Quote from Dave Voorhis on April 13, 2020, 9:35 am

I get the feeling there's maybe a sound idea behind this, but it's not clear what it is.

Somewhere during the very first few years of my presence on the discussion list, I wrote almost exactly the same thing to P C.

Quote from Dave Voorhis on April 13, 2020, 8:16 am

If you're developing tools for others, "I've never seen an x therefore x is unnecessary" or "I've only ever needed x so x is all you need" is always the wrong approach.

Codd himself wrote down exactly the same argument against relations-as-domains in the very first paper.

If normalization as described above is to be applicable, the unnormalized collection of relations must satisfy the following conditions :
(1) The graph of interrelationships of the nonsimple domains is a collection of trees.
(2) No primary key has a component domain which is nonsimple.
The writer knows of no application which would require any relaxation of these conditions.

Quote from Erwin on April 13, 2020, 1:02 pm
Quote from Dave Voorhis on April 13, 2020, 8:16 am

If you're developing tools for others, "I've never seen an x therefore x is unnecessary" or "I've only ever needed x so x is all you need" is always the wrong approach.

Codd himself wrote down exactly the same argument against relations-as-domains in the very first paper.

If normalization as described above is to be applicable, the unnormalized collection of relations must satisfy the following conditions :
(1) The graph of interrelationships of the nonsimple domains is a collection of trees.
(2) No primary key has a component domain which is nonsimple.
The writer knows of no application which would require any relaxation of these conditions.

There are two absolutely consistent features of truly revolutionary ideas.

1. They are initially rejected and resisted by highly competent people who are experts in the state of the art and the application of established knowledge
2. About 99% of the time they are wrong, and rejection and resistance is what they need and deserve.

Due to recency bias you will easily be able to recount cases where the revolutionary idea was right: Newton, Galileo, Einstein, Schrödinger, etc; continental drift and Helicobacter pylori are two in my personal recollection.

You will not so easily be able to identify instances where it was wrong but there are far more, countless thousands of them. In my view this is one of the wrong ones, and I think I have about a 99% chance of being right.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on April 13, 2020, 9:35 am
Quote from dandl on April 13, 2020, 8:56 am

... Plan A is:

• A query language based on Andl ERA (as outlined), implemented as an extension to a GP language
• A type system as per this proposal (9 types including user-defined STRUCT and ENUM), compatible with the chosen GPL
• A function library compatible with the chosen GPL and SQL.

This can be executed locally or cross-compiled to SQL. If the lack of type system or library features is a compelling limitation, then Plan B:

• Add an enhanced ability to define types and related library functions accessible to queries.

Note that Plan B precludes compiling to SQL, but it might still be possible to target a server that supports other languages. If not, you're on your own. Good luck with that.

Given that no known DBMS provides facilities per Plan B, it's hard to know just how important it really is. Which is kind of why this proposal to not do it.

I'm afraid I'm understanding this less the more you write about it.

At first it sounded like a lightweight storage engine with a set of predefined types. That's unduly limiting for my purposes, but I can kind of see the justification.

I don't know whether it's because I don't have it clear in my mind, I'm not writing it well, or you are reading into it what I didn't write.

I don't care about 'lightweight'. What I'm exploring is how to adapt the ideas in TTM to fit with a real world out there that now has a rich mixture of GP languages, SQL and NoSQL data stores, and a myriad of interesting data sources, but very little interest in a 'D' as defined. I'm interested in finding something people might actually want to use.

So my proposal is about handing over most of TTM to some existing GP 'host' language while retaining a core of key features as a language extension. I'm suggesting that should include:

1. A version of the RA, roughly at the level of SQL DQL, with all attribute computations (Select, New Value, Aggregation) farmed out to a set of library functions in the host language
2. A type system integrated with the host language, roughly as capable as TTM but scalars only.

A type system designed as a superset of SQL/JSON would allow queries to be cross-compiled into SQL and executed on a RDBMS. A type system with "selector operator S" as per RM Pre 4a would rule out this possibility, but could be implemented using a local storage engine.

Now I don't know what it is, particularly with descriptions like "I've proposed a type system with user-defined types but without related application code executed in the database. You get to define types as a set of values, as per RM Pre 1. You just don't get to write your own code to put in selectors for the database to execute per RM Pre 4. You also don't get to write your own library functions to use in queries (which as per OO VSS 2 are not part of the type system)."

What is "application code executed in the database"?

Databases don't execute, languages execute. DBMSs can host languages that execute inside the DBMS. Is that what you mean?

Being a bit picky? A DBMS as per TTM has a catalog, and if there are instances of user-defined types in the database then any "Selector Operator S" must be stored in the catalog and executed by the DBMS when new values are created. If the types are user-defined that means application code being executed by the DBMS. That has implications, and losing the ability to compile queries into SQL is one of them.

If you "don't get to write your own code to put in selectors for the database to execute" (again, "for the database to execute" doesn't make sense), which I presume means you can't have selectors (?), how do you select values of types?

I get the feeling there's maybe a sound idea behind this, but it's not clear what it is.

As the thread went on, it sounded like you were planning to expose what is or could be the internals of a database language -- i.e., the core library -- in its implementation language, along with a predefined set of types. But that isn't it?

It does sound like queries would be unpleasantly restricted. No use of library functions in queries means queries that are largely unusable. No date functions? No string functions? Just -- I presume -- testing for equality?

I would expect all those as library functions, and hundreds more. TEXT and TIME are standard types, so you get the full gamut of ways to create and manipulate values, based on the existing standard library for the host language. My aim would be to 'sanitise' the library functions so you can use the same queries with (say) Java and C# and JavaScript as host languages, without getting tied up in platform detail differences. But maybe that's too hard.

However if you want your query to run on a RDBMS, you can only use the type and library function subset that can be supported by the targeted SQL.

BTW there is an interesting parallel with LINQ. Mostly what we all use is the `IEnumerable` interface, which gives access to something like 95% of the standard library functions, as well as the ability to write your own. Alternatively there is the `IQueryable` interface which compiles into SQL and restricts the range of types and functions to what can be cross-compiled. Perhaps that's a better model to go after.

Andl - A New Database Language - andl.org