The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Where do y'all hang out?

PreviousPage 4 of 5Next
Quote from Erwin on December 31, 2022, 10:29 pm
Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

Dude, please, learn the RM.  The "not amenable" is a DIRECT consequence of people not being willing to let the "type system" be anything more than "just numbers, dates and strings".

I know the RM well enough for all practical purposes. Otherwise I wouldn't have written this: http://www.andl.org/2021/07/formal-definitions-for-an-extended-relational-algebra/. Feel free to critique.

The key insight of TTM is the interplay between the RM and the attribute type system. The key innovation (over SQL) is the inclusion of a struct type and a relation type in that type system, but that's only important if (a) you already have a data model in mind and (b) you need a full blown programming language to operate on it. Most business applications store values of just 6 types (bool, int, decimal, float, date/time and string), which SQL already handles, although struct would certainly be a nice addition. The only place other types are really needed is in the language that operates on that data, and available GP languages do that well enough.

My post was about being able to query (not update) a wide variety of data sources already in existence interactively or in tiny scripts, not about the needs of major application development. For that purpose those 6 types are enough.

Andl - A New Database Language - andl.org
Quote from dandl on January 1, 2023, 12:46 am
Quote from Erwin on December 31, 2022, 10:29 pm
Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

Dude, please, learn the RM.  The "not amenable" is a DIRECT consequence of people not being willing to let the "type system" be anything more than "just numbers, dates and strings".

I know the RM well enough for all practical purposes. Otherwise I wouldn't have written this: http://www.andl.org/2021/07/formal-definitions-for-an-extended-relational-algebra/. Feel free to critique.

The key insight of TTM is the interplay between the RM and the attribute type system. The key innovation (over SQL) is the inclusion of a struct type and a relation type in that type system, but that's only important if (a) you already have a data model in mind and (b) you need a full blown programming language to operate on it. Most business applications store values of just 6 types (bool, int, decimal, float, date/time and string), which SQL already handles, although struct would certainly be a nice addition. The only place other types are really needed is in the language that operates on that data, and available GP languages do that well enough.

My post was about being able to query (not update) a wide variety of data sources already in existence interactively or in tiny scripts, not about the needs of major application development. For that purpose those 6 types are enough.

Fixed-precision decimal or unlimited precision?  Fixed-size int or unlimited?  What's the biggest/smallest float?  Date/time with or without timezone?  What is the maximum length of a string? No blob, or will string work for arbitrary data?

Etc.

Or... If it's for simple data-crunching, why have any type other than string?

These things are a surprising minefield of considerations. What you consider to be common-sense and obvious decisions, the next guy along will consider to be abominations or omissions.

That's why I favour rich type systems as the starting point, rather than some arbitrary selection of built-in types. That way, if your excellent type-x is abominable for my purposes, I can easily replace it. Or add it, if you didn't provide it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on January 1, 2023, 12:26 am
Quote from Dave Voorhis on December 31, 2022, 1:22 pm
Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

That's kind of the forte of bash and/or Python, though I wouldn't say either are particularly good at it, it's just everything else is a bit worse.

Bash is a horrible mixture of 70's cute features, impossible quoting conventions and a maze of twisty little Unix filters, all different. I won't use it. [Actually, half the time it finishes up being a mix of Sed and Awk, so you might just as well just use Python and be done with it.]

Bash and the associated gaggle of Unix tools is undeniably horrid, yet for some things I'll have the bash solution done and tested whilst the Pythonistas are still deciding what packages to import.

Python happens by pure luck to be the winner over Perl and Ruby, with PHP a sad last. I do use it, but never with pleasure.

I use Rel on a mix of CSVs, spreadsheets, external SQL database tables and internal relvars and for those it's great. It's not much help on emails and ZIPs and system APIs, though I can use Java code and libraries from within Rel. It's not fun or easy, though.

It's that which led me to the to-Java transpiler I've been slowly working on, to get both friendly relational semantics and Java for just this sort of thing.

Which is kind of what I've been talking about -- that's one way to add features to a known language.

What strikes me is that API endpoints are notionally like relvars. GET/POST (per REST terminology) or queries (per GraphQL terminology) retrieves. Mutations (per GraphQL terminology) or PUT/DELETE (per REST, and GET/POST can also be used to update) are like relvar updates.

That means what we could have is a relational model for API access, where a relational algebra is used to query whilst relevant operations update a collection of relvars (which are actually API endpoints.)  That would be much higher level than REST, and could be considerably higher level -- and probably simpler, easier, and more powerful -- than GraphQL.

I think I'll work on a prototype.

I have used APIs to retrieve JSON arrays, and screen scraping to retrieve HTML tables, but I'm not really using enough of this kind of stuff to comment usefully.

I create and consume Web-tech APIs a lot.

It's hard not to recognise that significant complexity in this area would vanish if we favoured remote-procedure-call facilities, so a relational API framework should ideally look and taste like local function calls.

IMO the BIG reason for the RM is to think about relations (or tables) as things in their own right, not just a stream of tuples. Linq has to be written stream-wise, but the mental picture easily rises to the higher level. That's what I would focus on.

I generally think in terms of structures and operations on structures, where the operations can be composed to form expressions that describe transformation from input structure to output structure. Relational models are one example of this approach. Java Streams and C# LINQ are another, and there are various others. I think in terms of operations on this relation or these relations, or on this stream or these streams, or on this matrix or these matrices, or on this bitmap image or these bitmap images, and so on. They're different operations depending on what the structures are, but conceptually equivalent.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Erwin on December 31, 2022, 10:26 pm
Quote from Hugh on December 27, 2022, 2:44 pm

If Dave's trying to tempt me with his nice bird pics (robin, eagle owl), then he's succeeded.  This one's a nice video taken by my trailcam, fortuitously set to take 30 seconds per shot.

Hugh

I've had Roodstaart (gekraagde roodstaart, I think, because it was clearly a couple and neither of them were black as at least one of them should have been had they been zwarte roodstaart) in my little garden for some 2-3 months.  This was the first year I got to see them.

Thanks, Erwin.  I have never forgotten learning the song of the redstart on the Hoge Veluwe one during my time in The Netherlands working on Business System 12.

I hope the forum at large doesn't mind these little off-topic diversions.  Like you, I've had to find things to occupy me since my full retirement in 2013.  I joined our Parish Council and I write articles under the rubric Nature Notes (originally just Bird Notes).

Hugh

Coauthor of The Third Manifesto and related books.

The key insight of TTM is the interplay between the RM and the attribute type system. The key innovation (over SQL) is the inclusion of a struct type and a relation type in that type system, but that's only important if (a) you already have a data model in mind and (b) you need a full blown programming language to operate on it. Most business applications store values of just 6 types (bool, int, decimal, float, date/time and string), which SQL already handles, although struct would certainly be a nice addition. The only place other types are really needed is in the language that operates on that data, and available GP languages do that well enough.

My post was about being able to query (not update) a wide variety of data sources already in existence interactively or in tiny scripts, not about the needs of major application development. For that purpose those 6 types are enough.

Fixed-precision decimal or unlimited precision?  Fixed-size int or unlimited?  What's the biggest/smallest float?  Date/time with or without timezone?  What is the maximum length of a string? No blob, or will string work for arbitrary data?

As I said, these are storage choices. Yes, of course they might internally be strings (as per SQLite) but I'm describing a storage interface. The guarantees are (a) if a put succeeds then you will always get back what you put in (b) a put can fail because you exceeded limits imposed by storage (c) you can put using one type system and get using another but there will be specific issues you may have to deal with, such as those you mention (int size, float size, decimal size, time zone, etc). Those issues arise now between SQL used directly and programmatically (CLI/ODBC).

As soon as you import a programming language type system into storage you make life difficult (impossible?) for other languages with other type systems. This was in an implicit choice made by TTM, but causes immense problems for interoperability. It is arguable that the only reason SQL succeeded was the decision by MS to adopt ODBC, based on the primitive CLI and with the help of Simba, in around 91/92. The choices they made to separate the storage-oriented SQL type system from the quite different host language type system (C) are with us today. It's a topic I've thought about a lot, but one that TTM ignores.

That's why I favour rich type systems as the starting point, rather than some arbitrary selection of built-in types. That way, if your excellent type-x is abominable for my purposes, I can easily replace it. Or add it, if you didn't provide it.

So I have a team skilled in Rust, C#, Java, Python and JS and you want to add type-x to the data storage system I depend on. How will you do that, and how do I interoperate with your data?

 

Andl - A New Database Language - andl.org

I create and consume Web-tech APIs a lot.

It's hard not to recognise that significant complexity in this area would vanish if we favoured remote-procedure-call facilities, so a relational API framework should ideally look and taste like local function calls.

Caveat emptor. It seems this would be repeating the mistake of DCOM where APIs did get easier to use, but all applications got bogged down in a morass of fine-grained remote calls.

Quote from tobega on January 2, 2023, 11:03 am

I create and consume Web-tech APIs a lot.

It's hard not to recognise that significant complexity in this area would vanish if we favoured remote-procedure-call facilities, so a relational API framework should ideally look and taste like local function calls.

Caveat emptor. It seems this would be repeating the mistake of DCOM where APIs did get easier to use, but all applications got bogged down in a morass of fine-grained remote calls.

But what's worse -- fine-grained remote calls which are no different from fine-grained local calls, or a morass of fine-grained laboriously-constructed obviously-Web-API calls?

The latter is typically constructed via builder syntax or similar at best; some arduous construction of Map<String, String>s of parameters at worst. I regularly see both, and it's obvious these are just crunchy surrogates for what could be cleaner and simpler as procedure calls.

But I suspect they'd be a lot cleaner if instead of being too fine-grained and mostly ad-hoc, they were mainly "standard" calls like insert(...), update(...), delete(...), project(...), select(...), join(...), etc.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I use relational algebra to good effect in some of the Advent of Code problems and mention it once in a while in posts (blog, forum). That's pitifully little, but if an itch starts to develop within the developer community, RA will eventually be included in languages. Although I wouldn't hold my breath, I think it takes a bit of a mental climb to advance from just iterating on simpler datastructures to accomplishing everything at once with a few select operations. Mostly it tends to be "divide" that's required and most magical, but I've used "matching" and "notMatching" quite a bit, and an occasional "join" and "union".

FWIW, I wrote a blog post where relational algebra is presented as an example of "closeness of mapping" according to the Cognitive Dimensions of Notation. https://tobega.blogspot.com/2022/12/evaluating-tailspin-language-after.html

 

Quote from Dave Voorhis on January 2, 2023, 11:29 am
Quote from tobega on January 2, 2023, 11:03 am

I create and consume Web-tech APIs a lot.

It's hard not to recognise that significant complexity in this area would vanish if we favoured remote-procedure-call facilities, so a relational API framework should ideally look and taste like local function calls.

Caveat emptor. It seems this would be repeating the mistake of DCOM where APIs did get easier to use, but all applications got bogged down in a morass of fine-grained remote calls.

But what's worse -- fine-grained remote calls which are no different from fine-grained local calls, or a morass of fine-grained laboriously-constructed obviously-Web-API calls?

The latter is typically constructed via builder syntax or similar at best; some arduous construction of Map<String, String>s of parameters at worst. I regularly see both, and it's obvious these are just crunchy surrogates for what could be cleaner and simpler as procedure calls.

But I suspect they'd be a lot cleaner if instead of being too fine-grained and mostly ad-hoc, they were mainly "standard" calls like insert(...), update(...), delete(...), project(...), select(...), join(...), etc.

Well, if you're going to design your REST APIs badly, as if they were local calls, anyway, sure. But why should you make it even easier to do the dumb thing?

Quote from dandl on January 2, 2023, 1:37 am

The key insight of TTM is the interplay between the RM and the attribute type system. The key innovation (over SQL) is the inclusion of a struct type and a relation type in that type system, but that's only important if (a) you already have a data model in mind and (b) you need a full blown programming language to operate on it. Most business applications store values of just 6 types (bool, int, decimal, float, date/time and string), which SQL already handles, although struct would certainly be a nice addition. The only place other types are really needed is in the language that operates on that data, and available GP languages do that well enough.

My post was about being able to query (not update) a wide variety of data sources already in existence interactively or in tiny scripts, not about the needs of major application development. For that purpose those 6 types are enough.

Fixed-precision decimal or unlimited precision?  Fixed-size int or unlimited?  What's the biggest/smallest float?  Date/time with or without timezone?  What is the maximum length of a string? No blob, or will string work for arbitrary data?

As I said, these are storage choices. Yes, of course they might internally be strings (as per SQLite) but I'm describing a storage interface. The guarantees are (a) if a put succeeds then you will always get back what you put in (b) a put can fail because you exceeded limits imposed by storage (c) you can put using one type system and get using another but there will be specific issues you may have to deal with, such as those you mention (int size, float size, decimal size, time zone, etc). Those issues arise now between SQL used directly and programmatically (CLI/ODBC).

As soon as you import a programming language type system into storage you make life difficult (impossible?) for other languages with other type systems. This was in an implicit choice made by TTM, but causes immense problems for interoperability. It is arguable that the only reason SQL succeeded was the decision by MS to adopt ODBC, based on the primitive CLI and with the help of Simba, in around 91/92. The choices they made to separate the storage-oriented SQL type system from the quite different host language type system (C) are with us today. It's a topic I've thought about a lot, but one that TTM ignores.

It doesn't even ignore it; it's simply outside of scope. TTM explicitly doesn't cover security or connection mechanisms either.

That's why I favour rich type systems as the starting point, rather than some arbitrary selection of built-in types. That way, if your excellent type-x is abominable for my purposes, I can easily replace it. Or add it, if you didn't provide it.

So I have a team skilled in Rust, C#, Java, Python and JS and you want to add type-x to the data storage system I depend on. How will you do that, and how do I interoperate with your data?

The usual approach is to specify the wire format string data representation (usually JSON, XML, YAML, etc.) with local implementation as the native sees fit. E.g, the wire format for a date/time type may be specified to be yyyy-mm-ddThh:mm:ss+mm:ss[tz] and local implementations may vary.

That means the only shared type is string, and the only obligatory consideration is which string encoding to use; everything else is by agreement and your own language-specific parsing & checking machinery.

Though if it's a database, language-specific definitions can be stored in it and retrieved by clients without having to roll their own.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 4 of 5Next