Where do y'all hang out?

#21 · December 29, 2022, 11:31 am

Quote from dandl on December 29, 2022, 4:53 am

I do say that you need a cogent answer to those familiar with SQL and accustomed to using NULL to solve certain problems. Right now you don't have that, or at least you haven't made it clear what that answer might be.

We have it, and it's clear what the answer is. It's obviously option types for most cases -- given that's where popular programming languages have already headed, with their inbuilt support for option types -- plus some structural approaches like Rel's outer join.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#22 · December 30, 2022, 3:20 am

I'm not at all sure it's 'obvious', or that popular GP languages are headed there. Documentation on the Web is scanty, and the Wiki page is pretty bad, suggesting it's not of widespread interest.

But that's not the whole issue. I've presented an XRM with 9 operators plus a specific mechanism for ordered queries which can completely replace SQL and then some. I've shown how the TTM expectation of compile-time checking/generation of heading types could be added to an existing compiler, allowing immediate access to the vast infrastructure of existing languages. And yes, the Rel outer join solution looks to be a good enough answer.

But IMO the type system is the major outstanding issue. The RM does not prescribe a type system, but it does require one based on values, not objects. So whatever language is chosen as the base, it must support a value-based type system and a language-native answer to how to solve the SQL NULL problem without heading down the same path. C# can probably do it, the value types are good enough and parameterised types can handle missing values. Java - maybe? Or perhaps bite the bullet and head directly into Scala/F#/Swift/Rust.

As so often, I can see a solution and if I had a customer with that problem I could solve it. Until then...

Andl - A New Database Language - andl.org

#23 · December 30, 2022, 8:06 pm

Quote from Dave Voorhis on December 30, 2022, 8:06 pm

Quote from dandl on December 30, 2022, 3:20 am

I'm not at all sure it's 'obvious', or that popular GP languages are headed there. Documentation on the Web is scanty, and the Wiki page is pretty bad, suggesting it's not of widespread interest.

Apparent Wikipedia page quality issues aside, it shows some good examples and links to a page on nullable types, which are perhaps a more familiar implementation of the same underlying concept. Whether popular general purpose languages are headed there or not -- though the presence of nullable types and option types (along with a general trend toward functional programming inspired features) suggest that they are -- where we're clearly not headed is back to limited non-union types with sentinel values and the like. Option types are the only reasonable solution to missing values where structural approaches (like that used in Rel's outer join) are not applicable.

But that's not the whole issue. I've presented an XRM with 9 operators plus a specific mechanism for ordered queries which can completely replace SQL and then some. I've shown how the TTM expectation of compile-time checking/generation of heading types could be added to an existing compiler, allowing immediate access to the vast infrastructure of existing languages. And yes, the Rel outer join solution looks to be a good enough answer.

But IMO the type system is the major outstanding issue. The RM does not prescribe a type system, but it does require one based on values, not objects.

By "objects", I presume you mean references to mutable class instances and by "values" I presume you mean references to immutable class (or equivalent) instances?

Why does a RM require values, and wouldn't work with mutable instances?

I'm presuming they wouldn't be mutated in some relational-expression-breaking way, such as by in situ updates within a relational expression, or by some other thread. Of course, that's a concern with object-oriented programming in general and something to consider in many contexts, not something specific to a relational model.

So whatever language is chosen as the base, it must support a value-based type system and a language-native answer to how to solve the SQL NULL problem without heading down the same path. C# can probably do it, the value types are good enough and parameterised types can handle missing values. Java - maybe? Or perhaps bite the bullet and head directly into Scala/F#/Swift/Rust.

Java has immutable record types, but in general -- and specific languages aside -- I'm not sure there's a compelling justification to replace Java Streams / C# LINQ / <insert language and its collection algebra here> with a relational model equivalent. Is there a problem that a relational algebra set of operators solves that a typical set of collection operators does not?

Quote from dandl on December 30, 2022, 3:20 am

I'm not at all sure it's 'obvious', or that popular GP languages are headed there. Documentation on the Web is scanty, and the Wiki page is pretty bad, suggesting it's not of widespread interest.

Apparent Wikipedia page quality issues aside, it shows some good examples and links to a page on nullable types, which are perhaps a more familiar implementation of the same underlying concept. Whether popular general purpose languages are headed there or not -- though the presence of nullable types and option types (along with a general trend toward functional programming inspired features) suggest that they are -- where we're clearly not headed is back to limited non-union types with sentinel values and the like. Option types are the only reasonable solution to missing values where structural approaches (like that used in Rel's outer join) are not applicable.

But that's not the whole issue. I've presented an XRM with 9 operators plus a specific mechanism for ordered queries which can completely replace SQL and then some. I've shown how the TTM expectation of compile-time checking/generation of heading types could be added to an existing compiler, allowing immediate access to the vast infrastructure of existing languages. And yes, the Rel outer join solution looks to be a good enough answer.

But IMO the type system is the major outstanding issue. The RM does not prescribe a type system, but it does require one based on values, not objects.

By "objects", I presume you mean references to mutable class instances and by "values" I presume you mean references to immutable class (or equivalent) instances?

Why does a RM require values, and wouldn't work with mutable instances?

I'm presuming they wouldn't be mutated in some relational-expression-breaking way, such as by in situ updates within a relational expression, or by some other thread. Of course, that's a concern with object-oriented programming in general and something to consider in many contexts, not something specific to a relational model.

So whatever language is chosen as the base, it must support a value-based type system and a language-native answer to how to solve the SQL NULL problem without heading down the same path. C# can probably do it, the value types are good enough and parameterised types can handle missing values. Java - maybe? Or perhaps bite the bullet and head directly into Scala/F#/Swift/Rust.

Java has immutable record types, but in general -- and specific languages aside -- I'm not sure there's a compelling justification to replace Java Streams / C# LINQ / <insert language and its collection algebra here> with a relational model equivalent. Is there a problem that a relational algebra set of operators solves that a typical set of collection operators does not?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#24 · December 30, 2022, 11:50 pm

Quote from dandl on December 30, 2022, 11:50 pm

But that's not the whole issue. I've presented an XRM with 9 operators plus a specific mechanism for ordered queries which can completely replace SQL and then some. I've shown how the TTM expectation of compile-time checking/generation of heading types could be added to an existing compiler, allowing immediate access to the vast infrastructure of existing languages. And yes, the Rel outer join solution looks to be a good enough answer.

But IMO the type system is the major outstanding issue. The RM does not prescribe a type system, but it does require one based on values, not objects.

By "objects", I presume you mean references to mutable class instances and by "values" I presume you mean references to immutable class (or equivalent) instances?

Why does a RM require values, and wouldn't work with mutable instances?

I guess the key point identified by TTM was that the RM is pretty useless without a type system, and that everything has to be a value so that it can provide value (not reference) equality. Mutability of values in a relation would potentially break things. Values (not references) have to be copied. You can't have typeless nulls, and for Extend and Aggregate you need operators on those types. All of this is doable in C++/Java/C# and others, just a bit clunky and hard to enforce. Which is where the slightly modified compiler comes in.

Java has immutable record types, but in general -- and specific languages aside -- I'm not sure there's a compelling justification to replace Java Streams / C# LINQ / <insert language and its collection algebra here> with a relational model equivalent. Is there a problem that a relational algebra set of operators solves that a typical set of collection operators does not?

Yes, quite a few of them, but nothing compelling. These products have no concept of a relation as such, just a stream of tuples. So there are queries that you can write with confidence in TD or SQL involving multiple relations/tables, that take quite a bit of planning and care to get to work when all you have is tuples. But I don't see any killer app on the horizon anytime soon.

But that's not the whole issue. I've presented an XRM with 9 operators plus a specific mechanism for ordered queries which can completely replace SQL and then some. I've shown how the TTM expectation of compile-time checking/generation of heading types could be added to an existing compiler, allowing immediate access to the vast infrastructure of existing languages. And yes, the Rel outer join solution looks to be a good enough answer.

But IMO the type system is the major outstanding issue. The RM does not prescribe a type system, but it does require one based on values, not objects.

By "objects", I presume you mean references to mutable class instances and by "values" I presume you mean references to immutable class (or equivalent) instances?

Why does a RM require values, and wouldn't work with mutable instances?

I guess the key point identified by TTM was that the RM is pretty useless without a type system, and that everything has to be a value so that it can provide value (not reference) equality. Mutability of values in a relation would potentially break things. Values (not references) have to be copied. You can't have typeless nulls, and for Extend and Aggregate you need operators on those types. All of this is doable in C++/Java/C# and others, just a bit clunky and hard to enforce. Which is where the slightly modified compiler comes in.

Java has immutable record types, but in general -- and specific languages aside -- I'm not sure there's a compelling justification to replace Java Streams / C# LINQ / <insert language and its collection algebra here> with a relational model equivalent. Is there a problem that a relational algebra set of operators solves that a typical set of collection operators does not?

Yes, quite a few of them, but nothing compelling. These products have no concept of a relation as such, just a stream of tuples. So there are queries that you can write with confidence in TD or SQL involving multiple relations/tables, that take quite a bit of planning and care to get to work when all you have is tuples. But I don't see any killer app on the horizon anytime soon.

Andl - A New Database Language - andl.org

#25 · December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

Andl - A New Database Language - andl.org

#26 · December 31, 2022, 1:22 pm

Quote from Dave Voorhis on December 31, 2022, 1:22 pm

Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

That's kind of the forte of bash and/or Python, though I wouldn't say either are particularly good at it, it's just everything else is a bit worse. I use Rel on a mix of CSVs, spreadsheets, external SQL database tables and internal relvars and for those it's great. It's not much help on emails and ZIPs and system APIs, though I can use Java code and libraries from within Rel. It's not fun or easy, though.

It's that which led me to the to-Java transpiler I've been slowly working on, to get both friendly relational semantics and Java for just this sort of thing.

Something I've been thinking about a lot lately is APIs; particularly when using microservices. Without getting into a debate about pros and cons of microservices, think of them here as "a bunch of Web-technology APIs."

Typically, when we want to use a bunch of APIs from some client, they're implemented and accessed via RESTful protocols -- which generally means we can use a given API endpoint to GET (retrieve), POST/PUT (update or insert), and/or DELETE (delete) data. That works, but it's low-level; it's basically primitive CRUD operations over the wire, and the data representations are largely arbitrary so it's dependent on much pre-agreement between API providers and consumers.

There's an alternative to (raw) REST called GraphQL, which lets you query data from an endpoint (with facilities to specify what data you want, like a somewhat-hierarchical relational projection) or mutate (update/insert/delete) data at an API endpoint. It also provides a rather crude facility for something akin to a JOIN on the results of diverse API queries, called "schema stitching", and it provides some standard type definitions and machinery for extending them (and far too much checking that happens at runtime instead of compile-time.)

GraphQL is arguably easier to use than raw RESTful APIs, but it's still rather low-level. It's a bit higher than low-level CRUD operations, but not much.

What strikes me is that API endpoints are notionally like relvars. GET/POST (per REST terminology) or queries (per GraphQL terminology) retrieves. Mutations (per GraphQL terminology) or PUT/DELETE (per REST, and GET/POST can also be used to update) are like relvar updates.

That means what we could have is a relational model for API access, where a relational algebra is used to query whilst relevant operations update a collection of relvars (which are actually API endpoints.) That would be much higher level than REST, and could be considerably higher level -- and probably simpler, easier, and more powerful -- than GraphQL.

I think I'll work on a prototype.

Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

That's kind of the forte of bash and/or Python, though I wouldn't say either are particularly good at it, it's just everything else is a bit worse. I use Rel on a mix of CSVs, spreadsheets, external SQL database tables and internal relvars and for those it's great. It's not much help on emails and ZIPs and system APIs, though I can use Java code and libraries from within Rel. It's not fun or easy, though.

It's that which led me to the to-Java transpiler I've been slowly working on, to get both friendly relational semantics and Java for just this sort of thing.

Something I've been thinking about a lot lately is APIs; particularly when using microservices. Without getting into a debate about pros and cons of microservices, think of them here as "a bunch of Web-technology APIs."

Typically, when we want to use a bunch of APIs from some client, they're implemented and accessed via RESTful protocols -- which generally means we can use a given API endpoint to GET (retrieve), POST/PUT (update or insert), and/or DELETE (delete) data. That works, but it's low-level; it's basically primitive CRUD operations over the wire, and the data representations are largely arbitrary so it's dependent on much pre-agreement between API providers and consumers.

There's an alternative to (raw) REST called GraphQL, which lets you query data from an endpoint (with facilities to specify what data you want, like a somewhat-hierarchical relational projection) or mutate (update/insert/delete) data at an API endpoint. It also provides a rather crude facility for something akin to a JOIN on the results of diverse API queries, called "schema stitching", and it provides some standard type definitions and machinery for extending them (and far too much checking that happens at runtime instead of compile-time.)

GraphQL is arguably easier to use than raw RESTful APIs, but it's still rather low-level. It's a bit higher than low-level CRUD operations, but not much.

What strikes me is that API endpoints are notionally like relvars. GET/POST (per REST terminology) or queries (per GraphQL terminology) retrieves. Mutations (per GraphQL terminology) or PUT/DELETE (per REST, and GET/POST can also be used to update) are like relvar updates.

That means what we could have is a relational model for API access, where a relational algebra is used to query whilst relevant operations update a collection of relvars (which are actually API endpoints.) That would be much higher level than REST, and could be considerably higher level -- and probably simpler, easier, and more powerful -- than GraphQL.

I think I'll work on a prototype.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#27 · December 31, 2022, 10:17 pm

Quote from Paul Vernon on December 22, 2022, 11:27 am

Seeing as this forum is not so very active nowadays, I was wondering where you all go to for stimulating ideas, discussion and debate on relational matters?

Quora, Stack Overflow, LinkedIn, Wikipedia are some of the place I go to nowadays, but I fear I'm missing out elsewhere.

There is a Slack workspace provided by dbt that is great to keep up with the "modern data stack" and observing everyone's ongoing life with SQL, but there is no real discussion of fundamentals there.

Any other suggestions?

I'm hanging out nowhere anymore (almost). Linkedin discussions by "modelers" around issues of "modeling" are the only thing that still comes close to being somewhat intellectually satisfying. But if people like Ronald Ross have a question using which they want to tease out information that's in other people's minds, you can't get appreciated by just answering that his problem was already solved 15 yrs ago. (I effectively had him answer once that "mathematics is not the solution, because ordinary people don't understand that anymore" - not his words, but my paraphrase reflects the intent quite accurately.)

I'm home on burn-out/depression trying to figure out what to do with the last yrs of my professional life. One counseler, after hearing my story of how and why I did SIRA_PRISE, said that "when the highly gifted brain decides to focus on one problem exclusively and do nothing else anymore, that gives an orgy of development of new knowledge and new insights. That goes at a pace that no one else in the world is able to keep up with. And if you keep that orgy going for 4-5 yrs uninterrupted, well, no picture needed to show where you end up.". Those are just others words for "there is nothing left for you to do in this world.". So I need to find something in the way of new directions.

Author of SIRA_PRISE

#28 · December 31, 2022, 10:26 pm

Quote from Hugh on December 27, 2022, 2:44 pm

If Dave's trying to tempt me with his nice bird pics (robin, eagle owl), then he's succeeded. This one's a nice video taken by my trailcam, fortuitously set to take 30 seconds per shot.

Hugh

I've had Roodstaart (gekraagde roodstaart, I think, because it was clearly a couple and neither of them were black as at least one of them should have been had they been zwarte roodstaart) in my little garden for some 2-3 months. This was the first year I got to see them.

Author of SIRA_PRISE

#29 · December 31, 2022, 10:29 pm

Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

Dude, please, learn the RM. The "not amenable" is a DIRECT consequence of people not being willing to let the "type system" be anything more than "just numbers, dates and strings".

Author of SIRA_PRISE

#30 · January 1, 2023, 12:26 am

Quote from dandl on January 1, 2023, 12:26 am

Quote from Dave Voorhis on December 31, 2022, 1:22 pm

Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

That's kind of the forte of bash and/or Python, though I wouldn't say either are particularly good at it, it's just everything else is a bit worse.

Bash is a horrible mixture of 70's cute features, impossible quoting conventions and a maze of twisty little Unix filters, all different. I won't use it. [Actually, half the time it finishes up being a mix of Sed and Awk, so you might just as well just use Python and be done with it.]

Python happens by pure luck to be the winner over Perl and Ruby, with PHP a sad last. I do use it, but never with pleasure.

I use Rel on a mix of CSVs, spreadsheets, external SQL database tables and internal relvars and for those it's great. It's not much help on emails and ZIPs and system APIs, though I can use Java code and libraries from within Rel. It's not fun or easy, though.

It's that which led me to the to-Java transpiler I've been slowly working on, to get both friendly relational semantics and Java for just this sort of thing.

Which is kind of what I've been talking about -- that's one way to add features to a known language.

What strikes me is that API endpoints are notionally like relvars. GET/POST (per REST terminology) or queries (per GraphQL terminology) retrieves. Mutations (per GraphQL terminology) or PUT/DELETE (per REST, and GET/POST can also be used to update) are like relvar updates.

That means what we could have is a relational model for API access, where a relational algebra is used to query whilst relevant operations update a collection of relvars (which are actually API endpoints.) That would be much higher level than REST, and could be considerably higher level -- and probably simpler, easier, and more powerful -- than GraphQL.

I think I'll work on a prototype.

I have used APIs to retrieve JSON arrays, and screen scraping to retrieve HTML tables, but I'm not really using enough of this kind of stuff to comment usefully.

IMO the BIG reason for the RM is to think about relations (or tables) as things in their own right, not just a stream of tuples. Linq has to be written stream-wise, but the mental picture easily rises to the higher level. That's what I would focus on.

Quote from Dave Voorhis on December 31, 2022, 1:22 pm

Quote from dandl on December 31, 2022, 12:44 am

Actually, it occurs to me that there is a set of problems I encounter routinely that involve connections between tabular data not amenable to putting in a server database. Things like searching through emails, files and folders on disk, text files, downloaded datasets in CSVs and ZIPs, system APIs, etc. The type system is just numbers, dates and strings for the most part, but it works best with a scripting language, interactively, a bit the way you use SQL on stored data. It harks back to your shell scripts. I don't know any tools that do that stuff well, if at all.

That's kind of the forte of bash and/or Python, though I wouldn't say either are particularly good at it, it's just everything else is a bit worse.

Bash is a horrible mixture of 70's cute features, impossible quoting conventions and a maze of twisty little Unix filters, all different. I won't use it. [Actually, half the time it finishes up being a mix of Sed and Awk, so you might just as well just use Python and be done with it.]

Python happens by pure luck to be the winner over Perl and Ruby, with PHP a sad last. I do use it, but never with pleasure.

I use Rel on a mix of CSVs, spreadsheets, external SQL database tables and internal relvars and for those it's great. It's not much help on emails and ZIPs and system APIs, though I can use Java code and libraries from within Rel. It's not fun or easy, though.

It's that which led me to the to-Java transpiler I've been slowly working on, to get both friendly relational semantics and Java for just this sort of thing.

Which is kind of what I've been talking about -- that's one way to add features to a known language.

What strikes me is that API endpoints are notionally like relvars. GET/POST (per REST terminology) or queries (per GraphQL terminology) retrieves. Mutations (per GraphQL terminology) or PUT/DELETE (per REST, and GET/POST can also be used to update) are like relvar updates.

That means what we could have is a relational model for API access, where a relational algebra is used to query whilst relevant operations update a collection of relvars (which are actually API endpoints.) That would be much higher level than REST, and could be considerably higher level -- and probably simpler, easier, and more powerful -- than GraphQL.

I think I'll work on a prototype.

I have used APIs to retrieve JSON arrays, and screen scraping to retrieve HTML tables, but I'm not really using enough of this kind of stuff to comment usefully.

IMO the BIG reason for the RM is to think about relations (or tables) as things in their own right, not just a stream of tuples. Linq has to be written stream-wise, but the mental picture easily rises to the higher level. That's what I would focus on.

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters