The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Aggregates in relational calculus

Quote from dandl on August 25, 2019, 2:09 pm

 

Those are two quite different things. Only a tiny part of the accesses to a data repository are for the purposes of update, and an even tinier fraction assume a formal database and concurrent/atomic update. A database language to support those updates is important, but merely a niche. I write tools to make it possible, but I never use them. ACID is way down the list on what I need.

I agree that concurrent update is not so important, but read/write concurrency is still very important, except for truly immutable data.  Nothing like processing a stream of data only to have it change out from under you or disappear completely as you go.

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

Most people's queries seem to be of the form SELECT * FROM table [WHERE column = ?], in which case better query languages are more or less irrelevant.

 

Quote from dandl on August 25, 2019, 2:09 pm
Quote from Dave Voorhis on August 25, 2019, 9:16 am

Though I note, largely as an aside, that I'm increasingly questioning of the assumption that a relational algebra is the right approach, particularly as what characterises the "power" of a database language for most users is not adherence to a relational algebra (or what's influenced by it -- SQL) but the fact that it's reasonably intuitive and provides persistence and ACID-compliant transactional state. (Though with much of NoSQL you can leave out ACID compliance and nobody seems to mind.)

Note that .NET LINQ and Java Streams and other collection frameworks provide an effective means to query containers without being a familiar relational algebra. Indeed, what they lack isn't a relational algebra but easy persistence and, often, ACID-compliant transactional state. Does the average -- or even well above average -- programmer care whether they implement a relational algebra (or not), as long as they're powerful and expressive -- and, ideally, provide easy persistence and ACID compliance?

Those are two quite different things. Only a tiny part of the accesses to a data repository are for the purposes of update, and an even tinier fraction assume a formal database and concurrent/atomic update. A database language to support those updates is important, but merely a niche. I write tools to make it possible, but I never use them. ACID is way down the list on what I need.

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

I think you're describing what you need, and "most data is queried more often than it is updated" depends entirely on the domain you're working in.

I worked on an inventory system where quantities on hand were irrelevant -- the items being cheap and effectively unlimited, but their outgoing still needed to be noted -- but every transaction was recorded so that a single report could be generated once a month. I.e., vast number of updates, one query. I've worked on logging systems that record masses of data on the off chance that it might need to be queried (and rarely is.) Many updates, often no queries...

This is again back to my point that we're mainly talking about tradeoffs here. Your recent posts smack of a one right way to do this, and my point is that no, there isn't and there can't be. Rather than suggest what needs to be done as a singular direction, perhaps it is more productive to focus on what you need to do to make a tool that solves a problem for your domain. It will almost certainly be very different from what others will do to solve problems for their niches, whether based on a relational model or SQL (as per the various *QL languages) or something else.

I think the original (ostensibly shared) goal set by TTM -- replace SQL -- is (currently) a fruitless endeavour and perhaps an unnecessarily limited one. IT has fragmented and broadened so much that there is no reasonable single focus, and replacing SQL is definitely not it. That said, I don't think just creating a general purpose language with embedded relational operators is "it", either, or at least not all of it. If (a/the) RM is to remain viable, it will do so as a component of a variety of tools and in a variety of ways, some being little more than Codd's RA, some being a full D, some being more than a full D, some exhibiting contradictory and incompatible features and approaches as befits various requirements and user inclinations, and many being something else entirely but remaining identifiably faithful (if only glancingly) to some variation of the relational model.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from johnwcowan on August 25, 2019, 4:50 pm
Quote from dandl on August 24, 2019, 2:23 pm
  1. The statically typed GP languages use a record structure with (a) ordered members (b) type name equivalence. The named perspective presumes a tuple with (a) a set of attributes (b) equivalence for types that share the same set of attributes.

That turns out not to be the case, at least not always.  C, C++, and Cobol have ordered members, but there is no ordering in Java or C#, where objects aren't even necessarily laid out in memory until class loading time.  Algol 68 uses structural type equivalence exclusively, and OCaml, Go, and C++ templating language use it to a varying extent.

It's not an important point, and it was intended to be read as and/or. The point is that these languages do lay out record members in some order, at times that order may be made visible for various purposes, and the languages are designed conceptually around the idea of ordered members and user-named types. There is nothing in the way the language is designed and implemented to correspond with the 'set of selector names' (HHT) or the 'RELATION type generator' (TTM), or the set equivalence of types that follows.

Andl - A New Database Language - andl.org
Quote from dandl on August 26, 2019, 4:27 am

 

It's not an important point, and it was intended to be read as and/or. The point is that these languages do lay out record members in some order, at times that order may be made visible for various purposes, and the languages are designed conceptually around the idea of ordered members and user-named types.

Algol 68 neither assigns meaning to the order in which structure fields are declared (they are a set) nor operates on name equivalence at the structure level (two structure types are the same iff they have the same set of fields with the same field names and types).  All logical differences are big differences.

But you say it's a minor point to your argument, so I'll say no more about it.

 

Quote from johnwcowan on August 26, 2019, 4:46 am
Quote from dandl on August 26, 2019, 4:27 am

 

It's not an important point, and it was intended to be read as and/or. The point is that these languages do lay out record members in some order, at times that order may be made visible for various purposes, and the languages are designed conceptually around the idea of ordered members and user-named types.

Algol 68 neither assigns meaning to the order in which structure fields are declared (they are a set) nor operates on name equivalence at the structure level (two structure types are the same iff they have the same set of fields with the same field names and types).  All logical differences are big differences.

But you say it's a minor point to your argument, so I'll say no more about it.

 

Thanks John, em I'm having trouble parsing your comment. If I take it one way, it'll be quite a major point. To quote/paraphrase ...

  1. Algol 68 [doesn't] assigns meaning to the order in which structure fields are declared (they are a set).
  2. [does treat] two structure types are the same iff they have the same set of fields with the same field names and types.
  3. [doesn't] operates on name equivalence at the structure level.

What does point 3's "name equivalence" denote? If I take it as equivalence of field names, that seems to contradict point 2. But perhaps you mean name of the overall structure type?

Perhaps you mean "types are the same" for some purposes; but Algol 68 doesn't for other purposes operate on "name equivalence"?

To take Dave's example from earlier in the thread

class x {int p} is a different type from class y {int p},

but an anonymous structure {int p, int q} is the same type as {int q, int p}, different from {int q, int r}? (I've no idea if Algol 68 has anonymous structures, nor what the syntax is to declare them.)

Then how about class z1 {x q, int r} compare class z2 {y q, int r}? Although the component named q has has different nominal type (x vs y), the bodys' type is the same and same field name p.

Quote from johnwcowan on August 25, 2019, 5:10 pm

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

Most people's queries seem to be of the form SELECT * FROM table [WHERE column = ?], in which case better query languages are more or less irrelevant.

That's as maybe, but a very solid subset of queries are what you can do in Excel, or can't quite do in Excel. Things like sorting, selecting one or more ranges and hiding some columns work OK in Excel. Totaling is kind of OK, but grouping and sub-totaling are bad. Combining data from multiple tables is woeful.

The classic situation is we have two different computer systems, one for accounts, one for issue reporting. We want to check if any issues were reported by people who had unpaid accounts. The go-to tool is Excel: we can write an SQL query to pull out the relevant data from each system but we have no easy way to combine the two tables and generate a usable report, especially if we need it on a regular basis.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on August 25, 2019, 5:19 pm
Quote from dandl on August 25, 2019, 2:09 pm

Those are two quite different things. Only a tiny part of the accesses to a data repository are for the purposes of update, and an even tinier fraction assume a formal database and concurrent/atomic update. A database language to support those updates is important, but merely a niche. I write tools to make it possible, but I never use them. ACID is way down the list on what I need.

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

I think you're describing what you need, and "most data is queried more often than it is updated" depends entirely on the domain you're working in.

I worked on an inventory system where quantities on hand were irrelevant -- the items being cheap and effectively unlimited, but their outgoing still needed to be noted -- but every transaction was recorded so that a single report could be generated once a month. I.e., vast number of updates, one query. I've worked on logging systems that record masses of data on the off chance that it might need to be queried (and rarely is.) Many updates, often no queries...

No, this is true across most domains. The structured data held by an enterprise to reflect its business activities is a tiny part of all interesting data, but even in the business you mention I think you'll find that each row in the database is retrieved anything from 2x to 20x more frequently than it is updated.

But consider the toll road with electronic tolling. In a given period of time the business system might record just a few vehicles passing tolling points, the traffic flow sensors record some thousands of vehicle movements while the video systems used for monitoring and licence plate recognition record terabytes of raw data.

No, it's not about what I need, it's about considering all kinds of data and not just the structured, organised, sanitised business records that live in a RDBMS.

This is again back to my point that we're mainly talking about tradeoffs here. Your recent posts smack of a one right way to do this, and my point is that no, there isn't and there can't be. Rather than suggest what needs to be done as a singular direction, perhaps it is more productive to focus on what you need to do to make a tool that solves a problem for your domain. It will almost certainly be very different from what others will do to solve problems for their niches, whether based on a relational model or SQL (as per the various *QL languages) or something else.

I agree, there is no one right way and there never could be. I just want to move the focus away from the places where SQL is already dominant to places where SQL cannot live but something else might.

I think the original (ostensibly shared) goal set by TTM -- replace SQL -- is (currently) a fruitless endeavour and perhaps an unnecessarily limited one. IT has fragmented and broadened so much that there is no reasonable single focus, and replacing SQL is definitely not it. That said, I don't think just creating a general purpose language with embedded relational operators is "it", either, or at least not all of it. If (a/the) RM is to remain viable, it will do so as a component of a variety of tools and in a variety of ways, some being little more than Codd's RA, some being a full D, some being more than a full D, some exhibiting contradictory and incompatible features and approaches as befits various requirements and user inclinations, and many being something else entirely but remaining identifiably faithful (if only glancingly) to some variation of the relational model.

I tend to agree. There is a lot of tabular data in the world, and a lot of it lives in various informal non-database stores such as spreadsheets, web pages, CSV files and so on. There ought to be a role for one or more such tools, but it's not that easy to say where exactly is the low hanging fruit.

Andl - A New Database Language - andl.org
Quote from dandl on August 28, 2019, 12:44 am
Quote from Dave Voorhis on August 25, 2019, 5:19 pm
Quote from dandl on August 25, 2019, 2:09 pm

Those are two quite different things. Only a tiny part of the accesses to a data repository are for the purposes of update, and an even tinier fraction assume a formal database and concurrent/atomic update. A database language to support those updates is important, but merely a niche. I write tools to make it possible, but I never use them. ACID is way down the list on what I need.

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

I think you're describing what you need, and "most data is queried more often than it is updated" depends entirely on the domain you're working in.

I worked on an inventory system where quantities on hand were irrelevant -- the items being cheap and effectively unlimited, but their outgoing still needed to be noted -- but every transaction was recorded so that a single report could be generated once a month. I.e., vast number of updates, one query. I've worked on logging systems that record masses of data on the off chance that it might need to be queried (and rarely is.) Many updates, often no queries...

No, this is true across most domains. The structured data held by an enterprise to reflect its business activities is a tiny part of all interesting data, but even in the business you mention I think you'll find that each row in the database is retrieved anything from 2x to 20x more frequently than it is updated.

But consider the toll road with electronic tolling. In a given period of time the business system might record just a few vehicles passing tolling points, the traffic flow sensors record some thousands of vehicle movements while the video systems used for monitoring and licence plate recognition record terabytes of raw data.

No, it's not about what I need, it's about considering all kinds of data and not just the structured, organised, sanitised business records that live in a RDBMS.

Exactly. Though it is notable (and slightly eyebrow-raising) that in spite of that awareness, you still see fit to make (incorrect) assumptions about the I/O profile is of an application I wrote.

Tsk. Tsk.

Again, as soon as you say "most data", you are already viewing the field through the distortions of your own domain lens. If you must look at the world through it, I suggest it will be far more interesting and productive for you to tell us what you are doing with the relational model -- and in turn we'll tell you what we are doing with the relational model -- rather than try to tell us what we should do with the relational model.

Quote from dandl on August 28, 2019, 12:44 am
Quote from Dave Voorhis on August 25, 2019, 5:19 pm

I think the original (ostensibly shared) goal set by TTM -- replace SQL -- is (currently) a fruitless endeavour and perhaps an unnecessarily limited one. IT has fragmented and broadened so much that there is no reasonable single focus, and replacing SQL is definitely not it. That said, I don't think just creating a general purpose language with embedded relational operators is "it", either, or at least not all of it. If (a/the) RM is to remain viable, it will do so as a component of a variety of tools and in a variety of ways, some being little more than Codd's RA, some being a full D, some being more than a full D, some exhibiting contradictory and incompatible features and approaches as befits various requirements and user inclinations, and many being something else entirely but remaining identifiably faithful (if only glancingly) to some variation of the relational model.

I tend to agree. There is a lot of tabular data in the world, and a lot of it lives in various informal non-database stores such as spreadsheets, web pages, CSV files and so on. There ought to be a role for one or more such tools, but it's not that easy to say where exactly is the low hanging fruit.

If you'll pardon my perhaps unduly aphoristic quip, it's the low hanging fruit that tends to be overripe, rotten, or already picked through by others.

Though more effort to pick, maybe we should look a bit further up the tree for better fruit.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on August 28, 2019, 12:15 am
Quote from johnwcowan on August 25, 2019, 5:10 pm

Most data is queried far more often than it is updated, and it is reasonable to expend energy on better query languages that operate within a familiar environment. Recent announcements include PartiQL, NRQL, DQL, and I'm sure I've missed a few. Are they based on the RA? I have no idea, but they do give some idea where the action is happening.

Most people's queries seem to be of the form SELECT * FROM table [WHERE column = ?], in which case better query languages are more or less irrelevant.

That's as maybe, but a very solid subset of queries are what you can do in Excel, or can't quite do in Excel. Things like sorting, selecting one or more ranges and hiding some columns work OK in Excel. Totaling is kind of OK, but grouping and sub-totaling are bad. Combining data from multiple tables is woeful.

The classic situation is we have two different computer systems, one for accounts, one for issue reporting. We want to check if any issues were reported by people who had unpaid accounts. The go-to tool is Excel: we can write an SQL query to pull out the relevant data from each system but we have no easy way to combine the two tables and generate a usable report, especially if we need it on a regular basis.

For that sort of thing MS Access is conceptually ideal, particularly if you're a Microsoft-heavy shop. Unfortunately, in practice Access typically turns out to be too technical (compared to Excel, at least) for users and too lightweight (and fragile and quirky) for programmers.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on August 28, 2019, 9:22 am

 

For that sort of thing MS Access is conceptually ideal, particularly if you're a Microsoft-heavy shop. Unfortunately, in practice Access typically turns out to be too technical (compared to Excel, at least) for users and too lightweight (and fragile and quirky) for programmers.

That was the concept behind Lotus Agenda.  A modernized open-source cross-platform Agenda (perhaps with a SQLite backing store, which would permit reasonable interchange with corporate data) would be a great boon to the world.  Several attempts have been made, but they have all foundered on the second-system effect as far as I know (Chandler) or been abandoned before they were useful (beeswax).

There's really not that much to it.  There's a category hierarchy; a tiny bit of AI for extracting categories, dates, and other properties from unstructured text; an ultra-simplistic definition of views (which categories and properties to project, which categories to select); simple condition/action pairs to add new categories to an item based on its existing categories, delete an older item, or what have you.  It isn't even necessary to distinguish between AND and OR; a category can be marked "children are mutually exclusive", in which case those children are ORed when they appear together; everything else is ANDed.