The case for an RA based on generics instead of types

#21 · April 10, 2020, 11:03 am

Quote from AntC on April 10, 2020, 11:03 am

Quote from Dave Voorhis on April 10, 2020, 10:26 am

Quote from dandl on April 10, 2020, 4:37 am

Quote from Dave Voorhis on April 9, 2020, 2:15 pm

Quote from dandl on April 9, 2020, 2:09 pm

Quote from Dave Voorhis on April 9, 2020, 10:31 am

Quote from AntC on April 8, 2020, 11:08 pm

Quote from Dave Voorhis on April 8, 2020, 6:33 pm

Quote from Erwin on April 8, 2020, 6:09 pm

Quote from Dave Voorhis on April 8, 2020, 2:54 pm

There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().

Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.

Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.

I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.

It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.

Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.

It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.

An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.

It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.

I would like to draw a distinction between 3 classes of program (I'm sure there are others).

Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.

Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.

Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.

It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.

Are you sure you've responded to the correct post?

Your response doesn't appear to have anything to do with what I wrote.

If it was a response to what I wrote, then you appear to have misunderstood what I wrote.

Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?

I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?

Is "a fully-connected object graph" tantamount to a bunch of relvars with Foreign Key constraints declared? Or is the object graph more like a hierarchical/network database structure? Because (I don't need to tell you this) if the connections are too rigid, your applications become driven by navigation: your control logic is not JOIN because it's lookup; and that's fine providing you only ever need to follow that one graph.

I wasn't advocating it.

I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.

But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.

So by the data modeller/programmer declaring Foreign Key constraints, equally the DBMS can improve performance by maintaining access paths. This sounds like a distinction without a difference; or perhaps using different terminology for the same data structure. (Except I smell an implication that a (R-ish)DBMS might not be as performant.) In TTM we prefer the terminology 'Inclusion Dependencies' because some data structures/referential integrity can't be expressed as Foreign Keys.

In particular, Exclusion Dependencies are difficult to express in a way that gives good performance. Oh silly me, of course we don't need Exclusion Dependencies because we have Null; or Null pointers in the object graph -- or if we're lucky, Tagged Unions to represent a properly-terminated edge in the graph.

Quote from Dave Voorhis on April 10, 2020, 10:26 am

Quote from dandl on April 10, 2020, 4:37 am

Quote from Dave Voorhis on April 9, 2020, 2:15 pm

Quote from dandl on April 9, 2020, 2:09 pm

Quote from Dave Voorhis on April 9, 2020, 10:31 am

Quote from AntC on April 8, 2020, 11:08 pm

Quote from Dave Voorhis on April 8, 2020, 6:33 pm

Quote from Erwin on April 8, 2020, 6:09 pm

Quote from Dave Voorhis on April 8, 2020, 2:54 pm

There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().

Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.

Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.

I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.

It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.

Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.

It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.

An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.

It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.

I would like to draw a distinction between 3 classes of program (I'm sure there are others).

Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.

Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.

Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.

It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.

Are you sure you've responded to the correct post?

Your response doesn't appear to have anything to do with what I wrote.

If it was a response to what I wrote, then you appear to have misunderstood what I wrote.

Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?

I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?

Is "a fully-connected object graph" tantamount to a bunch of relvars with Foreign Key constraints declared? Or is the object graph more like a hierarchical/network database structure? Because (I don't need to tell you this) if the connections are too rigid, your applications become driven by navigation: your control logic is not JOIN because it's lookup; and that's fine providing you only ever need to follow that one graph.

I wasn't advocating it.

I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.

But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.

So by the data modeller/programmer declaring Foreign Key constraints, equally the DBMS can improve performance by maintaining access paths. This sounds like a distinction without a difference; or perhaps using different terminology for the same data structure. (Except I smell an implication that a (R-ish)DBMS might not be as performant.) In TTM we prefer the terminology 'Inclusion Dependencies' because some data structures/referential integrity can't be expressed as Foreign Keys.

In particular, Exclusion Dependencies are difficult to express in a way that gives good performance. Oh silly me, of course we don't need Exclusion Dependencies because we have Null; or Null pointers in the object graph -- or if we're lucky, Tagged Unions to represent a properly-terminated edge in the graph.

#22 · April 10, 2020, 5:03 pm

Quote from Dave Voorhis on April 10, 2020, 5:03 pm

Quote from AntC on April 10, 2020, 11:03 am

Quote from Dave Voorhis on April 10, 2020, 10:26 am

Quote from dandl on April 10, 2020, 4:37 am

Quote from Dave Voorhis on April 9, 2020, 2:15 pm

Quote from dandl on April 9, 2020, 2:09 pm

Quote from Dave Voorhis on April 9, 2020, 10:31 am

Quote from AntC on April 8, 2020, 11:08 pm

Quote from Dave Voorhis on April 8, 2020, 6:33 pm

Quote from Erwin on April 8, 2020, 6:09 pm

Quote from Dave Voorhis on April 8, 2020, 2:54 pm

There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().

Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.

Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.

I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.

It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.

Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.

It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.

An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.

It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.

I would like to draw a distinction between 3 classes of program (I'm sure there are others).

Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.

Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.

Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.

It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.

Are you sure you've responded to the correct post?

Your response doesn't appear to have anything to do with what I wrote.

If it was a response to what I wrote, then you appear to have misunderstood what I wrote.

Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?

I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?

Is "a fully-connected object graph" tantamount to a bunch of relvars with Foreign Key constraints declared? Or is the object graph more like a hierarchical/network database structure? Because (I don't need to tell you this) if the connections are too rigid, your applications become driven by navigation: your control logic is not JOIN because it's lookup; and that's fine providing you only ever need to follow that one graph.

An object graph is probably closer to a network database than a relational database with foreign key constraints, but it's not really like either one. A running program isn't a database as such. The state of a running object-oriented (or procedural, with pointers/references like C) program is a collection of variables containing primitive values and references to instances, which themselves contain references to primitives or other instances. Some of the instances are containers, which are a collection of n references to other instances or primitives. There's typically no notion of ad-hoc queries (except maybe for debugging), and queries (as such) are predefined and reference a particular container. E.g., given container instance x, for each item i in x where i.foo < 0, add i.bar.baz to a UI listbox, or send the text representation of i.zot.zap.zaz to another system, or invoke i.blat() to perform some processing, etc.

Note the dotted syntax, which implies that i is a reference to an instance with (at least) member variables foo, bar and zot and a member method blat. In turn, bar is a reference to an instance with a member baz, and zot is a reference to an instance with a member zap, and zap is a reference to an instance with a member zaz. The bar→baz and zot→zap and zap→zaz references are what in the relational model would be obtained with joins. In an object-oriented or procedural language, they are references or pointers.

Generally, you don't explicitly construct an object graph or even think about it. It's simply the term given to the data structure that inevitably results from writing and running an object-oriented (or procedural with pointers or references) program. Most object-oriented (or procedural with pointers or references) programmers never even think about the object graph -- I suspect some wouldn't even recognise the term, or would dimly recognise it as being the data and data structures in memory -- but it's always there.

I wasn't advocating it.

I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.

But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.

So by the data modeller/programmer declaring Foreign Key constraints, equally the DBMS can improve performance by maintaining access paths. This sounds like a distinction without a difference; or perhaps using different terminology for the same data structure. (Except I smell an implication that a (R-ish)DBMS might not be as performant.) In TTM we prefer the terminology 'Inclusion Dependencies' because some data structures/referential integrity can't be expressed as Foreign Keys.

In particular, Exclusion Dependencies are difficult to express in a way that gives good performance. Oh silly me, of course we don't need Exclusion Dependencies because we have Null; or Null pointers in the object graph -- or if we're lucky, Tagged Unions to represent a properly-terminated edge in the graph.

Sorry, not following you here.

Quote from AntC on April 10, 2020, 11:03 am

Quote from Dave Voorhis on April 10, 2020, 10:26 am

Quote from dandl on April 10, 2020, 4:37 am

Quote from Dave Voorhis on April 9, 2020, 2:15 pm

Quote from dandl on April 9, 2020, 2:09 pm

Quote from Dave Voorhis on April 9, 2020, 10:31 am

Quote from AntC on April 8, 2020, 11:08 pm

Quote from Dave Voorhis on April 8, 2020, 6:33 pm

Quote from Erwin on April 8, 2020, 6:09 pm

Quote from Dave Voorhis on April 8, 2020, 2:54 pm

There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().

Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.

Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.

I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.

It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.

Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.

It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.

An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.

It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.

I would like to draw a distinction between 3 classes of program (I'm sure there are others).

Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.

Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.

Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.

It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.

Are you sure you've responded to the correct post?

Your response doesn't appear to have anything to do with what I wrote.

If it was a response to what I wrote, then you appear to have misunderstood what I wrote.

Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?

I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?

Is "a fully-connected object graph" tantamount to a bunch of relvars with Foreign Key constraints declared? Or is the object graph more like a hierarchical/network database structure? Because (I don't need to tell you this) if the connections are too rigid, your applications become driven by navigation: your control logic is not JOIN because it's lookup; and that's fine providing you only ever need to follow that one graph.

An object graph is probably closer to a network database than a relational database with foreign key constraints, but it's not really like either one. A running program isn't a database as such. The state of a running object-oriented (or procedural, with pointers/references like C) program is a collection of variables containing primitive values and references to instances, which themselves contain references to primitives or other instances. Some of the instances are containers, which are a collection of n references to other instances or primitives. There's typically no notion of ad-hoc queries (except maybe for debugging), and queries (as such) are predefined and reference a particular container. E.g., given container instance x, for each item i in x where i.foo < 0, add i.bar.baz to a UI listbox, or send the text representation of i.zot.zap.zaz to another system, or invoke i.blat() to perform some processing, etc.

Note the dotted syntax, which implies that i is a reference to an instance with (at least) member variables foo, bar and zot and a member method blat. In turn, bar is a reference to an instance with a member baz, and zot is a reference to an instance with a member zap, and zap is a reference to an instance with a member zaz. The bar→baz and zot→zap and zap→zaz references are what in the relational model would be obtained with joins. In an object-oriented or procedural language, they are references or pointers.

Generally, you don't explicitly construct an object graph or even think about it. It's simply the term given to the data structure that inevitably results from writing and running an object-oriented (or procedural with pointers or references) program. Most object-oriented (or procedural with pointers or references) programmers never even think about the object graph -- I suspect some wouldn't even recognise the term, or would dimly recognise it as being the data and data structures in memory -- but it's always there.

I wasn't advocating it.

I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.

But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.

So by the data modeller/programmer declaring Foreign Key constraints, equally the DBMS can improve performance by maintaining access paths. This sounds like a distinction without a difference; or perhaps using different terminology for the same data structure. (Except I smell an implication that a (R-ish)DBMS might not be as performant.) In TTM we prefer the terminology 'Inclusion Dependencies' because some data structures/referential integrity can't be expressed as Foreign Keys.

In particular, Exclusion Dependencies are difficult to express in a way that gives good performance. Oh silly me, of course we don't need Exclusion Dependencies because we have Null; or Null pointers in the object graph -- or if we're lucky, Tagged Unions to represent a properly-terminated edge in the graph.

Sorry, not following you here.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters

The case for an RA based on generics instead of types