The case for an RA based on generics instead of types
Quote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
Quote from Dave Voorhis on April 8, 2020, 2:54 pm
There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
Quote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Quote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Quote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements. People rarely (allegedly) want more than Foreign Keys, because SQL provides only Foreign Keys (and not much more for a long time); and if you want anything fancier use triggers for validation and Stored Procedures for queries; except don't use triggers because they slow performance; Stored Procedures only for RBAR-style retrieval following Foreign Keys.
So Streams only provide lookup via declared Foreign Keys 'object graph' because SQL does little else because ... history and benighted imagination in 1970s.
What's hard to measure the rarity of, is where programs are using streams to provide the forward direction for the main logic, but taking each element of the stream and looking sideways/backwards with ad-hoc database access that would have been better expressed in one bigger query.
Furthermore not all possible Foreign Keys get declared (let alone more subtle referential integrity), again because of the worry about performance for transaction insert. So what is declared is very much focussed on lookup for validating transaction input: CRUD or a few dozen Order/Invoice/Payslip lines.
I've seen some horrendous stuff written with explicit loops in PL/SQL, because the dba wouldn't allow 'occasional' Foreign Keys to be declared; and certainly not a View using those Keys. I can think of a couple of typical SQL designs with lots of nullable fields that would be Foreign Key references if not Null.
Quote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements. People rarely (allegedly) want more than Foreign Keys, because SQL provides only Foreign Keys (and not much more for a long time); and if you want anything fancier use triggers for validation and Stored Procedures for queries; except don't use triggers because they slow performance; Stored Procedures only for RBAR-style retrieval following Foreign Keys.
So Streams only provide lookup via declared Foreign Keys 'object graph' because SQL does little else because ... history and benighted imagination in 1970s.
What's hard to measure the rarity of, is where programs are using streams to provide the forward direction for the main logic, but taking each element of the stream and looking sideways/backwards with ad-hoc database access that would have been better expressed in one bigger query.
Furthermore not all possible Foreign Keys get declared (let alone more subtle referential integrity), again because of the worry about performance for transaction insert. So what is declared is very much focussed on lookup for validating transaction input: CRUD or a few dozen Order/Invoice/Payslip lines.
I've seen some horrendous stuff written with explicit loops in PL/SQL, because the dba wouldn't allow 'occasional' Foreign Keys to be declared; and certainly not a View using those Keys. I can think of a couple of typical SQL designs with lots of nullable fields that would be Foreign Key references if not Null.
Quote from dandl on April 9, 2020, 12:40 amI get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I guess if that's the problem, it's what an ORM tries to solve and there are already plenty of those.
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
I'm not sure I understand your last "Edit: ..." paragraph.
Oops! I edited the wrong message! Ignore.
I get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I guess if that's the problem, it's what an ORM tries to solve and there are already plenty of those.
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
I'm not sure I understand your last "Edit: ..." paragraph.
Oops! I edited the wrong message! Ignore.
Quote from dandl on April 9, 2020, 1:25 amWhat this discussion highlights to me is that there remains a significant problem with scalar types. My proposal assumes that all the attribute data exists in a host-language-provided type system, nothing more needed, but this is rarely true. And this is a big part of what an ORM does.
Simple examples:
- you read data from a CSV file, you get strings, but you want to read (and write) numbers and dates
- you read data from a JSON file, you get numbers and strings, but you want to read (and write) currency and dates
- you read data from ODBC, you get columns X and Y but you want to read (and write) a POINT data structure
You can (and should) get rid of the non-scalar types by using a template generic approach, as used by collection classes in C++, C# and Java, but you still need an adequate scalar type system for attributes. There is no longer the type system mismatch of TTM/D, but now there is the type system mismatch of all the different data sources you need to deal with: CSV, XLS, JSON, ODBC, POCO/POJO to mention a few.
Thinking of the template-based ERA in language terms, it's possible to devise syntax to write the actual queries, but there needs to be a declarations section spelling out all the data sources and callable functions. This is very much like the approach used in a number of compiler generators. Here is a fragment of a Pegasus grammar for Andl:
@namespace Andl.Peg @classname PegParser @using System.Linq // A do body is just {} or { statements }, EOF not allowed DoBody <IList<AstStatement>> = WSO LCexp &{ AST(state).Enter() } WSC v:DoBodyLine* &{ AST(state).Exit() } WSO RCexp { v }; DoBodyLine <AstStatement> = DirectiveOrBlank { AST(state).Empty() } // Note: DoBlock must discard these / WSO !RC v:Statement WSO (&RC / EOLchk EOLX) { v };You can see portions of PEG code interleaved with C# code. The result is generated C# code, which is readable and debuggable. It actually works quite well, and is one real possibility, but it's a much bigger job than the initial proposal. More thought needed.
What this discussion highlights to me is that there remains a significant problem with scalar types. My proposal assumes that all the attribute data exists in a host-language-provided type system, nothing more needed, but this is rarely true. And this is a big part of what an ORM does.
Simple examples:
- you read data from a CSV file, you get strings, but you want to read (and write) numbers and dates
- you read data from a JSON file, you get numbers and strings, but you want to read (and write) currency and dates
- you read data from ODBC, you get columns X and Y but you want to read (and write) a POINT data structure
You can (and should) get rid of the non-scalar types by using a template generic approach, as used by collection classes in C++, C# and Java, but you still need an adequate scalar type system for attributes. There is no longer the type system mismatch of TTM/D, but now there is the type system mismatch of all the different data sources you need to deal with: CSV, XLS, JSON, ODBC, POCO/POJO to mention a few.
Thinking of the template-based ERA in language terms, it's possible to devise syntax to write the actual queries, but there needs to be a declarations section spelling out all the data sources and callable functions. This is very much like the approach used in a number of compiler generators. Here is a fragment of a Pegasus grammar for Andl:
@namespace Andl.Peg @classname PegParser @using System.Linq // A do body is just {} or { statements }, EOF not allowed DoBody <IList<AstStatement>> = WSO LCexp &{ AST(state).Enter() } WSC v:DoBodyLine* &{ AST(state).Exit() } WSO RCexp { v }; DoBodyLine <AstStatement> = DirectiveOrBlank { AST(state).Empty() } // Note: DoBlock must discard these / WSO !RC v:Statement WSO (&RC / EOLchk EOLX) { v };
You can see portions of PEG code interleaved with C# code. The result is generated C# code, which is readable and debuggable. It actually works quite well, and is one real possibility, but it's a much bigger job than the initial proposal. More thought needed.
Quote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
Quote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
Quote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Quote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Quote from Dave Voorhis on April 9, 2020, 2:15 pmQuote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Quote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Quote from dandl on April 10, 2020, 4:37 amQuote from Dave Voorhis on April 9, 2020, 2:15 pmQuote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?
Quote from Dave Voorhis on April 9, 2020, 2:15 pmQuote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?
Quote from Dave Voorhis on April 10, 2020, 10:26 amQuote from dandl on April 10, 2020, 4:37 amQuote from Dave Voorhis on April 9, 2020, 2:15 pmQuote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?
I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?
I wasn't advocating it.
I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.
But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.
Quote from dandl on April 10, 2020, 4:37 amQuote from Dave Voorhis on April 9, 2020, 2:15 pmQuote from dandl on April 9, 2020, 2:09 pmQuote from Dave Voorhis on April 9, 2020, 10:31 amQuote from AntC on April 8, 2020, 11:08 pmQuote from Dave Voorhis on April 8, 2020, 6:33 pmQuote from Erwin on April 8, 2020, 6:09 pmQuote from Dave Voorhis on April 8, 2020, 2:54 pmThere are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
I remember having had to revert to that paradigm once. I cannot imagine anyone liking it, but then perhaps there are those who simply do not know any better.
It's not a matter of not knowing any better, it's simply that in the general-purpose programming contexts where Streams is normally used -- or its .NET & C# equivalent, LINQ -- you usually do have a preexisting object graph to query within a running program. Performing ad-hoc joins on tables/relvars (that are often unrelated until referenced in a query) is a database thing; it's not a typical programming thing. You can do it -- see above flatMap() + filter() and jOOλ's innerJoin(), above -- but it's rare to need to do it.
Hmmm? Rare? That's always smelled to me like one of those self-fulfilling statements.
It's rare because of the inherent nature of object oriented programs; instance-to-instance references -- i.e., the object graph -- tend to be created from the start and maintained as state changes, rather than constructed dynamically in queries.
An object graph isn't an artificial construct. It's already there because it's the essence of a running object-oriented program.
It needs to be emphasised that a running object-oriented program and a database are completely different things. The ideas discussed in this thread -- using an object-oriented language to connect to diverse and discrete data sources and then use a functional-programming-inspired API like .NET LINQ or Java Streams to query them -- is a bit odd from a typical object-oriented programming point of view.
I would like to draw a distinction between 3 classes of program (I'm sure there are others).
- Window on Data (a la Toon Koopelaars). The database is a filing cabinet for a running business; there are some reasonably simple programs to add/update what it contains, and a variety of displays/reports that let people see the data in various ways.
- Data warehouse. The data is a record of stuff that has happened (weather records, power usage, traffic, etc), the challenge is all on the analysis side.
- Persistence mechanism. The database is simply the persistence store for a continually running complex application, so that it can be restarted if needed, and is not accessed directly or except in the context of that application.
It seems to me you're always focused on the last of these, as if it was the only thing that mattered. I don't think it is.
Are you sure you've responded to the correct post?
Your response doesn't appear to have anything to do with what I wrote.
If it was a response to what I wrote, then you appear to have misunderstood what I wrote.
Perhaps I did. You gave an answer that seemed to me to come from a particular narrow perspective; I was trying to point out that there are others, equally valuable, that have nothing in particular to do with 'maintaining an object graph'. But perhaps it would be better if I just ask: what do you mean, and how would you prefer to go about "using an object-oriented language to connect to diverse and discrete data sources " and then querying same if this is not it?
I was merely pointing out that's the typical assumption in the object-oriented programming world. I.e., the usual thinking is that you already have a fully-connected object graph, so why would you need JOIN?
I wasn't advocating it.
I like the idea of using a general-purpose language as a query language for diverse data sources. It's the essence of my work-in-progress datasheet tool.
But I imagine a goodly number of typical object-oriented programmers will want to create an object graph linking instances of data from the diverse data sources (assuming the selection of sources is relatively static) before they reach the query stage, and I have no objection to that. Indeed, for repeated queries, it will improve performance.