Meaning of a relation
Quote from Dave Voorhis on January 20, 2021, 11:28 amQuote from dandl on January 20, 2021, 9:24 amThen you run a static analyzer that verifies that joining a, b and c really gives an ABC, and, even better, such a tool could be configured to check the table definitions in the database as well.
[Sorry, but not much, to sound like a fuddy-duddy but ...] In the 1970's, on System/38, database schemas were visible at operating system level/to compilers.
You'll be right at home here.
He is right at home here. You're responding to AntC. He's been here a while. :-)
Quote from dandl on January 20, 2021, 9:24 amThen you run a static analyzer that verifies that joining a, b and c really gives an ABC, and, even better, such a tool could be configured to check the table definitions in the database as well.
[Sorry, but not much, to sound like a fuddy-duddy but ...] In the 1970's, on System/38, database schemas were visible at operating system level/to compilers.
You'll be right at home here.
He is right at home here. You're responding to AntC. He's been here a while. :-)
Quote from Dave Voorhis on January 20, 2021, 12:12 pmQuote from dandl on January 20, 2021, 12:31 amIt was little more than a thin skin over JDBC, but just the right amount of skin to make it easier to use SQL queries in Java.
There is an issue here. One of the tasks of a serious-scale software project is to raise the level of the language by adding libraries, macros, reusable code. This greatly reduces the effort for incremental work on the application by those familiar with it, but greatly increases the work for new arrivals, or casual users/contributors. They want to be able to write code using principles learned elsewhere, with a relatively shallow layer of application-specific knowledge acquisition.
IMO this kind of 'thin layer over SQL' is well suited to ad hoc and hopeless at scale. It's good if you just want to drag out some data with a bit of SQL. It's bad if you want long-term durable code that can support rapid prototyping by core developers and deep test suites, and survive heavy refactoring and incremental requirements.
I would never use this thin layer approach at scale. I rather like the 'Helsinki' approach.
I think it's compatible with the Helsinki approach, at least as I understand it. My new "SQL amplifier" simply provides a convenient DBMS abstraction layer to:
- Safely and easily parametrise queries;
- Easily access query results in a native-code type-safe manner;
- Easily auto-generate update and insert queries, as they tend to be particularly awkward;
- Encourage and facilitate unit testing;
- Manage database schema versioning, particularly for multiple deployments; and
- Maintain synchronisation between the database schema and client-side code.
In other words, it does what typical ORMs like Entity Framework do, but with SQL instead of a client-side SQL-ish native code, and without ORM scaffolding, without inevitable ORM machinery to (as required) defeat/enable ORM caching, and without the usual explicit SQL workarounds for the poorly-performing or otherwise inadequate ORM-generated SQL.
It shouldn't harm application scalability at all, as the rest of the application is exactly the same as it would be with an ORM. It simply means the 'persistence layer' (as it usually appears in enterprise applications) can use SQL instead of some "ORM-QL."
Quote from dandl on January 20, 2021, 12:31 amIt was little more than a thin skin over JDBC, but just the right amount of skin to make it easier to use SQL queries in Java.
There is an issue here. One of the tasks of a serious-scale software project is to raise the level of the language by adding libraries, macros, reusable code. This greatly reduces the effort for incremental work on the application by those familiar with it, but greatly increases the work for new arrivals, or casual users/contributors. They want to be able to write code using principles learned elsewhere, with a relatively shallow layer of application-specific knowledge acquisition.
IMO this kind of 'thin layer over SQL' is well suited to ad hoc and hopeless at scale. It's good if you just want to drag out some data with a bit of SQL. It's bad if you want long-term durable code that can support rapid prototyping by core developers and deep test suites, and survive heavy refactoring and incremental requirements.
I would never use this thin layer approach at scale. I rather like the 'Helsinki' approach.
I think it's compatible with the Helsinki approach, at least as I understand it. My new "SQL amplifier" simply provides a convenient DBMS abstraction layer to:
- Safely and easily parametrise queries;
- Easily access query results in a native-code type-safe manner;
- Easily auto-generate update and insert queries, as they tend to be particularly awkward;
- Encourage and facilitate unit testing;
- Manage database schema versioning, particularly for multiple deployments; and
- Maintain synchronisation between the database schema and client-side code.
In other words, it does what typical ORMs like Entity Framework do, but with SQL instead of a client-side SQL-ish native code, and without ORM scaffolding, without inevitable ORM machinery to (as required) defeat/enable ORM caching, and without the usual explicit SQL workarounds for the poorly-performing or otherwise inadequate ORM-generated SQL.
It shouldn't harm application scalability at all, as the rest of the application is exactly the same as it would be with an ORM. It simply means the 'persistence layer' (as it usually appears in enterprise applications) can use SQL instead of some "ORM-QL."
Quote from dandl on January 20, 2021, 1:29 pmMy experience in using tools like this to build software is limited: Rails, MS EF, couple of others. I defer to experts.
But I know about building tools that are used in this way by others, and I know how hard it is to create the right abstraction and defend it over time, at scale and across change. I think the fundamental premise of the ORM is wrong, but while I think a 'layer over SQL' is probably a better way, at least for smaller apps, I don't think it scales or delivers the other benefits of a true abstraction layer.
The ORM is wrong because it maintains a fiction of an OO database. If you had an OO database you wouldn't use it, so why pretend?
Layer over SQL is less wrong (because it keeps the good bits) but it really only gives you better glue code, it doesn't raise the level of the language (or the database).
I would prefer a solution that was function oriented (like Linq or Java Streams) and composable, but providing the benefits of TTM: extensible type system, RA, MA. Visible SQL won't get you there.
My experience in using tools like this to build software is limited: Rails, MS EF, couple of others. I defer to experts.
But I know about building tools that are used in this way by others, and I know how hard it is to create the right abstraction and defend it over time, at scale and across change. I think the fundamental premise of the ORM is wrong, but while I think a 'layer over SQL' is probably a better way, at least for smaller apps, I don't think it scales or delivers the other benefits of a true abstraction layer.
The ORM is wrong because it maintains a fiction of an OO database. If you had an OO database you wouldn't use it, so why pretend?
Layer over SQL is less wrong (because it keeps the good bits) but it really only gives you better glue code, it doesn't raise the level of the language (or the database).
I would prefer a solution that was function oriented (like Linq or Java Streams) and composable, but providing the benefits of TTM: extensible type system, RA, MA. Visible SQL won't get you there.
Quote from tobega on January 20, 2021, 1:59 pmQuote from dandl on January 20, 2021, 12:31 amIt was little more than a thin skin over JDBC, but just the right amount of skin to make it easier to use SQL queries in Java.
There is an issue here. One of the tasks of a serious-scale software project is to raise the level of the language by adding libraries, macros, reusable code. This greatly reduces the effort for incremental work on the application by those familiar with it, but greatly increases the work for new arrivals, or casual users/contributors. They want to be able to write code using principles learned elsewhere, with a relatively shallow layer of application-specific knowledge acquisition.
IMO this kind of 'thin layer over SQL' is well suited to ad hoc and hopeless at scale. It's good if you just want to drag out some data with a bit of SQL. It's bad if you want long-term durable code that can support rapid prototyping by core developers and deep test suites, and survive heavy refactoring and incremental requirements.
I would never use this thin layer approach at scale. I rather like the 'Helsinki' approach
Seems good to me. Not sure if this counts, but I now and then try to mention that business logic tends to get very thin in general and that perhaps we should all just start using https://postgrest.org/en/v7.0.0/ that just exposes your database as a REST API with json objects, which are very close to tuples in TTM, especially if you add Typescript typing on top of them.
Quote from dandl on January 20, 2021, 12:31 amIt was little more than a thin skin over JDBC, but just the right amount of skin to make it easier to use SQL queries in Java.
There is an issue here. One of the tasks of a serious-scale software project is to raise the level of the language by adding libraries, macros, reusable code. This greatly reduces the effort for incremental work on the application by those familiar with it, but greatly increases the work for new arrivals, or casual users/contributors. They want to be able to write code using principles learned elsewhere, with a relatively shallow layer of application-specific knowledge acquisition.
IMO this kind of 'thin layer over SQL' is well suited to ad hoc and hopeless at scale. It's good if you just want to drag out some data with a bit of SQL. It's bad if you want long-term durable code that can support rapid prototyping by core developers and deep test suites, and survive heavy refactoring and incremental requirements.
I would never use this thin layer approach at scale. I rather like the 'Helsinki' approach
Seems good to me. Not sure if this counts, but I now and then try to mention that business logic tends to get very thin in general and that perhaps we should all just start using https://postgrest.org/en/v7.0.0/ that just exposes your database as a REST API with json objects, which are very close to tuples in TTM, especially if you add Typescript typing on top of them.
Quote from Dave Voorhis on January 20, 2021, 2:42 pmQuote from dandl on January 20, 2021, 1:29 pmMy experience in using tools like this to build software is limited: Rails, MS EF, couple of others. I defer to experts.
But I know about building tools that are used in this way by others, and I know how hard it is to create the right abstraction and defend it over time, at scale and across change. I think the fundamental premise of the ORM is wrong, but while I think a 'layer over SQL' is probably a better way, at least for smaller apps, I don't think it scales or delivers the other benefits of a true abstraction layer.
These days, I get paid to use (and sometimes build) tools like this. ORMs are kind of like dynamically typed languages -- they're deceptively simple and make you think you're writing less code, and maybe you are, but you actually wind up going to a lot more effort.
I think a layer over SQL probably is a better way. Over time, I'll find out as I use it.
Having used true abstraction layers of at least the ORM variety -- such as Entity Framework -- with large applications, they seem to wind up encouraging a certain amount of code bloat and a lot of awfully SQL-like non-SQL, plus some actual SQL shoehorned in because something runs too slowly.
That doesn't seem particularly scalable.
The ORM is wrong because it maintains a fiction of an OO database. If you had an OO database you wouldn't use it, so why pretend?
Layer over SQL is less wrong (because it keeps the good bits) but it really only gives you better glue code, it doesn't raise the level of the language (or the database).
I would prefer a solution that was function oriented (like Linq or Java Streams) and composable, but providing the benefits of TTM: extensible type system, RA, MA. Visible SQL won't get you there.
Yes, the full benefits of the TTM ideal would be nice.
But, tradeoffs, as usual.
Quote from dandl on January 20, 2021, 1:29 pmMy experience in using tools like this to build software is limited: Rails, MS EF, couple of others. I defer to experts.
But I know about building tools that are used in this way by others, and I know how hard it is to create the right abstraction and defend it over time, at scale and across change. I think the fundamental premise of the ORM is wrong, but while I think a 'layer over SQL' is probably a better way, at least for smaller apps, I don't think it scales or delivers the other benefits of a true abstraction layer.
These days, I get paid to use (and sometimes build) tools like this. ORMs are kind of like dynamically typed languages -- they're deceptively simple and make you think you're writing less code, and maybe you are, but you actually wind up going to a lot more effort.
I think a layer over SQL probably is a better way. Over time, I'll find out as I use it.
Having used true abstraction layers of at least the ORM variety -- such as Entity Framework -- with large applications, they seem to wind up encouraging a certain amount of code bloat and a lot of awfully SQL-like non-SQL, plus some actual SQL shoehorned in because something runs too slowly.
That doesn't seem particularly scalable.
The ORM is wrong because it maintains a fiction of an OO database. If you had an OO database you wouldn't use it, so why pretend?
Layer over SQL is less wrong (because it keeps the good bits) but it really only gives you better glue code, it doesn't raise the level of the language (or the database).
I would prefer a solution that was function oriented (like Linq or Java Streams) and composable, but providing the benefits of TTM: extensible type system, RA, MA. Visible SQL won't get you there.
Yes, the full benefits of the TTM ideal would be nice.
But, tradeoffs, as usual.
Quote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Quote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Quote from tobega on January 21, 2021, 10:27 amQuote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Quote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Quote from Erwin on January 21, 2021, 2:51 pmQuote from tobega on January 21, 2021, 10:27 amQuote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Little addendum FWIW :
Where I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
So if departments are identified by, say, a 4-byte integer that has no further constraints on it (such as having to be positive, or having to be even, or whatever notion that would rule out some of the integers), then that's over 4bn tuples for every single emloyee who works on some project Z.
Take the tuple {X "Hugh Darwen", Y "does_not_exist", Z "The third Manifesto"}, fill in the values in the predicate and verify for yourself that the resulting proposition (the disjunction) is true (and will also be true for every other thinkable department identifier) if it is true that "Hugh Darwen works on The third Manifesto".
Quote from tobega on January 21, 2021, 10:27 amQuote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Little addendum FWIW :
Where I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
So if departments are identified by, say, a 4-byte integer that has no further constraints on it (such as having to be positive, or having to be even, or whatever notion that would rule out some of the integers), then that's over 4bn tuples for every single emloyee who works on some project Z.
Take the tuple {X "Hugh Darwen", Y "does_not_exist", Z "The third Manifesto"}, fill in the values in the predicate and verify for yourself that the resulting proposition (the disjunction) is true (and will also be true for every other thinkable department identifier) if it is true that "Hugh Darwen works on The third Manifesto".
Quote from tobega on January 21, 2021, 3:47 pmQuote from Erwin on January 21, 2021, 2:51 pmQuote from tobega on January 21, 2021, 10:27 amQuote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Little addendum FWIW :
Where I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
So if departments are identified by, say, a 4-byte integer that has no further constraints on it (such as having to be positive, or having to be even, or whatever notion that would rule out some of the integers), then that's over 4bn tuples for every single emloyee who works on some project Z.
Take the tuple {X "Hugh Darwen", Y "does_not_exist", Z "The third Manifesto"}, fill in the values in the predicate and verify for yourself that the resulting proposition (the disjunction) is true (and will also be true for every other thinkable department identifier) if it is true that "Hugh Darwen works on The third Manifesto".
Well, all is all, as defined by the domain, which may or may not be all possible values of the 4 byte integer. It could also be restricted to a list of N distinct 4-byte values, in which case it would be N values. Otherwise why care about a restriction imposed by some semi-arbitrarily selected physical limit if you don't care about the restriction imposed by the business?
I think that clearly nonsensical alternatives can be safely ignored anyway, since ignoring them cannot affect the outcome. But maybe you've made a good argument against using OR in the predicate that applies to a tuple if we would have to add tuples even for nonsense values on the leg that doesn't matter? I am at least much happier that the AND form is easier to apply further restrictions to.
Quote from Erwin on January 21, 2021, 2:51 pmQuote from tobega on January 21, 2021, 10:27 amQuote from Erwin on January 20, 2021, 8:07 pmQuote from tobega on January 13, 2021, 2:12 pm"All the tuples are true and each tuple means that Employee X works in Department Y OR on project Z"
The problem I have with that is that when I project on (Employee, Department), I get a relation value where it is suddenly very difficult to assert anything about the employee's relation to the department. Since it would (I presume) still be required to have "AND" between all the tuples, my only recourse seems to be to say that each tuple means "Employee X may work in Department Y".
Three things I want to say here.
If the predicate of a relation (/relational expression) truly is "Employee X works in Department Y OR on project Z", [and the relation has three attributes X, Y and Z] then the CWA dictates that :
- if X works in dept.Y, then a tuple must be present for each and every possible Z value (each and every possible project) and the particular (X,Y) combination
- if X works on project Z, then a tuple must be present for each and every possible Y value (each and every possible department) and the particular (X,Z) combination.
Note very carefully that where I mentioned this example in my reply, I was talking very specifically of a relation of degree 1, just attribute X, and your Y and Z were ***NOT*** free variables in my predicate.
Representing this predicate in the form of a relation of degree 3, with Y (department) and Z (project) also being attributes of the relation, gets us straight into the territory of OUTER UNION. And while <OR> in A algebra is indeed intended to correspond to that form of union, there are practical reasons why we'd all want to avoid using it in practice, same reasons why real DBMS's ***require*** "heading compatibility" between the arguments used in an invocation of relational UNION.
Two, the precise nature of the predicate that is satisfied by the result coming out of your projection (and any projection in general actually). In your example, that predicate is (study very carefully some 15-20 times and only if and when you're sufficiently awake) :
- THERE EXISTS SOME PROJECT Z such that ...
- ... Employee X works in Department Y OR on project Z
Note how the 'Z' has, in terms of logic, become a ***bound*** variable of the predicate, meaning it's no longer a ***free*** variable, meaning there's no corresponding Z attribute in the relation representing that.
The tuples that ***should*** appear in the relation are :
- the (X,Y) tuples that satisfy "Employee X works in Department Y"
- ***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)
PONDER THAT SECOND BULLET.
Three, about "it is suddenly very difficult to assert anything about the employee's relation to the department".
If the informational problem you are trying to address in some specific particular piece of code you' re writing is about "the employee's relation to the department", ***and nothing else but that***, then you ***SHOULDN'T*** be inspecting any things that import any concept into the data other than, precisely, "the employee's relation to the department". Meaning, you ***SHOULDN'T*** be inspecting things that say anything, at whatever level of vagueness, about the possible existence of projects that the employee at hand might possibly also be working on. Meaning, if your ***business problem*** is "does this employee work in this department or not", then ***use the relvar that TELLS you thing thing (and nothing else)***.
Thanks for this carefully considered response, very useful to see the logic.
For my own purposes I concluded that the crucial thing is the CWA.
I also concluded that I do not want to have an OR between attributes. It remains to be seen if it is possible to work around this in general, but in this case I would use the predicate "It may be true that Employee X works for Department Y AND in Project Z", which holds when we generate the combinations given. It is a somewhat weaker assertion, but I can separately keep track of the fact that I expect to be able to replace "may be" by "is" for at least one tuple. The crucial point is by the CWA that no missing relation can have "may be" replaced by "is".
You are of course correct that in the simple example given, the projection onto (employee, department) will by necessity contain all departments. But it will not be so if we meanwhile receive other pieces of information such as "nobody on project J works in department C".
As for use case, are you familiar with the problem "Who owns the zebra?" otherwise known as Einstein's riddle? That is probably a good example of the type of work to do (without going to datalog). In addition, of course, to more usual business purposes since it's a language feature.
Little addendum FWIW :
Where I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
So if departments are identified by, say, a 4-byte integer that has no further constraints on it (such as having to be positive, or having to be even, or whatever notion that would rule out some of the integers), then that's over 4bn tuples for every single emloyee who works on some project Z.
Take the tuple {X "Hugh Darwen", Y "does_not_exist", Z "The third Manifesto"}, fill in the values in the predicate and verify for yourself that the resulting proposition (the disjunction) is true (and will also be true for every other thinkable department identifier) if it is true that "Hugh Darwen works on The third Manifesto".
Well, all is all, as defined by the domain, which may or may not be all possible values of the 4 byte integer. It could also be restricted to a list of N distinct 4-byte values, in which case it would be N values. Otherwise why care about a restriction imposed by some semi-arbitrarily selected physical limit if you don't care about the restriction imposed by the business?
I think that clearly nonsensical alternatives can be safely ignored anyway, since ignoring them cannot affect the outcome. But maybe you've made a good argument against using OR in the predicate that applies to a tuple if we would have to add tuples even for nonsense values on the leg that doesn't matter? I am at least much happier that the AND form is easier to apply further restrictions to.
Quote from dandl on January 21, 2021, 10:32 pmWhere I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
We had a discussion around this very point not that long back. It's far from obvious in Algebra-A that OR is well-defined on relations with differing headings, but effectively meaningless. There is a similar issue with NOT. For the 4 cases of Same, Subset, Overlap, Disjoint headings:
- AND equates usefully to Set Intersection, Semijoin, Join, Times
- OR equates usefully to Set Union, nothing, nothing, nothing
- AND NOT equates usefully to Set Minus, Antijoin, nothing, nothing
- OR NOT equates usefully to nothing
Having written that out, I'm less sure I got it right. Please feel free to correct.
Where I wrote "***ALL*** the (X,Y) tuples such that Y is ***some*** department and X works on ***some*** project Z (further unspecified in this relation)", the part "Y is ***some department***" specifically means "any thinkable department", ***including*** all the ones that are not claimed to exist (in some other relvar dedicated to that purpose of stating which departments are in existence).
We had a discussion around this very point not that long back. It's far from obvious in Algebra-A that OR is well-defined on relations with differing headings, but effectively meaningless. There is a similar issue with NOT. For the 4 cases of Same, Subset, Overlap, Disjoint headings:
- AND equates usefully to Set Intersection, Semijoin, Join, Times
- OR equates usefully to Set Union, nothing, nothing, nothing
- AND NOT equates usefully to Set Minus, Antijoin, nothing, nothing
- OR NOT equates usefully to nothing
Having written that out, I'm less sure I got it right. Please feel free to correct.