Possreps, Objects, Mutability, Immutability, and Developing Non-database Applications.
Quote from AntC on June 22, 2019, 1:13 amQuote from Tegiri Nenashi on June 21, 2019, 5:10 pmQuote from AntC on June 21, 2019, 9:59 am... Of course because SQL's support for constraints is a joke. This should be a strong argument for TTM over SQL.
Wouldn't SQL ASSERT be adequate for the task?
Heh heh. Look what I bumped into as very nearly first Google hit https://community.oracle.com/ideas/13028
Assertions have been part of the SQL standard since SQL-92.
We are considering building support for the CREATE ASSERTION command in a next release
You'd think this was written sometime in the mid 1990s. Wrong: it's 2016.
You'd think this was written by some two-bit DBMS vendor. Wrong, it's Oracle Corp.
Perhaps it was written by some Oracle junior who didn't know SQL from a bar of soap? Wrong, it's Toon_Koppelaars-Oracle
AFAICT to this day, Oracle has very limited support for
ASSERTION
s, and other vendors are not much better.Quite a few implement
CHECK CONSTRAINT
s -- that is check the value in specific columns. Some have limited forms of whole-row constraints or whole-table constraints -- that is, more than justUNIQUE
constraints. I don't think any of the big vendors offer inter-table constraints (beyondFOREIGN KEY
).It is the implementation of database assertions that needs more thought. Either in SQL or TTM, evaluating assertions can be extremely expensive.
Yes this is the excuse vendors give. Then do you (or they) suggest we just don't bother validating incoming data? Instead you can write database triggers to validate before update (and that'll be Row By Agonising Row); or you can build the logic into your application. IOW you're doing exactly the same amount of logic, just in the wrong way (RBAR rather than set-of-tuples) or in the wrong place (in the client side rather than the database engine). This is bound to be more expensive.
It's a joke.
In real life, constraints are everywhere, they are just called differently. Humans gave up on enforcing the law flawlessly (i.e. without any incidental violations) long time ago. Therefore, one can break a rule (or two). Some mess would be created that have to be cleaned afterwards, but the troublemaker is penalized (if caught). Constraint enforcement via applications idea is of that kind.
No. There's a logical difference between a database content that conforms to constraints vs content that doesn't. (That is, conforms to the constraints you'd like to
ASSERT
if only your vendor supported it.) That's one of those big differences. Because a query is an exercise in logic: what can I infer, given the facts currently represented in the database? -- that is, facts as axioms. If those facts are inconsistent (with each other, because a undeclared constraint fails), then your query can conclude anything it likes:FALSE
implies anything. You might be lucky some queries don't draw on the tables or rows that hold falsehoods -- or your query has some convoluted logic to avoid them (such as typically surrounds SQLNULL
) -- more "extremely expensive" logic that is amortised over the drip, drip of running queries. But everything you do with the database has to be qualified by caveats and Fear, Uncertainty and Doubt.Oh, and here's some more humour from an SQL reference
All Constraints are defined with a deferral mode of either
DEFERRABLE
orNOT DEFERRABLE
. A deferral mode ofDEFERRABLE
allows you to specify when you want your DBMS to check the Constraint for violation (the choices are at statement end or at transaction end). A deferral mode ofNOT DEFERRABLE
doesn’t give you this option: your DBMS will check the Constraint for violation as soon as it finishes executing an SQL statement. Of course, not every SQL statement will cause your DBMS to check Constraints – the main statements that cause Constraint checking areINSERT
,UPDATE
andDELETE
(there is no Constraint checking forDROP
statements).DELETE
is slightly less important because whenever you get rid of a row, there is no longer any need to check whether that row violates a Constraint.WTF? If I delete the row holding a Supplier record, that invalidates all the transactions referencing that supplier:
FOREIGN KEY
constraint violation.
DEFERRABLE
constraints (and there's some exciting timing options in there, in case there was any chance of you understanding the impact on your data) are a superb way to banjax your database. At least the reference has a caution (in the fine print)You’re taking a huge risk when you use deferred Constraints,
Why the need for
DEFERRABLE
constraints? Because sometimes you need multiple updates to get your data consistent; and SQL has nothing like that.This (Multiple Assignment) should be a another strong argument for TTM over SQL.
But then the industry wouldn't need so many propellor-headed data analysts (like me) going round patching up databases or writing convoluted logic buried in applications where only the programmer understands it.
Quote from Tegiri Nenashi on June 21, 2019, 5:10 pmQuote from AntC on June 21, 2019, 9:59 am... Of course because SQL's support for constraints is a joke. This should be a strong argument for TTM over SQL.
Wouldn't SQL ASSERT be adequate for the task?
Heh heh. Look what I bumped into as very nearly first Google hit https://community.oracle.com/ideas/13028
Assertions have been part of the SQL standard since SQL-92.
We are considering building support for the CREATE ASSERTION command in a next release
You'd think this was written sometime in the mid 1990s. Wrong: it's 2016.
You'd think this was written by some two-bit DBMS vendor. Wrong, it's Oracle Corp.
Perhaps it was written by some Oracle junior who didn't know SQL from a bar of soap? Wrong, it's Toon_Koppelaars-Oracle
AFAICT to this day, Oracle has very limited support for ASSERTION
s, and other vendors are not much better.
Quite a few implement CHECK CONSTRAINT
s -- that is check the value in specific columns. Some have limited forms of whole-row constraints or whole-table constraints -- that is, more than just UNIQUE
constraints. I don't think any of the big vendors offer inter-table constraints (beyond FOREIGN KEY
).
It is the implementation of database assertions that needs more thought. Either in SQL or TTM, evaluating assertions can be extremely expensive.
Yes this is the excuse vendors give. Then do you (or they) suggest we just don't bother validating incoming data? Instead you can write database triggers to validate before update (and that'll be Row By Agonising Row); or you can build the logic into your application. IOW you're doing exactly the same amount of logic, just in the wrong way (RBAR rather than set-of-tuples) or in the wrong place (in the client side rather than the database engine). This is bound to be more expensive.
It's a joke.
In real life, constraints are everywhere, they are just called differently. Humans gave up on enforcing the law flawlessly (i.e. without any incidental violations) long time ago. Therefore, one can break a rule (or two). Some mess would be created that have to be cleaned afterwards, but the troublemaker is penalized (if caught). Constraint enforcement via applications idea is of that kind.
No. There's a logical difference between a database content that conforms to constraints vs content that doesn't. (That is, conforms to the constraints you'd like to ASSERT
if only your vendor supported it.) That's one of those big differences. Because a query is an exercise in logic: what can I infer, given the facts currently represented in the database? -- that is, facts as axioms. If those facts are inconsistent (with each other, because a undeclared constraint fails), then your query can conclude anything it likes: FALSE
implies anything. You might be lucky some queries don't draw on the tables or rows that hold falsehoods -- or your query has some convoluted logic to avoid them (such as typically surrounds SQL NULL
) -- more "extremely expensive" logic that is amortised over the drip, drip of running queries. But everything you do with the database has to be qualified by caveats and Fear, Uncertainty and Doubt.
Oh, and here's some more humour from an SQL reference
All Constraints are defined with a deferral mode of either
DEFERRABLE
orNOT DEFERRABLE
. A deferral mode ofDEFERRABLE
allows you to specify when you want your DBMS to check the Constraint for violation (the choices are at statement end or at transaction end). A deferral mode ofNOT DEFERRABLE
doesn’t give you this option: your DBMS will check the Constraint for violation as soon as it finishes executing an SQL statement. Of course, not every SQL statement will cause your DBMS to check Constraints – the main statements that cause Constraint checking areINSERT
,UPDATE
andDELETE
(there is no Constraint checking forDROP
statements).DELETE
is slightly less important because whenever you get rid of a row, there is no longer any need to check whether that row violates a Constraint.
WTF? If I delete the row holding a Supplier record, that invalidates all the transactions referencing that supplier: FOREIGN KEY
constraint violation.
DEFERRABLE
constraints (and there's some exciting timing options in there, in case there was any chance of you understanding the impact on your data) are a superb way to banjax your database. At least the reference has a caution (in the fine print)
You’re taking a huge risk when you use deferred Constraints,
Why the need for DEFERRABLE
constraints? Because sometimes you need multiple updates to get your data consistent; and SQL has nothing like that.
This (Multiple Assignment) should be a another strong argument for TTM over SQL.
But then the industry wouldn't need so many propellor-headed data analysts (like me) going round patching up databases or writing convoluted logic buried in applications where only the programmer understands it.
Quote from Erwin on June 22, 2019, 12:06 pmToon has been pushing this idea for a very very long time. But it's also long time ago since I've last heard him say anything anywhere on the topic.
Blanket statements that "evaluating assertions can be extremely expensive" do not add to the discussion (and are even arguably wrong for the vast majority of cases). Proper physical access paths are all that's needed and the only ones that will intrinsically be "always [relatively] expensive" are those involving aggregations/summaries over large tuple sets. I've never got around to actually building it, but a good analysis tool should get you very close even to outright telling the DBA "you need an index on this-and-that on this relvar for that constraint".
One of my test scenarios is as follows :
- a BOM (bill of materials) relvar with two identifiers
- a "no cyclic graphs" constraint on that relvar (i.e. ISEMPTY(TCLOSE(my_bom_relvar) WHERE ID1 = ID2).
- An update program that "runs forever", doing each time 500 randomly generated updates to that relvar in one MA, initially 99%inserts and 1%deletes, evolving to 50%inserts 50%deletes as the relvar "grows".
I've never seen its running pace fall to pieces on any particular size I allowed the relvar to grow to. The process shows hiccups (temporary speed degradations), but never permanent ones.
All that said, I have developed some doubts regarding ASSERTION in a 3VL system. The maths I did for my 2VL implementation seems unfit to be adapted to SQL's 3VL, where the basic principle concerning constraints is "if you don't know, then you let it pass".
Toon has been pushing this idea for a very very long time. But it's also long time ago since I've last heard him say anything anywhere on the topic.
Blanket statements that "evaluating assertions can be extremely expensive" do not add to the discussion (and are even arguably wrong for the vast majority of cases). Proper physical access paths are all that's needed and the only ones that will intrinsically be "always [relatively] expensive" are those involving aggregations/summaries over large tuple sets. I've never got around to actually building it, but a good analysis tool should get you very close even to outright telling the DBA "you need an index on this-and-that on this relvar for that constraint".
One of my test scenarios is as follows :
- a BOM (bill of materials) relvar with two identifiers
- a "no cyclic graphs" constraint on that relvar (i.e. ISEMPTY(TCLOSE(my_bom_relvar) WHERE ID1 = ID2).
- An update program that "runs forever", doing each time 500 randomly generated updates to that relvar in one MA, initially 99%inserts and 1%deletes, evolving to 50%inserts 50%deletes as the relvar "grows".
I've never seen its running pace fall to pieces on any particular size I allowed the relvar to grow to. The process shows hiccups (temporary speed degradations), but never permanent ones.
All that said, I have developed some doubts regarding ASSERTION in a 3VL system. The maths I did for my 2VL implementation seems unfit to be adapted to SQL's 3VL, where the basic principle concerning constraints is "if you don't know, then you let it pass".
Quote from johnwcowan on June 22, 2019, 1:41 pmI wrote: "Big vendors have far less to gain by implementing standards than small vendors do."
I should add that big vendors win even bigger if they can make a description of what their existing product does into a standard, as in the notorious case of Microsoft's Office Open XML, which became ISO/IEC 29500. That way they gain an even bigger advantage over smaller competitors, who are going to be far behind in conformance for a very long time. In this case it is good if the standard is as complex and confusing as possible.
Still more is achieved for the big vendor if, once the standard is finished, they immediately initiate the development of the next version of the standard matching the next version of their product. They have been able to develop their new product version during standardization, since they have not had to push for anything during the standards process, only fight defensively to avoid "undue" simplification. At this point the small vendors are trapped on an obsolete version of the standard. This game of shifting standards (not in the usual sense of that term) can go on indefinitely until the vendor loses interest.
I wrote: "Big vendors have far less to gain by implementing standards than small vendors do."
I should add that big vendors win even bigger if they can make a description of what their existing product does into a standard, as in the notorious case of Microsoft's Office Open XML, which became ISO/IEC 29500. That way they gain an even bigger advantage over smaller competitors, who are going to be far behind in conformance for a very long time. In this case it is good if the standard is as complex and confusing as possible.
Still more is achieved for the big vendor if, once the standard is finished, they immediately initiate the development of the next version of the standard matching the next version of their product. They have been able to develop their new product version during standardization, since they have not had to push for anything during the standards process, only fight defensively to avoid "undue" simplification. At this point the small vendors are trapped on an obsolete version of the standard. This game of shifting standards (not in the usual sense of that term) can go on indefinitely until the vendor loses interest.
Quote from Dave Voorhis on June 22, 2019, 1:45 pmQuote from johnwcowan on June 22, 2019, 1:41 pmLet's face it: software is crap. Feature-laden and bloated, written under tremendous time-pressure, often by incapable coders, using dangerous languages and inadequate tools, trying to connect to heaps of broken or obsolete protocols, implemented equally insufficiently, running on unpredictable hardware -- we are all more than used to brokenness. --Felix WinkelmannI appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Quote from johnwcowan on June 22, 2019, 1:41 pmLet's face it: software is crap. Feature-laden and bloated, written under tremendous time-pressure, often by incapable coders, using dangerous languages and inadequate tools, trying to connect to heaps of broken or obsolete protocols, implemented equally insufficiently, running on unpredictable hardware -- we are all more than used to brokenness. --Felix Winkelmann
I appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Quote from johnwcowan on June 22, 2019, 1:59 pmQuote from Dave Voorhis on June 22, 2019, 1:45 pmI appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Well, perhaps. He wrote Chicken, a world-class compiler and interpreter for Scheme and a very substantial standard library + extensions in about 17,000 lines of C plus a lot of Scheme, and the project continues to exist and sees use (including commercial use) mostly without him these days. It compiles Scheme to C, which is then compiled using gcc or clang, and is bootstrapped by having a copy of the generated C for the bare-bones compiler, or by downloading a suitable binary as the bootstrap compiler.
Quote from Dave Voorhis on June 22, 2019, 1:45 pm
I appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Well, perhaps. He wrote Chicken, a world-class compiler and interpreter for Scheme and a very substantial standard library + extensions in about 17,000 lines of C plus a lot of Scheme, and the project continues to exist and sees use (including commercial use) mostly without him these days. It compiles Scheme to C, which is then compiled using gcc or clang, and is bootstrapped by having a copy of the generated C for the bare-bones compiler, or by downloading a suitable binary as the bootstrap compiler.
Quote from Dave Voorhis on June 22, 2019, 2:01 pmQuote from johnwcowan on June 22, 2019, 1:59 pmQuote from Dave Voorhis on June 22, 2019, 1:45 pmI appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Well, perhaps. He wrote Chicken, a world-class compiler and interpreter for Scheme and a very substantial standard library + extensions in about 17,000 lines of C plus a lot of Scheme, and the project continues to exist and sees use (including commercial use) mostly without him these days. It compiles Scheme to C, which is then compiled using gcc or clang, and is bootstrapped by having a copy of the generated C for the bare-bones compiler, or by downloading a suitable binary as the bootstrap compiler.
I know. I was being sarcastic. I should have used a smiley.
Quote from johnwcowan on June 22, 2019, 1:59 pmQuote from Dave Voorhis on June 22, 2019, 1:45 pmI appreciate the sentiment, but Felix is far too optimistic to be a real programmer.
Well, perhaps. He wrote Chicken, a world-class compiler and interpreter for Scheme and a very substantial standard library + extensions in about 17,000 lines of C plus a lot of Scheme, and the project continues to exist and sees use (including commercial use) mostly without him these days. It compiles Scheme to C, which is then compiled using gcc or clang, and is bootstrapped by having a copy of the generated C for the bare-bones compiler, or by downloading a suitable binary as the bootstrap compiler.
I know. I was being sarcastic. I should have used a smiley.
Quote from Erwin on June 22, 2019, 5:48 pmQuote from johnwcowan on June 22, 2019, 1:41 pmand multiple assignment won't handle all such cases.Hmmmmmmmmm. Care to elaborate ?
Quote from johnwcowan on June 22, 2019, 1:41 pmand multiple assignment won't handle all such cases.
Hmmmmmmmmm. Care to elaborate ?
Quote from johnwcowan on June 22, 2019, 10:48 pmIn principle it is always possible to write an expression for each relvar in the database giving its new value in terms of the existing values of some or all of the relvars in the database, and then a multiple assignment to the relvars will achieve a correct result. But the resulting code might be a tad difficult for a human being to understand, even with the introduction of private relvars prior to the assignment to capture common subexpressions. It would be quite an optimizer that could reliably turn a multi-relvar transformation of arbitrary complexity into a minimal number of relvar inserts, deletes, and updates at the physical level.
Indeed, if multiple assignments like this were the rule rather than the exception, we could dispense with transactions, and simply say that the unit of work is a relational assignment, single or multiple.
In principle it is always possible to write an expression for each relvar in the database giving its new value in terms of the existing values of some or all of the relvars in the database, and then a multiple assignment to the relvars will achieve a correct result. But the resulting code might be a tad difficult for a human being to understand, even with the introduction of private relvars prior to the assignment to capture common subexpressions. It would be quite an optimizer that could reliably turn a multi-relvar transformation of arbitrary complexity into a minimal number of relvar inserts, deletes, and updates at the physical level.
Indeed, if multiple assignments like this were the rule rather than the exception, we could dispense with transactions, and simply say that the unit of work is a relational assignment, single or multiple.
Quote from AntC on June 23, 2019, 12:01 amQuote from johnwcowan on June 22, 2019, 10:48 pmIn principle it is always possible to write an expression for each relvar in the database giving its new value in terms of the existing values of some or all of the relvars in the database, and then a multiple assignment to the relvars will achieve a correct result. But the resulting code might be a tad difficult for a human being to understand, even with the introduction of private relvars prior to the assignment to capture common subexpressions. It would be quite an optimizer that could reliably turn a multi-relvar transformation of arbitrary complexity into a minimal number of relvar inserts, deletes, and updates at the physical level.
D&D have had many occasions to regret RM Pre 21's wording "the assignment operator" as if they were prescribing syntax. 'We do not prescribe syntax.' Remember RM Pre 3 (second half) prescribes support for update operators. So you can have your traditional
INSERT/DELETE/UPDATE ... WHERE ...
, and it's the compiler's job to figure out how to assign the required value to each relvar.Indeed, if multiple assignments like this were the rule rather than the exception,
As per RM Pre 21, you mean? Swanky update operators could well be syntactic sugar for bunching multiple assignments taken from a private relvar which follows a denormalised schema relative to the assignment targets.
we could dispense with transactions, and simply say that the unit of work is a relational assignment, single or multiple.
That is what RM Pre 21 says, in effect, wrt ensuring all constraints hold within the database at (multiple) assignment boundaries.
You still need transaction commitment (RM Pre 4) for multiple multiple assignments. I am less convinced of the need for nested transaction commitment transaction commitments (Rm Pre 5) -- that would be multiple multiple multiple assignments.
Quote from johnwcowan on June 22, 2019, 10:48 pmIn principle it is always possible to write an expression for each relvar in the database giving its new value in terms of the existing values of some or all of the relvars in the database, and then a multiple assignment to the relvars will achieve a correct result. But the resulting code might be a tad difficult for a human being to understand, even with the introduction of private relvars prior to the assignment to capture common subexpressions. It would be quite an optimizer that could reliably turn a multi-relvar transformation of arbitrary complexity into a minimal number of relvar inserts, deletes, and updates at the physical level.
D&D have had many occasions to regret RM Pre 21's wording "the assignment operator" as if they were prescribing syntax. 'We do not prescribe syntax.' Remember RM Pre 3 (second half) prescribes support for update operators. So you can have your traditional INSERT/DELETE/UPDATE ... WHERE ...
, and it's the compiler's job to figure out how to assign the required value to each relvar.
Indeed, if multiple assignments like this were the rule rather than the exception,
As per RM Pre 21, you mean? Swanky update operators could well be syntactic sugar for bunching multiple assignments taken from a private relvar which follows a denormalised schema relative to the assignment targets.
we could dispense with transactions, and simply say that the unit of work is a relational assignment, single or multiple.
That is what RM Pre 21 says, in effect, wrt ensuring all constraints hold within the database at (multiple) assignment boundaries.
You still need transaction commitment (RM Pre 4) for multiple multiple assignments. I am less convinced of the need for nested transaction commitment transaction commitments (Rm Pre 5) -- that would be multiple multiple multiple assignments.
Quote from johnwcowan on June 23, 2019, 12:56 amQuote from AntC on June 23, 2019, 12:01 amYou still need transaction commitment (RM Pre 4) for multiple multiple assignments. I am less convinced of the need for nested transaction commitment transaction commitments (Rm Pre 5) -- that would be multiple multiple multiple assignments.
Per contra, you actually need only one single assignment: the assignment of the whole database, which per Appendix D of the TTM book is a tuplevar, to a new tuple. (Indeed, in the closed world of Tutorial D, there is no need to perform more than one such action ever, as everything that can be known is already known, and all we need do is reformulate it for all possible uses present and future. But that's by the way.) However, that's only in principle. I think it is "quite beyond the powers of all the dwarves put together, if they could all be collected again from the four corners of the world" to convert such a single assignment in full generality to a minimal set of physical-level inserts, deletes, and updates. Much simpler tasks are attempted by database optimizers every day, and they often fail, or are very brittle for reasons that remain mysterious to their users.
Quote from AntC on June 23, 2019, 12:01 amYou still need transaction commitment (RM Pre 4) for multiple multiple assignments. I am less convinced of the need for nested transaction commitment transaction commitments (Rm Pre 5) -- that would be multiple multiple multiple assignments.
Per contra, you actually need only one single assignment: the assignment of the whole database, which per Appendix D of the TTM book is a tuplevar, to a new tuple. (Indeed, in the closed world of Tutorial D, there is no need to perform more than one such action ever, as everything that can be known is already known, and all we need do is reformulate it for all possible uses present and future. But that's by the way.) However, that's only in principle. I think it is "quite beyond the powers of all the dwarves put together, if they could all be collected again from the four corners of the world" to convert such a single assignment in full generality to a minimal set of physical-level inserts, deletes, and updates. Much simpler tasks are attempted by database optimizers every day, and they often fail, or are very brittle for reasons that remain mysterious to their users.