The Third Manifesto ForumDatabases: Database TheoryTuple visibility through a view

Please or Register to create posts and topics.

Tuple visibility through a view

#21 · October 13, 2019, 12:55 am

Quote from p c on October 13, 2019, 12:55 am
Quote from Dave Voorhis on October 12, 2019, 4:32 pm

Quote from p c on October 12, 2019, 3:24 pm

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Here's a simplified part of a billing system, written in Tutorial D:
VAR InvoiceHeading REAL RELATION {InvoiceNumber INT, InvoiceDate Date, CustomerID CHAR} KEY {InvoiceNumber};
VAR InvoiceDetail REAL RELATION {InvoiceNumber INT, ProductCode CHAR, Qty INT} KEY {InvoiceNumber, ProductCode};

CONSTRAINT InvoiceDetail_FK1 InvoiceDetail {InvoiceNumber} ⊆ InvoiceHeading {InvoiceNumber};

VAR Invoices VIEW InvoiceHeading JOIN InvoiceDetail
We can assume there would also be, at least, relvars and corresponding constraints for customers and products, which for simplicity's sake are not shown here.

Suppose the view Invoices is used to present invoices on some user interface. If the user deletes a tuple from Invoices, what should happen?

Alternatively, if this schema is undesirable, how should it be changed?

In other words, if there are issues that have "never been understood by the posters here", what do we need to understand?

Please be specific and as clear and logical as possible, and use examples.
What should happen is a typical misguided coder's question. It depends on whatever the application's requirement is, in other words what the application designer wants to happen. What will happen depends on the predicate expressed by the view's expression that addresses the singleton relation aka tuple. If the expression doesn't address a certain singleton aka addressing an empty relation, then nothing should happen.

Since detail invoice number logically implies a header invoice number, a header can't be deleted without also deleting all matching details. If that is desired then the most obvious view is not the one declared, rather it's a projection of that view and the algebraic expression that defnes it should give an extension that satisfies a predicate having quantified variables.

If the designer wants one certain detail among others for the same invoice to be deleted, the predicate of Invoices Join Details derives the projection equation Invoices{number} = Details{number} as well as the resulting Invoices'{number} = Details'{number}. See Codd 1970 for natural join definition. (Note as in next paragraph that this predicate is not usually the predicate of a deleted singleton or to-be-deleted singleton.)

If it's desired to never delete an invoice from this view, more accurately never replace a subset of its extension that is addressed by a certain expression/predicate with an empty relation, the predicate Invoices Join Details is the WRONG predicate to use. It can only address all invoices. Lots of talk here treats it as the right predicate even though it is wrong unless Invoices has only one tuple. Instead a different view could join Invoices Join Details = J with two singleton relations having a distinct attribute and two distinct values, eg rel{tup{x 1}} = KeepI and rel{tup{x 0}} = DeleteI. J Join KeepI Union J Join DeleteI = B is the beginning of an expression that can reflect a more accurate predicate. Its importance is that it can be used as part of the effective predicate of an application view that addresses only the subset that matches DeleteI, eg., B Join DeleteI, which guarantees that the projection J{number} is always a subset of J'{number} because that must be a subset of J Join Keep1{number} which guarantees that the resulting projection Invoices'{number} implies all the original invoicenumbers and possibly more.

The above can be embodied in a further expression that projects out the x attribute giving a relation for which it can be said forall ... x = 'DeleteI'. Effectively every tuple of the projection implies an unprojected tuple such that x = 'DeleteI'. This gives a projected view predicate whose empty extension can be replaced with a non-empty extension, thus allowing inserts to B.

But these particular answers might just be answering some application requirement that you haven't given. Don't ask me for more details if you can't detail your application's requirments. Anyway, this level of manipulation is beyond most coders and any optimization to implement a language shortcut is more so. Even if it weren't it's not economic for coders to be involved because they are generally overwhelmed by more routine matters and confused by even the most basic relational concepts, as the original question shows, In fact this is probably the wrong group to even discuss it in and given that there is so little mention of physical issues that can effect language implementation and therefore dbms implementation, leaving a scope here that is just language and given that the talk is generally not even relational it is strange to see talk of 'engines'.

The question of whether the given schema is desireable is also misguided, about as useful as the strangely popular adjective here, 'reasonable'. When talking about what Codd talked about, it is what de Bono called a porridge word. It is only ever useful in file system talk where anything a program can do is reasonable if it satifies a heritage of unspoken requirements which is just mysticism as far as relational theory is concerned. Whether a different schema should be used or whether relational expressions can be used to same effect has nothing to do with manipulating predicates. But it should at least be obvious that given the Invoices join, the given constraint is redundant. Nearly all constraints are redundant in their usual given form, since even with the four-operator algebra of the 1972 completeness paper nearly all constraints are update constraints. In their usual form I suspect Codd meant them as a convenient stepping stone for an audience that was almost universally file system oriented. His primary keys were an exception. I once listened to him introduce his theory for an hour and all he talked about was the importance of primary keys. He clearly thought his relations always had primary keys, not just candidate keys, he expected them to be built-in to his relations, so it's not surprising he didn't need a Rename operator.

As for what needs to be understood, as I've said in this group for many many years now, quite a lot of unlearning would be needed before most existing practioners could develop predicate skill. I spent many years implementing physical dbmses and all the low-level facets too including concurrency support which effecively requires predicate manipulation and even some such as security and distribution that can be supported relationally so this answer is based on my own experience of finally realizing how much time I had wasted and how much there was to unlearn, but I don't say copy me. Above I gave a basic answer to a simple question from a non-logician. From talking to some of the kids who live nearby it's pretty obvious that they could master some of the basic skills just as easily as they handle grade 7 algebra, at least the few that are taught algebra as opposed to rote coding from patterns.

That unlearning probably needs a cultural change which implies generational change and possibly even a non-English language culture. Some people want to lay the blame for this on Codd, because he didn't tell them how to implement a dbms even though he freely acknowledged that wasn't his purpose. His was one of the very rare human inventions where the whole theory was outlined before any little piece appeared. But history is full of technologies that weren't initially fully understood, made their mark on the the strength of an isolated facet and humanity staggers on living with the waste and damage too in spite of its ignorance. Maybe it shouldn't be surprising to see database practioners prematurely try to apply de Morgan's laws to the wrong relations let alone ever seeing anybody who knows that a relational update is not just a derivation but a logical argument too. Most posters here accept that base updates are fundamental rather than the special case they actually are. Were change ever to happen TTM is not enough, with its file heritage that deals with mostly physical matters concerning result values rather than logical manipulation of declarative expressions by procedural means, not logical derivations and not logical validity. Functions mentioned only in terms of storage optimization eg normalization, the usefullness of projecting all pairs of relations on common attributes, etc., etc. In other words it's a mistake to ever think a D-language is a relational dbms, especially if it uses one of the existing dbmses as a so-called engine.

This area is not the first place I've seen non-system-programmers incapable of understanding an imaginative system programmer such as Codd. His kind of programmer is extremely rare. As I've long said the main signposts are in the ten pages of the 1970 paper though most people I've worked with were too lazy to read it once let alone the many times average people like me need to in order to figure out the minimum that must be done to make a basic start on an implementation even though what a basic relational dbms/data sublanguage needs is more thought and less code, orders of magnitude less code than the commercial dbms products even when the most important optimizations are supported. Such an anti-culture change is not likely to happen in my lifetime. My advice to you is live with it and don't claim to implement a relational system. I only got started on this because of Erwin's comment. I hope nobody else comments with 'too many joins', a very popular misapprehension here and elsewhere.

Quote from Dave Voorhis on October 12, 2019, 4:32 pm

Quote from p c on October 12, 2019, 3:24 pm

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Here's a simplified part of a billing system, written in Tutorial D:
VAR InvoiceHeading REAL RELATION {InvoiceNumber INT, InvoiceDate Date, CustomerID CHAR} KEY {InvoiceNumber};
VAR InvoiceDetail REAL RELATION {InvoiceNumber INT, ProductCode CHAR, Qty INT} KEY {InvoiceNumber, ProductCode};

CONSTRAINT InvoiceDetail_FK1 InvoiceDetail {InvoiceNumber} ⊆ InvoiceHeading {InvoiceNumber};

VAR Invoices VIEW InvoiceHeading JOIN InvoiceDetail
We can assume there would also be, at least, relvars and corresponding constraints for customers and products, which for simplicity's sake are not shown here.

Suppose the view Invoices is used to present invoices on some user interface. If the user deletes a tuple from Invoices, what should happen?

Alternatively, if this schema is undesirable, how should it be changed?

In other words, if there are issues that have "never been understood by the posters here", what do we need to understand?

Please be specific and as clear and logical as possible, and use examples.

What should happen is a typical misguided coder's question. It depends on whatever the application's requirement is, in other words what the application designer wants to happen. What will happen depends on the predicate expressed by the view's expression that addresses the singleton relation aka tuple. If the expression doesn't address a certain singleton aka addressing an empty relation, then nothing should happen.

Since detail invoice number logically implies a header invoice number, a header can't be deleted without also deleting all matching details. If that is desired then the most obvious view is not the one declared, rather it's a projection of that view and the algebraic expression that defnes it should give an extension that satisfies a predicate having quantified variables.

If the designer wants one certain detail among others for the same invoice to be deleted, the predicate of Invoices Join Details derives the projection equation Invoices{number} = Details{number} as well as the resulting Invoices'{number} = Details'{number}. See Codd 1970 for natural join definition. (Note as in next paragraph that this predicate is not usually the predicate of a deleted singleton or to-be-deleted singleton.)

If it's desired to never delete an invoice from this view, more accurately never replace a subset of its extension that is addressed by a certain expression/predicate with an empty relation, the predicate Invoices Join Details is the WRONG predicate to use. It can only address all invoices. Lots of talk here treats it as the right predicate even though it is wrong unless Invoices has only one tuple. Instead a different view could join Invoices Join Details = J with two singleton relations having a distinct attribute and two distinct values, eg rel{tup{x 1}} = KeepI and rel{tup{x 0}} = DeleteI. J Join KeepI Union J Join DeleteI = B is the beginning of an expression that can reflect a more accurate predicate. Its importance is that it can be used as part of the effective predicate of an application view that addresses only the subset that matches DeleteI, eg., B Join DeleteI, which guarantees that the projection J{number} is always a subset of J'{number} because that must be a subset of J Join Keep1{number} which guarantees that the resulting projection Invoices'{number} implies all the original invoicenumbers and possibly more.

The above can be embodied in a further expression that projects out the x attribute giving a relation for which it can be said forall ... x = 'DeleteI'. Effectively every tuple of the projection implies an unprojected tuple such that x = 'DeleteI'. This gives a projected view predicate whose empty extension can be replaced with a non-empty extension, thus allowing inserts to B.

But these particular answers might just be answering some application requirement that you haven't given. Don't ask me for more details if you can't detail your application's requirments. Anyway, this level of manipulation is beyond most coders and any optimization to implement a language shortcut is more so. Even if it weren't it's not economic for coders to be involved because they are generally overwhelmed by more routine matters and confused by even the most basic relational concepts, as the original question shows, In fact this is probably the wrong group to even discuss it in and given that there is so little mention of physical issues that can effect language implementation and therefore dbms implementation, leaving a scope here that is just language and given that the talk is generally not even relational it is strange to see talk of 'engines'.

The question of whether the given schema is desireable is also misguided, about as useful as the strangely popular adjective here, 'reasonable'. When talking about what Codd talked about, it is what de Bono called a porridge word. It is only ever useful in file system talk where anything a program can do is reasonable if it satifies a heritage of unspoken requirements which is just mysticism as far as relational theory is concerned. Whether a different schema should be used or whether relational expressions can be used to same effect has nothing to do with manipulating predicates. But it should at least be obvious that given the Invoices join, the given constraint is redundant. Nearly all constraints are redundant in their usual given form, since even with the four-operator algebra of the 1972 completeness paper nearly all constraints are update constraints. In their usual form I suspect Codd meant them as a convenient stepping stone for an audience that was almost universally file system oriented. His primary keys were an exception. I once listened to him introduce his theory for an hour and all he talked about was the importance of primary keys. He clearly thought his relations always had primary keys, not just candidate keys, he expected them to be built-in to his relations, so it's not surprising he didn't need a Rename operator.

As for what needs to be understood, as I've said in this group for many many years now, quite a lot of unlearning would be needed before most existing practioners could develop predicate skill. I spent many years implementing physical dbmses and all the low-level facets too including concurrency support which effecively requires predicate manipulation and even some such as security and distribution that can be supported relationally so this answer is based on my own experience of finally realizing how much time I had wasted and how much there was to unlearn, but I don't say copy me. Above I gave a basic answer to a simple question from a non-logician. From talking to some of the kids who live nearby it's pretty obvious that they could master some of the basic skills just as easily as they handle grade 7 algebra, at least the few that are taught algebra as opposed to rote coding from patterns.

That unlearning probably needs a cultural change which implies generational change and possibly even a non-English language culture. Some people want to lay the blame for this on Codd, because he didn't tell them how to implement a dbms even though he freely acknowledged that wasn't his purpose. His was one of the very rare human inventions where the whole theory was outlined before any little piece appeared. But history is full of technologies that weren't initially fully understood, made their mark on the the strength of an isolated facet and humanity staggers on living with the waste and damage too in spite of its ignorance. Maybe it shouldn't be surprising to see database practioners prematurely try to apply de Morgan's laws to the wrong relations let alone ever seeing anybody who knows that a relational update is not just a derivation but a logical argument too. Most posters here accept that base updates are fundamental rather than the special case they actually are. Were change ever to happen TTM is not enough, with its file heritage that deals with mostly physical matters concerning result values rather than logical manipulation of declarative expressions by procedural means, not logical derivations and not logical validity. Functions mentioned only in terms of storage optimization eg normalization, the usefullness of projecting all pairs of relations on common attributes, etc., etc. In other words it's a mistake to ever think a D-language is a relational dbms, especially if it uses one of the existing dbmses as a so-called engine.

This area is not the first place I've seen non-system-programmers incapable of understanding an imaginative system programmer such as Codd. His kind of programmer is extremely rare. As I've long said the main signposts are in the ten pages of the 1970 paper though most people I've worked with were too lazy to read it once let alone the many times average people like me need to in order to figure out the minimum that must be done to make a basic start on an implementation even though what a basic relational dbms/data sublanguage needs is more thought and less code, orders of magnitude less code than the commercial dbms products even when the most important optimizations are supported. Such an anti-culture change is not likely to happen in my lifetime. My advice to you is live with it and don't claim to implement a relational system. I only got started on this because of Erwin's comment. I hope nobody else comments with 'too many joins', a very popular misapprehension here and elsewhere.

#22 · October 13, 2019, 1:55 am

Quote from johnwcowan on October 13, 2019, 1:55 am

Quote from Erwin on October 10, 2019, 2:20 pm

Present day users of those data must interpret the data not as "father_of(George, Bill)" but as "the census says that the father of George is Bill".

Of course there's nothing about Prolog notation that disambiguates, but I'm used to reading father_of(George, Bill) as "George is the father of Bill", whereas you read it exactly the opposite way. I wonder why.

In their minds they're probably already very good at doing exactly that imo.

Probably, though with aggregate queries that has to be set aside. If you want to know "average number of children per father", you have to clean the data as best you can before you can put it in the database where you compute the average. Hopefully the result will be more reliable than the raw dirty data would provide.

Automated improvement is not possible until a human has done enough detailed investigation to conclude that there are some "patterns" of errors that have been frequently made, as well as to conclude with sufficient degree of certainty what the real state of affairs most likely had been for each individual specific error pattern concerned.

Ideally, yes. But if there are too many contradictions for a human being to examine, a priori reasoning may be in order. To give an idea of the scale of this data, it potentially provides multiple data for each year that has passed for each living person in the U.S. since about 1940, whether or not they have died since. So far, I have come up with the following general (i.e. not specific to this effort) principles:

Assign an a priori probability of correctness to each attribute (as noted above, counting children is a priori more likely to be wrong than mentioning them as individuals). Given a conflict between two fields, change the one with the lower probability of being right.

Failing that, assign a priori importance to each attribute on an ordinal scale. Given a conflict, change the one with lower importance.

Failing that, express the constraint to be maintained as an overall negation of conjunctive normal form, so that denying any one conjunct will defease the constraint. Change the attribute(s) referred to in the least complicated conjunct.

Quote from Erwin on October 10, 2019, 2:20 pm

Present day users of those data must interpret the data not as "father_of(George, Bill)" but as "the census says that the father of George is Bill".

Of course there's nothing about Prolog notation that disambiguates, but I'm used to reading father_of(George, Bill) as "George is the father of Bill", whereas you read it exactly the opposite way. I wonder why.

In their minds they're probably already very good at doing exactly that imo.

Probably, though with aggregate queries that has to be set aside. If you want to know "average number of children per father", you have to clean the data as best you can before you can put it in the database where you compute the average. Hopefully the result will be more reliable than the raw dirty data would provide.

Automated improvement is not possible until a human has done enough detailed investigation to conclude that there are some "patterns" of errors that have been frequently made, as well as to conclude with sufficient degree of certainty what the real state of affairs most likely had been for each individual specific error pattern concerned.

Ideally, yes. But if there are too many contradictions for a human being to examine, a priori reasoning may be in order. To give an idea of the scale of this data, it potentially provides multiple data for each year that has passed for each living person in the U.S. since about 1940, whether or not they have died since. So far, I have come up with the following general (i.e. not specific to this effort) principles:

Assign an a priori probability of correctness to each attribute (as noted above, counting children is a priori more likely to be wrong than mentioning them as individuals). Given a conflict between two fields, change the one with the lower probability of being right.
Failing that, assign a priori importance to each attribute on an ordinal scale. Given a conflict, change the one with lower importance.
Failing that, express the constraint to be maintained as an overall negation of conjunctive normal form, so that denying any one conjunct will defease the constraint. Change the attribute(s) referred to in the least complicated conjunct.

#23 · October 13, 2019, 10:44 am

Quote from Dave Voorhis on October 13, 2019, 10:44 am
Quote from p c on October 13, 2019, 12:55 am
Quote from Dave Voorhis on October 12, 2019, 4:32 pm

Quote from p c on October 12, 2019, 3:24 pm

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Here's a simplified part of a billing system, written in Tutorial D:
VAR InvoiceHeading REAL RELATION {InvoiceNumber INT, InvoiceDate Date, CustomerID CHAR} KEY {InvoiceNumber};
VAR InvoiceDetail REAL RELATION {InvoiceNumber INT, ProductCode CHAR, Qty INT} KEY {InvoiceNumber, ProductCode};

CONSTRAINT InvoiceDetail_FK1 InvoiceDetail {InvoiceNumber} ⊆ InvoiceHeading {InvoiceNumber};

VAR Invoices VIEW InvoiceHeading JOIN InvoiceDetail
We can assume there would also be, at least, relvars and corresponding constraints for customers and products, which for simplicity's sake are not shown here.

Suppose the view Invoices is used to present invoices on some user interface. If the user deletes a tuple from Invoices, what should happen?

Alternatively, if this schema is undesirable, how should it be changed?

In other words, if there are issues that have "never been understood by the posters here", what do we need to understand?

Please be specific and as clear and logical as possible, and use examples.
What should happen is a typical misguided coder's question. It depends on whatever the application's requirement is, in other words what the application designer wants to happen. What will happen depends on the predicate expressed by the view's expression that addresses the singleton relation aka tuple. If the expression doesn't address a certain singleton aka addressing an empty relation, then nothing should happen.

Since detail invoice number logically implies a header invoice number, a header can't be deleted without also deleting all matching details. If that is desired then the most obvious view is not the one declared, rather it's a projection of that view and the algebraic expression that defnes it should give an extension that satisfies a predicate having quantified variables.

If the designer wants one certain detail among others for the same invoice to be deleted, the predicate of Invoices Join Details derives the projection equation Invoices{number} = Details{number} as well as the resulting Invoices'{number} = Details'{number}. See Codd 1970 for natural join definition. (Note as in next paragraph that this predicate is not usually the predicate of a deleted singleton or to-be-deleted singleton.)

If it's desired to never delete an invoice from this view, more accurately never replace a subset of its extension that is addressed by a certain expression/predicate with an empty relation, the predicate Invoices Join Details is the WRONG predicate to use. It can only address all invoices. Lots of talk here treats it as the right predicate even though it is wrong unless Invoices has only one tuple. Instead a different view could join Invoices Join Details = J with two singleton relations having a distinct attribute and two distinct values, eg rel{tup{x 1}} = KeepI and rel{tup{x 0}} = DeleteI. J Join KeepI Union J Join DeleteI = B is the beginning of an expression that can reflect a more accurate predicate. Its importance is that it can be used as part of the effective predicate of an application view that addresses only the subset that matches DeleteI, eg., B Join DeleteI, which guarantees that the projection J{number} is always a subset of J'{number} because that must be a subset of J Join Keep1{number} which guarantees that the resulting projection Invoices'{number} implies all the original invoicenumbers and possibly more.

The above can be embodied in a further expression that projects out the x attribute giving a relation for which it can be said forall ... x = 'DeleteI'. Effectively every tuple of the projection implies an unprojected tuple such that x = 'DeleteI'. This gives a projected view predicate whose empty extension can be replaced with a non-empty extension, thus allowing inserts to B.

But these particular answers might just be answering some application requirement that you haven't given. Don't ask me for more details if you can't detail your application's requirments. Anyway, this level of manipulation is beyond most coders and any optimization to implement a language shortcut is more so. Even if it weren't it's not economic for coders to be involved because they are generally overwhelmed by more routine matters and confused by even the most basic relational concepts, as the original question shows, In fact this is probably the wrong group to even discuss it in and given that there is so little mention of physical issues that can effect language implementation and therefore dbms implementation, leaving a scope here that is just language and given that the talk is generally not even relational it is strange to see talk of 'engines'.

The question of whether the given schema is desireable is also misguided, about as useful as the strangely popular adjective here, 'reasonable'. When talking about what Codd talked about, it is what de Bono called a porridge word. It is only ever useful in file system talk where anything a program can do is reasonable if it satifies a heritage of unspoken requirements which is just mysticism as far as relational theory is concerned. Whether a different schema should be used or whether relational expressions can be used to same effect has nothing to do with manipulating predicates. But it should at least be obvious that given the Invoices join, the given constraint is redundant. Nearly all constraints are redundant in their usual given form, since even with the four-operator algebra of the 1972 completeness paper nearly all constraints are update constraints. In their usual form I suspect Codd meant them as a convenient stepping stone for an audience that was almost universally file system oriented. His primary keys were an exception. I once listened to him introduce his theory for an hour and all he talked about was the importance of primary keys. He clearly thought his relations always had primary keys, not just candidate keys, he expected them to be built-in to his relations, so it's not surprising he didn't need a Rename operator.

As for what needs to be understood, as I've said in this group for many many years now, quite a lot of unlearning would be needed before most existing practioners could develop predicate skill. I spent many years implementing physical dbmses and all the low-level facets too including concurrency support which effecively requires predicate manipulation and even some such as security and distribution that can be supported relationally so this answer is based on my own experience of finally realizing how much time I had wasted and how much there was to unlearn, but I don't say copy me. Above I gave a basic answer to a simple question from a non-logician. From talking to some of the kids who live nearby it's pretty obvious that they could master some of the basic skills just as easily as they handle grade 7 algebra, at least the few that are taught algebra as opposed to rote coding from patterns.

That unlearning probably needs a cultural change which implies generational change and possibly even a non-English language culture. Some people want to lay the blame for this on Codd, because he didn't tell them how to implement a dbms even though he freely acknowledged that wasn't his purpose. His was one of the very rare human inventions where the whole theory was outlined before any little piece appeared. But history is full of technologies that weren't initially fully understood, made their mark on the the strength of an isolated facet and humanity staggers on living with the waste and damage too in spite of its ignorance. Maybe it shouldn't be surprising to see database practioners prematurely try to apply de Morgan's laws to the wrong relations let alone ever seeing anybody who knows that a relational update is not just a derivation but a logical argument too. Most posters here accept that base updates are fundamental rather than the special case they actually are. Were change ever to happen TTM is not enough, with its file heritage that deals with mostly physical matters concerning result values rather than logical manipulation of declarative expressions by procedural means, not logical derivations and not logical validity. Functions mentioned only in terms of storage optimization eg normalization, the usefullness of projecting all pairs of relations on common attributes, etc., etc. In other words it's a mistake to ever think a D-language is a relational dbms, especially if it uses one of the existing dbmses as a so-called engine.

This area is not the first place I've seen non-system-programmers incapable of understanding an imaginative system programmer such as Codd. His kind of programmer is extremely rare. As I've long said the main signposts are in the ten pages of the 1970 paper though most people I've worked with were too lazy to read it once let alone the many times average people like me need to in order to figure out the minimum that must be done to make a basic start on an implementation even though what a basic relational dbms/data sublanguage needs is more thought and less code, orders of magnitude less code than the commercial dbms products even when the most important optimizations are supported. Such an anti-culture change is not likely to happen in my lifetime. My advice to you is live with it and don't claim to implement a relational system. I only got started on this because of Erwin's comment. I hope nobody else comments with 'too many joins', a very popular misapprehension here and elsewhere.
Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Quote from p c on October 13, 2019, 12:55 am
Quote from Dave Voorhis on October 12, 2019, 4:32 pm

Quote from p c on October 12, 2019, 3:24 pm

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Here's a simplified part of a billing system, written in Tutorial D:
VAR InvoiceHeading REAL RELATION {InvoiceNumber INT, InvoiceDate Date, CustomerID CHAR} KEY {InvoiceNumber};
VAR InvoiceDetail REAL RELATION {InvoiceNumber INT, ProductCode CHAR, Qty INT} KEY {InvoiceNumber, ProductCode};

CONSTRAINT InvoiceDetail_FK1 InvoiceDetail {InvoiceNumber} ⊆ InvoiceHeading {InvoiceNumber};

VAR Invoices VIEW InvoiceHeading JOIN InvoiceDetail
We can assume there would also be, at least, relvars and corresponding constraints for customers and products, which for simplicity's sake are not shown here.

Suppose the view Invoices is used to present invoices on some user interface. If the user deletes a tuple from Invoices, what should happen?

Alternatively, if this schema is undesirable, how should it be changed?

In other words, if there are issues that have "never been understood by the posters here", what do we need to understand?

Please be specific and as clear and logical as possible, and use examples.
What should happen is a typical misguided coder's question. It depends on whatever the application's requirement is, in other words what the application designer wants to happen. What will happen depends on the predicate expressed by the view's expression that addresses the singleton relation aka tuple. If the expression doesn't address a certain singleton aka addressing an empty relation, then nothing should happen.

Since detail invoice number logically implies a header invoice number, a header can't be deleted without also deleting all matching details. If that is desired then the most obvious view is not the one declared, rather it's a projection of that view and the algebraic expression that defnes it should give an extension that satisfies a predicate having quantified variables.

If the designer wants one certain detail among others for the same invoice to be deleted, the predicate of Invoices Join Details derives the projection equation Invoices{number} = Details{number} as well as the resulting Invoices'{number} = Details'{number}. See Codd 1970 for natural join definition. (Note as in next paragraph that this predicate is not usually the predicate of a deleted singleton or to-be-deleted singleton.)

If it's desired to never delete an invoice from this view, more accurately never replace a subset of its extension that is addressed by a certain expression/predicate with an empty relation, the predicate Invoices Join Details is the WRONG predicate to use. It can only address all invoices. Lots of talk here treats it as the right predicate even though it is wrong unless Invoices has only one tuple. Instead a different view could join Invoices Join Details = J with two singleton relations having a distinct attribute and two distinct values, eg rel{tup{x 1}} = KeepI and rel{tup{x 0}} = DeleteI. J Join KeepI Union J Join DeleteI = B is the beginning of an expression that can reflect a more accurate predicate. Its importance is that it can be used as part of the effective predicate of an application view that addresses only the subset that matches DeleteI, eg., B Join DeleteI, which guarantees that the projection J{number} is always a subset of J'{number} because that must be a subset of J Join Keep1{number} which guarantees that the resulting projection Invoices'{number} implies all the original invoicenumbers and possibly more.

The above can be embodied in a further expression that projects out the x attribute giving a relation for which it can be said forall ... x = 'DeleteI'. Effectively every tuple of the projection implies an unprojected tuple such that x = 'DeleteI'. This gives a projected view predicate whose empty extension can be replaced with a non-empty extension, thus allowing inserts to B.

But these particular answers might just be answering some application requirement that you haven't given. Don't ask me for more details if you can't detail your application's requirments. Anyway, this level of manipulation is beyond most coders and any optimization to implement a language shortcut is more so. Even if it weren't it's not economic for coders to be involved because they are generally overwhelmed by more routine matters and confused by even the most basic relational concepts, as the original question shows, In fact this is probably the wrong group to even discuss it in and given that there is so little mention of physical issues that can effect language implementation and therefore dbms implementation, leaving a scope here that is just language and given that the talk is generally not even relational it is strange to see talk of 'engines'.

The question of whether the given schema is desireable is also misguided, about as useful as the strangely popular adjective here, 'reasonable'. When talking about what Codd talked about, it is what de Bono called a porridge word. It is only ever useful in file system talk where anything a program can do is reasonable if it satifies a heritage of unspoken requirements which is just mysticism as far as relational theory is concerned. Whether a different schema should be used or whether relational expressions can be used to same effect has nothing to do with manipulating predicates. But it should at least be obvious that given the Invoices join, the given constraint is redundant. Nearly all constraints are redundant in their usual given form, since even with the four-operator algebra of the 1972 completeness paper nearly all constraints are update constraints. In their usual form I suspect Codd meant them as a convenient stepping stone for an audience that was almost universally file system oriented. His primary keys were an exception. I once listened to him introduce his theory for an hour and all he talked about was the importance of primary keys. He clearly thought his relations always had primary keys, not just candidate keys, he expected them to be built-in to his relations, so it's not surprising he didn't need a Rename operator.

As for what needs to be understood, as I've said in this group for many many years now, quite a lot of unlearning would be needed before most existing practioners could develop predicate skill. I spent many years implementing physical dbmses and all the low-level facets too including concurrency support which effecively requires predicate manipulation and even some such as security and distribution that can be supported relationally so this answer is based on my own experience of finally realizing how much time I had wasted and how much there was to unlearn, but I don't say copy me. Above I gave a basic answer to a simple question from a non-logician. From talking to some of the kids who live nearby it's pretty obvious that they could master some of the basic skills just as easily as they handle grade 7 algebra, at least the few that are taught algebra as opposed to rote coding from patterns.

That unlearning probably needs a cultural change which implies generational change and possibly even a non-English language culture. Some people want to lay the blame for this on Codd, because he didn't tell them how to implement a dbms even though he freely acknowledged that wasn't his purpose. His was one of the very rare human inventions where the whole theory was outlined before any little piece appeared. But history is full of technologies that weren't initially fully understood, made their mark on the the strength of an isolated facet and humanity staggers on living with the waste and damage too in spite of its ignorance. Maybe it shouldn't be surprising to see database practioners prematurely try to apply de Morgan's laws to the wrong relations let alone ever seeing anybody who knows that a relational update is not just a derivation but a logical argument too. Most posters here accept that base updates are fundamental rather than the special case they actually are. Were change ever to happen TTM is not enough, with its file heritage that deals with mostly physical matters concerning result values rather than logical manipulation of declarative expressions by procedural means, not logical derivations and not logical validity. Functions mentioned only in terms of storage optimization eg normalization, the usefullness of projecting all pairs of relations on common attributes, etc., etc. In other words it's a mistake to ever think a D-language is a relational dbms, especially if it uses one of the existing dbmses as a so-called engine.

This area is not the first place I've seen non-system-programmers incapable of understanding an imaginative system programmer such as Codd. His kind of programmer is extremely rare. As I've long said the main signposts are in the ten pages of the 1970 paper though most people I've worked with were too lazy to read it once let alone the many times average people like me need to in order to figure out the minimum that must be done to make a basic start on an implementation even though what a basic relational dbms/data sublanguage needs is more thought and less code, orders of magnitude less code than the commercial dbms products even when the most important optimizations are supported. Such an anti-culture change is not likely to happen in my lifetime. My advice to you is live with it and don't claim to implement a relational system. I only got started on this because of Erwin's comment. I hope nobody else comments with 'too many joins', a very popular misapprehension here and elsewhere.

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#24 · October 13, 2019, 2:59 pm

Quote from p c on October 12, 2019, 3:24 pm

(plucked out from a massive stew of semi-intelligible rant)

A record can be altered but a tuple can't be changed.

A distinction without a difference, surely. SQL systems think in terms of updatable records. TTM usually talks about immutable tuples and immutable sets of them, and grants mutability only to variables holding them. Date says in VURT that in principle relvars are just constant names for relations within a database value, and the only thing that can be mutated is which database value is held in a dbvar. The FRP standpoint is that all databases everywhere, along with everything else, are immutable, and that the only action that can be taken is to replace the value of the singleton world[*] variable with another immutable world value.

[*] In Wittgenstein's sense: "Die Welt ist alles, was der Fall ist."

assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions

A predicate as such is simply a set of objects of which it is true, pure extension. Intensions arise only from constraints.

#25 · October 14, 2019, 6:35 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

#26 · October 14, 2019, 8:50 pm

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#27 · October 14, 2019, 9:06 pm

Quote from p c on October 14, 2019, 6:35 pm

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration.

Datalog ground facts can just be stored in relvars as-is. Rules, including non-ground facts, can also be stored in relvars in a coded form, as the Information Principle requires. I don't yet have a good representation of rules as relvars. Then you just need a syntactic representation of Datalog queries. One thing that will definitely be useful is a set of types foo_var for every ground type foo, which contain a value of type foo, a variable name, and a boolean saying whether a specific value of this type is a variable or a value. Note that when pickled in the database, variables can't vary, so there is no violation of TTM here.

#28 · October 16, 2019, 12:14 am

Quote from p c on October 16, 2019, 12:14 am

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

#29 · October 16, 2019, 7:18 am

Quote from Erwin on October 16, 2019, 7:18 am

Quote from p c on October 16, 2019, 12:14 am

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

The first time you ask me, I'll "want to happen" that only the lineitem is deleted and nothing else (even if it would leave an "orphaned" -well, no, "childless"- invoice).

The second time you ask me, I'll "want to happen" that only the lineitem is deleted and nothing else, except in the case when the lineitem deleted is the last one remaining for the invoice, in which case I want the invoice itself deleted too.

The third time you ask me, I'll "want to happen" that the lineitem, and the invoice it belongs to, and all other lineitems belonging to that invoice, get deleted.

Now please show in minute detail how your "logical theory" can help a DBMS/data language support each of those three possible "want to happens".

Quote from p c on October 16, 2019, 12:14 am

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

The first time you ask me, I'll "want to happen" that only the lineitem is deleted and nothing else (even if it would leave an "orphaned" -well, no, "childless"- invoice).

The second time you ask me, I'll "want to happen" that only the lineitem is deleted and nothing else, except in the case when the lineitem deleted is the last one remaining for the invoice, in which case I want the invoice itself deleted too.

The third time you ask me, I'll "want to happen" that the lineitem, and the invoice it belongs to, and all other lineitems belonging to that invoice, get deleted.

Now please show in minute detail how your "logical theory" can help a DBMS/data language support each of those three possible "want to happens".

Author of SIRA_PRISE

#30 · October 16, 2019, 7:44 am

Quote from Dave Voorhis on October 16, 2019, 7:44 am

Quote from p c on October 16, 2019, 12:14 am

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

It's not divergence, but an explanation providing background to my question in the last paragraph.

My explanation described just one common, real-world(-ish) example of the sort of one-to-many association between tables that exists in millions of real-world database-driven applications. The view, Invoices, is one illustration of the sort of view commonly created to join (or JOIN) the tables that are so associated and is often used to underpin some user interface.

Erwin described the problem with update-through-views on Invoices very well, but I'll reiterate: Through that one view, deleting a row may...

Sometimes mean the user wants to delete only from the "many" side (delete a line item from InvoiceDetail) even if it leaves an invoice with no line items (we'll add them later);

Sometimes mean the user wants to delete a whole invoice (delete an invoice from InvoiceHeading and all its line items in InvoiceDetail); and

Sometimes mean the user wants to delete the last line item in InvoiceDetail and automatically delete the associated InvoiceHeading row because there are no associated line items.

... And the user will want to do be able to do all three in the same session.

In the scenario as I described it in my last post, it's easy. Appropriate buttons are available in the user interface and the user can click them. They issue appropriate DELETEs to the base tables. No update-through-views is needed nor would it work, it appears.

How would you handle it?

Quote from p c on October 16, 2019, 12:14 am

Quote from Dave Voorhis on October 14, 2019, 8:50 pm

Quote from p c on October 14, 2019, 6:35 pm

Quote from Dave Voorhis on October 13, 2019, 10:44 am

Sorry, I can't make head or tails of this. It starts with the usual anti-programmer sentiment -- hardly a good way to address an audience of programmers -- and the rest is mostly incoherent. If there are useful insights there, they're drowning in a sea of irrelevant verbiage. I get that you think we're doing the relational model all wrong -- which is fine; that's what the Datalog folks thought too, and their approach is now largely considered "right" in academic circles -- but despite the fact that your writing occasionally hints of aspects of Prolog/Datalog and other logic languages, I've never been able to tell if you agree with them or not.

Of course, another possibility is that you simply don't understand any of it.

Not all wrong, there is plenty in TTM that I don't see any need to comment on given its goal which is more ambitious than I think is useful. It has many aspects I don't disagree with even if I personally wouldn't see any need to implement them. The separation of logical from physical matters will always be sprouting thorns for implementations when a language tries to combine them. If it is possible it won't happen without some formal division of concepts, architectural layering or somesuch and like existing dbms systems the relational parts won't be nearly as small as they should be to be maintainable.

The latest repeat mention of Datalog confuses me just as much as it has for years. It seems silly because I've never read that it embodies Codd's theory so a dbms implementation must need a lot of backward integration. I suppose if Horn clauses or other logic programming techniques were patented there might be a commercial reason forcing the use of Datalog but as far as I know they aren't and anyway, proofs of the theorems a dbms needs are short and very few in number.

As for the rest of your latest response, think of the post as a litmus test for approximating a predisposition, which the response demonstrates.

The previous response asked what should happen but you still haven't said what you want to happen. I don't have the energy to enumerate all the many logical possibilities and such a large effort would be error-prone, like writing a book. I gave outlines of some possible expressions the dbms could use to satisfy the request. Without knowing what the user want to happen and without relation values only a general answer is possible.

A possible general answer is that when a user deletes a tuple the dbms replaces a relation value that is a subset of the Invoices extension with an empty relation provided that the argument it uses is logically valid and provided that when the original database value is logically consistent, the resulting replacement database value is consistent.

You ask "what should happen?", a familiarly simplistic question, then give a partial situation then ask me to be specific and leaving me to guess what result you want. When I give a partial solution for several possible results, you can't make head nor tail of it. The question is too vague for dbms purposes because it assumes the database has other relations which are not declared and for which no values are given. The results a dbms gives depend entirely on the relations an expression references and the form of those references. Note that says relations, not relvars because the relation or relations that a relvar indicates can be expressed in a variety of equivalent forms. From the dbms perspective, relvars are not part of the logical model, they are just a coat-hanger that makes some users feel more comfortable. When asking seemingly simple questions on a topic that most database workers don't understand at all (nor should some users ever need to understand), such as dbms internals, some of whom post here, don't assume that anybody else can assume what you think you can assume. It's not hard to see that the assumed relations might have attributes and projections in common with your relvars. Since an algebra uses extensions, results can't be predicted without input relation values. Without values you can declare a relation but you can't predict an update result without defining which means defining the value.

This thread would last a long time if I had to keep asking you questions so that I could be sure the relations you don't declare and their values are logically independent of the ones you do declare.

There are multiple ways for the result to be logically consistent so any dbms applies policies when updating, even when base relation values are replaced. One useful policy minimizes storage changes provided consistency is maintained. Although being mechanical a dbms can't embody application semantics from a users perspective such a policy is useful for reflecting some of the usual user expectations.

The expression the dbms uses to address or identify the relation the user indirectly specifies for replacement is also important because it can be written in a way that allows data designers to confirm logical validity in advance of user commands. Restricting the coding of such expressions to only designers who are responsible for database integrity as opposed to all users is another useful policy.

It's also important to note that Appendix A says that JOIN is natural join even though its formal definition doesn't exactly coincide with Codd's. Many posts in this group over the years show a frequent confusion, assuming that the first name in the user expression A JOIN B addresses, is equivalent to, the powerset of relation A as opposed to that of relvar A. even though that's frequenty not so. It's usually a subset of the powerset that the name A addresses, otherwise data independence is violated. As far as I know, TTM doesn't redress this. That's another policy needed for a dbms algebra to implement TTM algebra.

I should congratulate you for at least giving part of a situation that has straightforward solutions, even if they are multiple which often disappoints coders, rather than the frequent response that wants to know how all possible views can be updated and a a thread that then usually concludes that none should be updated because it's not obvious how to update all of them which most of the time is an unnecessary pointless question.

I mention Datalog because sometimes your descriptions and rants are somewhat reminiscent of how Datalog works. I think you'd like it.

There are a known and relatively small set of kinds of views that can be consistently and reasonably intuitively updated. Microsoft SQL Server implements them. Microsoft Access implements them. There are almost certainly other SQL DBMSs that implement them.

But even using a DBMS that implements them, it's generally considered good practice to not update through views. That's because -- among other reasons -- even with a limited set of views that can be updated through, it takes only a few views of views to make predicting the outcome of a given update painfully difficult, or for it to be disallowed by the SQL interpreter, when the intent of the update is almost always trivial to express in terms of updates to base tables.

My invoice example is one I like to use because it's simple, should be familiar to anyone who has ever dealt with a small to medium-sized business, and is roughly illustrative of how invoices have been implemented in many SQL-based (and dBase III!) small-business information systems for the last 40 years. Typically, when we want to create an invoice, we insert a tuple/row directly into InvoiceHeading. When we want to add line items, we insert related tuples/rows directly into InvoiceDetail.

Likewise updates and deletes. When we want to delete an invoice, we delete a the appropriate tuple/row in InvoiceHeading. Typically, a CASCADE DELETE will be used to automatically delete related InvoiceDetail tuples/rows, so deleting a whole invoice is easy when we need to do it.

Of course, for an actual generated invoice we should never need to do it -- or even allow users to do it -- but there may be testing/administrative/repair reasons to delete an invoice that was generated in error or testing and was never intended to be sent to a customer.

When we want to delete a invoice line item, we delete the appropriate tuple/row in InvoiceDetail. Again, for a production billing system that normally shouldn't occur, and indeed there may never be cause to do anything but INSERT into InvoiceHeading or InvoiceDetail. But regard my example as an illustration rather than an implementation. If you prefer, think of some other 1-to-many relationship from your preferred domain where INSERT/UPDATE/DELETE can occur on either side of the relationship.

Typically, most of the code for all that -- both SQL and application -- doesn't even have to be written by the programmer. Most will be code-generated by some form painter, having been told to create a master/detail form for InvoiceHeading which has a 1-to-many relationship with InvoiceDetail.

My example VIEW called Invoices might be used by some interactive report/screen/display that (say) shows the products invoiced today. One can easily imagine that this is updatable, perhaps as part of a validation process to ensure the invoices are correct before being printed or emailed. (If you don't like it, endure. Work with me here.) The typical way to handle this is to explicitly issue updates to the base tables -- InvoiceHeading and InvoiceDetail -- as appropriate. Update-through-views, even if available, would typically not be used, for the reasons I noted above.

Note that all of this is pretty straightforward stuff. It's the sort of thing junior programmers churn out on a daily basis and they generally get it right, because it's simple and intuitive and doesn't require a great deal of either technical or theoretical understanding.

But I get the impression from past posts of yours that you consider the approach I've described above to be flawed and reflective of file-system and programmer-oriented thinking. If so, how do you think it should be done?

You keep diverging, now into overall system design and asking more questions without answering the one question I asked about your Invoices deletion: what do you want to happen? I tried to guess what you want to happen and sketched parts of things that MIGHT happen. Why not just tell me what you want to happen. If it can happen, I'll tell you how and give a reason.

Regarding the latest divergence, it's an example of a system that to the extent it was designed, the design was bottom up. The mention of SQL CASCADES is an example of a feature that is totally unnecessary in the relational model. What's worse is teaching young database workers physical techniques before teaching them the logical theory. I suppose the idea is that if we want to teach bad habits it's best to start early.

It's not divergence, but an explanation providing background to my question in the last paragraph.

My explanation described just one common, real-world(-ish) example of the sort of one-to-many association between tables that exists in millions of real-world database-driven applications. The view, Invoices, is one illustration of the sort of view commonly created to join (or JOIN) the tables that are so associated and is often used to underpin some user interface.

Erwin described the problem with update-through-views on Invoices very well, but I'll reiterate: Through that one view, deleting a row may...

Sometimes mean the user wants to delete only from the "many" side (delete a line item from InvoiceDetail) even if it leaves an invoice with no line items (we'll add them later);
Sometimes mean the user wants to delete a whole invoice (delete an invoice from InvoiceHeading and all its line items in InvoiceDetail); and
Sometimes mean the user wants to delete the last line item in InvoiceDetail and automatically delete the associated InvoiceHeading row because there are no associated line items.

... And the user will want to do be able to do all three in the same session.

In the scenario as I described it in my last post, it's easy. Appropriate buttons are available in the user interface and the user can click them. They issue appropriate DELETEs to the base tables. No update-through-views is needed nor would it work, it appears.

How would you handle it?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters

Tuple visibility through a view