The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Updating through a view: a special case

Quote from johnwcowan on July 5, 2019, 12:34 am
Quote from AntC on July 4, 2019, 11:40 pm

Then John should know there's a spectrum of opinions on the forum (and its predecessors) about updating-through-view in general. I would describe Chris Date's (many) positions as off the end of the spectrum.

Good to know.

Specifically, Date has held many positions, all of which have had complications/contradictions and most of which conclude "Further investigation is obviously needed." I haven't re-read Appendix E lately; but it's well out of date. And Date has published a whole book on the topic a couple of years ago, and gave a presentation to a TTM seminar before that. Date believes these complications are resolvable. Without wanting to put words into people's mouths, I think the general consensus on TTM is that the contradictions are problems in logic and are not resolvable. (And there's probably a grey area of challenges that are not worth the candle of solving, as per Dave's judgment.)

<rant>Historical note: I first came to TTM after reading on dbdebunk Date+McGoveran's 1994 approach to update-through-views, which is truly, embarrassingly, awful. I came here trying to find out how two supposedly intelligent people, respected in the industry, could have produced such unalloyed tripe.</rant>

I don't think there'd be much support for INSERT through a projection -- even with all sorts of defaulting of projected-out attributes [...] because RM Pro 4 (aka no Nulls):

I am not talking about defaulted values, which of course will "work".  I am talking about known values, known because of the structure of the underlying relation which was projected as the last step in forming the view.  The footnote I mentioned is on p. 451 of the TTM book (PDF page 462), in Date's section of Appendix E, and reads in part:

For example, consider a view defined as the projection over all but STATUS of S WHERE STATUS = 20, for which a default STATUS value of 20 clearly makes sense, vs. a view defined as the projection over all but STATUS of S WHERE STATUS ≠ 20, for which a default STATUS value of 20 clearly does not. Consider also the fact that the projection over all but Z of EXTEND A ADD (z AS Z) is identically equal to A. Further investigation is obviously needed.

This is precisely the case I have in mind, ...

For every case where there's a plausible strategy, there's a gazillion cases with no strategy. Hence the 'not worth the candle'. How to INSERT into

S WHERE City = 'Leeds' {ALL BUT Status}

though I didn't find this footnote till after I got interested.  Here 20 is the known value of STATUS.  I argue in my OP that known values include far more than simple literals, though equality is definitely a necessity.  It rather more than "makes sense", it is true by construction.  So if all the projected-out attributes have such known values, the insertion is legitimate.

S WHERE Status < 21 AND Status > 19 {ALL BUT Status}

S WHERE Status MOD 5 = 0 AND Status MOD 4 = 0 AND Status BETWEEN 1 AND 39 {ALL BUT Status}

(S WHERE IS_EMPTY(REL{TUP{S# S#}} MATCHING SP)) {ALL BUT Status}   // see below

Would you like to specify a general rule -- as in what the compiler/DBMS would have to go by?

(I reject any claim that the number of insertions to be made into the underlying relvar is unknown; in the absence of a join, it is exactly one.)

Bzzt: Codd's Rule 7 read with 12's "multiple-records-at-a-time" requires updates (all three verbs) in general work with sets of tuples. Appendix E talking about a single tuple is a simplification. Suppose relvar `S` is in fact a Join view. Now we need to insert to two or more base tables. That's why all query operators return sets of tuples, and why TTM more explicitly (RM Pre 21) requires update to relvars holus bolus.

Again, specifying the whole requirement, in which any relvar might turn out to be a view of arbitrary complexity, is fraught with complications. Not worth the candle.

Oh, also you might have inter-relvar constraints which mean that you must update multiple tuples as a set. There's some weasly words in the 1994 treatment suggesting the update strategy could take advantage of seeing both the constraints and the whole set of updates. Something like ON DELETE CASCADE. "Weasly words"? Read: clutching at straws.

(There might be a corollary to the Information Rule 1: everything represented explicitly by values in tables must be information.)

Hardly.  If an extension of a relation with attributes A and B adds attribute C whose value is A+B, then C contains no information as such,

OK: s/tables/base tables/.

but such an extension may be useful nonetheless, particularly if the extended relation is being consumed by something that does no computation itself, such as a screen or a plain HTML form.

That is, if a tuple makes business sense without those projected-out attributes, then that tuple alone should be the base schema. If later more information comes to hand that 'extends' that tuple, then that should be in a separate relvar with a Foreign Key constraint. (And probably a timestamp or audit trail to explain when/how the info came to hand.)

Please explain further.

What's the business reason a user is inserting into SELECT S WHERE Status = 20 {ALL BUT Status}, as opposed to into underlying relvar S? Note from an earlier thread Hugh's general distaste for Status attributes, which I agree with. Because they indicate obliquely 'more information is known', without telling what information or who knows it or when/how they came to know it. Or they're duplicating/summarising information that is recorded elsewhere -- in which case they should be represented in a view, not in the base relvar.

In this example, perhaps Status = 20 means we're negotiating with the Supplier, we've signed a contract with them but we haven't yet agreed any specific Products. Then that situation is shown in the database by there being a tuple in S but none in SP (Supplier-Product contracts) for that S#. Status 20 is a virtual/calculated field. Status 30 'Supplier contracted for Products' is another virtual/contracted field: there are tuples in SP for that S#.

The more common 'requirement' for INSERT-through-projection is (say) an employee has started work but we don't yet have their date of birth or Social Security number and/or the lowly payroll inputter is not allowed to see such sensitive info. (That's more the case I had in mind, and I don't see any defaulting rules that would help.) Then if it's legitimate by the business rules to have an employee number + name without dob/SSN: don't include dob/SSN in the schema for the Employee base table; put them in auxiliary base table(s) keyed by employee number.

Quote from Dave Voorhis on July 4, 2019, 9:28 pm

Nothing, other than meeting some ill-defined goal of "logical data independence", i.e., to be able to modify database schemas without modifying the programs that use them, for a vanishingly small number of use cases where it could work and, at best, might make a little sense if there weren't more reasonable alternatives like scrapping the notion of update-through-views and simply modifying the programs that access the database. It's not that difficult.

Indeed, considering how to build automated software engineering tools to track when database schema changes necessitate application program changes -- and reexamining coupling between applications and database management systems in general -- would be a far more productive expenditure of time and mental effort than yet another fruitless round of debating update-through-views.

I'm all for.  Where do we start ?  What would the "considering" and "reexamining" even look like ?

Quote from Erwin on July 5, 2019, 6:55 am
Quote from Dave Voorhis on July 4, 2019, 9:28 pm

Nothing, other than meeting some ill-defined goal of "logical data independence", i.e., to be able to modify database schemas without modifying the programs that use them, for a vanishingly small number of use cases where it could work and, at best, might make a little sense if there weren't more reasonable alternatives like scrapping the notion of update-through-views and simply modifying the programs that access the database. It's not that difficult.

Indeed, considering how to build automated software engineering tools to track when database schema changes necessitate application program changes -- and reexamining coupling between applications and database management systems in general -- would be a far more productive expenditure of time and mental effort than yet another fruitless round of debating update-through-views.

I'm all for.  Where do we start ?  What would the "considering" and "reexamining" even look like ?

I don't know. Discussion, I guess?

I observe that relatively successful logical data independence is typically achieved with a change in abstraction, such wrapping database access (i.e., queries against relvars/tables) with a procedural interface (e.g., stored procedures), or wrapping one API in another conceptually-different API.

Achieving logical data independence appears (at least so far) to be generally unworkable when trying to retain the same level of abstraction, such as wrapping base relvars in virtual relvars.

I've suggested before (as have others) that we'd gain more benefit -- compared to update-through-views -- from always wrapping database updates and shared business logic in stored procedures / operators, and providing automated mechanisms to expose those procedures / operators (and only those procedures / operators; no query-level database updates, only read-only ad hoc queries for reporting) to application programs, along with automated mechanisms to identify breaking API changes.

Of course, this means updates must be manually defined (same for PostgreSQL's RULEs or SQL Server's INSTEAD OF triggers, before anyone mentions those) but until there is some sensible general and agreed-upon specification of update-through-views -- and not, as it is currently, an endless parade of exceptions and contradictions -- it's the best we've got...

...unless there is some other shift in abstraction level that we haven't yet considered.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on July 4, 2019, 9:08 pm

 

An imperative language like Tutorial D would be awkward and strange without private (local) variables of scalar or tuple type, but the absence of persistent1 global variables of scalar or tuple type isn't a problem.

--
1 Non-persistent global variables of scalar or tuple type are fine.

/me chuckles.

People always think that what they have, or have had, is essential, whereas what they have never had isn't a problem.  I climb the stairs to my apartment once a day or more and don't complain, but if an elevator could be and were installed (there is simply no room for it), I would complain plenty when it broke down.  There are use cases for persistent scalars, there is precedent for providing them, they are orthogonal, and the workarounds for the lack of them are obvious kludges.  I have spoken.

Quote from johnwcowan on July 5, 2019, 4:40 pm
Quote from Dave Voorhis on July 4, 2019, 9:08 pm

 

An imperative language like Tutorial D would be awkward and strange without private (local) variables of scalar or tuple type, but the absence of persistent1 global variables of scalar or tuple type isn't a problem.

--
1 Non-persistent global variables of scalar or tuple type are fine.

/me chuckles.

People always think that what they have, or have had, is essential, whereas what they have never had isn't a problem.  I climb the stairs to my apartment once a day or more and don't complain, but if an elevator could be and were installed (there is simply no room for it), I would complain plenty when it broke down.  There are use cases for persistent scalars, there is precedent for providing them, they are orthogonal, and the workarounds for the lack of them are obvious kludges.  I have spoken.

What do you need them for?

For a pedagogic language like Tutorial D, I'm not sure they'd add much. Admittedly, I don't find much use for them and I've written a fair amount of "production" Tutorial D code.

Aside: An early release of Rel, which implemented Tutorial D as it does now but also provided a number of extensions, effectively allowed persistent arrays. One user proudly sent me some code he'd written to demonstrate how he could use arrays for everything (he'd written an "array algebra") and was delighted that he could finally dispense with those pesky and confusing relations, relvars, and relational algebra. I subsequently removed the facilities that allowed persistent arrays.

If your D and/or coding style would benefit significantly from persistent scalars then I'm sure it wouldn't be a problem, though I suspect you will hear some grumbles about violation of Codd's Rule 1.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on July 5, 2019, 6:50 am

For every case where there's a plausible strategy, there's a gazillion cases with no strategy. Hence the 'not worth the candle'. How to INSERT into

S WHERE City = 'Leeds' {ALL BUT Status}

Trivially insertable.

IS WHERE Status < 21 AND Status > 19 {ALL BUT Status}

Trivially not insertable; not equality.

S WHERE Status MOD 5 = 0 AND Status MOD 4 = 0 AND Status BETWEEN 1 AND 39 {ALL BUT Status}

A straightforward "value inference" engine analogous to Hindley-Milner type inference will show that this is insertable as well.

(S WHERE IS_EMPTY(REL{TUP{S# S#}} MATCHING SP)) {ALL BUT Status}   // see below

I'd have to figure this out to see if it works, and I'm not willing to do so today.

Would you like to specify a general rule -- as in what the compiler/DBMS would have to go by?

The general rule is easy.  It's the "Minute Particulars" (Blake) that are tricky.

(I reject any claim that the number of insertions to be made into the underlying relvar is unknown; in the absence of a join, it is exactly one.)

Bzzt:

As usual we seem to be at cross purposes.  I am talking about this quotation from Darwen's section of Appendix E (p. 465, PDF p. 476):

If view V is defined on some projection of relvar R1, then deletions from V are permitted but insertions, in general, are not.  The relvar predicate, P say, for V is EXISTS x (R1 ( x ∪ y ) ), where x is a tuple whose heading consists of the attributes of R1 that are excluded by the projection and y is a tuple consisting of the remaining attributes of R1.

To delete tuple y from V is to deny that there exists such an x (i.e., to assert that NOT P). The only way of achieving that effect is to delete all tuples of the form x∪y from R1.

To insert tuple y into V is to assert that there does exist such an x. In general, there are many equally reasonable ways of achieving that effect and to choose one of those ways would be arbitrary. Now, some writers have proposed to treat as special the case where all of the attributes of the heading of x have default values defined for them. My objection to that proposal is that we cannot safely conclude, given that there exists at least one x that makes R1(x∪y) evaluate to TRUE, that there exists exactly one such x; hence, to insert exactly one tuple into R1 would be arbitrary.

I believe that this over-applies what is true for joins (that more than one relvar may need an insert when you insert one tuple into the join view) to projection.  Or else I have misunderstood it, as is perfectly likely.

What's the business reason a user is inserting into SELECT S WHERE Status = 20 {ALL BUT Status}, as opposed to into underlying relvar S?

STATUS was introduced by Date, not me.  So let's take an actual use case.  A relation consists of items (index cards, emails, documents, what have you) that have some general properties (text, creation date, etc.) and also a fixed and finite set of categories to which a given item belongs or does not belong.  These can be neatly represented as a set of boolean attributes in the relation.

Now the UI is this: the user chooses certain categories, and all items in all of those categories are going to appear in the user's new view.  This is a restrict.  After that, it is quite useful to remove the attributes representing those categories from the view, since they now carry no information beyond what is in the definition of the view.  This is a project.

So in this case it is true by construction that an INSERT through this view must supplement the attributes and values supplied by the user with the removed attributes, all set to True as specified by the underlying restrictions.  Otherwise the inserts would not appear in the view, violating the Assignment Principle.  It is equally true if a single tuple is being inserted or many tuples are.  (Inequality works too because there are only two elements in the boolean domain, but it wouldn't in any other domain.)

Yes, it's a special case — that's what the topic says.   But it is not as special as Date's footnote suggests.

The more common 'requirement' for INSERT-through-projection is (say) an employee has started work but we don't yet have their date of birth or Social Security number and/or the lowly payroll inputter is not allowed to see such sensitive info. (That's more the case I had in mind, and I don't see any defaulting rules that would help.)

I agree with you about that use case.

 

Quote from johnwcowan on July 5, 2019, 4:40 pm
Quote from Dave Voorhis on July 4, 2019, 9:08 pm

 

An imperative language like Tutorial D would be awkward and strange without private (local) variables of scalar or tuple type, but the absence of persistent1 global variables of scalar or tuple type isn't a problem.

--
1 Non-persistent global variables of scalar or tuple type are fine.

/me chuckles.

People always think that what they have, or have had, is essential, whereas what they have never had isn't a problem.  I climb the stairs to my apartment once a day or more and don't complain, but if an elevator could be and were installed (there is simply no room for it), I would complain plenty when it broke down.  There are use cases for persistent scalars, there is precedent for providing them, they are orthogonal, and the workarounds for the lack of them are obvious kludges.  I have spoken.

If your language has the concept of scopes (such as local and global) and variables and types then to have variables of any type and any scope is the normal expectation. To do less is to impose an artificial restriction. This says nothing about database variables.

A relational database has its own rules regarding access, visibility, persistence, transaction control and so on, which the language has to accept and implement. (Again, my strong preference is for that to be mediated by an abstraction layer.)

If you have use cases for persistent scalars, they are outside the database and would require a whole new set of rules, not defined by TTM.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on July 5, 2019, 7:53 am
Quote from Erwin on July 5, 2019, 6:55 am
Quote from Dave Voorhis on July 4, 2019, 9:28 pm

Nothing, other than meeting some ill-defined goal of "logical data independence", i.e., to be able to modify database schemas without modifying the programs that use them, for a vanishingly small number of use cases where it could work and, at best, might make a little sense if there weren't more reasonable alternatives like scrapping the notion of update-through-views and simply modifying the programs that access the database. It's not that difficult.

Indeed, considering how to build automated software engineering tools to track when database schema changes necessitate application program changes -- and reexamining coupling between applications and database management systems in general -- would be a far more productive expenditure of time and mental effort than yet another fruitless round of debating update-through-views.

I'm all for.  Where do we start ?  What would the "considering" and "reexamining" even look like ?

I don't know. Discussion, I guess?

I observe that relatively successful logical data independence is typically achieved with a change in abstraction, such wrapping database access (i.e., queries against relvars/tables) with a procedural interface (e.g., stored procedures), or wrapping one API in another conceptually-different API.

Achieving logical data independence appears (at least so far) to be generally unworkable when trying to retain the same level of abstraction, such as wrapping base relvars in virtual relvars.

I've suggested before (as have others) that we'd gain more benefit -- compared to update-through-views -- from always wrapping database updates and shared business logic in stored procedures / operators, and providing automated mechanisms to expose those procedures / operators (and only those procedures / operators; no query-level database updates, only read-only ad hoc queries for reporting) to application programs, along with automated mechanisms to identify breaking API changes.

Of course, this means updates must be manually defined (same for PostgreSQL's RULEs or SQL Server's INSTEAD OF triggers, before anyone mentions those) but until there is some sensible general and agreed-upon specification of update-through-views -- and not, as it is currently, an endless parade of exceptions and contradictions -- it's the best we've got...

...unless there is some other shift in abstraction level that we haven't yet considered.

 

I'm a big fan of designing a procedural API for use by the application programmers, as long as they factor properly. You need queries that take stream arguments and return stream values, which can be combined by RA operators, to give query optimisation a chance to work. Ditto updates.

LINQ for SQL is not perfect, but gives an idea of what is possible.

Andl - A New Database Language - andl.org
Quote from dandl on July 6, 2019, 1:21 am

 

If your language has the concept of scopes (such as local and global) and variables and types then to have variables of any type and any scope is the normal expectation. To do less is to impose an artificial restriction. This says nothing about database variables.

If you treat database variables as a special case, then no, it doesn't.  If you treat them as a persistence property that can apply to any variable (or at least any global variable, as local variables have genuine issues), then saying "All persistent variables must be of a specific type" is what's artificial.

A relational database has its own rules regarding access, visibility, persistence, transaction control and so on, which the language has to accept and implement. (Again, my strong preference is for that to be mediated by an abstraction layer.)

If you have use cases for persistent scalars, they are outside the database and would require a whole new set of rules, not defined by TTM.

As I noted above, public tuple and scalar variables can be emulated as public relvars with suitable constraints.  But so can private tuple and scalar variables.

While I am at it, what is the meaning in TD of a public relvar declaration that is local to an operator definition?  Does it mean that the relvar is created at the point of declaration (like global public relvars) and dropped at the end of the definition?   In that case, what if the operator definition is recursive?

Or is it just a locally scoped reference to a relvar with global extent, like static variables in C/C++?

Quote from johnwcowan on July 6, 2019, 2:07 am
Quote from dandl on July 6, 2019, 1:21 am

 

If your language has the concept of scopes (such as local and global) and variables and types then to have variables of any type and any scope is the normal expectation. To do less is to impose an artificial restriction. This says nothing about database variables.

If you treat database variables as a special case, then no, it doesn't.  If you treat them as a persistence property that can apply to any variable (or at least any global variable, as local variables have genuine issues), then saying "All persistent variables must be of a specific type" is what's artificial.

You have it back to front. TTM defines a relational database and a language for accessing it. If you don't have that, it's not TTM.

You are perfectly free to add persisted non-database variables to your language, with or without database. TTM has nothing to say about them.

A relational database has its own rules regarding access, visibility, persistence, transaction control and so on, which the language has to accept and implement. (Again, my strong preference is for that to be mediated by an abstraction layer.)

If you have use cases for persistent scalars, they are outside the database and would require a whole new set of rules, not defined by TTM.

As I noted above, public tuple and scalar variables can be emulated as public relvars with suitable constraints.  But so can private tuple and scalar variables.

I don't think so. The update rules are different.

While I am at it, what is the meaning in TD of a public relvar declaration that is local to an operator definition?  Does it mean that the relvar is created at the point of declaration (like global public relvars) and dropped at the end of the definition?   In that case, what if the operator definition is recursive?

Or is it just a locally scoped reference to a relvar with global extent, like static variables in C/C++?

TTM says very little about scope. I would expect any public relvar declaration to be a reference to a database variable, which while public happens not to be visible elsewhere in the program (not declared). It is created explicitly (somewhere), dropped explicitly (somewhere) and every reference to it is a reference to the same database relvar. But your language might do something different.

Andl - A New Database Language - andl.org