The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Tuple visibility through a view

Quote from johnwcowan on October 10, 2019, 1:34 am

Did I say "autonomously"?  I did not.  I suggested that it was not unreasonable for the application to tell the DBMS in advance what general approach to take in cases like this.

If you've defined a view, then you've told the DBMS in advance such-and-such. If you want the view to be updatable-through, what you've told the DBMS is that everything to insert/delete/update through this particular view must satisfy its predicate/constraint.

  Or if you're allergic to that for some reason, what about the application?

If the application wants to manipulate data that doesn't meet what you've told the DBMS in advance wrt that view, then don't access via that view: access via the base relvar or by some other view defined appropriately. What's this obsession/madness with (ab)using the wrong view?

I am still mystified what is your use case. Does your DBMS only allow one view per base relvar or something?

 

Now that all the preliminary cruft has been cleared away, let me try again.  Suppose the Powers That Be have decided ...

 

The Powers That Be seem to have decided that they disagree with logic; or disagree with the Relational Model; or want a different DBMS. Fine. NoSQL is a good way to get cruft into your database -- not just preliminary cruft, but persistent, impossible-to-eliminate cruft that spreads like bindweed. Because logic.

IOW sounds like the PTB want a key-value store where the key is a pointer to the tuple. Not a relational database at all.

It looks to me like you didn't read the second half of what I wrote.  THE DATA AS SUPPLIED TO US IS WRONG, and we have no choice but to repair it.  What is a systematic procedure by which our application can minimally repair it in such a way that it will satisfy the constraints of our database?

To forestall the objection that I am shifting ground as I go on: of course I am.  I am steadily refining my question, which means removing yet more preliminary cruft.

Quote from johnwcowan on October 10, 2019, 2:10 am

It looks to me like you didn't read the second half of what I wrote.

It seems to me you're not reading what everybody is telling you.

  THE DATA AS SUPPLIED TO US IS WRONG,

No, what you've said is there's a contradiction between the data content vs the constraint on how you want to push the data into the database. We (the DBMS) has no way to know the data is 'wrong', unless somebody tells us. And if they're telling us, that'll be via some data content -- aka the Information Principle.

and we have no choice but to repair it.

We have plenty of choices: we can reject it; we can put it in some other place whose constraint it doesn't break; we can push it through an API/Stored Procedure that manipulates it. That's an application's job, not the DBMS's job.

Specifically you want to break RM Pre 21 "After assignment of v to V (the “target variable”), the equality comparison V = v shall evaluate to TRUE (see RM Prescription 8)." If you've manipulated the v to get it to fit V's constraint, V = v won't hold; if you've forced v into the database against V's constraint, then it won't appear in V, so again V = v won't hold. In the theory of update-through-view this failing is known as a side-effect (Dayal & Bernstein) or breaking the Assignment Principle (Date).

  What is a systematic procedure by which our application can minimally repair it in such a way that it will satisfy the constraints of our database?

That is not a question that makes sense. Noone round here is going to try to answer it.

To forestall the objection that I am shifting ground as I go on: of course I am.  I am steadily refining my question, which means removing yet more preliminary cruft.

What you're failing to do is give a motivating use case. Rather than saying (or rather shouting) the data is wrong, tell the Powers That Be they are wrong. Of course real-world data is messy and users make mistakes in data entry. That's the realm of applications to detect mistakes and provide some mechanism to repair. There's nothing specific to update-through-view about it. Provide a base relvar that shadows the true relvar but has more relaxed constraints, as a 'draft' or 'Inbox' for inspection/repair by a supervisor.

Quote from johnwcowan on October 10, 2019, 1:34 am

Did I say "autonomously"?  I did not.  I suggested that it was not unreasonable for the application to tell the DBMS in advance what general approach to take in cases like this.  Or if you're allergic to that for some reason, what about the application?  Is the user allowed to tell the application what recovery strategy is to be employed?  Or must we send a letter to the employee saying "Your promotion has been canceled due to a contradiction in the database"?

Now that all the preliminary cruft has been cleared away, let me try again.  Suppose the Powers That Be have decided that if the actions taken by the code (to wit, inserting into a specific relvar) contradict what the data says, then the fault is in the data and it must be repaired.  What now is the most appropriate general strategy to minimally repair the data so that it meets the constraints? This question is actually just as appropriate for base tables as for views, so Dave's objection is no longer to the point.

Data, after all, despite its etymology, is not given to us from Heaven on tablets of gold, or even stone.  It is collected by fallible human beings who make mistakes of all kinds.  If you want to say this isn't database theory, fine.  It's data theory, then.  Or something.

 

No you did not say "autonomously", but it was implied by everything you said suggesting that it is ***the system*** which should be applying the repair tactic.

You can't expect that of "the system" (whether that refers to DBMS in isolation or DBMS+applications) because "the system" cannot possibly know that, and that is because "the system" was constructed at a time well before the inconsistency/contradiction arose.

The problem that is in the data originated (as you observed) from a fallible user who made a mistake.  The only reasonable option is to go back to ***that*** user and ask/tell him to figure out exactly what is wrong.

Here is an abstract scenario.  A implies B.  A is observed true.  B is observed false.  What are the ***three*** possible mistakes that have caused this anomaly/contradiction ?   Hint : the scenario consists of three statements.  Hence there are three possible repair tactics : remove the implication, alter the observation of A to false, alter the observation of B to true.  What makes you think it is feasible in general to know ***in advance***, that is at system design time (and it must be at system design time because the invocation of whatever possible repair tactic applies must itself be designed into the system) which of the three will be "most" appropriate in any particular given case ?

Applied to your example : being in SENIOR_EMPLOYEE implies senior.  JWC is asserted to have to be present in SENIOR_EMPLOYEE.  JWC is asserted to be not senior.  Repair tactics : remove the implication (achievable through "alter the view definiton"), make A false (that is, bluntly deny the user's command for INSERT), make B true (that is, bluntly deny the -semantics of the- assertion as made by the user commanding the INSERT).

I am allergic to systems doing any of the latter two and the first one begs the question "alter view definition TO WHAT".

Quote from Erwin on October 10, 2019, 7:31 am

The problem that is in the data originated (as you observed) from a fallible user who made a mistake.  The only reasonable option is to go back to ***that*** user and ask/tell him to figure out exactly what is wrong.

In short, your advice is "Rerun the census".  No, I don't work for the Census Bureau and never have.  I'm trying to be suggestive without revealing proprietary and forward-looking information in a public forum.  Sometimes you can't re-gather the data (for example, you may need to know the value of various measurements as of a certain time) and have to do what you can with it, without the option of discarding it altogether.  Please take my word for it that this does happen, and in a commercial setting.

Here is an abstract scenario.

I really do understand logic.  I also understand non-axiomatic reasoning, where you have to intelligently do the best you can with limited time and resources.

Quote from johnwcowan on October 9, 2019, 10:01 pm
Quote from Dave Voorhis on October 9, 2019, 9:48 pm
Quote from johnwcowan on October 9, 2019, 8:55 pm
  1. Refuse to insert T through SENIOR_EMPLOYEES and raise an exception.

That is the most sane option in a world you've already made mad.

Better would be to disallow any form of "insert through" and only permit updates to base relvars.

If you do not like crottled greeps, by all means do not eat them.  But I do: see above.

Your crottled greeps are poisonous. Any form of "insert through" is an abomination.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on October 10, 2019, 5:01 am
It seems to me you're not reading what everybody is telling you.
Naah, we are just talking past one another, that's all.  I think the moral (for me) here is that expecting database people, for whom the data is indeed 'given', to understand data repair issues is not a priori reasonable.

 No, what you've said is there's a contradiction between the data content vs the constraint on how you want to push the data into the database. We (the DBMS) has no way to know the data is 'wrong', unless somebody tells us. And if they're telling us, that'll be via some data content -- aka the Information Principle.

If you don't want the DBMS to do this, I understand that, although it potentially has privileged access to the best methods of checking its own constraints.  That's why I spoke of the total system, including data intake.

We have plenty of choices: we can reject it; we can put it in some other place whose constraint it doesn't break; we can push it through an API/Stored Procedure that manipulates it. That's an application's job, not the DBMS's job.

As may be.  The "application" is still part of the total system.  I see no reason why you couldn't code such decisions in Industrial D — if we knew what procedure to use to make them.

Specifically you want to break RM Pre 21

I don't.  The Assignment Principle only applies within the Platonic world of the database proper, where all data and all constraints are assumed correct.

That is not a question that makes sense. Noone round here is going to try to answer it.

As I said above.

What you're failing to do is give a motivating use case.

Well, suppose the data is from the 1860 U.S. Census, which is in remarkably good shape for its age.  It is in 1NF at best, and contains oodles of assertions like (to speak basic Prolog for a moment) father_of(George, James), number_of_children(George, 0).  Those are inconsistent, and a decent database constraint will catch the problem.  But you can't ask the census-taker which one is wrong, you have to do your best.  In this case, father_of is probably more securely founded than number_of_children, so the total system should retract number_of_children(George, 0) and assert number_of_children(George, 1).  This leads me to some ideas that I'll post later.

(Again, I'm talking about the Census because that is not the problem I'm working on; it's comparable in scale and amount of dirt in the data, though.  In particular, nulls in the data are a particular form of dirt to which machine learning can be applied, but algorithmic methods are much more efficient if you can get them.)

There's nothing specific to update-through-view about it.

Agreed.  That was a red herring from the start that I unfortunately dragged across the trail myself.

Provide a base relvar that shadows the true relvar but has more relaxed constraints, as a 'draft' or 'Inbox' for inspection/repair by a supervisor.

Human intervention is not practical with something as big and error-riddled as the census.  Computers need to apply heuristics settled on by humans after a general (not particular) consideration of the errors found.

Anyway, thank you all for helping me to refine the issues.

Quote from johnwcowan on October 10, 2019, 1:18 pm
Quote from AntC on October 10, 2019, 5:01 am
It seems to me you're not reading what everybody is telling you.
Naah, we are just talking past one another, that's all.  I think the moral (for me) here is that expecting database people, for whom the data is indeed 'given', to understand data repair issues is not a priori reasonable.

 No, what you've said is there's a contradiction between the data content vs the constraint on how you want to push the data into the database. We (the DBMS) has no way to know the data is 'wrong', unless somebody tells us. And if they're telling us, that'll be via some data content -- aka the Information Principle.

If you don't want the DBMS to do this, I understand that, although it potentially has privileged access to the best methods of checking its own constraints.  That's why I spoke of the total system, including data intake.

We have plenty of choices: we can reject it; we can put it in some other place whose constraint it doesn't break; we can push it through an API/Stored Procedure that manipulates it. That's an application's job, not the DBMS's job.

As may be.  The "application" is still part of the total system.  I see no reason why you couldn't code such decisions in Industrial D — if we knew what procedure to use to make them.

Specifically you want to break RM Pre 21

I don't.  The Assignment Principle only applies within the Platonic world of the database proper, where all data and all constraints are assumed correct.

That is not a question that makes sense. Noone round here is going to try to answer it.

As I said above.

What you're failing to do is give a motivating use case.

Well, suppose the data is from the 1860 U.S. Census, which is in remarkably good shape for its age.  It is in 1NF at best, and contains oodles of assertions like (to speak basic Prolog for a moment) father_of(George, James), number_of_children(George, 0).  Those are inconsistent, and a decent database constraint will catch the problem.  But you can't ask the census-taker which one is wrong, you have to do your best.  In this case, father_of is probably more securely founded than number_of_children, so the total system should retract number_of_children(George, 0) and assert number_of_children(George, 1).  This leads me to some ideas that I'll post later.

(Again, I'm talking about the Census because that is not the problem I'm working on; it's comparable in scale and amount of dirt in the data, though.  In particular, nulls in the data are a particular form of dirt to which machine learning can be applied, but algorithmic methods are much more efficient if you can get them.)

There's nothing specific to update-through-view about it.

Agreed.  That was a red herring from the start that I unfortunately dragged across the trail myself.

Provide a base relvar that shadows the true relvar but has more relaxed constraints, as a 'draft' or 'Inbox' for inspection/repair by a supervisor.

Human intervention is not practical with something as big and error-riddled as the census.  Computers need to apply heuristics settled on by humans after a general (not particular) consideration of the errors found.

Anyway, thank you all for helping me to refine the issues.

Age-old census figures are what they are and registering them in a db is a top notch case for ***not declaring constraints at all***.

Whatever contradictions are held within those figures is something the present day user must simply learn to live with.

Present day users of those data must interpret the data not as "father_of(George, Bill)" but as "the census says that the father of George is Bill".  In their minds they're probably already very good at doing exactly that imo.  That's no different than historians keeping in the back of their minds that the author of an age-old document might just have been so laudable about the person he was praising because that person had the hand that fed the author his daily bread.

Besides, this doesn't prohibit going in search of inconsistencies.  Instead of declaring a faults expression as a constraint which will ***prevent*** the fault, just execute the faults expression as a query and it will ***list*** the faults that are there.  Automated improvement is not possible until a human has done enough detailed investigation to conclude that there are some "patterns" of errors that have been frequently made, as well as to conclude with sufficient degree of certainty what the real state of affairs most likely had been for each individual specific error pattern concerned.

(And if the father of George is James, then there is no inconsistency in claiming George has no children :-)  But I got your point.)

Quote from Erwin on October 9, 2019, 9:29 pm
Quote from johnwcowan on October 9, 2019, 8:55 pm

Very concretely, then, and talking wlg only about the visible-insert case:

I have a relvar holding an EMPLOYEEs relation.   There is a field SENIORITY, and I have a view SENIOR_EMPLOYEES defined as EMPLOYEES WHERE SENIORITY = "senior".  I have constructed a tuple T representing a new employee, and I wish to insert T through the SENIOR_EMPLOYEES view.

If the attribute SENIORITY in T = "senior", all is well.  If not, the database engine has three choices:

  1. Refuse to insert T through SENIOR_EMPLOYEES and raise an exception.
  2. Insert T anyway, and accept that this employee is not senior and will not appear in the view it was inserted through.
  3. Insert T', where T' is equal to T except in the SENIORITY field.  This requires access to the definition of SENIOR_EMPLOYEES, but I posit that.

It's case 3 that interests me.  ...

 

I'm relatively sorry to have to say, but this post betrays that you don't understand the CWA and/or predicates aka set membership conditions (and the role they play in the how and why the RM actually works).

The only reasonable option is 1.

That's because asserting that someone is a [member of the] SENIOR_EMPLOYEES (-and making that assertion is exactly what you semantically mean by inserting into this view-) despite the fact that it has been formally stated that being a member of the SENIOR_EMPLOYEES ***requires*** seniority == "senior" and despite the fact that you are asserting that for this particular employee, that is not the case.

In logic that is plain and simply called a contradiction, and a DBMS [supposed to] being a dependable system of logical inference, it ***CANNOT AFFORD TO ACCEPT ANY CONTRADICTIONS***.

Any busy dbms is likely to reflect a database tomorrow that contradicts today's database unless all the relations that can be derived from today's database are equal to those deriveable from tomorrow's database, let alone from minute to minute.

The question is reminiscent of pre-relational security requirements for fields that are unknown to the user to be insertable to the same record as fields that are known. The requirement hasn't disappeared but will remain impossible among the typical coders' clubs, SQL or otherwise, until they learn how to think in terms of relational operators/logical connectives, common attributes, 'quantified' attribute values, so as to exploit relations, views, predicates and data independence, understand logical validity, the significance and use of empty relations and learn how to  postpone storage complications such as relvars. A record can be altered but a tuple can't be changed.

The predicate of a projection doesn't need to be visible to an application if the dbms is capable of satisfying the predicate as well as that of a join of it and another relation that has unique value(s) common to all tuples in the join. This means that some attribute values don't need to be visible and set union can be used to reflect what is incorrectly called insert through projection, in otherwords the replaceent of a predicate that has a universally quantified attribute and a given value and that has an empty extension with a non-empty extension, provided the algebraic defnition of the view can be equated to the quantified set. This is not a solution only to one isolated requirement, but also has other diffeent uses.

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Quote from p c on October 12, 2019, 3:24 pm

In this group, a requirement for such a different use might likely be sloppily expressed as 'deleting only one side of a join'. What has never been understood by the posters here is how their approach is sabotaged by the simplistic view definitions, and therefore predicates, they choose such as S JOIN SP so the predicate reflected can never be accurate for their purpose. They are unable to synthesize the predicates they need and therfore the view defnitions they need so as to overcome the twin obstacles of relvars and normalized storage structures. They duplicate the same error the SQL world makes, assuming physical artifacts such as tables and relvars accurately reflect predicates as opposed to merely reflecting extensions. They must be physical because there are no such things as tables and relvars that set operators and their equivalent logical connectives can operate on.

Here's a simplified part of a billing system, written in Tutorial D:

VAR InvoiceHeading REAL RELATION {InvoiceNumber INT, InvoiceDate Date, CustomerID CHAR} KEY {InvoiceNumber};
VAR InvoiceDetail REAL RELATION {InvoiceNumber INT, ProductCode CHAR, Qty INT} KEY {InvoiceNumber, ProductCode};

CONSTRAINT InvoiceDetail_FK1 InvoiceDetail {InvoiceNumber} ⊆ InvoiceHeading {InvoiceNumber};

VAR Invoices VIEW InvoiceHeading JOIN InvoiceDetail

We can assume there would also be, at least, relvars and corresponding constraints for customers and products, which for simplicity's sake are not shown here.

Suppose the view Invoices is used to present invoices on some user interface. If the user deletes a tuple from Invoices, what should happen?

Alternatively, if this schema is undesirable, how should it be changed?

In other words, if there are issues that have "never been understood by the posters here", what do we need to understand?

Please be specific and as clear and logical as possible, and use examples.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org