## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

# The problem of company cars

PreviousPage 3 of 4Next
Quote from AntC on March 3, 2020, 12:23 am

You mean some sort of shorthand for a Boolean expression? I don't see what could be more succinct than the expression Erwin and I gave.

Suppose that a relvar has ten attributes `{a, ... j}`.  Rewriting `KEY {ALL BUT b, c}` as a first-order predicate logic expression is straightforward, but I don't see how you can make it succinct.

Please explain why you want to eliminate 'general constraints'.

Because they are general.

And why you think that's a "practical need". FFS there's already two implementations that support constraints as free-standing Boolean expressions. It ain't hard.

I do wish you would read what I write and not what you assume I probably must have written.  I wanted to know what list, if any, of special-syntax constraints like `KEY` would be sufficient for practical purposes.  I am now convinced that domains, `KEY`, and `REFERENCES/REFERENCED BY` (i.e. foreign keys) are not sufficient.

Quote from johnwcowan on March 3, 2020, 2:54 am
Quote from AntC on March 3, 2020, 12:23 am

You mean some sort of shorthand for a Boolean expression? I don't see what could be more succinct than the expression Erwin and I gave.

Suppose that a relvar has ten attributes `{a, ... j}`.  Rewriting `KEY {ALL BUT b, c}` as a first-order predicate logic expression is straightforward, but I don't see how you can make it succinct.

Please explain why you want to eliminate 'general constraints'.

Because they are general.

And why you think that's a "practical need". FFS there's already two implementations that support constraints as free-standing Boolean expressions. It ain't hard.

I do wish you would read what I write and not what you assume I probably must have written.  I wanted to know what list, if any, of special-syntax constraints like `KEY` would be sufficient for practical purposes.  I am now convinced that domains, `KEY`, and `REFERENCES/REFERENCED BY` (i.e. foreign keys) are not sufficient.

I'm not objecting to there being a shorthand for `KEY` declarations; or for domain constraints. Because they attach to one place, and only that place. I'm so-so about `REFERENCES`, but I give way to the weight of precedence for Foreign Key constraints represented in Entity-Relationship modelling, for example.

I think we have an 80-20 split: 80% of the constraints can be expressed by 20% of the possible declaration constructs -- i.e. the three in that last paragraph. Then we're into diminishing returns. If we support a construct for Exclusion Dependencies, we need two (at least): one for `AT MOST ONE` (your cars example); one for `EXACTLY ONE OF` (each car must be assigned somewhere). Does that cover all plausible requirements? It covers a significant proportion, but no not all.

So I don't think anybody would bother itemising "special-syntax constraints" when Boolean expressions cover all eventualities. Historically Keys and Foreign Keys pre-date the RM; they come from indexed files and network/hierarchical models. Codd/Boyce/Fagin had to assure the industry that the RM was at least as expressive/powerful as those models. Oh, and I suppose persuade the dyed-in-the-wool programmers who didn't see how you could use a data structure as a content-addressable store without pointers. I rather thought they'd retired long ago.

However many forms of constraint, I expect any DBMS/query optimizer will translate them all into Boolean expressions using relational operators very similar to TUTORIAL D or some algebra. (Rather a messy algebra if it has to deal with SQL nulls.) Why? Because there is a huge, well-established body of knowledge on converting Boolean/predicate expressions and relational operators into equivalent forms. By "equivalent" I mean returns the same result for the same database content; that extends to ... for the same database content providing it conforms to the declared constraints. So that equivalence doesn't hold if the in-effect constraints are locked up in code such as triggers.  There's push-select-through-join; there's key inference; there's choosing which is to be the driving source for stream-based operations; there's various canonical forms by which you can assess whether two expressions are equivalent or one always returns a subset of another; etc; etc.

Neither do those equivalences hold if the in-effect constraints are locked up as values in attributes pointing to relvars. Then here's my response to David's

You need to produce a cogent argument why indirection would be a bad thing.

And let me point out that no I don't "need" to produce an argument for what is industry best practice. You [David] need to get some foundation knowledge and stop being blinkered by a programmers perpective. No apology from me for the following being in a condescending tone to a newbie. Erwin said "In a TTM system that's so trivial/mundane/... as to not even being worth a post here ..." I agree.

Why do those transforms work? Because the optimiser can look at the schema and the Boolean expressions and manipulate their form before going to the database. If all the 'logic'/structure is locked up as the value of attributes in tuples, it can only go to the data first. And get the results of the query Row By Agonising Row.

This also applies to validating proffered updates against constraints (Erwin/SIRA_PRISE is the go-to expert here). If we have a visible constraint `IS_EMPTY (CAR_ASSGN_EMP JOIN CAR_ASSGN_DEPT);` together with visible Key and Foreign Key constraints, the DBMS can unpick proffered tuples for update (note they'll be part of a Multiple Assignment) and convert the validation requirement into a single fetch from the database, using known indexes/access paths.

So the DBMS is going to work with "general constraints" anyway -- that is, free-standing Boolean expressions. It is no extra burden on the DBMS to support declaring those constraints. The DBMS can examine/manipulate the Boolean expressions, so it can see which relvars/attributes they mention; and only bother with them when it's querying or updating those targetted relvars.

Quote from dandl on March 3, 2020, 12:27 am

IS_EMPTY (CAR_ASSGN_EMP JOIN CAR_ASSGN_DEPT).

I agree, this is thematic for TTM. But please note, this is an expression written in a programming language, a D, and it's part of the database schema. See RM Pre 23.

Well, it's an expression written in ***some*** language.  I'm not sure how you determined it's a "programming" language.  It's a declaration of something that's as you say, "part of the database schema", and therefore it might be much clearer and appropriate to call it the "schema" language.

The database design world is not served very well by calling their languages "programming languages", thereby reinforcing and perpetuating the severe misconception that it's all just about programming.

Quote from johnwcowan on March 3, 2020, 2:54 am
Quote from AntC on March 3, 2020, 12:23 am

You mean some sort of shorthand for a Boolean expression? I don't see what could be more succinct than the expression Erwin and I gave.

Suppose that a relvar has ten attributes `{a, ... j}`.  Rewriting `KEY {ALL BUT b, c}` as a first-order predicate logic expression is straightforward, but I don't see how you can make it succinct.

Please explain why you want to eliminate 'general constraints'.

Because they are general.

And why you think that's a "practical need". FFS there's already two implementations that support constraints as free-standing Boolean expressions. It ain't hard.

I do wish you would read what I write and not what you assume I probably must have written.  I wanted to know what list, if any, of special-syntax constraints like `KEY` would be sufficient for practical purposes.  I am now convinced that domains, `KEY`, and `REFERENCES/REFERENCED BY` (i.e. foreign keys) are not sufficient.

"I am interested in the questions of what other forms of constraints are sufficiently useful to justify their own syntax, and whether some basis set of such forms eliminates the practical (not theoretical) need for general constraints."

You might be interested in Halpin ORM.  It has 14 "categories" of constraint, the 14th being "the remaining rest".  I never looked at it in detail, because it's an ORM language targeted at the conceptual level, and from scanning the summary definitions, I remember being left with the conviction that there had to be [nonempty] overlap between several pairs of those categories.  Which makes it too blurry to be useful, in principle.

"Rewriting `KEY {ALL BUT b, c}` as a first-order predicate logic expression is straightforward, but I don't see how you can make it succinct."

Yes.  That's why SIRA_PRISE offers defining keys the shorthand way instead of forcing the user to spell out a restrict (with full restrict condition) of a join of a relvar with a rename of itself (with full rename list).  It's the only reason I see for offering a shorthand (that the longhand is too long and tedious) and keys are the only case I am aware of.  (FWIW : in the case of foreign keys, it turns out that the shorthand is often longer than the longhand, which is the reason why SIRA_PRISE doesn't offer anything like the "REFERENCES" shorthand, which also suffers from the fact that it gets declared for relvar REFERENCING but affects relvar REFERENCED without there being any mention of that at the declaration site of REFERENCED.)

Quote from AntC on March 3, 2020, 5:57 am

So the DBMS is going to work with "general constraints" anyway -- that is, free-standing Boolean expressions. It is no extra burden on the DBMS to support declaring those constraints. The DBMS can examine/manipulate the Boolean expressions, so it can see which relvars/attributes they mention; and only bother with them when it's querying or updating those targetted relvars.

Yes.  SIRA_PRISE has one general enforcement machinery, and even the keys that are declared in shorthand are enforced by generating the longhand expression from the shorthand, and feeding that to the enforcer.

You need to produce a cogent argument why indirection would be a bad thing.

And let me point out that no I don't "need" to produce an argument for what is industry best practice. You [David] need to get some foundation knowledge and stop being blinkered by a programmers perpective. No apology from me for the following being in a condescending tone to a newbie. Erwin said "In a TTM system that's so trivial/mundane/... as to not even being worth a post here ..." I agree.

Those who subscribe to a religion hold that nothing can be said against it except by those who are likewise deeply knowledgeable. In other words, they claim immunity from criticism by anyone who is not already a member of the inner sanctum. Best practice is simply a codification of that inner sanctum.

What I see is a spirited defence of the establishment. You insist on a problem defined on your terms, and then provide a solution on like terms. Perhaps this is steam power defending against oil, valves defending against transistors.

TTM defines something rather different from current best practice. It reached its peak of development some 10 years ago, but even then D&D had already anticipated the concept I am proposing. There has been a lot of language development since then, so those of us who understand these things can see areas where TTM could be improved. Your rigid adherence to 'best practice' tells me that you do not.

Why do those transforms work? Because the optimiser can look at the schema and the Boolean expressions and manipulate their form before going to the database. If all the 'logic'/structure is locked up as the value of attributes in tuples, it can only go to the data first. And get the results of the query Row By Agonising Row.

The solution I propose carries the same benefits. It will probably optimise better, as it does not prescribe a full join.

This also applies to validating proffered updates against constraints (Erwin/SIRA_PRISE is the go-to expert here). If we have a visible constraint `IS_EMPTY (CAR_ASSGN_EMP JOIN CAR_ASSGN_DEPT);` together with visible Key and Foreign Key constraints, the DBMS can unpick proffered tuples for update (note they'll be part of a Multiple Assignment) and convert the validation requirement into a single fetch from the database, using known indexes/access paths.

So the DBMS is going to work with "general constraints" anyway -- that is, free-standing Boolean expressions. It is no extra burden on the DBMS to support declaring those constraints. The DBMS can examine/manipulate the Boolean expressions, so it can see which relvars/attributes they mention; and only bother with them when it's querying or updating those targetted relvars.

Of course.

Andl - A New Database Language - andl.org
Quote from dandl on March 3, 2020, 10:37 am

TTM defines something rather different from current best practice. It reached its peak of development some 10 years ago, but even then D&D had already anticipated the concept I am proposing. There has been a lot of language development since then, so those of us who understand these things can see areas where TTM could be improved. Your rigid adherence to 'best practice' tells me that you do not.

The solution I propose carries the same benefits. It will probably optimise better, as it does not prescribe a full join.

I suppose "the solution I propose" refers to

As far as data in the database is concerned, therefore, the only kind of dereferencing D
might possibly need is an operation that would allow a database relvar to be specified, not
directly by means of its name, but indirectly by means of a variable whose value is that
name.

and that it is supposed to be a solution to the problem of enforcing OP's exclusion constraint ?

I have seen no such solution spelled out though, so you could perhaps start off with doing that (showing how it is more concise and less error-prone for the designer compared to just declaring the constraint) and once that done, you might try to explain how you arrived at that conclusion that it would "probably optimise better, as it does not prescribe a full join", and in particular explain "better" ***compared to what***.  In case the answer is "compared to the naive enforcement strategy that just takes the join and evaluates it", if my algorithm were that stupid I'd never even have mentioned anything on the subject here ever.

Quote from Erwin on March 3, 2020, 7:39 am
Quote from dandl on March 3, 2020, 12:27 am

IS_EMPTY (CAR_ASSGN_EMP JOIN CAR_ASSGN_DEPT).

I agree, this is thematic for TTM. But please note, this is an expression written in a programming language, a D, and it's part of the database schema. See RM Pre 23.

Well, it's an expression written in ***some*** language.  I'm not sure how you determined it's a "programming" language.  It's a declaration of something that's as you say, "part of the database schema", and therefore it might be much clearer and appropriate to call it the "schema" language.

It's a programming language, by any definition. It has a lexical structure, a syntactic structure, a semantics based on the meanings of symbols. It is defined by BNF, and it conforms to some Chomsky level. TTM describes it as a boolean expression:

"values of a given set of database relvars taken in combination shall be such that a given boolean expression (which shall mention no variables
other than the database relvars in question) evaluates to TRUE."

The TTM database has attributes defined in accordance with a persistent type system, which in turn have associated operator definitions, so a database constraint must be able to call upon those types and operators. It is unlikely to be a full D, but certainly one would expect it to be a major subset.

The database design world is not served very well by calling their languages "programming languages", thereby reinforcing and perpetuating the severe misconception that it's all just about programming.

Programming languages are pervasive. The construction of computer systems proceeds entirely by writing programs, which operate on data, in accordance with algorithms. Every digital artefact is created by programs, which are constructed by programmers in programming languages.

I will happily allow that you can design using paper, sticky notes, pictures and words, but when you want it to do work, you have to program it. In some programming language.

Andl - A New Database Language - andl.org

and that it is supposed to be a solution to the problem of enforcing OP's exclusion constraint ?

I have seen no such solution spelled out though, so you could perhaps start off with doing that (showing how it is more concise and less error-prone for the designer compared to just declaring the constraint) and once that done, you might try to explain how you arrived at that conclusion that it would "probably optimise better, as it does not prescribe a full join", and in particular explain "better" ***compared to what***.  In case the answer is "compared to the naive enforcement strategy that just takes the join and evaluates it", if my algorithm were that stupid I'd never even have mentioned anything on the subject here ever.

I made only one point: that the problem as set could be solved by indirection on relvar name instead of by the method outlined. In my experience, this kind of different approach often leads to solutions that are simpler and more elegant, and should be considered if the language/type system supports it. I was asked:

And what is the predicate for a relvar containing an attribute that points to either {Employee, E} or {Department, D}? How do I query this relvar to find the current assignment for a car? Is the query (dealing with the type-union) any simpler than one over link tables? I don't see a solution here, only a path to more difficulties.

The predicate is:

`Cars: Car [C#] has registration plate [PLATE] and is assigned to [ASSIGNEE]`

The value of assignee includes a reference to the relvar by name. The type system constraint ensures that it can only reference a valid relvar, and that requires only a simple lookup on the database catalog. Assuming that various other issues are resolved the solution appears simpler and more elegant.

Andl - A New Database Language - andl.org
Quote from dandl on March 3, 2020, 1:08 pm
Quote from Erwin on March 3, 2020, 7:39 am
Quote from dandl on March 3, 2020, 12:27 am

IS_EMPTY (CAR_ASSGN_EMP JOIN CAR_ASSGN_DEPT).

I agree, this is thematic for TTM. But please note, this is an expression written in a programming language, a D, and it's part of the database schema. See RM Pre 23.

Well, it's an expression written in ***some*** language.  I'm not sure how you determined it's a "programming" language.  It's a declaration of something that's as you say, "part of the database schema", and therefore it might be much clearer and appropriate to call it the "schema" language.

It's a programming language, by any definition. It has a lexical structure, a syntactic structure, a semantics based on the meanings of symbols. It is defined by BNF, and it conforms to some Chomsky level.

Popular usage holds that a programming language is a computer language that is Turing Complete. That's what makes C a programming language, and HTML and CSS not programming languages. We usually use the term "database language" or "query language" to distinguish certain languages that are often not Turing Complete -- or that are Turing Complete but not in a remotely general-purpose way -- from programming languages.

Tutorial D is a programming language.