The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

The problem of company cars

This is a followup to the "Programmer as Navigator" thread, talking about a problem that is hard to solve in a model like TTM where named entities are second-class.

A company's database has the usual sort of employee and department relvars, the details being unimportant.  There is also a relvar representing the cars owned by the company.  At any time, a car may be assigned to an employee for their use, a department for use by any of its members, or not assigned.  How is this to be best represented?

If there were only employee cars, it would suffice to have a relation E with attributes EMP# and CAR# with a uniqueness constraint on the CAR#, so that no car is linked to more than one employee.  Ditto for department cars only, using an anologous relation D.  But since both cases are needed, if we create both E and D, then there is the risk that a car might wind up assigned to both a unique employee and a unique department.  This does not meet the conditions of the problem.

Alternatively, one can unify E and D into a single relation A with an EMP# and an ASSIGNEE#.  The linkage from an ASSIGNEE# to an EMP# would be relation E', and to a DEPT# would be relation D'.  But then we have re-created the problem: no guarantee that the same ASSIGNEE# does not map to both an EMP# and a DEPT#!  A practical but not theoretically sound resolution is to see to it that EMP# and DEPT# have disjoint values, and then the two secondary linkage relations would be trivial and need not be expressed as real relations.

Of course in a non-TTM world with nulls-as-meaningless, this is trivial: each tuple representing a car has either a null EMP# attribute or a null DEPT# attribute or neither, and the constraint preventing both from being non-null is local.

It is also trivial in a system with first-class relvars: each car tuple contains an attribute whose value is a reference to either relvar E or relvar D, and another which contains a reference to E.EMP# or D.DEPT# as the case may be; this is a kind of dependent typing.

Finally, this problem, like all problems, can be solved by a general database constraint.

Quote from johnwcowan on March 1, 2020, 5:42 am

...

Finally, this problem, like all problems, can be solved by a general database constraint.

Yes. Under TTM all constraints are 'database constraints' even if they mention only one relvar. A referential constraint (such as a Foreign Key constraint aka Inclusion Dependency; an Equivalence Dependency; or an Exclusion Dependency -- which is what you need in this case) is necessarily a 'database constraint' because it needs to mention two (at least) relvars. In SQL you declare a FK in the referring table:

a) why do you think that isn't a 'database constraint'?

b) why do you think it's sensible or 'better' to define the constraint in one place when it affects two places?

I see no need for relvar names to be first-class to address this particular design. (There might be other data sub-language requirements which justify first-classness.)

(And no I didn't look at your example in detail: it's a FAQ on StackOverflow.)

Quote from AntC on March 1, 2020, 6:21 am

In SQL you declare a FK in the referring table:

a) why do you think that isn't a 'database constraint'?

I have, of course, no such absurd view.

b) why do you think it's sensible or 'better' to define the constraint in one place when it affects two places?

Same answer.

(And no I didn't look at your example in detail: it's a FAQ on StackOverflow.)

If you had, you might have realized that it is precisely about the insufficiency of foreign-key constraints in this case.

Or not.

 

Quote from johnwcowan on March 1, 2020, 6:31 am
Quote from AntC on March 1, 2020, 6:21 am

In SQL you declare a FK in the referring table:

a) why do you think that isn't a 'database constraint'?

I have, of course, no such absurd view.

b) why do you think it's sensible or 'better' to define the constraint in one place when it affects two places?

Same answer.

(And no I didn't look at your example in detail: it's a FAQ on StackOverflow.)

If you had, you might have realized that it is precisely about the insufficiency of foreign-key constraints in this case.

 

John, I think you're exhibiting SQL-dominated thinking, and your language of framing the question is revealing that. My reply mentioned three varieties of referential constraints; from which you can infer I was already saying foreign-key constraints are not sufficient. (Sadly, I see lots of texts, such as wikipedia, reiterating this thinking that "referential" = Foreign Key. ) I agree with TTM's preference to frame integrity as 'Dependencies' . From DBE Ch 13

foreign key constraints are self evidently not fundamental—unlike, e.g., the notion of integrity constraints in general, or candidate key constraints in particular.

See that Chapter's EXACTLY ONE OFAT LEAST ONE OF, etc. Also Chapter 23 'The Decomposition Approach' (to so-called "missing information"), section on CONSTRAINTS.

SQL is not sufficient. Foreign-key constraints are not sufficient in most schema designs. But SQL offers only hammer and flat-head screwdriver (Keys/Uniqueness constraints and Foreign Key constraints). To make SQL sufficient, SQL wants you to express any other variety of constraint using triggers for validation.

Your example needs an Exclusion Dependency (wot SQL doesn't 'ave) to exclude a car from being assigned to more than one entity. It also needs a couple of four Foreign Key constraints (one on two from the car -> employee link; one on two from the car -> department link). I note a car might be non-assigned -- i.e. not appear in either link; or might be assigned to different entities at different times. So those link relvars are a natural consequence of normalising the schema to 6NF.

IS_EMPTY( CAR_EMPLink JOIN CAR_DEPLink ); JOIN by the attribute in common, which is CAR#

Or if you don't want an Exclusion Dependency, a 'poor man's way is a View unioning those two linkages with a uniqueness constraint on the View. (The View's schema to have CAR# and a source E/D indicator. SQL vendors do not all support the standard's allowing Uniqueness constraints to be declared on Views.)

There are plenty of examples where the natural solution to the problem is to add a level of indirection. The point of the RM is abstract over the physical layout of data, but (at least in TTM) we still have very concrete references by name to databases, database relvars, attributes, types (and components) and constraints. (The D language itself has some further examples).

The point is that this example could be approached differently if an attribute could refer to a relvar. Then there would be a single attribute to express the problem directly (plus a typed union for the key).

I won't argue whether it's better, just that indirection offers different solutions.

[I've seen problems like this solved in SQL by generating SQL code.]

Andl - A New Database Language - andl.org
Quote from dandl on March 1, 2020, 1:41 pm

There are plenty of examples where the natural solution to the problem is to add a level of indirection.

So you want pointers? If your 'indirection' in the schema is via business-related fields, the RM already gives you that as referential integrity -- just don't restrict your thinking to foreign-key constraints.

The point of the RM is abstract over the physical layout of data, but (at least in TTM) we still have very concrete references by name to databases, database relvars, attributes, types (and components) and constraints. (The D language itself has some further examples).

Those references are within the data manipulation language, not the schema. As my earlier message said, I'm open to allowing that in a D.

The point is that this example could be approached differently if an attribute could refer to a relvar. Then there would be a single attribute to express the problem directly (plus a typed union for the key).

You're going to need much stronger examples (and many of them) before turning the RM upside down. Note here (and I expect this is typical) a car might be non-assigned. Then your attribute could refer to an Employee, a Department or nothing. Nulls to the rescue! But we don't need that attribute nullable if we instead put the link in a um ... link relvar. Your 'typed union' must be over a finite, pre-determined number of types (one for each referred-to relvar). Well then just make a finite, pre-determined number of link relvars. And add a Exclusion Dependency.

I won't argue whether it's better, just that indirection offers different solutions.

And what is the predicate for a relvar containing an attribute that points to either {Employee, E} or {Department, D}? How do I query this relvar to find the current assignment for a car? Is the query (dealing with the type-union) any simpler than one over link tables? I don't see a solution here, only a path to more difficulties.

[I've seen problems like this solved in SQL by generating SQL code.]

Of course because SQL can't do this. (Or maybe it can but the DBA doesn't want such nasties as Union Views in their database.) That code should be in triggers as a way to at least make it visible. (But DBAs don't like triggers either; for probably better reasons.)

 

Quote from AntC on March 1, 2020, 11:04 pm
Quote from dandl on March 1, 2020, 1:41 pm

There are plenty of examples where the natural solution to the problem is to add a level of indirection.

So you want pointers? If your 'indirection' in the schema is via business-related fields, the RM already gives you that as referential integrity -- just don't restrict your thinking to foreign-key constraints.

Of course not. Indirection only means that an enquiry of A is redirected to B. An attribute of type "RELVAR" is a reference to a particular relvar. An attribute of type "ATTRIBUTE" is a reference to another attribute. And so on, not a pointer in sight.

The point of the RM is abstract over the physical layout of data, but (at least in TTM) we still have very concrete references by name to databases, database relvars, attributes, types (and components) and constraints. (The D language itself has some further examples).

Those references are within the data manipulation language, not the schema. As my earlier message said, I'm open to allowing that in a D.

There is no such division. The database schema must contain every persistent named thing defined by its D. If a type, component, database relvar or constraint is persistent, it must be represented in the schema. Given that a type or constraint expresses a boolean expression, persistent operators must also be represented in the schema. Indirection simply allows the choice of named thing to be set by a variable/value rather than being statically encoded.

The point is that this example could be approached differently if an attribute could refer to a relvar. Then there would be a single attribute to express the problem directly (plus a typed union for the key).

You're going to need much stronger examples (and many of them) before turning the RM upside down. Note here (and I expect this is typical) a car might be non-assigned. Then your attribute could refer to an Employee, a Department or nothing. Nulls to the rescue! But we don't need that attribute nullable if we instead put the link in a um ... link relvar. Your 'typed union' must be over a finite, pre-determined number of types (one for each referred-to relvar). Well then just make a finite, pre-determined number of link relvars. And add a Exclusion Dependency.

I have no intention of taking part in a pecker contest of examples and implementations. My experience tells me that many problems that have lengthy, laborious but plausible solutions at one level of indirection have shorter, simpler, more elegant solutions at the next level up. I have no quarrel with the RM as such, but in the same way that TTM has a higher level type system than the domains Codd wrote about, it's perfectly feasible to imagine a richer system for naming and referencing the components that make up a database and its schema, in which certain kinds of problems can be expressed more elegantly.

The changes are minimal:

  • new types for references to relvars, attributes, constraints and operators. I would leave types for database and type out of this list.
  • union types, if  references are allowed to resolve to objects of the same kind but different types.
  • queries on those new types.

I won't argue whether it's better, just that indirection offers different solutions.

And what is the predicate for a relvar containing an attribute that points to either {Employee, E} or {Department, D}? How do I query this relvar to find the current assignment for a car? Is the query (dealing with the type-union) any simpler than one over link tables? I don't see a solution here, only a path to more difficulties.

Cars: Car [C#] has registration plate [PLATE] and is assigned to [ASSIGNEE]

The query is like any other. Eventually you will get a value for ASSIGNEE which will be a reference to either {Employee, E} or {Department, D}. Then you can query via the reference to get further details. Really nothing complicated at all. Just a bit of simple indirection, all completely type safe.

[I've seen problems like this solved in SQL by generating SQL code.]

Of course because SQL can't do this. (Or maybe it can but the DBA doesn't want such nasties as Union Views in their database.) That code should be in triggers as a way to at least make it visible. (But DBAs don't like triggers either; for probably better reasons.)

You missed my point. The fact that such code gets written guarantees there are problems of this kind that people want to solve. I'm just proposing a particular way of solving them by indirection rather than by generating code.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 2, 2020, 12:07 am
Quote from AntC on March 1, 2020, 11:04 pm
Quote from dandl on March 1, 2020, 1:41 pm

There are plenty of examples where the natural solution to the problem is to add a level of indirection.

So you want pointers? If your 'indirection' in the schema is via business-related fields, the RM already gives you that as referential integrity -- just don't restrict your thinking to foreign-key constraints.

Of course not.

A relvar name (or attribute name) in an attribute quacks like a pointer to me. It is not business-meaningful, although you might try to name your relvars/attributes using words in a natural language. Good luck with that over thousands of relvars and tens of thousands of attribute names. The relvar name points to a db-internal structure just as much as does an ObjectId or a memory address.

I have no intention of taking part in a pecker contest of examples and implementations. My experience tells me that many problems that have lengthy, laborious but plausible solutions at one level of indirection ...

My experience tells me that lengthy, laborious solutions in SQL/code are because SQL is not fit for purpose.

[I've seen problems like this solved in SQL by generating SQL code.]

Of course because SQL can't do this. (Or maybe it can but the DBA doesn't want such nasties as Union Views in their database.) That code should be in triggers as a way to at least make it visible. (But DBAs don't like triggers either; for probably better reasons.)

You missed my point. The fact that such code gets written guarantees there are problems of this kind that people want to solve.

The 'problems' in this case are because SQL doesn't implement the RM. And that the industry's thinking is so dominated by SQL and Nulls and foreign-keys as the only solution, guarantees that the code that gets written is a bodge to make a hammer and flathead screwdriver mimic proper nuts and bolts.

I'm just proposing a particular way of solving them by indirection rather than by generating code.

 

The solution I've outlined needs one extra form of constraint wot SQL does not 'ave. No code. But while we're at it, let's support constraints being arbitrary Boolean conditions using relational operators.

Quote from AntC on March 2, 2020, 12:42 am

A relvar name (or attribute name) in an attribute quacks like a pointer to me. It is not business-meaningful, although you might try to name your relvars/attributes using words in a natural language. Good luck with that over thousands of relvars and tens of thousands of attribute names. The relvar name points to a db-internal structure just as much as does an ObjectId or a memory address.

This is precisely the argument people working and thinking in first-order languages like Java and C# make against first-class procedures.  Eventually they discover that the first-order viewpoint is too restrictive and engage in one or more of various kludges to work around the lack of them.

The solution I've outlined needs one extra form of constraint wot SQL does not 'ave. No code. But while we're at it, let's support constraints being arbitrary Boolean conditions using relational operators.

Well, now that I know what exclusion constraints are, I will investigate them.  But in general I favor the strongest possible meta-constraints on the form of constraints.

 

Quote from johnwcowan on March 2, 2020, 2:41 am
Quote from AntC on March 2, 2020, 12:42 am

A relvar name (or attribute name) in an attribute quacks like a pointer to me. It is not business-meaningful, although you might try to name your relvars/attributes using words in a natural language. Good luck with that over thousands of relvars and tens of thousands of attribute names. The relvar name points to a db-internal structure just as much as does an ObjectId or a memory address.

This is precisely the argument people working and thinking in first-order languages like Java and C# make against first-class procedures.  Eventually they discover that the first-order viewpoint is too restrictive and engage in one or more of various kludges to work around the lack of them.

Yes, I'm with you on this one. I remember people praising Fortran, and Cobol, and Algol similar terms. None of them provide indirection, so why need it?

But be careful lumping C# in with Java. C# supports first-class procedures in most scenarios, and you can usually write elegant FP-like code and avoid destructive assignments. You can pass the function to the data just as well as the other way around. It's not all roses, but it's getting quite a nice smell to it.

Java (at least as at v8) has streams and lambdas, but they don't play nicely with the rest of the language, and you get to write a lot of stuff the compiler could work out for itself. It still feels very Pascal-ish to write. Anyone for C# cross-compiling to the JVM? https://stackoverflow.com/questions/682608/implementing-c-sharp-for-the-jvm

The solution I've outlined needs one extra form of constraint wot SQL does not 'ave. No code. But while we're at it, let's support constraints being arbitrary Boolean conditions using relational operators.

Well, now that I know what exclusion constraints are, I will investigate them.  But in general I favor the strongest possible meta-constraints on the form of constraints.

 

 

Andl - A New Database Language - andl.org