The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Updating through a view: a special case

I have not yet fully grasped Appendix E, but I'm interested in the special case of inserting into a projection view where the removed attributes have known values.  This is mentioned in a footnote, but the question is, which values are known?  The three obvious cases are a restriction that equates the removed attribute with a constant expression (one that does not contain any variable names, though it may contain attribute names of the current relation), an equivalent general constraint, and an extension.  In order for such inserts to work, all of the removed variables must be reconstructible in this way.  I think everything will be fine even for non-constant expressions, provided there are no circular dependencies:  given a relvar with one integer attribute X currently holding an empty relation, what is the effect of INSERT INTO var {X := COUNT(var)}?

This should also apply to any other relop that removes columns as part of its work.  What I don't know is whether there are other ways to supply a known value.

(While I am at it, why are public variables restricted to being relvars?  It seems that it should be possible to make a variable of any type persistent.   Certainly you can extend a tuple to a relation of cardinality 1 or a scalar to a relation of degree 1 and cardinality 1, but the artifice is unlovely.)

Quote from johnwcowan on July 3, 2019, 3:13 pm

While I am at it, why are public variables restricted to being relvars?  It seems that it should be possible to make a variable of any type persistent.   Certainly you can extend a tuple to a relation of cardinality 1 or a scalar to a relation of degree 1 and cardinality 1, but the artifice is unlovely.

I presume it's per Codd's Information Rule, "All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables." (See https://reldb.org/c/index.php/twelve-rules/)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on July 3, 2019, 3:18 pm

I presume it's per Codd's Information Rule, "All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables." (See https://reldb.org/c/index.php/twelve-rules/)

The context shows that that rule is about not having catalog metadata that is not accessible by relational means, quite a different consideration.  With persistent variables of arbitrary type, we can have a bunch of configurational values for the database as a whole (perhaps the name and email address of the database owner) that otherwise have to be put into one or more relations of the kind I mentioned above, requiring more ceremony to test and set them. There was a lot of experience with this back in the day with Napier88 (which calls it the "persistent store") , Pascal/R and its sequels, and of course Date's proposals for PL/I.

Quote from johnwcowan on July 3, 2019, 5:22 pm
Quote from Dave Voorhis on July 3, 2019, 3:18 pm

I presume it's per Codd's Information Rule, "All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables." (See https://reldb.org/c/index.php/twelve-rules/)

The context shows that that rule is about not having catalog metadata that is not accessible by relational means, quite a different consideration.  With persistent variables of arbitrary type, we can have a bunch of configurational values for the database as a whole (perhaps the name and email address of the database owner) that otherwise have to be put into one or more relations of the kind I mentioned above, requiring more ceremony to test and set them. There was a lot of experience with this back in the day with Napier88 (which calls it the "persistent store") , Pascal/R and its sequels, and of course Date's proposals for PL/I.

Codd's "Rule 1" and associated context is the following:

Rule 1: All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables.

Even table names, column names and domain names are represented as character strings in some tables. Tables containing such names are normally part of the built-in system catalog. The catalog is accordingly a relational data base itself — one that is dynamic and active and represents the metadata (data describing the rest of the data in the system). ...

Whenever "Rule 1" comes up it tends to gets interpreted in one of two ways. Depending on who you ask, it will either be, "All information in a relational data base is represented ... in exactly one way -- by values in tables. [Which includes] even table names, column names and domain names ... [i.e.] the built-in system catalog."

Or it will be, "All information in a relational data base is represented ... in exactly one way -- by values in tables. [By which I mean] table names, column names and domain names ... [i.e.] the built-in system catalog."

I think the former is a more compelling interpretation -- mostly hinging on the presence of the "even" preceding "table names, column names ..." etc.  -- but I've heard the latter often enough. In the grand scheme of things, I doubt the elegance of a database language would be compromised by supporting persistent scalar variables for global settings -- though they might simply be a syntactic shorthand for retrieving some scalar value from a relvar.

I can easily imagine the Rule was based on the belief that insisting on one and only one persistent mechanism -- i.e., the table (now relvar) -- would enforce a consistent and simple database representation, and avoid a proliferation of different database access and update mechanisms and representations, the uttermost extension of that being (say) an object-oriented database where the memory state is persistent but the access mechanisms are arbitrary, complex and programmer-defined.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on July 3, 2019, 7:07 pm

Whenever "Rule 1" comes up it tends to gets interpreted in one of two ways. Depending on who you ask, it will either be, "All information in a relational data base is represented ... in exactly one way -- by values in tables. [Which includes] even table names, column names and domain names ... [i.e.] the built-in system catalog."

On a closer look, that is probably what Codd meant.  However, in D there seem to be many other persistent things: scalar types, keys, operators, and constraints at least.  It is hard to know what is meant by dropping these unless they are persistent: note that non-persistent variables cannot be dropped.

(It is also hard to understand what effects such drops have.  Does an expression mentioning a relvar or operator that has been dropped fail with an exception the next time it is evaluated?  What happens to a relation or tuple value with an attribute of a scalar type that has been dropped?  Or do drops perhaps only take effect after the D program terminates?  In that case, does existing code have to be recompiled?  The whole idea of dropping things inside D, as opposed to with some administrative action like that which creates and drops databases, seems incoherent to me.)

In the grand scheme of things, I doubt the elegance of a database language would be compromised by supporting persistent scalar variables for global settings -- though they might simply be a syntactic shorthand for retrieving some scalar value from a relvar.

Quite the other way about, I think. Orthogonality is a substantial factor in elegance, and treating one variable type differently from all the rest looks quite inelegant to me.

I can easily imagine the Rule was based on the belief that insisting on one and only one persistent mechanism -- i.e., the table (now relvar) -- would enforce a consistent and simple database representation, and avoid a proliferation of different database access and update mechanisms and representations, the uttermost extension of that being (say) an object-oriented database where the memory state is persistent but the access mechanisms are arbitrary, complex and programmer-defined.

I think part of the miscommunications that occur is because the critique of non-relational, non-SQL, non-OO database systems is so old that the state of the art in them is not even thought about.  Comparisons are made against IMS and Codasyl with their exposed pointers.  "Of an opinion which ceases to be doubted, the evidence ceases to be examined."  (George Bernard Shaw)

The last such database system I used, a few years ago, gave me an extremely pleasant experience.  It's a tree similar to the Posix file system, and as such hierarchical.  The edge between each node and its children is labeled by a string, and each node contains an arbitrary number of properties with arbitrary names (thus no distinction between "directories" and "files").   The system maps the REST verbs GET (get child names and property names and values of a node), PUT (update properties of a node), POST (add a child to a node, giving the edge a previously unused name), DELETE (remove a node and its descendants) extremely well.  It is general-purpose and exposes no pointers.  (I can't remember if there were explicit ascending edges or not.)  Symbolic links were also provided, whereby operating on such a node is transparently mapped to some other node identified by its absolute pathname.

The last hierarchical database before that was a Posix file system; for my purposes it could be viewed as a single relation with five string attributes and one CLOB.  I chose the file system because it required no DBA or maintainer, and because it was extremely robust compared to even the best single-machine databases in those days.  I was also able to exploit hard links to keep only a single copy of each CLOB on disk no matter how many times the same text appeared in the logical relation.  The application did not notice this except when inserting, but disks were small in those days.

Correction to come...

Quote from johnwcowan on July 3, 2019, 10:55 pm
Quote from Dave Voorhis on July 3, 2019, 7:07 pm

Whenever "Rule 1" comes up it tends to gets interpreted in one of two ways. Depending on who you ask, it will either be, "All information in a relational data base is represented ... in exactly one way -- by values in tables. [Which includes] even table names, column names and domain names ... [i.e.] the built-in system catalog."

On a closer look, that is probably what Codd meant.  However, in D there seem to be many other persistent things: scalar types, keys, operators, and constraints at least.  It is hard to know what is meant by dropping these unless they are persistent: note that non-persistent variables cannot be dropped.

I think you have to read Rule 1 with Rule 2 specifying how to access data (which does not include 'by variable name'). In a D, scalar types, keys, operator definitions and constraints are all held in the catalogue, so are accessible (as metadata) using the means in Rule 2.

(It is also hard to understand what effects such drops have.  Does an expression mentioning a relvar or operator that has been dropped fail with an exception the next time it is evaluated?  What happens to a relation or tuple value with an attribute of a scalar type that has been dropped?  Or do drops perhaps only take effect after the D program terminates?  In that case, does existing code have to be recompiled?  The whole idea of dropping things inside D, as opposed to with some administrative action like that which creates and drops databases, seems incoherent to me.)

In the grand scheme of things, I doubt the elegance of a database language would be compromised by supporting persistent scalar variables for global settings -- though they might simply be a syntactic shorthand for retrieving some scalar value from a relvar.

Quite the other way about, I think. Orthogonality is a substantial factor in elegance, and treating one variable type differently from all the rest looks quite inelegant to me.

I can easily imagine the Rule was based on the belief that insisting on one and only one persistent mechanism -- i.e., the table (now relvar) -- would enforce a consistent and simple database representation, and avoid a proliferation of different database access and update mechanisms and representations, the uttermost extension of that being (say) an object-oriented database where the memory state is persistent but the access mechanisms are arbitrary, complex and programmer-defined.

I think part of the miscommunications that occur is because the critique of non-relational, non-SQL, non-OO database systems is so old that the state of the art in them is not even thought about.  Comparisons are made against IMS and Codasyl with their exposed pointers.  "Of an opinion which ceases to be doubted, the evidence ceases to be examined."  (George Bernard Shaw)

At the time Codd wrote those rules (mid-1980's) all sorts of vendors were jumping on the 'Relational' bandwagon, and putting SQL-like query frontends on their 1960's era file systems. (Lipstick on a pig.) Yes IMS and Codasyl with their pointers. Also mid-range machines ("minicomputers") with hierarchical file systems, claiming that a whole directory was a database. They had size limits on how much data per file or entries per index or (especially) entries per 'secondary index', which meant you had to 'archive' last month's or last year's transactions into sub-directories; and then your queries had to specify which "generation(s)" they were to run over. So then the current "generation" had to include a swag of summaries of historical data/brought forward position, and queries had to combine this with current transactions. Woebetide anybody who had to 'correct' a historical transaction.

Accessing was very much not per Rule 2: you had to look at the data range requested in a query; then navigate the "generations"/file hierarchy (knowing what is the archiving policy); then merge the data from the various files. It sounds a lot like you describe for your Posix hierarchical directory. In what way is a directory structure not tantamount to pointers? It certainly isn't "by values in tables".

The last such database system I used, a few years ago, gave me an extremely pleasant experience.  It's a tree similar to the Posix file system, and as such hierarchical.  The edge between each node and its children is labeled by a string, and each node contains an arbitrary number of properties with arbitrary names (thus no distinction between "directories" and "files").   The system maps the REST verbs GET (get child names and property names and values of a node), PUT (update properties of a node), POST (add a child to a node, giving the edge a previously unused name), DELETE (remove a node and its descendants) extremely well.  It is general-purpose and exposes no pointers.  (I can't remember if there were explicit ascending edges or not.)  Symbolic links were also provided, whereby operating on such a node is transparently mapped to some other node identified by its absolute pathname.

The last hierarchical database before that was a Posix file system; for my purposes it could be viewed as a single relation with five string attributes and one CLOB.  I chose the file system because it required no DBA or maintainer, and because it was extremely robust compared to even the best single-machine databases in those days.  I was also able to exploit hard links to keep only a single copy of each CLOB on disk no matter how many times the same text appeared in the logical relation.  The application did not notice this except when inserting, but disks were small in those days.

 

Quote from Dave Voorhis on July 3, 2019, 3:18 pm
Quote from johnwcowan on July 3, 2019, 3:13 pm

While I am at it, why are public variables restricted to being relvars?  It seems that it should be possible to make a variable of any type persistent.   Certainly you can extend a tuple to a relation of cardinality 1 or a scalar to a relation of degree 1 and cardinality 1, but the artifice is unlovely.

I presume it's per Codd's Information Rule, "All information in a relational data base is represented explicitly at the logical level and in exactly one way — by values in tables." (See https://reldb.org/c/index.php/twelve-rules/)

As an aside ...

Rule 3 talks about "null values". But elsewhere Codd says NULL is not a value, but a mark; and it's difficult to treat NULL systematically as Rule 3 requires if it is merely a value.

But if NULL isn't a value, then attributes marked NULL aren't "values in tables"; then Rule 1 gets broken. Furthermore if you must access by IS [NOT] NULL ..., Rule 2 gets broken.

Quote from johnwcowan on July 3, 2019, 10:55 pm
Quote from Dave Voorhis on July 3, 2019, 7:07 pm

Whenever "Rule 1" comes up it tends to gets interpreted in one of two ways. Depending on who you ask, it will either be, "All information in a relational data base is represented ... in exactly one way -- by values in tables. [Which includes] even table names, column names and domain names ... [i.e.] the built-in system catalog."

On a closer look, that is probably what Codd meant.  However, in D there seem to be many other persistent things: scalar types, keys, operators, and constraints at least.  It is hard to know what is meant by dropping these unless they are persistent: note that non-persistent variables cannot be dropped.

(It is also hard to understand what effects such drops have.  Does an expression mentioning a relvar or operator that has been dropped fail with an exception the next time it is evaluated?  What happens to a relation or tuple value with an attribute of a scalar type that has been dropped?  Or do drops perhaps only take effect after the D program terminates?  In that case, does existing code have to be recompiled?  The whole idea of dropping things inside D, as opposed to with some administrative action like that which creates and drops databases, seems incoherent to me.)

The intention of TTM is usually taken to be that D is strongly typed, with the compiler able to check every possible statement construct for correctness. Dropping a variable or anything else other than a type means that the name goes out of scope. It is up to the language designer and compiler to ensure that the name cannot be used where it is undefined. For public variables, the compiler must track dependencies to ensure that this is enforced across separate compilation units. This is non-trivial.

Dropping or modifying a type is even more difficult, especially a public type. Yes, it may require recompilation, which may require special admin rights.

I can easily list the kinds of issues that might arise, but I don't think anyone has worked through a complete set of problems and solutions.

 

Andl - A New Database Language - andl.org
Quote from dandl on July 4, 2019, 12:35 am

The intention of TTM is usually taken to be that D is strongly typed, with the compiler able to check every possible statement construct for correctness. Dropping a variable or anything else other than a type means that the name goes out of scope. It is up to the language designer and compiler to ensure that the name cannot be used where it is undefined. For public variables, the compiler must track dependencies to ensure that this is enforced across separate compilation units. This is non-trivial.

That's much too modest a claim.  Consider a conditional statement (CASE)that drops a relvar iff some condition is true.  Then it cannot even be decided until runtime whether references to that relvar are valid or not.  If the conditional declares another relvar of the same name, who knows what happens.

I can easily list the kinds of issues that might arise, but I don't think anyone has worked through a complete set of problems and solutions.

And yet various TTM 'scriptions make these things mandatory.