The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

A Scheme library for the true RM

I've been thinking a lot about the true RM in the interest of implementing a library for it in Scheme (a dialect of Lisp), somewhat similar to the Python implementation Dee.  In Scheme, everything is first-class except variables and syntax extensions (macros).  That aside, however, Scheme is a dynamically typed impure functional programming language: that is, assignment to a variable is permitted but discouraged, and compilers typically rewrite assignable variables to be bound to a mutable box object that holds the underlying value.  As such it is almost, but not quite, entirely unlike D.

For Scheme purposes, a relvar can't be a variable, because given a Scheme variable's name all you can get is its value, whereas relvars have several other properties beside their relation value:  a name, a list of keys, a list of constraints, possibly other things like a physical database reference.  After muddling on it for a while, I came to the conclusion that a relvar for Scheme has to be a mutable object whose properties are the things mentioned above.  A Scheme variable can then be bound (typically but not necessarily immutably) to a relvar object.  The procedures (functions) that operate on relations must also be made to operate on relvars, which is easy thanks to dynamic typing.  (This introduces the possibility of relations that contain relvars, but I intend to sweep that under the rug: do that at your own risk.  The same is true of relations that contain first-class functions.)

I currently think the pr{e,o}scriptions will be satisfied except for the following (some knock-on consequences affect other 'scriptions, but just how is not stated here):

RM PRE 2: No update operators, at least not yet; only relational assignment (not the same as Scheme variable assignment, but rather mutating the "relational value" property of a relvar).

RM PRE 5:  No update operators for components of scalar types either.

RM PRE 8:  Scheme types have equivalence procedures that are more fine-grained than type-specific equality; use them with relations at your own risk.

RM PRE 22:  I see no objection to supporting < and > for tuples in the sense that T1 < T2 if T1's attributes are a proper subset of T2's and the values of those attributes are equal.  For relations, I think R1 < R2 if R1's attributes are a proper subset of R2's, or R1 and R2 have the same attributes and R1's tuples are a subset of R2's.  I would like comment on this.

RM PRE 23: I intend to support only key constraints, not arbitrary database constraints, at least for public relvars.

RM PRE 26:  I'll do my best.

RM PRO 4:  Here I intend to introduce something bold.  Scheme has a Maybe type generator based on Haskell's (its name is Option in ML, Scala, and Java) but dynamically typed.  For example: Maybe Integer is the type of all objects that are either Just integers or Nothing.  This is different from NULL because Just 5 (e.g.) is not an integer and cannot be treated as one: it has to be unwrapped in a way that respects the possibility that it is Nothing. Integers remain integers and have no extra element NULL.  My sense is that while this violates the letter of RM PRO 4, it respects the spirit of it.  Comments welcome.

It also allows outer joins to be safely introduced:  if R1 and R2 have attributes named A, B, C and A, X, Y respectively, all of type Integer, then A OUTER_JOIN B returns a relation with all five attributes, but only A retains the type Integer, whereas the others have the type Maybe Integer.  The tuples that an ordinary (inner) join would produce have Just Integer values, whereas the remaining tuples from both R1 and R2 have Nothing values.  The definitions of LEFT_OUTER_JOIN and RIGHT_OUTER_JOIN follow.

OO PRE 1:  Scheme is dynamically typed, though some implementations have non-standard exceptions that declare the types of both values and variables and can check them at compile time.

OO PRE 2:  No type inheritance in standard Scheme, though implementations provide it as an extension.

OO PRO 4, 5:  I have not yet resolved how best to implement transactions, but I will conform to these prescriptions when I do.

OO PRE 6:  I intend to introduce a general fold over relations, which requires the user rather than the system to provide an identity element.  Fold(x, +, 0> folds + over the sequence x as follows:  + is invoked on the first element of x and the identity 0 to produce intermediate result r, then + is invoked again on the next element and the result, and so on until the sequence is exhausted.  A zero-length sequence immediately returns 0, so for a sequence of length n, exactly n calls to + are made.  This generalizes to string concatenation and all sorts of other things: fold (left fold in Scheme) is almost a totipotent operator.

RM VSS 2, 4, 5: I intend to provide only key constraints and type constraints, at least for public relvars.

OO VSS 3:  My design will not be single-level.

Another constraint I am imposing: in implementations conforming to this design (including mine), scalars in public relvars MAY be limited to just numbers, strings of arbitrary length, single characters (which may be indistinguishable from strings of length 1), booleans, bytevectors (blobs), and the corresponding Maybe types.  Furthermore, lists will be forbidden in public relvars and possibly arrays as well.  These restrictions make certain implementation techniques easier.

I have also been thinking more speculatively about how to use a SQL DBMS as a back-end (this is distinct from providing access to existing tables).  For my purposes, each relvar is stored in one table, except that in dynamically typed DBMSes like SQLite the attribute type names have to be kept elsewhere.  Columns for non-Maybe types are marked NOT NULL, whereas for Maybe types they are marked NULL.  The restrictions on scalars above mean that even low-conforming DBMSes can be used.  I think it will suffice to use the SQL system just to join and restrict, with everything else done in-memory.  (One issue is that a first-class function in Scheme used for restriction can't be "decompiled" into a SQL expression.  There may have to be two distinct representations of restriction functions.)

All comments, criticisms, or kudos welcome!

Addendum:  for this purpose, Just (Just 5) collapses to Just 5, and Just Nothing collapses to Nothing.

Quote from johnwcowan on June 5, 2019, 4:20 pm

For Scheme purposes, a relvar can't be a variable, because given a Scheme variable's name all you can get is its value, whereas relvars have several other properties beside their relation value:  a name, a list of keys, a list of constraints, possibly other things like a physical database reference.

It's good to hear of a proposal for a D-like language in a style not yet attempted (I assume).  Here I just want to say that a constraint in general is not a property of a single variable.  Chris and I even now have reservations about regarding keys as such too, though we followed tradition in that respect in Tutorial D.  A key declaration, after all, is only a convenient shorthand for something that can also be expressed using a CONSTRAINT statement.  As for the name, I'm unfamiliar with any means by which that can be dispensed with and the thing in question still to be assignable to or referenced to obtain its value.  Physical database reference is totally separate from the variable declaration itself and is expected to be defined by and visible to DBAs only; as far as D is concerned, it's not even part of the language as defined in the reference manual.

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on June 6, 2019, 3:03 pm

It's good to hear of a proposal for a D-like language in a style not yet attempted (I assume).  Here I just want to say that a constraint in general is not a property of a single variable.  Chris and I even now have reservations about regarding keys as such too, though we followed tradition in that respect in Tutorial D.  A key declaration, after all, is only a convenient shorthand for something that can also be expressed using a CONSTRAINT statement.  As for the name, I'm unfamiliar with any means by which that can be dispensed with and the thing in question still to be assignable to or referenced to obtain its value.  Physical database reference is totally separate from the variable declaration itself and is expected to be defined by and visible to DBAs only; as far as D is concerned, it's not even part of the language as defined in the reference manual.

Hugh

Thanks so much for the quick response!

As I noted, it's closest to Dee, being a library embedded in an existing dynamically typed programming language, although one that is typically used more functionally than Python is.

I don't doubt that general constraints can handle everything that specialized constraints can!  But I wish to use the weakest system of constraints that still seems usable, rather than the strongest, especially to keep it practical to execute them on the server.  Allowing only type and key constraints also pushes people toward domain-key normal form, which I believe to be the best general-purpose normalization, without actually mandating its use.

As for relvars, let me give an example.  You'll need a little Scheme syntax to understand this, but I hope it is transparent.  Suppose there exists a database named foo with various relvars in it. This assumes that the database to be worked on may vary in different parts of the program, as seems right for a library.  Then the operation (set! george (db-read "foo" "Frank"))  causes the relvar named "Frank" in the foodatabase, along with its properties, to be read in and returned as the result of db-read; this relvar is then assigned to the local variable george.  (The ! indicates a function that mutates some portion of one of its arguments, or in the case of set!, mutates a local variable.  In either case no useful value is returned.)

If we want the current relation inside "Frank", we use the result of (relvar-value george), but as a convenience we allow a relvar to be passed to any function expecting a relation.  Now we can do one of two things.  The command(set-relvar-value! george my-relation) causes the relvar held in george to have its relation replaced with my-relation (at which point constraints are checked).  If this is to affect the database, then (db-write "foo" "frank" george) is required.

This is distinct from (set! stephen george), which causes the local variables stephen and george to refer to the same relvar; if the relvar's value is mutated through george, the result is visible through stephen as well.  So you see that in this design a relvar is neither a variable nor a value, but an anonymous mutable object that provides various properties: name, value, constraints, perhaps others.  As such it is first-class: you could write a function that returns a newly created relvar that shares properties with an existing relvar passed to it, for example, or store a bunch of relvars in a local data structure such as a list or vector.

On dynamic typing:  Dynamic typing is basically a special case of static typing in which all variables have a DT of α.

Most implementations of TTM other than Tutorial D have been along FP lines. That certainly includes mine, Andl. Most of what TTM has to say is quite unsurprising and easily implemented in any modern GP programming language. In my view the big ideas in TTM and the hardest to reconcile with existing languages are the type system, the treatment of relvars and the use of open expressions. They just happen to be things that directly contradict choices made in SQL.

TTM defines a fully extensible scalar type system plus generated tuple and relation types for persistent data in the database, and then mandates that the exact same type system be available in the programming language D. You are free to also have dynamic types in your D, but you must provide static scalar, tuple and relation types in the database and those same types in your D. You are encouraged to do type checking on those types. Your proposed limitations in this area are deeply hostile to this concept, and I find them unacceptable.

Many languages provide a record or struct type, but these are problematic because TTM requires name equivalence for relational operations. Grafting a TTM-compatible type system onto an existing language is the biggest challenge. I think it's impossible for most languages, other than by modifying the compiler. I would be interested in your approach.

TTM requires that access to persistent data in the database be made to look like any other variable, rather than something unique. If a line of code can contain a reference to a local variable V, then that can be replaced by the name of a relvar R. But there is no magic in 'assignment'. Whatever serves to update a local variable will do just fine. Even FP languages like Haskell can find a way around this one.

Finally, open expressions are a bit of a sleeper. They require nested scopes and the evaluation of an expression in the context of the database content, not easily done in most GP languages. It's not just about fold.

Andl - A New Database Language - andl.org
Quote from johnwcowan on June 6, 2019, 5:27 pm

Thanks John, I can say a lot about your approach, but it'll take some time ...

Quickly re TTM's relvars: in Scheme I would look at a monad abstraction to capture that not only the data content but also the schema (type) of a relvar can change. Note that TTM tends to think of the whole database being a variable, and relvars being a component of that data structure.

Re constraints:

...  Allowing only type and key constraints also pushes people toward domain-key normal form, which I believe to be the best general-purpose normalization, without actually mandating its use.

I think you need to read Fagin's DK/NF paper (closing sections). Because he very clearly doesn't expect Domain and Key constraints to be sufficient. He talks about Foreign Key constraints as perhaps being the only inter-relational constraints needed. (Of course in TTM terms that should be inter-relvar constraints.) And he expects FK constraints to piggy-back on Keys.

 

Quote from dandl on June 7, 2019, 2:04 am

You are free to also have dynamic types in your D, but you must provide static scalar, tuple and relation types in the database and those same types in your D.

[...]

Finally, open expressions are a bit of a sleeper. They require nested scopes and the evaluation of an expression in the context of the database content, not easily done in most GP languages. It's not just about fold.

Implementing a D in Scheme would be straightforward, but I have no interest in doing that.  Neither am I starting with an already statically typed programming language and extending its type system to do what D does, which seems to be what Andl.NET is about; I can't say I understand it deeply.   What I am doing is designing and writing a library to implement the RM, not D, in a way that is culturally compatible with Scheme, and as close to satisfying the TTM rules for D as closely as seems reasonable.

As for open expressions, they are just lambdas (anonymous functions) and at the very core of any Lisp (or Python for that matter).  As I see it, they should be provided a (list representation of a) tuple as an argument, from which they can pull attributes and their values.  Nested scopes and all are already handled by the language.

Questions about Andl: When you use the SQLite back end, where are the relation heading type names stored?  How do you represent tuple-valued and relation-valued values within relvars on a SQL back end?

I refer people again to the documentation for Dee, a Python library with the same general style that I am employing.  Here's a precis of the available functions and methods:  "__and__" is a method that implements Python's & operator for the given type and so with other such "__foo__" methods.

module functions:
def dictToTuple(heading, d):
def validateHeading(heading):
def constraintFromCandidateKeyFactory(r, Hk=None, scope={}):
def constraintFromForeignKeyFactory(r1, (r2, map), scope={}):
def constraintFromLambdaFactory(r, f, scope={}):
def _convertToShorthand(kn):
def _convertToConstraint(kn):
def relationFromCondition(f):
def relationFromExtension(f):
def AND(r1, r2):
def OR(r1, r2):
def MINUS(r1, r2):
def REMOVE(r, Hr):
def COMPOSE(r1, r2):
def RESTRICT(r, restriction = lambda trx:True):
def EXTEND(r, Hextension=[], extension = lambda trx:{}):
def SEMIJOIN(r1, r2):
def SEMIMINUS(r1, r2):
def SUMMARIZE(r1, r2, exps):
def GROUP(r, Hr, groupname):
def UNGROUP(r, groupname):
def WRAP(r, Hr, wrapname):
def UNWRAP(r, wrapname):
def DIVIDE_SIMPLE(r1, r2):
def DIVIDE(r1, r2, r3, r4):
def GENERATE(extension = {}):
def TCLOSE(r):
def QUOTA(r, limit, Hr=None, asc=True):
def COUNT(r, none=None):
def SUM(r, expression = lambda trx:None):
def AVG(r, expression = lambda trx:None):
def MAX(r, expression = lambda trx:None):
def MIN(r, expression = lambda trx:None):
def ALL(r, expression = lambda trx:None):
def ANY(r, expression = lambda trx:None):
def IS_EMPTY(r):

class Tuple(dict):
def __init__(self, _indict=None, **args):
def __getattr__(self, item):
def __setattr__(self, item, value):
def attributes(self):
def __hash__(self):
def __repr__(self):
def remove(self, head):
def project(self, head):
def extend(self, Hextension = [], extension = lambda t:{}):
def rename(self, newNames):
def wrap(self, Hr, wrapname):
def unwrap(self, wrapname):

class Relation(object):
def __init__(self, heading, body, constraints={'PK':(Key, None)}):
def setConstraints(self, constraints):
def __hash__(self):
def __getstate__(self):
def __setstate__(self,dict):
def setBody(self, body):
def heading(self):
def common(self, rel):
def __iter__(self):
def renderHTML(self, columns=None, sort=None, row_limit=None, link_columns=None, title_columns=False):
def __str__(self):
def __repr__(self):
def toTuple(self):
def fromTuple(tr, constraints=None):
def toTupleList(self, sort=None):
def fromTupleList(trl, constraints=None):
def dump(self, filename = None):
def load(self, filename = None):
def project(self, head):
def __call__(self, head):
def remove(self, head):
def rename(self, newnames):
def where(self, restriction = lambda r:True):
def extend(self, Hextension = [], extension = lambda t:{}):
def group(self, Hr, groupname):
def ungroup(self, groupname):
def wrap(self, Hr, wrapname):
def unwrap(self, wrapname):
def insert(self, rel):
def delete(self, rel):
def update(self, restriction = lambda r:True, Hsetting = [], setting = lambda t:{}):
def __contains__(self, rel):
def __eq__(self, rel):
def __ne__(self, rel):
def __lt__(self, rel):
def __gt__(self, rel):
def __le__(self, rel):
def __ge__(self, rel):
def __and__(self, rel):
def __or__(self, rel):
def __ior__(self, rel):
def __sub__(self, rel):
def __isub__(self, rel):
def __copy__(self):
def __len__(self):

 

Quote from AntC on June 7, 2019, 6:25 am
Quote from johnwcowan on June 6, 2019, 5:27 pm

Thanks John, I can say a lot about your approach, but it'll take some time ...

Quickly re TTM's relvars: in Scheme I would look at a monad abstraction to capture that not only the data content but also the schema (type) of a relvar can change. Note that TTM tends to think of the whole database being a variable, and relvars being a component of that data structure.

I think you need to read Fagin's DK/NF paper (closing sections). Because he very clearly doesn't expect Domain and Key constraints to be sufficient. He talks about Foreign Key constraints as perhaps being the only inter-relational constraints needed. (Of course in TTM terms that should be inter-relvar constraints.) And he expects FK constraints to piggy-back on Keys.

 

Thanks, and I hope to hear more.

I don't see the point of a monad here, when Scheme has perfectly good mutable records.

I will indeed read the paper: I had not understood that DK/NF did not include foreign key constraints, and I certainly meant to include them as an available constraint type.

Quote from johnwcowan on June 7, 2019, 10:19 am

Implementing a D in Scheme would be straightforward, but I have no interest in doing that.  Neither am I starting with an already statically typed programming language and extending its type system to do what D does, which seems to be what Andl.NET is about; I can't say I understand it deeply.   What I am doing is designing and writing a library to implement the RM, not D, in a way that is culturally compatible with Scheme, and as close to satisfying the TTM rules for D as closely as seems reasonable.

Then I think you'll have to make it clearer just what you intend. As I said, the key features of TTM are the unification of the type system between database and programming language, the representation of headings as generated tuple and relation types, the representation of relations as variables and the implicit scoping of open expressions. If you abandon those, it would seem you should say what you intend will replace them.

I know Python well and Dee somewhat. My recollection is that it makes little or no effort to deal with any of these.

As for open expressions, they are just lambdas (anonymous functions) and at the very core of any Lisp (or Python for that matter).  As I see it, they should be provided a (list representation of a) tuple as an argument, from which they can pull attributes and their values.  Nested scopes and all are already handled by the language.

Lambdas are easy, open expressions are not, at least in the way they interact with the type system. But I guess if you don't have relation or tuple types the point is rather moot.

Questions about Andl: When you use the SQLite back end, where are the relation heading type names stored?  How do you represent tuple-valued and relation-valued values within relvars on a SQL back end?

Andl creates a database table for each relvar, with a column for each attribute. System scalars are represented as a corresponding SQL type, POSSREP scalars, tuples and relations are serialised into a canonical binary format. Open expressions use the SQLite C-language callback mechanism via Interop.

Andl - A New Database Language - andl.org